user

Messages by Thread

- Re: How is Spark a memory based solution if it writes data to disk before shuffles? krexos
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? Sid
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? krexos
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? Sean Owen
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? krexos
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? Sid
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? Gourav Sengupta
- Re: How is Spark a memory based solution if it writes data to disk before shuffles? Apostolos N. Papadopoulos
Spark Group How to Ask Zehra Günindi
- Re: Spark Group How to Ask Sean Owen
DatasourceV2 with Custom JDBC Source Arsh Bhardwaj
Sources/V2 DatasourceV2 in Spark 3.* Bigg Ben
Understanding about joins in spark Sid
[FINAL CALL] - Travel Assistance to ApacheCon New Orleans 2022 Gavin McDonald
Glue is serverless? how? Sid
- Re: Glue is serverless? how? Bjørn Jørgensen
- Re: Glue is serverless? how? finkel
- Re: Glue is serverless? how? Sid
Follow up on Jira Issue 39549 Chenyang Zhang
- Re: Follow up on Jira Issue 39549 Sean Owen
- Re: Follow up on Jira Issue 39549 Chenyang Zhang
- Re: Follow up on Jira Issue 39549 Sean Owen
Need help with the configuration for AWS glue jobs Sid
- Re: Need help with the configuration for AWS glue jobs Gourav Sengupta
- Re: Need help with the configuration for AWS glue jobs Sid
[Java 17] --add-exports required? Greg Kopff
- Re: [Java 17] --add-exports required? Yang,Jie(INF)
- Re: [Java 17] --add-exports required? Greg Kopff
- Re: [Java 17] --add-exports required? Yang,Jie(INF)
- Re: [Java 17] --add-exports required? Greg Kopff
StructuredStreaming - read from Kafka, writing data into Mongo every 10 minutes karan alang
repartition(n) should be deprecated/alerted Igor Berman
- Re: repartition(n) should be deprecated/alerted Sean Owen
- Re: repartition(n) should be deprecated/alerted Igor Berman
[Spark Dataframe] How to load compressed file? (lz4, snappy) HelloWorld
Will it lead to OOM error? Sid
- Re: Will it lead to OOM error? Deepak Sharma
- Re: Will it lead to OOM error? Enrico Minack
- Re: Will it lead to OOM error? Sid
- Re: Will it lead to OOM error? Enrico Minack
- Re: Will it lead to OOM error? Yong Walt
- Re: Will it lead to OOM error? Sid
Spark Doubts Sid
- Re: Spark Doubts Apostolos N. Papadopoulos
- Re: Spark Doubts Yong Walt
- Re: Spark Doubts Sid
- Spark Doubts Sid
- Re: Spark Doubts Tufan Rakshit
- Re: Spark Doubts Sid
- Re: Spark Doubts russell . spitzer
spark-submit on kubernetes Michaela Bogiages
Spark Summit Europe Gowran, Declan
- Re: Spark Summit Europe Sean Owen
How to guarantee dataset is split over unique partitions (partitioned by a column value) DESCOTTE Loic - externe
- Re: How to guarantee dataset is split over unique partitions (partitioned by a column value) Sean Owen
How reading works? Sid
- Re: How reading works? Sid
- Re: How reading works? Sid
- Re: How reading works? Bjørn Jørgensen
- Re: How reading works? Bjørn Jørgensen
- Re: How reading works? Sid
input file size mbreuer
- Re: input file size marc nicole
- Re: input file size Yong Walt
- Re: input file size Enrico Minack
- Re: input file size Gourav Sengupta
- Re: input file size Enrico Minack
- Re: input file size marc nicole
- Re: input file size Markus Breuer
how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? Sean Owen
- Re: how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? Sean Owen
- Re: how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? Stelios Philippou
- Re: how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? Stelios Philippou
- Re: how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? marc nicole
- Re: how to properly filter a dataset by dates ? marc nicole
How to update TaskMetrics from Python? Shay Elbaz
Spark Structured streaming(batch mode) - running dependent jobs concurrently karan alang
How to recognize and get the min of a date/string column in Java? marc nicole
- Re: How to recognize and get the min of a date/string column in Java? Sean Owen
- Re: How to recognize and get the min of a date/string column in Java? marc nicole
- Re: How to recognize and get the min of a date/string column in Java? marc nicole
- Re: How to recognize and get the min of a date/string column in Java? Sean Owen
- Re: How to recognize and get the min of a date/string column in Java? marc nicole
- Re: How to recognize and get the min of a date/string column in Java? marc nicole
Stickers and Swag Xiao Li
- Re: Stickers and Swag Hyukjin Kwon
- Re: Stickers and Swag Gengliang Wang
- Re: Stickers and Swag Reynold Xin
- Re: Stickers and Swag Qian Sun
Redesign approach for hitting the APIs using PySpark Sid
- Re: Redesign approach for hitting the APIs using PySpark Gourav Sengupta
- Re: Redesign approach for hitting the APIs using PySpark Sid
- Re: Redesign approach for hitting the APIs using PySpark Gourav Sengupta
- Re: Redesign approach for hitting the APIs using PySpark Sid
- Re: Redesign approach for hitting the APIs using PySpark Gourav Sengupta
- Re: Redesign approach for hitting the APIs using PySpark Sid
[no subject] Rodrigo
- Re: Aironman DirtDiver
Spark streaming / confluent Kafka- messages are empty KhajaAsmath Mohammed
API Problem Sid
- Re: API Problem Stelios Philippou
- Re: API Problem Sean Owen
- Re: API Problem Sid
- Re: API Problem Stelios Philippou
- Re: API Problem Sid
- Re: API Problem Enrico Minack
- Re: API Problem Enrico Minack
- Re: API Problem Sid
- Re: API Problem Enrico Minack
- Re: API Problem Sid
- Re: API Problem Enrico Minack
Retrieve the count of spark nodes Poorna Murali
- Re: Retrieve the count of spark nodes Stephen Coy
- Re: Retrieve the count of spark nodes Poorna Murali
to find Difference of locations in Spark Dataframe rows Chetan Khatri
- Re: to find Difference of locations in Spark Dataframe rows Bjørn Jørgensen
How the data is distributed Sid
- Re: How the data is distributed Peyman Mohajerian
- Re: How the data is distributed Sean Owen
- Re: How the data is distributed Sid
Structured streaming with protobuf proto3 schema registry Kiran Biswal
partitionBy creating lot of small files Nikhil Goyal
- Re: partitionBy creating lot of small files Enrico Minack
How to convert a Dataset<Row> to a Dataset<String>? marc nicole
- Re: How to convert a Dataset<Row> to a Dataset<String>? Sean Owen
- Re: How to convert a Dataset<Row> to a Dataset<String>? marc nicole
- Re: How to convert a Dataset<Row> to a Dataset<String>? Sean Owen
- Re: How to convert a Dataset<Row> to a Dataset<String>? marc nicole
- Re: How to convert a Dataset<Row> to a Dataset<String>? Enrico Minack
- Re: How to convert a Dataset<Row> to a Dataset<String>? marc nicole
- Re: How to convert a Dataset<Row> to a Dataset<String>? Enrico Minack
- Re: How to convert a Dataset<Row> to a Dataset<String>? marc nicole
- Re: How to convert a Dataset<Row> to a Dataset<String>? Christophe Préaud
- Re: How to convert a Dataset<Row> to a Dataset<String>? Stelios Philippou
PartitionBy and SortWithinPartitions Nikhil Goyal
- Re: PartitionBy and SortWithinPartitions Enrico Minack
- Re: PartitionBy and SortWithinPartitions Nikhil Goyal
approx_count_distinct in spark always return 1 marc nicole
Does adaptive auto broadcast respect spark.sql.autoBroadcastJoinThreshold Henry Quan
What's the expected Spark 3.1.4 release date ? Sandeep Vinayak
Kotlin API for Apache Spark feedback finkel
Unable to format timestamp values in pyspark Sid
- Re: Unable to format timestamp values in pyspark Stelios Philippou
- Re: Unable to format timestamp values in pyspark Sid
Unable to convert double values Sid
- Re: Unable to convert double values Stelios Philippou
- Re: Unable to convert double values marc nicole
- Re: Unable to convert double values marc nicole
k-anonymity with Spark in Java marc nicole
Issues getting Apache Spark Martin, Michael
- Re: Issues getting Apache Spark Apostolos N. Papadopoulos
java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive Prasanth M Sasidharan
- Fwd: java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.FileUtils.mkdir --> Spark to Hive Prasanth M Sasidharan
Complexity with the data Sid
- Re: Complexity with the data Apostolos N. Papadopoulos
- Re: Complexity with the data Sid
- Re: Complexity with the data Gavin Ray
- Re: Complexity with the data Sid
- Re: Complexity with the data Sid
- Re: Complexity with the data Bjørn Jørgensen
- Re: Complexity with the data Sid
- Re: Complexity with the data Bjørn Jørgensen
- Re: Complexity with the data Sid
- Re: Complexity with the data Apostolos N. Papadopoulos
- Re: Complexity with the data Sid
- Re: Complexity with the data Bjørn Jørgensen
- Re: Complexity with the data Sid
- Re: Complexity with the data Bjørn Jørgensen
- Re: Complexity with the data Gourav Sengupta
- Re: Complexity with the data Sid
[SPARK SQL] Spark Thrift server, It is not releasing memory. Ramakrishna Chilaka
GCP Dataproc - adding multiple packages(kafka, mongodb) while submitting spark jobs not working karan alang
Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Ranadip Chatterjee
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Ori Popowski
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Ranadip Chatterjee
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Aniket Mokashi
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Ori Popowski
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Ranadip Chatterjee
- Re: Job migrated from EMR to Dataproc takes 20 hours instead of 90 minutes Gourav Sengupta
Spark Push-Based Shuffle causing multiple stage failures Han Altae-Tran
- Re: Spark Push-Based Shuffle causing multiple stage failures Mridul Muralidharan
- Re: Spark Push-Based Shuffle causing multiple stage failures Ye Zhou
- Re: Spark Push-Based Shuffle causing multiple stage failures Han Altae-Tran
- Re: Spark Push-Based Shuffle causing multiple stage failures Ye Zhou
how to add a column for percent wilson
- Re: how to add a column for percent Raghavendra Ganesh
Problem with implementing the Datasource V2 API for Salesforce Rohit Pant
- Re: Problem with implementing the Datasource V2 API for Salesforce Gourav Sengupta
Final reminder: ApacheCon North America call for presentations closing soon Rich Bowen
[SQL] Why does a small two-source JDBC query take ~150-200ms with all optimizations (AQE, CBO, pushdown, Kryo, unsafe) enabled? (v3.4.0-SNAPSHOT) Gavin Ray
Spark 3 migration question Jason Xu
What does Apache Spark do? Turritopsis Dohrnii Teo En Ming
- Re: What does Apache Spark do? Pasha Finkelshtein
Stopping streaming after the write commit and before the read commit? kineret M
A scene with unstable Spark performance Bowen Song