date:20181113

Re: [Spark SQL] Does Spark group small files

2018-11-13 Thread Silvio Fiorito

Yes, it does bin-packing for small files which is a good thing so you avoid having many small partitions especially if you’re writing this data back out (e.g. it’s compacting as you read). The default partition size is 128MB with a 4MB “cost” for opening files. You can configure this using the

[ANNOUNCE] Apache Toree 0.3.0-incubating Released

2018-11-13 Thread Luciano Resende

Apache Toree is a kernel for the Jupyter Notebook platform providing interactive and remote access to Apache Spark. The Apache Toree community is pleased to announce the release of Apache Toree 0.3.0-incubating which provides various bug fixes and the following enhancements. * Fix JupyterLab

[ANNOUNCE] Apache Bahir 2.2.2 Released

2018-11-13 Thread Luciano Resende

Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.2.2 which provides the following extensions for Apache

[ANNOUNCE] Apache Bahir 2.1.3 Released

2018-11-13 Thread Luciano Resende

Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.1.3 which provides the following extensions for Apache

inferred schemas for spark streaming from a Kafka source

2018-11-13 Thread Colin Williams

Does anybody know how to use inferred schemas with structured streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#schema-inference-and-partition-of-streaming-dataframesdatasets I have some code like : object StreamingApp { def launch(config: Config,

[Spark SQL] Does Spark group small files

2018-11-13 Thread Yann Moisan

Hello, I'm using Spark 2.3.1. I have a job that reads 5.000 small parquet files into s3. When I do a mapPartitions followed by a collect, only *278* tasks are used (I would have expected 5000). Does Spark group small files ? If yes, what is the threshold for grouping ? Is it configurable ? Any

Failed to convert java.sql.Date to String

2018-11-13 Thread luby

Hi, All, I'm new to Spark SQL and just start to use it in our project. We are using spark 2. When importing data from a Hive table, I got the following error: if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class

Re: [Spark SQL] Does Spark group small files

[ANNOUNCE] Apache Toree 0.3.0-incubating Released

[ANNOUNCE] Apache Bahir 2.2.2 Released

[ANNOUNCE] Apache Bahir 2.1.3 Released

inferred schemas for spark streaming from a Kafka source

[Spark SQL] Does Spark group small files

Failed to convert java.sql.Date to String

7 matches

Site Navigation

Mail list logo

Footer information