Which predicate pushdown work or does not work with Parquet?

2017-11-06 Thread Manuel Vonthron
Hi all, I am trying to determine which predicate pushdown work or does not work with Spark+Parquet (mostly for versions 2.1.0 and/or 2.2.0). I've read a lot of messages from the pull requests comments, JIRA tickets, even the comments in Parquet's source but it's hard to have a clear picture of

Re: spark-avro aliases incompatible

2017-11-06 Thread Gourav Sengupta
Hi, I may be wrong about this, but when you are using format("") you are basically using old SPARK classes, which still exists because of backward compatibility. Please refer to the following documentation to take advantage of the recent changes in SPARK:

Re: Structured Stream equivalent of reduceByKey

2017-11-06 Thread Michael Armbrust
Hmmm, I see. You could output the delta using flatMapGroupsWithState

A pyspark sql query

2017-11-06 Thread paulgureghian
are the min,max, and mean functions correct ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: spark-avro aliases incompatible

2017-11-06 Thread Gaspar Muñoz
Of course, right now I'm trying in local with spark 2.2.0 and spark-avro 4.0.0. I've just uploaded a snippet https://gist.github.com/gasparms/5d0740bd61a500357e0230756be963e1 Basically, my avro schema has a field with an alias and in the last part of code spark-avro is not able to read old data

pySpark driver memory limit

2017-11-06 Thread Nicolas Paris
hi there Can anyone clarify the driver memory aspects of pySpark? According to [1], spark.driver.memory limits JVM + python memory. In case: spark.driver.memory=2G Then does it mean the user won't be able to use more than 2G, whatever the python code + the RDD stuff he is using ? Thanks, [1]:

Building Spark with hive 1.1.0

2017-11-06 Thread HARSH TAKKAR
Hi I am using the cloudera (cdh5.11.0) setup, which have the hive version as 1.1.0, but when i build spark with hive and thrift support it pack the hive version as 1.6.0, Please let me know how can i build spark with hive 1.1.0 ? command i am using to build : ./dev/make-distribution.sh --name