Re: RV: Unintelligible warning arose out of the blue.

2018-05-04 Thread Marco Mistroni
Hi i think it has to do with spark configuration, dont think the standard configuration is geared up to be running in local mode on windows your dataframe is ok, you can check out that you have read it successfully by printing out df.count() and you will see your code is reading the dataframe

RV: Unintelligible warning arose out of the blue.

2018-05-04 Thread Tomas Zubiri
De: Tomas Zubiri Enviado: viernes, 04 de mayo de 2018 04:23 p.m. Para: user@spark.apache.org Asunto: Unintelligible warning arose out of the blue. My setup is as follows: Windows 10 Python 3.6.5 Spark 2.3.0 The latest java jdk winutils/hadoop installed from

Re: [pyspark] Read multiple files parallely into a single dataframe

2018-05-04 Thread Irving Duran
I could be wrong, but I think you can do a wild card. df = spark.read.format('csv').load('/path/to/file*.csv.gz') Thank You, Irving Duran On Fri, May 4, 2018 at 4:38 AM Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > Hi, > > I want to read multiple files parallely into 1

Re: Free Column Reference with $

2018-05-04 Thread Vadim Semenov
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala#L38-L47 It's called String Interpolation See "Advanced Usage" here https://docs.scala-lang.org/overviews/core/string-interpolation.html On Fri, May 4, 2018 at 10:10 AM, Christopher Piggott

Free Column Reference with $

2018-05-04 Thread Christopher Piggott
How does $"something" actually work (from a scala perspective) as a free column reference?

Re: AccumulatorV2 vs AccumulableParam (V1)

2018-05-04 Thread Sergey Zhemzhitsky
Hi Wenchen, Thanks a lot for clarification and help. Here is what I mean regarding the remaining points For 2: Should we update the documentation [1] regarding custom accumulators to be more clear and to highlight that a) custom accumulators should always override "copy" method to prevent

[pyspark] Read multiple files parallely into a single dataframe

2018-05-04 Thread Shuporno Choudhury
Hi, I want to read multiple files parallely into 1 dataframe. But the files have random names and cannot confirm to any pattern (so I can't use wildcard). Also, the files can be in different directories. If I provide the file names in a list to the dataframe reader, it reads then sequentially.

Re: Pickling Keras models for use in UDFs

2018-05-04 Thread Khaled Zaouk
Why don't you try to encapsulate your keras model within a wrapper class (an estimator let's say), and you implement inside this wrapper class the two functions: __getstate__ and __setstate__ On Thu, May 3, 2018 at 5:27 PM erp12 wrote: > I would like to create a Spark

I cannot use spark 2.3.0 and kafka 0.9?

2018-05-04 Thread kant kodali
Hi All, This link seems to suggest I cant use Spark 2.3.0 and Kafka 0.9 broker. is that correct? https://spark.apache.org/docs/latest/streaming-kafka-integration.html Thanks!

SparkContext taking time after adding jars and asking yarn for resources

2018-05-04 Thread neeravsalaria
In my production setup spark is always taking 40 seconds between these steps like a fixed counter is set. In my local lab these steps take exact 1 second. I am not able to find the exact root cause of this behaviour. My Spark application is running on Hortonworks platform in yarn client mode. Can

Re: question on collect_list or say aggregations in general in structured streaming 2.3.0

2018-05-04 Thread kant kodali
1) I get an error when I set watermark to 0. 2) I set window and slide interval to 1 second with no watermark. It sill aggregates messages from the previous batch that are in 1 second window. so is it fair to say there is no declarative way to do stateless aggregations? On Thu, May 3, 2018 at