Re: Issue with file names writeStream in Structured Streaming

2019-02-27 Thread Gourav Sengupta
Should that not cause more problems? Regards, Gourav Sengupta On Wed, Feb 27, 2019 at 7:36 PM SRK wrote: > > Hi, > > We are using something like the following to write data to files in > Structured Streaming and we seem to get file names as part* as mentioned in > >

dummy coding in sparklyr

2019-02-27 Thread ya
Dear list, I am trying to run some regression models with big data set using sparklyr. Some of the explanatory variables (Xs) in my model are categorical variables, they have to be converted into dummy codes before the analysis. I understand that in spark columns need to be treated as string

Issue with file names writeStream in Structured Streaming

2019-02-27 Thread SRK
Hi, We are using something like the following to write data to files in Structured Streaming and we seem to get file names as part* as mentioned in https://stackoverflow.com/questions/51056764/how-to-define-a-spark-structured-streaming-file-sink-file-path-or-file-name. How to get file names

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-27 Thread Hien Luu
Thanks for looking into this. Does this mean string fields should alway be nullable? You are right that the result is not yet correct and further digging is needed :( On Wed, Feb 27, 2019 at 1:19 AM Gabor Somogyi wrote: > Hi, > > I was dealing with avro stuff lately and most of the time it

Hadoop free spark on kubernetes => NoClassDefFound

2019-02-27 Thread Sommer Tobias
Hi, we are having problems with using a custom hadoop lib in a spark image when running it on a kubernetes cluster while following the steps of the documentation. Details in the description below. Does anyone else had similar problems? Is there something missing in the setup below? Or is

Spark 2.4.0 Master going down

2019-02-27 Thread lokeshkumar
Hi All We are running Spark version 2.4.0 and we run few Spark streaming jobs listening on Kafka topics. We receive an average of 10-20 msgs per second. And the Spark master has been going down after 1-2 hours of it running. Exception is given below: Along with that spark executors also get

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Mixed up with Spark version. Seems like the issue is different based on Guillermo last mail. On Wed, Feb 27, 2019 at 1:16 PM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi Gabor, > > I guess you are looking at Kafka 2.1 but Guillermo mentioned initially > that they are working with

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Where exactly? In Kafka broker configuration section here it's 10080: https://kafka.apache.org/documentation/ offsets.retention.minutes After a consumer group loses all its consumers (i.e. becomes empty) its offsets will be kept for this retention period before getting discarded. For standalone

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Akshay Bhardwaj
Hi Gabor, I guess you are looking at Kafka 2.1 but Guillermo mentioned initially that they are working with Kafka 1.0 Akshay Bhardwaj +91-97111-33849 On Wed, Feb 27, 2019 at 5:41 PM Gabor Somogyi wrote: > Where exactly? In Kafka broker configuration section here it's 10080: >

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Guillermo Ortiz
I'm going to check the value, but I didn't change it., normally, the process is always running but sometimes I have to restarted to apply some changes. Sometimes it starts from the beginning and others continue for the last offset. El miƩ., 27 feb. 2019 a las 12:25, Akshay Bhardwaj (<

Spark on k8s - map persistentStorage for data spilling

2019-02-27 Thread Tomasz Krol
Hey Guys, I hope someone will be able to help me, as I've stuck with this for a while:) Basically I am running some jobs on kubernetes as per documentation https://spark.apache.org/docs/latest/running-on-kubernetes.html All works fine, however if I run queries on bigger data volume, then jobs

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Akshay Bhardwaj
Hi Gabor, I am talking about offset.retention.minutes which is set default as 1440 (or 24 hours) Akshay Bhardwaj +91-97111-33849 On Wed, Feb 27, 2019 at 4:47 PM Gabor Somogyi wrote: > Hi Akshay, > > The feature what you've mentioned has a default value of 7 days... > > BR, > G > > > On Wed,

Re: Spark Streaming - Proeblem to manage offset Kafka and starts from the beginning.

2019-02-27 Thread Gabor Somogyi
Hi Akshay, The feature what you've mentioned has a default value of 7 days... BR, G On Wed, Feb 27, 2019 at 7:38 AM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi Guillermo, > > What was the interval in between restarting the spark job? As a feature in > Kafka, a broker deleted

Re: to_avro and from_avro not working with struct type in spark 2.4

2019-02-27 Thread Gabor Somogyi
Hi, I was dealing with avro stuff lately and most of the time it has something to do with the schema. One thing I've pinpointed quickly (where I was struggling also) is the name field should be nullable but the result is not yet correct so further digging needed... scala> val expectedSchema =

Faster Spark ML training using accelerators

2019-02-27 Thread inaccel
If you want to run much faster your Spark ML applications (up to 14x faster), you can now use the Accelerated ML suite from inaccel on aws. The Accelerated ML suite allows to offload the most computational intensive part of the ML tasks to FPGA hardware accelerators

How to start two Workers connected to two different masters

2019-02-27 Thread onmstester onmstester
I have two java applications sharing the same spark cluster, the applications should be running on different servers. Based on my experience, if spark driver (inside java application) connects remotely to spark master (which is running on different node), then the response time to submit a