Re: Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-27 Thread Darshan Pandya
you can also try to set "spark.io.compression.codec" to "snappy" to try a different compression codec On Fri, Aug 16, 2019 at 10:14 AM Vadim Semenov wrote: > This is what you're looking for: > > Handle large corrupt shuffle blocks > https://issues.apache.org/jira/browse/SPARK-26089 > > So

Re: Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-16 Thread Darshan Pandya
ly, Darshan On Mon, Oct 16, 2017 at 12:08 PM, Burak Yavuz <brk...@gmail.com> wrote: > Hi Darshan, > > How are you creating your kafka stream? Can you please share the options > you provide? > > spark.readStream.format("kafka") > .option(...) // all these pl

Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-14 Thread Darshan Pandya
Hello, I'm using Spark 2.1.0 on CDH 5.8 with kafka 0.10.0.1 + kerberos I am unable to connect to the kafka broker with the following message 17/10/14 14:29:10 WARN clients.NetworkClient: Bootstrap broker 10.197.19.25:9092 disconnected and is unable to consume any messages. And am using it as

Any rabbit mq connect for spark structured streaming ?

2017-10-05 Thread Darshan Pandya
-- Sincerely, Darshan

Spark Yarn mode - unsupported exception

2017-08-17 Thread Darshan Pandya
Hello Users, I am running into a spark issue "Unsupported major.minor version 52.0" The code I am trying to run is https://github.com/cpitman/spark-drools-example/ This code runs fine in spark local mode but fails horribly with the above exception when you submit the job in the yarn mode.

Serialization error - sql UDF related

2017-02-17 Thread Darshan Pandya
Hello, I am getting the famous serialization exception on running some code as below, val correctColNameUDF = udf(getNewColumnName(_: String, false: Boolean): String); val charReference: DataFrame = thinLong.select("char_name_id", "char_name").withColumn("columnNameInDimTable",

Re: pivot over non numerical data

2017-02-02 Thread Darshan Pandya
nary > hint, you can "aggregate" text values using *max*. > > df.groupBy("someCol") > .pivot("anotherCol") > .agg(max($"textCol")) > > Thanks, > Kevin > > On Wed, Feb 1, 2017 at 2:02 PM, Darshan Pandya <darshanpan...@gmail.com>

pivot over non numerical data

2017-02-01 Thread Darshan Pandya
Hello, I am trying to transpose some data using groupBy pivot aggr as mentioned in this blog https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html But this works only for numerical data. Any hints for doing the same thing for non numerical data ? -- Sincerely,