Re: intermittent Kryo serialization failures in Spark

2019-09-17 Thread Vadim Semenov
Pre-register your classes: ``` import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyKryoRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo): Unit = { kryo.register(Class.forName("[[B")) // byte[][]

Re: custom rdd - do I need a hadoop input format?

2019-09-17 Thread Arun Mahadevan
You can do it with custom RDD implementation. You will mainly implement "getPartitions" - the logic to split your input into partitions and "compute" to compute and return the values from the executors. On Tue, 17 Sep 2019 at 08:47, Marcelo Valle wrote: > Just to be more clear about my

Re: custom rdd - do I need a hadoop input format?

2019-09-17 Thread Marcelo Valle
Just to be more clear about my requirements, what I have is actually a custom format, with header, summary and multi line blocks. I want to create tasks per block and no per line.I already have a library that reads an InputStream and outputs an Iterator of Block, but now I need to integrate this

custom rdd - do I need a hadoop input format?

2019-09-17 Thread Marcelo Valle
Hi, I want to create a custom RDD which will read n lines in sequence from a file, which I call a block, and each block should be converted to a spark dataframe to be processed in parallel. Question - do I have to implement a custom hadoop input format to achieve this? Or is it possible to do it

Can I set the Alluxio WriteType in Spark applications?

2019-09-17 Thread Mark Zhao
Hi, If Spark applications write data into alluxio, can WriteType be configured? Thanks, Mark

Re: intermittent Kryo serialization failures in Spark

2019-09-17 Thread Jerry Vinokurov
Hi folks, Posted this some time ago but the problem continues to bedevil us. I'm including a (slightly edited) stack trace that results from this error. If anyone can shed any light on what exactly is happening here and what we can do to avoid it, that would be much appreciated.

How to Integrate Spark mllib Streaming Training Models To Spark Structured Streaming

2019-09-17 Thread Praful Rana
Spark mllib library Streaming Training models work with DStream. So is there any way to use them with spark structured streaming.

How to integrates MLeap to Spark Structured Streaming

2019-09-17 Thread Praful Rana
So I am trying to integrate MLeap to spark structured streaming, But facing a problem. As the Spark structured Streaming with Kafka works with data frames and for MLeap LeapFrame is required. So I tried to convert data frame to leapframe using mleap spark support library function

Re: Re: how can I dynamic parse json in kafka when using Structured Streaming

2019-09-17 Thread lk_spark
I want to parse the Struct of data dynamically , then write data to delta lake , I think it can automatically merge scheme. 2019-09-17 lk_spark 发件人:Tathagata Das 发送时间:2019-09-17 16:13 主题:Re: how can I dynamic parse json in kafka when using Structured Streaming 收件人:"lk_spark"

Re: how can I dynamic parse json in kafka when using Structured Streaming

2019-09-17 Thread Tathagata Das
You can use *from_json* built-in SQL function to parse json. https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#from_json-org.apache.spark.sql.Column-org.apache.spark.sql.Column- On Mon, Sep 16, 2019 at 7:39 PM lk_spark wrote: > hi,all : > I'm using Structured