Re: [Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-03-20 Thread dcam
I'm just circling back to this now. Is the commit protocol an acceptable way of making this configureable? I could make the temp path (currently "_temporary") configureable if that is what you are referring to. Michael Armbrust wrote > We didn't go this way initially because it doesn't work on

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-13 Thread dcam
Hi Mina I believe this is different for Structured Streaming from Kafka, specifically. I'm assuming you are using structured streaming based on the name of the dependency ("spark-streaming-kafka"). There is a note in the docs here:

Re: [Structured Streaming] Avoiding multiple streaming queries

2018-02-13 Thread dcam
Hi Priyank I have a similar structure, although I am reading from Kafka and sinking to multiple MySQL tables. My input stream has multiple message types and each is headed for a different MySQL table. I've looked for a solution for a few months, and have only come up with two alternatives: 1.

Re: Spark streaming: java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration ... on restart from checkpoint

2017-08-08 Thread dcam
Considering the @transient annotations and the work done in the instance initializer, not much state is really be broadcast to the executors. It might be simpler to just create these instances on the executors, rather than trying to broadcast them? -- View this message in context:

[Spark Structured Streaming]: truncated Parquet after driver crash or kill

2017-08-08 Thread dcam
Hello list We have a Spark application that performs a set of ETLs: reading messages from a Kafka topic, categorizing them, and writing the contents out as Parquet files on HDFS. After writing, we are querying the data from HDFS using Presto's hive integration. We are having problems because the