Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Susan Zhang
Have you tried partitionBy? Something like hiveWindowsEvents.foreachRDD( rdd => { val eventsDataFrame = rdd.toDF() eventsDataFrame.write.mode(SaveMode.Append).partitionBy(" windows_event_time_bin").saveAsTable("windows_event") }) On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
directKStream.checkpoint(checkpointDuration) Just calling checkpoint on the streaming context should be sufficient to save the metadata On Tue, Aug 25, 2015 at 2:36 PM, Susan Zhang suchenz...@gmail.com wrote: Sure thing! The main looks like

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
affect anything? On Wed, Aug 26, 2015 at 1:04 PM, Susan Zhang suchenz...@gmail.com wrote: Thanks for the suggestions! I tried the following: I removed createOnError = true And reran the same process to reproduce. Double checked that checkpoint is loading: 15/08/26 10:10:40 INFO

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-26 Thread Susan Zhang
lines you posted indicate that the checkpoint was restored and those offsets were processed; what are the log lines for the following KafkaRDD ? On Wed, Aug 26, 2015 at 2:04 PM, Susan Zhang suchenz...@gmail.com wrote: Compared offsets, and it continues from checkpoint loading: 15/08/26 11:24:54

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
Yeah. All messages are lost while the streaming job was down. On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger c...@koeninger.org wrote: Are you actually losing messages then? On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang suchenz...@gmail.com wrote: No; first batch only contains messages

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
No; first batch only contains messages received after the second job starts (messages come in at a steady rate of about 400/second). On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger c...@koeninger.org wrote: Does the first batch after restart contain all the messages received while the job was

Re: Spark Streaming Checkpointing Restarts with 0 Event Batches

2015-08-25 Thread Susan Zhang
that reproduces the issue? On Tue, Aug 25, 2015 at 1:40 PM, Susan Zhang suchenz...@gmail.com wrote: Yeah. All messages are lost while the streaming job was down. On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger c...@koeninger.org wrote: Are you actually losing messages then? On Tue, Aug