Have you tried partitionBy?
Something like
hiveWindowsEvents.foreachRDD( rdd => {
val eventsDataFrame = rdd.toDF()
eventsDataFrame.write.mode(SaveMode.Append).partitionBy("
windows_event_time_bin").saveAsTable("windows_event")
})
On Wed, Oct 28, 2015 at 7:41 AM, Bryan Jeffrey
directKStream.checkpoint(checkpointDuration)
Just calling checkpoint on the streaming context should be sufficient to
save the metadata
On Tue, Aug 25, 2015 at 2:36 PM, Susan Zhang suchenz...@gmail.com wrote:
Sure thing!
The main looks like
affect anything?
On Wed, Aug 26, 2015 at 1:04 PM, Susan Zhang suchenz...@gmail.com wrote:
Thanks for the suggestions! I tried the following:
I removed
createOnError = true
And reran the same process to reproduce. Double checked that checkpoint
is loading:
15/08/26 10:10:40 INFO
lines you posted indicate that the checkpoint was restored and
those offsets were processed; what are the log lines for the following
KafkaRDD ?
On Wed, Aug 26, 2015 at 2:04 PM, Susan Zhang suchenz...@gmail.com wrote:
Compared offsets, and it continues from checkpoint loading:
15/08/26 11:24:54
Yeah. All messages are lost while the streaming job was down.
On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger c...@koeninger.org wrote:
Are you actually losing messages then?
On Tue, Aug 25, 2015 at 1:15 PM, Susan Zhang suchenz...@gmail.com wrote:
No; first batch only contains messages
No; first batch only contains messages received after the second job starts
(messages come in at a steady rate of about 400/second).
On Tue, Aug 25, 2015 at 11:07 AM, Cody Koeninger c...@koeninger.org wrote:
Does the first batch after restart contain all the messages received while
the job was
that reproduces the issue?
On Tue, Aug 25, 2015 at 1:40 PM, Susan Zhang suchenz...@gmail.com wrote:
Yeah. All messages are lost while the streaming job was down.
On Tue, Aug 25, 2015 at 11:37 AM, Cody Koeninger c...@koeninger.org
wrote:
Are you actually losing messages then?
On Tue, Aug