Re: uncontinuous offset in kafka will cause the spark streaming failure

2018-01-23 Thread Justin Miller
We appear to be kindred spirits, I’ve recently run into the same issue. Are you running compacted topics? I’ve run into this issue on non-compacted topics as well, it happens rarely but is still a pain. You might check out this patch and related spark streaming Kafka ticket:

Re: "Got wrong record after seeking to offset" issue

2018-01-18 Thread Justin Miller
ote: > > https://kafka.apache.org/documentation/#compaction > > On Thu, Jan 18, 2018 at 1:17 AM, Justin Miller > <justin.mil...@protectwise.com> wrote: >> By compacted do you mean compression? If so then we did recently turn on lz4 >> compression. If there’s another me

Re: "Got wrong record after seeking to offset" issue

2018-01-17 Thread Justin Miller
the driver told it to > get, something's generally wrong. > > What happens when you try to consume the particular failing offset > from another (e.g. commandline) consumer? > > Is the topic in question compacted? > > > > On Tue, Jan 16, 2018 at 11:10 PM, Justin Miller > <

"Got wrong record after seeking to offset" issue

2018-01-16 Thread Justin Miller
Greetings all, I’ve recently started hitting on the following error in Spark Streaming in Kafka. Adjusting maxRetries and spark.streaming.kafka.consumer.poll.ms even to five minutes doesn’t seem to be helping. The problem only manifested in the last few days, restarting with a new consumer

Forcing either Hive or Spark SQL representation for metastore

2017-05-18 Thread Justin Miller
Hello, I was wondering if there were a way to force one representation or another for the Hive metastore. Some of our data can’t be parsed with the Hive method so it switches over to the Spark SQL method, leaving some of our data stored in Spark SQL format and some in Hive format. It’d be nice

Spark Streaming Kafka Job has strange behavior for certain tasks

2017-04-05 Thread Justin Miller
Greetings! I've been running various spark streaming jobs to persist data from kafka topics and one persister in particular seems to have issues. I've verified that the number of messages is the same per partition (roughly of course) and the volume of data is a fraction of the volume of other

Re: How to gracefully handle Kafka OffsetOutOfRangeException

2017-03-10 Thread Justin Miller
pen a JIRA. It would be great if > there is a fix. I'm just saying I know a similar issue does not exist in > structured streaming. > > On Fri, Mar 10, 2017 at 7:46 AM, Justin Miller <justin.mil...@protectwise.com > <mailto:justin.mil...@protectwise.com>> wrote: >

Re: How to gracefully handle Kafka OffsetOutOfRangeException

2017-03-10 Thread Justin Miller
Hi Michael, I'm experiencing a similar issue. Will this not be fixed in Spark Streaming? Best, Justin > On Mar 10, 2017, at 8:34 AM, Michael Armbrust wrote: > > One option here would be to try Structured Streaming. We've added an option > "failOnDataLoss" that will

Re: Is there any scheduled release date for Spark 2.1.0?

2016-12-28 Thread Justin Miller
> attached to all of the code that is currently intended for the associated > release number. > > On Wed, Dec 28, 2016 at 3:09 PM, Justin Miller <justin.mil...@protectwise.com > <mailto:justin.mil...@protectwise.com>> wrote: > It looks like the jars for 2.1.0-SNAP

Re: Is there any scheduled release date for Spark 2.1.0?

2016-12-28 Thread Justin Miller
ta.com > <mailto:ko...@tresata.com>> wrote: > seems like the artifacts are on maven central but the website is not yet > updated. > > strangely the tag v2.1.0 is not yet available on github. i assume its equal > to v2.1.0-rc5 > > On Fri, Dec 23, 2016 at 10:52 AM, Jus

Re: Is there any scheduled release date for Spark 2.1.0?

2016-12-23 Thread Justin Miller
I'm curious about this as well. Seems like the vote passed. > On Dec 23, 2016, at 2:00 AM, Aseem Bansal wrote: > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org