Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Shixiong(Ryan) Zhu
I recommend you to use Structured Streaming as it has a patch that can workaround this issue: https://issues.apache.org/jira/browse/SPARK-26267 Best Regards, Ryan On Tue, Apr 30, 2019 at 3:34 PM Shixiong(Ryan) Zhu wrote: > There is a known issue that Kafka may return a wrong offset even if

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Shixiong(Ryan) Zhu
There is a known issue that Kafka may return a wrong offset even if there is no reset happening: https://issues.apache.org/jira/browse/KAFKA-7703 Best Regards, Ryan On Tue, Apr 30, 2019 at 10:41 AM Austin Weaver wrote: > @deng - There was a short erroneous period where 2 streams were reading

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Austin Weaver
@deng - There was a short erroneous period where 2 streams were reading from the same topic and group id were running at the same time. We saw errors in this and stopped the extra stream. That being said, I would think regardless that the auto.offset.reset would kick in sine documentation says

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Akshay Bhardwaj
Hi Austin, Are you using Spark Streaming or Structured Streaming? For better understanding, could you also provide sample code/config params for your spark-kafka connector for the said streaming job? Akshay Bhardwaj +91-97111-33849 On Mon, Apr 29, 2019 at 10:34 PM Austin Weaver wrote: >

Issue with offset management using Spark on Dataproc

2019-04-29 Thread Austin Weaver
Hey guys, relatively new Spark Dev here and i'm seeing some kafka offset issues and was wondering if you guys could help me out. I am currently running a spark job on Dataproc and am getting errors trying to re-join a group and read data from a kafka topic. I have done some digging and am not