Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 Example report of skipped offsets in a non-compacted non-transactional situation http://mail-archives.apache.org/mod_mbox/kafka-users/201801.mbox/%3ccakwx9vxc1cdosqwwwjk3qmyy3svvtmh+rjdrjyvsbejsds8...@mail.gmail.com%3EFo I asked on the kafka list about ways to tell if an offset is a transactional marker. I also asked about endOffset alternatives, although I think that doesn't totally solve the problem (for instance, in cases where the batch size has been rate limited) On Mon, Aug 6, 2018 at 2:57 AM, Quentin Ambard <notificati...@github.com> wrote: > By failed, you mean returned an empty collection after timing out, even > though records should be available? You don't. You also don't know that it > isn't just lost because kafka skipped a message. AFAIK from the information > you have from a kafka consumer, once you start allowing gaps in offsets, > you don't know. > > Ok that's interesting, my understanding was that if you successfully poll > and get results you are 100% sure that you don't lose anything. Do you have > more details on that? Why would kafka skip a record while consuming? > > Have you tested comparing the results of consumer.endOffsets for consumers > with different isolation levels? > > endOffsets returns the last offset (same as seekToEnd). But you're right > that the easiest solution for us would be to have something like > seekToLastRecord method instead. Maybe something we could also ask ? > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/21917#issuecomment-410620996>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAGAB2FVhHp_76l0WnRg_2WPgzSx1LlSks5uN_bxgaJpZM4VmlWm> > . >
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org