Github user vackosar commented on the issue: https://github.com/apache/spark/pull/22143 @arunmahadevan min and max are used there can be other writers to same topic occurring in different job. The messages sent would then become interleaved and one would have to return large number of intervals to be accurate. This approach gives sufficient information where the data ended up being written, while being also resilient and simplistic. Would you recommend adding this as a Java Doc? To explain montivation I updated description of this PR using description of the Jira. (To track data lineage we need to know where data was read from and written to at least approaximately.)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org