Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22138 I just applied a new approach: "separate of concerns". This approach does pooling for both kafka consumers as well as fetched data. Both pools support eviction on idle objects, which will help closing invalid idle objects which topic or partition are no longer be assigned to any tasks. It also enables applying different policies on pool, which helps optimization of pooling for each pool. We concerned about multiple tasks pointing same topic partition as well as same group id, and existing code can't handle this hence excess seek and fetch could happen. This approach properly handles the case. It also makes the code always safe to leverage cache, hence no need to maintain reuseCache parameter. @koeninger @tdas @zsxwing @arunmahadevan Could you please take a look at the new approach? I think this approach solves multiple issues existing code has.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org