AngersZhuuuu commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-722380427
> The cost for latter is linearly increasing based on the number of entities in streams, hence the problem appears if the number of entities is quite huge. Each iterate of all streams is small, but each chunk will trigger count `chunksBeingTransferred `, the cost will be very serious and there is also a competition for `streams`'s locks since it is a ConcurrentHashMap. > So that sounds like a trade-off. Probably it would be pretty much beneficial to leave it as it is if we assume the number of stream entities will retain small enough, but if we assume the case where the number of stream entities are quite large, the cost of synchronizing numChunksBeingTransferred is going to be smaller. Yea, that is why I said the cost of `synchronizing numChunksBeingTransferred` is acceptable. > We agreed that the value of numChunksBeingTransferred doesn't need to be strictly accurate, so this might be acceptable. From this point, I think the initial approach is good, the code is simple and easy to understand. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org