AngersZhuuuu commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-722380427


   > The cost for latter is linearly increasing based on the number of entities 
in streams, hence the problem appears if the number of entities is quite huge.
   
   Each iterate of all streams is small, but each chunk will trigger count 
`chunksBeingTransferred `, the cost will be  very serious
   and there is also a competition for `streams`'s locks since it is a 
ConcurrentHashMap.
   
   > So that sounds like a trade-off. Probably it would be pretty much 
beneficial to leave it as it is if we assume the number of stream entities will 
retain small enough, but if we assume the case where the number of stream 
entities are quite large, the cost of synchronizing numChunksBeingTransferred 
is going to be smaller.
   
   Yea, that is why I said  the cost of `synchronizing 
numChunksBeingTransferred` is  acceptable. 
   
   > We agreed that the value of numChunksBeingTransferred doesn't need to be 
strictly accurate, so this might be acceptable.
   
   From this point, I think the initial approach is good, the code is simple 
and easy to understand.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to