[GitHub] [spark] advancedxy commented on pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

via GitHub Wed, 17 May 2023 04:57:57 -0700


advancedxy commented on PR #41192:
URL: https://github.com/apache/spark/pull/41192#issuecomment-1551260315


   I believe one main problem of carrying the byte buffer is that it's 
serialized and deserialized when scheduling tasks. 
   
   When the `FileDescritptorSet` size is large enough or many protobuf 
functions are used, task size would be larger and cause some scheduling 
overhead. It would be much lightweighter to just carrying the file path name.
   
   To address that problem, we would normally broadcast the byte buffer, 
however that may not work well with spark connect?
   
   Do you think it's necessary to give users the option to pass by descriptor 
file a.k.a the current behavior ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] advancedxy commented on pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

Reply via email to