Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

via GitHub Wed, 17 Apr 2024 00:08:55 -0700


pspoerri commented on PR #34864:
URL: https://github.com/apache/spark/pull/34864#issuecomment-2060527524


   @steveloughran How do I call the Hue APIs from Spark? Can you point me to a 
package?
   I agree with you that using the Hadoop APIs are not ideal performance wise, 
but they are great from a usability and portability perspective.
   
   Another issue is that Hadoop wants to know the size of every file it wants 
to read. While this makes sense for formats like parquet where the header is 
located at the last few bytes of the file. It does not make sense for shuffle 
where you know the exact block/file you want to read.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

Reply via email to