gaogaotiantian commented on PR #55726: URL: https://github.com/apache/spark/pull/55726#issuecomment-4400809350
First of all, I think this is super useful. This extra task however, takes an extra slot from out 20 concurrent job limit. I'm definitely not saying we should do this - we should definitely do it, without any question, but we also need to think about if it's worth it to give it a separate slot, or if we can combine this with some other prereq jobs (for example, when we detected that we need to build it, we just build). Another observation is this | Task | download/upload | tar/untar | ---- | ----------------- | -------- | | Compile | 1m2s | 4m52s | | Use | 25s | 2m38s | which makes me wonder - maybe we should use a less aggressive compress algorithm? zstd or gzip? We are spending much more time to compress/decompress, than upload/download. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
