rdblue commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-768668219
Overall, I have no problem adding this. My one reservation is that I would hope that most sources don't explicitly control parallelism and I think adding only this would cause implementations to do exactly that. Instead, I would like to see some factor given to Spark to control parallelism, like bytes per task. That way parallelism can grow as incoming data grows. That said, I know there are cases where parallelism does need to be purposely controlled, like writing to a system with just a few nodes. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org