[GitHub] [spark] rdblue commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

GitBox Wed, 27 Jan 2021 16:26:33 -0800


rdblue commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-768668219



   Overall, I have no problem adding this. My one reservation is that I would 
hope that most sources don't explicitly control parallelism and I think adding 
only this would cause implementations to do exactly that. Instead, I would like 
to see some factor given to Spark to control parallelism, like bytes per task. 
That way parallelism can grow as incoming data grows. That said, I know there 
are cases where parallelism does need to be purposely controlled, like writing 
to a system with just a few nodes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

Reply via email to