from:"vgankidi"

[GitHub] spark issue #20372: Improved block merging logic for partitions

2018-01-26 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/20372 I agree with @ash211. Applications shouldn't rely on the order of the files within a partition. This optimization looks good to me

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

2017-11-29 Thread vgankidi

Github user vgankidi commented on a diff in the pull request: https://github.com/apache/spark/pull/19633#discussion_r153957431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -424,11 +424,19 @@ case class FileSourceScanExec

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

2017-11-15 Thread vgankidi

Github user vgankidi commented on a diff in the pull request: https://github.com/apache/spark/pull/19633#discussion_r151236188 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -424,11 +424,19 @@ case class FileSourceScanExec

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

2017-11-12 Thread vgankidi

Github user vgankidi commented on a diff in the pull request: https://github.com/apache/spark/pull/19633#discussion_r150431620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -424,11 +424,19 @@ case class FileSourceScanExec

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

2017-11-08 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19633 @gatorsmile Can you please take a look? I'd like to hear your thoughts on this. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

2017-11-08 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19634 We will end up having fewer combined splits. That reduces the number of files that the job produces and also reduces the number of tasks in the downstream jobs. In some tests I have noticed about

[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

2017-11-07 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19634 @gatorsmile I also wanted to discuss if we should consider other bin packing algorithms. According to this http://www.math.unl.edu/~s-sjessie1/203Handouts/Bin%20Packing.pdf, next fit decreasing

[GitHub] spark pull request #19634: [SPARK-22412][SQL] Fix incorrect comment in DataS...

2017-11-01 Thread vgankidi

GitHub user vgankidi opened a pull request: https://github.com/apache/spark/pull/19634 [SPARK-22412][SQL] Fix incorrect comment in DataSourceScanExec ## What changes were proposed in this pull request? Next fit decreasing bin packing algorithm is used to combine splits

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

2017-11-01 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19633 How about using spark.dynamicAllocation.maxExecutors for calculating bytesPerCore when dynamic allocation is enabled

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

2017-11-01 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19633 ping @davies --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

2017-11-01 Thread vgankidi

GitHub user vgankidi opened a pull request: https://github.com/apache/spark/pull/19633 [SPARK-22411][SQL] Disable the heuristic to calculate max partition size when dynamic allocation is enabled and use the value specified by the property spark.sql.files.maxPartitionBytes instead

[GitHub] spark issue #19425: [SPARK-22196][Core] Combine multiple input splits into a...

2017-10-04 Thread vgankidi

Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19425 @davies Can you please take a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #19425: [SPARK-22196][Core] Combine multiple input splits...

2017-10-04 Thread vgankidi

GitHub user vgankidi opened a pull request: https://github.com/apache/spark/pull/19425 [SPARK-22196][Core] Combine multiple input splits into a HadoopPartition ## What changes were proposed in this pull request? Spark native read path allows tuning the partition size based

[GitHub] spark issue #20372: Improved block merging logic for partitions

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

[GitHub] spark pull request #19634: [SPARK-22412][SQL] Fix incorrect comment in DataS...

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

[GitHub] spark issue #19633: [SPARK-22411][SQL] Disable the heuristic to calculate ma...

[GitHub] spark pull request #19633: [SPARK-22411][SQL] Disable the heuristic to calcu...

[GitHub] spark issue #19425: [SPARK-22196][Core] Combine multiple input splits into a...

[GitHub] spark pull request #19425: [SPARK-22196][Core] Combine multiple input splits...

13 matches

Site Navigation

Mail list logo

Footer information