[GitHub] spark pull request #16347: [SPARK-18934][SQL] Writing to dynamic partitions ...

2017-06-19 Thread junegunn
Github user junegunn closed the pull request at: https://github.com/apache/spark/pull/16347 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-06-19 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 Hive makes sure that the output file is properly sorted by the column specified in `SORT BY` clause by having only one reduce task (output) for each partition. ``` STAGE PLANS

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-05-23 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 See my answer above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-04-11 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 @cloud-fan It's not a problem in the context of DataFrame API. But when it comes to Spark SQL, it makes Spark SQL incompatible to equivalent HiveQL in a subtle way. At least we may need to revisit

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-04-11 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 @cloud-fan Unfortunately, yes. ```scala sc.parallelize(1 to 1000).toDS.withColumn("part", 'value.mod(2)) .repartition(1, 'part).sortWithinPartitions("value"

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-19 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 Rebased to current master. The patch is simpler thanks to the refactoring made in [SPARK-18243](https://issues.apache.org/jira/browse/SPARK-18243). Anyway, I can understand your rationale

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-04 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 @chpritchard-expedia The patch here fixes the problem. I don't think it's possible to workaround the issue by using Spark API in some different ways, because we can't completely avoid memory

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2016-12-20 Thread junegunn
Github user junegunn commented on the issue: https://github.com/apache/spark/pull/16347 Thanks for the comment. I was trying to implement the following Hive QL in Spark SQL/API: ```sql set hive.exec.dynamic.partition.mode=nonstrict; set hive.mapred.mode = nonstrict

[GitHub] spark pull request #16347: [SPARK-18934][SQL] Writing to dynamic partitions ...

2016-12-19 Thread junegunn
GitHub user junegunn opened a pull request: https://github.com/apache/spark/pull/16347 [SPARK-18934][SQL] Writing to dynamic partitions does not preserve sort order if spills occur ## What changes were proposed in this pull request? Make dynamic partition writer perform