[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 'The ShuffleWriter should treat RangePartitioner specially and consume the sampled data in RangePartitioner instead of the input iterator.' This idea is good, maybe we can cache both the K and V

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 I read the source code again. The RangePartitioner[K, V] in ShuffleExchangeExec is an instance of RangePartitioner[InternalRow, Null]. RangePartitioner only sample K for getting

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r211270748 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1207,6 +1207,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-20 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r211230520 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,9 +169,20 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-19 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 We may not know in advance how big this query is. The data at the beginning is large, but it may be very small after filtering. I encountered this problem while using thrift server for queries

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-19 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r211131380 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1207,6 +1207,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-19 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r211131294 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1207,6 +1207,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-19 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r211130877 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -155,6 +156,8 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-13 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 I think I need another retest . Please help. @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-12 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209486199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-12 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 please help retest it . @kiszk @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-11 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209420016 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-11 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209418116 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-11 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209418058 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-11 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209417745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209417551 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-10 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r209417115 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 @ueshin please retest it, an unkown error occurred. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208804032 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208803622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208803055 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 This optimization is only for SQL, but other places also use RangePartitioner. What it can affect other places? The failed UTs are caused by ``` else if (sampleCacheEnabled

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208801641 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -294,7 +296,12 @@ object ShuffleExchangeExec

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208801492 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208801520 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 @ueshin please test again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-07 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208441135 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-07 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208441067 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark pull request #22028: [SPARK-25046][SQL] Fix Alter View can excute sql ...

2018-08-07 Thread sddyljsx
GitHub user sddyljsx opened a pull request: https://github.com/apache/spark/pull/22028 [SPARK-25046][SQL] Fix Alter View can excute sql like "ALTER VIEW ... AS INSERT INTO" ## What changes were proposed in this pull request? Alter View can excute sql like &

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-07 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208129700 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -166,7 +170,13 @@ class RangePartitioner[K : Ordering : ClassTag, V

[GitHub] spark pull request #21859: [SPARK-24900][SQL]Speed up sort when the dataset ...

2018-08-07 Thread sddyljsx
Github user sddyljsx commented on a diff in the pull request: https://github.com/apache/spark/pull/21859#discussion_r208125902 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala --- @@ -0,0 +1,85 @@ +/* + * Licensed

[GitHub] spark issue #21859: [SPARK-24900][SQL]speed up sort when the dataset is smal...

2018-08-02 Thread sddyljsx
Github user sddyljsx commented on the issue: https://github.com/apache/spark/pull/21859 @felixcheung Thanks for review. **1. How small is 'small':** This optimazition works when the sampled data of the RangePartitioner covers all the data to sort

[GitHub] spark pull request #21859: [SPARK-24900][SQL]speed up sort when the dataset ...

2018-07-24 Thread sddyljsx
GitHub user sddyljsx opened a pull request: https://github.com/apache/spark/pull/21859 [SPARK-24900][SQL]speed up sort when the dataset is small ## What changes were proposed in this pull request? when running the sql like 'select * from order where order_status = 4 order