[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-11-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r237737064 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-11-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r237736770 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22965: [SPARK-25964][SQL][Minor] Revise OrcReadBenchmark...

2018-11-08 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22965#discussion_r231908870 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala --- @@ -32,9 +32,11 @@ import org.apache.spark.sql.types

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Rename and refactor Benc...

2018-11-07 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22823#discussion_r231771399 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideTableBenchmark.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Rename and refactor Benc...

2018-11-06 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22823#discussion_r231198094 --- Diff: sql/core/benchmarks/WideTableBenchmark-results.txt --- @@ -0,0 +1,17

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWi...

2018-11-06 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22823 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Rename and refactor Benc...

2018-11-06 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22823#discussion_r231116690 --- Diff: sql/core/benchmarks/WideTableBenchmark-results.txt --- @@ -0,0 +1,17

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...

2018-11-05 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22823 @dongjoon-hyun Just push the rebased version, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-02 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 @cloud-fan @rednaxelafx I missed that! Please help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 @cloud-fan @gatorsmile How about merging this PR first? And then we can dissuss those performance issue in other PR? 1. One PR to improve WideTableBenchmark #22823 WIP. 2. One PR to add more

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-11-01 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r230248773 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 I used the WideTableBenchmark to test this configuration. 4 scenarioes are tested, `2048` is always better than `1024`, overall it is also good and looks more safe to avoid hitting 8KB limitaion

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r229919857 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r229638917 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf

[GitHub] spark issue #22879: [SPARK-25872][SQL][TEST] Add an optimizer tracker for TP...

2018-10-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22879 `tpcdsQueries` and `tpcdsQueriesV2_7` are duplicated to `TPCDSQueryBenchmark`'s, should we maintain them together

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 @cloud-fan @dongjoon-hyun @kiszk I just add a negative check, maybe we need another PR to figure better value later if it is not easy to decide now

[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22861 @dongjoon-hyun Tests have been passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor Bu...

2018-10-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22861#discussion_r229162191 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroWriteBenchmark.scala --- @@ -19,22 +19,17 @@ package

[GitHub] spark pull request #22845: [SPARK-25848][SQL][TEST] Refactor CSVBenchmarks t...

2018-10-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22845#discussion_r229011040 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala --- @@ -137,22 +124,15 @@ object CSVBenchmarks extends

[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22844#discussion_r229010476 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala --- @@ -195,23 +170,16 @@ object JSONBenchmarks

[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22861 @dongjoon-hyun I used #22872 to make main args accessible for `BenchmarkBase`'s subclass, this PR is mainly for refactoring `DataSourceWriteBenchmark` and `BuiltInDataSourceWriteBenchmark

[GitHub] spark issue #22872: [SPARK-25864][SQL][TEST] Make main args accessible for B...

2018-10-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22872 @HyukjinKwon @cloud-fan My previous tests have been passed :). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22861 Current implementation misses main args, but some suite would need it anyway. Let's can discuss #22872. --- - To unsubscribe

[GitHub] spark issue #22872: [SPARK-25864][SQL][TEST] Make main args accessible for B...

2018-10-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22872 @gengliangwang @dongjoon-hyun @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22872: [SPARK-25864][SQL][TEST] Make main args accessibl...

2018-10-29 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22872 [SPARK-25864][SQL][TEST] Make main args accessible for BenchmarkBase's subclass ## What changes were proposed in this pull request? Set main args correctly in BenchmarkBase, to make

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-28 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22847#discussion_r228788123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -812,6 +812,17 @@ object SQLConf { .intConf

[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-28 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22861 @dongjoon-hyun Originally, I want to do two things in this PR. 1. Make `mainArgs` correctly set in `BenchmarkBase`. 2. Include an example to use `mainArgs`: refactor `DataSourceWriteBenchmark

[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-27 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22861 @gengliangwang @wangyum @dongjoon-hyun help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor Bu...

2018-10-27 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22861 [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDataSourceWriteBenchmark, DataSourceWriteBenchmark and AvroWriteBenchmark to use main method ## What changes were proposed in this pull request

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Improve BenchmarkWideTab...

2018-10-27 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22823#discussion_r228715829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala --- @@ -1,52 +0,0 @@ -/* - * Licensed

[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-26 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22847 @cloud-fan @dongjoon-hyun @gengliangwang Kindly help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-26 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22847 [SPARK-25850][SQL] Make the split threshold for the code generated method configurable ## What changes were proposed in this pull request? As per the [discussion](https://github.com/apache

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTa...

2018-10-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22823#discussion_r228387605 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -910,12 +910,14 @@ class CodegenContext

[GitHub] spark issue #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to ...

2018-10-25 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22823 Thanks @wangyum for good suggestion! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22823: [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTa...

2018-10-25 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22823 [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to use main method ## What changes were proposed in this pull request? Refactor BenchmarkWideTable to use main method. Generate

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-10-24 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/19788 @cloud-fan , exception screenshot. Let me know if you want any change. ![image](https://user-images.githubusercontent.com/2989575/47471258-1793ce00-d83c-11e8-90bf-107865fc9032.png

[GitHub] spark pull request #21156: [SPARK-24087][SQL] Avoid shuffle when join keys a...

2018-10-23 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/21156#discussion_r227268021 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinUtils.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22580: [SPARK-25508][SQL] Refactor OrcReadBenchmark to u...

2018-09-28 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22580 [SPARK-25508][SQL] Refactor OrcReadBenchmark to use main method ## What changes were proposed in this pull request? Refactor OrcReadBenchmark to use main method. Generate benchmark

[GitHub] spark issue #22493: [SPARK-25485][SQL][TEST] Refactor UnsafeProjectionBenchm...

2018-09-26 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22493 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22493: [SPARK-25485][SQL][TEST] Refactor UnsafeProjectionBenchm...

2018-09-26 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22493 @dongjoon-hyun could you help review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use...

2018-09-23 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22495#discussion_r219731873 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SortBenchmark.scala --- @@ -28,12 +28,15 @@ import

[GitHub] spark pull request #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchma...

2018-09-21 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22490#discussion_r219548887 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala --- @@ -30,8 +30,13 @@ import

[GitHub] spark pull request #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use...

2018-09-20 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22495 [SPARK-25486][TEST] Refactor SortBenchmark to use main method ## What changes were proposed in this pull request? Refactor SortBenchmark to use main method. Generate benchmark result

[GitHub] spark pull request #22493: [SPARK-25485][TEST] Refactor UnsafeProjectionBenc...

2018-09-20 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22493 [SPARK-25485][TEST] Refactor UnsafeProjectionBenchmark to use main method ## What changes were proposed in this pull request? Refactor `UnsafeProjectionBenchmark` to use main method

[GitHub] spark issue #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to u...

2018-09-20 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22490 @wangyum Could you help review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22490: [SPARK-25481][TEST] Refactor ColumnarBatchBenchma...

2018-09-20 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22490 [SPARK-25481][TEST] Refactor ColumnarBatchBenchmark to use main method ## What changes were proposed in this pull request? Refactor `ColumnarBatchBenchmark` to use main method. Generate

[GitHub] spark issue #21791: [SPARK-24925][SQL] input bytesRead metrics fluctuate fro...

2018-09-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21791 @kiszk I feel it is hard to add UT, do you have any idea? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21791: [SPARK-24925][SQL] input bytesRead metrics fluctuate fro...

2018-09-08 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21791 @kiszk I see #22324 has been solved, which is one of my PR's dependency actually (see https://issues.apache.org/jira/browse/SPARK-24925?focusedCommentId=16556818

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @cloud-fan, tests have passed. And I will use a followup PR to make it cleaner. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-30 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r213912108 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -1021,6 +1022,113 @@ class

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @cloud-fan I reverted to the previous version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @dongjoon-hyun Sorry for the late response, description is changed to: > Although filter "ID < 100L" is generated by Spark, it fails to pushdown into parquet actually,

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 I will treat the above case as acceptable and will add a duplicated field check for the parquet schema. --- - To unsubscribe, e

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 Both `catalystRequestedSchema` and `parquetSchema` are recursive structure, is there the easy way to find duplicated fields? Thanks

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-29 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @cloud-fan I also think my way changes too much in this PR. > go through the parquet schema and find duplicated field names If user query only query non-duplicated field, this

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-29 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22184#discussion_r213581563 --- Diff: docs/sql-programming-guide.md --- @@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-28 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r213551202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -350,25 +356,38 @@ private[parquet] class

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-28 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22184#discussion_r213519348 --- Diff: docs/sql-programming-guide.md --- @@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-28 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22184#discussion_r213386126 --- Diff: docs/sql-programming-guide.md --- @@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-27 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @dongjoon-hyun In the **schema matched case** as you listed, it is expected behavior in current master. ``` spark.sparkContext.hadoopConfiguration.setInt("parquet.block.size", 8 * 1

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-26 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-25 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @gatorsmile I can help check `spark.sql.caseSensitive` for all the built-in data sources. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212814524 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -350,25 +356,38 @@ private[parquet] class

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212813302 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -1021,6 +1022,116 @@ class

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212813240 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -1021,6 +1022,116 @@ class

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212801680 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -1021,6 +1021,88 @@ class

[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...

2018-08-25 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22197 @cloud-fan @HyukjinKwon Seem cannot simply add `originalName` into `ParquetSchemaType`. Because we need exact ParquetSchemaType info for type match, like: ``` private val

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-25 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212798970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -350,25 +352,43 @@ private[parquet] class

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-24 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212788978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -377,7 +378,7 @@ class ParquetFileFormat

[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...

2018-08-24 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22183 We need to backport it. Without this PR, we cannot solve the data issue in [SPARK-25206] Wrong data may be returned when enable pushdown

[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...

2018-08-23 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22197 [SPARK-25207][SQL] Case-insensitve field resolution for filter pushdown when reading Parquet ## What changes were proposed in this pull request? Currently, filter pushdown will not work

[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

2018-08-20 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22148 I trigged 3 hours ago, but see many Jenkins submission is in the queue. And it says "Jenkins is about to shut down" ? ![image](https://user-images.githubusercontent.com/298957

[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

2018-08-20 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22148 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

2018-08-19 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22148 LGTM. @cloud-fan @gatorsmile Could you kindly help trigger Jenkins and review? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/19788 **Summary** One disk IO solution's performance seems not as good as current PR19877's implementation. **Benchmark** ```scacla spark.range(1, 512000L, 1, 1280).selectExpr

[GitHub] spark issue #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" on mult...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22077 @LantaoJin KafkaSourceStressForDontFailOnDataLossSuite fails occationally, thanks @wangyum for retesting. --- - To unsubscribe, e

[GitHub] spark issue #22077: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22077 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22077: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22077 @cloud-fan Kindly help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22077: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22077 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22077: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22077 @LantaoJin, could you modify the title and description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan Synced with @LantaoJin he will help port to 2.3 soon and I will review it. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan @jerryshao sure, I will do it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-11 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22066#discussion_r209426886 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -404,21 +404,26 @@ abstract class HashExpression[E

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @LantaoJin I realized the initial way had some issue, so I marked it as WIP to refine and add test. It is different from your original implementation, so I would like to use this one

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan Jira and 1st is from this one. It is critical to our 2.3 migration. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-10 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22066#discussion_r209267827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -778,21 +783,22 @@ case class HiveHash(children: Seq

[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-10 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22066#discussion_r209265323 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2831,4 +2831,17 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan @gatorsmile PR has been ready, kindly help review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan I am refining and adding tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-09 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/22066 [SPARK-25084][SQL] "distribute by" on multiple columns may lead to codegen issue ## What changes were proposed in this pull request? "distribute by" on multiple columns

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-07-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/19788 @cloud-fan @gatorsmile I am trying the new method as suggested and I have a question. If we make it **purely server-side** optimization, for external shuffle service, it has no idea how

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-07-30 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/19788 @gatorsmile @cloud-fan @carsonwang I will update it recently. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-07-26 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21156 closed by mistake, reopen it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21156: [SPARK-24087][SQL] Avoid shuffle when join keys a...

2018-07-26 Thread yucai
GitHub user yucai reopened a pull request: https://github.com/apache/spark/pull/21156 [SPARK-24087][SQL] Avoid shuffle when join keys are a super-set of bucket keys ## What changes were proposed in this pull request? To improve the bucket join, when join keys are a super

[GitHub] spark pull request #21156: [SPARK-24087][SQL] Avoid shuffle when join keys a...

2018-07-26 Thread yucai
Github user yucai closed the pull request at: https://github.com/apache/spark/pull/21156 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21791: [SPARK-24832][SQL] Improve inputMetrics's bytesRe...

2018-07-17 Thread yucai
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/21791 [SPARK-24832][SQL] Improve inputMetrics's bytesRead update for ColumnarBatch ## What changes were proposed in this pull request? Currently, ColumnarBatch's bytesRead need to be updated every

[GitHub] spark issue #21156: [SPARK-24087][SQL] Avoid shuffle when join keys are a su...

2018-07-11 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21156 @maryannxue how about this way? Any better idea? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

  1   2   3   >