[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-12-03 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23171 @rxin `switch` in Java is still significantly faster than hash set even without boxing / unboxing problems when the number of elements are small. We were thinking about to have two implementations

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-29 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23171 @cloud-fan as @aokolnychyi said, `switch` will still be faster than optimized `Set` without autoboxing when the number of elements are small. As a result, this PR is still very useful

[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23100 I went through the PR again, and it looks right to me. Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder and re...

2018-11-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23100 It's hard to track the huge diffs on renaming. I don't go though it line-by-line. But if they're just renaming, the rest LGTM

[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints

2018-11-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23171 The approach looks great, and can significantly improve the performance. For Long, I agree that we should also implement binary search approach for `O(logn)` look up. Wondering which one

[GitHub] spark pull request #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts,...

2018-11-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23171#discussion_r237227892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -335,6 +343,41 @@ case class In(value: Expression

[GitHub] spark pull request #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts,...

2018-11-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23171#discussion_r237226275 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -335,6 +343,41 @@ case class In(value: Expression

[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23148 Thanks for testing it out. I personally like auto-formatting as my company projects are using scalafmt and we find it's very useful to keep consistent coding style

[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23148 My concern is that let's say we have a code like the following which I copied from `ParquetSchemaPruningSuite.scala`; the scalafmt will complaint the second line is longer than 98 and reformat

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236830497 --- Diff: dev/.scalafmt.conf --- @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236813956 --- Diff: dev/.scalafmt.conf --- @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23139 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23139 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23139#discussion_r236463594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala --- @@ -79,29 +80,31 @@ object

[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23100#discussion_r236411677 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -17,126 +17,512 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23100#discussion_r236410750 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -17,126 +17,512 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #23100: [SPARK-26133][ML] Remove deprecated OneHotEncoder...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23100#discussion_r236410306 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala --- @@ -17,126 +17,512 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule Repla...

2018-11-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/23139#discussion_r236394865 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala --- @@ -79,29 +80,31 @@ object

[GitHub] spark issue #23139: [SPARK-25860][SPARK-26107] [FOLLOW-UP] Rule ReplaceNullW...

2018-11-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23139 Although we are trying to make sure in the caller side to only call `replaceNullWithFalse` when the expression is boolean type, I agree that for safety, we should check it and throw exception

[GitHub] spark issue #23118: [SPARK-26144][BUILD] `build/mvn` should detect `scala.ve...

2018-11-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/23118 Late to the party! Thanks @dongjoon-hyun for taking care of this. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-14 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22967 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-13 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22967 @dongjoon-hyun thanks for trigging the build. The python test script was only looking for scala 2.11 jars resulting python test failures. I just fixed it in the latest push. Let's see how it goes

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r22593 --- Diff: pom.xml --- @@ -2718,7 +2710,6 @@ *:*_2.11

[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...

2018-11-12 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22977 LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22764: [SPARK-25765][ML] Add training cost to BisectingKMeans s...

2018-11-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22764 @mgaido91 I'm on thanksgiving vacation, will be back to community to help code review on Nov 21st. Sorry for the delay

[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22967 Waiting https://github.com/apache/spark/pull/22977 to be merged, and I'll rebase from it and fix the remaining binary incompatibilities

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r232510439 --- Diff: pom.xml --- @@ -2717,7 +2717,6

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r232505402 --- Diff: docs/sparkr.md --- @@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` commands, or if initiali

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r232024914 --- Diff: docs/sparkr.md --- @@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` commands, or if initiali

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r232024557 --- Diff: pom.xml --- @@ -1998,7 +1998,7 @@ --> org.jboss.ne

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r231781938 --- Diff: docs/sparkr.md --- @@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` commands, or if initiali

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r231781635 --- Diff: docs/sparkr.md --- @@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` commands, or if initiali

[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22966 jmh is a framework to write benchmark that can generate standardized reports to be consumed by Jenkins. Here is an example, https://github.com/pvillega/jmh-scala-test/blob/master/src/main

[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22967 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22970: [SPARK-25676][FOLLOWUP][BUILD] Fix Scala 2.12 build erro...

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22970 Merged into master as the compilation finished. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22970: [SPARK-25676][FOLLOWUP][BUILD] Fix Scala 2.12 build erro...

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22970 LGTM. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22966: [PARK-25965][SQL][TEST] Add avro read benchmark

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22966 cc @jleach4 and @aokolnychyi We have a great success using [jmh](http://openjdk.java.net/projects/code-tools/jmh/) for this type of benchmarking; the benchmarks can be written in the unit

[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...

2018-11-07 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22967 @dongjoon-hyun Yeah, seems https://github.com/apache/spark/commit/63ca4bbe792718029f6d6196e8a6bb11d1f20fca breaks the Scala 2.12 build. I'll re-trigger the build once Scala 2.12 build

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22967 [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 ## What changes were proposed in this pull request? This PR makes Spark's default Scala version as 2.12, and Scala 2.11

[GitHub] spark pull request #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to suppo...

2018-11-05 Thread dbtsai
Github user dbtsai closed the pull request at: https://github.com/apache/spark/pull/22953 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertT...

2018-11-05 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22947#discussion_r230975661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -66,6 +66,8 @@ case class AssertTrue(child: Expression

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 ASM6 supports Java 9 while ASM7 supports Java 9, Java 10, and Java 11. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22953 cc @gatorsmile @srowen @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to suppo...

2018-11-05 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22953 [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK11 ## What changes were proposed in this pull request? Upgrade ASM to 7.x to support JDK11 ## How was this patch tested

[GitHub] spark issue #22786: [SPARK-25764][ML][EXAMPLES] Update BisectingKMeans examp...

2018-11-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22786 LGTM. Merged into master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22869: [SPARK-25758][ML] Deprecate computeCost in BisectingKMea...

2018-11-05 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22869 LGTM too. Merged into master. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Restores '-i' option's behaviour in...

2018-11-01 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22919 I'm also on @cloud-fan's side---we should keep it consistent with the upstream Scala Shell. However, we should document it on `./bin/spark-shell --help`, so when a user complains or files a ticket

[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

2018-10-31 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22857 Thanks all for reviewing! The latest change looks good to me too. Merged into master. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-10-30 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22880 I can confirm that this fixes https://issues.apache.org/jira/browse/SPARK-25879 cc @cloud-fan @gatorsmile Thanks

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-10-30 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21320 cc @viirya If we select a nested field and a top level field, the schema pruning will fail. Here is the reproducible test, ```scala testSchemaPruning("select a single co

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228741341 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends Rule

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228739082 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends Rule

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228739018 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseSuite.scala --- @@ -0,0 +1,324

[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

2018-10-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22857 LGTM. @cloud-fan and @gatorsmile, this is the PR I mentioned to you earlier this year in the SF Spark summit which can simplify some of our queries. Also add @dongjoon-hyun

[GitHub] spark pull request #22857: [SPARK-25860][SQL] Replace Literal(null, _) with ...

2018-10-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22857#discussion_r228738623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -736,3 +736,65 @@ object CombineConcats extends Rule

[GitHub] spark issue #22839: [SPARK-25656][SQL][DOC][EXAMPLE][BRANCH-2.4] Add a doc a...

2018-10-25 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22839 Thanks @dongjoon-hyun This LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and examples ...

2018-10-23 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22801 This LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22788: [SPARK-25769][SQL]escape nested columns by backtick each...

2018-10-23 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22788 @cloud-fan I like the idea of using JSON, but that will also change the definition of string format. Do we just use JSON for nested case so the existing data source doesn't have to be changed

[GitHub] spark issue #22788: [SPARK-25769][SQL]escape nested columns by backtick each...

2018-10-23 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22788 @cloud-fan @dongjoon-hyun instead of changing `Filter` API, do you think using proper escaped char like this PR in https://github.com/apache/spark/pull/22573 is a good approach

[GitHub] spark pull request #22597: [SPARK-25579][SQL] Use quoted attribute names if ...

2018-10-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22597#discussion_r225309479 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala --- @@ -383,4 +385,17 @@ class OrcFilterSuite extends

[GitHub] spark issue #22597: [SPARK-25579][SQL] Use quoted attribute names if needed ...

2018-10-12 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22597 In `ParquetFilter`, the way we test if a predicate pushdown works is by removing that predicate from Spark SQL physical plan, and only relying on the reader to do the filter. Thus, if there is a bug

[GitHub] spark issue #22597: [SPARK-25579][SQL] Use quoted attribute names if needed ...

2018-10-12 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22597 Is it possible to add tests like parquet to remove the filter in Spark SQL to ensure that the predicate is pushed down to the reader? Thanks

[GitHub] spark issue #22664: [SPARK-25662][SQL][TEST] Refactor DataSourceReadBenchmar...

2018-10-12 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22664 @peter-toth I assigned to you. Thanks for contribution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22664: [SPARK-25662][SQL][TEST] Refactor DataSourceReadBenchmar...

2018-10-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22664 Thanks @dongjoon-hyun for ping me. LGTM too. We're working on some parquet reader improvement, and this will be useful

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22684 Merged into master. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22684 LGTM. Just some styling feedback. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224179579 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224179447 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala --- @@ -90,32 +107,51 @@ private[orc] object OrcFilters extends Logging

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224178237 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -138,39 +138,75 @@ private[sql] object OrcFilters

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22684#discussion_r224174206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala --- @@ -138,39 +138,75 @@ private[sql] object OrcFilters

[GitHub] spark issue #22679: [SPARK-25559] [FOLLOW-UP] Add comments for partial pushd...

2018-10-09 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22679 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22679: [SPARK-25559] [FOLLOW-UP] Add comments for partial pushd...

2018-10-09 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22679 LGTM. Wait for the PR build. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

2018-09-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22574 I changed the title, and hopefully, it's much more clear now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

2018-09-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22573 I was thinking to change the APIs in `Filter` so we can represent nested fields easier, but also realized that it's a stable public interface. Without changing the interface of `Filter`, we

[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

2018-09-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22574#discussion_r221374514 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -488,26 +494,27 @@ private[parquet] class

[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

2018-09-28 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22574 test this again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

2018-09-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22574#discussion_r221153414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -488,26 +494,25 @@ private[parquet] class

[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

2018-09-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22574#discussion_r221152340 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -488,26 +494,25 @@ private[parquet] class

[GitHub] spark pull request #22573: [SPARK-25558][SQL] Pushdown predicates for nested...

2018-09-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22573#discussion_r221128544 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -437,53 +436,65 @@ object DataSourceStrategy

[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

2018-09-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22574 cc @gatorsmile @cloud-fan @HyukjinKwon @dongjoon-hyun @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22574: [SPARK-25556][SQL] Just remove the unsupported pr...

2018-09-27 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22574 [SPARK-25556][SQL] Just remove the unsupported predicates in Parquet ## What changes were proposed in this pull request? Currently, in `ParquetFilters`, if one of the children predicates

[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...

2018-09-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22535 I'm breaking this PRs into three smaller PR. I'll fix the tests in those smaller PRs. Thanks. --- - To unsubscribe, e-mail

[GitHub] spark issue #22573: [SPARK-25558][SQL] Pushdown predicates for nested fields...

2018-09-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22573 @gatorsmile @cloud-fan @dongjoon-hyun @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22573: [SPARK-25558][SQL] Pushdown predicates for nested...

2018-09-27 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22573 [SPARK-25558][SQL] Pushdown predicates for nested fields in DataSource Strategy ## What changes were proposed in this pull request? This PR allows Spark to create predicates for nested

[GitHub] spark pull request #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdow...

2018-09-24 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/22535 [SPARK-17636][SQL][WIP] Parquet predicate pushdown in nested fields ## What changes were proposed in this pull request? Support Parquet predicate pushdown in nested fields ## How

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22418#discussion_r218272427 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -50,6 +55,66 @@ abstract class OrcSuite

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22418#discussion_r218158845 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -50,6 +55,66 @@ abstract class OrcSuite

[GitHub] spark issue #22431: [SPARK-24418][FOLLOWUP][DOC] Update docs to show Scala 2...

2018-09-15 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22431 Thanks! Merged into both branch 2.4 and master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-13 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22394 LGTM. Merged into master and branch 2.4. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22409: [SPARK-25352][SQL][Followup] Add helper method and addre...

2018-09-13 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22409 LGTM. Wait for the test. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r217128163 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -68,22 +68,42 @@ abstract class SparkStrategies extends

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-12 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22357 Thanks all again. Merged into 2.4 branch and master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22357 LGTM. Thank you all for participating the discussion. @cloud-fan and @gatorsmile, do you have any further comment? If not, I would like to merge it tomorrow into both master and rc branch

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216776055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22357 FYI, @mallman I'm working on having `ParquetFilter` to support `IsNotNull(employer.id)` to be pushed into parquet reader

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216559045 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216204022 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -155,6 +161,47 @@ class

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-09 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216202879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,12 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-09 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22357 cc @beettlle --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

  1   2   3   4   5   6   7   8   9   10   >