[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r224639756 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223172392 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #18954: [SPARK-17654] [SQL] Enable populating hive bucket...

2018-07-26 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/18954 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-07-26 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 I will close this for now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support

2018-07-26 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/19001 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-02-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r166426610 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -86,6 +86,9 @@ case class RowDataSourceScanExec

[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-02-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r166426184 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -169,10 +171,12 @@ case class LogicalRDD( case class

[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-02-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r166425954 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -103,6 +103,8 @@ case class ExternalRDDScanExec[T

[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-01-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r162769562 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala --- @@ -30,6 +30,8 @@ case class LocalTableScanExec

[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...

2018-01-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -271,23 +325,24 @@ case class

[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...

2018-01-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -220,45 +220,76 @@ case class

[GitHub] spark pull request #19054: [SPARK-18067] Avoid shuffling child if join keys ...

2018-01-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19054#discussion_r162768446 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -220,45 +220,76 @@ case class

[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...

2018-01-18 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19054 cc @hvanhovell @cloud-fan for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20226: [SPARK-23034][SQL] Override `nodeName` for all *S...

2018-01-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r162224065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -45,7 +46,12 @@ trait CodegenSupport extends

[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-01-17 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-01-16 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 The test failure does look legit to me. I have been not able to repro it on my laptop. Intellij doesn't treat it as a test case. Command-line does recognize it as test case but hits runtime

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-13 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 cc @cloud-fan @gatorsmile @sameeragarwal for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-13 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #20226: [SPARK-23034][SQL][UI] Display tablename for `Hiv...

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r161147457 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -62,6 +62,8 @@ case class HiveTableScanExec

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Now that https://github.com/apache/spark/pull/19080 has been merged to trunk, I am rebasing this PR. A small part of this PR is put in https://github.com/apache/spark/pull/20206 and ready

[GitHub] spark pull request #20226: [SPARK-23034][SQL][UI] Display tablename for `Hiv...

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r161036084 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -62,6 +62,8 @@ case class HiveTableScanExec

[GitHub] spark issue #20206: [SPARK-19256][SQL] Remove ordering enforcement from `Fil...

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20206 cc @cloud-fan @gengliangwang for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 @dongjoon-hyun : I have updated the PR description --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 Jenkins retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20206: [SPARK-19256][SQL] Remove ordering enforcement from `Fil...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20206 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 Jenkins test this please. Previous test case failure is not related to the PR ``` org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.(It is not a test

[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 @dongjoon-hyun : I tried it out over master and since the table scan goes via codegen, it wont show the table name. Will update the PR description with this finding. Lets move this discussion

[GitHub] spark pull request #20226: [SPARK-23034][SQL][UI] Display tablename for `Hiv...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r160843749 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -62,6 +62,8 @@ case class HiveTableScanExec

[GitHub] spark pull request #20226: [SPARK-23034][Hive][UI] Display tablename for `Hi...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/20226#discussion_r160843166 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -62,6 +62,8 @@ case class HiveTableScanExec

[GitHub] spark issue #20226: [SPARK-23034][Hive][UI] Display tablename for `HiveTable...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20226 @dongjoon-hyun : For Spark native tables, the table scan node is abstracted out as a `WholeStageCodegen` node in the DAG. A codegen node might be doing more things besides table scan so

[GitHub] spark pull request #20226: [SPARK-23034][Hive][UI] Display tablename for `Hi...

2018-01-10 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/20226 [SPARK-23034][Hive][UI] Display tablename for `HiveTableScan` node in UI ## What changes were proposed in this pull request? For queries which scan multiple tables

[GitHub] spark pull request #20206: [SPARK-19256][SQL] Remove ordering enforcement fr...

2018-01-09 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/20206 [SPARK-19256][SQL] Remove ordering enforcement from `FileFormatWriter` and let planner do that ## What changes were proposed in this pull request? Thks is as per discussion in https

[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20041 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates ca...

2017-12-20 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/20041 checked the test case failure but I dont think its related to this PR. ``` org.apache.spark.sql.execution.datasources.parquet.ParquetQuerySuite.(It is not a test

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-12-20 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 Created https://github.com/apache/spark/pull/20041 for addressing the follow-up comments by @gatorsmile

[GitHub] spark pull request #20041: [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredic...

2017-12-20 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/20041 [SPARK-22042] [FOLLOW-UP] [SQL] ReorderJoinPredicates can break when child's partitioning is not decided ## What changes were proposed in this pull request? This is a followup PR

[GitHub] spark pull request #19725: [DO NOT REVIEW][SPARK-22042] [SQL] Insert shuffle...

2017-12-20 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/19725 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...

2017-12-19 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19977 As per hive implementation of CONCAT(), [these are the rules used](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/ql/src/java/org/apache/hadoop/hive/ql/udf/generic

[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-11-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r153343928 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -602,6 +602,28 @@ abstract class BucketedReadSuite

[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-11-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r153343898 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -265,6 +268,7 @@ case class

[GitHub] spark pull request #19725: [DO NOT REVIEW][SPARK-22042] [SQL] Insert shuffle...

2017-11-11 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/19725 [DO NOT REVIEW][SPARK-22042] [SQL] Insert shuffle nodes in entire tree before applying `ReorderJoinPredicates` trying out suggestion in https://github.com/apache/spark/pull/19257#issuecomment

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-11-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 Or we could move forward with the current approach and defer the refactoring around how shuffles are added in planning phase

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-11-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 @dongjoon-hyun : It will take me time to get back to this. Having said that , its not ideal to have master is bad state. How about disabling the rule by default (using a config

[GitHub] spark pull request #19672: [SPARK-22456] Add support for dayofweek function

2017-11-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19672#discussion_r149256287 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2550,6 +2550,13 @@ object functions { * @group datetime_funcs

[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-11-02 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r148566192 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -247,7 +247,7 @@ abstract class

[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-11-02 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r148558587 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -247,7 +247,7 @@ abstract class

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19222 At high level, this idea is good and worth moving forward with. I still have to dig into your analysis in response to concern raised by @hvanhovell. In terms of the PR itself

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19222 I pulled up frequency of methods from `UTF8String` which are being invoked from FB prod clusters and picked top 25. ``` .writeToMemory() .getBytes() .toString

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19222 Apart from `UTF8String.trim`, can you try other some other method ? If we have to eval perf., its better to pick a method which would be most frequently used... if I have to guess, `trim

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r144457118 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r144457126 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -17,47 +17,168 @@ package

[GitHub] spark issue #19483: [SPARK-21165][SQL] FileFormatWriter should handle mismat...

2017-10-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19483 >> I'll refactor it later, to use requiredChildOrdering to do the sort. The hive bucketing PR does that : https://github.com/apache/spark/pull/19001 I can isolate that piece a

[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19449#discussion_r143312237 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -929,7 +929,7 @@ class

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19449 LGTM. There are already multiple places in codegen where `SQLConf.get` is being used so this will make things consistent

[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19449#discussion_r143291724 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -929,7 +929,7 @@ class

[GitHub] spark issue #19330: Orderable MapType

2017-09-23 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19330 @hvanhovell : based on [your comment over the jira](https://issues.apache.org/jira/browse/SPARK-18134?focusedCommentId=15693519=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-21 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r140336969 --- Diff: common/unsafe/src/main/java/org/apache/spark/sql/catalyst/expressions/HiveHasher.java --- @@ -38,6 +39,10 @@ public static int hashLong(long

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-21 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r140336704 --- Diff: common/unsafe/src/main/java/org/apache/spark/sql/catalyst/expressions/HiveHasher.java --- @@ -38,6 +39,10 @@ public static int hashLong(long

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-21 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r140336092 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -46,6 +47,42 @@ public static int

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-09-21 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 By "placeholder shuffle nodes" you mean dummy ones ? We need to know the exact partitioning of the children which dummy nodes won't give (maybe I didn't get what you meant

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-09-20 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 cc @cloud-fan @gatorsmile @sameeragarwal for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calc...

2017-09-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19281#discussion_r139850950 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -101,14 +101,15 @@ case class SortMergeJoinExec

[GitHub] spark pull request #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calc...

2017-09-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19281#discussion_r139850236 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -64,6 +67,42 @@ class JoinSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calc...

2017-09-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19281#discussion_r139850127 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -64,6 +67,42 @@ class JoinSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calc...

2017-09-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19281#discussion_r139849801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -396,6 +396,26 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark pull request #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calc...

2017-09-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19281#discussion_r139820801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -396,6 +396,26 @@ abstract class SparkPlan extends QueryPlan

[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...

2017-09-16 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/19257 [SPARK-22042] [SQL] ReorderJoinPredicates can break when child's partitioning is not decided ## What changes were proposed in this pull request? See jira description for the bug

[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...

2017-09-16 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19257 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138745656 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -46,6 +47,42 @@ public static int

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138744794 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -46,6 +47,42 @@ public static int

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138744855 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -59,6 +60,18 @@ public static int hashUnsafeWords(Object

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138737907 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/ByteArrayMemoryBlock.java --- @@ -0,0 +1,74 @@ +/* + * Licensed

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2017-09-13 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r138744155 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java --- @@ -18,6 +18,7 @@ package org.apache.spark.unsafe.types

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18692 @cloud-fan : In event when the (set of join keys) is a superset of (child node's partitioning keys), its possible to avoid shuffle : https://github.com/apache/spark/pull/19054 ... this can help

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 @gatorsmile : whats you opinion on the job failure scenario I mentioned in last comment ? --- - To unsubscribe, e-mail

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 @gatorsmile: If the query fails in the middle (eg. tasks are OOMing), hive would have written data to the staging location and not the final output location. So users wont see this partial data

[GitHub] spark pull request #19112: [SPARK-21901][SS] Define toString for StateOperat...

2017-09-04 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19112#discussion_r136892336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -200,7 +202,7 @@ class SourceProgress protected[sql

[GitHub] spark pull request #19124: [SPARK-21912][SQL] Creating ORC datasource table ...

2017-09-04 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19124#discussion_r136877087 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -169,6 +171,16 @@ class OrcFileFormat extends FileFormat

[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-09-04 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18692#discussion_r136868330 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +152,71 @@ object EliminateOuterJoin extends

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-04 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 @gatorsmile : Yes. Hive is not 100% atomic as stuff can go wrong between removing old data and renaming staging location. But its superior in these regards: - Hive would output

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 There is a difference in Hive's semantics vs what this PR is doing. In Hive, the query execution writes to a staging location and the destination location is cleared + re-populated after

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136727120 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136727065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -0,0 +1,81

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726991 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -178,11 +179,50 @@ class AstBuilder(conf: SQLConf

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726921 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1509,4 +1509,86 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1509,4 +1509,86 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-02 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136706616 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -178,11 +179,50 @@ class AstBuilder(conf: SQLConf

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-02 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136706593 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -178,11 +179,50 @@ class AstBuilder(conf: SQLConf

[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...

2017-09-01 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18692 Can we restrict this to cartesian product ONLY ? One clear downside of doing this for other joins is that it will potentially add shuffle in case of (bucketing queries) and (subqueries

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-30 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 https://github.com/apache/spark/pull/19080 is improving the distribution semantic in planner. Will wait for that to get in. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-30 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 ping @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #19080: [SPARK-21865][SQL] remove Partitioning.compatible...

2017-08-29 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r135949966 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,32 @@ import

[GitHub] spark pull request #19080: [SPARK-21865][SQL] remove Partitioning.compatible...

2017-08-29 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r135950331 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -162,64 +156,40 @@ case class

[GitHub] spark pull request #19080: [SPARK-21865][SQL] remove Partitioning.compatible...

2017-08-29 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r135949262 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -153,6 +139,14 @@ case class

[GitHub] spark pull request #19080: [SPARK-21865][SQL] remove Partitioning.compatible...

2017-08-29 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r135949500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,32 @@ import

[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...

2017-08-25 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19054 cc @hvanhovell @cloud-fan for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuffle if ...

2017-08-25 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15605 This is superseded by https://github.com/apache/spark/pull/19054 Closing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15605: [WIP] [SPARK-18067] [SQL] SortMergeJoin adds shuf...

2017-08-25 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/15605 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

  1   2   3   4   5   6   7   8   >