[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1117


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167734589
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
@@ -197,4 +199,14 @@ public void emptyPartTest() throws Exception {
   BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
 }
   }
+
+  @Test // DRILL-6089
+  public void testJoinOrdering() throws Exception {
+final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
--- End diff --

Thanks for catching this, updated the test.


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167722292
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
@@ -197,4 +199,14 @@ public void emptyPartTest() throws Exception {
   BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
 }
   }
+
+  @Test // DRILL-6089
+  public void testJoinOrdering() throws Exception {
+final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
--- End diff --

I missed this in the prior review but I think the ORDER BY should be on the 
column coming from the left input of the join (assuming that nation table is 
the left input).  The reason is the original code was using 'convertedLeft' to 
propagate the collation trait from the Probe (left) side, so the test should 
match that scenario. Another thing is the you don't necessarily need the 
subquery.. you could  do  `select * from  nation.parquet nation  left outer 
join region.parquet region on ... order by nation.n_name desc`. 


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167694158
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
@@ -197,4 +198,24 @@ public void emptyPartTest() throws Exception {
   BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
 }
   }
+
+  @Test // DRILL-6089
+  public void testJoinOrdering() throws Exception {
+final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
+  "(select * from dfs.`sample-data/region.parquet`) " +
+  "as region on region.r_regionkey = nation.n_nationkey order by 
region.r_name desc";
+final String plan = getPlanInString("EXPLAIN PLAN for " + 
QueryTestUtil.normalizeQuery(query), OPTIQ_FORMAT);
+lastSortAfterJoin(plan);
--- End diff --

The method `testPlanMatchingPatterns(String query, String[] 
expectedPatterns, String[] excludedPatterns)` is not sufficient out of the box 
since I have to add the `Pattern.DOTALL` flag when compiling the pattern in 
order to have the regex match across new lines. So I've added 
`testPlanMatchingPatterns(String query, Pattern[] expectedPatterns, Pattern[] 
excludedPatterns)`


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167675404
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
 ---
@@ -246,4 +246,17 @@ public RelNode convertChild(final DrillJoinRel join,  
final RelNode rel) throws
 
   }
 
+  // DRILL-6089 make sure no collations are added to HashJoin
+  public static RelTraitSet removeCollation(RelTraitSet traitSet, 
RelOptRuleCall call)
+  {
+RelTraitSet newTraitSet = call.getPlanner().emptyTraitSet();
+
+for (RelTrait trait: traitSet) {
+  if (!trait.getTraitDef().getTraitClass().equals(RelCollation.class)) 
{
--- End diff --

I don't see a way to remove a trait with RelTraitSet.replace(). So I made 
this removeCollation method instead.


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167675182
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
 ---
@@ -246,4 +246,17 @@ public RelNode convertChild(final DrillJoinRel join,  
final RelNode rel) throws
 
   }
 
+  // DRILL-6089 make sure no collations are added to HashJoin
+  public static RelTraitSet removeCollation(RelTraitSet traitSet, 
RelOptRuleCall call)
--- End diff --

moved


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167674109
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHashJoinOrdering.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec;
+
+import org.apache.drill.categories.HiveStorageTest;
+import org.apache.drill.categories.SlowTest;
+import org.apache.drill.exec.hive.HiveTestBase;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.test.QueryTestUtil;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category({SlowTest.class, HiveStorageTest.class})
+public class TestHashJoinOrdering extends HiveTestBase {
--- End diff --

removed


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-12 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167673140
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentsRunner.java
 ---
@@ -61,7 +61,7 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FragmentsRunner.class);
   private static final ControlsInjector injector = 
ControlsInjectorFactory.getInjector(FragmentsRunner.class);
 
-  private static final long RPC_WAIT_IN_MSECS_PER_FRAGMENT = 5000;
+  private static final long RPC_WAIT_IN_MSECS_PER_FRAGMENT = 3;
--- End diff --

These changes were added as a separate commit to fix issues with our 
functional tests and build so that I could test this change. I've opened 
separate PRs for these changes so I'll remove them from this PR.


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167441911
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHashJoinOrdering.java
 ---
@@ -0,0 +1,63 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec;
+
+import org.apache.drill.categories.HiveStorageTest;
+import org.apache.drill.categories.SlowTest;
+import org.apache.drill.exec.hive.HiveTestBase;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.test.QueryTestUtil;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category({SlowTest.class, HiveStorageTest.class})
+public class TestHashJoinOrdering extends HiveTestBase {
--- End diff --

I am wondering why a separate HashJoinOrdering test is needed for Hive. The 
Drill join planner is not doing something specific for Hive tables.  If we have 
a Hive test, why not for other storage plugins such as HBase etc but since the 
proposed fix removes Collation property for any input stream of a HashJoin, it 
should not matter where that input originated from. 


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167442088
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
 ---
@@ -246,4 +246,17 @@ public RelNode convertChild(final DrillJoinRel join,  
final RelNode rel) throws
 
   }
 
+  // DRILL-6089 make sure no collations are added to HashJoin
+  public static RelTraitSet removeCollation(RelTraitSet traitSet, 
RelOptRuleCall call)
+  {
+RelTraitSet newTraitSet = call.getPlanner().emptyTraitSet();
+
+for (RelTrait trait: traitSet) {
+  if (!trait.getTraitDef().getTraitClass().equals(RelCollation.class)) 
{
--- End diff --

I suppose you already considered Calcite's RelTraitSet.replace() method and 
found this to be simpler ?


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167442059
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/JoinPruleBase.java
 ---
@@ -246,4 +246,17 @@ public RelNode convertChild(final DrillJoinRel join,  
final RelNode rel) throws
 
   }
 
+  // DRILL-6089 make sure no collations are added to HashJoin
+  public static RelTraitSet removeCollation(RelTraitSet traitSet, 
RelOptRuleCall call)
--- End diff --

Since this is static method, better to add it to PrelUtil class for general 
purpose use. 


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167441485
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/FragmentsRunner.java
 ---
@@ -61,7 +61,7 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FragmentsRunner.class);
   private static final ControlsInjector injector = 
ControlsInjectorFactory.getInjector(FragmentsRunner.class);
 
-  private static final long RPC_WAIT_IN_MSECS_PER_FRAGMENT = 5000;
+  private static final long RPC_WAIT_IN_MSECS_PER_FRAGMENT = 3;
--- End diff --

Not sure why this and the memory settings changes are in this PR..they are 
unrelated to DRILL-6089.  


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-11 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1117#discussion_r167441417
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
@@ -197,4 +198,24 @@ public void emptyPartTest() throws Exception {
   BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
 }
   }
+
+  @Test // DRILL-6089
+  public void testJoinOrdering() throws Exception {
+final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
+  "(select * from dfs.`sample-data/region.parquet`) " +
+  "as region on region.r_regionkey = nation.n_nationkey order by 
region.r_name desc";
+final String plan = getPlanInString("EXPLAIN PLAN for " + 
QueryTestUtil.normalizeQuery(query), OPTIQ_FORMAT);
+lastSortAfterJoin(plan);
--- End diff --

Most plan tests that we have use one of the utility methods in PlanTestBase 
(from which JoinTestBase is derived) which uses Java's regex Pattern and 
Matcher classes.  In your query, is it necessary to check the index of the Sort 
vs the HashJoin ?  Since there is expected to be only 1 Sort (corresponding to 
the final ORDER BY), as long as there is a regex pattern that matches  
`'*Sort*HashJoin',` I think that would be sufficient.  You might want to see 
the callers of [1] if it satisfies your requirement. 

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/PlanTestBase.java#L82


---


[GitHub] drill pull request #1117: DRILL-6089 Removed ordering trait from HashJoin in...

2018-02-07 Thread ilooner
GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1117

DRILL-6089 Removed ordering trait from HashJoin in planner

HashJoin typically does not preserve ordering in most databases. Drill's 
HashJoin operator technically preserved ordering up to this point, but after 
spilling is implemented it will no longer preserve ordering. This change makes 
sure the planner knows that HashJoin does not preserve ordering.

All unit and functional tests are passing @amansinha100 @Ben-Zvi please 
review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6089

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1117


commit 40521a6edb77d018902903892ba356033e7bf7ca
Author: Timothy Farkas 
Date:   2018-01-26T21:46:22Z

DRILL-6089: Removed ordering trait from HashJoin in planner and verified 
the planner does not assume HashJoin preserves ordering.

commit 726bf663186a21e3b54a58d2ed79eaca8d746bbf
Author: Timothy Farkas 
Date:   2018-02-08T07:26:36Z

DRILL-6144: Tune direct memory to prevent unit test hangs on Travis and on 
Jenkins




---