[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2016-02-20 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155710#comment-15155710
 ] 

Jacques Nadeau commented on DRILL-3623:
---

[~sudheeshkatkam], can you break this into more subtasks? If it is WIP, let's 
break out the sub tasks and share them.

> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2016-02-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144934#comment-15144934
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/364#issuecomment-183425694
  
The update does not handle all type (because some types were out of scope). 
The [latest patch](https://github.com/sudheeshkatkam/drill/commits/limit_zero) 
(WIP) is rebased on top of Hsuan's work.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2016-02-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143257#comment-15143257
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/364#issuecomment-183008840
  
@hhsuanyi I did not open any calcite jira. The new calcite version was 
there from Sudheesh's commit which I pulled in.

@sudheeshkatkam Does your update handle limit 0 for all types?



> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2016-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139835#comment-15139835
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/364#issuecomment-182089660
  
@StevenMPhillips This is still WIP, right?

@hsuanyi and I plan to post an update soon.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2016-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135208#comment-15135208
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

GitHub user StevenMPhillips opened a pull request:

https://github.com/apache/drill/pull/364

DRILL-3623: Optimize limit 0 queries

This pulls some patches that Sudheesh had previously worked on but not 
committed. In addition, fixes some problems when the type is not one of the 
json ExtendTypes.

There are still problems with float, decimal, and interval_day/year.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StevenMPhillips/drill limit_0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/364.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #364


commit 8c3f2e8911dabc93f7b3f381600ccfaccc008938
Author: Sudheesh Katkam 
Date:   2015-11-17T23:12:48Z

DRILL-3623: Use shorter exec path for LIMIT 0 queries when schema of the 
root logical node is known

+ DRILL-4043: Perform LIMIT 0 optimization before logical transformation

commit 3be63cd5b95891cbfee1bdb6cfa2baaaf18f31c8
Author: Sudheesh Katkam 
Date:   2015-12-22T19:07:39Z

WIP

commit b4061e46fe91b747bc2afe82e66cd6d0bc060d1b
Author: Steven Phillips 
Date:   2016-01-31T01:49:04Z

Make limit 0 return correct type for VARCHAR

Still does not work correctly for float4, decimal, or interval types




> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014511#comment-15014511
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45408783
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
 ---
@@ -42,6 +43,25 @@ public void hiveReadWithDb() throws Exception {
   }
 
   @Test
+  public void simpleLimitZero() throws Exception {
+testBuilder()
+.sqlQuery("select * from hive.kv limit 0")
+.expectsEmptyResultSet()
+.baselineColumns("key", "value")
--- End diff --

will do.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014506#comment-15014506
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45408655
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -36,16 +48,62 @@
  * executing a schema-only query.
  */
 public class FindLimit0Visitor extends RelShuttleImpl {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FindLimit0Visitor.class);
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FindLimit0Visitor.class);
+
+  /**
+   * Values in the {@link 
DrillConstExecutor#DRILL_TO_CALCITE_TYPE_MAPPING} map.
+   * + without {@link SqlTypeName#ANY} (avoid late binding)
+   * + without {@link SqlTypeName#VARBINARY} ({@link DrillValuesRel 
values} does not support this)
+   * + with {@link SqlTypeName#CHAR} ({@link DrillValuesRel values} 
supports this, but the above mapping does not
+   *   contain this type.
+   */
+  private static final ImmutableSet TYPES = 
ImmutableSet.builder()
+  .add(SqlTypeName.INTEGER, SqlTypeName.BIGINT, SqlTypeName.FLOAT, 
SqlTypeName.DOUBLE, SqlTypeName.VARCHAR,
+  SqlTypeName.BOOLEAN, SqlTypeName.DATE, SqlTypeName.DECIMAL, 
SqlTypeName.TIME, SqlTypeName.TIMESTAMP,
+  SqlTypeName.INTERVAL_YEAR_MONTH, SqlTypeName.INTERVAL_DAY_TIME, 
SqlTypeName.MAP, SqlTypeName.ARRAY,
+  SqlTypeName.TINYINT, SqlTypeName.SMALLINT, SqlTypeName.CHAR)
+  .build();
 
   private boolean contains = false;
 
+  /**
+   * Checks if the root portion of the RelNode tree contains a limit 0 
pattern.
+   */
   public static boolean containsLimit0(RelNode rel) {
 FindLimit0Visitor visitor = new FindLimit0Visitor();
 rel.accept(visitor);
 return visitor.isContains();
   }
 
+  /**
+   * If all field types of the given node are {@link #TYPES recognized 
types}, then this method returns the tree:
+   *   DrillLimitRel(0)
+   * \
+   * DrillValuesRel(field types)
+   * Otherwise, the method returns null.
+   */
+  public static DrillRel getValuesRelIfFullySchemaed(final RelNode rel) {
+final List fieldList = 
rel.getRowType().getFieldList();
+final ImmutableList.Builder tupleBuilder = new 
ImmutableList.Builder<>();
+final RexBuilder literalBuilder = new 
RexBuilder(rel.getCluster().getTypeFactory());
+for (final RelDataTypeField field : fieldList) {
+  if (!TYPES.contains(field.getType().getSqlTypeName())) {
+return null;
+  } else {
+tupleBuilder.add((RexLiteral) literalBuilder.makeLiteral(null, 
field.getType(), false));
+  }
+}
+
+final ImmutableList> tuples = new 
ImmutableList.Builder>()
+.add(tupleBuilder.build())
+.build();
+final RelTraitSet traits = 
rel.getTraitSet().plus(DrillRel.DRILL_LOGICAL);
+// TODO: ideally, we want the limit to be pushed into values
+final DrillValuesRel values = new DrillValuesRel(rel.getCluster(), 
rel.getRowType(), tuples, traits);
--- End diff --

I did this to avoid logical transformation completely, as this is an 
exceptional case.

LogicalValues [allows an empty list of 
tuples](https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/logical/LogicalValues.java#L101)
 but as Jacques pointed out Drill does not handle that well (we use data to 
encode types).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014479#comment-15014479
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-158212142
  
Overall, looks good to me.  +1



> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014478#comment-15014478
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45407282
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
 ---
@@ -42,6 +43,25 @@ public void hiveReadWithDb() throws Exception {
   }
 
   @Test
+  public void simpleLimitZero() throws Exception {
+testBuilder()
+.sqlQuery("select * from hive.kv limit 0")
+.expectsEmptyResultSet()
+.baselineColumns("key", "value")
+.go();
+  }
+
+  @Test
+  public void simpleLimitZeroPlan() throws Exception {
+final String[] expectedPlan = {
+  ".*Limit.*\n" +
+  ".*Values.*"
+};
+final String[] excludedPlan = {};
+PlanTestBase.testPlanMatchingPatterns("select * from hive.kv limit 0", 
expectedPlan, excludedPlan);
+  }
+
--- End diff --

Also, you may consider adding a negative test, such as "schemed limit 0", 
but select clause has a function. 

select substr(char_col, 1, 5) from hive.table limit 0;

In such case, I assume this patch would not do the short cut, and the 
function return type is not known for now in Drill's planning time.



> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014471#comment-15014471
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45406939
  
--- Diff: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
 ---
@@ -42,6 +43,25 @@ public void hiveReadWithDb() throws Exception {
   }
 
   @Test
+  public void simpleLimitZero() throws Exception {
+testBuilder()
+.sqlQuery("select * from hive.kv limit 0")
+.expectsEmptyResultSet()
+.baselineColumns("key", "value")
--- End diff --

In the test with schemed "limit 0" case, I think we had better verify the 
column type as well. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014469#comment-15014469
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45406832
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -36,16 +48,62 @@
  * executing a schema-only query.
  */
 public class FindLimit0Visitor extends RelShuttleImpl {
-  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FindLimit0Visitor.class);
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FindLimit0Visitor.class);
+
+  /**
+   * Values in the {@link 
DrillConstExecutor#DRILL_TO_CALCITE_TYPE_MAPPING} map.
+   * + without {@link SqlTypeName#ANY} (avoid late binding)
+   * + without {@link SqlTypeName#VARBINARY} ({@link DrillValuesRel 
values} does not support this)
+   * + with {@link SqlTypeName#CHAR} ({@link DrillValuesRel values} 
supports this, but the above mapping does not
+   *   contain this type.
+   */
+  private static final ImmutableSet TYPES = 
ImmutableSet.builder()
+  .add(SqlTypeName.INTEGER, SqlTypeName.BIGINT, SqlTypeName.FLOAT, 
SqlTypeName.DOUBLE, SqlTypeName.VARCHAR,
+  SqlTypeName.BOOLEAN, SqlTypeName.DATE, SqlTypeName.DECIMAL, 
SqlTypeName.TIME, SqlTypeName.TIMESTAMP,
+  SqlTypeName.INTERVAL_YEAR_MONTH, SqlTypeName.INTERVAL_DAY_TIME, 
SqlTypeName.MAP, SqlTypeName.ARRAY,
+  SqlTypeName.TINYINT, SqlTypeName.SMALLINT, SqlTypeName.CHAR)
+  .build();
 
   private boolean contains = false;
 
+  /**
+   * Checks if the root portion of the RelNode tree contains a limit 0 
pattern.
+   */
   public static boolean containsLimit0(RelNode rel) {
 FindLimit0Visitor visitor = new FindLimit0Visitor();
 rel.accept(visitor);
 return visitor.isContains();
   }
 
+  /**
+   * If all field types of the given node are {@link #TYPES recognized 
types}, then this method returns the tree:
+   *   DrillLimitRel(0)
+   * \
+   * DrillValuesRel(field types)
+   * Otherwise, the method returns null.
+   */
+  public static DrillRel getValuesRelIfFullySchemaed(final RelNode rel) {
+final List fieldList = 
rel.getRowType().getFieldList();
+final ImmutableList.Builder tupleBuilder = new 
ImmutableList.Builder<>();
+final RexBuilder literalBuilder = new 
RexBuilder(rel.getCluster().getTypeFactory());
+for (final RelDataTypeField field : fieldList) {
+  if (!TYPES.contains(field.getType().getSqlTypeName())) {
+return null;
+  } else {
+tupleBuilder.add((RexLiteral) literalBuilder.makeLiteral(null, 
field.getType(), false));
+  }
+}
+
+final ImmutableList> tuples = new 
ImmutableList.Builder>()
+.add(tupleBuilder.build())
+.build();
+final RelTraitSet traits = 
rel.getTraitSet().plus(DrillRel.DRILL_LOGICAL);
+// TODO: ideally, we want the limit to be pushed into values
+final DrillValuesRel values = new DrillValuesRel(rel.getCluster(), 
rel.getRowType(), tuples, traits);
--- End diff --

FindLimit0Visitor.containsLimit0(relNode) is called when relNode is 
LogicalRel.  Can we create LogicalValue with empty list of tuple, in stead of 
creating DrillValues with one tuple followed by a DrillLimit(0) ? 

I think LogicalValue would allow an empty list of tuple, right?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011802#comment-15011802
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-157839747
  
Does the latest patch look fine? Also, please see [the note in 
FindLimit0Visitor](https://github.com/apache/drill/pull/193/files#diff-6d8243a8f379ee0041618329fb132ca6R54).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010297#comment-15010297
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45162213
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

The `getValuesRelIfFullySchemaed(...)` check is done before logical 
transformation to avoid creating expensive objects while applying rules (unless 
rules are ordered). For example, DrillScanRule creates a DrillScanRel object 
which constructs a group scan object that can be quite expensive (see HiveScan, 
MongoGroupScan, HbaseGroupScan).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010015#comment-15010015
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45150593
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

My note was in reference to DrillReduceExpressionsRule.

Basically, you could be able to modify this classes implementations of 
createEmptyRelOrEquivalent() to switch to a values (with fake data) operator 
followed by a limit(0). At least that was the thought.




> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009832#comment-15009832
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45138996
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

@jinfengni I'll post the patch with Values approach.

@julianhyde That makes sense.

@jacques-n Since we use data to encode types in Values, pushing Limit 
(specially zero) into into Values requires a sizable change (?). Also, I think 
[there is a 
bug](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillValuesRel.java#L141)
 since Calcite allows for creating a LogicalValues [with types and without 
literals](https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/logical/LogicalValues.java#L101),
 and Drill does not handle this case.

Regarding the above approach (putting a limit 0 on top of scan), I think 
fast schema wasn't sufficient because there was only one record batch during 
execution (extreme case). Right now, I do not see any specific issue there.

Regarding the note, can you expand on what you mean? Change the visitor 
pattern to a logical rule? Or is there a logical rule that conflicts with this 
change, and this shorter path should be avoided?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009085#comment-15009085
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45090762
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

One other note on the Calcite rule: it seems like we should just modify the 
Calcite mainline rule to avoid applying the zero records values operator 
optimization in the case that a column is not yet bound to a type (is an ANY 
column). That way we can stop maintaining our version of the rule. @jaltekruse, 
thoughts?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009078#comment-15009078
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45090426
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

There is one challenge with the Values execution in Drill. We use data to 
encode types (and generate the vectors). It seems like the ideal would be 
expressing a values operation that has no data. Maybe we should just support a 
local limit in the values operator? That would allow us to bypass adding the 
limit(0) and sv remover for the simple case. Generally, we should probably 
support leaf node limit pushdown anyway. I see the new patch takes a different 
approach to the one above. One of the things that seemed to be an issue above 
is that the Limit operator was properly terminating its parents in the fast 
schema case of a limit 0. @sudheeshkatkam and @jinfengni, do you agree that is 
an issue? If it is, we should get a JIRA opened for it.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007785#comment-15007785
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45012029
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

Yeah, I think DrillDirectScanRel is a wrong turn for this functionality, 
because you can't see that it is empty and reason about it. A DirectGroupScan 
is a runtime thing, so shouldn't be floating around at planning time.

I notice that DrillValuesRel does not extend Values. If it did, it would 
get all of the rules in PruneEmptyRules for free - pruning away Filter, Project 
etc. on top of it.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007751#comment-15007751
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45010657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

Sounds like that the DrillDirectScanRel approach works only when "limit 0" 
is on the root level; it would not apply when "limit 0" is inside a subquery. 

The Values approach may apply for the case "limit 0" in a subquery. 
Certainly, this JIRA only targets the case of "limit 0" on root. But with the 
VALUES approach, there is room for improvement for a more general scenarios. 




> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007723#comment-15007723
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45008717
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

Should I stick to the current impl?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007685#comment-15007685
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r45006594
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

With DrillDirectScanRel, the execution plan is: *Screen <-- Project <-- 
DrillDirectScanRel*. In my experiments ([TPCDS 
queries](https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional/tpcds/original/hive)),
 the total time did not cross 0.6 s.

With VALUES operator, there needs to be a LIMIT(0) operator on top. Then, 
the execution plan is: *Screen <-- Project <-- SVR <-- Limit <-- Values*. In my 
experiments, sometimes (seen twice) Project and SVR take ~0.5s to setup, and 
the query take 2s to complete, but this is **not reproducible.**


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005037#comment-15005037
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r44852176
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillDirectScanRel.java
 ---
@@ -0,0 +1,111 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.plan.RelOptCluster;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.AbstractRelNode;
+import org.apache.calcite.rel.RelWriter;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.drill.common.logical.data.LogicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.physical.DrillScanPrel;
+import org.apache.drill.exec.planner.physical.PhysicalPlanCreator;
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.physical.PrelUtil;
+import org.apache.drill.exec.planner.physical.visitor.PrelVisitor;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.store.direct.DirectGroupScan;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+/**
+ * Logical and physical RelNode representing a {@link DirectGroupScan}. 
This is not backed by a {@link DrillTable},
+ * unlike {@link DrillScanRel}.
+ */
+public class DrillDirectScanRel extends AbstractRelNode implements 
DrillScanPrel, DrillRel {
--- End diff --

Can we just use the VALUES operator instead? I think that Calcite already 
has the code to do this in the reduceexpressionsrule.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005000#comment-15005000
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-156602174
  
Updated the patch: for LIMIT 0 queries, short circuit before logical 
transformation if all field types are known for the root node.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979707#comment-14979707
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152059989
  
Just to add to my comment above, if you want to do a quick call or hangout 
to discuss I'm more than happy to. As I said above, it is possible I am 
misunderstanding. If so, I'll definitely revise my objection.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979701#comment-14979701
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152056732
  
I'm sorry to say that I'm -1 on this change

It seems to be adding a planning rewrite rule where there should be a 
simple fix execution bug. Let's just fix the execution bug. 

Limit 0 should complete its execution the moment it receives a schema (as 
part of fast schema). It doesn't need to receive any records. You just 
described a situation where it is waiting for records from a blocking operator. 
That shouldn't be the case. If there is some other real benefit to this change 
after that execution bug is fixed, let's revisit in that light.

If you think I'm misunderstanding your description of the execution 
behavior or the dynamics involved, please help me to better understand. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979671#comment-14979671
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152053724
  
I think I see the source of confusion (sorry); this patch does not address 
that query in the JIRA, which is why Jinfeng asked me to change the title in 
one of his comments. Regarding that query, DRILL-3921 helps avoids most of the 
execution time, but we still incur the planning time. And my initial approach 
address this issue but as mentioned above, this is blocked by DRILL-2288 and 
other things.

The new approach actually addresses any query that has a limit 0 above a 
blocking operator that consumes all records. And avoiding parallelization made 
the query much worse. (Actually, was fast-schema supposed to still kick in? Did 
not seem like it from my experiments.)

I tested against a query like `SELECT * FROM (SELECT COUNT(DISTINCT a), 
COUNT(DISTINCT b), COUNT(DISTINCT c) FROM very_large_table) T LIMIT 0` and this 
completed two orders of magnitude faster.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979632#comment-14979632
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152047328
  
Interesting. Can you explain where the time is coming from? It isn't clear 
to me why this will have a big impact over what we had before. While you're 
pushing the limit down to just above the scan nodes, we already had an 
optimization which avoided parallelization. Since we're pipelined this really 
shouldn't matter much. Is limit zero not working right in the limit operator? 
It should terminate upon receiving schema, not wait until a batch of actual 
records (I'm wondering if it is doing the latter). Is sending zero records 
through causing operators to skip compilation? In what cases was this change 
taking something from hundreds of seconds to a few seconds? I'm asking these 
questions so I can better understand as I want to make sure there isn't a bug 
somewhere else. Thanks!


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979610#comment-14979610
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43339305
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

Thank you Julian, RelBuilder seems perfect for this case.

Jinfeng, for now, making this visitor more general and using RelBuilder 
needs bigger changes, so I am adding a TODO(DRILL-3993).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979502#comment-14979502
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152034356
  
Also, on the execution side, I was actually hitting 
[DRILL-2288](https://issues.apache.org/jira/browse/DRILL-2288), where sending 
exactly one batch with schema and without data is not handled correctly by 
various RecordBatches. With a fix for that issue, we could add further 
optimization for schemaed tables (i.e. add the previous implementation) with 
this implementation as the fallback.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979500#comment-14979500
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152034091
  
Sudheesh and I feel this new approach is more like a big optimization step 
towards solving the performance issue for "limit 0" query, rather than hack 
solution :  1) It shows quite significantly reduction in query time, from 
hundreds of seconds to couple of seconds in some cases. That's a big 
improvement. 2) it would benefit not only schema-based query, but also 
schema-less query, while the original approach would only apply for 
schema-based query. 

I agree we should continue to optimize "limit 0" query. But for now, I 
think this new approach has its own merits.

The aggregation /implicit casting are the two things that I can think of, 
if we go with the schema-based approach. 
 



> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979395#comment-14979395
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152018974
  
Got it. Thanks for the explanation. So this is a hack until we can solve 
those issues.

I think we have to do this work, however. a 1-2 second response to a limit 
0 query is too much. We should open up jiras for all of these inconsistency 
issues and then get Calcite and Drill in alignment. 

What do you think we're talking about: aggregation outputs, implicit 
casting. What else? 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979348#comment-14979348
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152013964
  
The original approach (skipping the execution phase for limit 0 
completely), actually could potentially have issues in some cases, due to the 
difference in Calcite rule and Drill execution rule, in terms of how type is 
determined.

For example, sum(int) in calcite is resolved to be int, while in Drill 
execution, we changed to bigint. Another case is implicit cast. Currently, 
there are some small differences between Calcite and Drill execution. That 
means, if we skip the execution for limit 0, then types which are resolved in 
Calcite could be different from the type if the query goes through Drill 
execution. For BI tool like Tableau, that means the type returned from "limit 
0" query and type from a second query w/o "limit 0" could be different. 

If we want to avoid the above issues, we have to detect all those cases, 
which are painful. That's why Sudheesh and I are now more inclined to this new 
approach. 
 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979292#comment-14979292
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-152001143
  
What happened to the original strategy of short circuiting on schema'd 
files. This approach still means we have to pay for all the operation 
compilations for no reason.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979213#comment-14979213
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-151987787
  
Please modify the title of JIRA DRILL-3623, since the new pull request is 
using a completely different approach to address the performance issue for 
"LIMIT 0".


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979209#comment-14979209
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43316183
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

Agree with @jinfengni. In more recent versions of Calcite, use 
RelBuilder.limit() or .sortLimit(). The RelBuilder will be configured to create 
the appropriate Drill variants of all RelNodes. It might also do some useful 
canonization/optimization. We recommend using RelBuilder for most tasks 
involving creating RelNodes.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979198#comment-14979198
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43315255
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
--- End diff --

I understand in your case, you only putDrillLimitRel. But you may want to 
make this Visitor more general, such that it could create any kind of LimitRel, 
including DrillLimitRel, LogicalLimitRel, DrillLimitPrel, etc. You can do that 
by define a LimitFactory, and pass to this Visitor. This's similar to what 
other Calcite rule would do, to make the code more general. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979189#comment-14979189
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r43314473
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ---
@@ -46,6 +51,32 @@ public static boolean containsLimit0(RelNode rel) {
 return visitor.isContains();
   }
 
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final RelShuttleImpl shuttle = new RelShuttleImpl() {
+
+  private RelNode addLimitAsParent(RelNode node) {
+final RexBuilder builder = node.getCluster().getRexBuilder();
+final RexLiteral offset = 
builder.makeExactLiteral(BigDecimal.ZERO);
+final RexLiteral fetch = builder.makeExactLiteral(BigDecimal.ZERO);
+return new DrillLimitRel(node.getCluster(), node.getTraitSet(), 
node, offset, fetch);
+  }
+
+  @Override
+  public RelNode visit(TableScan scan) {
--- End diff --

You also need override visitValues, since Value could be leaf operator as 
well, in addition to TableScan. 


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977652#comment-14977652
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on the pull request:

https://github.com/apache/drill/pull/193#issuecomment-151711063
  
@jinfengni please review.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Jinfeng Ni
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955464#comment-14955464
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41908422
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/NonDeferredSchemaTableLimit0Visitor.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelShuttleImpl;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.sql.type.SqlTypeFactoryImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.planner.types.DrillRelDataTypeSystem;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.RecordReader;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ *
--- End diff --

Yes, this is a reminder for me that I should add comments.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955462#comment-14955462
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41908298
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 ---
@@ -162,6 +166,9 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ValidationException, RelConv
 
 log("Optiq Logical", queryRelNode, logger);
 DrillRel drel = convertToDrel(queryRelNode, validatedRowType);
+if (otherPlan != null) {
--- End diff --

The DrillRel node for pojo record reader needs to have corresponding Prel 
and Prule, right?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951271#comment-14951271
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41682242
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
 ---
@@ -179,4 +179,9 @@ private void throwUnsupportedHiveDataTypeError(String 
hiveType) {
 
 throw new RuntimeException(errMsg.toString());
   }
+
+  @Override
+  public boolean providesDeferredSchema() {
--- End diff --

I think DrillHiveTable or other schema-aware table would return 
RelRecordType (in Calcite), when calling typeFactory.createStructType() as 
RowType.

For DrillDynamicTable, which is schema-less, the RowType would be 
RelDataTypeDrillImpl.





> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950396#comment-14950396
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41630368
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/NonDeferredSchemaTableLimit0Visitor.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelShuttleImpl;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.sql.type.SqlTypeFactoryImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.planner.types.DrillRelDataTypeSystem;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.RecordReader;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ *
+ */
+public class NonDeferredSchemaTableLimit0Visitor extends RelShuttleImpl {
--- End diff --

I think the logic here is incorrect/incomplete. We should be: 

1. traverse the entire tree to determine if all the table leaves are 
FullySchemaed and the root portion of the tree is limit 0. 
  - For schema detection, a better possibility would be to determine if the 
output is fully schemaed as opposed to all the inputs are fully schemaed. Maybe 
this would work if we verify that all of the fields of the root node are not an 
ANY type. @jinfengni could provide comments here.
2. If (1) is true, then we should read the schema of the root node of the 
tree
3. Create a reader for the root node of the tree.

The query here can be arbitrarily complex (many joins, group bys, etc).


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950384#comment-14950384
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41629498
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/NonDeferredSchemaTableLimit0Visitor.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelShuttleImpl;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.sql.type.SqlTypeFactoryImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.planner.types.DrillRelDataTypeSystem;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.RecordReader;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ *
--- End diff --

A little more explanation in this javadoc would be good :)


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950383#comment-14950383
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41629465
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/NonDeferredSchemaTableLimit0Visitor.java
 ---
@@ -0,0 +1,126 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+import com.google.common.collect.Lists;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelShuttleImpl;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.sql.type.SqlTypeFactoryImpl;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.planner.logical.DrillConstExecutor;
+import org.apache.drill.exec.planner.logical.DrillScanRel;
+import org.apache.drill.exec.planner.logical.DrillTable;
+import org.apache.drill.exec.planner.types.DrillRelDataTypeSystem;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.RecordReader;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ *
+ */
+public class NonDeferredSchemaTableLimit0Visitor extends RelShuttleImpl {
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(NonDeferredSchemaTableLimit0Visitor.class);
+
+  private RecordReader reader;
+
+  public static RecordReader getReader(RelNode node) {
+final NonDeferredSchemaTableLimit0Visitor visitor = new 
NonDeferredSchemaTableLimit0Visitor();
+node.accept(visitor);
+return visitor.getReader();
+  }
+
+  private NonDeferredSchemaTableLimit0Visitor() {
+  }
+
+  RecordReader getReader() {
+return reader;
+  }
+
+  @Override
+  public RelNode visit(TableScan scan) {
+if (scan instanceof DrillScanRel) {
+  final DrillTable table = ((DrillScanRel) scan).getDrillTable();
+
+  System.out.println("here?");
--- End diff --

yes?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950381#comment-14950381
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41629260
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
 ---
@@ -179,4 +179,9 @@ private void throwUnsupportedHiveDataTypeError(String 
hiveType) {
 
 throw new RuntimeException(errMsg.toString());
   }
+
+  @Override
+  public boolean providesDeferredSchema() {
--- End diff --

or maybe simply

isFullySchemaed()


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950374#comment-14950374
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41628985
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 ---
@@ -162,6 +166,9 @@ public PhysicalPlan getPlan(SqlNode sqlNode) throws 
ValidationException, RelConv
 
 log("Optiq Logical", queryRelNode, logger);
 DrillRel drel = convertToDrel(queryRelNode, validatedRowType);
+if (otherPlan != null) {
--- End diff --

This is a hack. Let's create a relnode which is equivalent to using a 
pojorecord reader and then when we create the plan, it will effectively be a 
direct plan.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950369#comment-14950369
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41628774
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DirectPlan.java 
---
@@ -45,11 +52,15 @@ public static PhysicalPlan 
createDirectPlan(QueryContext context, boolean result
   public static  PhysicalPlan createDirectPlan(QueryContext context, T 
obj){
 Iterator iter = (Iterator) Collections.singleton(obj).iterator();
 return createDirectPlan(context.getCurrentEndpoint(), iter, (Class) 
obj.getClass());
+  }
 
+  public static  PhysicalPlan createDirectPlan(DrillbitEndpoint 
endpoint, Iterator iterator, Class clazz) {
+PojoRecordReader reader = new PojoRecordReader<>(clazz, iterator);
+return createDirectPlan(endpoint, reader, ScanStats.TRIVIAL_TABLE);
   }
-  public static  PhysicalPlan createDirectPlan(DrillbitEndpoint 
endpoint, Iterator iterator, Class clazz){
-PojoRecordReader reader = new PojoRecordReader(clazz, iterator);
-DirectGroupScan scan = new DirectGroupScan(reader);
+
+  public static PhysicalPlan createDirectPlan(DrillbitEndpoint endpoint, 
RecordReader reader, ScanStats stats) {
--- End diff --

Let's keep the default behavior static constructor (which uses trivial) and 
then we don't have to update the constructor everywhere.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950363#comment-14950363
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41628570
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
 ---
@@ -179,4 +179,9 @@ private void throwUnsupportedHiveDataTypeError(String 
hiveType) {
 
 throw new RuntimeException(errMsg.toString());
   }
+
+  @Override
+  public boolean providesDeferredSchema() {
--- End diff --

Double negative here. How about 
hasKnownSchema()

With javadoc.


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950364#comment-14950364
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/193#discussion_r41628637
  
--- Diff: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
 ---
@@ -179,4 +179,9 @@ private void throwUnsupportedHiveDataTypeError(String 
hiveType) {
 
 throw new RuntimeException(errMsg.toString());
   }
+
+  @Override
+  public boolean providesDeferredSchema() {
--- End diff --

I believe that there other places (the select * replacement) which depend 
on the same type of property. We should use a single method to determine this. 
@jinfengni , can you point Sudheesh in the right direction?


> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949565#comment-14949565
 ] 

ASF GitHub Bot commented on DRILL-3623:
---

GitHub user sudheeshkatkam opened a pull request:

https://github.com/apache/drill/pull/193

DRILL-3623: Use shorter query path for LIMIT 0 queries on schema-ed tables

Initial patch.

DrillTable#providesDeferredSchema function is used by the 
NonDeferredSchemaTableLimit0Visitor to check if the table can provide schema 
directly, and if so the result is directly returned.

It seems the shorter query path for this query needs a hacky "otherPlan" in 
the DefaultSqlHandler without major refactoring (Should I go ahead and make 
changes?). This also means that "EXPLAIN PLAN ..." returns a plan that is 
different the actual query plan (without a check in ExplainHandler, another 
hack).

I think the classes need more meaningful names 
(NonDeferredSchemaTableLimit0Visitor).

Also, note the type conversion using CALCITE_TO_DRILL_TYPE_MAPPING.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sudheeshkatkam/drill DRILL-3623

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit a766c54b34697df8b851204705ea1ce16c7114b7
Author: Sudheesh Katkam 
Date:   2015-10-08T22:38:00Z

DRILL-3623: Use shorter query path for LIMIT 0 queries on schema-ed tables




> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
> Fix For: Future
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-08-27 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717327#comment-14717327
 ] 

Rahul Challapalli commented on DRILL-3623:
--

The below query returned just fine (on top of hive partitioned table)
{code}
select * from hive.lineitem_text_partitioned_hive_hier_intint limit 0;
+-+++---+-+--+-++---+---+-+---++-+-++---++
| l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
l_shipdate  | l_commitdate  | l_receiptdate  | l_shipinstruct  | l_shipmode  | 
l_comment  | year  | month  |
+-+++---+-+--+-++---+---+-+---++-+-++---++
+-+++---+-+--+-++---+---+-+---++-+-++---++
No rows selected (5.616 seconds)
{code}

I also created a view  on top of the above partitioned hive table and was able 
to run a "limit 0" query on top of it..

One thing to note however is that the query took a lot of time to return for a 
small data set (~6 seconds)

> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Venki Korukanti
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3623) Hive query hangs with limit 0 clause

2015-08-11 Thread Sudheesh Katkam (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682021#comment-14682021
 ] 

Sudheesh Katkam commented on DRILL-3623:


Can you check the logs for pertinent information?

> Hive query hangs with limit 0 clause
> 
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Venki Korukanti
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)