[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381616#comment-16381616
 ] 

ASF GitHub Bot commented on DRILL-6099:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171479641
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
+assert project instanceof Project : "Rel is NOT an instance of 
project!";
+try {
+  RexVisitor visitor =
--- End diff --

Would FLATTEN ever occur within other expressions ?  I believe it always 
occurs as an independent expression.  If that's the case, it seems to me that 
having a visitor is overkill.. what do you think ?  Even the original rewrite 
from project to flatten just iterates over the project exprs here [1].  

[1]  
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/RewriteProjectToFlatten.java#L77


> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381617#comment-16381617
 ] 

ASF GitHub Bot commented on DRILL-6099:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171478117
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
+assert project instanceof Project : "Rel is NOT an instance of 
project!";
+try {
+  RexVisitor visitor =
+  new RexVisitorImpl(true) {
+public Void visitCall(RexCall call) {
+  if 
("flatten".equals(call.getOperator().getName().toLowerCase())) {
+throw new Util.FoundOne(call); /* throw exception to 
interrupt tree walk (this is similar to
+  other utility methods in 
RexUtil.java */
+  }
+  return super.visitCall(call);
+}
+  };
+  for (RexNode rex : ((Project) project).getProjects()) {
+rex.accept(visitor);
+  }
+} catch (Util.FoundOne e) {
+  Util.swallow(e, null);
+  return true;
+}
+return false;
+  }
+
+  public static boolean isProjectOutputSchemaUnknown(RelNode project) {
--- End diff --

Javadoc


> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381615#comment-16381615
 ] 

ASF GitHub Bot commented on DRILL-6099:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171478085
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -224,4 +226,64 @@ public Void visitInputRef(RexInputRef inputRef) {
 }
   }
 
+  public static boolean isLimit0(RexNode fetch) {
+if (fetch != null && fetch.isA(SqlKind.LITERAL)) {
+  RexLiteral l = (RexLiteral) fetch;
+  switch (l.getTypeName()) {
+case BIGINT:
+case INTEGER:
+case DECIMAL:
+  if (((long) l.getValue2()) == 0) {
+return true;
+  }
+  }
+}
+return false;
+  }
+
+  public static boolean isProjectOutputRowcountUnknown(RelNode project) {
--- End diff --

Could you add javadoc for this utility function. 


> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6099) Drill does not push limit past project (flatten) if it cannot be pushed into scan

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381618#comment-16381618
 ] 

ASF GitHub Bot commented on DRILL-6099:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1096#discussion_r171480227
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
 ---
@@ -55,18 +62,21 @@ public void onMatch(RelOptRuleCall call) {
 }
   };
 
-  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT =
-  new DrillPushLimitToScanRule(
-  RelOptHelper.some(DrillLimitRel.class, RelOptHelper.some(
-  DrillProjectRel.class, 
RelOptHelper.any(DrillScanRel.class))),
-  "DrillPushLimitToScanRule_LimitOnProject") {
+  public static DrillPushLimitToScanRule LIMIT_ON_PROJECT = new 
DrillPushLimitToScanRule(
+  RelOptHelper.some(DrillLimitRel.class, 
RelOptHelper.any(DrillProjectRel.class)), 
"DrillPushLimitToScanRule_LimitOnProject") {
 @Override
 public boolean matches(RelOptRuleCall call) {
   DrillLimitRel limitRel = call.rel(0);
-  DrillScanRel scanRel = call.rel(2);
-  // For now only applies to Parquet. And pushdown only apply limit 
but not offset,
+  DrillProjectRel projectRel = call.rel(1);
+  // pushdown only apply limit but not offset,
   // so if getFetch() return null no need to run this rule.
-  if (scanRel.getGroupScan().supportsLimitPushdown() && 
(limitRel.getFetch() != null)) {
--- End diff --

One implication of this is suppose the underlying Scan does not support 
Limit pushdown, you could end up with a plan `Scan->Limit->Project->Limit`  
where the Limit above the Scan is redundant (assume that there is no FLATTEN in 
this query).  Can this be avoided ? 


> Drill does not push limit past project (flatten) if it cannot be pushed into 
> scan
> -
>
> Key: DRILL-6099
> URL: https://issues.apache.org/jira/browse/DRILL-6099
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.13.0
>
>
> It would be useful to have pushdown occur past flatten(project). Here is an 
> example to illustrate the issue:
> {{explain plan without implementation for }}{{select name, 
> flatten(categories) as category from dfs.`/tmp/t_json_20` LIMIT 1;}}
> {{DrillScreenRel}}{{  }}
> {{  DrillLimitRel(fetch=[1])}}{{    }}
> {{    DrillProjectRel(name=[$0], category=[FLATTEN($1)])}}
> {{      DrillScanRel(table=[[dfs, /tmp/t_json_20]], groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/t_json_20, numFiles=1, columns=[`name`, 
> `categories`], files=[maprfs:///tmp/t_json_20/0_0_0.json]]])}}
> = 
> Content of 0_0_0.json
> =
> {
>   "name" : "Eric Goldberg, MD",
>   "categories" : [ "Doctors", "Health & Medical" ]
> } {
>   "name" : "Pine Cone Restaurant",
>   "categories" : [ "Restaurants" ]
> } {
>   "name" : "Deforest Family Restaurant",
>   "categories" : [ "American (Traditional)", "Restaurants" ]
> } {
>   "name" : "Culver's",
>   "categories" : [ "Food", "Ice Cream & Frozen Yogurt", "Fast Food", 
> "Restaurants" ]
> } {
>   "name" : "Chang Jiang Chinese Kitchen",
>   "categories" : [ "Chinese", "Restaurants" ]
> } 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6191) Need more information on TCP flags

2018-02-28 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381466#comment-16381466
 ] 

Ted Dunning commented on DRILL-6191:


Fixed the test to release results. Updated pull request. This pull may now 
conflict with DRILL-6190, but probably not.

> Need more information on TCP flags
> --
>
> Key: DRILL-6191
> URL: https://issues.apache.org/jira/browse/DRILL-6191
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
>  
> This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-28 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381429#comment-16381429
 ] 

Ted Dunning commented on DRILL-6190:


Travis build is fixed:
h3. [ #5031 passed|https://travis-ci.org/apache/drill/builds/347567906]
 *  Ran for 43 min 7 sec

> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381377#comment-16381377
 ] 

ASF GitHub Bot commented on DRILL-6190:
---

Github user tdunning commented on the issue:

https://github.com/apache/drill/pull/1133
  

Fixed the test regression. Deferring investigation into why the data field 
doesn't look unique to Drill because we will probably need to revamp how raw 
data is returned anyway.


> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6197) Duplicate entries in inputProfiles of minor fragments for specific operators

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381306#comment-16381306
 ] 

ASF GitHub Bot commented on DRILL-6197:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1141#discussion_r171431227
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentStats.java ---
@@ -31,6 +32,13 @@
 public class FragmentStats {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FragmentStats.class);
 
+  //Skip operators that already have stats reported by 
org.apache.drill.exec.physical.impl.BaseRootExec
+  private static final List operatorStatsInitToSkip = 
Lists.newArrayList(
--- End diff --

This could get out of sync with the types of senders that extend 
BaseRootExec. 


> Duplicate entries in inputProfiles of minor fragments for specific operators
> 
>
> Key: DRILL-6197
> URL: https://issues.apache.org/jira/browse/DRILL-6197
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Monitoring
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Minor fragments for the following operators show duplicate entries of the 
> inputProfile ({{org.apache.drill.exec.ops.OperatorStats}} instance) when 
> viewed in the Profile UI.
> e.g
> {code:json}
> {
> ...
>   "query": "select * from sys.version",
> ...
>   [ ...
>   {
>   "inputProfile": [{
>   "records": 0,
>   "batches": 0,
>   "schemas": 0
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 0,
>   "peakLocalMemoryAllocated": 27131904,
>   "waitNanos": 0
>   },
>   {
>   "inputProfile": [{
>   "records": 1,
>   "batches": 1,
>   "schemas": 1
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 752448,
>   "peakLocalMemoryAllocated": 27131904,
>   "metric": [{
>   "metricId": 0,
>   "longValue": 178
>   }],
>   "waitNanos": 889492
>   }]
>   ...
> }
> {code}
> {{operatorType: 13}} is the screen operator, for which there can be only one 
> inputProfile.
> It turns out that by default, all minor fragments' operators are provide a 
> list of inputProfiles by 
> {{org.apache.drill.exec.ops.FragmentStats.newOperatorStats(OpProfileDef, 
> BufferAllocator)}}. However, for the following 4 operators, the 
> {{org.apache.drill.exec.physical.impl.BaseRootExec}} constructors also inject 
> {{OperatorStats}}. 
> {code:java}
> org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
> org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
> org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
> org.apache.drill.exec.proto.beans.CoreOperatorType.HASH_PARTITION_SENDER
> {code}
> All updates to the inputProfiles are done by the latter, while the former 
> only reports zero values.
> The workaround is to have {{org.apache.drill.exec.ops.FragmentStats}} skip 
> injecting the {{org.apache.drill.exec.ops.OperatorStats}} instance for these 
> operators



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6180) Use System Option "output_batch_size" for External Sort

2018-02-28 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6180:

  Labels: ready-to-commit  (was: )
Reviewer: Paul Rogers

> Use System Option "output_batch_size" for External Sort
> ---
>
> Key: DRILL-6180
> URL: https://issues.apache.org/jira/browse/DRILL-6180
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> External Sort has boot time configuration for output batch size 
> "drill.exec.sort.external.spill.merge_batch_size" which is defaulted to 16M.
> To make batch sizing configuration uniform across all operators, change this 
> to use new system option that is added 
> "drill.exec.memory.operator.output_batch_size". This option has default value 
> of 32M.
> So, what are the implications if default is changed to 32M for external sort ?
> Instead, should we change the output batch size default to 16M for all 
> operators ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381106#comment-16381106
 ] 

Aman Sinha commented on DRILL-6193:
---

Thanks [~julianhyde].   We do currently disable cartesian join at the SQL level 
if we encounter  'CROSS JOIN', but we cannot detect the case when this is 
omitted, such as:
{noformat}
SELECT col FROM t1, t2; 
SELECT col FROM t1, t2 WHERE t2.a = 5;{noformat}
These are only detected at the logical plan stage.  Agree with the scenario 
where cartesian join is 'safe'..when one input is provably scalar and we do 
support it currently for some limited cases (we haven't yet moved to using the 
RelMdMaxRowCount due to 
[CALCITE-1048|https://issues.apache.org/jira/browse/CALCITE-1048].   

 

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6197) Duplicate entries in inputProfiles of minor fragments for specific operators

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381096#comment-16381096
 ] 

ASF GitHub Bot commented on DRILL-6197:
---

GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1141

DRILL-6197: Skip duplicate entry for OperatorStats

`org.apache.drill.exec.ops.FragmentStats` should skip injecting the 
`org.apache.drill.exec.ops.OperatorStats` instance for these operators:
```
org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.HASH_PARTITION_SENDER
```
They all use the `org.apache.drill.exec.physical.impl.BaseRootExec` to 
inject the correct statistics.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-6197

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1141


commit f61e0416b10ebf9826540bb0bbe7de5d826de029
Author: Kunal Khatua 
Date:   2018-02-28T21:58:09Z

DRILL-6197: Skip duplicate entry for OperatorStats

org.apache.drill.exec.ops.FragmentStats should skip injecting the 
org.apache.drill.exec.ops.OperatorStats instance for these operators:
org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.HASH_PARTITION_SENDER




> Duplicate entries in inputProfiles of minor fragments for specific operators
> 
>
> Key: DRILL-6197
> URL: https://issues.apache.org/jira/browse/DRILL-6197
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Monitoring
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Minor fragments for the following operators show duplicate entries of the 
> inputProfile ({{org.apache.drill.exec.ops.OperatorStats}} instance) when 
> viewed in the Profile UI.
> e.g
> {code:json}
> {
> ...
>   "query": "select * from sys.version",
> ...
>   [ ...
>   {
>   "inputProfile": [{
>   "records": 0,
>   "batches": 0,
>   "schemas": 0
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 0,
>   "peakLocalMemoryAllocated": 27131904,
>   "waitNanos": 0
>   },
>   {
>   "inputProfile": [{
>   "records": 1,
>   "batches": 1,
>   "schemas": 1
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 752448,
>   "peakLocalMemoryAllocated": 27131904,
>   "metric": [{
>   "metricId": 0,
>   "longValue": 178
>   }],
>   "waitNanos": 889492
>   }]
>   ...
> }
> {code}
> {{operatorType: 13}} is the screen operator, for which there can be only one 
> inputProfile.
> It turns out that by default, all minor fragments' operators are provide a 
> list of inputProfiles by 
> {{org.apache.drill.exec.ops.FragmentStats.newOperatorStats(OpProfileDef, 
> BufferAllocator)}}. However, for the following 4 operators, the 
> {{org.apache.drill.exec.physical.impl.BaseRootExec}} constructors also inject 
> {{OperatorStats}}. 
> {code:java}
> org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
> org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
> org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
> org.apache.drill.exec.proto.beans.Cor

[jira] [Commented] (DRILL-6197) Duplicate entries in inputProfiles of minor fragments for specific operators

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381098#comment-16381098
 ] 

ASF GitHub Bot commented on DRILL-6197:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1141
  
@amansinha100  please review


> Duplicate entries in inputProfiles of minor fragments for specific operators
> 
>
> Key: DRILL-6197
> URL: https://issues.apache.org/jira/browse/DRILL-6197
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Monitoring
>Affects Versions: 1.12.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
> Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Minor fragments for the following operators show duplicate entries of the 
> inputProfile ({{org.apache.drill.exec.ops.OperatorStats}} instance) when 
> viewed in the Profile UI.
> e.g
> {code:json}
> {
> ...
>   "query": "select * from sys.version",
> ...
>   [ ...
>   {
>   "inputProfile": [{
>   "records": 0,
>   "batches": 0,
>   "schemas": 0
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 0,
>   "peakLocalMemoryAllocated": 27131904,
>   "waitNanos": 0
>   },
>   {
>   "inputProfile": [{
>   "records": 1,
>   "batches": 1,
>   "schemas": 1
>   }],
>   "operatorId": 0,
>   "operatorType": 13,
>   "setupNanos": 0,
>   "processNanos": 752448,
>   "peakLocalMemoryAllocated": 27131904,
>   "metric": [{
>   "metricId": 0,
>   "longValue": 178
>   }],
>   "waitNanos": 889492
>   }]
>   ...
> }
> {code}
> {{operatorType: 13}} is the screen operator, for which there can be only one 
> inputProfile.
> It turns out that by default, all minor fragments' operators are provide a 
> list of inputProfiles by 
> {{org.apache.drill.exec.ops.FragmentStats.newOperatorStats(OpProfileDef, 
> BufferAllocator)}}. However, for the following 4 operators, the 
> {{org.apache.drill.exec.physical.impl.BaseRootExec}} constructors also inject 
> {{OperatorStats}}. 
> {code:java}
> org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
> org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
> org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
> org.apache.drill.exec.proto.beans.CoreOperatorType.HASH_PARTITION_SENDER
> {code}
> All updates to the inputProfiles are done by the latter, while the former 
> only reports zero values.
> The workaround is to have {{org.apache.drill.exec.ops.FragmentStats}} skip 
> injecting the {{org.apache.drill.exec.ops.OperatorStats}} instance for these 
> operators



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6197) Duplicate entries in inputProfiles of minor fragments for specific operators

2018-02-28 Thread Kunal Khatua (JIRA)
Kunal Khatua created DRILL-6197:
---

 Summary: Duplicate entries in inputProfiles of minor fragments for 
specific operators
 Key: DRILL-6197
 URL: https://issues.apache.org/jira/browse/DRILL-6197
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Monitoring
Affects Versions: 1.12.0
Reporter: Kunal Khatua
Assignee: Kunal Khatua
 Fix For: 1.13.0


Minor fragments for the following operators show duplicate entries of the 
inputProfile ({{org.apache.drill.exec.ops.OperatorStats}} instance) when viewed 
in the Profile UI.
e.g
{code:json}
{
...
"query": "select * from sys.version",
...
[ ...
{
"inputProfile": [{
"records": 0,
"batches": 0,
"schemas": 0
}],
"operatorId": 0,
"operatorType": 13,
"setupNanos": 0,
"processNanos": 0,
"peakLocalMemoryAllocated": 27131904,
"waitNanos": 0
},
{
"inputProfile": [{
"records": 1,
"batches": 1,
"schemas": 1
}],
"operatorId": 0,
"operatorType": 13,
"setupNanos": 0,
"processNanos": 752448,
"peakLocalMemoryAllocated": 27131904,
"metric": [{
"metricId": 0,
"longValue": 178
}],
"waitNanos": 889492
}]
...
}
{code}

{{operatorType: 13}} is the screen operator, for which there can be only one 
inputProfile.

It turns out that by default, all minor fragments' operators are provide a list 
of inputProfiles by 
{{org.apache.drill.exec.ops.FragmentStats.newOperatorStats(OpProfileDef, 
BufferAllocator)}}. However, for the following 4 operators, the 
{{org.apache.drill.exec.physical.impl.BaseRootExec}} constructors also inject 
{{OperatorStats}}. 

{code:java}
org.apache.drill.exec.proto.beans.CoreOperatorType.SCREEN
org.apache.drill.exec.proto.beans.CoreOperatorType.SINGLE_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.BROADCAST_SENDER
org.apache.drill.exec.proto.beans.CoreOperatorType.HASH_PARTITION_SENDER
{code}

All updates to the inputProfiles are done by the latter, while the former only 
reports zero values.

The workaround is to have {{org.apache.drill.exec.ops.FragmentStats}} skip 
injecting the {{org.apache.drill.exec.ops.OperatorStats}} instance for these 
operators



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Hanumath Rao Maduri (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380938#comment-16380938
 ] 

Hanumath Rao Maduri edited comment on DRILL-6193 at 2/28/18 9:18 PM:
-

[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and not just 
the simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.


was (Author: hanu.ncr):
[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and the 
simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381044#comment-16381044
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user cchang738 commented on the issue:

https://github.com/apache/drill/pull/1101
  
My test fail with OOM. @ilooner has test log.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381032#comment-16381032
 ] 

Julian Hyde commented on DRILL-6193:


By design, you cannot tell where the relational expression came from.

One case where a cartesian join is "safe" is where one or both sides have at 
zero or one rows. Then the join has no multiplying effect. Looks like that may 
hold in this case, if you have a PK on Orders. There is a statistic 
RelMdMaxRowCount.

If you are going to disable cartesian joins, why not do it at the SQL level 
rather than the algebra level? There are legitimate patterns where cartesian 
joins are the best plan.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381012#comment-16381012
 ] 

Aman Sinha commented on DRILL-6193:
---

[~julianhyde] is there a way in Calcite to distinguish whether a Join with 
'True' condition was created after doing expression simplification (via 
RexSimplify) vs. one where the original Join itself was a cartesian join ?  
Drill currently throws a CannotPlan for cartesian joins but for the first case 
we could potentially allow it since there is an underlying assumption that this 
simplification was 'safe'.    

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Hanumath Rao Maduri (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380938#comment-16380938
 ] 

Hanumath Rao Maduri commented on DRILL-6193:


[~vvysotskyi] Thank you for your input. I have looked at that code. It looks 
like we may require more changes than just overloading filter method in 
DrillRelBuilder, because the filter method returns the RelBuilder and the 
simplified predicates. Currently the client will call the build to build a 
FilterNode. Unless we overload the build and use the original predicates to add 
the removed join predicates, I think we cannot fix it. Please do let me know if 
I am missing anything here.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6196) Upgrade HiveTestDataGenerator to leverage "schematool"

2018-02-28 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-6196:
---
Description: 
Since version 2.0, Hive uses 
["schematool"|https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool]
 to create the necessary schema in the metastore on a startup if one doesn't 
exist.
 The old method via using datanucleus property "METASTORE_AUTO_CREATE_ALL" is 
[deprecated|https://github.com/apache/hive/blob/branch-2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L718].

That is especially needed to add test cases for transactional tables - 
[https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]

Once DRILL-5978 and DRILL-6195 are merged, it is necessary here to create test 
cases for partitioned and non-partitioned transnational tables.

  was:
Since version 2.0, Hive uses 
["schematool"|https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool]
 to create the necessary schema in the metastore on a startup if one doesn't 
exist.
 The old method via using datanucleus property "METASTORE_AUTO_CREATE_ALL" is 
[deprecated|https://github.com/apache/hive/blob/branch-2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L718].

That is especially needed to add test cases for transactional tables - 
[https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]


> Upgrade HiveTestDataGenerator to leverage "schematool"
> --
>
> Key: DRILL-6196
> URL: https://issues.apache.org/jira/browse/DRILL-6196
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vitalii Diravka
>Priority: Minor
>
> Since version 2.0, Hive uses 
> ["schematool"|https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool]
>  to create the necessary schema in the metastore on a startup if one doesn't 
> exist.
>  The old method via using datanucleus property "METASTORE_AUTO_CREATE_ALL" is 
> [deprecated|https://github.com/apache/hive/blob/branch-2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L718].
> That is especially needed to add test cases for transactional tables - 
> [https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]
> Once DRILL-5978 and DRILL-6195 are merged, it is necessary here to create 
> test cases for partitioned and non-partitioned transnational tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380886#comment-16380886
 ] 

ASF GitHub Bot commented on DRILL-6195:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1140
  
@arina-ielchiieva Current implementation of creating schema for Drill Hive 
embedded metastore doesn't allow to create transactional tables. That's why I 
have created a separate Jira task to update Drill HiveTestGenerator - 
[DRILL-6196](https://issues.apache.org/jira/browse/DRILL-6196).


> Quering Hive non-partitioned transactional tables via Drill
> ---
>
> Key: DRILL-6195
> URL: https://issues.apache.org/jira/browse/DRILL-6195
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
>
> After updating Hive client Drill can query Hive partitioned bucketed tables.
> The same logic can be used for Hive non-partitioned transnational bucketed 
> tables.
> Use case:
> {code}
> Hive
> CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
> CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
> TBLPROPERTIES (
>  'transactional'='true'
> );
> INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
> ('jdoe', 'mail.com', null);
> {code}
> {code}
> 0: jdbc:drill:> select * from hive.test_txn_2;
> Error: SYSTEM ERROR: IOException: Open failed for file: 
> /user/hive/warehouse/test_txn_2, error: Invalid argument (22)
> Setup failed for null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6180) Use System Option "output_batch_size" for External Sort

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380861#comment-16380861
 ] 

ASF GitHub Bot commented on DRILL-6180:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1129
  
@paul-rogers Made the change you suggested. Please take a look when you get 
a chance. 


> Use System Option "output_batch_size" for External Sort
> ---
>
> Key: DRILL-6180
> URL: https://issues.apache.org/jira/browse/DRILL-6180
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
> Fix For: 1.13.0
>
>
> External Sort has boot time configuration for output batch size 
> "drill.exec.sort.external.spill.merge_batch_size" which is defaulted to 16M.
> To make batch sizing configuration uniform across all operators, change this 
> to use new system option that is added 
> "drill.exec.memory.operator.output_batch_size". This option has default value 
> of 32M.
> So, what are the implications if default is changed to 32M for external sort ?
> Instead, should we change the output batch size default to 16M for all 
> operators ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380835#comment-16380835
 ] 

ASF GitHub Bot commented on DRILL-6032:
---

Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1101
  
Spoke with Chun, he will run the tests and update the PR with the test 
results.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> ---
>
> Key: DRILL-6032
> URL: https://issues.apache.org/jira/browse/DRILL-6032
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
> Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6185) Error is displaying while accessing query profiles via the Web-UI

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380828#comment-16380828
 ] 

ASF GitHub Bot commented on DRILL-6185:
---

Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1137
  
the purpose of parsing the plan as a String is primarily to extract the 
alternative operator names for the UI. The rest of the items are irrelevant for 
that usecase. Were you looking to figure out a way to deserialize the plan 
text, so that other scenarios could leverage off that?


> Error is displaying while accessing query profiles via the Web-UI
> -
>
> Key: DRILL-6185
> URL: https://issues.apache.org/jira/browse/DRILL-6185
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> *Steps:*
>  # Execute the following query:
> {code:sql}
> show schemas;
> {code}
> # On the Web-UI, go to the Profiles tab
> # Open the profile for the query you executed
> *Expected result:* You can access to the profile entry
> *Actual result:* Error is displayed:
> {code:json}
> {
>   "errorMessage" : "1"
> }
> {code}
> *Note:* This error doesn't happen with every query. For example, "select * 
> from system.version" can be accessed without error, while "show tables", "use 
> dfs", "alter sessions" etc end with this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6195:

Reviewer: Arina Ielchiieva

> Quering Hive non-partitioned transactional tables via Drill
> ---
>
> Key: DRILL-6195
> URL: https://issues.apache.org/jira/browse/DRILL-6195
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
>
> After updating Hive client Drill can query Hive partitioned bucketed tables.
> The same logic can be used for Hive non-partitioned transnational bucketed 
> tables.
> Use case:
> {code}
> Hive
> CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
> CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
> TBLPROPERTIES (
>  'transactional'='true'
> );
> INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
> ('jdoe', 'mail.com', null);
> {code}
> {code}
> 0: jdbc:drill:> select * from hive.test_txn_2;
> Error: SYSTEM ERROR: IOException: Open failed for file: 
> /user/hive/warehouse/test_txn_2, error: Invalid argument (22)
> Setup failed for null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380764#comment-16380764
 ] 

ASF GitHub Bot commented on DRILL-6195:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1140
  
Looks good, could you add unit test?


> Quering Hive non-partitioned transactional tables via Drill
> ---
>
> Key: DRILL-6195
> URL: https://issues.apache.org/jira/browse/DRILL-6195
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
>
> After updating Hive client Drill can query Hive partitioned bucketed tables.
> The same logic can be used for Hive non-partitioned transnational bucketed 
> tables.
> Use case:
> {code}
> Hive
> CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
> CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
> TBLPROPERTIES (
>  'transactional'='true'
> );
> INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
> ('jdoe', 'mail.com', null);
> {code}
> {code}
> 0: jdbc:drill:> select * from hive.test_txn_2;
> Error: SYSTEM ERROR: IOException: Open failed for file: 
> /user/hive/warehouse/test_txn_2, error: Invalid argument (22)
> Setup failed for null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6126) Allocate memory for value vectors upfront in flatten operator

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380746#comment-16380746
 ] 

ASF GitHub Bot commented on DRILL-6126:
---

Github user ppadma commented on the issue:

https://github.com/apache/drill/pull/1125
  
@paul-rogers Paul, Thanks a lot for your review comments and bringing up 
some good issues. Just want to let you know. I am working on refactoring the 
batch sizer code, writing bunch of unit tests to test sizing and vector 
allocation for all different vector types. Found some bugs in the process and 
fixed them. I will be posting new changes soon and need your review once they 
are ready.


> Allocate memory for value vectors upfront in flatten operator
> -
>
> Key: DRILL-6126
> URL: https://issues.apache.org/jira/browse/DRILL-6126
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
> Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out 
> row count in the output batch based on memory. Since we know how many rows we 
> are going to include in the batch, we can also allocate the memory needed 
> upfront instead of starting with initial value (4096) and doubling, copying 
> every time we need more. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6114) Complete internal metadata layer for improved batch handling

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380696#comment-16380696
 ] 

ASF GitHub Bot commented on DRILL-6114:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1112
  
@arina-ielchiieva, @parthchandra can either of you perhaps give this one a 
committer review? Thanks! 


> Complete internal metadata layer for improved batch handling
> 
>
> Key: DRILL-6114
> URL: https://issues.apache.org/jira/browse/DRILL-6114
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.13.0
>
>
> Slice of the ["batch handling" 
> project.|https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades] 
> that includes enhancements to the internal metadata system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380693#comment-16380693
 ] 

ASF GitHub Bot commented on DRILL-6195:
---

Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/1140
  
@arina-ielchiieva Please review


> Quering Hive non-partitioned transactional tables via Drill
> ---
>
> Key: DRILL-6195
> URL: https://issues.apache.org/jira/browse/DRILL-6195
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
>
> After updating Hive client Drill can query Hive partitioned bucketed tables.
> The same logic can be used for Hive non-partitioned transnational bucketed 
> tables.
> Use case:
> {code}
> Hive
> CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
> CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
> TBLPROPERTIES (
>  'transactional'='true'
> );
> INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
> ('jdoe', 'mail.com', null);
> {code}
> {code}
> 0: jdbc:drill:> select * from hive.test_txn_2;
> Error: SYSTEM ERROR: IOException: Open failed for file: 
> /user/hive/warehouse/test_txn_2, error: Invalid argument (22)
> Setup failed for null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6040) Need to add usage for graceful_stop to drillbit.sh

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380692#comment-16380692
 ] 

ASF GitHub Bot commented on DRILL-6040:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1135#discussion_r171319850
  
--- Diff: distribution/src/resources/drillbit.sh ---
@@ -45,7 +45,7 @@
 # configuration file. The option takes precedence over the
 # DRILL_CONF_DIR environment variable.
 #
-# The command is one of: start|stop|status|restart|run
+# The command is one of: start|stop|status|restart|run|graceful_stop
--- End diff --

This command will be typed by hand sometimes. Can we find a shorter 
command? Retire? Remove? Or, should stop be changed to do this, with a new kill 
for the "ungraceful" stop?


> Need to add usage for graceful_stop to drillbit.sh
> --
>
> Key: DRILL-6040
> URL: https://issues.apache.org/jira/browse/DRILL-6040
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Krystal
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=eb0c403
> Usage for graceful_stop is missing from drillbit.sh.
> ./drillbit.sh
> Usage: drillbit.sh [--config|--site ] 
> (start|stop|status|restart|run) [args]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380690#comment-16380690
 ] 

ASF GitHub Bot commented on DRILL-6195:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/1140

DRILL-6195: Quering Hive non-partitioned transactional tables via Drill



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-6195

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1140


commit 655075f9cea5f41e2770924ceabee79a57ea323e
Author: Vitalii Diravka 
Date:   2018-02-28T14:16:17Z

DRILL-6195: Quering Hive non-partitioned transactional tables via Drill




> Quering Hive non-partitioned transactional tables via Drill
> ---
>
> Key: DRILL-6195
> URL: https://issues.apache.org/jira/browse/DRILL-6195
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.13.0
>
>
> After updating Hive client Drill can query Hive partitioned bucketed tables.
> The same logic can be used for Hive non-partitioned transnational bucketed 
> tables.
> Use case:
> {code}
> Hive
> CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
> CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
> TBLPROPERTIES (
>  'transactional'='true'
> );
> INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
> ('jdoe', 'mail.com', null);
> {code}
> {code}
> 0: jdbc:drill:> select * from hive.test_txn_2;
> Error: SYSTEM ERROR: IOException: Open failed for file: 
> /user/hive/warehouse/test_txn_2, error: Invalid argument (22)
> Setup failed for null
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380687#comment-16380687
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1138
  
General comment: if we could move to the new scan framework; it handles 
implicit columns for all file-based readers. It also handles projection, 
missing columns, etc...


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6180) Use System Option "output_batch_size" for External Sort

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380682#comment-16380682
 ] 

ASF GitHub Bot commented on DRILL-6180:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1129#discussion_r171315310
  
--- Diff: exec/java-exec/src/main/resources/drill-module.conf ---
@@ -421,7 +416,7 @@ drill.exec.options: {
 drill.exec.storage.implicit.fqn.column.label: "fqn",
 drill.exec.storage.implicit.suffix.column.label: "suffix",
 drill.exec.testing.controls: "{}",
-drill.exec.memory.operator.output_batch_size : 33554432, # 32 MB
+drill.exec.memory.operator.output_batch_size : 16777216, # 16 MB
--- End diff --

Thanks for making this adjustment.


> Use System Option "output_batch_size" for External Sort
> ---
>
> Key: DRILL-6180
> URL: https://issues.apache.org/jira/browse/DRILL-6180
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
> Fix For: 1.13.0
>
>
> External Sort has boot time configuration for output batch size 
> "drill.exec.sort.external.spill.merge_batch_size" which is defaulted to 16M.
> To make batch sizing configuration uniform across all operators, change this 
> to use new system option that is added 
> "drill.exec.memory.operator.output_batch_size". This option has default value 
> of 32M.
> So, what are the implications if default is changed to 32M for external sort ?
> Instead, should we change the output batch size default to 16M for all 
> operators ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6180) Use System Option "output_batch_size" for External Sort

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380681#comment-16380681
 ] 

ASF GitHub Bot commented on DRILL-6180:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1129#discussion_r171314892
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortConfig.java
 ---
@@ -71,8 +72,8 @@
 
   private final int mSortBatchSize;
 
-  public SortConfig(DrillConfig config) {
-
+  public SortConfig(FragmentContext context) {
+DrillConfig config = context.getConfig();
--- End diff --

Suggestion: pass in the original `DrillConfig` plus an option manager 
rather than the fragment context. The suggestion minimizes undesired 
dependencies.


> Use System Option "output_batch_size" for External Sort
> ---
>
> Key: DRILL-6180
> URL: https://issues.apache.org/jira/browse/DRILL-6180
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.12.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Critical
> Fix For: 1.13.0
>
>
> External Sort has boot time configuration for output batch size 
> "drill.exec.sort.external.spill.merge_batch_size" which is defaulted to 16M.
> To make batch sizing configuration uniform across all operators, change this 
> to use new system option that is added 
> "drill.exec.memory.operator.output_batch_size". This option has default value 
> of 32M.
> So, what are the implications if default is changed to 32M for external sort ?
> Instead, should we change the output batch size default to 16M for all 
> operators ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6196) Upgrade HiveTestDataGenerator to leverage "schematool"

2018-02-28 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6196:
--

 Summary: Upgrade HiveTestDataGenerator to leverage "schematool"
 Key: DRILL-6196
 URL: https://issues.apache.org/jira/browse/DRILL-6196
 Project: Apache Drill
  Issue Type: Improvement
  Components: Tools, Build & Test
Reporter: Vitalii Diravka


Since version 2.0, Hive uses 
["schematool"|https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool]
 to create the necessary schema in the metastore on a startup if one doesn't 
exist.
 The old method via using datanucleus property "METASTORE_AUTO_CREATE_ALL" is 
[deprecated|https://github.com/apache/hive/blob/branch-2.1/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L718].

That is especially needed to add test cases for transactional tables - 
[https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380661#comment-16380661
 ] 

Aman Sinha commented on DRILL-6193:
---

[~vvysotskyi]'s suggestion seems reasonable in the near term and the change can 
be made in the Drill.  I don't know how much work is involved to override the 
base class's filter() method.  

It would be helpful if we can distinguish between a pure cartesian join 
(submitted by the user) versus one that resulted after the RexSimplify was 
done. 

BTW, if the query was written with the ON clause, then the join predicate 
remains intact, so this can be a workaround: 
{noformat}
0: jdbc:drill:zk=local> explain plan without implementation for select count(*) 
from cp.`tpch/nation.parquet` n inner join cp.`tpch/region.parquet` r on 
n.n_nationkey = r.r_regionkey where n.n_nationkey = 5 and r.r_regionkey 
| DrillScreenRel
  DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
    DrillProjectRel($f0=[0])
      DrillJoinRel(condition=[=($0, $1)], joinType=[inner])
        DrillFilterRel(condition=[=($0, 5)])
          DrillScanRel(table=[[cp, tpch/nation.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`n_nationkey`]]])
        DrillFilterRel(condition=[=($0, 5)])
          DrillScanRel(table=[[cp, tpch/region.parquet]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=classpath:/tpch/region.parquet]], 
selectionRoot=classpath:/tpch/region.parquet, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`r_regionkey`]]]){noformat}

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6189) Security: passwords logging and file permisions

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6189:

Reviewer: Arina Ielchiieva

> Security: passwords logging and file permisions
> ---
>
> Key: DRILL-6189
> URL: https://issues.apache.org/jira/browse/DRILL-6189
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual results:* During the drillbit startup or establishing connections via 
> the jdbc or odbc, the following lines appear in the drillbit.log:
> {noformat}
> properties {
> key: "password"
> value: ""
> }
> {noformat}
> Same thing happens with storage configuration data, everything, including 
> passwords is being logged to file.
> *Another issue:*
> Currently Drill config files has the permissions 0644:
> {noformat}
> -rw-r--r--. 1 mapr mapr 1081 Nov 16 14:42 core-site-example.xml
> -rwxr-xr-x. 1 mapr mapr 1807 Dec 19 11:55 distrib-env.sh
> -rw-r--r--. 1 mapr mapr 1424 Nov 16 14:42 distrib-env.sh.prejmx
> -rw-r--r--. 1 mapr mapr 1942 Nov 16 14:42 drill-am-log.xml
> -rw-r--r--. 1 mapr mapr 1279 Dec 19 11:55 drill-distrib.conf
> -rw-r--r--. 1 mapr mapr  117 Nov 16 14:50 drill-distrib-mem-qs.conf
> -rw-r--r--. 1 mapr mapr 6016 Nov 16 14:42 drill-env.sh
> -rw-r--r--. 1 mapr mapr 1855 Nov 16 14:50 drill-on-yarn.conf
> -rw-r--r--. 1 mapr mapr 6913 Nov 16 14:42 drill-on-yarn-example.conf
> -rw-r--r--. 1 mapr mapr 1135 Dec 19 11:55 drill-override.conf
> -rw-r--r--. 1 mapr mapr 7820 Nov 16 14:42 drill-override-example.conf
> -rw-r--r--. 1 mapr mapr 3136 Nov 16 14:42 logback.xml
> -rw-r--r--. 1 mapr mapr  668 Nov 16 14:51 warden.drill-bits.conf
> -rw-r--r--. 1 mapr mapr 1581 Nov 16 14:42 yarn-client-log.xml
> {noformat}
> As they may contain some sensitive information, like passwords or secret 
> keys, they cannot be viewable to everyone. So I suggest to reduce the 
> permissions at least to 0640.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6189) Security: passwords logging and file permisions

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380646#comment-16380646
 ] 

ASF GitHub Bot commented on DRILL-6189:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171307607
  
--- Diff: 
logical/src/main/java/org/apache/drill/common/config/LogicalPlanPersistence.java
 ---
@@ -52,6 +53,7 @@ public LogicalPlanPersistence(DrillConfig conf, 
ScanResult scanResult) {
 mapper.configure(Feature.ALLOW_UNQUOTED_FIELD_NAMES, true);
 mapper.configure(JsonGenerator.Feature.QUOTE_FIELD_NAMES, true);
 mapper.configure(Feature.ALLOW_COMMENTS, true);
+mapper.setFilterProvider(new 
SimpleFilterProvider().setFailOnUnknownId(false));
--- End diff --

Will filtering passwords work when profiles are sent between nodes (i.e. 
when we have several major fragments)?


> Security: passwords logging and file permisions
> ---
>
> Key: DRILL-6189
> URL: https://issues.apache.org/jira/browse/DRILL-6189
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual results:* During the drillbit startup or establishing connections via 
> the jdbc or odbc, the following lines appear in the drillbit.log:
> {noformat}
> properties {
> key: "password"
> value: ""
> }
> {noformat}
> Same thing happens with storage configuration data, everything, including 
> passwords is being logged to file.
> *Another issue:*
> Currently Drill config files has the permissions 0644:
> {noformat}
> -rw-r--r--. 1 mapr mapr 1081 Nov 16 14:42 core-site-example.xml
> -rwxr-xr-x. 1 mapr mapr 1807 Dec 19 11:55 distrib-env.sh
> -rw-r--r--. 1 mapr mapr 1424 Nov 16 14:42 distrib-env.sh.prejmx
> -rw-r--r--. 1 mapr mapr 1942 Nov 16 14:42 drill-am-log.xml
> -rw-r--r--. 1 mapr mapr 1279 Dec 19 11:55 drill-distrib.conf
> -rw-r--r--. 1 mapr mapr  117 Nov 16 14:50 drill-distrib-mem-qs.conf
> -rw-r--r--. 1 mapr mapr 6016 Nov 16 14:42 drill-env.sh
> -rw-r--r--. 1 mapr mapr 1855 Nov 16 14:50 drill-on-yarn.conf
> -rw-r--r--. 1 mapr mapr 6913 Nov 16 14:42 drill-on-yarn-example.conf
> -rw-r--r--. 1 mapr mapr 1135 Dec 19 11:55 drill-override.conf
> -rw-r--r--. 1 mapr mapr 7820 Nov 16 14:42 drill-override-example.conf
> -rw-r--r--. 1 mapr mapr 3136 Nov 16 14:42 logback.xml
> -rw-r--r--. 1 mapr mapr  668 Nov 16 14:51 warden.drill-bits.conf
> -rw-r--r--. 1 mapr mapr 1581 Nov 16 14:42 yarn-client-log.xml
> {noformat}
> As they may contain some sensitive information, like passwords or secret 
> keys, they cannot be viewable to everyone. So I suggest to reduce the 
> permissions at least to 0640.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6189) Security: passwords logging and file permisions

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380645#comment-16380645
 ] 

ASF GitHub Bot commented on DRILL-6189:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171307292
  
--- Diff: 
contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStorageConfig.java
 ---
@@ -17,13 +17,15 @@
  */
 package org.apache.drill.exec.store.jdbc;
 
+import com.fasterxml.jackson.annotation.JsonFilter;
 import org.apache.drill.common.logical.StoragePluginConfig;
 
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.annotation.JsonTypeName;
 
 @JsonTypeName(JdbcStorageConfig.NAME)
+@JsonFilter("passwordFilter")
--- End diff --

Please explain how this works?


> Security: passwords logging and file permisions
> ---
>
> Key: DRILL-6189
> URL: https://issues.apache.org/jira/browse/DRILL-6189
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual results:* During the drillbit startup or establishing connections via 
> the jdbc or odbc, the following lines appear in the drillbit.log:
> {noformat}
> properties {
> key: "password"
> value: ""
> }
> {noformat}
> Same thing happens with storage configuration data, everything, including 
> passwords is being logged to file.
> *Another issue:*
> Currently Drill config files has the permissions 0644:
> {noformat}
> -rw-r--r--. 1 mapr mapr 1081 Nov 16 14:42 core-site-example.xml
> -rwxr-xr-x. 1 mapr mapr 1807 Dec 19 11:55 distrib-env.sh
> -rw-r--r--. 1 mapr mapr 1424 Nov 16 14:42 distrib-env.sh.prejmx
> -rw-r--r--. 1 mapr mapr 1942 Nov 16 14:42 drill-am-log.xml
> -rw-r--r--. 1 mapr mapr 1279 Dec 19 11:55 drill-distrib.conf
> -rw-r--r--. 1 mapr mapr  117 Nov 16 14:50 drill-distrib-mem-qs.conf
> -rw-r--r--. 1 mapr mapr 6016 Nov 16 14:42 drill-env.sh
> -rw-r--r--. 1 mapr mapr 1855 Nov 16 14:50 drill-on-yarn.conf
> -rw-r--r--. 1 mapr mapr 6913 Nov 16 14:42 drill-on-yarn-example.conf
> -rw-r--r--. 1 mapr mapr 1135 Dec 19 11:55 drill-override.conf
> -rw-r--r--. 1 mapr mapr 7820 Nov 16 14:42 drill-override-example.conf
> -rw-r--r--. 1 mapr mapr 3136 Nov 16 14:42 logback.xml
> -rw-r--r--. 1 mapr mapr  668 Nov 16 14:51 warden.drill-bits.conf
> -rw-r--r--. 1 mapr mapr 1581 Nov 16 14:42 yarn-client-log.xml
> {noformat}
> As they may contain some sensitive information, like passwords or secret 
> keys, they cannot be viewable to everyone. So I suggest to reduce the 
> permissions at least to 0640.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6189) Security: passwords logging and file permisions

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380644#comment-16380644
 ] 

ASF GitHub Bot commented on DRILL-6189:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/1139#discussion_r171308023
  
--- Diff: 
protocol/src/main/java/org/apache/drill/exec/proto/UserProtos.java ---
@@ -5798,6 +5798,34 @@ public static UserToBitHandshake 
getDefaultInstance() {
 public UserToBitHandshake getDefaultInstanceForType() {
   return defaultInstance;
 }
+public String safeLogString() {
--- End diff --

You cannot add custom methods to proto buffers. Also consider using tabs 
instead of multiple spaces.
Please add to Jira example how log files looked before your changes and 
after.


> Security: passwords logging and file permisions
> ---
>
> Key: DRILL-6189
> URL: https://issues.apache.org/jira/browse/DRILL-6189
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual results:* During the drillbit startup or establishing connections via 
> the jdbc or odbc, the following lines appear in the drillbit.log:
> {noformat}
> properties {
> key: "password"
> value: ""
> }
> {noformat}
> Same thing happens with storage configuration data, everything, including 
> passwords is being logged to file.
> *Another issue:*
> Currently Drill config files has the permissions 0644:
> {noformat}
> -rw-r--r--. 1 mapr mapr 1081 Nov 16 14:42 core-site-example.xml
> -rwxr-xr-x. 1 mapr mapr 1807 Dec 19 11:55 distrib-env.sh
> -rw-r--r--. 1 mapr mapr 1424 Nov 16 14:42 distrib-env.sh.prejmx
> -rw-r--r--. 1 mapr mapr 1942 Nov 16 14:42 drill-am-log.xml
> -rw-r--r--. 1 mapr mapr 1279 Dec 19 11:55 drill-distrib.conf
> -rw-r--r--. 1 mapr mapr  117 Nov 16 14:50 drill-distrib-mem-qs.conf
> -rw-r--r--. 1 mapr mapr 6016 Nov 16 14:42 drill-env.sh
> -rw-r--r--. 1 mapr mapr 1855 Nov 16 14:50 drill-on-yarn.conf
> -rw-r--r--. 1 mapr mapr 6913 Nov 16 14:42 drill-on-yarn-example.conf
> -rw-r--r--. 1 mapr mapr 1135 Dec 19 11:55 drill-override.conf
> -rw-r--r--. 1 mapr mapr 7820 Nov 16 14:42 drill-override-example.conf
> -rw-r--r--. 1 mapr mapr 3136 Nov 16 14:42 logback.xml
> -rw-r--r--. 1 mapr mapr  668 Nov 16 14:51 warden.drill-bits.conf
> -rw-r--r--. 1 mapr mapr 1581 Nov 16 14:42 yarn-client-log.xml
> {noformat}
> As they may contain some sensitive information, like passwords or secret 
> keys, they cannot be viewable to everyone. So I suggest to reduce the 
> permissions at least to 0640.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6189) Security: passwords logging and file permisions

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380530#comment-16380530
 ] 

ASF GitHub Bot commented on DRILL-6189:
---

GitHub user vladimirtkach opened a pull request:

https://github.com/apache/drill/pull/1139

DRILL-6189: Security: passwords logging and file permisions

1. Overrided serialization methods for instances with passwords
2. Changed file permissions for configuration files

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vladimirtkach/drill DRILL-6189

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1139


commit 9bf7f464fe921cef92ad9802f56c75b72064b0aa
Author: Vladimir Tkach 
Date:   2018-02-28T11:10:50Z

DRILL-6189: Security: passwords logging and file permisions

1. Overrided serialization methods for instances with passwords
2. Changed file permissions for configuration files




> Security: passwords logging and file permisions
> ---
>
> Key: DRILL-6189
> URL: https://issues.apache.org/jira/browse/DRILL-6189
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Volodymyr Tkach
>Assignee: Volodymyr Tkach
>Priority: Major
>
> *Prerequisites:*
>  *1.* Log level is set to "all" in the conf/logback.xml:
> {code:xml}
> 
> 
> 
> 
> {code}
> *2.* PLAIN authentication mechanism is configured:
> {code:java}
>   security.user.auth: {
>   enabled: true,
>   packages += "org.apache.drill.exec.rpc.user.security",
>   impl: "pam",
>   pam_profiles: [ "sudo", "login" ]
>   }
> {code}
> *Steps:*
>  *1.* Start the drillbits
>  *2.* Connect by sqlline:
> {noformat}
> /opt/mapr/drill/drill-1.13.0/bin/sqlline -u "jdbc:drill:zk=node1:5181;" -n 
> user1 -p 
> {noformat}
> *Expected result:* Logs shouldn't contain clear-text passwords
> *Actual results:* During the drillbit startup or establishing connections via 
> the jdbc or odbc, the following lines appear in the drillbit.log:
> {noformat}
> properties {
> key: "password"
> value: ""
> }
> {noformat}
> Same thing happens with storage configuration data, everything, including 
> passwords is being logged to file.
> *Another issue:*
> Currently Drill config files has the permissions 0644:
> {noformat}
> -rw-r--r--. 1 mapr mapr 1081 Nov 16 14:42 core-site-example.xml
> -rwxr-xr-x. 1 mapr mapr 1807 Dec 19 11:55 distrib-env.sh
> -rw-r--r--. 1 mapr mapr 1424 Nov 16 14:42 distrib-env.sh.prejmx
> -rw-r--r--. 1 mapr mapr 1942 Nov 16 14:42 drill-am-log.xml
> -rw-r--r--. 1 mapr mapr 1279 Dec 19 11:55 drill-distrib.conf
> -rw-r--r--. 1 mapr mapr  117 Nov 16 14:50 drill-distrib-mem-qs.conf
> -rw-r--r--. 1 mapr mapr 6016 Nov 16 14:42 drill-env.sh
> -rw-r--r--. 1 mapr mapr 1855 Nov 16 14:50 drill-on-yarn.conf
> -rw-r--r--. 1 mapr mapr 6913 Nov 16 14:42 drill-on-yarn-example.conf
> -rw-r--r--. 1 mapr mapr 1135 Dec 19 11:55 drill-override.conf
> -rw-r--r--. 1 mapr mapr 7820 Nov 16 14:42 drill-override-example.conf
> -rw-r--r--. 1 mapr mapr 3136 Nov 16 14:42 logback.xml
> -rw-r--r--. 1 mapr mapr  668 Nov 16 14:51 warden.drill-bits.conf
> -rw-r--r--. 1 mapr mapr 1581 Nov 16 14:42 yarn-client-log.xml
> {noformat}
> As they may contain some sensitive information, like passwords or secret 
> keys, they cannot be viewable to everyone. So I suggest to reduce the 
> permissions at least to 0640.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4761) Hit NoClassDefFoundError on class oadd/org/apache/log4j/Logger

2018-02-28 Thread John Humphreys (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380501#comment-16380501
 ] 

John Humphreys commented on DRILL-4761:
---

Seeing same with 1.12 JAR. :(

> Hit NoClassDefFoundError on class oadd/org/apache/log4j/Logger
> --
>
> Key: DRILL-4761
> URL: https://issues.apache.org/jira/browse/DRILL-4761
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Alfie
>Priority: Minor
>
> I'm using drill-jdbc-all jar and see NoClassDefFoundError exception because 
> of cannot find class oadd/org/apache/log4j/Logger.
> The class is not included in the jar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6195) Quering Hive non-partitioned transactional tables via Drill

2018-02-28 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6195:
--

 Summary: Quering Hive non-partitioned transactional tables via 
Drill
 Key: DRILL-6195
 URL: https://issues.apache.org/jira/browse/DRILL-6195
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Affects Versions: 1.12.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.13.0


After updating Hive client Drill can query Hive partitioned bucketed tables.
The same logic can be used for Hive non-partitioned transnational bucketed 
tables.

Use case:
{code}
Hive
CREATE TABLE test_txn_2 (userid VARCHAR(64), link STRING, came_from STRING)
CLUSTERED BY (userid) INTO 8 BUCKETS STORED AS ORC
TBLPROPERTIES (
 'transactional'='true'
);
INSERT INTO TABLE test_txn_2 VALUES ('jsmith', 'mail.com', 'sports.com'), 
('jdoe', 'mail.com', null);
{code}
{code}
0: jdbc:drill:> select * from hive.test_txn_2;
Error: SYSTEM ERROR: IOException: Open failed for file: 
/user/hive/warehouse/test_txn_2, error: Invalid argument (22)

Setup failed for null
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5932) HiveServer2 queries throw error with jdbc connection

2018-02-28 Thread Willian Mattos Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380342#comment-16380342
 ] 

Willian Mattos Ribeiro commented on DRILL-5932:
---

I'm having the same problem even using 1.12 version.

> HiveServer2 queries throw error with jdbc connection
> 
>
> Key: DRILL-5932
> URL: https://issues.apache.org/jira/browse/DRILL-5932
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Storage - Hive
>Affects Versions: 1.11.0
> Environment: linux
> 2.3 hive version
>Reporter: tooptoop4
>Priority: Blocker
>
> Basic hive queries all throw error!
> {code:sql}
> copied 
> https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/2.3.0/hive-jdbc-2.3.0-standalone.jar
>  to /usr/lib/apache-drill-1.11.0/jars/3rdparty/hive-jdbc-2.3.0-standalone.jar
> added this storage plugin:
> {
>   "type": "jdbc",
>   "driver": "org.apache.hive.jdbc.HiveDriver",
>   "url": "jdbc:hive2://host:1/default",
>   "username": "hive",
>   "password": "hive1234",
>   "enabled": true
> }
> [ec2-user@host ~]$ cd /usr/lib/apache-drill-1.11.0
> [ec2-user@host apache-drill-1.11.0]$ ./bin/drill-embedded
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support 
> was removed in 8.0
> Nov 01, 2017 7:53:53 AM org.glassfish.jersey.server.ApplicationHandler 
> initialize
> INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 
> 01:25:26...
> apache drill 1.11.0
> "this isn't your grandfather's sql"
> 0: jdbc:drill:zk=local> SELECT count(*) FROM hive2.`contact`;
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query.
> sql SELECT COUNT(*) AS EXPR$0
> FROM (SELECT 0 AS $f0
> FROM.default.contact) AS t
> plugin hive2
> Fragment 0:0
> [Error Id: 4b293e97-7547-49c5-91da-b9ee2f2184fc on 
> ip-myip.mydomain.orghere.com:31010] (state=,code=0)
> 0: jdbc:drill:zk=local> ALTER SESSION SET `exec.errors.verbose` = true;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | exec.errors.verbose updated.  |
> +---+---+
> 1 row selected (0.351 seconds)
> 0: jdbc:drill:zk=local> select * from hive2.contact;
> Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the 
> SQL query.
> sql SELECT *
> FROM.default.contact
> plugin hive2
> Fragment 0:0
> [Error Id: fe36b026-e8ff-4354-af6c-6073130680c9 on ip-ip.domain.org.com:31010]
>   (org.apache.hive.service.cli.HiveSQLException) Error while compiling 
> statement: FAILED: ParseException line 2:4 cannot recognize input near '.' 
> 'default' '.' in join source
> org.apache.hive.jdbc.Utils.verifySuccess():267
> org.apache.hive.jdbc.Utils.verifySuccessWithInfo():253
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer():313
> org.apache.hive.jdbc.HiveStatement.execute():253
> org.apache.hive.jdbc.HiveStatement.executeQuery():476
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.commons.dbcp.DelegatingStatement.executeQuery():208
> org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177
> org.apache.drill.exec.physical.impl.ScanBatch.():104
> org.apache.drill.exec.physical.impl.ScanBatch.():126
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40
> org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():156
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():179
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren():179
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():109
> org.apache.drill.exec.physical.impl.ImplCreator.getExec():87
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():207
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748
>   Caused By (org.apache.hive.service.cli.HiveSQLException) Error while 
> compiling statement: FAILED: ParseException line 2:4 cannot recognize input 
> near '.' 'default' '.' in join source
> org.apache.hive.service.cli.operation.Operation.toSQLException():380
> org.apache.hive.service.cli.operation.SQLOperation.prepare():206
> org.apache.hive.service.cli.operation.SQLOperation.runInternal():290
> org.apache.hive.service.cli.operation.Operation.run():320
> 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal():530
> 
> org.apache.hive.service.cli.session.HiveSes

[jira] [Created] (DRILL-6194) Allow un-caching of parquet metadata or stop queries from failing when metadata is old.

2018-02-28 Thread John Humphreys (JIRA)
John Humphreys created DRILL-6194:
-

 Summary: Allow un-caching of parquet metadata or stop queries from 
failing when metadata is old.
 Key: DRILL-6194
 URL: https://issues.apache.org/jira/browse/DRILL-6194
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.10.0
Reporter: John Humphreys


Let's say you have files stored in the standard hierarchical way and the data 
is held in parquet:
 * year/
 ** month/
 *** day/
  filev2.parquet

If you cache the metadata under year/ or one of the other levels, and then you 
replace filev2.parquet with filev3.parquet, you will get errors when running 
queries relating to file2.parquet not being present.

I'm specifically seeing this when using maxdir(), and dir0/1/2 for 
year/month/day but I suspect its a general issue.

Queries using cached metadata should not fail if the metadata is outdated; they 
should just choose not to use it.  Otherwise there should be an uncache 
operator for the metadata so people can just decide to stop using it.

It's not always efficient to run a metadata refresh before every single query 
you do, and its difficult to run one from every program that touches HDFS files 
immediately after it touches them.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1170) YARN support for Drill

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1170:

Labels: doc-impacting  (was: ready-to-commit)

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.12.0
>Reporter: Neeraja
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
> Attachments: Drill-on-YARNDesignOverview.pdf, 
> Drill-on-YARNUserGuide.pdf
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1170) YARN support for Drill

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1170:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.12.0
>Reporter: Neeraja
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
> Attachments: Drill-on-YARNDesignOverview.pdf, 
> Drill-on-YARNUserGuide.pdf
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-1170) YARN support for Drill

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-1170:

Labels: ready-to-commit  (was: )

> YARN support for Drill
> --
>
> Key: DRILL-1170
> URL: https://issues.apache.org/jira/browse/DRILL-1170
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.12.0
>Reporter: Neeraja
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
> Attachments: Drill-on-YARNDesignOverview.pdf, 
> Drill-on-YARNUserGuide.pdf
>
>
> This is a tracking item to make Drill work with YARN.
> Below are few requirements/needs to consider.
> - Drill should run as an YARN based application, side by side with other YARN 
> enabled applications (on same nodes or different nodes). Both memory and CPU 
> resources of Drill should be controlled in this mechanism.
> - As an YARN enabled application, Drill resource consumption should be 
> adaptive to the load on the cluster. For ex: When there is no load on the 
> Drill , Drill should consume no resources on the cluster.  As the load on 
> Drill increases, resources permitting, usage should grow proportionally.
> - Low latency is a key requirement for Apache Drill along with support for 
> multiple users (concurrency in 100s-1000s). This should be supported when run 
> as YARN application as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6153) Revised operator framework

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6153:

Labels: ready-to-commit  (was: )

> Revised operator framework
> --
>
> Key: DRILL-6153
> URL: https://issues.apache.org/jira/browse/DRILL-6153
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Adds the core operator framework which is the foundation for the revised scan 
> operators. This is another incremental part of the batch sizing project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6153) Revised operator framework

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6153:

Reviewer: Padma Penumarthy

> Revised operator framework
> --
>
> Key: DRILL-6153
> URL: https://issues.apache.org/jira/browse/DRILL-6153
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.13.0
>
>
> Adds the core operator framework which is the foundation for the revised scan 
> operators. This is another incremental part of the batch sizing project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6153) Revised operator framework

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6153:

Affects Version/s: 1.13.0

> Revised operator framework
> --
>
> Key: DRILL-6153
> URL: https://issues.apache.org/jira/browse/DRILL-6153
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.13.0
>
>
> Adds the core operator framework which is the foundation for the revised scan 
> operators. This is another incremental part of the batch sizing project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380160#comment-16380160
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1138
  
You are basically reverting changes done in DRILL-3810 to support schema 
validation in Avro. 
Avro format is strict and has schema. Should Drill treat it the same way or 
do loosen parsing?

We should evaluate the option of leaving schema for avro but adding 
implicit columns. Maybe the change won't be as easy as changing 
`AvroDrillTable` to `DynamicDrillTable` but it might be more correct.

You can also start mailing thread on dev / user list, asking about treating 
avro as dynamic format (listing pros and cons) and get feedback from the users. 

[1] https://issues.apache.org/jira/browse/DRILL-3810


> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6188:

Labels: ready-to-commit  (was: )

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6188:

Reviewer: Arina Ielchiieva

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6188:

Affects Version/s: 1.12.0

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6188:
---

Assignee: Parth Chandra  (was: Arina Ielchiieva)

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6188:

Fix Version/s: 1.13.0

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380132#comment-16380132
 ] 

ASF GitHub Bot commented on DRILL-6188:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1132
  
+1


> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-6188:
---

Assignee: Arina Ielchiieva

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Arina Ielchiieva
>Priority: Major
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6188) Fix C++ client build on Centos 7 and OSX

2018-02-28 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380121#comment-16380121
 ] 

Parth Chandra commented on DRILL-6188:
--

[~arina], [~amansinha100], this was reported by Patrick who has verified the 
fix. Can I please get a committer review?

> Fix C++ client build on Centos 7 and OSX 
> -
>
> Key: DRILL-6188
> URL: https://issues.apache.org/jira/browse/DRILL-6188
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Priority: Major
>
> compile issue on CentOS 7:
> {quote}In file included from 
> /root/default/private-drill/contrib/native/client/src/clientlib/utils.cpp:22:0:
>  /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp: 
> In constructor 'Drill::Logger::Logger()':
>  
> /root/default/private-drill/contrib/native/client/src/clientlib/logger.hpp:38:29:
>  error: 'cout' is not a member of 'std'
>  m_pOutStream = &std::cout;
>  ^
>  make[2]: *** [src/clientlib/CMakeFiles/drillClient.dir/utils.cpp.o] Error 1
>  make[1]: *** [src/clientlib/CMakeFiles/drillClient.dir/all] Error 2
>  make: *** [all] Error 2
> {quote}
> OSX - has this compile error:
> {quote}In file included from 
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.cpp:34:
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:185:39:
>  error: 'm_bHasError' is a private member of 'Drill::DrillClientQueryHandle'
>  void setHasError(bool hasError)
> Unknown macro: \{ m_bHasError = hasError; }
> ^
>  
> /Users/mapr/private-drill/contrib/native/client/src/clientlib/drillClientImpl.hpp:158:10:
>  note: declared private here
>  bool m_bHasError;
>  ^
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380114#comment-16380114
 ] 

ASF GitHub Bot commented on DRILL-4120:
---

GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/1138

DRILL-4120: Allow implicit columns for Avro storage format

Existing implementation of `AvroDrillTabl` does not allow dynamic columns 
discovering. `AvroDrillTable.getRowType()` method returns `RelDataTypeImlp` 
instance with the list of all table columns. It forces validator to check 
columns from select list in `RowType` list. It makes impossible to use implicit 
columns.

This fix replaces the usage of `AvroDrillTable` by `DynamicDrillTable` for 
Avro format and also allows usage of non-existent columns in Avro tables to be 
consistent with other storage formats.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-4120

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1138


commit 402accca668481bb6816aad438c867781157fac6
Author: Volodymyr Vysotskyi 
Date:   2018-02-27T16:39:22Z

DRILL-4120: Allow implicit columns for Avro storage format




> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6191) Need more information on TCP flags

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380103#comment-16380103
 ] 

ASF GitHub Bot commented on DRILL-6191:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1134
  
+1


> Need more information on TCP flags
> --
>
> Key: DRILL-6191
> URL: https://issues.apache.org/jira/browse/DRILL-6191
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
>  
> This is a small fix based on input from Charles Givre



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380097#comment-16380097
 ] 

ASF GitHub Bot commented on DRILL-6190:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1133
  
@tdunning travis fails with `
Failed tests: 
  
TestPcapRecordReader.testDistinctQuery:51->runSQLVerifyCount:56->printResultAndVerifyRowCount:68
 expected:<1> but was:<2>`


> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6190) Packets can be bigger than strictly legal

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380089#comment-16380089
 ] 

ASF GitHub Bot commented on DRILL-6190:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1133
  
+1


> Packets can be bigger than strictly legal
> -
>
> Key: DRILL-6190
> URL: https://issues.apache.org/jira/browse/DRILL-6190
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Ted Dunning
>Assignee: Ted Dunning
>Priority: Major
> Fix For: 1.13.0
>
>
> Packets, especially those generated by malware, can be bigger than the legal 
> limit for IP. The fix is to leave 64kB padding in the buffers instead of 9kB.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380071#comment-16380071
 ] 

Volodymyr Vysotskyi commented on DRILL-6193:


This simplification of filter condition happens during the creation of new 
filter when used {{RelBuilder.filter()}} method. With Calcite upgrade, we 
started using {{DrillRelBuilder}} instead of Calcites {{RelBuilder}}.

To fix this issue we can override {{RelBuilder.filter()}} method in 
{{DrillRelBuilder}} and after simplification of filter condition split AND 
predicates, check and add back if needed every predicate if it can be used in 
the join condition.

Here is an example how to determine if equals predicate may be used in the 
join: 
[https://github.com/apache/calcite/commit/b60b67eb8f62463ccbc230358969ef2450cdbe05?diff=unified#diff-a6a937c185ffdee97b49b98530c5112dR713].

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread Volodymyr Vysotskyi (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379990#comment-16379990
 ] 

Volodymyr Vysotskyi commented on DRILL-4120:


Existing implementation of AvroDrillTable does not allow dynamic columns 
discovering. {{AvroDrillTable.getRowType()}} method returns {{RelDataTypeImlp}} 
instance with the list of all table columns. It forces validator to check 
columns from select list in {{RowType}} list. It makes impossible to use 
implicit columns.

I think the usage of {{AvroDrillTable}} should be replaced by 
{{DynamicDrillTable}} for Avro format and also should be allowed usage of 
non-existent columns in Avro tables to be consistent with other storage formats.

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-5270) Improve loading of profiles listing in the WebUI

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5270:

Fix Version/s: (was: 1.13.0)
   1.14.0

> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4120:

Reviewer: Arina Ielchiieva

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4120:

Fix Version/s: (was: Future)
   1.13.0

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-4120) dir0 does not work when the directory structure contains Avro files

2018-02-28 Thread Volodymyr Vysotskyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi reassigned DRILL-4120:
--

Assignee: Volodymyr Vysotskyi  (was: Bhallamudi Venkata Siva Kamesh)

> dir0 does not work when the directory structure contains Avro files
> ---
>
> Key: DRILL-4120
> URL: https://issues.apache.org/jira/browse/DRILL-4120
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.3.0
>Reporter: Stefán Baxter
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: Future
>
> Attachments: 
> 0001-DRILL-4120-Support-reading-directories-having-avro-f.patch
>
>
> Any select statment containing dirN fails if the target directory structure 
> contains Avro files.
> Steps to test:
> 1. create a simple directory structure
> 2. copy an avro file into each directory
> 3. execute a query containing dir0
> outcome:
> Error: VALIDATION ERROR: From line 1, column 117 to line 1, column 120: 
> Column 'dir0' not found in any table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6185) Error is displaying while accessing query profiles via the Web-UI

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6185:

Reviewer: Arina Ielchiieva

> Error is displaying while accessing query profiles via the Web-UI
> -
>
> Key: DRILL-6185
> URL: https://issues.apache.org/jira/browse/DRILL-6185
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> *Steps:*
>  # Execute the following query:
> {code:sql}
> show schemas;
> {code}
> # On the Web-UI, go to the Profiles tab
> # Open the profile for the query you executed
> *Expected result:* You can access to the profile entry
> *Actual result:* Error is displayed:
> {code:json}
> {
>   "errorMessage" : "1"
> }
> {code}
> *Note:* This error doesn't happen with every query. For example, "select * 
> from system.version" can be accessed without error, while "show tables", "use 
> dfs", "alter sessions" etc end with this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6185) Error is displaying while accessing query profiles via the Web-UI

2018-02-28 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6185:

Labels: ready-to-commit  (was: )

> Error is displaying while accessing query profiles via the Web-UI
> -
>
> Key: DRILL-6185
> URL: https://issues.apache.org/jira/browse/DRILL-6185
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Kunal Khatua
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> *Steps:*
>  # Execute the following query:
> {code:sql}
> show schemas;
> {code}
> # On the Web-UI, go to the Profiles tab
> # Open the profile for the query you executed
> *Expected result:* You can access to the profile entry
> *Actual result:* Error is displayed:
> {code:json}
> {
>   "errorMessage" : "1"
> }
> {code}
> *Note:* This error doesn't happen with every query. For example, "select * 
> from system.version" can be accessed without error, while "show tables", "use 
> dfs", "alter sessions" etc end with this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6185) Error is displaying while accessing query profiles via the Web-UI

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379954#comment-16379954
 ] 

ASF GitHub Bot commented on DRILL-6185:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1137
  
+1, LGTM.


> Error is displaying while accessing query profiles via the Web-UI
> -
>
> Key: DRILL-6185
> URL: https://issues.apache.org/jira/browse/DRILL-6185
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Anton Gozhiy
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.13.0
>
>
> *Steps:*
>  # Execute the following query:
> {code:sql}
> show schemas;
> {code}
> # On the Web-UI, go to the Profiles tab
> # Open the profile for the query you executed
> *Expected result:* You can access to the profile entry
> *Actual result:* Error is displayed:
> {code:json}
> {
>   "errorMessage" : "1"
> }
> {code}
> *Note:* This error doesn't happen with every query. For example, "select * 
> from system.version" can be accessed without error, while "show tables", "use 
> dfs", "alter sessions" etc end with this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)