date:20180724

[jira] [Commented] (DRILL-6611) Add [meta]-[Enter] js handler for query form submission

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555164#comment-16555164
 ] 

ASF GitHub Bot commented on DRILL-6611:
---

kkhatua commented on issue #1392: Implements DRILL-6611 to enable meta-enter 
query submission in web query interface
URL: https://github.com/apache/drill/pull/1392#issuecomment-407639822
 
 
   @hrbrmstr can you also change the PR's title to 
   **DRILL-6611: Enable meta-enter query submission in web query interface**
   This will facilitate the Apache JIRA system to pick up the PR and 
automatically link to it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add [meta]-[Enter] js handler for query form submission
> ---
>
> Key: DRILL-6611
> URL: https://issues.apache.org/jira/browse/DRILL-6611
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.14.0
>Reporter: Bob Rudis
>Assignee: Bob Rudis
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.15.0
>
>
> The new ACE-based SQL query editor is great. Being able to submit the form 
> without using a mouse would be even better.
> Adding:
>  
> {noformat}
> document.getElementById('queryForm')
>  .addEventListener('keydown', function(e) {
>  if (!(e.keyCode == 13 && e.metaKey)) return;
>  if (e.target.form) doSubmitQueryWithUserName();
> });
> {noformat}
> {{to ./exec/java-exec/src/main/resources/rest/query/query.ftl adds such 
> support.}}
> I can file a PR with the code if desired.
> --
> Functionality (for the documentation):
> This JIRA's commit introduces the following to Drill:
> When composing queries in the web query editor it is now possible to submit 
> the query text by using the {{Meta+Enter}} key combination. This will trigger 
> the same action as pressing the {{Submit}} button. On Mac keyboards 
> {{Meta+Enter}} is {{Cmd+Enter}}. On Windows or Linux is {{Ctrl+Enter}} though 
> Linux users may have keymapped the {{Meta}} key to another physical keyboard 
> key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555113#comment-16555113
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on issue #1334: DRILL-6385: Support JPPD feature
URL: https://github.com/apache/drill/pull/1334#issuecomment-407627190
 
 
   @amansinha100 thanks for your valuable review. Since being on vacation , 
others will be commented and updated later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.14.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555105#comment-16555105
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on a change in pull request #1334: DRILL-6385: Support 
JPPD feature
URL: https://github.com/apache/drill/pull/1334#discussion_r204974446
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/filter/BloomFilterCreator.java
 ##
 @@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.work.filter;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.memory.BufferAllocator;
+
+public class BloomFilterCreator {
 
 Review comment:
   Current implementation is one bloom filter one join column.  To your 
example, multi-column join , will generate two bloom filters.  The reason to 
this implementation is to achieve one vector column memory access by one  hash 
computation. But the Murmur hash 's complex computation ate up the pipeline 
performance , the result performance does not so good. So I will change it to 
your assumes implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.14.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the p

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555102#comment-16555102
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on a change in pull request #1334: DRILL-6385: Support 
JPPD feature
URL: https://github.com/apache/drill/pull/1334#discussion_r204973381
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/filter/RuntimeFilterManager.java
 ##
 @@ -0,0 +1,586 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.work.filter;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinInfo;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.commons.collections.CollectionUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.AccountingDataTunnel;
+import org.apache.drill.exec.ops.Consumer;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.ops.SendingAccountor;
+import org.apache.drill.exec.ops.StatusHandler;
+import org.apache.drill.exec.physical.PhysicalPlan;
+
+import org.apache.drill.exec.physical.base.AbstractPhysicalVisitor;
+import org.apache.drill.exec.physical.base.Exchange;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.config.BroadcastExchange;
+import org.apache.drill.exec.physical.config.HashJoinPOP;
+import org.apache.drill.exec.planner.fragment.Fragment;
+import org.apache.drill.exec.planner.fragment.Wrapper;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.proto.BitData;
+import org.apache.drill.exec.proto.CoordinationProtos;
+import org.apache.drill.exec.proto.GeneralRPCProtos;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.rpc.RpcOutcomeListener;
+import org.apache.drill.exec.rpc.data.DataTunnel;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.util.Pointer;
+import org.apache.drill.exec.work.QueryWorkUnit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * This class traverses the physical operator tree to find the HashJoin 
operator
+ * for which is JPPD (join predicate push down) is possible. The prerequisite 
to do JPPD
+ * is:
+ * 1. The join condition is equality
+ * 2. The physical join node is a HashJoin one
+ * 3. The probe side children of the HashJoin node should not contain a 
blocking operator like HashAgg
+ */
+public class RuntimeFilterManager {
+
+  private Wrapper rootWrapper;
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
endpoints
+  private Map> 
joinMjId2probdeScanEps = new HashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
number
+  private Map joinMjId2scanSize = new ConcurrentHashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side scan 
node's belonging major fragment id
+  private Map joinMjId2ScanMjId = new HashMap<>();
+
+  private RuntimeFilterWritable aggregatedRuntimeFilter;
+
+  private DrillbitContext drillbitContext;
+
+  private SendingAccountor sendingAccountor = new SendingAccountor();
+
+  private String lineSeparator;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RuntimeFilterManager.class);
+
+  /**
+   * This class maintains context for the runtime join push down's filter 
management. It
+   * does a traversal of the physical operators by leveraging the root wrapper 
which indirectly
+   * holds the global PhysicalOperator tree a

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555101#comment-16555101
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on a change in pull request #1334: DRILL-6385: Support 
JPPD feature
URL: https://github.com/apache/drill/pull/1334#discussion_r204973345
 
 

 ##
 File path: exec/java-exec/src/main/resources/drill-module.conf
 ##
 @@ -455,6 +455,8 @@ drill.exec.options: {
 exec.hashjoin.num_partitions: 32,
 exec.hashjoin.num_rows_in_batch: 1024,
 exec.hashjoin.max_batches_in_memory: 0,
+exec.hashjoin.enable.runtime_filter: true,
 
 Review comment:
   agree


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support JPPD (Join Predicate Push Down)
> ---
>
> Key: DRILL-6385
> URL: https://issues.apache.org/jira/browse/DRILL-6385
> Project: Apache Drill
>  Issue Type: New Feature
>  Components:  Server, Execution - Flow
>Affects Versions: 1.14.0
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>
> This feature is to support the JPPD (Join Predicate Push Down). It will 
> benefit the HashJoin ,Broadcast HashJoin performance by reducing the number 
> of rows to send across the network ,the memory consumed. This feature is 
> already supported by Impala which calls it RuntimeFilter 
> ([https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html]).
>  The first PR will try to push down a bloom filter of HashJoin node to 
> Parquet’s scan node.   The propose basic procedure is described as follow:
>  # The HashJoin build side accumulate the equal join condition rows to 
> construct a bloom filter. Then it sends out the bloom filter to the foreman 
> node.
>  # The foreman node accept the bloom filters passively from all the fragments 
> that has the HashJoin operator. It then aggregates the bloom filters to form 
> a global bloom filter.
>  # The foreman node broadcasts the global bloom filter to all the probe side 
> scan nodes which maybe already have send out partial data to the hash join 
> nodes(currently the hash join node will prefetch one batch from both sides ).
>       4.  The scan node accepts a global bloom filter from the foreman node. 
> It will filter the rest rows satisfying the bloom filter.
>  
> To implement above execution flow, some main new notion described as below:
>       1. RuntimeFilter
> It’s a filter container which may contain BloomFilter or MinMaxFilter.
>       2. RuntimeFilterReporter
> It wraps the logic to send hash join’s bloom filter to the foreman.The 
> serialized bloom filter will be sent out through the data tunnel.This object 
> will be instanced by the FragmentExecutor and passed to the 
> FragmentContext.So the HashJoin operator can obtain it through the 
> FragmentContext.
>      3. RuntimeFilterRequestHandler
> It is responsible to accept a SendRuntimeFilterRequest RPC to strip the 
> actual BloomFilter from the network. It then translates this filter to the 
> WorkerBee’s new interface registerRuntimeFilter.
> Another RPC type is BroadcastRuntimeFilterRequest. It will register the 
> accepted global bloom filter to the WorkerBee by the registerRuntimeFilter 
> method and then propagate to the FragmentContext through which the probe side 
> scan node can fetch the aggregated bloom filter.
>       4.RuntimeFilterManager
> The foreman will instance a RuntimeFilterManager .It will indirectly get 
> every RuntimeFilter by the WorkerBee. Once all the BloomFilters have been 
> accepted and aggregated . It will broadcast the aggregated bloom filter to 
> all the probe side scan nodes through the data tunnel by a 
> BroadcastRuntimeFilterRequest RPC.
>      5. RuntimeFilterEnableOption 
>  A global option will be added to decide whether to enable this new feature.
>  
> Welcome suggestion and advice from you.The related PR will be presented as 
> soon as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555100#comment-16555100
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on a change in pull request #1334: DRILL-6385: Support 
JPPD feature
URL: https://github.com/apache/drill/pull/1334#discussion_r204973201
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/filter/RuntimeFilterManager.java
 ##
 @@ -0,0 +1,586 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.work.filter;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinInfo;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.commons.collections.CollectionUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.AccountingDataTunnel;
+import org.apache.drill.exec.ops.Consumer;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.ops.SendingAccountor;
+import org.apache.drill.exec.ops.StatusHandler;
+import org.apache.drill.exec.physical.PhysicalPlan;
+
+import org.apache.drill.exec.physical.base.AbstractPhysicalVisitor;
+import org.apache.drill.exec.physical.base.Exchange;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.config.BroadcastExchange;
+import org.apache.drill.exec.physical.config.HashJoinPOP;
+import org.apache.drill.exec.planner.fragment.Fragment;
+import org.apache.drill.exec.planner.fragment.Wrapper;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.proto.BitData;
+import org.apache.drill.exec.proto.CoordinationProtos;
+import org.apache.drill.exec.proto.GeneralRPCProtos;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.rpc.RpcOutcomeListener;
+import org.apache.drill.exec.rpc.data.DataTunnel;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.util.Pointer;
+import org.apache.drill.exec.work.QueryWorkUnit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * This class traverses the physical operator tree to find the HashJoin 
operator
+ * for which is JPPD (join predicate push down) is possible. The prerequisite 
to do JPPD
+ * is:
+ * 1. The join condition is equality
+ * 2. The physical join node is a HashJoin one
+ * 3. The probe side children of the HashJoin node should not contain a 
blocking operator like HashAgg
+ */
+public class RuntimeFilterManager {
+
+  private Wrapper rootWrapper;
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
endpoints
+  private Map> 
joinMjId2probdeScanEps = new HashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
number
+  private Map joinMjId2scanSize = new ConcurrentHashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side scan 
node's belonging major fragment id
+  private Map joinMjId2ScanMjId = new HashMap<>();
+
+  private RuntimeFilterWritable aggregatedRuntimeFilter;
+
+  private DrillbitContext drillbitContext;
+
+  private SendingAccountor sendingAccountor = new SendingAccountor();
+
+  private String lineSeparator;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RuntimeFilterManager.class);
+
+  /**
+   * This class maintains context for the runtime join push down's filter 
management. It
+   * does a traversal of the physical operators by leveraging the root wrapper 
which indirectly
+   * holds the global PhysicalOperator tree a

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555031#comment-16555031
 ] 

ASF GitHub Bot commented on DRILL-6385:
---

weijietong commented on a change in pull request #1334: DRILL-6385: Support 
JPPD feature
URL: https://github.com/apache/drill/pull/1334#discussion_r204965485
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/filter/RuntimeFilterManager.java
 ##
 @@ -0,0 +1,586 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.work.filter;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.JoinInfo;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.commons.collections.CollectionUtils;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.AccountingDataTunnel;
+import org.apache.drill.exec.ops.Consumer;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.ops.SendingAccountor;
+import org.apache.drill.exec.ops.StatusHandler;
+import org.apache.drill.exec.physical.PhysicalPlan;
+
+import org.apache.drill.exec.physical.base.AbstractPhysicalVisitor;
+import org.apache.drill.exec.physical.base.Exchange;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.config.BroadcastExchange;
+import org.apache.drill.exec.physical.config.HashJoinPOP;
+import org.apache.drill.exec.planner.fragment.Fragment;
+import org.apache.drill.exec.planner.fragment.Wrapper;
+import org.apache.drill.exec.planner.physical.HashJoinPrel;
+import org.apache.drill.exec.planner.physical.ScanPrel;
+import org.apache.drill.exec.proto.BitData;
+import org.apache.drill.exec.proto.CoordinationProtos;
+import org.apache.drill.exec.proto.GeneralRPCProtos;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.rpc.RpcOutcomeListener;
+import org.apache.drill.exec.rpc.data.DataTunnel;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.util.Pointer;
+import org.apache.drill.exec.work.QueryWorkUnit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+
+/**
+ * This class traverses the physical operator tree to find the HashJoin 
operator
+ * for which is JPPD (join predicate push down) is possible. The prerequisite 
to do JPPD
+ * is:
+ * 1. The join condition is equality
+ * 2. The physical join node is a HashJoin one
+ * 3. The probe side children of the HashJoin node should not contain a 
blocking operator like HashAgg
+ */
+public class RuntimeFilterManager {
+
+  private Wrapper rootWrapper;
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
endpoints
+  private Map> 
joinMjId2probdeScanEps = new HashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side nodes's 
number
+  private Map joinMjId2scanSize = new ConcurrentHashMap<>();
+  //HashJoin node's major fragment id to its corresponding probe side scan 
node's belonging major fragment id
+  private Map joinMjId2ScanMjId = new HashMap<>();
+
+  private RuntimeFilterWritable aggregatedRuntimeFilter;
+
+  private DrillbitContext drillbitContext;
+
+  private SendingAccountor sendingAccountor = new SendingAccountor();
+
+  private String lineSeparator;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RuntimeFilterManager.class);
+
+  /**
+   * This class maintains context for the runtime join push down's filter 
management. It
+   * does a traversal of the physical operators by leveraging the root wrapper 
which indirectly
+   * holds the global PhysicalOperator tree a

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554957#comment-16554957
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204952967
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/RuleInstance.java
 ##
 @@ -140,4 +145,14 @@
 
   SubQueryRemoveRule SUB_QUERY_JOIN_REMOVE_RULE =
   new 
SubQueryRemoveRule.SubQueryJoinRemoveRule(DrillRelFactories.LOGICAL_BUILDER);
+
+  FilterAggregateTransposeRule DRILL_FILTER_AGGREGATE_TRANSPOSE_RULE =
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554958#comment-16554958
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on issue #1372: DRILL-6589: Push transitive closure predicates 
past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#issuecomment-407595809
 
 
   @vdiravka thanks for the review. I have addressed your review comments. 
Please take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554955#comment-16554955
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204952905
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushFilterPastProjectRule.java
 ##
 @@ -50,9 +54,12 @@
   }
 
   private DrillPushFilterPastProjectRule(RelBuilderFactory relBuilderFactory) {
-super(operand(LogicalFilter.class, operand(LogicalProject.class, any())), 
relBuilderFactory,null);
+super(operand(LogicalFilter.class, operand(LogicalProject.class, any())), 
relBuilderFactory, null);
   }
 
+  private DrillPushFilterPastProjectRule(RelBuilderFactory relBuilderFactory, 
boolean forDrill) {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554953#comment-16554953
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204952377
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/RuleInstance.java
 ##
 @@ -140,4 +145,14 @@
 
   SubQueryRemoveRule SUB_QUERY_JOIN_REMOVE_RULE =
   new 
SubQueryRemoveRule.SubQueryJoinRemoveRule(DrillRelFactories.LOGICAL_BUILDER);
+
+  FilterAggregateTransposeRule DRILL_FILTER_AGGREGATE_TRANSPOSE_RULE =
+  new FilterAggregateTransposeRule(Filter.class,
+  DrillRelBuilder.proto(DrillRelFactories.DRILL_LOGICAL_FILTER_FACTORY,
+  DrillRelFactories.DRILL_LOGICAL_AGGREGATE_FACTORY), 
Aggregate.class);
+
+  FilterProjectTransposeRule DRILL_FILTER_PROJECT_TRANSPOSE_RULE =
 
 Review comment:
   Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554945#comment-16554945
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204951832
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
 ##
 @@ -122,4 +127,16 @@ public RelNode createJoin(RelNode left, RelNode right,
 }
   }
 
+  private static class DrillAggregateFactoryImpl implements 
RelFactories.AggregateFactory {
+
+@Override
+public RelNode createAggregate(RelNode input, boolean indicator, 
ImmutableBitSet groupSet,
+   ImmutableList groupSets, 
List aggCalls) {
+  try {
+return new DrillAggregateRel(input.getCluster(), input.getTraitSet(), 
input, indicator, groupSet, groupSets, aggCalls);
+  } catch (InvalidRelException ex) {
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554944#comment-16554944
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204951732
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
 ##
 @@ -122,4 +127,16 @@ public RelNode createJoin(RelNode left, RelNode right,
 }
   }
 
+  private static class DrillAggregateFactoryImpl implements 
RelFactories.AggregateFactory {
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554942#comment-16554942
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204951494
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestAggregateFunctions.java
 ##
 @@ -732,6 +732,25 @@ public void testPushFilterInExprPastAgg() throws 
Exception {
 .build().run();
   }
 
+  @Test
+  public void testTransitiveFilterPushPastAgg() throws Exception {
 
 Review comment:
   Moved testcase. I decided to remove the push filter past project rule from 
TC. It was causing too many side-effects (plan patterns breaking etc.). 
Moreover, it may not be very useful from a cost perspective. We can 
re-introduce it if it were to be applied in a cost-based manner (via Volcano 
planner).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554943#comment-16554943
 ] 

ASF GitHub Bot commented on DRILL-6589:
---

gparai commented on a change in pull request #1372: DRILL-6589: Push transitive 
closure predicates past aggregates/projects
URL: https://github.com/apache/drill/pull/1372#discussion_r204951637
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
 ##
 @@ -92,7 +97,7 @@ public RelNode createProject(RelNode child,
 
   /**
* Implementation of {@link RelFactories.FilterFactory} that
-   * returns a vanilla {@link LogicalFilter}.
+   * returns a vanilla LogicalFilter.
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Push transitive closure generated predicates past aggregates/projects
> -
>
> Key: DRILL-6589
> URL: https://issues.apache.org/jira/browse/DRILL-6589
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>Priority: Major
> Fix For: 1.15.0
>
>
> Here is a sample query that may benefit from this optimization:
> SELECT * FROM T1 WHERE a1 = 5 AND a1 IN (SELECT a2 FROM T2); 
> Here the transitive predicate a2 = 5 would be pushed past the aggregate due 
> to this optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6632) drill-jdbc-all jar size limit too small for release build

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554841#comment-16554841
 ] 

ASF GitHub Bot commented on DRILL-6632:
---

Ben-Zvi closed pull request #1396: DRILL-6632: Increase jdbc-all jar size limit 
to 3650
URL: https://github.com/apache/drill/pull/1396
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/exec/jdbc-all/pom.xml b/exec/jdbc-all/pom.xml
index f7af5110e50..983a98f4e2a 100644
--- a/exec/jdbc-all/pom.xml
+++ b/exec/jdbc-all/pom.xml
@@ -506,7 +506,7 @@
   This is likely due to you adding new dependencies to a 
java-exec and not updating the excludes in this module. This is important as it 
minimizes the size of the dependency of Drill application users.
 
   
-  3600
+  3650
   1500
   

${project.build.directory}/drill-jdbc-all-${project.version}.jar


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> drill-jdbc-all jar size limit too small for release build
> -
>
> Key: DRILL-6632
> URL: https://issues.apache.org/jira/browse/DRILL-6632
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.14.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>Priority: Blocker
> Fix For: 1.14.0
>
>
> Among the changes for DRILL-6294, the limit for the drill-jdbc-all jar file 
> size was increased to 3600, about what was needed to accommodate the new 
> Calcite version.  
> However a Release build requires a slightly larger size (probably due to 
> adding several of those 
> *org.codehaus.plexus.compiler.javac.JavacCompiler6931842185404907145arguments*).
> Proposed Fix: Increase the size limit to 36,500,000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6632) drill-jdbc-all jar size limit too small for release build

2018-07-24 Thread Boaz Ben-Zvi (JIRA)

Boaz Ben-Zvi created DRILL-6632:
---

 Summary: drill-jdbc-all jar size limit too small for release build
 Key: DRILL-6632
 URL: https://issues.apache.org/jira/browse/DRILL-6632
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
Affects Versions: 1.14.0
Reporter: Boaz Ben-Zvi
Assignee: Boaz Ben-Zvi
 Fix For: 1.14.0


Among the changes for DRILL-6294, the limit for the drill-jdbc-all jar file 
size was increased to 3600, about what was needed to accommodate the new 
Calcite version.  

However a Release build requires a slightly larger size (probably due to adding 
several of those 
*org.codehaus.plexus.compiler.javac.JavacCompiler6931842185404907145arguments*).

Proposed Fix: Increase the size limit to 36,500,000

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554825#comment-16554825
 ] 

ASF GitHub Bot commented on DRILL-6629:
---

ppadma commented on a change in pull request #1395: DRILL-6629 BitVector split 
and transfer does not work correctly for transfer length < 8
URL: https://github.com/apache/drill/pull/1395#discussion_r204918913
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/BitVector.java
 ##
 @@ -323,7 +323,8 @@ public void splitAndTransferTo(int startIndex, int length, 
BitVector target) {
   if (length % 8 != 0) {
 // start is not byte aligned so we have to copy some bits from the 
last full byte read in the
 // previous loop
-byte lastButOneByte = byteIPlus1;
+// if numBytesHoldingSourceBits == 1, lastButOneByte is the first 
byte, but we have not read it yet, so read it
+byte lastButOneByte = (numBytesHoldingSourceBits == 1) ? 
this.data.getByte(firstByteIndex) : byteIPlus1;
 
 Review comment:
   @bitblender I think there could be a problem here. please check. If you are 
copying say from firstBitOffset 2, length 4. We want to copy 4  bits only. But, 
this might copy 6 bits. bitsFromLastButOneByte will be all bits from 
firstBitOffset to the end of the byte. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BitVector split and transfer does not work correctly for transfer length < 8
> 
>
> Key: DRILL-6629
> URL: https://issues.apache.org/jira/browse/DRILL-6629
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
> Environment: BitVector split and transfer does not work correctly for 
> transfer length < 8.
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

2018-07-24 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6631:


Assignee: Parth Chandra

> Wrong result from LateralUnnest query with aggregation and order by
> ---
>
> Key: DRILL-6631
> URL: https://issues.apache.org/jira/browse/DRILL-6631
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Reported by Chun:
> The following query gives correct result:
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> where customer.c_custkey = 101276;
> ++-+-+
> | c_custkey  |   c_name| totalprice  |
> ++-+-+
> | 101276 | Customer#000101276  | 82657.72|
> ++-+-+
> 1 row selected (6.184 seconds)
> {noformat}
> But if I remove the where clause and replace it with order by and limit, I 
> got the following empty result set. This is wrong.
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> ++-+-+
> | c_custkey  | c_name  | totalprice  |
> ++-+-+
> ++-+-+
> No rows selected (2.753 seconds)
> {noformat}
> Here is the plan for the query giving the correct result:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 472783.35, cumulative cost = {8242193.734985 
> rows, 4.10218543349E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14410
> 00-01  Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
> cumulative cost = {8194915.399985 rows, 4.0974575E7 cpu, 0.0 io, 
> 5.80956180479E9 network, 0.0 memory}, id = 14409
> 00-02UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
> ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.04999 
> rows, 3.955622594996E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14408
> 01-01  LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 472783.35, cumulative cost = {7249348.6 rows, 3.577395915E7 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 14407
> 01-03SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
> ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
> {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 14403
> 01-05  Filter(condition=[=($1, 101276)]) : rowType = 
> RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, 
> cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 14402
> 01-07Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14401
> 01-02StreamAgg(group=[{}],

[jira] [Updated] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

2018-07-24 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6631:
-
Fix Version/s: 1.15.0

> Wrong result from LateralUnnest query with aggregation and order by
> ---
>
> Key: DRILL-6631
> URL: https://issues.apache.org/jira/browse/DRILL-6631
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Reported by Chun:
> The following query gives correct result:
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> where customer.c_custkey = 101276;
> ++-+-+
> | c_custkey  |   c_name| totalprice  |
> ++-+-+
> | 101276 | Customer#000101276  | 82657.72|
> ++-+-+
> 1 row selected (6.184 seconds)
> {noformat}
> But if I remove the where clause and replace it with order by and limit, I 
> got the following empty result set. This is wrong.
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> ++-+-+
> | c_custkey  | c_name  | totalprice  |
> ++-+-+
> ++-+-+
> No rows selected (2.753 seconds)
> {noformat}
> Here is the plan for the query giving the correct result:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 472783.35, cumulative cost = {8242193.734985 
> rows, 4.10218543349E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14410
> 00-01  Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
> cumulative cost = {8194915.399985 rows, 4.0974575E7 cpu, 0.0 io, 
> 5.80956180479E9 network, 0.0 memory}, id = 14409
> 00-02UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
> ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.04999 
> rows, 3.955622594996E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14408
> 01-01  LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 472783.35, cumulative cost = {7249348.6 rows, 3.577395915E7 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 14407
> 01-03SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
> ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
> {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 14403
> 01-05  Filter(condition=[=($1, 101276)]) : rowType = 
> RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, 
> cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 14402
> 01-07Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : 
> rowType = RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 
> 3151889.0, cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 14401
> 01-02StreamAgg(group=[{}], totalpric

[jira] [Commented] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

2018-07-24 Thread Parth Chandra (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554664#comment-16554664
 ] 

Parth Chandra commented on DRILL-6631:
--

The issue is caused by incorrect handling of empty batches in the streaming 
aggregator. In case of empty input and no group by, streaming agg sends out a 
'special' batch with no ( or null) records and a row count of 1. Once a special 
batch has been sent, streaming agg always returned a NONE outcome on subsequent 
calls to next().

In a lateral/unnest subquery, this behaviour needs to be emulated for every 
empty batch produced by unnest.  However, we cannot return NONE after sending 
out such a batch, but must reset the state. Streaming agg is handling this 
incorrectly and returning NONE causing the query to terminate early.

There are other issues with the handling of state in such a case. However none 
of the issues is caught by the unit tests because they all have a group-by.

 

> Wrong result from LateralUnnest query with aggregation and order by
> ---
>
> Key: DRILL-6631
> URL: https://issues.apache.org/jira/browse/DRILL-6631
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Parth Chandra
>Priority: Major
>
> Reported by Chun:
> The following query gives correct result:
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> where customer.c_custkey = 101276;
> ++-+-+
> | c_custkey  |   c_name| totalprice  |
> ++-+-+
> | 101276 | Customer#000101276  | 82657.72|
> ++-+-+
> 1 row selected (6.184 seconds)
> {noformat}
> But if I remove the where clause and replace it with order by and limit, I 
> got the following empty result set. This is wrong.
> {noformat}
> 0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, 
> customer.c_name, orders.totalprice from customer, lateral (select 
> sum(t.o.o_totalprice) as totalprice from unnest(customer.c_orders) t(o) WHERE 
> t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> ++-+-+
> | c_custkey  | c_name  | totalprice  |
> ++-+-+
> ++-+-+
> No rows selected (2.753 seconds)
> {noformat}
> Here is the plan for the query giving the correct result:
> {noformat}
> 00-00Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
> totalprice): rowcount = 472783.35, cumulative cost = {8242193.734985 
> rows, 4.10218543349E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14410
> 00-01  Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
> RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
> cumulative cost = {8194915.399985 rows, 4.0974575E7 cpu, 0.0 io, 
> 5.80956180479E9 network, 0.0 memory}, id = 14409
> 00-02UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
> ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.04999 
> rows, 3.955622594996E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
> memory}, id = 14408
> 01-01  LateralJoin(correlation=[$cor1], joinType=[inner], 
> requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType 
> = RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 
> 472783.35, cumulative cost = {7249348.6 rows, 3.577395915E7 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 14407
> 01-03SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
> ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
> {6776561.35 rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
> 14403
> 01-05  Filter(condition=[=($1, 101276)]) : rowType = 
> RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, 
> cumulative cost = {6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 14402
> 01-07Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
> numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
> files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
> maprfs:///drill/testdata/lateral/tpchsf1/

[jira] [Created] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

2018-07-24 Thread Parth Chandra (JIRA)

Parth Chandra created DRILL-6631:


 Summary: Wrong result from LateralUnnest query with aggregation 
and order by
 Key: DRILL-6631
 URL: https://issues.apache.org/jira/browse/DRILL-6631
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Parth Chandra


Reported by Chun:

The following query gives correct result:
{noformat}
0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, customer.c_name, 
orders.totalprice from customer, lateral (select sum(t.o.o_totalprice) as 
totalprice from unnest(customer.c_orders) t(o) WHERE t.o.o_totalprice in 
(89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
where customer.c_custkey = 101276;
++-+-+
| c_custkey  |   c_name| totalprice  |
++-+-+
| 101276 | Customer#000101276  | 82657.72|
++-+-+
1 row selected (6.184 seconds)
{noformat}
But if I remove the where clause and replace it with order by and limit, I got 
the following empty result set. This is wrong.
{noformat}
0: jdbc:drill:zk=10.10.30.166:5181> select customer.c_custkey, customer.c_name, 
orders.totalprice from customer, lateral (select sum(t.o.o_totalprice) as 
totalprice from unnest(customer.c_orders) t(o) WHERE t.o.o_totalprice in 
(89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
order by customer.c_custkey limit 50;
++-+-+
| c_custkey  | c_name  | totalprice  |
++-+-+
++-+-+
No rows selected (2.753 seconds)
{noformat}
Here is the plan for the query giving the correct result:
{noformat}
00-00Screen : rowType = RecordType(ANY c_custkey, ANY c_name, ANY 
totalprice): rowcount = 472783.35, cumulative cost = {8242193.734985 rows, 
4.10218543349E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 memory}, id = 
14410
00-01  Project(c_custkey=[$0], c_name=[$1], totalprice=[$2]) : rowType = 
RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
cumulative cost = {8194915.399985 rows, 4.0974575E7 cpu, 0.0 io, 
5.80956180479E9 network, 0.0 memory}, id = 14409
00-02UnionExchange : rowType = RecordType(ANY c_custkey, ANY c_name, 
ANY totalprice): rowcount = 472783.35, cumulative cost = {7722132.04999 
rows, 3.955622594996E7 cpu, 0.0 io, 5.80956180479E9 network, 0.0 
memory}, id = 14408
01-01  LateralJoin(correlation=[$cor1], joinType=[inner], 
requiredColumns=[{0}], column excluded from output: =[`c_orders`]) : rowType = 
RecordType(ANY c_custkey, ANY c_name, ANY totalprice): rowcount = 472783.35, 
cumulative cost = {7249348.6 rows, 3.577395915E7 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 14407
01-03SelectionVectorRemover : rowType = RecordType(ANY c_orders, 
ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = {6776561.35 
rows, 2.442713975E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14403
01-05  Filter(condition=[=($1, 101276)]) : rowType = RecordType(ANY 
c_orders, ANY c_custkey, ANY c_name): rowcount = 472783.35, cumulative cost = 
{6303778.0 rows, 2.39543564E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14402
01-07Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/lateral/tpchsf1/json/customer, 
numFiles=10, columns=[`c_orders`, `c_custkey`, `c_name`], 
files=[maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_6.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_4.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_3.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_7.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_5.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_2.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_0.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_8.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_1.json, 
maprfs:///drill/testdata/lateral/tpchsf1/json/customer/0_0_9.json]]]) : rowType 
= RecordType(ANY c_orders, ANY c_custkey, ANY c_name): rowcount = 3151889.0, 
cumulative cost = {3151889.0 rows, 9455667.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 14401
01-02StreamAgg(group=[{}], totalprice=[SUM($0)]) : rowType = 
RecordType(ANY totalprice): rowcount = 1.0, cumulative cost = {4.0 rows, 19.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14406
01-04  Filter(condition=[OR(=($0, 89230.03), =($0, 270087.44), 
=($0, 246408.53), =($0, 82657.72), =($0, 153941.38), =($0, 65277.06), =($0, 
180309.76))]) : rowType = RecordType(ANY ITEM): rowcount = 1.0, cumulative cost 
= {3.0 rows, 7.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 14405
01-0

[jira] [Updated] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

2018-07-24 Thread Pritesh Maker (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6629:
-
Reviewer: Sorabh Hamirwasia

> BitVector split and transfer does not work correctly for transfer length < 8
> 
>
> Key: DRILL-6629
> URL: https://issues.apache.org/jira/browse/DRILL-6629
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
> Environment: BitVector split and transfer does not work correctly for 
> transfer length < 8.
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554588#comment-16554588
 ] 

ASF GitHub Bot commented on DRILL-6629:
---

bitblender commented on issue #1395: DRILL-6629 BitVector split and transfer 
does not work correctly for transfer length < 8
URL: https://github.com/apache/drill/pull/1395#issuecomment-407495625
 
 
   @HanumathRao @sohami Previous fix for the BitVector split and transfer was 
missing a case. Please review this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> BitVector split and transfer does not work correctly for transfer length < 8
> 
>
> Key: DRILL-6629
> URL: https://issues.apache.org/jira/browse/DRILL-6629
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
> Environment: BitVector split and transfer does not work correctly for 
> transfer length < 8.
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554546#comment-16554546
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204839799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,91 +62,126 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
+  /**
+   * Return true if exprStat.getMin is defined and true
+   */
+  private static Boolean minIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin(); }
+
+  /**
+   * Return true if exprStat.getMin is defined and false
+   */
+  private static Boolean minIsFalse(Statistics exprStat) { return 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMin(); }
+
+  /**
+   * Return true if exprStat.getMax is defined and true
+   */
+  private static Boolean maxIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMax(); }
+
+  /**
+   * Return true if exprStat.getMax is defined and false
+   */
+  private static Boolean maxIsFalse(Statistics exprStat) { return 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax(); }
+
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if (expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
   private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCou

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554545#comment-16554545
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204839765
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,91 +62,126 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
+  /**
+   * Return true if exprStat.getMin is defined and true
+   */
+  private static Boolean minIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin(); }
 
 Review comment:
   Functions suppressed because of next comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554547#comment-16554547
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204839867
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java
 ##
 @@ -165,12 +167,29 @@ protected void doOnMatch(RelOptRuleCall call, FilterPrel 
filter, ProjectPrel pro
   return;
 }
 
-
 RelNode newScan = ScanPrel.create(scan, scan.getTraitSet(), newGroupScan, 
scan.getRowType());;
 
 if (project != null) {
   newScan = project.copy(project.getTraitSet(), ImmutableList.of(newScan));
 }
+
+if (newGroupScan instanceof AbstractParquetGroupScan) {
+  RowsMatch matchAll = RowsMatch.ALL;
+  List rowGroupInfos = ((AbstractParquetGroupScan) 
newGroupScan).rowGroupInfos;
+  for (RowGroupInfo rowGroup : rowGroupInfos) {
+if (rowGroup.getRowsMatch() != RowsMatch.ALL) {
+  matchAll = RowsMatch.SOME;
+  break;
+}
+  }
+  if (matchAll == ParquetFilterPredicate.RowsMatch.ALL) {
+call.transformTo(newScan);
+  }
+} else {
+  final RelNode newFilter = filter.copy(filter.getTraitSet(), 
ImmutableList.of(newScan));
+  call.transformTo(newFilter);
+}
+
 final RelNode newFilter = filter.copy(filter.getTraitSet(), 
ImmutableList.of(newScan));
 
 Review comment:
   Exact. Suppressed duplicate lines


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554543#comment-16554543
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204839620
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -124,8 +124,7 @@ private static LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
*/
   private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
 return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if min value is not false or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+  isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin() ? 
RowsMatch.NONE : checkNull(exprStat)
 
 Review comment:
   Done added 12 unit tests for cases 
   a. ST:[min: true, max: true, num_nulls: 0]
   b. ST:[min: false, max: false, num_nulls: 0]
   c. ST:[min: false, max: true, num_nulls: 0]


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554456#comment-16554456
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204817724
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java
 ##
 @@ -165,12 +167,29 @@ protected void doOnMatch(RelOptRuleCall call, FilterPrel 
filter, ProjectPrel pro
   return;
 }
 
-
 RelNode newScan = ScanPrel.create(scan, scan.getTraitSet(), newGroupScan, 
scan.getRowType());;
 
 if (project != null) {
   newScan = project.copy(project.getTraitSet(), ImmutableList.of(newScan));
 }
+
+if (newGroupScan instanceof AbstractParquetGroupScan) {
+  RowsMatch matchAll = RowsMatch.ALL;
+  List rowGroupInfos = ((AbstractParquetGroupScan) 
newGroupScan).rowGroupInfos;
+  for (RowGroupInfo rowGroup : rowGroupInfos) {
+if (rowGroup.getRowsMatch() != RowsMatch.ALL) {
+  matchAll = RowsMatch.SOME;
+  break;
+}
+  }
+  if (matchAll == ParquetFilterPredicate.RowsMatch.ALL) {
+call.transformTo(newScan);
+  }
+} else {
+  final RelNode newFilter = filter.copy(filter.getTraitSet(), 
ImmutableList.of(newScan));
+  call.transformTo(newFilter);
+}
+
 final RelNode newFilter = filter.copy(filter.getTraitSet(), 
ImmutableList.of(newScan));
 
 Review comment:
   How it works in case `newGroupScan` is not an instance of 
`AbstractParquetGroupScan`? Will not `filter.copy` be called twice?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554414#comment-16554414
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204807292
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -124,8 +124,7 @@ private static LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
*/
   private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
 return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if min value is not false or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+  isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin() ? 
RowsMatch.NONE : checkNull(exprStat)
 
 Review comment:
   Please add unit testing. As you can see, integration tests may result in 
false positive.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554406#comment-16554406
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204805759
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,91 +62,126 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
+  /**
+   * Return true if exprStat.getMin is defined and true
+   */
+  private static Boolean minIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin(); }
+
+  /**
+   * Return true if exprStat.getMin is defined and false
+   */
+  private static Boolean minIsFalse(Statistics exprStat) { return 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMin(); }
+
+  /**
+   * Return true if exprStat.getMax is defined and true
+   */
+  private static Boolean maxIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMax(); }
+
+  /**
+   * Return true if exprStat.getMax is defined and false
+   */
+  private static Boolean maxIsFalse(Statistics exprStat) { return 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax(); }
+
   /**
* IS NULL predicate.
*/
   private static > LogicalExpression 
createIsNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are no nulls  -> canDrop
-(exprStat, evaluator) -> hasNoNulls(exprStat)) {
-  private final boolean isArray = isArray(expr);
-
-  private boolean isArray(LogicalExpression expression) {
-if (expression instanceof TypedFieldExpr) {
-  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expression;
-  SchemaPath schemaPath = typedFieldExpr.getPath();
-  return schemaPath.isArray();
-}
-return false;
-  }
-
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  (exprStat, evaluator) -> {
 // for arrays we are not able to define exact number of nulls
 // [1,2,3] vs [1,2] -> in second case 3 is absent and thus it's null 
but statistics shows no nulls
-return !isArray && super.canDrop(evaluator);
-  }
-};
+if (expr instanceof TypedFieldExpr) {
+  TypedFieldExpr typedFieldExpr = (TypedFieldExpr) expr;
+  if (typedFieldExpr.getPath().isArray()) {
+return RowsMatch.SOME;
+  }
+}
+if (hasNoNulls(exprStat)) {
+  return RowsMatch.NONE;
+}
+return isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.ALL : 
RowsMatch.SOME;
+  });
   }
 
   /**
* IS NOT NULL predicate.
*/
   private static > LogicalExpression 
createIsNotNullPredicate(LogicalExpression expr) {
 return new ParquetIsPredicate(expr,
-//if there are all nulls  -> canDrop
-(exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount())
+  (exprStat, evaluator) -> isAllNulls(exprStat, evaluator.getRowCount()) ? 
RowsMatch.NONE : checkNull(exprStat)
 );
   }
 
   /**
* IS TRUE predicate.
*/
   private static LogicalExpression createIsTruePredicate(LogicalExpression 
expr) {
-return new ParquetIsPredicate(expr, (exprStat, evaluator) ->
-//if max value is not true or if there are all nulls  -> canDrop
-isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && !((BooleanStatistics) exprStat).getMax()
-);
+return new ParquetIsPredicate(expr, (exprStat, evaluator) -> {
+  if (isAllNulls(exprStat, evaluator.getRowCount

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

2018-07-24 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554363#comment-16554363
 ] 

ASF GitHub Bot commented on DRILL-5796:
---

vrozov commented on a change in pull request #1298: DRILL-5796: Filter pruning 
for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r204797790
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##
 @@ -62,91 +62,126 @@ private ParquetIsPredicate(LogicalExpression expr, 
BiPredicate, Ra
 return visitor.visitUnknown(this, value);
   }
 
-  @Override
-  public boolean canDrop(RangeExprEvaluator evaluator) {
+  /**
+   * Apply the filter condition against the meta of the rowgroup.
+   */
+  public RowsMatch matches(RangeExprEvaluator evaluator) {
 Statistics exprStat = expr.accept(evaluator, null);
-if (isNullOrEmpty(exprStat)) {
-  return false;
-}
+return isNullOrEmpty(exprStat) ? RowsMatch.SOME : 
predicate.apply(exprStat, evaluator);
+  }
 
-return predicate.test(exprStat, evaluator);
+  /**
+   * After the applying of the filter against the statistics of the rowgroup, 
if the result is RowsMatch.ALL,
+   * then we still must know if the rowgroup contains some null values, 
because it can change the filter result.
+   * If it contains some null values, then we change the RowsMatch.ALL into 
RowsMatch.SOME, which sya that maybe
+   * some values (the null ones) should be disgarded.
+   */
+  private static RowsMatch checkNull(Statistics exprStat) {
+return hasNoNulls(exprStat) ? RowsMatch.ALL : RowsMatch.SOME;
   }
 
+  /**
+   * Return true if exprStat.getMin is defined and true
+   */
+  private static Boolean minIsTrue(Statistics exprStat) { return 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin(); }
 
 Review comment:
   why **B**oolean?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Filter pruning for multi rowgroup parquet file
> --
>
> Key: DRILL-5796
> URL: https://issues.apache.org/jira/browse/DRILL-5796
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Damien Profeta
>Assignee: Jean-Blas IMBERT
>Priority: Major
> Fix For: 1.14.0
>
>
> Today, filter pruning use the file name as the partitioning key. This means 
> you can remove a partition only if the whole file is for the same partition. 
> With parquet, you can prune the filter if the rowgroup make a partition of 
> your dataset as the unit of work if the rowgroup not the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6630) Extra spaces are ignored while publishing results in Drill Web UI

2018-07-24 Thread Anton Gozhiy (JIRA)

Anton Gozhiy created DRILL-6630:
---

 Summary: Extra spaces are ignored while publishing results in 
Drill Web UI
 Key: DRILL-6630
 URL: https://issues.apache.org/jira/browse/DRILL-6630
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Anton Gozhiy


*Prerequisites:*
Use Drill Web UI to submit queries

*Query:*
{code:sql}
select '   sdssada' from (values(1))
{code}

*Expected Result:*
{noformat}
"  sdssada"
{noformat}

*Actual Result:*
{noformat}
"sds sada"
{noformat}

*Note:* Inspecting the element using Chrome Developer Tools you can see that it 
contain the real string. So something should be done with HTML formatting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6611) Add [meta]-[Enter] js handler for query form submission

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6385) Support JPPD (Join Predicate Push Down)

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6589) Push transitive closure generated predicates past aggregates/projects

[jira] [Commented] (DRILL-6632) drill-jdbc-all jar size limit too small for release build

[jira] [Created] (DRILL-6632) drill-jdbc-all jar size limit too small for release build

[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

[jira] [Assigned] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

[jira] [Updated] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

[jira] [Commented] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

[jira] [Created] (DRILL-6631) Wrong result from LateralUnnest query with aggregation and order by

[jira] [Updated] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

[jira] [Commented] (DRILL-6629) BitVector split and transfer does not work correctly for transfer length < 8

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Commented] (DRILL-5796) Filter pruning for multi rowgroup parquet file

[jira] [Created] (DRILL-6630) Extra spaces are ignored while publishing results in Drill Web UI

33 matches

Site Navigation

Mail list logo

Footer information