[jira] [Commented] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-02 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567768#comment-16567768
 ] 

Pritesh Maker commented on DRILL-6645:
--

Added ready-to-commit for the batch committer to review as well.

> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-02 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6645:
-
Labels: ready-to-commit  (was: )

> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-02 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6645:
-
Reviewer: Gautam Kumar Parai

> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567763#comment-16567763
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410142512
 
 
   @ilooner, the PR in question was a different one; one that tried a more 
general fix for the concurrent vector update issue. Thanks for rerunning this 
one so we can see if the workaround works.
   
   I did just rebase the code, so hopefully you will find no issues when you do 
it again.
   
   Thanks much for your help on this. Once this goes in, I can add the final 
vector types, then move on to the text and JSON readers. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6657) Unnest reports one batch less than the actual number of batches

2018-08-02 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567758#comment-16567758
 ] 

Pritesh Maker commented on DRILL-6657:
--

h3.  [parthchandra|https://github.com/parthchandra] merged commit 
[{{419f51e}}|https://github.com/apache/drill/commit/419f51e57b389e39a0c3c090ae0e8d34e1fb944c]

> Unnest reports one batch less than the actual number of batches
> ---
>
> Key: DRILL-6657
> URL: https://issues.apache.org/jira/browse/DRILL-6657
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Unnest doesn't count the first batch that comes in. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567756#comment-16567756
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

ilooner commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410141154
 
 
   @paul-rogers My apologies, I don't recall discussing or disagreeing to any 
fix to the value vectors. I am in favor of whatever it takes to get this change 
in asap. I will try rebasing this PR ontop of the latest master. If the tests 
pass I will merge it. If the tests don't pass I will try to help debug it or 
provide a unit test that reproduces the issue. Will keep you posted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6644) In Some Cases The HashJoin Memory Calculator Over Reserves Memory For The Probe Side During The Build Phase

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567749#comment-16567749
 ] 

ASF GitHub Bot commented on DRILL-6644:
---

ilooner commented on a change in pull request #1409: DRILL-6644: Don't reserve 
space for incoming probe batches unnecessarily during the build phase.
URL: https://github.com/apache/drill/pull/1409#discussion_r207436910
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinMemoryCalculatorImpl.java
 ##
 @@ -391,7 +391,7 @@ private void calculateMemoryUsage()
 // probe batch we sniffed.
 // TODO when batch sizing project is complete we won't have to sniff 
probe batches since
 // they will have a well defined size.
-reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize + 
maxProbeBatchSize;
+reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize + 
probeSizePredictor.getBatchSize();
 
 Review comment:
   I'm not sure. I think we need to get some clarification for the desired 
behavior of operators. If the Drill engine is operating under the assumption 
that ownership of VectorContainers passes to a downstream operator after a call 
to next, then this is a bug that should be fixed.
   
   I will ask a question on the dev list about this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> In Some Cases The HashJoin Memory Calculator Over Reserves Memory For The 
> Probe Side During The Build Phase
> ---
>
> Key: DRILL-6644
> URL: https://issues.apache.org/jira/browse/DRILL-6644
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> There are two cases where the HashJoin Memory calculator over reserves memory:
>  1. It reserves a maximum incoming probe batch size during the build phase. 
> This is not really necessary because we will not fetch probe data until the 
> probe phase. We only have to account for the data received during 
> OK_NEW_SCHEMA.
>  2. https://issues.apache.org/jira/browse/DRILL-6646



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567737#comment-16567737
 ] 

ASF GitHub Bot commented on DRILL-6453:
---

ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r207435469
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -381,16 +409,14 @@ public HashJoinMemoryCalculator getCalculatorImpl() {
 
   @Override
   public IterOutcome innerNext() {
-if (!prefetched) {
+if (!prefetchedBuild) {
   // If we didn't retrieve our first data hold batch, we need to do it now.
-  prefetched = true;
-  prefetchFirstBatchFromBothSides();
+  prefetchedBuild = true;
+  prefetchFirstBuildBatch();
 
   // Handle emitting the correct outcome for termination conditions
-  // Use the state set by prefetchFirstBatchFromBothSides to emit the 
correct termination outcome.
+  // Use the state set by prefetchFirstBuildBatch to emit the correct 
termination outcome.
 
 Review comment:
   Refactored this code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Timothy Farkas
>Priority: Blocker
> Fix For: 1.15.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567738#comment-16567738
 ] 

ASF GitHub Bot commented on DRILL-6453:
---

ilooner commented on issue #1408: DRILL-6453: Resolve deadlock when reading 
from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#issuecomment-410138161
 
 
   @Ben-Zvi I've refactored the code, should be much cleaner now. Please take 
another look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Timothy Farkas
>Priority: Blocker
> Fix For: 1.15.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6453) TPC-DS query 72 has regressed

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567734#comment-16567734
 ] 

ASF GitHub Bot commented on DRILL-6453:
---

ilooner commented on a change in pull request #1408: DRILL-6453: Resolve 
deadlock when reading from build and probe sides simultaneously in HashJoin
URL: https://github.com/apache/drill/pull/1408#discussion_r207435292
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -248,32 +254,54 @@ protected void buildSchema() throws 
SchemaChangeException {
 }
   }
 
-  @Override
-  protected boolean prefetchFirstBatchFromBothSides() {
-if (leftUpstream != IterOutcome.NONE) {
-  // We can only get data if there is data available
-  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
-}
-
+  private void prefetchFirstBuildBatch() {
 if (rightUpstream != IterOutcome.NONE) {
   // We can only get data if there is data available
   rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
 }
 
 buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
 
-if (verifyOutcomeToSetBatchState(leftUpstream, rightUpstream)) {
+if (rightUpstream == IterOutcome.OUT_OF_MEMORY) {
+  // We reached a termination state
+  state = BatchState.OUT_OF_MEMORY;
+} else if (rightUpstream == IterOutcome.STOP) {
+  state = BatchState.STOP;
+} else {
   // For build side, use aggregate i.e. average row width across batches
-  batchMemoryManager.update(LEFT_INDEX, 0);
   batchMemoryManager.update(RIGHT_INDEX, 0, true);
-
-  logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
   logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
 
   // Got our first batche(s)
   state = BatchState.FIRST;
+}
+  }
+
+  /**
+   *
+   * @return True terminate. False continue.
+   */
+  private boolean prefetchFirstProbeBatch() {
 
 Review comment:
   Refactored this code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> TPC-DS query 72 has regressed
> -
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Timothy Farkas
>Priority: Blocker
> Fix For: 1.15.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill, 
> jstack_29173_June_10_2018.txt, jstack_29173_June_10_2018.txt, 
> jstack_29173_June_10_2018_b.txt, jstack_29173_June_10_2018_b.txt, 
> jstack_29173_June_10_2018_c.txt, jstack_29173_June_10_2018_c.txt, 
> jstack_29173_June_10_2018_d.txt, jstack_29173_June_10_2018_d.txt, 
> jstack_29173_June_10_2018_e.txt, jstack_29173_June_10_2018_e.txt
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it 
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT 
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took 
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not 
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes; 
> planner.memory.max_query_memory_per_node = 10737418240. 
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to 
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in 
> CANCELLATION_REQUESTED state on UI.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567682#comment-16567682
 ] 

Steve Loughran commented on DRILL-6662:
---

you might want to take the code from [S3A 
Utils|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L735],
 which handles per-bucket secrets in the config files. That allows you to have 
different secrets (inc encryption keys) for different buckets

> Access AWS access key ID and secret access key using Credential Provider API 
> for S3 storage plugin
> --
>
> Key: DRILL-6662
> URL: https://issues.apache.org/jira/browse/DRILL-6662
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Hadoop provides [CredentialProvider 
> API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
>  which allows passwords and other sensitive secrets to be stored in an 
> external provider rather than in configuration files in plaintext.
> Currently S3 storage plugin is accessing passwords, namely 
> 'fs.s3a.access.key' and 'fs.s3a.secret.key', stored in clear text in 
> Configuration with get() method. To give users an ability to remove clear 
> text passwords for S3 from configuration files Configuration.getPassword() 
> method should be used, given they configure 
> 'hadoop.security.credential.provider.path' property which points to a file 
> containing encrypted passwords instead of configuring two aforementioned 
> properties.
> By using this approach, credential providers will be checked first and if the 
> secret is not provided or providers are not configured there will be a 
> fallback to secrets configured in clear text (unless 
> 'hadoop.security.credential.clear-text-fallback' is configured to be 
> "false"), thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567667#comment-16567667
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207419304
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/dotdrill/TestDotDrillUtil.java
 ##
 @@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.dotdrill;
+
+import java.util.List;
+
+import static org.junit.Assert.assertTrue;
+
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class TestDotDrillUtil {
+
+  private static final String TEST_ROOT_DIR = 
System.getProperty("java.io.tmpdir") + "/dot_drill_util";
 
 Review comment:
   I think you may want to look at some existing tests to inherit baseclass 
like BaseTestQuery. This will make sure that the test directories are cleaned 
up properly.
   
   @ilooner  Can you please take a look at the test file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567660#comment-16567660
 ] 

ASF GitHub Bot commented on DRILL-6101:
---

ilooner commented on issue #1414: DRILL-6101: Optimized implicit columns 
handling within scanner
URL: https://github.com/apache/drill/pull/1414#issuecomment-410114076
 
 
   @paul-rogers I suggest we go with this change for now since the code change 
is minimal, needed by some of our users, and provides a big speed up. I see 
your result set loader PRs are blocked on testing. I will be your personal 
tester and try to help you get those PRs in. When those PRs are ready we can 
switch to using your comprehensive solution.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567618#comment-16567618
 ] 

ASF GitHub Bot commented on DRILL-6645:
---

gparai commented on a change in pull request #1417: DRILL-6645: Transform TopN 
in Lateral Unnest pipeline to Sort and Limit.
URL: https://github.com/apache/drill/pull/1417#discussion_r207410445
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/TopNPrel.java
 ##
 @@ -115,6 +118,16 @@ public Prel addImplicitRowIDCol(List children) {
 
.replace(this.getTraitSet().getTrait(DrillDistributionTraitDef.INSTANCE))
 .replace(collationTrait)
 .replace(DRILL_PHYSICAL);
-return (Prel) this.copy(traits, children);
+return transformTopNToSortAndLimit(children, traits, collationTrait);
+  }
+
+  private Prel transformTopNToSortAndLimit(List children, RelTraitSet 
traits, RelCollation collationTrait) {
+SortPrel sortprel = new SortPrel(this.getCluster(), traits, 
children.get(0), collationTrait);
+RexNode offset = 
this.getCluster().getRexBuilder().makeExactLiteral(BigDecimal.valueOf(0),
+
this.getCluster().getTypeFactory().createSqlType(SqlTypeName.INTEGER));
+RexNode limit = 
this.getCluster().getRexBuilder().makeExactLiteral(BigDecimal.valueOf(this.limit),
 
 Review comment:
   Just a minor comment - you can add a comment describing why we don't need 
SMEX.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567611#comment-16567611
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409716
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##
 @@ -48,16 +59,70 @@
 }
 return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-if(!name.endsWith(".drill")) {
-  name = name + DotDrillType.DOT_DRILL_GLOB;
-}
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+List statusList = new ArrayList();
 
 Review comment:
   filesStatus does it make sense?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567610#comment-16567610
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409509
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##
 @@ -48,16 +59,70 @@
 }
 return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-if(!name.endsWith(".drill")) {
-  name = name + DotDrillType.DOT_DRILL_GLOB;
-}
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+List statusList = new ArrayList();
+
+if (name.endsWith(".drill")) {
+  FileStatus[] status = fs.globStatus(new Path(root, name));
 
 Review comment:
   In this case does it mean that types should not matter? If so is it good to 
have some assert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567607#comment-16567607
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409122
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##
 @@ -48,16 +59,70 @@
 }
 return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-if(!name.endsWith(".drill")) {
-  name = name + DotDrillType.DOT_DRILL_GLOB;
-}
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+List statusList = new ArrayList();
+
+if (name.endsWith(".drill")) {
+  FileStatus[] status = fs.globStatus(new Path(root, name));
 
 Review comment:
   Why do you want to change the existing logic for this case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567604#comment-16567604
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409069
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##
 @@ -48,16 +59,70 @@
 }
 return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-if(!name.endsWith(".drill")) {
-  name = name + DotDrillType.DOT_DRILL_GLOB;
-}
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+List statusList = new ArrayList();
+
+if (name.endsWith(".drill")) {
+  FileStatus[] status = fs.globStatus(new Path(root, name));
+  if (status != null) {
+statusList.addAll(Arrays.asList(status));
+  }
+} else {
+  DotDrillType[] typeArray;
 
 Review comment:
   Instead of typeArray you may want  to use "types".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567605#comment-16567605
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409122
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##
 @@ -48,16 +59,70 @@
 }
 return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-if(!name.endsWith(".drill")) {
-  name = name + DotDrillType.DOT_DRILL_GLOB;
-}
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+List statusList = new ArrayList();
+
+if (name.endsWith(".drill")) {
+  FileStatus[] status = fs.globStatus(new Path(root, name));
 
 Review comment:
   Why do you want to change the existing logic for this case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6654) Data verification failure with lateral unnest query having filter in and order by

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567570#comment-16567570
 ] 

ASF GitHub Bot commented on DRILL-6654:
---

sohami opened a new pull request #1418: DRILL-6654: Data verification failure 
with lateral unnest query havin…
URL: https://github.com/apache/drill/pull/1418
 
 
   …g filter in and order by


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Data verification failure with lateral unnest query having filter in and 
> order by
> -
>
> Key: DRILL-6654
> URL: https://issues.apache.org/jira/browse/DRILL-6654
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Lateral Parquet.pdf, Lateral json.pdf, flatten.pdf
>
>
> Data verification failure with lateral unnest query having filter in and 
> order by .
> lateral query - 
> {code}
> select customer.c_custkey, customer.c_name, orders.totalprice from customer, 
> lateral (select sum(t.o.o_totalprice) as totalprice from 
> unnest(customer.c_orders) t(o) WHERE t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> {code}
> result :-
> {code}
> ++-+-+
> | c_custkey | c_name | totalprice |
> ++-+-+
> | 101276 | Customer#000101276 | 82657.72 |
> | 120295 | Customer#000120295 | 266119.96 |
> | 120376 | Customer#000120376 | 180309.76 |
> ++-+-+
> {code}
> flatten query -
> {code}
> select f.c_custkey, f.c_name, sum(f.o.o_totalprice) from (select c_custkey, 
> c_name, flatten(c_orders) as o from customer) f WHERE f.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76) group by 
> f.c_custkey, f.c_name order by f.c_custkey limit 50;
> {code}
> result :-
> {code}
> ++-++
> | c_custkey | c_name | EXPR$2 |
> ++-++
> | 101276 | Customer#000101276 | 82657.72 |
> | 120376 | Customer#000120376 | 180309.76 |
> ++-++
> {code}
> PS :- The above results are for Parquet type data .The same query for JSON 
> data gives identical result given as follows :-
> {code}
> ++-++
> | c_custkey | c_name | EXPR$2 |
> ++-++
> | 101276 | Customer#000101276 | 82657.72 |
> | 120376 | Customer#000120376 | 180309.76 |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567542#comment-16567542
 ] 

ASF GitHub Bot commented on DRILL-6101:
---

sachouche commented on issue #1414: DRILL-6101: Optimized implicit columns 
handling within scanner
URL: https://github.com/apache/drill/pull/1414#issuecomment-410086886
 
 
   @paul-rogers 
   I agree the planner should have been responsible for optimally exposing the 
implicit & partition columns since it has all the necessary metadata. This 
feature was not implemented that way.. The execution layer included logic to 
optimize implicit column handling if the query was not a STAR_QUERY. My fix was 
mainly to pass the missing metadata in the case of STAR_QUERY and thus trigger 
the existing optimization. Our testing showed up to 30% improvement within the 
Parquet scanner for SELECT_STAR queries and big reduction in batch memory 
utilization. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6657) Unnest reports one batch less than the actual number of batches

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567541#comment-16567541
 ] 

ASF GitHub Bot commented on DRILL-6657:
---

parthchandra closed pull request #1413: DRILL-6657: Unnest reports one batch 
less than the actual number of b…
URL: https://github.com/apache/drill/pull/1413
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
index e89144db59d..6204d37cfb0 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
@@ -55,9 +55,6 @@
   private IntVector rowIdVector; // vector to keep the implicit rowId column in
 
   private Unnest unnest = new UnnestImpl();
-  private boolean hasNewSchema = false; // set to true if a new schema was 
encountered and an empty batch was
-// sent. The next iteration, we need 
to make sure the record batch sizer
-// is updated before we process the 
actual data.
   private boolean hasRemainder = false; // set to true if there is data left 
over for the current row AND if we want
 // to keep processing it. Kill may be 
called by a limit in a subquery that
 // requires us to stop processing 
thecurrent row, but not stop processing
@@ -164,7 +161,7 @@ protected void killIncoming(boolean sendUpstream) {
 Preconditions.checkState(context.getExecutorState().isFailed() ||
   lateral.getLeftOutcome() == IterOutcome.STOP, "Kill received by unnest 
with unexpected state. " +
   "Neither the LateralOutcome is STOP nor executor state is failed");
-logger.debug("Kill received. Stopping all processing");
+  logger.debug("Kill received. Stopping all processing");
 state = BatchState.DONE;
 recordCount = 0;
 hasRemainder = false; // whatever the case, we need to stop processing the 
current row.
@@ -180,12 +177,6 @@ public IterOutcome innerNext() {
   return IterOutcome.NONE;
 }
 
-if (hasNewSchema) {
-  memoryManager.update();
-  hasNewSchema = false;
-  return doWork();
-}
-
 if (hasRemainder) {
   return doWork();
 }
@@ -200,12 +191,13 @@ public IterOutcome innerNext() {
   state = BatchState.NOT_FIRST;
   try {
 stats.startSetup();
-hasNewSchema = true; // next call to next will handle the actual data.
 logger.debug("First batch received");
 schemaChanged(); // checks if schema has changed (redundant in this 
case becaause it has) AND saves the
  // current field metadata for check in subsequent 
iterations
 setupNewSchema();
 stats.batchReceived(0, incoming.getRecordCount(), true);
+memoryManager.update();
+hasRemainder = incoming.getRecordCount() > 0;
   } catch (SchemaChangeException ex) {
 kill(false);
 logger.error("Failure during query", ex);
@@ -216,14 +208,18 @@ public IterOutcome innerNext() {
   }
   return IterOutcome.OK_NEW_SCHEMA;
 } else {
+  Preconditions.checkState(incoming.getRecordCount() > 0,
+"Incoming batch post buildSchema phase should never be empty for 
Unnest");
   container.zeroVectors();
   // Check if schema has changed
   if (lateral.getRecordIndex() == 0) {
-hasNewSchema = schemaChanged();
+boolean hasNewSchema = schemaChanged();
 stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema);
 if (hasNewSchema) {
   try {
 setupNewSchema();
+hasRemainder = true;
+memoryManager.update();
   } catch (SchemaChangeException ex) {
 kill(false);
 logger.error("Failure during query", ex);


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unnest reports one batch less than the actual number of batches
> ---
>
> Key: DRILL-6657
> URL: https://issues.apache.org/jira/browse/DRILL-6657
> Project: Apache Drill
>  

[jira] [Commented] (DRILL-6640) Drill takes long time in planning when there are large number of files in views/tables DFS parent directory

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567530#comment-16567530
 ] 

ASF GitHub Bot commented on DRILL-6640:
---

priteshm commented on issue #1405: DRILL-6640: Modifying DotDrillUtil 
implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#issuecomment-410083376
 
 
   @HanumathRao can you complete the review today?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> 
>
> Key: DRILL-6640
> URL: https://issues.apache.org/jira/browse/DRILL-6640
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Reporter: Arjun
>Assignee: Arjun
>Priority: Major
> Fix For: 1.15.0
>
>
> When Drill is used for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6654) Data verification failure with lateral unnest query having filter in and order by

2018-08-02 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6654:
-
Reviewer: Parth Chandra

> Data verification failure with lateral unnest query having filter in and 
> order by
> -
>
> Key: DRILL-6654
> URL: https://issues.apache.org/jira/browse/DRILL-6654
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Kedar Sankar Behera
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: Lateral Parquet.pdf, Lateral json.pdf, flatten.pdf
>
>
> Data verification failure with lateral unnest query having filter in and 
> order by .
> lateral query - 
> {code}
> select customer.c_custkey, customer.c_name, orders.totalprice from customer, 
> lateral (select sum(t.o.o_totalprice) as totalprice from 
> unnest(customer.c_orders) t(o) WHERE t.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76)) orders 
> order by customer.c_custkey limit 50;
> {code}
> result :-
> {code}
> ++-+-+
> | c_custkey | c_name | totalprice |
> ++-+-+
> | 101276 | Customer#000101276 | 82657.72 |
> | 120295 | Customer#000120295 | 266119.96 |
> | 120376 | Customer#000120376 | 180309.76 |
> ++-+-+
> {code}
> flatten query -
> {code}
> select f.c_custkey, f.c_name, sum(f.o.o_totalprice) from (select c_custkey, 
> c_name, flatten(c_orders) as o from customer) f WHERE f.o.o_totalprice in 
> (89230.03,270087.44,246408.53,82657.72,153941.38,65277.06,180309.76) group by 
> f.c_custkey, f.c_name order by f.c_custkey limit 50;
> {code}
> result :-
> {code}
> ++-++
> | c_custkey | c_name | EXPR$2 |
> ++-++
> | 101276 | Customer#000101276 | 82657.72 |
> | 120376 | Customer#000120376 | 180309.76 |
> ++-++
> {code}
> PS :- The above results are for Parquet type data .The same query for JSON 
> data gives identical result given as follows :-
> {code}
> ++-++
> | c_custkey | c_name | EXPR$2 |
> ++-++
> | 101276 | Customer#000101276 | 82657.72 |
> | 120376 | Customer#000120376 | 180309.76 |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6652) PartitionLimit changes for Lateral and Unnest

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567450#comment-16567450
 ] 

ASF GitHub Bot commented on DRILL-6652:
---

sohami closed pull request #1407: DRILL-6652: PartitionLimit changes for 
Lateral and Unnest
URL: https://github.com/apache/drill/pull/1407
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/contrib/native/client/src/protobuf/UserBitShared.pb.cc 
b/contrib/native/client/src/protobuf/UserBitShared.pb.cc
index 282f581c5b6..739804844bf 100644
--- a/contrib/native/client/src/protobuf/UserBitShared.pb.cc
+++ b/contrib/native/client/src/protobuf/UserBitShared.pb.cc
@@ -750,7 +750,7 @@ void protobuf_AddDesc_UserBitShared_2eproto() {
 "TATEMENT\020\005*\207\001\n\rFragmentState\022\013\n\007SENDING\020"
 
"\000\022\027\n\023AWAITING_ALLOCATION\020\001\022\013\n\007RUNNING\020\002\022"
 
"\014\n\010FINISHED\020\003\022\r\n\tCANCELLED\020\004\022\n\n\006FAILED\020\005"
-"\022\032\n\026CANCELLATION_REQUESTED\020\006*\271\010\n\020CoreOpe"
+"\022\032\n\026CANCELLATION_REQUESTED\020\006*\316\010\n\020CoreOpe"
 "ratorType\022\021\n\rSINGLE_SENDER\020\000\022\024\n\020BROADCAS"
 "T_SENDER\020\001\022\n\n\006FILTER\020\002\022\022\n\016HASH_AGGREGATE"
 
"\020\003\022\r\n\tHASH_JOIN\020\004\022\016\n\nMERGE_JOIN\020\005\022\031\n\025HAS"
@@ -777,11 +777,12 @@ void protobuf_AddDesc_UserBitShared_2eproto() {
 "_SCAN\020.\022\022\n\016MONGO_SUB_SCAN\020/\022\017\n\013KUDU_WRIT"
 "ER\0200\022\026\n\022OPEN_TSDB_SUB_SCAN\0201\022\017\n\013JSON_WRI"
 "TER\0202\022\026\n\022HTPPD_LOG_SUB_SCAN\0203\022\022\n\016IMAGE_S"
-"UB_SCAN\0204\022\025\n\021SEQUENCE_SUB_SCAN\0205*g\n\nSasl"
-"Status\022\020\n\014SASL_UNKNOWN\020\000\022\016\n\nSASL_START\020\001"
-"\022\024\n\020SASL_IN_PROGRESS\020\002\022\020\n\014SASL_SUCCESS\020\003"
-"\022\017\n\013SASL_FAILED\020\004B.\n\033org.apache.drill.ex"
-"ec.protoB\rUserBitSharedH\001", 5385);
+"UB_SCAN\0204\022\025\n\021SEQUENCE_SUB_SCAN\0205\022\023\n\017PART"
+"ITION_LIMIT\0206*g\n\nSaslStatus\022\020\n\014SASL_UNKN"
+"OWN\020\000\022\016\n\nSASL_START\020\001\022\024\n\020SASL_IN_PROGRES"
+
"S\020\002\022\020\n\014SASL_SUCCESS\020\003\022\017\n\013SASL_FAILED\020\004B."
+"\n\033org.apache.drill.exec.protoB\rUserBitSh"
+"aredH\001", 5406);
   ::google::protobuf::MessageFactory::InternalRegisterGeneratedFile(
 "UserBitShared.proto", _RegisterTypes);
   UserCredentials::default_instance_ = new UserCredentials();
@@ -956,6 +957,7 @@ bool CoreOperatorType_IsValid(int value) {
 case 51:
 case 52:
 case 53:
+case 54:
   return true;
 default:
   return false;
diff --git a/contrib/native/client/src/protobuf/UserBitShared.pb.h 
b/contrib/native/client/src/protobuf/UserBitShared.pb.h
index 134dc2b500c..4599abb23aa 100644
--- a/contrib/native/client/src/protobuf/UserBitShared.pb.h
+++ b/contrib/native/client/src/protobuf/UserBitShared.pb.h
@@ -257,11 +257,12 @@ enum CoreOperatorType {
   JSON_WRITER = 50,
   HTPPD_LOG_SUB_SCAN = 51,
   IMAGE_SUB_SCAN = 52,
-  SEQUENCE_SUB_SCAN = 53
+  SEQUENCE_SUB_SCAN = 53,
+  PARTITION_LIMIT = 54
 };
 bool CoreOperatorType_IsValid(int value);
 const CoreOperatorType CoreOperatorType_MIN = SINGLE_SENDER;
-const CoreOperatorType CoreOperatorType_MAX = SEQUENCE_SUB_SCAN;
+const CoreOperatorType CoreOperatorType_MAX = PARTITION_LIMIT;
 const int CoreOperatorType_ARRAYSIZE = CoreOperatorType_MAX + 1;
 
 const ::google::protobuf::EnumDescriptor* CoreOperatorType_descriptor();
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/PartitionLimit.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/PartitionLimit.java
new file mode 100644
index 000..29f8bb2fe3f
--- /dev/null
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/PartitionLimit.java
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language 

[jira] [Updated] (DRILL-6636) Planner side changes to use PartitionLimitBatch in place of LimitBatch

2018-08-02 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6636:
-
Labels: ready-to-commit  (was: )

> Planner side changes to use PartitionLimitBatch in place of LimitBatch
> --
>
> Key: DRILL-6636
> URL: https://issues.apache.org/jira/browse/DRILL-6636
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6652) PartitionLimit changes for Lateral and Unnest

2018-08-02 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6652:
-
Labels: ready-to-commit  (was: )

> PartitionLimit changes for Lateral and Unnest
> -
>
> Key: DRILL-6652
> URL: https://issues.apache.org/jira/browse/DRILL-6652
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators, Query Planning  
> Optimization
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6635) PartitionLimit for Lateral/Unnest

2018-08-02 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-6635:
-
Labels: ready-to-commit  (was: )

> PartitionLimit for Lateral/Unnest
> -
>
> Key: DRILL-6635
> URL: https://issues.apache.org/jira/browse/DRILL-6635
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> With batch processing changes in Lateral/Unnest the limit/TopN clause within 
> Lateral-Unnest subquery will not work as expected since it will impose 
> limit/TopN across RowId's. We need a new mechanism to apply these operators 
> at rowId boundary.
> For now we are planning to add support for only limit and hence need to have 
> a new operator PartitionLimit which will get the partitionColumn on which the 
> limit should be imposed. This will currently only support queries between 
> lateral and unnest. 
> For TopN we can still achieve that using combination of Sort and Partition 
> Limit and later we can figure out how to address it directly within TopN or 
> is it needed at all. Since the number of rows across EMIT boundary on which 
> SORT will operate should not be big enough and mostly be done in memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Fix Version/s: 1.15.0

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6101) Optimize Implicit Columns Processing

2018-08-02 Thread salim achouche (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche updated DRILL-6101:
--
Labels: ready-to-commit  (was: pull-request-available)

> Optimize Implicit Columns Processing
> 
>
> Key: DRILL-6101
> URL: https://issues.apache.org/jira/browse/DRILL-6101
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Affects Versions: 1.12.0
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> Problem Description -
>  * Apache Drill allows users to specify columns even for SELECT STAR queries
>  * From my discussion with [~paul-rogers], Apache Calcite has a limitation 
> where the, extra columns are not provided
>  * The workaround has been to always include all implicit columns for SELECT 
> STAR queries
>  * Unfortunately, the current implementation is very inefficient as implicit 
> column values get duplicated; this leads to substantial performance 
> degradation when the number of rows are large
> Suggested Optimization -
>  * The NullableVarChar vector should be enhanced to efficiently store 
> duplicate values
>  * This will not only address the current Calcite limitations (for SELECT 
> STAR queries) but also optimize all queries with implicit columns
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6645) Transform TopN in Lateral Unnest pipeline to Sort and Limit.

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567106#comment-16567106
 ] 

ASF GitHub Bot commented on DRILL-6645:
---

HanumathRao opened a new pull request #1417: DRILL-6645: Transform TopN in 
Lateral Unnest pipeline to Sort and Limit.
URL: https://github.com/apache/drill/pull/1417
 
 
   
   In Lateral/Unnest pipeline, for the sake of getting correct results it is 
required to introduce a ParitionLimit instead of Limit. ParitionLimit is 
introduced by PR for DRILL-6652. Similarly a TopN should have another version 
like PartitionTopN. Since ParitionTopN operator is not yet implemented we can 
use Limit and sort to replace a TopN. This PR includes changes to transform the 
TopN -> Sort and Limit.
   
   @gparai  @sohami  Can you please review these changes.
   
   This PR needs to be committed after the DRILL-6652 committed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Transform TopN in Lateral Unnest pipeline to Sort and Limit.
> 
>
> Key: DRILL-6645
> URL: https://issues.apache.org/jira/browse/DRILL-6645
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.14.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.15.0
>
>
> TopN operator is not supported in Lateral Unnest pipeline. Hence transform 
> the TopN to use Sort and Limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6662) Access AWS access key ID and secret access key using Credential Provider API for S3 storage plugin

2018-08-02 Thread Bohdan Kazydub (JIRA)
Bohdan Kazydub created DRILL-6662:
-

 Summary: Access AWS access key ID and secret access key using 
Credential Provider API for S3 storage plugin
 Key: DRILL-6662
 URL: https://issues.apache.org/jira/browse/DRILL-6662
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Hadoop provides [CredentialProvider 
API|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html]]
 which allows passwords and other sensitive secrets to be stored in an external 
provider rather than in configuration files in plaintext.

Currently S3 storage plugin is accessing passwords, namely 'fs.s3a.access.key' 
and 'fs.s3a.secret.key', stored in clear text in Configuration with get() 
method. To give users an ability to remove clear text passwords for S3 from 
configuration files Configuration.getPassword() method should be used, given 
they configure 'hadoop.security.credential.provider.path' property which points 
to a file containing encrypted passwords instead of configuring two 
aforementioned properties.

By using this approach, credential providers will be checked first and if the 
secret is not provided or providers are not configured there will be a fallback 
to secrets configured in clear text (unless 
'hadoop.security.credential.clear-text-fallback' is configured to be "false"), 
thus making new change backwards-compatible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6657) Unnest reports one batch less than the actual number of batches

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567064#comment-16567064
 ] 

ASF GitHub Bot commented on DRILL-6657:
---

parthchandra commented on issue #1413: DRILL-6657: Unnest reports one batch 
less than the actual number of b…
URL: https://github.com/apache/drill/pull/1413#issuecomment-409995213
 
 
   Addressed review comments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unnest reports one batch less than the actual number of batches
> ---
>
> Key: DRILL-6657
> URL: https://issues.apache.org/jira/browse/DRILL-6657
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Unnest doesn't count the first batch that comes in. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6657) Unnest reports one batch less than the actual number of batches

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567029#comment-16567029
 ] 

ASF GitHub Bot commented on DRILL-6657:
---

sohami commented on a change in pull request #1413: DRILL-6657: Unnest reports 
one batch less than the actual number of b…
URL: https://github.com/apache/drill/pull/1413#discussion_r207287174
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 ##
 @@ -231,14 +223,18 @@ public IterOutcome innerNext() {
   }
   return IterOutcome.OK_NEW_SCHEMA;
 } else {
+  Preconditions.checkState(incoming.getRecordCount() > 0,
+"Incoming batch post buildSchema phase should never be empty for 
Unnest");
   container.zeroVectors();
   // Check if schema has changed
   if (lateral.getRecordIndex() == 0) {
-hasNewSchema = schemaChanged();
+boolean hasNewSchema = schemaChanged();
 stats.batchReceived(0, incoming.getRecordCount(), hasNewSchema);
 if (hasNewSchema) {
   try {
 setupNewSchema();
+hasRemainder = incoming.getRecordCount() > 0;
 
 Review comment:
   we can just set `hasRemainder=true` here since based on Preconditions check 
the right side will always be true.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unnest reports one batch less than the actual number of batches
> ---
>
> Key: DRILL-6657
> URL: https://issues.apache.org/jira/browse/DRILL-6657
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Unnest doesn't count the first batch that comes in. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6657) Unnest reports one batch less than the actual number of batches

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567028#comment-16567028
 ] 

ASF GitHub Bot commented on DRILL-6657:
---

sohami commented on a change in pull request #1413: DRILL-6657: Unnest reports 
one batch less than the actual number of b…
URL: https://github.com/apache/drill/pull/1413#discussion_r207286232
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 ##
 @@ -166,7 +163,7 @@ protected void killIncoming(boolean sendUpstream) {
 Preconditions.checkNotNull(lateral);
 // Do not call kill on incoming. Lateral Join has the responsibility for 
killing incoming
 if (context.getExecutorState().isFailed() || lateral.getLeftOutcome() == 
IterOutcome.STOP) {
-  logger.debug("Kill received. Stopping all processing");
+logger.debug("Kill received. Stopping all processing");
 
 Review comment:
   please revert this indentation back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unnest reports one batch less than the actual number of batches
> ---
>
> Key: DRILL-6657
> URL: https://issues.apache.org/jira/browse/DRILL-6657
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
> Fix For: 1.15.0
>
>
> Unnest doesn't count the first batch that comes in. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6661) Need a configurable parameter to stop Long running queries

2018-08-02 Thread Aditya Allamraju (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Allamraju updated DRILL-6661:

Summary: Need a configurable parameter to stop Long running queries  (was: 
Resource management: Need a configurable parameter to stop Long running queries)

> Need a configurable parameter to stop Long running queries
> --
>
> Key: DRILL-6661
> URL: https://issues.apache.org/jira/browse/DRILL-6661
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.13.0
>Reporter: Aditya Allamraju
>Priority: Major
>
> I am looking for a way to stop any long running queries that run beyond a 
> certain time.
> This is not to be confused with queue timeout which does not trigger if the 
> query has started executing.  Other database vendors do this via resource 
> management or a simple timeout.
> Currently, the default behavior is to allow any query in execution to 
> continue till arbitrarily large amount of time until completion.
> There is a genuine need for this case. For instance, i want to stop queries 
> running beyond 15mins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6661) Resource management: Need a configurable parameter to stop Long running queries

2018-08-02 Thread Aditya Allamraju (JIRA)
Aditya Allamraju created DRILL-6661:
---

 Summary: Resource management: Need a configurable parameter to 
stop Long running queries
 Key: DRILL-6661
 URL: https://issues.apache.org/jira/browse/DRILL-6661
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Monitoring
Affects Versions: 1.13.0
Reporter: Aditya Allamraju


I am looking for a way to stop any long running queries that run beyond a 
certain time.

This is not to be confused with queue timeout which does not trigger if the 
query has started executing.  Other database vendors do this via resource 
management or a simple timeout.

Currently, the default behavior is to allow any query in execution to continue 
till arbitrarily large amount of time until completion.

There is a genuine need for this case. For instance, i want to stop queries 
running beyond 15mins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)