[jira] [Commented] (DRILL-4795) Nested aggregate windowed query fails - IllegalStateException

2016-07-25 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391449#comment-15391449
 ] 

Khurram Faraaz commented on DRILL-4795:
---

Results from Postgres 9.3 for same data

{noformat}
postgres=# select avg(sum(c1)) over() from t222;
 

avg 

-

 4296044133.

(1 row)
{noformat}

Drill returns an Exception
Another case where drill returns error, whereas Postgres returns results. note 
that the window definition is empty.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(SUM(c1)) OVER() FROM 
`tblWnulls.parquet`;

Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
mappings beyond

Fragment 0:0

[Error Id: 660751c5-bcb1-41b8-a1eb-a2e4a7d3e036 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> Nested aggregate windowed query fails - IllegalStateException 
> --
>
> Key: DRILL-4795
> URL: https://issues.apache.org/jira/browse/DRILL-4795
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Gautam Kumar Parai
>Priority: Critical
> Attachments: tblWnulls.parquet
>
>
> The below two window function queries fail on MapR Drill 1.8.0 commit ID 
> 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c1) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: b32ed6b0-6b81-4d5f-bce0-e4ea269c5af1 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select avg(sum(c1)) over (partition by c2) from 
> `tblWnulls.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
> mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> From drillbit.log
> {noformat}
> 2016-07-21 11:19:27,778 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 286f503f-9b20-87e3-d7ec-2d3881f29e4a: select avg(sum(c1)) over (partition by 
> c2) from `tblWnulls.parquet`
> ...
> 2016-07-21 11:19:27,979 [286f503f-9b20-87e3-d7ec-2d3881f29e4a:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: This generator does not support mappings beyond
> Fragment 0:0
> [Error Id: ef9056c7-3989-427e-b180-b48741bfc6a4 on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_101]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
> Caused by: java.lang.IllegalStateException: This generator does not support 
> mappings beyond
> at 
> org.apache.drill.exec.compile.sig.MappingSet.enterChild(MappingSet.java:102) 
> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression(EvaluationVisitor.java:188)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression(EvaluationVisitor.java:1077)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression(EvaluationVisitor.java:815)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> at 
> or

[jira] [Commented] (DRILL-4802) NULLS are not first when NULLS FIRST is used with ORDER BY in window definition

2016-07-25 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391456#comment-15391456
 ] 

Khurram Faraaz commented on DRILL-4802:
---

We have several tests in our functional suite 
(resources/Functional/window_functions) that cover NULLS FIRST, and they are 
there for sometime now and running clean. So this issue seems to be related to 
nested aggregates.
Here are some existing tests for NULLS FIRST with regular window functions.

{noformat}
aggregates/winFnQry_62.q:select c1, c2, max ( c1 ) over ( partition by c2 order 
by c1 nulls first ) w_max from `tblWnulls.parquet`;
aggregates/winFnQry_63.q:select c1, c2, sum ( c1 ) over ( partition by c2 order 
by c1 desc nulls first ) w_sum from `tblWnulls.parquet`;
aggregates/winFnQry_82.q:select c1, c2, w_avg from ( select c1, c2, avg ( c1 ) 
over ( partition by c2 order by c1 asc nulls first ) w_avg from 
`tblWnulls.parquet` ) sub_query where w_avg is not null;

{noformat}

> NULLS are not first when NULLS FIRST is used with ORDER BY in window 
> definition
> ---
>
> Key: DRILL-4802
> URL: https://issues.apache.org/jira/browse/DRILL-4802
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>
> NULLS FIRST is not honored when used with ORDER BY inside window definition. 
> This in a wrong results issue.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select c2, AVG(SUM(c1)) OVER(partition by c2 
> order by c2 nulls first) FROM `tblWnulls.parquet` group by c2;
> +---++
> |  c2   | EXPR$1 |
> +---++
> | a | 11152.0|
> | b | 41.0   |
> | c | 56.0   |
> | d | 4.294967315E9  |
> | e | 14.0   |
> | null  | 106.0  |
> +---++
> 6 rows selected (0.227 seconds)
> {noformat}
> {noformat}
> postgres=# select c2, AVG(SUM(c1)) OVER(partition by c2 order by c2 nulls 
> first) FROM t222 group by c2;
>  c2 |  avg   
> +
> |   106.
>  a  | 11152.
>  b  |41.
>  c  |56.
>  d  |4294967315.
>  e  |14.
> (6 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4802) NULLS are not first when NULLS FIRST is used with ORDER BY in window definition

2016-07-25 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz reassigned DRILL-4802:
-

Assignee: Khurram Faraaz

> NULLS are not first when NULLS FIRST is used with ORDER BY in window 
> definition
> ---
>
> Key: DRILL-4802
> URL: https://issues.apache.org/jira/browse/DRILL-4802
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>
> NULLS FIRST is not honored when used with ORDER BY inside window definition. 
> This in a wrong results issue.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select c2, AVG(SUM(c1)) OVER(partition by c2 
> order by c2 nulls first) FROM `tblWnulls.parquet` group by c2;
> +---++
> |  c2   | EXPR$1 |
> +---++
> | a | 11152.0|
> | b | 41.0   |
> | c | 56.0   |
> | d | 4.294967315E9  |
> | e | 14.0   |
> | null  | 106.0  |
> +---++
> 6 rows selected (0.227 seconds)
> {noformat}
> {noformat}
> postgres=# select c2, AVG(SUM(c1)) OVER(partition by c2 order by c2 nulls 
> first) FROM t222 group by c2;
>  c2 |  avg   
> +
> |   106.
>  a  | 11152.
>  b  |41.
>  c  |56.
>  d  |4294967315.
>  e  |14.
> (6 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4673.
-
   Resolution: Fixed
Fix Version/s: 1.8.0

Fixed in f36fec9750fedd00a063f26b9998f9a994d025ad

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: drill
> Fix For: 1.8.0
>
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4673:

Labels: doc-impacting drill  (was: drill)

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: doc-impacting, drill
> Fix For: 1.8.0
>
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-07-25 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391564#comment-15391564
 ] 

Arina Ielchiieva commented on DRILL-3149:
-

Fixed in 5ca2340a0a83412aa8fc8b077b72eca5f55e4226

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-4746.
-
Resolution: Fixed

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4746
> URL: https://issues.apache.org/jira/browse/DRILL-4746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.8.0
>
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4748) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-4748.
---
Resolution: Duplicate

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4748
> URL: https://issues.apache.org/jira/browse/DRILL-4748
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Priority: Critical
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4747) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-4747.
---
Resolution: Duplicate

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4747
> URL: https://issues.apache.org/jira/browse/DRILL-4747
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Priority: Critical
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3149) TextReader should support multibyte line delimiters

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-3149.
-
Resolution: Fixed

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4749) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-4749.
---
Resolution: Duplicate

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4749
> URL: https://issues.apache.org/jira/browse/DRILL-4749
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Priority: Critical
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4802) NULLS are not first when NULLS FIRST is used with ORDER BY in window definition

2016-07-25 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391577#comment-15391577
 ] 

Khurram Faraaz commented on DRILL-4802:
---

I did some more investigation and here is what I have found.
While we do have tests that cover nulls first in our window function tests, we 
do not have a test similar to the one 
shown in case (ii) below. So this seems to be a general window functions issue 
and is not specific to nested aggregates.

Schema details of parquet file used in below tests

{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema ~/tblWnulls.parquet
message root {
  optional int32 c1;
  optional binary c2 (UTF8);
}
{noformat}

case (i) Order by c1 nulls first, returns correct results. Note that nulls are 
first in column c1 for each group in c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c1 
nulls first), c1, c2 FROM `tblWnulls.parquet`;
++-+---+
|   EXPR$0   | c1  |  c2   |
++-+---+
| 0.0| 0   | a |
| 0.5| 1   | a |
| 2.0| 5   | a |
| 4.0| 10  | a |
| 5.4| 11  | a |
| 6.833  | 14  | a |
| 1593.142857142857  | 1   | a |
| 2.0| 2   | b |
| 5.5| 9   | b |
| 8.0| 13  | b |
| 10.25  | 17  | b |
| null   | null| c |
| 4.0| 4   | c |
| 5.0| 6   | c |
| 6.0| 8   | c |
| 7.5| 12  | c |
| 9.334  | 13  | c |
| 9.334  | 13  | c |
| null   | null| d |
| null   | null| d |
| 10.0   | 10  | d |
| 10.5   | 11  | d |
| 1.07374182875E9| 2147483647  | d |
| 1.07374182875E9| 2147483647  | d |
| -1.0   | -1  | e |
| 7.0| 15  | e |
| null   | null| null  |
| 19.0   | 19  | null  |
| 32777.5| 65536   | null  |
| 355185.0   | 100 | null  |
++-+---+
30 rows selected (0.145 seconds)
{noformat}

case (2) order by c2 nulls first, does not return correct results. Note that 
nulls are NOT first in column c1 for each group in c2.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select AVG(c1) OVER(partition by c2 order by c2 
nulls first), c1, c2 FROM `tblWnulls.parquet`;
++-+---+
|   EXPR$0   | c1  |  c2   |
++-+---+
| 1593.142857142857  | 1   | a |
| 1593.142857142857  | 5   | a |
| 1593.142857142857  | 10  | a |
| 1593.142857142857  | 11  | a |
| 1593.142857142857  | 1   | a |
| 1593.142857142857  | 14  | a |
| 1593.142857142857  | 0   | a |
| 10.25  | 17  | b |
| 10.25  | 9   | b |
| 10.25  | 13  | b |
| 10.25  | 2   | b |
| 9.334  | 6   | c |
| 9.334  | 13  | c |
| 9.334  | 8   | c |
| 9.334  | null| c |
| 9.334  | 4   | c |
| 9.334  | 12  | c |
| 9.334  | 13  | c |
| 1.07374182875E9| 2147483647  | d |
| 1.07374182875E9| null| d |
| 1.07374182875E9| 11  | d |
| 1.07374182875E9| null| d |
| 1.07374182875E9| 2147483647  | d |
| 1.07374182875E9| 10  | d |
| 7.0| 15  | e |
| 7.0| -1  | e |
| 355185.0   | 65536   | null  |
| 355185.0   | 19  | null  |
| 355185.0   | null| null  |
| 355185.0   | 100 | null  |
++-+---+
30 rows selected (0.195 seconds)
{noformat}


> NULLS are not first when NULLS FIRST is used with ORDER BY in window 
> definition
> ---
>
> Key: DRILL-4802
> URL: https://issues.apache.org/jira/browse/DRILL-4802
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.8.0
> Environment: 4 node CentOS cluster
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>
> NULLS FIRST is n

[jira] [Commented] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392312#comment-15392312
 ] 

ASF GitHub Bot commented on DRILL-4786:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/553

DRILL-4786: Read the metadata cache file from the least common ancest…

…or directory when multiple partitions are selected.

Handle wildcards appropriately when metadata cache is present.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill DRILL-4786-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #553


commit 002898ac21738cdcd9f367969d91fa7b7c5b2a3c
Author: Aman Sinha 
Date:   2016-07-22T23:42:09Z

DRILL-4786: Read the metadata cache file from the least common ancestor 
directory when multiple partitions are selected.

Handle wildcards appropriately when metadata cache is present.




> Improve metadata cache performance for queries with multiple partitions
> ---
>
> Key: DRILL-4786
> URL: https://issues.apache.org/jira/browse/DRILL-4786
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> Consider  queries of the following type run against Parquet data with 
> metadata caching:   
> {noformat}
> SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
> {noformat}
> For such queries, Drill will read the metadata cache file from the top level 
> directory 'A', which is not very efficient since we are only interested in 
> the files  from some subdirectories of 'B'.   DRILL-4530 improves the 
> performance of such queries when the leaf level directory is a single 
> partition.  Here, there are 3 subpartitions due to the IN list.   We can 
> build upon the DRILL-4530 enhancement by at least reading the cache file from 
> the immediate parent level  `/A/B`  instead of the top level.  
> The goal of this JIRA is to improve performance for such types of queries.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392328#comment-15392328
 ] 

ASF GitHub Bot commented on DRILL-4786:
---

Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/553
  
@jinfengni could you pls review the PR since you reviewed the related PR 
earlier ?  thanks. 


> Improve metadata cache performance for queries with multiple partitions
> ---
>
> Key: DRILL-4786
> URL: https://issues.apache.org/jira/browse/DRILL-4786
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> Consider  queries of the following type run against Parquet data with 
> metadata caching:   
> {noformat}
> SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
> {noformat}
> For such queries, Drill will read the metadata cache file from the top level 
> directory 'A', which is not very efficient since we are only interested in 
> the files  from some subdirectories of 'B'.   DRILL-4530 improves the 
> performance of such queries when the leaf level directory is a single 
> partition.  Here, there are 3 subpartitions due to the IN list.   We can 
> build upon the DRILL-4530 enhancement by at least reading the cache file from 
> the immediate parent level  `/A/B`  instead of the top level.  
> The goal of this JIRA is to improve performance for such types of queries.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4499) Remove unused classes

2016-07-25 Thread Sudheesh Katkam (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheesh Katkam resolved DRILL-4499.

Resolution: Fixed

Fixed in 
[5a7d4c3|https://github.com/apache/drill/commit/5a7d4c3983747a778e6a29d3450dd18871e98f2c].

> Remove unused classes
> -
>
> Key: DRILL-4499
> URL: https://issues.apache.org/jira/browse/DRILL-4499
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> List of unused classes that I've tracked over time:
> exec/interpreter/src/test/java/org/apache/drill/exec/expr/TestPrune.java
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/annotations/MethodMap.java
> exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/FunctionBody.java
> exec/java-exec/src/main/java/org/apache/drill/exec/ops/Multitimer.java
> exec/java-exec/src/main/java/org/apache/drill/exec/ops/QuerySetup.java
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/AvailabilityListener.java
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/ControlCommand.java
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/SendProgress.java
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/data/SendProgress.java
> exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/DrillUser.java
> exec/java-exec/src/main/java/org/apache/drill/exec/store/RecordRecorder.java
> exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/PartialWork.java
> exec/java-exec/src/main/java/org/apache/drill/exec/util/AtomicState.java
> exec/java-exec/src/main/java/org/apache/drill/exec/work/RecordOutputStream.java
> exec/java-exec/src/main/java/org/apache/drill/exec/work/ResourceRequest.java
> exec/java-exec/src/main/java/org/apache/drill/exec/work/RootNodeDriver.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4800) Improve parquet reader performance

2016-07-25 Thread Parth Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392388#comment-15392388
 ] 

Parth Chandra commented on DRILL-4800:
--

Good point. I'll include that in the benchmarking phase after making the first 
set of changes. 

> Improve parquet reader performance
> --
>
> Key: DRILL-4800
> URL: https://issues.apache.org/jira/browse/DRILL-4800
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>
> Reported by a user in the field - 
> We're generally getting read speeds of about 100-150 MB/s/node on PARQUET 
> scan operator. This seems a little low given the number of drives on the node 
> - 24. We're looking for options we can improve the performance of this 
> operator as most of our queries are I/O bound. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4514) Add describe schema command

2016-07-25 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392490#comment-15392490
 ] 

Robert Hou commented on DRILL-4514:
---

Tests have been added, commit: cdcb7a0736646105ae01db8d49b88de22977a336.

Tests pass.

> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {code:sql}
>  DESCRIBE SCHEMA dfs.tmp;
> {code}
> {noformat}
> +++
> | schema | properties |
> +++
> | dfs.tmp | {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> },
> "sequencefile" : {
>   "type" : "sequencefile",
>   "extensions" : [ "seq" ]
> },
> "csvh" : {
>   "type" : "text",
>   "extensions" : [ "csvh" ],
>   "extractHeader" : true,
>   "delimiter" : ","
> }
>   },
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> } |
> +++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4514) Add describe schema command

2016-07-25 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-4514.
-

Tests pass.

> Add describe schema  command
> -
>
> Key: DRILL-4514
> URL: https://issues.apache.org/jira/browse/DRILL-4514
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: Future
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> Add describe database  command which will return directory 
> associated with a database on the fly.
> Syntax:
> describe database 
> describe schema 
> Output:
> {code:sql}
>  DESCRIBE SCHEMA dfs.tmp;
> {code}
> {noformat}
> +++
> | schema | properties |
> +++
> | dfs.tmp | {
>   "type" : "file",
>   "enabled" : true,
>   "connection" : "file:///",
>   "config" : null,
>   "formats" : {
> "psv" : {
>   "type" : "text",
>   "extensions" : [ "tbl" ],
>   "delimiter" : "|"
> },
> "csv" : {
>   "type" : "text",
>   "extensions" : [ "csv" ],
>   "delimiter" : ","
> },
> "tsv" : {
>   "type" : "text",
>   "extensions" : [ "tsv" ],
>   "delimiter" : "\t"
> },
> "parquet" : {
>   "type" : "parquet"
> },
> "json" : {
>   "type" : "json",
>   "extensions" : [ "json" ]
> },
> "avro" : {
>   "type" : "avro"
> },
> "sequencefile" : {
>   "type" : "sequencefile",
>   "extensions" : [ "seq" ]
> },
> "csvh" : {
>   "type" : "text",
>   "extensions" : [ "csvh" ],
>   "extractHeader" : true,
>   "delimiter" : ","
> }
>   },
>   "location" : "/tmp",
>   "writable" : true,
>   "defaultInputFormat" : null
> } |
> +++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392556#comment-15392556
 ] 

ASF GitHub Bot commented on DRILL-4682:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r72131382
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillCompoundIdentifier.java
 ---
@@ -69,31 +70,38 @@ public void addIndex(int index, SqlParserPos pos){
 }
   }
 
-  public SqlNode getAsSqlNode(){
-if(ids.size() == 1){
+  public SqlNode getAsSqlNode(Set fullSchemasSet) 
{
--- End diff --

It would be great don't convert original calcite `SqlNode` with 
`CompoundIdentifierConverter`. 
In that case unit tests from my PR would have passed successfully but drill 
functionality with nested complex schema wouldn't work (ex: quering json 
arrays). So I think we can't refuse from `DrillParserWithCompoundIdConverter` 
logic. 

And the main idea of this PR to improve that CompoundIdentifierConverter 
don't ignore full schema in the beginning of the identifier.
If I missed something please correct me.


> Allow full schema identifier in SELECT clause
> -
>
> Key: DRILL-4682
> URL: https://issues.apache.org/jira/browse/DRILL-4682
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Andries Engelbrecht
>
> Currently Drill requires aliases to identify columns in the SELECT clause 
> when working with multiple tables/workspaces.
> Many BI/Analytical and other tools by default will use the full schema 
> identifier in the select clause when generating SQL statements for execution 
> for generic JDBC or ODBC sources. Not supporting this feature causes issues 
> and a slower adoption of utilizing Drill as an execution engine within the 
> larger Analytical SQL community.
> Propose to support 
> SELECT ... FROM 
> ..
> Also see DRILL-3510 for double quote support as per ANSI_QUOTES
> SELECT ""."".""."" FROM 
> ""."".""
> Which is very common generic SQL being generated by most tools when dealing 
> with a generic SQL data source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2330) Add support for nested aggregate expressions for window aggregates

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-2330:
-
Reviewer: Khurram Faraaz

> Add support for nested aggregate expressions for window aggregates
> --
>
> Key: DRILL-2330
> URL: https://issues.apache.org/jira/browse/DRILL-2330
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
> Fix For: 1.8.0
>
> Attachments: drillbit.log
>
>
> Aggregate expressions currently cannot be nested. 
> *The following query fails to validate:*
> {code:sql}
> select avg(sum(i_item_sk)) from item;
> {code}
> Error:
> Query failed: SqlValidatorException: Aggregate expressions cannot be nested
> Log attached. 
> Reference: TPCDS queries (20, 63, 98, ...) fail to execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392646#comment-15392646
 ] 

ASF GitHub Bot commented on DRILL-4682:
---

Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r72142344
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/CompoundIdentifierConverter.java
 ---
@@ -115,6 +119,18 @@ public SqlNode visitChild(
   enableComplex = true;
 }
   }
+  if (expr.getKind() == SqlKind.SELECT) {
+if (((SqlSelect) expr).getFrom() instanceof 
DrillCompoundIdentifier) {
+  fullSchemasSet.add((DrillCompoundIdentifier) ((SqlSelect) 
expr).getFrom());
+} else if (((SqlSelect) expr).getFrom() instanceof SqlJoin) {
--- End diff --

You are right. I should add recursive checking of full-schema identifier of 
every nested query. 
Anyway it doesn't help now because when I ran the query with nested 
subqueries with different schema-qualified tables I got an error from calcite. 
I've already written about it in calcite dev list. 


> Allow full schema identifier in SELECT clause
> -
>
> Key: DRILL-4682
> URL: https://issues.apache.org/jira/browse/DRILL-4682
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Andries Engelbrecht
>
> Currently Drill requires aliases to identify columns in the SELECT clause 
> when working with multiple tables/workspaces.
> Many BI/Analytical and other tools by default will use the full schema 
> identifier in the select clause when generating SQL statements for execution 
> for generic JDBC or ODBC sources. Not supporting this feature causes issues 
> and a slower adoption of utilizing Drill as an execution engine within the 
> larger Analytical SQL community.
> Propose to support 
> SELECT ... FROM 
> ..
> Also see DRILL-3510 for double quote support as per ANSI_QUOTES
> SELECT ""."".""."" FROM 
> ""."".""
> Which is very common generic SQL being generated by most tools when dealing 
> with a generic SQL data source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4530:
-
Reviewer: Rahul Challapalli

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4746:
-
Reviewer: Khurram Faraaz

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4746
> URL: https://issues.apache.org/jira/browse/DRILL-4746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.8.0
>
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4743:
-
Reviewer: Robert Hou

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4744) Fully Qualified JDBC Plugin Tables return Table not Found via Rest API

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4744:
-
Reviewer: Chun Chang

> Fully Qualified JDBC Plugin Tables return Table not Found via Rest API
> --
>
> Key: DRILL-4744
> URL: https://issues.apache.org/jira/browse/DRILL-4744
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.6.0
>Reporter: John Omernik
>Assignee: Roman Lozovyk
>Priority: Minor
> Fix For: 1.7.0
>
>
> When trying to query a JDBC table via authenticated Rest API, using a fully 
> qualified table name returns table not found.  This does not occur in 
> sqlline, and a workaround is to "use pluginname.mysqldatabase" prior to the 
> query. (Then the fully qualified table name will work)
> Plugin Name: mysql
> Mysql Database: events
> Mysql Table: curevents
> Via Rest:
> select * from mysql.events.curevents limit 10;
> Fail with "VALIDATION ERROR "Table 'mysql.events.curevents' not found
> Via Rest:
> use mysql.events;
> select * from mysql.events.curevents limit 10;
> - Success. 
> Via SQL line, authenticating with the same username, you can connect, and run 
> select * from mysql.events.curevents limit 10;
> without issue. (and without the use mysql.events)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4794) Regression: Wrong result for query with disjunctive partition filters

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4794:
-
Reviewer: Rahul Challapalli

> Regression: Wrong result for query with disjunctive partition filters
> -
>
> Key: DRILL-4794
> URL: https://issues.apache.org/jira/browse/DRILL-4794
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> For a query that contains certain types of disjunctive filter conditions such 
> as 'dir0=x OR dir1=y'  we get wrong result when metadata caching is used.  
> This is a regression due to DRILL-4530.  
> Note that the filter involves OR of 2 different directory levels. For the 
> normal case of OR condition at the same level the problem does not occur. 
> Correct result (without metadata cache) 
> {noformat}
> 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or 
> dir1='Q3' ;
> +-+
> | EXPR$0  |
> +-+
> | 60  |
> +-+
> {noformat}
> Wrong result (with metadata cache):
> {noformat}
> 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or 
> dir1='Q3' ;
> +-+
> | EXPR$0  |
> +-+
> | 50  |
> +-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4664:
-
Reviewer: Chun Chang

> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.8.0
>
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly with count function.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset test.json for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}
> Example of query:
> {code}select count(t.oooi) from dfs.tmp.`test.json` t{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3559) Make filename available to sql statments just like dirN

2016-07-25 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-3559.
--

git.commit.id.abbrev=ba22806

Verified feature and added tests to automation framework.

> Make filename available to sql statments just like dirN
> ---
>
> Key: DRILL-3559
> URL: https://issues.apache.org/jira/browse/DRILL-3559
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Affects Versions: 1.1.0
>Reporter: Stefán Baxter
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3474) Add implicit file columns support

2016-07-25 Thread Krystal (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krystal closed DRILL-3474.
--

git.commit.id.abbrev=ba22806

Verified feature and added tests to automation framework.

> Add implicit file columns support
> -
>
> Key: DRILL-3474
> URL: https://issues.apache.org/jira/browse/DRILL-3474
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> I could not find another ticket which talks about this ...
> The file name should be a column which can be selected or filtered when 
> querying a directory just like dir0, dir1 are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4783:
-
Reviewer: Rahul Challapalli

> Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
> ---
>
> Key: DRILL-4783
> URL: https://issues.apache.org/jira/browse/DRILL-4783
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.8.0
>
>
> Flatten failed to work on top of convert_from when the resultset is empty. 
> For a HBase table like this:
> 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
> from hbase.`/tmp/flattentest` t;
> +--+
> |  EXPR$0 
>  |
> +--+
> | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain 
> View"}]}|
> | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} 
>  |
> | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
> Paul"}]}  |
> +--+
> Flatten works when row_key is in (1,2,3)
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=1) t1;
> +---+
> |  EXPR$0   |
> +---+
> | {"city":"SunnyVale"}  |
> | {"city":"Palo Alto"}  |
> | {"city":"Mountain View"}  |
> +---+
> But Flatten throws exception if the resultset is empty
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=4) t1;
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> Fragment 0:0
> [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392657#comment-15392657
 ] 

ASF GitHub Bot commented on DRILL-4682:
---

Github user julianhyde commented on a diff in the pull request:

https://github.com/apache/drill/pull/549#discussion_r72143283
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/DrillCompoundIdentifier.java
 ---
@@ -69,31 +70,38 @@ public void addIndex(int index, SqlParserPos pos){
 }
   }
 
-  public SqlNode getAsSqlNode(){
-if(ids.size() == 1){
+  public SqlNode getAsSqlNode(Set fullSchemasSet) 
{
--- End diff --

I can't argue with the facts.

But you're writing an ugly piece of code and building up technical debt. 
That is a bad idea.

Maybe you need to revisit how you deal with JSON arrays.


> Allow full schema identifier in SELECT clause
> -
>
> Key: DRILL-4682
> URL: https://issues.apache.org/jira/browse/DRILL-4682
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Andries Engelbrecht
>
> Currently Drill requires aliases to identify columns in the SELECT clause 
> when working with multiple tables/workspaces.
> Many BI/Analytical and other tools by default will use the full schema 
> identifier in the select clause when generating SQL statements for execution 
> for generic JDBC or ODBC sources. Not supporting this feature causes issues 
> and a slower adoption of utilizing Drill as an execution engine within the 
> larger Analytical SQL community.
> Propose to support 
> SELECT ... FROM 
> ..
> Also see DRILL-3510 for double quote support as per ANSI_QUOTES
> SELECT ""."".""."" FROM 
> ""."".""
> Which is very common generic SQL being generated by most tools when dealing 
> with a generic SQL data source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4673:
-
Reviewer: Chun Chang

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: doc-impacting, drill
> Fix For: 1.8.0
>
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4175) IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4175:
-
Reviewer: Rahul Challapalli

> IOBE may occur in Calcite RexProgramBuilder when queries are submitted 
> concurrently
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
> Fix For: 1.8.0
>
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392660#comment-15392660
 ] 

Rahul Challapalli commented on DRILL-4746:
--

[~khfaraaz] I already have a branch where I moved the failing tests back in 
passing. So now you have the fun task of reproducing the issue with new tests :)

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4746
> URL: https://issues.apache.org/jira/browse/DRILL-4746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.8.0
>
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4800) Improve parquet reader performance

2016-07-25 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra reassigned DRILL-4800:


Assignee: Parth Chandra

> Improve parquet reader performance
> --
>
> Key: DRILL-4800
> URL: https://issues.apache.org/jira/browse/DRILL-4800
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>
> Reported by a user in the field - 
> We're generally getting read speeds of about 100-150 MB/s/node on PARQUET 
> scan operator. This seems a little low given the number of drives on the node 
> - 24. We're looking for options we can improve the performance of this 
> operator as most of our queries are I/O bound. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1950) Implement filter pushdown for Parquet

2016-07-25 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392753#comment-15392753
 ] 

Jinfeng Ni commented on DRILL-1950:
---

I put an initial draft of proposal to add row group lever filter pushdown for 
parquet, after looking at the initial patch Adam submitted.

The draft is here.  Please let me know if you have any comments, suggestions. 
Thanks!

[1] 
https://docs.google.com/document/d/1obTtgjaY6zMaKO97gtZxHCrHfo2jXmLJ_66DhY08O2o/edit?usp=sharing

> Implement filter pushdown for Parquet
> -
>
> Key: DRILL-1950
> URL: https://issues.apache.org/jira/browse/DRILL-1950
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Reporter: Jason Altekruse
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: Future
>
> Attachments: DRILL-1950.1.patch.txt
>
>
> The parquet reader currently supports project pushdown, for limiting the 
> number of columns read, however it does not use filter pushdown to read a 
> subset of the requested columns. This is particularly useful with parquet 
> files that contain statistics, most importantly min and max values on pages. 
> Evaluating predicates against these values could save some major reading and 
> decoding time.
> The largest barrier to implementing this is the current design of the reader. 
> Firstly, we currently have two separate parquet readers, one for reading flat 
> files very quickly and another or reading complex data. There are 
> enhancements we can make the the flat reader, to make it support nested data 
> in a much more efficient manner. However the speed of the flat file reader 
> currently comes from being able to make vectorized copies out the the parquet 
> file. This design is somewhat at odds with filter pushdown, as we will only 
> can make useful vectorized copies if the filter matches a large run of values 
> within the file. This might not be too rare a case, assuming files are often 
> somewhat sorted on a primary field like date or a numeric key, and these are 
> often fields used to limit the query to a subset of the data. However for 
> cases where we are filter out a few records here and there, we should just 
> make individual copies.
> We need to do more design work on the best way to balance performance with 
> these use cases in mind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4126) Adding HiveMetaStore caching when impersonation is enabled.

2016-07-25 Thread Dechang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dechang Gu closed DRILL-4126.
-

Verified with Apache Drill 1.5.0 (git id 3f228d3) against the commit (git id 
539cbba) prior to the patch, querying INFORTION_SCHEMA. Significant reduction 
in the function calls to HIVE API.  
Before the patch (git id 539cbba):  
-- get_all_databases was called 340 times
-- get_all_tables was called 336 times.

with the patch (AD 1.5.0 git id 3f228d3), for the same query and same databases:
   -- get_all_databases was only called 2 times, and
   -- get_all_tables was called 38 times.

So the fixed LGTM, and the jira is closed.

> Adding HiveMetaStore caching when impersonation is enabled. 
> 
>
> Key: DRILL-4126
> URL: https://issues.apache.org/jira/browse/DRILL-4126
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, HiveMetastore caching is used only when impersonation is disabled, 
> such that all the hivemetastore call goes through 
> NonCloseableHiveClientWithCaching [1]. However, if impersonation is enabled, 
> caching is not used for HiveMetastore access.
> This could significantly increase the planning time when hive storage plugin 
> is enabled, or when running a query against INFORMATION_SCHEMA. Depending on 
> the # of databases/tables in Hive storage plugin, the planning time or 
> INFORMATION_SCHEMA query could become unacceptable. This becomes even worse 
> if the hive metastore is running on a different node from drillbit, making 
> the access of hivemetastore even slower.
> We are seeing that it could takes 30~60 seconds for planning time, or 
> execution time for INFORMATION_SCHEMA query.  The long planning or execution 
> time for INFORMATION_SCHEMA query prevents Drill from acting "interactively" 
> for such queries. 
> We should enable caching when impersonation is used. As long as the 
> authorizer verifies the user has the access to databases/tables, we should 
> get the data from caching. By doing that, we should see reduced number of api 
> call to HiveMetaStore.
> [1] 
> https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the options
{code} 
planner.filter.min_selectivity_estimate_factor 
{code} 

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the options
> {code} 
> planner.filter.min_selectivity_estimate_factor 
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} 
planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the options
{code} 
planner.filter.min_selectivity_estimate_factor 
{code} 


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code} 
> planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} 
planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code} planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Gautam Kumar Parai (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Kumar Parai updated DRILL-4743:
--
Description: 
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code}planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.

  was:
The underlying problem is filter selectivity under-estimate for a query with 
complicated predicates e.g. deeply nested and/or predicates. This leads to 
under parallelization of the major fragment doing the join. 

To really resolve this problem we need table/column statistics to correctly 
estimate the selectivity. However, in the absence of statistics OR even when 
existing statistics are insufficient to get a correct estimate of selectivity 
this will serve as a workaround.

For now, the fix is to provide options for controlling the lower and upper 
bounds for filter selectivity. The user can use the following options. The 
selectivity can be varied between 0 and 1 with min selectivity always less than 
or equal to max selectivity.
{code} planner.filter.min_selectivity_estimate_factor 
planner.filter.max_selectivity_estimate_factor 
{code} 

When using 'explain plan including all attributes for ' it should cap the 
estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
downstream is not directly controlled by these options. However, they may 
change as a result of dependency between different operators.


> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.
> For now, the fix is to provide options for controlling the lower and upper 
> bounds for filter selectivity. The user can use the following options. The 
> selectivity can be varied between 0 and 1 with min selectivity always less 
> than or equal to max selectivity.
> {code}planner.filter.min_selectivity_estimate_factor 
> planner.filter.max_selectivity_estimate_factor 
> {code} 
> When using 'explain plan including all attributes for ' it should cap the 
> estimated ROWCOUNT based on these options. Estimated ROWCOUNT of operators 
> downstream is not directly controlled by these options. However, they may 
> change as a result of dependency between different operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)