[jira] [Commented] (DRILL-6193) Latest Calcite optimized out join condition and cause "This query cannot be planned possibly due to either a cartesian join or an inequality join"

2018-02-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381032#comment-16381032
 ] 

Julian Hyde commented on DRILL-6193:


By design, you cannot tell where the relational expression came from.

One case where a cartesian join is "safe" is where one or both sides have at 
zero or one rows. Then the join has no multiplying effect. Looks like that may 
hold in this case, if you have a PK on Orders. There is a statistic 
RelMdMaxRowCount.

If you are going to disable cartesian joins, why not do it at the SQL level 
rather than the algebra level? There are legitimate patterns where cartesian 
joins are the best plan.

> Latest Calcite optimized out join condition and cause "This query cannot be 
> planned possibly due to either a cartesian join or an inequality join"
> --
>
> Key: DRILL-6193
> URL: https://issues.apache.org/jira/browse/DRILL-6193
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.13.0
>Reporter: Chunhui Shi
>Assignee: Hanumath Rao Maduri
>Priority: Blocker
> Fix For: 1.13.0
>
>
> I got the same error on apache master's MapR profile on the tip(before Hive 
> upgrade) and on changeset 9e944c97ee6f6c0d1705f09d531af35deed2e310, the last 
> commit of Calcite upgrade with the failed query reported in functional test 
> but now it is on parquet file:
>  
> {quote}SELECT L.L_QUANTITY, L.L_DISCOUNT, L.L_EXTENDEDPRICE, L.L_TAX
>  
> FROM cp.`tpch/lineitem.parquet` L, cp.`tpch/orders.parquet` O
> WHERE cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) AND 
> cast(L.L_LINENUMBER as int) = 7 AND cast(L.L_ORDERKEY as int) = 10208 AND 
> cast(O.O_ORDERKEY as int) = 10208;
>  {quote}
> However, built Drill on commit ef0fafea214e866556fa39c902685d48a56001e1, the 
> commit right before Calcite upgrade commits, the same query worked.
> This was caused by latest Calcite simplified the predicates and during this 
> process, "cast(L.L_ORDERKEY as int) = cast(O.O_ORDERKEY as int) " was 
> considered redundant and was removed, so the logical plan of this query is 
> getting an always true condition for Join:
> {quote}DrillJoinRel(condition=[true], joinType=[inner])
> {quote}
> While in previous version we have 
> {quote}DrillJoinRel(condition=[=($5, $0)], joinType=[inner])
> {quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6159) No need to offset rows if order by is not specified in the query.

2018-02-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365062#comment-16365062
 ] 

Julian Hyde commented on DRILL-6159:


You're technically correct, but what's the point of doing the optimization? 
Consider why the user wrote that query. Because they wanted to do something 
like pagination. With your change, the syntax will be useless for pagination, 
and they will have to write something more complicated.

Some data sets have a natural order, and if people can rely on that order, why 
not let them?

> No need to offset rows if order by is not specified in the query.
> -
>
> Key: DRILL-6159
> URL: https://issues.apache.org/jira/browse/DRILL-6159
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.12.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: Future
>
>
> For the queries which have offset and limit and no order by no need to add 
> the offset to limit during pushdown of the limit.
> Sql doesn't guarantee order in the output if no order by is specified in the 
> query. It is observed that for the queries with offset and limit and no order 
> by, current optimizer is adding the offset and limit and limiting those many 
> rows. Doing so will not early exit the query.
> Here is an example for a query.
> {code}
> select zz1,zz2,a11 from dfs.tmp.viewtmp limit 10 offset 1000
> 00-00Screen : rowType = RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 
> 1.01E7, cumulative cost = {1.06048844E8 rows, 5.54015404E8 cpu, 0.0 io, 
> 1.56569100288E11 network, 4.64926176E7 memory}, id = 787
> 00-01  Project(zz1=[$0], zz2=[$1], a11=[$2]) : rowType = RecordType(ANY 
> zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 
> rows, 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 
> memory}, id = 786
> 00-02SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY zz2, 
> ANY a11): rowcount = 1.01E7, cumulative cost = {1.05038844E8 rows, 
> 5.53005404E8 cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id 
> = 785
> 00-03  Limit(offset=[1000], fetch=[10]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = 
> {9.4938844E7 rows, 5.42905404E8 cpu, 0.0 io, 1.56569100288E11 network, 
> 4.64926176E7 memory}, id = 784
> 00-04UnionExchange : rowType = RecordType(ANY zz1, ANY zz2, ANY 
> a11): rowcount = 1.01E7, cumulative cost = {8.4838844E7 rows, 5.02505404E8 
> cpu, 0.0 io, 1.56569100288E11 network, 4.64926176E7 memory}, id = 783
> 01-01  SelectionVectorRemover : rowType = RecordType(ANY zz1, ANY 
> zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {7.4738844E7 rows, 
> 4.21705404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 782
> 01-02Limit(fetch=[1010]) : rowType = RecordType(ANY zz1, 
> ANY zz2, ANY a11): rowcount = 1.01E7, cumulative cost = {6.4638844E7 rows, 
> 4.11605404E8 cpu, 0.0 io, 3.2460300288E10 network, 4.64926176E7 memory}, id = 
> 781
> 01-03  Project(zz1=[$0], zz2=[$2], a11=[$1]) : rowType = 
> RecordType(ANY zz1, ANY zz2, ANY a11): rowcount = 2.3306983E7, cumulative 
> cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 3.2460300288E10 network, 
> 4.64926176E7 memory}, id = 780
> 01-04HashJoin(condition=[=($0, $2)], joinType=[left]) : 
> rowType = RecordType(ANY ZZ1, ANY A, ANY ZZ2): rowcount = 2.3306983E7, 
> cumulative cost = {5.4538844E7 rows, 3.71205404E8 cpu, 0.0 io, 
> 3.2460300288E10 network, 4.64926176E7 memory}, id = 779
> 01-06  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/csvd1, numFiles=3, columns=[`ZZ1`, `A`], 
> files=[maprfs:/tmp/csvd1/Daamulti11random2.csv, 
> maprfs:/tmp/csvd1/Daamulti11random21.csv, 
> maprfs:/tmp/csvd1/Daamulti11random211.csv]]]) : rowType = RecordType(ANY 
> ZZ1, ANY A): rowcount = 2.3306983E7, cumulative cost = {2.3306983E7 rows, 
> 4.6613966E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 776
> 01-05  BroadcastExchange : rowType = RecordType(ANY ZZ2): 
> rowcount = 2641626.0, cumulative cost = {5283252.0 rows, 2.3774634E7 cpu, 0.0 
> io, 3.2460300288E10 network, 0.0 memory}, id = 778
> 02-01Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/csvd2, numFiles=1, columns=[`ZZ2`], 
> files=[maprfs:/tmp/csvd2/D222random2.csv]]]) : rowType = RecordType(ANY ZZ2): 
> rowcount = 2641626.0, cumulative cost = {2641626.0 rows, 2641626.0 cpu, 0.0 
> io, 0.0 network, 0.0 memory}, id = 777
> {code}
> The limit pushed down is  Limit(fetch=[1010]) instead it sh

[jira] [Commented] (DRILL-6135) New Feature: SHOW CREATE TABLE / VIEW command

2018-02-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354347#comment-16354347
 ] 

Julian Hyde commented on DRILL-6135:


After some research, I agree that {{SHOW CREATE}} seems to be the standard. As 
you say, MySQL and Presto (and its derivative, Athena) support it. Also Hive 
supports it; see HIVE-967. Oracle, PostgreSQL, DB2, SQL Server do not have an 
equivalent (other than using stored procedures).

There is a minor concern for how we would generate DDL for sub-objects such as 
columns and foreign keys, which do not have their own CREATE statement but 
nevertheless could have their own DDL fragment.

> New Feature: SHOW CREATE TABLE / VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6135) New Feature: SHOW CREATE VIEW command

2018-02-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352772#comment-16352772
 ] 

Julian Hyde edited comment on DRILL-6135 at 2/5/18 6:56 PM:


How about {{SHOW DDL FOR  [ . ]}}? I can imagine this 
being useful for other object types besides views.


was (Author: julianhyde):
How about {{SHOW DDL FOR  [ . ]}}? I can imagine this 
being useful for other object types besides views?

> New Feature: SHOW CREATE VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6135) New Feature: SHOW CREATE VIEW command

2018-02-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352772#comment-16352772
 ] 

Julian Hyde commented on DRILL-6135:


How about {{SHOW DDL FOR  [ . ]}}? I can imagine this 
being useful for other object types besides views?

> New Feature: SHOW CREATE VIEW command
> -
>
> Key: DRILL-6135
> URL: https://issues.apache.org/jira/browse/DRILL-6135
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata, Storage - Information Schema
>Affects Versions: 1.10.0
> Environment: MapR 5.2 + Kerberos
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to implement
> {code:java}
> SHOW CREATE VIEW ;{code}
> A colleague and I just had to cat the view file which is non-pretty json and 
> hard to read a large view creation statement that could have been presented 
> in drill shell and formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-5578) Drill fails on date functions in 'where clause' when queried on a JDBC Storage plugin

2017-12-11 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287010#comment-16287010
 ] 

Julian Hyde commented on DRILL-5578:


These are the lines in Calcite JdbcImplementor.java:

{code}
switch (literal.getTypeName().getFamily()) {
case CHARACTER:
  return SqlLiteral.createCharString((String) literal.getValue2(), POS);
...
case ANY:
case NULL:
  switch (literal.getTypeName()) {
  case NULL:
return SqlLiteral.createNull(POS);
  // fall through
  }
default:
  throw new AssertionError(literal + ": " + literal.getTypeName());
}
{code}

So it appears that Calcite can't convert an interval literal from a RexNode to 
a SQL string. Possible explanations: (1) Calcite usually represents interval 
literals as numerics - but Drill is doing things differently; (2) we've not 
tested before because we don't know of any standard SQL functions that accept 
intervals as arguments.

> Drill fails on date functions in 'where clause' when queried on a JDBC 
> Storage plugin
> -
>
> Key: DRILL-5578
> URL: https://issues.apache.org/jira/browse/DRILL-5578
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Rahul Raj
>
> Drill 1.9/1.10 fails on any date manipulation in the where part while 
> querying on a table from JDBC Storage plugin. Issue happens on postgres and 
> Oracle.
> The following two queries error out:
> select * from config_1.APP.EXECUTIONSTEP  where DATE_ADD(CAST(STARTED_AT as 
> DATE),interval '1' second) < CAST(CURRENT_DATE as DATE)
> select `id` from (select * from config_1.public.project_release) where 
> CAST(DATE_ADD(`start_date`,interval '19800' second(5)) AS DATE) = DATE 
> '2011-11-11'
> However date manipulation succeeds when date function are applied on the 
> selected fields
>  
> select DATE_ADD(CAST(STARTED_AT as DATE),interval '1' second) < 
> CAST(CURRENT_DATE as DATE) from config_1.APP.EXECUTIONSTEP 
> select `id` from (select * from config_1.public.project_release) where 
> CAST(`start_date` AS DATE) = DATE '2011-11-11'
> Error:
> [Error Id: 048fe4b9-ecab-40fe-aca9-d57eb2df9b0c on vpc12.o3c.in:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> AssertionError: 1000: INTERVAL_DAY_TIME
> [Error Id: 048fe4b9-ecab-40fe-aca9-d57eb2df9b0c on vpc12.o3c.in:31010]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825)
>  [drill-java-exec-1.9.0.jar:1.9.0]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) 
> [drill-java-exec-1.9.0.jar:1.9.0]
>   at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
> [drill-java-exec-1.9.0.jar:1.9.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: 1000: INTERVAL_DAY_TIME
>   ... 4 common frames omitted
> Caused by: java.lang.AssertionError: 1000: INTERVAL_DAY_TIME
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:179)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:268)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:212)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:268)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:212)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:268)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor$Context.toSql(JdbcImplementor.java:212)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcRules$JdbcFilter.implement(JdbcRules.java:658)
>  ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
>   at 
> org.apache.calcite.adapter.jdbc.JdbcImplementor.visit

[jira] [Commented] (DRILL-5800) Explicitly set locale to en_US on locale-dependent tests

2017-09-18 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169759#comment-16169759
 ] 

Julian Hyde commented on DRILL-5800:


The position we took in calcite was that code shouldn't depend on the jvm's 
locale. As a server we want to operate in the client's locale.

If drill has the same policy, your change is masking a bug. And besides, your 
change will affect subsequent tests run in the same vm. It might potentially 
cause indeterminacy in the test suite. 

> Explicitly set locale to en_US on locale-dependent tests
> 
>
> Key: DRILL-5800
> URL: https://issues.apache.org/jira/browse/DRILL-5800
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Uwe L. Korn
>
> Some tests depend on the locale, i.e. they run with {{en_US}} successfully 
> but fail with {{de_DE}} due to a different decimal separator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite master branch

2017-08-31 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149431#comment-16149431
 ] 

Julian Hyde commented on DRILL-3993:


Excellent work, Roman.

(Can you please make your Jira handle more unique? I can't find you in the 
search box, because there are lots of other Romans.)

> Rebase Drill on Calcite master branch
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>Assignee: Roman
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form

2017-08-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113730#comment-16113730
 ] 

Julian Hyde edited comment on DRILL-5703 at 8/4/17 12:16 AM:
-

Actually, I'm not sure that you need to rebase. I think you need to create a 
branch that just contains the fix to this one issue, and base your pull request 
on that, not your master branch. If there are multiple commits that are all 
required to fix the issue, squash them before you create the pull request. 
Interactive rebase, e.g. {{git rebase -i origin/master}} is a good way to 
squash commits (and other things like re-ordering commits).


was (Author: julianhyde):
Actually, I'm not sure that you need to rebase. I think you need to create a 
branch that just contains the fix to this one issue, and base your pull request 
on that, not your master branch. If there are multiple commits that are all 
required to fix the issue, squash them before you create the pull request. 
Interactive rebase, e.g. `git rebase -i origin/master` is a good way to squash 
commits (and other things like re-ordering commits).

> Add Syntax Highlighting & Autocompletion to Query Form
> --
>
> Key: DRILL-5703
> URL: https://issues.apache.org/jira/browse/DRILL-5703
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Charles Givre
>
> The UI could really benefit from having syntax highlighting and 
> autocompletion in the query window as well as the form to update storage 
> plugins.  This PR adds that capability to the query form using the Ace code 
> editor (https://ace.c9.io). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form

2017-08-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113730#comment-16113730
 ] 

Julian Hyde commented on DRILL-5703:


Actually, I'm not sure that you need to rebase. I think you need to create a 
branch that just contains the fix to this one issue, and base your pull request 
on that, not your master branch. If there are multiple commits that are all 
required to fix the issue, squash them before you create the pull request. 
Interactive rebase, e.g. `git rebase -i origin/master` is a good way to squash 
commits (and other things like re-ordering commits).

> Add Syntax Highlighting & Autocompletion to Query Form
> --
>
> Key: DRILL-5703
> URL: https://issues.apache.org/jira/browse/DRILL-5703
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Charles Givre
>
> The UI could really benefit from having syntax highlighting and 
> autocompletion in the query window as well as the form to update storage 
> plugins.  This PR adds that capability to the query form using the Ace code 
> editor (https://ace.c9.io). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5703) Add Syntax Highlighting & Autocompletion to Query Form

2017-08-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113614#comment-16113614
 ] 

Julian Hyde commented on DRILL-5703:


That's a lot of commits! Looks like you need to rebase your pull request.

I don't know what kind of autocompletion you are supplying, but Calcite has a 
little-known class called SqlAdvisor that enables auto-completion of 
identifiers. For example, if the cursor is in the middle of the SELECT clause, 
it knows what table aliases and columns are available because it understands 
what is in the FROM clause. It even works if the query is mal-formed.

> Add Syntax Highlighting & Autocompletion to Query Form
> --
>
> Key: DRILL-5703
> URL: https://issues.apache.org/jira/browse/DRILL-5703
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.11.0
>Reporter: Charles Givre
>
> The UI could really benefit from having syntax highlighting and 
> autocompletion in the query window as well as the form to update storage 
> plugins.  This PR adds that capability to the query form using the Ace code 
> editor (https://ace.c9.io). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5640) NOT causes Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: AA

2017-07-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071303#comment-16071303
 ] 

Julian Hyde commented on DRILL-5640:


IIRC we fixed a bug in NOT precedence in Calcite a while ago. Might be same 
problem. 

> NOT  causes Error: SYSTEM ERROR: IllegalArgumentException: Invalid 
> value for boolean: AA
> ---
>
> Key: DRILL-5640
> URL: https://issues.apache.org/jira/browse/DRILL-5640
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.10.0
> Environment: Drill 1.10
>Reporter: N Campbell
>
> Following statement will fail, will not fail when NOT removed
> select TJOIN2.RNUM, TJOIN1.C1, TJOIN1.C2, TJOIN2.C2 as C2J2 from 
> ( 
>  values 
> ( 0, 10, 15),
> ( 1, 20, 25),
> ( 2, cast(NULL as integer), 50)
>  ) TJOIN1 (RNUM, C1, C2)
> inner join 
> (
> values ( 0, 10, 'BB'),
> ( 1, 15, 'DD'),
> ( 2, cast(NULL as integer), 'EE'),
> ( 3, 10, 'FF')
> ) TJOIN2 (RNUM, C1, C2)
> on ( TJOIN1.C1 = TJOIN2.C1 and not TJOIN2.C2 = 'AA' )
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: AA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5527) Support for querying slowly changing dimensions of HBase/MapR-DB tables on TIMESTAMP/TIMERANGE/VERSION

2017-05-19 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017558#comment-16017558
 ] 

Julian Hyde commented on DRILL-5527:


I think it would be best if we did this in the relational way. In the 
relational model a column has only one value, and the update timestamp is a 
column that belongs to the row. 

So let's suppose that each time a columns value is changed, Drill makes an 
entire new row appear. The row the same key as the previous row, but a new 
timestamp or sequence number. Or perhaps the new row has a surrogate key that 
is unique across the whole table. 

This is how people manage SCDs in traditional kimbal data warehousing. I think 
it is the simplest way to expose it for Drill users. 

> Support for querying slowly changing dimensions of HBase/MapR-DB tables on 
> TIMESTAMP/TIMERANGE/VERSION
> --
>
> Key: DRILL-5527
> URL: https://issues.apache.org/jira/browse/DRILL-5527
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HBase
>Affects Versions: 1.10.0
>Reporter: Alan Fischer e Silva
>
> HBase and MapR-DB support versioning of cell values via timestamp, but today 
> a Drill query only returns the most recent version of a cell.
> Being able to query an HBase/MapR-DB cell on it's version, timestamp or 
> timerange would be a major improvement to the HBase storage plugin in order 
> to support slowly changing dimensions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2017-05-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998565#comment-15998565
 ] 

Julian Hyde commented on DRILL-4039:


Why no test case?

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
> Attachments: DRILL-4039.patch
>
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5089) Skip initializing all enabled storage plugins for every query

2017-04-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968517#comment-15968517
 ] 

Julian Hyde commented on DRILL-5089:


How are you able to call {{SimpleCalciteSchema.from}}? Isn't 
{{SimpleCalciteSchema}} package-private?

In CALCITE-1748 you ask to override a method that returns {{CalciteSchema}} but 
in CALCITE-911 we agreed that Drill wouldn't create your own CalciteSchema 
sub-classes. What has changed?

Can you take a look at CALCITE-1742, and tell me how it relates to your 
problem? I'd rather fix Calcite's schema cache for Drill's and Phoenix's needs 
rather than let people drill holes in our APIs.

> Skip initializing all enabled storage plugins for every query
> -
>
> Key: DRILL-5089
> URL: https://issues.apache.org/jira/browse/DRILL-5089
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Abhishek Girish
>Assignee: Chunhui Shi
>Priority: Critical
>
> In a query's lifecycle, at attempt is made to initialize each enabled storage 
> plugin, while building the schema tree. This is done regardless of the actual 
> plugins involved within a query. 
> Sometimes, when one or more of the enabled storage plugins have issues - 
> either due to misconfiguration or the underlying datasource being slow or 
> being down, the overall query time taken increases drastically. Most likely 
> due the attempt being made to register schemas from a faulty plugin.
> For example, when a jdbc plugin is configured with SQL Server, and at one 
> point the underlying SQL Server db goes down, any Drill query starting to 
> execute at that point and beyond begin to slow down drastically. 
> We must skip registering unrelated schemas (& workspaces) for a query. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5360) Timestamp type documented as UTC, implemented as local time

2017-03-16 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929189#comment-15929189
 ] 

Julian Hyde commented on DRILL-5360:


Please read the SQL standard. TIMESTAMP has no time zone. The time zone is not 
UTC, or the local time zone. 

The JDBC spec is different. A java.sql.Timestamp is an instant (a point in 
time). It has no time zone but the Timestamp.toString will print it with the 
JVM's time zone.

> Timestamp type documented as UTC, implemented as local time
> ---
>
> Key: DRILL-5360
> URL: https://issues.apache.org/jira/browse/DRILL-5360
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> The Drill documentation implies that the {{Timestamp}} type is in UTC:
> bq. JDBC timestamp in year, month, date hour, minute, second, and optional 
> milliseconds format: -MM-dd HH:mm:ss.SSS. ... TIMESTAMP literals: Drill 
> stores values in Coordinated Universal Time (UTC). Drill supports time 
> functions in the range 1971 to 2037. ... Drill does not support TIMESTAMP 
> with time zone.
> The above is ambiguous. The first part talks about JDBC timestamps. From the 
> JDK Javadoc:
> bq. Timestamp: A thin wrapper around java.util.Date. ... Date class is 
> intended to reflect coordinated universal time (UTC)...
> So, a JDBC timestamp is intended to represent time in UTC. (The "indented to 
> reflect" statement leaves open the possibility of misusing {{Date}} to 
> represent times in other time zones. This was common practice in early Java 
> development and was the reason for the eventual development of the Joda, then 
> Java 8 date/time classes.)
> The Drill documentation implies that timestamp *literals* are in UTC, but a 
> careful read of the documentation does allow an interpretation that the 
> internal representation can be other than UTC. If this is true, then we would 
> also rely on a liberal reading of the Java `Timestamp` class to also not be 
> UTC. (Or, we rely on the Drill JDBC driver to convert from the (unknown) 
> server time zone to a UTC value returned by the Drill JDBC client.)
> Still, a superficial reading (and common practice) would suggest that a Drill 
> Timestamp should be in UTC.
> However, a test on a Mac, with an embedded Drillbit (run in the Pacific time 
> zone, with Daylight Savings Time in effect) shows that the Timestamp binary 
> value is actual local time:
> {code}
>   long before = System.currentTimeMillis();
>   long value = getDateValue(client, "SELECT NOW() FROM (VALUES(1))" );
>   double hrsDiff = (value - before) / (1000.00 * 60 * 60);
>   System.out.println("Hours: " + hrsDiff);
> {code}
> The above gets the actual UTC time from Java. Then, it runs a query that gets 
> Drill's idea of the current time using the {{NOW()}} function. (The 
> {{getDateValue}} function uses the new test framework to access the actual 
> {{long}} value from the returned value vector.) Finally, we compute the 
> difference between the two times, converted to hours. Output:
> {code}
> Hours: -6.975
> {code}
> As it turns out, this is the difference between UTC and PDT. So, the time is 
> in local time, not UTC.
> Since the documentation and implementation are both ambiguous, it is hard to 
> know the intent of the Drill Timestamp. Clearly, common practice is to use 
> UTC. But, there is wiggle-room.
> If the Timestamp value is supposed to be local time, then Drill should 
> provide a function to return the server's time zone offset (in ms) from UTC 
> so that the client can to the needed local-to-UTC conversion to get a true 
> timestamp.
> On the other hand, if the Timestamp is supposed to be UTC (per common 
> practice), then {{NOW()}} should not report local time, it should return UTC.
> Further, if {{NOW()}} returns local time, but Timestamp literals are UTC, 
> then it is hard to see how any query can be rationally written if one 
> timestamp value is local, but a literal is UTC.
> So, job #1 is to define the Timestamp semantics. Then, use that to figure out 
> where the bug lies to make implementation consistent with documentation (or 
> visa-versa.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-02-22 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879511#comment-15879511
 ] 

Julian Hyde commented on DRILL-5293:


Or, let each hash function have a "generation", basically a seed, the same way 
that you would if you are doing multiple hash partitioning runs. Each operator 
uses its operator ID as the generation. That way you are only hashing once (and 
therefore you don't have to worry about so much about the computational expense 
of the hash function).

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-02-22 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879088#comment-15879088
 ] 

Julian Hyde commented on DRILL-5290:


This is a good idea. Just to clarify, the "static" functions don't need to be 
static fields in the java class. They could be non-static if you choose, as 
long as you re-use the same operator table instance. Calcite's operator table 
API gives you several ways to build and combine operator tables, and reflection 
of (static) fields is not the only way. 

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.10
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, concurrent low latency operational queries, we quickly run 
> out of heap memory, causing JVM hangs. Build operator table once during 
> startup for static built-in functions and save in DrillbitContext, so we can 
> reuse it across queries.
> Provide an system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5283) Support "is not present" as subtype of "is null" for JSON data

2017-02-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876356#comment-15876356
 ] 

Julian Hyde commented on DRILL-5283:


Consider calling this "is empty". MDX and HBase/Phoenix have concepts of empty 
cells, and we should share concepts wherever possible. Otherwise you no longer 
have a data model; you are bubbling up details of the data format.

> Support "is not present" as subtype of "is null" for JSON data
> --
>
> Key: DRILL-5283
> URL: https://issues.apache.org/jira/browse/DRILL-5283
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>
> JSON files consist of a series of "objects", each of which has name/value 
> pairs. Values can be in one of three states:
> * Not present (the value does not appear)
> * Null (the name appears and the value is null)
> * Non-null (the field is one of the JSON data types)
> Drill, however, has only a single null state and so Drill collapses "not 
> present" and "null" into the same state.
> The not-present and present-but-null states work identically for calculations 
> inside Drill. But, when doing a CTAS from JSON to JSON, the collapsed state 
> means that the user does not get out of Drill what was put in: all null 
> values either appear as null values, or do not appear at all (depending on 
> Drill version.)
> This ticket asks to repurpose the "bit" fields in nullable vectors. Rename 
> the vector to "nullState". Then, use these values:
> * 0: value is set
> * 1: value is null
> * 3: value is not present
> The column is null if the null state is non-zero. The column is not null if 
> the null state is 0.
> This change requires reversing the "polarity" of the bit field, and so is a 
> major change.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5254) Enhance default reduction factors in optimizer

2017-02-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864178#comment-15864178
 ] 

Julian Hyde commented on DRILL-5254:


Thanks. I am [~julianhyde], by the way.

> Enhance default reduction factors in optimizer
> --
>
> Key: DRILL-5254
> URL: https://issues.apache.org/jira/browse/DRILL-5254
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10
>
>
> Drill uses Calcite for query parsing and optimization. Drill uses Calcite's 
> default selectivity (reduction factor) rules to compute the number of rows 
> removed by a filter.
> The default rules appear to be overly aggressive in estimating reductions. In 
> a production use case, an input with 4 billion rows was estimated to return 
> just 40K rows from a filter. That is, the filter estimated a 1/1,000,000 
> reduction in rows. As it turns out, the actual reduction was closer to 1/2.
> The result was that the planner compared the expected 40K rows against 
> another input of 2.5 million rows, and decided the 40K rows would be best on 
> the build side of a hash join. When confronted with the actual 3 billion 
> rows, the hash join ran out of memory.
> The moral of the story is that, in Drill, it is worth being conservative when 
> planning for memory-intensive operations.
> The (sanitized) filter is the following, annotated with (a guess at) the 
> default reduction factors in each term:
> {code}
> col1_s20 in ('Value1','Value2','Value3','Value4',
>  'Value5','Value6','Value7','Value8','Value9') -- 25%
> AND col2_i <=3 -- 25%
> AND col3_s1 = 'Y' -- 15%
> AND col4_s1 = 'Y' -- 15%
> AND col5_s6 not like '%str1%' -- 25%
> AND col5_s6 not like '%str2%' -- 25%
> AND col5_s6 not like '%str3%' -- 25%
> AND col5_s6 not like '%str4%' -- 25%
> {code}
> Total reduction is something like:
> {code}
> .25 * .25 * .15 ^ 2 * .25 ^ 4 = 0.05
> {code}
> Filter estimation is a known hard problem. In general, one needs statistics 
> and other data, and even then the estimates are just guesses.
> Still it is possible to ensure that the defaults are at least unbiased. That 
> is if we assume that the probability of A LIKE B being 25%, then the 
> probability of A NOT LIKE B should be 75%, not also 25%.
> This JIRA suggests creating an experimental set of defaults based on the 
> "core" Calcite defaults, but with other reduction factors derived using the 
> laws of probability. In particular:
> || Operator || Revised || Explanation || Calcite Default
> | = | 0.15 | Default in Calcite | 0.15
> | <> | 0.85 | 1 - p(=) | 0.5
> | < | 0.425 | p(<>) / 2 | 0.5
> | > | 0.425 | p(<>) / 2 | 0.5
> | <= | 0.575 | p(<) + p(=) | 0.5
> | >= | 0.575 | p(>) + p(=) | 0.5
> | LIKE | 0.25 | Default in Calcite | 0.25
> | NOT LIKE | 0.75 | 1 - p(LIKE) | 0.25
> | NOT NULL | 0.90 | Default in Calcite | 0.90
> | IS NULL | 0.10 | 1 - p(NOT NULL) | 0.25
> | IS TRUE | 0.5 | 1 / 2 | 0.25
> | IS FALSE | 0.5 | 1 / 2 | 0.25
> | IS NOT TRUE | 0.55 | 1 - p(IS TRUE) - p(IS NULL) | 0.25
> | IS NOT FALSE | 0.55 | 1 - p(IS FALSE) - p(IS NULL) | 0.25
> | A OR B | Varies | min(p(A) + p(B) - p(A ^ B), 0.5) | 0.5
> | A AND B | Varies | p(A ^ B) = p(A) * p(B) | Same
> | IN (a) | 0.15 | p(=) | 0.25
> | x IN (a, b, c, ...) | Varies | p(x = a v x = b v x = c v ...) | 0.25
> | NOT A | Varies | 1 - p(A) | 0.25
> | BETWEEN a AND b | 0.33 | p(<= ^ >=) | 0.25
> | NOT BETWEEN a AND b | 0.72 | 1 - p(BETWEEN) | 0.25
> The Calcite defaults were identified by inspection and verified by tests. The 
> Calcite rules make sense if one considers conditional probability: that the 
> user applied a particular expression to the data with the expectation that 
> given that data set, the expression matches 25% of the rows.
> The probability of the IS NOT TRUE statement assumes the presence of nulls, 
> while IS TRUE does not. The rule for OR caps the reduction factor at 0.5 per 
> standard practice. The rule for BETWEEN arises because Calcite (or Drill?) 
> rewrites {{a BETWEEN b AND c}} as {{a >= b AND a <= c}}.
> With the revised rules, the example WHERE reduction becomes:
> {code}
> col1_s20 in ('Value1','Value2','Value3','Value4',
>  'Value5','Value6','Value7','Value8','Value9') -- 50%
> AND col2_i <=3 -- 57%
> AND col3_s1 = 'Y' -- 15%
> AND col4_s1 = 'Y' -- 15%
> AND col5_s6 not like '%str1%' -- 85%
> AND col5_s6 not like '%str2%' -- 85%
> AND col5_s6 not like '%str3%' -- 85%
> AND col5_s6 not like '%str4%' -- 85%
> .5 * .57 * .15^2 * .85^4 = 0.003
> {code}
> The new rules are not a panacea: they are still just guesses. However, they 
> are unbiased guesses based on the rules of probability which result in more 
> conservative reductions of filters. The res

[jira] [Commented] (DRILL-5254) Enhance default reduction factors in optimizer

2017-02-12 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863011#comment-15863011
 ] 

Julian Hyde commented on DRILL-5254:


Please consider fixing this in Calcite. It's tiresome when the good stuff only 
goes one way.

> Enhance default reduction factors in optimizer
> --
>
> Key: DRILL-5254
> URL: https://issues.apache.org/jira/browse/DRILL-5254
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10
>
>
> Drill uses Calcite for query parsing and optimization. Drill uses Calcite's 
> default selectivity (reduction factor) rules to compute the number of rows 
> removed by a filter.
> The default rules appear to be overly aggressive in estimating reductions. In 
> a production use case, an input with 4 billion rows was estimated to return 
> just 40K rows from a filter. That is, the filter estimated a 1/1,000,000 
> reduction in rows. As it turns out, the actual reduction was closer to 1/2.
> The result was that the planner compared the expected 40K rows against 
> another input of 2.5 million rows, and decided the 40K rows would be best on 
> the build side of a hash join. When confronted with the actual 3 billion 
> rows, the hash join ran out of memory.
> The moral of the story is that, in Drill, it is worth being conservative when 
> planning for memory-intensive operations.
> The (sanitized) filter is the following, annotated with (a guess at) the 
> default reduction factors in each term:
> {code}
> col1_s20 in ('Value1','Value2','Value3','Value4',
>  'Value5','Value6','Value7','Value8','Value9') -- 25%
> AND col2_i <=3 -- 25%
> AND col3_s1 = 'Y' -- 15%
> AND col4_s1 = 'Y' -- 15%
> AND col5_s6 not like '%str1%' -- 25%
> AND col5_s6 not like '%str2%' -- 25%
> AND col5_s6 not like '%str3%' -- 25%
> AND col5_s6 not like '%str4%' -- 25%
> {code}
> Total reduction is something like:
> {code}
> .25 * .25 * .15 ^ 2 * .25 ^ 4 = 0.05
> {code}
> Filter estimation is a known hard problem. In general, one needs statistics 
> and other data, and even then the estimates are just guesses.
> Still it is possible to ensure that the defaults are at least unbiased. That 
> is if we assume that the probability of A LIKE B being 25%, then the 
> probability of A NOT LIKE B should be 75%, not also 25%.
> This JIRA suggests creating an experimental set of defaults based on the 
> "core" Calcite defaults, but with other reduction factors derived using the 
> laws of probability. In particular:
> || Operator || Revised || Explanation || Calcite Default
> | = | 0.15 | Default in Calcite | 0.15
> | <> | 0.85 | 1 - p(=) | 0.5
> | < | 0.425 | p(<>) / 2 | 0.5
> | > | 0.425 | p(<>) / 2 | 0.5
> | <= | 0.575 | p(<) + p(=) | 0.5
> | >= | 0.575 | p(>) + p(=) | 0.5
> | LIKE | 0.25 | Default in Calcite | 0.25
> | NOT LIKE | 0.75 | 1 - p(LIKE) | 0.25
> | NOT NULL | 0.90 | Default in Calcite | 0.90
> | IS NULL | 0.10 | 1 - p(NOT NULL) | 0.25
> | IS TRUE | 0.5 | 1 / 2 | 0.25
> | IS FALSE | 0.5 | 1 / 2 | 0.25
> | IS NOT TRUE | 0.55 | 1 - p(IS TRUE) - p(IS NULL) | 0.25
> | IS NOT FALSE | 0.55 | 1 - p(IS FALSE) - p(IS NULL) | 0.25
> | A OR B | Varies | min(p(A) + p(B) - p(A ^ B), 0.5) | 0.5
> | A AND B | Varies | p(A ^ B) = p(A) * p(B) | Same
> | IN (a) | 0.15 | p(=) | 0.25
> | x IN (a, b, c, ...) | Varies | p(x = a v x = b v x = c v ...) | 0.25
> | NOT A | Varies | 1 - p(A) | 0.25
> | BETWEEN a AND b | 0.33 | p(<= ^ >=) | 0.25
> | NOT BETWEEN a AND b | 0.72 | 1 - p(BETWEEN) | 0.25
> The Calcite defaults were identified by inspection and verified by tests. The 
> Calcite rules make sense if one considers conditional probability: that the 
> user applied a particular expression to the data with the expectation that 
> given that data set, the expression matches 25% of the rows.
> The probability of the IS NOT TRUE statement assumes the presence of nulls, 
> while IS TRUE does not. The rule for OR caps the reduction factor at 0.5 per 
> standard practice. The rule for BETWEEN arises because Calcite (or Drill?) 
> rewrites {{a BETWEEN b AND c}} as {{a >= b AND a <= c}}.
> With the revised rules, the example WHERE reduction becomes:
> {code}
> col1_s20 in ('Value1','Value2','Value3','Value4',
>  'Value5','Value6','Value7','Value8','Value9') -- 50%
> AND col2_i <=3 -- 57%
> AND col3_s1 = 'Y' -- 15%
> AND col4_s1 = 'Y' -- 15%
> AND col5_s6 not like '%str1%' -- 85%
> AND col5_s6 not like '%str2%' -- 85%
> AND col5_s6 not like '%str3%' -- 85%
> AND col5_s6 not like '%str4%' -- 85%
> .5 * .57 * .15^2 * .85^4 = 0.003
> {code}
> The new rules are not a panacea: they are still just guesses. However, they 
> are unbiased guesses based on the rules of probability which result 

[jira] [Commented] (DRILL-5202) Planner misses opportunity to link sort & filter without remover

2017-01-18 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828849#comment-15828849
 ] 

Julian Hyde commented on DRILL-5202:


Wouldn't the logical planner usually have pushed project and filter through 
sort?

> Planner misses opportunity to link sort & filter without remover
> 
>
> Key: DRILL-5202
> URL: https://issues.apache.org/jira/browse/DRILL-5202
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider the following query:
> {code}
> SELECT * FROM (SELECT * FROM `mock`.`mock.json` ORDER BY col1) d WHERE d.col1 
> = 'bogus'
> {code}
> The data source here is a mock: it simply generates a data set with 10 
> columns numbered col1 to col10. Then it generates 10,000 rows of data.
> The plan for this query misses an optimization opportunity. (See plan below.)
> The current plan is (abbreviated):
> * scan
> * ...
> * sort
> * selection vector remover
> * project
> * filter
> * selection vector remover
> * project
> * screen
> Careful inspection shows that this query is very simple. The following steps 
> would work just as well:
> * scan
> * ...
> * sort
> * filter
> * project
> * screen
> That is, the filter can handle an input with a selection vector. So, no SVR 
> is needed between the sort and the filter. Plus, this is a {{SELECT *}} 
> query, so all the extra projects don't really do anything useful, so they can 
> be removed where unneeded. The revised plan eliminates an unnecessary data 
> copy.
> Of course, the planner should have pushed the filter below the sort. But that 
> is DRILL-5200.
> {code} 
>  "graph" : [ {
> "pop" : "mock-scan",
> "@id" : 8,
> ...
>   }, {
> "pop" : "project",
> "@id" : 7,
> "exprs" : [ {
>   "ref" : "`T0¦¦*`",
>   "expr" : "`*`"
> }, {
>   "ref" : "`col1`",
>   "expr" : "`col1`"
> } ],
> "child" : 8,
> ...
>   }, {
> "pop" : "external-sort",
> "@id" : 6,
> "child" : 7,
> "orderings" : [ {
>   "order" : "ASC",
>   "expr" : "`col1`",
>   "nullDirection" : "UNSPECIFIED"
> } ],
> ...
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 5,
> "child" : 6,
> ...
>   }, {
> "pop" : "project",
> "@id" : 4,
> "exprs" : [ {
>   "ref" : "`T0¦¦*`",
>   "expr" : "`T0¦¦*`"
> } ],
> "child" : 5,
> ...
>   }, {
> "pop" : "filter",
> "@id" : 3,
> "child" : 4,
> ...
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 2,
> ...
>   }, {
> "pop" : "project",
> "@id" : 1,
> "exprs" : [ {
>   "ref" : "`*`",
>   "expr" : "`T0¦¦*`"
> } ],
> "child" : 2,
>...
>   }, {
> "pop" : "screen",
> "@id" : 0,
> ...
>   } ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-5169) Reconsider use of AutoCloseable within Drill

2016-12-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786120#comment-15786120
 ] 

Julian Hyde commented on DRILL-5169:


I think {{DrillCloseable}} should extend {{AutoCloseable}}. You can override 
the {{close}} method to have no declared exceptions.

People can then close a {{DrillCloseable}} from within try-with-resources if 
they like. And you can use utilities such as [Guava's 
Closer|https://google.github.io/guava/releases/19.0/api/docs/com/google/common/io/Closer.html]
 or [Calcite's 
Closer|https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/util/Closer.java].

> Reconsider use of AutoCloseable within Drill
> 
>
> Key: DRILL-5169
> URL: https://issues.apache.org/jira/browse/DRILL-5169
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Drill has many resources that must be closed: value vectors, threads, 
> operators and on and on. The {{close()}} method may sometimes throw an 
> exception or take a long time. Drill has developed, or borrowed from Guava, 
> many utilities to help manage the close operation.
> Java has two forms of "predefined" closeable interfaces: {{Closeable}} and 
> {{AutoCloseable}}. {{Closeable}} is for I/O resources and thus can throw an 
> {{IOException}}. {{AutoCloseable}} throws no exception, and is integrated 
> into the language for use in try-with-resources blocks. Because 
> {{AutoCloseable}} is intended only for this use, any creation or return of an 
> {{AutoCloseable}} outside of a try-with-resources block produces compiler 
> warnings.
> Neither of the two Java interfaces fit Drill's needs. {{Closeable}} throws a 
> particular exception ({{IOException}}) which Drill seldom throws, but does 
> not throw exceptions that Drill does throw.
> Drill has settled on {{AutoCloseable}}, but few of Drill's resources are 
> limited in life to a single try-with-resources block. The result is either 
> hundreds of resource warnings (which developers learn to ignore), or hundreds 
> of insertions of {{@SuppressWarnings("resource")}} tags, which just clutter 
> the code.
> Note that there is nothing special about either of the Java-provided 
> interfaces. {{Closeable}} is simply a convention to allow easy closing of IO 
> resources such as streams and so on. {{AutoCloseable}} exists for the sole 
> purpose of implementing try-with-resource blocks.
> What we need is a Drill-specific interface that provides a common {{close()}} 
> method which throws only unchecked exceptions, but is not required to be used 
> in try-with-resources. Perhaps call this {{DrillCloseable}}.
> Next, reimplement the various close utilities. For example: 
> {{DrillCloseables.closeAll()}} would close a set of resources, suppressing 
> exceptions, and throwing a single combined exception if any operations fail.
> Then, convert all uses of {{AutoCloseable}} to {{DrillCloseable}}, or at 
> least all that are not used in try-with-resources block. Doing so will 
> eliminate many compiler warnings and or suppress warnings tags. Because Java 
> allows classes to implement multiple interfaces, it is even possible for a 
> class to implement both {{DrillCloseable}} and {{AutoCloseable}} in the rare 
> instance where both are needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-12-12 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742666#comment-15742666
 ] 

Julian Hyde commented on DRILL-4455:


[~jnadeau] and [~sphillips], Can you respond to [~parthc]'s suggestion? I don't 
want this rapprochement to go cold.

> Depend on Apache Arrow for Vector and Memory
> 
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2543) Correlated subquery where outer table contains NULL values returns seemingly wrong result

2016-11-30 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710628#comment-15710628
 ] 

Julian Hyde commented on DRILL-2543:


[~jni], did you ever log a Calcite issue for this? CALCITE-1513 is looking 
similar.

> Correlated subquery where outer table contains NULL values returns  seemingly 
> wrong result
> --
>
> Key: DRILL-2543
> URL: https://issues.apache.org/jira/browse/DRILL-2543
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: Future
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> ++++
> | a1 | b1 | c1 |
> ++++
> | 1  | 2015-03-01 | a  |
> | 2  | 2015-03-02 | b  |
> | null   | null   | null   |
> ++++
> 3 rows selected (0.064 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> ++++
> | a2 | b2 | c2 |
> ++++
> | 5  | 2017-03-01 | a  |
> ++++
> 1 row selected (0.07 seconds)
> 0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in 
> (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1;
> +++
> | c1 |   EXPR$1   |
> +++
> | a  | 1  |
> | b  | 1  |
> +++
> 2 rows selected (0.32 seconds)
> {code}
> Postgres returns row from the outer table where a1 is null.
> This is part that I don't understand, because join condition in the subquery 
> should have eliminated row where a1 IS NULL. To me Drill result looks 
> correct. Unless there is something different in correlated comparison 
> semantics that I'm not aware of.
> {code}
> postgres=# select * from t1;
>  a1 | b1 |  c1
> ++---
>   1 | 2015-03-01 | a
>   2 | 2015-03-02 | b
> ||
> (3 rows)
> {code}
> Explain plan for the query:
> {code}
> 00-01  Project(c1=[$0], EXPR$1=[$1])
> 00-02StreamAgg(group=[{0}], EXPR$1=[COUNT()])
> 00-03  Sort(sort0=[$0], dir0=[ASC])
> 00-04Project(c1=[$0])
> 00-05  SelectionVectorRemover
> 00-06Filter(condition=[NOT(IS TRUE($3))])
> 00-07  HashJoin(condition=[=($1, $2)], joinType=[left])
> 00-09Project($f1=[$0], $f3=[$2])
> 00-11  SelectionVectorRemover
> 00-13Filter(condition=[IS NOT NULL($1)])
> 00-15  Project(c1=[$1], b1=[$0], a1=[$2])
> 00-17Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, 
> numFiles=1, columns=[`c1`, `b1`, `a1`]]])
> 00-08Project($f02=[$1], $f2=[$2])
> 00-10  StreamAgg(group=[{0, 1}], agg#0=[MIN($2)])
> 00-12Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
> dir1=[ASC])
> 00-14  Project($f0=[$1], $f02=[$2], $f1=[true])
> 00-16HashJoin(condition=[=($2, $0)], 
> joinType=[inner])
> 00-18  StreamAgg(group=[{0}])
> 00-20Sort(sort0=[$0], dir0=[ASC])
> 00-22  Project($f0=[$1])
> 00-23SelectionVectorRemover
> 00-24  Filter(condition=[IS NOT NULL($0)])
> 00-25Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, 
> numFiles=1, columns=[`b1`, `a1`]]])
> 00-19  Project(a2=[$1], b2=[$0])
> 00-21Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/test/t2]], selectionRoot=/test/t2, 
> numFiles=1, columns=[`a2`, `b2`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-11-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706607#comment-15706607
 ] 

Julian Hyde commented on DRILL-4455:


[~jnadeau], [~sphillips], [~amansinha100] and [~parthc],

Can all parties please agree (and state publicly for the record) that moving 
value vector code out of Drill and into Arrow is in the best interests of the 
Drill project?

Most contributions can be managed by a process of submitting a patch, review, 
reject, revise, and repeat. But this is not one of those patches that can be 
casually kicked back to the contributor. It is a huge, because it is an 
architectural change. I would like to see a commitment from both sides 
(contributor and reviewer) that we will find consensus and accept the patch.

> Depend on Apache Arrow for Vector and Memory
> 
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-11-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685763#comment-15685763
 ] 

Julian Hyde commented on DRILL-4455:


[~sphillips], [~jnadeau], What do you think about memory management/accounting? 
Is there potentially a compromise, say an an interface for accounting that 
could be called by Arrow?

> Depend on Apache Arrow for Vector and Memory
> 
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory

2016-11-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685135#comment-15685135
 ] 

Julian Hyde commented on DRILL-4455:


Now that Arrow has had a release, should we re-visit this? I have no dog in 
this race (or maybe I have two, as a PMC member of both Drill and Arrow), but 
it seems to me that moving the memory format out of Drill is to Drill's benefit.

> Depend on Apache Arrow for Vector and Memory
> 
>
> Key: DRILL-4455
> URL: https://issues.apache.org/jira/browse/DRILL-4455
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 2.0.0
>
>
> The code for value vectors and memory has been split and contributed to the 
> apache arrow repository. In order to help this project advance, Drill should 
> depend on the arrow project instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and 
> StoragePlugins. The changes will mainly just involve renaming the classes to 
> the org.apache.arrow namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4924) Can not use case expression within an IN predicate.

2016-10-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543278#comment-15543278
 ] 

Julian Hyde edited comment on DRILL-4924 at 10/3/16 8:09 PM:
-

Actually, Calcite does now support "SELECT without FROM", you enable the right 
SQL conformance level. See CALCITE-1120.


was (Author: julianhyde):
Actually, Calcite does now support "SELECT without FROM", you enable the right 
SQL conformance level.

> Can not use case expression within an IN predicate.
> ---
>
> Key: DRILL-4924
> URL: https://issues.apache.org/jira/browse/DRILL-4924
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>
> Can not use case expression within an IN predicate.
> Drill 1.9.0 git commit ID: f3c26e34
> Similar query works on Postgres 9.3
> {noformat}
> postgres=# select * from t1 where c1 in ( select case when c1=2 then 30 else 
> c1 end);
>  c1 | c2
> +-
>   1 |   0
>   3 |  19
>  -1 |  11
>   5 |  13
>  10 |  17
>  11 |  -1
>  13 |   1
>  17 |  20
>   0 |   9
>  19 | 100
> (10 rows)
> {noformat}
> Drill 1.9.0 returns an error
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `emp_tbl` where id IN ( SELECT 
> CASE WHEN id=2 THEN 30 else id end );
> Error: PARSE ERROR: Encountered ")" at line 1, column 81.
> Was expecting one of:
> "FROM" ...
> "," ...
> "AS" ...
>  ...
>  ...
>  ...
>  ...
>  ...
> "NOT" ...
> "IN" ...
> "BETWEEN" ...
> "LIKE" ...
> "SIMILAR" ...
> "=" ...
> ">" ...
> "<" ...
> "<=" ...
> ">=" ...
> "<>" ...
> "+" ...
> "-" ...
> "*" ...
> "/" ...
> "||" ...
> "AND" ...
> "OR" ...
> "IS" ...
> "MEMBER" ...
> "SUBMULTISET" ...
> "MULTISET" ...
> "[" ...
> SQL Query select * from `emp_tbl` where id IN ( SELECT CASE WHEN id=2 THEN 30 
> else id end )
>   
>   ^
> [Error Id: e6c3f120-8776-476e-8df7-7ef30f6b7307 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4924) Can not use case expression within an IN predicate.

2016-10-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543278#comment-15543278
 ] 

Julian Hyde commented on DRILL-4924:


Actually, Calcite does now support "SELECT without FROM", you enable the right 
SQL conformance level.

> Can not use case expression within an IN predicate.
> ---
>
> Key: DRILL-4924
> URL: https://issues.apache.org/jira/browse/DRILL-4924
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>
> Can not use case expression within an IN predicate.
> Drill 1.9.0 git commit ID: f3c26e34
> Similar query works on Postgres 9.3
> {noformat}
> postgres=# select * from t1 where c1 in ( select case when c1=2 then 30 else 
> c1 end);
>  c1 | c2
> +-
>   1 |   0
>   3 |  19
>  -1 |  11
>   5 |  13
>  10 |  17
>  11 |  -1
>  13 |   1
>  17 |  20
>   0 |   9
>  19 | 100
> (10 rows)
> {noformat}
> Drill 1.9.0 returns an error
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `emp_tbl` where id IN ( SELECT 
> CASE WHEN id=2 THEN 30 else id end );
> Error: PARSE ERROR: Encountered ")" at line 1, column 81.
> Was expecting one of:
> "FROM" ...
> "," ...
> "AS" ...
>  ...
>  ...
>  ...
>  ...
>  ...
> "NOT" ...
> "IN" ...
> "BETWEEN" ...
> "LIKE" ...
> "SIMILAR" ...
> "=" ...
> ">" ...
> "<" ...
> "<=" ...
> ">=" ...
> "<>" ...
> "+" ...
> "-" ...
> "*" ...
> "/" ...
> "||" ...
> "AND" ...
> "OR" ...
> "IS" ...
> "MEMBER" ...
> "SUBMULTISET" ...
> "MULTISET" ...
> "[" ...
> SQL Query select * from `emp_tbl` where id IN ( SELECT CASE WHEN id=2 THEN 30 
> else id end )
>   
>   ^
> [Error Id: e6c3f120-8776-476e-8df7-7ef30f6b7307 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4918) Adding 2 dates results in AssertionError: Multiple functions with best cost found

2016-09-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534220#comment-15534220
 ] 

Julian Hyde commented on DRILL-4918:


Adding two dates should give a validation error (or runtime error if you have 
late schema).

> Adding 2 dates results in AssertionError: Multiple functions with best cost 
> found
> -
>
> Key: DRILL-4918
> URL: https://issues.apache.org/jira/browse/DRILL-4918
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Functions - Drill
>Reporter: Rahul Challapalli
> Attachments: error.log
>
>
> git.commit.id.abbrev=2295715
> The below query fails
> {code}
> select l_commitdate+l_receiptdate from cp.`tpch/lineitem.parquet`;
> Error: SYSTEM ERROR: AssertionError: Multiple functions with best cost found
> Fragment 0:0
> [Error Id: 983983bd-73e3-45c8-a377-e7022a5899c7 on qa-node182.qa.lab:31010] 
> (state=,code=0)
> {code}
> The more likely scenario is that this operation is not supported. Atleast we 
> should throw a proper error message. I attached the error from the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4805) COUNT(*) over window and group by partitioning column results in validation error

2016-07-24 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391102#comment-15391102
 ] 

Julian Hyde commented on DRILL-4805:


I can reproduce this on Calcite; run QuidemTest with 
https://github.com/julianhyde/calcite/tree/drill-4805-win-agg-group-by.

> COUNT(*) over window and group by partitioning column results in validation 
> error
> -
>
> Key: DRILL-4805
> URL: https://issues.apache.org/jira/browse/DRILL-4805
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>  Labels: window_function
>
> COUNT(*) over window and group by partitioning column results in validation 
> error.
> MapR Drill 1.8.0 commit ID : 34ca63ba
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT COUNT(*) OVER ( PARTITION BY c2 ORDER BY 
> c2 ) FROM `tblWnulls.parquet` group by c2;
> Error: VALIDATION ERROR: At line 1, column 14: Expression '*' is not being 
> grouped
> SQL Query null
> [Error Id: 44e03ee7-4c86-4809-90e7-5eaeb634691d on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Postgres returns the COUNT for the same query and same data.
> {noformat}
> postgres=# SELECT COUNT(*) OVER ( PARTITION BY c2 ORDER BY c2 ) FROM t222 
> group by c2;
>  count 
> ---
>  1
>  1
>  1
>  1
>  1
>  1
> (6 rows)
> {noformat}
> Interestingly, when we do a nested COUNT(COUNT(*)) over the window, Drill 
> does return the count.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT COUNT(COUNT(*)) OVER ( PARTITION BY c2 
> ORDER BY c2 ) FROM `tblWnulls.parquet` group by c2;
> +-+
> | EXPR$0  |
> +-+
> | 1   |
> | 1   |
> | 1   |
> | 1   |
> | 1   |
> | 1   |
> +-+
> 6 rows selected (0.22 seconds)
> {noformat}
> Also without the GROUP BY c2, and with column c2 in the project Drill returns 
> results.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT COUNT(*) OVER ( PARTITION BY c2 ORDER BY 
> c2 ), c2 FROM `tblWnulls.parquet`;
> +-+---+
> | EXPR$0  |  c2   |
> +-+---+
> | 7   | a |
> | 7   | a |
> | 7   | a |
> | 7   | a |
> | 7   | a |
> | 7   | a |
> | 7   | a |
> | 4   | b |
> | 4   | b |
> | 4   | b |
> | 4   | b |
> | 7   | c |
> | 7   | c |
> | 7   | c |
> | 7   | c |
> | 7   | c |
> | 7   | c |
> | 7   | c |
> | 6   | d |
> | 6   | d |
> | 6   | d |
> | 6   | d |
> | 6   | d |
> | 6   | d |
> | 2   | e |
> | 2   | e |
> | 4   | null  |
> | 4   | null  |
> | 4   | null  |
> | 4   | null  |
> +-+---+
> 30 rows selected (0.172 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4280) Kerberos Authentication

2016-07-22 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390108#comment-15390108
 ] 

Julian Hyde commented on DRILL-4280:


If the main (only?) use case is client-to-drill communication, rather than 
drill-to-drill communication, and since Avatica already does Kerberos (see 
CALCITE-1159) it seems to me that a client based on Avatica would be an 
alternative. This would solve DRILL-4791 as well.

What are the pros and cons of Avatica?

> Kerberos Authentication
> ---
>
> Key: DRILL-4280
> URL: https://issues.apache.org/jira/browse/DRILL-4280
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
>
> Drill should support Kerberos based authentication from clients. This means 
> that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
> support inbound Kerberos. For Web this would most likely be SPNEGO while for 
> ODBC and JDBC this will be more generic Kerberos.
> Since Hive and much of Hadoop supports Kerberos there is a potential for a 
> lot of reuse of ideas if not implementation.
> Note that this is related to but not the same as 
> https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4175) IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently

2016-07-18 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383387#comment-15383387
 ] 

Julian Hyde commented on DRILL-4175:


Is it worth doing a force-push to fix it? It's only 4 hours ago, and one change 
on top of it. It depends on Drill's policy on force-push, I suppose.

> IOBE may occur in Calcite RexProgramBuilder when queries are submitted 
> concurrently
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
> Fix For: 1.8.0
>
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4175) calcite parse sql error

2016-07-15 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379958#comment-15379958
 ] 

Julian Hyde commented on DRILL-4175:


Can you correct 1099 to 1009?

> calcite parse sql error
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4175) calcite parse sql error

2016-07-15 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379908#comment-15379908
 ] 

Julian Hyde commented on DRILL-4175:


Rebasing is intermediate between cherry-picking (back-porting) and getting off 
the fork. You'd be moving forward the point at which the branch occurs, so 
you'd share history up to that point and get all of the fixes. If you're not 
regularly rebasing like this then by definition you're falling behind.

By the way, where is Drill's fork of Calcite? I could find the maven repo where 
the binaries are posted, but I couldn't find the git repo.

> calcite parse sql error
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4175) calcite parse sql error

2016-07-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378683#comment-15378683
 ] 

Julian Hyde commented on DRILL-4175:


bq. Given that, quick solution is to port CALCITE-1009 back to Drill's fork 
Calcite.

Or, better, rebase Drill's fork onto the latest Calcite release.

> calcite parse sql error
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4709) Document the included Foodmart sample data

2016-06-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317059#comment-15317059
 ] 

Julian Hyde commented on DRILL-4709:


Microsoft have never objected to this use of the data, nor have they shown much 
interest in curating it. The de facto home of the data is my 
foodmart-data-hsqldb project:

https://github.com/julianhyde/foodmart-data-hsqldb

You will see that on that page I make some effort to describe the schema, etc. 
Maybe you could help improve that site, and include a reference to that site in 
Drill documentation.

Under ASL you could of course copy that site into Drill's documentation but 
please, for heaven's sake, don't fork.

> Document the included Foodmart sample data
> --
>
> Key: DRILL-4709
> URL: https://issues.apache.org/jira/browse/DRILL-4709
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Drill includes a JSON version of the Mondrian FoodMart sample data. This data 
> appears in the $DRILL_HOME/jars/3rdparty/foodmart-data-json-0.4.jar jar file, 
> accessible using the class path storage plugin.
> The documentation mentions using the cp plugin to access customers.json. 
> However, the FoodMart data set is quite rich, with many example files.
> As it is, unless someone is a curious developer, and good with Google, they 
> won't be able to find the other data sets or the source of the FoodMart data.
> The data appears to be a JSON version of the SQL sample data for the Mondrian 
> project. A schema description is here: 
> https://github.com/pentaho/mondrian/blob/master/demo/FoodMart.xml
> The Mondrian data appears to have originated at Microsoft to highlight their 
> circa 2000 OLAP projects, but has since been discontinued. See
> * http://sqlmag.com/development/dts-2000-action
> * https://technet.microsoft.com/en-us/library/aa217032(v=sql.80).aspx
> * http://sqlmag.com/sql-server/desperately-seeking-samples
> Or do a Google search for "microsoft foodmart database".
> The request is to:
> 1. Credit MS and Mondrian for the data.
> 2. Either explain the data (which is quite a bit of work), or
> 3. Explain how to extract the files from the jar file to explore manually.
> 4. Provide a pointer to a description of the schema (if such can be found.)
> For option 3:
> cd $DRILL_HOME/jars/3rdparty
> unzip foodmart-data-json-0.4.jar -d ~/foodmart
> cd ~/foodmart
> ls
> Looking at the data, it is clear that SOME description is needed to 
> understand the many tables and how they might work with Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4689) Need to support conversion from TIMESTAMP type to TIME type

2016-05-20 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293474#comment-15293474
 ] 

Julian Hyde commented on DRILL-4689:


I support converting TIMESTAMP to TIME, but a TIME literal needs to be in the 
correct format, regardless of what PostgreSQL does.

> Need to support conversion from TIMESTAMP type to TIME type
> ---
>
> Key: DRILL-4689
> URL: https://issues.apache.org/jira/browse/DRILL-4689
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
> Environment: CentOS cluster
>Reporter: Khurram Faraaz
>
> According to ISO/IEC-2 9075 standard, TIMESTAMP type to TIME type conversion 
> is allowed and supported.
> This does not seem to work on Drill 1.7.0
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(TIME '2050-2-3 10:11:12.1000');
> Error: PARSE ERROR: Illegal TIME literal '2050-2-3 10:11:12.1000': not in 
> format 'HH:mm:ss'
> SQL Query values(TIME '2050-2-3 10:11:12.1000')
>^
> [Error Id: 77168fe0-760f-4384-a7c6-682241675348 on centos-03.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> values(cast('2050-2-3 10:11:12.1000' as time));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "2050-2-3 
> 10:11:12.1000" is malformed at "50-2-3 10:11:12.1000"
> Fragment 0:0
> [Error Id: 5168dfe6-b5e5-4ce0-8570-02ea74da6367 on centos-03.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> The above two expressions are supported on Postgres 9.3
> {noformat}
> postgres=# values(TIME '2050-2-3 10:11:12.1000');
>   column1   
> 
>  10:11:12.1
> (1 row)
> postgres=# values(cast('2050-2-3 10:11:12.1000' as time));
>   column1   
> 
>  10:11:12.1
> (1 row)
> postgres=# 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4678) Query HANG - SELECT DISTINCT over date data

2016-05-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287073#comment-15287073
 ] 

Julian Hyde commented on DRILL-4678:


OK, next step is to try to reproduce this in Calcite and if it reproduces, move 
it over to Calcite.

> Query HANG - SELECT DISTINCT over date data
> ---
>
> Key: DRILL-4678
> URL: https://issues.apache.org/jira/browse/DRILL-4678
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Priority: Critical
> Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>   (CAST('2002-03-04' AS DATE)),
>   (CAST('1966-09-04' AS DATE)),
>   (CAST('1993-08-18' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1959-10-23' AS DATE)),
>   (CAST('1992-01-14' AS DATE)),
>   (CAST('1994-07-24' AS DATE)),
>   (CAST('1979-11-25' AS DATE)),
>   (CAST('1945-01-14' AS DATE)),
>   (CAST('1982-07-25' AS DATE)),
>   (CAST('1966-09-06' AS DATE)),
>   (CAST('1989-05-01' AS DATE)),
>   (CAST('1996-03-08' AS DATE)),
>   (CAST('1998-08-19' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
> (CAST('1962-07-03' AS DATE)),
>   (CAST('2011-08-17' AS DATE)),
>   (CAST('2011-05-16' AS DATE)),
>   (CAST('1946-05-08' AS DATE)),
>   (CAST('1994-02-13' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1958-02-06' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('1998-03-26' AS DATE)),
>   (CAST('1996-11-04' AS DATE)),
>   (CAST('1953-09-25' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('1980-07-05' AS DATE)),
>   (CAST('1982-06-15' AS DATE)),
>   (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in 
> STARTING state
> {noformat}
> Running Queries
> Time  UserQuery   State   Foreman
> 05/16/2016 10:33:57   
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
> (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is 
> attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4682) Allow full schema identifier in SELECT clause

2016-05-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287066#comment-15287066
 ] 

Julian Hyde commented on DRILL-4682:


The validator support was added to Calcite a while ago in 
https://issues.apache.org/jira/browse/CALCITE-356. Not sure why this is not 
showing up in Drill.

> Allow full schema identifier in SELECT clause
> -
>
> Key: DRILL-4682
> URL: https://issues.apache.org/jira/browse/DRILL-4682
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Andries Engelbrecht
>
> Currently Drill requires aliases to identify columns in the SELECT clause 
> when working with multiple tables/workspaces.
> Many BI/Analytical and other tools by default will use the full schema 
> identifier in the select clause when generating SQL statements for execution 
> for generic JDBC or ODBC sources. Not supporting this feature causes issues 
> and a slower adoption of utilizing Drill as an execution engine within the 
> larger Analytical SQL community.
> Propose to support 
> SELECT ... FROM 
> ..
> Also see DRILL-3510 for double quote support as per ANSI_QUOTES
> SELECT ""."".""."" FROM 
> ""."".""
> Which is very common generic SQL being generated by most tools when dealing 
> with a generic SQL data source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4678) Query HANG - SELECT DISTINCT over date data

2016-05-16 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285187#comment-15285187
 ] 

Julian Hyde commented on DRILL-4678:


>From the stack it looks as if it is not hung, but is hitting a performance bug 
>evaluating metadata (the estimate of the number of distinct rows). I suspect 
>something is O(n ^ 2) or worse in the size of the VALUES clause.

You can confirm that the large VALUES clause is the main contributing factor by 
reducing its size and seeing what that does to the running time.

CALCITE-604 and CALCITE-1147 might both improve this situation, so it's worth 
finding out whether they're in that Drill version. They might also make the 
situation worse, caching being somewhat of a blunt instrument.

See if you can reproduce this bug in just Calcite. (E.g. in the PlannerTest.)

> Query HANG - SELECT DISTINCT over date data
> ---
>
> Key: DRILL-4678
> URL: https://issues.apache.org/jira/browse/DRILL-4678
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Priority: Critical
> Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>   (CAST('2002-03-04' AS DATE)),
>   (CAST('1966-09-04' AS DATE)),
>   (CAST('1993-08-18' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1959-10-23' AS DATE)),
>   (CAST('1992-01-14' AS DATE)),
>   (CAST('1994-07-24' AS DATE)),
>   (CAST('1979-11-25' AS DATE)),
>   (CAST('1945-01-14' AS DATE)),
>   (CAST('1982-07-25' AS DATE)),
>   (CAST('1966-09-06' AS DATE)),
>   (CAST('1989-05-01' AS DATE)),
>   (CAST('1996-03-08' AS DATE)),
>   (CAST('1998-08-19' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
> (CAST('1962-07-03' AS DATE)),
>   (CAST('2011-08-17' AS DATE)),
>   (CAST('2011-05-16' AS DATE)),
>   (CAST('1946-05-08' AS DATE)),
>   (CAST('1994-02-13' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1958-02-06' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('1998-03-26' AS DATE)),
>   (CAST('1996-11-04' AS DATE)),
>   (CAST('1953-09-25' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('1980-07-05' AS DATE)),
>   (CAST('1982-06-15' AS DATE)),
>   (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in 
> STARTING state
> {noformat}
> Running Queries
> Time  UserQuery   State   Foreman
> 05/16/2016 10:33:57   
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
> (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is 
> attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4175) calcite parse sql error

2016-04-07 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230861#comment-15230861
 ] 

Julian Hyde commented on DRILL-4175:


[~huntersjm], I suspected that CALCITE-1009 was a duplicate of CALCITE-440 but 
I couldn't figure out exactly how they are linked. Now I read your analysis I 
agree that is the cause. Very nice work!

Both bugs are fixed in Calcite-1.6.0 so should be fixed in Drill now. 
[~jnadeau], Can you mark this fixed for whatever version of Drill incorporates 
Calcite-1.6.0?

> calcite parse sql error
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174345#comment-15174345
 ] 

Julian Hyde commented on DRILL-4460:


An "external" algorithm is one that uses disk to complete if there is not 
enough memory. An "adaptive" algorithm is one that can start using memory and 
switch to external in the same run, without losing data. A "hybrid" algorithm 
is one that puts as much data as possible in memory and puts the rest in 
external, and therefore tends to gracefully degrade as input increases.

I wanted to point out that there are adaptive, external algorithms based on 
sort as well as hash. This paper describes adaptive hybrid hash join but 
adaptive hybrid hash aggregation is similar (and in fact simpler). 
http://www.vldb.org/conf/1990/P186.PDF

To be clear, external hashing is not currently implemented in Drill.

> Provide feature that allows fall back to sort aggregation
> -
>
> Key: DRILL-4460
> URL: https://issues.apache.org/jira/browse/DRILL-4460
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4460) Provide feature that allows fall back to sort aggregation

2016-03-01 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174268#comment-15174268
 ] 

Julian Hyde commented on DRILL-4460:


Falling back to external hashing would be another viable solution to the 
problem. It's a little more expensive to switch from memory hashing to external 
hashing when you discover that the data set is larger than you expected 
(hashing uses a different data structure for external data, whereas sorting 
uses essentially the same data structure)

> Provide feature that allows fall back to sort aggregation
> -
>
> Key: DRILL-4460
> URL: https://issues.apache.org/jira/browse/DRILL-4460
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.5.0
>Reporter: John Omernik
>
> Currently, the default setting for Drill is to use a Hash (in Memory) model 
> for aggregations (set by planner.enable_hashagg = true as default).  This 
> works well, but it's memory dependent and an out of memory condition will 
> cause a query failure.  At this point, a user can alter session set 
> `planner.enable_hashagg` = false and run the query again. If memory is a 
> challenge again, the sort based approach will spill to disk allowing the 
> query to complete (slower).
> What I am requesting is a feature, that defaults to be off (so Drill default 
> behavior will be the same after this feature is added) that would allow a 
> query that tried hash aggregation and failed due to out of memory to restart 
> the same query with sort aggregation.  Basically, allowing the query to 
> succeed, it will try hash first, then go to sort.  This would make for a 
> better user experience in that the query would succeed. Perhaps a warning 
> could be set for the user that would allow them to understand that this 
> occurred, so they could just go to a sort based query by default in the 
> future. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4400) Cannot apply 'NOT' to arguments of type 'NOT'.

2016-02-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151410#comment-15151410
 ] 

Julian Hyde commented on DRILL-4400:


OK, please check the precedence of NOT and = in the SQL standard and log a bug 
against Calcite if we're in the wrong.

> Cannot apply 'NOT' to arguments of type 'NOT'.
> ---
>
> Key: DRILL-4400
> URL: https://issues.apache.org/jira/browse/DRILL-4400
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> select tjoin2.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from 
> postgres.public.tjoin1 inner join postgres.public.tjoin2 on ( tjoin1.c1 = 
> tjoin2.c1 and not tjoin2.c2 = 'AA' )
> Error: VALIDATION ERROR: From line 1, column 154 to line 1, column 163: 
> Cannot apply 'NOT' to arguments of type 'NOT'. Supported form(s): 
> 'NOT'
> [Error Id: f781f07a-2361-4f3d-8f03-0a3f1ddec8f0 on centos1:31010]
>   (org.apache.calcite.tools.ValidationException) 
> org.apache.calcite.runtime.CalciteContextException: From line 1, column 154 
> to line 1, column 163: Cannot apply 'NOT' to arguments of type 
> 'NOT'. Supported form(s): 'NOT'
> org.apache.calcite.prepare.PlannerImpl.validate():189
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (org.apache.calcite.runtime.CalciteContextException) From line 1, 
> column 154 to line 1, column 163: Cannot apply 'NOT' to arguments of type 
> 'NOT'. Supported form(s): 'NOT'
> sun.reflect.GeneratedConstructorAccessor66.newInstance():-1
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
> java.lang.reflect.Constructor.newInstance():422
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
> org.apache.calcite.sql.SqlUtil.newContextException():714
> org.apache.calcite.sql.SqlUtil.newContextException():702
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3931
> org.apache.calcite.sql.SqlCallBinding.newValidationSignatureError():275
> 
> org.apache.calcite.sql.type.FamilyOperandTypeChecker.checkSingleOperandType():92
> 
> org.apache.calcite.sql.type.FamilyOperandTypeChecker.checkOperandTypes():109
> org.apache.calcite.sql.SqlOperator.checkOperandTypes():563
> org.apache.calcite.sql.SqlOperator.validateOperands():420
> org.apache.calcite.sql.SqlOperator.deriveType():487
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit():4268
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit():4255
> org.apache.calcite.sql.SqlCall.accept():130
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl():1495
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType():1478
> org.apache.calcite.sql.type.InferTypes$1.inferOperandTypes():51
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes():1672
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes():1678
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn():3370
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin():2814
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2772
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2986
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
> org.apache.calcite.sql.validate.AbstractNamespace.validate():86
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
> org.apache.calcite.sql.SqlSelect.validate():210
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
> org.apache.calcite.prepare.PlannerImpl.validate():187
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.h

[jira] [Updated] (DRILL-4407) Group by subquery causes Java NPE

2016-02-17 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated DRILL-4407:
---
Summary: Group by subquery causes Java NPE  (was: Group by subsquery causes 
Java NPE)

> Group by subquery causes Java NPE
> -
>
> Key: DRILL-4407
> URL: https://issues.apache.org/jira/browse/DRILL-4407
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> select count(*) from postgres.public.tjoin2  group by ( select c1 from 
> postgres.public.tjoin1 where rnum = 0)
> Error: VALIDATION ERROR: java.lang.NullPointerException
> [Error Id: d3453085-d77c-484e-8df7-f5fadc7bcc7d on centos1:31010]
>   (org.apache.calcite.tools.ValidationException) 
> java.lang.NullPointerException
> org.apache.calcite.prepare.PlannerImpl.validate():189
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.NullPointerException) null
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.visit():633
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.visit():619
> org.apache.calcite.sql.SqlIdentifier.accept():274
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():676
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():663
> org.apache.calcite.sql.SqlNodeList.accept():152
> 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild():134
> 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild():101
> org.apache.calcite.sql.SqlOperator.acceptCall():720
> org.apache.calcite.sql.SqlSelectOperator.acceptCall():128
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visitScoped():686
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit():50
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit():32
> org.apache.calcite.sql.SqlCall.accept():130
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():676
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():663
> org.apache.calcite.sql.SqlNodeList.accept():152
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.copy():626
> org.apache.calcite.sql.validate.AggregatingSelectScope.():92
> org.apache.calcite.sql.validate.SqlValidatorImpl.registerQuery():2200
> org.apache.calcite.sql.validate.SqlValidatorImpl.registerQuery():2122
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():835
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
> org.apache.calcite.prepare.PlannerImpl.validate():187
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4407) Group by subsquery causes Java NPE

2016-02-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151231#comment-15151231
 ] 

Julian Hyde commented on DRILL-4407:


For what it's worth, this may be fixed in the latest Calcite. 
https://github.com/julianhyde/calcite/commit/d1546712f6cae64021e717cb8e36ae6685e4ffbe
 

> Group by subsquery causes Java NPE
> --
>
> Key: DRILL-4407
> URL: https://issues.apache.org/jira/browse/DRILL-4407
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> select count(*) from postgres.public.tjoin2  group by ( select c1 from 
> postgres.public.tjoin1 where rnum = 0)
> Error: VALIDATION ERROR: java.lang.NullPointerException
> [Error Id: d3453085-d77c-484e-8df7-f5fadc7bcc7d on centos1:31010]
>   (org.apache.calcite.tools.ValidationException) 
> java.lang.NullPointerException
> org.apache.calcite.prepare.PlannerImpl.validate():189
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (java.lang.NullPointerException) null
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.visit():633
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.visit():619
> org.apache.calcite.sql.SqlIdentifier.accept():274
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():676
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():663
> org.apache.calcite.sql.SqlNodeList.accept():152
> 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild():134
> 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild():101
> org.apache.calcite.sql.SqlOperator.acceptCall():720
> org.apache.calcite.sql.SqlSelectOperator.acceptCall():128
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visitScoped():686
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit():50
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit():32
> org.apache.calcite.sql.SqlCall.accept():130
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():676
> org.apache.calcite.sql.validate.SqlValidatorUtil$DeepCopier.visit():663
> org.apache.calcite.sql.SqlNodeList.accept():152
> 
> org.apache.calcite.sql.validate.SqlValidatorUtil$ExpansionAndDeepCopier.copy():626
> org.apache.calcite.sql.validate.AggregatingSelectScope.():92
> org.apache.calcite.sql.validate.SqlValidatorImpl.registerQuery():2200
> org.apache.calcite.sql.validate.SqlValidatorImpl.registerQuery():2122
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():835
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
> org.apache.calcite.prepare.PlannerImpl.validate():187
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
> SQLState:  null
> ErrorCode: 0
> create table TJOIN1 (RNUM integer   not null , C1 integer, C2 integer);
> create table TJOIN2 (RNUM integer   not null , C1 integer, C2 char(2));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4400) Cannot apply 'NOT' to arguments of type 'NOT'.

2016-02-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151221#comment-15151221
 ] 

Julian Hyde commented on DRILL-4400:


Not a bug. I believe that 'NOT' has higher precedence than '='. If you change 
{{not tjoin2.c2 = 'AA'}} to {{not (tjoin2.c2 = 'AA')}} it should work.

> Cannot apply 'NOT' to arguments of type 'NOT'.
> ---
>
> Key: DRILL-4400
> URL: https://issues.apache.org/jira/browse/DRILL-4400
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.5.0
>Reporter: N Campbell
>
> select tjoin2.rnum, tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from 
> postgres.public.tjoin1 inner join postgres.public.tjoin2 on ( tjoin1.c1 = 
> tjoin2.c1 and not tjoin2.c2 = 'AA' )
> Error: VALIDATION ERROR: From line 1, column 154 to line 1, column 163: 
> Cannot apply 'NOT' to arguments of type 'NOT'. Supported form(s): 
> 'NOT'
> [Error Id: f781f07a-2361-4f3d-8f03-0a3f1ddec8f0 on centos1:31010]
>   (org.apache.calcite.tools.ValidationException) 
> org.apache.calcite.runtime.CalciteContextException: From line 1, column 154 
> to line 1, column 163: Cannot apply 'NOT' to arguments of type 
> 'NOT'. Supported form(s): 'NOT'
> org.apache.calcite.prepare.PlannerImpl.validate():189
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():167
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():199
> org.apache.drill.exec.work.foreman.Foreman.runSQL():924
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   Caused By (org.apache.calcite.runtime.CalciteContextException) From line 1, 
> column 154 to line 1, column 163: Cannot apply 'NOT' to arguments of type 
> 'NOT'. Supported form(s): 'NOT'
> sun.reflect.GeneratedConstructorAccessor66.newInstance():-1
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
> java.lang.reflect.Constructor.newInstance():422
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex():405
> org.apache.calcite.sql.SqlUtil.newContextException():714
> org.apache.calcite.sql.SqlUtil.newContextException():702
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():3931
> org.apache.calcite.sql.SqlCallBinding.newValidationSignatureError():275
> 
> org.apache.calcite.sql.type.FamilyOperandTypeChecker.checkSingleOperandType():92
> 
> org.apache.calcite.sql.type.FamilyOperandTypeChecker.checkOperandTypes():109
> org.apache.calcite.sql.SqlOperator.checkOperandTypes():563
> org.apache.calcite.sql.SqlOperator.validateOperands():420
> org.apache.calcite.sql.SqlOperator.deriveType():487
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit():4268
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit():4255
> org.apache.calcite.sql.SqlCall.accept():130
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl():1495
> org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType():1478
> org.apache.calcite.sql.type.InferTypes$1.inferOperandTypes():51
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes():1672
> org.apache.calcite.sql.validate.SqlValidatorImpl.inferUnknownTypes():1678
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereOrOn():3370
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin():2814
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2772
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():2986
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
> org.apache.calcite.sql.validate.AbstractNamespace.validate():86
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():877
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():863
> org.apache.calcite.sql.SqlSelect.validate():210
> 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():837
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate():551
> org.apache.calcite.prepare.PlannerImpl.validate():187
> org.apache.calcite.prepare.PlannerImpl.validateAndGetType():198
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():451
> 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():198
> org.

[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027464#comment-15027464
 ] 

Julian Hyde commented on DRILL-4130:


I agree: properties set at the query level would clearly override those set at 
the session level. The principle of least surprise is restored.

My philosophy is that one should be able to set any property at any level above 
where it is actually used. If you set it at a high level (e.g. set field 
delimiter at system level) it merely becomes the default for where it is used 
at a lower level. Some properties only apply at high levels (say system) and it 
should be illegal to override them at lower levels.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4130) Ability to set settings at Table or View level rather than SESSION or SYSTEM

2015-11-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027432#comment-15027432
 ] 

Julian Hyde commented on DRILL-4130:


Suppose that there is a system property P, and table T has overridden it, and 
the current session has overridden it also. It's not clear to me whether the 
table's setting or the session's setting should win.

You seem to have in mind that the table's setting would win, and no doubt you 
have a use case in mind where it makes sense that the table's setting would win.

But there are other properties where the user would legitimately expect the 
session to override the table. If we implement this feature as written we would 
violate the principle of least surprise.

> Ability to set settings at Table or View level rather than SESSION or SYSTEM
> 
>
> Key: DRILL-4130
> URL: https://issues.apache.org/jira/browse/DRILL-4130
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
> Environment: All
>Reporter: John Omernik
>  Labels: administration, settings
> Fix For: Future
>
>
> There are a number of settings within drill for handling data that due to low 
> level of granularity there may be unintended data reading consequences. A few 
> examples include:
> store.json.read_numbers_as_double
> and
> store.json.all_text_mode
> (There are likely more, these are some I've worked with)
> The documentation on https://drill.apache.org/docs/json-data-model/ outlines 
> how when dealing with certain types of data, that these settings can be 
> helpful for reading, and indeed some queries fail with a suggestion to change 
> these settings. 
> A few points here. 1. The documentation suggests alter system commands.  This 
> is not ideal as it changes the default way drill handles data for all users 
> AND not all users will (should) have the privs to enter this command.  The 
> documentation at a minimum should show alter session (or provide a clearer 
> understanding of the difference) 
> But even with alter session, that affects reads for all JSON files for that 
> session, when in reality, the reasoning behind the setting is to be able to 
> read a specific table that has poorly formed JSON.  Thus, issuing a command 
> that alters how Drill reads all JSON in order to read one table of JSON could 
> have unintended consequences, especially for a user who just wants to be able 
> to read things and issues commands without thinking things through. 
> Now as an administrator, there are two use cases here.  One is I have a table 
> of poorly formed JSON that requires one of these settings, and I can't change 
> the source, therefore, can I create a view that makes it so all reads of this 
> table are done with the more permissive  setting? Setting these in a view 
> would be very helpful from an administrator perspective for known bad data 
> sources.  Keep users from having to think about it, and let them do their 
> exploration. 
> The other use case, is the ability for a user to set a session level read 
> that only applies for the table being read.  alter session set 
> "%tablename%.store.json.read_numbers_as_double = true" (and have the errors 
> that display use that as the default suggestion) that way, the user can issue 
> the command, but not have downstream consequences in their session while 
> reading other tables. 
> Either case is valuable to an administrator, and could help prevent data read 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4087) Error parsing JSON - Invalid numeric value: Leading zeroes not allowed

2015-11-21 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde closed DRILL-4087.
--
Resolution: Invalid

Changed the resolution from fixed to invalid - there was never a problem with 
drill. 

> Error parsing JSON - Invalid numeric value: Leading zeroes not allowed
> --
>
> Key: DRILL-4087
> URL: https://issues.apache.org/jira/browse/DRILL-4087
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: Hadoop 2.7.1 cluster running on AWS staging instance 
> t4.medium 
> Apahe Dril - 1.2.0
>Reporter: Shankar
>
> jdbc:drill:> SELECT count(`timestamp`) FROM dfs.`/tmp/drill-s/` limit 10;
> Error: DATA_READ ERROR: Error parsing JSON - Invalid numeric value: Leading 
> zeroes not allowed
> is there any solution for this error ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4087) Error parsing JSON - Invalid numeric value: Leading zeroes not allowed

2015-11-21 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde reopened DRILL-4087:


> Error parsing JSON - Invalid numeric value: Leading zeroes not allowed
> --
>
> Key: DRILL-4087
> URL: https://issues.apache.org/jira/browse/DRILL-4087
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.2.0
> Environment: Hadoop 2.7.1 cluster running on AWS staging instance 
> t4.medium 
> Apahe Dril - 1.2.0
>Reporter: Shankar
>
> jdbc:drill:> SELECT count(`timestamp`) FROM dfs.`/tmp/drill-s/` limit 10;
> Error: DATA_READ ERROR: Error parsing JSON - Invalid numeric value: Leading 
> zeroes not allowed
> is there any solution for this error ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1491) Support for JDK 8

2015-11-19 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015000#comment-15015000
 ] 

Julian Hyde commented on DRILL-1491:


[~khfaraaz] wrote:

{noformat}
Functional tests were executed against Drill 1.3 RPM and JDK8. We see an
NPE for which DRILL-4035 was reported.
We still need to run TPCH tests + TPCDS tests and performance tests using
Drill and JDK8.
{noformat}

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Aditya Kishore
> Fix For: Future
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-1491) Support for JDK 8

2015-11-19 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014868#comment-15014868
 ] 

Julian Hyde commented on DRILL-1491:


What's left to do for JDK 1.8 support?

> Support for JDK 8
> -
>
> Key: DRILL-1491
> URL: https://issues.apache.org/jira/browse/DRILL-1491
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Aditya Kishore
> Fix For: Future
>
> Attachments: DRILL-1491.1.patch.txt
>
>
> This will be the umbrella JIRA used to track and fix issues with JDK 8 
> support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4107) Broken links in web site

2015-11-17 Thread Julian Hyde (JIRA)
Julian Hyde created DRILL-4107:
--

 Summary: Broken links in web site
 Key: DRILL-4107
 URL: https://issues.apache.org/jira/browse/DRILL-4107
 Project: Apache Drill
  Issue Type: Bug
Reporter: Julian Hyde
 Attachments: Screenshot from 2015-11-17 14-26-46.png

Following CALCITE-979 I ran http://www.brokenlinkcheck.com and found 40 broken 
links at http://drill.apache.org. Most of them are shown in the attached 
screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4107) Broken links in web site

2015-11-17 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated DRILL-4107:
---
Attachment: Screenshot from 2015-11-17 14-26-46.png

> Broken links in web site
> 
>
> Key: DRILL-4107
> URL: https://issues.apache.org/jira/browse/DRILL-4107
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Julian Hyde
> Attachments: Screenshot from 2015-11-17 14-26-46.png
>
>
> Following CALCITE-979 I ran http://www.brokenlinkcheck.com and found 40 
> broken links at http://drill.apache.org. Most of them are shown in the 
> attached screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3929) Support the ability to query database tables using external indices

2015-11-15 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006312#comment-15006312
 ] 

Julian Hyde commented on DRILL-3929:


When you say "Index selection is part of physical database design and is a core 
step in access path selection which occurs at physical planning stage" how have 
you decided what should occur at logical and physical planning stage? You make 
it sound as if there is a dictionary definition of "physical planning stage" 
whereas it is just a design decision which decisions are made when. And indeed 
every DBMS will have different definitions of "logical" and "physical".

I recall that "Access path selection" is a phase in query optimization in a 
transactional DBMS but is not so clear cut in a modern analytic DBMS. For 
example Vertica has no indexes, only relations, so you choose access path and 
relations simultaneously.

"I don't see how all indexes can be modeled as [materialized views]". - The 
contents of any index can be modeled as a query on the table (or tables for 
join indexes), and the physical layout can be modeled too. A table whose 
contents is identically equal to a query -- that's the very definition of a 
materialized view.

> Support the ability to query database tables using external indices   
> --
>
> Key: DRILL-3929
> URL: https://issues.apache.org/jira/browse/DRILL-3929
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3929) Support the ability to query database tables using external indices

2015-11-10 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999592#comment-14999592
 ] 

Julian Hyde commented on DRILL-3929:


More concretely: I think that physical optimization is too late a phase to 
consider indexes. By then you have already (I presume) decided sorting 
(collation) and distribution. But the existence of indexes should affect your 
choice of sorting & distribution.

It is worrying to me how much query planning Drill leaves until physical (Prel) 
time. By then only peephole optimizations are possible.

> Support the ability to query database tables using external indices   
> --
>
> Key: DRILL-3929
> URL: https://issues.apache.org/jira/browse/DRILL-3929
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3929) Support the ability to query database tables using external indices

2015-11-10 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999223#comment-14999223
 ] 

Julian Hyde commented on DRILL-3929:


I think the requirement is to build something that will adapt when the 
requirements change. :)

But seriously, we know that Elasticsearch indexes are not going to be the last 
use case we see in this area. We shouldn't over-fit for that use case.

> Support the ability to query database tables using external indices   
> --
>
> Key: DRILL-3929
> URL: https://issues.apache.org/jira/browse/DRILL-3929
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3929) Support the ability to query database tables using external indices

2015-11-09 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997866#comment-14997866
 ] 

Julian Hyde commented on DRILL-3929:


[~amansinha100], Your key argument against using modeling indexes as Calcite 
MVs is that it will "increase the search space". I do acknowledge that managing 
the planner's search space is a problem in Calcite. It is in every query 
planner.

But secondary indexes inflate the search space because they create so many more 
possibilities for execution plans. This is good! 

You state "External secondary indexes can be of two types: covering index and 
non-covering index". Phoenix also has local and global indexes. Some systems 
have hash indexes. Vertica and Druid has sorted projection tables. These are 
all forms of index, and there are more kinds of index that I haven't thought of 
or haven't been invented yet. They can all be modeled as MVs, then chosen based 
on cost, but I think your scheme would run out of road very quickly if the 
requirements were changed.

Also, consider the ways that a query can use several indexes. Some types of 
indexes, in particular bitmap indexes on the same table, can be intersected and 
unioned before generating a stream of rowids into the table scan. A rule-based 
approach would have difficulty choosing the best valid combination of indexes.

Lastly, consider summary tables, which I am sure Drill will use at some point. 
Summary tables are a kind of index (similar to sort-project index with optional 
aggregate), but summary tables can have indexes too! If you model summary 
tables and indexes as different concepts from each other and from base 
relations, your search space just got not larger, but a lot more complicated. 
Pragmatically that means that rules you have written for recognizing indexes on 
base tables won't work for indexes on summary tables; and you will have to 
write special rules that treat a sorted summary table as a non-covering index.

We should not use Volcano, with all rules enabled simultaneously, to optimize 
these queries; the search space will be too large. But by not modeling indexes 
as what they are -- relations containing useful denormalized data in a useful 
physical layout -- we are turning our back on many of the possibilities that 
they offer.

> Support the ability to query database tables using external indices   
> --
>
> Key: DRILL-3929
> URL: https://issues.apache.org/jira/browse/DRILL-3929
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>
> This is a placeholder for adding support in Drill to query database tables 
> using external indices.  I will add more details about the use case and a 
> preliminary design proposal.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2015-11-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992449#comment-14992449
 ] 

Julian Hyde commented on DRILL-4039:


Alright, I don't know what should happen in those cases. But now you know about 
SQL's support for non-ascii characters maybe you can see what other databases 
do. IIRC the Convert function is supposed to deal with character-set conversion.

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4039) Query fails when non-ascii characters are used in string literals

2015-11-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992134#comment-14992134
 ] 

Julian Hyde commented on DRILL-4039:


Not a bug. Calcite (and standard SQL) have several ways to create character 
literals of non-ASCII characters: n'foo' or _iso-8859-1'foo' or _utf16'foo' or 
u&'foo' or u&'foo' uescape '0'.

> Query fails when non-ascii characters are used in string literals
> -
>
> Key: DRILL-4039
> URL: https://issues.apache.org/jira/browse/DRILL-4039
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux lnxx64r6 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 
> 10 15:42:40 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Sergio Lob
>
> The following query against DRILL returns this error:
> SYSTEM ERROR: CalciteException: Failed to encode  'НАСТРОЕние' in character 
> set 'ISO-8859-1'
>  cc39118a-cde6-4a6e-a1d6-4b6b7e847b8a on maprd
> Query is:
>     SELECT
>    T1.`F01INT`,
>    T1.`F02UCHAR_10`,
>    T1.`F03UVARCHAR_10`
>     FROM
>    DPRV64R6_TRDUNI01T T1
>     WHERE
>    (T1.`F03UVARCHAR_10` =  'НАСТРОЕние')
>     ORDER BY
>    T1.`F01INT`;
> This issue looks similar to jira HIVE-12207.
> Is there a fix or workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4021) Cannot subract or add between two timestamps

2015-11-04 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990155#comment-14990155
 ] 

Julian Hyde commented on DRILL-4021:


That list does not include {{  }} Why? Because a {{datetime - datetime}} yields an interval, not a 
datetime. Which is why I referred to the {{}} syntax.

Your move. :)

> Cannot subract or add between two timestamps
> 
>
> Key: DRILL-4021
> URL: https://issues.apache.org/jira/browse/DRILL-4021
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Krystal
> Attachments: Screen Shot 2015-11-03 at 3.44.30 PM.png, Screen Shot 
> 2015-11-04 at 7.35.07 AM.png, Screen Shot 2015-11-04 at 8.12.34 AM.png
>
>
> The following subtraction between 2 now() function works:
> select now() - now()from voter_hive limit 1;
> +-+
> | EXPR$0  |
> +-+
> | PT0S|
> +-+
>  
> However, the following queries fail:
> select now() - create_time from voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: Cannot 
> apply '-' to arguments of type ' - '. Supported form(s): 
> ' - '
> ' - '
> ' - '
> select create_time - cast('1997-02-12 15:18:31.072' as timestamp) from 
> voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 65: Cannot 
> apply '-' to arguments of type ' - '. Supported 
> form(s): ' - '
> ' - '
> ' - '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4021) Cannot subract or add between two timestamps

2015-11-04 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated DRILL-4021:
---
Attachment: Screen Shot 2015-11-04 at 7.35.07 AM.png

> Cannot subract or add between two timestamps
> 
>
> Key: DRILL-4021
> URL: https://issues.apache.org/jira/browse/DRILL-4021
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Krystal
> Attachments: Screen Shot 2015-11-03 at 3.44.30 PM.png, Screen Shot 
> 2015-11-04 at 7.35.07 AM.png
>
>
> The following subtraction between 2 now() function works:
> select now() - now()from voter_hive limit 1;
> +-+
> | EXPR$0  |
> +-+
> | PT0S|
> +-+
>  
> However, the following queries fail:
> select now() - create_time from voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: Cannot 
> apply '-' to arguments of type ' - '. Supported form(s): 
> ' - '
> ' - '
> ' - '
> select create_time - cast('1997-02-12 15:18:31.072' as timestamp) from 
> voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 65: Cannot 
> apply '-' to arguments of type ' - '. Supported 
> form(s): ' - '
> ' - '
> ' - '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4021) Cannot subract or add between two timestamps

2015-11-04 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989734#comment-14989734
 ] 

Julian Hyde commented on DRILL-4021:


For the record, and your table 7 notwithstanding, SQL-2011 only allows 
subtraction of a timestamp from another using the '(ts1 - ts2) interval' syntax 
(see attached).

But it's fine that Drill goes beyond the standard as long as the semantics are 
clear. As Postgres does.

> Cannot subract or add between two timestamps
> 
>
> Key: DRILL-4021
> URL: https://issues.apache.org/jira/browse/DRILL-4021
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Krystal
> Attachments: Screen Shot 2015-11-03 at 3.44.30 PM.png
>
>
> The following subtraction between 2 now() function works:
> select now() - now()from voter_hive limit 1;
> +-+
> | EXPR$0  |
> +-+
> | PT0S|
> +-+
>  
> However, the following queries fail:
> select now() - create_time from voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: Cannot 
> apply '-' to arguments of type ' - '. Supported form(s): 
> ' - '
> ' - '
> ' - '
> select create_time - cast('1997-02-12 15:18:31.072' as timestamp) from 
> voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 65: Cannot 
> apply '-' to arguments of type ' - '. Supported 
> form(s): ' - '
> ' - '
> ' - '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4021) Cannot subract or add between two timestamps

2015-11-03 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988403#comment-14988403
 ] 

Julian Hyde commented on DRILL-4021:


I'll ask a rhetorical question: If you subtract two timestamps, what units 
would you expect the result to have? Milliseconds? Days?

Standard SQL says that if you subtract two date-time values you need to say 
what type of interval you can back (e.g. an interval in seconds). Thus you must 
write {code}t2 - t1 second{code} or {code}t2 - t1 month to year{code}.

Standard SQL does not allow you add two date-time values at all. What would you 
expect 2015-01-01 + 2015-01-01 to return? I suppose you could say 'a value 
twice as far from the 1970-01-01 epoch as 2015-01-01' but then you are assuming 
an epoch.

If you want to add to a date-time value, add an interval. {code}date 
'2015-01-01' + interval '2' year{code} should work.

So, this is not a bug.

> Cannot subract or add between two timestamps
> 
>
> Key: DRILL-4021
> URL: https://issues.apache.org/jira/browse/DRILL-4021
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Krystal
>
> The following subtraction between 2 now() function works:
> select now() - now()from voter_hive limit 1;
> +-+
> | EXPR$0  |
> +-+
> | PT0S|
> +-+
>  
> However, the following queries fail:
> select now() - create_time from voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 26: Cannot 
> apply '-' to arguments of type ' - '. Supported form(s): 
> ' - '
> ' - '
> ' - '
> select create_time - cast('1997-02-12 15:18:31.072' as timestamp) from 
> voter_hive where voter_id=1;
> Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 65: Cannot 
> apply '-' to arguments of type ' - '. Supported 
> form(s): ' - '
> ' - '
> ' - '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3993) Rebase Drill on Calcite 1.5.0 release

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979538#comment-14979538
 ] 

Julian Hyde commented on DRILL-3993:


[~sudheeshkatkam], "Catching up" is always necessary when you separate two 
components into modules and version them separately. Changes in module A don't 
break module B's nightly builds, but B needs to periodically sync up, at a time 
of its choosing.

I think that we put a lot of valuable features into Calcite that benefit Drill 
(some of them contributed by people who are also Drill committers), and I think 
we do a pretty good job at controlling change, so that things that do not 
directly benefit Drill at least do not break it. For example, we follow 
semantic versioning and do not remove APIs except in a major release.

We have discovered with other projects that asking the downstream projects to 
kick the tires of a Calcite release in the run-up to a release is an effective 
way to find problems, and efficient in terms of time and effort for both 
projects.

If there is anything else we can do in Calcite do make the process more 
efficient for Drill, let me know.

> Rebase Drill on Calcite 1.5.0 release
> -
>
> Key: DRILL-3993
> URL: https://issues.apache.org/jira/browse/DRILL-3993
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Sudheesh Katkam
>
> Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure 
> there are no regressions.
> Also, how do we resolve this 'catching up' issue in the long term?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978979#comment-14978979
 ] 

Julian Hyde commented on DRILL-3989:


I don’t think Oracle has a clear answer. Their nearest equivalent is v$sql, but 
they also have v$process.

JDBC defines the terminology for most people, and they call it statement. 
Albeit a JDBC statement can be executed multiple times. I don't know whether 
Drill gives each execution a new id, or uses the same statement id for each.

MySQL gets it mixed up: “KILL QUERY terminates the statement the connection is 
currently executing, but leaves the connection itself intact.” 
https://dev.mysql.com/doc/refman/5.0/en/kill.html

I'd define it as "things that are running that a DBA would like to kill". This 
includes SELECT queries, DML and DDL statements. Collectively, statements. 
Certainly, INSERT and CREATE TABLE AS SELECT can potentially table as much time 
& resources as queries.

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3989) Create a sys.queries table

2015-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978874#comment-14978874
 ] 

Julian Hyde commented on DRILL-3989:


You should call it "statements". You never know...

> Create a sys.queries table
> --
>
> Key: DRILL-3989
> URL: https://issues.apache.org/jira/browse/DRILL-3989
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Jacques Nadeau
>  Labels: newbie
>
> We should create a sys.queries table that provides a clusterwide view of 
> active queries. It could include the following columns:
> queryid, user, sql, current status, number of nodes involved, number of total 
> fragments, number of fragments completed, start time
> This should be a pretty straightforward task as we should be able to leverage 
> the capabilities around required affinity. A great model to build off of are 
> the sys.memory and sys.threads tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3962) Add support of ROLLUP, CUBE, GROUPING_SETS, GROUPING, GROUPING_ID, GROUP_ID support

2015-10-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967953#comment-14967953
 ] 

Julian Hyde commented on DRILL-3962:


GROUPING SETS, not GROUPING_SETS

> Add support of ROLLUP, CUBE, GROUPING_SETS, GROUPING, GROUPING_ID, GROUP_ID 
> support
> ---
>
> Key: DRILL-3962
> URL: https://issues.apache.org/jira/browse/DRILL-3962
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Jinfeng Ni
>
> These functions are important for BI analytical workload.  Currently, Calcite 
> supports those functions, but neither the planning or execution in Drill 
> supports those functions. 
> DRILL-3802 blocks those functions in Drill planning. But we should provide 
> the support for those functions in both planning and execution of Drill. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3942) IS NOT NULL filter is not pushed pass aggregation

2015-10-16 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960358#comment-14960358
 ] 

Julian Hyde commented on DRILL-3942:


I'd be surprised if FilterAggregateTransposeRule cannot handle this. So I 
wonder what's going on.

> IS NOT NULL filter is not pushed pass aggregation
> -
>
> Key: DRILL-3942
> URL: https://issues.apache.org/jira/browse/DRILL-3942
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Victoria Markman
>
> It seems to me that we should be able to do that, x is a grouping column:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select 
> ss_sold_date_sk, ss_customer_sk, avg(ss_quantity) from store_sales group by 
> ss_sold_date_sk, ss_customer_sk ) as sq(x, y, z) where x is not null;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(x=[$0], y=[$1], z=[$2])
> 00-02UnionExchange
> 01-01  Project(x=[$0], y=[$1], z=[$2])
> 01-02Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> EXPR$2=[CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY NOT NULL])
> 01-03  SelectionVectorRemover
> 01-04Filter(condition=[IS NOT NULL($0)])
> 01-05  HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], 
> agg#1=[$SUM0($3)])
> 01-06Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> $f2=[$2], $f3=[$3])
> 01-07  HashToRandomExchange(dist0=[[$0]], dist1=[[$1]])
> 02-01UnorderedMuxExchange
> 03-01  Project(ss_sold_date_sk=[$0], 
> ss_customer_sk=[$1], $f2=[$2], $f3=[$3], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))])
> 03-02HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], 
> agg#1=[COUNT($2)])
> 03-03  Project(ss_sold_date_sk=[$2], 
> ss_customer_sk=[$1], ss_quantity=[$0])
> 03-04Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tpcds1/parquet/store_sales]], 
> selectionRoot=maprfs:/tpcds1/parquet/store_sales, numFiles=1, 
> usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_customer_sk`, 
> `ss_quantity`]]])
> {code}
> If I add another not null filter, it is pushed down:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select 
> ss_sold_date_sk, ss_customer_sk, avg(ss_quantity) from store_sales group by 
> ss_sold_date_sk, ss_customer_sk ) as sq(x, y, z) where x is not null and y is 
> not null;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(x=[$0], y=[$1], z=[$2])
> 00-02UnionExchange
> 01-01  Project(x=[$0], y=[$1], z=[$2])
> 01-02Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> EXPR$2=[CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY NOT NULL])
> 01-03  HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], 
> agg#1=[$SUM0($3)])
> 01-04Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> $f2=[$2], $f3=[$3])
> 01-05  HashToRandomExchange(dist0=[[$0]], dist1=[[$1]])
> 02-01UnorderedMuxExchange
> 03-01  Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> $f2=[$2], $f3=[$3], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, 
> hash64AsDouble($0)))])
> 03-02HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], 
> agg#1=[COUNT($2)])
> 03-03  SelectionVectorRemover
> 03-04Filter(condition=[AND(IS NOT NULL($0), IS 
> NOT NULL($1))])
> 03-05  Project(ss_sold_date_sk=[$2], 
> ss_customer_sk=[$1], ss_quantity=[$0])
> 03-06Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tpcds1/parquet/store_sales]], 
> selectionRoot=maprfs:/tpcds1/parquet/store_sales, numFiles=1, 
> usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_customer_sk`, 
> `ss_quantity`]]])
> {code}
> IS NULL filter is pushed down:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select 
> ss_sold_date_sk, ss_customer_sk, avg(ss_quantity) from store_sales group by 
> ss_sold_date_sk, ss_customer_sk ) as sq(x, y, z) where x is null;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(x=[$0], y=[$1], z=[$2])
> 00-02Project(x=[$0], y=[$1], z=[$2])
> 00-03  Project(ss_sold_date_sk=[$0], ss_customer_sk=[$1], 
> EXPR$2=[CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY NOT NULL])
> 00-04HashAgg(group=[{0, 1}], agg#0=[$SUM0($2)], agg#1=[$SUM0(

[jira] [Commented] (DRILL-3912) Common subexpression elimination in code generation

2015-10-07 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947733#comment-14947733
 ] 

Julian Hyde commented on DRILL-3912:


The search space for common relational expressions is so much larger (and so 
different) that it needs a different approach. In Calcite we are planning to 
introduce a Spool operator to create temporary tables and then re-use them 
throughout the query as if they were materialized views.  Furthermore the 
temporary tables might be virtual (i.e. never fully materialized, but have two 
or subscribers to the stream of records).

CALCITE-481 is the placeholder for that work.

> Common subexpression elimination in code generation
> ---
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3912) Common subexpression elimination

2015-10-07 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947663#comment-14947663
 ] 

Julian Hyde commented on DRILL-3912:


Take a look at Calcite's RexProgram. It internally removes sub-expressions, 
topologically sorting them so that simplest expressions occur first. Also, if 
you have combined project and filter into one operation, it evaluates 
expressions used in the filter first.

> Common subexpression elimination
> 
>
> Key: DRILL-3912
> URL: https://issues.apache.org/jira/browse/DRILL-3912
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3882) Time and Timestamp vectors should not return timezone-based data

2015-10-02 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940845#comment-14940845
 ] 

Julian Hyde commented on DRILL-3882:


I guess it's up to you how you implement it, but you are correct about the 
behavior of date and timestamp values in the SQL standard and they do seem to 
map very well to Joda LocalTime.

> Time and Timestamp vectors should not return timezone-based data
> 
>
> Key: DRILL-3882
> URL: https://issues.apache.org/jira/browse/DRILL-3882
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.1.0
>Reporter: Andrew
>Assignee: Andrew
> Fix For: 1.2.0
>
>
> TimeVector, NullableTimeVector, TimestampVector, and NullableTimestampVector 
> should not return values that contain timezone information. Each of these 
> classes implements:
> {code}
> public DateTime getObject()
> {code}
> I believe the correct method should be
> {code}
> public LocalTime getObject()
> {code}
> The rational for this change is that the "time" and "timestamp" types are not 
> timezone-aware and therefore are more closely modeled by Joda's LocalTime 
> class.
> Additionally, the way it is now makes testing harder b/c 
> {code}TestBuilder{code} wants to use {code}DateTime{code} objects to compare 
> results from JDBC storage engines, but the storage engine's return no 
> timezone information for such types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3875) Reserved words in option names

2015-09-30 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939057#comment-14939057
 ] 

Julian Hyde commented on DRILL-3875:


You can't make JOIN non-reserved. If you do, people will be entitled to expect 
the parser to be able to handle queries like {code}select * from join join join 
join join using (join){code}. (Here 'join' is a table, table alias and column 
name.)

If you get rid of reserved words, the parser cannot make progress if it sees an 
identifier that might also be part of the "structure" of the query.

I would strongly recommend against making SQL standard reserved words 
non-reserved. 

> Reserved words in option names
> --
>
> Key: DRILL-3875
> URL: https://issues.apache.org/jira/browse/DRILL-3875
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Reporter: Sudheesh Katkam
>
> Either remove keywords from option names or create a list of non-reserved 
> keywords.
> The list of keywords that we need to add to the non-reserved list: ["EXEC", 
> "JOIN", "WINDOW", "LARGE", "PARTITION", "OLD", "COLUMN"].
> See comments [here|https://github.com/apache/drill/pull/159].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3853) Get off Sqlline fork

2015-09-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935786#comment-14935786
 ] 

Julian Hyde commented on DRILL-3853:


+ ability to read email

> Get off Sqlline fork
> 
>
> Key: DRILL-3853
> URL: https://issues.apache.org/jira/browse/DRILL-3853
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>
> Drill has it's own forked version of sqlline that includes customizations for 
> displaying the drill version, drill QOTD, removing names of unsupported 
> commands and removing JDBC drivers not shipped with Drill.
> To get off the fork, we need to parameterize these features in sqlline and 
> have them driven from a properties file. The changes should be merged back 
> into sqlline and Drill packaging should then provide a properties file to 
> customize the stock sqlline distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3853) Get off Sqlline fork

2015-09-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935569#comment-14935569
 ] 

Julian Hyde commented on DRILL-3853:


+1 to un-forking.

Please log bugs at https://github.com/julianhyde/sqlline/issues so we can track.

> Get off Sqlline fork
> 
>
> Key: DRILL-3853
> URL: https://issues.apache.org/jira/browse/DRILL-3853
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
>
> Drill has it's own forked version of sqlline that includes customizations for 
> displaying the drill version, drill QOTD, removing names of unsupported 
> commands and removing JDBC drivers not shipped with Drill.
> To get off the fork, we need to parameterize these features in sqlline and 
> have them driven from a properties file. The changes should be merged back 
> into sqlline and Drill packaging should then provide a properties file to 
> customize the stock sqlline distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3838) Ability to use UDFs in the directory pruning process

2015-09-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933807#comment-14933807
 ] 

Julian Hyde commented on DRILL-3838:


A wise Oracle performance expert criticized an application that used a lot of 
tables with identical column names. He said, "you're using the data dictionary 
as an index".

That pattern comes up a lot in RDBMS: do you use a lot of tables with identical 
structure, and put a UNION ALL view on top, or create one table with a TYPE 
column? And it comes up even more in Hadoop-like systems which store data in 
files. Hive, for instance, has something mid-way between the data dictionary 
and the data: the metastore. And here we are talking about the file system as 
an index.

So, using so-called metadata (the catalog, the metastore, the file system) as 
data, efficiently, is really an old ask but really useful.

> Ability to use UDFs in the directory pruning process
> 
>
> Key: DRILL-3838
> URL: https://issues.apache.org/jira/browse/DRILL-3838
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Stefán Baxter
>
> This feature request is about allowing UDFs to participate in the 
> Directory/Partition pruning process at runtime rather than at 
> planing/optimization time.
> For this a UDF needs:
>  - filename
>  - full path (not just dirN)
>  - to be able to throw a IgnoreFile exception
>  - to be able to throw a IgnoreDirecotyr exception
> I think the naming is pretty self explanatory and hopefully this brief 
> description is enough.
> _Stefan 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3838) Ability to use UDFs in the directory pruning process

2015-09-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908563#comment-14908563
 ] 

Julian Hyde commented on DRILL-3838:


I like the idea of making directory scanning a relational operation. (Hive 
suffers, I think, because their operations on the metastore are neither true 
queries nor metadata operations, but somewhere in between, but the number of 
partitions can be truly huge, so would benefit from being optimized and 
executed as if it were a query on a novel data source, namely the metastore. 
Scanning the file system is analogous to scanning the metastore.)

Once directory scanning is a relational operation, the usual relational 
optimizations follow: pushing down filters, and "sideways information passing" 
join optimizations like bloom filters.

So that would mean modeling a table scan either as (1) having a parameter, 
which is the name of the current file, and set by a nested loop join above it 
which is fed by a directory scan, or (2) giving the table scan an input, which 
is a stream of file names.

> Ability to use UDFs in the directory pruning process
> 
>
> Key: DRILL-3838
> URL: https://issues.apache.org/jira/browse/DRILL-3838
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
>Reporter: Stefán Baxter
>
> This feature request is about allowing UDFs to participate in the 
> Directory/Partition pruning process at runtime rather than at 
> planing/optimization time.
> For this a UDF needs:
>  - filename
>  - full path (not just dirN)
>  - to be able to throw a IgnoreFile exception
>  - to be able to throw a IgnoreDirecotyr exception
> I think the naming is pretty self explanatory and hopefully this brief 
> description is enough.
> _Stefan 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3840) Add support for in UDF two-phased aggregate merging for UDAFs

2015-09-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908554#comment-14908554
 ] 

Julian Hyde edited comment on DRILL-3840 at 9/25/15 7:32 PM:
-

Suggest you build on SqlSplittableAggFunction, introduced in CALCITE-751.

And by the way I wouldn't classify this as a "codegen" problem. It needs to be 
a logical rewrite. The two phases of aggregation might be done by different 
engines, e.g. phoenix and drill.


was (Author: julianhyde):
Suggest you build on SqlSplittableAggFunction, introduced in CALCITE-751.

> Add support for in UDF two-phased aggregate merging for UDAFs
> -
>
> Key: DRILL-3840
> URL: https://issues.apache.org/jira/browse/DRILL-3840
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Jacques Nadeau
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3840) Add support for in UDF two-phased aggregate merging for UDAFs

2015-09-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908554#comment-14908554
 ] 

Julian Hyde commented on DRILL-3840:


Suggest you build on SqlSplittableAggFunction, introduced in CALCITE-751.

> Add support for in UDF two-phased aggregate merging for UDAFs
> -
>
> Key: DRILL-3840
> URL: https://issues.apache.org/jira/browse/DRILL-3840
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Codegen
>Reporter: Jacques Nadeau
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3180) Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

2015-09-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901390#comment-14901390
 ] 

Julian Hyde commented on DRILL-3180:


I don't often see non-join conditions into the ON clause, but I take 
[~magnusp]'s point. If that were an outer join, we would be able to push the 
date condition down to the salary table, whereas if it were in the WHERE clause 
we could not.

Anyway, the JDBC adapter's goal is not to format the SQL to follow any "best 
practice" or to look nice for humans to read. It is to communicate with the 
target DB's query optimizer, ideally in a form that the optimizer is unlikely 
to screw up, and most importantly to preserve semantics.

Sometimes there is a danger that Calcite will "over optimize" the query, e.g.

{code}select *
FROM mp.employees.`employees` e
INNER JOIN  (
  SELECT * FROM mp.employees.`salaries` s
  WHERE s.`to_date` > CURRENT_DATE) AS s
ON e.`EMP_NO` = s.`EMP_NO`{code}

is valid and efficient but the new query block might confuse optimizers like 
MySQL's.

> Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and 
> Netezza from Apache Drill
> ---
>
> Key: DRILL-3180
> URL: https://issues.apache.org/jira/browse/DRILL-3180
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Other
>Affects Versions: 1.0.0
>Reporter: Magnus Pierre
>Assignee: Jacques Nadeau
>  Labels: Drill, JDBC, plugin
> Fix For: 1.2.0
>
> Attachments: patch.diff, pom.xml, storage-mpjdbc.zip
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I have developed the base code for a JDBC storage-plugin for Apache Drill. 
> The code is primitive but consitutes a good starting point for further 
> coding. Today it provides primitive support for SELECT against RDBMS with 
> JDBC. 
> The goal is to provide complete SELECT support against RDBMS with push down 
> capabilities.
> Currently the code is using standard JDBC classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3077) sqlline's return code is 0 even when it force exits due to failed sql command

2015-05-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544375#comment-14544375
 ] 

Julian Hyde commented on DRILL-3077:


You should upgrade sqlline. It is fixed in 1.1.7. 
https://github.com/julianhyde/sqlline/commit/066bfd3d7572f9df314c35d7834b75bb5c4c1e88

> sqlline's return code is 0 even when it force exits due to failed sql command
> -
>
> Key: DRILL-3077
> URL: https://issues.apache.org/jira/browse/DRILL-3077
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Victoria Markman
>
> My SQL script looks like this:
> {code}
> select * from sys.options limit 1;
> select * sys.options; <--- from clause is missing
> select * from sys.options limit 1;
> {code}
> sqlline correctly exists (--force is set to true by default).
> However, return code is '0', which makes scripting challenging.
> It should be set to 1.
> {code}
> [Wed May 13 17:49:39 root@~ ] # ${DRILL_HOME}/bin/sqlline -u 
> "jdbc:drill:schema=dfs.ctas_parquet"  --run=/root/script.sql
> 1/5  select * from sys.options limit 1;
> +++++++++
> |name|kind|type|   status   |  num_val   | string_val 
> |  bool_val  | float_val  |
> +++++++++
> | drill.exec.rpc.bit.server.retry.delay | LONG   | BOOT   | BOOT  
>  | 500| null   | null   | null   |
> +++++++++
> 1 row selected (0.247 seconds)
> 2/5  
> 3/5  select * sys.options;
> Error: PARSE ERROR: Encountered "." at line 1, column 13.
> Was expecting one of:
> "FROM" ...
> "," ...
> [Error Id: 9da00514-6a96-4d9a-b90a-c903d006c060 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> Aborting command set because "force" is false and command failed: "select * 
> sys.options;"
> Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection
> sqlline version 1.1.6
> [Wed May 13 17:53:56 root@~ ] # echo $?
> 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2870) Fix return type of aggregate functions to be nullable

2015-04-27 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515846#comment-14515846
 ] 

Julian Hyde commented on DRILL-2870:


For this case, Calcite wraps the aggregate function in a {{CASE COUNT(*) WHEN 0 
THEN NULL ELSE ... END}} call. Thus you get the efficiency of a not-null 
accumulator.

> Fix return type of aggregate functions to be nullable
> -
>
> Key: DRILL-2870
> URL: https://issues.apache.org/jira/browse/DRILL-2870
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.0.0
>
>
> The output type of aggregate functions is required if the input is required 
> which does not always hold true. Consider the case where input is a required 
> type and we are performing sum(), however if all the rows are filtered out 
> then we should return null instead of 0. This holds true for all aggregate 
> functions (except count). 
> As part of DRILL-2277 we are fixing the case when we have an aggregate 
> function (without group by) and the input batch is empty we still need to 
> produce one record with null as output (count function is an exception to 
> this). If we don't fix the return type of aggregate functions then we will 
> return wrong results in the case where we have an empty input with required 
> columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-2870) Fix return type of aggregate functions to be nullable

2015-04-27 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515846#comment-14515846
 ] 

Julian Hyde edited comment on DRILL-2870 at 4/27/15 11:30 PM:
--

For this case, Calcite wraps the aggregate function in a {{CASE COUNT( * ) WHEN 
0 THEN NULL ELSE ... END}} call. Thus you get the efficiency of a not-null 
accumulator.


was (Author: julianhyde):
For this case, Calcite wraps the aggregate function in a {{CASE COUNT(*) WHEN 0 
THEN NULL ELSE ... END}} call. Thus you get the efficiency of a not-null 
accumulator.

> Fix return type of aggregate functions to be nullable
> -
>
> Key: DRILL-2870
> URL: https://issues.apache.org/jira/browse/DRILL-2870
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Mehant Baid
>Assignee: Mehant Baid
>Priority: Critical
> Fix For: 1.0.0
>
>
> The output type of aggregate functions is required if the input is required 
> which does not always hold true. Consider the case where input is a required 
> type and we are performing sum(), however if all the rows are filtered out 
> then we should return null instead of 0. This holds true for all aggregate 
> functions (except count). 
> As part of DRILL-2277 we are fixing the case when we have an aggregate 
> function (without group by) and the input batch is empty we still need to 
> produce one record with null as output (count function is an exception to 
> this). If we don't fix the return type of aggregate functions then we will 
> return wrong results in the case where we have an empty input with required 
> columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2880) non-syntax error has unexpected category "PARSE ERROR"

2015-04-26 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513277#comment-14513277
 ] 

Julian Hyde commented on DRILL-2880:


Might be a sign that the parser is trying to do too much validation. IMHO the 
parser should return a complete AST before [semantic] validation starts. That 
means the AST needs to be able to refer to a file that does not exist, and 
indeed to numbers which are too large to be held in BIGINTs.

> non-syntax error has unexpected category "PARSE ERROR"
> --
>
> Key: DRILL-2880
> URL: https://issues.apache.org/jira/browse/DRILL-2880
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Reporter: Daniel Barclay (Drill)
>Assignee: Aman Sinha
>Priority: Minor
>
> Some errors that are not syntax errors (failures to match the productions of 
> the SQL grammar), such as referring to a non-existent table:
> SELECT * FROM NoSuchFile;
> yield error messages that unexpectedly refer to parsing, such as:
> PARSE ERROR: From line 1, column 15 to line 1, column 26: Table 'NoSuchFile' 
> not found
>  
> That "PARSE ERROR" sounds like it's saying it's specifically a parsing error, 
> when it's really an error downstream of parsing.  
> (A parsing error requires changing the query to conform to the grammar.  In 
> the above case, creating a file or view would resolve the problem without 
> changing the query, so the problem is not a parsing error.)
>  
> Its not clear whether that error should be reported using a different 
> existing or new error category, or whether the category "PARSE ERROR" should 
> be renamed (to cover both real parsing errors and the level of errors).
> Note, however, that to avoid confusion, errors that can be fixed by changing 
> something other than the query (i.e., the data store state) probably should 
> not be reported using the same category as true syntax/parsing errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2837) Resolve what Statement.cancel() really should do

2015-04-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506377#comment-14506377
 ] 

Julian Hyde commented on DRILL-2837:


execute (or executeQuery or executeUpdate) should throw. cancel should just 
return. It doesn't know whether it succeeded in issuing a cancel.

> Resolve what Statement.cancel() really should do
> 
>
> Key: DRILL-2837
> URL: https://issues.apache.org/jira/browse/DRILL-2837
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
>
> It is not clear exactly what JDBC's Statement.cancel() method is supposed to 
> do.
> The Javadoc method description for cancel() says only:
> bq. Cancels this Statement object if both the DBMS and driver support 
> aborting an SQL statement. This method can be used by one thread to cancel a 
> statement that is being executed by another thread.
> In particular, it's not clear what "cancels this Statement" really means.  
> (The JDBC PDF specification doesn't anything about it.)
>  
> It seems reasonable to think that calling cancel() on a Statement cancels any 
> associated query that has not already completed, leaves any associated 
> ResultSet closed, and leaves the statement closed.
> However, JDBC doesn't actually specify any of that, AvaticaStatement.cancel() 
> does not close the Statement, and it's not clear whether SQLLine expects the 
> above interpretation or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2837) Resolve what Statement.cancel() really should do

2015-04-21 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505339#comment-14505339
 ] 

Julian Hyde commented on DRILL-2837:


It is my belief cancel does not close the statement. If the statement is 
executing, I would expect it to throw an exception with a particular SQLCode. 
But it is also possible and valid that the cancel does not arrive in time and 
the statement completes normally.

One strange thing about cancel is that it needs to be thread-safe. Well 
obviously, you need to cancel on one thread while another is blocked calling 
execute or executeQuery or executeUpdate. But this is the exception to usual 
JDBC semantics - no other operations are guaranteed thread-safe on the same 
statement or even statements from the same connection.

> Resolve what Statement.cancel() really should do
> 
>
> Key: DRILL-2837
> URL: https://issues.apache.org/jira/browse/DRILL-2837
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
>
> It is not clear exactly what JDBC's Statement.cancel() method is supposed to 
> do.
> The Javadoc method description for cancel() says only:
> bq. Cancels this Statement object if both the DBMS and driver support 
> aborting an SQL statement. This method can be used by one thread to cancel a 
> statement that is being executed by another thread.
> In particular, it's not clear what "cancels this Statement" really means.  
> (The JDBC PDF specification doesn't anything about it.)
>  
> It seems reasonable to think that calling cancel() on a Statement cancels any 
> associated query that has not already completed, leaves any associated 
> ResultSet closed, and leaves the statement closed.
> However, JDBC doesn't actually specify any of that, AvaticaStatement.cancel() 
> does not close the Statement, and it's not clear whether SQLLine expects the 
> above interpretation or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2738) Offset with casting a column to timestamp not working

2015-04-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492947#comment-14492947
 ] 

Julian Hyde commented on DRILL-2738:


Lazy evaluation, baby!

> Offset with casting a column to timestamp not working
> -
>
> Key: DRILL-2738
> URL: https://issues.apache.org/jira/browse/DRILL-2738
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Venkata krishnan Sowrirajan
>
> In the below query, it should skip the first row which is a header and want 
> to cast one of the column to timestamp. But it is trying to parse the first 
> row to cast it to timestamp. Without casting it to timestamp, simple offset 
> query works fine.
> "select cast(columns[0] as timestamp) from 
> `guts-csv/CSV/guts_run_lab-app002.csv` offset 1;"
> So I did explain plan on the above query
> explain plan without implementation for select cast(columns[0] as timestamp) 
> from `guts-csv/CSV/guts_run_lab-app002.csv` offset 1;
> DrillScreenRel
>   DrillLimitRel(offset=[1])
> DrillProjectRel(EXPR$0=[CAST(ITEM($0, 0)):TIMESTAMP(0)])
>   DrillScanRel(table=[[fs, drill, guts-csv/CSV/guts_run_lab-app002.csv]], 
> groupscan=[EasyGroupScan 
> [selectionRoot=/mapr/yarn-test/drill/guts-csv/CSV/guts_run_lab-app002.csv, 
> numFiles=1, columns=[`columns`[0]], 
> files=[file:/mapr/yarn-test/drill/guts-csv/CSV/guts_run_lab-app002.csv]]])
> In the plan, it looks like it tries to do casting to timestamp and then the 
> offset operation which is why its failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2597) Sqlline fails when script contains comments

2015-03-27 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383483#comment-14383483
 ] 

Julian Hyde commented on DRILL-2597:


SQL supports both kinds of comments:

{code} ::=

  | 

 ::=
   [ ... ] 

 ::=
  

 ::=
  
  
  

 ::=
  /*

 ::=
  */{code}

So sqlline should send comments to JDBC as if they were commands, and Drill's 
SQL parser should deal with them.

> Sqlline fails when script contains comments
> ---
>
> Key: DRILL-2597
> URL: https://issues.apache.org/jira/browse/DRILL-2597
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Daniel Barclay (Drill)
>
> Sqlline fails if option -f or --run= is used, and the DDL file contains a 
> comment. 
> *File contents:*
> {code}
> -- comments
> drop view abc;
> create view abc;
> {code}
> *Fails to recognize comments starting with -- *
> {code}
> ${DRILL_HOME}/bin/sqlline -u "jdbc:drill:schema=dfs.tmp"  --run=abc.sql
> Drill log directory: /opt/mapr/drill/drill-0.8.0/logs
> 1/50 -- comments
> Aborting command set because "force" is false and command failed: "-- 
> comments "
> Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection
> sqlline version 1.1.6
> # ${DRILL_HOME}/bin/sqlline -u "jdbc:drill:schema=dfs.tmp"  -f abc.sql
> Drill log directory: /opt/mapr/drill/drill-0.8.0/logs
> 1/50 -- comments
> Aborting command set because "force" is false and command failed: "-- comment"
> Closing: org.apache.drill.jdbc.DrillJdbc41Factory$DrillJdbc41Connection
> sqlline version 1.1.6
> {code}
> However, it does recognizes comments enclosed within /* ... */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-412) FoodMart data (account.json) cause JsonParseException

2015-03-09 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated DRILL-412:
--
Attachment: DRILL-412.1.patch.txt

> FoodMart data (account.json) cause JsonParseException
> -
>
> Key: DRILL-412
> URL: https://issues.apache.org/jira/browse/DRILL-412
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Chun Chang
>Priority: Minor
> Fix For: Future
>
> Attachments: DRILL-412.1.patch.txt
>
>
> The account.json file contain character "[" that is not properly escaped. 
> {"account_id":3100,"account_parent":3000,"account_description":"Gross 
> Sales","account_type":"Income","account_rollup":"+","Custom_Members":"LookUpCube("[Sales]","(Measures.[Store
>  Sales],"+time.currentmember.UniqueName+","+ 
> Store.currentmember.UniqueName+")")"}
> This caused the following query failure:
> 0: jdbc:drill:> select * from `account.json` limit 5;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "f6f296e6-d145-4b27-9622-fcbf6aa9e11d"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NullPointerException"
> ]
> Error: exception while executing query (state=,code=0)
> 0: jdbc:drill:> select * from `account.json` limit 5;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "aa455c46-94b0-43b4-8aaa-4bf224d3956d"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NullPointerException"
> ]
> Error: exception while executing query (state=,code=0)
> And here is the stack trace:
> 12:07:53.370 [WorkManager-6] ERROR o.a.d.e.s.easy.json.JSONRecordReader - 
> Error reading next in Json reader
> com.fasterxml.jackson.core.JsonParseException: Unexpected character ('[' 
> (code 91)): was expecting comma to separate OBJECT entries
>  at [Source: org.apache.hadoop.fs.FSDataInputStream@22c09510; line: 4, 
> column: 154]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1369) 
> ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:520)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:655)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader$ReadType.readRecord(JSONRecordReader.java:316)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:143)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:94) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.next(LimitRecordBatch.java:86)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.next(ScreenCreator.java:80)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:83)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
>   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 12:

[jira] [Commented] (DRILL-412) FoodMart data (account.json) cause JsonParseException

2015-03-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351408#comment-14351408
 ] 

Julian Hyde commented on DRILL-412:
---

I have fixed the data set in 
https://github.com/julianhyde/foodmart-data-json/commit/b7585e7854a40572f41031de29844264395a8dc6
 and released {groupId: net.hydromatic, artifactId: foodmart-data-json; 
version: 0.4}. If you upgrade to that version of the data set this issue should 
be fixed.

> FoodMart data (account.json) cause JsonParseException
> -
>
> Key: DRILL-412
> URL: https://issues.apache.org/jira/browse/DRILL-412
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Chun Chang
>Priority: Minor
> Fix For: Future
>
>
> The account.json file contain character "[" that is not properly escaped. 
> {"account_id":3100,"account_parent":3000,"account_description":"Gross 
> Sales","account_type":"Income","account_rollup":"+","Custom_Members":"LookUpCube("[Sales]","(Measures.[Store
>  Sales],"+time.currentmember.UniqueName+","+ 
> Store.currentmember.UniqueName+")")"}
> This caused the following query failure:
> 0: jdbc:drill:> select * from `account.json` limit 5;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "f6f296e6-d145-4b27-9622-fcbf6aa9e11d"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NullPointerException"
> ]
> Error: exception while executing query (state=,code=0)
> 0: jdbc:drill:> select * from `account.json` limit 5;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "aa455c46-94b0-43b4-8aaa-4bf224d3956d"
> endpoint {
>   address: "qa-node117.qa.lab"
>   user_port: 31010
>   control_port: 31011
>   data_port: 31012
> }
> error_type: 0
> message: "Failure while running fragment. < NullPointerException"
> ]
> Error: exception while executing query (state=,code=0)
> And here is the stack trace:
> 12:07:53.370 [WorkManager-6] ERROR o.a.d.e.s.easy.json.JSONRecordReader - 
> Error reading next in Json reader
> com.fasterxml.jackson.core.JsonParseException: Unexpected character ('[' 
> (code 91)): was expecting comma to separate OBJECT entries
>  at [Source: org.apache.hadoop.fs.FSDataInputStream@22c09510; line: 4, 
> column: 154]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1369) 
> ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:520)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:655)
>  ~[jackson-core-2.2.0.jar:2.2.0]
>   at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader$ReadType.readRecord(JSONRecordReader.java:316)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:143)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:94) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.next(LimitRecordBatch.java:86)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:42)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.next(ScreenCreator.java:80)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:83)
>  
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
>   at 
> java.util.co

[jira] [Commented] (DRILL-367) FoodMart data (category.json) packaged with Drill does not conform with JSON specification

2015-03-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351407#comment-14351407
 ] 

Julian Hyde commented on DRILL-367:
---

I have fixed the data set in 
https://github.com/julianhyde/foodmart-data-json/commit/c475d45ff7dd2c1eb8bef00e95eb7148f2e7f606
 and released {groupId: net.hydromatic, artifactId: foodmart-data-json; 
version: 0.4}. If you upgrade to that version of the data set this issue should 
be fixed.

> FoodMart data (category.json) packaged with Drill does not conform with JSON 
> specification 
> ---
>
> Key: DRILL-367
> URL: https://issues.apache.org/jira/browse/DRILL-367
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Chun Chang
>Priority: Trivial
> Fix For: Future
>
>
> The json file category.json packaged within 
> mondrian-data-foodmart-json-0.2.jar contains non-json conforming data (extra 
> double double quote, single quote can be used directly and does not need to 
> be escaped):
> [root@ mondrian-data]# cat category.json
> {"category_id":"ACTUAL","category_parent":null,"category_description":"Current
>  Year""s Actuals","category_rollup":null}
> {"category_id":"ADJUSTMENT","category_parent":null,"category_description":"Adjustment
>  for Budget input","category_rollup":null}
> {"category_id":"BUDGET","category_parent":null,"category_description":"Current
>  Year""s Budget","category_rollup":null}
> {"category_id":"FORECAST","category_parent":null,"category_description":"Forecast","category_rollup":null}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2116) Add non-reserved keywords to non-reserved keyword list in parser

2015-01-29 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297338#comment-14297338
 ] 

Julian Hyde commented on DRILL-2116:


The following statement will be ambiguous if you make LEFT non-reserved: 
{code}select * from t1 left join t2 rite on left.x = rite.y{code}

> Add non-reserved keywords to non-reserved keyword list in parser
> 
>
> Key: DRILL-2116
> URL: https://issues.apache.org/jira/browse/DRILL-2116
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Reporter: Jacques Nadeau
>Assignee: Aman Sinha
>
> There are a number of keywords in Drill that shouldn't be considered reserved 
> when parsing.  Calcite allows us to customize the list of un-reserved 
> keywords and we should update the list to allow more words.  Things that I've 
> run across include value, user, left, etc.  
> This is a very common usability problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-1898) SELECT DISTINCT fails when applied to boolean column

2014-12-18 Thread Julian Hyde (JIRA)
Julian Hyde created DRILL-1898:
--

 Summary: SELECT DISTINCT fails when applied to boolean column
 Key: DRILL-1898
 URL: https://issues.apache.org/jira/browse/DRILL-1898
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Julian Hyde


SELECT DISTINCT fails when applied to boolean column.

{code}
0: jdbc:drill:zk=local> select distinct `bool_val` from `sys`.`options`;
Query failed: Query failed: Failure while running fragment., Failure finding 
function that runtime code generation expected.  Signature: compare_to( 
BIT:OPTIONALBIT:OPTIONAL,  ) returns INT:REQUIRED [ 
af1536ff-eca4-4592-b0ac-2b625362bf2f on 10.11.4.182:31010 ]
[ af1536ff-eca4-4592-b0ac-2b625362bf2f on 10.11.4.182:31010 ]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-391) IN-list predicate with 20 or more elements gives UnsupportedOperationException

2014-12-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245571#comment-14245571
 ] 

Julian Hyde commented on DRILL-391:
---

Systems that generate large IN lists are not going to stop at 200. (Mondrian is 
one such system.) Meanwhile, IN lists with 100 or so elements are going to 
perform poorly. So, this is a poor compromise. I would not be inclined to 
accept the Calcite patch.


> IN-list predicate with 20 or more elements gives UnsupportedOperationException
> --
>
> Key: DRILL-391
> URL: https://issues.apache.org/jira/browse/DRILL-391
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 0.7.0
>
> Attachments: 
> 0001-DRILL-391-Test-case-for-large-IN-list-actual-fix-wor.patch, 
> 0001-Increase-IN-list-threshold-from-20-to-200-for-system.patch
>
>
> select _MAP['N_REGIONKEY'], _MAP['N_NATIONKEY'] FROM 
> "/tmp/parquet/nation.parquet" where cast(_MAP['N_NATIONKEY'] as int) in (1, 
> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20);
> java.lang.UnsupportedOperationException
>   at 
> org.apache.drill.optiq.DrillValuesRel.implement(DrillValuesRel.java:51)
>   at 
> org.apache.drill.optiq.DrillImplementor.visitChild(DrillImplementor.java:143)
>   at 
> org.apache.drill.optiq.DrillAggregateRel.implement(DrillAggregateRel.java:62)
>   at 
> org.apache.drill.optiq.DrillImplementor.visitChild(DrillImplementor.java:143)
>   at 
> org.apache.drill.optiq.DrillJoinRel.implementInput(DrillJoinRel.java:98)
>   at org.apache.drill.optiq.DrillJoinRel.implement(DrillJoinRel.java:75)
>   at 
> org.apache.drill.optiq.DrillImplementor.visitChild(DrillImplementor.java:143)
>   at 
> org.apache.drill.optiq.DrillProjectRel.implement(DrillProjectRel.java:63)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)