[jira] [Assigned] (DRILL-5681) Incorrect query result when query uses star and correlated subquery
[ https://issues.apache.org/jira/browse/DRILL-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka reassigned DRILL-5681: -- Assignee: Vitalii Diravka (was: Jinfeng Ni) > Incorrect query result when query uses star and correlated subquery > --- > > Key: DRILL-5681 > URL: https://issues.apache.org/jira/browse/DRILL-5681 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Vitalii Diravka > > The following repo was based on a testcase provided by Arjun > Rajan(ara...@mapr.com). > Drill returns incorrect query result, when the query has a correlated > subquery and querying against a view defined with select *, or querying a > subquery with select *. > Case 1: Querying view with select * + correlated subquery > {code} > create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`; > {code} > //Q1 : return 25 rows. The correct answer is 0 row. > {code} > SELECT n_nationkey, n_name > FROM dfs.tmp.nation_view a > WHERE NOT EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey > ) > +--+-+ > | n_nationkey | n_name | > +--+-+ > | 0| ALGERIA | > | 1| ARGENTINA | > | 2| BRAZIL | > ... > | 24 | UNITED STATES | > +--+-+ > 25 rows selected (0.614 seconds) > {code} > // Q2: return 0 row. The correct answer is 25 rows. > {code} > SELECT n_nationkey, n_name > FROM dfs.tmp.nation_view a > WHERE EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey > ) > +--+-+ > | n_nationkey | n_name | > +--+-+ > +--+-+ > No rows selected (0.4 seconds) > {code} > Case 2: Querying a table expression with select * > // Q3: return 25 rows. The correct result is 0 row > {code} > SELECT n_nationkey, n_name > FROM ( > SELECT * FROM cp.`tpch/nation.parquet` > ) a > WHERE NOT EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey > ) > +--+-+ > | n_nationkey | n_name | > +--+-+ > | 0| ALGERIA | > | 1| ARGENTINA | > ... > | 24 | UNITED STATES | > +--+-+ > 25 rows selected (0.451 seconds) > {code} > Q4: return 0 row. The correct result is 25 rows. > {code} > SELECT n_nationkey, n_name > FROM ( > SELECT * FROM cp.`tpch/nation.parquet` > ) a > WHERE EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey > ) > +--+-+ > | n_nationkey | n_name | > +--+-+ > +--+-+ > No rows selected (0.515 seconds) > {code} > All cases can be reproduced without view usage, using sub-select with star is > enough. > For example: > {code} > SELECT n_nationkey, n_name > FROM (select * from cp.`tpch/nation.parquet`) a > WHERE NOT EXISTS (SELECT 1 > FROM cp.`tpch/region.parquet` b > WHERE b.r_regionkey = a.n_regionkey > ) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DRILL-5683) Incorrect query result when query uses NOT(IS NOT NULL) expression
[ https://issues.apache.org/jira/browse/DRILL-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka reassigned DRILL-5683: -- Assignee: Vitalii Diravka (was: Jinfeng Ni) > Incorrect query result when query uses NOT(IS NOT NULL) expression > --- > > Key: DRILL-5683 > URL: https://issues.apache.org/jira/browse/DRILL-5683 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Vitalii Diravka > > The following repo was modified from a testcase provided by Arjun > Rajan(ara...@mapr.com). > 1. Prepare dataset with null. > {code} > create table dfs.tmp.t1 as > select r_regionkey, r_name, case when mod(r_regionkey, 3) > 0 then > mod(r_regionkey, 3) else null end as flag > from cp.`tpch/region.parquet`; > select * from dfs.tmp.t1; > +--+--+---+ > | r_regionkey |r_name| flag | > +--+--+---+ > | 0| AFRICA | null | > | 1| AMERICA | 1 | > | 2| ASIA | 2 | > | 3| EUROPE | null | > | 4| MIDDLE EAST | 1 | > +--+--+---+ > {code} > 2. Query with NOT(IS NOT NULL) expression in the filter. > {code} > select * from dfs.tmp.t1 where NOT (flag IS NOT NULL); > +--+-+---+ > | r_regionkey | r_name | flag | > +--+-+---+ > | 0| AFRICA | null | > | 3| EUROPE | null | > +--+-+---+ > {code} > 3. Switch run-time code compiler from default to 'JDK', and get wrong result. > {code} > alter system set `exec.java_compiler` = 'JDK'; > +---+--+ > | ok | summary| > +---+--+ > | true | exec.java_compiler updated. | > +---+--+ > select * from dfs.tmp.t1 where NOT (flag IS NOT NULL); > +--+--+---+ > | r_regionkey |r_name| flag | > +--+--+---+ > | 0| AFRICA | null | > | 1| AMERICA | 1 | > | 2| ASIA | 2 | > | 3| EUROPE | null | > | 4| MIDDLE EAST | 1 | > +--+--+---+ > {code} > 4. Wrong result could happen too, when NOT(IS NOT NULL) in Project operator. > {code} > select r_regionkey, r_name, NOT(flag IS NOT NULL) as exp1 from dfs.tmp.t1; > +--+--+---+ > | r_regionkey |r_name| exp1 | > +--+--+---+ > | 0| AFRICA | true | > | 1| AMERICA | true | > | 2| ASIA | true | > | 3| EUROPE | true | > | 4| MIDDLE EAST | true | > +--+--+---+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5952) Implement "CREATE TABLE IF NOT EXISTS"
[ https://issues.apache.org/jira/browse/DRILL-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265423#comment-16265423 ] ASF GitHub Bot commented on DRILL-5952: --- Github user prasadns14 commented on a diff in the pull request: https://github.com/apache/drill/pull/1033#discussion_r153004699 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java --- @@ -47,20 +47,19 @@ public SqlCall createCall(SqlLiteral functionQualifier, SqlParserPos pos, SqlNod private SqlIdentifier viewName; private SqlNodeList fieldList; private SqlNode query; - private boolean replaceView; + private SqlLiteral createViewType; - public SqlCreateView(SqlParserPos pos, SqlIdentifier viewName, SqlNodeList fieldList, - SqlNode query, SqlLiteral replaceView) { -this(pos, viewName, fieldList, query, replaceView.booleanValue()); + public enum SqlCreateViewType { +SIMPLE, ORREPLACE, IFNOTEXISTS --- End diff -- done > Implement "CREATE TABLE IF NOT EXISTS" > -- > > Key: DRILL-5952 > URL: https://issues.apache.org/jira/browse/DRILL-5952 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: Future > > > Currently, if a table/view with the same name exists CREATE TABLE fails with > VALIDATION ERROR > Having "IF NOT EXISTS" support for CREATE TABLE will ensure that query > succeeds -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5952) Implement "CREATE TABLE IF NOT EXISTS"
[ https://issues.apache.org/jira/browse/DRILL-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265424#comment-16265424 ] ASF GitHub Bot commented on DRILL-5952: --- Github user prasadns14 commented on the issue: https://github.com/apache/drill/pull/1033 @arina-ielchiieva, please review > Implement "CREATE TABLE IF NOT EXISTS" > -- > > Key: DRILL-5952 > URL: https://issues.apache.org/jira/browse/DRILL-5952 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: Future > > > Currently, if a table/view with the same name exists CREATE TABLE fails with > VALIDATION ERROR > Having "IF NOT EXISTS" support for CREATE TABLE will ensure that query > succeeds -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5963) Canceling a query hung in planning state, leaves the query in ENQUEUED state for ever.
[ https://issues.apache.org/jira/browse/DRILL-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5963: Fix Version/s: 1.13.0 > Canceling a query hung in planning state, leaves the query in ENQUEUED state > for ever. > -- > > Key: DRILL-5963 > URL: https://issues.apache.org/jira/browse/DRILL-5963 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 > Environment: Drill 1.12.0-SNAPSHOT, commit: > 4a718a0bd728ae02b502ac93620d132f0f6e1b6c >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.13.0 > > Attachments: enqueued-2.png > > > Canceling the below query that is hung in planning state, leaves the query in > ENQUEUED state for ever. > Here is the query that is hung in planning state > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || > ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' > AS CSV_DATA from (values(1)); > +--+ > | | > +--+ > +--+ > No rows selected (304.291 seconds) > {noformat} > Explain plan for that query also just hangs. > {noformat} > explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || > ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA > from (values(1)); > ... > {noformat} > The above issues show the following problems: > *1. Simple query with reasonable number of concat functions hangs.* > In reality query does not hang it just take lots of time to execute. The root > cause is that during planning time DrillFuncHolderExpr return type is > extensively used to determine matching function, matching type etc. Though > this type is retrieved via > [getter|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/DrillFuncHolderExpr.java#L41] > in reality complex logic is executed beaneath it. For example for [concat > function|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/ConcatReturnTypeInference.java#L47]. > Since function return type can not be changes during DrillFuncHolderExpr > life time, it is safe to cache it. > *2. No mechanism to cancel query during ENQUEUED state.* > Currently Drill does not have mechanism to cancel query before STARTING / > RUNNING states. Plus ENQUEUED state includes two PLANNING and ENQUEUED. > Also submitting mechanism for submitting query to the queue is blocking, > making foreman wait till enqueueing is done Making it non-blocking will > prevent consuming threads that just sit idle in a busy system and also is > important when we move to a real admission control solution. > The following changes were made to address above issues: > a. two new states were added: PREPARING (when foreman is initialized) and > PLANNING (includes logical and / or physical planning). > b. process of query enqueuing was made non-blocking. Once query was enqueued, > fragments runner is called to submit fragments locally and remotely. > c. ability to cancel query during planning and enqueued states was added. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5963) Canceling a query hung in planning state, leaves the query in ENQUEUED state for ever.
[ https://issues.apache.org/jira/browse/DRILL-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-5963: Description: Canceling the below query that is hung in planning state, leaves the query in ENQUEUED state for ever. Here is the query that is hung in planning state {noformat} 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA from (values(1)); +--+ | | +--+ +--+ No rows selected (304.291 seconds) {noformat} Explain plan for that query also just hangs. {noformat} explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA from (values(1)); ... {noformat} The above issues show the following problems: *1. Simple query with reasonable number of concat functions hangs.* In reality query does not hang it just take lots of time to execute. The root cause is that during planning time DrillFuncHolderExpr return type is extensively used to determine matching function, matching type etc. Though this type is retrieved via [getter|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/DrillFuncHolderExpr.java#L41] in reality complex logic is executed beaneath it. For example for [concat function|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/ConcatReturnTypeInference.java#L47]. Since function return type can not be changes during DrillFuncHolderExpr life time, it is safe to cache it. *2. No mechanism to cancel query during ENQUEUED state.* Currently Drill does not have mechanism to cancel query before STARTING / RUNNING states. Plus ENQUEUED state includes two PLANNING and ENQUEUED. Also submitting mechanism for submitting query to the queue is blocking, making foreman wait till enqueueing is done Making it non-blocking will prevent consuming threads that just sit idle in a busy system and also is important when we move to a real admission control solution. The following changes were made to address above issues: a. two new states were added: PREPARING (when foreman is initialized) and PLANNING (includes logical and / or physical planning). b. process of query enqueuing was made non-blocking. Once query was enqueued, fragments runner is called to submit fragments locally and remotely. c. ability to cancel query during planning and enqueued states was added. was: Canceling the below query that is hung in planning state, leaves the query in ENQUEUED state for ever. Here is the query that is hung in planning state {noformat} 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA from (values(1)); +--+ | | +--+ +--+ No rows selected (304.291 seconds) {noformat} Explain plan for that query also just hangs. {noformat} explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA from (values(1)); ... {noformat} > Canceling a query hung in planning state, leaves the query in ENQUEUED state > for ever. > -- > > Key: DRILL-5963 > URL: https://issues.apache.org/jira/browse/DRILL-5963 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 > Environment: Drill 1.12.0-SNAPSHOT, commit: > 4a718a0bd728ae02b502ac93620d132f0f6e1b6c >Reporter: Khurram Faraaz >Assignee: Arina Ielchiieva >Priority: Critical > Attachments: enqueued-2.png > > > Canceling the below query that is hung in planning state, leaves the query in > ENQUEUED state for ever. > Here is the query that is hung in planning state > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || > ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' > AS CSV_DATA from (values(1)); > +--+ > | | > +--+ > +--+ > No rows selected (304.291 seconds) > {noformat} > Explain plan for that query also just hangs. > {noformat} > explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || > ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA > from (values(1)); > ... > {noformat} > The above issues show the following problems: > *1. Simple query with reasonable number of concat functions hangs.* > In reality query does not hang it just take lots of time to execute. The root > cause is that during planning time DrillFuncHolderExpr return type is > extensively used to determine matching function,
[jira] [Created] (DRILL-5991) Performance improvements for Hive tables with skip header / footer logic
Arina Ielchiieva created DRILL-5991: --- Summary: Performance improvements for Hive tables with skip header / footer logic Key: DRILL-5991 URL: https://issues.apache.org/jira/browse/DRILL-5991 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive Affects Versions: 1.12.0 Reporter: Arina Ielchiieva Currently when Hive table has header / footer all input split of the file are processed by one reader. This has performance impact better way would be to keep one reader per split and see if we can figure out a way to tell readers how many rows they should skip. To create reader for each input split and maintain skip header / footer functionality we need to know how many rows are in input split. Unfortunately, input split does not hold such information, only [number of bytes|https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapred/FileSplit.html]. We can't apply skip header functionality for the first input split and skip footer for the last input either since we don't know how many rows will be skipped, it can be the situation that we need to skip the whole first input split and partially second. Also we use [Hadoop reader|https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/mapred/RecordReader.html] for the data and don't have information about number of rows in input split. Possible improvements: 1. For table with header only before creating readers we can start skipping header and when done, create reader at that position, for other untouched input splits create separate readers though all readers will be on the same node. 2. Consider Drill text reader usage instead of Hadoop one (as we do for parquet files) which might provide more flexibility in terms of offsetting bytes etc. This should be investigated further. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5952) Implement "CREATE TABLE IF NOT EXISTS"
[ https://issues.apache.org/jira/browse/DRILL-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265095#comment-16265095 ] ASF GitHub Bot commented on DRILL-5952: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1033#discussion_r152929802 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/SqlCreateView.java --- @@ -47,20 +47,19 @@ public SqlCall createCall(SqlLiteral functionQualifier, SqlParserPos pos, SqlNod private SqlIdentifier viewName; private SqlNodeList fieldList; private SqlNode query; - private boolean replaceView; + private SqlLiteral createViewType; - public SqlCreateView(SqlParserPos pos, SqlIdentifier viewName, SqlNodeList fieldList, - SqlNode query, SqlLiteral replaceView) { -this(pos, viewName, fieldList, query, replaceView.booleanValue()); + public enum SqlCreateViewType { +SIMPLE, ORREPLACE, IFNOTEXISTS --- End diff -- Please use underscore to separate words: `OR_REPLACE`, `IF_NOT_EXISTS`. > Implement "CREATE TABLE IF NOT EXISTS" > -- > > Key: DRILL-5952 > URL: https://issues.apache.org/jira/browse/DRILL-5952 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.11.0 >Reporter: Prasad Nagaraj Subramanya >Assignee: Prasad Nagaraj Subramanya > Fix For: Future > > > Currently, if a table/view with the same name exists CREATE TABLE fails with > VALIDATION ERROR > Having "IF NOT EXISTS" support for CREATE TABLE will ensure that query > succeeds -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5919) Add non-numeric support for JSON processing
[ https://issues.apache.org/jira/browse/DRILL-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265061#comment-16265061 ] Volodymyr Tkach commented on DRILL-5919: Following math functions results with NumberFormatException because currently Calcite and Drill handles FLOAT,DOUBLE types using BigDecimal class, which doesn't support nan, inf values. Currently investigating the problem, looking for ways of how to handle nan,inf values in a different way. *Query example:* _select sin(cast('NaN' as float)) from (values(1))_ * div * divide * add * multiply * tanh * sin * asin * cos * cot * acos * sqrt * ceil * negative, * castFLOAT4 * abs * floor * exp * subtract * sinh * cbrt * mod * degrees * trunc * trunc * casthigh * log * log * power * atan * tan * radians * cosh * round * round * convertToNullableFLOAT8 * convertToNullableFLOAT4 > Add non-numeric support for JSON processing > --- > > Key: DRILL-5919 > URL: https://issues.apache.org/jira/browse/DRILL-5919 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON >Affects Versions: 1.11.0 >Reporter: Volodymyr Tkach >Assignee: Volodymyr Tkach > Labels: doc-impacting > Fix For: Future > > > Add session options to allow drill working with non standard json strings > number literals like: NaN, Infinity, -Infinity. By default these options will > be switched off, the user will be able to toggle them during working session. > *For documentation* > 1. Added two session options {{store.json.reader.non_numeric_numbers}} and > {{store.json.reader.non_numeric_numbers}} that allow to read/write NaN and > Infinity as numbers. By default these options are set to false. > 2. Extended signature of {{convert_toJSON}} and {{convert_fromJSON}} > functions by adding second optional parameter that enables read/write NaN and > Infinity. > For example: > {noformat} > select convert_fromJSON('{"key": NaN}') from (values(1)); will result with > JsonParseException, but > select convert_fromJSON('{"key": NaN}', true) from (values(1)); will parse > NaN as a number. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)