[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files
[ https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933866#comment-16933866 ] ASF GitHub Bot commented on DRILL-7177: --- cgivre commented on issue #1749: DRILL-7177: Format Plugin for Excel Files URL: https://github.com/apache/drill/pull/1749#issuecomment-533351781 @vdiravka, @arina-ielchiieva this PR is ready for review with the EVF. Could someone please take a look? Thx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Format Plugin for Excel Files > - > > Key: DRILL-7177 > URL: https://issues.apache.org/jira/browse/DRILL-7177 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.17.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > This pull request adds the functionality which enables Drill to query > Microsoft Excel files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (DRILL-7381) Query to a map field returns nulls with hive native reader
[ https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7381: --- Assignee: Igor Guzenko > Query to a map field returns nulls with hive native reader > -- > > Key: DRILL-7381 > URL: https://issues.apache.org/jira/browse/DRILL-7381 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_nation.n_region.r_name from hive.customer_complex t limit 5 > {code} > *Expected results:* > {noformat} > AFRICA > MIDDLE EAST > AMERICA > MIDDLE EAST > AMERICA > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > {noformat} > *Workaround:* > {code:sql} > set store.hive.optimize_scan_with_native_readers = false; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7381) Query to a map field returns nulls with hive native reader
Anton Gozhiy created DRILL-7381: --- Summary: Query to a map field returns nulls with hive native reader Key: DRILL-7381 URL: https://issues.apache.org/jira/browse/DRILL-7381 Project: Apache Drill Issue Type: Bug Affects Versions: 1.17.0 Reporter: Anton Gozhiy Attachments: customer_complex.zip *Query:* {code:sql} select t.c_nation.n_region.r_name from hive.customer_complex t limit 5 {code} *Expected results:* {noformat} AFRICA MIDDLE EAST AMERICA MIDDLE EAST AMERICA {noformat} *Actual results:* {noformat} null null null null null {noformat} *Workaround:* {code:sql} set store.hive.optimize_scan_with_native_readers = false; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (DRILL-7380) Query of a field inside of an array of structs returns null
[ https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Guzenko reassigned DRILL-7380: --- Assignee: Igor Guzenko > Query of a field inside of an array of structs returns null > --- > > Key: DRILL-7380 > URL: https://issues.apache.org/jira/browse/DRILL-7380 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.17.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Attachments: customer_complex.zip > > > *Query:* > {code:sql} > select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10; > {code} > *Expected results (given from Hive):* > {noformat} > OK > O > F > NULL > O > O > NULL > O > O > NULL > F > {noformat} > *Actual results:* > {noformat} > null > null > null > null > null > null > null > null > null > null > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7380) Query of a field inside of an array of structs returns null
Anton Gozhiy created DRILL-7380: --- Summary: Query of a field inside of an array of structs returns null Key: DRILL-7380 URL: https://issues.apache.org/jira/browse/DRILL-7380 Project: Apache Drill Issue Type: Bug Affects Versions: 1.17.0 Reporter: Anton Gozhiy Attachments: customer_complex.zip *Query:* {code:sql} select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10; {code} *Expected results (given from Hive):* {noformat} OK O F NULL O O NULL O O NULL F {noformat} *Actual results:* {noformat} null null null null null null null null null null {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (DRILL-2000) Hive generated parquet files with maps show up in drill as map(key value)
[ https://issues.apache.org/jira/browse/DRILL-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub resolved DRILL-2000. --- Fix Version/s: (was: Future) Resolution: Fixed Fixed in scope of DRILL-7096 > Hive generated parquet files with maps show up in drill as map(key value) > - > > Key: DRILL-2000 > URL: https://issues.apache.org/jira/browse/DRILL-2000 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 0.7.0 >Reporter: Ramana Inukonda Nagaraj >Assignee: Bohdan Kazydub >Priority: Major > > Created a parquet file in hive having the following DDL > hive> desc alltypesparquet; > OK > c1 int > c2 boolean > c3 double > c4 string > c5 array > c6 map > c7 map > c8 struct > c9 tinyint > c10 smallint > c11 float > c12 bigint > c13 array> > c15 struct> > c16 array,n:int>> > Time taken: 0.076 seconds, Fetched: 15 row(s) > Columns which are maps such as c6 map > show up as > 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`; > ++ > | c6 | > ++ > | {"map":[]} | > | {"map":[]} | > | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} | > ++ > 3 rows selected (0.078 seconds) > hive> select c6 from alltypesparquet; > NULL > NULL > {1:"x",2:"y"} > Ignore the wrong values, I have raised DRILL-1997 for the same. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7379) Planning error
benj created DRILL-7379: --- Summary: Planning error Key: DRILL-7379 URL: https://issues.apache.org/jira/browse/DRILL-7379 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.16.0 Reporter: benj With data as: {code:sql} SELECT id, tags FROM `example_parquet`; +++ | id |tags| +++ | 7b8808 | ["peexe","signed","overlay"] | | 55a4ae | ["peexe","signed","upx","overlay"] | +++ {code} The next request is OK {code:sql} SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) LIMIT 2; +++ | id | tag | +++ | 55a4ae | peexe | | 55a4ae | signed | +++ {code} But unexpectedly, the next query failed: {code:sql} SELECT tag, count(*) FROM ( SELECT flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id ) ) GROUP BY tag; Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED. /* Or other error with another set of data : Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index 0. Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null.. */ {code} These errors are incomprehensible because, the aggregate is on VARCHAR. More, the request works if decomposed in 2 request with with the creation of an intermediate table like below: {code:sql} CREATE TABLE `tmp.parquet` AS ( SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM `example_parquet` GROUP BY id )); SELECT tag, count(*) c FROM `tmp_parquet` GROUP BY tag; +-+---+ | tag | c | +-+---+ | overlay | 2 | | peexe | 2 | | signed | 2 | | upx | 1 | +-+---+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7378) Allowing less outer/inner select
benj created DRILL-7378: --- Summary: Allowing less outer/inner select Key: DRILL-7378 URL: https://issues.apache.org/jira/browse/DRILL-7378 Project: Apache Drill Issue Type: Improvement Components: Functions - Drill Affects Versions: 1.16.0 Reporter: benj Currently, it's not possible to exploit the result of some function like _kvgen_ or _flatten_ and an inner/outer select is needed for some operations. It will be easiest to allow the use of the results of theses functions directly. Example: {code:sql} CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') j; +--+ |j | +--+ | {"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}} | +--+ {code} But it's not possible to simply do {code:sql} SELECT kvgen(CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON')); Error: PLAN ERROR: Failure while materializing expression in constant expression evaluator [CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}', 'JSON')]. Errors: Error in expression at index -1. Error: Only ProjectRecordBatch could have complex writer function. You are using complex writer function convert_fromJSON in a non-project operation!. Full expression: --UNKNOWN EXPRESSION--. {code} It's only possible to do {code:sql} SELECT kvgen(c) AS k FROM (SELECT CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') c); +--+ |k | +--+ | [{"key":"Tuesday","value":{"close":"22:00"}},{"key":"Friday","value":{"close":"23:00"}}] | +--+ {code} Its possible to cascade with flatten: {code:sql} SELECT flatten(kvgen(c)) f FROM (SELECT CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') c); +-+ | f | +-+ | {"key":"Tuesday","value":{"close":"22:00"}} | | {"key":"Friday","value":{"close":"23:00"}} | +-+ {code} But it's not possible to use directly use the result of flatten to select key or value {code:sql} SELECT (flatten(kvgen(r.c))).key f FROM (SELECT CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') c) r; Error: VALIDATION ERROR: From line 1, column 9 to line 1, column 27: Incompatible types {code} You have to inner/outer select like: {code:sql} SELECT r.f.key k FROM (SELECT flatten(kvgen(c)) f FROM (SELECT CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') c)) r; +-+ |k| +-+ | Tuesday | | Friday | +-+ {code} it would be useful to be able to write/read shorter and simpler queries with limiting when it's possible the need of inner/outer SELECT. -- This message was sent by Atlassian Jira (v8.3.4#803005)