[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-09-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933866#comment-16933866
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-533351781
 
 
   @vdiravka, @arina-ielchiieva this PR is ready for review with the EVF.   
Could someone please take a look?  Thx
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7381) Query to a map field returns nulls with hive native reader

2019-09-19 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7381:
---

Assignee: Igor Guzenko

> Query to a map field returns nulls with hive native reader
> --
>
> Key: DRILL-7381
> URL: https://issues.apache.org/jira/browse/DRILL-7381
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_nation.n_region.r_name from hive.customer_complex t limit 5
> {code}
> *Expected results:*
> {noformat}
> AFRICA
> MIDDLE EAST
> AMERICA
> MIDDLE EAST
> AMERICA
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> {noformat}
> *Workaround:*
> {code:sql}
> set store.hive.optimize_scan_with_native_readers = false;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7381) Query to a map field returns nulls with hive native reader

2019-09-19 Thread Anton Gozhiy (Jira)
Anton Gozhiy created DRILL-7381:
---

 Summary: Query to a map field returns nulls with hive native reader
 Key: DRILL-7381
 URL: https://issues.apache.org/jira/browse/DRILL-7381
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Anton Gozhiy
 Attachments: customer_complex.zip

*Query:*
{code:sql}
select t.c_nation.n_region.r_name from hive.customer_complex t limit 5
{code}

*Expected results:*
{noformat}
AFRICA
MIDDLE EAST
AMERICA
MIDDLE EAST
AMERICA
{noformat}

*Actual results:*
{noformat}
null
null
null
null
null
{noformat}

*Workaround:*

{code:sql}
set store.hive.optimize_scan_with_native_readers = false;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7380) Query of a field inside of an array of structs returns null

2019-09-19 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7380:
---

Assignee: Igor Guzenko

> Query of a field inside of an array of structs returns null
> ---
>
> Key: DRILL-7380
> URL: https://issues.apache.org/jira/browse/DRILL-7380
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: customer_complex.zip
>
>
> *Query:*
> {code:sql}
> select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10;
> {code}
> *Expected results (given from Hive):*
> {noformat}
> OK
> O
> F
> NULL
> O
> O
> NULL
> O
> O
> NULL
> F
> {noformat}
> *Actual results:*
> {noformat}
> null
> null
> null
> null
> null
> null
> null
> null
> null
> null
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7380) Query of a field inside of an array of structs returns null

2019-09-19 Thread Anton Gozhiy (Jira)
Anton Gozhiy created DRILL-7380:
---

 Summary: Query of a field inside of an array of structs returns 
null
 Key: DRILL-7380
 URL: https://issues.apache.org/jira/browse/DRILL-7380
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Anton Gozhiy
 Attachments: customer_complex.zip

*Query:*
{code:sql}
select t.c_orders[0].o_orderstatus from hive.customer_complex t limit 10;
{code}

*Expected results (given from Hive):*
{noformat}
OK
O
F
NULL
O
O
NULL
O
O
NULL
F
{noformat}

*Actual results:*
{noformat}
null
null
null
null
null
null
null
null
null
null
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (DRILL-2000) Hive generated parquet files with maps show up in drill as map(key value)

2019-09-19 Thread Bohdan Kazydub (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub resolved DRILL-2000.
---
Fix Version/s: (was: Future)
   Resolution: Fixed

Fixed in scope of DRILL-7096

> Hive generated parquet files with maps show up in drill as map(key value)
> -
>
> Key: DRILL-2000
> URL: https://issues.apache.org/jira/browse/DRILL-2000
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 0.7.0
>Reporter: Ramana Inukonda Nagaraj
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet; 
> OK
> c1 int 
> c2 boolean 
> c3 double 
> c4 string 
> c5 array 
> c6 map 
> c7 map 
> c8 struct
> c9 tinyint 
> c10 smallint 
> c11 float 
> c12 bigint 
> c13 array>  
> c15 struct>
> c16 array,n:int>> 
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> Columns which are maps such as c6 map 
> show up as 
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> ++
> | c6 |
> ++
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> ++
> 3 rows selected (0.078 seconds)
> hive> select c6 from alltypesparquet;   
> NULL
> NULL
> {1:"x",2:"y"}
> Ignore the wrong values, I have raised DRILL-1997 for the same. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7379) Planning error

2019-09-19 Thread benj (Jira)
benj created DRILL-7379:
---

 Summary: Planning error
 Key: DRILL-7379
 URL: https://issues.apache.org/jira/browse/DRILL-7379
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.16.0
Reporter: benj


With data as:
{code:sql}
SELECT id, tags FROM `example_parquet`;
+++
|   id   |tags|
+++
| 7b8808 | ["peexe","signed","overlay"]   |
| 55a4ae | ["peexe","signed","upx","overlay"] |
+++
{code}
The next request is OK
{code:sql}
SELECT id, flatten(tags) tag 
FROM ( 
  SELECT id, any_value(tags) tags 
  FROM `example_parquet` 
  GROUP BY id 
) LIMIT 2;
+++
|   id   |  tag   |
+++
| 55a4ae | peexe  |
| 55a4ae | signed |
+++
{code}
But unexpectedly, the next query failed:
{code:sql}
SELECT tag, count(*) 
FROM (
  SELECT flatten(tags) tag 
  FROM (
SELECT id, any_value(tags) tags 
FROM `example_parquet`
GROUP BY id 
  )
) GROUP BY tag;
Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or 
repeated scalar type should not be used in group by, order by or in a 
comparison operator. Drill does not support compare between MAP:REPEATED and 
MAP:REPEATED.

/* Or other error with another set of data :
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index 0.  Error: Missing function implementation: 
[hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
*/
{code}
These errors are incomprehensible because, the aggregate is on VARCHAR.

More, the request works if decomposed in 2 request with with the creation of an 
intermediate table like below:
{code:sql}
CREATE TABLE `tmp.parquet` AS (
  SELECT id, flatten(tags) tag 
  FROM ( 
SELECT id, any_value(tags) tags 
FROM `example_parquet` 
GROUP BY id 
));

SELECT tag, count(*) c FROM `tmp_parquet` GROUP BY tag;
+-+---+
|   tag   | c |
+-+---+
| overlay | 2 |
| peexe   | 2 |
| signed  | 2 |
| upx | 1 |
+-+---+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7378) Allowing less outer/inner select

2019-09-19 Thread benj (Jira)
benj created DRILL-7378:
---

 Summary: Allowing less outer/inner select
 Key: DRILL-7378
 URL: https://issues.apache.org/jira/browse/DRILL-7378
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.16.0
Reporter: benj


Currently, it's not possible to exploit the result of some function like 
_kvgen_ or _flatten_ and an inner/outer select is needed for some operations.
It will be easiest to allow the use of the results of theses functions directly.
Example:
{code:sql}
CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') 
j;
+--+
|j |
+--+
| {"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}} |
+--+
{code}
But it's not possible to simply do
{code:sql}
SELECT 
kvgen(CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON'));
Error: PLAN ERROR: Failure while materializing expression in constant 
expression evaluator 
[CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}', 
'JSON')].  Errors: 
Error in expression at index -1.  Error: Only ProjectRecordBatch could have 
complex writer function. You are using complex writer function convert_fromJSON 
in a non-project operation!.  Full expression: --UNKNOWN EXPRESSION--.
{code}
It's only possible to do
{code:sql}
SELECT kvgen(c) AS k FROM (SELECT 
CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') 
c);
+--+
|k  
   |
+--+
| 
[{"key":"Tuesday","value":{"close":"22:00"}},{"key":"Friday","value":{"close":"23:00"}}]
 |
+--+
{code}

Its possible to cascade with flatten:
{code:sql}
SELECT flatten(kvgen(c)) f FROM (SELECT 
CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') 
c);
+-+
|  f  |
+-+
| {"key":"Tuesday","value":{"close":"22:00"}} |
| {"key":"Friday","value":{"close":"23:00"}}  |
+-+
{code}
But it's not possible to use directly use the result of flatten to select key 
or value
{code:sql}
SELECT (flatten(kvgen(r.c))).key f FROM (SELECT 
CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') 
c) r;
Error: VALIDATION ERROR: From line 1, column 9 to line 1, column 27: 
Incompatible types
{code}
You have to inner/outer select like:
{code:sql}
SELECT r.f.key k FROM (SELECT flatten(kvgen(c)) f FROM (SELECT 
CONVERT_FROM('{"Tuesday":{"close":"22:00"},"Friday":{"close":"23:00"}}','JSON') 
c)) r;
+-+
|k|
+-+
| Tuesday |
| Friday  |
+-+
{code}

it would be useful to be able to write/read shorter and simpler queries with 
limiting when it's possible the need of inner/outer SELECT.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)