[ 
https://issues.apache.org/jira/browse/DRILL-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253653#comment-17253653
 ] 

ian commented on DRILL-7825:
----------------------------

-rw-r--r-- 1 *** None {color:#FF0000}83548516{color} Dec 22 11:45 
uuid-string.parquet
-rw-r--r-- 1 *** None {color:#FF0000}39575254{color} Dec 22 11:45 uuid.parquet
>From my simplistic test, penalty is about 111%.  My mistake.. above, 
>theoretical penalty would be 125% , not 225%.  I think this outcomes results 
>from the random nature of UUIDs.  Because they are pseudo-random, the 
>resulting strings don't compressed very well, probably only slightly better 
>than the binary.
Good thought, but probably not a practical work around for any needs at scale.  
Thanks again and best.

> Error: SYSTEM ERROR: RuntimeException: Unknown logical type <LogicalType 
> UUID:UUIDType()>
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-7825
>                 URL: https://issues.apache.org/jira/browse/DRILL-7825
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.17.0
>         Environment: Windows 10 single local node.
>            Reporter: ian
>            Assignee: Vitalii Diravka
>            Priority: Critical
>             Fix For: 1.19.0
>
>         Attachments: uuid.parquet
>
>
> Parquet logical type UUID fails on read.  Only workaround is to store as 
> text, a 125% penalty. 
> Here is the schema dump for the attached test parquet file.  I can read the 
> file okay from R and natively through C++.
> {code:java}
> 3961 $ parquet-dump-schema uuid.parquet
> required group field_id=0 schema {
>  required fixed_len_byte_array(16) field_id=1 uuid_req1 (UUID);
>  optional fixed_len_byte_array(16) field_id=2 uuid_opt1 (UUID);
>  required fixed_len_byte_array(16) field_id=3 uuid_req2 (UUID);
> }{code}
> I'm new.. I put this as MAJOR from reading the severity definitions, but 
> gladly defer to those who know better how to classify.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to