I've been experimenting with transferring information from Oracle RDBMS to 
Parquet files, and then querying the Parquet using Drill.  Oracle tables often 
contain a lot of DECIMAL information, since that seems to be a very common way 
to represent numbers there.  My initial approach was to archive the DECIMAL 
values to variable length "bytes" in the Parquet.  However, I found that Drill 
does  not expect to see DECIMAL values as "bytes" (variable length), but rather 
as one of the fixed length types.  See 
https://issues.apache.org/jira/browse/DRILL-4184 for more info on this.

I then tried archiving DECIMAL values to fixed length byte arrays in Parquet 
(rather than variable length ones), with the length being the full precision 
(size) of each DECIMAL field.  This increases my Parquet file sizes by about 
10%, but now Drill queries with a where clause on the DECIMAL values work, but 
with some oddities as shown here:

0: jdbc:drill:zk=local> select acct_no, curbal from dfs.`c:/dao/ 
acct2M_after.parquet` where acct_no = 70000008.0;
+-----------+----------+
|  acct_no  |  curbal  |
+-----------+----------+
| 70000008  | 7000.00  |
+-----------+----------+
1 row selected (25.411 seconds)
0: jdbc:drill:zk=local> select acct_no, curbal from 
dfs.`c:/dao/acct2M_after.parquet` where acct_no = 70000008;
+----------+---------+
| acct_no  | curbal  |
+----------+---------+
+----------+---------+
No rows selected (24.331 seconds)
0: jdbc:drill:zk=local> select acct_no, curbal from 
dfs.`c:/dao/acct2M_after.parquet` where cast(acct_no AS DECIMAL) = 70000008;
+-----------+----------+
|  acct_no  |  curbal  |
+-----------+----------+
| 70000008  | 7000.00  |
+-----------+----------+
1 row selected (25.742 seconds)
0: jdbc:drill:zk=local>

Note how the query only finds the row if the value in the where clause has ".0" 
appended, or the value is explicitly casted to DECIMAL.

Two questions:

1)      Are there any plans to allow for the more efficient storage of DECIMAL 
as "bytes" (rather than fixed length byte arrays, at the full/maximum 
precision)?

2)      Is it expected that the ".0" must be appended in the where clause (or 
an explicit cast applied to DECIMAL), or should I create a JIRA for that?  
Intuitively speaking, a number is a number (whether it's decimal or not) so 
this behavior was a bit unexpected to me.

Thanks,
Dave Oshinsky





***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************

Reply via email to