[ 
https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453811#comment-15453811
 ] 

Xuefu Zhang commented on HIVE-14568:
------------------------------------

I think this is mostly by design. You have two columns: decimal(p1, s1) and 
decimal(p2,s2). We need to statically derive the type for the product of the 
two columns based on s = s1 + s2 and p1 = p1 + p2 +1. since your s1 = 28 and s2 
= 10 in your case, then s = 38.  Similarly, p = 38 (which is the max). Thus, 
the result column has a type decimal(38, 38). This basically means that the 
result cannot have any integer part. On the other hand, if the result type is 
set as (38, 18), I can certainly construct example data which shows that the 
production of the two column loses the scale that I was expecting.

I understand that NULL may have been surprising to people. However, I wonder 
why a column defined as decimal (38,28) to be used to store data like 1.2, 
1.44, etc. Is it reasonable to have a smaller precision/scale?

This sounds like a data modeling issue. the metadata needs to closely define 
the data.

It's a good point that an ERROR here might be better so that NULL doesn't slick 
in unnoticed. I believe that in MySQL there is a strict mode, which, when on, 
will generate error in this case. We don't have such mode defined in Hive, but 
it may make sense to introduce such a mode.

> Hive Decimal Returns NULL
> -------------------------
>
>                 Key: HIVE-14568
>                 URL: https://issues.apache.org/jira/browse/HIVE-14568
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.0.0, 1.2.0
>         Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0
>            Reporter: gurmukh singh
>            Assignee: Xuefu Zhang
>
> Hi
> I was under the impression that the bug: 
> https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the 
> same issue in Hive 1.0 and hive 1.2 as well.
> hive> desc mul_table;
> OK
> prc                           decimal(38,28)
> vol                           decimal(38,10)
> Time taken: 0.068 seconds, Fetched: 2 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table;
> OK
> 1.2           200     NULL
> 1.44          200     NULL
> 2.14          100     NULL
> 3.004         50      NULL
> 1.2           200     NULL
> Time taken: 0.048 seconds, Fetched: 5 row(s)
> Rather then returning NULL, it should give error or round off.
> I understand that, I can use Double instead of decimal or can cast it, but 
> still returning "Null" will make many things go unnoticed.
> hive> desc mul_table2;
> OK
> prc                           double
> vol                           decimal(14,10)
> Time taken: 0.049 seconds, Fetched: 2 row(s)
> hive> select * from mul_table2;
> OK
> 1.4           200
> 1.34          200
> 7.34          100
> 7454533.354544        100
> Time taken: 0.028 seconds, Fetched: 4 row(s)
> hive> select prc, vol, prc*vol  as cost from mul_table3;
> OK
> 7.34          100     734.0
> 7.34          1000    7340.0
> 1.0004        1000    1000.4
> 7454533.354544        100     7.454533354544E8       <----- Wrong result
> 7454533.354544        1000    7.454533354544E9       <----- Wrong result
> Time taken: 0.025 seconds, Fetched: 5 row(s)
> Casting:
> hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from 
> mul_table3;
> OK
> 7.34          100     NULL
> 7.34          1000    NULL
> 1.0004        1000    NULL
> 7454533.354544        100     NULL
> 7454533.354544        1000    NULL
> Time taken: 0.033 seconds, Fetched: 5 row(s)
> hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from 
> mul_table3;
> OK
> 7.34          100     734
> 7.34          1000    7340
> 1.0004        1000    1000.4
> 7454533.354544        100     745453335.4544
> 7454533.354544        1000    7454533354.544
> Time taken: 0.026 seconds, Fetched: 5 row(s) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to