[ https://issues.apache.org/jira/browse/HIVE-5774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szehon Ho updated HIVE-5774: ---------------------------- Assignee: (was: Szehon Ho) > INSERT OVERWRITE DYNAMIC PARTITION on LARGE DATA > ------------------------------------------------ > > Key: HIVE-5774 > URL: https://issues.apache.org/jira/browse/HIVE-5774 > Project: Hive > Issue Type: Bug > Components: Database/Schema > Environment: debian 6.0.7 > Reporter: Danny Teok > Priority: Critical > Labels: dynamic, hive, insert, overwrite, partition > > After several forensic analysis, we are convinced that there is a bug when > rebuilding using dynamic partition over more than 30 days. Row counts do not > match. > In details: > Part A -- original_table > 2013-01-01; 394,755 rows > 2013-01-02; 424,448 > 2013-01-03; 427,201 > ... > 2013-10-30; 3,234,472 > Part B -- copy_of_original_table_new > 2013-01-01; 372,628 rows > 2013-01-02; 400,553 > 2013-01-03; 403,495 > ... > 2013-10-30; 2,865,877 > The query that is used to populate the original table is the same for > populating the "copy_of_original_table_new" table. When we rebuilt for 1 day, > e.g. 2013-01-01, the number of row counts of the copy_of_original_table_new > matched up exactly with orignal_table. > When we rebuilt for 7 days, the number of row counts matched up exactly. > When we rebuilt for 15 days, the number of row counts matched up exactly. > When we rebuilt for 303 days (10 months), everything fxxked up. No matches. > When we rebuilt for 35 days, 80% matched up exactly. The other 20% are out > from hundreds to tens of thousands of rows (a variance of up to 3%) > In other words, the more days that are specified in the WHERE dt BETWEEN > dateStart AND dateEnd, the dates will be out, i.e. no matching row count with > original_table. > However, of those 20% that are out, we rebuilt each of them statically with > the corresponding date. The result is astonishingly surprising -- they > matched the original_table row count! > Apologize in advance if this is not technical enough, but I hope the message > is clear. We believe there is a bug. Not sure how to check our Hive version, > but our Hadoop's version is "Hadoop 2.0.0-cdh4.1.1" > For a glimpse of the INSERT OVERWRITE sql, it's here -- > http://pastebin.com/g1qxsUm2 -- This message was sent by Atlassian JIRA (v6.1.4#6159)