Yeah - that's really really surprising. The row count is reported using hadoop counters - we haven't seen any discrepancies so far (we use hadoop-17) - but that's one possibility.
But the count(1) is the more important one to resolve - that should definitely be correct. Are the count results non-deterministic as well? -----Original Message----- From: Bob Schulze [mailto:b.schu...@ecircle.com] Sent: Friday, March 20, 2009 7:44 AM To: hive-user@hadoop.apache.org Subject: getting different row counts on each import Hi all, When I do a "from t1 insert overwrite table t14 select *;" multiple times, I get different responses every time: "Loading data to table t14 81324734 Rows loaded to t14" This row number varies for every attempt. A "select count(1) from t14" brings yet a different number of (here 84968986). The record count in the original file is 82518636, but even if some records are dropped during import, perhaps because of format errors, I'd expect to get the same number on every attempt.. What could be wrong? ?!