RE: getting different row counts on each import

Joydeep Sen Sarma Fri, 20 Mar 2009 09:10:45 -0700

Yeah - that's really really surprising.

The row count is reported using hadoop counters - we haven't seen any 
discrepancies so far (we use hadoop-17) - but that's one possibility.

But the count(1) is the more important one to resolve - that should definitely 
be correct. Are the count results non-deterministic as well?

-----Original Message-----
From: Bob Schulze [mailto:b.schu...@ecircle.com] 
Sent: Friday, March 20, 2009 7:44 AM
To: hive-user@hadoop.apache.org
Subject: getting different row counts on each import

Hi all,

When I do a
"from t1 insert overwrite  table t14 select *;"
multiple times, I get different responses every time:

"Loading data to table t14 81324734 Rows loaded to t14"

This row number varies for every attempt. A "select count(1) from t14"
brings yet a different number of (here 84968986).

The record count in the original file is 82518636, but even if some
records are dropped during import, perhaps because of format errors, I'd
expect to get the same number on every attempt..

What could be wrong?

?!

RE: getting different row counts on each import

Reply via email to