Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE 
is obtained by means of Hadoop counters, which is not very reliable and subject 
to inaccuracy due to failed tasks and speculations.

If you are using the latest trunk, you may want to try the feature of 
automatically gathering statistics during INSERT OVERWRITE TABLE. You need to 
set up a MySQL/HBase for partial stats publishing/aggregation.  You can find 
the design doc at http://wiki.apache.org/hadoop/Hive/StatsDev.

Note that stats is still in this experimental stage. So please feel free to 
report bugs/suggestions here or to 
hive-...@hadoop.apache.org<mailto:hive-...@hadoop.apache.org>.

On Oct 1, 2010, at 10:30 AM, Ping Zhu wrote:

I had such issues on different versions of hadoop/hive: The version of 
hadoop/hive I am using now is hadoop 0.20.2/hive 0.7. The version of 
hadoop/hive I once used is hadoop 0.20.0/hive 0.5

Ping

On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu 
<p...@sharethis.com<mailto:p...@sharethis.com>> wrote:
Hi,

  I ran a simple Hive query inserting data into a target table from a source 
table. The number of records loaded to the target table (say number A), which 
is returned by running this query, is different with the number (say number B) 
returned by running a query "select count(1) from target". I checked the number 
of rows in target table's HDFS files by running command "hadoop fs -cat 
/root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. I 
believe number B is the actual number of rows in target table.

  I had this issue intermittently. Any comments?

  Thank you very much.

  Ping


Reply via email to