Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE is obtained by means of Hadoop counters, which is not very reliable and subject to inaccuracy due to failed tasks and speculations.
If you are using the latest trunk, you may want to try the feature of automatically gathering statistics during INSERT OVERWRITE TABLE. You need to set up a MySQL/HBase for partial stats publishing/aggregation. You can find the design doc at http://wiki.apache.org/hadoop/Hive/StatsDev. Note that stats is still in this experimental stage. So please feel free to report bugs/suggestions here or to hive-...@hadoop.apache.org<mailto:hive-...@hadoop.apache.org>. On Oct 1, 2010, at 10:30 AM, Ping Zhu wrote: I had such issues on different versions of hadoop/hive: The version of hadoop/hive I am using now is hadoop 0.20.2/hive 0.7. The version of hadoop/hive I once used is hadoop 0.20.0/hive 0.5 Ping On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu <p...@sharethis.com<mailto:p...@sharethis.com>> wrote: Hi, I ran a simple Hive query inserting data into a target table from a source table. The number of records loaded to the target table (say number A), which is returned by running this query, is different with the number (say number B) returned by running a query "select count(1) from target". I checked the number of rows in target table's HDFS files by running command "hadoop fs -cat /root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. I believe number B is the actual number of rows in target table. I had this issue intermittently. Any comments? Thank you very much. Ping