Note that Hive doesn’t track individual files, just which directory a table stores its files in. So we wouldn’t expect this to work. The bug is more that Hive doesn’t detect that two tables are trying to use the same directory. I’m not sure we’re anxious to fix this since it would mean when creating a table Hive would need to search all existing tables to make sure none of them are using the directory the new table wants to use.
Alan. > On Aug 30, 2016, at 04:17, Sergey Shelukhin <ser...@hortonworks.com> wrote: > > This is a bug, or rather an unexpected usage. I suspect the correct count > value is coming from statistics. > Can you file a JIRA? > > On 16/8/29, 00:51, "naveen mahadevuni" <nmahadev...@gmail.com> wrote: > >> Hi, >> >> Is the following behavior a bug? I believe at least one part of it is a >> bug. I created two Hive tables at the same location and inserted rows in >> two tables. count(*) returns the correct count for each individual table, >> but SELECT * on one tables reads the rows from other table files too. >> >> CREATE TABLE test1 (col1 INT, col2 INT) >> stored as orc >> LOCATION '/apps/hive/warehouse/test1'; >> >> insert into test1 values(1,2); >> insert into test1 values(3,4); >> >> hive> select count(*) from test1; >> OK >> 2 >> Time taken: 0.177 seconds, Fetched: 1 row(s) >> >> >> CREATE TABLE test2 (col1 INT, col2 INT) >> stored as orc >> LOCATION '/apps/hive/warehouse/test1'; >> >> insert into test2 values(1,2); >> insert into test2 values(3,4); >> >> hive> select count(*) from test2; >> OK >> 2 >> Time taken: 2.683 seconds, Fetched: 1 row(s) >> >> -- SELECT * fetches 4 records where as COUNT(*) above returns count of 2. >> >> hive> select * from test2; >> OK >> 1 2 >> 3 4 >> 1 2 >> 3 4 >> Time taken: 0.107 seconds, Fetched: 4 row(s) >> hive> select * from test1; >> OK >> 1 2 >> 3 4 >> 1 2 >> 3 4 >> Time taken: 0.054 seconds, Fetched: 4 row(s) >> >> Thanks, >> Naveen >