Naveen, Can you please verify if you create these tables as external tables the results are correct ? In case of managed tables, the assumption is that there is a 1:1 mapping between tables and the locations and all update to the table are through hive. With that assumption, it relies on stats to return results in queries like count(*) .
On Tue, Aug 30, 2016 at 4:18 AM, Abhishek Somani <abhisheksoman...@gmail.com > wrote: > For the 2nd table(after both inserts are over), isn't the return count > expected to be 4? In that case, isn't the the bug that the count was > returned wrong(maybe from the stats as mentioned) rather the fact that > another table was allowed to be created at the same location? > > I might be very wrong, so pardon my ignorance. > > On Tue, Aug 30, 2016 at 3:06 AM, Alan Gates <alanfga...@gmail.com> wrote: > > > Note that Hive doesn’t track individual files, just which directory a > > table stores its files in. So we wouldn’t expect this to work. The bug > is > > more that Hive doesn’t detect that two tables are trying to use the same > > directory. I’m not sure we’re anxious to fix this since it would mean > when > > creating a table Hive would need to search all existing tables to make > sure > > none of them are using the directory the new table wants to use. > > > > Alan. > > > > > On Aug 30, 2016, at 04:17, Sergey Shelukhin <ser...@hortonworks.com> > > wrote: > > > > > > This is a bug, or rather an unexpected usage. I suspect the correct > count > > > value is coming from statistics. > > > Can you file a JIRA? > > > > > > On 16/8/29, 00:51, "naveen mahadevuni" <nmahadev...@gmail.com> wrote: > > > > > >> Hi, > > >> > > >> Is the following behavior a bug? I believe at least one part of it is > a > > >> bug. I created two Hive tables at the same location and inserted rows > in > > >> two tables. count(*) returns the correct count for each individual > > table, > > >> but SELECT * on one tables reads the rows from other table files too. > > >> > > >> CREATE TABLE test1 (col1 INT, col2 INT) > > >> stored as orc > > >> LOCATION '/apps/hive/warehouse/test1'; > > >> > > >> insert into test1 values(1,2); > > >> insert into test1 values(3,4); > > >> > > >> hive> select count(*) from test1; > > >> OK > > >> 2 > > >> Time taken: 0.177 seconds, Fetched: 1 row(s) > > >> > > >> > > >> CREATE TABLE test2 (col1 INT, col2 INT) > > >> stored as orc > > >> LOCATION '/apps/hive/warehouse/test1'; > > >> > > >> insert into test2 values(1,2); > > >> insert into test2 values(3,4); > > >> > > >> hive> select count(*) from test2; > > >> OK > > >> 2 > > >> Time taken: 2.683 seconds, Fetched: 1 row(s) > > >> > > >> -- SELECT * fetches 4 records where as COUNT(*) above returns count of > > 2. > > >> > > >> hive> select * from test2; > > >> OK > > >> 1 2 > > >> 3 4 > > >> 1 2 > > >> 3 4 > > >> Time taken: 0.107 seconds, Fetched: 4 row(s) > > >> hive> select * from test1; > > >> OK > > >> 1 2 > > >> 3 4 > > >> 1 2 > > >> 3 4 > > >> Time taken: 0.054 seconds, Fetched: 4 row(s) > > >> > > >> Thanks, > > >> Naveen > > > > > > > >