Naveen,
Can you please verify if you create these tables as external tables the
results are correct ?
In case of managed tables, the assumption is that there is a 1:1 mapping
between tables and the locations and all update to the table are through
hive. With that assumption, it relies on stats to return results in queries
like count(*) .


On Tue, Aug 30, 2016 at 4:18 AM, Abhishek Somani <abhisheksoman...@gmail.com
> wrote:

> For the 2nd table(after both inserts are over), isn't the return count
> expected to be 4? In that case, isn't the the bug that the count was
> returned wrong(maybe from the stats as mentioned) rather the fact that
> another table was allowed to be created at the same location?
>
> I might be very wrong, so pardon my ignorance.
>
> On Tue, Aug 30, 2016 at 3:06 AM, Alan Gates <alanfga...@gmail.com> wrote:
>
> > Note that Hive doesn’t track individual files, just which directory a
> > table stores its files in.  So we wouldn’t expect this to work.  The bug
> is
> > more that Hive doesn’t detect that two tables are trying to use the same
> > directory.  I’m not sure we’re anxious to fix this since it would mean
> when
> > creating a table Hive would need to search all existing tables to make
> sure
> > none of them are using the directory the new table wants to use.
> >
> > Alan.
> >
> > > On Aug 30, 2016, at 04:17, Sergey Shelukhin <ser...@hortonworks.com>
> > wrote:
> > >
> > > This is a bug, or rather an unexpected usage. I suspect the correct
> count
> > > value is coming from statistics.
> > > Can you file a JIRA?
> > >
> > > On 16/8/29, 00:51, "naveen mahadevuni" <nmahadev...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Is the following behavior a bug? I believe at least one part of it is
> a
> > >> bug. I created two Hive tables at the same location and inserted rows
> in
> > >> two tables. count(*) returns the correct count for each individual
> > table,
> > >> but SELECT * on one tables reads the rows from other table files too.
> > >>
> > >> CREATE TABLE test1 (col1 INT, col2 INT)
> > >> stored as orc
> > >> LOCATION '/apps/hive/warehouse/test1';
> > >>
> > >> insert into test1 values(1,2);
> > >> insert into test1 values(3,4);
> > >>
> > >> hive> select count(*) from test1;
> > >> OK
> > >> 2
> > >> Time taken: 0.177 seconds, Fetched: 1 row(s)
> > >>
> > >>
> > >> CREATE TABLE test2 (col1 INT, col2 INT)
> > >> stored as orc
> > >> LOCATION '/apps/hive/warehouse/test1';
> > >>
> > >> insert into test2 values(1,2);
> > >> insert into test2 values(3,4);
> > >>
> > >> hive> select count(*) from test2;
> > >> OK
> > >> 2
> > >> Time taken: 2.683 seconds, Fetched: 1 row(s)
> > >>
> > >> -- SELECT * fetches 4 records where as COUNT(*) above returns count of
> > 2.
> > >>
> > >> hive> select * from test2;
> > >> OK
> > >> 1       2
> > >> 3       4
> > >> 1       2
> > >> 3       4
> > >> Time taken: 0.107 seconds, Fetched: 4 row(s)
> > >> hive> select * from test1;
> > >> OK
> > >> 1       2
> > >> 3       4
> > >> 1       2
> > >> 3       4
> > >> Time taken: 0.054 seconds, Fetched: 4 row(s)
> > >>
> > >> Thanks,
> > >> Naveen
> > >
> >
> >
>

Reply via email to