Re: More than one table created at the same location

2016-08-29 Thread Alan Gates
Note that Hive doesn’t track individual files, just which directory a table 
stores its files in.  So we wouldn’t expect this to work.  The bug is more that 
Hive doesn’t detect that two tables are trying to use the same directory.  I’m 
not sure we’re anxious to fix this since it would mean when creating a table 
Hive would need to search all existing tables to make sure none of them are 
using the directory the new table wants to use.

Alan.

> On Aug 30, 2016, at 04:17, Sergey Shelukhin  wrote:
> 
> This is a bug, or rather an unexpected usage. I suspect the correct count
> value is coming from statistics.
> Can you file a JIRA?
> 
> On 16/8/29, 00:51, "naveen mahadevuni"  wrote:
> 
>> Hi,
>> 
>> Is the following behavior a bug? I believe at least one part of it is a
>> bug. I created two Hive tables at the same location and inserted rows in
>> two tables. count(*) returns the correct count for each individual table,
>> but SELECT * on one tables reads the rows from other table files too.
>> 
>> CREATE TABLE test1 (col1 INT, col2 INT)
>> stored as orc
>> LOCATION '/apps/hive/warehouse/test1';
>> 
>> insert into test1 values(1,2);
>> insert into test1 values(3,4);
>> 
>> hive> select count(*) from test1;
>> OK
>> 2
>> Time taken: 0.177 seconds, Fetched: 1 row(s)
>> 
>> 
>> CREATE TABLE test2 (col1 INT, col2 INT)
>> stored as orc
>> LOCATION '/apps/hive/warehouse/test1';
>> 
>> insert into test2 values(1,2);
>> insert into test2 values(3,4);
>> 
>> hive> select count(*) from test2;
>> OK
>> 2
>> Time taken: 2.683 seconds, Fetched: 1 row(s)
>> 
>> -- SELECT * fetches 4 records where as COUNT(*) above returns count of 2.
>> 
>> hive> select * from test2;
>> OK
>> 1   2
>> 3   4
>> 1   2
>> 3   4
>> Time taken: 0.107 seconds, Fetched: 4 row(s)
>> hive> select * from test1;
>> OK
>> 1   2
>> 3   4
>> 1   2
>> 3   4
>> Time taken: 0.054 seconds, Fetched: 4 row(s)
>> 
>> Thanks,
>> Naveen
> 



[jira] [Created] (HIVE-14667) Eliminate OrcRecordUpdate.OPERATION

2016-08-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-14667:
-

 Summary: Eliminate OrcRecordUpdate.OPERATION
 Key: HIVE-14667
 URL: https://issues.apache.org/jira/browse/HIVE-14667
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Eugene Koifman


With HIVE-14035, for acid tables with 'transactional_properties'='default', it 
should be possible to eliminate the type of event property (at least on disk) 
since each type of delta file contains exactly 1 type events.

Also, for insert deltas as well as base files, all rows have 
originalTxnId=currentTxnId.

Not sure if it's worth the effort given RLE in ORC




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14666) LEFT OUTER JOIN - ON CLAUSE

2016-08-29 Thread Dharmendra Shavkani (JIRA)
Dharmendra Shavkani created HIVE-14666:
--

 Summary: LEFT OUTER JOIN - ON CLAUSE
 Key: HIVE-14666
 URL: https://issues.apache.org/jira/browse/HIVE-14666
 Project: Hive
  Issue Type: Bug
  Components: Beeline, CLI
Affects Versions: 1.1.0
Reporter: Dharmendra Shavkani


When we execute below SQL it is failing in HIVE.

SELECT T3.facility_name   AS Facility_Name,
 Count(DISTINCT ORDERS_SRL.tc_order_id) AS Count_of_Orders,
 SUM(order_line_item.order_qty) AS Order_Line_Quantity
FROM   order_line_item ORDER_LINE_ITEM  join orders ORDERS_SRL ON 
(order_line_item.order_id = ORDERS_SRL.order_id )
left outer join (facility T2  join facility_alias T3 ON T2.facility_id = 
T3.facility_id)   ON  (ORDERS_SRL.o_facility_id = T2.facility_id)
GROUP  BY T3.facility_name;

Error --> Error: Error while compiling statement: FAILED: ParseException line 
5:97 cannot recognize input near 'ON' 'ORDERS_SRL' '.' in expression 
specification (state=42000,code=4)


Same above SQL will work if we re-write as below.

Working SQL

SELECT TAB2.Facility_Name,TAB1.Count_of_Orders,TAB1.Order_Line_Quantity FROM 
(SELECT ORDERS_SRL.o_facility_id   AS o_facility_id, 
Count(DISTINCT ORDERS_SRL.tc_order_id) AS Count_of_Orders, 
SUM(order_line_item.order_qty) AS Order_Line_Quantity FROM   
order_line_item ORDER_LINE_ITEM  join orders ORDERS_SRL ON 
order_line_item.order_id = ORDERS_SRL.order_id GROUP BY 
ORDERS_SRL.o_facility_id) TAB1
 left outer join
 (SELECT  T3.facility_name, T2.facility_id FROM  facility T2  join 
facility_alias T3 ON T2.facility_id = T3.facility_id ) TAB2
ON  TAB1.o_facility_id = TAB2.facility_id;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-14665:


 Summary: vector_join_part_col_char.q failure
 Key: HIVE-14665
 URL: https://issues.apache.org/jira/browse/HIVE-14665
 Project: Hive
  Issue Type: Sub-task
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Happens 100% of the time. Looks like a missed golden file update from 
HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14664) Move QueryLifeTimeHook out of compilation lock

2016-08-29 Thread Chao Sun (JIRA)
Chao Sun created HIVE-14664:
---

 Summary: Move QueryLifeTimeHook out of compilation lock
 Key: HIVE-14664
 URL: https://issues.apache.org/jira/browse/HIVE-14664
 Project: Hive
  Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun


QueryLifeTimeHook currently triggers inside {{Driver#compile}} method, which is 
protected by the global compile lock. This poses a potential issue: since one 
can do anything inside a Hook, even running a Hive query, it may incur 
potential deadlock. To resolve this, we should consider move it out of the 
method and perhaps put in {{Driver#compileInternal}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: More than one table created at the same location

2016-08-29 Thread Sergey Shelukhin
This is a bug, or rather an unexpected usage. I suspect the correct count
value is coming from statistics.
Can you file a JIRA?

On 16/8/29, 00:51, "naveen mahadevuni"  wrote:

>Hi,
>
>Is the following behavior a bug? I believe at least one part of it is a
>bug. I created two Hive tables at the same location and inserted rows in
>two tables. count(*) returns the correct count for each individual table,
>but SELECT * on one tables reads the rows from other table files too.
>
>CREATE TABLE test1 (col1 INT, col2 INT)
>stored as orc
>LOCATION '/apps/hive/warehouse/test1';
>
>insert into test1 values(1,2);
>insert into test1 values(3,4);
>
>hive> select count(*) from test1;
>OK
>2
>Time taken: 0.177 seconds, Fetched: 1 row(s)
>
>
>CREATE TABLE test2 (col1 INT, col2 INT)
>stored as orc
>LOCATION '/apps/hive/warehouse/test1';
>
>insert into test2 values(1,2);
>insert into test2 values(3,4);
>
>hive> select count(*) from test2;
>OK
>2
>Time taken: 2.683 seconds, Fetched: 1 row(s)
>
>-- SELECT * fetches 4 records where as COUNT(*) above returns count of 2.
>
>hive> select * from test2;
>OK
>1   2
>3   4
>1   2
>3   4
>Time taken: 0.107 seconds, Fetched: 4 row(s)
>hive> select * from test1;
>OK
>1   2
>3   4
>1   2
>3   4
>Time taken: 0.054 seconds, Fetched: 4 row(s)
>
>Thanks,
>Naveen



[jira] [Created] (HIVE-14663) Change ptest java language version to 1.7, other version changes and fixes

2016-08-29 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14663:
-

 Summary: Change ptest java language version to 1.7, other version 
changes and fixes
 Key: HIVE-14663
 URL: https://issues.apache.org/jira/browse/HIVE-14663
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 49782: HIVE-14170: Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-08-29 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49782/#review147140
---


Ship it!




Ship It!

- Sergio Pena


On July 19, 2016, 9:50 p.m., Sahil Takiar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49782/
> ---
> 
> (Updated July 19, 2016, 9:50 p.m.)
> 
> 
> Review request for hive, Sergio Pena, Thejas Nair, and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-14170
> https://issues.apache.org/jira/browse/HIVE-14170
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> * Added a new BeeLine Options called `incrementalBufferRows` which controls 
> the number of `Row`s the `IncrementalRows` class should buffer, by default it 
> is 1000
> * Modified `BufferedRows` so that it can accept a limit on the number of 
> `Row`s it buffers
> * Modified `IncrementalRows` to read the value of `incrementalBufferRows` and 
> buffer rows as per HIVE-14170
> * The class delegates all buffering work to a `BufferedRows` class
> * This has the advantage that all the width calculaltion that spans multiple 
> rows can be encapsulate in the `BufferedRows` class, there is no need to 
> re-implement the logic in `IncrementalRows`
> * `IncrementalRows` will buffer `incrementalBufferRows` rows at a time, when 
> the buffer is depleted, it will fetch the next buffer and re-calculate the 
> width for that buffer
> 
> 
> Diffs
> -
> 
>   beeline/pom.xml 5503add 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 66185f6 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e2bbbea 
>   beeline/src/java/org/apache/hive/beeline/BufferedRows.java 5604742 
>   beeline/src/java/org/apache/hive/beeline/IncrementalRows.java 8aef976 
>   
> beeline/src/java/org/apache/hive/beeline/IncrementalRowsWithNormalization.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/Rows.java 453f685 
>   beeline/src/main/resources/BeeLine.properties 7500df9 
>   
> beeline/src/test/org/apache/hive/beeline/TestIncrementalRowsWithNormalization.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/49782/diff/
> 
> 
> Testing
> ---
> 
> * Unit Test added for `IncrementalRows`
> * Tested locally
> 
> 
> Thanks,
> 
> Sahil Takiar
> 
>



Re: Review Request 51468: HIVE-14532 Enable qtests from IDE

2016-08-29 Thread Gabor Szadovszky


> On Aug. 27, 2016, 12:25 a.m., Gabor Szadovszky wrote:
> > pom.xml, line 195
> > 
> >
> > Wouldn't it infect the other modules (production) as well?
> 
> Zoltan Haindrich wrote:
> I don't think so...this property will affect everything where its 
> used...when the ide profile is inactive it will work as before.
> 
> Gabor Szadovszky wrote:
> Then, the comment "this will enable by default the shade plugin" in the 
> root pom is a bit misleading to me.
> 
> Zoltan Haindrich wrote:
> disabling the shade plugin was not easy...contolling its execution phase 
> seemed the most straight forward way to control its skip...
> it's not good to have a comment which is misleading...
> 
> * do you have any suggestion?
> * sould i remove the comment? I think the property name is self 
> explaining about what it does - and that way you may look up its uses and 
> figure out its function from the actual configuration..

For me the comment means that the shade plugin will run for all the submodules 
which is not true. I would say something like "Used for enabling/disabling the 
shade plugin where it is relevant. Enabled by default." What do you think?


- Gabor


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51468/#review147046
---


On Aug. 26, 2016, 10:13 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51468/
> ---
> 
> (Updated Aug. 26, 2016, 10:13 p.m.)
> 
> 
> Review request for hive, Balint Molnar, Lefty Leverenz, and Prasanth_J.
> 
> 
> Bugs: HIVE-14532
> https://issues.apache.org/jira/browse/HIVE-14532
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> how to execute qtests for ide wikipage draft:
> http://hastebin.com/paxicutive.vhdl
> 
> the patch itself contains:
> 
> * some automatic property settings to configure qtest related things to be 
> able to execute
> * maven profile to avoid shading plugin invocation during IDE project 
> generation
> * some test.src.tables related changes
> 
> 
> Diffs
> -
> 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/AbstractCliConfig.java
>  efbd4657f22e856b9c9ba5f74472ad5fd9f9a5b5 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 319a205225c67f123ba35a2811ee117650d46dc3 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> 4d4a929c159c61f9f4af3238d4b7baff146d346e 
>   jdbc/pom.xml b29739b3f8577c6e363b5c8ee39b63e53a17c907 
>   pom.xml 9ed1c19f3e312ee1f1d9932ee2ba228589a7cb49 
>   ql/pom.xml 02ddb805a228ed23694c8a81953dd2400d7308c6 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/EnforceReadOnlyTables.java 
> 4569ed5e41f2f4db8f3d539d7cfb693c442ba910 
> 
> Diff: https://reviews.apache.org/r/51468/diff/
> 
> 
> Testing
> ---
> 
> I've tested the draft using eclipse:
> 3.0 -> 3.1 -> TestCliDriver(combine2,alter_concatenate_indexed_table)
> 3.0 -> 3.2 -> TestCliDriver(combine2)
> 
> I think this should work with idea too...but someone should possibly check it 
> ;)
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>



[jira] [Created] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-08-29 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-14662:


 Summary: Wrong Class Instance When Using Custom SERDE
 Key: HIVE-14662
 URL: https://issues.apache.org/jira/browse/HIVE-14662
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Nemon Lou
Assignee: Nemon Lou


Using  [SERDE for 
mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
DDL
{noformat}
create external table mytable (ID STRING..) 
ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
LOCATION 'hdfs:///mypath'; 
{noformat}
Open beeline and run the following query ,and then open another beeline,run 
this again.Then fails.
{noformat}
add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
select * from mytable limit 1;
{noformat}

Error log :
{noformat}
2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
Error fetching results:  | 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
org.apache.hadoop.hive.serde2.SerDeException: class 
com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
com.mongodb.hadoop.io.BSONWritable
at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: 
class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
com.mongodb.hadoop.io.BSONWritable
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756)
at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361)
... 24 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: class 
com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
com.mongodb.hadoop.io.BSONWritable
at com.mongodb.hadoop.hive.BSONSerDe.deserialize(BSONSerDe.java:196)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:488)
... 28 more
{noformat}

Note:must make sure the table is not 

More than one table created at the same location

2016-08-29 Thread naveen mahadevuni
Hi,

Is the following behavior a bug? I believe at least one part of it is a
bug. I created two Hive tables at the same location and inserted rows in
two tables. count(*) returns the correct count for each individual table,
but SELECT * on one tables reads the rows from other table files too.

CREATE TABLE test1 (col1 INT, col2 INT)
stored as orc
LOCATION '/apps/hive/warehouse/test1';

insert into test1 values(1,2);
insert into test1 values(3,4);

hive> select count(*) from test1;
OK
2
Time taken: 0.177 seconds, Fetched: 1 row(s)


CREATE TABLE test2 (col1 INT, col2 INT)
stored as orc
LOCATION '/apps/hive/warehouse/test1';

insert into test2 values(1,2);
insert into test2 values(3,4);

hive> select count(*) from test2;
OK
2
Time taken: 2.683 seconds, Fetched: 1 row(s)

-- SELECT * fetches 4 records where as COUNT(*) above returns count of 2.

hive> select * from test2;
OK
1   2
3   4
1   2
3   4
Time taken: 0.107 seconds, Fetched: 4 row(s)
hive> select * from test1;
OK
1   2
3   4
1   2
3   4
Time taken: 0.054 seconds, Fetched: 4 row(s)

Thanks,
Naveen