[jira] [Created] (HIVE-10907) Hive on Tez: Classcast exception in some cases with SMB joins

2015-06-03 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-10907:
-

 Summary: Hive on Tez: Classcast exception in some cases with SMB 
joins
 Key: HIVE-10907
 URL: https://issues.apache.org/jira/browse/HIVE-10907
 Project: Hive
  Issue Type: Bug
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K


In cases where there is a mix of Map side work and reduce side work, we get a 
classcast exception because we assume homogeneity in the code. We need to fix 
this correctly. For now this is a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10912) LLAP: Exception in InputInitializer when creating HiveSplitGenerator

2015-06-03 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-10912:
-

 Summary: LLAP: Exception in InputInitializer when creating 
HiveSplitGenerator
 Key: HIVE-10912
 URL: https://issues.apache.org/jira/browse/HIVE-10912
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


{code}
2015-06-03 13:46:32,212 ERROR [Dispatcher thread: Central] exec.Utilities: 
Failed to load plan: 
hdfs://localhost:8020/tmp/hive/sseth/9c4ce145-f7f4-49c4-a615-28ce154f7f1d/hive_2015-06-03_13-46-29_283_23518
java.lang.NullPointerException
  at 
org.apache.hadoop.hive.ql.exec.GlobalWorkMapFactory.get(GlobalWorkMapFactory.java:85)
  at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:389)
  at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:299)
  at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.init(HiveSplitGenerator.java:94)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at 
org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
  at 
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
  at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
  at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
  at 
org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4422)
  at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4300(VertexImpl.java:200)
  at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3271)
  at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3221)
  at 
org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3202)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
  at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1850)
  at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:199)
  at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2001)
  at 
org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1987)
  at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
  at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
  at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10908) Hive on tez: SMB join needs to work with different type of work items (map side with reduce side)

2015-06-03 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-10908:
-

 Summary: Hive on tez: SMB join needs to work with different type 
of work items (map side with reduce side)
 Key: HIVE-10908
 URL: https://issues.apache.org/jira/browse/HIVE-10908
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 1.3.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K


This is related to HIVE-10907. This is going to be the actual enhancement/fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10910) Alter table drop partition queries in encrypted zone failing to remove data from HDFS

2015-06-03 Thread Aswathy Chellammal Sreekumar (JIRA)
Aswathy Chellammal Sreekumar created HIVE-10910:
---

 Summary: Alter table drop partition queries in encrypted zone 
failing to remove data from HDFS
 Key: HIVE-10910
 URL: https://issues.apache.org/jira/browse/HIVE-10910
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Eugene Koifman


Alter table query trying to drop partition removes metadata of partition but 
fails to remove the data from HDFS

hive create table table_1(name string, age int, gpa double) partitioned by (b 
string) stored as textfile;
OK
Time taken: 0.732 seconds
hive alter table table_1 add partition (b='2010-10-10');
OK
Time taken: 0.496 seconds
hive show partitions table_1;
OK
b=2010-10-10
Time taken: 0.781 seconds, Fetched: 1 row(s)
hive alter table table_1 drop partition (b='2010-10-10');
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Got exception: java.io.IOException 
Failed to move to trash: 
hdfs://ip-address:8020/warehouse-dir/table_1/b=2010-10-10
hive show partitions table_1;
OK
Time taken: 0.622 seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Creating branch-1

2015-06-03 Thread Vinod Kumar Vavilapalli
Hadoop uses a Target Version field. Not sure if this was done for all 
projects.

+Vinod

On Jun 3, 2015, at 9:16 AM, Alan Gates 
alanfga...@gmail.commailto:alanfga...@gmail.com wrote:

I don't think using Affects Version will work because it is used to list which 
versions of Hive the bug affects, unless you're proposing being able to parse 
affected version into branch (ie 1.3.0 = branch-1).

I like the idea of customizing JIRA, though I don't know how hard it is.

We could also use the labels field.  It would run against master by default and 
you could also add a label to run against an additional branch.  It would have 
to find a patch matching that branch in order to run.

Alan.

[cid:part1.08000808.03000103@gmail.com]
Thejas Nairmailto:thejas.n...@gmail.com
June 3, 2015 at 7:51
Thanks for the insights Sergio!
Using 'Affects Version' sounds like a good idea. However, for the case
where it needs to be executed against both branch-1 and master, I
think it would be more intuitive to use
Affects Version/s: branch-master branch-1  , as the version
number in master branch will keep increasing.

We might be able to request for a custom field in jira (say Test
branches) for this as well. But we could probably start with the
'Affects Version' approach.
[cid:part1.08000808.03000103@gmail.com]
Sergio Penamailto:sergio.p...@cloudera.com
June 2, 2015 at 15:03
Hi Alan,

Currently, the test system executes tests on a specific branch only if
there is a Jenkins job assigned to it, like trunk or spark. Any other
branch will not work. We will need to create a job for branch-1, modify the
jenkins-submit-build.sh to add the new profile, and add a new properties
file to the Jenkins instance that contains branch information.

This is a little tedious for every branch we create.

Also, I don't think the test system will grab two patches (branch-1 
master) to execute the tests on different branches. It will get the latest
one you uploaded.

What about if we use the 'Affects Version/s' field of the ticket to specify
which branches the patch needs to be executed? Or as you said, use hints on
the comments.

For instance:
- Affects Version/s: branch-1 # Tests on branch-1 only
- Affects Version/s: 2.0.0 branch-1 # Tests on branch-1 and master
- Affects Version/s: branch-spark # Tests on branch-spark only

If we use 'branch-xxx' as a naming convention for our branches, then we can
detect the branch from the ticket details. And if x.x.x version is
specified, then just execute them from master.

Also, branch-1 would need to be executed with MR1, right? Then the patch
file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
environment.

Right now the code that parses this info is on process_jira function on
'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
parse different branches there, and let jenkins-submit-build.sh call the
correct job with specific branch details.

Any other ideas?

- Sergio



[cid:part1.08000808.03000103@gmail.com]
Alan Gatesmailto:alanfga...@gmail.com
June 1, 2015 at 16:19
Based on our discussion and vote last week I'm working on creating branch-1.   
I plan to make the branch tomorrow.  If anyone has a large commit they don't 
want to have to commit twice and they are close to committing it let me know so 
I can make sure it gets in before I branch.

I'll also be updating 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute to clarify how 
to handle feature and bug fix patches on master and branch-1.

Also, we will need to make sure patches can be tested against master and 
branch-1.  If I understand correctly the test system today will run a patch 
against a branch instead of master if the patch is named with the branch name.  
There are a couple of issues with this.  One, people will often want to submit 
two versions of patches and have them both tested (one against master and one 
against branch-1) rather than one or the other.  The second is we will want a 
way for one patch to be tested against both when appropriate.  The first case 
could be handled by the system picking up both branch-1 and master patches and 
running them automatically.  The second could be handled by hints in the 
comments so the system needs to run both.  I'm open to other suggestions as 
well.  Can someone familiar with the testing code point to where I'd look to 
see what it would take to make this work?

Alan.



[jira] [Created] (HIVE-10911) Add support for date datatype in the value based windowing function

2015-06-03 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-10911:
---

 Summary: Add support for date datatype in the value based 
windowing function
 Key: HIVE-10911
 URL: https://issues.apache.org/jira/browse/HIVE-10911
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu


Currently date datatype is not supported in value based windowing function. 

For the following query with hiredate to be date type, an exception will be 
thrown.

{{select deptno, ename, hiredate, sal, sum(sal) over (partition by deptno order 
by hiredate range 90 preceding) from emp;}}

It's valuable to support such type with number of days as the value difference. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10909) Make TestFilterHooks robust

2015-06-03 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-10909:
---

 Summary: Make TestFilterHooks robust
 Key: HIVE-10909
 URL: https://issues.apache.org/jira/browse/HIVE-10909
 Project: Hive
  Issue Type: Test
  Components: Metastore, Tests
Affects Versions: 1.2.0
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Ashutosh Chauhan


Currently it fails sometimes when run in sequential order because of left over 
state from previous tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10913) LLAP: cache QF counters have a wrong counters

2015-06-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10913:
---

 Summary: LLAP: cache QF counters have a wrong counters 
 Key: HIVE-10913
 URL: https://issues.apache.org/jira/browse/HIVE-10913
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


Also not enough data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10917) ORC fails to read table with a 38Gb ORC file

2015-06-03 Thread Gopal V (JIRA)
Gopal V created HIVE-10917:
--

 Summary: ORC fails to read table with a 38Gb ORC file
 Key: HIVE-10917
 URL: https://issues.apache.org/jira/browse/HIVE-10917
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0
Reporter: Gopal V


{code}

hive  set mapreduce.input.fileinputformat.split.maxsize=1;
hive set  mapreduce.input.fileinputformat.split.maxsize=1;
hive alter table lineitem concatenate;
..
hive dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
Found 12 items
-rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
-rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
-rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
-rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
-rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
-rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
-rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
-rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
-rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
-rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
-rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
-rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
{code}

Errors without PPD

Suspicions about ORC stripe padding and stream offsets in the stream 
information, when concatenating.

{code}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 
offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 
36845
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
... 25 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10921) Change trunk pom version to reflect the branch-1 split

2015-06-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10921:
---

 Summary: Change trunk pom version to reflect the branch-1 split
 Key: HIVE-10921
 URL: https://issues.apache.org/jira/browse/HIVE-10921
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10921.patch





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10923) encryption_join_with_different_encryption_keys.q fails on CentOS 6

2015-06-03 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-10923:
--

 Summary: encryption_join_with_different_encryption_keys.q fails on 
CentOS 6
 Key: HIVE-10923
 URL: https://issues.apache.org/jira/browse/HIVE-10923
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong


Here is the stack trace
{code}
Task with the most failures(4):
-
Task ID:
  task_1433377676690_0015_m_00

URL:
  
http://ip-10-0-0-249.ec2.internal:44717/taskdetails.jsp?jobid=job_1433377676690_0015tipid=task_1433377676690_0015_m_00
-
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {key:238,value:val_238}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {key:238,value:val_238}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
java.security.InvalidKeyException: Illegal key size
at 
org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:116)
at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension$DefaultCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:264)
at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2489)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2620)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2519)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:566)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
Caused by: java.security.InvalidKeyException: Illegal key size
at javax.crypto.Cipher.checkCryptoPerm(Cipher.java:1024)
at javax.crypto.Cipher.implInit(Cipher.java:790)
at javax.crypto.Cipher.chooseProvider(Cipher.java:849)
at javax.crypto.Cipher.init(Cipher.java:1348)
at javax.crypto.Cipher.init(Cipher.java:1282)
at 
org.apache.hadoop.crypto.JceAesCtrCryptoCodec$JceAesCtrCipher.init(JceAesCtrCryptoCodec.java:113)
... 16 more

at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: 

[jira] [Created] (HIVE-10924) add support for MERGE statement

2015-06-03 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-10924:
-

 Summary: add support for MERGE statement
 Key: HIVE-10924
 URL: https://issues.apache.org/jira/browse/HIVE-10924
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Query Processor
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


add support for 
MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10916) ORC fails to read table with a 38Gb ORC file

2015-06-03 Thread Gopal V (JIRA)
Gopal V created HIVE-10916:
--

 Summary: ORC fails to read table with a 38Gb ORC file
 Key: HIVE-10916
 URL: https://issues.apache.org/jira/browse/HIVE-10916
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0
Reporter: Gopal V


{code}

hive  set mapreduce.input.fileinputformat.split.maxsize=1;
hive set  mapreduce.input.fileinputformat.split.maxsize=1;
hive alter table lineitem concatenate;
..
hive dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
Found 12 items
-rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
-rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
-rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
-rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
-rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
-rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
-rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
-rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
-rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
-rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
-rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
-rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
{code}

Errors without PPD

Suspicions about ORC stripe padding and stream offsets in the stream 
information, when concatenating.

{code}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 
offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 
36845
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
... 25 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10915) ORC fails to read table with a 38Gb ORC file

2015-06-03 Thread Gopal V (JIRA)
Gopal V created HIVE-10915:
--

 Summary: ORC fails to read table with a 38Gb ORC file
 Key: HIVE-10915
 URL: https://issues.apache.org/jira/browse/HIVE-10915
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0
Reporter: Gopal V


{code}

hive  set mapreduce.input.fileinputformat.split.maxsize=1;
hive set  mapreduce.input.fileinputformat.split.maxsize=1;
hive alter table lineitem concatenate;
..
hive dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
Found 12 items
-rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
-rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
-rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
-rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
-rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
-rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
-rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
-rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
-rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
-rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
-rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
-rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
{code}

Errors without PPD

Suspicious offsets in the stream information - 

{code}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 
offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 
36845
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
... 25 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Creating branch-1

2015-06-03 Thread Sergey Shelukhin
Meanwhile, should we switch trunk HiveQA builds to use hadoop-2 profile? I see 
they are still running hadoop-1 :)

From: Vinod Kumar Vavilapalli 
vino...@hortonworks.commailto:vino...@hortonworks.com
Reply-To: dev@hive.apache.orgmailto:dev@hive.apache.org 
dev@hive.apache.orgmailto:dev@hive.apache.org
Date: Wednesday, June 3, 2015 at 12:17
To: dev@hive.apache.orgmailto:dev@hive.apache.org 
dev@hive.apache.orgmailto:dev@hive.apache.org
Subject: Re: Creating branch-1

Hadoop uses a Target Version field. Not sure if this was done for all 
projects.

+Vinod

On Jun 3, 2015, at 9:16 AM, Alan Gates 
alanfga...@gmail.commailto:alanfga...@gmail.com wrote:

I don't think using Affects Version will work because it is used to list which 
versions of Hive the bug affects, unless you're proposing being able to parse 
affected version into branch (ie 1.3.0 = branch-1).

I like the idea of customizing JIRA, though I don't know how hard it is.

We could also use the labels field.  It would run against master by default and 
you could also add a label to run against an additional branch.  It would have 
to find a patch matching that branch in order to run.

Alan.

[cid:part1.08000808.03000103@gmail.com]
Thejas Nairmailto:thejas.n...@gmail.com
June 3, 2015 at 7:51
Thanks for the insights Sergio!
Using 'Affects Version' sounds like a good idea. However, for the case
where it needs to be executed against both branch-1 and master, I
think it would be more intuitive to use
Affects Version/s: branch-master branch-1  , as the version
number in master branch will keep increasing.

We might be able to request for a custom field in jira (say Test
branches) for this as well. But we could probably start with the
'Affects Version' approach.
[cid:part1.08000808.03000103@gmail.com]
Sergio Penamailto:sergio.p...@cloudera.com
June 2, 2015 at 15:03
Hi Alan,

Currently, the test system executes tests on a specific branch only if
there is a Jenkins job assigned to it, like trunk or spark. Any other
branch will not work. We will need to create a job for branch-1, modify the
jenkins-submit-build.sh to add the new profile, and add a new properties
file to the Jenkins instance that contains branch information.

This is a little tedious for every branch we create.

Also, I don't think the test system will grab two patches (branch-1 
master) to execute the tests on different branches. It will get the latest
one you uploaded.

What about if we use the 'Affects Version/s' field of the ticket to specify
which branches the patch needs to be executed? Or as you said, use hints on
the comments.

For instance:
- Affects Version/s: branch-1 # Tests on branch-1 only
- Affects Version/s: 2.0.0 branch-1 # Tests on branch-1 and master
- Affects Version/s: branch-spark # Tests on branch-spark only

If we use 'branch-xxx' as a naming convention for our branches, then we can
detect the branch from the ticket details. And if x.x.x version is
specified, then just execute them from master.

Also, branch-1 would need to be executed with MR1, right? Then the patch
file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
environment.

Right now the code that parses this info is on process_jira function on
'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
parse different branches there, and let jenkins-submit-build.sh call the
correct job with specific branch details.

Any other ideas?

- Sergio



[cid:part1.08000808.03000103@gmail.com]
Alan Gatesmailto:alanfga...@gmail.com
June 1, 2015 at 16:19
Based on our discussion and vote last week I'm working on creating branch-1.   
I plan to make the branch tomorrow.  If anyone has a large commit they don't 
want to have to commit twice and they are close to committing it let me know so 
I can make sure it gets in before I branch.

I'll also be updating 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute to clarify how 
to handle feature and bug fix patches on master and branch-1.

Also, we will need to make sure patches can be tested against master and 
branch-1.  If I understand correctly the test system today will run a patch 
against a branch instead of master if the patch is named with the branch name.  
There are a couple of issues with this.  One, people will often want to submit 
two versions of patches and have them both tested (one against master and one 
against branch-1) rather than one or the other.  The second is we will want a 
way for one patch to be tested against both when appropriate.  The first case 
could be handled by the system picking up both branch-1 and master patches and 
running them automatically.  The second could be handled by hints in the 
comments so the system needs to run both.  I'm open to other suggestions as 
well.  Can someone familiar with the testing code point to where I'd look to 
see what it would take to make this work?

Alan.



Re: Creating branch-1

2015-06-03 Thread Thejas Nair
Do the hadoop jenkins scripts use some regex match on 'target version' to
identify the branch to be used ?


On Wed, Jun 3, 2015 at 12:17 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

  Hadoop uses a Target Version field. Not sure if this was done for all
 projects.

  +Vinod

  On Jun 3, 2015, at 9:16 AM, Alan Gates alanfga...@gmail.com wrote:

  I don't think using Affects Version will work because it is used to list
 which versions of Hive the bug affects, unless you're proposing being able
 to parse affected version into branch (ie 1.3.0 = branch-1).

 I like the idea of customizing JIRA, though I don't know how hard it is.

 We could also use the labels field.  It would run against master by
 default and you could also add a label to run against an additional
 branch.  It would have to find a patch matching that branch in order to run.

 Alan.

Thejas Nair thejas.n...@gmail.com
 June 3, 2015 at 7:51
   Thanks for the insights Sergio!
 Using 'Affects Version' sounds like a good idea. However, for the case
 where it needs to be executed against both branch-1 and master, I
 think it would be more intuitive to use
 Affects Version/s: branch-master branch-1  , as the version
 number in master branch will keep increasing.

 We might be able to request for a custom field in jira (say Test
 branches) for this as well. But we could probably start with the
 'Affects Version' approach.
Sergio Pena sergio.p...@cloudera.com
 June 2, 2015 at 15:03
   Hi Alan,

 Currently, the test system executes tests on a specific branch only if
 there is a Jenkins job assigned to it, like trunk or spark. Any other
 branch will not work. We will need to create a job for branch-1, modify the
 jenkins-submit-build.sh to add the new profile, and add a new properties
 file to the Jenkins instance that contains branch information.

 This is a little tedious for every branch we create.

 Also, I don't think the test system will grab two patches (branch-1 
 master) to execute the tests on different branches. It will get the latest
 one you uploaded.

 What about if we use the 'Affects Version/s' field of the ticket to specify
 which branches the patch needs to be executed? Or as you said, use hints on
 the comments.

 For instance:
 - Affects Version/s: branch-1 # Tests on branch-1 only
 - Affects Version/s: 2.0.0 branch-1 # Tests on branch-1 and master
 - Affects Version/s: branch-spark # Tests on branch-spark only

 If we use 'branch-xxx' as a naming convention for our branches, then we can
 detect the branch from the ticket details. And if x.x.x version is
 specified, then just execute them from master.

 Also, branch-1 would need to be executed with MR1, right? Then the patch
 file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
 environment.

 Right now the code that parses this info is on process_jira function on
 'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
 parse different branches there, and let jenkins-submit-build.sh call the
 correct job with specific branch details.

 Any other ideas?

 - Sergio



Alan Gates alanfga...@gmail.com
 June 1, 2015 at 16:19
   Based on our discussion and vote last week I'm working on creating
 branch-1.   I plan to make the branch tomorrow.  If anyone has a large
 commit they don't want to have to commit twice and they are close to
 committing it let me know so I can make sure it gets in before I branch.

 I'll also be updating
 https://cwiki.apache.org/confluence/display/Hive/HowToContribute to
 clarify how to handle feature and bug fix patches on master and branch-1.

 Also, we will need to make sure patches can be tested against master and
 branch-1.  If I understand correctly the test system today will run a patch
 against a branch instead of master if the patch is named with the branch
 name.  There are a couple of issues with this.  One, people will often want
 to submit two versions of patches and have them both tested (one against
 master and one against branch-1) rather than one or the other.  The second
 is we will want a way for one patch to be tested against both when
 appropriate.  The first case could be handled by the system picking up both
 branch-1 and master patches and running them automatically.  The second
 could be handled by hints in the comments so the system needs to run both.
 I'm open to other suggestions as well.  Can someone familiar with the
 testing code point to where I'd look to see what it would take to make this
 work?

 Alan.





[jira] [Created] (HIVE-10918) ORC fails to read table with a 38Gb ORC file

2015-06-03 Thread Gopal V (JIRA)
Gopal V created HIVE-10918:
--

 Summary: ORC fails to read table with a 38Gb ORC file
 Key: HIVE-10918
 URL: https://issues.apache.org/jira/browse/HIVE-10918
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0
Reporter: Gopal V


{code}

hive  set mapreduce.input.fileinputformat.split.maxsize=1;
hive set  mapreduce.input.fileinputformat.split.maxsize=1;
hive alter table lineitem concatenate;
..
hive dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
Found 12 items
-rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
-rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
-rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
-rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
-rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
-rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
-rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
-rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
-rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
-rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
-rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
-rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
/apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
{code}

Errors without PPD

Suspicions about ORC stripe padding and stream offsets in the stream 
information, when concatenating.

{code}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 
offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 
36845
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
at 
org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
... 25 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10922) In HS2 doAs=false mode, file system related errors in one query causes other failures

2015-06-03 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-10922:


 Summary: In HS2 doAs=false mode, file system related errors in one 
query causes other failures
 Key: HIVE-10922
 URL: https://issues.apache.org/jira/browse/HIVE-10922
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.0.0, 1.1.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair


Warehouse class has a few methods that close file system object on errors.
With doAs=false, since all queries use the same HS2 ugi, the filesystem object 
is shared across queries/threads. When the close on one filesystem object gets 
called, it leads to filesystem object used in other threads also get closed and 
any files registered for deletion on exit also getting deleted.

There is also no close being done in case of the happy code path.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10914) LLAP: fix hadoop-1 build for good by removing llap-server from hadoop-1 build

2015-06-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10914:
---

 Summary: LLAP: fix hadoop-1 build for good by removing llap-server 
from hadoop-1 build
 Key: HIVE-10914
 URL: https://issues.apache.org/jira/browse/HIVE-10914
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin


LLAP won't ever work with hadoop 1, so no point in building it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10920) LLAP: elevator reads some useless data even if all RGs are eliminated by SARG

2015-06-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-10920:
---

 Summary: LLAP: elevator reads some useless data even if all RGs 
are eliminated by SARG
 Key: HIVE-10920
 URL: https://issues.apache.org/jira/browse/HIVE-10920
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10919) Windows: create table with JsonSerDe failed via beeline unless you add hcatalog core jar to classpath

2015-06-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-10919:


 Summary: Windows: create table with JsonSerDe failed via beeline 
unless you add hcatalog core jar to classpath
 Key: HIVE-10919
 URL: https://issues.apache.org/jira/browse/HIVE-10919
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Before we run HiveServer2 tests, we create table via beeline.
And 'create table' with JsonSerDe failed on Winodws. It works on Linux:

{noformat}
0: jdbc:hive2://localhost:10001 create external table all100kjson(
0: jdbc:hive2://localhost:10001 s string,
0: jdbc:hive2://localhost:10001 i int,
0: jdbc:hive2://localhost:10001 d double,
0: jdbc:hive2://localhost:10001 m mapstring, string,
0: jdbc:hive2://localhost:10001 bb arraystructa: int, b: string,
0: jdbc:hive2://localhost:10001 t timestamp)
0: jdbc:hive2://localhost:10001 row format serde 
'org.apache.hive.hcatalog.data.JsonSerDe'
0: jdbc:hive2://localhost:10001 WITH SERDEPROPERTIES 
('timestamp.formats'='-MM-dd\'T\'HH:mm:ss')
0: jdbc:hive2://localhost:10001 STORED AS TEXTFILE
0: jdbc:hive2://localhost:10001 location '/user/hcat/tests/data/all100kjson';
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLT
ask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe 
(state=08S01,code=1)
{noformat}

hive.log shows:
{noformat}
2015-05-21 21:59:17,004 ERROR operation.Operation (SQLOperation.java:run(209)) 
- Error running hive query: 

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: 
org.apache.hive.hcatalog.data.JsonSerDe

at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)

at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)

at 
org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)

at 
org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

at 
org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate 
serde: org.apache.hive.hcatalog.data.JsonSerDe

at 
org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3871)

at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4011)

at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)

at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1650)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1409)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1192)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)

at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)

... 11 more

Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hive.hcatalog.data.JsonSerDe not found

at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)

at 
org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3865)

... 21 more
{noformat}

If you do add the hcatalog jar to classpath, it works:
{noformat}0: jdbc:hive2://localhost:10001 add jar 
hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-2079.jar;
INFO  : converting to local 
hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-2079.jar
INFO  : Added 
[/C:/Users/hadoop/AppData/Local/Temp/bc941dac-3bca-4287-a490-8a65c2dac220_resources/hive-hcatalog-core-1.2
.0.2.3.0.0-2079.jar] to class path
INFO  : Added resources: 
[hdfs:///tmp/testjars/hive-hcatalog-core-1.2.0.2.3.0.0-2079.jar]
No rows affected (0.304 seconds)
0: jdbc:hive2://localhost:10001 create external table all100kjson(
0: jdbc:hive2://localhost:10001 s string,
0: 

Re: Review Request 34752: Beeline-CLI: Implement CLI source command using Beeline functionality

2015-06-03 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34752/#review86368
---

Ship it!


Ship It!

- Xuefu Zhang


On June 1, 2015, 3:17 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34752/
 ---
 
 (Updated June 1, 2015, 3:17 a.m.)
 
 
 Review request for hive, chinna and Xuefu Zhang.
 
 
 Bugs: HIVE-10821
 https://issues.apache.org/jira/browse/HIVE-10821
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Add source command support for CLI using beeline
 
 
 Diffs
 -
 
   beeline/src/java/org/apache/hive/beeline/BeeLine.java 4a82635 
   beeline/src/java/org/apache/hive/beeline/Commands.java 4c60525 
   beeline/src/test/org/apache/hive/beeline/cli/TestHiveCli.java cc0b598 
 
 Diff: https://reviews.apache.org/r/34752/diff/
 
 
 Testing
 ---
 
 Newly created UT passed
 
 
 Thanks,
 
 cheng xu
 




[jira] [Created] (HIVE-10903) Add hive.in.test for HoS tests [Spark Branch]

2015-06-03 Thread Rui Li (JIRA)
Rui Li created HIVE-10903:
-

 Summary: Add hive.in.test for HoS tests [Spark Branch]
 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10905) QuitExit fails ending with ';' [beeline-cli Branch]

2015-06-03 Thread Chinna Rao Lalam (JIRA)
Chinna Rao Lalam created HIVE-10905:
---

 Summary: QuitExit fails ending with ';' [beeline-cli Branch]
 Key: HIVE-10905
 URL: https://issues.apache.org/jira/browse/HIVE-10905
 Project: Hive
  Issue Type: Bug
Affects Versions: beeline-cli-branch
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam


In CLI quit and exit will expect ending ';'

In Updated CLI quit and exit without ending ; is working.
quit and exit ending with ';' throwing exception. Support quit and exit with 
ending ';'  for the compatibility;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Creating branch-1

2015-06-03 Thread Thejas Nair
Thanks for the insights Sergio!
Using 'Affects Version' sounds like a good idea. However, for the case
where it needs to be executed against both branch-1 and master, I
think it would be more intuitive to use
Affects Version/s: branch-master branch-1 , as the version
number in master branch will keep increasing.

We might be able to request for a custom field in jira (say Test
branches) for this as well. But we could probably start with the
'Affects Version' approach.

On Tue, Jun 2, 2015 at 3:03 PM, Sergio Pena sergio.p...@cloudera.com wrote:
 Hi Alan,

 Currently, the test system executes tests on a specific branch only if
 there is a Jenkins job assigned to it, like trunk or spark. Any other
 branch will not work. We will need to create a job for branch-1, modify the
 jenkins-submit-build.sh to add the new profile, and add a new properties
 file to the Jenkins instance that contains branch information.

 This is a little tedious for every branch we create.

 Also, I don't think the test system will grab two patches (branch-1 
 master) to execute the tests on different branches. It will get the latest
 one you uploaded.

 What about if we use the 'Affects Version/s' field of the ticket to specify
 which branches the patch needs to be executed? Or as you said, use hints on
 the comments.

 For instance:
 - Affects Version/s: branch-1  # Tests on branch-1 only
 - Affects Version/s: 2.0.0 branch-1   # Tests on branch-1 and master
 - Affects Version/s: branch-spark # Tests on branch-spark only

 If we use 'branch-xxx' as a naming convention for our branches, then we can
 detect the branch from the ticket details. And if x.x.x version is
 specified, then just execute them from master.

 Also, branch-1 would need to be executed with MR1, right? Then the patch
 file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
 environment.

 Right now the code that parses this info is on process_jira function on
 'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
 parse different branches there, and let jenkins-submit-build.sh call the
 correct job with specific branch details.

 Any other ideas?

 - Sergio


 On Mon, Jun 1, 2015 at 6:19 PM, Alan Gates alanfga...@gmail.com wrote:

 Based on our discussion and vote last week I'm working on creating
 branch-1.   I plan to make the branch tomorrow.  If anyone has a large
 commit they don't want to have to commit twice and they are close to
 committing it let me know so I can make sure it gets in before I branch.

 I'll also be updating
 https://cwiki.apache.org/confluence/display/Hive/HowToContribute to
 clarify how to handle feature and bug fix patches on master and branch-1.

 Also, we will need to make sure patches can be tested against master and
 branch-1.  If I understand correctly the test system today will run a patch
 against a branch instead of master if the patch is named with the branch
 name.  There are a couple of issues with this.  One, people will often want
 to submit two versions of patches and have them both tested (one against
 master and one against branch-1) rather than one or the other.  The second
 is we will want a way for one patch to be tested against both when
 appropriate.  The first case could be handled by the system picking up both
 branch-1 and master patches and running them automatically.  The second
 could be handled by hints in the comments so the system needs to run both.
 I'm open to other suggestions as well.  Can someone familiar with the
 testing code point to where I'd look to see what it would take to make this
 work?

 Alan.



[jira] [Created] (HIVE-10904) Use beeline-log4j.properties for migrated CLI [beeline-cli Branch]

2015-06-03 Thread Chinna Rao Lalam (JIRA)
Chinna Rao Lalam created HIVE-10904:
---

 Summary: Use beeline-log4j.properties for migrated CLI 
[beeline-cli Branch]
 Key: HIVE-10904
 URL: https://issues.apache.org/jira/browse/HIVE-10904
 Project: Hive
  Issue Type: Bug
Affects Versions: beeline-cli-branch
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam


Updated CLI printing logs on the console. Use beeline-log4j.properties for 
redirecting to file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Creating branch-1

2015-06-03 Thread Alan Gates
I don't think using Affects Version will work because it is used to list 
which versions of Hive the bug affects, unless you're proposing being 
able to parse affected version into branch (ie 1.3.0 = branch-1).


I like the idea of customizing JIRA, though I don't know how hard it is.

We could also use the labels field.  It would run against master by 
default and you could also add a label to run against an additional 
branch.  It would have to find a patch matching that branch in order to run.


Alan.


Thejas Nair mailto:thejas.n...@gmail.com
June 3, 2015 at 7:51
Thanks for the insights Sergio!
Using 'Affects Version' sounds like a good idea. However, for the case
where it needs to be executed against both branch-1 and master, I
think it would be more intuitive to use
Affects Version/s: branch-master branch-1  , as the version
number in master branch will keep increasing.

We might be able to request for a custom field in jira (say Test
branches) for this as well. But we could probably start with the
'Affects Version' approach.
Sergio Pena mailto:sergio.p...@cloudera.com
June 2, 2015 at 15:03
Hi Alan,

Currently, the test system executes tests on a specific branch only if
there is a Jenkins job assigned to it, like trunk or spark. Any other
branch will not work. We will need to create a job for branch-1, 
modify the

jenkins-submit-build.sh to add the new profile, and add a new properties
file to the Jenkins instance that contains branch information.

This is a little tedious for every branch we create.

Also, I don't think the test system will grab two patches (branch-1 
master) to execute the tests on different branches. It will get the latest
one you uploaded.

What about if we use the 'Affects Version/s' field of the ticket to 
specify
which branches the patch needs to be executed? Or as you said, use 
hints on

the comments.

For instance:
- Affects Version/s: branch-1 # Tests on branch-1 only
- Affects Version/s: 2.0.0 branch-1 # Tests on branch-1 and master
- Affects Version/s: branch-spark # Tests on branch-spark only

If we use 'branch-xxx' as a naming convention for our branches, then 
we can

detect the branch from the ticket details. And if x.x.x version is
specified, then just execute them from master.

Also, branch-1 would need to be executed with MR1, right? Then the patch
file would need to be named 'HIVE--mr1.patch' so that it uses the MR1
environment.

Right now the code that parses this info is on process_jira function on
'jenkins-common.sh', and it is called by 'jenkins-submit-build.sh'. We can
parse different branches there, and let jenkins-submit-build.sh call the
correct job with specific branch details.

Any other ideas?

- Sergio



Alan Gates mailto:alanfga...@gmail.com
June 1, 2015 at 16:19
Based on our discussion and vote last week I'm working on creating 
branch-1.   I plan to make the branch tomorrow.  If anyone has a large 
commit they don't want to have to commit twice and they are close to 
committing it let me know so I can make sure it gets in before I branch.


I'll also be updating 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute to 
clarify how to handle feature and bug fix patches on master and branch-1.


Also, we will need to make sure patches can be tested against master 
and branch-1.  If I understand correctly the test system today will 
run a patch against a branch instead of master if the patch is named 
with the branch name.  There are a couple of issues with this.  One, 
people will often want to submit two versions of patches and have them 
both tested (one against master and one against branch-1) rather than 
one or the other.  The second is we will want a way for one patch to 
be tested against both when appropriate.  The first case could be 
handled by the system picking up both branch-1 and master patches and 
running them automatically.  The second could be handled by hints in 
the comments so the system needs to run both.  I'm open to other 
suggestions as well.  Can someone familiar with the testing code point 
to where I'd look to see what it would take to make this work?


Alan.


Hive-0.14 - Build # 973 - Fixed

2015-06-03 Thread Apache Jenkins Server
Changes for Build #972

Changes for Build #973



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #973)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-0.14/973/ to view 
the results.

[jira] [Created] (HIVE-10925) Non-static threadlocals in metastore code can potentially cause memory leak

2015-06-03 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-10925:
---

 Summary: Non-static threadlocals in metastore code can potentially 
cause memory leak
 Key: HIVE-10925
 URL: https://issues.apache.org/jira/browse/HIVE-10925
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.2.0, 1.0.0, 0.14.0, 0.12.0, 0.11.0, 1.1.0, 0.13
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta


There are many places where non-static threadlocals are used. I can't seem to 
find a good logic of using them. However, they can potentially result in 
leaking objects if for example they are created in a long running thread every 
time the thread handles a new session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10906) Value based UDAF function throws NPE

2015-06-03 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-10906:
---

 Summary: Value based UDAF function throws NPE
 Key: HIVE-10906
 URL: https://issues.apache.org/jira/browse/HIVE-10906
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu


The following query throws NPE.
{noformat}
select key, value, min(value) over (partition by key range between unbounded 
preceding and current row) from small;
FAILED: NullPointerException null


2015-06-03 13:48:09,268 ERROR [main]: ql.Driver 
(SessionState.java:printError(957)) - FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.WindowingSpec.validateValueBoundary(WindowingSpec.java:293)
at 
org.apache.hadoop.hive.ql.parse.WindowingSpec.validateWindowFrame(WindowingSpec.java:281)
at 
org.apache.hadoop.hive.ql.parse.WindowingSpec.validateAndMakeEffective(WindowingSpec.java:155)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genWindowingPlan(SemanticAnalyzer.java:11965)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8910)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8868)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9713)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9606)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10079)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:327)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10090)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1124)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1172)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1061)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)