subject:"Re\: Hive on Tez query failed with ³wrong key class\""

RE: Hive on Tez query failed with ³wrong key class

2015-07-28 Thread Bikas Saha

Also, I believe you are comparing the Tez code for IFile (which is intermediate 
data) vs code for SequenceFile (which is the final output or initial input from 
stable storage like HDFS). So they may not be related.

-Original Message-
From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
Vijayaraghavan
Sent: Monday, July 27, 2015 9:20 PM
To: u...@tez.apache.org; user@hive.apache.org
Cc: Jim Green openkbi...@gmail.com
Subject: Re: Hive on Tez query failed with ³wrong key class

 From the java code which creates the sequence file, it has set the key 
class to NullWritable.class:
 job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class);
...
 I think that caused the mismatch:
 wrong key class: org.apache.hadoop.io.BytesWritable is not class 
org.apache.hadoop.io.NullWritable

In all possibilities, the exception you¹re hitting originates from here

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-co
mmon/src/main/java/org/apache/hadoop/io/SequenceFile.java#L2328

 Anyone knows why Tez will check the key and value class when doing 
sort stuff?

As I said in my earlier mail, if you can check the SequenceFile headers and 
they look like my pasted pair, then we know it¹s the same as the known issue.

Cheers,
Gopal

Re: Hive on Tez query failed with “wrong key class

2015-07-27 Thread Jim Green

Hi Team,

Some clue:
From the java code which creates the sequence file, it has set the key
class to NullWritable.class:
job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class);

However per the source code of Hive, and the key class for sequence file
writer should be : BytesWritable.
HiveSequenceFileOutputFormat.java:
final SequenceFile.Writer outStream = Utilities.createSequenceWriter(jc,
fs, finalOutPath, BytesWritable.class, valueClass, isCompressed);

I think that caused the mismatch:
wrong key class: org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable

Then I look into the Tez source code and found the reason is in :
tez-runtime-library/src/main/java/org/apache/tez/runtime/lib
rary/common/sort/impl/IFile.java
/**
* Send key/value to be appended to IFile. To represent same key as previous
* one, send IFile.REPEAT_KEY as key parameter. Should not call this method
with
* IFile.REPEAT_KEY as the first key.
*
* @param key
* @param value
* @throws IOException
*/
public void append(Object key, Object value) throws IOException {
checkArgument((key == REPEAT_KEY || key.getClass() == keyClass),
WRONG_KEY_CLASS,
key.getClass(), keyClass);

Above IFile should be speficic to Tez. Hive does not have that code to
check the key class and value class.
Anyone knows why Tez will check the key and value class when doing sort
stuff?

Thanks.



On Tue, Jul 21, 2015 at 5:26 PM, Jim Green openkbi...@gmail.com wrote:


 Sample stacktrace is :
 [Error: Failure while running task:java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
 java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is
 not class org.apache.hadoop.io.NullWritable
 at
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
 at
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
 at
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
 at
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
 at
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
 at
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
 at
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
 java.io.IOException: java.io.IOException: wrong key class:
 org.apache.hadoop.io.BytesWritable is not class
 org.apache.hadoop.io.NullWritable
 at
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
 at
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
 at
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
 ... 13 more
 Caused by: java.io.IOException: java.io.IOException: wrong key class:
 org.apache.hadoop.io.BytesWritable is not class
 org.apache.hadoop.io.NullWritable
 at
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:363)
 at
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
 at
 org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
 at
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 at
 org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:126)
 at
 org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
 at
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
 ... 15 more
 Caused by: java.io.IOException: wrong key class:
 org.apache.hadoop.io.BytesWritable is not class
 org.apache.hadoop.io.NullWritable
 at
 org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2495)
 at

Re: Hive on Tez query failed with ³wrong key class

2015-07-27 Thread Gopal Vijayaraghavan




 From the java code which creates the sequence file, it has set the key
class to NullWritable.class:
 job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class);
...
 I think that caused the mismatch:
 wrong key class: org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable

In all possibilities, the exception you¹re hitting originates from here

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-co
mmon/src/main/java/org/apache/hadoop/io/SequenceFile.java#L2328


 Anyone knows why Tez will check the key and value class when doing sort
stuff?

As I said in my earlier mail, if you can check the SequenceFile headers
and they look like my pasted pair, then we know it¹s the same as the known
issue.

Cheers,
Gopal

Re: Hive on Tez query failed with “wrong key class

2015-07-21 Thread Jim Green

Sample stacktrace is :
[Error: Failure while running task:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException:
java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is
not class org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:363)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:126)
at
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
... 15 more
Caused by: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2495)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:358)
... 21 more
],



On Tue, Jul 21, 2015 at 11:26 AM, Bikas Saha bi...@hortonworks.com wrote:

  A full stack trace would help determine is this is a Tez issue or hive
 issue.



 *From:* Jim Green [mailto:openkbi...@gmail.com]
 *Sent:* Tuesday, July 21, 2015 11:12 AM
 *To:* u...@tez.apache.org; user@hive.apache.org
 *Subject:* Hive on Tez query failed with “wrong key class



 Hi Team,



 Env: Hive 1.0 on Tez 0.5.3

 Query is a simple group-by on top of sequence table.



 It fails with below error on tez mode:

 *java.lang.RuntimeException:
 org.apache.hadoop.hive.ql.metadata.HiveException: *

 *java.io.IOException: java.io.IOException: wrong key class:
 org.apache.hadoop.io.BytesWritable is not class
 org.apache.hadoop.io.NullWritable *



 And it works fine in MR mode.

 Anyone met this issue before?



 --

 Thanks,

 www.openkb.info

 (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)




-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

RE: Hive on Tez query failed with “wrong key class

2015-07-21 Thread Bikas Saha

A full stack trace would help determine is this is a Tez issue or hive issue.

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 21, 2015 11:12 AM
To: u...@tez.apache.org; user@hive.apache.org
Subject: Hive on Tez query failed with “wrong key class

Hi Team,

Env: Hive 1.0 on Tez 0.5.3
Query is a simple group-by on top of sequence table.

It fails with below error on tez mode:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.IOException: java.io.IOException: wrong key class: 
org.apache.hadoop.io.BytesWritable is not class 
org.apache.hadoop.io.NullWritable

And it works fine in MR mode.
Anyone met this issue before?

--
Thanks,
www.openkb.infohttp://www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Re: Hive on Tez query failed with ³wrong key class

2015-07-21 Thread Gopal Vijayaraghavan


 Query is a simple group-by on top of sequence table.
...
 java.io.IOException: java.io.IOException: wrong key class:
org.apache.hadoop.io.BytesWritable is not class
org.apache.hadoop.io.NullWritable

I have seen this issue when mixing Sequence files written by PIG with
Sequence files written by Hive - primarily because the data ingestion
wasn¹t done properly via HCatalog writers.

Last report, the first sequence file had as its header

M?.io.LongWritableorg.apache.hadoop.io.BytesWritable)org.apache.hadoop.io.
compress.SnappyCodec??


and the second one had

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text)org.apache.h
adoop.io.compress.SnappyCodec?


You can cross-check the exception trace and make sure that the exception
is coming from the RecordReader as the k-v pairs change types between
files.

Primarily this doesn¹t happen in Hive-mr at the small scale, but it
happens for both MR and Tez.

To hit this via CombineInputFormat, you need a file which has been split
up between machines and two such files to generate a combined split of
mismatched schema.

Tez is more aggressive at splitting, since it relies on the file format
splits, not HDFS locations.

If you confirm that this is indeed the cause of the issue, I might have an
idea how to fix it.

Cheers,
Gopal

RE: Hive on Tez query failed with ³wrong key class

Re: Hive on Tez query failed with “wrong key class

Re: Hive on Tez query failed with ³wrong key class

Re: Hive on Tez query failed with “wrong key class

RE: Hive on Tez query failed with “wrong key class

Re: Hive on Tez query failed with ³wrong key class

6 matches

Site Navigation

Mail list logo

Footer information