RE: Hive on Tez query failed with ³wrong key class
Also, I believe you are comparing the Tez code for IFile (which is intermediate data) vs code for SequenceFile (which is the final output or initial input from stable storage like HDFS). So they may not be related. -Original Message- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Monday, July 27, 2015 9:20 PM To: u...@tez.apache.org; user@hive.apache.org Cc: Jim Green openkbi...@gmail.com Subject: Re: Hive on Tez query failed with ³wrong key class From the java code which creates the sequence file, it has set the key class to NullWritable.class: job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class); ... I think that caused the mismatch: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable In all possibilities, the exception you¹re hitting originates from here https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-co mmon/src/main/java/org/apache/hadoop/io/SequenceFile.java#L2328 Anyone knows why Tez will check the key and value class when doing sort stuff? As I said in my earlier mail, if you can check the SequenceFile headers and they look like my pasted pair, then we know it¹s the same as the known issue. Cheers, Gopal
Re: Hive on Tez query failed with “wrong key class
Hi Team, Some clue: From the java code which creates the sequence file, it has set the key class to NullWritable.class: job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class); However per the source code of Hive, and the key class for sequence file writer should be : BytesWritable. HiveSequenceFileOutputFormat.java: final SequenceFile.Writer outStream = Utilities.createSequenceWriter(jc, fs, finalOutPath, BytesWritable.class, valueClass, isCompressed); I think that caused the mismatch: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable Then I look into the Tez source code and found the reason is in : tez-runtime-library/src/main/java/org/apache/tez/runtime/lib rary/common/sort/impl/IFile.java /** * Send key/value to be appended to IFile. To represent same key as previous * one, send IFile.REPEAT_KEY as key parameter. Should not call this method with * IFile.REPEAT_KEY as the first key. * * @param key * @param value * @throws IOException */ public void append(Object key, Object value) throws IOException { checkArgument((key == REPEAT_KEY || key.getClass() == keyClass), WRONG_KEY_CLASS, key.getClass(), keyClass); Above IFile should be speficic to Tez. Hive does not have that code to check the key class and value class. Anyone knows why Tez will check the key and value class when doing sort stuff? Thanks. On Tue, Jul 21, 2015 at 5:26 PM, Jim Green openkbi...@gmail.com wrote: Sample stacktrace is : [Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) ... 13 more Caused by: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:363) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:126) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61) ... 15 more Caused by: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2495) at
Re: Hive on Tez query failed with ³wrong key class
From the java code which creates the sequence file, it has set the key class to NullWritable.class: job.setOutputKeyClass(org.apache.hadoop.io.NullWritable.class); ... I think that caused the mismatch: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable In all possibilities, the exception you¹re hitting originates from here https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-co mmon/src/main/java/org/apache/hadoop/io/SequenceFile.java#L2328 Anyone knows why Tez will check the key and value class when doing sort stuff? As I said in my earlier mail, if you can check the SequenceFile headers and they look like my pasted pair, then we know it¹s the same as the known issue. Cheers, Gopal
Re: Hive on Tez query failed with “wrong key class
Sample stacktrace is : [Error: Failure while running task:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:71) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) ... 13 more Caused by: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:363) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:126) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61) ... 15 more Caused by: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2495) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:358) ... 21 more ], On Tue, Jul 21, 2015 at 11:26 AM, Bikas Saha bi...@hortonworks.com wrote: A full stack trace would help determine is this is a Tez issue or hive issue. *From:* Jim Green [mailto:openkbi...@gmail.com] *Sent:* Tuesday, July 21, 2015 11:12 AM *To:* u...@tez.apache.org; user@hive.apache.org *Subject:* Hive on Tez query failed with “wrong key class Hi Team, Env: Hive 1.0 on Tez 0.5.3 Query is a simple group-by on top of sequence table. It fails with below error on tez mode: *java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: * *java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable * And it works fine in MR mode. Anyone met this issue before? -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool) -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
RE: Hive on Tez query failed with “wrong key class
A full stack trace would help determine is this is a Tez issue or hive issue. From: Jim Green [mailto:openkbi...@gmail.com] Sent: Tuesday, July 21, 2015 11:12 AM To: u...@tez.apache.org; user@hive.apache.org Subject: Hive on Tez query failed with “wrong key class Hi Team, Env: Hive 1.0 on Tez 0.5.3 Query is a simple group-by on top of sequence table. It fails with below error on tez mode: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable And it works fine in MR mode. Anyone met this issue before? -- Thanks, www.openkb.infohttp://www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
Re: Hive on Tez query failed with ³wrong key class
Query is a simple group-by on top of sequence table. ... java.io.IOException: java.io.IOException: wrong key class: org.apache.hadoop.io.BytesWritable is not class org.apache.hadoop.io.NullWritable I have seen this issue when mixing Sequence files written by PIG with Sequence files written by Hive - primarily because the data ingestion wasn¹t done properly via HCatalog writers. Last report, the first sequence file had as its header M?.io.LongWritableorg.apache.hadoop.io.BytesWritable)org.apache.hadoop.io. compress.SnappyCodec?? and the second one had SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text)org.apache.h adoop.io.compress.SnappyCodec? You can cross-check the exception trace and make sure that the exception is coming from the RecordReader as the k-v pairs change types between files. Primarily this doesn¹t happen in Hive-mr at the small scale, but it happens for both MR and Tez. To hit this via CombineInputFormat, you need a file which has been split up between machines and two such files to generate a combined split of mismatched schema. Tez is more aggressive at splitting, since it relies on the file format splits, not HDFS locations. If you confirm that this is indeed the cause of the issue, I might have an idea how to fix it. Cheers, Gopal