Looks like it hasn't been committed yet. I think you're interested in this PR: https://github.com/apache/parquet-mr/pull/280
rb On Mon, Mar 21, 2016 at 4:31 AM, Santlal J Gupta < [email protected]> wrote: > Hi Ryan, > > Did you get chance to look into this query. I am still waiting for your > reply. > > Could you tell me in which version Reuben had fixed the issue. > > Thanks > Santlal > > > From: Santlal J Gupta > Sent: Monday, February 29, 2016 6:02 PM > To: '[email protected]' > Cc: Gurdit Singh > Subject: HashJoin throws ParquetDecodingException with input as > ParquetTupleScheme > > > Hi Ryan, > > Currently I am using following version: > > Hadoop : 2.6.0 > Cascading : 3.0.1 > Parquet : 1.6.0 > Parquet-cascading : 1.6.0 > Parquet-hadoop : 1.6.0 > Parquet-column : 1.6.0 > > Thanks > Santlal > > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > Santlal, > > What version of Parquet are you using? I think this was recently fixed by > Reuben. > > rb > > On Tue, Feb 16, 2016 at 5:16 AM, Santlal J Gupta < > [email protected]<mailto:[email protected]>> > wrote: > > > Hi, > > > > I am facing problem while using *HashJoin* with input using > > *ParquetTupleScheme*. I have two source taps of which one is using > > *TextDelimited* scheme and the other source tap is using > > *ParquetTupleScheme. *I am performing a *HashJoin *and writing the data > > as Delimited file. The program runs successfully on local mode but when i > > tried to run it on cluster, it gives following error : > > > > parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 > > in file hdfs://Hostname:8020/user/username/testData/lookup-file.parquet > > at > > > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:211) > > at > > > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:144) > > at > > > parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.<init>(DeprecatedParquetInputFormat.java:91) > > at > > > parquet.hadoop.mapred.DeprecatedParquetInputFormat.getRecordReader(DeprecatedParquetInputFormat.java:42) > > at > > > cascading.tap.hadoop.io.MultiRecordReaderIterator.makeReader(MultiRecordReaderIterator.java:123) > > at > > > cascading.tap.hadoop.io.MultiRecordReaderIterator.getNextReader(MultiRecordReaderIterator.java:172) > > at > > > cascading.tap.hadoop.io.MultiRecordReaderIterator.hasNext(MultiRecordReaderIterator.java:133) > > at > > > cascading.tuple.TupleEntrySchemeIterator.<init>(TupleEntrySchemeIterator.java:94) > > at > > > cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.<init>(HadoopTupleEntrySchemeIterator.java:49) > > at > > > cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.<init>(HadoopTupleEntrySchemeIterator.java:44) > > at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:439) > > at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:108) > > at > > cascading.flow.stream.element.SourceStage.map(SourceStage.java:82) > > at > > cascading.flow.stream.element.SourceStage.run(SourceStage.java:66) > > at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:139) > > at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > Caused by: java.lang.NullPointerException > > at > > > parquet.hadoop.util.counters.mapred.MapRedCounterAdapter.increment(MapRedCounterAdapter.java:34) > > at > > > parquet.hadoop.util.counters.BenchmarkCounter.incrementTotalBytes(BenchmarkCounter.java:75) > > at > > > parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:349) > > at > > > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:114) > > at > > > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:191) > > ... 21 more > > > > *Below are the UseCase:* > > > > public static void main(String[] args) throws IOException { > > > > Configuration conf = new Configuration(); > > > > String[] otherArgs; > > > > otherArgs = new GenericOptionsParser(conf, > > args).getRemainingArgs(); > > > > String argsString = ""; > > for (String arg : otherArgs) { > > argsString = argsString + " " + arg; > > } > > System.out.println("After processing arguments are:" + > argsString); > > > > Properties properties = new Properties(); > > properties.putAll(conf.getValByRegex(".*")); > > > > String OutputPath = "testData/BasicEx_Output"; > > Class types1[] = { String.class, String.class, String.class }; > > Fields f1 = new Fields("id1", "city1", "state"); > > > > Tap source = new Hfs(new TextDelimited(f1, "|", "", types1, > > false), "main-txt-file.dat"); > > Pipe pipe = new Pipe("ReadWrite"); > > > > Scheme pScheme = new ParquetTupleScheme(); > > Tap source2 = new Hfs(pScheme, "testData/lookup-file.parquet"); > > Pipe pipe2 = new Pipe("ReadWrite2"); > > > > Pipe tokenPipe = new HashJoin(pipe, new Fields("id1"), pipe2, new > > Fields("id"), new LeftJoin()); > > > > Tap sink = new Hfs(new TextDelimited(f1, true, "|"), OutputPath, > > SinkMode.REPLACE); > > > > FlowDef flowDef1 = FlowDef.flowDef().addSource(pipe, > > source).addSource(pipe2, source2).addTailSink(tokenPipe, > > sink); > > new > > Hadoop2MR1FlowConnector(properties).connect(flowDef1).complete(); > > > > } > > > > > > I have attached the input files for the reference . Please help me in > > solving this issue. > > > > > > > > I have asked the same question on cascading google group and below is > > response for it : > > > > > > > > *André Kelpe * > > > > > > > > > > > > This looks like a bug caused by a wrong assumption in parquet. I fixed > > a similar thing 2 years ago in parquet: > > https://github.com/Parquet/parquet-mr/pull/388/ Can you check with the > > upstream project? It looks like it is their problem and not a problem > > in Cascading. > > > > - André > > > > - show quoted text - > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > > > "cascading-user" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an > > > > > email to [email protected]<mailto: > [email protected]>. > > > To post to this group, send email to [email protected] > <mailto:[email protected]>. > > > Visit this group at https://groups.google.com/group/cascading-user. > > > To view this discussion on the web visit > > > > > > https://groups.google.com/d/msgid/cascading-user/4af70450-d5f6-4186-bb9e-8b9755ed7bb3%40googlegroups.com > > . > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > -- > > André Kelpe > > [email protected]<mailto:[email protected]> > > http://concurrentinc.com > > > > > > > > > > > > Thanks > > > > Santlal > > > > > > > **************************************Disclaimer****************************************** > > This e-mail message and any attachments may contain confidential > > information and is for the sole use of the intended recipient(s) only. > Any > > views or opinions presented or implied are solely those of the author and > > do not necessarily represent the views of BitWise. If you are not the > > intended recipient(s), you are hereby notified that disclosure, printing, > > copying, forwarding, distribution, or the taking of any action whatsoever > > in reliance on the contents of this electronic information is strictly > > prohibited. If you have received this e-mail message in error, please > > immediately notify the sender and delete the electronic message and any > > attachments.BitWise does not accept liability for any virus introduced by > > this e-mail or any attachments. > > > ******************************************************************************************** > > > > > > > > -- > Ryan Blue > Software Engineer > Netflix > > -- Ryan Blue Software Engineer Netflix
