[
https://issues.apache.org/jira/browse/CRUNCH-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893846#comment-13893846
]
Laxmikanth Samudrala edited comment on CRUNCH-338 at 2/6/14 9:49 PM:
---------------------------------------------------------------------
i am including our pseudo code to show how we are constructing TupleN
PCollection<Record> entity1to1Data1 = .....;
PCollection<Record> entity1to1Data2 = .....;
PCollection<Record> entity1to1Data3 = .....;
PCollection<Record> entity1to1Data4 = .....;
PCollection<Record> entity1toNData1 = .....;
PCollection<Record> entity1toNData2 = .....;
PCollection<Record> entity1toNData3 = .....;
PTable<String, Record> entity1To1Map1 = entity1to1Data1.by("1 to 1
grouping", new EntityGroup1Fn(), Avros.strings());
PTable<String, Record> entity1To1Map2 = entity1to1Data2.by("1 to 1
grouping", new EntityGroup2Fn(), Avros.strings());
PTable<String, Record> entity1To1Map3 = entity1to1Data3.by("1 to 1
grouping", new EntityGroup3Fn(), Avros.strings());
PTable<String, Record> entity1To1Map4 = entity1to1Data4.by("1 to 1
grouping", new EntityGroup4Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap1 = entity1toNData1.by("1 to n
grouping", new EntityGroup5Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap2 = entity1toNData2.by("1 to n
grouping", new EntityGroup6Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap3 = entity1toNData3.by("1 to n
grouping", new EntityGroup7Fn(), Avros.strings());
PTable<String, TupleN> entityGrouper =
Cogroup.cogroup(entity1To1Map1, entity1To1Map2, entity1To1Map3,
entity1To1Map4, entity1ToNMap1, entity1ToNMap2, entity1ToNMap3);
entityGrouper.parallelDo .....
entityGrouper.parallelDo ....
Note : the strange part is calling parallelDo on entityGrouper table once is
running with no problems; but calling twice failing the process with
ClassCastException
was (Author: laxmikanth.s):
i am including our pseudo code to show how we are constructing TupleN
PCollection<Record> entity1to1Data1 = .....;
PCollection<Record> entity1to1Data2 = .....;
PCollection<Record> entity1to1Data3 = .....;
PCollection<Record> entity1to1Data4 = .....;
PCollection<Record> entity1toNData1 = .....;
PCollection<Record> entity1toNData2 = .....;
PCollection<Record> entity1toNData3 = .....;
PTable<String, Record> entity1To1Map1 = entity1to1Data1.by("1 to 1
grouping", new EntityGroup1Fn(), Avros.strings());
PTable<String, Record> entity1To1Map2 = entity1to1Data2.by("1 to 1
grouping", new EntityGroup2Fn(), Avros.strings());
PTable<String, Record> entity1To1Map3 = entity1to1Data3.by("1 to 1
grouping", new EntityGroup3Fn(), Avros.strings());
PTable<String, Record> entity1To1Map4 = entity1to1Data4.by("1 to 1
grouping", new EntityGroup4Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap1 = entity1toNData1.by("1 to n
grouping", new EntityGroup5Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap2 = entity1toNData2.by("1 to n
grouping", new EntityGroup6Fn(), Avros.strings());
PTable<String, Record> entity1ToNMap3 = entity1toNData3.by("1 to n
grouping", new EntityGroup7Fn(), Avros.strings());
PTable<String, TupleN> entityGrouper =
Cogroup.cogroup(entity1To1Map1, entity1To1Map2, entity1To1Map3,
entity1To1Map4, entity1ToNMap1, entity1ToNMap2, entity1ToNMap3);
entityGrouper.parallelDo .....
entityGrouper.parallelDo ....
> TupleDeepCopier throws java.lang.ClassCastException: java.util.ArrayList
> cannot be cast to org.apache.avro.generic.IndexedRecord
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: CRUNCH-338
> URL: https://issues.apache.org/jira/browse/CRUNCH-338
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.2
> Reporter: Laxmikanth Samudrala
> Assignee: Josh Wills
> Attachments: ClassCastExceptionInDeepCopierIT.java, stack-trace.log
>
>
> when PTable<String, TupleN> using twice and performing parallelDo causing
> java.lang.ClassCastException; when the same PTable<String, TupleN> used once
> for parallelDo not causing the exception or turning PTable<String,TupleN> to
> PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>>
> and using paired PTable twice for parallelDo not causing any exception.
> Note : The root cause seem's expressing the items passed to the TupleN is
> collection or a single instance. Suprisingly when we are performing
> parallelDo operation once on PTable<String, TupleN> is working with no
> exceptions and when performing parallelDo twice seem's try to make use of
> TupleDeepCopier.deepCopy; which is triggering the exception.
> Template of Code :
> Failure case :
> PTable<String, TupleN> entityData = .....;
> entityData.parallelDo(.....);
> entityData.parallelDo(.....);
> Success Case :
> PTable<String, TupleN> entityData = .....;
> entityData.parallelDo(.....);
> Another success case :
> PTable<String, Pair<Tuple4.Collect<?, ?, ?, ?>, Tuple3.Collect<?, ?, ?>>>
> entityData;
> entityData.parallelDo(.....);
> entityData.parallelDo(.....);
> stack trace for reference :
> org.apache.crunch.CrunchRuntimeException: Error while deep copying avro value
> [.........]
> at
> org.apache.crunch.types.avro.AvroDeepCopier.deepCopy(AvroDeepCopier.java:195)
> at
> org.apache.crunch.types.avro.AvroDeepCopier$AvroSpecificDeepCopier.deepCopy(AvroDeepCopier.java:83)
> at
> org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
> at
> org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:60)
> at
> org.apache.crunch.types.TupleDeepCopier.deepCopy(TupleDeepCopier.java:32)
> at
> org.apache.crunch.types.avro.AvroType.getDetachedValue(AvroType.java:217)
> at org.apache.crunch.lib.PTables.getDetachedValue(PTables.java:191)
> at
> org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:149)
> at
> org.apache.crunch.types.avro.AvroTableType.getDetachedValue(AvroTableType.java:36)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:54)
> at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
> at
> org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:56)
> at org.apache.crunch.MapFn.process(MapFn.java:34)
> at org.apache.crunch.impl.mr.run.RTNode.process(RTNode.java:99)
> at org.apache.crunch.impl.mr.run.RTNode.processIterable(RTNode.java:114)
> at
> org.apache.crunch.impl.mr.run.CrunchReducer.reduce(CrunchReducer.java:57)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:447)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)