-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8463/
-----------------------------------------------------------
(Updated Dec. 13, 2012, 1:46 a.m.)
Review request for crunch and Gabriel Reid.
Changes
-------
Came up with a much better way to do this-- we add the SourceTargets that are
needed for the DoFn to run to the parallelDo call itself, which eliminates the
possibility of cyclic dependencies. Also updated the mapside join IT to verify
that we run the right number of MapReduces, even if the mapside joins are
applied out-of-order.
Description
-------
This involves updating the PCollectionImpl class to be able to track any
SourceTarget instances that it needs to exist before any Target that depends on
this PCollectionImpl can be created, and optimizing the MSCRPlanner to check
for this information and build the jobs to incorporate these dependencies.
This isn't the prettiest implementation of this idea, but I think it'll turn
out to be a useful thing to have.
This addresses bug CRUNCH-128.
https://issues.apache.org/jira/browse/CRUNCH-128
Diffs (updated)
-----
crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680e
crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680e
crunch/src/main/java/org/apache/crunch/PCollection.java f5a3465
crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727
crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727
crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41ce
crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41ce
crunch/src/main/java/org/apache/crunch/impl/mem/collect/MemCollection.java
61bb1e7
crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3
crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3
crunch/src/main/java/org/apache/crunch/impl/mr/collect/DoCollectionImpl.java
1f4fea2
crunch/src/main/java/org/apache/crunch/impl/mr/collect/DoTableImpl.java
1d19580
crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java
f0d8187
crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java
f0d8187
crunch/src/main/java/org/apache/crunch/impl/mr/collect/PTableBase.java
9183784
crunch/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java 7fe2809
crunch/src/main/java/org/apache/crunch/io/ReadableSourceTarget.java 95c90aa
crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab3
crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab3
crunch/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java
3830616
Diff: https://reviews.apache.org/r/8463/diff/
Testing
-------
Updated the mapside join IT to use the new code and fixed the in-memory impl to
work properly.
Thanks,
Josh Wills