-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8463/
-----------------------------------------------------------

(Updated Dec. 13, 2012, 1:46 a.m.)


Review request for crunch and Gabriel Reid.


Changes
-------

Came up with a much better way to do this-- we add the SourceTargets that are 
needed for the DoFn to run to the parallelDo call itself, which eliminates the 
possibility of cyclic dependencies. Also updated the mapside join IT to verify 
that we run the right number of MapReduces, even if the mapside joins are 
applied out-of-order.


Description
-------

This involves updating the PCollectionImpl class to be able to track any 
SourceTarget instances that it needs to exist before any Target that depends on 
this PCollectionImpl can be created, and optimizing the MSCRPlanner to check 
for this information and build the jobs to incorporate these dependencies.

This isn't the prettiest implementation of this idea, but I think it'll turn 
out to be a useful thing to have.


This addresses bug CRUNCH-128.
    https://issues.apache.org/jira/browse/CRUNCH-128


Diffs (updated)
-----

  crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680e 
  crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680e 
  crunch/src/main/java/org/apache/crunch/PCollection.java f5a3465 
  crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727 
  crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727 
  crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41ce 
  crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41ce 
  crunch/src/main/java/org/apache/crunch/impl/mem/collect/MemCollection.java 
61bb1e7 
  crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3 
  crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3 
  crunch/src/main/java/org/apache/crunch/impl/mr/collect/DoCollectionImpl.java 
1f4fea2 
  crunch/src/main/java/org/apache/crunch/impl/mr/collect/DoTableImpl.java 
1d19580 
  crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java 
f0d8187 
  crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java 
f0d8187 
  crunch/src/main/java/org/apache/crunch/impl/mr/collect/PTableBase.java 
9183784 
  crunch/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java 7fe2809 
  crunch/src/main/java/org/apache/crunch/io/ReadableSourceTarget.java 95c90aa 
  crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab3 
  crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab3 
  
crunch/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java 
3830616 

Diff: https://reviews.apache.org/r/8463/diff/


Testing
-------

Updated the mapside join IT to use the new code and fixed the in-memory impl to 
work properly.


Thanks,

Josh Wills

Reply via email to