-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8463/#review14317
-----------------------------------------------------------


In general it seems pretty good to me. It would be nice if there were some more 
tests on this functionality specifically (outside of being dependent on the 
MapsideJoinIT), but I'm also having a hard time thinking of some good test 
cases. One idea of an additional test that *is* related to MapsideJoins could 
be to ensure that multiple mapside joins run in parallel do their setup in the 
same job (instead of each being run in separate jobs like in the past).

I did run into one issue by accident that put the planner in an infinite loop, 
but I'll put the details of that on JIRA.


crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java
<https://reviews.apache.org/r/8463/#comment30490>

    This test should probably be called testMapsideJoin_MemPipeline instead 
now. Previously the point of it was to point out that anything that wasn't a 
MRPipeline wouldn't work, whereas now the point is to show that it specifically 
does work with a MemPipeline.



crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java
<https://reviews.apache.org/r/8463/#comment30492>

    Is there a reason that a non-immutable Set isn't just used from in the 
start? When you see this statement (instantiating a new set if the existing one 
is empty), it looks like a mistake at first.


- Gabriel Reid


On Dec. 11, 2012, 7:26 a.m., Josh Wills wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8463/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2012, 7:26 a.m.)
> 
> 
> Review request for crunch and Gabriel Reid.
> 
> 
> Description
> -------
> 
> This involves updating the PCollectionImpl class to be able to track any 
> SourceTarget instances that it needs to exist before any Target that depends 
> on this PCollectionImpl can be created, and optimizing the MSCRPlanner to 
> check for this information and build the jobs to incorporate these 
> dependencies.
> 
> This isn't the prettiest implementation of this idea, but I think it'll turn 
> out to be a useful thing to have.
> 
> 
> This addresses bug CRUNCH-128.
>     https://issues.apache.org/jira/browse/CRUNCH-128
> 
> 
> Diffs
> -----
> 
>   crunch/src/it/java/org/apache/crunch/lib/join/MapsideJoinIT.java 297680e 
>   crunch/src/main/java/org/apache/crunch/Pipeline.java bcf8727 
>   crunch/src/main/java/org/apache/crunch/impl/mem/MemPipeline.java 77c41ce 
>   crunch/src/main/java/org/apache/crunch/impl/mr/MRPipeline.java 60950f3 
>   crunch/src/main/java/org/apache/crunch/impl/mr/collect/PCollectionImpl.java 
> f0d8187 
>   crunch/src/main/java/org/apache/crunch/impl/mr/plan/MSCRPlanner.java 
> 7fe2809 
>   crunch/src/main/java/org/apache/crunch/io/ReadableSourceTarget.java 95c90aa 
>   crunch/src/main/java/org/apache/crunch/lib/join/MapsideJoin.java 0ca1ab3 
>   
> crunch/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java
>  3830616 
> 
> Diff: https://reviews.apache.org/r/8463/diff/
> 
> 
> Testing
> -------
> 
> Updated the mapside join IT to use the new code and fixed the in-memory impl 
> to work properly.
> 
> 
> Thanks,
> 
> Josh Wills
> 
>

Reply via email to