[
https://issues.apache.org/jira/browse/CRUNCH-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micah Whitacre updated CRUNCH-361:
----------------------------------
Summary: Adjust the planner to handle non-existent SourceTargets (was:
Illegal State Exception)
> Adjust the planner to handle non-existent SourceTargets
> -------------------------------------------------------
>
> Key: CRUNCH-361
> URL: https://issues.apache.org/jira/browse/CRUNCH-361
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Jinal Shah
> Assignee: Josh Wills
> Priority: Minor
>
> So apparently I was trying to use the ParallelDoOption in order to tell the
> planner to do something in a certain way. So when you pass the sourceTarget
> to it and do the union or co-group in the steps following that on the
> PCollection that was generated it tries to find the size of the parent source
> which is still not generated. Here are the steps to produce it
> {code}
> PCollection<U> collection = afterSomeOperation();
> SourceTarget<U> marker = new SourceTarget<U>(pathThatDoesNotExist); // this
> could be any SourceTarget implementation
> pipeline.write(collection, marker);
> PCollection<U> collection2 = pipeline.read(marker);
> PCollection<V> collection3 =
> collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build());
> doSomeMoreOperation();
> PCollection<V> union = collection3.union(SomePCollectionOfV);
> {code}
> This will throw the exception since the union will not be able to find the
> size of the marker since it is not generated yet. So the planner should know
> that the Source is not generated yet and there is a job in the pipeline that
> will generate it.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)