[
https://issues.apache.org/jira/browse/CRUNCH-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinal Shah updated CRUNCH-361:
------------------------------
Description:
So apparently I was trying to use the ParallelDoOption in order to tell the
planner to do something in a certain way. So when you pass the sourceTarget to
it and do the union or co-group in the steps following that on the PCollection
that was generated it tries to find the size of the parent source which is
still not generated. Here are the steps to produce it
{code}
PCollection<U> collection = afterSomeOperation();
SourceTarget<U> marker = new SourceTarget<U>(pathThatDoesNotExist); // this
could be any SourceTarget implementation
pipeline.write(collection, marker);
PCollection<U> collection2 = pipeline.read(marker);
PCollection<V> collection3 =
collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build());
doSomeMoreOperation();
PCollection<V> union = collection3.union(SomePCollectionOfV);
{code}
This will throw the exception since the union will not be able to find the size
of the marker since it is not generated yet. So the planner should know that
the Source is not generated yet and there is a job in the pipeline that will
generate it.
was:
So apparently I was trying to use the ParallelDoOption in order to tell the
planner to do something in a certain way. So when you pass the sourceTarget to
it and do the union or co-group in the steps following that on the PCollection
that was generated it tries to find the size of the parent source which is
still not generated. Here are the steps to produce it
{code}
PCollection<U> collection = afterSomeOperation();
SourceTarget<U> marker = new SourceTarget<U>(pathThatDoesNotExist); // this
could be any SourceTarget implementation
pipeline.write(collection, marker);
PCollection<U> collection2 = pipeline.read(marker);
PCollection<V> collection3 =
collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build());
doSomeMoreOperation();
PCollection<V> union = collection3.union(SomePCollectionOfV);
{code}
This will throw the exception since the union will not be able to find the size
of the marker since it is not generated yet. So the planner should know that
the Source is not generated yet and there is a job in the pipeline that will
generate it.
> Illegal State Exception
> -----------------------
>
> Key: CRUNCH-361
> URL: https://issues.apache.org/jira/browse/CRUNCH-361
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Jinal Shah
> Assignee: Josh Wills
> Priority: Minor
>
> So apparently I was trying to use the ParallelDoOption in order to tell the
> planner to do something in a certain way. So when you pass the sourceTarget
> to it and do the union or co-group in the steps following that on the
> PCollection that was generated it tries to find the size of the parent source
> which is still not generated. Here are the steps to produce it
> {code}
> PCollection<U> collection = afterSomeOperation();
> SourceTarget<U> marker = new SourceTarget<U>(pathThatDoesNotExist); // this
> could be any SourceTarget implementation
> pipeline.write(collection, marker);
> PCollection<U> collection2 = pipeline.read(marker);
> PCollection<V> collection3 =
> collection2.parallelDo(DoFn,PType,ParallelDoOptions.builder().sources(marker).build());
> doSomeMoreOperation();
> PCollection<V> union = collection3.union(SomePCollectionOfV);
> {code}
> This will throw the exception since the union will not be able to find the
> size of the marker since it is not generated yet. So the planner should know
> that the Source is not generated yet and there is a job in the pipeline that
> will generate it.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)