For the sake of prototyping, can you use a util that simply materializes
the intermediate result in a file system (using typeInfo input and output
formats) ?

On Tue, Apr 7, 2015 at 6:21 PM, Maximilian Michels <m...@apache.org> wrote:

> On Mon, Apr 6, 2015 at 2:37 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > BTW: Should "print()" be also an "eager" statement? I think it needs to
> be,
> > if we want to print to the driver's std out
>
>
> Yes, if we change print() to print on the Client, then it needs to execute
> eagerly.
>
> On Thu, Apr 2, 2015 at 6:59 PM, Alexander Alexandrov <
> alexander.s.alexand...@gmail.com> wrote:
>
> > I would like to run a dataflow up to a particular point and materialize
> (in
> > memory) the intermediate result. Is this possible at the moment?
> >
>
> Blocking execution mode is already implemented but the results are
> currently not cached. That means that they are lost after they are
> consumed.
>
> On Mon, Apr 6, 2015 at 2:37 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > count() and collect() need to immediately trigger an execution, because
> the
> > driver program cannot proceed otherwise. They are "eager".
> >
> > Regular sinks are "lazy", they wait until the program is triggered
> anyways.
> >
> > BTW: Should "print()" be also an "eager" statement? I think it needs to
> be,
> > if we want to print to the driver's std out.
> >
> > On Thu, Apr 2, 2015 at 5:51 PM, Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> >
> > > In my opinion it should not be handled like print. The idea behind
> > > count()/collect() is that they immediately return the result which can
> > > then be used in further flink operations.
> > >
> > > Right now, this is not properly/efficiently implemented but once we
> > > have support for intermediate results on this level they start making
> > > more sense. Also, in such a case an execute would not be required
> > > after a collect()/count() if only the result of that call is required.
> > >
> > > On Thu, Apr 2, 2015 at 5:33 PM, Felix Neutatz <neut...@googlemail.com>
> > > wrote:
> > > > Hi,
> > > >
> > > > I have run the following program:
> > > >
> > > > final ExecutionEnvironment env =
> > > ExecutionEnvironment.getExecutionEnvironment();
> > > >
> > > > List l = Arrays.asList(new Tuple1<Long>(1L));
> > > > TypeInformation t = TypeInfoParser.parse("Tuple1<Long>");
> > > > DataSet<Tuple1<Long>> data = env.fromCollection(l, t);
> > > >
> > > > long value = data.count();
> > > > System.out.println(value);
> > > >
> > > > env.execute("example");
> > > >
> > > >
> > > > Since there is no "real" data sink, I get the following:
> > > > Exception in thread "main" java.lang.RuntimeException: No data sinks
> > have
> > > > been created yet. A program needs at least one sink that consumes
> data.
> > > > Examples are writing the data set or printing it.
> > > >
> > > > In my opinion, we should handle count() and collect() like print().
> > > >
> > > > What do you think?
> > > >
> > > > Best regards,
> > > >
> > > > Felix
> > >
> >
>

Reply via email to