Re: Tuple performance and the curious JIT compiler

Stephan Ewen Wed, 09 Mar 2016 09:35:54 -0800

Thanks for posting this.

I think it is not super urgent (in the sense of weeks or few months), so
results around mid summer is probably good.
The background in LLVM is a very good base for this!


On Wed, Mar 9, 2016 at 3:56 PM, Gábor Horváth <[email protected]> wrote:

> Hi,
>
> In the meantime I sent out the current version of the proposal draft [1].
> Hopefully it will help you triage this task and contribute to the
> discussion of the problem.
> How urgent is this issue? In what time frame should there be results?
>
> Best Regards,
> Gábor
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/GSoC-Project-Proposal-Draft-Code-Generation-in-Serializers-td10702.html
>
> On 9 March 2016 at 14:49, Stephan Ewen <[email protected]> wrote:
>
> > Do we have consensus that we want to "reserve" this topic for a GSoC
> > student?
> >
> > It is becoming a feature that gains more importance. To see we can "hold
> > off" on working on that, would be good to know a bit more, like
> >   - when is it decided whether this project takes place?
> >   - when would results be there?
> >   - can we expect the results to be usable, i.e., how good is the
> student?
> > (no offence, but so far the results in GSoC were everywhere between very
> > good and super bad)
> >
> > Greetings,
> > Stephan
> >
> >
> > On Tue, Mar 8, 2016 at 4:28 PM, Márton Balassi <[email protected]
> >
> > wrote:
> >
> > > @Fabian: That is my bad, but I think we should be still on time. Pinged
> > Uli
> > > just to make sure. Proposal from Gabor and Jira from me are coming
> soon.
> > >
> > > On Tue, Mar 8, 2016 at 11:43 AM, Fabian Hueske <[email protected]>
> > wrote:
> > >
> > > > Hi Gabor,
> > > >
> > > > I did not find any Flink proposals for this year's GSoC in JIRA
> (should
> > > be
> > > > labeled with gsoc2016).
> > > > I am also not sure if any of the Flink committers signed up as a GSoC
> > > > mentor.
> > > > Maybe it is still time to do that but as it looks right now there are
> > no
> > > > GSoC projects offered by Flink.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2016-03-08 11:22 GMT+01:00 Gábor Horváth <[email protected]>:
> > > >
> > > > > Hi!
> > > > >
> > > > > I am planning to do GSoC and I would like to work on the
> serializers.
> > > > More
> > > > > specifically I would like to implement code generation. I am
> planning
> > > to
> > > > > send the first draft of the proposal to the mailing list early next
> > > week.
> > > > > If everything is going well, that will include some preliminary
> > > > benchmarks
> > > > > how much performance gain can be expected from hand written
> > > serializers.
> > > > >
> > > > > Best regards,
> > > > > Gábor
> > > > >
> > > > > On 8 March 2016 at 10:47, Stephan Ewen <[email protected]> wrote:
> > > > >
> > > > > > Ah, very good, that makes sense!
> > > > > >
> > > > > > I would guess that this performance difference could probably be
> > seen
> > > > at
> > > > > > various points where generic serializers and comparators are used
> > > (also
> > > > > for
> > > > > > Comparable, Writable) or
> > > > > > where the TupleSerializer delegates to a sequence of other
> > > > > TypeSerializers.
> > > > > >
> > > > > > I guess creating more specialized serializers would solve some of
> > > these
> > > > > > problems, like in your IntValue vs LongValue case.
> > > > > >
> > > > > > The best way to solve that would probably be through code
> > generation
> > > in
> > > > > the
> > > > > > serializers. That has actually been my wish for quite a while.
> > > > > > If you are also into these kinds of low-level performance topics,
> > we
> > > > > could
> > > > > > start a discussion on that.
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 7, 2016 at 11:25 PM, Greg Hogan <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > The issue is not with the Tuple hierarchy (running Gelly
> examples
> > > had
> > > > > no
> > > > > > > effect on runtime, and as you note there aren't any subclass
> > > > overrides)
> > > > > > but
> > > > > > > with CopyableValue. I had been using IntValue exclusively but
> had
> > > > > > switched
> > > > > > > to using LongValue for graph generation.
> CopyableValueComparator
> > > and
> > > > > > > CopyableValueSerializer are now working with multiple types.
> > > > > > >
> > > > > > > If I create IntValue- and LongValue-specific versions of
> > > > > > > CopyableValueComparator and CopyableValueSerializer and modify
> > > > > > > ValueTypeInfo to return these then I see the expected
> > performance.
> > > > > > >
> > > > > > > Greg
> > > > > > >
> > > > > > > On Mon, Mar 7, 2016 at 5:18 AM, Stephan Ewen <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Greg!
> > > > > > > >
> > > > > > > > Sounds very interesting.
> > > > > > > >
> > > > > > > > Do you have a hunch what "virtual" Tuple methods are being
> used
> > > > that
> > > > > > > become
> > > > > > > > less jit-able? In many cases, tuples use only field accesses
> > > (like
> > > > > > > > "vakle.f1") in the user functions.
> > > > > > > >
> > > > > > > > I have to dig into the serializers, to see if they could
> suffer
> > > > from
> > > > > > > that.
> > > > > > > > The "getField(pos)" method for example should always have
> many
> > > > > > overrides
> > > > > > > > (though few would be loaded at any time, because one usually
> > does
> > > > not
> > > > > > use
> > > > > > > > all Tuple classes at the same time).
> > > > > > > >
> > > > > > > > Greetings,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Mar 4, 2016 at 11:37 PM, Greg Hogan <
> > [email protected]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I am noticing what looks like the same drop-off in
> > performance
> > > > when
> > > > > > > > > introducing TupleN subclasses as expressed in
> "Understanding
> > > the
> > > > > JIT
> > > > > > > and
> > > > > > > > > tuning the implementation" [1].
> > > > > > > > >
> > > > > > > > > I start my single-node cluster, run an algorithm which
> relies
> > > > > purely
> > > > > > on
> > > > > > > > > Tuples, and measure the runtime. I execute a separate jar
> > which
> > > > > > > executes
> > > > > > > > > essentially the same algorithm but using Gelly's Edge
> (which
> > > > > > subclasses
> > > > > > > > > Tuple3 but does not add any extra fields) and now both the
> > > Tuple
> > > > > and
> > > > > > > Edge
> > > > > > > > > algorithms take twice as long.
> > > > > > > > >
> > > > > > > > > Has this been previously discussed? If not I can work up a
> > > > > > > demonstration.
> > > > > > > > >
> > > > > > > > > [1]
> > > > https://flink.apache.org/news/2015/09/16/off-heap-memory.html
> > > > > > > > >
> > > > > > > > > Greg
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tuple performance and the curious JIT compiler

Reply via email to