Han-Wen Nienhuys <hanw...@gmail.com> writes:

> On Fri, Apr 10, 2020 at 1:07 PM David Kastrup <d...@gnu.org> wrote:
>> > * memory use: each SCM value takes 8 bytes, and there are 416 grob
>> > properties today, for a total of 3328 bytes. Morgenlied is single page
>> > of music and has 3704 grobs. So storage for the vectors (which will be
>> > mostly empty) will take up 12M / page. This likely will result in
>> > doubling the memory use of LilyPond.
>>
>> You haven't understood the scheme.  A grob only contains the properties
>> seminal to its type.  That's why there is a double indirection.  A full
>> array of 416 entries is needed per grob _type_, not per grob.
>
> I still don't see why this requires an overall overhaul of the
> internal calling convention. You can have a global registry of symbol
> => uint16 (numbering all the symbols),

It would be pointless to have a single registry for music properties,
grob properties, stream event properties, and context properties, thus
significantly blowing up the second-level indexing tables and making for
a large number of cache misses.

> memoize that uint16 like we memoize the symbol today (I think this is
> what you suggested already).  Then use a uint16 => value hashtable in
> the grob/prob/etc. A custom hash implementation can save space by
> storing the uint16 key rather than the 8 byte SCM symbol.
>
> This would be a very conservative, localized change that doesn't
> require setting up machinery to convert types to vectors,

A vector is cheaper to index than a hashtable.  It makes no sense
talking aobut "setting up machinery" with regard to a vector while
taking hash tables for granted.

> and it keeps working if grob types change which grob-interfaces they
> support halfway through.

I don't think there is any point in discussing this patch based on what
you surmise about an ultimate solution.

> There are many types (stem, axis-group, beam, hairpin) that properties
> that are mostly unset. A fixed encoding will waste CPU cache and main
> memory to allow these properties to be set even if it is not
> necessary, so aside from increased complexity, it's not obvious that
> there will be a performance gain.

We can discuss this until the cows go home, but the authoritive answer
will be benchmarks.  They may indicate the necessity for further work,
and doing further work is facilitated by not having to track any added
C++ code accessing properties.  Hence this patch.

I actually expect more gains to be realised by rewriting
break-substitution.cc rather than the actual property access, but having
property access made cheap is certainly not undesirable either.

>> Hashing outside of a closed set still requires storing the actual symbol
>> for corroboration.  Hashing inside of a closed set amounts to the
>> first-stage lookup in my scheme, except that a small number guaranteed
>> to be unique comes out.  This lookup can be memoised and will work
>> permanently as opposed to hashing in a growing set.
>>
>> > We could experiment with this without requiring a giant rewrite of the
>> > source code.
>>
>> What you call a "giant rewrite" is basically this patch.  It's a purely
>> syntactical patch only extending rather than reducing possible
>> approaches and perfectly compatible with different implementations.  The
>> work for that patch is already done, the impact is similar to what
>> happened when changing around the syntax of unsmob a few times ian the
>> past.  It will allow for prolonged work on possibly different
>> implementations without having to constantly deal with merge conflicts.
>>
>> It does not depend on picking a particular implementation.  You
>> expressed interest in what I was planning to do with the opportunity
>> this patch opens, I answered.  I do not, however, see the point in
>> discussing and dissing (partially) unwritten code based on an
>> incomplete understanding in a handwaving discussion.
>
> I think it is not out of the ordinary to discuss plans invasive plans
> before implementing them.

But we are not discussing an invasive plan here.  This here is an
extensive, not an invasive change.  There is no point in dissing
unwritten code that take no project-wide resources.  The point of time
to review code is after it is proposed.

> I think this change is invasive because it changes the idiom across
> the source code, and I disagree that this change is net neutral. C++
> method calls are a more natural idiom plain function calls.

We are not talking about C++ method calls here.  We are talking about
macro calls.  Memoisation of any kind can only be implemented at the
macro call level when diversifying on a string literal since those are
not available as template arguments, otherwise one could transfer the
job to template specialisation like it is done in a few other places.
So the macro needs information to work with.  This change makes what
actually happens more rather than less transparent and idiomatic.

> If we can get a significant performance boost, that may be worth it,
> but I think we should first understand if that is the case.

This patch only increases the options for changing implementations.

> As said (see above), I am skeptical.
>
>> It is more efficient to discuss objections of the "you haven't thought
>> this through" kind on working code.
>>
>> Nothing in this patch here enforces picking a particular route.  It is
>> just paving the ground for more work, and it does so in a manner that
>> isn't so much of an eye sore that it can only be tolerated in
>> expectation of immense future gains.

-- 
David Kastrup

Reply via email to