On Sat, Apr 11, 2020 at 3:55 PM David Kastrup <d...@gnu.org> wrote: > >> You haven't understood the scheme. A grob only contains the properties > >> seminal to its type. That's why there is a double indirection. A full > >> array of 416 entries is needed per grob _type_, not per grob. > > > > I still don't see why this requires an overall overhaul of the > > internal calling convention. You can have a global registry of symbol > > => uint16 (numbering all the symbols), > > It would be pointless to have a single registry for music properties, > grob properties, stream event properties, and context properties, thus > significantly blowing up the second-level indexing tables and making for > a large number of cache misses.
there are 400 grob properties, and about 800 properties in total (including music and context). 400 requires 9 bits, 800 requires 10 bits. It doesn't appreciably change the problem size. > > memoize that uint16 like we memoize the symbol today (I think this is > > what you suggested already). Then use a uint16 => value hashtable in > > the grob/prob/etc. A custom hash implementation can save space by > > storing the uint16 key rather than the 8 byte SCM symbol. > > > > This would be a very conservative, localized change that doesn't > > require setting up machinery to convert types to vectors, > > A vector is cheaper to index than a hashtable. It makes no sense > talking aobut "setting up machinery" with regard to a vector while > taking hash tables for granted. > > > and it keeps working if grob types change which grob-interfaces they > > support halfway through. > > I don't think there is any point in discussing this patch based on what > you surmise about an ultimate solution. Have a look at https://github.com/hanwen/lilypond/tree/grob-dict It explores different proposals, all of them with neglible to negative performance impact. The last series introduces a 'symid' (a global, memoized uint16 identifier for a property), with the optimized hashtable. I think you could implement what you want to do try by doing the symid => per grob type vector index lookup in Grob::internal_xx_property. I still don't see what you'd do with the infromation you gain by reconstructing the call sites. Even if you know the C++ type (Grob vs Context) in the call site, you still won't know the actual grob type (Stem vs. NoteHead) in the call sites, which you'd need to do further memoization. I looked a bit closer at your call-site reorganization. It introduces 'this' as parameter in many places, which I find ugly, and certainly is nonstandard idiom in C++. I suggest we proceed only if you can produce a measurable performance improvement. If that happens, I'd be happy to help deal with any merge conflicts. Until that time, you can work on the change by not rebasing it. > >> Hashing outside of a closed set still requires storing the actual symbol > >> for corroboration. Hashing inside of a closed set amounts to the > >> first-stage lookup in my scheme, except that a small number guaranteed > >> to be unique comes out. This lookup can be memoised and will work > >> permanently as opposed to hashing in a growing set. > >> > >> > We could experiment with this without requiring a giant rewrite of the > >> > source code. > >> > >> What you call a "giant rewrite" is basically this patch. It's a purely > >> syntactical patch only extending rather than reducing possible > >> approaches and perfectly compatible with different implementations. The > >> work for that patch is already done, the impact is similar to what > >> happened when changing around the syntax of unsmob a few times ian the > >> past. It will allow for prolonged work on possibly different > >> implementations without having to constantly deal with merge conflicts. > >> > >> It does not depend on picking a particular implementation. You > >> expressed interest in what I was planning to do with the opportunity > >> this patch opens, I answered. I do not, however, see the point in > >> discussing and dissing (partially) unwritten code based on an > >> incomplete understanding in a handwaving discussion. > > > > I think it is not out of the ordinary to discuss plans invasive plans > > before implementing them. > > But we are not discussing an invasive plan here. This here is an > extensive, not an invasive change. There is no point in dissing > unwritten code that take no project-wide resources. The point of time > to review code is after it is proposed. > > > I think this change is invasive because it changes the idiom across > > the source code, and I disagree that this change is net neutral. C++ > > method calls are a more natural idiom plain function calls. > > We are not talking about C++ method calls here. We are talking about > macro calls. Memoisation of any kind can only be implemented at the > macro call level when diversifying on a string literal since those are > not available as template arguments, otherwise one could transfer the > job to template specialisation like it is done in a few other places. > So the macro needs information to work with. This change makes what > actually happens more rather than less transparent and idiomatic. > > > If we can get a significant performance boost, that may be worth it, > > but I think we should first understand if that is the case. > > This patch only increases the options for changing implementations. > > > As said (see above), I am skeptical. > > > >> It is more efficient to discuss objections of the "you haven't thought > >> this through" kind on working code. > >> > >> Nothing in this patch here enforces picking a particular route. It is > >> just paving the ground for more work, and it does so in a manner that > >> isn't so much of an eye sore that it can only be tolerated in > >> expectation of immense future gains. > > -- > David Kastrup -- Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen