Hi Tomas, On Tue, 2011-12-13 at 00:52 +0100, Tomas Hlavaty wrote: > > that they are tiny, > > What does "tiny" mean?
Well - you're going to find it hard to make it bigger than the existing rdb files ;-) but by tiny I really mean fast to read from disk and fast to parse. > Currently, rdb files are giant. Sure; they are a disaster :-) > I'm not sure why. If I simply concatenate all idl definitions for > udkapi and offapi into one preprocessed file I get smaller file while > still being a valid idl file containing all the information: Yep; this is well known. It is all done re-using some code not intended for thus purpose, which has been tweaked to the maximum to try to make it suit it better, but it still doesn't ;-) > Is 200kB considered tiny? Sounds fine :-) > And this is just original concatenated idl files. Sure - sounds fine; if we can parse it fast. > How long does reading the type information take at the moment? That's quite hard to say; access to it is extremely scattered across the code. callgrind gives 1.5% in libreg, 0.6% in libstore and some lowish proportion of the 32% in libuno_sal; say perhaps 2.5%. That IMHO hides it's true cost - we have to force pagein all that data before start to avoid horrible I/O patterns mmap gives us as we seek about in those big files. > What do we get to do a lot at startup? I thought we simply load it an > that's it. Sure; we load it & that is it *but* we would really like to be starting in total in under a second, at least making choices that hurt that goal on a fast PC are almost certain to also hurt the goal of working well on mobile devices etc. :-) > If the new format is a text format (I would prefer text format over > another binary one), there needs to be some parsing. unoidl2 can parse > the allpp.idl file (containing all type information) and print the > syntax tree in about 200ms: > > $ rm allpp.ast > $ time make allpp.ast > cat allpp.idl | ./unoidl2ast >allpp.ast > > real 0m0.247s > user 0m0.170s > sys 0m0.100s 250ms is a -really- long time IMHO; particularly since we have to parse the entire file before startup. As Stephan says, perhaps we can overcome this by inlining more in the generated C++ which may make that acceptable later (after all bootstrapping python takes a good long time itself anyway). > If 200ms is slow, we could split the allpp,idl file into something > smaller required at startup and the rest loaded lazily. Possibly; or we could invent yet another format for this type information. Personally, I'd like to keep the number of representations of the same information as low as possible: we already have IDL, we have the binaryurp format [ used for IPC on the wire ] (potentially we could re-use that?), do we have an XML/text IPC protocol ? I suspect we will want that for the remote Javascript/websockets magic - possibly we could use a condensed XML format for this that'd be quicker to parse ? unclear. Stephan - do you have some ideas ? as soon as I see a yacc parser, I see "slow" and "busts the branch predictor" - but perhaps I'm paranoid ;-) > We could have a binary format, something like a mmap dump. That would > be instant but rather ugly. Sure - that'd be bad :-) I like the 'concatenate text files' approach for building the the database (personally). > Are there any other requirements? Like functionality related to > rdbmerge and how extensibility works? Or is that not relevant anymore? rdbmerge is/was IIRC just a compile-time tool. Clearly we need to continue to be able to read old types.rdb files for some time to come, but that can be de-coupled and removed later I think. > I was under impression that these projects somehow depend on the rdb > code, but if they depend on the typedescription api, then it is better > then I hoped (if that typedescription api is somehow separate from the > rdb file code). Sure - there is only one place that we go grubbing with that nasty rdb format - and it's at the bottom of the stack :-) if we can hot plug that out with something else, life is good :-) Thanks, Michael. -- michael.me...@suse.com <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice