Hi Michael, On 28 January 2011 04:04, Michael Meeks <michael.me...@novell.com> wrote: > > > Licensing wise - I'd like to add the standard LGPLv3+/MPL header to it > (see bootstrap/) but having MIT too is fine if you want.
This patch adds the (c) header from the template to the idxdict.cpp although i had to tweak it to 2011. > > I have no idea how this would be integrated into the build process as I'm > > not even sure where it is called from, but happy if someone wants to > > take up the challenge and/or incorporate it as an installer process. > > So - the installer process is more exciting on Windows I think - we'll > need to see how the setup_native/ tools are called and be inspired by > that I think. I think in order to do any work on the windows installer I would have to work out how to get a windows compile environment setup. I currently only have it setup on my Ubunto machine. > > The same set of files using th_gen_idx.pl took around 5 seconds (although > > some basic fixups got it done to 3.5 seconds). > > Great - its trivial; indeed - it rather makes you wonder whether we > need the indexes at all ? [ I wonder what they are good for, and/or what > code loads and uses them ;-]. We may discover that in fact there is no > need for them to be indexed - any chance of a dig around ? I imagine my timings are a bit skewed by the machine I tested on, and the number of times I ran it. I'm sure all the dictionaries were well and truly in buffer cache so there was no I/O for the test. On slower machines (are you targetting these) or slower disks there is a chance the index files may offer a performance improvement. Here is the same test after I dropped all my buffer cache: real 0m2.300s user 0m0.700s sys 0m0.150s > > These range from having the entry count incorrect, causing the index > > process to miss a word (lots of these in some dictionaries), to having > > words apparently duplicated either as the next entry, or sometimes a long > > way apart. > > That is bad; we should mail the l10n list to ask them to have a look I > suppose. I wasn't aware there was such a list and I can't find one on freedesktop.org - is it a libreoffice related l10n list, or are these dictionaries sourced from another project? > > I have not attempted to fix these dictionary issues, but if they are > > serious it might be worth having a perl script that is able to validate > > the dictionaries are internally consistent. Unfortunately, it would have > > to use heuristics as the file format makes it difficult to tell in general > > what kind of line is being processed. > > Right; we should validate them as we compile the index perhaps - or at > least, look at the parser and see how it has traditionally interpreted > them. If a utility were written that can validate the files, would it be possible to make it reject on commit if it detected errors? > > Having multiple entries for a word when loaded into libreoffice? > The native code thing is great; it'd be wonderful if you had some time > to look at hooking it into the build process in dictionaries/ (?) Yep... I will have to try to figure out how the build works though. Back to the wiki, at least I've realised how to make git work across the multiple checkouts now. -- Regards, Steven Butler
copyright.patch
Description: Binary data
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice