[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from nem...@openoffice.org Fri Feb 27 09:11:45 + 2009 --- You are welcome. I'm very glad of the success, too, especially because this was an old and serious problem. I have used the he_IL version 1.0 of the OOo extension he-IL-dict.oxt. I have checked the new dictionary with the following commands (tabulated input and the option -1 need for the correct tokenization): Check the base dictionary: $ cat he_IL.dic | sed 's#/.*$##' | awk '{print$0\t}' | LC_ALL=C ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL -1 -l Check samples: $ unmunch he_IL.dic he_IL.aff he_IL.all # need 1.3 GB disk space $ time sed -n '1~1000p' he_IL.all sample # sample with 10 words $ cat sample | awk '{print$0\t}' | LC_ALL=C time ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL -1 -l You can add the doubleaffixcompress and affixcompress scripts to your Hspell distribution (doubleaffixcompress will be a standard tool of the next Hunspell release, too). Also you may need to extend the affix file to handle Niqqut by the IGNORE or the new ICONV/OCONV features of Hunspell. (A further optimization could be to use hzip Hunspell compressed format: $ ~/hunspell-1.2.8/src/tools/hzip * $ ls -lh new_he_IL_alias.* -rw-r--r-- 1 laci laci 634K 2009-02-27 09:55 new_he_IL_alias.aff.hz -rw-r--r-- 1 laci laci 116K 2009-02-27 09:55 new_he_IL_alias.dic.hz Hunspell library searches the hzip compressed files, if the given .dic and .aff files are missing: $ rm *aff *dic $ ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL_alias $ ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL_alias Hunspell 1.2.8 a a 7 0: ו, ה, כ, ש, ב, מ, ל But likely OpenOffice.org 3.1 and Firefox 3.1 extension formats don't support hzip installation.) Regards, László - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from n...@openoffice.org Thu Feb 26 09:01:43 + 2009 --- nemeth, this is fantastic news! I'm looking forward to browsing the doubleaffixcompress source, and seeing if I can incoporate it or something similar in the hspell distribution (so that doing make hunspell will create an optimal hunspell dictionary). The numbers you report, 5.5 MB runtime memory use, 2 MB disk use and few tenth of a second are very encouraging. For the average user, they not too far from the optimal we've been able to achieve in hspell (as I described in previous comments) - 4 MB, 0.1 MB and 0.05 seconds respectively. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from nadav...@openoffice.org Thu Feb 26 09:11:58 + 2009 --- very good news ! (i disabled automatic speller because of that) - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from nem...@openoffice.org Thu Feb 26 07:39:04 + 2009 --- Created an attachment (id=60497) optimized he_IL dictionary - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 User nemeth changed the following: What|Old value |New value Status|STARTED |RESOLVED Resolution| |FIXED Target milestone|OOo 3.x |OOo 3.1 --- Additional comments from nem...@openoffice.org Thu Feb 26 07:44:00 + 2009 --- I have attached a compressed he_IL dictionary. I posted a letter to the Lingucomponent development list about this improvement: ... Languages with complex morphology can use the second-level affixation of Hunspell. There is a new tool doubleaffixcompress (http://downloads.sourceforge.net/hunspell/doubleaffixcompress) to compress the output dictionary of the affixcompress script or other Hunspell dictionaries using second-level affixes. For example, on the old en_US dictionary of Openoffice.org we got 50% compression rate: $ doubleaffixcompress en_US $ wc -l en_US.dic new_en_US.dic 62157 en_US.dic 30442 new_en_US.dic $ grep abolish en_US.dic abolisher/M abolish/LZRSDG abolishment/MS $ grep abolish new_en_US.dic abolish/5193,6535,64991,64993,64995,64996,64997,65001 $ grep '\(5193\|6535\)' new_en_US.aff SFX 5193 Y 1 SFX 5193 0 er/64999 . SFX 6535 Y 1 SFX 6535 0 ment/64997,64999 . A more important result on the (too big) he_IL dictionary. (This dictionary recognizes more than 100 million Hebrew word forms): $ LC_ALL=C doubleaffixcompress he_IL $ wc he_IL.dic new_he_IL.dic 329237 328996 3212113 he_IL.dic 37913 37879 1940612 new_he_IL.dic $ LC_ALL=C ~/hunspell-1.2.8/src/tools/makealias new_he_IL.{dic,aff} output: new_he_IL_alias.dic, new_he_IL_alias.aff Memory usage has been reduced from 19 MB to 5.5 MB by doubleaffixcompress and makealias. ... Also the big loading time reduced to a few tenth of a second, so this issue has been fixed. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org - To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 User nadavkav changed the following: What|Old value |New value CC|'bizna,nyh' |'bizna,nadavkav,nyh' --- Additional comments from [EMAIL PROTECTED] Sat May 3 21:07:07 + 2008 --- i am using version 2.4.0 (Debian Sid) on a Compaq presario 1500 laptop , 1.24GB Ram , P4 2.4 GHz. i use hunSpell (comes default) with an Hebrew dictionary. i opened a 13p Hebrew (.doc) document in 7 sec. it took 8 more seconds for the spell-checker. (OOo froze) it took the same time (overall) to open it with spell-checker enabled. (15 sec +/- 1sec) obviously it was pre-cached in memory, but that is true for all the tests. just to open writer it takes 3-4 sec. it is just numbers but i think it is quite quick ? do you see an issue here ? did you fix it already, prior to 2.4.x ? is this an hspell issue ? did you change spell-checking engines ? (allot of questions :-) ha, i am just curious to know ;-) - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 User wise_ferret changed the following: What|Old value |New value Version|OOo 2.0.3 |OOo 2.2 --- Additional comments from [EMAIL PROTECTED] Wed Sep 19 01:06:00 + 2007 --- Still exists in 2.2 under Ubuntu Feisty. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from [EMAIL PROTECTED] Thu Sep 21 00:54:18 -0700 2006 --- nyh: Secondary affix compression is a new Hunspell feature. I will make a secondary affix conversion script for MySpell dictionaries that you can add to the Hspell distribution. Unfortunatelly, I can build the Hspell OpenOffice.org component only for Linux. Especially if Hspell has special non-portable features for Hebrew (BTW. I'm interesting in it), we need Hspell OpenOffice.org 2.x components for Windows, too. Thanks, Laci - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from [EMAIL PROTECTED] Thu Sep 21 01:15:15 -0700 2006 --- nemeth: thanks, I'd love to see this script. About the component: The Hspell project produces not just the word lists, but also a sample implementation of a spell-checker which is very efficient (the entire Hebrew word list is compressed with a general-purpose, non-Hebrew-specific algorithm to 100K on disk, and can be read into memory in a fraction of a second). Alan Yaniger (who I guess is part of the OpenOffice project) took, about two years ago, a snapshot of this code, and a snapshot of the compressed dictionaries, and wrapped it into an openoffice component. I never understood why this was Linux specific, but I guess that it's mostly because I use zlib to read compressed files, and Windows doesn't have that. Looking at the package you pointed to, I see that it actually contains compiled code - of zlib and of hspell (an ancient snapshot thereof), so obviously it won't run as-is on Windows. You can take the new version of Hspell from our site and try to compile it on Windows to see what breaks. But I think it's better that instead of trying to revive this old component, we focus on that secondary compression thing. One of the primary goals of the Hspell project was to create a full word list that a general-purpose multi-lingual spell-checker (such as ispell, aspell, myspell, hunspell, etc.) could use, and would not need Hebrew-specific algorithms to be inserted into it. Once we released the myspell version of our wordlist (the openoffice target in our makefile), this goal became reality. If we'll have secondary affix compression of the type you described (and of which I wasn't aware), the dictionary can be drastically shrunk and this would be even better. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 --- Additional comments from [EMAIL PROTECTED] Wed Sep 20 05:51:54 -0700 2006 --- Thanks Nemeth. If you figure out how to do this secondary affix compression (is this an hunspell-specific feature or can it be done with ordinary myspell?), if you explain to me ([EMAIL PROTECTED]) how to do it, I'll include it in the future versions of Hspell, whose makefile already has an openoffice target. If this works, it could be great. About using the Hspell code, rather than MySpell dictionary in OpenOffice - if you do this, please at least update the dictionaries. In the two years since Alan Yaniger created that component, a lot of water passed under the bridge, and Hspell's vocabulary has undergone considerable enlargement. 3 major releases of Hspell were done since, and a fourth one is on the way. If you need help in updating that component, please let me know. Thanks, Nadav. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 User sba changed the following: What|Old value |New value Assigned to|sba |nemeth Ever confirmed| |1 Status|UNCONFIRMED |NEW Summary|hspell Hebrew spell checke|hspell Hebrew spell checke |r freezes upon loading|r freezes several seconds | |upon loading Target milestone|--- |OOo 2.x --- Additional comments from [EMAIL PROTECTED] Mon Sep 18 04:09:20 -0700 2006 --- SBA-nyh: The problem of too many dictionaries at once will cost performance is well readable in the upcoming dialog of DictOOO. So THAT problem is well known (and should have been known to the cooks of that OOo distribution), thus will be disregarded within this issue. Feel free to submit another issue with your ideas of speeding up an all dictionary use. SBA-Nemeth: Confirming with OOo 2.04 RC (OOD680m4_Build9070) on Windows 2000 and SUSE Linux. CPU goes up to 100% whenever Hebrew Lingu is called the first time in an office session. Of course, the delay time is system dependant. (- summary adjusted as it is not an eternal freeze). The other languages do not seem to have this problem. - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading
To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=66939 User nemeth changed the following: What|Old value |New value Status|NEW |STARTED --- Additional comments from [EMAIL PROTECTED] Mon Sep 18 04:51:33 -0700 2006 --- The problem is that there are too many word stem in the Hebrew MySpell dictionary (330 thousand). I will solve the problem with secondary affix compression soon. Other solution is building the Hspell OOo component under OOo 2.0 (source: http://www.openoffice.org.il/hspell_src.tar.gz on http://www.ivrix.org.il/projects/spell-checker/download.html). Thanks for the report. Laci - Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]