[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2009-02-27 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from nem...@openoffice.org Fri Feb 27 09:11:45 
+ 2009 ---
You are welcome. I'm very glad of the success, too, especially because this was
an old and serious problem.

I have used the he_IL version 1.0 of the OOo extension he-IL-dict.oxt. I have
checked the new dictionary with the following commands (tabulated input and the
option -1 need for the correct tokenization):

Check the base dictionary: 

$ cat he_IL.dic | sed 's#/.*$##' | awk  '{print$0\t}' | LC_ALL=C
~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL -1 -l

Check samples:

$ unmunch he_IL.dic he_IL.aff he_IL.all  # need 1.3 GB disk space
$ time sed -n '1~1000p' he_IL.all sample # sample with 10 words
$ cat sample | awk  '{print$0\t}' | LC_ALL=C time
~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL -1 -l

You can add the doubleaffixcompress and affixcompress scripts to your Hspell
distribution (doubleaffixcompress will be a standard tool of the next Hunspell
release, too). Also you may need to extend the affix file to handle Niqqut by
the IGNORE or the new ICONV/OCONV features of Hunspell.

(A further optimization could be to use hzip Hunspell compressed format:
$ ~/hunspell-1.2.8/src/tools/hzip *
$ ls -lh new_he_IL_alias.*
-rw-r--r-- 1 laci laci 634K 2009-02-27 09:55 new_he_IL_alias.aff.hz
-rw-r--r-- 1 laci laci 116K 2009-02-27 09:55 new_he_IL_alias.dic.hz
Hunspell library searches the hzip compressed files, if the given .dic and .aff
files are missing:
$ rm *aff *dic
$ ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL_alias
$ ~/hunspell-1.2.8/src/tools/hunspell -d new_he_IL_alias
Hunspell 1.2.8
a
 a 7 0: ו, ה, כ, ש, ב, מ, ל
But likely OpenOffice.org 3.1 and Firefox 3.1 extension formats don't support
hzip installation.)
Regards, László

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2009-02-26 Thread nyh
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from n...@openoffice.org Thu Feb 26 09:01:43 + 
2009 ---
nemeth, this is fantastic news!

I'm looking forward to browsing the doubleaffixcompress source, and seeing if
I can incoporate it or something similar in the hspell distribution (so that
doing make hunspell will create an optimal hunspell dictionary).

The numbers you report, 5.5 MB runtime memory use, 2 MB disk use and few tenth
of a second are very encouraging. For the average user, they not too far from
the optimal we've been able to achieve in hspell (as I described in previous
comments) - 4 MB, 0.1 MB and 0.05 seconds respectively.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2009-02-26 Thread nadavkav
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from nadav...@openoffice.org Thu Feb 26 09:11:58 
+ 2009 ---
very good news !
(i disabled automatic speller because of that)

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2009-02-25 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from nem...@openoffice.org Thu Feb 26 07:39:04 
+ 2009 ---
Created an attachment (id=60497)
optimized he_IL dictionary


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2009-02-25 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939


User nemeth changed the following:

What|Old value |New value

  Status|STARTED   |RESOLVED

  Resolution|  |FIXED

Target milestone|OOo 3.x   |OOo 3.1





--- Additional comments from nem...@openoffice.org Thu Feb 26 07:44:00 
+ 2009 ---
I have attached a compressed he_IL dictionary. I posted a letter to the
Lingucomponent development list about this improvement:

...
Languages with complex morphology can use the second-level affixation
of Hunspell. There is a new tool doubleaffixcompress
(http://downloads.sourceforge.net/hunspell/doubleaffixcompress) to
compress the output dictionary of the affixcompress script or other
Hunspell dictionaries using second-level affixes. For example, on the
old en_US dictionary of Openoffice.org we got 50% compression rate:

$ doubleaffixcompress en_US
$ wc -l en_US.dic new_en_US.dic
 62157 en_US.dic
 30442 new_en_US.dic
$ grep abolish en_US.dic
abolisher/M
abolish/LZRSDG
abolishment/MS
$ grep abolish new_en_US.dic
abolish/5193,6535,64991,64993,64995,64996,64997,65001
$ grep '\(5193\|6535\)' new_en_US.aff
SFX  5193 Y 1
SFX  5193 0 er/64999 .
SFX  6535 Y 1
SFX  6535 0 ment/64997,64999 .

A more important result on the (too big) he_IL dictionary. (This
dictionary recognizes more than 100 million Hebrew word forms):

$ LC_ALL=C doubleaffixcompress he_IL
$ wc he_IL.dic new_he_IL.dic
 329237  328996 3212113 he_IL.dic
 37913   37879 1940612 new_he_IL.dic
$ LC_ALL=C ~/hunspell-1.2.8/src/tools/makealias new_he_IL.{dic,aff}
output: new_he_IL_alias.dic, new_he_IL_alias.aff

Memory usage has been reduced from 19 MB to 5.5 MB by
doubleaffixcompress and makealias.
...

Also the big loading time reduced to a few tenth of a second, so this issue has
been fixed.


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


-
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2008-05-03 Thread nadavkav
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939


User nadavkav changed the following:

What|Old value |New value

  CC|'bizna,nyh'   |'bizna,nadavkav,nyh'





--- Additional comments from [EMAIL PROTECTED] Sat May  3 21:07:07 + 
2008 ---
i am using version 2.4.0 (Debian Sid) on a Compaq presario 1500 laptop , 1.24GB
Ram , P4 2.4 GHz. i use hunSpell (comes default) with an Hebrew dictionary.

i opened a 13p Hebrew (.doc) document in 7 sec. it took 8 more seconds for the
spell-checker. (OOo froze) it took the same time (overall) to open it with
spell-checker enabled. (15 sec +/- 1sec) obviously it was pre-cached in memory,
but that is true for all the tests.

just to open writer it takes 3-4 sec. it is just numbers but i think it is quite
quick ? do you see an issue here ? did you fix it already, prior to 2.4.x ?
is this an hspell issue ? did you change spell-checking engines ?
(allot of questions :-) ha, i am just curious to know ;-)

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2007-09-18 Thread wise_ferret
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939


User wise_ferret changed the following:

What|Old value |New value

 Version|OOo 2.0.3 |OOo 2.2





--- Additional comments from [EMAIL PROTECTED] Wed Sep 19 01:06:00 + 
2007 ---
Still exists in 2.2 under Ubuntu Feisty.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2006-09-21 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from [EMAIL PROTECTED] Thu Sep 21 00:54:18 -0700 
2006 ---
nyh: Secondary affix compression is a new Hunspell feature. I will make a
secondary affix conversion script for MySpell dictionaries that you can add to
the Hspell distribution. Unfortunatelly, I can build the Hspell OpenOffice.org
component only for Linux. Especially if Hspell has special non-portable features
for Hebrew (BTW. I'm interesting in it), we need Hspell OpenOffice.org 2.x
components for Windows, too. Thanks, Laci


-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2006-09-21 Thread nyh
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from [EMAIL PROTECTED] Thu Sep 21 01:15:15 -0700 
2006 ---
nemeth: thanks, I'd love to see this script. About the component: The Hspell
project produces not just the word lists, but also a sample implementation of a
spell-checker which is very efficient (the entire Hebrew word list is compressed
with a general-purpose, non-Hebrew-specific algorithm to 100K on disk, and can
be read into memory in a fraction of a second). Alan Yaniger (who I guess is
part of the OpenOffice project) took, about two years ago, a snapshot of this
code, and a snapshot of the compressed dictionaries, and wrapped it into an
openoffice component. I never understood why this was Linux specific, but I
guess that it's mostly because I use zlib to read compressed files, and Windows
doesn't have that. Looking at the package you pointed to, I see that it actually
contains compiled code - of zlib and of hspell (an ancient snapshot thereof), so
obviously it won't run as-is on Windows. You can take the new version of Hspell
from our site and try to compile it on Windows to see what breaks.

But I think it's better that instead of trying to revive this old component,
we focus on that secondary compression thing. One of the primary goals of the
Hspell project was to create a full word list that a general-purpose
multi-lingual spell-checker (such as ispell, aspell, myspell, hunspell, etc.)
could use, and would not need Hebrew-specific algorithms to be inserted into it.
Once we released the myspell version of our wordlist (the openoffice target in
our makefile), this goal became reality. If we'll have secondary affix
compression of the type you described (and of which I wasn't aware), the
dictionary can be drastically shrunk and this would be even better.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2006-09-20 Thread nyh
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939





--- Additional comments from [EMAIL PROTECTED] Wed Sep 20 05:51:54 -0700 
2006 ---
Thanks Nemeth. If you figure out how to do this secondary affix compression (is
this an hunspell-specific feature or can it be done with ordinary myspell?), if
you explain to me ([EMAIL PROTECTED]) how to do it, I'll include it in the
future versions of Hspell, whose makefile already has an openoffice target. If
this works, it could be great.

About using the Hspell code, rather than MySpell dictionary in OpenOffice - if
you do this, please at least update the dictionaries. In the two years since
Alan Yaniger created that component, a lot of water passed under the bridge, and
Hspell's vocabulary has undergone considerable enlargement. 3 major releases of
Hspell were done since, and a fourth one is on the way. If you need help in
updating that component, please let me know.

Thanks,
Nadav.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2006-09-18 Thread sba
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939


User sba changed the following:

What|Old value |New value

 Assigned to|sba   |nemeth

  Ever confirmed|  |1

  Status|UNCONFIRMED   |NEW

 Summary|hspell Hebrew spell checke|hspell Hebrew spell checke
|r freezes upon loading|r freezes several seconds 
|  |upon loading

Target milestone|---   |OOo 2.x





--- Additional comments from [EMAIL PROTECTED] Mon Sep 18 04:09:20 -0700 
2006 ---
SBA-nyh: The problem of too many dictionaries at once will cost performance
is well readable in the upcoming dialog of DictOOO. So THAT problem is well
known (and should have been known to the cooks of that OOo distribution),
thus will be disregarded within this issue. Feel free to submit another issue
with your ideas of speeding up an all dictionary use.

SBA-Nemeth: Confirming with OOo 2.04 RC (OOD680m4_Build9070) on Windows 2000
and SUSE Linux. CPU goes up to 100% whenever Hebrew Lingu is called the first
time in an office session. Of course, the delay time is system dependant. (-
summary adjusted as it is not an eternal freeze).
The other languages do not seem to have this problem.

-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[framework-issues] [Issue 66939] hspell Hebrew spell checke r freezes several seconds upon loading

2006-09-18 Thread nemeth
To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=66939


User nemeth changed the following:

What|Old value |New value

  Status|NEW   |STARTED





--- Additional comments from [EMAIL PROTECTED] Mon Sep 18 04:51:33 -0700 
2006 ---
The problem is that there are too many word stem in the Hebrew MySpell
dictionary (330 thousand). I will solve the problem with secondary affix
compression soon. Other solution is building the Hspell OOo component under OOo
2.0 (source: http://www.openoffice.org.il/hspell_src.tar.gz on
http://www.ivrix.org.il/projects/spell-checker/download.html). Thanks for the
report. Laci




-
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]