Re: Announce: Hspell 1.1
On Fri, Jan 01, 2010, E L wrote about Re: Announce: Hspell 1.1: I think it should be done in the following order: - If hspell doesn't have it add for each word if it's a verb adjective and so on. Hspell already does this, and more. This is known as a morphological analyzer. It is explained on our site, and you can also find on our site a link to a live demo. - Grammatical analyzer - I saw a doc work that was released under GPL about it long ago. - Grammatical fixer (maybe better spelling suggestion based on grammar - Independent of that we need a list of words and their nikud (I also saw one in that doc work) - Nikud checker - Nakdan Eli, I think this discussion is starting to get a little too specific for this list and I think we should continue it elsewhere. I opened a new mailing list for Hspell, at hspell-de...@lists.sourceforge.net If you're interested, please join this list (via the web interface at https://lists.sourceforge.net/lists/listinfo/hspell-devel) and we can continue this discussion, and other hspell-related technical discussions, there. Everybody who is interested in contributing to Hspell - whether its current capabilities or completely new ones - is very welcome to subscribe to this list. Does anyone know where will be a good place to start getting word list with nikud? The mila (center of knowledge for processing Hebrew, http://www.mila.cs.technion.ac.il/) started something like this (word list with niqqud). They create a word list that was originally forked from Hspell's (and since grew independently), and later they started adding niqqud to the base words - but only did it for part of the lexicon. This is a far-cry, however, from knowing how to inflect these base-words with correct niqqud, and I don't believe they ever did that. Nadav. -- Nadav Har'El| Sunday, Jan 3 2010, 18 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I used to work in a pickle factory, until http://nadav.harel.org.il |I got canned. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
Who said anything about *few* rules? They are many, and are complex, and have gazillion of exceptions. But they exist, and putting them into effect in hspell's inflection scripts is doable, albeit requiring a lot of meticulous work. The classical references for niqqud are Luah HaShemot HaShalem and Luah HaP`alim HaShalem by Shaul Bakali. These tables include all the rules and all the exceptions needed to add the correct niqqud to Hebrew words. On Fri, Jan 01, 2010 at 02:02:21AM +0200, Ely Levy wrote: I can only talk from my own experience, I couldn't find any good source for rules about nikud and grammar in a simple form. I did find some gpled work list with nikud, and I think I even talked to the people in mila. But no one could provide that few rules you are talking about. (And I'm still confused about the difference between old and modern grammar/nikud...) Ely On Thu, Dec 31, 2009 at 4:11 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one I beg to differ. First of all, most of the needed knowledge already exists, published in numerous papers and books, and demonstrated by several pieces of commercial software. One doesn't need to come with advanced knowledge of the topic, any more than I had to be some spell-checking expert before I started Hspell. All one needs is a willingness to learn, and of course the resourcefulness to put it into good use. Second, while the work on Hspell had a lot of very interesting theoretical sides and problems to solve (in linguistics, language, compression, etc.), most of the work was actually the mundane and almost endless task of making lists of words (a task which you can see, still isn't done 10 years after starting the project). For niqqud checking, there is also a lot of similar mundane work that needs to be done (writing the right niqqud for each word), and that takes a lot of time. For grammar checking, it depends what you call grammar: If you also want to include semantics, and not just grammar - like Prof. Uzzi Ornan did in his text-to-speech and niqqud research (and product) - there's also tons of work that needs to be done on creating classes of nouns, listing arguments of verbs, and so on. I guess you can start with just grammar, though, and in this case, you're right - it should be doable without too much data collection - so maybe this is indeed a good project to start with. This is all very interesting work. Unfortunately, I do not see myself starting it in the near future. If anyone is interested in taking a shot at it, I'd love to advise - please contact me and/or Dan privately. Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I couldn't afford a cool signature, so I http://nadav.harel.org.il |just got this one. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il -- Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901 ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
I think it should be done in the following order: - If hspell doesn't have it add for each word if it's a verb adjective and so on. - Grammatical analyzer - I saw a doc work that was released under GPL about it long ago. - Grammatical fixer (maybe better spelling suggestion based on grammar - Independent of that we need a list of words and their nikud (I also saw one in that doc work) - Nikud checker - Nakdan Does anyone know where will be a good place to start getting word list with nikud? Or where is the doc work that made grammatical analyzer? Ely On Fri, Jan 1, 2010 at 10:18 AM, Dan Kenigsberg dan...@cs.technion.ac.ilwrote: Who said anything about *few* rules? They are many, and are complex, and have gazillion of exceptions. But they exist, and putting them into effect in hspell's inflection scripts is doable, albeit requiring a lot of meticulous work. The classical references for niqqud are Luah HaShemot HaShalem and Luah HaP`alim HaShalem by Shaul Bakali. These tables include all the rules and all the exceptions needed to add the correct niqqud to Hebrew words. On Fri, Jan 01, 2010 at 02:02:21AM +0200, Ely Levy wrote: I can only talk from my own experience, I couldn't find any good source for rules about nikud and grammar in a simple form. I did find some gpled work list with nikud, and I think I even talked to the people in mila. But no one could provide that few rules you are talking about. (And I'm still confused about the difference between old and modern grammar/nikud...) Ely On Thu, Dec 31, 2009 at 4:11 PM, Nadav Har'El n...@math.technion.ac.il wrote: On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one I beg to differ. First of all, most of the needed knowledge already exists, published in numerous papers and books, and demonstrated by several pieces of commercial software. One doesn't need to come with advanced knowledge of the topic, any more than I had to be some spell-checking expert before I started Hspell. All one needs is a willingness to learn, and of course the resourcefulness to put it into good use. Second, while the work on Hspell had a lot of very interesting theoretical sides and problems to solve (in linguistics, language, compression, etc.), most of the work was actually the mundane and almost endless task of making lists of words (a task which you can see, still isn't done 10 years after starting the project). For niqqud checking, there is also a lot of similar mundane work that needs to be done (writing the right niqqud for each word), and that takes a lot of time. For grammar checking, it depends what you call grammar: If you also want to include semantics, and not just grammar - like Prof. Uzzi Ornan did in his text-to-speech and niqqud research (and product) - there's also tons of work that needs to be done on creating classes of nouns, listing arguments of verbs, and so on. I guess you can start with just grammar, though, and in this case, you're right - it should be doable without too much data collection - so maybe this is indeed a good project to start with. This is all very interesting work. Unfortunately, I do not see myself starting it in the near future. If anyone is interested in taking a shot at it, I'd love to advise - please contact me and/or Dan privately. Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I couldn't afford a cool signature, so I http://nadav.harel.org.il |just got this one. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il -- Dan Kenigsberg http://www.cs.technion.ac.il/~dankenhttp://www.cs.technion.ac.il/%7Edanken ICQ 162180901 ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
Nadav and Dan: It's great news to hear about the new release. thank you for the hard work and time you put into this project over the years. without hspell the hebrew free software was not what it is today. Ely: Dan suggested earlier : Luah HaShemot HaShalem and Luah HaP`alim HaShalem by Shaul Bakali I will add : http://culmus.sourceforge.net/dictionary/index.html anyway you are missing the point, *you* should do the things you listed, all the things you listed are doable, but doing them take hard work and time. if you think these things are important just do them. they are possible to do and when you start you will see that a lot of the work already been done by people in the community. hspell is not just a spell checker but also a grammatical analyzer that can tell you word tense, number, type, sex and hataye. this options in hspell are a big step in achieving your goal and culmus's dictionary project is aonther, i'm sure you will find other sources when you start the work. i hope to here about your project when it will have some working code to show. kobi 2010/1/1 Ely Levy elyl...@cs.huji.ac.il: I think it should be done in the following order: - If hspell doesn't have it add for each word if it's a verb adjective and so on. - Grammatical analyzer - I saw a doc work that was released under GPL about it long ago. - Grammatical fixer (maybe better spelling suggestion based on grammar - Independent of that we need a list of words and their nikud (I also saw one in that doc work) - Nikud checker - Nakdan Does anyone know where will be a good place to start getting word list with nikud? Or where is the doc work that made grammatical analyzer? Ely On Fri, Jan 1, 2010 at 10:18 AM, Dan Kenigsberg dan...@cs.technion.ac.il wrote: Who said anything about *few* rules? They are many, and are complex, and have gazillion of exceptions. But they exist, and putting them into effect in hspell's inflection scripts is doable, albeit requiring a lot of meticulous work. The classical references for niqqud are Luah HaShemot HaShalem and Luah HaP`alim HaShalem by Shaul Bakali. These tables include all the rules and all the exceptions needed to add the correct niqqud to Hebrew words. On Fri, Jan 01, 2010 at 02:02:21AM +0200, Ely Levy wrote: I can only talk from my own experience, I couldn't find any good source for rules about nikud and grammar in a simple form. I did find some gpled work list with nikud, and I think I even talked to the people in mila. But no one could provide that few rules you are talking about. (And I'm still confused about the difference between old and modern grammar/nikud...) Ely On Thu, Dec 31, 2009 at 4:11 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one I beg to differ. First of all, most of the needed knowledge already exists, published in numerous papers and books, and demonstrated by several pieces of commercial software. One doesn't need to come with advanced knowledge of the topic, any more than I had to be some spell-checking expert before I started Hspell. All one needs is a willingness to learn, and of course the resourcefulness to put it into good use. Second, while the work on Hspell had a lot of very interesting theoretical sides and problems to solve (in linguistics, language, compression, etc.), most of the work was actually the mundane and almost endless task of making lists of words (a task which you can see, still isn't done 10 years after starting the project). For niqqud checking, there is also a lot of similar mundane work that needs to be done (writing the right niqqud for each word), and that takes a lot of time. For grammar checking, it depends what you call grammar: If you also want to include semantics, and not just grammar - like Prof. Uzzi Ornan did in his text-to-speech and niqqud research (and product) - there's also tons of work that needs to be done on creating classes of nouns, listing arguments of verbs, and so on. I guess you can start with just grammar, though, and in this case, you're right - it should be doable without too much data collection - so maybe this is indeed a good project to start with. This is all very interesting work. Unfortunately, I do not see myself starting it in the near future. If anyone is interested in taking a shot at it, I'd love to advise - please contact me and/or Dan privately. Nadav. -- Nadav Har'El | Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il
Re: Announce: Hspell 1.1
Cool:) Any news on grammar checking/nikud checking? Ely On Thu, Dec 31, 2009 at 2:32 PM, Nadav Har'El n...@math.technion.ac.ilwrote: We are proud to present version 1.1 of Hspell, the free Hebrew spell-checker and morphological analyzer. You can find the new release in the project's homepage: http://hspell.ivrix.org.il/ Over three years have passed since our previous release. In that time, we continued to improve Hspell's vocabulary, and enlarged it by 900 more base words. Hspell is now closer to full coverage of the modern Hebrew language than it ever was. We've always been proud of Hspell's accuracy and its compliance with the spelling standard set by the Academy of the Hebrew Language. Nevertheless, we continuously get asked why Hspell spells certain words the way that it does. So, starting with this release, Hspell now includes a document which describes its spelling standard and discusses the numerous spelling questions which we had to answer while developing hspell. This document is still a work in progress, but even at its present form is already quite readable and, we hope, educational. It is available in Hspell's tarball, and also online: http://hspell.ivrix.org.il/niqqudless.pdf Not only people who download Hspell from our site will benefit from this release. For several years now, only a minority of Hspell's users downloaded it from our site. Hspell has become the de-facto standard Hebrew spell-checker in the free software world and beyond; It is available in Linux distributions, in Aspell's and Hunspell's dictionary collections, and as OpenOffice and Firefox plugins. Even Google's hugely popular mail service, GMail, uses Hspell as its Hebrew spell-checker. We expect that the new Hspell release will soon propagate to all these applications, so that their users will also be able to enjoy the improved vocabulary of Hspell 1.1. Enjoy Hspell 1.1. No further releases are expected this year ;-) Nadav Har'El and Dan Kenigsberg. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Computers are useless. They can only http://nadav.harel.org.il |give you answers. -- Pablo Picasso ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: Cool:) Any news on grammar checking/nikud checking? Not really... Do you (or anyone else) want to volunteer to help us work on it? Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Two wrongs may not may a right, but three http://nadav.harel.org.il |rights make a left. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
On Thu, Dec 31, 2009 at 02:46:37PM +0200, Ely Levy wrote: Cool:) Any news on grammar checking/nikud checking? No, we are constantly too busy releasing Hspell versions to deal with that :) ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one Ely On Thu, Dec 31, 2009 at 2:56 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: Cool:) Any news on grammar checking/nikud checking? Not really... Do you (or anyone else) want to volunteer to help us work on it? Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Two wrongs may not may a right, but three http://nadav.harel.org.il |rights make a left. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one I beg to differ. First of all, most of the needed knowledge already exists, published in numerous papers and books, and demonstrated by several pieces of commercial software. One doesn't need to come with advanced knowledge of the topic, any more than I had to be some spell-checking expert before I started Hspell. All one needs is a willingness to learn, and of course the resourcefulness to put it into good use. Second, while the work on Hspell had a lot of very interesting theoretical sides and problems to solve (in linguistics, language, compression, etc.), most of the work was actually the mundane and almost endless task of making lists of words (a task which you can see, still isn't done 10 years after starting the project). For niqqud checking, there is also a lot of similar mundane work that needs to be done (writing the right niqqud for each word), and that takes a lot of time. For grammar checking, it depends what you call grammar: If you also want to include semantics, and not just grammar - like Prof. Uzzi Ornan did in his text-to-speech and niqqud research (and product) - there's also tons of work that needs to be done on creating classes of nouns, listing arguments of verbs, and so on. I guess you can start with just grammar, though, and in this case, you're right - it should be doable without too much data collection - so maybe this is indeed a good project to start with. This is all very interesting work. Unfortunately, I do not see myself starting it in the near future. If anyone is interested in taking a shot at it, I'd love to advise - please contact me and/or Dan privately. Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I couldn't afford a cool signature, so I http://nadav.harel.org.il |just got this one. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
On Thu, Dec 31, 2009 at 03:31:59PM +0200, Ely Levy wrote: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one For grammar you might be right. However, the case for niqqud is very different. The rules for niqqud are very clear and strict (on most cases). Applying them to all verb and noun inflection is mostly hard work. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
2009/12/31 Nadav Har'El n...@math.technion.ac.il: We are proud to present version 1.1 of Hspell, the free Hebrew spell-checker and morphological analyzer. You can find the new release in the project's homepage: http://hspell.ivrix.org.il/ Over three years have passed since our previous release. In that time, we continued to improve Hspell's vocabulary, and enlarged it by 900 more base words. Hspell is now closer to full coverage of the modern Hebrew language than it ever was. We've always been proud of Hspell's accuracy and its compliance with the spelling standard set by the Academy of the Hebrew Language. Nevertheless, we continuously get asked why Hspell spells certain words the way that it does. So, starting with this release, Hspell now includes a document which describes its spelling standard and discusses the numerous spelling questions which we had to answer while developing hspell. This document is still a work in progress, but even at its present form is already quite readable and, we hope, educational. It is available in Hspell's tarball, and also online: http://hspell.ivrix.org.il/niqqudless.pdf Not only people who download Hspell from our site will benefit from this release. For several years now, only a minority of Hspell's users downloaded it from our site. Hspell has become the de-facto standard Hebrew spell-checker in the free software world and beyond; It is available in Linux distributions, in Aspell's and Hunspell's dictionary collections, and as OpenOffice and Firefox plugins. Even Google's hugely popular mail service, GMail, uses Hspell as its Hebrew spell-checker. We expect that the new Hspell release will soon propagate to all these applications, so that their users will also be able to enjoy the improved vocabulary of Hspell 1.1. Enjoy Hspell 1.1. No further releases are expected this year ;-) Nadav Har'El and Dan Kenigsberg. Congratulations! I was under the impression that hspell development was pretty much stalled indefinitely. This release is great news. I have a long word list that can be added to hspell, mostly words such as Ubuntu, Linux, and such. I do not speak English/Russian/Greek as Hebrew and certainly would not add those words to a dictionary. -- Dotan Cohen http://what-is-what.com http://gibberish.co.il ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Announce: Hspell 1.1
I can only talk from my own experience, I couldn't find any good source for rules about nikud and grammar in a simple form. I did find some gpled work list with nikud, and I think I even talked to the people in mila. But no one could provide that few rules you are talking about. (And I'm still confused about the difference between old and modern grammar/nikud...) Ely On Thu, Dec 31, 2009 at 4:11 PM, Nadav Har'El n...@math.technion.ac.ilwrote: On Thu, Dec 31, 2009, E L wrote about Re: Announce: Hspell 1.1: I think the main problem is what need to be done and not the man power to program it. If someone know of what are the rules grammar or nikud checkers should follow I'm sure it won't be a big deal programing one I beg to differ. First of all, most of the needed knowledge already exists, published in numerous papers and books, and demonstrated by several pieces of commercial software. One doesn't need to come with advanced knowledge of the topic, any more than I had to be some spell-checking expert before I started Hspell. All one needs is a willingness to learn, and of course the resourcefulness to put it into good use. Second, while the work on Hspell had a lot of very interesting theoretical sides and problems to solve (in linguistics, language, compression, etc.), most of the work was actually the mundane and almost endless task of making lists of words (a task which you can see, still isn't done 10 years after starting the project). For niqqud checking, there is also a lot of similar mundane work that needs to be done (writing the right niqqud for each word), and that takes a lot of time. For grammar checking, it depends what you call grammar: If you also want to include semantics, and not just grammar - like Prof. Uzzi Ornan did in his text-to-speech and niqqud research (and product) - there's also tons of work that needs to be done on creating classes of nouns, listing arguments of verbs, and so on. I guess you can start with just grammar, though, and in this case, you're right - it should be doable without too much data collection - so maybe this is indeed a good project to start with. This is all very interesting work. Unfortunately, I do not see myself starting it in the near future. If anyone is interested in taking a shot at it, I'd love to advise - please contact me and/or Dan privately. Nadav. -- Nadav Har'El| Thursday, Dec 31 2009, 14 Tevet 5770 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |I couldn't afford a cool signature, so I http://nadav.harel.org.il |just got this one. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il