Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Marco A.G.Pinto

Hello!

The "Tags" is the extra information which I only found in the pt_PT 
dictionary, which tells if each word is masculine, feminine, singular, 
plural, etc.


As for the rules, they are here:


For each wrong result you get in the extracted list, you can check here 
if it is a rule or a tool bug.


Thanks!

Kind regards,
   >Marco A.G.Pinto
 --



On 27/10/2014 20:46, R.J. Baars wrote:

The first thing I notice is that flags and word are not separated on the
screen.

I added a picture to show that.

When I click edit, it is the same. The / is apparently not seen as a flag
indicator in the dictionary.
In the dic, you can find flags after the / , comments after # and extra
data after a tab (like yy:xxx, which can be ignored for spelling purposes;
it is word-type info, a bit like postags)

The data that appears after clicking the edit is unclear to me; I don't
understand what it is supposed to mean, even though I have quite good
knowledge of the Hunspell format and options.

Ruud



--
--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
The first thing I notice is that flags and word are not separated on the
screen.

I added a picture to show that.

When I click edit, it is the same. The / is apparently not seen as a flag
indicator in the dictionary.
In the dic, you can find flags after the / , comments after # and extra
data after a tab (like yy:xxx, which can be ignored for spelling purposes;
it is word-type info, a bit like postags)

The data that appears after clicking the edit is unclear to me; I don't
understand what it is supposed to mean, even though I have quite good
knowledge of the Hunspell format and options.

Ruud

> Dear Ruud,
>
> To see if it is a tool bug or a rule bug, just edit the word(s) in the
> "Dictionary" tab of my tool and it will show a tab containing each rule
> that generates the derivates.
>
> You can edit the words with a double-click or with right-click+EDIT.
>
> :-P
>
> I am feeling so eager!
>
> PS->You must use V3.0 (beta) of Proofing Tool GUI since V2.x is
> outdated. I wanted to
>  release an official 3.0 but I want to make several improvements
> to it and I will
>  only have free time in January.
>
> Thanks!
>
> Kind regards from your friend,
>   >Marco A.G.Pinto
> ---
>
> On 27/10/2014 20:02, R.J. Baars wrote:
>> In the output of the tool are also unmunch errors.
>>
>> Ab0 as the derivative if Abel e.g.
>>
>> After exporting and processing into a words list, out of the 2.7 Mb, 2.3
>> Mb was accepted as a correct word by the same spellchecker.
>>
>> So the 'bag of trics' might still be useful after unmunching using this
>> tool, to collect suggestions and add them to the list, for another round
>> of spell checking. (Hunspell suggests words that are not accepted by
>> it...)
>>
>> Ruud
>>
>
> --
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Marco A.G.Pinto

Dear Ruud,

To see if it is a tool bug or a rule bug, just edit the word(s) in the 
"Dictionary" tab of my tool and it will show a tab containing each rule 
that generates the derivates.


You can edit the words with a double-click or with right-click+EDIT.

:-P

I am feeling so eager!

PS->You must use V3.0 (beta) of Proofing Tool GUI since V2.x is 
outdated. I wanted to
release an official 3.0 but I want to make several improvements 
to it and I will

only have free time in January.

Thanks!

Kind regards from your friend,
 >Marco A.G.Pinto
   ---

On 27/10/2014 20:02, R.J. Baars wrote:

In the output of the tool are also unmunch errors.

Ab0 as the derivative if Abel e.g.

After exporting and processing into a words list, out of the 2.7 Mb, 2.3
Mb was accepted as a correct word by the same spellchecker.

So the 'bag of trics' might still be useful after unmunching using this
tool, to collect suggestions and add them to the list, for another round
of spell checking. (Hunspell suggests words that are not accepted by
it...)

Ruud



--
--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
In the output of the tool are also unmunch errors.

Ab0 as the derivative if Abel e.g.

After exporting and processing into a words list, out of the 2.7 Mb, 2.3
Mb was accepted as a correct word by the same spellchecker.

So the 'bag of trics' might still be useful after unmunching using this
tool, to collect suggestions and add them to the list, for another round
of spell checking. (Hunspell suggests words that are not accepted by
it...)

Ruud





--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
The tool seems to work.

I will check if it is better than the bag of trick.. Looks very promising.
Requires further processing though.

Ruud


> You have to use V3.0 build 64. From the menu "Dictionary Tools", choose
> "Extract wordlist". It worked for me.
>
> Am 27.10.2014 16:38, schrieb Daniel Naber:
>> On 2014-10-27 13:48, Marco A.G.Pinto wrote:
>>
>>>  To unmunch .DIC + .AFF use my tool, Proofing Tool GUI:
>>>  http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html [3]
>>
>> How does it work? I couldn't find an "unmunch" menu item or similar.
>> What does it do differently to unmunch command line program?
>>
>> Regards
>>   Daniel
>>
>>
>> --
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Jan Schreiber
You have to use V3.0 build 64. From the menu "Dictionary Tools", choose
"Extract wordlist". It worked for me.

Am 27.10.2014 16:38, schrieb Daniel Naber:
> On 2014-10-27 13:48, Marco A.G.Pinto wrote:
> 
>>  To unmunch .DIC + .AFF use my tool, Proofing Tool GUI:
>>  http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html [3]
> 
> How does it work? I couldn't find an "unmunch" menu item or similar. 
> What does it do differently to unmunch command line program?
> 
> Regards
>   Daniel
> 
> 
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> 

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
Below is the full bag of tricks:

#!/bin/bash
# set the language id (name of hunspell dic without extension)
if [ ! $1 ] ; then
  echo "ENTER THE NAME OF THE DICTIONARY FILE WITHOUT .DIC AS A PARAMTER"
else
  if [ -f $1.dic ] ; then
if [ -f $1.aff ] ; then
  LANG=$1
  # try to unmunch
  echo "UNMUNCHING"
  unmunch $LANG.dic $LANG.aff | sed "s/\/.*$//g" > $LANG.txt
  # use dictionary as input anyway, removing flags
  echo "MAKING DICTIONARY WITHOUT FLAGS"
  cat $LANG.dic | sed "s/\/.*$//g" >> $LANG.tmp
  # add a small modifyer to the dic words to generate more alternatives
  echo "ADDING MODIFIED DICTIONARY ITEMS FOR MORE SUGGESTIONS"
  cat $LANG.tmp >> $LANG.txt
  cat $LANG.tmp | sed "s/.$/x/g" >> $LANG.txt
  rm $LANG.tmp
  # add random words (languages that looks a bit like it or just noise)
  echo "ADDING OTHER WORDS IF AVAILABLE"
  if [ -f other.txt ] ; then
cat other.txt >> $LANG.txt
  else
echo "  no other.txt present; add one (from similar language) for better
results"
  fi
  # sorting and getting unique
  echo "MAKING WORDS IN LIST UNIQUE"
  cat $LANG.txt | sort | uniq > $LANG.in
  rm $LANG.txt
  # use the input to generate suggestions
  echo "USING HUNSPELL TO GET SUGGESTIONS (TERRIBLY SLOW!)"
  hunspell -i utf-8 -d $LANG -a $LANG.in | grep "&" > $LANG.suggestions
  # edit the suggestions into words
  echo "ADD SUGGESTIONS TO WORDS LIST"
  cat $LANG.suggestions | sed "s/&.*: //g" | sed "s/, /\n/g" | sort |
uniq >> $LANG.in
  rm $LANG.suggestions
  # get all correct words
  echo "SPELLCHECK ALL WORDS TO GET CORRECT WORDS"
  hunspell -i utf-8 -G -d $LANG $LANG.in > $LANG.okay
else
  echo "$1.aff IS MISSING"
fi
  else
echo "$1.dic  IS MISSING"
  fi
fi




--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
Apart from the trick I am applying now, a good option for more valid
output could be to use the words form Wikipedia and Tatoeba as an extra
input. If the language is in those databases.

Galician grew to > 3 Mb fast enough when Spanish and Portuguese were used
as input. These could also be found in an unmunch of those languages or
the LT dictionaries.

Whatever is acceptable for the list to create.

Ruud




--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Daniel Naber
On 2014-10-27 13:48, Marco A.G.Pinto wrote:

>  To unmunch .DIC + .AFF use my tool, Proofing Tool GUI:
>  http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html [3]

How does it work? I couldn't find an "unmunch" menu item or similar. 
What does it do differently to unmunch command line program?

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Marco A.G.Pinto

Daniels and friends,

To unmunch .DIC + .AFF use my tool, Proofing Tool GUI:
http://marcoagpinto.cidadevirtual.pt/proofingtoolgui.html

But please notice that the files must be in UTF-8 and not obfuscated.

Thanks!

Kind regards,
 >Marco A.G.Pinto
   --


On 27/10/2014 08:50, Daniel Naber wrote:

Hi,

I tried to switch Icelandic and Galician to hunspell (as documented at
http://wiki.languagetool.org/hunspell-support#toc3), but I ran into
problems:

For Icelandic, words like 'virkar' and 'texta' do not get recognized,
simply because hunspell's unmunch doesn't create them. Does anybody have
an idea why that might be? In other words, how can I get a complete list
of Icelandic words from is_IS.aff and is_IS.dic?

For Galician, unmunch returns entries like "construíu/102,103,104|".
This seems to be caused by "recursive" definitions like "SFX 232 oñer
ón/104 poñer", where a suffix is not simply replaced by another suffix,
but by a suffix plus another tag. Can anybody confirm that? Is there a
workaround?

Any help is welcome.

Regards
   Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel



--
--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
If you don't want the words from my own list added, I will leave them out.
No issue. But it will mean, since the source is not unmunchable, you might
be missing quite common Icelandic words, because the other tricks did not
generate them.

But It is already running, without other input than the hunspell stuff,
for both Icelandic and Galician.

It will take time however, since generating suggestions is slow.

I create a simple Bash file to do the entire process as well.

If that one generates a workable list, I will supply that as well.

Ruud

> On 2014-10-27 11:37, R.J. Baars wrote:
>
>> That is what these trick do. There is no word added that is not
>> accepted
>> by the spellchecker.
>
> I understand that, I'd also like to understand where 'virkar' and
> 'texta' come from: from your unmunch output or from the step you call
> "Then I added my own collection of icelandic words".
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Daniel Naber
On 2014-10-27 11:37, R.J. Baars wrote:

> That is what these trick do. There is no word added that is not 
> accepted
> by the spellchecker.

I understand that, I'd also like to understand where 'virkar' and 
'texta' come from: from your unmunch output or from the step you call 
"Then I added my own collection of icelandic words".

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
Anyway, the words you wanted checked were in the dictionary before unmunch.

Ruud


> There is no list to go from, so how should I know? If htere was such a
> list, there was no need to use unmunch, right?
>
> Doing an unmunch, you add lots of words to the dictionary, being all
> derivatives. That is what makes them that big.
>
> When the source list is not there, the only thing you can do is what a
> user does, use the spell checker as a correctness filter.
>
> That is what these trick do. There is no word added that is not accepted
> by the spellchecker.
>
> I think you will not have to maintain those lists at all. You could just
> try to get the sources if they are still being maintained. If it is no
> longer maintained, a new maintainer will have a good start by having a
> words list and word frequencies, not just Hunspell codings.
>
> Ruud
>
>
>> On 2014-10-27 10:53, R.J. Baars wrote:
>>
>>> I first changed it into utf-8;
>>> I removed the po: flags
>>> I changed the tab chars into spaces
>>> Then I unmunched.
>>> I used sed to remove the trailing flags, which are created, as well as
>>> trailing numbers
>>> Then I added my own collection of icelandic words
>>
>> Had 'virkar' and 'texta' been in the list before you added your words?
>> The thing is that I don't want to maintain a dictionary, this is really
>> the task of the dictionary maintainers (outside LT). So adding words
>> leads to problems, as it creates a dictionary that we cannot easily
>> update anymore when the original dictionary gets an update.
>>
>> The only way I "extend" the German dictionary is by adding words to a
>> separate file ("ignore.txt"), and these are supposed to become part of
>> the original dictionary in the long term.
>>
>> Regards
>>   Daniel
>>
>>
>> --
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
There is no list to go from, so how should I know? If htere was such a
list, there was no need to use unmunch, right?

Doing an unmunch, you add lots of words to the dictionary, being all
derivatives. That is what makes them that big.

When the source list is not there, the only thing you can do is what a
user does, use the spell checker as a correctness filter.

That is what these trick do. There is no word added that is not accepted
by the spellchecker.

I think you will not have to maintain those lists at all. You could just
try to get the sources if they are still being maintained. If it is no
longer maintained, a new maintainer will have a good start by having a
words list and word frequencies, not just Hunspell codings.

Ruud


> On 2014-10-27 10:53, R.J. Baars wrote:
>
>> I first changed it into utf-8;
>> I removed the po: flags
>> I changed the tab chars into spaces
>> Then I unmunched.
>> I used sed to remove the trailing flags, which are created, as well as
>> trailing numbers
>> Then I added my own collection of icelandic words
>
> Had 'virkar' and 'texta' been in the list before you added your words?
> The thing is that I don't want to maintain a dictionary, this is really
> the task of the dictionary maintainers (outside LT). So adding words
> leads to problems, as it creates a dictionary that we cannot easily
> update anymore when the original dictionary gets an update.
>
> The only way I "extend" the German dictionary is by adding words to a
> separate file ("ignore.txt"), and these are supposed to become part of
> the original dictionary in the long term.
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Daniel Naber
On 2014-10-27 10:53, R.J. Baars wrote:

> I first changed it into utf-8;
> I removed the po: flags
> I changed the tab chars into spaces
> Then I unmunched.
> I used sed to remove the trailing flags, which are created, as well as
> trailing numbers
> Then I added my own collection of icelandic words

Had 'virkar' and 'texta' been in the list before you added your words? 
The thing is that I don't want to maintain a dictionary, this is really 
the task of the dictionary maintainers (outside LT). So adding words 
leads to problems, as it creates a dictionary that we cannot easily 
update anymore when the original dictionary gets an update.

The only way I "extend" the German dictionary is by adding words to a 
separate file ("ignore.txt"), and these are supposed to become part of 
the original dictionary in the long term.

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
Galician will be doable as well.
It accepts a lot of spanish and portuguese words (already 4Mb). Add the
suggestions to it, and it will be a workable list.

My computer will be doing that for the next days (suggestion is slow)

By the way, would it not be a good idea to have the full dictionaries
editable online, and generate the words lists from that database?

Ruud


> On 2014-10-27 10:26, R.J. Baars wrote:
>
>> I was able to make a file though. It is 3 Mb uncompressed.
>>
>> You can download it from dev.taaltik.nl/is.okay.zip
>
> Thanks, what was the exact command you used to create this list?
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
> On 2014-10-27 10:26, R.J. Baars wrote:
>
>> I was able to make a file though. It is 3 Mb uncompressed.
>>
>> You can download it from dev.taaltik.nl/is.okay.zip
>
> Thanks, what was the exact command you used to create this list?

Multiple. And manual editing.

I first changed it into utf-8;
I removed the po: flags
I changed the tab chars into spaces
Then I unmunched.
I used sed to remove the trailing flags, which are created, as well as
trailing numbers
Then I added my own collection of icelandic words
And finally I used hunspell -G to generate an accepted list from it.

There is another trick to enhance it a bit more; You could throw all
collected words of all languages through Hunspell using Icelandic, and
catch the suggestions to add those to the list again.

I did not do that.


I gave a go at galician, but that one is even worse. I am quite sure it is
not as good too, since I sam quite some Dutch proper names in it

Nevertheless, I can generate something Catalan using Spanish and
Portuguese as input, catching the suggestions and use them as words.
It is the best I can do with this set...



Ruud







>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread Daniel Naber
On 2014-10-27 10:26, R.J. Baars wrote:

> I was able to make a file though. It is 3 Mb uncompressed.
> 
> You can download it from dev.taaltik.nl/is.okay.zip

Thanks, what was the exact command you used to create this list?

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars

Icelandig really create a lot of junk using unmunch, even after removing
some newer attributes form the .dict.
Looks like unmunch is not capable of using the number flags as well.

I was able to make a file though. It is 3 Mb uncompressed.

You can download it from dev.taaltik.nl/is.okay.zip

Ruud

> Unmunch does not support the newer functionalities of Hunspell. It might
> generate rubbish even.
>
> There are ways to do this, more or less.
>
> Generating the list using unmunch is still an option, even when it
> generates rubbish. Add a list of found Icelandic words to that list.
> The use hunspell with the -G to generate a list of correct words from
> this.
>
> (Even then, it might be usefull to use the Hunspell with -L on it too; you
> might see it rejects a few words. Unexpected, but it does.)
>
> I could do this for you, done it before ..
> Ruud
>
>> Hi,
>>
>> I tried to switch Icelandic and Galician to hunspell (as documented at
>> http://wiki.languagetool.org/hunspell-support#toc3), but I ran into
>> problems:
>>
>> For Icelandic, words like 'virkar' and 'texta' do not get recognized,
>> simply because hunspell's unmunch doesn't create them. Does anybody have
>> an idea why that might be? In other words, how can I get a complete list
>> of Icelandic words from is_IS.aff and is_IS.dic?
>>
>> For Galician, unmunch returns entries like "construíu/102,103,104|".
>> This seems to be caused by "recursive" definitions like "SFX 232 oñer
>> ón/104 poñer", where a suffix is not simply replaced by another
>> suffix,
>> but by a suffix plus another tag. Can anybody confirm that? Is there a
>> workaround?
>>
>> Any help is welcome.
>>
>> Regards
>>   Daniel
>>
>>
>> --
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Help with unmunch and Icelandic + Galician

2014-10-27 Thread R.J. Baars
Unmunch does not support the newer functionalities of Hunspell. It might
generate rubbish even.

There are ways to do this, more or less.

Generating the list using unmunch is still an option, even when it
generates rubbish. Add a list of found Icelandic words to that list.
The use hunspell with the -G to generate a list of correct words from this.

(Even then, it might be usefull to use the Hunspell with -L on it too; you
might see it rejects a few words. Unexpected, but it does.)

I could do this for you, done it before ..
Ruud

> Hi,
>
> I tried to switch Icelandic and Galician to hunspell (as documented at
> http://wiki.languagetool.org/hunspell-support#toc3), but I ran into
> problems:
>
> For Icelandic, words like 'virkar' and 'texta' do not get recognized,
> simply because hunspell's unmunch doesn't create them. Does anybody have
> an idea why that might be? In other words, how can I get a complete list
> of Icelandic words from is_IS.aff and is_IS.dic?
>
> For Galician, unmunch returns entries like "construíu/102,103,104|".
> This seems to be caused by "recursive" definitions like "SFX 232 oñer
> ón/104 poñer", where a suffix is not simply replaced by another suffix,
> but by a suffix plus another tag. Can anybody confirm that? Is there a
> workaround?
>
> Any help is welcome.
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Help with unmunch and Icelandic + Galician

2014-10-27 Thread Daniel Naber
Hi,

I tried to switch Icelandic and Galician to hunspell (as documented at 
http://wiki.languagetool.org/hunspell-support#toc3), but I ran into 
problems:

For Icelandic, words like 'virkar' and 'texta' do not get recognized, 
simply because hunspell's unmunch doesn't create them. Does anybody have 
an idea why that might be? In other words, how can I get a complete list 
of Icelandic words from is_IS.aff and is_IS.dic?

For Galician, unmunch returns entries like "construíu/102,103,104|". 
This seems to be caused by "recursive" definitions like "SFX 232 oñer 
ón/104 poñer", where a suffix is not simply replaced by another suffix, 
but by a suffix plus another tag. Can anybody confirm that? Is there a 
workaround?

Any help is welcome.

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel