Re: Hunspell unmunching question

2014-12-24 Thread Andrea Pescetti

On 04/12/2014 Marco A.G.Pinto wrote:

What this means is that I probably need to change the code of my tool,
maybe create three arrays:
1st - to store the words with suffixes
2nd - to store the codes of the prefixes
3rd - to store 1st plus all its combinations with the prefixes (it would
apply prefixes to 1st and store them in 3rd )


Displaying all combinations would be highly unpractical since indeed it 
would explode. Maybe you could rearrange the GUI so that it displays 
something like unsubscribe (and derivatives) when it handles prefixes.


Regards,
  Andrea.

-
To unsubscribe, e-mail: l10n-unsubscr...@openoffice.apache.org
For additional commands, e-mail: l10n-h...@openoffice.apache.org



Hunspell unmunching question

2014-12-04 Thread Marco A.G.Pinto

Hello!

Around a week ago, Peter from England sent me an e-mail suggesting new 
words to be added to en_GB.


One of them was unsubscribe.

Here is what appears in Proofing Tool GUI:


The strange thing is that I tried the variants in Mozilla and OpenOffice 
and none of them was marked as a typo.


I started meditating about it and wondered if in Hunspell the prefixes 
would attach themselves to all suffixes.


Today I made a test, please see the archive: 
https://dl.dropboxusercontent.com/u/30674540/hunspell_issue_marcoagpinto_20141204.zip
It contains the extracted wordlists both in PTG and Unmunch and also the 
.DIC + .AFF I created for the tests.


In my PTG 3.0 build 67 I get:
*subscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**unsubscribe**
**000**
**subscribe**
**unsubscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**
*In Unmunch for Linux I got:
*subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**000**
**subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**
*I placed a 000 to divide the same word with an exchanged order of the 
code U to make sure it would produce the same results, no matter its 
position.


What this means is that I probably need to change the code of my tool, 
maybe create three arrays:

1st - to store the words with suffixes
2nd - to store the codes of the prefixes
3rd - to store 1st plus all its combinations with the prefixes (it would 
apply prefixes to 1st and store them in 3rd )


Then, I would display the prefixes at the bottom in PTG not following 
the order of the codes?


What this also means is that there are hundreds of combinations not 
appearing in the wordlist which I always publish in .txt in the GitHub 
of the project but that are processed by Hunspell in Mozilla (Firefox, 
Thunderbird and SeaMonkey) and Apache OpenOffice.


Thanks for your time!

Kind regards,
  Marco A.G.Pinto
--


--