Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-05 Thread Kruno

04.05.2016 u 22:43, toki je napisao/la:



On 03/05/2016 19:14, Kruno wrote:

And there is now way any language smaller then English build 
something like outside some institute or outside funded

  project of some sort.

Please either rephrase that sentence, or write it in your native 
language.


I'm trying to figure out if you mean that English is the only language 
in which an N-Gram based grammar checker can be created, or if that is 
the only language for which adequate funding for such a critter can be 
found.



I think it would make more sense that you commented my first mail; 
that's were I could (and would) rephrase.






For small languages even having a spell checker is huge. There's quite a


When working with evidential grammars, or noun class grammars, spell 
checking fall apart, because the entire word is rewritten according to 
the evidential particle, or noun class.


jonathon



How that reflects situation with dictionaries in LO?

No grammar checker of spelling checker can fix dadaism. Nor they can 
think for the user. Only think you can do, is fix typically grammar or 
spelling mistakes by spelling checker and grammar checker working together.


Without grammar checker you still can cache typo. So having spelling 
checker is still better then don't having it.


I really didn't want to to start this kind of discussion, but can't 
thous languages still bit word lists (maybe with affix file) and isn't 
that better then don't having that option at all?


You know what you are talking and you have whole system in your head 
(phonology, morphology...), you know what can possibly go wrong and what 
to correct -- but you only working with Hunspell (and maybe 
LanguageTool) -- so you work with that and forget about stuff you read 
in doctorate degrees.


My point was that shift in how dictionaries are (or will be) build 
should not be expected. I'm surprised that there is that many of them 
(put aside overall quality).


And I can't build even decent grammar checking with LanguageTool without 
corpus available. There is one, but I'm not doing this on my on, alone 
-- forget it. So let it be just spelling checker (broken or not).


Kruno

--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-04 Thread Michael Bauer



Sgrìobh toki na leanas 04/05/2016 aig 21:43:



On 03/05/2016 19:14, Kruno wrote:

And there is now way any language smaller then English build 
something like outside some institute or outside funded

  project of some sort.

Please either rephrase that sentence, or write it in your native 
language.


It's not that hard to understand. What he said is that unless you happen 
to be lucky and have an institute or some funding mechanism, as a small 
language you often don't have the means to go and do the really fancy 
stuff that would be really nice to have. English has a massive amount of 
research and resources to throw at its linguistic problems. A language 
like Scottish Gaelic mostly works on the back of dedicated volunteers 
(or just a volunteer in some cases) donating time and/or expertise.



For small languages even having a spell checker is huge. There's quite a


When working with evidential grammars, or noun class grammars, spell 
checking fall apart, because the entire word is rewritten according to 
the evidential particle, or noun class.


That's a strange and rather defeatist argument. Whatever those are 
(really never heard of evidential grammars, I'm guessing with noun class 
grammars you mean languages like Bantu) I have yet to come across a 
language for which spellchecking is practically or theoretically 
impossible. Ideographic scripts like Chinese perhaps where you need to 
take longer chunks or semantics into account to account for 水 vs 氷 being 
in the wrong place but there are few systems like that. Sure, coverage 
is an issue in morphologically complex languages but it's by no means 
impossible. Basque has a dozen or so cases and a myriad of suffixes 
which can be combined in lots of different ways but oddly enough, 
spellchecking is possible. You just have to be clever about how you go 
about creating them.


No need to throw out the baby with the bathwater.

Michael

--
*Akerbeltz *
Goireasan Gàidhlig air an lìon
Fòn: +44-141-946 4437
Facs: +44-141-945 2701

*Tha Gàidhlig aig a' choimpiutair agad, siuthad, feuch e!*
Iomadh rud eadar prògraman oifis, brabhsairean, predictive texting,
geamannan is mòran a bharrachd. Tadhail oirnn aig www.iGàidhlig.net 



--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-04 Thread toki



On 03/05/2016 19:14, Kruno wrote:


And there is now way any language smaller then English build something like 
outside some institute or outside funded

  project of some sort.

Please either rephrase that sentence, or write it in your native language.

I'm trying to figure out if you mean that English is the only language 
in which an N-Gram based grammar checker can be created, or if that is 
the only language for which adequate funding for such a critter can be 
found.



For small languages even having a spell checker is huge. There's quite a


When working with evidential grammars, or noun class grammars, spell 
checking fall apart, because the entire word is rewritten according to 
the evidential particle, or noun class.


jonathon


--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Kruno




For small languages even having a spell checker is huge. There's quite 
a few English dictionaries out there to help you with this or that, 
but when whole country has population equivalent to only one (average) 
US city, everything is extra hard.



That is why is such a tool _more_ then welcome.

--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Kruno



03.05.2016 u 20:35, toki je napisao/la:

On 03/05/2016 15:51, Kruno wrote:

not doubled maintained word lists by multiple maintainers (not 
knowing each other)

will not and can not be resolved.


With a central repository for working on dictionaries, it is far 
easier for two individuals interested in the same dictionary to find 
each other, than if they are working on two different sites, in 
different locations.


Yes, I agree, never argued that. But now we are talking about my point 
from first mail: you build a place for people involved with LO and 
provide them with tools to make better dictionaries. I was saying that 
it makes more sense tie it up with LO for LO then thinking that you are 
making repository for Hunspell -- you are making one for LibreOffice and 
that's it.


My point was that you can't build repository for 'official' Hunspell's 
dictionaries, only for 'official' LO's dictionaries. I was just saying 
it was communicated a little bit blurry and unclear (to me).


(And you explained it to me in last part of your mail).




Who's dictionary to include to that single repository, how to merge


As a practical matter, a repository that only allows for one 
dictionary per language, is not viable. At a minimum, you'll have 
specialized dictionaries.


[Starting new discussion (sic!)]

Which languages? Have of them don't even have decent affix file (mine 
included, and that's nobody's fault).


I'm not trying discourage or sabotage (I'm really hope this comes alive) 
but who do you think will build such dictionaries for languages that 
don't have them already? Who can maintain that?


Are all those specialized dictionaries sharing an affix file?

That was my second (and last) point, it's sounds like goals are set too 
hight. Not in terms of possibilities, but actual interest.


And we are talking about different things here, having two for the same 
languages was not what I meant. We are starting new discussion here, so 
back on topic:


I was more concerned about how would a such system work. I'm not telling 
anyone what and how (nor even suggesting) because I don't understand all 
of that. Just wanted to know. It sounded so unreal.


[Yes, have a possibility -- we agree on practically everything but you 
are pressure where hurts here ;) ]




how to merge affix files with different affix classes (that will be a 
mess).


I've seen some tools for automating the creation of affix files.
I don't know how well they work, though.


No, no and - no! No scripts with any natural language if you already 
don't have a finished dictionary for cross-referencing. No, no way. (And 
small and not so small languages don't have access to thous, or they 
simply don't exist).





This goes back to my claim that spell checking without built-in 
grammar checking is useless.


>Why you think that included dictionary is 'standard' and is better 
then the other one?


Any dictionary project has to include the ability to have the same 
language in at least two different writing systems --- Braille (^1) 
and the standard writing system for the language.


>The other guy will give up his work?

The proposal does not require the other guy to give up his project.

I wouldn't be surprised to see the other guy create a more specialized 
dictionary.


* John Doe creates a general purpose dictionary;
* Jane Doe creates a name and places dictionary;
* John Roe creates a scientific terminology dictionary;
* Jane Roe creates a basic words dictionary;



It sure will be easier then it is now.





Who will hunt all those 'other' guys telling them 'Yo, dude, leave 
that, do this shit!'


As far as existing spell checking and wordlist projects go, nobody is 
going to tell them to "leave that, do this".


Yes, exactly: so again, you can invite people -- people you already 
know, people who already doing this stuff (and thous are few).


So having some soft of bugzilla for missing or wrong words has more 
potential for regular users (even integrated into UI so it's just 
reporting to a matching language in that repository of some sort).


The dictionary building tool can help the ones already doing it to do it 
better.


(Not trying to make discussion of this)


What might happen, is that known, existing projects, are offered 
space, etc in the proposed repository/incubator, but they will stay 
where they currently are, due to how their workflow operates.


How will such a repository resolve competition between two English 
dictionaries?


Since you specifically mentioned English, there currently are versions 
of English for a dozen locales, plus around half a dozen specialist 
dictionaries.


Most users won't choose the English (OED) variant, because it has too 
many words in it. Too many words means that words that are wrongly 
used, get flagged as correct spelled. The "Eye right withe aye pin" 
phenomena.


How many languages have that problem?




This proposal is about non-technical types being able to _easily_ 
create vi

Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Kruno

03.05.2016 u 20:49, Michael Bauer je napisao/la:
Totally disagree from experience. Of course, both is better but you 
try working in a language with not even a spellchecker and then get 
someone to count the errors. Even mediocre spellcheck coverage kills a 
good % of typos. I just have to take a random Gaelic page off the BBC 
and dump it in LO and count the hits.


Michael

Sgrìobh toki na leanas 03/05/2016 aig 19:35:
This goes back to my claim that spell checking without built-in 
grammar checking is useless. 




I agree. Otherwise you can say that no grammar checker is good if it's 
not n-gram based or such. And there is now way any language smaller then 
English build something like outside some institute or outside funded 
project of some sort.


For small languages even having a spell checker is huge. There's quite a 
few English dictionaries out there to help you with this or that, but 
when whole country has population equivalent to only one (average) US 
city, everything is extra hard.


We all know the downsides of spelling checkers but it's just the way it is.

And yet, spelling checkers (dumb as they are) and grammar checkers (poor 
as they are) still do a lot of good.


It's easier to teach people how to write then make decent grammar 
checker (and that's just the way it is).


--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Michael Bauer
Totally disagree from experience. Of course, both is better but you try 
working in a language with not even a spellchecker and then get someone 
to count the errors. Even mediocre spellcheck coverage kills a good % of 
typos. I just have to take a random Gaelic page off the BBC and dump it 
in LO and count the hits.


Michael

Sgrìobh toki na leanas 03/05/2016 aig 19:35:
This goes back to my claim that spell checking without built-in 
grammar checking is useless. 



--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread toki

On 03/05/2016 15:51, Kruno wrote:


not doubled maintained word lists by multiple maintainers (not knowing each 
other)

will not and can not be resolved.


With a central repository for working on dictionaries, it is far easier 
for two individuals interested in the same dictionary to find each 
other, than if they are working on two different sites, in different 
locations.



Who's dictionary to include to that single repository, how to merge


As a practical matter, a repository that only allows for one dictionary 
per language, is not viable. At a minimum, you'll have specialized 
dictionaries.



how to merge affix files with different affix classes (that will be a mess).


I've seen some tools for automating the creation of affix files.
I don't know how well they work, though.

This goes back to my claim that spell checking without built-in grammar 
checking is useless.


>Why you think that included dictionary is 'standard' and is better 
then the other one?


Any dictionary project has to include the ability to have the same 
language in at least two different writing systems --- Braille (^1) and 
the standard writing system for the language.


>The other guy will give up his work?

The proposal does not require the other guy to give up his project.

I wouldn't be surprised to see the other guy create a more specialized 
dictionary.


* John Doe creates a general purpose dictionary;
* Jane Doe creates a name and places dictionary;
* John Roe creates a scientific terminology dictionary;
* Jane Roe creates a basic words dictionary;


Who will hunt all those 'other' guys telling them 'Yo, dude, leave that, do 
this shit!'


As far as existing spell checking and wordlist projects go, nobody is 
going to tell them to "leave that, do this". What might happen, is that 
known, existing projects, are offered space, etc in the proposed 
repository/incubator, but they will stay where they currently are, due 
to how their workflow operates.



How will such a repository resolve competition between two English dictionaries?


Since you specifically mentioned English, there currently are versions 
of English for a dozen locales, plus around half a dozen specialist 
dictionaries.


Most users won't choose the English (OED) variant, because it has too 
many words in it. Too many words means that words that are wrongly used, 
get flagged as correct spelled. The "Eye right withe aye pin" phenomena.



Nobody can (or should) just declare 'we are building dictionary
repository - here use this, not that' just because being in position of
power to do that.


The proposal does not mandate that only the proposed space/workflow/etc 
be used. In an ideal world, existing groups would be able to drop their 
work-product into the repository, with only one change to their workflow 
--- a bot that automatically uploads their new, verified, approved work 
product into the repository. Furthermore, this change would occur, if, 
and only if the existing group wanted to do so.


This proposal is about non-technical types being able to _easily_ create 
viable dictionaries for their specific use-case. It doesn't matter if 
that use-case is a dictionary in Pondo, or a dictionary of people and 
places in Bharat, or a dictionary in Moon.


The other part of the proposal is that even if the original dictionary 
creator abandons the dictionary, it can still be maintained, and updated.


The third part of the proposal is that whilst it is initially for LibO, 
the hope is that it becomes the source for dictionaries for FLOSS projects.


#

Hypothetical situation. One of Kevin Scannell's students decides that 
what the world needs is a dictionaries in each of the 2,500 languages 
that have been reduced to a writing system.  So said student walks thru 
Kevin's word lists, and creates a dictionary project for each of the 
2,000 languages that Kevin maintains word lists for. A year later, said 
student graduates, and forgets about their dictionaries.


Under the current scenarios, when said student abandons their 
dictionaries, the only way other people can update them, is by forking 
them --- assuming that the license allows forking.


Under the proposed scenario, if said student creates the dictionaries in 
the repository, when said student abandons them, other people can still 
update the dictionaries, which can then be distributed to LibO, etc.


I'll grant that were said student to create 2,000+ dictionaries for 
LibO, it would break the UI. However, as far as the proposal goes, that 
breakage is irrelevant.



use of hunspell features correctly (not simple word lists, but by logic)

what this mean?


For non-techies, creating a HunSpell dictionary is a non-starter, 
because they don't understand the vocabulary that it uses.


For techies, the technical description is, at best, off-putting.


features but those dictionaries who do just word list will continue do
just the word list because is purgatory (or hell) to do this right if
you 

Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Marco A.G.Pinto
Michael and people,

I make a brief explanation of how .AFF files work in Proofing Tool GUI's
manual:
http://marcoagpinto.cidadevirtual.pt/proofingtoolgui_files/ProofingToolGUI_manual_V30.html

But it is an unfinished manual since I haven't had much free time to
work on everything.

Kind regards,
>Marco A.G.Pinto
  

On 03/05/2016 18:55, Michael Bauer wrote:
> There are many things about Hunspell that make me shake my head in
> disbelief ;) All the more so given how widely used it is. Like the
> lack of documentation to the outside world. I mean, like a user
> friendly "Here is how you make a spellchecker for your language" and
> "All about affixes" page. But their github space aside, there doesn't
> seem to be much. Not at a casual websearch anyway.
>
> The old FOSS caltrop? Great idea, machinery and people, shaky end-user
> strategy?
>
> M
>
> Sgrìobh Kruno na leanas 03/05/2016 aig 17:26:
>> I see... (and we shall see).
>>
>> It's always seemed logical to me for Hunspell project to host
>> 'official' dictionaries. Since it's lacking this feature -- here is a
>> chance for LO to play this card right.
>>
>> No doubt, it could become huge.
>>
>> Thanks for clarifications,
>> Kruno 
>
>


-- 

-- 
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Michael Bauer
There are many things about Hunspell that make me shake my head in 
disbelief ;) All the more so given how widely used it is. Like the lack 
of documentation to the outside world. I mean, like a user friendly 
"Here is how you make a spellchecker for your language" and "All about 
affixes" page. But their github space aside, there doesn't seem to be 
much. Not at a casual websearch anyway.


The old FOSS caltrop? Great idea, machinery and people, shaky end-user 
strategy?


M

Sgrìobh Kruno na leanas 03/05/2016 aig 17:26:

I see... (and we shall see).

It's always seemed logical to me for Hunspell project to host 
'official' dictionaries. Since it's lacking this feature -- here is a 
chance for LO to play this card right.


No doubt, it could become huge.

Thanks for clarifications,
Kruno 



--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Kruno



03.05.2016 u 17:58, Michael Bauer je napisao/la:



Sgrìobh Kruno na leanas 03/05/2016 aig 16:51:


will not and can not be resolved.

Who's dictionary to include to that single repository, how to merge 
them, how to resolve different concepts (what to include, what to 
exclude), how to merge affix files with different affix classes (that 
will be a mess). Why you think that included dictionary is 'standard' 
and is better then the other one? How to introduce those guys not 
knowing each other? The other guy will give up his work? Who will 
hunt all those 'other' guys telling them 'Yo, dude, leave that, do 
this shit!'


How will such a repository resolve competition between two English 
dictionaries?
I think this is less of a problem that it might seem at first glance. 
To begin with, there are not that many dictionaries which have active 
competing teams. Even en-GB (where you might expect a flurry of 
competition) only has a single maintainer.


First I think you gather in all those projects willing to participate, 
then you rescue the dead ones and then you try and work out 
arrangements with competing dictionaries. I don't see a reason which 
such a resource could not host multiple dictionaries for the same 
locale, even if it somehow selects one for default inclusion. Many 
locales have pre and post spelling reform variants anyway so you have 
to allow for multiples anyway so if you had 3 competing en-US 
dictionaries, you just label them differently if the differences 
cannot be reconciled.





My point: it should be repository for maintaining hunspell's 
dictionaries and building extensions for other project - that's fine, 
but don't expect it to be lively as Pootle and translations - it's 
just not gonna happen, that's not realistic.
It won't and that's a good thing, here too many cooks will certainly 
spoil the broth


Did I get it all wrong?

Hit me hard,
No, I think I had thoughts similar to your going through my head but I 
think the concept still has legs if we pull together.


Michael


I see... (and we shall see).

It's always seemed logical to me for Hunspell project to host 'official' 
dictionaries. Since it's lacking this feature -- here is a chance for LO 
to play this card right.


No doubt, it could become huge.

Thanks for clarifications,
Kruno


--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Michael Bauer



Sgrìobh Kruno na leanas 03/05/2016 aig 16:51:


will not and can not be resolved.

Who's dictionary to include to that single repository, how to merge 
them, how to resolve different concepts (what to include, what to 
exclude), how to merge affix files with different affix classes (that 
will be a mess). Why you think that included dictionary is 'standard' 
and is better then the other one? How to introduce those guys not 
knowing each other? The other guy will give up his work? Who will hunt 
all those 'other' guys telling them 'Yo, dude, leave that, do this shit!'


How will such a repository resolve competition between two English 
dictionaries?
I think this is less of a problem that it might seem at first glance. To 
begin with, there are not that many dictionaries which have active 
competing teams. Even en-GB (where you might expect a flurry of 
competition) only has a single maintainer.


First I think you gather in all those projects willing to participate, 
then you rescue the dead ones and then you try and work out arrangements 
with competing dictionaries. I don't see a reason which such a resource 
could not host multiple dictionaries for the same locale, even if it 
somehow selects one for default inclusion. Many locales have pre and 
post spelling reform variants anyway so you have to allow for multiples 
anyway so if you had 3 competing en-US dictionaries, you just label them 
differently if the differences cannot be reconciled.





My point: it should be repository for maintaining hunspell's 
dictionaries and building extensions for other project - that's fine, 
but don't expect it to be lively as Pootle and translations - it's 
just not gonna happen, that's not realistic.
It won't and that's a good thing, here too many cooks will certainly 
spoil the broth


Did I get it all wrong?

Hit me hard,
No, I think I had thoughts similar to your going through my head but I 
think the concept still has legs if we pull together.


Michael


--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Kruno

03.05.2016 u 13:29, Dennis Roczek je napisao/la:

And in this paragraph is lying more stuff than some might imagine. Marco
is doing a great job, but because of the underestimate of the hunspell
features and the "wrongly" used git, he is not able to update the
bundled dictionary of LibreOffice (through the bundled one in AOO was
updated).

My proposal for the new project would solve many of the problems Marco
has. (if he would have used the project from the beginning, I'm sure we
can solve the actual situation with the en_GB dictionary).

I have started a wiki page for tracking proposals, ideas, pros and cons
together on one page.
https://wiki.documentfoundation.org/User:Dennisroczek/CDP

Please feel free to improve the page!


I've read that wiki page and as somebody who just started to maintain 
dictionary have to say that it's all sounds so good to me to a point 
being fantastic (science fiction fantastic).


First:

not doubled maintained word lists by multiple maintainers (not knowing 
each other)


will not and can not be resolved.

Who's dictionary to include to that single repository, how to merge 
them, how to resolve different concepts (what to include, what to 
exclude), how to merge affix files with different affix classes (that 
will be a mess). Why you think that included dictionary is 'standard' 
and is better then the other one? How to introduce those guys not 
knowing each other? The other guy will give up his work? Who will hunt 
all those 'other' guys telling them 'Yo, dude, leave that, do this shit!'


How will such a repository resolve competition between two English 
dictionaries?


You can only make one of them as LO's default and hope it will get 
maintained regularly and well (and that other guy will help).


But again,you can resolve which one to include and for which one to care 
about. What you will get is other people contributing to a (new) default 
LO's dictionary and that's where it might end.


Nobody can (or should) just declare 'we are building dictionary 
repository - here use this, not that' just because being in position of 
power to do that.


You can leave everything as is but it should be communicated 
differently. It's blurry like this, at least it is to me.


It's sounds like equation goes 'LO dictionary maintainer = hunspell 
dictionary maintainer' but that is not evaluating true (it's so obvious 
that I'm risking here to be called a fool).


Maybe I misunderstood something (my English is bad).

Croatian dictionary hasn't been update since it was released in 2001, 
that's fifteen years. I'm desperate for somebody to help me, but the 
whole concept feels a little bit problematic. It's the concept that's 
bugging me...


What should be done is central repository for _LO_ dictionary and hope 
that _that instance_ -- that particular dictionary -- will become widely 
use (Firefox etc.).


Next:


use of hunspell features correctly (not simple word lists, but by logic)


what this mean? OK, every dictionary should be build using all the 
features but those dictionaries who do just word list will continue do 
just the word list because is purgatory (or hell) to do this right if 
you were not doing it right from the beginning.


If people start randomly add affix class nobody will be able to maintain 
those dictionaries pretty soon. That can not be done easily if 
maintainer doesn't have access to some other dictionary so he can 
automate this job. It's pain and I often regret I took that on myself. 
There is no guy in the world who can add word with affix to that 
dictionary without me because it will take him tree days to study it (I 
need to write some kind of manual for that as soon as possible).


My point: it should be repository for maintaining hunspell's 
dictionaries and building extensions for other project - that's fine, 
but don't expect it to be lively as Pootle and translations - it's just 
not gonna happen, that's not realistic.


Maintaining a dictionary is devil's work and only doom are willing to do 
that; let others to participate but only a few will care for technical 
part even without a git or gerrit. Other will just want to add word or 
two, report a bad suggestion and such, but generally this repository 
will make job easy only for thous who are already doing this and that's 
where it might end (but it's a lot).


(This is not criticism to the wiki page, I wonted to post this three 
days ago...)


Did I get it all wrong?

Hit me hard,
Kruno

--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Dennis Roczek
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi *,

Am 01.05.2016 um 16:25 schrieb Marco A.G.Pinto:
> Hello!
> 
> This is what I have been doing for nearly three years.
> 
> I and other people realised that the bundled dictionaries were totally
> outdated and I had some difficulties setting up the technical stuff
> (Git, Gerrit, etc.) and I had a discussion on IRC that a push is very
> likely not going to be merged because of some licence problems and suc
h.
And in this paragraph is lying more stuff than some might imagine. Marco
is doing a great job, but because of the underestimate of the hunspell
features and the "wrongly" used git, he is not able to update the
bundled dictionary of LibreOffice (through the bundled one in AOO was
updated).

My proposal for the new project would solve many of the problems Marco
has. (if he would have used the project from the beginning, I'm sure we
can solve the actual situation with the en_GB dictionary).

I have started a wiki page for tracking proposals, ideas, pros and cons
together on one page.
https://wiki.documentfoundation.org/User:Dennisroczek/CDP

Please feel free to improve the page!

> Thanks!
> 
> Kind regards,
>  >Marco A.G.Pinto
Dennis Roczek
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJXKIuPAAoJEM4+Qf3OKrbZ2VMP/RaRnD42Q9JmhRkTSHaawVZK
HrrRBmUI1aywutO86PvYM6PsRQ/WwOJnuHJRKnZ8sjyTQ5oR5vAi1aGePEmKO1Qp
U3HzTtZbqy9ZoLO96p8cTwZELVop3jiWx1VhkG96wLWFin1PJM1+jlBYQEmsPMFS
XUKMZb3l9IKWtHzTGpqEIa5wdm4RbdjpnpKYwSiAcfOE65feA9CbApmYzDCutowk
/bk+piLWD6O0eB9OXaDPZXPL1bSa8K2gBbOB/jXk3OYar8zwujdKCiceagKmZOb7
gj0XH4njgp9rE/SC8P3+/KeO7m7LvpE7ppFaMojPN+GjAiJG8hq+AOcIjhzISUAW
8IMw7DI31NPeGUBhhGNTSUHDPiKB1KTUgsMbyQu9L5QzsZR+U5qExNBf31yKyaGA
fMXmZba7L4T+DSeIwzJX+sdaUTno+soZZxXSvZeU1hBzzCc5TP7I8bYTw1nVIxCY
g9pMKPKVnZK1C1Y1MlWa4hzLpH5ik+UoBHf+ZCUNvHthuPhHZLQxWxEURgP2moST
kUSZ67gO46ofAPRr1xYS76ROUVo2kg7wuo+X1uTC/c+SlW22hwkUAROs26HCD8ug
Juc1KPkbdJCV2saW12hnDnDSz0BE8EGcxE6dwACbiypb8TEaCclH7wVuiOoE5SGp
VcJGKJ5djquxv8yJPjek
=iYse
-END PGP SIGNATURE-

-- 
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-03 Thread Michael Bauer

Appreciate your good work Marco, many thanks!

Michael

Sgrìobh Marco A.G.Pinto na leanas 01/05/2016 aig 15:25:

My forked en_GB is the best British dictionary around and I have even
bought a Gold Account in Oxford Dictionaries in order to test the words
so that I may add them if needed. I first paste texts into Thunderbird
and, if it appears as typos, I check them in Oxford and add them if they
are valid words. Since the en_GB that came with AOO was obfuscated, I
grabbed the one from Mozilla around three years ago.

I do monthly releases of en_GB.

This is what I have been doing for nearly three years.



--
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted


[libreoffice-l10n] English Dictionaries Project - Introduction by Marco Pinto

2016-05-01 Thread Marco A.G.Pinto
Hello!

I am Marco from Portugal and I have been involved on several projects in
the past.

I have translated Pretty Good Privacy 2.6.3i to pt_PT back in 1997,
which was my first translation.

I have also translated Gpg4win, OpenSlides, sites and other documentation.

Lately I have dedicated most of my free time to my PhD project/thesis,
to the British Dictionary, to my tool Proofing Tool GUI and LanguageTool.

Around three years ago I had the idea of creating a place to store the
most up-to-date English dictionaries so that I could create OXTs for AOO
and LO and so that people could download the files from there to other
projects.

I have been trying to find the original authors and get their most
recent files but most of them are gone for a long-time, thus my fork of
en_GB.

I created a GitHub repository and there are already persons/companies
that download the files from there using scripts.

When I created my tool, Proofing Tool GUI, the first goal was to create
a thesaurus editor and the second to create a dictionary editor.

Both have been accomplished.

I offered myself to improve the thesaurus of the pt_PT language for
Minho University, but they never said anything so the project was halted
(I will resume somewhere in the future).

But other persons are developing thesauri using Proofing Tool GUI.

I have also forked and been improving en_GB and have added so far around
21K words and I am releasing it as extensions for Mozilla, OpenOffice
and LibreOffice:

*Mozilla (British):*
https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary-2

*OpenOffice:*
http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice

*LibreOffice:*
http://extensions.libreoffice.org/extension-center/english-dictionaries

Notice that for AOO and LO I am releasing OXTs with several English
Dictionaries.

My forked en_GB is the best British dictionary around and I have even
bought a Gold Account in Oxford Dictionaries in order to test the words
so that I may add them if needed. I first paste texts into Thunderbird
and, if it appears as typos, I check them in Oxford and add them if they
are valid words. Since the en_GB that came with AOO was obfuscated, I
grabbed the one from Mozilla around three years ago.

I do monthly releases of en_GB.

This is what I have been doing for nearly three years.

I and other people realised that the bundled dictionaries were totally
outdated and I had some difficulties setting up the technical stuff
(Git, Gerrit, etc.) and I had a discussion on IRC that a push is very
likely not going to be merged because of some licence problems and such.

Thanks!

Kind regards,
 >Marco A.G.Pinto
   ---

-- 

-- 
To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted