Re: Roadmap for Spanish

2016-04-07 Thread Pablo Saratxaga
Kaixo,

Li Thu, Apr 07, 2016 at 01:50:42PM +0200, Juan Martorell scrijha:

>  For example "morirteme" would be wrong.
> 
>This counterexample is not the best IMHO. Google gives some results. The

>Consider: "Por muy enfermo que estés, ni se te ocurra morírteme ahora."

Indeed!
And the reason it makes sense it is used in figurative sense (in such
case the real meaning is not just "to die" but "to willingly die", eg
to kill oneself; I didn't tough about that possibility.

Anyway, my point was that the possible pronouns depend on meaning;
and some meanings just don't make sense.

As for the question if it is a problem or not; well meaningless combinations
are not very likely to occur on a text; but if those automatically
generated combinations are used by a spell checker suggestion, then some
wrong/unwanted suggestions happen, and maybe even push aside better ones.

If meaning or "existence" of a form is handled differently, and only
"grammatical possibility" is intendend, then indeed the larger definition
of flexion mechanisms is better (as that is also the main way how neologism
are created).

Can the dictionnaires used for POS tagging and for providing suggestions
be different?

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://chanae.walon.org/pablo/  PGP Key available, key ID: 0xD9B85466

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Roadmap for Spanish

2016-04-07 Thread Juan Martorell
Thanks Pabo for your points.

On 7 April 2016 at 11:07, Pablo Saratxaga  wrote:

> Li Wed, Apr 06, 2016 at 02:55:52PM +0200, Juan Martorell scrijha:
>
> >It is quite common to attach some pronouns to the verb thus including
> >information about direct and/or indirect object, or passive/impersonal
> >voice. Combinations are hughe, some like:
>
> but for a proper synthetisation, the verb itsefl has to be correctly
> tagged first, so to know if a pronoun can be added, or if two can be added.
>
> blind automatic generation will lead to a huge mass of incorrect forms.
>

I'd like to see how huge it will be. Some rare or unfrequent forms will not
harm for sure, but I'm not sure how harmful can be some incorrect forms,
for our purpose.


> For example "morirteme" would be wrong.
>

This counterexample is not the best IMHO. Google gives some results. The
correct spelling includes diacritical (*morírteme*), but the point is that
this word, even rare, is gramatically correct and it has full sense in its
context.

Consider: "*Por muy enfermo que estés, ni se te ocurra morírteme ahora.*" "*Con
el viento he hace, si sales así vestido vas a morírteme de frío. Ponte una
chaqueta, anda.*"


>
> With the prefixes it is even more difficult, as the adequatness of
> a prefix depends not only on grammatical properties, but also on
> meaning and usage.
>
> For example, while desforestar, deshacer are ok; desmorir, descaer are odd.
> I think automatic use of prefixes (that is, add the to *ALL* verbs) would
> be wrong.
>

Agreed in some extent. Even thoug "*desforestar*" is valid; "*deforestar*"
is the preferred spelling. "*descaer*" is in the RAE's dictionary being "
*decaer*" the most used spelling.
Following your example, even though *desmorir* is not in the RAE's
dictionary, it may be a neologism with figurative content conveying sense
to the reader. I mean, for religious or philosofical texts, *desmorir* (to
*undie*) can make sense when talking about alternative timelines, where "
*resucitar*" (to resurrect) makes worse sense bein active and removing the
undo sense of *undying*.

Bottm line, the point of grammar proofreading is more the syntax rather
than spelling or semantics, so it would be worth to allow some flexibility
while mild warning rare forms. This setting may be tuned via category
activation.
This is a good case for statistical insertion:

   1. produce the word
   2. check upon the word database created from a large corpus
   3. decide its insertion based on its frequency



> My approach would be to define some tags to apply to verbs (nouns, etc)
> that can accept a given prefix.
>

This is compatible with statistical insertion, IMO.
--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Roadmap for Spanish

2016-04-07 Thread Pablo Saratxaga
Kaixo,

Li Wed, Apr 06, 2016 at 02:55:52PM +0200, Juan Martorell scrijha:

>It is quite common to attach some pronouns to the verb thus including
>information about direct and/or indirect object, or passive/impersonal
>voice. Combinations are hughe, some like:

but for a proper synthetisation, the verb itsefl has to be correctly
tagged first, so to know if a pronoun can be added, or if two can be added.

blind automatic generation will lead to a huge mass of incorrect forms.
For example "morirteme" would be wrong.

With the prefixes it is even more difficult, as the adequatness of
a prefix depends not only on grammatical properties, but also on
meaning and usage.

For example, while desforestar, deshacer are ok; desmorir, descaer are odd.
I think automatic use of prefixes (that is, add the to *ALL* verbs) would
be wrong.
My approach would be to define some tags to apply to verbs (nouns, etc)
that can accept a given prefix.

At least in latin-derivated languages, it seems that prefixes are
related to meaning; while suffixes, carry both meaning and grammatical
signification (and so, wrong combinations are easier to discard;
as you pointed with -mente suffix, which applies only to addjectives
and produce adverbs)



-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://chanae.walon.org/pablo/  PGP Key available, key ID: 0xD9B85466

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Roadmap for Spanish

2016-04-07 Thread Marcin Miłkowski
W dniu 06.04.2016 o 14:57, Juan Martorell pisze:
> On 5 April 2016 at 16:29, Jaume Ortolà i Font  > wrote:
>
>
> 2014-06-06 20:45 GMT+02:00 Juan Martorell  >:
>
>
> *1st and foremost: disambiguator:*
>
> My current strategy for disambiguation is starting by the longer
> constructions and then downsizing to the two tokens
> constructions. Positive and negative examples should be included.
>
>
> I can point out some strategies for disambiguation.  I will try to
> make a summary.
>
>
> That's a great opportunity for Wiki improvement!

I added some today:

http://wiki.languagetool.org/developing-a-disambiguator

Best regards,
Marcin

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel