Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 19:18 GMT+02:00 Marco A.G.Pinto :

>  Marcin, Jaume, can someone commit/add the tagger please?
>

I'll do it.

Jaume





> On 08/07/2014 18:08, Marcin Miłkowski wrote:
>
> W dniu 2014-07-08 17:34, Marco A.G.Pinto pisze:
>
>  Hello!
>
> I have contacted my Minho University friends who make the pt_PT
> dictionaries for Mozilla and OpenOffice/LibreOffice.
>
> They said they can create the postag dictionary and help.
>
>  But you're reinventing the wheel. Why? There is a good dictionary
> already available in FreeLing. I can add the tagger dictionary in 15
> minutes if you want. Creating the dictionary from hunspell is a *BAD*
> idea if you already have a tagged wordlist.
>
> Regards,
> Marcin
>
>
>  :-P
>
> Kind regards,
>  >Marco A.G.Pinto
>---
>
>
> On 08/07/2014 10:08, Jaume Ortolà i Font wrote:
>
>  2014-07-08 9:37 GMT+02:00 Marcin Miłkowski 
> mailto:list-addr...@wp.pl> >:
>
>
> The Portuguese dictionary is already built. We simply haven't included
> it yet because we usually start from a certain number of rules,
> and then
> add the tagger. Using the tags in rules is a very good idea overall.
>
>
> I agree with Marcin. The most sensible think to do is to add the
> Freeling POS tag dictionary for Portuguese. As the same tags are used
> in other languages, existing rules can be used as models, or those who
> are familiar with them can help readily.
>
> As an example, I have created a rule in the online rule editor for
> non-agreement (determinant plural - noun singular) in Galician.
>
>
> postag_regexp='yes'>  postag_regexp='yes'> postag_regexp='yes'> regexp='yes'>que|de  Error de
> concordancia Os
> amigo Os amigos
> os dous termos
> os que son requiridos
> 
>
>
> Regards,
> Jaume Ortolà
>
>
>  --
>
>
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awardshttp://p.sf.net/sfu/Bonitasoft
>
>
>
> ___
> Languagetool-devel mailing 
> listLanguagetool-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>  
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awardshttp://p.sf.net/sfu/Bonitasoft
> ___
> Languagetool-devel mailing 
> listLanguagetool-devel@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
> --
>
>
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marco A.G.Pinto

Marcin, Jaume, can someone commit/add the tagger please?

Thanks!

On 08/07/2014 18:08, Marcin Miłkowski wrote:

W dniu 2014-07-08 17:34, Marco A.G.Pinto pisze:

Hello!

I have contacted my Minho University friends who make the pt_PT
dictionaries for Mozilla and OpenOffice/LibreOffice.

They said they can create the postag dictionary and help.

But you're reinventing the wheel. Why? There is a good dictionary
already available in FreeLing. I can add the tagger dictionary in 15
minutes if you want. Creating the dictionary from hunspell is a *BAD*
idea if you already have a tagged wordlist.

Regards,
Marcin


:-P

Kind regards,
  >Marco A.G.Pinto
---


On 08/07/2014 10:08, Jaume Ortolà i Font wrote:

2014-07-08 9:37 GMT+02:00 Marcin Miłkowski mailto:list-addr...@wp.pl>>:


 The Portuguese dictionary is already built. We simply haven't included
 it yet because we usually start from a certain number of rules,
 and then
 add the tagger. Using the tags in rules is a very good idea overall.


I agree with Marcin. The most sensible think to do is to add the
Freeling POS tag dictionary for Portuguese. As the same tags are used
in other languages, existing rules can be used as models, or those who
are familiar with them can help readily.

As an example, I have created a rule in the online rule editor for
non-agreement (determinant plural - noun singular) in Galician.


que|de  Error de
concordancia Os
amigo Os amigos
os dous termos
os que son requiridos



Regards,
Jaume Ortolà



--


--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft



___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel



--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marco A.G.Pinto

Could you commit it, please?

:-P

I will still ask for Minho University help creating the rule... after 
the rule is done I will start adding the regular rules I add in every 
release.


Thank you very much!

Kind regards,
   >Marco A.G.Pinto
 ---


On 08/07/2014 17:55, Jaume Ortolà i Font wrote:
2014-07-08 17:34 GMT+02:00 Marco A.G.Pinto 
mailto:marcoagpi...@mail.telepac.pt>>:


Hello!

I have contacted my Minho University friends who make the pt_PT
dictionaries for Mozilla and OpenOffice/LibreOffice.

They said they can create the postag dictionary and help.


Hi Marco,

What I and Marcin try to say is that there is no need for creating a 
postag dictionary. It is already done and you can download it here 
[1]. The first lines of the dictionary (after merging the different 
grammatical categories which are in separate files) look like this:


wordform // lemma // POS tag

aa a NCMP000
à a+a SPS00+*
aacheniana aacheniano NCFS000
aachenianas aacheniano NCFP000
aacheniano aacheniano NCMS000
aachenianos aacheniano NCMP000
aais aal NCMP000
aal aal NCMS000
aaleniana aaleniano AQ0FS0
aalenianas aaleniano AQ0FP0
aaleniano aaleniano AQ0MS0
aaleniano aaleniano NCMS000
aalenianos aaleniano AQ0MP0
aalenianos aaleniano NCMP000
a a NCMS000
a a SPS00
aba aba NCFS000
aba aba NCMS000
aba abar VMIP3S0
aba abar VMM02S0
abá abar VMN
abá abar VMN01S0
abá abar VMN03S0
ababá ababá AQ0CS0
ababá ababá NCCS000

The next steps for building a dictionary that can be used in 
LanguageTool are described here [2]. You need at least the "binary POS 
dictionary". The "binary synthesizer dictionary" is not essential (it 
is used only for generating suggestions; for example, plural of amigo 
> amigos).


If you want, I can build and commit the dictionary in a few minutes. I 
your learn to do it, you'll be able to make yourself improvements to 
the dictionary in the future.


Regards,
Jaume Ortolà


[1] 
http://nlp.lsi.upc.edu/freeling/index.php?option=com_content&task=view&id=23&Itemid=58
Downlad page: 
http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1

Download and extrat freeling-3.1.tar.gz and go to /data/pt/entries.

[2] http://wiki.languagetool.org/developing-a-tagger-dictionary





--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marcin Miłkowski
W dniu 2014-07-08 17:34, Marco A.G.Pinto pisze:
> Hello!
>
> I have contacted my Minho University friends who make the pt_PT
> dictionaries for Mozilla and OpenOffice/LibreOffice.
>
> They said they can create the postag dictionary and help.

But you're reinventing the wheel. Why? There is a good dictionary 
already available in FreeLing. I can add the tagger dictionary in 15 
minutes if you want. Creating the dictionary from hunspell is a *BAD* 
idea if you already have a tagged wordlist.

Regards,
Marcin

>
> :-P
>
> Kind regards,
>  >Marco A.G.Pinto
>---
>
>
> On 08/07/2014 10:08, Jaume Ortolà i Font wrote:
>> 2014-07-08 9:37 GMT+02:00 Marcin Miłkowski > >:
>>
>>
>> The Portuguese dictionary is already built. We simply haven't included
>> it yet because we usually start from a certain number of rules,
>> and then
>> add the tagger. Using the tags in rules is a very good idea overall.
>>
>>
>> I agree with Marcin. The most sensible think to do is to add the
>> Freeling POS tag dictionary for Portuguese. As the same tags are used
>> in other languages, existing rules can be used as models, or those who
>> are familiar with them can help readily.
>>
>> As an example, I have created a rule in the online rule editor for
>> non-agreement (determinant plural - noun singular) in Galician.
>>
>>
>>> postag_regexp='yes'> > postag_regexp='yes'>> postag_regexp='yes'>> regexp='yes'>que|de  Error de
>> concordancia Os
>> amigo Os amigos
>> os dous termos
>> os que son requiridos
>> 
>>
>>
>> Regards,
>> Jaume Ortolà
>>
>
>
> --
>
>
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
>
>
>
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>


--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 17:34 GMT+02:00 Marco A.G.Pinto :

> Hello!
>
> I have contacted my Minho University friends who make the pt_PT
> dictionaries for Mozilla and OpenOffice/LibreOffice.
>
> They said they can create the postag dictionary and help.
>
>
Hi Marco,

What I and Marcin try to say is that there is no need for creating a postag
dictionary. It is already done and you can download it here [1]. The first
lines of the dictionary (after merging the different grammatical categories
which are in separate files) look like this:

wordform // lemma // POS tag

aa a NCMP000
à a+a SPS00+*
aacheniana aacheniano NCFS000
aachenianas aacheniano NCFP000
aacheniano aacheniano NCMS000
aachenianos aacheniano NCMP000
aais aal NCMP000
aal aal NCMS000
aaleniana aaleniano AQ0FS0
aalenianas aaleniano AQ0FP0
aaleniano aaleniano AQ0MS0
aaleniano aaleniano NCMS000
aalenianos aaleniano AQ0MP0
aalenianos aaleniano NCMP000
a a NCMS000
a a SPS00
aba aba NCFS000
aba aba NCMS000
aba abar VMIP3S0
aba abar VMM02S0
abá abar VMN
abá abar VMN01S0
abá abar VMN03S0
ababá ababá AQ0CS0
ababá ababá NCCS000

The next steps for building a dictionary that can be used in LanguageTool
are described here [2]. You need at least the "binary POS dictionary". The
"binary synthesizer dictionary" is not essential (it is used only for
generating suggestions; for example, plural of amigo > amigos).

If you want, I can build and commit the dictionary in a few minutes. I your
learn to do it, you'll be able to make yourself improvements to the
dictionary in the future.

Regards,
Jaume Ortolà


[1]
http://nlp.lsi.upc.edu/freeling/index.php?option=com_content&task=view&id=23&Itemid=58
Downlad page: http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1
Download and extrat freeling-3.1.tar.gz and go to /data/pt/entries.

[2] http://wiki.languagetool.org/developing-a-tagger-dictionary
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marco A.G.Pinto

Hello!

I have contacted my Minho University friends who make the pt_PT 
dictionaries for Mozilla and OpenOffice/LibreOffice.


They said they can create the postag dictionary and help.

:-P

Kind regards,
>Marco A.G.Pinto
  ---


On 08/07/2014 10:08, Jaume Ortolà i Font wrote:
2014-07-08 9:37 GMT+02:00 Marcin Mi?kowski >:



The Portuguese dictionary is already built. We simply haven't included
it yet because we usually start from a certain number of rules,
and then
add the tagger. Using the tags in rules is a very good idea overall.


I agree with Marcin. The most sensible think to do is to add the 
Freeling POS tag dictionary for Portuguese. As the same tags are used 
in other languages, existing rules can be used as models, or those who 
are familiar with them can help readily.


As an example, I have created a rule in the online rule editor for 
non-agreement (determinant plural - noun singular) in Galician.



   postag_regexp='yes'> postag_regexp='yes'>postag_regexp='yes'>regexp='yes'>que|de  Error de 
concordancia Os 
amigo Os amigos

os dous termos
os que son requiridos



Regards,
Jaume Ortolà




--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 9:37 GMT+02:00 Marcin Miłkowski :

>
> The Portuguese dictionary is already built. We simply haven't included
> it yet because we usually start from a certain number of rules, and then
> add the tagger. Using the tags in rules is a very good idea overall.
>
>
I agree with Marcin. The most sensible think to do is to add the Freeling
POS tag dictionary for Portuguese. As the same tags are used in other
languages, existing rules can be used as models, or those who are familiar
with them can help readily.

As an example, I have created a rule in the online rule editor for
non-agreement (determinant plural - noun singular) in Galician.


que|de  Error de
concordancia Os
amigo Os amigos
 os dous termos
 os que son requiridos



Regards,
Jaume Ortolà
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Marcin Miłkowski
W dniu 2014-07-08 08:15, R.J. Baars pisze:
> I think the best way is not tu use Hunspell directly, but to use Hunspell
> to help you create a PT_PT postag dictionary.
>
> That will help, not just in the current rules, but in lots of others as
> well. Postags are quite easy to use in the grammar xml.
>
> Marcin, Daniel, Am I correct in this?

Well, if you're asking about hunspell, then converting hunspell is the 
most tedious way. Marco can simply use a ready dictionary from FreeLing. 
Looking into hunspell is like trying to build a car from beer cans. In 
principle, you can do this, but it's a lengthy process and a costly one. 
But there's a ready file ready for grabs.

The Portuguese dictionary is already built. We simply haven't included 
it yet because we usually start from a certain number of rules, and then 
add the tagger. Using the tags in rules is a very good idea overall.

Marcin

>
> Ruud
>
>> Yes, the pt_PT .DIC has lots of
>>
>> +CAT=adj,G=m,N=s
>>
>>
>> but, I don't know how to code that into grammar.xml *yet* :-P
>>
>> I hope that Catalan does, so that I can see how it works?
>>
>> Thanks!
>>
>> Kind regards,
>>>Marco A.G.Pinto
>>  ---
>>
>> On 08/07/2014 07:00, R.J. Baars wrote:
>>> Looks you have been using the .dic-file mostly.
>>>
>>> When I look at the line in the aff that sais:
>>>
>>> SFX r   ogiaógico   logia   +CAT=adj,G=m,N=s
>>>
>>> I see that the suffix coded r in this case seems to be and adjective,
>>> male, single?
>>> (I don know any Portuguese, but I do know Hunspell..)
>>>
>>>
>>> I am quite sure that this file can be used to 'tag' a huge list of
>>> portuguese words to tag them.
>>>
>>> This way, a portuguese tagging dictionary could be generated.
>>>
>>> One would need a big portuguese words list (which I have), this affix
>>> file, and more knowledge of Portuguese.
>>>
>>> Sounds feasible ...
>>>
>>> Ruud
>>>
>>>
>>>
 Ruud!

 Thanks for your help.

 I have seen .AFF files since I am adding words to the en_GB for Mozilla
 and OpenOffice.

 But, basically all I do in en_GB is to add words and codes in front of
 them to generate more words, for example:
 *store/S* will generate:
 1) store
 2) stores
 and I have a user guide for each letter code.

 I noticed that the pt_PT Hunspell .DIC has lots of capital letters in
 front of the words+codes, after a kind of TAB character which is used
 to
 separate them.

 This is all I know :-P

 I guess I must try Catalan... I wanted to do it tomorrow but I have the
 dentist appointment in the morning and in the afternoon I will be at
 the
 university.

 I will try to have a look at it the moment I have some free time.

 Thank you all once again!

 Kind regards,
   >Marco A.G.Pinto
 ---


 On 07/07/2014 21:14, R.J. Baars wrote:
> There is some basis morphological data in the affix file of Hunspell.
> The
> Hunspell flags seem to be made on a word type basis.
>
> If that has been done correctly, postags could be derived form the
> flags
> ...
> It might be rough, but may also be just enough.
>
> If you never read an affix file, feel free to ask. Have a look at
> suffixes, these are probably the most useful.
>
>
> Ruud
>>
>> --
>> --
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>


--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bo

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread R.J. Baars

There is loads of them in the wiki, in the other languages files etc.

It is quite simple, instead of checking the word itself, you check the
'type of word'.

Ruud

> But, how?
>
> Is there any documentation around with examples?
>
>
> On 08/07/2014 07:15, R.J. Baars wrote:
>> I think the best way is not tu use Hunspell directly, but to use
>> Hunspell
>> to help you create a PT_PT postag dictionary.
>>
>> That will help, not just in the current rules, but in lots of others as
>> well. Postags are quite easy to use in the grammar xml.
>>
>> Marcin, Daniel, Am I correct in this?
>>
>> Ruud
>>
>
> --
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marco A.G.Pinto

But, how?

Is there any documentation around with examples?


On 08/07/2014 07:15, R.J. Baars wrote:

I think the best way is not tu use Hunspell directly, but to use Hunspell
to help you create a PT_PT postag dictionary.

That will help, not just in the current rules, but in lots of others as
well. Postags are quite easy to use in the grammar xml.

Marcin, Daniel, Am I correct in this?

Ruud



--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread R.J. Baars
I think the best way is not tu use Hunspell directly, but to use Hunspell
to help you create a PT_PT postag dictionary.

That will help, not just in the current rules, but in lots of others as
well. Postags are quite easy to use in the grammar xml.

Marcin, Daniel, Am I correct in this?

Ruud

> Yes, the pt_PT .DIC has lots of
>
> +CAT=adj,G=m,N=s
>
>
> but, I don't know how to code that into grammar.xml *yet* :-P
>
> I hope that Catalan does, so that I can see how it works?
>
> Thanks!
>
> Kind regards,
>   >Marco A.G.Pinto
> ---
>
> On 08/07/2014 07:00, R.J. Baars wrote:
>> Looks you have been using the .dic-file mostly.
>>
>> When I look at the line in the aff that sais:
>>
>> SFX r   ogiaógico   logia   +CAT=adj,G=m,N=s
>>
>> I see that the suffix coded r in this case seems to be and adjective,
>> male, single?
>> (I don know any Portuguese, but I do know Hunspell..)
>>
>>
>> I am quite sure that this file can be used to 'tag' a huge list of
>> portuguese words to tag them.
>>
>> This way, a portuguese tagging dictionary could be generated.
>>
>> One would need a big portuguese words list (which I have), this affix
>> file, and more knowledge of Portuguese.
>>
>> Sounds feasible ...
>>
>> Ruud
>>
>>
>>
>>> Ruud!
>>>
>>> Thanks for your help.
>>>
>>> I have seen .AFF files since I am adding words to the en_GB for Mozilla
>>> and OpenOffice.
>>>
>>> But, basically all I do in en_GB is to add words and codes in front of
>>> them to generate more words, for example:
>>> *store/S* will generate:
>>> 1) store
>>> 2) stores
>>> and I have a user guide for each letter code.
>>>
>>> I noticed that the pt_PT Hunspell .DIC has lots of capital letters in
>>> front of the words+codes, after a kind of TAB character which is used
>>> to
>>> separate them.
>>>
>>> This is all I know :-P
>>>
>>> I guess I must try Catalan... I wanted to do it tomorrow but I have the
>>> dentist appointment in the morning and in the afternoon I will be at
>>> the
>>> university.
>>>
>>> I will try to have a look at it the moment I have some free time.
>>>
>>> Thank you all once again!
>>>
>>> Kind regards,
>>>  >Marco A.G.Pinto
>>>---
>>>
>>>
>>> On 07/07/2014 21:14, R.J. Baars wrote:
 There is some basis morphological data in the affix file of Hunspell.
 The
 Hunspell flags seem to be made on a word type basis.

 If that has been done correctly, postags could be derived form the
 flags
 ...
 It might be rough, but may also be just enough.

 If you never read an affix file, feel free to ask. Have a look at
 suffixes, these are probably the most useful.


 Ruud
>
> --
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marco A.G.Pinto

Yes, the pt_PT .DIC has lots of

+CAT=adj,G=m,N=s


but, I don't know how to code that into grammar.xml *yet* :-P

I hope that Catalan does, so that I can see how it works?

Thanks!

Kind regards,
 >Marco A.G.Pinto
   ---

On 08/07/2014 07:00, R.J. Baars wrote:

Looks you have been using the .dic-file mostly.

When I look at the line in the aff that sais:

SFX r   ogiaógico   logia   +CAT=adj,G=m,N=s

I see that the suffix coded r in this case seems to be and adjective,
male, single?
(I don know any Portuguese, but I do know Hunspell..)


I am quite sure that this file can be used to 'tag' a huge list of
portuguese words to tag them.

This way, a portuguese tagging dictionary could be generated.

One would need a big portuguese words list (which I have), this affix
file, and more knowledge of Portuguese.

Sounds feasible ...

Ruud




Ruud!

Thanks for your help.

I have seen .AFF files since I am adding words to the en_GB for Mozilla
and OpenOffice.

But, basically all I do in en_GB is to add words and codes in front of
them to generate more words, for example:
*store/S* will generate:
1) store
2) stores
and I have a user guide for each letter code.

I noticed that the pt_PT Hunspell .DIC has lots of capital letters in
front of the words+codes, after a kind of TAB character which is used to
separate them.

This is all I know :-P

I guess I must try Catalan... I wanted to do it tomorrow but I have the
dentist appointment in the morning and in the afternoon I will be at the
university.

I will try to have a look at it the moment I have some free time.

Thank you all once again!

Kind regards,
 >Marco A.G.Pinto
   ---


On 07/07/2014 21:14, R.J. Baars wrote:

There is some basis morphological data in the affix file of Hunspell.
The
Hunspell flags seem to be made on a word type basis.

If that has been done correctly, postags could be derived form the flags
...
It might be rough, but may also be just enough.

If you never read an affix file, feel free to ask. Have a look at
suffixes, these are probably the most useful.


Ruud


--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread R.J. Baars

Looks you have been using the .dic-file mostly.

When I look at the line in the aff that sais:

SFX r   ogiaógico   logia   +CAT=adj,G=m,N=s

I see that the suffix coded r in this case seems to be and adjective,
male, single?
(I don know any Portuguese, but I do know Hunspell..)


I am quite sure that this file can be used to 'tag' a huge list of
portuguese words to tag them.

This way, a portuguese tagging dictionary could be generated.

One would need a big portuguese words list (which I have), this affix
file, and more knowledge of Portuguese.

Sounds feasible ...

Ruud



> Ruud!
>
> Thanks for your help.
>
> I have seen .AFF files since I am adding words to the en_GB for Mozilla
> and OpenOffice.
>
> But, basically all I do in en_GB is to add words and codes in front of
> them to generate more words, for example:
> *store/S* will generate:
> 1) store
> 2) stores
> and I have a user guide for each letter code.
>
> I noticed that the pt_PT Hunspell .DIC has lots of capital letters in
> front of the words+codes, after a kind of TAB character which is used to
> separate them.
>
> This is all I know :-P
>
> I guess I must try Catalan... I wanted to do it tomorrow but I have the
> dentist appointment in the morning and in the afternoon I will be at the
> university.
>
> I will try to have a look at it the moment I have some free time.
>
> Thank you all once again!
>
> Kind regards,
> >Marco A.G.Pinto
>   ---
>
>
> On 07/07/2014 21:14, R.J. Baars wrote:
>> There is some basis morphological data in the affix file of Hunspell.
>> The
>> Hunspell flags seem to be made on a word type basis.
>>
>> If that has been done correctly, postags could be derived form the flags
>> ...
>> It might be rough, but may also be just enough.
>>
>> If you never read an affix file, feel free to ask. Have a look at
>> suffixes, these are probably the most useful.
>>
>>
>> Ruud
>>
>>
>
>
> --
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marco A.G.Pinto

Ruud!

Thanks for your help.

I have seen .AFF files since I am adding words to the en_GB for Mozilla 
and OpenOffice.


But, basically all I do in en_GB is to add words and codes in front of 
them to generate more words, for example:

*store/S* will generate:
1) store
2) stores
and I have a user guide for each letter code.

I noticed that the pt_PT Hunspell .DIC has lots of capital letters in 
front of the words+codes, after a kind of TAB character which is used to 
separate them.


This is all I know :-P

I guess I must try Catalan... I wanted to do it tomorrow but I have the 
dentist appointment in the morning and in the afternoon I will be at the 
university.


I will try to have a look at it the moment I have some free time.

Thank you all once again!

Kind regards,
   >Marco A.G.Pinto
 ---


On 07/07/2014 21:14, R.J. Baars wrote:

There is some basis morphological data in the affix file of Hunspell. The
Hunspell flags seem to be made on a word type basis.

If that has been done correctly, postags could be derived form the flags ...
It might be rough, but may also be just enough.

If you never read an affix file, feel free to ask. Have a look at
suffixes, these are probably the most useful.


Ruud





--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread R.J. Baars
There is some basis morphological data in the affix file of Hunspell. The
Hunspell flags seem to be made on a word type basis.

If that has been done correctly, postags could be derived form the flags ...
It might be rough, but may also be just enough.

If you never read an affix file, feel free to ask. Have a look at
suffixes, these are probably the most useful.


Ruud




> I need to be able to check if the word after "o" or "os" in Portuguese
> is singular or plural.
>
> I could use a workaround if no solution is available, but even for a
> workaround I need some help regarding the message I sent last night:
> *Subject: Need help improving concordance rules*
> The code I provided in last night's message gives lots of errors when I
> use: *testrules pt*
>
> Thanks for your help!
>
> Kind regards,
>   >Marco A.G.Pinto
> ---
>
>
> On 07/07/2014 17:45, R.J. Baars wrote:
>> What do you really need?
>>
>>> Hello!
>>>
>>> This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is
>>> generating too many false positives.
>>>
>>> I was thinking about a way of adding possible exceptions to the XML in
>>> order to fix most of the positives.
>>>
>>> I contacted my Minho University friends and they replied:
>>> *"**the correct way of achieving that would be to have access to a
>>> **morphologic **analyser and accessing the Hunspell dictionary, if it
>>> was possible for the Hunspell dictionary to have such information
>>> regarding the number of substantives/adjectives.**"*
>>>
>>> Daniel Naber told me this morning:
>>> *"I suggest you have a look at the hunspell files to see if the
>>> information is in there. If it's not, there might be other resources on
>>> the internet with this information. "*
>>>
>>> Can someone help me?
>>>
>>> What should I do?
>>>
>>> Thanks!
>>>
>>> Kind regards,
>>> >Marco A.G.Pinto
>>>   ---
>
> --
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Jim O'Regan
On 7 July 2014 18:12, Marcin Miłkowski  wrote:
> W dniu 2014-07-07 16:25, Marco A.G.Pinto pisze:
>> Hello!
>>
>> This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is
>> generating too many false positives.
>>
>> I was thinking about a way of adding possible exceptions to the XML in
>> order to fix most of the positives.
>>
>> I contacted my Minho University friends and they replied:
>> *"**the correct way of achieving that would be to have access to a
>> **morphologic **analyser and accessing the Hunspell dictionary, if it
>> was possible for the Hunspell dictionary to have such information
>> regarding the number of substantives/adjectives.**"*
>>
>> Daniel Naber told me this morning:
>> *"I suggest you have a look at the hunspell files to see if the
>> information is in there. If it's not, there might be other resources on
>> the internet with this information. "*
>>
>> Can someone help me?
>
> Actually, you might try to see what results you get with our Galician
> module. I believe it comes from FreeLing, and Freeling also has a big
> Portuguese (probably Brazilian) analyzer we could reuse.

pt_PT (FreeLing is an Iberian project, after all), but they have a
version that's adapted to pt_BR (see
http://gramatica.usc.es/pln/tools/freeling.html)

-- 
 Are any of the mentors around?
 yes, they're the ones trolling you

--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marcin Miłkowski
W dniu 2014-07-07 16:25, Marco A.G.Pinto pisze:
> Hello!
>
> This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is
> generating too many false positives.
>
> I was thinking about a way of adding possible exceptions to the XML in
> order to fix most of the positives.
>
> I contacted my Minho University friends and they replied:
> *"**the correct way of achieving that would be to have access to a
> **morphologic **analyser and accessing the Hunspell dictionary, if it
> was possible for the Hunspell dictionary to have such information
> regarding the number of substantives/adjectives.**"*
>
> Daniel Naber told me this morning:
> *"I suggest you have a look at the hunspell files to see if the
> information is in there. If it's not, there might be other resources on
> the internet with this information. "*
>
> Can someone help me?

Actually, you might try to see what results you get with our Galician 
module. I believe it comes from FreeLing, and Freeling also has a big 
Portuguese (probably Brazilian) analyzer we could reuse. But you might 
start your experiments with Galician.

See:

http://nlp.lsi.upc.edu/freeling/index.php?option=com_content&task=view&id=23&Itemid=58

Regards,
Marcin

--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marco A.G.Pinto
I need to be able to check if the word after "o" or "os" in Portuguese 
is singular or plural.


I could use a workaround if no solution is available, but even for a 
workaround I need some help regarding the message I sent last night:

*Subject: Need help improving concordance rules*
The code I provided in last night's message gives lots of errors when I 
use: *testrules pt*


Thanks for your help!

Kind regards,
 >Marco A.G.Pinto
   ---


On 07/07/2014 17:45, R.J. Baars wrote:

What do you really need?


Hello!

This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is
generating too many false positives.

I was thinking about a way of adding possible exceptions to the XML in
order to fix most of the positives.

I contacted my Minho University friends and they replied:
*"**the correct way of achieving that would be to have access to a
**morphologic **analyser and accessing the Hunspell dictionary, if it
was possible for the Hunspell dictionary to have such information
regarding the number of substantives/adjectives.**"*

Daniel Naber told me this morning:
*"I suggest you have a look at the hunspell files to see if the
information is in there. If it's not, there might be other resources on
the internet with this information. "*

Can someone help me?

What should I do?

Thanks!

Kind regards,
>Marco A.G.Pinto
  ---


--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread R.J. Baars
What do you really need?

> Hello!
>
> This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is
> generating too many false positives.
>
> I was thinking about a way of adding possible exceptions to the XML in
> order to fix most of the positives.
>
> I contacted my Minho University friends and they replied:
> *"**the correct way of achieving that would be to have access to a
> **morphologic **analyser and accessing the Hunspell dictionary, if it
> was possible for the Hunspell dictionary to have such information
> regarding the number of substantives/adjectives.**"*
>
> Daniel Naber told me this morning:
> *"I suggest you have a look at the hunspell files to see if the
> information is in there. If it's not, there might be other resources on
> the internet with this information. "*
>
> Can someone help me?
>
> What should I do?
>
> Thanks!
>
> Kind regards,
>>Marco A.G.Pinto
>  ---
>
>
> --
> --
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community
> Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Morphologic Analyser to solve concordance issue for Portuguese

2014-07-07 Thread Marco A.G.Pinto

Hello!

This is still the *"o"*->*"os"* and *"os"*->*"o"* issue which is 
generating too many false positives.


I was thinking about a way of adding possible exceptions to the XML in 
order to fix most of the positives.


I contacted my Minho University friends and they replied:
*"**the correct way of achieving that would be to have access to a 
**morphologic **analyser and accessing the Hunspell dictionary, if it 
was possible for the Hunspell dictionary to have such information 
regarding the number of substantives/adjectives.**"*


Daniel Naber told me this morning:
*"I suggest you have a look at the hunspell files to see if the 
information is in there. If it's not, there might be other resources on 
the internet with this information. "*


Can someone help me?

What should I do?

Thanks!

Kind regards,
  >Marco A.G.Pinto
---


--
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel