Re: [Apertium-stuff] English-Santali Plural form not working

2021-12-20 Thread Ramansh Sharma
someone please unsubscribe me from this mailing list.

--
Ram


On Tue, Dec 21, 2021 at 12:55 PM Kevin Brubeck Unhammer 
wrote:

> Make sure you compile (`make -j langs`) before testing, and preferably
> before each commit as well. The version on github doesn't compile right
> now:
>
> apertium-eng-sat.eng-sat.t1x:69: element rule: validity error :
> Element rule content does not follow the DTD, expecting (pattern , action),
> got (pattern action rule )
>
> You could check the various stages of the pipeline to see where the
> problem happens. Do `ls modes/eng-sat*` to see the eng-sat-related
> debug modes, you can do e.g.
>
> echo Gregor woke up | apertium -d . eng-sat-lex
>
> to see up until lexical selection (typically the step right before
> transfer).
>
> Prasanta Hembram
>  čálii:
>
> > Thanks for the clarification sir, but I tried changing my rules and
> paradef
> > but no luck, no forms including paradef are working for bilingual
> > English-Santali Dictionary. Only working in a monolingual Dictionary. It
> is
> > detected as Plural for Bilingual Dictionary, but with that nouns should
> be
> > inflicted as ᱰᱟᱹᱝᱜᱽᱨᱤ  ᱠᱚ for Cows. I'm not able to find where the
> problem
> > lies. But for the monolingual Santali Dictionary it is working fine.
> After
> > deleting the dual form there is no luck. I'm following this book and
> trying
> > to make rules :-). Luckily I got this from the Internet.
> >  OCRA-Glimpes-of-Santali-Grammar.pdf
> > <
> https://drive.google.com/file/d/1y7y4-YtkL1tIV29Ri_swY6u1RbTA2HT4/view?usp=drive_web
> >
> > with best regards
> > Prasanta Hembram
> >
> >
> >
> > On Sun, Dec 19, 2021 at 11:59 PM Hèctor Alòs i Font <
> hectora...@gmail.com>
> > wrote:
> >
> >> Hi Prasanta,
> >>
> >> I'm delighted you are working on Santali.
> >>
> >> I have seen your code in github, and I have seen nothing that could
> >> generate this kind of error. So, I downloaded your code, I fixed the
> >> eng-sat.t1x file because it has a syntactic error, and I ran the
> >> translation of "cows"... but I can't get anything, seemingly because of
> a
> >> problem in the postchunk module (although it is standard):
> >>
> >> $ echo "cow" | apertium -d . eng-sat-interchunk
> >> ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
> >> $ echo "cow" | apertium -d . eng-sat-postchunk
> >> ^᱾$
> >>
> >> In any case, if you somehow get something of the type "X/Y" (like
> >> "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ") this is because in the target dictionary
> there
> >> are two generations for  ᱰᱟᱹᱝᱜᱽᱨᱤ. Probably you have:
> >>
> >> 
> >>   
> >>  ᱠᱤᱱ 
> >>  ᱠᱚ 
> >> 
> >>
> >> instead of:
> >>
> >> 
> >>   
> >>  ᱠᱤᱱ 
> >>  ᱠᱚ 
> >> 
> >>
> >> Regards,
> >> Hèctor
> >>
> >>
> >> Missatge de Prasanta Hembram <
> prasantahembram720-re5jqeeqqe8avxtiumw...@public.gmane.org> del dia dg.,
> >> 19 de des. 2021 a les 19:09:
> >>
> >>> Hi, I'm working on a new language pair English -Santali pair and trying
> >>> to learn everyday how can i improve this pair. Today I had a few
> doubts.
> >>>
> >>> Doubt 1: Plural rules are not working when translating from English to
> >>> Santali, The English plural form "Cows" returns output " ᱰᱟᱹᱝᱜᱽᱨᱤ "
> instead
> >>> of "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ" . I have set up paradef but no luck.
> How to
> >>> get the correct output??
> >>>
> >>> The correct forms are as follows
> >>> Cow = ᱰᱟᱹᱝᱜᱽᱨᱤ
> >>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ  or Two Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ
> >>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ
> >>>
> >>> My Santali Monolingual Dictionary link:
> >>> https://github.com/Prasanta-Hembram/apertium-sat
> >>>
> >>> English-Santali Bilingual Dictionary link :
> >>> https://github.com/Prasanta-Hembram/apertium-eng-sat
> >>>
> >>> echo "cows" | apertium -d . eng-sat-transfer
> >>>
> >>> returns wrong output: ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 
> >>>  
> >>>ᱠᱤᱱ  
> >>>ᱠᱚ 
> >>> 
> >>>
> >>> --
> >>> Thanks
> >>> with best regards
> >>> Prasanta Hembram
> >>> ___
> >>> Apertium-stuff mailing list
> >>> Apertium-stuff@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >>>
> >> ___
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] English-Santali Plural form not working

2021-12-20 Thread Kevin Brubeck Unhammer
Make sure you compile (`make -j langs`) before testing, and preferably
before each commit as well. The version on github doesn't compile right
now:

apertium-eng-sat.eng-sat.t1x:69: element rule: validity error : Element 
rule content does not follow the DTD, expecting (pattern , action), got 
(pattern action rule )

You could check the various stages of the pipeline to see where the
problem happens. Do `ls modes/eng-sat*` to see the eng-sat-related
debug modes, you can do e.g.

echo Gregor woke up | apertium -d . eng-sat-lex

to see up until lexical selection (typically the step right before
transfer).

Prasanta Hembram
 čálii:

> Thanks for the clarification sir, but I tried changing my rules and paradef
> but no luck, no forms including paradef are working for bilingual
> English-Santali Dictionary. Only working in a monolingual Dictionary. It is
> detected as Plural for Bilingual Dictionary, but with that nouns should be
> inflicted as ᱰᱟᱹᱝᱜᱽᱨᱤ  ᱠᱚ for Cows. I'm not able to find where the problem
> lies. But for the monolingual Santali Dictionary it is working fine. After
> deleting the dual form there is no luck. I'm following this book and trying
> to make rules :-). Luckily I got this from the Internet.
>  OCRA-Glimpes-of-Santali-Grammar.pdf
> 
> with best regards
> Prasanta Hembram
>
>
>
> On Sun, Dec 19, 2021 at 11:59 PM Hèctor Alòs i Font 
> wrote:
>
>> Hi Prasanta,
>>
>> I'm delighted you are working on Santali.
>>
>> I have seen your code in github, and I have seen nothing that could
>> generate this kind of error. So, I downloaded your code, I fixed the
>> eng-sat.t1x file because it has a syntactic error, and I ran the
>> translation of "cows"... but I can't get anything, seemingly because of a
>> problem in the postchunk module (although it is standard):
>>
>> $ echo "cow" | apertium -d . eng-sat-interchunk
>> ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
>> $ echo "cow" | apertium -d . eng-sat-postchunk
>> ^᱾$
>>
>> In any case, if you somehow get something of the type "X/Y" (like
>> "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ") this is because in the target dictionary there
>> are two generations for  ᱰᱟᱹᱝᱜᱽᱨᱤ. Probably you have:
>>
>> 
>>   
>>  ᱠᱤᱱ 
>>  ᱠᱚ 
>> 
>>
>> instead of:
>>
>> 
>>   
>>  ᱠᱤᱱ 
>>  ᱠᱚ 
>> 
>>
>> Regards,
>> Hèctor
>>
>>
>> Missatge de Prasanta Hembram 
>>  del dia dg.,
>> 19 de des. 2021 a les 19:09:
>>
>>> Hi, I'm working on a new language pair English -Santali pair and trying
>>> to learn everyday how can i improve this pair. Today I had a few doubts.
>>>
>>> Doubt 1: Plural rules are not working when translating from English to
>>> Santali, The English plural form "Cows" returns output " ᱰᱟᱹᱝᱜᱽᱨᱤ " instead
>>> of "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ" . I have set up paradef but no luck. How to
>>> get the correct output??
>>>
>>> The correct forms are as follows
>>> Cow = ᱰᱟᱹᱝᱜᱽᱨᱤ
>>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ  or Two Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ
>>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ
>>>
>>> My Santali Monolingual Dictionary link:
>>> https://github.com/Prasanta-Hembram/apertium-sat
>>>
>>> English-Santali Bilingual Dictionary link :
>>> https://github.com/Prasanta-Hembram/apertium-eng-sat
>>>
>>> echo "cows" | apertium -d . eng-sat-transfer
>>>
>>> returns wrong output: ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>>  
>>>ᱠᱤᱱ  
>>>ᱠᱚ 
>>> 
>>>
>>> --
>>> Thanks
>>> with best regards
>>> Prasanta Hembram
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Questions about lexical selection

2021-12-20 Thread Hèctor Alòs i Font
Missatge de Daniel Swanson  del dia dt., 21 de
des. 2021 a les 7:57:

> Hi Greg,
>
> The file where you want to write rules for this is
> https://github.com/apertium/apertium-pol/blob/master/apertium-pol.pol.rlx
>
> If you want something like "tacy is  before ", you could get that
> with
>
> SELECT DET IF (0 DET) (0 NOUN) (1 NOUN) ;
>

The problem with this rule is that (1 NOUN) is not necessarily a noun, but
something that can be analysed as a noun at the moment this rule is
executed. Similarly, the 0 word may be correctly analysed as something
else, like an adjective. So, a more cautious rule can be, for instance:

REMOVE NOUN IF (0 DET) (0 NOUN) (1C NOUN) ;

The problem with this alternative variant of the rule is that it matches
less often than the first one. It may not solve cases Daniel's version
solve, although it probably makes less wrong decisions. Your knowledge of
the language, and testing on corpus, should help you decide what is better,
or maybe you will choose something else in the middle. Tuning can be done
adding a few rules, previous to the general one, for often words/cases.

Hèctor


>
> Daniel
>
> On Mon, Dec 20, 2021 at 1:40 PM Grzegorz Kulik 
> wrote:
> >
> > Hello all,
> >
> > I haven't contacted you for some time, I hope you are all well. I
> developed the pol-szl pair and although the translation is quite
> reasonable, I decided to make it better by improving the lexical selection.
> I've been reading the documentation and managed to write several rules for
> forms that need disambiguation and are the same parts of speech. However, I
> cannot find any information anywhere about what to do if there is a form
> that can mean two completely different things. Example in Polish:
> >
> > tacy (such) = taki
> > tacy (of a tablet) =
> taca/taca/taca
> >
> > The first meaning is obviously much more frequent but the translator
> chooses the second one, which is less than desirable.
> >
> > What can I do to remedy this? Can I write rules for that manually?
> Should I train the tagger? If so, what method would be the best? There's
> multiple training methods and I don't know which one to choose for my pair.
> Could you recommend me the best approach?
> >
> > Thank you in advance
> > Greg
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Questions about lexical selection

2021-12-20 Thread Daniel Swanson
Hi Greg,

The file where you want to write rules for this is
https://github.com/apertium/apertium-pol/blob/master/apertium-pol.pol.rlx

If you want something like "tacy is  before ", you could get that with

SELECT DET IF (0 DET) (0 NOUN) (1 NOUN) ;

Daniel

On Mon, Dec 20, 2021 at 1:40 PM Grzegorz Kulik  wrote:
>
> Hello all,
>
> I haven't contacted you for some time, I hope you are all well. I developed 
> the pol-szl pair and although the translation is quite reasonable, I decided 
> to make it better by improving the lexical selection. I've been reading the 
> documentation and managed to write several rules for forms that need 
> disambiguation and are the same parts of speech. However, I cannot find any 
> information anywhere about what to do if there is a form that can mean two 
> completely different things. Example in Polish:
>
> tacy (such) = taki
> tacy (of a tablet) = 
> taca/taca/taca
>
> The first meaning is obviously much more frequent but the translator chooses 
> the second one, which is less than desirable.
>
> What can I do to remedy this? Can I write rules for that manually? Should I 
> train the tagger? If so, what method would be the best? There's multiple 
> training methods and I don't know which one to choose for my pair. Could you 
> recommend me the best approach?
>
> Thank you in advance
> Greg
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Questions about lexical selection

2021-12-20 Thread Grzegorz Kulik
Hello all,

I haven't contacted you for some time, I hope you are all well. I
developed the pol-szl pair and although the translation is quite
reasonable, I decided to make it better by improving the lexical
selection. I've been reading the documentation and managed to write
several rules for forms that need disambiguation and are the same parts
of speech. However, I cannot find any information anywhere about what
to do if there is a form that can mean two completely different things.
Example in Polish:

tacy (such) = taki
tacy (of a tablet) =
taca/taca/taca

The first meaning is obviously much more frequent but the translator
chooses the second one, which is less than desirable.

What can I do to remedy this? Can I write rules for that manually?
Should I train the tagger? If so, what method would be the best?
There's multiple training methods and I don't know which one to choose
for my pair. Could you recommend me the best approach?

Thank you in advance
Greg
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] English-Santali Plural form not working

2021-12-20 Thread Prasanta Hembram
Thanks for the clarification sir, but I tried changing my rules and paradef
but no luck, no forms including paradef are working for bilingual
English-Santali Dictionary. Only working in a monolingual Dictionary. It is
detected as Plural for Bilingual Dictionary, but with that nouns should be
inflicted as ᱰᱟᱹᱝᱜᱽᱨᱤ  ᱠᱚ for Cows. I'm not able to find where the problem
lies. But for the monolingual Santali Dictionary it is working fine. After
deleting the dual form there is no luck. I'm following this book and trying
to make rules :-). Luckily I got this from the Internet.
 OCRA-Glimpes-of-Santali-Grammar.pdf

with best regards
Prasanta Hembram



On Sun, Dec 19, 2021 at 11:59 PM Hèctor Alòs i Font 
wrote:

> Hi Prasanta,
>
> I'm delighted you are working on Santali.
>
> I have seen your code in github, and I have seen nothing that could
> generate this kind of error. So, I downloaded your code, I fixed the
> eng-sat.t1x file because it has a syntactic error, and I ran the
> translation of "cows"... but I can't get anything, seemingly because of a
> problem in the postchunk module (although it is standard):
>
> $ echo "cow" | apertium -d . eng-sat-interchunk
> ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
> $ echo "cow" | apertium -d . eng-sat-postchunk
> ^᱾$
>
> In any case, if you somehow get something of the type "X/Y" (like
> "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ") this is because in the target dictionary there
> are two generations for  ᱰᱟᱹᱝᱜᱽᱨᱤ. Probably you have:
>
> 
>   
>  ᱠᱤᱱ 
>  ᱠᱚ 
> 
>
> instead of:
>
> 
>   
>  ᱠᱤᱱ 
>  ᱠᱚ 
> 
>
> Regards,
> Hèctor
>
>
> Missatge de Prasanta Hembram  del dia dg.,
> 19 de des. 2021 a les 19:09:
>
>> Hi, I'm working on a new language pair English -Santali pair and trying
>> to learn everyday how can i improve this pair. Today I had a few doubts.
>>
>> Doubt 1: Plural rules are not working when translating from English to
>> Santali, The English plural form "Cows" returns output " ᱰᱟᱹᱝᱜᱽᱨᱤ " instead
>> of "ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ/ ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ" . I have set up paradef but no luck. How to
>> get the correct output??
>>
>> The correct forms are as follows
>> Cow = ᱰᱟᱹᱝᱜᱽᱨᱤ
>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ  or Two Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱤᱱ
>> Cows = ᱰᱟᱹᱝᱜᱽᱨᱤ ᱠᱚ
>>
>> My Santali Monolingual Dictionary link:
>> https://github.com/Prasanta-Hembram/apertium-sat
>>
>> English-Santali Bilingual Dictionary link :
>> https://github.com/Prasanta-Hembram/apertium-eng-sat
>>
>> echo "cows" | apertium -d . eng-sat-transfer
>>
>> returns wrong output: ^ᱰᱟᱹᱝᱜᱽᱨᱤ$^sent{^᱾$}$
>>
>>
>>
>>
>>
>>
>> 
>>  
>>ᱠᱤᱱ  
>>ᱠᱚ 
>> 
>>
>> --
>> Thanks
>> with best regards
>> Prasanta Hembram
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
Thanks
with best regards
Prasanta Hembram
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff