from:"Sevilay Bayatlı"

Re: [Apertium-stuff] GSoC 2023 Mentors & Ideas?

2023-01-29 Thread Sevilay Bayatlı

I am willing to mentor too, I would like to add a few project ideas to the
list.

Sevilay

On Sun, Jan 29, 2023 at 12:20 AM Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> I'm also happy to mentor!  (And help with recursive transfer:)  I made
> sure my name was on all the projects I'd like to mentor.
>
> --
> Jonathan
>
> 27 yan 2023, C. tarixində 22:29 tarixində Hèctor Alòs i Font
>  yazdı:
> >
> > Missatge de Kevin Brubeck Unhammer  del dia dv., 27
> de gen. 2023 a les 23:41:
> >>
> >> > As far as rewriting the
> >> > transfer rules using apertium-recursive is concerned, a co-mentor with
> >> > experience in the module would be highly desirable.
> >>
> >> I can try to assist :)
> >
> >
> > Great, thanks, Kevin! It now remains to be seen whether all the
> conditions are in place for there to be a solid proposal in this sense
> (starting with Apertium being chosen by Google this year).
> >
> > Hèctor
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Contributing to a new Language Pair

2022-02-05 Thread Sevilay Bayatlı

Hi Yash Gupta,

Here are a few websites you can start with them, and for quick questions
and answers join #IRC

https://wiki.apertium.org/wiki/Installation

https://wiki.apertium.org/wiki/Main_Page

https://github.com/apertium


Best,

Sevilay




On Sat, Feb 5, 2022 at 3:24 PM Yash Gupta  wrote:

> Hello out there,
>
> I am new to the community, I have surfed a bit and I am highly interested
> in
> contributing towards making a new language pair of English and Marwadi
>
> It would be wonderful if I can get some help in understanding how shall I
> begin with
>
> Regards
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Proposal: A Research in Semantic Domains and Arabic Collocations

2022-01-29 Thread Sevilay Bayatlı

Hi Anas,

I am happy to help you, I am a native Arabic speaker and also working as an
academic.  Let me know more about your work.

Sevilay

On Sat, Jan 29, 2022 at 3:16 PM  wrote:

> Dear sirs,
>
> I'm interested in submitting a doctoral thesis in semantic domains, and
> Arabic collocations, hoping to be a contribution to Apertium project.
>
> I'm looking for an Apertium developer in an academic context that would
> be in a position to be able to supervise my work.
>
> Maybe it's recommended also to be a native Arabic speaker, to evaluate
> the Arabic collocations sub project.
>
> Best regards.
>
> — Anas R.
>
> https://richstyle.org/
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Fwd: [GSoC Mentors] GSoC 2022 Org Applications open Feb 7-21

2022-01-10 Thread Sevilay Bayatlı

-- Forwarded message -
From: 'sttaylor' via Google Summer of Code Mentors List <
google-summer-of-code-mentors-l...@googlegroups.com>
Date: Mon, Jan 10, 2022 at 10:16 AM
Subject: [GSoC Mentors] GSoC 2022 Org Applications open Feb 7-21
To: Google Summer of Code Mentors List <
google-summer-of-code-mentors-l...@googlegroups.com>

Happy New Year!

We are excited to announce the full GSoC 2022 timeline
. With
Organization applications opening in a few weeks, February 7 - 21, we
wanted to remind everyone of the important changes to think about when
considering your application.

   1.

   Be sure to have both ~175 hr projects and ~350 hour projects that you
   would like GSoC contributors to work on included in your Ideas List.
   2.

   Reach out to your community members to ask if they would like to be
   mentors for the program. This year with the option of extending projects
   for up to 22 weeks we hope some of your community members that weren’t able
   to participate before will be able to make the more flexible schedule work
   for them.

Having a thorough and well thought out list of Project Ideas

is the most important part of your application.

You can check out the 2022 GSoC Org Application Short Answer Questions

if you’d like to start preparing your responses in advance so you can copy
over your answers once the org applications open February 7 at g.co/gsoc -
there are a couple of different questions this year.

Please encourage other open source orgs to apply -- if you know of other
open source projects that may be interested in applying to GSoC as a first
time org please remind them to check out the available resources below and
have them put your org (or you) down as a reference when they apply.

Open source projects can apply  to be
mentoring organizations from February 7- 21 at 1800 UTC.

Resources:

Mentor Guide 

Timeline 

FAQs 

Roles and Responsibilities

Marketing Materials
 (slide
deck, flyers)

Videos 

Best,

Stephanie

GSoC Program Admin

-- 
You received this message because you are subscribed to the Google Groups
"Google Summer of Code Mentors List" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to google-summer-of-code-mentors-list+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-summer-of-code-mentors-list/644e37da-5c0f-4204-8290-84d0f2e819a4n%40googlegroups.com

.
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC

2021-04-18 Thread Sevilay Bayatlı

Hi Kamush,

You need to work on the coding  challenge as soon as possible which is

   - Installing Apertium
   - Initialize kaz-uzb pair
   - Send the first PR that can translate a small sample text.

As I saw in your work plan you are going to collect some resources as
parallel corpora, why that? is this just for testing?
Another point, you are going to make a kaz-uzb system so your main focus
should be on transfer rules besides bidix and lexical selection rules, but
as I saw you are going to spent just one or two weeks on transfer rules,
and this, not enough. If you work on all of them simultaneously it will be
better.

Sevilay


On Thu, Apr 15, 2021 at 7:39 PM Barno Kutlimuratova <
kutlimuratovab0...@gmail.com> wrote:

> Assalamu Aleikum Sevilay bayatli,
>
> I am also grateful for the quick contact and your will to mentor me.
>
> Could you please have a look at my wiki proposal page now? I have just
> created the work plan, but it needs to be reviewed.
>
> So, the major distribution of my work plan as such:
> * The first week I devoted to improving the Uzbek lexicon(monolingual
> data), as I can see that there are so many stems repeated (ones created by
> a bot?) and many of them have to be corrected.
>  * I cannot do the same for Kazakh, as I'm not proficient in Kazakh,
> and it looks pretty clean too.
> * next three weeks I devoted to creating the kaz-uzb bilingual dictionary
> and lexical selection rules;
> * Last three weeks should be enough for transfer rules;
> * Last two weeks for creating testvoc;
>
> If there is any part that I might have missed(most probably I did),
> please let me know.
> Also, if you think that time distribution is well-distributed, could you
> please help me organise it?
>
> Best,
>
> Kamush
>
>
>
>
> On Thu, 15 Apr 2021 at 16:04, Sevilay Bayatlı 
> wrote:
>
>> Hi Kamush,
>>
>> Please upload your work plan here
>> https://wiki.apertium.org/wiki/User:Kamush/GSoC2021Proposal as soon as
>> possible..  I am happy to mentor you, but first, we want to see your work
>> plan.
>>
>> Sevilay
>>
>>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC proposal draft - Create a usable version of these language pair: English--Igbo

2021-04-10 Thread Sevilay Bayatlı

Hi Okonkwo,

How many words you will be able to add into the monodix (and how many in
the bidix?), and what is your WER goal?

Do you think 1 week is enough to work on transfer rules?

Another thing, as I understand from your proposal your main focus on a
bilingual dictionary, the monodix needs more focus otherwise you can't good
result or you have to work simultaneously.


Sevilay






On Fri, Apr 9, 2021 at 12:42 PM Okonkwo Ifeanyichukwu <
ifeanyijaspe...@gmail.com> wrote:

> My name is Okonkwo Ifeanyichukwu a final year student at the University of
> Buea. I am interested in participating in GSoC 2021, on the project -
> "Create a usable version of these language pair: English--Igbo".
>
> I am planning to build English(eng)-Ibo(ibo) MT pair. I have added some
> ibo words to ibo pair and pull request open. I have done some coding
> challenge but it still needs improvement. I will upload the translated
> story to Github with my work on the GitHub repository mentioned in the
> proposal draft. It would be of great help if I could get some feedback
> before I make the final submission.
>
>  Link to my proposal draft:
> Proposal Draft
> 
>
> https://docs.google.com/document/d/1iK_9VTqb5ZHH1bEjl5UAqBP77ijaNKm6p4HIT2JO5qk/edit?usp=sharing
>
> Sincerely,
> Okonkwo
> IRC: Ifeanyi
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC proposal draft - building a prototype MT system

2021-04-09 Thread Sevilay Bayatlı

Hi Anuradha,
You need to update your proposal based on what Hèctor suggested, yeah it is
better to work on both monodix and bidix simultaneously, but for a good
lexicon, you need to take a small corpus and analysis the sentences and
adding words.

Sevilay

On Thu, Apr 8, 2021 at 9:24 AM Anuradha Pandey 
wrote:

> Thank you for your response, Hèctor. I read the proposal for the
> Hindi-Bengali translator. There aren't open-source dictionaries for the
> Bhojpuri language (though there are resources for getting a Bhojpuri
> corpus), so I was using a hardcopy of a BHO-HIN dictionary for manually
> adding the pairs. I did some rough calculations, and I shall be able to add
> at least 8,000 words to the monodix. And, based on my experience with
> Apertium, I think simultaneously adding words in the bidix makes the work
> easier, so I think roughly the same number of words in the bidix too. But,
> I don't think I will be able to achieve a WER below 20% with 8000 words.
> Should I aim for a WER of nearly 30% then?
>
> Since the time for GSoC has been reduced, I am planning to modify my
> proposal and the inputs from mentors would be extremely helpful.
>
> On Wed, 7 Apr 2021 at 20:24, Hèctor Alòs i Font 
> wrote:
>
>> Hi, Anuradha.
>>
>> Thanks for your proposal draft. First, I would like to tell you that if
>> Apertium is a rule-based translation system, it is because this paradigm
>> still makes sense for many languages (indeed, for the vast majority of
>> them). If Bhojpuri has extensive electronic language resources and,
>> particularly, bilingual linguistic corpora, then Apertium is probably not
>> the best approach. But this is probably not the case. If it was, it would
>> probably already be on Google Translate.
>>
>> As for the project. I would advise you to look at Gourab Chakraborty's
>> proposal for a Hindi-Bengali translator and the comments on it. Most of the
>> comments apply to your proposal as well. The following message would be
>> useful to you, for instance:
>> https://sourceforge.net/p/apertium/mailman/message/37251899/
>>
>> Your proposal seems to me unrealistic. 10,000 words in the monodix (and
>> how many in the bidix?) are not enough for a WER below 20%, I think (maybe
>> for two extremely close related languages).
>>
>> For better evaluation your proposal I'd like to find the answer for some
>> basic questions:
>>
>> * Which is the current state of Bhojpuri language and, eventually,
>> the Bhojpuri-Hindi language pair in Apertium?
>> * Would you have to write a whole Bhojpuri morphological analyser from
>> scratch and, afterwards, to add some 10,000 words manually assigning them
>> to a given paradigm? How much time you'll need for this?
>> * From where would you get the bilingual dictionary? Would you have to
>> create it yourself? Are there freely available bilingual electronic
>> dictionaries (like e.g. Wiktionary)?
>> * Would you work on a Bhojpuri-to-Hindi translator or on a
>> Hindi-to-Bhojpuri one? In any case there will be a quite a lot of work in
>> the morphological disambiguation. But for one side you'll have it only
>> once. If both Hindi-to-Bhojpuri and Hindi-to-Bengali are chosen (which is
>> entirely possible), this work can be divided by the two projects.
>>
>> There is nothing wrong to this all this work by hand, if needed. It
>> depends on the state of the language resources for the given language. But
>> it is necessary to know to what extent you will have to do this
>> time-consuming work.
>>
>> When we had twice the time in most of the cases the projects couldn't
>> reach to create a working translator for a new language pair. In the
>> current conditions, it is even more difficult.
>>
>> Hèctor
>>
>>
>>
>>
>> Missatge de Anuradha Pandey  del dia dc., 7
>> d’abr. 2021 a les 16:28:
>>
>>> Hello everyone,
>>> I am Anuradha Pandey, a sophomore student at BITS Pilani. I am
>>> interested I participating in GSoC 2021, on the project - "*Develop a
>>> prototype MT system for a strategic language pair*".
>>>
>>> I have prepared a rough draft for the same and I am planning to build
>>> Bhojpuri(BHO)-Hindi(HIN) MT pair. I am improving my translation system for
>>> the coding challenge and I will update my work on the GitHub repository
>>> mentioned in the draft. It would be really helpful if I could get some
>>> feedback before I make the final submission.
>>>
>>> Link to the draft -
>>>
>>> https://docs.google.com/document/d/1U19gJ3TMKYkYsp-FRthrvXkCRJUnNYSYKi46XhvZGOE/edit?usp=sharing
>>>
>>> Thanks & Regards,
>>> Anuradha Pandey
>>> IRC: Anuradha_Pandey
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>

Re: [Apertium-stuff] comments sought on Iraqi Türkman ISO code

2020-10-02 Thread Sevilay Bayatlı

Dear Apertiumers,

In addition to what Jonathan said, Iraqi Turkmen is a third official
language in Iraq.  In Turkey and in many Turkic countries, Iraqi Türkmen’s
poems, songs, fictions and stories have seen a lot of interest and people
admire it.  So, I believe it will be of great benefit to the Iraqi Turkman
by taking the ISO 639-3 registration of the language. People can study in
their mother tongue and this will help them to be developed easily in
different areas. These areas could be language filed or other fields. Also,
this will unite the people there to maintain their own culture and customs
:)


Your support by commenting will be appreciated.

Sincerely;

Sevilay



On Fri, Oct 2, 2020 at 8:28 PM Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Dear colleagues (apologies for cross-posting),
>
> Sevilay (CCed) and I have submitted an application to the ISO 639-3
> registrar for a new three-letter code for Sevilay's native language,
> Iraqi Türkman, to be added to the standard:
> https://iso639-3.sil.org/request/2020-039
>
> The registration authority is currently accepting comments from the
> public (until December 15th), which are taken into consideration when
> the decision is made to approve the request or not.  We would like to
> ask you to consider submitting a comment.
>
> Because of how the world works, an ISO code is the next step towards
> recognition of the existence of the language among academics and
> industry.  Hence it is also a major prerequisite for providing access
> to language technology, which in turn has the potential to reinforce
> continued use and intergenerational transmission of the language.
>
> One concern those reviewing the application might have is the
> similarity of the language to other Western Oghuz varieties, like
> Turkish and Azerbaycani.  This is a valid concern—there is some level
> of mutual intelligibility of the spoken varieties, and many speakers
> of Iraqi Türkman do have some level of exposure to Turkish.  However,
> the varieties are linguistically rather divergent, and there are
> distinct literary traditions.  Furthermore, official classification of
> Iraqi Türkman as a dialect of Turkish (i.e., denial of the application
> along these lines) runs the risk of denying speakers of Iraqi Türkman
> access to materials in their own language, whether already existing or
> yet to be created.
>
> Please feel free to contact Sevilay and/or me with any questions about
> any of this.
>
> --
> Jonathan
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: CFP: Resources and Representations for Under-resourced Languages and Domains

2020-09-05 Thread Sevilay Bayatlı

Hi everybody,

We can participate by using some Turkic languages,  apertium-tki cloud be
one of them.  what do you think?


Sevilay


On Sat, Sep 5, 2020 at 4:13 PM Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Apertiumers,
>
> See the following information about a workshop of relevance.
>
> --
> Jonathan
>
> =
>
> Dear ALL,
> We are organizing RESOURCEFUL-2020 (RESOURCEs and Representations For
> Under-resourced Languages and Domains) which will be collocated with the
> Eighth Swedish Language Technology Conference (SLTC), organized by the
> University of Gothenburg, Sweden on 25th November 2020. The conference and
> the workshop will be held online.
>
> The aim of the workshop is to create a forum for researchers in the area
> of resource creation and representation learning in limited or low-resource
> environments. You can find more details about the questions we would like
> to address:
>  https://gu-clasp.github.io/resourceful-2020/
> Best regards,
> Tewodros
>
> On behalf of the organizers:
>
> Tewodros Gebreselassie, University of Gothenburg
> Simon Dobnik, University of Gothenburg
> Barbara Plank, IT University Copenhagen
> Lars Borin, University of Gothenburg
>
> Contact: resourceful2...@easychair.org
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Potential replacement for GCI

2020-06-12 Thread Sevilay Bayatlı

could you add me to the group?

Sevilay

On Fri, Jun 12, 2020 at 5:45 AM Samuel Sloniker 
wrote:

> Does anyone have any good name ideas? I thought about ReCodeIn, but there
> could be trademark issues (and the "re" could confuse those not familiar
> with GCI). I also thought of FOSS for Teens (FFT), but that doesn't seem
> very creative.
>
> On Thu, Jun 11, 2020 at 10:22 AM Samuel Sloniker 
> wrote:
>
>> Sure! Just added you.
>>
>> On Thu, Jun 11, 2020, 10:08 abhishek chopra 
>> wrote:
>>
>>> Can I join the Google group?
>>>
>>> On Thu, 11 Jun, 2020, 8:30 pm Samuel Sloniker, 
>>> wrote:
>>>
 I created a Google Group (private to avoid spam), anyone who wants to
 join can ask.

 On Thu, Jun 11, 2020 at 7:49 AM Samuel Sloniker 
 wrote:

> By GitHub integration, I mean a PR could be linked to a task, and when
> the PR was merged, the task would be completed.
>
> On Thu, Jun 11, 2020 at 7:31 AM Samuel Sloniker 
> wrote:
>
>> Cool idea! However, I don't do social media; perhaps someone else
>> could post there?
>> Also, I'm working on ReCodeIn, an experimental GCI-type task tracker,
>> but it's a long way from completion. If we end up using it, there could 
>> be
>> GitHub integration, as well as a Begiak module.
>>
>> On Thu, Jun 11, 2020, 05:25 Marc Bogonovich <
>> marc.bogonov...@gmail.com> wrote:
>>
>>> Hi Samuel,
>>> Bad news about Google Code-in.
>>> Perhaps one group you could advertise to would be the FOSS server on
>>> Mastodon. Maybe many of you are already on there? If you're not
>>> familiar with Mastodon, it is a "Federated" or decentralized
>>> microblogging platform (kind of like Twitter but with independent but
>>> communicating servers). I'm in the FOSStodon community (ID below,
>>> please hit me up).
>>>
>>> Recently, in the Fosstodon instance of Mastodon, I heard someone talk
>>> about how they had only just learned about the Apertium project, and
>>> they wanted to share about how awesome it is. There are around ~10k
>>> people in Fosstodon, and over a million in Mastodon in general.
>>>
>>> Specifically join the Fosstodon instance, here:
>>> https://fosstodon.org/about
>>> Or join another instance:
>>> https://joinmastodon.org/
>>>
>>> @m...@fosstodon.org
>>>
>>>
>>> On Thu, Jun 11, 2020 at 12:25 PM Samuel Sloniker <
>>> scoopgra...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> > Google has cancelled Google Code-in. I was wondering if Apertium
>>> could start a replacement. Perhaps we could partner with some other FOSS
>>> organizations?
>>> > ___
>>> > Apertium-stuff mailing list
>>> > Apertium-stuff@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC2020-English-Swahili pair

2020-03-31 Thread Sevilay Bayatlı

Hi Eden,

I have reviewed your proposal, it looks fine.

best,
Sevilay


On Tue, Mar 31, 2020 at 4:43 PM Eden Grace  wrote:

> Hi all,
> I wrote a proposal to create a usable version for the Swahili-English pair
> here.
> 
> Any feedback/suggestion/question will be greatly appreciated.
> Also, here  is my PR
> for the Swahili-English bilingual dictionary and Swahili's transducer is
> here .
>
> Thanks
> Eden
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: [Corpora-List] Postdoc fellowship in Indigenous Language Documentation and Technology

2020-03-23 Thread Sevilay Bayatlı

Thank you Tommi.

Sevilay

On Mon, Mar 23, 2020 at 7:55 PM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> Sorry for spam guys, but I think this position should be of specific
> interest to many apertiumers.
>
> - Forwarded message from Antti Arppe  -
>
> Subject: [Corpora-List] Postdoc fellowship in Indigenous Language
> Documentation and Technology
> From: Antti Arppe 
> To: i...@list.arizona.edu, ling...@listserv.linguistlist.org,
> corp...@uib.no, w...@cla-acl.ca,
> webmas...@ssila.org, r-n-...@lists.unimelb.edu.au
> Cc: Antti Arppe , 21st Century Tools for Indigenous
> Languages <21ct...@gmail.com>
> Date: Sun, 22 Mar 2020 13:56:02 -0600
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
> Gecko/20100101 Thunderbird/68.6.0
> X-Spam-Score: -5.05
> X-Spam-Status: No, score=-5.05 tagged_above=-999 required=6
> tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1,
> DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1,
> HEADER_FROM_DIFFERENT_DOMAINS=0.249, MAILING_LIST_MULTI=-1,
> RCVD_IN_DNSWL_MED=-2.3, URIBL_BLOCKED=0.001] autolearn=unavailable
> autolearn_force=no
>
> [apologies for cross-postings]
>
> Dear colleagues,
>
> We'd appreciate if you shared the posting below on our postdoc position
> through your networks. Also, if you know of recent or forthcoming PhD's
> who'd
> be interested in the creation of language technological resources and in
> Indigenous languages, we'd be most happy if they applied.
>
> -Antti Arppe
> ---
> OPEN POSITION: POSTDOCTORAL FELLOWSHIP IN INDIGENOUS LANGUAGE DOCUMENTATION
> AND TECHNOLOGY
>
> WEBPAGE:
>
> https://altlab.artsrn.ualberta.ca/21c/2020/03/14/postdoc-language-documentation-2020/
>
> DESCRIPTION
>
> The university-community Partnership “21st Century Tools for Indigenous
> Languages” invites applications for a full-time Postdoctoral Fellowship in
> Indigenous Language Documentation and Technology, beginning in summer/fall
> 2020, in this research project funded by the Social Sciences and Humanities
> Research Council (SSHRC) of Canada. The start date is negotiable, and the
> appointment is tenable for 2 years, subject to review after the 1st year.
>
> This 7-year Partnership is led by the Alberta Language Technology Lab
> (ALTLab)
> in the Department of Linguistics, University of Alberta, and it has as
> partner
> organizations 13 institutions and Indigenous language communities and 31
> individual researchers and educators in Canada, the United States, and
> Norway.
> Further details of our Partnership and the host organization can be found
> at:
> https://altlab.artsrn.ualberta.ca/21c/ and
> http://altlab.artsrn.ualberta.ca
>
> Members of our Partnership have been developing computational models of the
> phonetics, morphology, lexis, and syntax of Indigenous languages in Canada
> and
> North America, starting with the Algonquian and the Dene language
> families, to
> create software applications that support their continued use in daily
> life by
> both speakers and learners. These include intelligent electronic
> dictionaries,
> spell-checkers, linguistically analyzed text collections, computer-aided
> language learning tools, as well as text-to-speech synthesizers and optical
> character recognition. The languages we have gotten the furthest with are
> Plains Cree (Algonquian) and Tsuut’ina (Dene), see:
> https://altlab.artsrn.ualberta.ca/tools-applications/ and
> https://altlab.artsrn.ualberta.ca/publications/ .
>
> DUTIES
>
> The tasks of the Postdoc will include the following, allowing for variation
> based on the successful applicant’s competences and interests:
>
> 1. participation in/responsibility for the continued development of our
> existing computational morphological and phonetic models and end-user
> applications for the Algonquian and/or Dene and/or other Indigenous
> languages
> we are already working on;
>
> 2. participation in/responsibility for the development of new computational
> morphological and phonetic models and applications for Indigenous languages
> other than the ones we are working on, preferably spoken in Canada;
>
> 3. partial training and supervision of undergraduate and graduate students
> (M.A/Ph.D level) in developing models, applications and resources for
> Indigenous languages;
>
> 4. engagement with Indigenous community consultants on collecting primary
> linguistic data and gathering feedback from community members;
>
> 5. other administrative responsibilities.
>
> The fellowship comes with an annual salary (in CAD) in line with SSHRC
> policies (
> https://www.sshrc-crsh.gc.ca/funding-financement/programs-programmes/fellowships/postdoctoral-postdoctorale-eng.aspx
> ),
> and benefits.
>
> The postdoc is expected to work with and support the activities of multiple
> Partners in the Partnership, and may be co-located or based at other
> Partners
> for part of their tenure (https://altlab.artsrn.ualberta.ca/21c/people/).
> To
> this end, the Partnership has allocated

Re: [Apertium-stuff] Working on the bn-en language pair

2020-03-15 Thread Sevilay Bayatlı

Also you have to work in coding challenge here
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair#Coding_challenge

best,

Sevilay

On Sun, Mar 15, 2020 at 5:55 PM Saurabh Rai  wrote:

> Hello Sourabh,
> For planning a Proposal you can have a look at the previous proposals by
> the students.
>
> http://wiki.apertium.org/wiki/Category:Student_proposals_for_the_Google_Summer_of_Code
> And have a look at this page as well.
> http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications
>
> On Sun, Mar 15, 2020, 8:18 PM Sourabh Raj  wrote:
>
>> Hi,
>>
>> I Have been working on the English-Bengali pair. How should my work plan
>> be for this pair? I have already started with reading the
>> recommended wikis, the documentation and have started working on the
>> dictionaries.
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium elections coming up.

2020-03-04 Thread Sevilay Bayatlı

I help to run the election.

Sevilay

On Thu, 5 Mar 2020, 06:15 Tanmai Khanna,  wrote:

> I volunteer to help conduct the election.
>
> Tanmai
>
> Sent from my iPhone
>
> On 05-Mar-2020, at 02:40, Diogo  wrote:
>
> 
> I must say the fact that the second list was generated by me, but I'm not
> eligible to vote according to those lists is quite ironic
>
> A quarta, 4/03/2020, 22:01, Tino Didriksen 
> escreveu:
>
>> It's now been a week. From what I can see, nobody volunteered to run the
>> election, and we only have 3 who for sure want to be part of the PMC.
>>
>> Everyone, please report in if you want to run the election or be part of
>> the PMC.
>>
>> For a census of those eligible to vote,
>> https://github.com/apertium/apertium-packaging/blob/master/authors.json
>> (sections 1, 20, and 25) plus
>> https://github.com/apertium/family-visualizations/blob/master/scrapers/.mailmap
>> will serve with minimal editing.
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 26 Feb 2020 at 10:50, Mikel L. Forcada  wrote:
>>
>>> Dear all,
>>>
>>> (1) According to our by-laws  [http://wiki.apertium.org/wiki/Bylaws],
>>> article 8, "The Assembly of Committers elects the Project Management
>>> Committee of Apertium every two years or whenever a vacancy occurs". The
>>> last election occurred in December 2017. We are therefore late.
>>>
>>> (2) According to article 23a, "An Election Board of 3 committers (with
>>> one substitute each) will run the election.". We need to appoint this
>>> board, preferrably made up of people who are not running for the election.
>>> I would appreciate it very much if six committers volunteered to run it and
>>> one of them led the process.
>>>
>>> (3) I announce will not run for president this time (I haven't decided
>>> about running for a position in the PMC yet). My role in Apertium in the
>>> last year has been testimonial and I believe it is time for someone else
>>> who is more active to chair the PMC.
>>>
>>> (4) As I am the only one who is able to operate the Apertium account(s),
>>> one of the first things we would have to do is for the new PMC to open a
>>> new account to which I would transfer the balance after updating the
>>> reporting I keep, which is outdated by about 4 months [1].
>>>
>>> All the best,
>>>
>>> Mikel
>>>
>>>
>>> [1]
>>> https://docs.google.com/spreadsheets/d/1bOBwjJF-lLGwYJtxiNLGtqa9LygQsiQ81tgVgDl4yU0/edit?usp=sharing
>>>
>>> --
>>> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
>>> Departament de Llenguatges i Sistemes Informàtics
>>> Universitat d'Alacant
>>> E-03690 Sant Vicent del Raspeig
>>> Spain
>>> Office: +34 96 590 9776
>>>
>>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC-2020

2020-02-28 Thread Sevilay Bayatlı

Hello,

In this link http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
you can choose  the  idea  to contribute to the project.

Sevilay

On Fri, 28 Feb 2020, 19:14 Himanshu choudhary, <
himanshuchoudhary_bt2...@dtu.ac.in> wrote:

> My name is Himanshu, I am currently studying at Delhi College of
> Engineering India. I want to know how I can contribute to Apertium. I am
> highly interested in machine translation tasks and want to work on an
> open-source project. I had also addressed some same issues in my research
> papers as in Apertium.
>
> Previously I have also worked on Neural machine translation for
> less-resourced morphologically rich Indian languages. One of my papers got
> accepted in EMNLP (WMT) -2018 and the current one in LREC-2020.
>
> Can you please guide me how can I start working and contributing to some
> similar tasks in Apertium. I am experienced in working on machine
> translation models please help me so that I can work efficiently and will
> also be able to contribute and can create a strong application for GSOC
> 2020.
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC 2020 Ideas Page

2020-02-21 Thread Sevilay Bayatlı

Congurlation!

Regarding weighted transfer rules, we have plan to finish it during next
few mouths, it's better don't include it as a task for GSoc, what do you
think?

Sevilay

On Fri, 21 Feb 2020, 17:12 Tino Didriksen,  wrote:

> Apertium is in GSoC 2020!
>
> Time to update the
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code page.
>
> What projects were actually completed to the mentors' satisfaction last
> year?
>
> What new projects do people want to add?
>
> -- Tino Didriksen
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Need your guidance for GSoC 2020

2020-02-20 Thread Sevilay Bayatlı

Hi Arzoo,

Welcome to Apertium, you have to read about apertium
https://github.com/apertium  then decide what can you contribute.

Best regards,

Sevilay

On Thu, Feb 20, 2020 at 10:39 PM Arzoo  wrote:

> Good evening everyone,
>
> My name is Arzoo. I am a fifth-year student at the National Institute of
> Technology, Hamirpur pursuing a dual degree program (B.Tech + M.Tech) in
> Computer Science and Engineering. I am looking for GSoC 2020. And for the
> same, I need a mentor for guidance. Please help and give directions so that
> I can contribute something solid to this GSoC.
>
> Thanks and Regards,
> Arzoo
>
> --
> Arzoo
> Computer Science and Engineering Department
> National Institute of Technology, Hamirpur
> +91-9971718061
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-02-06 Thread Sevilay Bayatlı

I can be mentor.

Sevilay

On Thu, 6 Feb 2020, 14:35 Tommi A Pirinen, <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Tue, Feb 04, 2020 at 01:51:38PM -0500, Jonathan Washington wrote:
>
> > Everyone else interested in mentoring, please let me know.
>
> I can co-mentor this year again but with very randomly varying schedule
> while I'm possible moving between jobs or places.
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> , Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> .
> I tend to follow inline-posting style in desktop e-mail messages.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-26 Thread Sevilay Bayatlı

Hi,
I think we do, we already have the list of authors and draft of the paper.
We need someone  to submit  a title, list of authors, and
a short description.

Sevilay


On Wed, Nov 20, 2019 at 8:05 PM Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Hi all,
>
> This is just a reminder that the expression of interest for this
> volume is due in less than a week!
>
> The expression of interest is easy: just a title, list of authors, and
> a short description.
>
> If anyone else would like to help out with the updated Apertium paper
> that we're planning to submit, then please get in touch.
>
> --
> Jonathan
>
> пт, 1 нояб. 2019 г. в 22:21, Jonathan Washington
> :
> >
> > Hi all,
> >
> > Below please find a revised CFP for the Machine Translation Special
> > Issue on MT for Low-Resource Languages.
> >
> > =
> > CALL FOR PAPERS: Machine Translation Journal
> > Special Issue on Machine Translation for Low-Resource Languages
> > https://www.springer.com/computer/ai/journal/10590/
> >
> > GUEST EDITORS (Listed alphabetically)
> > • Alina Karakanta (FBK-Fondazione Bruno Kessler)
> > • Audrey N. Tong (NIST)
> > • Chao-Hong Liu (ADAPT Centre/Dublin City University)
> > • Ian Soboroff (NIST)
> > • Jonathan Washington (Swarthmore College)
> > • Oleg Aulov (NIST)
> > • Xiaobing Zhao (Minzu University of China)
> >
> > Machine translation (MT) technologies have been improved significantly
> > in the last two decades, with developments in phrase-based statistical
> > MT (SMT) and recently neural MT (NMT). However, most of these methods
> > rely on the availability of large parallel data for training the MT
> > systems, resources which are not available for the majority of
> > language pairs, and hence current technologies often fall short in
> > their ability to be applied to low-resource languages. Developing MT
> > technologies using relatively small corpora still presents a major
> > challenge for the MT community. In addition, many methods for
> > developing MT systems still rely on several natural language
> > processing (NLP) tools to pre-process texts in source languages and
> > post-process MT outputs in target languages. The performance of these
> > tools often has a great impact on the quality of the resulting
> > translation. The availability of MT technologies and NLP tools can
> > facilitate equal access to information for the speakers of a language
> > and determine on which side of the digital divide they will end up.
> > The lack of these technologies for many of the world's languages
> > provides opportunities both for the field to grow and for making tools
> > available for speakers of low-resource languages.
> >
> > In recent years, several workshops and evaluations have been organized
> > to promote research on low-resource languages. NIST has been
> > conducting Low Resource Human Language Technology evaluations
> > (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
> > no training data in the evaluation language. Participants receive
> > training data in related languages, but need to bootstrap systems in
> > the surprise evaluation language at the start of the evaluation.
> > Methods for this include pivoting approaches and taking advantage of
> > linguistic universals. The evaluations are supported by DARPA's Low
> > Resource Languages for Emergent Incidents (LORELEI) program, which
> > seeks to advance technologies that are less dependent on large data
> > resources and that can be quickly pivoted to new languages within a
> > very short amount of time so that information from any language can be
> > extracted in a timely manner to provide situation awareness to
> > emergent incidents. There are also the Workshop on Technologies for MT
> > of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
> > Approaches for Low-Resource Natural Language Processing (DeepLo),
> > which provide a venue for sharing research and working on the research
> > and development in this field.
> >
> > This special issue solicits original research papers on MT
> > systems/methods and related NLP tools for low-resource languages in
> > general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very
> > welcome to submit their work to the special issue. Summary papers on
> > MT research for specific low-resource languages, as well as extended
> > versions (>40% difference) of published papers from relevant
> > conferences/workshops are also welcome.
> >
> > Topics of the special issue include but are not limited to:
> >  * Research and review papers of MT systems/methods for low-resource
> languages
> >  * Research and review papers of pre-processing and/or post-processing
> > NLP tools for MT
> >  * Word tokenizers/de-tokenizers for low-resource languages
> >  * Word/morpheme segmenters for low-resource languages
> >  * Use of morphological analyzers and/or morpheme segmenters in MT
> >  * Multilingual/cross-lingual NLP tools for MT
> >  * Review of available

Re: [Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-21 Thread Sevilay Bayatlı

Ilnar,
I have added you, we need your affiliation and email address too.

Sevilay


On Thu, Nov 21, 2019 at 7:32 AM Ilnar Salimzianov 
wrote:

> Hey,
>
> I don't remember whether I have said so already, but I'm in :)
>
> Best,
>
> Ilnar
>
> Am 20.11.2019 18:04 schrieb Jonathan Washington:
> > Hi all,
> >
> > This is just a reminder that the expression of interest for this
> > volume is due in less than a week!
> >
> > The expression of interest is easy: just a title, list of authors, and
> > a short description.
> >
> > If anyone else would like to help out with the updated Apertium paper
> > that we're planning to submit, then please get in touch.
> >
> > --
> > Jonathan
> >
> > пт, 1 нояб. 2019 г. в 22:21, Jonathan Washington
> > :
> >>
> >> Hi all,
> >>
> >> Below please find a revised CFP for the Machine Translation Special
> >> Issue on MT for Low-Resource Languages.
> >>
> >> =
> >> CALL FOR PAPERS: Machine Translation Journal
> >> Special Issue on Machine Translation for Low-Resource Languages
> >> https://www.springer.com/computer/ai/journal/10590/
> >>
> >> GUEST EDITORS (Listed alphabetically)
> >> • Alina Karakanta (FBK-Fondazione Bruno Kessler)
> >> • Audrey N. Tong (NIST)
> >> • Chao-Hong Liu (ADAPT Centre/Dublin City University)
> >> • Ian Soboroff (NIST)
> >> • Jonathan Washington (Swarthmore College)
> >> • Oleg Aulov (NIST)
> >> • Xiaobing Zhao (Minzu University of China)
> >>
> >> Machine translation (MT) technologies have been improved significantly
> >> in the last two decades, with developments in phrase-based statistical
> >> MT (SMT) and recently neural MT (NMT). However, most of these methods
> >> rely on the availability of large parallel data for training the MT
> >> systems, resources which are not available for the majority of
> >> language pairs, and hence current technologies often fall short in
> >> their ability to be applied to low-resource languages. Developing MT
> >> technologies using relatively small corpora still presents a major
> >> challenge for the MT community. In addition, many methods for
> >> developing MT systems still rely on several natural language
> >> processing (NLP) tools to pre-process texts in source languages and
> >> post-process MT outputs in target languages. The performance of these
> >> tools often has a great impact on the quality of the resulting
> >> translation. The availability of MT technologies and NLP tools can
> >> facilitate equal access to information for the speakers of a language
> >> and determine on which side of the digital divide they will end up.
> >> The lack of these technologies for many of the world's languages
> >> provides opportunities both for the field to grow and for making tools
> >> available for speakers of low-resource languages.
> >>
> >> In recent years, several workshops and evaluations have been organized
> >> to promote research on low-resource languages. NIST has been
> >> conducting Low Resource Human Language Technology evaluations
> >> (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
> >> no training data in the evaluation language. Participants receive
> >> training data in related languages, but need to bootstrap systems in
> >> the surprise evaluation language at the start of the evaluation.
> >> Methods for this include pivoting approaches and taking advantage of
> >> linguistic universals. The evaluations are supported by DARPA's Low
> >> Resource Languages for Emergent Incidents (LORELEI) program, which
> >> seeks to advance technologies that are less dependent on large data
> >> resources and that can be quickly pivoted to new languages within a
> >> very short amount of time so that information from any language can be
> >> extracted in a timely manner to provide situation awareness to
> >> emergent incidents. There are also the Workshop on Technologies for MT
> >> of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
> >> Approaches for Low-Resource Natural Language Processing (DeepLo),
> >> which provide a venue for sharing research and working on the research
> >> and development in this field.
> >>
> >> This special issue solicits original research papers on MT
> >> systems/methods and related NLP tools for low-resource languages in
> >> general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very
> >> welcome to submit their work to the special issue. Summary papers on
> >> MT research for specific low-resource languages, as well as extended
> >> versions (>40% difference) of published papers from relevant
> >> conferences/workshops are also welcome.
> >>
> >> Topics of the special issue include but are not limited to:
> >>  * Research and review papers of MT systems/methods for low-resource
> >> languages
> >>  * Research and review papers of pre-processing and/or post-processing
> >> NLP tools for MT
> >>  * Word tokenizers/de-tokenizers for low-resource languages
> >>  * Word/morpheme segmenters for low-resource languages
> >>  * Use of

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-11-04 Thread Sevilay Bayatlı

Hi,
I have prepared the structure of the paper.  it is just start,  here the
link:

https://www.overleaf.com/8679142877hkvxvhpsyvyz

If you agree with this then I will keep work on, and anyone need to
contribute  to edit or add stuff  can reach the content.

Sevilay



On Wed, Oct 23, 2019 at 6:11 PM Sevilay Bayatlı 
wrote:

> 1- If the paper present translating between less-resources closely related
> languages then we have to provide proof that MT between related languages
> is easier for all systems (and provide citations) and so the current
> approach tests the new system on closely related languages.
>
> 2- Regarding of comparing apertium with corpus based systems, as you know
> for that we have to use some parallel data for training and testing
> systems. Previously I sued data form OPUS but their data has a lot of
> problems such as more than half of sentences were repeated and most of
> target sentence was unrelated to the source sentence..etc. So what I did I
> trained the systems with OPUS and tested it with my test data. I did the
> evaluation by comparing it with post-edited my test data. However the
> reviewers complain for bais and say it is not fair. so I want to know the
> others how they did it. And also if there is recently published papers
> about comparing NMT and apertium could you give us the link.
>
> 3- I think it will be great idea "something about all the technological
> improvements" as Flammie pointed out. We may do something like that section
> for each module with diagram explain its process by applying it for one
> pair.
>
> Sevilay
>
> On Wed, Oct 23, 2019 at 5:39 PM Juan Pablo  wrote:
>
>> Same here!
>> I think submitting a paper to this Special Issue it is a great idea, as
>> coverage for low-resource languages is one of the distinctive traits of
>> Apertium as well as of this community.
>> If there is something I can contribute, I would be happy to collaborate
>> in it (no need to be in the authors' list). Please, keep me informed.
>> Best,
>> Juan Pablo
>> On 23/10/2019 13:36, Hèctor Alòs i Font wrote:
>>
>> Missatge de Tommi A Pirinen  del
>> dia dc., 23 d’oct. 2019 a les 14:16:
>>
>>> On Mon, Oct 21, 2019 at 10:57:28PM +0300, Sevilay Bayatlı wrote:
>>> > Hi,
>>> > its my pleasure to participate too, if you want I can prepare the
>>> overleaf
>>> > and send it to you. But first lets discuss here the content of the
>>> paper.
>>>
>>> Ok. I have two things in mind that could be included:
>>>
>>> * some kind of description of all low-resource languages done in past 10
>>>   years, most of these have publications to cite etc.
>>> * something about all the technological improvements, all the gsoc
>>>   stuffs that's been used now, etc.
>>>
>>> Some of the publications e.g. in loresmt this year also have really nice
>>> comparison between apertium and state-of-the-art NMT that should be
>>> replicated in this article.
>>>
>>>
>> If the article deals in a relevant way with Apertium and low-ressource
>> languages in recent years, I would be happy to collaborate in it. If it's
>> about technical issues, I can't contribute anything of interest.
>>
>> Hèctor
>>
>>
>> ___
>> Apertium-stuff mailing 
>> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Help with is error

2019-11-03 Thread Sevilay Bayatlı

hi,
first try to use ./configure with compile of monolingual, then be sure to
run it as

./autogen.sh --with-lang1=../apertium-xxx --with-lang2=../apertium-yyy

Sevilay
On Sun, Nov 3, 2019 at 11:29 AM kiran srigiri  wrote:

> I am running ./autogen.sh after comping eng-hau pair. Stuck with this
> error please help
> configure: error: Package requirements (apertium-eng) were not met:
>
> No package 'apertium-eng' found
>
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
>
> Alternatively, you may set the environment variables APERTIUM_ENG_CFLAGS
> and APERTIUM_ENG_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Help with error

2019-10-25 Thread Sevilay Bayatlı

be sure you have first compiled monolingual  with that commands then
bilingual.

Sevilay

On Fri, Oct 25, 2019 at 5:56 PM kiran srigiri  wrote:

> I did that same still same error with ./configure
>
> On Fri, Oct 25, 2019 at 8:24 PM Sevilay Bayatlı 
> wrote:
>
>> As I remember I had this problem with apertium-eng-spa, then solved by
>> compiling it like that:
>>
>> ./autogen.sh
>> ./configure
>> make
>>
>> Sevilay
>>
>>
>> On Fri, Oct 25, 2019 at 5:42 PM kiran srigiri  wrote:
>>
>>> Cleared that error by doing
>>> $ touch NEWS AUTHORS ChangeLog
>>> $ bash autogen.sh --prefix=/home/fran/local
>>> --with-lang1=/home/fran/source/apertium/languages/apertium-eng
>>> --with-lang2=../apertium-hau
>>>
>>> but stuck at hecking for a BSD-compatible install... /usr/bin/install -c
>>> checking whether build environment is sane... yes
>>> checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
>>> checking for gawk... gawk
>>> checking whether make sets $(MAKE)... yes
>>> checking whether make supports nested variables... yes
>>> checking whether ln -s works... yes
>>> checking for gawk... (cached) gawk
>>> checking for pkg-config... /usr/bin/pkg-config
>>> checking pkg-config is at least version 0.9.0... yes
>>> checking for APERTIUM... yes
>>> checking for cg-comp... /usr/bin/cg-comp
>>> checking for cg-proc... /usr/bin/cg-proc
>>> checking for lrx-comp... /usr/bin/lrx-comp
>>> checking for lrx-proc... /usr/bin/lrx-proc
>>> checking for gzcat... no
>>> checking for zcat... /usr/bin/zcat
>>> checking for APERTIUM_ENG... no
>>> configure: error: Package requirements (apertium-eng) were not met:
>>>
>>> No package 'apertium-eng' found
>>>
>>> Consider adjusting the PKG_CONFIG_PATH environment variable if you
>>> installed software in a non-standard prefix.
>>>
>>> Alternatively, you may set the environment variables APERTIUM_ENG_CFLAGS
>>> and APERTIUM_ENG_LIBS to avoid the need to call pkg-config.
>>> See the pkg-config man page for more details.
>>>
>>>
>>> please help!!
>>>
>>> On Fri, Oct 25, 2019 at 6:29 PM kiran srigiri 
>>> wrote:
>>>
>>>> I have complied eng-hau translation package but while trying to run
>>>> autogen.sh I get this error message
>>>> Makefile.am: error: required file './NEWS' not found
>>>> Makefile.am: error: required file './AUTHORS' not found
>>>> Makefile.am: error: required file './ChangeLog' not found
>>>> autoreconf: automake failed with exit status: 1
>>>>
>>>> Please help
>>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Help with error

2019-10-25 Thread Sevilay Bayatlı

As I remember I had this problem with apertium-eng-spa, then solved by
compiling it like that:

./autogen.sh
./configure
make

Sevilay


On Fri, Oct 25, 2019 at 5:42 PM kiran srigiri  wrote:

> Cleared that error by doing
> $ touch NEWS AUTHORS ChangeLog
> $ bash autogen.sh --prefix=/home/fran/local
> --with-lang1=/home/fran/source/apertium/languages/apertium-eng
> --with-lang2=../apertium-hau
>
> but stuck at hecking for a BSD-compatible install... /usr/bin/install -c
> checking whether build environment is sane... yes
> checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
> checking for gawk... gawk
> checking whether make sets $(MAKE)... yes
> checking whether make supports nested variables... yes
> checking whether ln -s works... yes
> checking for gawk... (cached) gawk
> checking for pkg-config... /usr/bin/pkg-config
> checking pkg-config is at least version 0.9.0... yes
> checking for APERTIUM... yes
> checking for cg-comp... /usr/bin/cg-comp
> checking for cg-proc... /usr/bin/cg-proc
> checking for lrx-comp... /usr/bin/lrx-comp
> checking for lrx-proc... /usr/bin/lrx-proc
> checking for gzcat... no
> checking for zcat... /usr/bin/zcat
> checking for APERTIUM_ENG... no
> configure: error: Package requirements (apertium-eng) were not met:
>
> No package 'apertium-eng' found
>
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
>
> Alternatively, you may set the environment variables APERTIUM_ENG_CFLAGS
> and APERTIUM_ENG_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
>
>
> please help!!
>
> On Fri, Oct 25, 2019 at 6:29 PM kiran srigiri  wrote:
>
>> I have complied eng-hau translation package but while trying to run
>> autogen.sh I get this error message
>> Makefile.am: error: required file './NEWS' not found
>> Makefile.am: error: required file './AUTHORS' not found
>> Makefile.am: error: required file './ChangeLog' not found
>> autoreconf: automake failed with exit status: 1
>>
>> Please help
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-10-23 Thread Sevilay Bayatlı

1- If the paper present translating between less-resources closely related
languages then we have to provide proof that MT between related languages
is easier for all systems (and provide citations) and so the current
approach tests the new system on closely related languages.

2- Regarding of comparing apertium with corpus based systems, as you know
for that we have to use some parallel data for training and testing
systems. Previously I sued data form OPUS but their data has a lot of
problems such as more than half of sentences were repeated and most of
target sentence was unrelated to the source sentence..etc. So what I did I
trained the systems with OPUS and tested it with my test data. I did the
evaluation by comparing it with post-edited my test data. However the
reviewers complain for bais and say it is not fair. so I want to know the
others how they did it. And also if there is recently published papers
about comparing NMT and apertium could you give us the link.

3- I think it will be great idea "something about all the technological
improvements" as Flammie pointed out. We may do something like that section
for each module with diagram explain its process by applying it for one
pair.

Sevilay

On Wed, Oct 23, 2019 at 5:39 PM Juan Pablo  wrote:

> Same here!
> I think submitting a paper to this Special Issue it is a great idea, as
> coverage for low-resource languages is one of the distinctive traits of
> Apertium as well as of this community.
> If there is something I can contribute, I would be happy to collaborate in
> it (no need to be in the authors' list). Please, keep me informed.
> Best,
> Juan Pablo
> On 23/10/2019 13:36, Hèctor Alòs i Font wrote:
>
> Missatge de Tommi A Pirinen  del dia
> dc., 23 d’oct. 2019 a les 14:16:
>
>> On Mon, Oct 21, 2019 at 10:57:28PM +0300, Sevilay Bayatlı wrote:
>> > Hi,
>> > its my pleasure to participate too, if you want I can prepare the
>> overleaf
>> > and send it to you. But first lets discuss here the content of the
>> paper.
>>
>> Ok. I have two things in mind that could be included:
>>
>> * some kind of description of all low-resource languages done in past 10
>>   years, most of these have publications to cite etc.
>> * something about all the technological improvements, all the gsoc
>>   stuffs that's been used now, etc.
>>
>> Some of the publications e.g. in loresmt this year also have really nice
>> comparison between apertium and state-of-the-art NMT that should be
>> replicated in this article.
>>
>>
> If the article deals in a relevant way with Apertium and low-ressource
> languages in recent years, I would be happy to collaborate in it. If it's
> about technical issues, I can't contribute anything of interest.
>
> Hèctor
>
>
> ___
> Apertium-stuff mailing 
> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: [Mt-list] Special Issue on Machine Translation for Low-Resource Languages

2019-10-21 Thread Sevilay Bayatlı

Hi,
its my pleasure to participate too, if you want I can prepare the overleaf
and send it to you. But first lets discuss here the content of the paper.

Sevilay

On Mon, Oct 21, 2019 at 7:24 PM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Sat, Oct 19, 2019 at 03:36:29AM +0100, Francis Tyers wrote:
> > We should definitely send a paper on Apertium to this. It has been 10
> years
> > since the last one and a lot of things have happened since then!
>
> I can probably help too, should we organise a git or overleaf and just
> gestart throwing things together?
>
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> , Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> .
> I tend to follow inline-posting style in desktop e-mail messages.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] similar --state-of-the-art systems

2019-05-27 Thread Sevilay Bayatlı

Hi,

I need to compare apetium-ambiguous result with similar state-of-the-art
systems,  I have compared it with OpenNMT,  do you have any suggestions?

Best wishes,
Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Improve apertium-ambiguous

2019-04-23 Thread Sevilay Bayatlı

Thanks Fran :)

Sevilay

On Tue, 23 Apr 2019, 22:45 Francis Tyers,  wrote:

> El 2019-04-23 20:32, Sevilay Bayatlı escribió:
> > One more thing you could try is doing a "semi-oracle" system:
> >
> > Make the translations, and choose the one that is closest to the
> > reference translation. What is the best score can you get?
> >
> > Thanks for your comments, they are useful,  with the comment above,
> > how can we choose the translation closest into reference? could you
> > give an example?
> >
> > Sevilay
>
> You could use WER or BLEU on all of the possible translations and
> pick the highest scoring one.
>
> Fran
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Improve apertium-ambiguous

2019-04-23 Thread Sevilay Bayatlı

One more thing you could try is doing a "semi-oracle" system:

Make the translations, and choose the one that is closest to the
reference translation. What is the best score can you get?


Thanks for your comments, they are useful,  with the comment above, how can
we choose the translation closest into reference? could you give an example?

Sevilay

On Tue, Apr 23, 2019 at 4:44 PM Francis Tyers  wrote:

> El 2019-04-23 10:27, Sevilay Bayatlı escribió:
> > Hi everyone,
> >
> > We want to improve apertium-ambiguous for getting more better result,
> > there are more than options for that, either by improve it
> > linguistically or using new learning method.
> >
> > The first solutions is possible in such cases:
> >
> > 1- pretty time (for adding more vocabulary and write transfer rules ),
> > as I understand all Oguz Turkic group and some of languages in Kipchak
> > group, I can choose one system and improve it, but based  on my
> > experiments, this can improve the system in case there is much more
> > of ambiguous rules and 0 out of vocabulary, also  if I have  have
> > time.
> >
> > 2- using new learning method, this can be in step replace it with
> > maximum entropy, we talked with Aboelhamd for using scikit-learn, but
> > didn't decide  a good formulation for our problem, yet.
> >
> > Dear apertiumer, we want to hear  your suggestions for choosing new
> > method instead of maximum entropy.
> >
>
> My thoughts would be to start by characterising what the problem is
> with maximum entropy, and then start to look at other methods.
>
> For example, determine what role amount of data plays. Try with 10%,
> 25%, 50%, 75%, 100% and look at the learning curve, if it doesn't
> seem to be plateauing then perhaps try adding more data.


> Another thing would be look at the number of ambiguous rules, try
> with 1, 2, 5, 10, ... and see what the learning curve is. How much
> difference does each rule ambiguity add?
>
> In addition, you could think of adding more features, for example,
> tags as well as lemmas.
>
> One more thing you could try is doing a "semi-oracle" system:
>
> Make the translations, and choose the one that is closest to the
> reference translation. What is the best score can you get?
>
> After doing this I think it would be worthwhile looking at other
> methods. SVM is one option, as are CRF and RNNs, but remember
> that for RNN a lot of data is needed, so I'm not sure how much
> sense it makes looking at that unless you are able to process
> a lot more data more efficiently.
>
> Best regards,
>
> Francis M. Tyers
>
>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Improve apertium-ambiguous

2019-04-23 Thread Sevilay Bayatlı

Hi everyone,

We want to improve apertium-ambiguous for getting more better result, there
are more than options for that, either by improve it linguistically or
using new learning method.

The first solutions is possible in such cases:

1- pretty time (for adding more vocabulary and write transfer rules ), as I
understand all Oguz Turkic group and some of languages in Kipchak group, I
can choose one system and improve it, but based  on my experiments, this
can improve the system in case there is much more  of ambiguous rules and 0
out of vocabulary, also  if I have  have time.

2- using new learning method, this can be in step replace it with maximum
entropy, we talked with Aboelhamd for using scikit-learn, but didn't
decide  a good formulation for our problem, yet.


Dear apertiumer, we want to hear  your suggestions for choosing new method
instead of maximum entropy.

Best wishes,

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı

I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
from literature GRU has more advantages than RNN.

Sevilay

On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly 
wrote:

> Hi Sevilay,
>
> I think a new language model that could distinguish the best ambiguous
> combination/s of a translation, would eliminate our need to max entropy
> model or any other method.
> But is that the case with RNNs LM, I don't know yet.
> But for now, do you agree that we need to change the LM first ? or you
> prefer going straight to an alternative method for max entropy ? and do you
> have any idea for such alternative method ?
> In my opinion, I think fixing all the bugs, evaluating our current system,
> then changing n-gram to RNNs, is the prior plan for the next two weeks or
> so.
> After that we can focus the research on what's next, if the accuracy is
> not good enough or there is a room for improvement.
> Do you agree with this ?
>
> Regards,
> Aboelhamd
>
> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı 
> wrote:
>
>> Aboelhamd,
>>
>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>> model is a good idea, probably we can achieve more gain,  however, the most
>> important part here is changing the maximum entropy.
>>
>> Lets see, what Fran thinks about it.
>>
>> Regards,
>>
>> Sevilay
>>
>>
>>
>>
>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
&

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı

Aboelhamd,

I think using Gated Recurrent Units (GRUS)  instead of n-gram language
model is a good idea, probably we can achieve more gain,  however, the most
important part here is changing the maximum entropy.

Lets see, what Fran thinks about it.

Regards,

Sevilay




On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly 
wrote:

> Hi Sevilay. Hi Francis,
>
> Unfortunately, Sevilay reported that the evaluation results of kaz-tur and
> spa-eng pairs were very bad with 30% of the tested sentences were good,
> compared to apertium LRLM resolution.
> So we discussed what to do next and it is to utilize the breakthrough of
> deep learning neural networks in NLP and especially machine translations.
> Also we discussed about using different values of n more than 5 in the
> already used n-gram language model. And to evaluate the result of
> increasing value of n, which could give us some more insights in what to do
> next and how to do it.
>
> Since I have an intro to deep learning subject this term in college, I
> waited this past two weeks to be introduced to the application of deep
> learning in NLP and MTs.
> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
> and why to use it instead of the standard network in NLP, beside
> understanding the different architectures of it and the math done in the
> forward and back propagation.
> Also besides knowing how to build a simple language model, and avoiding
> the problem of (vanishing gradient) leading to not capturing long
> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
> Memory (LSTM) network.
>
> For next step, we will consider working only on the language model and to
> let the max entropy part for later discussions.
> So along with trying different n values in the n-gram language model and
> evaluate the results, I will try either to use a ready RNNLM or to build a
> new one from scratch from what I learnt so far. Honestly I prefer the last
> choice because it will increase my experience in applying what I have
> learnt.
> In last 2 weeks I implemented RNNs with GRUs and LSTM and also implemented
> a character based language model as two assignments and they were very fun
> to do. So implementing a RNNs word based character LM will not take much
> time, though it may not be close to the state-of-the-art model and this is
> the disadvantage of it.
>
> Using NNLM instead of the n-gram LM has these possible advantages :
> - Automatically learn such syntactic and semantic features.
> - Overcome the curse of dimensionality by generating better
> generalizations.
>
> --
>
> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
> that different as Sevilay pointed out in our discussion.
> I knew that NNLM is better than statistical one, also that using machine
> learning instead of maximum entropy model will give better performance.
> *But* the evaluation results were very very disappointing, unexpected and
> illogical, so I thought there might be a bug in the code.
> And after some search, I found that I did a very very silly *mistake* in
> normalizing the LM scores. As the scores are log base 10 of the sentence
> probability, then the higher in magnitude has the lower probability, but I
> what I did was the inverse of that, and that was the cause of the very bad
> results.
>
> I am fixing this now and then will re-evaluate the results with Sevilay.
>
> Regards,
> Aboelhamd
>
>
> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly 
> wrote:
>
>> Thanks Sevilay for your feedback, and thanks for the resources.
>>
>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı > wrote:
>>
>>> hi Aboelhamd,
>>>
>>> Your proposal looks good, I found these resource may be will be benefit.
>>>
>>>
>>>
>>> <https://arxiv.org/pdf/1601.00710>
>>> Multi-source *neural translation* <https://arxiv.org/abs/1601.00710>
>>> https://arxiv.org/abs/1601.00710
>>>
>>>
>>> <https://arxiv.org/pdf/1708.05943>
>>> *Neural machine translation *with extended context
>>> <https://arxiv.org/abs/1708.05943>
>>> https://arxiv.org/abs/1708.05943
>>>
>>> Handling homographs in *neural machine translation*
>>> <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510
>>>
>>>
>>>
>>> Sevilay
>>>
>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I got a not solid yet idea as an alternative to yasmet and max entropy
>

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı

Hi Aboelhamd,

For now it is ok to record day by day, but then you can change it week by
week and make it in a table.

Sevilay

On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly 
wrote:

> Hi,
>
> I am uploading the summary of each day of work in this wiki page
> <http://wiki.apertium.org/wiki/User:Aboelhamd/progress>.
> Please, take a look and let me know if there is something else I could do
> instead.
>
> Thanks.
>
> On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> According to the timeline I put in my proposal, I am supposed to start
>> phase 1 today.
>> I want to know which procedures to do to document my work, day by day and
>> week by week.
>> Do I create a page in wiki to save my progress ?
>> Or is there another way ?
>>
>> Thanks
>>
>> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>> probability, then the higher in magnitude has the lower probability, but I
>>> what I did was the inverse of that, and that was the cause of the very bad
>>> results.
>>>
>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>
>>> Regards,
>>> Aboelhamd
>>>
>>>
>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
>>>> Th

Re: [Apertium-stuff] GSOC proposal

2019-04-08 Thread Sevilay Bayatlı

Then change it to*Bring a released language pair up to state-of-the-art
quality*

Sevilay

On Mon, Apr 8, 2019 at 6:24 PM ogabek yusupov 
wrote:

> My language pair is already exists
> https://github.com/apertium/apertium-tur-uzb
>
> On Mon, Apr 8, 2019 at 8:12 PM Sevilay Bayatlı 
> wrote:
>
>> hi Ogabek,
>> Your project under  *Adopt an unreleased language pair,* please change
>> it from *Extend weighted transfer rules.*
>>
>> Sevilay
>>
>>
>> On Mon, Apr 8, 2019 at 5:59 PM ogabek yusupov 
>> wrote:
>>
>>> Hello everyone, please look at my proposal.
>>>
>>> http://wiki.apertium.org/wiki/User:Ogabek
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC proposal

2019-04-08 Thread Sevilay Bayatlı

hi Ogabek,
Your project under  *Adopt an unreleased language pair,* please change it
from *Extend weighted transfer rules.*

Sevilay


On Mon, Apr 8, 2019 at 5:59 PM ogabek yusupov 
wrote:

> Hello everyone, please look at my proposal.
>
> http://wiki.apertium.org/wiki/User:Ogabek
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Confusion on selection the problem

2019-04-06 Thread Sevilay Bayatlı

sorry for previous email I meant  work in
https://github.com/apertium/apertium-tel

Sevilay

On Sat, Apr 6, 2019 at 6:58 PM Sevilay Bayatlı 
wrote:

> In my idea I recommend  you  to choose  work in
> https://github.com/apertium/apertium-eng-tel
>
> Sevilay
>
> On Sat, Apr 6, 2019 at 6:30 PM Aboelhamd Aly 
> wrote:
>
>> Hi Sanmitra,
>> Some mentors responded there to you and others too.
>>
>> On Sat, Apr 6, 2019 at 5:21 PM Sanmitra Dharmavarapu <
>> goodfriend2...@gmail.com> wrote:
>>
>>> Ok Sevilay.
>>>
>>> But, I am unable to find much engagement on it. Are there any offiice
>>> hours?
>>>
>>> with regards,
>>> Sanmitra D.
>>>
>>> On Sat, 6 Apr 2019 at 20:28, Sevilay Bayatlı 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Please for discussing your topic join IRC #apertium
>>>>
>>>> Sevilay
>>>>
>>>> On Sat, 6 Apr 2019, 10:51 Sanmitra Dharmavarapu, <
>>>> goodfriend2...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>> I want to contribute to the language Telugu Language at Apertium.
>>>>> I couldn't able to select between contributing to
>>>>>  'English-telugu <https://github.com/apertium/apertium-eng-tel>'
>>>>> translation Pair
>>>>> or
>>>>> "Linguistic data for Telugu <https://github.com/apertium/apertium-tel
>>>>> >"
>>>>>
>>>>> I am more interested in the second one. I am meeting and collecting
>>>>> the data from Universities like Andhra University where I found a lot of
>>>>> digitalised data of Telugu Language. And there is a lot of News paper data
>>>>> and many dictionaries too.
>>>>>
>>>>> Telugu is the 16th largest spoken language in the world by 2013 with
>>>>> 94million people.
>>>>>
>>>>> Can anyone help me by suggesting the differences the above two.
>>>>> So, that I can Complete my proposal to GSoC.
>>>>>
>>>>> The love towards my mother tounge made me to work on this.
>>>>>
>>>>> Waiting for your reply,
>>>>>
>>>>> With regards,
>>>>> Sanmitra D.
>>>>> ___
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>> ___
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Confusion on selection the problem

2019-04-06 Thread Sevilay Bayatlı

In my idea I recommend  you  to choose  work in
https://github.com/apertium/apertium-eng-tel

Sevilay

On Sat, Apr 6, 2019 at 6:30 PM Aboelhamd Aly 
wrote:

> Hi Sanmitra,
> Some mentors responded there to you and others too.
>
> On Sat, Apr 6, 2019 at 5:21 PM Sanmitra Dharmavarapu <
> goodfriend2...@gmail.com> wrote:
>
>> Ok Sevilay.
>>
>> But, I am unable to find much engagement on it. Are there any offiice
>> hours?
>>
>> with regards,
>> Sanmitra D.
>>
>> On Sat, 6 Apr 2019 at 20:28, Sevilay Bayatlı 
>> wrote:
>>
>>> Hi,
>>>
>>> Please for discussing your topic join IRC #apertium
>>>
>>> Sevilay
>>>
>>> On Sat, 6 Apr 2019, 10:51 Sanmitra Dharmavarapu, <
>>> goodfriend2...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> I want to contribute to the language Telugu Language at Apertium.
>>>> I couldn't able to select between contributing to
>>>>  'English-telugu <https://github.com/apertium/apertium-eng-tel>'
>>>> translation Pair
>>>> or
>>>> "Linguistic data for Telugu <https://github.com/apertium/apertium-tel
>>>> >"
>>>>
>>>> I am more interested in the second one. I am meeting and collecting the
>>>> data from Universities like Andhra University where I found a lot of
>>>> digitalised data of Telugu Language. And there is a lot of News paper data
>>>> and many dictionaries too.
>>>>
>>>> Telugu is the 16th largest spoken language in the world by 2013 with
>>>> 94million people.
>>>>
>>>> Can anyone help me by suggesting the differences the above two.
>>>> So, that I can Complete my proposal to GSoC.
>>>>
>>>> The love towards my mother tounge made me to work on this.
>>>>
>>>> Waiting for your reply,
>>>>
>>>> With regards,
>>>> Sanmitra D.
>>>> ___
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Sevilay Bayatlı

Hi Aboelhamd,

There is some points in your proposal:

First, I do not think "splitting sentence" is a good idea, each language
has different syntax, how could you know when you should split the sentence.

Second, "substitute yasmet with other method", I think the result will not
be more better if you substituted it with statistical method.

Sincerely,

Sevilay

On Fri, Apr 5, 2019 at 7:41 PM Francis Tyers  wrote:

> El 2019-04-05 13:37, Aboelhamd Aly escribió:
> > Dear Francis,
> >
> > I thought that 30-40 hours per week are enough for GSoC and that's no
> > problem with any other activities, as long as I am able to preserve
> > that time. But if it's a problem, I will consider leaving the
> > part-time job when start in phase one.
> >
>
> Dear Aboelhamd,
>
> We count GSOC to be a fulltime job. It is extremely unlikely that
> you be selected if you are planning to have other paid employment
> at the same time.
>
> Regards,
>
> Fran
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Contributing to Telugu Language

2019-04-04 Thread Sevilay Bayatlı

hi,
Welcome into apertium here  you can find the ideas of Gsoc
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code

and here how to get started
http://wiki.apertium.org/wiki/Getting_started_with_induction_tools

Best wishes,
Sevilay

On Fri, Apr 5, 2019 at 7:12 AM Sanmitra Dharmavarapu <
goodfriend2...@gmail.com> wrote:

> Hello all,
> I want to contribute Telugu language at Apertium
> Please guide me I am new here and getting started with GSoC as I completed
> my exams yesterday.
>
> With regards,
> Sanmitra D.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Adding Ambiguous Rules to Transfer File

2019-04-03 Thread Sevilay Bayatlı

Dear gsoc students,

People who are working in a new language pair we recommend you to add
ambiguous rules to your language pair proposals,  but Before that could you
choose any language pair from apertium and add some ambiguous rules in its
transfer file, then show us how its working.

Ambiguous rules mean when more than one transfer rule applying to the  same
pattern, and, that can be either  two rules in the same length or one
longest rule and two shortest rules.

Best wishes,

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] spa-eng rule's issue

2019-03-26 Thread Sevilay Bayatlı

hi,

I think there is a  problem in interchunk  because when this rule  apply to this
sentence "Aún hay preguntas esenciales sin respuesta.",  give me this
translation " Still it #there is essential questions without response."
please can you  check it out? below I put the output of both systems.

Weighted: Still it #there is essential questions without response.


^adv-interc{^Still$}$ ^verbcj{^there
is$}$ ^nom_adj{^essential$
^question$}$ ^sin{^without$}$
^nom{^response$}$^punt{^.$}$

Unweighted: Still there are essential questions without response.

^Adv-interc{^still$}$
^haverhi{^there$ ^be<5>$}$
^nom_adj{^essential$ ^question<3>$}$
^sin{^without$}$
^nom{^response<3>$}$^punt{^.$}$^punt{^.$}$

Sevilay

On Mon, Mar 25, 2019 at 4:18 AM Francis Tyers  wrote:

> El 2019-03-25 00:50, Sevilay Bayatlı escribió:
> > After transfer all *.t*.x files from ambiguous-rules branch into
> > master (to make the result same for both system) now I got this result
> > from master branch  please just want to be sure are you getting the
> > same result or not?
> >
> > sevilay@sevilay-linux:/media/sevilay/SAMSUNG/master/apertium-eng-spa$
> > git branch
> > * master
> >
> > /master/apertium-eng-spa$ echo "hace . seis . años" | apertium -d .
> > spa-eng-postchunk | lt-proc -g spa-eng.autogen.bin
> > It #do . Six . Years.
> >
> > thanks for your help
> >
> > Sevilay
> >
> > On Mon, Mar 25, 2019 at 1:34 AM Sevilay Bayatlı
> >  wrote:
> >
> >> thanks.
> >>
> >> Sevilay
> >>
> >> On Mon, Mar 25, 2019 at 1:29 AM Francis Tyers 
> >> wrote:
> >>
> >>> El 2019-03-24 22:06, Sevilay Bayatlı escribió:
> >>>> I clone ambiguous-rules and compile it
> >>>>
> >>>> but when I did
> >>> sevilay@sevilay-linux:~/Documents/apertium-eng-spa$ git
> >>>> branch
> >>>> * master
> >>>>
> >>>> why show me master?  how should I download ambiguous-rules else
> >>> ?
> >>>> Sevilay
> >>>>
> >>>> On Sun, Mar 24, 2019 at 10:22 PM Francis Tyers
> >>> 
> >>>> wrote:
> >>>>
> >>>>> El 2019-03-24 19:09, Sevilay Bayatlı escribió:
> >>>>>> echo "hace . seis . años" | apertium -d . spa-eng-postchunk
> >>>>>>
> >>>>>> ^Prpers$ ^do$
> >>>>> ^.$
> >>>>>> ^Six$ ^.$ ^Year$^.$
> >>>>>>
> >>>>>> echo "hace . seis . años" | apertium -d . spa-eng-generador
> >>>>>> It #do . Six . Years
> >>>>>>
> >>>>>> uweighted(apertium):
> >>>>>>
> >>>>>> echo "hace seis años" | apertium -d . spa-eng
> >>>>>> Six years ago
> >>>>>> 
> >>>>>> ^Hace_num_nom{^six$ ^year$
> >>>>>> ^ago$}$^punt{^.$}$
> >>>>>>
> >>>>>> Also when I run it with unweighted(apertium) as below he
> >>> following
> >>>>>> rules are applying
> >>>>>>
> >>>>>> echo "hace . seis . años" | apertium -d . spa-eng
> >>>>>> It #do . Six . Years
> >>>>>> 
> >>>>>>
> >>>>>> 
> >>>>>>
> >>>>>> 
> >>>>>>
> >>>>>> ^Verbcj{^do$}$
> >>>>>> ^punt{^.$}$ ^Num{^six$}$
> >>>>>> ^punt{^.$}$
> >>>>>> ^Nom{^year<3>$}$^punt{^.$}$
> >>>>>>
> >>>>>> uweighted(apertium):
> >>>>>>
> >>>>>> echo "hace seis años" | apertium -d . spa-eng
> >>>>>> Six years ago
> >>>>>> 
> >>>>>> ^Hace_num_nom{^six$ ^year$
> >>>>>> ^ago$}$^punt{^.$}$
> >>>>>>
> >>>>>> Also when I run it with unweighted(apertium) as below he
> >>> following
> >>>>>> rules are applying
> >>>>>>
> >>>>>> echo "hace . seis . años" | apertium -d . spa-eng
> >>>>>> It #do . Six . Years
> >>>>>> 
> >>>>>>
> >>>>>> 
> >>>>>>
> >>>>>> 
> >>>>>>
> >>&

Re: [Apertium-stuff] spa-eng rule's issue

2019-03-24 Thread Sevilay Bayatlı

After transfer all *.t*.x files from ambiguous-rules branch into master (to
make the result same for both system) now I got this result from master
branch  please just want to be sure are you getting the same result or not?

sevilay@sevilay-linux:/media/sevilay/SAMSUNG/master/apertium-eng-spa$ git
branch
* master

/master/apertium-eng-spa$ echo "hace . seis . años" | apertium -d .
spa-eng-postchunk | lt-proc -g spa-eng.autogen.bin
It #do . Six . Years.

thanks for your help

Sevilay


On Mon, Mar 25, 2019 at 1:34 AM Sevilay Bayatlı 
wrote:

> thanks.
>
> Sevilay
>
> On Mon, Mar 25, 2019 at 1:29 AM Francis Tyers  wrote:
>
>> El 2019-03-24 22:06, Sevilay Bayatlı escribió:
>> > I clone ambiguous-rules and compile it
>> >
>> > but when I did sevilay@sevilay-linux:~/Documents/apertium-eng-spa$ git
>> > branch
>> > * master
>> >
>> > why show me master?  how should I download ambiguous-rules else ?
>> > Sevilay
>> >
>> > On Sun, Mar 24, 2019 at 10:22 PM Francis Tyers 
>> > wrote:
>> >
>> >> El 2019-03-24 19:09, Sevilay Bayatlı escribió:
>> >>> echo "hace . seis . años" | apertium -d . spa-eng-postchunk
>> >>>
>> >>> ^Prpers$ ^do$
>> >> ^.$
>> >>> ^Six$ ^.$ ^Year$^.$
>> >>>
>> >>> echo "hace . seis . años" | apertium -d . spa-eng-generador
>> >>> It #do . Six . Years
>> >>>
>> >>> uweighted(apertium):
>> >>>
>> >>> echo "hace seis años" | apertium -d . spa-eng
>> >>> Six years ago
>> >>> 
>> >>> ^Hace_num_nom{^six$ ^year$
>> >>> ^ago$}$^punt{^.$}$
>> >>>
>> >>> Also when I run it with unweighted(apertium) as below he following
>> >>> rules are applying
>> >>>
>> >>> echo "hace . seis . años" | apertium -d . spa-eng
>> >>> It #do . Six . Years
>> >>> 
>> >>>
>> >>> 
>> >>>
>> >>> 
>> >>>
>> >>> ^Verbcj{^do$}$
>> >>> ^punt{^.$}$ ^Num{^six$}$
>> >>> ^punt{^.$}$
>> >>> ^Nom{^year<3>$}$^punt{^.$}$
>> >>>
>> >>> uweighted(apertium):
>> >>>
>> >>> echo "hace seis años" | apertium -d . spa-eng
>> >>> Six years ago
>> >>> 
>> >>> ^Hace_num_nom{^six$ ^year$
>> >>> ^ago$}$^punt{^.$}$
>> >>>
>> >>> Also when I run it with unweighted(apertium) as below he following
>> >>> rules are applying
>> >>>
>> >>> echo "hace . seis . años" | apertium -d . spa-eng
>> >>> It #do . Six . Years
>> >>> 
>> >>>
>> >>> 
>> >>>
>> >>> 
>> >>>
>> >>> ^Verbcj{^do$}$
>> >>> ^punt{^.$}$ ^Num{^six$}$
>> >>> ^punt{^.$}$
>> >>> ^Nom{^year<3>$}$^punt{^.$}$
>> >>>
>> >>> ==
>> >>> weighted(apertium-ambiguous):
>> >>>
>> >>> a youngster *italoamericano of 33 years that from #do six years
>> >> #live
>> >>> in the runner of the death of Virginia's prison
>> >>>
>> >>> 
>> >>> 
>> >>>
>> >>> ^verbcj{^do$}$
>> >>> ^num_year{^six$ ^year$}$
>> >>>
>> >>> echo "un joven italoamericano de 33 años que desde hace seis
>> >> años
>> >>> vive en el corredor de la muerte de una cárcel de Virginia"|
>> >> apertium
>> >>> -d . spa-eng-biltrans
>> >>> ^uno/a$
>> >>> ^joven/youngster$
>> >> ^*italoamericano/*italoamericano$
>> >>> ^de/of/from$ ^33/33$
>> >>> ^año/year$
>> >>> ^que/that$ ^desde/from$
>> >>> ^hacer/do$
>> >>> ^seis/six$ ^año/year$
>> >>> ^vivir/live$
>> >>> ^en/in/on$ ^el/the$
>> >>> ^corredor/runner$ ^de/of/from$
>> >>> ^el/the$
>> >>> ^muerte/death$ ^de/of/from$
>> >>> ^uno/a$
>> >>> ^cárcel/prison$ ^de/of/from$
>> >>>
>> >> ^Virginia/Virginia$^./.$
>> >>>
>> >>> ==

Re: [Apertium-stuff] spa-eng rule's issue

2019-03-24 Thread Sevilay Bayatlı

thanks.

Sevilay

On Mon, Mar 25, 2019 at 1:29 AM Francis Tyers  wrote:

> El 2019-03-24 22:06, Sevilay Bayatlı escribió:
> > I clone ambiguous-rules and compile it
> >
> > but when I did sevilay@sevilay-linux:~/Documents/apertium-eng-spa$ git
> > branch
> > * master
> >
> > why show me master?  how should I download ambiguous-rules else ?
> > Sevilay
> >
> > On Sun, Mar 24, 2019 at 10:22 PM Francis Tyers 
> > wrote:
> >
> >> El 2019-03-24 19:09, Sevilay Bayatlı escribió:
> >>> echo "hace . seis . años" | apertium -d . spa-eng-postchunk
> >>>
> >>> ^Prpers$ ^do$
> >> ^.$
> >>> ^Six$ ^.$ ^Year$^.$
> >>>
> >>> echo "hace . seis . años" | apertium -d . spa-eng-generador
> >>> It #do . Six . Years
> >>>
> >>> uweighted(apertium):
> >>>
> >>> echo "hace seis años" | apertium -d . spa-eng
> >>> Six years ago
> >>> 
> >>> ^Hace_num_nom{^six$ ^year$
> >>> ^ago$}$^punt{^.$}$
> >>>
> >>> Also when I run it with unweighted(apertium) as below he following
> >>> rules are applying
> >>>
> >>> echo "hace . seis . años" | apertium -d . spa-eng
> >>> It #do . Six . Years
> >>> 
> >>>
> >>> 
> >>>
> >>> 
> >>>
> >>> ^Verbcj{^do$}$
> >>> ^punt{^.$}$ ^Num{^six$}$
> >>> ^punt{^.$}$
> >>> ^Nom{^year<3>$}$^punt{^.$}$
> >>>
> >>> uweighted(apertium):
> >>>
> >>> echo "hace seis años" | apertium -d . spa-eng
> >>> Six years ago
> >>> 
> >>> ^Hace_num_nom{^six$ ^year$
> >>> ^ago$}$^punt{^.$}$
> >>>
> >>> Also when I run it with unweighted(apertium) as below he following
> >>> rules are applying
> >>>
> >>> echo "hace . seis . años" | apertium -d . spa-eng
> >>> It #do . Six . Years
> >>> 
> >>>
> >>> 
> >>>
> >>> 
> >>>
> >>> ^Verbcj{^do$}$
> >>> ^punt{^.$}$ ^Num{^six$}$
> >>> ^punt{^.$}$
> >>> ^Nom{^year<3>$}$^punt{^.$}$
> >>>
> >>> ==
> >>> weighted(apertium-ambiguous):
> >>>
> >>> a youngster *italoamericano of 33 years that from #do six years
> >> #live
> >>> in the runner of the death of Virginia's prison
> >>>
> >>> 
> >>> 
> >>>
> >>> ^verbcj{^do$}$
> >>> ^num_year{^six$ ^year$}$
> >>>
> >>> echo "un joven italoamericano de 33 años que desde hace seis
> >> años
> >>> vive en el corredor de la muerte de una cárcel de Virginia"|
> >> apertium
> >>> -d . spa-eng-biltrans
> >>> ^uno/a$
> >>> ^joven/youngster$
> >> ^*italoamericano/*italoamericano$
> >>> ^de/of/from$ ^33/33$
> >>> ^año/year$
> >>> ^que/that$ ^desde/from$
> >>> ^hacer/do$
> >>> ^seis/six$ ^año/year$
> >>> ^vivir/live$
> >>> ^en/in/on$ ^el/the$
> >>> ^corredor/runner$ ^de/of/from$
> >>> ^el/the$
> >>> ^muerte/death$ ^de/of/from$
> >>> ^uno/a$
> >>> ^cárcel/prison$ ^de/of/from$
> >>>
> >> ^Virginia/Virginia$^./.$
> >>>
> >>> ==
> >>> weighted(apertium-ambiguous):
> >>>
> >>> a youngster *italoamericano of 33 years that from #do six years
> >> #live
> >>> in the runner of the death of Virginia's prison
> >>>
> >>> 
> >>> 
> >>>
> >>> ^verbcj{^do$}$
> >>> ^num_year{^six$ ^year$}$
> >>>
> >>> echo "un joven italoamericano de 33 años que desde hace seis
> >> años
> >>> vive en el corredor de la muerte de una cárcel de Virginia"|
> >> apertium
> >>> -d . spa-eng-biltrans
> >>> ^uno/a$
> >>> ^joven/youngster$
> >> ^*italoamericano/*italoamericano$
> >>> ^de/of/from$ ^33/33$
> >>> ^año/year$
> >>> ^que/that$ ^desde/from$
> >>> ^hacer/do$
> >>> ^seis/six$ ^año/year$
> >>> ^vivir/live$
>

Re: [Apertium-stuff] spa-eng rule's issue

2019-03-24 Thread Sevilay Bayatlı

echo "hace . seis . años" | apertium -d . spa-eng-postchunk
^Prpers$ ^do$ ^.$
^Six$ ^.$ ^Year$^.$
echo "hace . seis . años" | apertium -d . spa-eng-generador
It #do . Six . Years

uweighted(apertium):
echo "hace seis años" | apertium -d . spa-eng
Six years ago

^Hace_num_nom{^six$ ^year$
^ago$}$^punt{^.$}$

Also when I run it with unweighted(apertium) as below he following rules
are applying
echo "hace . seis . años" | apertium -d . spa-eng
It #do . Six . Years

^Verbcj{^do$}$
^punt{^.$}$ ^Num{^six$}$
^punt{^.$}$
^Nom{^year<3>$}$^punt{^.$}$
uweighted(apertium):
echo "hace seis años" | apertium -d . spa-eng
Six years ago

^Hace_num_nom{^six$ ^year$
^ago$}$^punt{^.$}$

Also when I run it with unweighted(apertium) as below he following rules
are applying
echo "hace . seis . años" | apertium -d . spa-eng
It #do . Six . Years

^Verbcj{^do$}$
^punt{^.$}$ ^Num{^six$}$
^punt{^.$}$
^Nom{^year<3>$}$^punt{^.$}$
==
weighted(apertium-ambiguous):
a youngster *italoamericano of 33 years that from #do six years #live in
the runner of the death of Virginia's prison

^verbcj{^do$}$
^num_year{^six$ ^year$}$

echo "un joven italoamericano de 33 años que desde hace seis años vive en
el corredor de la muerte de una cárcel de Virginia"| apertium -d .
spa-eng-biltrans
^uno/a$ ^joven/youngster$
^*italoamericano/*italoamericano$ ^de/of/from$
^33/33$ ^año/year$
^que/that$ ^desde/from$
^hacer/do$
^seis/six$ ^año/year$
^vivir/live$ ^en/in/on$
^el/the$
^corredor/runner$ ^de/of/from$
^el/the$ ^muerte/death$
^de/of/from$ ^uno/a$
^cárcel/prison$ ^de/of/from$
^Virginia/Virginia$^./.$

==
weighted(apertium-ambiguous):
a youngster *italoamericano of 33 years that from #do six years #live in
the runner of the death of Virginia's prison

^verbcj{^do$}$
^num_year{^six$ ^year$}$

echo "un joven italoamericano de 33 años que desde hace seis años vive en
el corredor de la muerte de una cárcel de Virginia"| apertium -d .
spa-eng-biltrans
^uno/a$ ^joven/youngster$
^*italoamericano/*italoamericano$ ^de/of/from$
^33/33$ ^año/year$
^que/that$ ^desde/from$
^hacer/do$
^seis/six$ ^año/year$
^vivir/live$ ^en/in/on$
^el/the$
^corredor/runner$ ^de/of/from$
^el/the$ ^muerte/death$
^de/of/from$ ^uno/a$
^cárcel/prison$ ^de/of/from$
^Virginia/Virginia$^./.$

so my question here how to solve # problem with word "do"?

Sevilay

On Sun, Mar 24, 2019 at 9:44 PM Francis Tyers  wrote:

> El 2019-03-24 18:30, Sevilay Bayatlı escribió:
> > un joven italoamericano de 33 años que desde hace seis años vive en
> > el corredor de la muerte de una cárcel de Virginia
> >
> > uweighted(apertium):
> >
> > echo "hace seis años" | apertium -d . spa-eng
> > Six years ago
> > 
> > ^Hace_num_nom{^six$ ^year$
> > ^ago$}$^punt{^.$}$
> >
> > Also when I run it with unweighted(apertium) as below he following
> > rules are applying
> >
> > echo "hace . seis . años" | apertium -d . spa-eng
> > It #do . Six . Years
> > 
> >
> > 
> >
> > 
> >
> > ^Verbcj{^do$}$
> > ^punt{^.$}$ ^Num{^six$}$
> > ^punt{^.$}$
> > ^Nom{^year<3>$}$^punt{^.$}$
> >
> > ==
> > weighted(apertium-ambiguous):
> >
> > a youngster *italoamericano of 33 years that from #do six years #live
> > in the runner of the death of Virginia's prison
> >
> >  
> > 
> >
> > ^verbcj{^do$}$
> > ^num_year{^six$ ^year$}$
> >
> > echo "un joven italoamericano de 33 años que desde hace seis años
> > vive en el corredor de la muerte de una cárcel de Virginia"| apertium
> > -d . spa-eng-biltrans
> > ^uno/a$
> > ^joven/youngster$ ^*italoamericano/*italoamericano$
> > ^de/of/from$ ^33/33$
> > ^año/year$
> > ^que/that$ ^desde/from$
> > ^hacer/do$
> > ^seis/six$ ^año/year$
> > ^vivir/live$
> > ^en/in/on$ ^el/the$
> > ^corredor/runner$ ^de/of/from$
> > ^el/the$
> > ^muerte/death$ ^de/of/from$
> > ^uno/a$
> > ^cárcel/prison$ ^de/of/from$
> > ^Virginia/Virginia$^./.$
> >
> > so my question here how to solve # problem with word "do"?
>
> Give the output of postchunk and the generator.
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] spa-eng rule's issue

2019-03-24 Thread Sevilay Bayatlı

un joven italoamericano de 33 años que desde hace seis años vive en el
corredor de la muerte de una cárcel de Virginia

uweighted(apertium):
echo "hace seis años" | apertium -d . spa-eng
Six years ago

^Hace_num_nom{^six$ ^year$
^ago$}$^punt{^.$}$

Also when I run it with unweighted(apertium) as below he following rules
are applying
echo "hace . seis . años" | apertium -d . spa-eng
It #do . Six . Years



^Verbcj{^do$}$
^punt{^.$}$ ^Num{^six$}$
^punt{^.$}$
^Nom{^year<3>$}$^punt{^.$}$
==
weighted(apertium-ambiguous):
a youngster *italoamericano of 33 years that from #do six years #live in
the runner of the death of Virginia's prison

 


^verbcj{^do$}$
^num_year{^six$ ^year$}$

echo "un joven italoamericano de 33 años que desde hace seis años vive en
el corredor de la muerte de una cárcel de Virginia"| apertium -d .
spa-eng-biltrans
^uno/a$ ^joven/youngster$
^*italoamericano/*italoamericano$ ^de/of/from$
^33/33$ ^año/year$
^que/that$ ^desde/from$
^hacer/do$
^seis/six$ ^año/year$
^vivir/live$ ^en/in/on$
^el/the$
^corredor/runner$ ^de/of/from$
^el/the$ ^muerte/death$
^de/of/from$ ^uno/a$
^cárcel/prison$ ^de/of/from$
^Virginia/Virginia$^./.$

so my question here how to solve # problem with word "do"?

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Proposal

2019-03-20 Thread Sevilay Bayatlı

Hi,

here how to start
http://wiki.apertium.org/wiki/Getting_started_with_induction_tools, also
you have to get Apertium wiki account to write your proposal.

best,

Sevilay


On Wed, Mar 20, 2019 at 12:52 PM Mohit Raj  wrote:

> Hi all,
> Here is my proposal for GSOC.
>
> I am Mohit Raj, doing my post graduation (4th sem) in linguistics from
> Dr. B.R. Ambedkar University , K.M.I, Agra.
>
>
> My area of interest is Machine Translation and Natural Language
> Processing. Previously i have completed courses on XML, Python programming,
> Language Technologies and Machine Translation. I have worked towards the
> development of parser for Magahi, in collaboration with my classmate Neerav
> Mathur, for course projects. I took participation in following workshop :-
>
>
> 1. 9th IASNLP-2018: IIIT-Hyderabad Advanced School on Natural Language
> Processing
>
> 2. SOIL-Tech: Towards Digital India at JNU, New Delhi
>
> 3. Hands on workshop on Statistical Machine Translation with Moses at
> K.M.I, Agra
>
>
> During the Machine Translation Workshop, Atul Kumar Ojha introduced us
> Rule Based Machine Translation system Apertium and in this period he also
> informed us about GSOC.
>
>
> I am interested in working on English-Magahi language pair for Machine
> Translation. Magahi belongs to Indo-Aryan language family and it is alos my
> native language. I have been suggested that in Machine Translation,
> Morphological Analyzer plays an important role in improving the system’s
> performance for morphologically rich language like Magahi. So I am
> interested in developing morph analyzer of Magahi.
>
>
> So, please give your feedback, Your feedback is greatly appreciated.
>
>
> Thanks,
>
>
> Mohit Raj
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] rule's issue

2019-03-14 Thread Sevilay Bayatlı

Hi Fran,
Do you think there is no problem in rules or applying rules in the issues
above? If that please let me know to start with training.

thank you,

Sevilay

On Thu, Mar 14, 2019 at 10:30 AM Sevilay Bayatlı 
wrote:

> I have pasted the biltrans for all of the sentences below.
>
> Sevilay
>
> On Thu, Mar 14, 2019 at 7:07 AM Francis Tyers  wrote:
>
>> El 2019-03-13 12:54, Sevilay Bayatlı escribió:
>> > I think we should not depend on rule's id to solve thess issues,
>> > please can you explain it depend on the rule's comments because rules
>> > id not right in the ambiguous-rules branch there are more than 10
>> > rules in ambiguous-branch have commented and they had ids.
>> >
>> >  All the following words  have # with the output of weighted system:
>> >
>> > #chaos
>> > Weighted:> > ^nom{^chaos$}$
>> >
>> > Unweighted:
>> > ^det_nom{^a<3>$ ^chaos<3>$}$
>>
>
>  Biltrans:
>  Sin embargo, antes de que aparezca este nuevo orden, puede que el
> mundo se enfrente a un desorden cada vez más profundo, si es que no
> directamente a un caos.
> ^Sin embargo/However$^,/,$ ^antes de
> que/before$
> ^aparecer/appear$
> ^este/this$
> ^nuevo/new$
> ^orden/order$^,/,$
> ^poder/can$
> ^que/that$ ^el/the$
> ^mundo/world$ ^se/$
> ^enfrentar/confront/pit$
> ^a/to$ ^uno/a$
> ^desorden/disorder$ ^cada vez más/increasingly$
> ^profundo/deep$^,/,$
> ^si/if$ ^ser/be$
> ^que/that$ ^no/no$
> ^directamente/directly/straight$ ^a/to$
> ^uno/a$
> ^caos/chaos$^./.$^./.$
>
>> >
>> > ===
>> > #there
>> > Weighted: 
>> > ^verbcj{^there is$}$
>> >
>> > Unweighted:
>> > ^Haverhi{^there$
>> > ^be<5>$}$
>>
>
>Biltrans:
>Por fortuna, existe una opción multilateral y ya hay un precedente.
>^Por fortuna/Fortunately$^,/,$
> ^existir/exist$
> ^uno/a$ ^opción/option$
> ^*multilateral/*multilateral$ ^y/and$
> ^ya/already$ ^hay/there
> is$ ^uno/a$
> ^precedente/precedent$^./.$^./.$
>
>> > ==
>> > #focus
>> > Weighted:> > ^nom{^focus$}$
>> >
>> > Unweighted:
>> > ^det_nom{^his<3>$ ^focus<3>$}$
>>
>
> Biltrans:
> La característica del pacto propuesto que genera mayor entusiasmo -su
> foco en las barreras regulatorias como estándares de productos
> obligatorios- en realidad es la que debería incitar la mayor preocupación.
> ^El/The$
> ^característica/characteristic$ ^de/of/from$
> ^el/the$ ^pacto/pact$
> ^proponer/propose$
> ^que/that$
> ^generar/generate$
> ^mayor/main/great$
> ^entusiasmo/enthusiasm$
> ^-/-$^suyo/his$
> ^foco/focus$ ^en/in/on$
> ^el/the$
> ^barrera/barrier$ ^*regulatorias/*regulatorias$
> ^como/like$ ^estándar/standard$
> ^de/of/from$ ^producto/product$
> ^obligatorio/compulsory/mandatory$^-/-$
> ^en realidad/in reality$
> ^ser/be$ ^el
> que/the one who$
> ^deber/have# to$
> ^incitar/incite$
> ^el/the$
> ^mayor/main/great$
> ^preocupación/worry$^./.$^./.$
>
>> >
>> > ==
>> > #time
>> > Weighted:
>> > > > ^nom{^time$
>> >
>> > Unweighted:
>> > 
>> > ^nom{^a<3>$ ^time<3>$
>>
>
>Biltrans:
> Ha entrado a la historia como el comienzo de algo nuevo, una nueva era tal
> vez, pero en cualquier caso un tiempo de cambios.
>
> ^Haber/Have$
> ^entrar/go# in$ ^a/to$
> ^el/the$
> ^historia/history/story$ ^como/like$
> ^el/the$
> ^comienzo/beginning$ ^de/of/from$
> ^algo/something$
> ^nuevo/new$^,/,$
> ^uno/a$
> ^nuevo/new$
> ^ser/be$ ^tal
> vez/maybe$^,/,$ ^pero/but$
> ^en/in/on$ ^cualquier/any$
> ^caso/case$ ^uno/a$
> ^tiempo/time/weather$ ^de/of/from$
> ^cambio/change$^./.$^./.$
>
>> >
>> >
>> ===
>> > #crisis
>> > Weghited:  de las crisis financieras= of the #crisis
>> >
>> > , > > g"
>> >
>> > ^de{^of$}$ ^det{^the$}$
>> > ^nom{^crisis$}$
>> >
>> > Unweighted:
>> > 
>> > ^det_nom_adj{^the$ ^financial$
>> > ^crisis<3>$}$
>>
> Biltrans:
>   echo "PARÍS – A medida que la crisis económica se profundiza y
> amplía, el mundo busca analogías históricas como ayuda para comprender lo
> que ha ocurrido.
>
> ^PARÍS/PARIS$ ^–/–$
> ^A/To$ ^medida/measure$
> ^que/that$
> ^el/the$ ^crisis/crisis$
> ^económico/economic/economical$
> ^se/$
> ^profundizar/deepen$
> ^y/and$
> ^ampliar/expand$^,/,$
> ^el/the$ ^mundo/world$
> ^buscar/look# for$
> ^analogía/analogy$
> ^histórico/historical$ ^como/like$
> ^ayuda/help$ ^para/for$
> ^comprender/comprise$ ^lo
> que/what$
> ^haber/have$
> ^ocurrir/occur$^./.$^./.$
>
> >
>> ===
>> >
>>
>> Thanks!
>>
>> Can you give the input sentence (biltrans) for each of those rules?
>>
>> Fran
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

2019-03-14 Thread Sevilay Bayatlı

Kazakh and Turkish does not consider completely similar in terms of
affixes, and that's why we have structural transfer rules between two
languages, still there is need to more transfer rules to improve the
quality of translation.

Sevilay

On Thu, Mar 14, 2019 at 6:16 PM Hèctor Alòs i Font 
wrote:

> Hi Daniyar,
>
> No, there isn't any problem to have different pipelines for different
> language pairs. In fact, there are lots of different possibilities. Take a
> look to the mode file for different language pairs (especially, old and new
> ones) and you'll see. In any case, I'm not sure that there should be any
> substantial differences between the Kazakh-Russian and the Kazakh-Turkish
> pipeline, except that the Turkish morphologic analysis and generation will
> be done with HFST and the Russian with lttoolbox.
>
> Hèctor
>
>
> El dj., 14 març 2019, 17.46, Daniyar Nariman via Apertium-stuff <
> apertium-stuff@lists.sourceforge.net> va escriure:
>
>> Hi Sevilay,
>>
>> In my message, I meant that Kazakh and Turkish languages are similar in
>> terms of affixes and sentence structure, and Kazakh and Russian are more
>> different. So if I will increase the translation quality of the first pair,
>> by adding some additional functionality to the pipeline, there is a chance
>> that the same might not work on the second pair. Finally, the question is,
>> Is this pipeline has to be the same for all language pairs, or it can
>> differ?
>> --
>> *From:* Sevilay Bayatlı 
>> *Sent:* Thursday, March 14, 2019 1:13:18 PM
>> *To:* apertium-stuff@lists.sourceforge.net
>> *Subject:* Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish
>>
>> Hi Daniyar,
>> ,
>> Could tell us how can increase accuracy on one pair and decrease for
>> other pair by modifying some parts of pipeline?
>>
>> Sevilay
>>
>>
>> On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov 
>> wrote:
>>
>>>
>>>
>>>
>>>  Forwarded Message 
>>> Subject:RBMT from Kazakh to Turkish
>>> Date:   Wed, 13 Mar 2019 19:07:42 +
>>> From:   Daniyar Nariman 
>>> To: il...@selimcan.org 
>>>
>>>
>>>
>>> Dear Ilnar Salimzianov,
>>>
>>>
>>> My name is Nariman. I am a third-year bachelor student at
>>> Innopolis University(Russia, Tatarstan). I am studying Data Science and
>>> really interested in disciplines such as machine learning, natural
>>> language processing, information retrieval etc.
>>>
>>>
>>> Recently I read your paper, RBMT from Kazakh to Turkish, which was
>>> published in EAMT 2018. It was really interesting to read. The thing is,
>>> I am applying to GSoC(Google Summer of Code) this year to Apertium, but
>>> I am still thinking on the topic which I would like to deal with. One of
>>> the topics was to bring the defined language pair to state-of-the-art
>>> quality and I would like to deal with Kazakh-Turkish pair as the
>>> Kazakh language my mother tongue and I studied the Turkish language in
>>> the high school for 5 years.
>>>
>>>
>>> I would like to ask If there any restrictions on how to increase the
>>> quality of this pair?
>>>
>>> Excluding adding a large number of rules or by expanding the
>>> dictionary(taken for granted). For instance by optimizing the algorithms
>>> given in the pipeline. I am asking this question because by modifying
>>> some part of the pipeline, we can increase accuracy on our pair of
>>> languages, but decrease on another pair and constructing a different
>>> pipeline for different pairs is not a good idea in my opinion.
>>>
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>> Best Regards,
>>>
>>> Daniyar Nariman
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Fwd: RBMT from Kazakh to Turkish

2019-03-14 Thread Sevilay Bayatlı

Hi Daniyar,
,
Could tell us how can increase accuracy on one pair and decrease for other
pair by modifying some parts of pipeline?

Sevilay


On Thu, Mar 14, 2019 at 11:26 AM Ilnar Salimzianov 
wrote:

>
>
>
>  Forwarded Message 
> Subject:RBMT from Kazakh to Turkish
> Date:   Wed, 13 Mar 2019 19:07:42 +
> From:   Daniyar Nariman 
> To: il...@selimcan.org 
>
>
>
> Dear Ilnar Salimzianov,
>
>
> My name is Nariman. I am a third-year bachelor student at
> Innopolis University(Russia, Tatarstan). I am studying Data Science and
> really interested in disciplines such as machine learning, natural
> language processing, information retrieval etc.
>
>
> Recently I read your paper, RBMT from Kazakh to Turkish, which was
> published in EAMT 2018. It was really interesting to read. The thing is,
> I am applying to GSoC(Google Summer of Code) this year to Apertium, but
> I am still thinking on the topic which I would like to deal with. One of
> the topics was to bring the defined language pair to state-of-the-art
> quality and I would like to deal with Kazakh-Turkish pair as the
> Kazakh language my mother tongue and I studied the Turkish language in
> the high school for 5 years.
>
>
> I would like to ask If there any restrictions on how to increase the
> quality of this pair?
>
> Excluding adding a large number of rules or by expanding the
> dictionary(taken for granted). For instance by optimizing the algorithms
> given in the pipeline. I am asking this question because by modifying
> some part of the pipeline, we can increase accuracy on our pair of
> languages, but decrease on another pair and constructing a different
> pipeline for different pairs is not a good idea in my opinion.
>
>
>
> Thanks in advance!
>
>
> Best Regards,
>
> Daniyar Nariman
>
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] rule's issue

2019-03-14 Thread Sevilay Bayatlı

I have pasted the biltrans for all of the sentences below.

Sevilay

On Thu, Mar 14, 2019 at 7:07 AM Francis Tyers  wrote:

> El 2019-03-13 12:54, Sevilay Bayatlı escribió:
> > I think we should not depend on rule's id to solve thess issues,
> > please can you explain it depend on the rule's comments because rules
> > id not right in the ambiguous-rules branch there are more than 10
> > rules in ambiguous-branch have commented and they had ids.
> >
> >  All the following words  have # with the output of weighted system:
> >
> > #chaos
> > Weighted: > ^nom{^chaos$}$
> >
> > Unweighted:
> > ^det_nom{^a<3>$ ^chaos<3>$}$
>

 Biltrans:
 Sin embargo, antes de que aparezca este nuevo orden, puede que el
mundo se enfrente a un desorden cada vez más profundo, si es que no
directamente a un caos.
^Sin embargo/However$^,/,$ ^antes de
que/before$
^aparecer/appear$
^este/this$
^nuevo/new$
^orden/order$^,/,$
^poder/can$
^que/that$ ^el/the$
^mundo/world$ ^se/$
^enfrentar/confront/pit$
^a/to$ ^uno/a$
^desorden/disorder$ ^cada vez más/increasingly$
^profundo/deep$^,/,$
^si/if$ ^ser/be$
^que/that$ ^no/no$
^directamente/directly/straight$ ^a/to$
^uno/a$
^caos/chaos$^./.$^./.$

> >
> > ===
> > #there
> > Weighted: 
> > ^verbcj{^there is$}$
> >
> > Unweighted:
> > ^Haverhi{^there$
> > ^be<5>$}$
>

   Biltrans:
   Por fortuna, existe una opción multilateral y ya hay un precedente.
   ^Por fortuna/Fortunately$^,/,$
^existir/exist$
^uno/a$ ^opción/option$
^*multilateral/*multilateral$ ^y/and$
^ya/already$ ^hay/there
is$ ^uno/a$
^precedente/precedent$^./.$^./.$

> > ==
> > #focus
> > Weighted: > ^nom{^focus$}$
> >
> > Unweighted:
> > ^det_nom{^his<3>$ ^focus<3>$}$
>

Biltrans:
La característica del pacto propuesto que genera mayor entusiasmo -su
foco en las barreras regulatorias como estándares de productos
obligatorios- en realidad es la que debería incitar la mayor preocupación.
^El/The$
^característica/characteristic$ ^de/of/from$
^el/the$ ^pacto/pact$
^proponer/propose$
^que/that$
^generar/generate$
^mayor/main/great$
^entusiasmo/enthusiasm$
^-/-$^suyo/his$
^foco/focus$ ^en/in/on$
^el/the$
^barrera/barrier$ ^*regulatorias/*regulatorias$
^como/like$ ^estándar/standard$
^de/of/from$ ^producto/product$
^obligatorio/compulsory/mandatory$^-/-$
^en realidad/in reality$
^ser/be$ ^el
que/the one who$
^deber/have# to$
^incitar/incite$
^el/the$
^mayor/main/great$
^preocupación/worry$^./.$^./.$

> >
> > ==
> > #time
> > Weighted:
> >  > ^nom{^time$
> >
> > Unweighted:
> > 
> > ^nom{^a<3>$ ^time<3>$
>

   Biltrans:
Ha entrado a la historia como el comienzo de algo nuevo, una nueva era tal
vez, pero en cualquier caso un tiempo de cambios.

^Haber/Have$
^entrar/go# in$ ^a/to$
^el/the$
^historia/history/story$ ^como/like$
^el/the$
^comienzo/beginning$ ^de/of/from$
^algo/something$
^nuevo/new$^,/,$
^uno/a$
^nuevo/new$
^ser/be$ ^tal
vez/maybe$^,/,$ ^pero/but$
^en/in/on$ ^cualquier/any$
^caso/case$ ^uno/a$
^tiempo/time/weather$ ^de/of/from$
^cambio/change$^./.$^./.$

> >
> >
> ===
> > #crisis
> > Weghited:  de las crisis financieras= of the #crisis
> >
> > ,  > g"
> >
> > ^de{^of$}$ ^det{^the$}$
> > ^nom{^crisis$}$
> >
> > Unweighted:
> > 
> > ^det_nom_adj{^the$ ^financial$
> > ^crisis<3>$}$
>
Biltrans:
  echo "PARÍS – A medida que la crisis económica se profundiza y
amplía, el mundo busca analogías históricas como ayuda para comprender lo
que ha ocurrido.

^PARÍS/PARIS$ ^–/–$ ^A/To$
^medida/measure$
^que/that$
^el/the$ ^crisis/crisis$
^económico/economic/economical$
^se/$
^profundizar/deepen$
^y/and$
^ampliar/expand$^,/,$
^el/the$ ^mundo/world$
^buscar/look# for$
^analogía/analogy$
^histórico/historical$ ^como/like$
^ayuda/help$ ^para/for$
^comprender/comprise$ ^lo
que/what$
^haber/have$
^ocurrir/occur$^./.$^./.$

>
> ===
> >
>
> Thanks!
>
> Can you give the input sentence (biltrans) for each of those rules?
>
> Fran
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] rule's issue

2019-03-13 Thread Sevilay Bayatlı

I think we should not depend on rule's id to solve thess issues, please can
you explain it depend on the rule's comments because rules id not right in
the ambiguous-rules branch there are more than 10 rules in ambiguous-branch
have commented and they had ids.

 All the following words  have # with the output of weighted system:

#chaos
Weighted:{^chaos$}$

Unweighted:
^det_nom{^a<3>$ ^chaos<3>$}$

===
#there
Weighted: 
^verbcj{^there is$}$

Unweighted:
^Haverhi{^there$ ^be<5>$}$
==
#focus
Weighted:{^focus$}$

Unweighted:
^det_nom{^his<3>$ ^focus<3>$}$

==
#time
Weighted:
{^time$

Unweighted:

^nom{^a<3>$ ^time<3>$

===
#crisis
Weghited:  de las crisis financieras= of the #crisis

, {^of$}$ ^det{^the$}$
^nom{^crisis$}$

Unweighted:

^det_nom_adj{^the$ ^financial$
^crisis<3>$}$
===

thanks in advance,
Sevilay


On Wed, Mar 13, 2019 at 1:37 PM Sevilay Bayatlı 
wrote:

> I think we should not depend on rule's id to solve this issue, please can
> you explain it depend on rule's comments because rules id not right with
> you there are more than 10 rules in ambiguous-branch commented and have id.
>
> hay diferencias evidentes=There are evident differences
>
> Weighted:   and   id="xx" comment="REGLA: NOM ADJ">
>
> output= it #there is evident differences
>
> ^verbcj{^there is$}$
>
> unweighted output:
>
> ^Haverhi{^there$ ^be<5>$}$
> ^nom{^difference<3>$}$^punt{^.$}$
>
> Sevilay
>
>
> On Wed, Mar 13, 2019 at 1:04 PM Francis Tyers  wrote:
>
>> El 2019-03-13 09:06, Sevilay Bayatlı escribió:
>> > ok but in both systems different rules applying, in unweighted 35
>> > applying  and in weighted 37 applying
>> >
>> > Sevilay
>> >
>> > On Wed, Mar 13, 2019 at 12:02 PM Francis Tyers 
>> > wrote:
>> >
>> >> El 2019-03-13 08:28, Sevilay Bayatlı escribió:
>> >>> Please are there any problem in the applying rules or in rules
>> >>> themselves for new system (Weighted), I dont see problem with
>> >> applying
>> >>> rules but I am not sure if there is problem in  rule itself or not
>> >>>
>> >>> #there
>> >>> #crisis
>> >>> 
>> >>> hay diferencias evidentes=There are evident differences
>> >>>
>> >>> Weighted:   and
>> >> > >>> id="14" comment="REGLA: NOM ADJ">
>> >>>
>> >>> output= it #there is evident differences
>> >>>
>> >>> ^verbcj{^there is$}$
>> >>>
>> >>> Unweighted:
>> >>>
>> >>> echo "hay diferencias"| apertium -d . spa-eng-chunker
>> >>>
>> >>> apertium-transfer: Rule 35 hay/there
>> >>> is
>> >>>
>> >>> apertium-transfer: Rule 1 diferencia/difference
>> >>>
>> >>> apertium-transfer: Rule 114 ./.
>> >>> ^Haverhi{^there$
>> >>> ^be<5>$}$
>> >>> ^nom{^difference<3>$}$^punt{^.$}$###
>> >>
>> >> Here is your answer:
>> >>
>> >> Your code is doing:
>> >>
>> >> ^verbcj{^there is$}$
>> >>
>> >> The unweighted code is doing:
>> >>
>> >> ^verbcj{^there is$}$
>> >>
>> >> Make the weighted code do the same as the unweighted code.
>> >>
>> 
>>
>> 
>>
>> How can these rules apply to that verb pattern?
>>
>> F.
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] rule's issue

2019-03-13 Thread Sevilay Bayatlı

Please are there any problem in the applying rules or in rules themselves
for new system (Weighted), I dont see problem with applying rules but I am
not sure if there is problem in  rule itself or not

#there
#crisis

hay diferencias evidentes=There are evident differences

Weighted:   and  

output= it #there is evident differences

^verbcj{^there is$}$

Unweighted:

 echo "hay diferencias"| apertium -d . spa-eng-chunker

apertium-transfer: Rule 35 hay/there
is

apertium-transfer: Rule 1 diferencia/difference

apertium-transfer: Rule 114 ./.
^Haverhi{^there$ ^be<5>$}$
^nom{^difference<3>$}$^punt{^.$}$
==

Weghited:  de las crisis financieras= of the #crisis

, /of/from

apertium-transfer: Rule 33 el/the

apertium-transfer: Rule 10 el/the
crisis/crisis

apertium-transfer: Rule 16 el/the
crisis/crisis financiero/financial

apertium-transfer: Rule 114 ./.
^De{^of$}$ ^det_nom_adj{^the$
^financial$ ^crisis<3>$}$^punt{^.$}$

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] spa-eng pair

2019-03-12 Thread Sevilay Bayatlı

hi,

Who is the author of spa-eng pair or anyone can help? There are  some
issues of applying ambiguous rules, need  help to solve these issues
because I have limited time,  and its emergency to get the results as soon
as possible.

for example: this "la crisis economica = the #crisis"  as you see its
translation has #  after the rules below applied on,  btw  we  run it with
ambiguous-rules branch and with transfer file in attachment,  please can
you check it?


Source :  PARÍS – A medida que la crisis económica se profundiza y amplía,
el mundo busca analogías históricas como ayuda para comprender lo que ha
ocurrido.

Target :  PARIS – To measure that the #crisis #deepen and #expand, the
world #look historical analogies like help to comprise what #have occurred.


w1 w2   w3

r33 => w1
r1 => w2
r29 => w3
r11 & r10 => w1 w2
r14 => w2 w3
r16 => w1 w2 w3

I have to mention that there are not a lot of issues  just 4 or 5  but the
problem the words with # repeated a lot and that affect the result badly.

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] OpenNMT

2019-03-11 Thread Sevilay Bayatlı

I have read through its documentation but I couldn't get idea of validation
data ,is it using a subset of the to-train data(source sentences) for
validation? Please can you point me?

Sevilay

On Mon, Mar 11, 2019 at 4:49 PM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Sun, Mar 10, 2019 at 11:09:19PM +0300, Sevilay Bayatlı wrote:
>
> > Do you know any open neural MT? It is necessary for doing comparison with
> > kaz-tur MT.
>
> I found the python version to be the easiest to build a baseline with,
> it has step-by-step instructions:
>
> https://github.com/OpenNMT/OpenNMT-py#quickstart
>
> I have not gotten it to produce the bleu points that it should though...
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> <https://flammie.github.io/purplemonkeydishwasher/>, Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> <http://gtweb.uit.no/sigur/>.
> I tend to follow inline-posting style in desktop e-mail messages.
> -BEGIN PGP SIGNATURE-
>
> iQKTBAABCAB9FiEEBjaPger6S6xkwNHdtVxzs4xLlkIFAlyGaV5fFIAALgAo
> aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDA2
> MzY4RjgxRUFGQTRCQUM2NEMwRDFEREI1NUM3M0IzOEM0Qjk2NDIACgkQtVxzs4xL
> lkIfKw//dqz/BfJbjAD8KSZ5xXpWYxUa7ox80PU7k5WDJZBwQebFPwb/2qTTf3P0
> 7zanJf8j1qwwMi3isNnwO4Zbs0KrJV5l/XCJHpSlCH+OE828L9nZFP9/F6i4j3fk
> RQD5yb/lzB2r3NbDL7py4GD1kjVPo/QsOm05DlTulKO4Wv7bToVuXDpgd7OipXsC
> HGguZxFVtSEq3aRuZSJgKKO7H8mQs+TUBAC0tc+22DJxZKGrivmaSC299lhJgzzA
> oyjFJPGLzULlTlp9pC4XfJ6AXhG6hrjX4xBBS4GNLfqSkUm+WZiXsJASx+Przr+X
> zbmeDhDj2f1vD5TFk9I9y9Guh31HLHmgZWZSszC3C2a+9O58P749omS/xvBMSpdd
> +P118TioJaGBmFPTY9RjSB8wX9QSU7u8dl8xL1VnjkxQrC5syofuJhEV+6RFmeRO
> /XrcBKcj1hSm4txWajmF1BkaftXzUWzz8B9rwfyhcu/NaOSQq9VQgSu+i5y8whTc
> 8fkhYd7BtsObJUwt5QtSwttRWq0OOtR1YZ57JsQ0WipHjMgD/HPXL3IbZLOnuEKB
> FwmuLbAbEYgcLtZwRPxlTqLxSLllK+oB0ReQSe9Lmn4DSIWGybIIG27IYSzDwcd0
> 2cYrzWB7bLQmAadEIS+P7NfBdEKjz1+tDRmWeA0dJlDhdjA8imI=
> =bpCy
> -END PGP SIGNATURE-
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] OpenNMT

2019-03-10 Thread Sevilay Bayatlı

Hi,

Do you know any open neural MT? It is necessary for doing comparison with
kaz-tur MT.

Thanks in advance,

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Regarding getting started with induction tools

2019-03-09 Thread Sevilay Bayatlı

hi,
did you try this
git clone https://github.com/apertium/lttoolbox.git
http://wiki.apertium.org/wiki/Install_Apertium_core_by_compiling

Sevilay

On Sun, Mar 10, 2019 at 12:21 AM shashank tiwari 
wrote:

> Sir,
>I wanted to install the ittoolbox and the link given below  doesn't
> work
> http://wiki.apertium.org/wiki/Getting_started_with_induction_tools Can
> you provide me more guidance
>
> Thanks
> Shashank Tiwari
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] part="tags"/

2019-02-24 Thread Sevilay Bayatlı

The part attribute in clip tag has strange value of "tags" , which is not
one of the 4 literals in the documentation and not in def-attrs. What shall
we do ?



Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Current GSOC ideas

2019-02-15 Thread Sevilay Bayatlı

Hello,

You can find more explanation about what should you do in this project here
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code#Extend_weighted_transfer_rules

Sevilay

On Fri, Feb 15, 2019 at 12:28 PM Shivanshu Sharma <
sharma.shivansh...@gmail.com> wrote:

> Hello, I would love to work on 1.7 to implement weighted transfer rules on
> a new language pair, hopefully, Hindi-Sanskrit pair. Could someone guide me
> on how to get started?
>
> - Shivanshu
>
> On Mon, Jan 28, 2019 at 10:43 PM Francis Tyers 
> wrote:
>
>> Here is my run-down on the current GSOC ideas page:
>>
>>  1.1 Anaphora resolution for machine translation
>>
>> Nice project idea, but not sure in 3 months.
>>
>>  1.2 Bring a released language pair up to state-of-the-art quality
>>
>> Always needed
>>
>>  1.3 Robust tokenisation in lttoolbox
>>
>> Up for grabs, we need this
>>
>>  1.4 Adopt an unreleased language pair
>>
>> Always needed
>>
>>  1.5 Extend lttoolbox to have the power of HFST
>>
>> I think getting this one is unlikely and requires more than 3 months.
>>
>>  1.6 Robust recursive transfer
>>
>> Keep, this would be really great. I got asked to run a workshop on
>> Apertium
>>   recently and then unasked when they found out that the formalisms
>> didn't
>> actually create parse trees :)
>>
>>  1.7 Extend weighted transfer rules
>>
>> There is ongoing work in this, it would need to be supervised carefully:
>>
>> https://github.com/sevilaybayatli/apertium-ambiguous
>>
>> I would say a nice project would be to really use this on a new language
>> pair
>>
>>  1.8 Improvements to the Apertium website
>>
>> Not sure
>>
>>  1.9 User-friendly lexical selection training
>>
>> I think getting this one is unlikely and requires more than 3 months.
>> Also has
>> been tried several times without luck.
>>
>>  1.10 Light alternative format for all XML files in an Apertium
>> language pair
>>
>> I'm not sure about this one.
>>
>>  1.11 Bilingual dictionary enrichment via graph completion
>>
>> There is code for this, it was a GSOC project last year but wasn't
>> merged, I'm
>> not sure how well it works.
>>
>>  1.12 UD and Apertium integration
>>
>> This is a very useful project. If we can take advantage of UD corpora we
>> can
>> make supervised taggers for around 70% of our languages.
>>
>>  1.13 Add weights to lttoolbox
>>
>> This was done last year. A nice project would be to actually make use of
>> it.
>>
>>  1.14 Improving language pairs mining Mediawiki Content Translation
>> postedits
>>  1.15 Unsupervised weighting of automata
>>
>> Open
>>
>>  1.16 Improvements to UD Annotatrix
>>
>> This is a really useful tool.
>>
>>  1.17 apertium-separable language-pair integration
>>
>> Agree, but I think that it should not just be apertium-separable, but
>> perhaps
>> something like "upgrade a language pair to use all the latest apertium
>> tricks"
>>
>>  1.18 Create FST-based module for disambiguating
>>
>> I like this idea, but I'm not sure three months is enough time, without
>> someone
>> who really knows what they are doing with both the FST library and
>> apertium.
>>
>>  1.19 Python API/library for Apertium
>>
>> This was mostly done right? I think this is still a really important
>> project
>>
>>  1.20 TIPP functionality for Apertium
>>
>> Not sure
>>
>> There is a lot of functionality that is not used widely that could be
>> really
>> used to improve performance of language pairs.
>>
>> * apertium-separable
>> * weights in lttoolbox
>> * weighted transfer
>>
>> Fran
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] spanish->english- generate issue

2019-02-01 Thread Sevilay Bayatlı

hi,

I am using this command  to generate the English sentences for
spanish->english pair, is this consider a problem or it is fine?

apertium-postchunk
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x
/media/sevilay/SAMSUNG/apertium-eng-spa/spa-eng.t3x.bin
interchunk_ax00.txt| lt-proc -g
/media/sevilay/SAMSUNG/apertium-eng-spa/spa-eng.autogen.bin | lt-proc -p
/media/sevilay/SAMSUNG/apertium-eng-spa/spa-eng.autopgen.bin >
transfer_ax00.txt


Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
228: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
228: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
228: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in
/media/sevilay/SAMSUNG/apertium-eng-spa/apertium-eng-spa.spa-eng.t3x: line
182: index > limit
Error in /media/sevilay/SAMSUNG/a



Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] apertium-eng-spa-eng-spa

2019-02-01 Thread Sevilay Bayatlı

hi,
In apertium-eng-spa.eng-spa file I see the use of link-to different than
apertium-eng-spa.spa-eng, in  the rule below they say link-to="4" and there
are just three tags not four,  I think it is not right because this number
should be equal to the number of tags? please can you point me



  

  
  

  


  

  
  
  


  
  
  
  
  

  

  


Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Current GSOC ideas

2019-01-29 Thread Sevilay Bayatlı

Hi Hèctor,

I am looking forward to help you with weighted transfer. Also I want your
help for take  a look on result of ang-spa pair that I had  apply on
apertium-ambiguous, of course if you like))

best wishes,

Sevilay

On Tue, Jan 29, 2019 at 5:24 AM Francis Tyers  wrote:

> El 2019-01-28 19:31, Hèctor Alòs i Font escribió:
> > Missatge de Francis Tyers  del dia dl., 28 de
> > gen. 2019 a les 21:13:
> >
> >> There is a lot of functionality that is not used widely that could
> >> be
> >> really
> >> used to improve performance of language pairs.
> >>
> >> * apertium-separable
> >> * weights in lttoolbox
>
> This is a question for Abinash:
>
> https://github.com/Techievena
>
> >> * weighted transfer
>
> I'm sure Sevilay will be happy to explain how it works:
>
> https://github.com/sevilaybayatli
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] eng-spa pair issue

2019-01-27 Thread Sevilay Bayatlı

I knew all of these commands it is like other pairs, my question was how to
get full sentence without transfer mode then I found  out the eng-spa pair
can get the full sentence  without  transfer mode.

Sevilay

On Sun, 27 Jan 2019, 21:29 Francis Tyers  El 2019-01-27 17:20, Hèctor Alòs i Font escribió:
> > I don't have now access to the files, but I'm pretty sure that eng-spa
> > has 3 transfer steps. That's the standard form since Apertium 2.0.
> > More than 3 steps is relatively unusual.
> >
> > Missatge de Sevilay Bayatlı  del dia dg.,
> > 27 de gen. 2019 a les 21:09:
> >
> >> eng-spa apertium doesn't have .t4x file , how could we generate the
> >> transfer sentences file ?
> >>
> >> for Kaz-tur I use apertium-transfer with .t4x  file and gives me
> >> full sentences like that
> >>
> >> apertium-transfer -n
> >>
> >
> /home/sevilay/linguistic-data/apertium-kaz-tur/apertium-kaz-tur.kaz-tur.t4x
> >> /home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.t4x.bin
> >> deneme.txt | lt-proc -g
> >> /home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.autogen.bin |
> >> lt-proc -p
> >> /home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.autopgen.bin
> >>> detransfer.txt
> >>
> >> but with eng-spa I dont have this mode.
> >>
>
> fran@ipek:~/source/apertium/pairs/apertium-eng-spa$ cat
> modes/eng-spa.mode
>
>
>  lt-proc
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.automorf.bin'
> | apertium-tagger -g $2
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.prob' |
> apertium-pretransfer| apertium-transfer -n
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.genitive.t1x'
>
>
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.genitive.bin'
> | lt-proc -b
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autobil.bin'
> | lrx-proc -m
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autolex.bin'
> | apertium-transfer -b
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t1x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t1x.bin' |
> apertium-interchunk
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t2x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t2x.bin' |
> apertium-postchunk
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t3x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t3x.bin' |
> lt-proc $1
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autogen.bin'
> | lt-proc -p
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autopgen.bin'
>
>
> You just need to remove the first parts of the pipeline, e.g. to get a
> full sentence from this, given the output of lexical selection, you
> could use:
>
> apertium-transfer -b
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t1x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t1x.bin' |
> apertium-interchunk
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t2x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t2x.bin' |
> apertium-postchunk
> '/home/fran/source/apertium/pairs/apertium-eng-spa/apertium-eng-spa.eng-spa.t3x'
>
>   '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.t3x.bin' |
> lt-proc $1
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autogen.bin'
> | lt-proc -p
> '/home/fran/source/apertium/pairs/apertium-eng-spa/eng-spa.autopgen.bin'
>
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] eng-spa pair issue

2019-01-27 Thread Sevilay Bayatlı

 eng-spa apertium doesn't have .t4x file , how could we generate the
transfer sentences file ?

for Kaz-tur I use apertium-transfer with .t4x  file and gives me full
sentences like that

apertium-transfer -n
/home/sevilay/linguistic-data/apertium-kaz-tur/apertium-kaz-tur.kaz-tur.t4x
/home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.t4x.bin deneme.txt |
lt-proc -g
/home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.autogen.bin |
lt-proc -p
/home/sevilay/linguistic-data/apertium-kaz-tur/kaz-tur.autopgen.bin >
detransfer.txt

but with eng-spa I dont have this mode.

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] long sentence issue

2018-12-27 Thread Sevilay Bayatlı

hi,

If we  split long sentences into 2 sentences and analyze each one
separately, will this split  makes  apertium stuck? I have tried with some
sentences and didn't cause any problem but I am not sure if it will works
with whole sentences or not, is anyone have any idea  and can share it with
me?

thanks for your help in advance,

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] let issue

2018-11-30 Thread Sevilay Bayatlı

hi,

In the let element,  I know that the second "right" part can be any part
that generates a string like : get-case-from , case-of , b , var , lit ,
lit-tag , concat and clip
But for the first "left" part , it's written that it could be clip , var ,
etc.
And I can't see any thing other than clip and var to assign the right part
to it. So what's meant by "etc."?

here form Documentation of the Open-Source Shallow-Transfer Machine
Translation Platform Apertium
3.5.4.37 Element for assignment 
The assignment instruction  assigns the value of the right part of
the assignment (a literal string, a clip, a variable, etc.) to the left
part (a
clip, a variable, etc.).

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] transfer rule's stuff

2018-11-29 Thread Sevilay Bayatlı

Thank you for respond.

Sevilay

On Thu, 29 Nov 2018, 09:26 Hèctor Alòs i Font  Hi Sevilay,
> Regarding the "when" and "otherwise" commands, yes: the first "when" works
> as an "if", the following as "elsif" and the ending "otherwise" as an
> "else" (or at least I have done so many times and apparently it works as I
> expected).
> Hèctor
>
> Missatge de Sevilay Bayatlı  del dia dj., 29 de
> nov. 2018 a les 6:00:
>
>> hi,
>>
>> In choose element , it may have one or more conditional options ()
>> and an alternative option ,  If we have more than one "when",
>> They are treated like if and else if , or just like if statements without
>> else if.
>>
>> I said that because here in the Documentation of the Open-Source
>> Shallow-Transfer Machine Translation Platform Apertium saying "The
>> selection instruction consists of one or more conditional options ()".
>> But I didn't understand if it is sequence of if statements or if and else
>> if statements
>>
>> Also for  ,  ,  , these
>> elements have no explanation in the documentation?
>>
>> Thanks for your help in advance,
>>
>> Sevilay
>>
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] transfer rule's stuff

2018-11-28 Thread Sevilay Bayatlı

hi,

In choose element , it may have one or more conditional options ()
and an alternative option ,  If we have more than one "when",
They are treated like if and else if , or just like if statements without
else if.

I said that because here in the Documentation of the Open-Source
Shallow-Transfer Machine Translation Platform Apertium saying "The
selection instruction consists of one or more conditional options ()".
But I didn't understand if it is sequence of if statements or if and else
if statements

Also for  ,  ,  , these
elements have no explanation in the documentation?

Thanks for your help in advance,

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] case-of work

2018-11-28 Thread Sevilay Bayatlı

Hi all,

how should  work inside ?
It's un-logical to place it there since clip will do the job because case
of only return one of three strings : aa , AA , Aa.
The right thing to do is just change the string returned by case of by the
right part of modify-case , which is useless

Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] apertium-transfer

2018-10-16 Thread Sevilay Bayatlı

Hi everyone,

We have a problem with this command:
 apertium-transfer -n  $HOME/apertium-kaz-tur/apertium-kaz-tur.kaz-tur.t4x
$HOME/apertium-kaz-tur/kaz-tur.t4x.bin postchunkOut.txt | lt-proc -g
$HOME/apertium-kaz-tur/kaz-tur.autogen.bin  | lt-proc -p
$HOME/apertium-kaz-tur/kaz-tur.autopgen.bin > transferOut.txt

The problem is  the file entered to the transfer has 19886 lines but the
output has just 3748 lines It should have the same number 19886

This  apertium-transfer -n
 $HOME/apertium-kaz-tur/apertium-kaz-tur.kaz-tur.t4x
$HOME/apertium-kaz-tur/kaz-tur.t4x.bin postchunkOut.txt
I tried it alone without the program , but by also putting an output file It
stuck or gave incomplete outputs
Is there any alternative command of transfer?

best wishes,
Sevilay
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] language model

2018-08-15 Thread Sevilay Bayatlı

Hi all,

In my project I need language model to calculate the probability of each
sentence, I have to  use kenlm  https://kheafield.com/code/kenlm/  but  for
some sentences give me wrong probability when the sentences have words not
analysis like this:

Курчатов – Шар темір жолы салынды, ал оның жалғасы (ұзындығы 150 км) 2007 ж.


#Kurçatov – *Шар demir yolu yapıldı, o #devam (uzunluku 150 km) 2007 #yıl
||| 0
 #Kurçatov – *Шар demir yolu yapıldı, o onun devamı (uzunluku 150 km) 2007
#yıl ||| -0.880188

is anyone  used it before and can  tell me how to deal with this problem or
is there any other toolkit can you point me to use it ?

best  regards,

Sevilay
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

70 matches

Mail list logo