from:"Alexandra Birch"

[Moses-support] PhD Position in Edinburgh

2022-11-21 Thread Alexandra Birch

FULLY FUNDED FOUR-YEAR PHD STUDENTSHIPS


- UKRI CENTRE FOR DOCTORAL TRAINING IN NATURAL LANGUAGE PROCESSING

Based at the University of Edinburgh: in conjunction with the School of
Informatics and School of Philosophy, Psychology and Language Sciences.

Deadlines:

* Non UK :25th November 2022

* UK :27th January 2023

Applications are now sought for the UKRI CDT in NLP’s fifth and final
cohort of students, which will start in September 2023.

* * *

The CDT in NLP offers unique, tailored doctoral training comprising both
taught courses and a doctoral dissertation over four years.

Each student will take a set of courses designed to complement their
existing expertise and give them an interdisciplinary perspective on NLP.

The studentships are fully funded for the four years and come with a
generous allowance for travel, equipment and research costs.

The CDT brings together researchers in NLP, speech, linguistics, cognitive
science and design informatics from across the University of Edinburgh.
Students will be supervised by a world-class faculty comprising almost 60
supervisors and will benefit from cutting edge computing and experimental
facilities, including a large GPU cluster and eye-tracking, speech, virtual
reality and visualisation labs.

The CDT involves a number of industrial partners, including Amazon,
Facebook, Huawei, Microsoft, Naver, Toshiba, and the BBC.  Links also exist
with the Alan Turing Institute and the Bayes Centre.

A wide range of research topics fall within the remit of the CDT:

* Natural language processing and computational linguistics

* Speech technology

* Dialogue, multimodal interaction, language and vision

* Information retrieval and visualization, computational social science

* Computational models of human cognition and behaviour, including language
and speech processing

* Human-Computer interaction, design informatics, assistive and educational
technology

* Psycholinguistics, language acquisition, language evolution, language
variation and change

* Linguistic foundations of language and speech processing.

The next cohort of CDT students will start in September 2023.  Around 12
studentships are available, covering maintenance at the UKRI rate
(currently £17,668 per year) plus tuition fees.

Studentships are open to all nationalities and we are particularly keen to
receive applications from women, minority groups and members of other
groups that are underrepresented in technology.  Applicants in possession
of other funding scholarships or industry funding are also welcome to apply
– please provide details of your funding source on your application.

Applicants should have an undergraduate or master’s degree in computer
science, linguistics, cognitive science, AI, or a related discipline; or
have a breadth of relevant experience in industry/academia/public sector,
etc.

Further details, including the application procedure, can be found at:
https://edin.ac/cdt-in-nlp

Application Deadlines: Early application is encouraged but completed
applications must be received at the latest by:

* 25th November 2022 (non UK applicants) or 27th January 2023 (UK
applicants).

Enquiries: Please direct any enquiries to the CDT admissions team at:
cdt-nlp-i...@inf.ed.ac.uk.

CDT in NLP Virtual Open Day: Find out more about the programme by attending
the PG Virtual Open Week in November.  Click here to register:
https://www.ed.ac.uk/studying/postgraduate/open-days-events-visits/open-days/postgraduate-virtual-open-days



-- 
--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-4453


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
https://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Second Call for Papers for 1st Workshop on Neural Machine Translation

2017-03-01 Thread Alexandra Birch

Description

The 1st Workshop on Neural Machine Translation (
https://sites.google.com/site/acl17nmt/) is a new annual workshop that will
be co-located with ACL 2017 (Vancouver, July 30-August 4, 2017). Neural
Machine Translation (NMT) is a simple new architecture for getting machines
to learn to translate. Despite being relatively recent, NMT has
demonstrated promising results and attracted much interest, achieving
state-of-the-art results on a number of shared tasks. This workshop aims to
cultivate research in neural machine translation and other aspects of
machine translation and multilinguality that utilize neural models. The
workshop is broad in scope and invites original research contributions on
topics that include, but are not limited to the following:


   -

   Incorporating linguistic insights: syntax, alignment, reordering, etc.
   -

   Combining NMT & SMT
   -

   Handling resource-limited domains
   -

   Utilizing more data in NMT: monolingual, multilingual resources
   -

   Multi-task learning for NMT
   -

   NMT for mobile devices
   -

   Analysis and visualization of NMT models
   -

   Beyond sentence-level translation
   -

   Beyond maximum-likelihood estimation
   -

   Neural Machine Generation

Submissions

We are soliciting submissions in three categories of papers: full workshop
submissions, extended abstracts, and cross-submissions. All submissions
will be made through Softconf (http://softconf.com/acl2017/nmt/).
Full Workshop Paper

Authors should submit a long paper of up to 8 pages, with up to 2
additional pages for references, following the ACL 2017 formatting
requirements (see the ACL 2017 Call For Papers for reference:
http://acl2017.org/calls/papers/). The reported research should be original
work. All papers will be presented as posters, and a few selected papers
may also be presented orally at the discretion of the committee. All
accepted papers will appear in the workshop proceedings, which will be
archived by the ACL Anthology (http://aclweb.org/anthology/).
Best Paper Awards

>From the submitted full workshop papers, between zero and two papers will
be selected for best paper awards at the discretion of the program
committee.
Extended Abstracts

Preliminary ideas or results may also be submitted as extended abstracts,
with a length of 2 to 4 pages plus references. Similarly to full papers,
these abstracts will follow the ACL formatting requirements, be submitted
through Softconf, and be reviewed by the program committee. Accepted
abstracts will be presented as posters, but not be included in the workshop
proceedings.
Cross-submissions

We also accept cross-submissions that have already been published or
presented in other venues for consideration as poster presentations, which
will allow authors who have presented at other venues to discuss with and
get feedback from NMT researchers. These submissions will not appear in the
proceedings, and there is no restriction on the format of the submissions
(in other words, it is OK to submit a paper from a different venue as-is).
The papers in this track will be submitted through softconf to the
cross-submission track and reviewed by the program committee.
Schedule

All Deadlines are 11:59 PM Pacific time.

   -

   Deadline for paper submission: Friday April 21, 2017
   -

   Notification of acceptance: Friday May 19, 2017
   -

   Camera ready submission due: Friday May 26, 2017
   -

   Early registration deadline (ACL'17): TBD
   -

   Workshop: August 3 or 4, 2017

Workshop Organizers

   -

   Alexandra Birch (Edinburgh)
   -

   Andrew Finch (NICT)
   -

   Thang Luong (Google)
   -

   Graham Neubig (CMU)


-- 
--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-8286

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Call for Papers: 1st Workshop on Neural Machine Translation

2017-01-11 Thread Alexandra Birch

Description

The 1st Workshop on Neural Machine Translation (https://sites.google.com/
site/acl17nmt/) is a new annual workshop that will be co-located with ACL
2017 (Vancouver, July 30-August 4, 2017). Neural Machine Translation (NMT)
is a simple new architecture for getting machines to learn to translate.
Despite being relatively recent, NMT has demonstrated promising results and
attracted much interest, achieving state-of-the-art results on a number of
shared tasks. This workshop aims to cultivate research in neural machine
translation and other aspects of machine translation and multilinguality
that utilize neural models. The workshop is broad in scope and invites
original research contributions on topics that include, but are not limited
to the following:


   -

   Incorporating linguistic insights: syntax, alignment, reordering, etc.
   -

   Combining NMT & SMT
   -

   Handling resource-limited domains
   -

   Utilizing more data in NMT: monolingual, multilingual resources
   -

   Multi-task learning for NMT
   -

   NMT for mobile devices
   -

   Analysis and visualization of NMT models
   -

   Beyond sentence-level translation
   -

   Beyond maximum-likelihood estimation
   -

   Neural Machine Generation

Submissions

We are soliciting submissions in three categories of papers: full workshop
submissions, extended abstracts, and cross-submissions. All submissions
will be made through Softconf (http://softconf.com/acl2017/nmt/).
Full Workshop Paper

Authors should submit a long paper of up to 8 pages, with up to 2
additional pages for references, following the ACL 2017 formatting
requirements (see the ACL 2017 Call For Papers for reference:
http://acl2017.org/calls/papers/). The reported research should be original
work. All papers will be presented as posters, and a few selected papers
may also be presented orally at the discretion of the committee. All
accepted papers will appear in the workshop proceedings, which will be
archived by the ACL Anthology (http://aclweb.org/anthology/).
Best Paper Awards

>From the submitted full workshop papers, between zero and two papers will
be selected for best paper awards at the discretion of the program
committee.
Extended Abstracts

Preliminary ideas or results may also be submitted as extended abstracts,
with a length of 2 to 4 pages plus references. Similarly to full papers,
these abstracts will follow the ACL formatting requirements, be submitted
through Softconf, and be reviewed by the program committee. Accepted
abstracts will be presented as posters, but not be included in the workshop
proceedings.
Cross-submissions

We also accept cross-submissions that have already been published or
presented in other venues for consideration as poster presentations, which
will allow authors who have presented at other venues to discuss with and
get feedback from NMT researchers. These submissions will not appear in the
proceedings, and there is no restriction on the format of the submissions
(in other words, it is OK to submit a paper from a different venue as-is).
The papers in this track will be submitted through softconf to the
cross-submission track and reviewed by the program committee.
Schedule

All Deadlines are 11:59 PM Pacific time.

   -

   Deadline for paper submission: Friday April 21, 2017
   -

   Notification of acceptance: Friday May 19, 2017
   -

   Camera ready submission due: Friday May 26, 2017
   -

   Early registration deadline (ACL'17): TBD
   -

   Workshop: August 3 or 4, 2017

Workshop Organizers

   -

   Alexandra Birch (Edinburgh)
   -

   Andrew Finch (NICT)
   -

   Thang Luong (Google)
   -

   Graham Neubig (CMU)


-- 

--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-8286

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] low Accuracy

2015-07-02 Thread Alexandra Birch

Hi Fatma,

Models are routinely trained with millions of parallel sentence pairs. You
need more data. Please read the Moses software documentation, and Philipp
Koehn's book for more background.

Lexi

On Thu, Jul 2, 2015 at 9:42 AM, fatma elzahraa Eltaher <
fatmaelta...@gmail.com> wrote:

> Dear All,
>
> I trained my model with 4633 words ,tested it with 342 word and only 99
> word was right . How can I increase the accuracy of my model?
>
> note: I use 1000 word  for tuning.
>
>
> thank you,
>
>
>
> Fatma El-Zahraa El -Taher
>
> Teaching Assistant at Computer & System department
>
>  Faculty of Engineering, Azhar University
>
> Email : fatmaelta...@gmail.com
> mobile: +201141600434
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-8286

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Segfaulting with WordTranslationFeature

2015-04-17 Thread Alexandra Birch

Hi there,

I have a seg fault with a normal master branch of Moses from 1 month ago,
on a normal seeming test sentence. This was an en-cs system, and it
translated the first 6000+ sentences fine. It also translates a short
version of the sentence fine.

So
"Daniel , the previous owner"

translates fine but:

"Daniel , the previous owner , supported the author cinema on the complex
premises after having himself financed its construction"

segfaults!

If I remove the WordTranslation feature, so I delete:

< [feature]
< WordTranslationFeature name=WT input-factor=0 output-factor=0 simple=1
source-context=0 target-context=0

from the moses.ini file, then it stops segfaulting.

Any had this happen to them? Does anyone know how much this feature helps?

Lexi



-- 
--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-8286

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Delvin et al 2014

2014-11-26 Thread Alexandra Birch

OK,

Here is 1-4:

1. You would normally train bilingual lm on the same corpus as the SMT
model, but it is not required
2. Yes, but there are also other ways to make training faster which you
might want to explore
3. Yes it is important that the bilingual lm corpus matches the format that
will be passed to it by the decoder at decoding time, or it will work as
well.
4. Yes it can include the sentences which were filtered by the training
scripts. You just need to have word alignments for them, and they do need
to be reasonably good translations of each other. So filter out the junk.

Lexi

On Wed, Nov 26, 2014 at 3:44 PM, Tom Hoar <
tah...@precisiontranslationtools.com> wrote:

>  Thanks again. It's very useful feedback. We're now preparing to move from
> v1.0 to 3.x. We skipped Moses 2.x. So, I'm not familiar with the new
> moses.ini syntax.
>
> Here are some more questions to help us get started playing with the
> extract_training.py options:
>
>1. I'm assuming corpus.e and corpus.f are the same prepared corpus
>files as used in train-model.perl?
>2. Is it possible for corpus.e and corpus.f to be different from the
>train-model.perl corpus, for example a smaller random sampling?
> 3. The corpus files are tokenized and lower-cased and escaped the
>same.
>4. Do the corpus files also need to enforce clean-corpus-n.perl max
>tokens (100) and ratio (9:1) for src & tgt? These address (M)GIZA++ limits
>and might not apply to BilingualLM. However, are there advantages to using
>the limits or disadvantages to overriding them? I.e. can these corpus files
>include lines that are filtered with clean-corpus-n.perl?
> 5. What is the --align value? Is it the output of train-model.perl
>step 3 or an file with word alignments for each line of the corpus.e and
>corpus.f pair?
>6. Re --prune-source-vocab & --prune-target-vocab, do these thresholds
>set the size of the vocabulary you reference in #4 below (i.e. 16K, 500K,
>etc)?
>7. Re --source-context & --target-context, are these the BilingualLM
>equivalents to a typical LM's order or ngrams for each?
>8. Re --tagged-corpus, is this for POS factored corpora?
>
> Thanks.
>
>
>
> On 11/26/2014 09:27 PM, Nikolay Bogoychev wrote:
>
> Hey, Tom
>
>  1) It's independent. You just add -with-oxlm and -with-nplm to the stack
> 2) Yes, they are both thread safe, you can run the decoder with however
> many threads you wish.
> 3) It doesn't create a separate binary. The compilation flag adds a new
> feature inside moses that is called BilingualNPLM and you have to add it to
> your moses.ini with a weight.
> 4) That depends on the vocabulary size used. With 16k source 16k target
> about 100 megabytes. With 50 about 1.5 gigabytes.
>
>  Beware that the memory requirements during decoding are much larger,
> because of premultiplication. If you have memory issues supply
> "premultiply=false" to the BilingualNPLM line in moses.ini, but this is
> likely going to slow down decoding by a lot.
>
>
>  Cheers,
>
>  Nick
>
> On Wed, Nov 26, 2014 at 2:09 PM, Tom Hoar <
> tah...@precisiontranslationtools.com> wrote:
>
>>  Thanks Nikolay! This is a great start. I have a few clarification
>> questions.
>>
>> 1) does this replace or run independently of traditional language models
>> like KenLM? I.e. when compiling, we can use -with-kenlm, -with-irstlm,
>> -with-randlm and -with-srilm together. Are -with-oxlm and -with-nplm added
>> to the stack or are they exclusive?
>>
>> 2) It looks like your branch of nplm is thread-safe. Is oxlm also
>> thread-safe?
>>
>> 3) You say, "To run it in moses as a feature function..." Does that mean
>> compiling with your above option(s) creates a new runtime binary "
>> BilingualNPLM" that replaces the moses binary, much like moseschart and
>> mosesserver? Or, does BilingualNPLM run in a separate process that the
>> Moses binary accesses during runtime?
>>
>> 4) How large do these LM files become? Are they comparable to traditional
>> ARPA files, larger or smaller? Also, are they binarized with mmap reads or
>> do they have to load into RAM?
>>
>> Thanks,
>> Tom
>>
>>
>>
>>
>>
>> On 11/26/2014 08:04 PM, Nikolay Bogoychev wrote:
>>
>>  Fix formatting...
>>
>>  Hey,
>>
>>  BilingualLM is implemented and as of last week resides within moses
>> master:
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/BilingualLM.cpp
>>
>>  To compile it you need a NeuralNetwork backend for it. Currently there
>> are two supported: Oxlm and Nplm. Adding a new backend is relatively easy,
>> you need to implement the interface as shown here:
>>
>> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/bilingual-lm/BiLM_NPLM.h
>>
>>  To compile with oxlm backend you need to compile moses with the switch
>> -with-oxlm=/path/to/oxlm
>> To compile with nplm backend you need to compile moses with the switch
>> -with-nplm=/path/to/nplm (You need this fork of nplm
>> https://github.com/rsennri

Re: [Moses-support] Meaning to language arguments for train-model.perl?

2014-11-13 Thread Alexandra Birch

Hi Kenneth,

In train-model.perl, the -e and -f arguments are used to determine
filenames and extensions so they could easily be changed to src and tgt
within the script. But Tom has a handle on how this could be painful to
change in the wrapper code. clean-corpus-n.perl doesn't have a -e and -f
argument, but the src and tgt languages are passed as arguments with
position.

Lexi

On Thu, Nov 13, 2014 at 3:04 PM, Kenneth Heafield 
wrote:

> Dear Moses,
>
> Do the -e and -f arguments to train-model.perl and
> clean-corpus-n.perl
> actually get interpreted by anything?  Or are they just there as file
> name extensions that could just as easily be "src" and "tgt"?  I think
> it doesn't matter.
>
> Kenneth
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

-- 
--
School of Informatics
University of Edinburgh
Phone  +44 (0)131 650-8286

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] KenLM memory usage

2014-03-20 Thread Alexandra Birch

Hi Ken,

Yes, different models, different languages. Thanks! Yes lazy loading is
absolutely dead slow.

Lexi


On Thu, Mar 20, 2014 at 4:53 PM, Kenneth Heafield wrote:

> Hi Lexi,
>
> I take it that these models are different, not the same model
> loaded
> into each process (in which case they would have shared).  I'd really
> recommend trying to compress things more (e.g. trie -a 64 -q 8) before
> going to lazy loading.
>
> Kenneth
>
> On 03/20/14 08:13, Marcin Junczys-Dowmunt wrote:
> > Hi,
> > since KenLM uses shared memory, four instances should take up the same
> > amount of memory as only one instance (ran yesterday 8 instances with 8
> > threads each with a 99GB LM on a 128 GB machine). If the model fits into
> > memory for a single instance it should work if you have enough memory
> > left for all the phrase tables and the translation process itself (I
> > guess this is actually the problem). Lazy loading was unbearably slow
> > for me with the above mentioned configuration, but I was using 64
> > threads in total, so a lot of concurrent disk access happing, no wonder
> > there.
> > Best,
> > Marcin
> >
> > W dniu 20.03.2014 14:35, Alexandra Birch pisze:
> >> I have found the answer on the kenlm web page and it seems to be
> working:
> >>
> >> Full or lazy loading
> >>
> >> KenLM supports lazy loading via mmap. This allows you to further
> >> reduce memory usage, especially with trie which has good memory
> >> locality. In Moses, this is controlled by the language model number in
> >> moses.ini. Using language model number 8 will load the full model into
> >> memory (MAP_POPULATE on Linux and read() on other OSes). Language
> >> model number 9 will lazily load the model using mmap. I recommend
> >> fully loading if you have the RAM for it; it actually takes less time
> >> to load the full model and use it because the disk does not have to
> >> seek during decoding. Lazy loading works best with local disk and is
> >> not recommended for networked filesystems.
> >>
> >>
> >>
> >> On Thu, Mar 20, 2014 at 2:32 PM, Alexandra Birch  >> <mailto:lexi.bi...@gmail.com>> wrote:
> >>
> >> Hi there,
> >>
> >> I want to run 4 MT servers at the same time on a machine with
> >> limited memory. Kenlm seems to reserve the amount of memory which
> >> the language model would have taken if it had been loaded into
> >> memory. So I don't have enough memory to run all these servers and
> >> the machine grinds to a halt if I try. Is there any flag I could
> >> use which would limit the amount of memory reserved?
> >>
> >> Lexi
> >>
> >>
> >>
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] KenLM memory usage

2014-03-20 Thread Alexandra Birch

I have found the answer on the kenlm web page and it seems to be working:

Full or lazy loading

KenLM supports lazy loading via mmap. This allows you to further reduce
memory usage, especially with trie which has good memory locality. In
Moses, this is controlled by the language model number in moses.ini. Using
language model number 8 will load the full model into memory (MAP_POPULATE
on Linux and read() on other OSes). Language model number 9 will lazily
load the model using mmap. I recommend fully loading if you have the RAM
for it; it actually takes less time to load the full model and use it
because the disk does not have to seek during decoding. Lazy loading works
best with local disk and is not recommended for networked filesystems.

On Thu, Mar 20, 2014 at 2:32 PM, Alexandra Birch wrote:

> Hi there,
>
> I want to run 4 MT servers at the same time on a machine with limited
> memory. Kenlm seems to reserve the amount of memory which the language
> model would have taken if it had been loaded into memory. So I don't have
> enough memory to run all these servers and the machine grinds to a halt if
> I try. Is there any flag I could use which would limit the amount of memory
> reserved?
>
> Lexi
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] KenLM memory usage

2014-03-20 Thread Alexandra Birch

Hi there,

I want to run 4 MT servers at the same time on a machine with limited
memory. Kenlm seems to reserve the amount of memory which the language
model would have taken if it had been loaded into memory. So I don't have
enough memory to run all these servers and the machine grinds to a halt if
I try. Is there any flag I could use which would limit the amount of memory
reserved?

Lexi
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Optimizing with lr-score

2012-06-12 Thread Alexandra Birch

Hi Yvette,

Barry was spot on as usual. The code is in a branch:

svn co 
https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/branches/mert-mtm5/
moseslrscore

You could also use the latest version of moses and just take the files
I changed for this branch. There weren't too many.

There are a few more parameters to use the mert with LRscore. In the
config.ini file put:

### tuning script to be used
tuning-script = "$moses-script-dir/training/mert-moses-new.pl --jobs 10"
refalign = $working-dir/tuning/source/Tune.berkeley
tuning-settings = "-mertdir $moses-src-dir/mert  -mertargs=' --sctype
KENDALL,BLEU --scconfig
refalign:$refalign.0+$refalign.1+$refalign.2+$refalign.3,source:$input,weights:0.2623+0.7377
' "

So you need to align your dev set source/translation and in this
config they are called $working-dir/tuning/source/Tune.berkeley.0-3,
they are in the format: 0-0 11-16 3-4 8-11 1-2 4-6 9-13 8-10 0-1 2-3
5-7 6-7 7-9

 --testalign is the alignment info between the decoded translations
and the source.
 --refalign are the alignments between each reference and the source.
For the source/reference alignments, you could either use a test set
with gold standard alignments, or you could align using the
Berkeley/GIZA++ aligner trained on the training set. For the test set,
you could either get your decoder to output alignments, or again
automatically align using Berkeley/GIZA++. I have used Berkeley
because you can train it once and then run it separately with
different test sets. Philipp also reports slightly better results with
Berkeley.



On Tue, Jun 12, 2012 at 8:52 AM, Barry Haddow
 wrote:
> Hi Yvette
>
> The LRScore metric was implemented in the mert-mtm5 branch (see
> PermutationScorer) but it doesn't look like it was merged into trunk. You'd
> also need to use InterpolatedScorer to interpolate the permutation metric with
> bleu.
>
> mert in general seems to be missing some documentation, and in particular the
> alternative scorers are not documented (as far as I can see). However you can
> use a scorer by passing the "--sctype TYPE" argument to mert, where the
> scorers are listed in ScorerFactory.cpp,
>
> cheers - Barry
>
> On Monday 11 June 2012 15:15:58 ygra...@computing.dcu.ie wrote:
>> Hi there,
>>
>> I want to optimize for lr-score instead of bleu using hierarchical moses.
>> Could you tell me if there are instructions available anywhere?
>>
>> Thanks a lot,
>> Yvette
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> --
> Barry Haddow
> University of Edinburgh
> +44 (0) 131 651 3173
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] PhD Position in Edinburgh

[Moses-support] Second Call for Papers for 1st Workshop on Neural Machine Translation

[Moses-support] Call for Papers: 1st Workshop on Neural Machine Translation

Re: [Moses-support] low Accuracy

[Moses-support] Segfaulting with WordTranslationFeature

Re: [Moses-support] Delvin et al 2014

Re: [Moses-support] Meaning to language arguments for train-model.perl?

Re: [Moses-support] KenLM memory usage

Re: [Moses-support] KenLM memory usage

[Moses-support] KenLM memory usage

Re: [Moses-support] Optimizing with lr-score

11 matches

Site Navigation

Mail list logo

Footer information