[Moses-support] AMTA 2020 Student Research Workshop: Call for Papers

2020-06-26 Thread Matt Post
AMTA 2020 | 2nd Call for Student Research Workshop Papers

The 14th biennial conference of the Association for Machine Translation in the 
Americas has been rescheduled to OCTOBER 6-10 and will be held as a virtual 
conference using Microsoft Teams, a powerful, enterprise collaboration 
platform. It was previously scheduled from September 8-12 in Orlando, Florida. 

This year, AMTA will hold a Student Research Workshop together with the main 
conference in 2020 and invites submissions from student participants at all 
stages of their education.

## Important Dates

All deadlines are in the “Anywhere on Earth” time zone (UTC–12).

• Submission deadline: Monday, 13 July 2020
• Notification: Monday, 3 August 2020
• Final “camera-ready” versions: Monday, 31 August 2020
• Main conference (virtual): 6–10 October, 2020

The purpose of the Student Research Workshop is to provide students with a 
special opportunity to present their work and receive focused, intentional 
feedback from international experts in the field of machine translation. 
Accepted work will have at least one experienced member of the government, 
industry, and/or academia with knowledge in the student’s particular research 
area. These senior members will prepare comments and questions ahead of time 
and will work with the student to provide them with an outside perspective on 
their work’s impact and potential.

## Submissions

We invite two types of submissions:

–  Research papers must describe original, unpublished work, and follow the 
submission criteria described in the Call for MT Research Papers. They may 
include multiple authors, but the primary and first author must be a student. 
These papers will be blindly reviewed and evaluated and then presented along 
with main conference papers. They will be published in the AMTA Student 
Research Workshop volume of the conference proceedings and hosted on the ACL 
Anthology. 

–  Research proposals may contain previously published work. They should 
describe a proposed research trajectory, ideally (but optionally) rooted in the 
student’s existing work that is either already completed or in-progress. 
Research proposals must have only a single author. 

All submitted papers must be in PDF. Papers must be submitted to the START 
system at https://www.softconf.com/amta2020/srw/ by Monday, 13 July 2020 (AOE).

For Research Papers, please follow the submission guidelines found in the Call 
for MT Research Papers, noting the alternate submission URL above. Research 
Proposals should be no longer than 5 (five) pages (not counting references, 
which have no page limit). Another page will be allowed for accepted papers.

Due to generous sponsorship from Microsoft, accepted submissions will receive 
free registration to the virtual conference for the primary student involved in 
the work. Additionally, the first one hundred students to register will pay 
only a reduced $10 registration fee.
———
https://amtaweb.org/amta-2020-second-call-for-student-research-papers/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] MTEVAL

2020-05-04 Thread Matt Post
Hi,

I suggest you use sacrebleu, which is a Python port of mteval-v*.pl that makes 
all of this easier:

https://github.com/mjpost/sacrebleu 


matt


> On May 4, 2020, at 10:22 AM, Moses Visperas  
> wrote:
> 
> I am trying to use the mteval-v14 perl script (after reading the disclaimer 
> from multi-bleu.pl ) . After finding a readme in the 
> net, it says i just need a reference file, source file, and test file. So If 
> I understood it correctly, It should be like this :
>  reference - is the "proper" translation of the source language
> source - is the text to be translated
> test - is the output of your machine translator that will be evaluated
> 
> All of which are wrapped in an XML tag, but I keep getting errors when 
> running it, can someone please help me or atleast link a tutorial that I can 
> follow.
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Matt Post
Incidentally, you can split the fields more simply using the “unpaste” command:

cat file_de-en.tsv | unpaste file.{de,en}

Unpaste is available here:

https://github.com/mjpost/bin/blob/master/unpaste

matt (from my phone)

> Le 30 mars 2020 à 21:01, Artem Shevchenko  a écrit :
> 
> 
> found how to split fields in tab-separated de-en sentences.
> just if someone needs it, do it with cut - f 1 or 2:
> cat file_de-en.tsv |  cut -f 1 > file.de
> cat file_de-en.tsv |  cut -f 2 > file.en
> 
> so the only question, is europarl v9 better than v8 or v7.
> 
> вт, 31 мар. 2020 г. в 02:21, Artem Shevchenko :
>> Hello, 
>> 
>> thank you very much for your reply.
>> my target is to rebuild translation memory for de-en pair while keeping 
>> truecase in the German phrase table. 
>> In models released with 4.0 for de-en it is all smallcased, which makes 
>> impossible to distinguish between e.g. a noun (das Wissen) and a verb zu 
>> wissen or sie (she) and Sie (you).
>> I observe the file extension is tsv, different to v7. it is a tab-separated 
>> de-en text file.
>> so I need to split it into two.
>> what would be the best way? is there a python script for it?
>> 
>> Is v9 better than v8 and v7?
>> 
>> Thanks!
>> Artem Shevchenko
>> 
>> 
>> 
>> пн, 30 мар. 2020 г. в 21:50, Philipp Koehn :
>>> Hi,
>>> 
>>> you are free to use this data - v9 has only been generated for some
>>> language pairs, since the amount of translations have not increased
>>> significantly for a few years by now.
>>> 
>>> -phi
>>> 
>>> On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko  wrote:
 Hello,
 
 I have found this:
 http://www.statmt.org/europarl/v9/ dated 2019-02
 It contains parallel corpus v9?
 
 However no mentioning of v9 elsewhere.
 Is it released?
 Can it be used?
 
 Thank you!
 Artem Shevchenko
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] get paid to help preserve the MT Archive

2020-02-06 Thread Matt Post
Hi everyone,

If you’ve been around a while, you are probably aware of how hard it can be to 
find and cite old MT papers. Many of these can only be found on the MT Archive, 
which has not been maintained for some years.

https://www.aclweb.org/anthology/
http://www.mt-archive.info

As Director of the ACL Anthology, I am looking for someone to help move the MT 
Archive into the ACL Anthology. This conversion is a paid position (with 
funding from IAMT) with a goal completion date of April 15, 2020, so that the 
results can be demonstrated at EAMT.

I am personally very excited about this conversion project. We’ve put a lot of 
work into the Anthology over the past year, and all of this could come together 
very quickly. It is satisfying to watch the ingestions and changes go live, and 
putting this wealth of data in a place where it can be easily searched, 
exported, and cited will be immensely satisfying!

If you are interested, please contact me. You can see more information in the 
job advertisement below.

# Seeking assistance to help in the conversion of the Machine Translation 
Archive

February 6, 2020

The Association for Computational Linguistics (ACL) is seeking assistance in 
the task of ingesting the Machine Translation Archive (www.mt-archive.info) 
into the ACL Anthology (www.aclweb.org/anthology). This job is funded by the 
International Association for Machine Translation (IAMT) with the goal of 
preserving and disseminating the wealth of information present in the Archive, 
much of it which is exclusively there.

## Job Description

The Machine Translation Archive (hereafter, “Archive”) was created by John 
Hutchins in 2004 and currently contains about 12,000 entries. All of the 
archive, including various portals and indexes, is hand-crafted HTML written 
using Microsoft Word, and all of the papers are stored as PDF files. It is the 
single most important source of papers about machine translation, with emphasis 
on historical MT papers.

The main task is to convert the information in the MT Archive into the XML 
format used by the Anthology. The steps, which will be done in close 
collaboration with the Anthology Director, are:

• Producing a spreadsheet of conference proceedings and journals in the 
MT Archive, and obtaining identifiers for each of them from the Anthology team.
• Semi-automatically transforming each of these proceedings into the 
XML metadata format used by the Anthology. This will only include abstracts 
when they have already been extracted from the PDFs in the Archive.
• Renaming all the PDFs into the format required by the Anthology.
• Where not already extant, incorporating the conference program into 
the frontmatter (for example, for AMTA 2008)
• (Time-permitting) Converting the following additional 
manually-curated metadata from the Archive into a structured object that refers 
to the new Anthology identifiers.
• Languages and language pairs
• System and project names
• Organizations and Affiliations
• Methods, techniques, applications, and uses

We hope to complete the conversion by April 15, 2020. Hourly salary will be 
negotiated at time of hiring. Timesheets will be signed and approved by the 
Anthology Director and paid biweekly from the ACL.

To apply, please send an email to anthol...@aclweb.org, with a subject of 
“Application for the MT Archive Ingestion Position”. In the body of the email, 
please provide the following information:

• Personal Information: A curriculum vitae.
• Job Times: When you are able to start working; hours available per 
week; estimated completion date.
• Qualifications: A paragraph describing your qualifications; an email 
address for one or two references.
• Plan: A paragraph or two summarizing your intended technical approach.

## Appendix: Detailed Information

### Main XML format

The Anthology repository is open-sourced and is hosted online at 
https://github.com/acl-org/acl-anthology. The paper metadata for the Anthology 
is hosted in the data/xml directory, with XML files roughly corresponding to 
events. For example, the proceedings of ACL 2019 are in data/xml/P19.xml, and 
look like this:



 
   
 Proceedings of the 57th Annual Meeting of the Association for 
Computational Linguistics
 P19-1
 AnnaKorhonen
 DavidTraum
 LluísMàrquez
 Association for Computational Linguistics
 Florence, Italy
 July
 2019
   
   
 P19-1000
   
   
 One Time of Interaction May Not Be Enough: Go Deep with an 
Interaction-over-Interaction Network for Response Selection in Dialogues
 ChongyangTao
 WeiWu
 CanXu
 WenpengHu
 DongyanZhao
 RuiYan
 1–11
 Currently, researchers have paid great attention to 
retrieval-based dialogues in open-domain. In particular, people study the 
problem by investigating context-response 

Re: [Moses-support] Dual Licensing or relicensing Moses

2018-05-29 Thread Matt Post
Hi Lane,

I'm not really involved with Moses or NLTK and never meant to take that on 
personally. However, it still seems to me like a reasonable and achievable goal.

matt

> On May 29, 2018, at 4:53 PM, Lane Schwartz  <mailto:dowob...@gmail.com>> wrote:
> 
> Matt,
> 
> Did you ever track down the people who contributed to the tokenizer? It seems 
> like we should be able to dual license that script. It would be very nice to 
> be able to include the Moses tokenizer and detokenizer as part of NLTK.
> 
> Lane
> 
> 
> On Fri, Apr 20, 2018 at 12:38 AM, liling tan  <mailto:alvati...@gmail.com>> wrote:
> Dear Moses Devs and Community,
> 
> Sorry for the delayed response. 
> 
> We've repackaged the MosesTokenizer Python code as a library and made it 
> pip-able.
> https://github.com/alvations/sacremoses 
> <https://github.com/alvations/sacremoses>
> 
> I hope that's okay with the Moses community and the license compliance is 
> good with this now.
> 
> Regards,
> Liling
> 
> 
> 
> On Wed, Apr 11, 2018 at 1:41 AM, Matt Post  <mailto:p...@cs.jhu.edu>> wrote:
> Seems worth a shot. I suggest contacting each of them with individual emails 
> until (and if) you get a “no”. 
> 
> matt (from my phone)
> 
> Le 10 avr. 2018 à 19:26, liling tan  <mailto:alvati...@gmail.com>> a écrit :
> 
>> @Matt I'm not sure whether that'll work.
>> 
>> 
>> For tokenizer, that'll include:
>>  
>>  phikoehn <https://github.com/phikoehn>
>>  hieuhoang <https://github.com/hieuhoang>
>>  bhaddow <https://github.com/bhaddow>
>>  jimregan <https://github.com/jimregan>
>>  kpu <https://github.com/kpu>
>>  ugermann <https://github.com/ugermann>
>>  pjwilliams <https://github.com/pjwilliams>
>>  jgwinnup <https://github.com/jgwinnup>
>>  mhuck <https://github.com/mhuck>
>>  tofula <https://github.com/tofula>
>>  a455bcd9 <https://github.com/a455bcd9>
>> 
>> And these for the detokenizer:
>> 
>> 
>>  phikoehn <https://github.com/phikoehn>
>>  flammie <https://github.com/flammie>
>>  hieuhoang <https://github.com/hieuhoang>
>>  pjwilliams <https://github.com/pjwilliams>
>>  bhaddow <https://github.com/bhaddow>
>>  alvations <https://github.com/alvations>
>> 
>> Not sure if everyone agrees though.
>> 
>> Regards,
>> Liling
>> 
>> On Wed, Apr 11, 2018 at 12:39 AM, Matt Post > <mailto:p...@cs.jhu.edu>> wrote:
>> Liling—Would it work to get the permission of just those people who are in 
>> the commit log of the specific scripts you want to port?
>> 
>> matt (from my phone)
>> 
>> Le 10 avr. 2018 à 18:19, liling tan > <mailto:alvati...@gmail.com>> a écrit :
>> 
>>> Got it. 
>>> 
>>> So I think we'll just remove the MosesTokenizer and MosesDetokenizer 
>>> function from NLTK and maybe create a PR to put it in 
>>> mosesdecoder/scripts/tokenizer 
>>> 
>>> Thank you for the clarification!
>>> Liling
>>> 
>>> On Wed, Apr 11, 2018 at 12:17 AM, Hieu Hoang >> <mailto:hieuho...@gmail.com>> wrote:
>>> Still the same problem - everyone owns Moses so you need everyone's 
>>> permission, not just mine. So no
>>> 
>>> Hieu Hoang
>>> http://moses-smt.org/ <http://moses-smt.org/>
>>> 
>>> 
>>> On 10 April 2018 at 17:13, liling tan >> <mailto:alvati...@gmail.com>> wrote:
>>> I understand. 
>>> 
>>> Could we have permission that it's okay to derive work from Moses with 
>>> respect to the (de-)tokenizer and possibly other scripts under an 
>>> MIT/Apache tool? 
>>> 
>>> Legally it's a restriction but I think for what's it worth, having mutual 
>>> agreement between the OSS is sufficient to still keep any port of LGPL work 
>>> until someone starts to enforce legal actions and I think it's safe to back 
>>> off to taking down these functionalities in the Apache/MIT code. 
>>> 
>>> Regards,
>>> Liling
>>> 
>>> On Wed, Apr 11, 2018 at 12:09 AM, Hieu Hoang >> <mailto:hieuho...@gmail.com>> wrote:
>>> we can't change the license, or dual license it, without the agreement of 
>>> everyone who's contributed to Moses. Too much work 
>>> 
>>> Hieu Hoang
>>> http://moses-smt.org/ <http://moses-smt.org/>
>>> 
>>> 
>>> On 10 April 2018 at 15:47, 

Re: [Moses-support] Dual Licensing or relicensing Moses

2018-04-10 Thread Matt Post
Seems worth a shot. I suggest contacting each of them with individual emails 
until (and if) you get a “no”. 

matt (from my phone)

> Le 10 avr. 2018 à 19:26, liling tan <alvati...@gmail.com> a écrit :
> 
> @Matt I'm not sure whether that'll work.
> 
> 
> For tokenizer, that'll include:
>  
>  phikoehn
>  hieuhoang
>  bhaddow
>  jimregan
>  kpu
>  ugermann
>  pjwilliams
>  jgwinnup
>  mhuck
>  tofula
>  a455bcd9
> 
> And these for the detokenizer:
> 
> 
>  phikoehn
>  flammie
>  hieuhoang
>  pjwilliams
>  bhaddow
>  alvations
> 
> 
> Not sure if everyone agrees though.
> 
> Regards,
> Liling
> 
>> On Wed, Apr 11, 2018 at 12:39 AM, Matt Post <p...@cs.jhu.edu> wrote:
>> Liling—Would it work to get the permission of just those people who are in 
>> the commit log of the specific scripts you want to port?
>> 
>> matt (from my phone)
>> 
>>> Le 10 avr. 2018 à 18:19, liling tan <alvati...@gmail.com> a écrit :
>>> 
>>> Got it. 
>>> 
>>> So I think we'll just remove the MosesTokenizer and MosesDetokenizer 
>>> function from NLTK and maybe create a PR to put it in 
>>> mosesdecoder/scripts/tokenizer 
>>> 
>>> Thank you for the clarification!
>>> Liling
>>> 
>>>> On Wed, Apr 11, 2018 at 12:17 AM, Hieu Hoang <hieuho...@gmail.com> wrote:
>>>> Still the same problem - everyone owns Moses so you need everyone's 
>>>> permission, not just mine. So no
>>>> 
>>>> Hieu Hoang
>>>> http://moses-smt.org/
>>>> 
>>>> 
>>>>> On 10 April 2018 at 17:13, liling tan <alvati...@gmail.com> wrote:
>>>>> I understand. 
>>>>> 
>>>>> Could we have permission that it's okay to derive work from Moses with 
>>>>> respect to the (de-)tokenizer and possibly other scripts under an 
>>>>> MIT/Apache tool? 
>>>>> 
>>>>> Legally it's a restriction but I think for what's it worth, having mutual 
>>>>> agreement between the OSS is sufficient to still keep any port of LGPL 
>>>>> work until someone starts to enforce legal actions and I think it's safe 
>>>>> to back off to taking down these functionalities in the Apache/MIT code. 
>>>>> 
>>>>> Regards,
>>>>> Liling
>>>>> 
>>>>>> On Wed, Apr 11, 2018 at 12:09 AM, Hieu Hoang <hieuho...@gmail.com> wrote:
>>>>>> we can't change the license, or dual license it, without the agreement 
>>>>>> of everyone who's contributed to Moses. Too much work 
>>>>>> 
>>>>>> Hieu Hoang
>>>>>> http://moses-smt.org/
>>>>>> 
>>>>>> 
>>>>>>> On 10 April 2018 at 15:47, liling tan <alvati...@gmail.com> wrote:
>>>>>>> Dear Moses Dev,
>>>>>>> 
>>>>>>> NLTK has a Python port of the word tokenizer in Moses. The tokenizer 
>>>>>>> works well in Python and create a good synergy to bridge Python users 
>>>>>>> to the code that Moses developers have spent years to hone. 
>>>>>>> 
>>>>>>> But it seemed to have hit a wall with some licensing issues. 
>>>>>>> https://github.com/nltk/nltk/issues/2000 
>>>>>>> 
>>>>>>> General port of LGPL code is considered derivative and is incompatible 
>>>>>>> with Apache or MIT license. I understand that LGPL keeps derivative 
>>>>>>> from being proprietary but it's a little less permissive than 
>>>>>>> non-copyleft license like Apache and MIT licenses. 
>>>>>>> 
>>>>>>> Note that this licensing issue might also affect Marian which is MIT 
>>>>>>> license and also incompatible with LGPL so although technically users 
>>>>>>> can chain the code from different libraries, but Marian couldn't have 
>>>>>>> any dependencies on the Moses components. (But we know do know that 
>>>>>>> none of our models built with Marian would work without the Moses 
>>>>>>> tokenizer which is in LGPL). 
>>>>>>> 
>>>>>>> Would there be a possibility to dual license the Moses repository with 
>>>>>>> LGPL and Apache/BSD/MIT license. I'm not sure whether it's allowed to 
>>>>>>> have dual licenses with LGPL and Apache/BSD/MIT license though. Might 
>>>>>>> have to check with some proper legal personnel though.
>>>>>>> 
>>>>>>> If dual license is not possible would it be possible relicense the code 
>>>>>>> under BSD/Apache/MIT license? That way it's more permissive for 
>>>>>>> derivatiive work?
>>>>>>> 
>>>>>>> I think the last scenario is for NLTK to drop the Python port of Moses 
>>>>>>> code entirely from Apache license repository but I think that'll remove 
>>>>>>> the synergy between various OSS. 
>>>>>>> 
>>>>>>> Hope to hear from Moses devs soon!
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Liling
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ___
>>>>>>> Moses-support mailing list
>>>>>>> Moses-support@mit.edu
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Dual Licensing or relicensing Moses

2018-04-10 Thread Matt Post
Liling—Would it work to get the permission of just those people who are in the 
commit log of the specific scripts you want to port?

matt (from my phone)

> Le 10 avr. 2018 à 18:19, liling tan  a écrit :
> 
> Got it. 
> 
> So I think we'll just remove the MosesTokenizer and MosesDetokenizer function 
> from NLTK and maybe create a PR to put it in mosesdecoder/scripts/tokenizer 
> 
> Thank you for the clarification!
> Liling
> 
>> On Wed, Apr 11, 2018 at 12:17 AM, Hieu Hoang  wrote:
>> Still the same problem - everyone owns Moses so you need everyone's 
>> permission, not just mine. So no
>> 
>> Hieu Hoang
>> http://moses-smt.org/
>> 
>> 
>>> On 10 April 2018 at 17:13, liling tan  wrote:
>>> I understand. 
>>> 
>>> Could we have permission that it's okay to derive work from Moses with 
>>> respect to the (de-)tokenizer and possibly other scripts under an 
>>> MIT/Apache tool? 
>>> 
>>> Legally it's a restriction but I think for what's it worth, having mutual 
>>> agreement between the OSS is sufficient to still keep any port of LGPL work 
>>> until someone starts to enforce legal actions and I think it's safe to back 
>>> off to taking down these functionalities in the Apache/MIT code. 
>>> 
>>> Regards,
>>> Liling
>>> 
 On Wed, Apr 11, 2018 at 12:09 AM, Hieu Hoang  wrote:
 we can't change the license, or dual license it, without the agreement of 
 everyone who's contributed to Moses. Too much work 
 
 Hieu Hoang
 http://moses-smt.org/
 
 
> On 10 April 2018 at 15:47, liling tan  wrote:
> Dear Moses Dev,
> 
> NLTK has a Python port of the word tokenizer in Moses. The tokenizer 
> works well in Python and create a good synergy to bridge Python users to 
> the code that Moses developers have spent years to hone. 
> 
> But it seemed to have hit a wall with some licensing issues. 
> https://github.com/nltk/nltk/issues/2000 
> 
> General port of LGPL code is considered derivative and is incompatible 
> with Apache or MIT license. I understand that LGPL keeps derivative from 
> being proprietary but it's a little less permissive than non-copyleft 
> license like Apache and MIT licenses. 
> 
> Note that this licensing issue might also affect Marian which is MIT 
> license and also incompatible with LGPL so although technically users can 
> chain the code from different libraries, but Marian couldn't have any 
> dependencies on the Moses components. (But we know do know that none of 
> our models built with Marian would work without the Moses tokenizer which 
> is in LGPL). 
> 
> Would there be a possibility to dual license the Moses repository with 
> LGPL and Apache/BSD/MIT license. I'm not sure whether it's allowed to 
> have dual licenses with LGPL and Apache/BSD/MIT license though. Might 
> have to check with some proper legal personnel though.
> 
> If dual license is not possible would it be possible relicense the code 
> under BSD/Apache/MIT license? That way it's more permissive for 
> derivatiive work?
> 
> I think the last scenario is for NLTK to drop the Python port of Moses 
> code entirely from Apache license repository but I think that'll remove 
> the synergy between various OSS. 
> 
> Hope to hear from Moses devs soon!
> 
> Regards,
> Liling
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
 
>>> 
>> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] EMS for the neural age?

2017-11-26 Thread Matt Post
Shuoyang Ding put this together recently:

https://github.com/shuoyangd/tape4nmt

matt


> On Nov 26, 2017, at 2:31 PM, Marcin Junczys-Dowmunt  
> wrote:
> 
> Hi Ondrej,
> you do not seem confident enough to recommend Eman :)
> 
> I now took another look at duct tape. That does not look too bad, 
> basically Make with multi-targets and easier reuse of existing recipes 
> (which is a nightmare in GNU make).
> Is anyone still using duct tape, commit dates are from two years ago?
> 
> W dniu 26.11.2017 o 13:30, Ondrej Bojar pisze:
>> Hi, Marcin.
>> 
>> I am afraid you are correct. I have my Eman and a couple of my students are 
>> using it (we have Neural Monkey, Nematus, t2t and probably already also 
>> Marian in), but it has a rather steep learning curve and it generally has 
>> other bells and whistles that what someone with data and desire for a single 
>> model would ask for.
>> 
>> There were also Makefiles for Moses, but I never tried those.
>> 
>> And Neural Monkey has most of the pre-processing and evaluation in itself.
>> 
>> I guess that commented oneliner snippets are the best thing you can do.
>> 
>> Cheers, O.
>> 
>> 
>> 26. listopadu 2017 10:41:16 SEČ, Marcin Junczys-Dowmunt  
>> napsal:
>>> Hi list,
>>> 
>>> I am preparing a couple of usage example for my NMT toolkit and got
>>> hung
>>> up on all the preprocessing and other evil stuff. I am wondering is
>>> there now anything decent around for doing preprocessing, running
>>> experiments and evaluation? Or is the best thing still GNU make (isn't
>>> that embarrassing)?
>>> 
>>> Best,
>>> 
>>> Marcin
>>> 
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] sacréBLEU

2017-11-12 Thread Matt Post
Hi, yes, I could add this easily. There are currently "wmt16/B" and "wmt17/B" 
test sets that include just the second reference. Do you anticipate using 
*just* the second reference? If so, I can create new test sets "wmt16/2" and 
"wm17/2" test sets that use both references. If you don't care about using just 
the second reference, I will repurpose the "*/B" to use both.

matt


> On Nov 12, 2017, at 10:47 AM, Jörg Tiedemann <jorg.tiedem...@lingfil.uu.se 
> <mailto:jorg.tiedem...@lingfil.uu.se>> wrote:
> 
> 
> This is nice! Could your tool even support an option that makes use of the 
> multi-reference test sets that are available for English-Finnish in 2016 and 
> 2017? They would finally be used for something if there would be a simple 
> option that allows to download and use those sets for standard evaluation. 
> Thanks!
> 
> Jörg
> 
> **
> Jörg Tiedemann
> Department of Modern Languageshttp://blogs.helsinki.fi/tiedeman/ 
> <http://blogs.helsinki.fi/tiedeman/>
> University of Helsinki 
> http://blogs.helsinki.fi/language-technology/ 
> <http://blogs.helsinki.fi/language-technology/>
> 
> 
> 
>> On 11 Nov 2017, at 12:37, Matt Post <p...@cs.jhu.edu 
>> <mailto:p...@cs.jhu.edu>> wrote:
>> 
>> Hi,
>>  
>> I’ve written a BLEU scoring tool called “sacreBLEU” that may be of use to 
>> people here. The goal is to get people to start reporting WMT-matrix 
>> compatible scores in their papers (i.e., scoring on detokenized outputs with 
>> a fixed reference tokenization) so that numbers can be compared directly, in 
>> the spirit of Rico Sennrich's multi-bleu-detok.pl. The nice part for you is 
>> that it auto-downloads WMT datasets and makes it so you no longer have to 
>> deal with SGML. You can install it via pip:
>>  
>> pip3 install sacrebleu
>>  
>> For starters, you can use it to easily download datasets:
>>  
>> sacrebleu -t wmt17 -l en-de --echo src > wmt17.en-de.en
>> sacrebleu -t wmt17 -l en-de --echo ref > wmt17.en-de.de 
>> <http://wmt17.en-de.de/>
>>  
>> You don’t need to download the reference, though. You can just score against 
>> it using sacreBLEU directly. After decoding and detokenizing, try:
>>  
>> cat output.detok.txt | sacrebleu -t wmt17 -l en-de
>>  
>> I have tested and it produces the exact same numbers as Moses' 
>> mteval-v13a.pl, which is the official scoring script for WMT. It computes 
>> the exact same numbers for all 153 WMT17 system submissions (column 
>> BLEU-cased at matrix.statmt.org <http://matrix.statmt.org/>). For example:
>>  
>> $ cat newstest2017.uedin-nmt.4722.en-de | sacrebleu -t wmt17 -l en-de
>> 
>> BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.1.4
>>  = 28.30 59.9/34.0/21.8/14.4 (BP = 1.000 ratio = 1.026 hyp_len = 62873 
>> ref_len = 61287)
>> 
>> This means numbers computed with it are directly comparable across papers. 
>> As you can see, in addition to the score, it outputs a version string that 
>> records the exact BLEU parameters used. The output string is compatible with 
>> the output of multi-bleu.pl, so your old code for parsing the BLEU score out 
>> of multi-bleu.pl should still work.
>>  
>> You can also use the tool in a backward compatible mode with arbitrary 
>> references, the same way 
>>  
>> cat output.detok.txt | sacrebleu ref1 [ref2 …]
>>  
>> The official code is in sockeye (Amazon’s NMT system):
>> 
>> github.com 
>> <http://github.com/>/awslabs/sockeye/tree/master/contrib/sacrebleu 
>> <http://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu>
>> 
>> I will also likely maintain a clone here:
>>  
>> github.com/mjpost/sacreBLEU <http://github.com/mjpost/sacreBLEU>
>>  
>> matt
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
> 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] sacréBLEU

2017-11-11 Thread Matt Post
Hi,
 
I’ve written a BLEU scoring tool called “sacreBLEU” that may be of use to 
people here. The goal is to get people to start reporting WMT-matrix compatible 
scores in their papers (i.e., scoring on detokenized outputs with a fixed 
reference tokenization) so that numbers can be compared directly, in the spirit 
of Rico Sennrich's multi-bleu-detok.pl. The nice part for you is that it 
auto-downloads WMT datasets and makes it so you no longer have to deal with 
SGML. You can install it via pip:
 
pip3 install sacrebleu
 
For starters, you can use it to easily download datasets:
 
sacrebleu -t wmt17 -l en-de --echo src > wmt17.en-de.en
sacrebleu -t wmt17 -l en-de --echo ref > wmt17.en-de.de 

 
You don’t need to download the reference, though. You can just score against it 
using sacreBLEU directly. After decoding and detokenizing, try:
 
cat output.detok.txt | sacrebleu -t wmt17 -l en-de
 
I have tested and it produces the exact same numbers as Moses' mteval-v13a.pl, 
which is the official scoring script for WMT. It computes the exact same 
numbers for all 153 WMT17 system submissions (column BLEU-cased at 
matrix.statmt.org ). For example:
 
$ cat newstest2017.uedin-nmt.4722.en-de | sacrebleu -t wmt17 -l en-de

BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.1.4
 = 28.30 59.9/34.0/21.8/14.4 (BP = 1.000 ratio = 1.026 hyp_len = 62873 ref_len 
= 61287)

This means numbers computed with it are directly comparable across papers. As 
you can see, in addition to the score, it outputs a version string that records 
the exact BLEU parameters used. The output string is compatible with the output 
of multi-bleu.pl, so your old code for parsing the BLEU score out of 
multi-bleu.pl should still work.
 
You can also use the tool in a backward compatible mode with arbitrary 
references, the same way 
 
cat output.detok.txt | sacrebleu ref1 [ref2 …]
 
The official code is in sockeye (Amazon’s NMT system):

github.com 
/awslabs/sockeye/tree/master/contrib/sacrebleu 


I will also likely maintain a clone here:
 
github.com/mjpost/sacreBLEU 
 
matt___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Advanced Topics documentation

2017-07-06 Thread Matt Post
Philipp and some others have also put quite amount of work into the "SMT 
Research Survey Wiki", which you might find helpful:

http://www.statmt.org/survey/

matt


> On Jul 6, 2017, at 7:01 AM, Matthias Huck  wrote:
> 
> Hi,
> 
> Philipp Koehn's textbook is a nice introduction to SMT: 
> http://www.cambridge.org/catalogue/catalogue.asp?isbn=0521874157
> http://www.statmt.org/book/
> 
> For advanced topics, it's best to read the primary literature (i.e.,
> research papers published in conference proceedings and scientific
> journals).
> 
> Cheers,
> Matthias
> 
> 
> On Thu, 2017-07-06 at 02:59 +0530, Sasi Kiran Patha wrote:
>> hi Team,
>> 
>> Can you please suggest any book in the market to understand the
>> concepts for implementing Advanced topics like incremental learning,
>> Dictionary model.
>> 
>> Regards,
>> Sasi Kiran P.
>> 
>>> On Sat, Jul 1, 2017 at 9:30 PM,  wrote:
>> 
>>> 
>>> Send Moses-support mailing list submissions to
>>> moses-support@mit.edu
>>> 
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> or, via email, send a message with subject or body 'help' to
>>> moses-support-requ...@mit.edu
>>> 
>>> You can reach the person managing the list at
>>> moses-support-ow...@mit.edu
>>> 
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Moses-support digest..."
>>> 
>>> 
>>> Today's Topics:
>>> 
>>>1. Advanced Topics documentation (Sasi Kiran Patha)
>>>2. Working of moses2 (Pritesh Ranjan)
>>> 
>>> 
>>> --
>>> 
>>> Message: 1
>>> Date: Sat, 1 Jul 2017 12:57:15 +0530
> From: Sasi Kiran Patha 
>>> Subject: [Moses-support] Advanced Topics documentation
>>> To: moses-support@mit.edu
>>> Message-ID:
> >> gmail.com>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> Hi,
>>> 
>>> The advanced topics implementations looks too precise to me from
>>> documentation
>>> on website. I may not understand it until i go through the code.
>>> Can you please specify if there is any book with more documentation on
>>> topics like
>>> Syntax models, Incremental Learning and Dictionary.
>>> 
>>> Regards,
>>> Sasi Kiran P
>>> -- next part --
>>> An HTML attachment was scrubbed...
>>> URL: http://mailman.mit.edu/mailman/private/moses-support/
>>> attachments/20170701/0f3724ed/attachment-0001.html
>>> 
>>> --
>>> 
>>> Message: 2
>>> Date: Sat, 1 Jul 2017 14:24:48 +0530
> From: Pritesh Ranjan 
>>> Subject: [Moses-support] Working of moses2
>>> To: moses-support@mit.edu
>>> Message-ID:
> 

Re: [Moses-support] integration of efmaral word alignment in Moses pipeline/EMS

2016-12-07 Thread Matt Post
Hi,

From the GitHub pages it appears that eflomal supercedes efmaral — is there any 
purpose therefore in using efmaral? Also, the linked PBML paper has no mention 
of eflomal — how does it perform in downstream BLEU tasks? Is it comparable to 
what you reported in Table 4?

matt


> On Dec 7, 2016, at 2:50 AM, Jorg Tiedemann  wrote:
> 
> 
> efmaral and eflomal are efficient Markov chain word aligners using Gibbs 
> sampling that can be used to replace GIZA++/fast_align in the typical Moses 
> training pipelines:
> 
> https://github.com/robertostling/efmaral
> https://github.com/robertostling/eflomal
> 
> Would anyone be interested in adding support in the Moses pipelines and 
> experiment.perl?
> Input and output formats are compatible with fast_align and Moses formats.
> 
> The tools could also be mentioned at statmt.org/moses
> 
> All the best,
> Jörg
> 
> —
> Jörg Tiedemann
> Department of Modern Languages
> University of Helsinki
> http://blogs.helsinki.fi/language-technology/
> —
> 
> 
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] cube pruning question

2016-08-17 Thread Matt Post
Great, thanks.


> On Aug 17, 2016, at 6:46 PM, Hieu Hoang <hieuho...@gmail.com> wrote:
> 
> -cube-pruning-lazy-scoring
> 
> is implemented in the hiero/syntax model in moses and the pb model in moses2.
> 
> On 17/08/2016 23:13, Kenneth Heafield wrote:
>> Moses: pass -cube-pruning-lazy-scoring and it will call the LM as items
>> come out of the queue.  Default is before they go into the queue.
>> 
>> mtplz is both and everything in between.  Initially they go into the
>> queue with no LM, then items get incremental updates as they surface.  A
>> completely refined hypothesis goes into the queue one last time and has
>> to bubble up to the top before it gets output to the chart.
>> 
>> On 08/17/2016 11:06 PM, Matt Post wrote:
>>> In Moses / Moses2 / mtplz, does the computation of a hyperedge cost 
>>> (particularly the LM) occur when items are pushed onto the candidates list, 
>>> or when they are popped off?
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> 
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] cube pruning question

2016-08-17 Thread Matt Post
In Moses / Moses2 / mtplz, does the computation of a hyperedge cost 
(particularly the LM) occur when items are pushed onto the candidates list, or 
when they are popped off?
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Finding the top 5 most ambiguous words

2016-05-13 Thread Matt Post
gzip -cd model/phrase-table.gz | cut -d\| -f1 | sort | uniq -c | sort -nr | 
head -n5

(according to one definition of "ambiguous")

> On May 11, 2016, at 2:53 AM, Joe Jean  wrote:
> 
> Hello, 
> 
> How would you go about finding the top 5 most ambiguous words in a 
> translation system just by looking at the phrase table and the lexical 
> translation tables? Thanks.
> 
>  
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support 
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] g++: error: unrecognized command line option '-no-cpp-precomp'

2015-09-01 Thread Matt Post
You do not need gcc; Apple's stock compiler (installed via Xcode) is fine. If 
you've installed it, I'd recommend uninstalling it, and if you can't, make sure 
that /opt/local/bin is last in your path, so that /usr/bin/gcc is found first.

I've also had a lot of trouble with the Macports boost installation, which uses 
the "--layout=tagged" argument to the boost installer, instead of the default 
"--layout=system". The difference is the tagged layout adds compile options to 
the library name (e.g., "-mt"). However, I think that Moses compilation tool 
figures this out.

matt


> On Sep 1, 2015, at 3:58 PM, Jorg Tiedemann  wrote:
> 
> 
> This is kind of frustrating … so, the recommended way is to use apples clang 
> and to built boost from source, is that correct?
> I thought I could pull gcc and boost out of macpots (as I used to do) and 
> they would understand each other, but this does not seem to work. Why not?
> 
> Well, thanks anyway. I will try with a fresh boost built ...
> Jörg
> 
> 
> 
> 
>> On 01 Sep 2015, at 15:21, Hieu Hoang > > wrote:
>> 
>> My advice on osx is don't install GCC. Clang is the ordained compiler now, 
>> you'll be fighting apple every step of the way. Don't think different!
>> 
>> Hieu Hoang
>> Sent while bumping into things
>> 
>> On 31 Aug 2015 5:14 pm, "Jorg Tiedemann" > > wrote:
>> 
>> Well, I have /opt/local/ search paths in various environment variables to 
>> get macports to work.
>> I deleted all this paths and tried again but I still get the same problem.
>> 
>> I am confused. And why is gcc not working anymore when installed via 
>> macports? I also installed boost with macports. Is that a problem as well?
>> 
>> I have also some problems with kenlm but part of it compiles and links fine. 
>> build_binary and query seems to compile fine but lmplz does not link because 
>> of some undefined symbols:
>> Undefined symbols for architecture x86_64:
>>   
>> "boost::program_options::value_semantic_codecvt_helper::parse(boost::any&,
>>  std::vector> std::allocator >, std::allocator> std::char_traits, std::allocator > > > const&, bool) const", 
>> referenced from:
>> ….
>> 
>> I also had to link /opt/local/lib to /opt/local/lib64 (which didn’t exist in 
>> my setup).
>> I am afraid that I started to make quite a mess on my system but what did I 
>> do wrong?
>> 
>> Is macports not working properly anymore?
>> As I said, I have gcc 5.2.0 and boost 1.59.0 via macports on my system. Is 
>> that bad?
>> 
>> Thanks for helping!
>> Jörg
>> 
>> 
>> 
>> 
>>> On 31 Aug 2015, at 16:19, Hieu Hoang >> > wrote:
>>> 
>>> the errors for clang looks like it's coming from the stl library. Have you 
>>> fiddled with the PATH variable or otherwise tried to make gcc on OSX work? 
>>> You shouldn't do that, it will just mess up the compilation environment on 
>>> your machine
>>> 
>>> On 31/08/2015 10:28, Jorg Tiedemann wrote:
 
 Unfortunately, this didn’t work for me either. I attach both logiles - one 
 for clang and one for gcc (which I installed via macports)
 What can I do? Thanks!
 
 Jörg
 
 
 
 
 
 
 
 
> On 30 Aug 2015, at 11:33, Hieu Hoang < 
> hieuho...@gmail.com 
> > wrote:
> 
> Add
>toolset=clang
> to the bjam compile command. Osx no longer has gcc
> 
> Hieu Hoang
> Sent while bumping into things
> 
> On 29 Aug 2015 11:56 pm, "Jorg Tiedemann"  > wrote:
> Hi,
> 
> I tried to make a fresh install of Moses on my new Mac and I get the 
> following error
> g++: error: unrecognized command line option '-no-cpp-precomp'
> 
> What’s wrong? I have gcc5 and boost 1.59 on my machine via macports ...
> 
> Thanks for your help!
> Jörg
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support 
> 
 
>>> 
>>> -- 
>>> Hieu Hoang
>>> Researcher
>>> New York University, Abu Dhabi
>>> http://www.hoang.co.uk/hieu 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> 

Re: [Moses-support] Major bug found in Moses

2015-06-17 Thread Matt Post
When you filter the TM, you reported that you used the fourth weight. When you 
translate with the full TM, what weights did you assign to the TM? If you used 
the default, I believe it would equally weight all the phrasal features (i.e., 
1 1 1 1). This would explain why decoding with the full TM does not give the 
same result as filtering first. The moses.ini in your unfiltered translation 
experiment should assign weights of 0 0 0 1 to the TM features.


 On Jun 17, 2015, at 1:52 PM, Read, James C jcr...@essex.ac.uk wrote:
 
 The analogy doesn't seem to be helping me understand just how exactly it is a 
 desirable quality of a TM to 
 
 a) completely break down if no LM is used (thank you for showing that such is 
 not always the case)
 b) be dependent on a tuning step to help it find the higher scoring 
 translations
 
 What you seem to be essentially saying is that the TM cannot find the higher 
 scoring translations because I didn't pretune the system to do so. And I am 
 supposed to accept that such is a desirable quality of a system whose very 
 job is to find the higher scoring translations.
 
 Further, I am still unclear which features you prequire a system to be tuned 
 on. At the very least it seems that I have discovered the selection process 
 that tuning seems to be making up for in some unspecified and altogether 
 opaque way.
 
 James
 
 
 
 From: Hieu Hoang hieuho...@gmail.com
 Sent: Wednesday, June 17, 2015 8:34 PM
 To: Read, James C; Kenneth Heafield; moses-support@mit.edu
 Cc: Arnold, Doug
 Subject: Re: [Moses-support] Major bug found in Moses
 
 4 BLEU is nothing to sniff at :) I was answering Ken's tangent aspersion
 that LM are needed for tuning.
 
 I have some sympathy for you. You're looking at ways to improve
 translation by reducing the search space. I've bashed my head against
 this wall for a while as well without much success.
 
 However, as everyone is telling you, you haven't understood the role of
 tuning. Without tuning, you're pointing your lab rat to some random part
 of the search space, instead of away from the furry animal with whiskers
 and towards the yellow cheesy thing
 
 On 17/06/2015 20:45, Read, James C wrote:
 Doesn't look like the LM is contributing all that much then does it?
 
 James
 
 
 From: moses-support-boun...@mit.edu moses-support-boun...@mit.edu on 
 behalf of Hieu Hoang hieuho...@gmail.com
 Sent: Wednesday, June 17, 2015 7:35 PM
 To: Kenneth Heafield; moses-support@mit.edu
 Subject: Re: [Moses-support] Major bug found in Moses
 
 On 17/06/2015 20:13, Kenneth Heafield wrote:
 I'll bite.
 
 The moses.ini files ship with bogus feature weights.  One is required to
 tune the system to discover good weights for their system.  You did not
 tune.  The results of an untuned system are meaningless.
 
 So for example if the feature weights are all zeros, then the scores are
 all zero.  The system will arbitrarily pick some awful translation from
 a large space of translations.
 
 The filter looks at one feature p(target | source).  So now you've
 constrained the awful untuned model to a slightly better region of the
 search space.
 
 In other words, all you've done is a poor approximation to manually
 setting the weight to 1.0 on p(target | source) and the rest to 0.
 
 The problem isn't that you are running without a language model (though
 we generally do not care what happens without one).  The problem is that
 you did not tune the feature weights.
 
 Moreover, as Marcin is pointing out, I wouldn't necessarily expect
 tuning to work without an LM.
 Tuning does work without a LM. The results aren't half bad. fr-en
 europarl (pb):
with LM: 22.84
retuned without LM: 18.33
 On 06/17/15 11:56, Read, James C wrote:
 Actually the approximation I expect to be:
 
 p(e|f)=p(f|e)
 
 Why would you expect this to give poor results if the TM is well trained? 
 Surely the results of my filtering experiments provve otherwise.
 
 James
 
 
 From: moses-support-boun...@mit.edu moses-support-boun...@mit.edu on 
 behalf of Rico Sennrich rico.sennr...@gmx.ch
 Sent: Wednesday, June 17, 2015 5:32 PM
 To: moses-support@mit.edu
 Subject: Re: [Moses-support] Major bug found in Moses
 
 Read, James C jcread@... writes:
 
 I have been unable to find a logical explanation for this behaviour other
 than to conclude that there must be some kind of bug in Moses which causes 
 a
 TM only run of Moses to perform poorly in finding the most likely
 translations according to the TM when
   there are less likely phrase pairs included in the race.
 I may have overlooked something, but you seem to have removed the language
 model from your config, and used default weights. your default model will
 thus (roughly) implement the following model:
 
 p(e|f) = p(e|f)*p(f|e)
 
 which is obviously wrong, and will give you poor results. This is not a bug
 in the code, but a 

Re: [Moses-support] lattice decoding

2015-03-15 Thread Matt Post
Hi Hieu,

I was just looking for technical details about particulars as a reference. No 
problem that there isn't one, thanks.

matt

 On Mar 14, 2015, at 9:33 AM, Hieu Hoang hieuho...@gmail.com wrote:
 
 There's only the web page at the moment. What kind of detail are you looking 
 for?
 
 On 12 Mar 2015 21:37, Matt Post p...@cs.jhu.edu mailto:p...@cs.jhu.edu 
 wrote:
 Hi,
 
 Is there a technical writeup of Moses' phrase-based lattice decoding? The 
 only real description I could find is Chris Dyer's Generalizing Word Lattice 
 Translation paper, and that is quite high level.
 
 Matt
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support 
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] lattice decoding

2015-03-12 Thread Matt Post
Hi,

Is there a technical writeup of Moses' phrase-based lattice decoding? The only 
real description I could find is Chris Dyer's Generalizing Word Lattice 
Translation paper, and that is quite high level.

Matt
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] kbmira segfault

2015-03-05 Thread Matt Post
Yes, passing --dense-init worked. Although, it seems to ignore the feature 
names: so long as I have enough lines matching the number of dense parameters, 
it works, and it always outputs the following:

477/3000 updates, avg loss = 0.36341, BLEU = 0.356527
F0 3.663
F1 0.221152
F2 0.186323
F3 1.41851
F4 2.38853
F5 -0.162657
F6 0.430753
F7 3.93281

Does that sound correct?


 On Mar 5, 2015, at 10:34 AM, Barry Haddow bhad...@staffmail.ed.ac.uk wrote:
 
 Hi Matt
 
 This was part of the changes to support hypergraph mira, since the 
 hypergraphs don't have the FEATURES_TXT_BEGIN_0 sections. In fact they don't 
 differentiate between sparse and dense features.
 
 Does it work correctly when you use the --dense-init paramater?
 
 cheers - Barry
 
 On 05/03/15 15:18, Matt Post wrote:
 Okay, the old kbmira works, so this must be part of the 3.0 changes.
 
 It seems that the names of features in the header line 
 (FEATURES_TXT_BEGIN_0) are ignored entirely. The 2.1 kbmira would output 
 dense feature weights using names F1..FN, which I would then re-map back to 
 the list in the header. In kbmira 3.0, it uses the file passed in, as Barry 
 pointed out.
 
 Thanks for your help!
 
 matt
 
 
 On Feb 27, 2015, at 1:21 PM, Matt Post p...@cs.jhu.edu 
 mailto:p...@cs.jhu.edu wrote:
 
 Although, those old successful runs might have been with the old Moses 
 kbmira. I'll look into this and report back.
 
 matt
 
 
 On Feb 27, 2015, at 12:19 PM, Matt Post p...@cs.jhu.edu 
 mailto:p...@cs.jhu.edu wrote:
 
 Hi Barry — Thanks for the response. I don't think that's it, because I use 
 the exact same approach for lots of other tuning runs. Isn't it the header 
 line of the features file that lists dense features? I've been using this 
 format, where dense features are listed in each header line, and then 
 sparse features in the individual lines:
 
 FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 
 WordPenalty PhrasePenalty Distortion
 -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8
 -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
 OOVPenalty=-100
 
 This works in lots of places (although, it also raises a separate 
 question, of whether kbmira actually distinguishes between sparse and 
 dense features? I seem to remember Colin once saying that there is a 
 single group weight between the two groups, but I've never been able to 
 find this in the code).
 
 matt
 
 
 On Feb 26, 2015, at 5:35 PM, Barry Haddow bhad...@staffmail.ed.ac.uk 
 mailto:bhad...@staffmail.ed.ac.uk wrote:
 
 Hi Matt
 
 When mert-moses.pl runs kbmira, it always supplies a list of the dense 
 features (and their initial values) using the --dense-init parameter. I 
 think this is your problem. I've attached a typical file used for this 
 feature list.
 
 Of course, kbmira should have a sensible message rather than a segfault. 
 This is probably my doing,
 
 cheers - Barry
 
 On 26/02/15 22:18, Matt Post wrote:
 kbmira segfaults on the following command:
 
 kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o mert.out
 
 Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be 
 downloaded here:
 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0
 
 I tracked it down to this line of mert/FeatureStats.cpp.
 
 std::string SparseVector::decode(std::size_t id)
 {
 return m_id_to_name[id];
 }
 
 Any obvious ideas before I go down this rabbit hole? I verified there 
 are no blank lines or anything else funny with the formatting, at least 
 as far as I can tell (all dense features, plus one sparse feature, 
 OOVPenalty=-100, showing up occasionally).
 
 matt
 
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 run1.dense
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 
 -- 
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] kbmira segfault

2015-03-05 Thread Matt Post
Okay, the old kbmira works, so this must be part of the 3.0 changes.

It seems that the names of features in the header line (FEATURES_TXT_BEGIN_0) 
are ignored entirely. The 2.1 kbmira would output dense feature weights using 
names F1..FN, which I would then re-map back to the list in the header. In 
kbmira 3.0, it uses the file passed in, as Barry pointed out.

Thanks for your help!

matt


 On Feb 27, 2015, at 1:21 PM, Matt Post p...@cs.jhu.edu wrote:
 
 Although, those old successful runs might have been with the old Moses 
 kbmira. I'll look into this and report back.
 
 matt
 
 
 On Feb 27, 2015, at 12:19 PM, Matt Post p...@cs.jhu.edu 
 mailto:p...@cs.jhu.edu wrote:
 
 Hi Barry — Thanks for the response. I don't think that's it, because I use 
 the exact same approach for lots of other tuning runs. Isn't it the header 
 line of the features file that lists dense features? I've been using this 
 format, where dense features are listed in each header line, and then sparse 
 features in the individual lines:
 
  FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 
 WordPenalty PhrasePenalty Distortion 
  -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
  -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
 OOVPenalty=-100
 
 This works in lots of places (although, it also raises a separate question, 
 of whether kbmira actually distinguishes between sparse and dense features? 
 I seem to remember Colin once saying that there is a single group weight 
 between the two groups, but I've never been able to find this in the code).
 
 matt
 
 
 On Feb 26, 2015, at 5:35 PM, Barry Haddow bhad...@staffmail.ed.ac.uk 
 mailto:bhad...@staffmail.ed.ac.uk wrote:
 
 Hi Matt
 
 When mert-moses.pl runs kbmira, it always supplies a list of the dense 
 features (and their initial values) using the --dense-init parameter. I 
 think this is your problem. I've attached a typical file used for this 
 feature list.
 
 Of course, kbmira should have a sensible message rather than a segfault. 
 This is probably my doing,
 
 cheers - Barry
 
 On 26/02/15 22:18, Matt Post wrote:
 kbmira segfaults on the following command:
 
 
 kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o 
 mert.out
 
 Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be 
 downloaded here:
 
 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0
 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0
 
 I tracked it down to this line of mert/FeatureStats.cpp.
 
 std::string SparseVector::decode(std::size_t id)
 {
   return m_id_to_name[id];
 }
 
 Any obvious ideas before I go down this rabbit hole? I verified there are 
 no blank lines or anything else funny with the formatting, at least as far 
 as I can tell (all dense features, plus one sparse feature, 
 OOVPenalty=-100, showing up occasionally).
 
 matt
 
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support 
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 run1.dense
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] kbmira segfault

2015-02-27 Thread Matt Post
Hi Barry — Thanks for the response. I don't think that's it, because I use the 
exact same approach for lots of other tuning runs. Isn't it the header line of 
the features file that lists dense features? I've been using this format, where 
dense features are listed in each header line, and then sparse features in the 
individual lines:

FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 
WordPenalty PhrasePenalty Distortion 
-82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
-82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
OOVPenalty=-100

This works in lots of places (although, it also raises a separate question, of 
whether kbmira actually distinguishes between sparse and dense features? I seem 
to remember Colin once saying that there is a single group weight between the 
two groups, but I've never been able to find this in the code).

matt


 On Feb 26, 2015, at 5:35 PM, Barry Haddow bhad...@staffmail.ed.ac.uk wrote:
 
 Hi Matt
 
 When mert-moses.pl runs kbmira, it always supplies a list of the dense 
 features (and their initial values) using the --dense-init parameter. I think 
 this is your problem. I've attached a typical file used for this feature list.
 
 Of course, kbmira should have a sensible message rather than a segfault. This 
 is probably my doing,
 
 cheers - Barry
 
 On 26/02/15 22:18, Matt Post wrote:
 kbmira segfaults on the following command:
 
 
 kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o 
 mert.out
 
 Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be 
 downloaded here:
 
 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0
 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0
 
 I tracked it down to this line of mert/FeatureStats.cpp.
 
 std::string SparseVector::decode(std::size_t id)
 {
   return m_id_to_name[id];
 }
 
 Any obvious ideas before I go down this rabbit hole? I verified there are no 
 blank lines or anything else funny with the formatting, at least as far as I 
 can tell (all dense features, plus one sparse feature, OOVPenalty=-100, 
 showing up occasionally).
 
 matt
 
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support 
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 run1.dense

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] kbmira segfault

2015-02-27 Thread Matt Post
Although, those old successful runs might have been with the old Moses kbmira. 
I'll look into this and report back.

matt


 On Feb 27, 2015, at 12:19 PM, Matt Post p...@cs.jhu.edu wrote:
 
 Hi Barry — Thanks for the response. I don't think that's it, because I use 
 the exact same approach for lots of other tuning runs. Isn't it the header 
 line of the features file that lists dense features? I've been using this 
 format, where dense features are listed in each header line, and then sparse 
 features in the individual lines:
 
   FEATURES_TXT_BEGIN_0 0 300 9 lm_0 lm_1 tm_pt_1 tm_pt_3 tm_pt_0 tm_pt_2 
 WordPenalty PhrasePenalty Distortion 
   -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
   -82.183 -72.639 -79.162 -41.493 -60.118 -28.509 -10.857 19 -8 
 OOVPenalty=-100
 
 This works in lots of places (although, it also raises a separate question, 
 of whether kbmira actually distinguishes between sparse and dense features? I 
 seem to remember Colin once saying that there is a single group weight 
 between the two groups, but I've never been able to find this in the code).
 
 matt
 
 
 On Feb 26, 2015, at 5:35 PM, Barry Haddow bhad...@staffmail.ed.ac.uk 
 mailto:bhad...@staffmail.ed.ac.uk wrote:
 
 Hi Matt
 
 When mert-moses.pl runs kbmira, it always supplies a list of the dense 
 features (and their initial values) using the --dense-init parameter. I 
 think this is your problem. I've attached a typical file used for this 
 feature list.
 
 Of course, kbmira should have a sensible message rather than a segfault. 
 This is probably my doing,
 
 cheers - Barry
 
 On 26/02/15 22:18, Matt Post wrote:
 kbmira segfaults on the following command:
 
 
 kbmira run --ffile run1.features.dat --scfile run1.scores.dat -o 
 mert.out
 
 Where run1.features.dat (30 MB) and run1.scores.dat (14 MB) can be 
 downloaded here:
 
 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0 
 https://www.dropbox.com/s/yim7ub1bmq5jv2g/run1.features.dat?dl=0
 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0 
 https://www.dropbox.com/s/kkek36o7aflgzuu/run1.scores.dat?dl=0
 
 I tracked it down to this line of mert/FeatureStats.cpp.
 
 std::string SparseVector::decode(std::size_t id)
 {
   return m_id_to_name[id];
 }
 
 Any obvious ideas before I go down this rabbit hole? I verified there are 
 no blank lines or anything else funny with the formatting, at least as far 
 as I can tell (all dense features, plus one sparse feature, 
 OOVPenalty=-100, showing up occasionally).
 
 matt
 
 
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu mailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support 
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 run1.dense
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] compilation problems

2015-02-16 Thread Matt Post
Okay, that worked. The whole project now builds, EXCEPT there is no bin/lmplz. 
fragment, build_binary, and query all exist, but not lmplz. It is not mentioned 
in the logs...

But that's okay, because I'll just copy it from KenLM directly.

Thanks for your help.

matt


 On Feb 16, 2015, at 1:01 PM, Kenneth Heafield mo...@kheafield.com wrote:
 
 Tests to be dynamically linked must be _compiled_ using
 -DBOOST_TEST_DYN_LINK .  The build system did this prior to Hieu's change.
 
 After reverting Hieu's change, force it to recompile the object file:
 
 rm util/bin/gcc-4.9.2/release/file_piece_test.o
 
 (or just run with -a and rebuild everything).
 
 Kenneth
 
 On 02/16/2015 12:53 PM, Matt Post wrote:
 Hmm; I got a bit further, but still have linking errors for the tests.
 build_binary built, but still no lmplz, and it's not mentioned at all in
 the log output.
 
 I was able to get lmplz to build by modifying Jamroot to build only the
 lm project, but still get all these linking errors when I try a full
 build.
 
 Maybe this is my environment? Or a Boost change? What version do you all
 build against?
 
 gcc.link util/bin/file_piece_test.test/gcc-4.9.2/release/file_piece_test
 
g++-L/opt/boost/lib-L/opt/boost/lib64-Wl,-R
 -Wl,/opt/boost/lib-Wl,-R -Wl,/opt/boost/lib64-Wl,-rpath-link
 -Wl,/opt/boost/lib-Wl,-rpath-link -Wl,/opt/boost/lib64-o
 util/bin/file_piece_test.test/gcc-4.9.2/release/file_piece_test-Wl,--start-group
 util/bin/gcc-4.9.2/release/file_piece_test.outil/bin/gcc-4.9.2/release/parallel_read.outil/bin/gcc-4.9.2/release/read_compressed.outil/double-conversion/bin/gcc-4.9.2/release/cached-powers.outil/double-conversion/bin/gcc-4.9.2/release/double-conversion.outil/double-conversion/bin/gcc-4.9.2/release/diy-fp.outil/double-conversion/bin/gcc-4.9.2/release/fast-dtoa.outil/double-conversion/bin/gcc-4.9.2/release/bignum.outil/double-conversion/bin/gcc-4.9.2/release/bignum-dtoa.outil/double-conversion/bin/gcc-4.9.2/release/strtod.outil/double-conversion/bin/gcc-4.9.2/release/fixed-dtoa.outil/bin/gcc-4.9.2/release/bit_packing.outil/bin/gcc-4.9.2/release/ersatz_progress.outil/bin/gcc-4.9.2/release/exception.outil/bin/gcc-4.9.2/release/file.outil/bin/gcc-4.9.2/release/file_piece.outil/bin/gcc-4.9.2/release/mmap.outil/bin/gcc-4.9.2/release/murmur_hash.outil/bin/gcc-4.9.2/release/pool.outil/bin/gcc-4.9.2/release/scoped.outil/bin/gcc-4.9.2/release!
 /!
 string_pi
 e
 ce.outil/bin/gcc-4.9.2/release/usage.o 
 -Wl,-Bstatic  -Wl,-Bdynamic -lboost_unit_test_framework -llzma -lbz2 -lz
 -lrt -ldl -lboost_system -lboost_filesystem -Wl,--end-group
 
 
 /usr/lib/../lib64/crt1.o: In function `_start':
 (.text+0x20): undefined reference to `main'
 collect2: error: ld returned 1 exit status
 
 From: Kenneth Heafield moses@...
 http://gmane.org/get-address.php?address=moses%2dbghys1TANAP2eFz%2f2MeuCQ%40public.gmane.org
 Subject: Re: compilation problems
 http://news.gmane.org/find-root.php?message_id=54E21FBC.70001%40kheafield.com
 Newsgroups: gmane.comp.nlp.moses.user
 http://news.gmane.org/gmane.comp.nlp.moses.user
 Date: 2015-02-16 16:50:04 GMT (49 minutes ago)
 Hi Matt,
 
 lmplz should be compiling anyway, despite the tests failing.  Try
 reverting this commit, which broke shared compilation for tests:
 
 commit d7f5bb41faaac5ca93b9cbb723ad558b2c67d3c2
 Author: Hieu Hoang hieuhoang@... 
 http://gmane.org/get-address.php?address=hieuhoang%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org
 Date:   Tue Jan 27 16:22:15 2015 +
 
 Regarding boost_filesystem we'll probably have to add that dependency
 since Boost doesn't really document which of their libraries depend on
 other libraries.
 
 Kenneth
 
 On Feb 16, 2015, at 11:42 AM, Matt Post p...@cs.jhu.edu
 mailto:p...@cs.jhu.edu wrote:
 
 Hi,
 
 I am running into a number of problems compiling Moses 3.0. I am
 using GCC 4.9.2 and a custom (correct) install of Boost 1.57.0.
 
 1. First, I had to add this:
 
 libraryboost_filesystem
 
 to line 174 of Jamroot (per this
 discussion: https://github.com/moses-smt/mosesdecoder/issues/89 )
 
 2. Things like lmplz do not compile, and aren't even attempted,
 perhaps because all of the tests fail.
 
 ./bjam --max-factors=1 --max-kenlm-order=5 debug-symbols=off -j4 -d2
 --with-boost=/opt/boost threading=single --notrace link=shared
 --without-libsegfault
 [snip]
 ...failed updating 30 targets...
 ...skipped 36 targets...
 
 It seems like something with boost unit tests?  e.g.,
 
 g++ -L/opt/boost/lib -L/opt/boost/lib64 -Wl,-R
 -Wl,/home/hltcoe/mpost/code/mosesdecoder/mert/bin/gcc-4.9.2/release
 -Wl,-R -Wl,/opt/boost/lib -Wl,-R -Wl,/opt/boost/lib64
 -Wl,-rpath-link
 -Wl,/home/hltcoe/mpost/code/mosesdecoder/mert/bin/gcc-4.9.2/release
 -Wl,-rpath-link -Wl,/opt/boost/lib -Wl,-rpath-link
 -Wl,/opt/boost/lib64 -o mert/bin/gcc-4.9.2/release/timer_test
 -Wl,--start-group
 mert/bin/gcc-4.9.2/release/TimerTest.o 
 mert/bin/gcc-4.9.2/release/libmert_lib.so  -Wl,-Bstatic  -Wl,-Bdynamic
 -lboost_unit_test_framework -ldl -lboost_system

Re: [Moses-support] compilation problems

2015-02-16 Thread Matt Post
I did threading=single because multi used to demand the -mt variant of the 
boost libraries, which I've never quite understood. But I just tried again with 
threading=multi and everything works, with that commit below. Thanks again.

matt


 On Feb 16, 2015, at 1:56 PM, Kenneth Heafield mo...@kheafield.com wrote:
 
 Hi,
 
   Adding Fabienne because this was the same problem.
 
   I've pushed commit 93ab057eda69a7915efbc9fa92d4ce6341e6ca02 which will
 hopefully handle BOOST_TEST_DYN_LINK correctly.
 
   Still unclear what the behavior should be for threading=single.
 Compile it anyway, forcing two compiles of kenlm?  Warning message that
 will probably be ignored?
 
 Kenneth
 
 On 02/16/2015 01:41 PM, Kenneth Heafield wrote:
 Actually, that's by design.  Your command line has threading=single and
 lmplz doesn't have a single-threaded option.
 
 Kenneth
 
 On 02/16/2015 01:29 PM, Matt Post wrote:
 Okay, that worked. The whole project now builds, EXCEPT there is no 
 bin/lmplz. fragment, build_binary, and query all exist, but not lmplz. It 
 is not mentioned in the logs...
 
 But that's okay, because I'll just copy it from KenLM directly.
 
 Thanks for your help.
 
 matt
 
 
 On Feb 16, 2015, at 1:01 PM, Kenneth Heafield mo...@kheafield.com wrote:
 
 Tests to be dynamically linked must be _compiled_ using
 -DBOOST_TEST_DYN_LINK .  The build system did this prior to Hieu's change.
 
 After reverting Hieu's change, force it to recompile the object file:
 
 rm util/bin/gcc-4.9.2/release/file_piece_test.o
 
 (or just run with -a and rebuild everything).
 
 Kenneth
 
 On 02/16/2015 12:53 PM, Matt Post wrote:
 Hmm; I got a bit further, but still have linking errors for the tests.
 build_binary built, but still no lmplz, and it's not mentioned at all in
 the log output.
 
 I was able to get lmplz to build by modifying Jamroot to build only the
 lm project, but still get all these linking errors when I try a full
 build.
 
 Maybe this is my environment? Or a Boost change? What version do you all
 build against?
 
 gcc.link util/bin/file_piece_test.test/gcc-4.9.2/release/file_piece_test
 
   g++-L/opt/boost/lib-L/opt/boost/lib64-Wl,-R
 -Wl,/opt/boost/lib-Wl,-R -Wl,/opt/boost/lib64-Wl,-rpath-link
 -Wl,/opt/boost/lib-Wl,-rpath-link -Wl,/opt/boost/lib64-o
 util/bin/file_piece_test.test/gcc-4.9.2/release/file_piece_test-Wl,--start-group
 util/bin/gcc-4.9.2/release/file_piece_test.outil/bin/gcc-4.9.2/release/parallel_read.outil/bin/gcc-4.9.2/release/read_compressed.outil/double-conversion/bin/gcc-4.9.2/release/cached-powers.outil/double-conversion/bin/gcc-4.9.2/release/double-conversion.outil/double-conversion/bin/gcc-4.9.2/release/diy-fp.outil/double-conversion/bin/gcc-4.9.2/release/fast-dtoa.outil/double-conversion/bin/gcc-4.9.2/release/bignum.outil/double-conversion/bin/gcc-4.9.2/release/bignum-dtoa.outil/double-conversion/bin/gcc-4.9.2/release/strtod.outil/double-conversion/bin/gcc-4.9.2/release/fixed-dtoa.outil/bin/gcc-4.9.2/release/bit_packing.outil/bin/gcc-4.9.2/release/ersatz_progress.outil/bin/gcc-4.9.2/release/exception.outil/bin/gcc-4.9.2/release/file.outil/bin/gcc-4.9.2/release/file_piece.outil/bin/gcc-4.9.2/release/mmap.outil/bin/gcc-4.9.2/release/murmur_hash.outil/bin/gcc-4.9.2/release/pool.outil/bin/gcc-4.9.2/release/scoped.outil/bin/gcc-4.9.2/rele!
 a!
 s!
 e/!
 string_pi
 e
 ce.outil/bin/gcc-4.9.2/release/usage.o 
 -Wl,-Bstatic  -Wl,-Bdynamic -lboost_unit_test_framework -llzma -lbz2 -lz
 -lrt -ldl -lboost_system -lboost_filesystem -Wl,--end-group
 
 
 /usr/lib/../lib64/crt1.o: In function `_start':
 (.text+0x20): undefined reference to `main'
 collect2: error: ld returned 1 exit status
 
 From: Kenneth Heafield moses@...
 http://gmane.org/get-address.php?address=moses%2dbghys1TANAP2eFz%2f2MeuCQ%40public.gmane.org
 Subject: Re: compilation problems
 http://news.gmane.org/find-root.php?message_id=54E21FBC.70001%40kheafield.com
 Newsgroups: gmane.comp.nlp.moses.user
 http://news.gmane.org/gmane.comp.nlp.moses.user
 Date: 2015-02-16 16:50:04 GMT (49 minutes ago)
 Hi Matt,
 
  lmplz should be compiling anyway, despite the tests failing.  Try
 reverting this commit, which broke shared compilation for tests:
 
 commit d7f5bb41faaac5ca93b9cbb723ad558b2c67d3c2
 Author: Hieu Hoang hieuhoang@... 
 http://gmane.org/get-address.php?address=hieuhoang%2dRe5JQEeQqe8AvxtiuMwx3w%40public.gmane.org
 Date:   Tue Jan 27 16:22:15 2015 +
 
 Regarding boost_filesystem we'll probably have to add that dependency
 since Boost doesn't really document which of their libraries depend on
 other libraries.
 
 Kenneth
 
 On Feb 16, 2015, at 11:42 AM, Matt Post p...@cs.jhu.edu
 mailto:p...@cs.jhu.edu wrote:
 
 Hi,
 
 I am running into a number of problems compiling Moses 3.0. I am
 using GCC 4.9.2 and a custom (correct) install of Boost 1.57.0.
 
 1. First, I had to add this:
 
 libraryboost_filesystem
 
 to line 174 of Jamroot (per this
 discussion: https://github.com/moses-smt/mosesdecoder/issues/89 )
 
 2. Things like

Re: [Moses-support] multiple LMs in moses

2013-06-07 Thread Matt Post
Sure thing, thanks!

moses.hiero2
Description: Binary data
On Jun 7, 2013, at 3:33 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:oops. Can you send me you ini file.the format has recently been changed. There's a conversion routine from the old to new format but it probably fails for things i don't often see.
i'll document it shortly
On 7 June 2013 20:25, Kenneth Heafield heafi...@cmu.edu wrote:
Hi,

Moses does support multiple LMs. This looks like an issue with the recent changes to configuration, which is Hieu's area.

Kenneth

On 06/07/13 20:08, Matt Post wrote:

Hi Kenneth,

Can you tell me whether Moses supports multiple LMs, and how to do it? I have looked all over and can't find any documentation on this. See the error I get below.

I've tried just listing them all:

[lmodel-file]
8 0 5 /home/hltcoe/mpost/expts/gigaword.kenlm.v5
8 0 5 /home/hltcoe/mpost/expts/wmt12/data/monolingual/training-monolingual/lm.en.kenlm.v5
8 0 5 /home/hltcoe/mpost/expts/wmt13/runs/hiero/de-en/8/lm.kenlm

and then the weights:

[weight-l]
0.0326538592710674
0.126667925925625
0.0265181580850064

but then it dies with this:

line=KENLM factor=0 order=5 num-features=1 lazyken=0 path=/home/hltcoe/mpost/expts/gigaword.kenlm.v5
FeatureFunction: KENLM0 start: 0 end: 1
WEIGHT KENLM0=0.012,
line=KENLM factor=1 order=5 num-features=1 lazyken=0 path=/home/hltcoe/mpost/expts/wmt12/data/monolingual/training-monolingual/lm.en.kenlm.v5
FeatureFunction: KENLM1 start: 1 end: 2

WEIGHT KENLM1=
Check scores.size() == indexes.second - indexes.first failed in ./moses/ScoreComponentCollection.h:235
Aborted

matt



-- Hieu HoangResearch AssociateUniversity of Edinburghhttp://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] multiple LMs in moses

2013-06-07 Thread Matt Post
Wow, that's very different. Is Moses backwards-compatible with the old file 
format? And if so, is there a way to specify multiple LM weights in the old 
format?

Thanks,
Matt


On Jun 7, 2013, at 4:12 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 try now, it was a simple fix
   
 https://github.com/moses-smt/mosesdecoder/commit/e8b1eb047cf067441a6a1a12316cffe027f35abb
 
 the attached file is you ini file, updated
 
 
 On 7 June 2013 21:06, Matt Post p...@cs.jhu.edu wrote:
 Hieu -- any update? Do you mind just telling me the format of the numbers 
 (assuming it's a simple change)?
 
 Thanks,
 matt
 
 
 On Jun 7, 2013, at 3:37 PM, Matt Post p...@cs.jhu.edu wrote:
 
 Sure thing, thanks!
 
 moses.hiero2
 
 On Jun 7, 2013, at 3:33 PM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:
 
 oops. Can you send me you ini file.
 
 the format has recently been changed. There's a conversion routine from the 
 old to new format but it probably fails for things i don't often see.
 
 i'll document it shortly
 
 
 
 
 On 7 June 2013 20:25, Kenneth Heafield heafi...@cmu.edu wrote:
 Hi,
 
 Moses does support multiple LMs.  This looks like an issue with the 
 recent changes to configuration, which is Hieu's area.
 
 Kenneth
 
 On 06/07/13 20:08, Matt Post wrote:
 Hi Kenneth,
 
 Can you tell me whether Moses supports multiple LMs, and how to do it? I 
 have looked all over and can't find any documentation on this. See the 
 error I get below.
 
 I've tried just listing them all:
 
 [lmodel-file]
 8 0 5 /home/hltcoe/mpost/expts/gigaword.kenlm.v5
 8 0 5 
 /home/hltcoe/mpost/expts/wmt12/data/monolingual/training-monolingual/lm.en.kenlm.v5
 8 0 5 /home/hltcoe/mpost/expts/wmt13/runs/hiero/de-en/8/lm.kenlm
 
 and then the weights:
 
 [weight-l]
 0.0326538592710674
 0.126667925925625
 0.0265181580850064
 
 but then it dies with this:
 
 line=KENLM factor=0 order=5 num-features=1 lazyken=0 
 path=/home/hltcoe/mpost/expts/gigaword.kenlm.v5
 FeatureFunction: KENLM0 start: 0 end: 1
 WEIGHT KENLM0=0.012,
 line=KENLM factor=1 order=5 num-features=1 lazyken=0 
 path=/home/hltcoe/mpost/expts/wmt12/data/monolingual/training-monolingual/lm.en.kenlm.v5
 FeatureFunction: KENLM1 start: 1 end: 2
 
 WEIGHT KENLM1=
 Check scores.size() == indexes.second - indexes.first failed in 
 ./moses/ScoreComponentCollection.h:235
 Aborted
 
 matt
 
 
 
 
 
 -- 
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu
 
 
 
 
 
 
 -- 
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu
 
 moses.ini.new

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support