[Moses-support] Fwd: TweetMT @ SEPLN 2015

2015-03-20 Thread Cristina
Apologies for multiple postings
*TweetMT
2015--Tweet Translation Workshop at SEPLN 2015

TweetMT is a workshop and shared task on machine translation applied to
tweets. It will take place in September, 2015, in Alicante, co-located with
SEPLN 2015 (to be confirmed). The objective of the task is to bring
together interested researchers to join forces to experiment with and
compare different approaches to tweet MT. This workshop is a follow-up to
two other workshops organized previously also at SEPLN: TweetNorm2013 and
TweetLID2014.

The machine translation of tweets is a complex task that greatly depends on
the type of data we work with. The translation process of tweets is very
different from that of correct texts posted for instance through a content
manager. Tweets are often written from mobile devices, which exacerbates
the poor quality of the spelling, and include errors, symbols and
diacritics. The texts also vary in terms of structure, where the latter
include tweet-specific features such as hashtags, user mentions, and
retweets, among others. The translation of tweets can be tackled as a
direct translation (tweet-to-tweet) or as an indirect translation (tweet
normalization to standard text (KaufmannKalita, 2011), text translation
and, if needed, tweet generation). Although the first approach looks
attractive, the lack of parallel or comparable tweets for the working
languages (Petrovic et al., 2010) tends to lead us towards an indirect
approach. Some authors also try to gather similar tweets in other languages
(CLIR).

Work in this area is scarce in the literature but a growing interest is
evident (Gotti et al., 2013). An important point of reference is the work
done to translate SMS texts during the Haiti earthquake (Munro, 2010).

The current task will focus on MT of tweets between languages of the
Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish), as
well as English. The organizing committee will release development data
including parallel tweets that will enable participants to train their
systems. For the final evaluation participants will have to submit the
automatic translation of a number of tweet corpora in a short period of
time. The evaluation will be carried out using automatic distances to the
reference corpora.

These corpora are not meant to be representative of all types of messages
that can be observed in informal communication. This is instead an initial
attempt at tackling part of the task which starts by addressing one of its
simplest parts. We are planing on using more informal and varied corpora in
future tasks as we make progress on these initial issues.

The workshop aims to be a forum where researchers will have a chance to
compare their methods, systems and results.
Important dates

   - *March **1*: Registration opened
   - *April 17*: Release of the development-set
   - *May **12*: Registration deadline
   - *May 19*: Release of the test-set
   - *May 21*: Result submission deadline
   - *May 22-June 12*: Manual evaluation. Publication of results
   - *July 3*: Short paper submission deadline
   - *July 31*: Papers’ camera ready version
   - *September **14 *or* 15*: Workshop

Organizing CommitteeIñaki Alegria (UPV/EHU)
Nora Aranberri (UPV/EHU)
Cristina España-Bonet (UPC)
Pablo Gamallo (USC)
Eva Martínez (UPC)
Hugo Oliveira (Universidade de Coimbra)
Iñaki San Vicente (Elhuyar)
Antonio Toral (DCU, Dublin)
Arkaitz Zubiaga (University of Warwick)
Proceedings
The papers of the workshop will be published In the proceedings of “XXXI
Congreso de la Sociedad Española de Procesamiento de lenguaje natural”.
Proceedings of the workshop will be also published in the CEUR Workshop
Proceedings digital publication service. Additional information
http://komunitatea.elhuyar.org/tweetmt
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem

2015-03-20 Thread Hieu Hoang
the xmlrpc-c problems have been solved, so please git pull and run bjam 
again


On 20/03/2015 13:32, Hieu Hoang wrote:

There's a problem with compiling with xmlrpc-c at the moment.
https://github.com/moses-smt/mosesdecoder/issues/99
It's being looked at, but in the meantime, try compiling without xmlrpc-c

On 20/03/2015 03:31, qinmaoyuan wrote:

hi everyone

I am fresh for mosesdecoder. Now i am confused by some problem.

1.I installed Ubuntu 14.04.02 kylin by using VMware station and some 
software below:


g++
git
subversion
automake
libtool
zlib1g-dev
libboost-all-dev
libbz2-dev
liblzma-dev
python-dev
libtcmalloc-minimal4

2.And then i install boost_1_56_0.tar.gz.  I downloaded by myself and 
ziped to my directory  and installed it successfully.


commands below:
cd boost_1_55_0/
./bootstrap.sh
./b2 -j2 --prefix=$PWD --libdir=$PWD/lib64 --layout=tagged 
link=static threading=multi,single install || echo FAILURE


3.Install xmlrpc-c
I downloaded this instead of using apt-get.
commands :

wget http://svn.code.sf.net/p/xmlrpc-c/code
REPOS=http://svn.code.sf.net/p/xmlrpc-c/code/stable
svn checkout $REPOS xmlrpc-c
./configure --prefix=/usr/local/lib/xml-rpc
make
make install

4.install mosesdecoder

i used git to download mosesdecoder from github and the code is below:

git clone https://github.com/moses-smt/mosesdecoder.git

cd /usr/local/lib/mosesdecoder/

./bjam --with-xmlrpc-c=/usr/local/lib/xml-rpc

here, system always reported build  error.

control platform output :








please help me


kings regards

Qin












___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese segmentation/tokenization

2015-03-20 Thread Jeremy Gwinnup
We’ve had reasonable luck with the Stanford Chinese segmenter - I think the ctb 
model did better than the pku one for our use case

 Message: 2
 Date: Fri, 20 Mar 2015 13:19:02 +0100
 From: Marcin Junczys-Dowmunt junc...@amu.edu.pl
 Subject: [Moses-support] Chinese segmentation/tokenization
 To: Moses Support moses-support@mit.edu
 Message-ID: e4d171cb90994cb853a9965facaeb...@amu.edu.pl
 Content-Type: text/plain; charset=us-ascii
 
 
 
 Hi, 
 
 questions appear from time to time on the list concerning Chinese
 segmentation/tokenization. I saw Barry mention Lingpipe and other tools.
 Is there a favourite tool you guys prefer to use over others? 
 
 Thanks, 
 
 Marcin 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Problem

2015-03-20 Thread Hieu Hoang

There's a problem with compiling with xmlrpc-c at the moment.
https://github.com/moses-smt/mosesdecoder/issues/99
It's being looked at, but in the meantime, try compiling without xmlrpc-c

On 20/03/2015 03:31, qinmaoyuan wrote:

hi everyone

I am fresh for mosesdecoder. Now i am confused by some problem.

1.I installed Ubuntu 14.04.02 kylin by using VMware station and some 
software below:


g++
git
subversion
automake
libtool
zlib1g-dev
libboost-all-dev
libbz2-dev
liblzma-dev
python-dev
libtcmalloc-minimal4

2.And then i install boost_1_56_0.tar.gz.  I downloaded by myself and 
ziped to my directory  and installed it successfully.


commands below:
cd boost_1_55_0/
./bootstrap.sh
./b2 -j2 --prefix=$PWD --libdir=$PWD/lib64 --layout=tagged link=static 
threading=multi,single install || echo FAILURE


3.Install xmlrpc-c
I downloaded this instead of using apt-get.
commands :

wget http://svn.code.sf.net/p/xmlrpc-c/code
REPOS=http://svn.code.sf.net/p/xmlrpc-c/code/stable
svn checkout $REPOS xmlrpc-c
./configure --prefix=/usr/local/lib/xml-rpc
make
make install

4.install mosesdecoder

i used git to download mosesdecoder from github and the code is below:

git clone https://github.com/moses-smt/mosesdecoder.git

cd /usr/local/lib/mosesdecoder/

./bjam --with-xmlrpc-c=/usr/local/lib/xml-rpc

here, system always reported build  error.

control platform output :








please help me


kings regards

Qin












___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese segmentation/tokenization

2015-03-20 Thread Венцислав Жечев (Ventsislav Zhechev)
Hi Marcin,

At Autodesk we’ve been successfully using KyTea since 2011. The main reason we 
chose this specific tool is that it has readily available models for both 
Chinese and Japanese, which simplified the integration in our workflows.
At least for Japanese, we also evaluated Mecab in 2011, but found KyTea to 
serve us better.

Keep in mind, though, that we are not very interested in the quality of the 
segmentation per se; instead we need the MT to be of sufficient quality, 
regardless if what the segmentation tool does makes sense on its own or not.


Cheers,

Ventzi

–––
Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster®
Platform Architecture and Technologies
Localisation Services

MAIN +41 32 723 91 22
FAX +41 32 723 93 99

http://VentsislavZhechev.eu

Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuchâtel, Switzerland
www.autodesk.com




 20.03.2015 г., в 14:32, moses-support-requ...@mit.edu написал(а):
 
 Date: Fri, 20 Mar 2015 13:19:02 +0100
 From: Marcin Junczys-Dowmunt junc...@amu.edu.pl
 Subject: [Moses-support] Chinese segmentation/tokenization
 To: Moses Support moses-support@mit.edu
 Message-ID: e4d171cb90994cb853a9965facaeb...@amu.edu.pl
 Content-Type: text/plain; charset=us-ascii
 
 
 
 Hi, 
 
 questions appear from time to time on the list concerning Chinese
 segmentation/tokenization. I saw Barry mention Lingpipe and other tools.
 Is there a favourite tool you guys prefer to use over others? 
 
 Thanks, 
 
 Marcin 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] CALL FOR BOOK PROPOSALS

2015-03-20 Thread Rohit Gupta
[apologies for cross-posting]



* CALL FOR BOOK PROPOSALS

John Benjamins' NATURAL LANGUAGE PROCESSING Book Series invites new book
proposals to respond to the growing demand for Natural Language processing
(NLP) literature. Three general types of books are considered for
publication:

--
MONOGRAPHS
--
- original, leading and cutting-edge research (the monograph could be based
on an outstanding PhD thesis)
- surveys of the state of the art in specific NLP tasks or applications

--
COLLECTIONS
--
- books focusing on a particular NLP area (e.g. emerging from successful
NLP workshops or as a result of editors’ calls for papers)
- books which include papers covering a wide range of topics (e.g. emerging
from competitive NLP conferences or as a result of proposals for books of
the type Reading In NLP)

-
COURSE BOOKS
-
- general NLP course books
- books on a particular key area of NLP (e.g. Speech Processing,
Computational Syntax/Parsing)



Authors are encouraged to append supplementary materials such as
demonstration programs, NLP software, corpora and so on if applicable, and
to indicate websites and computational language resources where
appropriate. This call invites proposals from potential authors of the
types of books described above. Proposals on any topic related to Natural
Language Processing are welcome.

Interested authors should submit proposals by email (plain text or pdf
files) to the series editor:
Prof. Dr. Ruslan Mitkov
Email r.mit...@wlv.ac.uk
with a copy to Emma Franklin (emma.frank...@wlv.ac.uk), the series
editorial assistant.

The proposals should include an outline of the book (1-2 pages), a
preliminary table of contents, the target readership, related publications,
how the book will differ from other similar books in the area (if
applicable), time-scale and information about the prospective author
(relevant experience in the field, publications etc.).

Each proposal will be reviewed by members of the advisory board or
additional reviewers.



For more information on the series, visit:
https://benjamins.com/#catalog/books/nlp/main






*Rohit Gupta*

*Marie Curie Early Stage Researcher, EXPERT Project*Research Group in
Computational Linguistics
Research Institute of Information and Language Processing
University of Wolverhampton
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Chinese segmentation/tokenization

2015-03-20 Thread Marcin Junczys-Dowmunt
 

Hi, 

questions appear from time to time on the list concerning Chinese
segmentation/tokenization. I saw Barry mention Lingpipe and other tools.
Is there a favourite tool you guys prefer to use over others? 

Thanks, 

Marcin 
 ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese segmentation/tokenization

2015-03-20 Thread Tom Hoar
We also use the Stanford Segmenter most of the time, but have also used 
many. Surprisingly, LDC's manseg also gives very good results with SMT 
and it's much faster to load than Stanford's.


Like Ventzi's comments, a segmenter's absolute accuracy relative to the 
human interpretation of what is a word is not the most important 
factor when using it as a tokenizer for SMT. It's much more important 
for the tool to give consistent co-occurrence results relative to the 
paired language tokens. In from ZH environments, the 
segmented/tokenized form is never seen by humans. In to ZH 
environments, the recaser/detokenizer method(s) can actually repair 
errors and restore the string to what it should be.


@ Venzi, thanks for mentioning KeTea. We'll test  compare.

Tom



On 03/20/2015 08:43 PM, Венцислав Жечев (Ventsislav Zhechev) wrote:

Hi Marcin,

At Autodesk we’ve been successfully using KyTea since 2011. The main 
reason we chose this specific tool is that it has readily available 
models for both Chinese and Japanese, which simplified the integration 
in our workflows.
At least for Japanese, we also evaluated Mecab in 2011, but found 
KyTea to serve us better.


Keep in mind, though, that we are not very interested in the quality 
of the segmentation per se; instead we need the MT to be of sufficient 
quality, regardless if what the segmentation tool does makes sense on 
its own or not.



Cheers,

Ventzi

–––
Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster®
Platform Architecture and Technologies
Localisation Services

MAIN +41 32 723 91 22
FAX +41 32 723 93 99

http://VentsislavZhechev.eu

Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuchâtel, Switzerland
www.autodesk.com




20.03.2015 г., в 14:32, moses-support-requ...@mit.edu 
mailto:moses-support-requ...@mit.edu написал(а):


Date: Fri, 20 Mar 2015 13:19:02 +0100
From: Marcin Junczys-Dowmunt junc...@amu.edu.pl 
mailto:junc...@amu.edu.pl

Subject: [Moses-support] Chinese segmentation/tokenization
To: Moses Support moses-support@mit.edu
Message-ID: e4d171cb90994cb853a9965facaeb...@amu.edu.pl
Content-Type: text/plain; charset=us-ascii



Hi,

questions appear from time to time on the list concerning Chinese
segmentation/tokenization. I saw Barry mention Lingpipe and other tools.
Is there a favourite tool you guys prefer to use over others?

Thanks,

Marcin




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Chinese segmentation/tokenization

2015-03-20 Thread Marcin Junczys-Dowmunt

Hi all,
thank you all for the tips. I am going with Stanford then.

I am currently producing a language model from the Christian's raw 
Chinese CommonCrawl data (www.statmt.org/ngrams). Once I am done I will 
be happy to share back.

Best,
Marcin

W dniu 20.03.2015 o 15:43, Tom Hoar pisze:
We also use the Stanford Segmenter most of the time, but have also 
used many. Surprisingly, LDC's manseg also gives very good results 
with SMT and it's much faster to load than Stanford's.


Like Ventzi's comments, a segmenter's absolute accuracy relative to 
the human interpretation of what is a word is not the most important 
factor when using it as a tokenizer for SMT. It's much more important 
for the tool to give consistent co-occurrence results relative to the 
paired language tokens. In from ZH environments, the 
segmented/tokenized form is never seen by humans. In to ZH 
environments, the recaser/detokenizer method(s) can actually repair 
errors and restore the string to what it should be.


@ Venzi, thanks for mentioning KeTea. We'll test  compare.

Tom



On 03/20/2015 08:43 PM, Венцислав Жечев (Ventsislav Zhechev) wrote:

Hi Marcin,

At Autodesk we’ve been successfully using KyTea since 2011. The main 
reason we chose this specific tool is that it has readily available 
models for both Chinese and Japanese, which simplified the 
integration in our workflows.
At least for Japanese, we also evaluated Mecab in 2011, but found 
KyTea to serve us better.


Keep in mind, though, that we are not very interested in the quality 
of the segmentation per se; instead we need the MT to be of 
sufficient quality, regardless if what the segmentation tool does 
makes sense on its own or not.



Cheers,

Ventzi

–––
Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster®
Platform Architecture and Technologies
Localisation Services

MAIN +41 32 723 91 22
FAX +41 32 723 93 99

http://VentsislavZhechev.eu

Autodesk, Inc.
Rue de Puits-Godet 6
2000 Neuchâtel, Switzerland
www.autodesk.com




20.03.2015 г., в 14:32, moses-support-requ...@mit.edu 
mailto:moses-support-requ...@mit.edu написал(а):


Date: Fri, 20 Mar 2015 13:19:02 +0100
From: Marcin Junczys-Dowmunt junc...@amu.edu.pl 
mailto:junc...@amu.edu.pl

Subject: [Moses-support] Chinese segmentation/tokenization
To: Moses Support moses-support@mit.edu
Message-ID: e4d171cb90994cb853a9965facaeb...@amu.edu.pl
Content-Type: text/plain; charset=us-ascii



Hi,

questions appear from time to time on the list concerning Chinese
segmentation/tokenization. I saw Barry mention Lingpipe and other tools.
Is there a favourite tool you guys prefer to use over others?

Thanks,

Marcin




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Translator Model Parameter Clarification

2015-03-20 Thread Jer Yango
Hi there,

As seen on the training references from the website, the --lm parameter
accepts three inputs: format, order, and filename.


   - --lm -- language model: factor:order:filename (option can be
   repeated)

But the baseline system uses 4 inputs:

-lm 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8


I would like to find out what the factor input is, also in identifying the
fourth input. Thanks!
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Forbidden link to binaries

2015-03-20 Thread Per Tunedal
Hi Nick, Thank you! Yours, Per Tunedal


On Thu, Mar 19, 2015, at 15:46, Nikolay Bogoychev wrote:
 Hey Per,

 The link seems to be outdated, as it points to RELEASE-1.0. You can
 find the current ones here:
 http://www.statmt.org/moses/RELEASE-3.0/binaries/

 Cheers,

 Nick

 On Thu, Mar 19, 2015 at 2:22 PM, Per Tunedal
 per.tune...@operamail.com wrote:
 Hi,

I just read the page http://www.statmt.org/moses/?n=Moses.Releases and

tried the link to the binaries:


All the binary executables are made available for download for users who

do not wish to compile their own version.


Clicking on download gets me to the page
 http://www.statmt.org/moses/RELEASE-1.0/binaries/

showing the message:


Forbidden


You don't have permission to access /moses/RELEASE-1.0/binaries/ on this

server.


Yours,

Per Tunedal

___

Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] LMs for factors unused make the decoder fail

2015-03-20 Thread Hieu Hoang
your output only has factor 0 and 2. So a LM over factor 1 or 3 will result
in a segfault

Hieu Hoang
Research Associate (until March 2015)
University of Edinburgh
http://www.hoang.co.uk/hieu

On 19 March 2015 at 08:01, Stanislav Kuřík standa.ku...@gmail.com wrote:

 Hello,

 when a train a model with 0,2-0,2 translation factors and I also attach
 a LMs for a different factor (1 or 3 in this case), running the decoder
 fails. Commenting these other LMs out in the INI file fixes this.

 It's not a critical issue, it just strikes me as odd that LMs which
 should not be used in the decoding process at all (yes, they are loaded,
 but they should not be consulted at all, if I am correct) make it fail.

 Regards,
 Stanislav K.

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Translator Model Parameter Clarification

2015-03-20 Thread Hieu Hoang
the 4th input (8) is the LM implementation you would like the decoder to
use. 0=SRILM, 1=IRSTLM, 8=KENLM.

factor (0) is the factor in the output sentence you want the LM to use. If
you don't use factors, then it's always 0

Hieu Hoang
Research Associate (until March 2015)
University of Edinburgh
http://www.hoang.co.uk/hieu

On 20 March 2015 at 16:32, Jer Yango yango@gmail.com wrote:

 Hi there,

 As seen on the training references from the website, the --lm parameter
 accepts three inputs: format, order, and filename.


- --lm -- language model: factor:order:filename (option can be
repeated)

 But the baseline system uses 4 inputs:

 -lm 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8


 I would like to find out what the factor input is, also in identifying the
 fourth input. Thanks!

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support