Re: [Moses-support] Moses-support Digest, Vol 95, Issue 8

2014-09-05 Thread Yusup Ashrap
Hi Tom ,
Thanks for the  reply.

1.Yes I did run this with moses binary.Here is the command and  result.



$ echo '我 要 去 学校' |
 /work/mosesdecoder/bin/moses -xml-input exclusive -f
/tmp/shared/moses-training/chuy_0826-03:04/model/moses.ini

I found out that chinese and arabic letters could not be shown properly ,
so I added my command to the last comment in the issue.
https://github.com/moses-smt/mosesdecoder/issues/73


\
FeatureFunction: UnknownWordPenalty0 start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 1 end: 1
line=PhrasePenalty
FeatureFunction: PhrasePenalty0 start: 2 end: 2
line=PhraseDictionaryMemory name=TranslationModel0 table-limit=20
num-features=4
path=/tmp/shared/moses-training/chuy_0826-03:04/model/rule-table.gz
input-factor=0 output-factor=0
FeatureFunction: TranslationModel0 start: 3 end: 6
line=PhraseDictionaryMemory name=TranslationModel1 num-features=1
path=/tmp/shared/moses-training/chuy_0826-03:04/model/glue-grammar
input-factor=0 output-factor=0
FeatureFunction: TranslationModel1 start: 7 end: 7
line=SRILM name=LM0 factor=0 path=/tmp/shared/27uy.o5.lm order=5
FeatureFunction: LM0 start: 8 end: 8
Loading UnknownWordPenalty0
Loading WordPenalty0
Loading PhrasePenalty0
Loading LM0
/tmp/shared/27uy.o5.lm: line 3483: warning: non-zero probability for 
in closed-vocabulary LM
Loading TranslationModel0
Start loading text phrase table. Moses  format : [4.729] seconds
Reading /tmp/shared/moses-training/chuy_0826-03:04/model/rule-table.gz
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Loading TranslationModel1
Start loading text phrase table. Moses  format : [760.530] seconds
Reading /tmp/shared/moses-training/chuy_0826-03:04/model/glue-grammar
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

max-chart-span: 20
max-chart-span: 1000
IO from STDOUT/STDIN
Created input-output object : [760.594] seconds
*Unknown input type: 3*
Name:moses  VmPeak:22480296 kB  VmRSS:22437712 kB
RSSMax:2245 kB  user:762.409sys:20.947  CPU:783.356
real:783.449
\


2.Text Does not include non-printing ltr , rtl chars.



Many Thanks.




> Message: 4
> Date: Fri, 05 Sep 2014 19:28:56 +0700
> From: Tom Hoar 
> Subject: Re: [Moses-support] mosesserver -xml-input exclusive
> segmentation fault
> To: moses-support@mit.edu
> Message-ID: <5409ac88.1050...@precisiontranslationtools.com>
> Content-Type: text/plain; charset="utf-8"
>
> Have you run this through the standard moses binary to see if it's a
> mosesserver or shared xml-input problem?
>
> Also, you're forcing Chinese characters to Arabic/Persian script. Does
> your text include the non-printing left-to-right and right-to-left
> characters?
>
>
>
>
>
> On 09/05/2014 06:54 PM, Yusup Ashrap wrote:
> > Hi All,
> >
> > I am having problems with using mosesserver with xml-input flag.
> > mosesserver works fine without adding tagged words, but with tagged
> > words it quits with a segmentation fault.
> >
> > I added detailed issue on github.
> > https://github.com/moses-smt/mosesdecoder/issues/73
> >
> > I started server using this following command.
> > #./mosesserver -xml-input exclusive -f  moses.ini
> >
> > and I invoke translation service with my python xml-rpc code.
> >
> >
> >
> > |  '? ? ?  ??'|
> >
> > after this mosesserver quitted with a segmentation fault without any
> > detailed stacktrace.
> >
> >
> >
> >
> > Error is just one line and I could get any related info from google or
> > mailing list.
> >
> > Could you tell me what possibly went wrong or what should I need to
> > make this work?
> >
> >
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental retraining

2014-09-05 Thread Sandipan Dandapat
Hi,
During incremental retraining I specified the following line in moses.ini
PhraseDictionaryBitextSampling name=PT0 output-factor=0 num-features=9
path=/home/sandipan/inc_retrain/MT_sys/EnPl/mtdata_pro/train. L1=en L2=pl
pfwd=g pbwd=g smooth=0 sample=1000 workers=1

this generates the error:
Feature function PT0 specified 9 dense scores or weights. Actually has 0.

which is solved when num-features is changed to '0'
but generates the error below:

Exception: moses/TranslationModel/UG/mmsapt.cpp:381 in virtual void
Moses::Mmsapt::Load() threw util::Exception because
`this->m_feature_names.size() != this->m_numScoreComponents'.
At moses/TranslationModel/UG/mmsapt.cpp:381: number of feature values
provided by Phrase table (7) does not match number specified in Moses
config file (0)!
Changing it to 7 also does not help.

I have tried with
Mmsapt name=PT0 output-factor=0 num-features=0
base=/home/sandipan/inc_retrain/MT_sys/EnPl/mtdata_pro/train. L1=en L2=pl

but does not work.
What I need to do at this stage of retraining using moses?

Thanks and regards,
sandipan
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] mosesserver -xml-input exclusive segmentation fault

2014-09-05 Thread Tom Hoar
Have you run this through the standard moses binary to see if it's a 
mosesserver or shared xml-input problem?


Also, you're forcing Chinese characters to Arabic/Persian script. Does 
your text include the non-printing left-to-right and right-to-left 
characters?






On 09/05/2014 06:54 PM, Yusup Ashrap wrote:

Hi All,

I am having problems with using mosesserver with xml-input flag.
mosesserver works fine without adding tagged words, but with tagged 
words it quits with a segmentation fault.


I added detailed issue on github.
https://github.com/moses-smt/mosesdecoder/issues/73

I started server using this following command.
#./mosesserver -xml-input exclusive -f  moses.ini

and I invoke translation service with my python xml-rpc code.



|  '我 要 去  学校'|

after this mosesserver quitted with a segmentation fault without any 
detailed stacktrace.





Error is just one line and I could get any related info from google or 
mailing list.


Could you tell me what possibly went wrong or what should I need to 
make this work?







--
Best Regards
Yusup Ashrap



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] mosesserver -xml-input exclusive segmentation fault

2014-09-05 Thread Yusup Ashrap
Hi All,

I am having problems with using mosesserver with xml-input flag.
mosesserver works fine without adding tagged words, but with tagged words
it quits with a segmentation fault.

I added detailed issue on github.
https://github.com/moses-smt/mosesdecoder/issues/73

I started server using this following command.
#./mosesserver -xml-input exclusive -f  moses.ini

and I invoke translation service with my python xml-rpc code.



 '我 要 去  学校'


after this mosesserver quitted with a segmentation fault without any
detailed stacktrace.




Error is just one line and I could get any related info from google or
mailing list.

Could you tell me what possibly went wrong or what should I need to make
this work?






-- 
Best Regards
Yusup Ashrap
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Parallel FDA5 WMT'14 Datasets

2014-09-05 Thread Ergun Bicici
Parallel FDA5 WMT'14 Datasets

Dear moses-list,

We make the English, Czech, French, German, and Russian datasets we used
when building Parallel FDA5 Moses SMT systems for research purposes,
available at:
https://github.com/bicici/ParFDA5WMT

Results are presented in the citation provided below.

Citation:

Ergun Biçici, Qun Liu, and Andy Way. Parallel FDA5 for Fast Deployment of
Accurate Statistical Machine Translation Systems. In Proceedings of the
Ninth Workshop on Statistical Machine Translation, Baltimore, USA, June
2014. Association for Computational Linguistics.


The datasets and the SMT results can serve as a benchmark for SMT research
where further linguistic processing can be performed to see whether the
results can be improved. The datasets allow fast deployment of accurate SMT
systems and can be used for benchmarking the performance of SMT systems.

Language models were built using SRILM (
http://www.speech.sri.com/projects/srilm/). Language model corpora used
contain 15M sentences some of which are selected from LDC Gigaword corpora
by the Parallel FDA5 algorithm:

[4 use the LDC English Gigaword 5th edition]
- Czech - English: 2.13 million sentences from LDC English Gigaword, ~1.69
%.
- French - English: 2.49 million sentences from LDC English Gigaword, ~1.97
%
- German - English: 2.57 million sentences from LDC English Gigaword, ~2.03
%
- Russian - English: 3.34 million sentences from LDC English Gigaword,
~2.64 %

[1 use the LDC French Gigaword 3rd edition]
- English - French: 0.47 million sentences from LDC French Gigaword, ~1.93 %


Work using the datasets:

- Ergun Biçici, Qun Liu, and Andy Way. Parallel FDA5 for Fast Deployment of
Accurate Statistical Machine Translation Systems. In Proceedings of the
Ninth Workshop on Statistical Machine Translation, Baltimore, USA, June
2014. Association for Computational Linguistics.
- Ergun Biçici (contributor), “Quality Estimation for Extending Good
Translations”, QTLaunchPad Deliverable:
http://www.qt21.eu/launchpad/deliverable/quality-estimation-extending-good-translations
- Ergun Biçici, High Quality Machine Translation with ITERPE, 2014. Note:
Dublin City University Invention Disclosure.


LICENSE:
CNGL License for Open Data allowing use for research and academic purposes.


Best Regards,
Ergun

Ergun Biçici, CNGL, School of Computing, DCU, www.cngl.ie, Phone:
+353-1-700-6711
http://www.computing.dcu.ie/~ebicici/
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Post-doctoral Researcher Job Advertisement

2014-09-05 Thread Ergun Bicici
We have a post-doctoral researcher position for the following project:

   - *Monolingual and Bilingual Text Quality Judgments with Translation
   Performance Prediction, 2014-2015*
   
   SFI project funded by Technology Innovation Development Award (TIDA).

Position is for 6 months. Salary is 37,750 Euros on a yearly basis. Application
deadline is 19 September, 2014.

http://www.dcu.ie/sites/default/files/hr/Postdoctoral%20Researcher%20CNGL.pdf


Regards,
Ergun

Ergun Bicici
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support