Hi Per,
the script that prints that warning message is the tokeniser provided with
Moses (you can find this script here:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl).
There is a sub-folder somewhere in the code which is called
*nonbreaking_preffixes* containing a collection of documents with these
lists for several languages. This sub-folder is also provided with moses:
https://github.com/moses-smt/mosesdecoder/tree/master/scripts/share/nonbreaking_prefixes.
I don't know how to create these files automatically, but, if you have a
look in the link to this sub-folder in the Moses repository, you will see
that there is already one for Swedish (
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.sv),
so you only have to download it and put it in the sub-folder in your system.
Best,
Miquel.
2014-03-03 12:59 GMT+01:00 Per Tunedal <per.tune...@operamail.com>:
> Hi again Miquel,
> I've manually replaced the variables and the script bitextor-builddics.sh
> works like a charm!
>
> I've got a complaint about a missing list of Swedish abbreviations though:
>
> TOKENISING THE CORPUS...
> WARNING: No known abbreviations for language 'sv', attempting fall-back to
> English version...
>
> Where do I find those lists of abbreviations (what program, what folder)?
> It would be quite easy for me to supply such a list as I've already done it
> to Apertium-sv-da and to bligner.py
>
> Yours,
> Per Tunedal
>
> On Thu, Feb 20, 2014, at 19:48, Miquel Esplà wrote:
>
> Well, of course you can try to replace manually the variables by paths (as
> I told you, you have to try to replace variables starting and ending with
> __). I don't think I can help you much more because I never did this, but
> I'm sure that with a bit of patiente you will do it ;) Good luck!
>
> Cheers,
>
> Miquel.
>
> ---snip---
>
>
> > >
> > > I'm sorry, I didn't explain it well: as I said, bitextor-builddics.inis
> > > only the template of the script. What I didn't say is that you need to
> > > compile the project to get the true script. If you have a look into the
> > > code of the template, you will see that there are many variables
> starting
> > > and ending with "__" (such as __PREFFIX__). These variables are
> > > replaced by the corresponding paths at compilation time. So, to use
> the
> > > script, you have to download the whole trunk directory, and then to
> run:
> > > ./autogen.sh
> > > ./configure
> > > make
> > > make install
> > >
> > > As you know, you can use the option --prefix=LOCALDIR when running
> > > ./configure to install bitextor in a specific path (for example
> LOCALDIR could
> > > be /home/per/local/).
> > >
> > > Best,
> > >
> > > Miquel.
> > >
> > >
> > >
> > > Yours,
> > > Per Tunedal
> > >
> > > On Tue, Feb 18, 2014, at 12:38, Miquel Esplà wrote:
> > >
> > > Hi Per,
> > >
> > > I think that the explanation in this website:
> > > http://rali.iro.umontreal.ca/rali/?q=en/node/1325 is quite useful. It
> > > helps a lot to understand the structure and the content of each file
> > > generated by OmegaT.
> > >
> > > About the script, in the last release of bitextor we included a script
> > > called "bitextor-builddics" (you can find the template of this script
> here:
> > > https://svn.code.sf.net/p/bitextor/code/trunk/bitextor-builddics.in)
> > > which uses GIZA++ to obtain a plain text bilingual dictionary, but only
> > > including pairs of words fulfilling: a) both words occur at least 10
> times
> > > in the corpus, and b) the harmonic mean of their probabilities in both
> > > probabilistic dictionaries (S -> T and T -> S) is higher than 0.2. If
> you
> > > want to use this, I recommend you to use the version in the trunk,
> which
> > > fixes some minor bugs still present in the release.
> > >
> > > Best,
> > >
> > > Miquel.
> > >--snip---
>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries. Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff