Re: [Moses-support] Looking for a tool for training csv delimited and aligned data

Vincent Nguyen Wed, 26 Apr 2017 01:18:56 -0700

I think you mixed up input/ouput because in your example at the end, youwould like to get pronunciation of a given new word.

input is the left hand side and output is the pron.

If you are able to rework a little bit the right hand side of your data(you need to stretch the phones one by one, based on unique ones)

then the tool you are looking for is this one:https://github.com/sequitur-g2p/sequitur-g2p



Le 26/04/2017 à 04:57, doc a écrit :

Many thanks for your kind interest.

Basically I aim was to find if a training tool exists which can traindata with the following format:

abc=def
Where the right hand is the input and the left hand is the output.

I chose English to UK IPA as an example. I have around 80,000+ stringsof English to IPA. An example is given below:

ˈem=əm
ˈneath=niːθ
ˈshun=ʃʌn
ˈtwas=twɒz
ˈtwen=twiːn
ˈtwen-decks=ˈtwiːn-deks
ˈtwere=twɜːr
ˈtwill=twɪl
ˈtwixt=twɪkst
ˈtwould=twʊd
ˈun=ən
A=eiː
Aˈs=eɪz
A-bomb=ˈeɪ-bɒm
If I had an unknown word like say
superpose
the tool after training should be able to predict
suːpəˈpəʊz
 or something like that. Stress does not matter

I wonder if someone in the MOSES team could come up with a small toolwhich can be used for such kind of work. As a linguist, I can assureyou that such a tool would be immensely popular and render greatservice to the community.I need not add that I will download GIZApp and try. I am trying outSciKit, but the results are not very encouraging.

Many thanks

On Wed, Apr 26, 2017 at 3:23 AM, Allen Smith <allen.w.smi...@gmail.com<mailto:allen.w.smi...@gmail.com>> wrote:


    Are you wanting to train something to align letters and sounds, or
    to figure out the sounds given the letters? (As it happens, I've
    been working on using GIZApp to do the former.)

    -Allen (Allen W. Smith, Ph.D.)

    On Tue, Apr 25, 2017 at 4:01 AM, doc <raymond.doc...@gmail.com
    <mailto:raymond.doc...@gmail.com>> wrote:

        Hello,
        I am looking for a tool for training data using either
        Statistical methods or even CNN/RNN.
        Basically the tool would allow the user to train simple data
        and then once trained, it could be deployed to predict unknown
        data
        As an example, I have around 80,000 words in English converted
        to IPA [text aligned in CSV format] and would like to train
        the tool using the data, to predict in the case of new words.
        Using Moses is like using a surgeon's scalpel to saw wood
        And since I work in a Windows environment installing Moses is
        not very easy
        In any case a large number of linguists like me would prefer
        to have a tool with ease of use.I am sure that if such a tool
        is made available, it will be one of the most popular tools.

        Thanks in advance for any help.



        _______________________________________________
        Moses-support mailing list
        Moses-support@mit.edu <mailto:Moses-support@mit.edu>
        http://mailman.mit.edu/mailman/listinfo/moses-support
        <http://mailman.mit.edu/mailman/listinfo/moses-support>





_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Looking for a tool for training csv delimited and aligned data

Reply via email to