Hello!

> I don't think so, I think Mansur wants the tagger to disambiguate
> according
> to the context, but have it in line-by-line output, like TreeTagger or
> UDpipe

Fran, no, no, I don't think so, Kevin was right :) I think tagger should
not disambiguate across lines. Because in corpus different lines sometimes
are taken from different texts, so lines should be absolutely independent
for a tagger.

By the way, I found example of actual lines merging:

һәм бу очракта "җиң сызганып" туры мәгънәдә 😉
Кибеткә бара идем.

^һәм/һәм<cnjcoo>$ ^бу/бу<prn><dem><nom>$ ^очракта/очрак<n><sg><sg><loc>$
^"/"<sent>$^җиң сызганып/җиң сызган<v><tv><gna_perf>$^"/"<sent>$
^туры/туры<adj>$ ^мәгънәдә/мәгънә<n><sg><sg><loc>$
�^Кибеткә/Кибет<n><sg><sg><dat>$ ^бара/бар<v><tv><pres><p3><sg>$
^идем/и<cop><ifi><p1><sg>$^./.<sent>$

Best!
Mansur


Am Do., 8. Nov. 2018 um 23:05 Uhr schrieb saurabh dubey <
sauvzi13...@gmail.com>:

> Hello sir,
> I am a student from JIIT Noida, India. Currently, I'm working on Deep
> learning and Specifically on NLP( Natural language processing) and NMT(
> Neural machine translation).
> As your open source organization already contributing in this field from a
> very long time, So you can be a great mentor for me and your guidance will
> be really valuable for me.  I really want to work in this field and want to
> learn more.
> *I have little knowledge in the field. I have already worked on a
> few small projects of my own as mentioned:*
> *-Sentimental analysis*
> *-Created a chatbot by using Deep NLP model in Tenserflow and python.    *
> * - Few things learned during the process are:*
> *   1. Type of Natural Language Processing*
> *   2. Seq2Seq Architecture & Training*
> *   3. End to End Deep learning models*
> *   4. Beam search decoding.*
>
> I would love to learn and then contribute to Apertium.
>
> *There are some Ideas on which we can work:*
>
> * 1. A chatbot for your website for Q&A.*
> * 2. India there is about 23 official languages and I would love to work
> for any of them to extend your spectrum.*
> * 3.  Additional toolbox with the given feature:*
> *      -Copy*
> *      -Share*
> *      -Text-to-speech recognition.*
>
> Kindly assist me in this process as I really dedicated and focused towards
> this field and would love to assure my commitment.
> *I hope you acknowledge my efforts. *
>
> On Thu, Nov 8, 2018 at 7:39 PM Kevin Brubeck Unhammer <unham...@fsfe.org>
> wrote:
>
>> Francis Tyers <fty...@prompsit.com> čálii:
>>
>> [...]
>>
>> >>> That would be a good feature, but wouldn't get past the issue of the
>> >>> tagger/cg. E.g. if we do that then the tagger can't take into account
>> >>> context.
>> >>
>> >> Isn't that the whole point? (Ie. treat each line as completely
>> >> independent, no context.)
>> >
>> > I don't think so, I think Mansur wants the tagger to disambiguate
>> > according
>> > to the context, but have it in line-by-line output, like TreeTagger or
>> > UDpipe
>> > etc.
>>
>> Well, it's only lt-proc doing the moving, so just move the NUL-deletion
>> before cg-proc:
>>
>>    cat corpus.txt                     \
>>    | tr -d '\0'                       \
>>    | apertium-deshtml -n              \
>>    | sed 's/\[$/[][/; s/^]/]\x00/'    \
>>    | lt-proc -z -w 'tat.automorf.bin' \
>>    | tr -d '\0'                       \
>>    | cg-proc -z  'tat.rlx.bin'        \
>>    | cg-proc -z -w -1 dev/mansur.bin' \
>>    | apertium-rehtml-noent
>>
>> Now only lt-proc should treat end-of-line as a stream delimiter.
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to