Hello! > I don't think so, I think Mansur wants the tagger to disambiguate > according > to the context, but have it in line-by-line output, like TreeTagger or > UDpipe
Fran, no, no, I don't think so, Kevin was right :) I think tagger should not disambiguate across lines. Because in corpus different lines sometimes are taken from different texts, so lines should be absolutely independent for a tagger. By the way, I found example of actual lines merging: һәм бу очракта "җиң сызганып" туры мәгънәдә 😉 Кибеткә бара идем. ^һәм/һәм<cnjcoo>$ ^бу/бу<prn><dem><nom>$ ^очракта/очрак<n><sg><sg><loc>$ ^"/"<sent>$^җиң сызганып/җиң сызган<v><tv><gna_perf>$^"/"<sent>$ ^туры/туры<adj>$ ^мәгънәдә/мәгънә<n><sg><sg><loc>$ �^Кибеткә/Кибет<n><sg><sg><dat>$ ^бара/бар<v><tv><pres><p3><sg>$ ^идем/и<cop><ifi><p1><sg>$^./.<sent>$ Best! Mansur Am Do., 8. Nov. 2018 um 23:05 Uhr schrieb saurabh dubey < sauvzi13...@gmail.com>: > Hello sir, > I am a student from JIIT Noida, India. Currently, I'm working on Deep > learning and Specifically on NLP( Natural language processing) and NMT( > Neural machine translation). > As your open source organization already contributing in this field from a > very long time, So you can be a great mentor for me and your guidance will > be really valuable for me. I really want to work in this field and want to > learn more. > *I have little knowledge in the field. I have already worked on a > few small projects of my own as mentioned:* > *-Sentimental analysis* > *-Created a chatbot by using Deep NLP model in Tenserflow and python. * > * - Few things learned during the process are:* > * 1. Type of Natural Language Processing* > * 2. Seq2Seq Architecture & Training* > * 3. End to End Deep learning models* > * 4. Beam search decoding.* > > I would love to learn and then contribute to Apertium. > > *There are some Ideas on which we can work:* > > * 1. A chatbot for your website for Q&A.* > * 2. India there is about 23 official languages and I would love to work > for any of them to extend your spectrum.* > * 3. Additional toolbox with the given feature:* > * -Copy* > * -Share* > * -Text-to-speech recognition.* > > Kindly assist me in this process as I really dedicated and focused towards > this field and would love to assure my commitment. > *I hope you acknowledge my efforts. * > > On Thu, Nov 8, 2018 at 7:39 PM Kevin Brubeck Unhammer <unham...@fsfe.org> > wrote: > >> Francis Tyers <fty...@prompsit.com> čálii: >> >> [...] >> >> >>> That would be a good feature, but wouldn't get past the issue of the >> >>> tagger/cg. E.g. if we do that then the tagger can't take into account >> >>> context. >> >> >> >> Isn't that the whole point? (Ie. treat each line as completely >> >> independent, no context.) >> > >> > I don't think so, I think Mansur wants the tagger to disambiguate >> > according >> > to the context, but have it in line-by-line output, like TreeTagger or >> > UDpipe >> > etc. >> >> Well, it's only lt-proc doing the moving, so just move the NUL-deletion >> before cg-proc: >> >> cat corpus.txt \ >> | tr -d '\0' \ >> | apertium-deshtml -n \ >> | sed 's/\[$/[][/; s/^]/]\x00/' \ >> | lt-proc -z -w 'tat.automorf.bin' \ >> | tr -d '\0' \ >> | cg-proc -z 'tat.rlx.bin' \ >> | cg-proc -z -w -1 dev/mansur.bin' \ >> | apertium-rehtml-noent >> >> Now only lt-proc should treat end-of-line as a stream delimiter. >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff