One more example: - Фәнис Яруллин � - Фәнис Яруллинга багышланган чараларның һәрберсендә катнашырга тырышам, - диде әдипнең дусты Мохтар Афзалов.
^-/-<guio>$ ^Фәнис/Фәнис<np><ant><m><nom>$ ^Яруллин/Яруллин<np><cog><m><nom>$ �-/-<guio>$ ^Фәнис/Фәнис<np><ant><m><nom>$ ^Яруллинга/Яруллин<np><cog><m><dat>$ ^багышланган/багышла<v><tv><pass><gpr_past>$ ^чараларның/чара<n><pl><gen>$ ^һәрберсендә/*һәрберсендә$ ^катнашырга/катнаш<v><tv><inf>$ ^тырышам/тырыш<v><tv><pres><p1><sg>$^,/,<cm>$ ^-/-<guio>$ ^диде/ди<v><tv><ifi><p3><sg>$ ^әдипнең/әдип<n><sg><gen>$ ^дусты/дуст<n><sg><px3sp><nom>$ ^Мохтар/Мохтар<np><ant><m><nom>$ ^Афзалов/Афзалов<np><cog><m><nom>+и<cop><aor><p3><sg>$^./.<sent>$ Here it happens because of some broken char... But why? Am Fr., 9. Nov. 2018 um 10:24 Uhr schrieb mansur <6688...@gmail.com>: > Hello! > > > I don't think so, I think Mansur wants the tagger to disambiguate > > according > > to the context, but have it in line-by-line output, like TreeTagger or > > UDpipe > > Fran, no, no, I don't think so, Kevin was right :) I think tagger should > not disambiguate across lines. Because in corpus different lines sometimes > are taken from different texts, so lines should be absolutely independent > for a tagger. > > By the way, I found example of actual lines merging: > > һәм бу очракта "җиң сызганып" туры мәгънәдә 😉 > Кибеткә бара идем. > > ^һәм/һәм<cnjcoo>$ ^бу/бу<prn><dem><nom>$ ^очракта/очрак<n><sg><sg><loc>$ > ^"/"<sent>$^җиң сызганып/җиң сызган<v><tv><gna_perf>$^"/"<sent>$ > ^туры/туры<adj>$ ^мәгънәдә/мәгънә<n><sg><sg><loc>$ > �^Кибеткә/Кибет<n><sg><sg><dat>$ ^бара/бар<v><tv><pres><p3><sg>$ > ^идем/и<cop><ifi><p1><sg>$^./.<sent>$ > > Best! > Mansur > > > Am Do., 8. Nov. 2018 um 23:05 Uhr schrieb saurabh dubey < > sauvzi13...@gmail.com>: > >> Hello sir, >> I am a student from JIIT Noida, India. Currently, I'm working on Deep >> learning and Specifically on NLP( Natural language processing) and NMT( >> Neural machine translation). >> As your open source organization already contributing in this field from >> a very long time, So you can be a great mentor for me and your >> guidance will be really valuable for me. I really want to work in this >> field and want to learn more. >> *I have little knowledge in the field. I have already worked on a >> few small projects of my own as mentioned:* >> *-Sentimental analysis* >> *-Created a chatbot by using Deep NLP model in Tenserflow and python. * >> * - Few things learned during the process are:* >> * 1. Type of Natural Language Processing* >> * 2. Seq2Seq Architecture & Training* >> * 3. End to End Deep learning models* >> * 4. Beam search decoding.* >> >> I would love to learn and then contribute to Apertium. >> >> *There are some Ideas on which we can work:* >> >> * 1. A chatbot for your website for Q&A.* >> * 2. India there is about 23 official languages and I would love to work >> for any of them to extend your spectrum.* >> * 3. Additional toolbox with the given feature:* >> * -Copy* >> * -Share* >> * -Text-to-speech recognition.* >> >> Kindly assist me in this process as I really dedicated and focused >> towards this field and would love to assure my commitment. >> *I hope you acknowledge my efforts. * >> >> On Thu, Nov 8, 2018 at 7:39 PM Kevin Brubeck Unhammer <unham...@fsfe.org> >> wrote: >> >>> Francis Tyers <fty...@prompsit.com> čálii: >>> >>> [...] >>> >>> >>> That would be a good feature, but wouldn't get past the issue of the >>> >>> tagger/cg. E.g. if we do that then the tagger can't take into account >>> >>> context. >>> >> >>> >> Isn't that the whole point? (Ie. treat each line as completely >>> >> independent, no context.) >>> > >>> > I don't think so, I think Mansur wants the tagger to disambiguate >>> > according >>> > to the context, but have it in line-by-line output, like TreeTagger or >>> > UDpipe >>> > etc. >>> >>> Well, it's only lt-proc doing the moving, so just move the NUL-deletion >>> before cg-proc: >>> >>> cat corpus.txt \ >>> | tr -d '\0' \ >>> | apertium-deshtml -n \ >>> | sed 's/\[$/[][/; s/^]/]\x00/' \ >>> | lt-proc -z -w 'tat.automorf.bin' \ >>> | tr -d '\0' \ >>> | cg-proc -z 'tat.rlx.bin' \ >>> | cg-proc -z -w -1 dev/mansur.bin' \ >>> | apertium-rehtml-noent >>> >>> Now only lt-proc should treat end-of-line as a stream delimiter. >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff