Here are some published papers on how character embeddings are used for classification. https://www.google.com/url?sa=t&source=web&rct=j&url=https://arxiv.org/abs/1810.03595&ved=2ahUKEwiu-ajdgvPnAhXXxzgGHQAWA3cQFjAVegQIDBAB&usg=AOvVaw0LQ60M-KXtk-NGyAoVqmeU
https://lsm.media.mit.edu/papers/tweet2vec_vvr.pdf https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf We have just finished writting a paper on this and have got better results than the one in the papers mentioned above.The dataset is collected from sentiwordnet as i mentioned earlier. I am not on the IRC ,i will join it then. Best, Rajarshi On Fri, Feb 28, 2020, 01:01 Tanmai Khanna <khanna.tan...@gmail.com> wrote: > How exactly can characters predict sentiment? Don’t you still need some > training data for pairs? English, Hindi, Bangla aren’t really low resource > languages. > > Anyway, we can continue this discussion on the IRC so that it’ll be easier > and more people can contribute to the discussion. > > Tanmai > > Sent from my iPhone > > On 28-Feb-2020, at 00:52, Rajarshi Roychoudhury <rroychoudhu...@gmail.com> > wrote: > > > To answer the question on how to analyse sentiment on low resource > language , I think character embedding would be the best option. The words > in the corpus is not exhaustive but the number of unique characters is > certainly well deterministic. We can figure out the embedding weight for > each character, and can apply it for a number of NLP techniques, not just > sentiment analysis.The downside of low resource language can be slightly > minimised using that. > > On Fri, Feb 28, 2020, 00:46 Rajarshi Roychoudhury < > rroychoudhu...@gmail.com> wrote: > >> As I mentioned earlier, I would like to work on English-Hindi or >> English-Bengali translation, the dataset can be obtained from sentiwordnet >> for Indian languages, >> https://amitavadas.com/sentiwordnet.php >> which is by far the most resourceful dataset available for sentiment >> analysis.It contains data for both Hindi and Bengali. >> >> I cannot give any example specific to apertium because whenever I try to >> translate a word from English in the interface, the available languages for >> translation are beyond my knowledge. I am not sure if I am right, but >> Hindi/Bengali is probably not one of the languages to which an English word >> can be translated to. Correct me if I am wrong >> >> >> >> On Fri, Feb 28, 2020, 00:31 Tanmai Khanna <khanna.tan...@gmail.com> >> wrote: >> >>> Hi, I have a few questions about this: >>> 1. How would you analyse the sentiment of the source text? Considering >>> the language pairs that Apertium deals with are low resource languages. >>> 2. As Tino mentions, is there a problem of sentiment loss in Apertium? >>> Any examples of this? >>> 3. Doesn't the sentiment analysis of a language require a decent amount >>> of training data? Where would this data be found for low resource languages? >>> >>> Tanmai >>> >>> On Fri, Feb 28, 2020 at 12:02 AM Rajarshi Roychoudhury < >>> rroychoudhu...@gmail.com> wrote: >>> >>>> The effect won't be very evident on simple sentences, I think it would >>>> be more effective on sentences where choice of words can decide the >>>> efficiency of translation. It's not about if "Watch out" could be " be >>>> careful" , it's about choosing words that can retain the urgency in "watch >>>> out". Sentiment information on original sentence can help in that. >>>> >>>> On Thu, Feb 27, 2020, 23:47 Scoop Gracie <scoopgra...@gmail.com> wrote: >>>> >>>>> So, "Watch out!" Could become "Be careful"? >>>>> >>>>> On Thu, Feb 27, 2020, 10:13 Rajarshi Roychoudhury < >>>>> rroychoudhu...@gmail.com> wrote: >>>>> >>>>>> It is not just about minimizing loss of sentiment , it is about >>>>>> using that information for better translation. A very trivial example >>>>>> would >>>>>> be that for some situations , sentences can project a strong sentiment >>>>>> and >>>>>> simple translation may not always yield the best result. However if we >>>>>> can >>>>>> use the knowledge of the sentiment to choose the words , it might give >>>>>> better result. >>>>>> >>>>>> As far as the codes are concerned, I need to study the source code , >>>>>> or a detailed documentation for proposing a feasible solution. >>>>>> >>>>>> Best, >>>>>> Rajarshi >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Feb 27, 2020, 23:21 Tino Didriksen <m...@tinodidriksen.com> >>>>>> wrote: >>>>>> >>>>>>> My first question would be, is this actually a problem for >>>>>>> rule-based machine translation? I am not a linguist, but given how RBMT >>>>>>> works I can't really see where sentiment would be lost in the process, >>>>>>> especially because Apertium is designed for related languages where >>>>>>> sentiment is mostly the same. But even for less related languages, it >>>>>>> would >>>>>>> be down to the quality of the source language analysis. >>>>>>> >>>>>>> Beyond that, please learn how Apertium specifically works, not just >>>>>>> RBMT in general. http://wiki.apertium.org/wiki/Documentation is a >>>>>>> good start, but our IRC channel is the best place to ask technical >>>>>>> questions. >>>>>>> >>>>>>> One major issue specific to Apertium is that the source information >>>>>>> is no longer available in the target generation step. >>>>>>> >>>>>>> E.g., since you mention English-Hindi, you could install >>>>>>> apertium-eng-hin and see how each part of the pipe works. We have >>>>>>> precompiled binaries common platforms. Again, see wiki and IRC. >>>>>>> >>>>>>> -- Tino Didriksen >>>>>>> >>>>>>> >>>>>>> On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury < >>>>>>> rroychoudhu...@gmail.com> wrote: >>>>>>> >>>>>>>> Formally i present my idea in this form: >>>>>>>> From my understanding of RBMT , >>>>>>>> >>>>>>>> The RBMT system contains: >>>>>>>> >>>>>>>> - a *SL morphological analyser* - analyses a source language >>>>>>>> word and provides the morphological information; >>>>>>>> - a *SL parser* - is a syntax analyser which analyses source >>>>>>>> language sentences; >>>>>>>> - a *translator* - used to translate a source language word >>>>>>>> into the target language; >>>>>>>> - a *TL morphological generator* - works as a generator of >>>>>>>> appropriate target language words for the given grammatica >>>>>>>> information; >>>>>>>> - a *TL parser* - works as a composer of suitable target >>>>>>>> language sentences >>>>>>>> >>>>>>>> I propose a 6th component of the RBMT system: *sentiment based TL >>>>>>>> morphological generator* >>>>>>>> >>>>>>>> I propose that we do word level sentiment analysis of the source >>>>>>>> language and targeted language. For the time being i want to work on >>>>>>>> English-Hindi translation. We do not need a neural network based >>>>>>>> translation, however for getting the sentiment associated with each >>>>>>>> word we >>>>>>>> might use nltk,or develop a character level embedding to just find out >>>>>>>> the >>>>>>>> sentiment assosiated with each word,and form a dictionary out of it.I >>>>>>>> have >>>>>>>> written a paper on it,and received good results.So basically,during the >>>>>>>> final application development we will just have the dictionary,with no >>>>>>>> neural network dependencies. This can easily be done with Python.I >>>>>>>> just >>>>>>>> need a good corpus of English and Hindi words(the sentiment datasets >>>>>>>> are >>>>>>>> available online). >>>>>>>> >>>>>>>> The *sentiment based TL morphological generator *will generate the >>>>>>>> list of possible words,and we will take that word whose sentiment is >>>>>>>> closest to the source language word. >>>>>>>> This is a novel method that has probably not been applied before, >>>>>>>> and might generate better results. >>>>>>>> >>>>>>>> Please provide your valuable feedwork and suggest some necessary >>>>>>>> changes that needs to be made. >>>>>>>> Best, >>>>>>>> Rajarshi >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Apertium-stuff mailing list >>>>>>> Apertium-stuff@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>>> >>>>>> _______________________________________________ >>>>>> Apertium-stuff mailing list >>>>>> Apertium-stuff@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>> >>>>> _______________________________________________ >>>>> Apertium-stuff mailing list >>>>> Apertium-stuff@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>> >>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> >>> >>> -- >>> *Khanna, Tanmai* >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff