Hello OpenNLP Devs,

I am working with text classification using word embeddings like
Gloves/Word2Vec and LSTM networks.
It will be interesting to see if we can use it as document categorizer,
especially for sentiment analysis in OpenNLP.

I have already raised a PR to the sandbox repo -
https://github.com/apache/opennlp-sandbox/pull/3

This is first version, and I expect to receive feedback from Dev community
to make it work for everyone.

Here are the design choices I have made for the initial version:

   - Using pre-trained Gloves - I felt the glove vector format is clean,
   easily customizable in terms of dimensions and vocabulary size, and (also I
   have been reading a lot about them from Stanford NLP group).
      - Training Gloves isnt hard either, we can do it using the original C
      library as well as by using DL4J.
      - Using DL4J's Multi layer networks with LSTM instead of reinventing
   this stuff again on JVM for OpenNLP


Please share your feedback here or on the github page
https://github.com/apache/opennlp-sandbox/pull/3 .


Thanks,
TG


--
*Thamme Gowda *
@thammegowda <https://twitter.com/thammegowda> |
http://scf.usc.edu/~tnarayan/
~Sent via somebody's Webmail server

Reply via email to