Hello OpenNLP Devs, I am working with text classification using word embeddings like Gloves/Word2Vec and LSTM networks. It will be interesting to see if we can use it as document categorizer, especially for sentiment analysis in OpenNLP.
I have already raised a PR to the sandbox repo - https://github.com/apache/opennlp-sandbox/pull/3 This is first version, and I expect to receive feedback from Dev community to make it work for everyone. Here are the design choices I have made for the initial version: - Using pre-trained Gloves - I felt the glove vector format is clean, easily customizable in terms of dimensions and vocabulary size, and (also I have been reading a lot about them from Stanford NLP group). - Training Gloves isnt hard either, we can do it using the original C library as well as by using DL4J. - Using DL4J's Multi layer networks with LSTM instead of reinventing this stuff again on JVM for OpenNLP Please share your feedback here or on the github page https://github.com/apache/opennlp-sandbox/pull/3 . Thanks, TG -- *Thamme Gowda * @thammegowda <https://twitter.com/thammegowda> | http://scf.usc.edu/~tnarayan/ ~Sent via somebody's Webmail server