Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

Chris Mattmann Wed, 05 Jul 2017 08:27:09 -0700

Thamme, great job! 

(proud academic dad)


Cheers,
Chris




On 7/5/17, 12:31 AM, "Joern Kottmann" <[email protected]> wrote:

    +1 to merge this when it implements the Document Categorizer, then we
    can also use those tools to train and evaluate it
    
    Jörn
    
    On Wed, Jul 5, 2017 at 9:28 AM, Rodrigo Agerri <[email protected]> wrote:
    > Hello again,
    >
    > @Thamme, out of curiosity, do you have evaluation numbers on the
    > Stanford Large Movie Review dataset?
    >
    > Best,
    >
    > Rodrigo
    >
    > On Wed, Jul 5, 2017 at 9:25 AM, Rodrigo Agerri <[email protected]> wrote:
    >> +1 to Tommaso's comment. This would be very nice to have in the project.
    >>
    >> R
    >>
    >> On Wed, Jul 5, 2017 at 9:19 AM, Tommaso Teofili
    >> <[email protected]> wrote:
    >>> thanks Thamme for bringing this to the list!
    >>>
    >>>
    >>> Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda 
<[email protected]> ha
    >>> scritto:
    >>>
    >>>> Hello OpenNLP Devs,
    >>>>
    >>>> I am working with text classification using word embeddings like
    >>>> Gloves/Word2Vec and LSTM networks.
    >>>> It will be interesting to see if we can use it as document categorizer,
    >>>> especially for sentiment analysis in OpenNLP.
    >>>>
    >>>> I have already raised a PR to the sandbox repo -
    >>>> https://github.com/apache/opennlp-sandbox/pull/3
    >>>>
    >>>> This is first version, and I expect to receive feedback from Dev 
community
    >>>> to make it work for everyone.
    >>>>
    >>>> Here are the design choices I have made for the initial version:
    >>>>
    >>>>    - Using pre-trained Gloves - I felt the glove vector format is 
clean,
    >>>>    easily customizable in terms of dimensions and vocabulary size, and
    >>>> (also I
    >>>>    have been reading a lot about them from Stanford NLP group).
    >>>>       - Training Gloves isnt hard either, we can do it using the 
original C
    >>>>       library as well as by using DL4J.
    >>>>       - Using DL4J's Multi layer networks with LSTM instead of 
reinventing
    >>>>    this stuff again on JVM for OpenNLP
    >>>>
    >>>>
    >>>> Please share your feedback here or on the github page
    >>>> https://github.com/apache/opennlp-sandbox/pull/3 .
    >>>>
    >>>>
    >>> I think the approach outlined here sounds good, I think we could
    >>> incorporate the PR as soon as it implements the Doccat API.
    >>> Then we may see whether and how it makes sense to adjust it to use other
    >>> types of embeddings (e.g. paragraph vectors) and / or different network
    >>> setups (e.g. more hidden layers, bidirectionalLSTM, etc.).
    >>>
    >>> Looking forward to see this move forward,
    >>> Regards,
    >>> Tommaso
    >>>
    >>>
    >>>>
    >>>> Thanks,
    >>>> TG
    >>>>
    >>>>
    >>>> --
    >>>> *Thamme Gowda *
    >>>> @thammegowda <https://twitter.com/thammegowda> |
    >>>> http://scf.usc.edu/~tnarayan/
    >>>> ~Sent via somebody's Webmail server
    >>>>

Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

Reply via email to