Re: clojure-opennlp
On Feb 11, 2012, at 7:20 AM, Jim foo.bar wrote: > HI everyone, > > I was just wondering whether anyone has used the clojure-opennlp > wrapper for multi-word named entity recognition (NER)? I am using it > to train a drug finder from my private corpus and even though i get > correct behavior when using the command line tool of apache openNLP > when trying to use the API i only get single-words entities > recognised!!! I've opened up a thread in the official mailing list > because initially i thought there was a genuine problem with openNLP > but since the command line tool does exactly what i want i'm starting > to think that it might not be openNLP's fault but either in my code or > in the clojure wrapper... > > I've followed both the official tutorials and the wrapper > documentation and thus i am doing everything as instructed... > I know the name finder expects tokenized sentences and i am indeed > passing tokenized sentences like this: > > (defn find-names-model [text] > (map #(drug-find (tokenize %)) > (get-sentences text))) > > It is very strange because i am getting back "Folic" but not "Folic > acid" regardless of using the exact same model i used with the command > line tool... > > Any help will be greatly appreciated... > Regards, > Jim I have inquired on the OpenNLP mailing list about a way to train a tokenizer not to automatically split on spaces, if I hear back a way to do it I will add it to clojure-opennlp. - Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: clojure-opennlp
Just for the record, it seems this issue has been fixed today: https://github.com/dakrone/clojure-opennlp/commit/887add29a1fbc3b4aac7d12f5cbc52c43c6a7dcd Try out the the new 0.1.8 version. On Feb 11, 9:20 am, "Jim foo.bar" wrote: > HI everyone, > > I was just wondering whether anyone has used the clojure-opennlp > wrapper for multi-word named entity recognition (NER)? I am using it > to train a drug finder from my private corpus and even though i get > correct behavior when using the command line tool of apache openNLP > when trying to use the API i only get single-words entities > recognised!!! I've opened up a thread in the official mailing list > because initially i thought there was a genuine problem with openNLP > but since the command line tool does exactly what i want i'm starting > to think that it might not be openNLP's fault but either in my code or > in the clojure wrapper... > > I've followed both the official tutorials and the wrapper > documentation and thus i am doing everything as instructed... > I know the name finder expects tokenized sentences and i am indeed > passing tokenized sentences like this: > > (defn find-names-model [text] > (map #(drug-find (tokenize %)) > (get-sentences text))) > > It is very strange because i am getting back "Folic" but not "Folic > acid" regardless of using the exact same model i used with the command > line tool... > > Any help will be greatly appreciated... > Regards, > Jim -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
clojure-opennlp
HI everyone, I was just wondering whether anyone has used the clojure-opennlp wrapper for multi-word named entity recognition (NER)? I am using it to train a drug finder from my private corpus and even though i get correct behavior when using the command line tool of apache openNLP when trying to use the API i only get single-words entities recognised!!! I've opened up a thread in the official mailing list because initially i thought there was a genuine problem with openNLP but since the command line tool does exactly what i want i'm starting to think that it might not be openNLP's fault but either in my code or in the clojure wrapper... I've followed both the official tutorials and the wrapper documentation and thus i am doing everything as instructed... I know the name finder expects tokenized sentences and i am indeed passing tokenized sentences like this: (defn find-names-model [text] (map #(drug-find (tokenize %)) (get-sentences text))) It is very strange because i am getting back "Folic" but not "Folic acid" regardless of using the exact same model i used with the command line tool... Any help will be greatly appreciated... Regards, Jim -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: clojure-opennlp
On Nov 17, 10:15 am, labwor...@gmail.com wrote: > That's probably an OpenNLP question, but here it goes. Is there a way to > tell the tokenizer to make tokens of more than one word according to a > multi-word lexicon? > > Thanks for any ideas. > melipone Not sure I understand what you're trying to get at 100%, but you should be able to train the tokenizer to split words however you'd like, take a look at the training documentation[1] and feel free to email me if you run into any snags. - Lee Hinman [1]: https://github.com/dakrone/clojure-opennlp/blob/master/TRAINING.markdown -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
clojure-opennlp
That's probably an OpenNLP question, but here it goes. Is there a way to tell the tokenizer to make tokens of more than one word according to a multi-word lexicon? Thanks for any ideas. melipone -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en