Re: clojure-opennlp

2012-02-13 Thread Lee Hinman
On Feb 11, 2012, at 7:20 AM, Jim foo.bar wrote:

> HI everyone,
> 
> I was just wondering whether anyone has used the clojure-opennlp
> wrapper for multi-word named entity recognition (NER)? I am using it
> to train a drug finder from my private corpus and even though i get
> correct behavior when using the command line tool of apache openNLP
> when trying to use the API i only get single-words entities
> recognised!!! I've opened up a thread in the official mailing list
> because initially i thought there was a genuine problem with openNLP
> but since the command line tool does exactly what i want i'm starting
> to think that it might not be openNLP's fault but either in my code or
> in the clojure wrapper...
> 
> I've followed both the official tutorials and the wrapper
> documentation and thus i am doing everything as instructed...
> I know the name finder expects tokenized sentences and i am indeed
> passing tokenized sentences like this:
> 
> (defn find-names-model [text]
> (map #(drug-find (tokenize %))
> (get-sentences text)))
> 
> It is very strange because i am getting back "Folic" but not "Folic
> acid" regardless of using the exact same model i used with the command
> line tool...
> 
> Any help will be greatly appreciated...
> Regards,
> Jim

I have inquired on the OpenNLP mailing list about a way to train a tokenizer 
not to automatically split on spaces, if I hear back a way to do it I will add 
it to clojure-opennlp.

- Lee

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: clojure-opennlp

2012-02-12 Thread Nicolas Buduroi
Just for the record, it seems this issue has been fixed today:

https://github.com/dakrone/clojure-opennlp/commit/887add29a1fbc3b4aac7d12f5cbc52c43c6a7dcd

Try out the  the new 0.1.8 version.


On Feb 11, 9:20 am, "Jim foo.bar"  wrote:
> HI everyone,
>
> I was just wondering whether anyone has used the clojure-opennlp
> wrapper for multi-word named entity recognition (NER)? I am using it
> to train a drug finder from my private corpus and even though i get
> correct behavior when using the command line tool of apache openNLP
> when trying to use the API i only get single-words entities
> recognised!!! I've opened up a thread in the official mailing list
> because initially i thought there was a genuine problem with openNLP
> but since the command line tool does exactly what i want i'm starting
> to think that it might not be openNLP's fault but either in my code or
> in the clojure wrapper...
>
> I've followed both the official tutorials and the wrapper
> documentation and thus i am doing everything as instructed...
> I know the name finder expects tokenized sentences and i am indeed
> passing tokenized sentences like this:
>
> (defn find-names-model [text]
> (map #(drug-find (tokenize %))
>              (get-sentences text)))
>
> It is very strange because i am getting back "Folic" but not "Folic
> acid" regardless of using the exact same model i used with the command
> line tool...
>
> Any help will be greatly appreciated...
> Regards,
> Jim

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


clojure-opennlp

2012-02-11 Thread Jim foo.bar
HI everyone,

I was just wondering whether anyone has used the clojure-opennlp
wrapper for multi-word named entity recognition (NER)? I am using it
to train a drug finder from my private corpus and even though i get
correct behavior when using the command line tool of apache openNLP
when trying to use the API i only get single-words entities
recognised!!! I've opened up a thread in the official mailing list
because initially i thought there was a genuine problem with openNLP
but since the command line tool does exactly what i want i'm starting
to think that it might not be openNLP's fault but either in my code or
in the clojure wrapper...

I've followed both the official tutorials and the wrapper
documentation and thus i am doing everything as instructed...
I know the name finder expects tokenized sentences and i am indeed
passing tokenized sentences like this:

(defn find-names-model [text]
(map #(drug-find (tokenize %))
 (get-sentences text)))

It is very strange because i am getting back "Folic" but not "Folic
acid" regardless of using the exact same model i used with the command
line tool...

Any help will be greatly appreciated...
Regards,
Jim

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: clojure-opennlp

2011-11-17 Thread Lee Hinman
On Nov 17, 10:15 am, labwor...@gmail.com wrote:
> That's probably an OpenNLP question, but here it goes. Is there a way to
> tell the tokenizer to make tokens of more than one word according to a
> multi-word lexicon?
>
> Thanks for any ideas.
> melipone

Not sure I understand what you're trying to get at 100%, but you
should be able to train the tokenizer to split words however you'd
like, take a look at the training documentation[1] and feel free to
email me if you run into any snags.

- Lee Hinman

[1]: https://github.com/dakrone/clojure-opennlp/blob/master/TRAINING.markdown

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


clojure-opennlp

2011-11-17 Thread labwork07
That's probably an OpenNLP question, but here it goes. Is there a way to  
tell the tokenizer to make tokens of more than one word according to a  
multi-word lexicon?


Thanks for any ideas.
melipone

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en