FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
Build back to normal after Thamme and I fixed this. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: "Hudson (JIRA)" Date: Tuesday, November 17, 2015 at 11:33 PM To: jpluser Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika > >[ >https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl >ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010200#comm >ent-15010200 ] > >Hudson commented on TIKA-1787: >-- > >SUCCESS: Integrated in tika-trunk-jdk1.7 #889 (See >[https://builds.apache.org/job/tika-trunk-jdk1.7/889/]) >Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika >contributed by Thamme Gowda N and Yueheng He this closes #61 this closes >#62 (mattmann: >[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714931]) >* trunk/tika-parsers/pom.xml > > >> Include Stanford Name Entity Recognition in Tika >> >> >> Key: TIKA-1787 >> URL: https://issues.apache.org/jira/browse/TIKA-1787 >> Project: Tika >> Issue Type: Improvement >> Components: mime, parser >>Affects Versions: 1.12 >> Environment: Java 1.8, Mac OSX 10.11 >>Reporter: Yueheng He >>Assignee: Chris A. Mattmann >> Labels: features, newbie, test >> Fix For: 1.12 >> >> Original Estimate: 168h >> Remaining Estimate: 168h >> >> Using the Stanford Name Entity Recognition, Tika will be able to >>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the >>given text. The extracted name entities will be added to the metadata > > > >-- >This message was sent by Atlassian JIRA >(v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010200#comment-15010200 ] Hudson commented on TIKA-1787: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #889 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/889/]) Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed by Thamme Gowda N and Yueheng He this closes #61 this closes #62 (mattmann: [http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714931]) * trunk/tika-parsers/pom.xml > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++ > > > > > > -Original Message- > From: "Hudson (JIRA)" > Date: Tuesday, November 17, 2015 at 12:48 PM > To: jpluser > Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity > Recognition in Tika > > > > >[ > > > https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl > >ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comm > >ent-15009116 ] > > > >Hudson commented on TIKA-1787: > >-- > > > >UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See > >[https://builds.apache.org/job/tika-trunk-jdk1.7/887/]) > >Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika > >contributed by Thamme Gowda N and Yueheng He this closes #61 this closes > >#62 (mattmann: > >[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835]) > >* trunk/.gitignore > >* trunk/CHANGES.txt > >* trunk/tika-parsers/pom.xml > >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.j > >ava > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityPar > >ser.java > >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNL > >PNERecogniser.java > >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL > >PNERecogniser.java > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL > >PNameFinder.java > >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex > >* > >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNER > >ecogniser.java > >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner > >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex > >* > >trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner > >-regex.txt > >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner > >* > >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityPar > >serTest.java > >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex > >* > >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNER > >ecogniserTest.java > >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser > >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner > >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp > >* > >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/M > >odelGetter.groovy > >* > >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/g > >et-models.sh > >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex > >* > >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner > >-regex.txt > >* > >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-conf > >ig.xml > > > > > >> Include Stanford Name Entity Recognition in Tika > >> > >> > >> Key: TIKA-1787 > >> URL: https://issues.apache.org/jira/browse/TIKA-1787 > >> Project: Tika > >> Issue Type: Improvement > >> Components: mime, parser > >>Affects Versions: 1.12 > >> Environment: Java 1.8, Mac OSX 10.11 > >>Reporter: Yueheng He > >>Assignee: Chris A. Mattmann > >> Labels: features, newbie, test > >> Fix For: 1.12 > >> > >> Original Estimate: 168h > >> Remaining Estimate: 168h > >> > >> Using the Stanford Name Entity Recognition, Tika will be able to > >>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the > >>given text. The extracted name entities will be added to the metadata > > > > > > > >-- > >This message was sent by Atlassian JIRA > >(v6.3.4#6332) > > -- - ThammeGowda N
FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
Thamme, can you have a look here: https://builds.apache.org/job/tika-trunk-jdk1.7/887/org.apache.tika$tika-pa rsers/testReport/junit/org.apache.tika.parser.ner/NamedEntityParserTest/tes tParse/ Tests seem to be failing (worked for me locally maybe b/c I had already downloaded the models?) Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: "Hudson (JIRA)" Date: Tuesday, November 17, 2015 at 12:48 PM To: jpluser Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika > >[ >https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl >ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comm >ent-15009116 ] > >Hudson commented on TIKA-1787: >-- > >UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See >[https://builds.apache.org/job/tika-trunk-jdk1.7/887/]) >Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika >contributed by Thamme Gowda N and Yueheng He this closes #61 this closes >#62 (mattmann: >[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835]) >* trunk/.gitignore >* trunk/CHANGES.txt >* trunk/tika-parsers/pom.xml >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.j >ava >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityPar >ser.java >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNL >PNERecogniser.java >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL >PNERecogniser.java >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL >PNameFinder.java >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex >* >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNER >ecogniser.java >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex >* >trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner >-regex.txt >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner >* >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityPar >serTest.java >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex >* >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNER >ecogniserTest.java >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp >* >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/M >odelGetter.groovy >* >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/g >et-models.sh >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex >* >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner >-regex.txt >* >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-conf >ig.xml > > >> Include Stanford Name Entity Recognition in Tika >> >> >> Key: TIKA-1787 >> URL: https://issues.apache.org/jira/browse/TIKA-1787 >> Project: Tika >> Issue Type: Improvement >> Components: mime, parser >>Affects Versions: 1.12 >> Environment: Java 1.8, Mac OSX 10.11 >>Reporter: Yueheng He >>Assignee: Chris A. Mattmann >> Labels: features, newbie, test >> Fix For: 1.12 >> >> Original Estimate: 168h >> Remaining Estimate: 168h >> >> Using the Stanford Name Entity Recognition, Tika will be able to >>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the >>given text. The extracted name entities will be added to the metadata > > > >-- >This message was sent by Atlassian JIRA >(v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comment-15009116 ] Hudson commented on TIKA-1787: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/887/]) Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed by Thamme Gowda N and Yueheng He this closes #61 this closes #62 (mattmann: [http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835]) * trunk/.gitignore * trunk/CHANGES.txt * trunk/tika-parsers/pom.xml * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.java * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNLPNERecogniser.java * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNLPNERecogniser.java * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNLPNameFinder.java * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex * trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNERecogniser.java * trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner * trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex * trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner-regex.txt * trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner * trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java * trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex * trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNERecogniserTest.java * trunk/tika-parsers/src/test/resources/org/apache/tika/parser * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/ModelGetter.groovy * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/get-models.sh * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner-regex.txt * trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-config.xml > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009110#comment-15009110 ] ASF GitHub Bot commented on TIKA-1787: -- Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/61 > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009111#comment-15009111 ] ASF GitHub Bot commented on TIKA-1787: -- Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/62 > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001042#comment-15001042 ] Thamme Gowda N commented on TIKA-1787: -- With #61, The CoreNLP NER can be activated by following steps: - Add CoreNLP jars and models to classpath. If you are using maven, then add : {code} edu.stanford.nlp stanford-corenlp ${corenlp.version} edu.stanford.nlp stanford-corenlp ${corenlp.version} models {code} - Set System property "ner.impl.class" to "org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser" You can do it either by calling `System.setProperty()` before instantiating tika parsers in code or via commandline by using "-Dner.impl.class=org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser" while launching the JVM. - Activate the NamedEntityParser A demo project setup is at : https://github.com/thammegowda/tika-ner-corenlp > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993250#comment-14993250 ] Yueheng He commented on TIKA-1787: -- Oh sorry about not noticing that. Thank you for pointing that out, Professor! Please let me know if there is anything I can do. > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993220#comment-14993220 ] Chris A. Mattmann commented on TIKA-1787: - Great work as a start, [~Yueheng]! The thing is directly binding to the library isn't possible due to the NLTK license (GPL): http://nlp.stanford.edu/software/CRF-NER.shtml#Download However, we can include NLTK in the form that [~thammegowda] did in #61 on Github - that is - he and I talked about a command line invocation of the tool that we could host on Github and then have Tika call it at runtime which means we wouldn't have to bind to the license. Let me think about this. Thank you! > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He >Assignee: Chris A. Mattmann > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
[ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992696#comment-14992696 ] ASF GitHub Bot commented on TIKA-1787: -- GitHub user TaichiHo opened a pull request: https://github.com/apache/tika/pull/62 fix for TIKA-1787 contributed by Yueheng He Succeed in building using java 1.8.0_65. To see the effect, create a text file like the following. ``` Good afternoon Rajat Raina, how are you today? Hi, I am Tom Brady. I go to school at Stanford University, which is located in California. ``` Save it as test.ner and feed it to tika. ``` java -classpath tika-app/target/tika-app-1.12-SNAPSHOT.jar org.apache.tika.cli.TikaCLI -m test.ner ``` The result should look like this ``` Content-Length: 137 Content-Type: application/stanford-ner LOCATION: [California] ORGANIZATION: [Stanford University] PERSON: [Rajat Raina, Tom Brady] X-Parsed-By: org.apache.tika.parser.DefaultParser X-Parsed-By: org.apache.tika.parser.stanfordNer.StanfordNerParser resourceName: test.ner ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/TaichiHo/tika TIKA-1787 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/62.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #62 commit b94331ece262bb8d8408dda7b22b6dc0bb69557e Author: Taichi Date: 2015-11-05T22:47:22Z fix for TIKA-1787 contributed by Yueheng He > Include Stanford Name Entity Recognition in Tika > > > Key: TIKA-1787 > URL: https://issues.apache.org/jira/browse/TIKA-1787 > Project: Tika > Issue Type: Improvement > Components: mime, parser >Affects Versions: 1.12 > Environment: Java 1.8, Mac OSX 10.11 >Reporter: Yueheng He > Labels: features, newbie, test > Fix For: 1.12 > > Original Estimate: 168h > Remaining Estimate: 168h > > Using the Stanford Name Entity Recognition, Tika will be able to extract name > entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The > extracted name entities will be added to the metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)