FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread Mattmann, Chris A (3980)
Build back to normal after Thamme and I fixed this.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: "Hudson (JIRA)" 
Date: Tuesday, November 17, 2015 at 11:33 PM
To: jpluser 
Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity
Recognition in Tika

>
>[ 
>https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl
>ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010200#comm
>ent-15010200 ] 
>
>Hudson commented on TIKA-1787:
>--
>
>SUCCESS: Integrated in tika-trunk-jdk1.7 #889 (See
>[https://builds.apache.org/job/tika-trunk-jdk1.7/889/])
>Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika
>contributed by Thamme Gowda N and Yueheng He this closes #61 this closes
>#62 (mattmann: 
>[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714931])
>* trunk/tika-parsers/pom.xml
>
>
>> Include Stanford Name Entity Recognition in Tika
>> 
>>
>> Key: TIKA-1787
>> URL: https://issues.apache.org/jira/browse/TIKA-1787
>> Project: Tika
>>  Issue Type: Improvement
>>  Components: mime, parser
>>Affects Versions: 1.12
>> Environment: Java 1.8, Mac OSX 10.11
>>Reporter: Yueheng He
>>Assignee: Chris A. Mattmann
>>  Labels: features, newbie, test
>> Fix For: 1.12
>>
>>   Original Estimate: 168h
>>  Remaining Estimate: 168h
>>
>> Using the Stanford Name Entity Recognition, Tika will be able to
>>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the
>>given text. The extracted name entities will be added to the metadata
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)



[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010200#comment-15010200
 ] 

Hudson commented on TIKA-1787:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #889 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/889/])
Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed 
by Thamme Gowda N and Yueheng He this closes #61 this closes #62 (mattmann: 
[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714931])
* trunk/tika-parsers/pom.xml


> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread Thamme Gowda N.
://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++
>
>
>
>
>
> -Original Message-
> From: "Hudson (JIRA)" 
> Date: Tuesday, November 17, 2015 at 12:48 PM
> To: jpluser 
> Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity
> Recognition in Tika
>
> >
> >[
> >
> https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl
> >ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comm
> >ent-15009116 ]
> >
> >Hudson commented on TIKA-1787:
> >--
> >
> >UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See
> >[https://builds.apache.org/job/tika-trunk-jdk1.7/887/])
> >Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika
> >contributed by Thamme Gowda N and Yueheng He this closes #61 this closes
> >#62 (mattmann:
> >[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835])
> >* trunk/.gitignore
> >* trunk/CHANGES.txt
> >* trunk/tika-parsers/pom.xml
> >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.j
> >ava
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityPar
> >ser.java
> >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNL
> >PNERecogniser.java
> >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL
> >PNERecogniser.java
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL
> >PNameFinder.java
> >* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex
> >*
> >trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNER
> >ecogniser.java
> >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner
> >* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex
> >*
> >trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner
> >-regex.txt
> >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner
> >*
> >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityPar
> >serTest.java
> >* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex
> >*
> >trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNER
> >ecogniserTest.java
> >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser
> >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner
> >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp
> >*
> >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/M
> >odelGetter.groovy
> >*
> >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/g
> >et-models.sh
> >* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex
> >*
> >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner
> >-regex.txt
> >*
> >trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-conf
> >ig.xml
> >
> >
> >> Include Stanford Name Entity Recognition in Tika
> >> 
> >>
> >> Key: TIKA-1787
> >> URL: https://issues.apache.org/jira/browse/TIKA-1787
> >> Project: Tika
> >>  Issue Type: Improvement
> >>  Components: mime, parser
> >>Affects Versions: 1.12
> >> Environment: Java 1.8, Mac OSX 10.11
> >>Reporter: Yueheng He
> >>Assignee: Chris A. Mattmann
> >>  Labels: features, newbie, test
> >> Fix For: 1.12
> >>
> >>   Original Estimate: 168h
> >>  Remaining Estimate: 168h
> >>
> >> Using the Stanford Name Entity Recognition, Tika will be able to
> >>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the
> >>given text. The extracted name entities will be added to the metadata
> >
> >
> >
> >--
> >This message was sent by Atlassian JIRA
> >(v6.3.4#6332)
>
>


-- 
-
ThammeGowda N


FW: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread Mattmann, Chris A (3980)
Thamme, can you have a look here:

https://builds.apache.org/job/tika-trunk-jdk1.7/887/org.apache.tika$tika-pa
rsers/testReport/junit/org.apache.tika.parser.ner/NamedEntityParserTest/tes
tParse/


Tests seem to be failing (worked for me locally maybe b/c I had
already downloaded the models?)

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++





-Original Message-
From: "Hudson (JIRA)" 
Date: Tuesday, November 17, 2015 at 12:48 PM
To: jpluser 
Subject: [jira] [Commented] (TIKA-1787) Include Stanford Name Entity
Recognition in Tika

>
>[ 
>https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.pl
>ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comm
>ent-15009116 ] 
>
>Hudson commented on TIKA-1787:
>--
>
>UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See
>[https://builds.apache.org/job/tika-trunk-jdk1.7/887/])
>Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika
>contributed by Thamme Gowda N and Yueheng He this closes #61 this closes
>#62 (mattmann: 
>[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835])
>* trunk/.gitignore
>* trunk/CHANGES.txt
>* trunk/tika-parsers/pom.xml
>* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.j
>ava
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityPar
>ser.java
>* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNL
>PNERecogniser.java
>* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL
>PNERecogniser.java
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNL
>PNameFinder.java
>* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex
>* 
>trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNER
>ecogniser.java
>* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner
>* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex
>* 
>trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner
>-regex.txt
>* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner
>* 
>trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityPar
>serTest.java
>* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex
>* 
>trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNER
>ecogniserTest.java
>* trunk/tika-parsers/src/test/resources/org/apache/tika/parser
>* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner
>* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp
>* 
>trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/M
>odelGetter.groovy
>* 
>trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/g
>et-models.sh
>* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex
>* 
>trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner
>-regex.txt
>* 
>trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-conf
>ig.xml
>
>
>> Include Stanford Name Entity Recognition in Tika
>> 
>>
>> Key: TIKA-1787
>> URL: https://issues.apache.org/jira/browse/TIKA-1787
>> Project: Tika
>>  Issue Type: Improvement
>>  Components: mime, parser
>>Affects Versions: 1.12
>> Environment: Java 1.8, Mac OSX 10.11
>>Reporter: Yueheng He
>>Assignee: Chris A. Mattmann
>>  Labels: features, newbie, test
>> Fix For: 1.12
>>
>>   Original Estimate: 168h
>>  Remaining Estimate: 168h
>>
>> Using the Stanford Name Entity Recognition, Tika will be able to
>>extract name entities like PERSON, ORGANIZATION, LOCATION, etc from the
>>given text. The extracted name entities will be added to the metadata
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)



[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009116#comment-15009116
 ] 

Hudson commented on TIKA-1787:
--

UNSTABLE: Integrated in tika-trunk-jdk1.7 #887 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/887/])
Fix for TIKA-1787: Include Stanford Name Entity Recognition in Tika contributed 
by Thamme Gowda N and Yueheng He this closes #61 this closes #62 (mattmann: 
[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1714835])
* trunk/.gitignore
* trunk/CHANGES.txt
* trunk/tika-parsers/pom.xml
* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner
* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NERecogniser.java
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/NamedEntityParser.java
* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/corenlp/CoreNLPNERecogniser.java
* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNLPNERecogniser.java
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/opennlp/OpenNLPNameFinder.java
* trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex
* 
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ner/regex/RegexNERecogniser.java
* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner
* trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex
* 
trunk/tika-parsers/src/main/resources/org/apache/tika/parser/ner/regex/ner-regex.txt
* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner
* 
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/NamedEntityParserTest.java
* trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex
* 
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ner/regex/RegexNERecogniserTest.java
* trunk/tika-parsers/src/test/resources/org/apache/tika/parser
* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner
* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp
* 
trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/ModelGetter.groovy
* 
trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/opennlp/get-models.sh
* trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex
* 
trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/regex/ner-regex.txt
* 
trunk/tika-parsers/src/test/resources/org/apache/tika/parser/ner/tika-config.xml


> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009110#comment-15009110
 ] 

ASF GitHub Bot commented on TIKA-1787:
--

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/61


> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009111#comment-15009111
 ] 

ASF GitHub Bot commented on TIKA-1787:
--

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/62


> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-11 Thread Thamme Gowda N (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001042#comment-15001042
 ] 

Thamme Gowda N commented on TIKA-1787:
--

With #61, The CoreNLP NER can be activated by following steps:

- Add CoreNLP jars and models to classpath. If you are using maven, then add :
{code}
   
edu.stanford.nlp
stanford-corenlp
${corenlp.version}


   
   
edu.stanford.nlp
stanford-corenlp
${corenlp.version}
models

{code}

- Set System property "ner.impl.class" to 
"org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser"
   You can do it either by calling `System.setProperty()` before instantiating 
tika parsers in code or via commandline by using 
"-Dner.impl.class=org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser" while 
launching the JVM.

- Activate the NamedEntityParser

A demo project setup is at : https://github.com/thammegowda/tika-ner-corenlp





> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-05 Thread Yueheng He (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993250#comment-14993250
 ] 

Yueheng He commented on TIKA-1787:
--

Oh sorry about not noticing that. Thank you for pointing that out, Professor! 

Please let me know if there is anything I can do.

> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-05 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993220#comment-14993220
 ] 

Chris A. Mattmann commented on TIKA-1787:
-

Great work as a start,  [~Yueheng]! The thing is directly binding to the 
library isn't possible due to the NLTK license (GPL): 
http://nlp.stanford.edu/software/CRF-NER.shtml#Download

However, we can include NLTK in the form that [~thammegowda] did in #61 on 
Github - that is - he and I talked about a command line invocation of the tool 
that we could host on Github and then have Tika call it at runtime which means 
we wouldn't have to bind to the license. 

Let me think about this. Thank you!

> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>Assignee: Chris A. Mattmann
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

2015-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992696#comment-14992696
 ] 

ASF GitHub Bot commented on TIKA-1787:
--

GitHub user TaichiHo opened a pull request:

https://github.com/apache/tika/pull/62

fix for TIKA-1787 contributed by Yueheng He

Succeed in building using java 1.8.0_65. 
To see the effect, create a text file like the following. 
```
Good afternoon Rajat Raina, how are you today? Hi, I am Tom Brady. I go to 
school at Stanford University, which is located in California.
```
Save it as test.ner and feed it to tika. 
```
java -classpath tika-app/target/tika-app-1.12-SNAPSHOT.jar 
org.apache.tika.cli.TikaCLI -m test.ner
```
The result should look like this
```
Content-Length: 137
Content-Type: application/stanford-ner
LOCATION: [California]
ORGANIZATION: [Stanford University]
PERSON: [Rajat Raina, Tom Brady]
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.stanfordNer.StanfordNerParser
resourceName: test.ner
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TaichiHo/tika TIKA-1787

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/62.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #62


commit b94331ece262bb8d8408dda7b22b6dc0bb69557e
Author: Taichi 
Date:   2015-11-05T22:47:22Z

fix for TIKA-1787 contributed by Yueheng He




> Include Stanford Name Entity Recognition in Tika
> 
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
>  Issue Type: Improvement
>  Components: mime, parser
>Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
>Reporter: Yueheng He
>  Labels: features, newbie, test
> Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)