Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "GrobidQuantitiesParser" page has been changed by GirishRao:
https://wiki.apache.org/tika/GrobidQuantitiesParser

Comment:
Adding new page for grobid quantities parser with tika

New page:
= Grobid Quantities with Tika =

Grobid Quantities is a Java library used to recognize any expressions of 
measurements (e.g. pressure, temperature, etc.) in textual documents, parse, 
normalize and finally convert the measurements into SI units. It can be used on 
technical and scientific articles (text, XML and PDF input) and patents (text 
and XML input). To use its capabilities with Tika, one must install the server 
endpoint created for Grobid Quantities to extract measurement units from text 
passed to it.


<<TableOfContents()>>

== Installation ==

 '''Steps to install''': Install Grobid Quantities by following the steps from 
[[https://github.com/kermitt2/grobid-quantities|github]] and make sure the 
quantity model is trained as per the instructions provided

After installing and training the model, start the REST server using the 
following command


== Start Grobid Quantities Server ==
{{{
$ mvn -Dmaven.test.skip=true jetty:run-war
}}}

The server starts by default on port number 8080 and the server can be seen 
running on http://127.0.0.1:8080.

== Preparing resources for Grobid Quantities in Tika-App ==
You can either perform steps 1 & 2 together or just 3.

 1. '''Activate Named Entity Parser '''
 In order to use any of the 
[[https://wiki.apache.org/tika/TikaAndNER#Activate_Named_Entity_Parser|NamedEntityParser]]
 implementations in Tika ,
 the parser responsible for handling the name recognition task needs to be 
enabled. This can be done with Tika Config XML file, as follows
 {{{
 <?xml version="1.0" encoding="UTF-8"?>
 <properties>
     <parsers>
         <parser class="org.apache.tika.parser.ner.NamedEntityParser">
             <mime>text/plain</mime>
             <mime>text/html</mime>
             <mime>application/xhtml+xml</mime>
         </parser>
     </parsers>
 </properties>
 }}}
 This configuration has to be supplied in the later phases, so store it as 
'tika-config.xml'.


 2. '''Supply GrobidServer.properties file'''

 It is imperative that Tika should know on what host you are running the 
''grobid-quantities-server''. By default Tika will assume your server runs on 
port 8080.
 In order to specify any other port, you must supply a GrobidServer.properties 
file. Sample GrobidServer.properties file.
 My file looks like the following:
 {{{
grobid.server.url=http://localhost:8080
grobid.endpoint.text=/processQuantityText
 }}}

 In a nutshell
 {{{
 #Create a directory for keeping the config and properties file.
 export GROBID_QUANTITIES_RES=$HOME/GrobidQuantitiesRest-resources
 mkdir -p $GROBID_QUANTITIES_RES
 cd $GROBID_QUANTITIES_RES
 #config file must be stored in this directory
 pwd

 export PATH_PREFIX="$GROBID_QUANTITIES_RES/org/apache/tika/parser/ner/grobid"
 mkdir -p $PATH_PREFIX
 #create and edit the properties file
 vim $PATH_PREFIX/GrobidServer.properties
 }}}


== Running Grobid Quantities with Tika ==
{{{

export TIKA_APP={your/path/to/tika-app}/target/tika-app-1.13-SNAPSHOT.jar

#set the system property to use GrobidNERecogniser class
java -Dner.impl.class=org.apache.tika.parser.ner.grobid.GrobidNERecogniser 
-classpath $GROBID_QUANTITIES_RES:$TIKA_APP org.apache.tika.cli.TikaCLI 
--config=$GROBID_QUANTITIES_RES/tika-config.xml -m  
https://en.wikipedia.org/wiki/Time

}}}

Reply via email to