Author: jukka
Date: Wed Oct  5 13:17:51 2011
New Revision: 1179211

URL: http://svn.apache.org/viewvc?rev=1179211&view=rev
Log:
site: Update 0.10 instructions

Modified:
    tika/site/pom.xml
    tika/site/publish/0.10/gettingstarted.html
    tika/site/publish/download.html
    tika/site/publish/index.html
    tika/site/src/site/apt/0.10/gettingstarted.apt
    tika/site/src/site/apt/download.apt
    tika/site/src/site/apt/index.apt

Modified: tika/site/pom.xml
URL: 
http://svn.apache.org/viewvc/tika/site/pom.xml?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/pom.xml (original)
+++ tika/site/pom.xml Wed Oct  5 13:17:51 2011
@@ -28,7 +28,7 @@
   <parent>
     <groupId>org.apache.tika</groupId>
     <artifactId>tika-parent</artifactId>
-    <version>0.9</version>
+    <version>0.10</version>
   </parent>
 
   <artifactId>tika-site</artifactId>

Modified: tika/site/publish/0.10/gettingstarted.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/0.10/gettingstarted.html?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/publish/0.10/gettingstarted.html (original)
+++ tika/site/publish/0.10/gettingstarted.html Wed Oct  5 13:17:51 2011
@@ -122,33 +122,61 @@
   ... &lt;!-- your other classpath entries --&gt;
   &lt;pathelement location=&quot;path/to/tika-core-0.10.jar&quot;/&gt;
   &lt;pathelement location=&quot;path/to/tika-parsers-0.10.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/netcdf-4.2-min.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/slf4j-api-1.5.6.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/apache-mime4j-core-0.7.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/apache-mime4j-dom-0.7.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/commons-compress-1.1.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/commons-codec-1.4.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/pdfbox-1.6.0.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/fontbox-1.6.0.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/jempbox-1.6.0.jar&quot;/&gt;
   &lt;pathelement location=&quot;path/to/commons-logging-1.1.1.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/commons-compress-1.0.jar&quot;/&gt;
-  &lt;pathelement 
location=&quot;path/to/pdfbox-0.10.0-incubating.jar&quot;/&gt;
-  &lt;pathelement 
location=&quot;path/to/fontbox-0.10.0-incubator.jar&quot;/&gt;
-  &lt;pathelement 
location=&quot;path/to/jempbox-0.10.0-incubator.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/poi-3.6.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/poi-scratchpad-3.6.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/poi-ooxml-3.6.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/poi-ooxml-schemas-3.6.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/poi-3.8-beta4.jar&quot;/&gt;
+  &lt;pathelement 
location=&quot;path/to/poi-scratchpad-3.8-beta4.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/poi-ooxml-3.8-beta4.jar&quot;/&gt;
+  &lt;pathelement 
location=&quot;path/to/poi-ooxml-schemas-3.8-beta4.jar&quot;/&gt;
   &lt;pathelement location=&quot;path/to/xmlbeans-2.3.0.jar&quot;/&gt;
   &lt;pathelement location=&quot;path/to/dom4j-1.6.1.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/xml-apis-1.0.b2.jar&quot;/&gt;
   &lt;pathelement 
location=&quot;path/to/geronimo-stax-api_1.0_spec-1.0.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/tagsoup-1.2.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/tagsoup-1.2.1.jar&quot;/&gt;
   &lt;pathelement location=&quot;path/to/asm-3.1.jar&quot;/&gt;
-  &lt;pathelement location=&quot;path/to/log4j-1.2.14.jar&quot;/&gt;
   &lt;pathelement 
location=&quot;path/to/metadata-extractor-2.4.0-beta-1.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/boilerpipe-1.1.0.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/rome-0.9.jar&quot;/&gt;
+  &lt;pathelement location=&quot;path/to/jdom-1.0.jar&quot;/&gt;
 &lt;/classpath&gt;</pre></div><p>An easy way to gather all these libraries is 
to run &quot;mvn dependency:copy-dependencies&quot; in the tika-parsers source 
directory. This will copy all Tika dependencies to the 
<tt>target/dependencies</tt> directory.</p><p>Alternatively you can simply drop 
the entire tika-app jar to your classpath to get all of the above dependencies 
in a single archive.</p></div><div class="section"><h2>Using Tika as a command 
line utility<a name="Using_Tika_as_a_command_line_utility"></a></h2><p>The Tika 
application jar (tika-app-0.10.jar) can be used as a command line utility for 
extracting text content and metadata from all sorts of files. This runnable jar 
contains all the dependencies it needs, so you don't need to worry about 
classpath settings to run it.</p><p>The usage instructions are shown 
below.</p><div><pre>usage: java -jar tika-app-0.10.jar [option] [file]
 
 Options:
-    -? or --help       Print this usage message
-    -v or --verbose    Print debug level messages
-    -g or --gui        Start the Apache Tika GUI
-    -x or --xml        Output XHTML content (default)
-    -h or --html       Output HTML content
-    -t or --text       Output plain text content
-    -m or --metadata   Output only metadata
+    -?  or --help          Print this usage message
+    -v  or --verbose       Print debug level messages
+
+    -g  or --gui           Start the Apache Tika GUI
+    -s  or --server        Start the Apache Tika server
+
+    -x  or --xml           Output XHTML content (default)
+    -h  or --html          Output HTML content
+    -j  or --json          Output JSON content
+    -t  or --text          Output plain text content
+    -T  or --text-main     Output plain text content (main content only)
+    -m  or --metadata      Output only metadata
+    -l  or --language      Output only language
+    -d  or --detect        Detect document type
+    -eX or --encoding=X    Use output encoding X
+    -z  or --extract       Extract all attachements into current directory
+    -r  or --pretty-print  For XML and XHTML outputs, adds newlines and
+                           whitespace, for better readability
+
+    --create-profile=X
+         Create NGram profile, where X is a profile name
+    --list-parsers
+         List the available document parsers
+    --list-parser-details
+         List the available document parsers, and their supported mime types
+    --list-met-models
+         List the available metadata models, and their supported keys
+    --list-supported-types
+         List all known media types and related information
 
 Description:
     Apache Tika will parse the file(s) specified on the
@@ -160,12 +188,21 @@ Description:
 
     If no file name or URL is specified (or the special
     name &quot;-&quot; is used), then the standard input stream
-    is parsed.
+    is parsed. If no arguments were given and no input
+    data is available, the GUI is started instead.
+
+- GUI mode
+
+    Use the &quot;--gui&quot; (or &quot;-g&quot;) option to start the
+    Apache Tika GUI. You can drag and drop files from
+    a normal file explorer to the GUI window to extract
+    text content and metadata from the files.
+
+- Server mode
 
-    Use the &quot;--gui&quot; (or &quot;-g&quot;) option to start
-    the Apache Tika GUI. You can drag and drop files
-    from a normal file explorer to the GUI window to
-    extract text content and metadata from the files.</pre></div><p>You can 
also use the jar as a component in a Unix pipeline or as an external tool in 
many scripting languages.</p><div><pre># Check if an Internet resource contains 
a specific keyword
+    Use the &quot;-server&quot; (or &quot;-s&quot;) option to start the
+    Apache Tika server. The server will listen to the
+    ports you specify as one or more arguments.</pre></div><p>You can also use 
the jar as a component in a Unix pipeline or as an external tool in many 
scripting languages.</p><div><pre># Check if an Internet resource contains a 
specific keyword
 curl http://.../document.doc \
   | java -jar tika-app-0.10.jar --text \
   | grep -q keyword</pre></div></div>

Modified: tika/site/publish/download.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/download.html?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/publish/download.html (original)
+++ tika/site/publish/download.html Wed Oct  5 13:17:51 2011
@@ -84,7 +84,7 @@
                 width="387" height="100"/></a>
       </div>
       <div id="content">
-        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section"><h2>Download 
Apache Tika<a name="Download_Apache_Tika"></a></h2><p>Apache Tika 0.10 is now 
availa
 ble. See the <a class="externalLink" 
href="http://www.apache.org/dist/tika/CHANGES-0.10.txt";>CHANGES.txt</a> file 
for more information on the list of updates in this initial 
release.</p><ul><li><a class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/tika/apache-tika-0.10-src.zip";>apache-tika-0.10-src.zip</a>
 (source archive, <a class="externalLink" 
href="http://www.apache.org/dist/tika/apache-tika-0.10-src.zip.asc";>PGP 
signature</a>)<br />SHA1: <tt>355d0b2fa0de232672e4760941ea0dcf641a82ad</tt><br 
/>MD5: <tt>96fb7db1b0c93d1e958a2ee52c4bd02f</tt></li><li><a 
class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/tika/0.10/tika-app-0.10.jar";>tika-app-0.10.jar</a>
 (runnable jar, <a class="externalLink" 
href="http://www.apache.org/dist/tika/0.10/tika-app-0.10.jar.asc";>PGP 
signature</a>)<br />SHA1: <tt>e1ad4e6cc4601c1c1367c646b2fbc57788664bed</tt><br 
/>MD5: <tt>d4b1136ddedc3ae2f9af778cea42c219</tt></li></ul><p>Apache Tika 
releases are available under the <a clas
 s="externalLink" href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>. See the NOTICE.txt file contained in each release 
artifact for applicable copyright attribution notices.</p><p>If you are looking 
for previous releases of Apache Tika, have a look in the <a 
class="externalLink" 
href="http://archive.apache.org/dist/tika/";>archives</a>.</p><p>If you are 
looking for releases of Apache Tika from the Apache Lucene project (pre-0.8 
releases), have a look in the <a class="externalLink" 
href="http://archive.apache.org/dist/lucene/tika/";>lucene archives</a>. If you 
are looking for releases of ApacheTika from the Apache Incubator (pre-0.2 
releases), have a look in the <a class="externalLink" 
href="http://archive.apache.org/dist/incubator/tika/";>incubator 
archives</a>.</p></div><div class="section"><h2>Export control<a 
name="Export_control"></a></h2><p>Apache Tika includes cryptographic software. 
The country in which you currently reside may have restric
 tions on the import, possession, use, and/or re-export to another country, of 
encryption software. BEFORE using any encryption software, please check your 
country's laws, regulations and policies concerning the import, possession, or 
use, and re-export of encryption software, to see if this is permitted. See 
&lt;<a class="externalLink" 
href="http://www.wassenaar.org/";>http://www.wassenaar.org/</a>&gt; for more 
information.</p><p>The U.S. Government Department of Commerce, Bureau of 
Industry and Security (BIS), has classified this software as Export Commodity 
Control Number (ECCN) 5D002.C.1, which includes information security software 
using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache Software Foundation distribution makes it 
eligible for export under the License Exception ENC Technology Software 
Unrestricted (TSU) exception (see the BIS Export Administration Regulations, 
Section 740.13) for both object code and source 
 code.</p><p>The following provides more details on the included cryptographic 
software:</p><ul><li>Apache Tika uses the Bouncy Castle generic encryption 
libraries for extracting text content and metadata from encrypted PDF files. 
See <a class="externalLink" 
href="http://www.bouncycastle.org/";>http://www.bouncycastle.org/</a> for more 
details on Bouncy Castle.</li></ul></div><div class="section"><h2>Verify<a 
name="Verify"></a></h2><p>It is essential that you verify the integrity of the 
downloaded files using the PGP signatures. Please read <a class="externalLink" 
href="http://httpd.apache.org/dev/verification.html";>Verifying Apache HTTP 
Server Releases</a> for more information on why you should verify our 
releases.</p><p>The PGP signatures can be verified using PGP or GPG. First 
download the KEYS file as well as the .asc signature files for the relevant 
release packages. Make sure you get these files from the main distribution 
directory, rather than from a mirror. Then verify
  the signatures using</p><div class="source"><pre>% pgpk -a KEYS
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section"><h2>Download 
Apache Tika<a name="Download_Apache_Tika"></a></h2><p>Apache Tika 0.10 is now 
availa
 ble. See the <a class="externalLink" 
href="http://www.apache.org/dist/tika/CHANGES-0.10.txt";>CHANGES.txt</a> file 
for more information on the list of updates in this initial 
release.</p><ul><li><a class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/tika/apache-tika-0.10-src.zip";>apache-tika-0.10-src.zip</a>
 (source archive, <a class="externalLink" 
href="http://www.apache.org/dist/tika/apache-tika-0.10-src.zip.asc";>PGP 
signature</a>)<br />SHA1: <tt>355d0b2fa0de232672e4760941ea0dcf641a82ad</tt><br 
/>MD5: <tt>96fb7db1b0c93d1e958a2ee52c4bd02f</tt></li><li><a 
class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/tika/tika-app-0.10.jar";>tika-app-0.10.jar</a>
 (runnable jar, <a class="externalLink" 
href="http://www.apache.org/dist/tika/tika-app-0.10.jar.asc";>PGP 
signature</a>)<br />SHA1: <tt>e1ad4e6cc4601c1c1367c646b2fbc57788664bed</tt><br 
/>MD5: <tt>d4b1136ddedc3ae2f9af778cea42c219</tt></li></ul><p>Apache Tika 
releases are available under the <a class="externa
 lLink" href="http://www.apache.org/licenses/LICENSE-2.0";>Apache License, 
Version 2.0</a>. See the NOTICE.txt file contained in each release artifact for 
applicable copyright attribution notices.</p><p>If you are looking for previous 
releases of Apache Tika, have a look in the <a class="externalLink" 
href="http://archive.apache.org/dist/tika/";>archives</a>.</p><p>If you are 
looking for releases of Apache Tika from the Apache Lucene project (pre-0.8 
releases), have a look in the <a class="externalLink" 
href="http://archive.apache.org/dist/lucene/tika/";>lucene archives</a>. If you 
are looking for releases of ApacheTika from the Apache Incubator (pre-0.2 
releases), have a look in the <a class="externalLink" 
href="http://archive.apache.org/dist/incubator/tika/";>incubator 
archives</a>.</p></div><div class="section"><h2>Export control<a 
name="Export_control"></a></h2><p>Apache Tika includes cryptographic software. 
The country in which you currently reside may have restrictions on t
 he import, possession, use, and/or re-export to another country, of encryption 
software. BEFORE using any encryption software, please check your country's 
laws, regulations and policies concerning the import, possession, or use, and 
re-export of encryption software, to see if this is permitted. See &lt;<a 
class="externalLink" 
href="http://www.wassenaar.org/";>http://www.wassenaar.org/</a>&gt; for more 
information.</p><p>The U.S. Government Department of Commerce, Bureau of 
Industry and Security (BIS), has classified this software as Export Commodity 
Control Number (ECCN) 5D002.C.1, which includes information security software 
using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache Software Foundation distribution makes it 
eligible for export under the License Exception ENC Technology Software 
Unrestricted (TSU) exception (see the BIS Export Administration Regulations, 
Section 740.13) for both object code and source code.</p><
 p>The following provides more details on the included cryptographic 
software:</p><ul><li>Apache Tika uses the Bouncy Castle generic encryption 
libraries for extracting text content and metadata from encrypted PDF files. 
See <a class="externalLink" 
href="http://www.bouncycastle.org/";>http://www.bouncycastle.org/</a> for more 
details on Bouncy Castle.</li></ul></div><div class="section"><h2>Verify<a 
name="Verify"></a></h2><p>It is essential that you verify the integrity of the 
downloaded files using the PGP signatures. Please read <a class="externalLink" 
href="http://httpd.apache.org/dev/verification.html";>Verifying Apache HTTP 
Server Releases</a> for more information on why you should verify our 
releases.</p><p>The PGP signatures can be verified using PGP or GPG. First 
download the KEYS file as well as the .asc signature files for the relevant 
release packages. Make sure you get these files from the main distribution 
directory, rather than from a mirror. Then verify the signa
 tures using</p><div class="source"><pre>% pgpk -a KEYS
 % pgpv apache-tika-X.Y.Z.tar.gz.asc</pre></div><p>or</p><div 
class="source"><pre>% pgp -ka KEYS
 % pgp apache-tika-X.Y.Z.tar.gz.asc</pre></div><p>or</p><div 
class="source"><pre>% gpg --import KEYS
 % gpg --verify apache-tika-X.Y.Z.tar.gz.asc</pre></div></div>

Modified: tika/site/publish/index.html
URL: 
http://svn.apache.org/viewvc/tika/site/publish/index.html?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/publish/index.html (original)
+++ tika/site/publish/index.html Wed Oct  5 13:17:51 2011
@@ -84,7 +84,7 @@
                 width="387" height="100"/></a>
       </div>
       <div id="content">
-        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section"><h2>Apache Tika 
- a content analysis toolkit<a 
name="Apache_Tika_-_a_content_analysis_toolkit"></
 a></h2><p>The Apache Tika&#x2122; toolkit detects and extracts metadata and 
structured text content from various documents using existing parser libraries. 
You can find the latest release on the <a href="./download.html">download 
page</a>. See the <a href="./0.9/gettingstarted.html">Getting Started</a> guide 
for instructions on how to start using Tika.</p><p>Tika is a project of the <a 
class="externalLink" href="http://www.apache.org/";>Apache Software 
Foundation</a>, and was formerly a subproject of <a class="externalLink" 
href="http://lucene.apache.org/";>Apache Lucene</a>.</p></div><div 
class="section"><h2>Latest News<a name="Latest_News"></a></h2><dl><dt>7-11 
November 2011 - Tika at US ApacheCon</dt><dd> ApacheCon NA is coming to 
Vancouver, British Columbia, at the Westin Bayshore, and Chris Mattmann will be 
giving a <a class="externalLink" 
href="http://na11.apachecon.com/talks/19391";>talk</a> on the forthcoming 1.0 
release of Tika as part of the <a class="externalLink" hr
 ef="http://na11.apachecon.com/talk/by_track/1400";>Content Technologies 
track</a> on Thursday November 10th, 2011. The talk will cover the history of 
Tika, its genesis, its inception as a top-level project, and where it's headed 
1.0 and beyond. Come out and support Tika by attending the talk! </dd><dt>30 
September 2011: Apache Tika Release</dt><dd> Apache Tika 0.10 has been 
released. This release includes new parser support for CHM files, bugfixes to 
RTF parsing, an improved GUI and more. Please see the download page for more 
details. </dd><dt>16 February 2011: Apache Tika Release</dt><dd> Apache Tika 
0.9 has been released. This release includes several important bugfixes and new 
features. Please see the download page for more details. </dd><dt>12 November 
2010: Apache Tika Release</dt><dd> Apache Tika 0.8 has been released. Please 
see the download page for more details. This is our first release as a TLP. 
We're excited!</dd><dt>1-5 November 2010 - Tika at US ApacheCon</dt><d
 d> ApacheCon NA is coming to Atlanta, Georgia, at the Westin Peachtree, and 
Tika is being repped as part of the <a class="externalLink" 
href="http://us.apachecon.com/c/acna2010/schedule/2010/11/05";>Lucene and 
friends track</a> on Friday, November 5th, 2010. Chris Mattmann will give a 
talk on how Tika is being used at NASA and in the context of other projects in 
the Apache ecosystem.<p>Friday, Nov. 5th, 2010:</p><ul><li><a 
class="externalLink" 
href="http://us.apachecon.com/c/acna2010/sessions/538";>Scientific data curation 
and processing with Apache Tika</a> - Chris Mattmann @ 
9:00am</li></ul></dd><dt>April 2010: Tika Graduates to TLP</dt><dd> Apache Tika 
was voted into TLP status by a resolution submitted to the Apache Board. We are 
in the process of updating the site and moving things around. If you notice 
anything out of place, let us know.</dd><dt>April 2010: Apache Tika 
Release</dt><dd> Apache Tika 0.7 has been released. Please see the download 
page for more details.</dd>
 <dt>January 2010: Apache Tika Release</dt><dd> Apache Tika 0.6 has been 
released. Please see the download page for more details.</dd><dt>November 2009: 
Apache Tika Release</dt><dd> Apache Tika 0.5 has been released. Please see the 
download page for more details.</dd><dt>14 August 2009 - Lucene at US 
ApacheCon</dt><dd> ApacheCon US is once again in the Bay Area and Lucene is 
coming along for the ride! The Lucene community has planned two full days of 
talks, plus a meetup and the usual bevy of training. With a well-balanced mix 
of first time and veteran ApacheCon speakers, the <a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/schedule#lucene";>Lucene track</a> 
at ApacheCon US promises to have something for everyone. Be sure not to 
miss:<p>Training:</p><ul><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/437";>Lucene Boot Camp</a> 
- A two day training session, Nov. 2nd &amp; 3rd</li><li><a 
class="externalLink" href="http://www.u
 s.apachecon.com/c/acus2009/sessions/375">Solr Day</a> - A one day training 
session, Nov. 2nd</li></ul><p>Thursday, Nov. 5th:</p><ul><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/428";>Introduction to the 
Lucene Ecosystem</a> - Grant Ingersoll @ 9:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/461";>Lucene Basics and 
New Features</a> - Michael Busch @ 10:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/331";>Apache Solr: Out of 
the Box</a> - Chris Hostetter @ 14:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/427";>Introduction to 
Nutch</a> - Andrzej Bialecki @ 15:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/430";>Lucene and Solr 
Performance Tuning</a> - Mark Miller @ 16:30</li></ul><p>Friday, Nov. 
6th:</p><ul><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009
 /sessions/332">Implementing an Information Retrieval Framework for an 
Organizational Repository</a> - Sithu D Sudarsan @ 9:00</li><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/333";>Apache Mahout - 
Going from raw data to Information</a> - Isabel Drost @ 10:00</li><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/334";>MIME Magic with 
Apache Tika</a> - Jukka Zitting @ 11:30</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/335";>Building Intelligent 
Search Applications with the Lucene Ecosystem</a> - Ted Dunning @ 
14:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/462";>Realtime Search</a> 
- Jason Rutherglen @ 15:00</li></ul></dd><dt>July 2009: Apache Tika 
Release</dt><dd> Apache Tika 0.4 has been released. Please see the download 
page for more details.</dd><dt>March 2009: Apache Tika Release</dt><dd> Apache 
Tika 0.3 has been released. P
 lease see the download page for more details.</dd><dt>February 2009: Lucene at 
ApacheCon Europe 2009 in Amsterdam</dt><dd> Lucene will be extremely well 
represented at <a class="externalLink" 
href="http://www.eu.apachecon.com/c/aceu2009/";>ApacheCon EU 2009</a> in 
Amsterdam, Netherlands this March 23-27, 2009:<ul><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/197";>Lucene Boot Camp</a> - A 
two day training session, March 23 &amp; 24th</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/201";>Solr Boot Camp</a> - A 
one day training session, March 24th</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/136";>Introducing Apache 
Mahout</a> - Grant Ingersoll. March 25th @ 10:30</li><li><a 
class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/137";>Lucene/Solr Case 
Studies</a> - Erik Hatcher. March 25th @ 11:30</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu
 2009/sessions/138">Advanced Indexing Techniques with Apache Lucene</a> - 
Michael Busch. March 25th @ 14:00</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/251";>Apache Solr - A Case 
Study</a> - Uri Boness. March 26th @ 17:30</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/250";>Best of breed - httpd, 
forrest, solr and droids</a> - Thorsten Scherler. March 27th @ 17:30</li><li><a 
class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/165";>Apache Droids - an 
intelligent standalone robot framework</a> - Thorsten Scherler. March 26th @ 
15:00</li></ul></dd><dt>December 2008: Apache Tika Release</dt><dd> Apache Tika 
0.2 has been released. Please see the download page for more 
details.</dd><dt>November 2008: User mailing list created</dt><dd> A new 
mailing list, [email protected], has been created for discussion 
about the use of the Tika toolkit. You can subscribe this mailing list by sendin
 g a message to [email protected].</dd><dt>October 2008: 
Tika graduates to a Lucene subproject</dt><dd> Tika has graduated form the 
Incubator to become a subproject of Apache Lucene. The project infrastructure 
will be migrated from incubator.apache.org to 
lucene.apache.org.</dd><dt>October 2008: Apache Tika status report</dt><dd> 
Dave Meikle was just voted in as a new committer.<p>Paolo Mottadelli will 
present Tika at ApacheCon US.</p><p>Tika 0.2 should be released 
soon.</p><p>Usage documentation has been added to the website.</p></dd><dt>July 
2008: Apache Tika status report</dt><dd> Tika community remains relatively 
small, with just a handful of active members<p>Work towards Tika 0.2 continues, 
Chris Mattman has volunteered to be the release manager</p></dd><dt>April 2008: 
Apache Tika status report</dt><dd> Niall Pemberton joined the project as a 
committer and PPMC member<p>The number of issues reported by external 
contributors is growing gradually.</p><p
 >There was a Fast Feather Talk on Tika in ApacheCon EU 2008</p><p>We have good 
 >contacts especially with Apache POI and PDFBox</p><p>We are working towards 
 >Tika 0.2</p><p>Metadata handling improvements are being 
 >discussed</p></dd><dt>January 2008: Apache Tika status report</dt><dd> No new 
 >committers since the last report, activity has been moderate but steady, 
 >leading to the 0.1 release.<p>Tika 0.1 (incubating) has just been 
 >released.</p><p>Chris Mattmann intends to use that release in Nutch, That's 
 >good progress towards Tika's goal of providing data extraction functionality 
 >to other projects.</p><p>A new Tika logo was created by Google Highly Open 
 >Participation student, hasn't been integrated yet.</p></dd><dt>December 27th, 
 >2007: Tika 0.1-incubating Released!</dt><dd> Tika has made its first official 
 >release, titled 0.1-incubating. See the <a class="externalLink" 
 >href="http://www.apache.org/dist/incubator/tika/CHANGES-0.1-incubating.txt";>CHANGES.txt</a>
 > file for more informa
 tion on the list of updates in this initial release. Thanks to all who 
contributed! You can download the official source tarball <a 
class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/incubator/tika";>here</a>.</dd><dt>October
 8th, 2007: Welcome Keith Bennett!</dt><dd> The Tika PPMC has <a 
class="externalLink" 
href="http://www.nabble.com/Please-welcome-Keith-Bennett-as-a-Tika-committer%21-tf4586151.html#a13107428";>elected</a>
 Keith Bennett as our new committer. Welcome!</dd><dt>March 22nd, 2007: Apache 
Tika project started</dt><dd> The Apache Tika project was formally started when 
the <a class="externalLink" 
href="http://wiki.apache.org/incubator/TikaProposal";>Tika proposal</a> was <a 
class="externalLink" 
href="http://mail-archives.apache.org/mod_mbox/incubator-general/200703.mbox/%[email protected]%3e";>accepted</a>
 by the <a class="externalLink" href="http://incubator.apache.org/";>Apache 
Incubator PMC</a>. </dd></dl></d
 iv>
+        <!-- Licensed to the Apache Software Foundation (ASF) under one or 
more --><!-- contributor license agreements.  See the NOTICE file distributed 
with --><!-- this work for additional information regarding copyright 
ownership. --><!-- The ASF licenses this file to You under the Apache License, 
Version 2.0 --><!-- (the "License"); you may not use this file except in 
compliance with --><!-- the License.  You may obtain a copy of the License at 
--><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- 
Unless required by applicable law or agreed to in writing, software --><!-- 
distributed under the License is distributed on an "AS IS" BASIS, --><!-- 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
--><!-- See the License for the specific language governing permissions and 
--><!-- limitations under the License. --><div class="section"><h2>Apache Tika 
- a content analysis toolkit<a 
name="Apache_Tika_-_a_content_analysis_toolkit"></
 a></h2><p>The Apache Tika&#x2122; toolkit detects and extracts metadata and 
structured text content from various documents using existing parser libraries. 
You can find the latest release on the <a href="./download.html">download 
page</a>. See the <a href="./0.10/gettingstarted.html">Getting Started</a> 
guide for instructions on how to start using Tika.</p><p>Tika is a project of 
the <a class="externalLink" href="http://www.apache.org/";>Apache Software 
Foundation</a>, and was formerly a subproject of <a class="externalLink" 
href="http://lucene.apache.org/";>Apache Lucene</a>.</p></div><div 
class="section"><h2>Latest News<a name="Latest_News"></a></h2><dl><dt>7-11 
November 2011 - Tika at US ApacheCon</dt><dd> ApacheCon NA is coming to 
Vancouver, British Columbia, at the Westin Bayshore, and Chris Mattmann will be 
giving a <a class="externalLink" 
href="http://na11.apachecon.com/talks/19391";>talk</a> on the forthcoming 1.0 
release of Tika as part of the <a class="externalLink" h
 ref="http://na11.apachecon.com/talk/by_track/1400";>Content Technologies 
track</a> on Thursday November 10th, 2011. The talk will cover the history of 
Tika, its genesis, its inception as a top-level project, and where it's headed 
1.0 and beyond. Come out and support Tika by attending the talk! </dd><dt>30 
September 2011: Apache Tika Release</dt><dd> Apache Tika 0.10 has been 
released. This release includes new parser support for CHM files, bugfixes to 
RTF parsing, an improved GUI and more. Please see the download page for more 
details. </dd><dt>16 February 2011: Apache Tika Release</dt><dd> Apache Tika 
0.9 has been released. This release includes several important bugfixes and new 
features. Please see the download page for more details. </dd><dt>12 November 
2010: Apache Tika Release</dt><dd> Apache Tika 0.8 has been released. Please 
see the download page for more details. This is our first release as a TLP. 
We're excited!</dd><dt>1-5 November 2010 - Tika at US ApacheCon</dt><
 dd> ApacheCon NA is coming to Atlanta, Georgia, at the Westin Peachtree, and 
Tika is being repped as part of the <a class="externalLink" 
href="http://us.apachecon.com/c/acna2010/schedule/2010/11/05";>Lucene and 
friends track</a> on Friday, November 5th, 2010. Chris Mattmann will give a 
talk on how Tika is being used at NASA and in the context of other projects in 
the Apache ecosystem.<p>Friday, Nov. 5th, 2010:</p><ul><li><a 
class="externalLink" 
href="http://us.apachecon.com/c/acna2010/sessions/538";>Scientific data curation 
and processing with Apache Tika</a> - Chris Mattmann @ 
9:00am</li></ul></dd><dt>April 2010: Tika Graduates to TLP</dt><dd> Apache Tika 
was voted into TLP status by a resolution submitted to the Apache Board. We are 
in the process of updating the site and moving things around. If you notice 
anything out of place, let us know.</dd><dt>April 2010: Apache Tika 
Release</dt><dd> Apache Tika 0.7 has been released. Please see the download 
page for more details.</dd
 ><dt>January 2010: Apache Tika Release</dt><dd> Apache Tika 0.6 has been 
 >released. Please see the download page for more details.</dd><dt>November 
 >2009: Apache Tika Release</dt><dd> Apache Tika 0.5 has been released. Please 
 >see the download page for more details.</dd><dt>14 August 2009 - Lucene at US 
 >ApacheCon</dt><dd> ApacheCon US is once again in the Bay Area and Lucene is 
 >coming along for the ride! The Lucene community has planned two full days of 
 >talks, plus a meetup and the usual bevy of training. With a well-balanced mix 
 >of first time and veteran ApacheCon speakers, the <a class="externalLink" 
 >href="http://www.us.apachecon.com/c/acus2009/schedule#lucene";>Lucene 
 >track</a> at ApacheCon US promises to have something for everyone. Be sure 
 >not to miss:<p>Training:</p><ul><li><a class="externalLink" 
 >href="http://www.us.apachecon.com/c/acus2009/sessions/437";>Lucene Boot 
 >Camp</a> - A two day training session, Nov. 2nd &amp; 3rd</li><li><a 
 >class="externalLink" href="http://www.
 us.apachecon.com/c/acus2009/sessions/375">Solr Day</a> - A one day training 
session, Nov. 2nd</li></ul><p>Thursday, Nov. 5th:</p><ul><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/428";>Introduction to the 
Lucene Ecosystem</a> - Grant Ingersoll @ 9:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/461";>Lucene Basics and 
New Features</a> - Michael Busch @ 10:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/331";>Apache Solr: Out of 
the Box</a> - Chris Hostetter @ 14:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/427";>Introduction to 
Nutch</a> - Andrzej Bialecki @ 15:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/430";>Lucene and Solr 
Performance Tuning</a> - Mark Miller @ 16:30</li></ul><p>Friday, Nov. 
6th:</p><ul><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus200
 9/sessions/332">Implementing an Information Retrieval Framework for an 
Organizational Repository</a> - Sithu D Sudarsan @ 9:00</li><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/333";>Apache Mahout - 
Going from raw data to Information</a> - Isabel Drost @ 10:00</li><li><a 
class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/334";>MIME Magic with 
Apache Tika</a> - Jukka Zitting @ 11:30</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/335";>Building Intelligent 
Search Applications with the Lucene Ecosystem</a> - Ted Dunning @ 
14:00</li><li><a class="externalLink" 
href="http://www.us.apachecon.com/c/acus2009/sessions/462";>Realtime Search</a> 
- Jason Rutherglen @ 15:00</li></ul></dd><dt>July 2009: Apache Tika 
Release</dt><dd> Apache Tika 0.4 has been released. Please see the download 
page for more details.</dd><dt>March 2009: Apache Tika Release</dt><dd> Apache 
Tika 0.3 has been released. 
 Please see the download page for more details.</dd><dt>February 2009: Lucene 
at ApacheCon Europe 2009 in Amsterdam</dt><dd> Lucene will be extremely well 
represented at <a class="externalLink" 
href="http://www.eu.apachecon.com/c/aceu2009/";>ApacheCon EU 2009</a> in 
Amsterdam, Netherlands this March 23-27, 2009:<ul><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/197";>Lucene Boot Camp</a> - A 
two day training session, March 23 &amp; 24th</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/201";>Solr Boot Camp</a> - A 
one day training session, March 24th</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/136";>Introducing Apache 
Mahout</a> - Grant Ingersoll. March 25th @ 10:30</li><li><a 
class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/137";>Lucene/Solr Case 
Studies</a> - Erik Hatcher. March 25th @ 11:30</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/ace
 u2009/sessions/138">Advanced Indexing Techniques with Apache Lucene</a> - 
Michael Busch. March 25th @ 14:00</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/251";>Apache Solr - A Case 
Study</a> - Uri Boness. March 26th @ 17:30</li><li><a class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/250";>Best of breed - httpd, 
forrest, solr and droids</a> - Thorsten Scherler. March 27th @ 17:30</li><li><a 
class="externalLink" 
href="http://eu.apachecon.com/c/aceu2009/sessions/165";>Apache Droids - an 
intelligent standalone robot framework</a> - Thorsten Scherler. March 26th @ 
15:00</li></ul></dd><dt>December 2008: Apache Tika Release</dt><dd> Apache Tika 
0.2 has been released. Please see the download page for more 
details.</dd><dt>November 2008: User mailing list created</dt><dd> A new 
mailing list, [email protected], has been created for discussion 
about the use of the Tika toolkit. You can subscribe this mailing list by sendi
 ng a message to [email protected].</dd><dt>October 2008: 
Tika graduates to a Lucene subproject</dt><dd> Tika has graduated form the 
Incubator to become a subproject of Apache Lucene. The project infrastructure 
will be migrated from incubator.apache.org to 
lucene.apache.org.</dd><dt>October 2008: Apache Tika status report</dt><dd> 
Dave Meikle was just voted in as a new committer.<p>Paolo Mottadelli will 
present Tika at ApacheCon US.</p><p>Tika 0.2 should be released 
soon.</p><p>Usage documentation has been added to the website.</p></dd><dt>July 
2008: Apache Tika status report</dt><dd> Tika community remains relatively 
small, with just a handful of active members<p>Work towards Tika 0.2 continues, 
Chris Mattman has volunteered to be the release manager</p></dd><dt>April 2008: 
Apache Tika status report</dt><dd> Niall Pemberton joined the project as a 
committer and PPMC member<p>The number of issues reported by external 
contributors is growing gradually.</p><
 p>There was a Fast Feather Talk on Tika in ApacheCon EU 2008</p><p>We have 
good contacts especially with Apache POI and PDFBox</p><p>We are working 
towards Tika 0.2</p><p>Metadata handling improvements are being 
discussed</p></dd><dt>January 2008: Apache Tika status report</dt><dd> No new 
committers since the last report, activity has been moderate but steady, 
leading to the 0.1 release.<p>Tika 0.1 (incubating) has just been 
released.</p><p>Chris Mattmann intends to use that release in Nutch, That's 
good progress towards Tika's goal of providing data extraction functionality to 
other projects.</p><p>A new Tika logo was created by Google Highly Open 
Participation student, hasn't been integrated yet.</p></dd><dt>December 27th, 
2007: Tika 0.1-incubating Released!</dt><dd> Tika has made its first official 
release, titled 0.1-incubating. See the <a class="externalLink" 
href="http://www.apache.org/dist/incubator/tika/CHANGES-0.1-incubating.txt";>CHANGES.txt</a>
 file for more inform
 ation on the list of updates in this initial release. Thanks to all who 
contributed! You can download the official source tarball <a 
class="externalLink" 
href="http://www.apache.org/dyn/closer.cgi/incubator/tika";>here</a>.</dd><dt>October
 8th, 2007: Welcome Keith Bennett!</dt><dd> The Tika PPMC has <a 
class="externalLink" 
href="http://www.nabble.com/Please-welcome-Keith-Bennett-as-a-Tika-committer%21-tf4586151.html#a13107428";>elected</a>
 Keith Bennett as our new committer. Welcome!</dd><dt>March 22nd, 2007: Apache 
Tika project started</dt><dd> The Apache Tika project was formally started when 
the <a class="externalLink" 
href="http://wiki.apache.org/incubator/TikaProposal";>Tika proposal</a> was <a 
class="externalLink" 
href="http://mail-archives.apache.org/mod_mbox/incubator-general/200703.mbox/%[email protected]%3e";>accepted</a>
 by the <a class="externalLink" href="http://incubator.apache.org/";>Apache 
Incubator PMC</a>. </dd></dl></
 div>
       </div>
       <div id="sidebar">
         <div id="navigation">

Modified: tika/site/src/site/apt/0.10/gettingstarted.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/0.10/gettingstarted.apt?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/src/site/apt/0.10/gettingstarted.apt (original)
+++ tika/site/src/site/apt/0.10/gettingstarted.apt Wed Oct  5 13:17:51 2011
@@ -137,23 +137,29 @@ Using Tika in an Ant project
   ... <!-- your other classpath entries -->
   <pathelement location="path/to/tika-core-0.10.jar"/>
   <pathelement location="path/to/tika-parsers-0.10.jar"/>
+  <pathelement location="path/to/netcdf-4.2-min.jar"/>
+  <pathelement location="path/to/slf4j-api-1.5.6.jar"/>
+  <pathelement location="path/to/apache-mime4j-core-0.7.jar"/>
+  <pathelement location="path/to/apache-mime4j-dom-0.7.jar"/>
+  <pathelement location="path/to/commons-compress-1.1.jar"/>
+  <pathelement location="path/to/commons-codec-1.4.jar"/>
+  <pathelement location="path/to/pdfbox-1.6.0.jar"/>
+  <pathelement location="path/to/fontbox-1.6.0.jar"/>
+  <pathelement location="path/to/jempbox-1.6.0.jar"/>
   <pathelement location="path/to/commons-logging-1.1.1.jar"/>
-  <pathelement location="path/to/commons-compress-1.0.jar"/>
-  <pathelement location="path/to/pdfbox-0.10.0-incubating.jar"/>
-  <pathelement location="path/to/fontbox-0.10.0-incubator.jar"/>
-  <pathelement location="path/to/jempbox-0.10.0-incubator.jar"/>
-  <pathelement location="path/to/poi-3.6.jar"/>
-  <pathelement location="path/to/poi-scratchpad-3.6.jar"/>
-  <pathelement location="path/to/poi-ooxml-3.6.jar"/>
-  <pathelement location="path/to/poi-ooxml-schemas-3.6.jar"/>
+  <pathelement location="path/to/poi-3.8-beta4.jar"/>
+  <pathelement location="path/to/poi-scratchpad-3.8-beta4.jar"/>
+  <pathelement location="path/to/poi-ooxml-3.8-beta4.jar"/>
+  <pathelement location="path/to/poi-ooxml-schemas-3.8-beta4.jar"/>
   <pathelement location="path/to/xmlbeans-2.3.0.jar"/>
   <pathelement location="path/to/dom4j-1.6.1.jar"/>
-  <pathelement location="path/to/xml-apis-1.0.b2.jar"/>
   <pathelement location="path/to/geronimo-stax-api_1.0_spec-1.0.jar"/>
-  <pathelement location="path/to/tagsoup-1.2.jar"/>
+  <pathelement location="path/to/tagsoup-1.2.1.jar"/>
   <pathelement location="path/to/asm-3.1.jar"/>
-  <pathelement location="path/to/log4j-1.2.14.jar"/>
   <pathelement location="path/to/metadata-extractor-2.4.0-beta-1.jar"/>
+  <pathelement location="path/to/boilerpipe-1.1.0.jar"/>
+  <pathelement location="path/to/rome-0.9.jar"/>
+  <pathelement location="path/to/jdom-1.0.jar"/>
 </classpath>
 ---
 
@@ -178,13 +184,35 @@ Using Tika as a command line utility
 usage: java -jar tika-app-0.10.jar [option] [file]
 
 Options:
-    -? or --help       Print this usage message
-    -v or --verbose    Print debug level messages
-    -g or --gui        Start the Apache Tika GUI
-    -x or --xml        Output XHTML content (default)
-    -h or --html       Output HTML content
-    -t or --text       Output plain text content
-    -m or --metadata   Output only metadata
+    -?  or --help          Print this usage message
+    -v  or --verbose       Print debug level messages
+
+    -g  or --gui           Start the Apache Tika GUI
+    -s  or --server        Start the Apache Tika server
+
+    -x  or --xml           Output XHTML content (default)
+    -h  or --html          Output HTML content
+    -j  or --json          Output JSON content
+    -t  or --text          Output plain text content
+    -T  or --text-main     Output plain text content (main content only)
+    -m  or --metadata      Output only metadata
+    -l  or --language      Output only language
+    -d  or --detect        Detect document type
+    -eX or --encoding=X    Use output encoding X
+    -z  or --extract       Extract all attachements into current directory
+    -r  or --pretty-print  For XML and XHTML outputs, adds newlines and
+                           whitespace, for better readability
+
+    --create-profile=X
+         Create NGram profile, where X is a profile name
+    --list-parsers
+         List the available document parsers
+    --list-parser-details
+         List the available document parsers, and their supported mime types
+    --list-met-models
+         List the available metadata models, and their supported keys
+    --list-supported-types
+         List all known media types and related information
 
 Description:
     Apache Tika will parse the file(s) specified on the
@@ -196,12 +224,21 @@ Description:
 
     If no file name or URL is specified (or the special
     name "-" is used), then the standard input stream
-    is parsed.
+    is parsed. If no arguments were given and no input
+    data is available, the GUI is started instead.
+
+- GUI mode
+
+    Use the "--gui" (or "-g") option to start the
+    Apache Tika GUI. You can drag and drop files from
+    a normal file explorer to the GUI window to extract
+    text content and metadata from the files.
+
+- Server mode
 
-    Use the "--gui" (or "-g") option to start
-    the Apache Tika GUI. You can drag and drop files
-    from a normal file explorer to the GUI window to
-    extract text content and metadata from the files.
+    Use the "-server" (or "-s") option to start the
+    Apache Tika server. The server will listen to the
+    ports you specify as one or more arguments.
 ---
 
  You can also use the jar as a component in a Unix pipeline or

Modified: tika/site/src/site/apt/download.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/download.apt?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/src/site/apt/download.apt (original)
+++ tika/site/src/site/apt/download.apt Wed Oct  5 13:17:51 2011
@@ -28,8 +28,8 @@ Download Apache Tika
      SHA1: <<<355d0b2fa0de232672e4760941ea0dcf641a82ad>>>\
      MD5: <<<96fb7db1b0c93d1e958a2ee52c4bd02f>>>
 
-   * 
{{{http://www.apache.org/dyn/closer.cgi/tika/0.10/tika-app-0.10.jar}tika-app-0.10.jar}}
-     (runnable jar, 
{{{http://www.apache.org/dist/tika/0.10/tika-app-0.10.jar.asc}PGP signature}})\
+   * 
{{{http://www.apache.org/dyn/closer.cgi/tika/tika-app-0.10.jar}tika-app-0.10.jar}}
+     (runnable jar, 
{{{http://www.apache.org/dist/tika/tika-app-0.10.jar.asc}PGP signature}})\
      SHA1: <<<e1ad4e6cc4601c1c1367c646b2fbc57788664bed>>>\
      MD5: <<<d4b1136ddedc3ae2f9af778cea42c219>>>
 

Modified: tika/site/src/site/apt/index.apt
URL: 
http://svn.apache.org/viewvc/tika/site/src/site/apt/index.apt?rev=1179211&r1=1179210&r2=1179211&view=diff
==============================================================================
--- tika/site/src/site/apt/index.apt (original)
+++ tika/site/src/site/apt/index.apt Wed Oct  5 13:17:51 2011
@@ -23,7 +23,7 @@ Apache Tika - a content analysis toolkit
    structured text content from various documents using existing parser
    libraries. You can find the latest release on the
    {{{./download.html}download page}}. See the
-   {{{./0.9/gettingstarted.html}Getting Started}} guide for instructions on
+   {{{./0.10/gettingstarted.html}Getting Started}} guide for instructions on
    how to start using Tika.
 
    Tika is a project of the


Reply via email to