Author: nick
Date: Sat Dec 20 06:52:15 2014
New Revision: 1646920
URL: http://svn.apache.org/r1646920
Log:
Include more examples in the website TIKA-1390
Modified:
tika/site/publish/1.7/examples.html
tika/site/src/site/apt/1.7/examples.apt
Modified: tika/site/publish/1.7/examples.html
URL:
http://svn.apache.org/viewvc/tika/site/publish/1.7/examples.html?rev=1646920&r1=1646919&r2=1646920&view=diff
==============================================================================
--- tika/site/publish/1.7/examples.html (original)
+++ tika/site/publish/1.7/examples.html Sat Dec 20 06:52:15 2014
@@ -93,7 +93,13 @@
<ul>
<li><a href="#Parsing">Parsing</a>
<ul>
-<li><a href="#Parsing_using_the_Tika_Facade">Parsing using the Tika
Facade</a></li></ul></li></ul></li></ul>
+<li><a href="#Parsing_using_the_Tika_Facade">Parsing using the Tika
Facade</a></li></ul></li>
+<li><a href="#Custom_Content_Handlers">Custom Content Handlers</a>
+<ul>
+<li><a href="#Extract_Phone_Numbers_from_Content_into_the_Metadata">Extract
Phone Numbers from Content into the Metadata</a></li></ul></li>
+<li><a href="#Translation">Translation</a>
+<ul>
+<li><a href="#Translation_using_the_Microsoft_Translation_API">Translation
using the Microsoft Translation API</a></li></ul></li></ul></li></ul>
<div class="section">
<h3><a name="Parsing">Parsing</a></h3>
<p>TODO Explain the options</p>
@@ -102,7 +108,19 @@
<p>TODO Explain about using this</p><style type="text/css">
@import url('attached-includes/css/shCoreDefault.css');
</style>
-<div id="highlighter_260411" class="syntaxhighlighter nogutter java"><table
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div
class="container"><div class="line number37 index0 alt2"><code class="java
keyword">public</code> <code class="java plain">String parseToStringExample()
</code><code class="java keyword">throws</code> <code class="java
plain">IOException, SAXException, TikaException {</code></div><div class="line
number38 index1 alt1"><code class="java
spaces"> </code><code class="java plain">InputStream
stream = ParsingExample.</code><code class="java keyword">class</code><code
class="java plain">.getResourceAsStream(</code><code class="java
string">"test.doc"</code><code class="java plain">);</code></div><div
class="line number39 index2 alt2"><code class="java
spaces"> </code><code class="java plain">Tika tika =
</code><code class="java keyword">new</code> <code class="java
plain">Tika();</code></div><
div class="line number40 index3 alt1"><code class="java
spaces"> </code><code class="java keyword">try</code>
<code class="java plain">{</code></div><div class="line number41 index4
alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java
plain">tika.parseToString(stream);</code></div><div class="line number42 index5
alt1"><code class="java spaces"> </code><code
class="java plain">} </code><code class="java keyword">finally</code> <code
class="java plain">{</code></div><div class="line number43 index6 alt2"><code
class="java
spaces"> </code><code
class="java plain">stream.close();</code></div><div class="line number44 index7
alt1"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number45 index8 alt2"><code
class="java plain">}</code></div></
div></td></tr></tbody></table></div></div></div></div>
+<div id="highlighter_882359" class="syntaxhighlighter nogutter java"><table
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div
class="container"><div class="line number37 index0 alt2"><code class="java
keyword">public</code> <code class="java plain">String parseToStringExample()
</code><code class="java keyword">throws</code> <code class="java
plain">IOException, SAXException, TikaException {</code></div><div class="line
number38 index1 alt1"><code class="java
spaces"> </code><code class="java plain">InputStream
stream = ParsingExample.</code><code class="java keyword">class</code><code
class="java plain">.getResourceAsStream(</code><code class="java
string">"test.doc"</code><code class="java plain">);</code></div><div
class="line number39 index2 alt2"><code class="java
spaces"> </code><code class="java plain">Tika tika =
</code><code class="java keyword">new</code> <code class="java
plain">Tika();</code></div><
div class="line number40 index3 alt1"><code class="java
spaces"> </code><code class="java keyword">try</code>
<code class="java plain">{</code></div><div class="line number41 index4
alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java
plain">tika.parseToString(stream);</code></div><div class="line number42 index5
alt1"><code class="java spaces"> </code><code
class="java plain">} </code><code class="java keyword">finally</code> <code
class="java plain">{</code></div><div class="line number43 index6 alt2"><code
class="java
spaces"> </code><code
class="java plain">stream.close();</code></div><div class="line number44 index7
alt1"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number45 index8 alt2"><code
class="java plain">}</code></div></
div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Custom_Content_Handlers">Custom Content Handlers</a></h3>
+<p>The textual output of parsing a file with Tika is returned via the SAX <a
class="externalLink"
href="http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html">ContentHandler</a>
you pass to the parse method. It is possible to customise your parsing by
supplying your own ContentHandler which does special things.</p>
+<div class="section">
+<h4><a name="Extract_Phone_Numbers_from_Content_into_the_Metadata">Extract
Phone Numbers from Content into the Metadata</a></h4>
+<p>By using the <a
href="./apidocs/org/apache/tika/sax/PhoneExtractingContentHandler.html">PhoneExtractingContentHandler</a>,
you can have any phone numbers found in the textual content of the document
extracted and placed into the Metadata object for you.</p><div
id="highlighter_956409" class="syntaxhighlighter nogutter java"><table
border="0" cellpadding="0" cellspacing="0"><tbody><tr><td class="code"><div
class="container"><div class="line number69 index0 alt2"><code class="java
keyword">public</code> <code class="java keyword">static</code> <code
class="java keyword">void</code> <code class="java plain">process(File file)
</code><code class="java keyword">throws</code> <code class="java
plain">Exception {</code></div><div class="line number70 index1 alt1"><code
class="java spaces"> </code><code class="java
plain">Parser parser = </code><code class="java keyword">new</code> <code
class="java plain">AutoDetectParser();</code></div><div class="line number71
index2 alt2"><code class="java spaces"> </code><code
class="java plain">Metadata metadata = </code><code class="java
keyword">new</code> <code class="java plain">Metadata();</code></div><div
class="line number72 index3 alt1"><code class="java
spaces"> </code><code class="java comments">// The
PhoneExtractingContentHandler will examine any characters for phone numbers
before passing them</code></div><div class="line number73 index4 alt2"><code
class="java spaces"> </code><code class="java
comments">// to the underlying Handler.</code></div><div class="line number74
index5 alt1"><code class="java spaces"> </code><code
class="java plain">PhoneExtractingContentHandler handler = </code><code
class="java keyword">new</code> <code class="java
plain">PhoneExtractingContentHandler(</code><code class="java
keyword">new</code> <code class="java plain">BodyContentHandler(),
metadata);</code></div><div cl
ass="line number75 index6 alt2"><code class="java
spaces"> </code><code class="java plain">InputStream
stream = </code><code class="java keyword">new</code> <code class="java
plain">FileInputStream(file);</code></div><div class="line number76 index7
alt1"><code class="java spaces"> </code><code
class="java keyword">try</code> <code class="java plain">{</code></div><div
class="line number77 index8 alt2"><code class="java
spaces"> </code><code
class="java plain">parser.parse(stream, handler, metadata, </code><code
class="java keyword">new</code> <code class="java
plain">ParseContext());</code></div><div class="line number78 index9
alt1"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number79 index10 alt2"><code
class="java spaces"> </code><code class="java
keyword">finally</code> <code class="java plain">{
</code></div><div class="line number80 index11 alt1"><code class="java
spaces"> </code><code
class="java plain">stream.close();</code></div><div class="line number81
index12 alt2"><code class="java spaces"> </code><code
class="java plain">}</code></div><div class="line number82 index13 alt1"><code
class="java spaces"> </code><code class="java
plain">String[] numbers = metadata.getValues(</code><code class="java
string">"phonenumbers"</code><code class="java plain">);</code></div><div
class="line number83 index14 alt2"><code class="java
spaces"> </code><code class="java keyword">for</code>
<code class="java plain">(String number : numbers) {</code></div><div
class="line number84 index15 alt1"><code class="java
spaces"> </code><code
class="java plain">phoneNumbers.add(number);</code></div><div class="line
number85 index16 al
t2"><code class="java spaces"> </code><code class="java
plain">}</code></div><div class="line number86 index17 alt1"><code class="java
plain">}</code></div></div></td></tr></tbody></table></div></div></div>
+<div class="section">
+<h3><a name="Translation">Translation</a></h3>
+<p>Tika provides a pluggable Translation system, which allow you to send the
results of parsing off to an external system or program to have the text
translated into another language</p>
+<div class="section">
+<h4><a name="Translation_using_the_Microsoft_Translation_API">Translation
using the Microsoft Translation API</a></h4>
+<p>In order to use the Microsoft Translation API, you need to sign up for an
account and get a key, then pass that to Tika when you have the translation
done.</p><div id="highlighter_451559" class="syntaxhighlighter nogutter
java"><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td
class="code"><div class="container"><div class="line number23 index0
alt2"><code class="java keyword">public</code> <code class="java plain">String
microsoftTranslateToFrench(String text) {</code></div><div class="line number24
index1 alt1"><code class="java spaces"> </code><code
class="java plain">MicrosoftTranslator translator = </code><code class="java
keyword">new</code> <code class="java
plain">MicrosoftTranslator();</code></div><div class="line number25 index2
alt2"><code class="java spaces"> </code><code
class="java comments">// Change the id and secret! See <a
href="http://msdn.microsoft.com/en-us/library/hh454950.aspx.">http://msdn.micro
soft.com/en-us/library/hh454950.aspx.</a></code></div><div class="line
number26 index3 alt1"><code class="java
spaces"> </code><code class="java
plain">translator.setId(</code><code class="java string">"dummy-id"</code><code
class="java plain">);</code></div><div class="line number27 index4 alt2"><code
class="java spaces"> </code><code class="java
plain">translator.setSecret(</code><code class="java
string">"dummy-secret"</code><code class="java plain">);</code></div><div
class="line number28 index5 alt1"><code class="java
spaces"> </code><code class="java keyword">try</code>
<code class="java plain">{</code></div><div class="line number29 index6
alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java
plain">translator.translate(text, </code><code class="java
string">"fr"</code><code class="java plain">);</code></div><div clas
s="line number30 index7 alt1"><code class="java
spaces"> </code><code class="java plain">} </code><code
class="java keyword">catch</code> <code class="java plain">(Exception e)
{</code></div><div class="line number31 index8 alt2"><code class="java
spaces"> </code><code
class="java keyword">return</code> <code class="java string">"Error while
translating."</code><code class="java plain">;</code></div><div class="line
number32 index9 alt1"><code class="java
spaces"> </code><code class="java
plain">}</code></div><div class="line number33 index10 alt2"><code class="java
plain">}</code></div></div></td></tr></tbody></table></div></div></div></div>
</div>
<div id="sidebar">
<div id="navigation">
Modified: tika/site/src/site/apt/1.7/examples.apt
URL:
http://svn.apache.org/viewvc/tika/site/src/site/apt/1.7/examples.apt?rev=1646920&r1=1646919&r2=1646920&view=diff
==============================================================================
--- tika/site/src/site/apt/1.7/examples.apt (original)
+++ tika/site/src/site/apt/1.7/examples.apt Sat Dec 20 06:52:15 2014
@@ -37,3 +37,32 @@ Apache Tika API Usage Examples
TODO Explain about using this
%{include|source=src/examples-src/main/java/org/apache/tika/example/ParsingExample.java|snippet=aj:..parseToStringExample()|show-gutter=false}
+
+* {Custom Content Handlers}
+
+ The textual output of parsing a file with Tika is returned via the SAX
+
{{{http://docs.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html}ContentHandler}}
+ you pass to the parse method. It is possible to customise your parsing by
supplying your
+ own ContentHandler which does special things.
+
+** {Extract Phone Numbers from Content into the Metadata}
+
+ By using the
+
{{{./apidocs/org/apache/tika/sax/PhoneExtractingContentHandler.html}PhoneExtractingContentHandler}},
+ you can have any phone numbers found in the textual content of the document
extracted and placed
+ into the Metadata object for you.
+
+%{include|source=src/examples-src/main/java/org/apache/tika/example/GrabPhoneNumbersExample.java|snippet=aj:..process(..File)|show-gutter=false}
+
+* {Translation}
+
+ Tika provides a pluggable Translation system, which allow you to send the
results of
+ parsing off to an external system or program to have the text translated
into another
+ language
+
+** {Translation using the Microsoft Translation API}
+
+ In order to use the Microsoft Translation API, you need to sign up for an
account
+ and get a key, then pass that to Tika when you have the translation done.
+
+%{include|source=src/examples-src/main/java/org/apache/tika/example/TranslatorExample.java|snippet=aj:..microsoftTranslateToFrench(..String)|show-gutter=false}