-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22219/#review44846
-----------------------------------------------------------


I'm new to Tika and late to this feature so I don't have any background on the 
targeted use case. Are Tika users looking to machine-translate metadata, parsed 
content or bot? Do they really benefit by having this in Tika and not use their 
existing libraries (they need to squeeze them into a Translator interface impl 
anyway)? Also, org.apache.tika.Tika handles detection and parsing of 
files/binary streams, it seems a bit awkward to now also have a method to 
translate Strings from one language to another.

The implementation looks ok. Would it make sense to have more fine grained 
exception handling for translations? The generic Exception throwing in the 
Translator interface and the catch(Exception) and rethrow IllegalStateException 
in Tika#translate seems a bit weird.

- Matthias Krueger


On June 5, 2014, 4:19 p.m., Tyler Palsulich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22219/
> -----------------------------------------------------------
> 
> (Updated June 5, 2014, 4:19 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> This patch adds basic language translation functionality to Tika. Translation 
> is provided by a Microsoft API, but accessed through Apache 2 licensed 
> com.memetix.microsoft-translator-java-api 
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants 
> to use the translation feature, they have to add a client id and client 
> secret to the 
> tika-core/src/main/resources/org/apache/tika/language/translator.properties 
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
> com.memetix as a dependency in tika-core. I put the Translator class in 
> org.apache.tika.language. There is no integration with the server or CLI, 
> yet. Further, only Strings are translated right now -- if you pass in a full 
> document with xml tags, the structure will be mangled. But, I think that 
> would be a cool feature -- translate the body, title, subtitle, etc, but not 
> the structural elements. 
> 
> There is still more work to do, but I wanted some more eyes on this to make 
> sure I'm heading in the right direction and this is a desired feature. Let me 
> know what you think!
> 
> 
> Diffs
> -----
> 
>   trunk/pom.xml 1600565 
>   trunk/tika-core/src/main/java/org/apache/tika/Tika.java 1600565 
>   trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java 
> 1600565 
>   
> trunk/tika-core/src/main/java/org/apache/tika/language/translate/DefaultTranslator.java
>  PRE-CREATION 
>   
> trunk/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java
>  PRE-CREATION 
>   trunk/tika-translate/pom.xml PRE-CREATION 
>   
> trunk/tika-translate/src/main/java/org/apache/tika/language/translate/MicrosoftTranslator.java
>  PRE-CREATION 
>   
> trunk/tika-translate/src/main/resources/META-INF/services/org.apache.tika.language.translate.Translator
>  PRE-CREATION 
>   
> trunk/tika-translate/src/main/resources/org/apache/tika/language/translator.microsoft.properties
>  PRE-CREATION 
>   
> trunk/tika-translate/src/test/java/org/apache/tika/language/translate/MicrosoftTranslatorTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22219/diff/
> 
> 
> Testing
> -------
> 
> There are two simple unit tests for now which translate "hello" to French 
> ("salut"). One for inputting the source and target languages, one for 
> inputing just the target language (and detecting the source language 
> automatically).
> 
> 
> Thanks,
> 
> Tyler Palsulich
> 
>

Reply via email to