----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/#review44846 -----------------------------------------------------------
I'm new to Tika and late to this feature so I don't have any background on the targeted use case. Are Tika users looking to machine-translate metadata, parsed content or bot? Do they really benefit by having this in Tika and not use their existing libraries (they need to squeeze them into a Translator interface impl anyway)? Also, org.apache.tika.Tika handles detection and parsing of files/binary streams, it seems a bit awkward to now also have a method to translate Strings from one language to another. The implementation looks ok. Would it make sense to have more fine grained exception handling for translations? The generic Exception throwing in the Translator interface and the catch(Exception) and rethrow IllegalStateException in Tika#translate seems a bit weird. - Matthias Krueger On June 5, 2014, 4:19 p.m., Tyler Palsulich wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22219/ > ----------------------------------------------------------- > > (Updated June 5, 2014, 4:19 p.m.) > > > Review request for tika and Chris Mattmann. > > > Repository: tika > > > Description > ------- > > This patch adds basic language translation functionality to Tika. Translation > is provided by a Microsoft API, but accessed through Apache 2 licensed > com.memetix.microsoft-translator-java-api > (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants > to use the translation feature, they have to add a client id and client > secret to the > tika-core/src/main/resources/org/apache/tika/language/translator.properties > file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added > com.memetix as a dependency in tika-core. I put the Translator class in > org.apache.tika.language. There is no integration with the server or CLI, > yet. Further, only Strings are translated right now -- if you pass in a full > document with xml tags, the structure will be mangled. But, I think that > would be a cool feature -- translate the body, title, subtitle, etc, but not > the structural elements. > > There is still more work to do, but I wanted some more eyes on this to make > sure I'm heading in the right direction and this is a desired feature. Let me > know what you think! > > > Diffs > ----- > > trunk/pom.xml 1600565 > trunk/tika-core/src/main/java/org/apache/tika/Tika.java 1600565 > trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java > 1600565 > > trunk/tika-core/src/main/java/org/apache/tika/language/translate/DefaultTranslator.java > PRE-CREATION > > trunk/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java > PRE-CREATION > trunk/tika-translate/pom.xml PRE-CREATION > > trunk/tika-translate/src/main/java/org/apache/tika/language/translate/MicrosoftTranslator.java > PRE-CREATION > > trunk/tika-translate/src/main/resources/META-INF/services/org.apache.tika.language.translate.Translator > PRE-CREATION > > trunk/tika-translate/src/main/resources/org/apache/tika/language/translator.microsoft.properties > PRE-CREATION > > trunk/tika-translate/src/test/java/org/apache/tika/language/translate/MicrosoftTranslatorTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/22219/diff/ > > > Testing > ------- > > There are two simple unit tests for now which translate "hello" to French > ("salut"). One for inputting the source and target languages, one for > inputing just the target language (and detecting the source language > automatically). > > > Thanks, > > Tyler Palsulich > >