[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017830#comment-14017830 ]
Lewis John McGibbney commented on TIKA-1319: -------------------------------------------- I think this is a nice (and pretty cunning) contribution... which essentially takes Tika in yet another direction... which I think is great. This is entirely new, affects very little of the existing codebase and also adds a nice translation service where users can either specify their source and target languages, or alternatively specify a target and leave it down to the underlying client to guess and determine the source. Licenses all check out. Javadoc's are excellent, especially the comma sep. list of language options. I like the addition of the Tika.detect, however I also agree with you [~tpalsulich] that there are improvements to be made regarding translation of markup content. I personally think that integration with server component could come at a later stage. Good work [~tpalsulich]. > Translation > ----------- > > Key: TIKA-1319 > URL: https://issues.apache.org/jira/browse/TIKA-1319 > Project: Tika > Issue Type: New Feature > Reporter: Tyler Palsulich > Priority: Minor > > I just opened up a review on reviews.apache.org -- > https://reviews.apache.org/r/22219/. I copied the description below. > This patch adds basic language translation functionality to Tika. Translation > is provided by a Microsoft API, but accessed through Apache 2 licensed > com.memetix.microsoft-translator-java-api > (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants > to use the translation feature, they have to add a client id and client > secret to the > tika-core/src/main/resources/org/apache/tika/language/translator.properties > file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added > com.memetix as a dependency in tika-core. I put the Translator class in > org.apache.tika.language. There is no integration with the server or CLI, > yet. Further, only Strings are translated right now -- if you pass in a full > document with xml tags, the structure will be mangled. But, I think that > would be a cool feature -- translate the body, title, subtitle, etc, but not > the structural elements. > There is still more work to do, but I wanted some more eyes on this to make > sure I'm heading in the right direction and this is a desired feature. Let me > know what you think! > There are two simple unit tests for now which translate "hello" to French > ("salut"). One for inputting the source and target languages, one for > inputing just the target language (and detecting the source language > automatically). -- This message was sent by Atlassian JIRA (v6.2#6252)