[ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1425: ------------------------------------ Fix Version/s: (was: 1.7) 1.8 - push to 1.8 > Automatic batching of Microsoft service calls > --------------------------------------------- > > Key: TIKA-1425 > URL: https://issues.apache.org/jira/browse/TIKA-1425 > Project: Tika > Issue Type: Improvement > Components: translation > Affects Versions: 1.6 > Reporter: Lewis John McGibbney > Fix For: 1.8 > > > Right now when I use the following code I get the stack trace at the bottom > of this description. This seems to be because the Request URI is too large to > make the service request. We need to have a mechansim within the call to > Tika.translate which will, on a service-by-service basis, determine the > maximum Request URI which can be sent. I beleive that this should be on the > Tika side as how else am I meant to know the maximum request size? > {code:title=translator.java|borderStyle=solid} > + Translator translate = new MicrosoftTranslator(); > + ((MicrosoftTranslator) translate).setId("..."); > + ((MicrosoftTranslator) translate).setSecret("..."); > for (java.util.Map.Entry<Text, Parse> entry : parseResult) { > Parse parse = entry.getValue(); > LOG.info("---------\nUrl\n---------------\n"); > @@ -201,7 +207,7 @@ > System.out.print(parse.getData().toString()); > if (dumpText) { > LOG.info("---------\nParseText\n---------\n"); > - System.out.print(parse.getText()); > + System.out.print(translate.translate(parse.getText(), "fr")); > } > {code} > {code:title=stacktrace.log|borderStyle=solid} > Exception in thread "main" java.lang.Exception: [microsoft-translator-api] > Error retrieving translation : Server returned HTTP response code: 414 for > URL: > http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0... > ... > at > com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:202) > at com.memetix.mst.translate.Translate.execute(Translate.java:61) > at com.memetix.mst.translate.Translate.execute(Translate.java:76) > at > org.apache.tika.language.translate.MicrosoftTranslator.translate(MicrosoftTranslator.java:104) > at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:210) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:228) > Caused by: java.io.IOException: Server returned HTTP response code: 414 for > URL: > http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE%D1%80%D1%83%D0%B... > ... > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675) > at > sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1673) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1671) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1244) > at > com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:178) > at > com.memetix.mst.MicrosoftTranslatorAPI.retrieveString(MicrosoftTranslatorAPI.java:199) > ... 6 more > Caused by: java.io.IOException: Server returned HTTP response code: 414 for > URL: > http://api.microsofttranslator.com/V2/Ajax.svc/Translate?&from=&to=fr&text=%D0%A4%D0%BE... > ... > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > com.memetix.mst.MicrosoftTranslatorAPI.retrieveResponse(MicrosoftTranslatorAPI.java:177) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)