lewismc opened a new pull request #444:
URL: https://github.com/apache/tika/pull/444


   This issue addresses https://issues.apache.org/jira/browse/TIKA-3403
   In addition to implementing the example file, it proposes the following 
improvements
   * minor upgrade of aws libraries to `1.11.1018`
   * adds a new configuration option for the AWS transcriber allowing client to 
write to a specific region cf. `transcribe.REGION`
   * makes use of 
[SelectObjectContentRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/SelectObjectContentRequest.html)
 which filters the contents of an Amazon S3 object (transcription) based on a 
simple Structured Query Language (SQL) statement. In the request, along with 
the SQL expression, we specify JSON as the data serialization format of the 
object. Amazon S3 uses this to parse object data into records, and returns only 
records that match the specified SQL expression. In our case this means we ONLY 
return the transcription text. This dramatically (orders of magnitude) reduces 
the amount of data we egress from s3 to client.
   * the implementation will now automatically create the bucket (to store the 
transcription) if one does not already exist. This is a merely a utility 
feature.
   * introduces a LOT of exception handling and checks which will assist the 
client in debugging errors/anomalies. 
   * Reformatted GoogleTranslator.java with 4-space indents.
   
   Thanks about it.
   
   CC @rohan2810 FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to