lewismc opened a new pull request #444: URL: https://github.com/apache/tika/pull/444
This issue addresses https://issues.apache.org/jira/browse/TIKA-3403 In addition to implementing the example file, it proposes the following improvements * minor upgrade of aws libraries to `1.11.1018` * adds a new configuration option for the AWS transcriber allowing client to write to a specific region cf. `transcribe.REGION` * makes use of [SelectObjectContentRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/SelectObjectContentRequest.html) which filters the contents of an Amazon S3 object (transcription) based on a simple Structured Query Language (SQL) statement. In the request, along with the SQL expression, we specify JSON as the data serialization format of the object. Amazon S3 uses this to parse object data into records, and returns only records that match the specified SQL expression. In our case this means we ONLY return the transcription text. This dramatically (orders of magnitude) reduces the amount of data we egress from s3 to client. * the implementation will now automatically create the bucket (to store the transcription) if one does not already exist. This is a merely a utility feature. * introduces a LOT of exception handling and checks which will assist the client in debugging errors/anomalies. * Reformatted GoogleTranslator.java with 4-space indents. Thanks about it. CC @rohan2810 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org