Re: Amazon Athena

2017-06-07 Thread Seth Wiesman
Seems straight forward. The biggest challenge is that that you don’t want 
Athena picking up on partially written files or for whatever reason corrupt 
files. The issue with S3 is you cannot allow Flink to perform delete, truncate, 
or rename operations because it moves faster than S3 can become consistent. I 
think the simplest solution would be to use the bucketing sink to write files 
out to hdfs and then add an additional operator or auxiliary process that will 
copy them to S3 when they move from pending to complete. If you do this then 
you will only need at least once copy’s to S3 because overwriting a file with 
itself is the only consistent overwrite condition.  

Seth  

On 6/6/17, 10:03 AM, "Aljoscha Krettek"  wrote:

Hi,

I don’t have any experience with Athena but this sounds doable. It seems 
that you only need to have some way of writing into S3 and then Athena will 
pick up the data in S3 when running queries. Multiple folks have used Flink to 
write data from Kafka into S3, the most recent case I know from the mailing 
lists is probably Seth (in cc), could you maybe comment if you find some time?

Best,
Aljoscha

> On 31. May 2017, at 04:10, Madhukar Thota  
wrote:
> 
> Anyone used used Amazon Athena with Apache Flink?
> 
> I have use case where I want to write streaming data ( which is in Avro 
format) from kafka to s3 by converting into parquet format and update S3 
location with daily partitions on Athena table.
> 
> Any guidance is appreciated.
> 





Re: Amazon Athena

2017-06-06 Thread Aljoscha Krettek
Hi,

I don’t have any experience with Athena but this sounds doable. It seems that 
you only need to have some way of writing into S3 and then Athena will pick up 
the data in S3 when running queries. Multiple folks have used Flink to write 
data from Kafka into S3, the most recent case I know from the mailing lists is 
probably Seth (in cc), could you maybe comment if you find some time?

Best,
Aljoscha

> On 31. May 2017, at 04:10, Madhukar Thota  wrote:
> 
> Anyone used used Amazon Athena with Apache Flink?
> 
> I have use case where I want to write streaming data ( which is in Avro 
> format) from kafka to s3 by converting into parquet format and update S3 
> location with daily partitions on Athena table.
> 
> Any guidance is appreciated.
> 



Amazon Athena

2017-05-30 Thread Madhukar Thota
Anyone used used Amazon Athena with Apache Flink?

I have use case where I want to write streaming data ( which is in Avro
format) from kafka to s3 by converting into parquet format and update S3
location with daily partitions on Athena table.

Any guidance is appreciated.