Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-20 Thread Till Rohrmann
Hi Biao, you're right. What you've described is a totally valid use case and we should design the interfaces such that you can have specialized implementations for the cases where you can exploit things like a common DFS. I think Nico's design should include this. Cheers, Till On Fri, Jun 16,

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-16 Thread Biao Liu
Hi Till I agree with you about the Flink's DC. It is another topic indeed. I just thought that we can think more about it before refactoring BLOB service. Make sure that it's easy to implement DC on the refactored architecture. I have another question about BLOB service. Can we abstract the BLOB

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-15 Thread Till Rohrmann
Hi Biao, you're right that the BlobServer won't live in the JM in FLIP-6. Instead it will either be part of the RM or the dispatcher component depending on the actual implementation. The requirements for the BlobServer should, however, be the same. Concerning the question about Flink's

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-15 Thread Biao Liu
I have the same concern with Chesnay Schepler. AFIK Flink does not support DC as well as Mapreduce and Spark. We only support DC in DataSet API. And DC in flink do not support local files. Is this a good change to refactor DC too? I have another concern, currently BLOB server has some conflicts

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-14 Thread Chesnay Schepler
The DC does delete local files. The user requests the file through the RuntimeContext, but the download and local caching is completely handled by Flink. The problems you mention is exactly why I'm suggesting to rebuild the DC on-top of the Blob service. If a user currently wants to use the

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-14 Thread Nico Kruber
Actually, I don't see the automatic cleanup you referred to but from what I can see around the DistributedCache class and its use, it is simply a registry for user-files whose life cycle is completely managed by the user (upload, download, delete). Files may even reside outside of flink's

Re: [DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-14 Thread Chesnay Schepler
One thing i was wondering for a long time now is why the distributed cache is not implemented via the blob store. The DC is essentially just a copy routine with local caching and automatic cleanup, so basically what the blob store is supposed to do (i guess). On 13.06.2017 16:31, Nico Kruber

[DISCUSS] FLIP-19: Improved BLOB storage architecture

2017-06-13 Thread Nico Kruber
Hi all, I'd like to initiate a discussion on some architectural changes for the BLOB server which finally add a proper cleanup story, remove some dead code, and extend the BLOB store's use for off-loaded (large) RPC messages. Please have a look at FLIP-19 that details the proposed changes: