Actually, I don't see the automatic cleanup you referred to but from what I 
can see around the DistributedCache class and its use, it is simply a registry 
for user-files whose life cycle is completely managed by the user (upload, 
download, delete). Files may even reside outside of flink's control and survive 
job termination.

Therefore, it is a different use case which can be (mis-)used as some sort of 
blob storage. It...
(1) will not work if there is no distributed/commonly-accessible file system,
(2) does not provide a life cycle management for distributed files,
(3) does not delete local files (this gets especially complicated if files are 
shared, e.g. multiple task at one TaskManager running the same job with the 
same libraries),
...


Nico

On Wednesday, 14 June 2017 12:29:55 CEST Chesnay Schepler wrote:
> One thing i was wondering for a long time now is why the distributed
> cache is not implemented
> via the blob store.
> 
> The DC is essentially just a copy routine with local caching and
> automatic cleanup, so basically what
> the blob store is supposed to do (i guess).
> 
> On 13.06.2017 16:31, Nico Kruber wrote:
> > Hi all,
> > I'd like to initiate a discussion on some architectural changes for the
> > BLOB server which finally add a proper cleanup story, remove some dead
> > code, and extend the BLOB store's use for off-loaded (large) RPC
> > messages.
> > 
> > Please have a look at FLIP-19 that details the proposed changes:
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB
> > +storage+architecture
> > 
> > 
> > Regards
> > Nico
> > 
> > PS: While doing the re-write, I'd also like to fix some concurrency issues
> > in the current code.

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to