[ 
https://issues.apache.org/jira/browse/HADOOP-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843068#comment-13843068
 ] 

Steve Loughran commented on HADOOP-9639:
----------------------------------------

Some quick comments on this

# the upload mechanism assumes that rename() is atomic. This should be spelled 
out, to avoid people trying to use blobstores as their cache infrastructure
# obviously: add a specific exception to indicate some kind of race condition
# The shared cache enabled flags are obviously things that admins would have to 
right to set and make final in yarn-site.xml files, clients to handle this 
without problems.


Security: # you have to also think about preserving the security of files I 
don't want to share with others, either by allowing me to mix cached with 
uncached files (those keeping configuration resources with sensitive 
information), or even let others in the cluster know what binaries I'm pushing 
around. Presumably clusters that care about such things will just disable the 
cache altogether, but there is the use case of "shared cache for most data, 
some private resources". If that use case is not to be supported, we should at 
least call it out.

co-ordination wise

#  I (personally) think we should all just embrace the presence of 1+ ZK quorum 
on the cluster as the core infrastructure HA systems need it, and it would 
avoid everyone trying to write their "let's use the filesystem as a way to 
synchronize clients based on the assumption that FileSystem.create() with 
overwrite==false guarantees unique access". But that's just an opinion, I don't 
see that a side-feature should force the action, but at the same time, if the 
cache it is optional, ZK could be made a prerequisite for caching. It would 
fundamentally change how confident we could be that the system would be 
correct, even on filesystems that break the assumptions of posix 
more-significantly than HDFS.

* 
[HADOOP-9361|https://github.com/steveloughran/hadoop-trunk/tree/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem]
 is attempting to formally define the semantics of a Hadoop-compatible 
filesystem. If you could use that as the foundation assumptions & perhaps even 
[notation|https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md]
 for defining your own behavior, the analysis on P7 could be proved more 
rigorously 

* The semantics of `{{happens-before}} comes from [Lamport78]  [Time, Clocks 
and the Ordering of Events in a Distributed 
System|http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf],
 so should be used as the citation as it is more appropriate than memory models 
of Java or out-of-order CPUs.

* Script-wise, I've been evolving a [[generic YARN service 
launcher|https://github.com/hortonworks/hoya/tree/master/hoya-core/src/main/java/org/apache/hadoop/yarn/service/launcher],
 which is nearly ready to submit as [YARN-679]: if the cleaner service were 
implemented as a YARN service it could be invoked as a run-one command line, or 
deployed in a YARN container service which provided cron-like services

> truly shared cache for jars (jobjar/libjar)
> -------------------------------------------
>
>                 Key: HADOOP-9639
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9639
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: filecache
>    Affects Versions: 2.0.4-alpha
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to