Attila Sasvari created OOZIE-2821:
-------------------------------------

             Summary: Using Hadoop Archives for Oozie ShareLib
                 Key: OOZIE-2821
                 URL: https://issues.apache.org/jira/browse/OOZIE-2821
             Project: Oozie
          Issue Type: New Feature
            Reporter: Attila Sasvari


Oozie ShareLib is a collection of lots of jar files that are required by Oozie 
actions. Right now, these jars are uploaded one by one with Oozie ShareLib 
installation. There can more hundreds of such jars, and many of them are pretty 
small, significantly smaller than a HDFS block size. Storing a large number of 
small files in HDFS is inefficient (for example due to the fact that there is 
an object maintained for each file in the NameNode's memory and blocks 
containing the small files might be much bigger then the actual files). When an 
action is executed, these jar files are copied to the distributed cache.

It  would worth to investigate the possibility of using [Hadoop 
archives|http://hadoop.apache.org/docs/r2.6.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html]
 for handling  Oozie ShareLib files, because it could result in better 
utilisation of HDFS. 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to