Attila Sasvari created OOZIE-2821: ------------------------------------- Summary: Using Hadoop Archives for Oozie ShareLib Key: OOZIE-2821 URL: https://issues.apache.org/jira/browse/OOZIE-2821 Project: Oozie Issue Type: New Feature Reporter: Attila Sasvari
Oozie ShareLib is a collection of lots of jar files that are required by Oozie actions. Right now, these jars are uploaded one by one with Oozie ShareLib installation. There can more hundreds of such jars, and many of them are pretty small, significantly smaller than a HDFS block size. Storing a large number of small files in HDFS is inefficient (for example due to the fact that there is an object maintained for each file in the NameNode's memory and blocks containing the small files might be much bigger then the actual files). When an action is executed, these jar files are copied to the distributed cache. It would worth to investigate the possibility of using [Hadoop archives|http://hadoop.apache.org/docs/r2.6.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html] for handling Oozie ShareLib files, because it could result in better utilisation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)