[ 
https://issues.apache.org/jira/browse/OOZIE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027449#comment-15027449
 ] 

Robert Kanter commented on OOZIE-2402:
--------------------------------------

The overall approach looks good to me.  This should be helpful in speeding 
things up.
Some minor things:
# Please remove the trailing whitespaces from the 7 lines
# Can you update the docs?  Update this section 
(http://oozie.apache.org/docs/4.2.0/AG_Install.html#Oozie_Server_Setup) with 
the new usage info.  You can find the source for this page in AG_Install.twiki 
in the code
# In {{concurrentCopyFromLocal}}, I think we should put the call to 
{{copyFolderRecursively}} in a try-catch-finally block.  For instance, we 
should make sure to always call {{threadpool.shutdown()}}.
# It would be nice if {{checkCopyResults}} could print out the exception for 
each failure, instead of just one.  Otherwise, if there are multiple problems, 
the user will have to keep trying after resolving each issue.  
# In {{copyFolderRecursively}}, the Streams should be closed in a finally 
block, not a catch block.  Otherwise, they're only closed if an Exception occurs
# The description for the concurrency should say what the default is (e.g. 
"(default=1)")
# Why use Streams instead of {{fs.copyFromLocalFile}}?
# Can you add/update unit tests in {{TestOozieSharelibCLI}}?

> oozie-setup.sh sharelib create takes a long time on large clusters
> ------------------------------------------------------------------
>
>                 Key: OOZIE-2402
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2402
>             Project: Oozie
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 4.2.0
>            Reporter: Illya Yalovyy
>            Assignee: Illya Yalovyy
>         Attachments: OOZIE-2402-1.patch
>
>
> When cluster has 256+ nodes it can take up to 5 minutes to create a sharelib. 
> Copy the tarball itself takes only around 10 seconds. It seems like 
> performance could be improved by loading files concurrently in many threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to