Bernd Mathiske created MESOS-1667:
-------------------------------------

             Summary: Extract from URI while downloading into work dir
                 Key: MESOS-1667
                 URL: https://issues.apache.org/jira/browse/MESOS-1667
             Project: Mesos
          Issue Type: Improvement
          Components: slave
    Affects Versions: 0.20.0
         Environment: Every
            Reporter: Bernd Mathiske


When the fetcher downloads an extractable archive, e.g. a tar file, it 
currently downloads it completely and only then starts extracting from it. But 
only the end result is needed for execution. Thus the space used for the 
downloaded copy of the archive is wasted. This can become critical in case of 
large archives.

The general idea to solve this issue is to perform the extraction while 
downloading, and not storing intermediate results on disk. Possibly, this can 
be achieved by arranging process pipes or by using some extraction library code 
to stream the data through.

However, as a result of this, repeated downloading may always be called for, 
whereas given an existing (https://reviews.apache.org/r/21316/) but not yet 
committed patch for MESOS-336, the fetcher cache could just repeat the 
extraction, without downloading more than once. Thus choosing in-stream 
extraction might result in an overall performance loss. We should therefore 
give users extra options in CommandInfo.URI to choose how to handle this.

In some cases, it could be possible to reuse the extracted assets directly, 
also forgoing the repeat extraction. This could be handled with sym links. Then 
extraction can happen during downloading and neither repeat downloading nor 
repeat extraction occur. The user has to be conscious of the safety issue, 
though, that any post-extraction modifications to the downloaded assets are 
visible to subsequent tasks. So, an explicit flag in CommandInfo.UIR is called 
for here, as well.

Ideally, this issue would be solved as a follow-up of MESOS-336, because some 
of the described benefits depend on it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to