[ 
https://issues.apache.org/jira/browse/JCRVLT-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069278#comment-16069278
 ] 

Tobias Bocanegra commented on JCRVLT-50:
----------------------------------------

[~kwin] yes, you're right. the {{InputStreamZipArchive}} was not implemented, 
but instead the {{ZipStreamArchive}}. The latter uses a internal memory buffer 
(1mb) [0], after which it switches to a temp file. 

the fundamental problem with our current serialisation format is, that it is 
not streamable by default. i.e. the way the importer works is that it fist 
builds an internal tree of import aggergates and then writes them to the 
repository. this means that the entire stream needs to be re-readable anyways. 
the other flaw is, that the order of the entries in the zip stream are not 
guaranteed to be ordered with depth first traversal. of course, usually they 
are because of the way they are packaged. but in theory, a tool could assemble 
the files differently. and/or put the META-INF/vault files at the end of the 
archive. My idea is to support a marker property that would specify that the 
files are in optimal order. so the importer could consume them in smaller 
chunks and prepare/commit subtrees.

Btw, the import is so complicated, because it's based on the aggregate tree, 
which was initially invented for the command line support and not for 
packaging. It might be better to invent a new format that cuts some filesystem 
mapping corner cases in favour of a streamable format. Or the developer can 
mark certain trees as machine-only, so the format could switch (for example 
serializing large content/dam trees don't really need to be super filesystem 
friendly).

\[0\] 
https://github.com/apache/jackrabbit-filevault/blob/trunk/vault-core/src/main/java/org/apache/jackrabbit/vault/fs/io/ZipStreamArchive.java#L61

> Add support for "hollow" packages
> ---------------------------------
>
>                 Key: JCRVLT-50
>                 URL: https://issues.apache.org/jira/browse/JCRVLT-50
>             Project: Jackrabbit FileVault
>          Issue Type: New Feature
>          Components: Packaging
>            Reporter: Tobias Bocanegra
>            Assignee: Tobias Bocanegra
>             Fix For: 3.1.40
>
>
> when installing the customer package that is 3Gb, I noticed that installing 
> it requires about 10Gb...
> The root cause seems that it doesn't use the file directly, but instead 
> perform multiple copies before actually starting to extract it.
> 1. it copies the package from crx-quickstart/install to datastore 
> 2. from datastore if first copy it in /tmp like 
> vaultpack7793665768596308927.zip
> 3. in /tmp/ it create a second copy __vlttmpbuffer2535888623024233693.dat
> So at the end it will have used many diskspace which in the case of large 
> package is not efficient.
> It could be nice if it would use the original file (datastore or /install 
> folder) all the time.
> ---
> I think we should have 2 improvements:
> # add option for streaming install which would:
> ** create a "hollow" package, that does not store the package content in the 
> repository (basically, a 0 byte jcr:data). I think it's still good to have 
> the package node so that you see it was installed
> ** would not create a snapshot by default
> ** would not allow to uninstall
> ** would install directly from the stream, w/o need of tmp file (there might 
> be a problem with large zip's that still need a tmp file - but with java7 
> that might not be a problem anymore)
> # add option to select a file on the server disk for installation. this can 
> be done independent of the "hollow" package support above. but would allow to 
> scp the files on the server, and then install it w/o an ever-open browser 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to