Answers--
>The problem of chunks is that they are
>- not self-describing (what size?)
When you parse multipart request, you should get chunk size. In single part 
request, content-length gives chunk size. 
>- must all have the same length
Not mandatory.  Why you think so?  In merge you append chunks in serial manner 
till you find last chunk. 
> - introduce an arbitrary numbering scheme that you cannot break out of
Explain it more.
>- if problems arise for one chunk, the actual order might become very 
>different, so "last chunk" is not a fixed thing
I don't see it as an issue. "last chunk" not required to be fixed thing as 
client explicitly mark last chunk request.
>- as noted above, not in line with existing HTTP concepts for this (even if 
>they currently only apply to GETs)
Afaics, Aws s3 uses chunk number approaches and we believe AWS is doing good in 
terms of scalability and concurrency.  One of the primary reason that we thrust 
on chunknumber approach. 

> Do you mean the file would be deferred or the other parameters? I'd say only 
> the file, because you probably only sent the other ones with the first file 
> snippet (and the client is free to chose when to send them along) and 
> secondly making all params defer is going to be very complex.
Sling cannot create nt:file without jcr:data because it will throw node type 
constraint violations exception. So node creation  and other parameters  
processing has to be done in last request. Client require to parameters in last 
request  but free to send/not send in first and intermediate request.  Sling 
ignores other parameters send  in  first and intermediate requests.

> - have a configurable size limit for those partial files (e.g. 50 GB)
>- once the limit is hit, clean up expired files; if not enough, clean up the 
>oldest files
Imo, size check upon each chunk upload request is not required.  It adds 
complexity. We already have scheduled job which can be configured to do 
necessary clean up. 

> Then a protocol questions arises: What to respond when a client uploads the 
> "final" byte range, but the previously uploaded ones are no longer present on 
> the server?
On first chunk upload,  sling sends "location"  header in response.  Subsequent 
upload request use this header as uploadId. 
Sling sends 404 when no resumable upload found. 


-----Original Message-----
From: Alexander Klimetschek [mailto:aklim...@adobe.com] 
Sent: 26 February 2013 21:35
To: dev@sling.apache.org
Subject: Re: [POST] Servlet resolution for non existing resource

Beware ;-) XXL mail with multiple proposals, the more interesting ones coming 
later...

On 25.02.2013, at 14:48, Shashank Gupta <shgu...@adobe.com> wrote:
> This would make it simpler to switch to a Range-header based upload if it 
> might be standardized around HTTP in the future.
> 
> [SG] Range based upload is not a standard and declined by Julian.

Yes :-) What I mean is if it's going to be standardized in a new HTTP version 
or extension, it will very very likely be byte-range based as well.

> Introduction of new type "sling:partialFile"  complicate things. How does it 
> solve "modify binary" use case.  I will not take this approach unless there 
> is consensus for this approach.

Avoiding nt:file?:

Discussed that with Felix and he pointed out that we'd need to avoid having 
something that looks like an nt:file (either is or extends from it), to avoid 
that jcr events are thrown and generic application code tries to read it before 
the file is finished. Any specific marker we introduce (properties or node type 
such as sling:PartialFile < nt:file) would need to be handled in all the 
existing code, which is not feasible.

So if we go this route, sling:PartialFile must not extend from nt:file.

Clean up of temp files:

But since we need to work around the data store issue (especially since this 
feature is targeted at large files), it's probably better to start with storing 
the partial chunks in the file system. The difficult part here is the clean up, 
mostly because the expiry time for those files needs to be quite long: imagine 
a user starting to upload a big file, then going home, upload fails over 
night/weekend, and during the next day he says "resume upload"... this will 
give typical expiry time of at least one day (ignoring automatic resumes here).

Felix and I discussed this:
- store partial files, including metadata: jcr path + total length
- once full file range is covered, create nt:file in jcr (and clean up partial 
files)
- have a configurable size limit for those partial files (e.g. 50 GB)
- have a configurable expiry time (e.g. 2 days)
- once the limit is hit, clean up expired files; if not enough, clean up the 
oldest files
- run cleanup periodically as well (expiry time or 1/2 expiry time or ...)

Then a protocol questions arises: What to respond when a client uploads the 
"final" byte range, but the previously uploaded ones are no longer present on 
the server? Do we need an additional acknowledge if the file was successfully 
uploaded (HEAD request on the file returning 200?)?

> What if the second chunk failed and you want to repeat it? While all the 
> others including "lastChunk" where successfull. I think chunks where the 
> server doesn't know the real byte coordinates for every single request won't 
> work. You need to specify exactly what is uploaded - and byte ranges + full 
> length are IMHO more generic than to say chunk X out of N, size of chunk = M.
> 
> [SG] I think we are not discussing about parallel chunk upload here and thus 
> invalidates your point.  The spec and impl is for simple, resumeable and 
> serial chunk upload. 
> Query chunk upload provides chunk number and bytes uploaded till failure 
> point.  Client will resume from next chunknumber and byte offset. 

The problem of chunks is that they are
- not self-describing (what size?)
- must all have the same length
- introduce an arbitrary numbering scheme that you cannot break out of
- if problems arise for one chunk, the actual order might become very 
different, so "last chunk" is not a fixed thing
- as noted above, not in line with existing HTTP concepts for this (even if 
they currently only apply to GETs)

Hence my -1 on indexed chunks.

> [SG] too complex.  We have to live with current datastore implementation at 
> least for the time being.

Data store & Oak:

I had a chat with Thomas Müller (works on the Jackrabbit & Oak team). He said 
that
a) the data store in oak will be improved and share binaries on smaller 2MB 
patches (instead of entire files)
b) for the existing JR2 FileDataStore, we should not care about the additional 
space overhead, the garbage collector will take care of it (just a matter of 
enough disk space and gc configuration)

This means we could put the structure into the repository right away (e.g. 
/tmp) and then combine the chunks into the final file. This would happen via 
some kind of SequenceInputStream that combines multiple input streams in a 
fixed sequence into one stream (this actually exists already). Doing so now 
would mean we would basically duplicate the binary in the data store (all 
chunks in /tmp + final file), that shouldn't be an issue.

Later, Oak with its updated data store could optimize here: we replace the 
input stream with a SequenceBinaryInputStream that gives a single input stream 
for the input streams of multiple binaries. It would hold the list of binaries 
and it would be part of the jackrabbit/oak API. The Oak implementation could 
detect that and instead of reading the input stream (and thus copying 
everything, taking time), resolve the binaries and use their internal 
representation to aggregate them into the new one.

This way the Sling solution is more or less the same (only importing a 
different API class later), and the underlying persistence layer improves by 
itself.

When putting things into /tmp and with Jackrabbit 2 we'd need a similar cleanup 
mechanism as with the file system, but at least it would count as normal 
repository content. With Oak, this would even be less of an issue since the 
partial binaries and the final ones would share their data store snippets - so 
it's only a matter of cleaning up /tmp in JCR for the sake of structure 
cleanup, not space savings.

Streaming end-to-end:

Finally, there would actually be a nice use case for updating a nt:file in 
place (no /tmp): streaming formats such as video or audio. Imagine an encoder 
streams a real time video feed into JCR using the partial upload feature - and 
different clients on the other side streaming that video from the JCR, using 
the existing HTTP GET Range support in Sling (which is used e.g. by most modern 
video streaming solutions).

In this case the file could really be nt:file like, if the file is unreadable 
because it is not complete yet, and then try again on the next modification and 
basically succeed with the final one. Such apps have a marker to say it's not 
finished and applications are somewhat forced to know it. And for events: if 
they get one on every modification, they would simply handle it, fail could 
easily be changed to handle the flag to fail fast - as those apps are also most 
likely the ones that ask for the resumable upload in the first place.


> This would apply to files in the form request only, so that is already 
> handled specifically in the sling post servlet (i.e. updating a binary 
> property). If the request contains other sling post servlet params, they 
> would be processed normally.
> [SG] Yes they would be *but* it would be deferred till sling receives last 
> chunk. 

Do you mean the file would be deferred or the other parameters? I'd say only 
the file, because you probably only sent the other ones with the first file 
snippet (and the client is free to chose when to send them along) and secondly 
making all params defer is going to be very complex.

> The question (also for 2.+3.) is: does Sling have to know when everything was 
> uploaded?
> - if it's temporarily stored on the file system, then yes (to move it to jcr)
> - if it's stored in a special content structure in jcr, then yes (to convert 
> it to a nt:file)
> - if it's stored in nt:file in jcr, then no
> 
> [SG]  you break modify binary use case if you append in place.  The binary 
> would be corrupted unless all chunks have arrived.

I guess you refer to the last point (nt:file): of course you'd update the files 
properly (get content, update the appropriate byte range) - not just naively 
append...

> The last two would probably require some investigation in data store 
> enhancements to avoid wasting space.
> [SG] afaik, additive "CUD"  is very fundamental to tarpersistence and 
> datastore so we have to live with it at least till  oak. 

See above.

Cheers,
Alex

Reply via email to