Hi

That was my first thought, too: Nothing prevents the Binary implementation from 
checking whether the InputStream is a FileInputStream and then access the 
FileChannel from it.

In the concrete case of Sling, the Sling RequestParameter.getInputStream() 
happens to call the Commons Upload FileItem.getInputStream() method which 
happens to return such a FileInputStream (if the item is actually stored in the 
filesystem, otherwise a ByteArrayInputStream happens to be returned).

Regards
Felix

Am 18.02.2014 um 08:46 schrieb Ian Boston <i...@tfd.co.uk>:

> Hi,
> Is there a reason you would not use the commons upload streaming Api to
> connect the target output stream to the request stream? Iirc you can test
> if both have nio channels and if the do just connect the two. I have used
> this in the past to eliminate all GC activity and spooling. The streaming
> Api is sensitive to order of the multiparts. You must use them as they
> appear and not expect to be able to treat request parameters as a map. In
> addition it is sensitive to other frameworks buffering or accessing the
> request input stream.
> 
> Best regards
> Ian
> 
> On Tuesday, February 18, 2014, Chetan Mehrotra <chetan.mehro...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> Currently in a Sling based application where a user uploads a file to
>> the JCR following sequence of steps are executed
>> 
>> 1. User uploads file via HTTP request mostly using Multi-Part form
>> data based upload
>> 
>> 2. Sling uses Commons File Upload to parse the multi-part request
>> which uses a DiskFileItemFactory and write the binary content to a
>> temporary file (for file size > 256 KB) [1]
>> 
>> 3. Later the servlet would access the JCR Session and create a Binary
>> value by extracting the InputStream
>> 
>> 4. The file content would then be spooled into the BlobStore
>> 
>> Effect of different blobstore
>> ----------------------------------------
>> 
>> Now depending on the type of BlobStore one of the following code flow
>> would happen
>> 
>> A - JR2 DataStores - The inputstream would be copied to file
>> B - S3DataStore - The AWS SDK would be creating a temporary file and
>> then that file content would be streamed back to the S3
>> C - Segment - Content from InputStream would be stored as part of
>> various segments
>> D - MongoBlobStore - Content from InputStream would be pushed to
>> remote mongo via multiple remote calls
>> 
>> Things to note in above sequence
>> 
>> 1. Uploaded content is copied twice.
>> 2. The whole content is spooled via InputStream through JVM Heap
>> 
>> Possible areas of Improvement
>> --------------------------------
>> 
>> 1. If the BlobStore is finally using some File (on same hard disk not
>> NFS) then it might be better to *move* the file which was created in
>> upload. This would help local FileDataStore and S3DataStore
>> 
>> 2. Avoid spooling via InputStream if possible. Spooling via IS is slow
>> [3]. Though in most cases we use efficient buffered copy which is
>> marginally slower than NIO based variants. However avoiding moving
>> byte[] might reduce pressure on GC (probably!)
>> 
>> Changes required
>> ------------------------
>> 
>> If we can have a way to create JCR Binary implementations which
>> enables DataStore/BlobStore to efficiently transfer content then that
>> would help.
>> 
>> For example for File based DS the Binary created can keep a reference
>> to the source File object and that Binary is used in JCR API.
>> Eventually the FileDataStore can treat it in a different way and move
>> the file.
>> 
>> Another example is S3DataStore - In some cases the file has already
>> been transferred to S3 using other options. And the user wants to
>> transfer the S3 file from its bucket to our bucket. So a Binary
>> implementation which can just wrap the S3 url would enable the
>> S3DataStore to transfer the content without streaming all content
>> again [4]
>> 
>> Any thoughts on the best way to enable users of Oak to create Binaries
>> via other means (compared to current mode which only enables via
>> InputStream) and enable the DataStores to make use of such binaries?
>> 
>> Chetan Mehrotra
>> 
>> [1]
>> https://github.com/apache/sling/blob/trunk/bundles/engine/src/main/java/org/apache/sling/engine/impl/parameters/ParameterSupport.java#L190
>> [2]
>> http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/PutObjectRequest.html
>> [3] http://www.baptiste-wicht.com/2010/08/file-copy-in-java-benchmark/3/
>> [4]
>> http://stackoverflow.com/questions/9664904/best-way-to-move-files-between-s3-buckets
>> 

Reply via email to