[ https://issues.apache.org/jira/browse/SLING-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417521#comment-15417521 ]
Ian Boston commented on SLING-5948: ----------------------------------- {{X-sling-uploadmode}} yes, no problem. > Support Streaming uploads. > -------------------------- > > Key: SLING-5948 > URL: https://issues.apache.org/jira/browse/SLING-5948 > Project: Sling > Issue Type: Bug > Components: Engine, Servlets > Affects Versions: Servlets Post 2.3.12, Engine 2.5.0 > Reporter: Ian Boston > Assignee: Ian Boston > Attachments: SLING-5948-Proposal1-illustration.patch, > SLING-5948-Proposal2v2.patch, SLING-5948-Proposal2v3.patch, > TarMKDSNotStreamed.png, TarMKDSStreamed.png, TarMKStreamed.png > > > Currently multipart POST request made to sling use the commons file upload > component that parses the request fully before processing. If uploads are > small they are stored in byte[], over a configurable limit they are sent to > disk. This creates additional IO overhead, increases heap usage and increases > upload time. > Having searched the SLing jira, and sling-dev I have failed to find an issue > relating to this area, although it has been discussed in the past. > I have 2 proposals. > The SlingMain Servlet processes all requests, identifying the request type > and parsing the request body. If the body is multipart the Commons File > Upload library is used to process the request body in full when the > SlingServletRequest is created or the first parameter is requested. To enable > streaming of a request this behaviour needs to be modified. Unfortunately, > processing a streamed request requires that the ultimate processor requests > multipar parts in a the correct order to avoid non streaming, so a streaming > behaviour will not be suitable for most POST requests and can only be used if > the ultimate Servlet has been written to process a stream rather than a map > of parameters. > Both proposals need to identify requests that should be processed as a > stream. This identification must happen in the headers or URI as any > identification later than the headers may be too late. Something like a > custom header (x-uploadmode: stream) or a query string (?uploadmode=stream) > or possibly a selector (/path/to/target.stream) would work and each have > advantages and disadvantages. > h1. Proposal 1 > When a POST request is identified as multipart and streaming, create a > LazyParameterMap that uses the Commons File Upload Streaming API > (https://commons.apache.org/proper/commons-fileupload/streaming.html) to > process the request on demand as parameters are requested. If parameters are > requested out of sequence, do something sensible attempting to maintain > streaming behaviour, but if the code really breaks streaming, throw an > exception to alert servlet developer early. > h2. Pros > * Follows a similar pattern to currently using the Servlet API. > h2. Cons > * [] params will be hard to support when the [] is out of order, and almost > impossible if the [] is an upload body. > * May not work when a request is routed incorrectly as getParameter requests > will be out of streaming sequence. > h2. Proposal 2 > When a POST request is identified as multipart and streaming, create a > NullParameterMap that returns null for all parameter get operations. In > addition set a request Attribute containing a Iterator<Part> that allows > access to the request stream in a similar way to the Commons File Upload > Streaming API. Servlets that process uploads streams will use the > Iterator<Part> object retrieved from the request. Part is the Servlet 3 Part > https://tomcat.apache.org/tomcat-7.0-doc/servletapi/javax/servlet/http/Part.html. > IIUC This API is already used in the Sling Engine and exported by a bundle. > h2. Pros > * Won't get broken by existing getParameter calls, which all return null and > do no harm to the stream. > * Far simpler implementation as the Servlet implementation has to get the > request data in streaming order. > h2. Cons > * Needs a custom Sling Upload Operation that understand how to process the > Iterator<Part> > * Can't use the adaptTo mechanism on the request, as > request.adaptTo(Iterator.class) doesn't make sense being too generic. Would > need a new API to make this work. request.adaptTo(PartsIterator.class), which > PartsIterator extends Iterator. > * Supporting the full breadth of the Sling Operation protocol in the Sling > Upload Operation will require wide scale duplication of code from the > ModifyOperation implementation as the ModifyOperation expects RequestProperty > maps and wont work with a streamed part. > * Forces the Sling Post bundle to depend on Servlet 3 to get the Part API, > requiring some patches to the existing test classes. > To support both methods a standard Servlet to handle streamed uploads would > be needed, connecting the file request stream to the Resource output stream. > In some cases (Oak S3 DS Async Uploads, Mongo DS) this wont entirely > eliminate local disk IO, although in most cases the Resource output stream > wrapps the final output stream. To maintain streaming a save operation may > need to be performed for each upload to cause the request stream to be read. > If this is a duplicate issue, please link. > If you have input, please share. > Have some patches in progress, would prefer Proposal 2, as Proposal 1 looks > messy at the moment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)