Github user ekovacs commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2991#discussion_r219117891
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java
 ---
    @@ -521,161 +553,221 @@ public void onTrigger(final ProcessContext context, 
final ProcessSession session
     
             final long start = System.nanoTime();
             final HttpServletRequest request = container.getRequest();
    -        FlowFile flowFile = session.create();
    -        try (OutputStream flowFileOut = session.write(flowFile)) {
    -            StreamUtils.copy(request.getInputStream(), flowFileOut);
    -        } catch (final IOException e) {
    -            // There may be many reasons which can produce an IOException 
on the HTTP stream and in some of them, eg.
    -            // bad requests, the connection to the client is not closed. 
In order to address also these cases, we try
    -            // and answer with a BAD_REQUEST, which lets the client know 
that the request has not been correctly
    -            // processed and makes it aware that the connection can be 
closed.
    -            getLogger().error("Failed to receive content from HTTP Request 
from {} due to {}",
    -                    new Object[]{request.getRemoteAddr(), e});
    -            session.remove(flowFile);
     
    -            try {
    -                HttpServletResponse response = container.getResponse();
    -                response.sendError(Status.BAD_REQUEST.getStatusCode());
    -                response.flushBuffer();
    -                container.getContext().complete();
    -            } catch (final IOException ioe) {
    -                getLogger().warn("Failed to send HTTP response to {} due 
to {}",
    -                        new Object[]{request.getRemoteAddr(), ioe});
    +        if (!Strings.isNullOrEmpty(request.getContentType()) && 
request.getContentType().contains(MIME_TYPE__MULTIPART_FORM_DATA)) {
    +          final long maxRequestSize = 
context.getProperty(MAX_REQUEST_SIZE).asLong();
    +          request.setAttribute(Request.__MULTIPART_CONFIG_ELEMENT, new 
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0));
    --- End diff --
    
    Hi @markap14 
    Thank you for reviewing my changes. 
    Please see my responses inline.
    Best regards,
    Endre
    
    > As-is, this is going to buffer the entire request into memory and then 
throw an Exception if the request reaches some configured threshold. 
Unfortunately, I think this is going to be too limiting for many use cases. We 
should not request that the entire file upload fit into java's heap.
    
    The original ticket: https://issues.apache.org/jira/browse/NIFI-3469
    Explicitly mentioned to avoid writing to local disk, thus i took this 
approach. This was achieved by passing 0 as the last parameter for: `new 
MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0)`
    
    
    > I think the best approach (longer term) would be to create our own MIME 
Parser (or find one that already exists?) that is capable of processing the 
data in a streaming fashion. I presume that Jetty doesn't allow that because 
they want to allow you to obtain all Part's and then access them individually. 
However, we're not doing that and so their API is rather limiting.
    
    Prior to implementing, I read lots of advices/tips on the web.  eg.: warns: 
https://stackoverflow.com/questions/2422468/how-to-upload-files-to-server-using-jsp-servlet/2424824#2424824
    As this SO post warns:
    * _Don't manually parse it_
    * _ You shouldn't try to do this on your own or copypaste some homegrown 
library-less code found elsewhere on the Internet_
    * _You should rather use a real library which is used (and implicitly 
tested!) by millions of users for years. Such a library has proven its 
robustness._
    
    I believe the return on asset would not be optimal, as it would open up a 
world of bugs.
    
    
    > For the shorter term, though, I think it makes more sense to expose each 
of these options - a temporary directory to write files to (we cannot use 
"/tmp" because it may not exist on many operating systems. Should perhaps 
default to the value of System.getProperty("java.io.tmpdir")), max request size 
(explain that this is used to prevent Denial of Service attacks that would fill 
disk space), and a threshold at which point we would stream the data to the 
temporary file and then re-read it (what jetty refers to as the 
"fileSizeThreshold", i.e., the last parameter).
    
    I agree. 
    
    > While streaming the data to a temporary file on disk is expensive and 
less than ideal, I think it's much better to give the user the option of doing 
that than to simply reject the request if it is too large.
    
    Indeed it makes more sense. I'll make the necessary changes for this.



---

Reply via email to