Github user ekovacs commented on a diff in the pull request: https://github.com/apache/nifi/pull/2991#discussion_r219117891 --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpRequest.java --- @@ -521,161 +553,221 @@ public void onTrigger(final ProcessContext context, final ProcessSession session final long start = System.nanoTime(); final HttpServletRequest request = container.getRequest(); - FlowFile flowFile = session.create(); - try (OutputStream flowFileOut = session.write(flowFile)) { - StreamUtils.copy(request.getInputStream(), flowFileOut); - } catch (final IOException e) { - // There may be many reasons which can produce an IOException on the HTTP stream and in some of them, eg. - // bad requests, the connection to the client is not closed. In order to address also these cases, we try - // and answer with a BAD_REQUEST, which lets the client know that the request has not been correctly - // processed and makes it aware that the connection can be closed. - getLogger().error("Failed to receive content from HTTP Request from {} due to {}", - new Object[]{request.getRemoteAddr(), e}); - session.remove(flowFile); - try { - HttpServletResponse response = container.getResponse(); - response.sendError(Status.BAD_REQUEST.getStatusCode()); - response.flushBuffer(); - container.getContext().complete(); - } catch (final IOException ioe) { - getLogger().warn("Failed to send HTTP response to {} due to {}", - new Object[]{request.getRemoteAddr(), ioe}); + if (!Strings.isNullOrEmpty(request.getContentType()) && request.getContentType().contains(MIME_TYPE__MULTIPART_FORM_DATA)) { + final long maxRequestSize = context.getProperty(MAX_REQUEST_SIZE).asLong(); + request.setAttribute(Request.__MULTIPART_CONFIG_ELEMENT, new MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0)); --- End diff -- Hi @markap14 Thank you for reviewing my changes. Please see my responses inline. Best regards, Endre > As-is, this is going to buffer the entire request into memory and then throw an Exception if the request reaches some configured threshold. Unfortunately, I think this is going to be too limiting for many use cases. We should not request that the entire file upload fit into java's heap. The original ticket: https://issues.apache.org/jira/browse/NIFI-3469 Explicitly mentioned to avoid writing to local disk, thus i took this approach. This was achieved by passing 0 as the last parameter for: `new MultipartConfigElement("/tmp", maxRequestSize, maxRequestSize, 0)` > I think the best approach (longer term) would be to create our own MIME Parser (or find one that already exists?) that is capable of processing the data in a streaming fashion. I presume that Jetty doesn't allow that because they want to allow you to obtain all Part's and then access them individually. However, we're not doing that and so their API is rather limiting. Prior to implementing, I read lots of advices/tips on the web. eg.: warns: https://stackoverflow.com/questions/2422468/how-to-upload-files-to-server-using-jsp-servlet/2424824#2424824 As this SO post warns: * _Don't manually parse it_ * _ You shouldn't try to do this on your own or copypaste some homegrown library-less code found elsewhere on the Internet_ * _You should rather use a real library which is used (and implicitly tested!) by millions of users for years. Such a library has proven its robustness._ I believe the return on asset would not be optimal, as it would open up a world of bugs. > For the shorter term, though, I think it makes more sense to expose each of these options - a temporary directory to write files to (we cannot use "/tmp" because it may not exist on many operating systems. Should perhaps default to the value of System.getProperty("java.io.tmpdir")), max request size (explain that this is used to prevent Denial of Service attacks that would fill disk space), and a threshold at which point we would stream the data to the temporary file and then re-read it (what jetty refers to as the "fileSizeThreshold", i.e., the last parameter). I agree. > While streaming the data to a temporary file on disk is expensive and less than ideal, I think it's much better to give the user the option of doing that than to simply reject the request if it is too large. Indeed it makes more sense. I'll make the necessary changes for this.
---