[ https://issues.apache.org/jira/browse/NIFI-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269619#comment-17269619 ]
ASF subversion and git services commented on NIFI-6999: ------------------------------------------------------- Commit 1c361d45ae94f155b6e2def7bd4430b1c9ca8b3b in nifi's branch refs/heads/main from Nathan Gough [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=1c361d4 ] NIFI-6999 - Made changes to load flow.xml files using streams. Updated tests. NIFI-6999 - Slight change to test to check for WARN message. NIFI-6999 - Removed very large flow file and test that uses it. This test ran for about 2 minutes so was excessive to keep in. The other changed tests to handle streams proves the functionality. A large file can be used on the command line to manually test large flow files. Some other cleanup. NIFI-6999 - Removed comments and altered the code a little bit for readability as per code review. NIFI-6999 - Removed commented code NIFI-6999 - Renamed variable and removed assert comment. Signed-off-by: Nathan Gough <thena...@gmail.com> This closes #4715. > Encrypt Config Toolkit fails on very large flow.xml.gz files > ------------------------------------------------------------ > > Key: NIFI-6999 > URL: https://issues.apache.org/jira/browse/NIFI-6999 > Project: Apache NiFi > Issue Type: Improvement > Components: Tools and Build > Affects Versions: 1.2.0, 1.10.0 > Reporter: Andy LoPresto > Assignee: Nathan Gough > Priority: Critical > Labels: documentation, encryption, heap, security, streaming, > toolkit > Time Spent: 2h 20m > Remaining Estimate: 0h > > A user reported failure when using the encrypt config toolkit to process > (encrypt) a large {{flow.xml.gz}}. The compressed file was 49 MB, but was 687 > MB uncompressed. It contained 545 encrypted values, and approximately 90 > templates. This caused the toolkit to fail during {{loadFlowXml()}} unless > the toolkit invocation set the heap to 8 GB via {{-Xms2g -Xmx8g}}. Even with > the expanded heap, the serialization of the newly-encrypted flow XML to the > file system fails with the following exception: > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Requested array size > exceeds VM limit > at java.lang.StringCoding.encode(StringCoding.java:350) > at java.lang.String.getBytes(String.java:941) > at org.apache.commons.io.IOUtils.write(IOUtils.java:1857) > at org.apache.commons.io.IOUtils$write$0.call(Unknown Source) > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:141) > at > org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5$_closure20.doCall(ConfigEncryptionTool.groovy:692) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) > at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) > at > org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) > at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) > at groovy.lang.Closure.call(Closure.java:426) > at groovy.lang.Closure.call(Closure.java:442) > at > org.codehaus.groovy.runtime.IOGroovyMethods.withCloseable(IOGroovyMethods.java:1622) > at > org.codehaus.groovy.runtime.NioGroovyMethods.withCloseable(NioGroovyMethods.java:1754) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54) > at > org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:56) > at > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) > at > org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) > at > org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) > at > org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) > at > org.apache.nifi.properties.ConfigEncryptionTool$_writeFlowXmlToFile_closure5.doCall(ConfigEncryptionTool.groovy:691) > {code} > The immediate fix was to remove the duplicated template definitions in the > flow definition, returning the file to a reasonable size. However, if run as > an inline replacement, this can cause the {{flow.xml.gz}} to be overwritten > with an empty file, potentially leading to data loss. The following steps > should be taken: > # Guard against loading/operating on/serializing large files (log statements, > simple conditional checks) > # Handle large files internally (change from direct {{String}} access to > {{BufferedInputStream}}, etc.) > # Document the internal memory usage of the toolkit in the toolkit guide > # Document best practices and steps to resolve issue in the toolkit guide -- This message was sent by Atlassian Jira (v8.3.4#803005)