[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mariappan Asokan updated MAPREDUCE-6628: ---------------------------------------- Status: Patch Available (was: Open) > Potential memory leak in CryptoOutputStream > ------------------------------------------- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security > Affects Versions: 2.6.4 > Reporter: Mariappan Asokan > Assignee: Mariappan Asokan > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, > MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch, MAPREDUCE-6628.008.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org