[ https://issues.apache.org/jira/browse/DRILL-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers updated DRILL-5416: ------------------------------- Fix Version/s: (was: 1.11.0) > Vectors read from disk report incorrect memory sizes > ---------------------------------------------------- > > Key: DRILL-5416 > URL: https://issues.apache.org/jira/browse/DRILL-5416 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > The external sort and revised hash agg operators spill to disk using a vector > serialization mechanism. This mechanism serializes each vector as a (length, > bytes) pair. > Before spilling, if we check the memory used for a vector (using the new > {{RecordBatchSizer}} class), we learn of the actual memory consumed by the > vector, including any unused space in the vector. > If we spill the vector, then reread it, the reported storage size is wrong. > On reading, the code allocates a buffer, based on the saved length, rounded > up to the next power of two. Then, when building the vector, we "slice" the > read buffer, setting the memory size to the data size. > For example, suppose we save 20 1-byte fields. The size on disk is 20. The > read buffer is rounded to 32 bytes (the size of the original, pre-spill > buffer.) We read the 20 bytes and create a vector. Creating the vector > reports the memory size as 20, "hiding" the extra, unused 12 bytes. > As a result, when computing memory sizes, we receive incorrect numbers. > Working with false numbers means that the code cannot safely operate within a > memory budget, causing the user to receive an unexpected OOM error. > As it turns out, the code path that does the slicing is used only for reads > from disk. This ticket asks to remove the slicing step: just use the > allocated buffer directly so that the after-read vector reports the correct > memory usage; same as the before-spill vector. -- This message was sent by Atlassian JIRA (v6.3.15#6346)