Quanlong Huang created ORC-1131: ----------------------------------- Summary: [C++] getMemoryUsage() is incorrect on string vector batches Key: ORC-1131 URL: https://issues.apache.org/jira/browse/ORC-1131 Project: ORC Issue Type: Bug Affects Versions: 1.6.0 Reporter: Quanlong Huang Assignee: Quanlong Huang
The C++ client produces two kinds of string vector batches, i.e. StringVectorBatch and EncodedStringVectorBatch. They both have incorrect results in getMemoryUsage() currently. After ORC-501, we move the blob from StringColumnReader to StringVectorBatch. However, StringVectorBatch::getMemoryUsage() was not updated to count for it. {code:cpp} uint64_t StringVectorBatch::getMemoryUsage() { return ColumnVectorBatch::getMemoryUsage() + static_cast<uint64_t>(data.capacity() * sizeof(char*) + length.capacity() * sizeof(int64_t)); } {code} For EncodedStringVectorBatch, it inherits StringVectorBatch but doesn't override the getMemoryUsage() method. Thus counting for wrong results. {code:cpp} struct EncodedStringVectorBatch : public StringVectorBatch { EncodedStringVectorBatch(uint64_t capacity, MemoryPool& pool); virtual ~EncodedStringVectorBatch(); std::string toString() const; void resize(uint64_t capacity); std::shared_ptr<StringDictionary> dictionary; // index for dictionary entry DataBuffer<int64_t> index; };{code} -- This message was sent by Atlassian Jira (v8.20.1#820001)