Quanlong Huang created ORC-1131:
-----------------------------------

             Summary: [C++] getMemoryUsage() is incorrect on string vector 
batches 
                 Key: ORC-1131
                 URL: https://issues.apache.org/jira/browse/ORC-1131
             Project: ORC
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


The C++ client produces two kinds of string vector batches, i.e. 
StringVectorBatch and EncodedStringVectorBatch. They both have incorrect 
results in getMemoryUsage() currently.

After ORC-501, we move the blob from StringColumnReader to StringVectorBatch. 
However, StringVectorBatch::getMemoryUsage() was not updated to count for it.
{code:cpp}
uint64_t StringVectorBatch::getMemoryUsage() {
  return ColumnVectorBatch::getMemoryUsage()
        + static_cast<uint64_t>(data.capacity() * sizeof(char*)
        + length.capacity() * sizeof(int64_t));
} {code}
For EncodedStringVectorBatch, it inherits StringVectorBatch but doesn't 
override the getMemoryUsage() method. Thus counting for wrong results.
{code:cpp}
struct EncodedStringVectorBatch : public StringVectorBatch { 
  EncodedStringVectorBatch(uint64_t capacity, MemoryPool& pool);
  virtual ~EncodedStringVectorBatch();
  std::string toString() const;
  void resize(uint64_t capacity);

  std::shared_ptr<StringDictionary> dictionary;
  // index for dictionary entry
  DataBuffer<int64_t> index;
};{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to