jinchengchenghh opened a new pull request, #8893:
URL: https://github.com/apache/incubator-gluten/pull/8893

   Add a new flag in header to indicate the column is a dictionary vector.
   Like this, first create a dictionary map to save the dictionaryValues, and 
then if insert, add this value to the distinctValues(BufferPtr), convert the it 
to FlatVector as dictionaryValues in DictinaryVector.
    
   The indices is the map it->second.
   Split the nulls as before.
    
   Only supports RowVector(DictionaryVector<Simple>)
    
   For string type, the key type is StringView, it is also supported as map key
   Based on this code, I'm not sure whether the input RowVectors encoding is 
all dictionary or not.
   ```
   template <typename T>
   DictionaryVectorPtr<EvalType<T>> VectorMaker::dictionaryVector(
       const std::vector<std::optional<T>>& data) {
     using TEvalType = EvalType<T>;
     // Encodes the data saving distinct values on `distinctValues` and their
     // respective indices on `indices`.
     std::vector<TEvalType> distinctValues;
     std::unordered_map<TEvalType, int32_t> indexMap;
     BufferPtr indices = AlignedBuffer::allocate<int32_t>(data.size(), pool_);
     auto rawIndices = indices->asMutable<int32_t>();
     BufferPtr nulls =
         AlignedBuffer::allocate<bool>(data.size(), pool_, bits::kNotNull);
     auto rawNulls = nulls->asMutable<uint64_t>();
     vector_size_t nullCount = 0;
     for (auto i = 0; i < data.size(); ++i) {
       auto val = data[i];
       if (val == std::nullopt) {
         ++nullCount;
         bits::setNull(rawNulls, i, true);
       } else {
         const auto& [it, inserted] = indexMap.emplace(*val, indexMap.size());
         if (inserted) {
           distinctValues.push_back(*val);
         }
         *rawIndices = it->second;
       }
       ++rawIndices;
     }
     auto values = flatVector(distinctValues);
     auto stats = genVectorMakerStats(data);
     auto dictionaryVector = std::make_unique<DictionaryVector<TEvalType>>(
         pool_,
         nullCount ? nulls : nullptr,
         data.size(),
         std::move(values),
         std::move(indices),
         stats.asSimpleVectorStats(),
         indexMap.size(),
         nullCount,
         stats.isSorted);
    
     return dictionaryVector;
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to