[ 
https://issues.apache.org/jira/browse/ORC-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668957#comment-16668957
 ] 

ASF GitHub Bot commented on ORC-420:
------------------------------------

majetideepak commented on a change in pull request #326: ORC-420: [C++] 
Implement string dictionary encoding for C++ writer
URL: https://github.com/apache/orc/pull/326#discussion_r229376031
 
 

 ##########
 File path: c++/src/ColumnWriter.cc
 ##########
 @@ -850,27 +977,76 @@ namespace orc {
 
     virtual void recordPosition() const override;
 
+    virtual void createRowIndexEntry() override;
+
+    virtual void writeDictionary() override;
+
+    virtual void reset() override;
+
+  private:
+    /**
+     * dictionary related functions
+     */
+    bool checkDictionaryKeyRatio();
+    void createDirectStreams();
+    void createDictStreams();
+    void deleteDictStreams();
+    void fallbackToDirectEncoding();
+
   protected:
-    std::unique_ptr<RleEncoder> lengthEncoder;
-    std::unique_ptr<AppendOnlyBufferedStream> dataStream;
     RleVersion rleVersion;
+    bool useCompression;
+    const StreamsFactory& streamsFactory;
+    bool alignedBitPacking;
+
+    // direct encoding streams
+    std::unique_ptr<RleEncoder> directLengthEncoder;
+    std::unique_ptr<AppendOnlyBufferedStream> directDataStream;
+
+    // dictionary encoding streams
+    std::unique_ptr<RleEncoder> dictDataEncoder;
+    std::unique_ptr<RleEncoder> dictLengthEncoder;
+    std::unique_ptr<AppendOnlyBufferedStream> dictStream;
+
+    /**
+     * dictionary related variables
+     */
+    StringDictionary dictionary;
+    // whether or not dictionary checking is done
+    bool doneDictionaryCheck;
+    // whether or not it should be used
+    bool useDictionary;
+    // keys in the dictionary should not exceed this ratio
+    double dictSizeThreshold;
+    // record index of insertion order in the dictionary for not-null rows
+    std::vector<int64_t> idxInDictBuffer;
 
 Review comment:
   In my opinion, the StringColumnWriter must not have any state corresponding 
to the dictionary encoding.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Implement string dictionary encoding for C++ writer
> ---------------------------------------------------------
>
>                 Key: ORC-420
>                 URL: https://issues.apache.org/jira/browse/ORC-420
>             Project: ORC
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Gang Wu
>            Assignee: Gang Wu
>            Priority: Major
>
> The scope of this Jira is to add string dictionary encoding support to C++ 
> writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to