lwhite1 commented on code in PR #14213:
URL: https://github.com/apache/arrow/pull/14213#discussion_r979051563


##########
docs/source/java/vector.rst:
##########
@@ -268,6 +268,82 @@ For example, the code below shows how to build a 
:class:`ListVector` of int's us
      }
   }
 
+Dictionary Encoding
+===================
+
+A :class:`FieldVector` can be dictionary encoded for performance or improved 
memory efficiency. While this is most often done with :class:`VarCharVector`, 
nearly any type of vector might be encoded if there are many values, but few 
unique values.
+
+There are a few steps involved in the encoding process:
+
+1. Create a regular, un-encoded vector and populate it
+2. Create a dictionary vector of the same type as the un-encoded vector. This 
vector must have the same values, but each unique value in the un-encoded 
vector need appear here only once.
+3. Create a :class:`Dictionary`. It will contain the dictionary vector, plus a 
:class:`DictionaryEncoding` object that holds the encoding's metadata and 
settings values.
+4. Create a :class:`DictionaryEncoder`.
+5. Call the encode() method on the :class:`DictionaryEncoder` to produce an 
encoded version of the original vector.
+6. (Optional) Call the decode() method on the encoded vector to re-create the 
original values.
+
+The encoded values will be integers. Depending on how many unique values you 
have, you can use either TinyIntVector, SmallIntVector, or IntVector to hold 
them. You specify the type when you create your :class:`DictionaryEncoding` 
instance. You might wonder where those integers come from: the dictionary 
vector is a regular vector, so the value's index position in that vector is 
used as its encoded value.

Review Comment:
   Java does not support unsigned indexes.  Unsigned vectors in Java work with 
signed values (except Uint2, which uses char arrays IIRC) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to