lidavidm commented on code in PR #14213:
URL: https://github.com/apache/arrow/pull/14213#discussion_r979055429


##########
docs/source/java/vector.rst:
##########
@@ -268,6 +268,82 @@ For example, the code below shows how to build a 
:class:`ListVector` of int's us
      }
   }
 
+Dictionary Encoding
+===================
+
+A :class:`FieldVector` can be dictionary encoded for performance or improved 
memory efficiency. While this is most often done with :class:`VarCharVector`, 
nearly any type of vector might be encoded if there are many values, but few 
unique values.
+
+There are a few steps involved in the encoding process:
+
+1. Create a regular, un-encoded vector and populate it
+2. Create a dictionary vector of the same type as the un-encoded vector. This 
vector must have the same values, but each unique value in the un-encoded 
vector need appear here only once.
+3. Create a :class:`Dictionary`. It will contain the dictionary vector, plus a 
:class:`DictionaryEncoding` object that holds the encoding's metadata and 
settings values.
+4. Create a :class:`DictionaryEncoder`.
+5. Call the encode() method on the :class:`DictionaryEncoder` to produce an 
encoded version of the original vector.
+6. (Optional) Call the decode() method on the encoded vector to re-create the 
original values.
+
+The encoded values will be integers. Depending on how many unique values you 
have, you can use either TinyIntVector, SmallIntVector, or IntVector to hold 
them. You specify the type when you create your :class:`DictionaryEncoding` 
instance. You might wonder where those integers come from: the dictionary 
vector is a regular vector, so the value's index position in that vector is 
used as its encoded value.

Review Comment:
   It's probably easiest to just use the double-backticks to put it in a 
monospace font, since what I'm saying is that the `:class:` markup won't 
actually link it properly since there's no integration between Javadoc and 
Sphinx



##########
docs/source/java/vector.rst:
##########
@@ -268,6 +268,82 @@ For example, the code below shows how to build a 
:class:`ListVector` of int's us
      }
   }
 
+Dictionary Encoding
+===================
+
+A :class:`FieldVector` can be dictionary encoded for performance or improved 
memory efficiency. While this is most often done with :class:`VarCharVector`, 
nearly any type of vector might be encoded if there are many values, but few 
unique values.
+
+There are a few steps involved in the encoding process:
+
+1. Create a regular, un-encoded vector and populate it
+2. Create a dictionary vector of the same type as the un-encoded vector. This 
vector must have the same values, but each unique value in the un-encoded 
vector need appear here only once.
+3. Create a :class:`Dictionary`. It will contain the dictionary vector, plus a 
:class:`DictionaryEncoding` object that holds the encoding's metadata and 
settings values.
+4. Create a :class:`DictionaryEncoder`.
+5. Call the encode() method on the :class:`DictionaryEncoder` to produce an 
encoded version of the original vector.
+6. (Optional) Call the decode() method on the encoded vector to re-create the 
original values.
+
+The encoded values will be integers. Depending on how many unique values you 
have, you can use either TinyIntVector, SmallIntVector, or IntVector to hold 
them. You specify the type when you create your :class:`DictionaryEncoding` 
instance. You might wonder where those integers come from: the dictionary 
vector is a regular vector, so the value's index position in that vector is 
used as its encoded value.

Review Comment:
   (You should probably have seen build warnings to this effect)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to