lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r778826343
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
Review comment:
Can we link to Javadocs? Is there an extension that can automate this in
the same way intersphinx does for Sphinx docs?
##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+ import org.apache.arrow.algorithm.sort.VectorValueComparator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
Review comment:
Sorter -> Comparator?
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
Review comment:
Maybe we should explain that vectors in Java are intended to be mutable,
since that will also be foreign to users of other Arrow libraries.
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
Review comment:
```suggestion
A vector is the basic unit in the Arrow Java library.
```
##########
File path: java/source/conf.py
##########
@@ -0,0 +1,55 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'java-cookbook'
+copyright = '2021, apache arrow'
+author = 'apache arrow'
+
+# The full version, including alpha/beta/rc tags
+release = 'arrow cookbook'
Review comment:
It seems other cookbooks don't have 'release' and have different values
for copyright/author, can we be consistent? (Also, it's 2022 now.)
##########
File path: java/source/demo/.cp.txt
##########
@@ -0,0 +1 @@
+/Users/dsusanibar/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-netty/6.0.0/arrow-memory-netty-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.11.4/jackson-core-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-algorithm/6.0.0/arrow-algorithm-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.11.4/jackson-annotations-2.11.4.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-vector/6.0.0/arrow-vector-6.0.0.jar:/Users/dsusanibar/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/6.0.0/arrow-memory-core-6.0.0.jar:/Users/dsusanibar/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.12.0/flatbuffers-java-1.12.0.jar:/Users/dsusanibar/.m2/repository/io/netty/netty-common/4.1.68.Final/netty-common-4.1.68.Final.jar:/Use
rs/dsusanibar/.m2/repository/io/netty/netty-buffer/4.1.68.Final/netty-buffer-4.1.68.Final.jar:/Users/dsusanibar/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.11.4/jackson-databind-2.11.4.jar:/Users/dsusanibar/.m2/repository/commons-codec/commons-codec/1.10/commons-codec-1.10.jar:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-format/6.0.0/arrow-format-6.0.0.jar
Review comment:
Was this meant to be committed?
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.BitVectorHelper;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+ import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+ import org.apache.arrow.vector.complex.ListVector;
+ import org.apache.arrow.vector.types.Types;
+ import org.apache.arrow.vector.types.pojo.FieldType;
+
+ import java.util.List;
+
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(VarCharVector vector, byte[]... values) {
+ final int length = values.length;
+ vector.allocateNewSafe();
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(ListVector vector, List<Integer>... values) {
+ vector.allocateNewSafe();
+ Types.MinorType type = Types.MinorType.INT;
+ vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+ IntVector dataVector = (IntVector) vector.getDataVector();
+ dataVector.allocateNew();
+
+ // set underlying vectors
+ int curPos = 0;
+ vector.getOffsetBuffer().setInt(0, curPos);
+ for (int i = 0; i < values.length; i++) {
+ if (values[i] == null) {
+ BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+ } else {
+ BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+ for (int value : values[i]) {
+ dataVector.setSafe(curPos, value);
+ curPos += 1;
+ }
+ }
+ vector.getOffsetBuffer().setInt((i + 1) *
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+ }
+ dataVector.setValueCount(curPos);
+ vector.setLastSet(values.length - 1);
+ vector.setValueCount(values.length);
+ }
+
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+ :emphasize-lines: 4
+
+ import org.apache.arrow.vector.IntVector;
+
+ // create int vector
+ IntVector intVector = new IntVector("intVector", rootAllocator);
Review comment:
It might be helpful to explain that this gets used as the field name?
(Coming off of Python/C++, this will be rather foreign.)
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.BitVectorHelper;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+ import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+ import org.apache.arrow.vector.complex.ListVector;
+ import org.apache.arrow.vector.types.Types;
+ import org.apache.arrow.vector.types.pojo.FieldType;
+
+ import java.util.List;
+
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(VarCharVector vector, byte[]... values) {
+ final int length = values.length;
+ vector.allocateNewSafe();
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(ListVector vector, List<Integer>... values) {
+ vector.allocateNewSafe();
+ Types.MinorType type = Types.MinorType.INT;
+ vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+ IntVector dataVector = (IntVector) vector.getDataVector();
+ dataVector.allocateNew();
+
+ // set underlying vectors
+ int curPos = 0;
+ vector.getOffsetBuffer().setInt(0, curPos);
Review comment:
Wow, I didn't realize ListVector required you to manipulate the offsets
directly.
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.BitVectorHelper;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+ import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+ import org.apache.arrow.vector.complex.ListVector;
+ import org.apache.arrow.vector.types.Types;
+ import org.apache.arrow.vector.types.pojo.FieldType;
+
+ import java.util.List;
+
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(VarCharVector vector, byte[]... values) {
+ final int length = values.length;
+ vector.allocateNewSafe();
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(ListVector vector, List<Integer>... values) {
+ vector.allocateNewSafe();
+ Types.MinorType type = Types.MinorType.INT;
+ vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+ IntVector dataVector = (IntVector) vector.getDataVector();
+ dataVector.allocateNew();
+
+ // set underlying vectors
+ int curPos = 0;
+ vector.getOffsetBuffer().setInt(0, curPos);
+ for (int i = 0; i < values.length; i++) {
+ if (values[i] == null) {
+ BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+ } else {
+ BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+ for (int value : values[i]) {
+ dataVector.setSafe(curPos, value);
+ curPos += 1;
+ }
+ }
+ vector.getOffsetBuffer().setInt((i + 1) *
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+ }
+ dataVector.setValueCount(curPos);
+ vector.setLastSet(values.length - 1);
+ vector.setValueCount(values.length);
+ }
+
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Array of int
+============
+
+.. code-block:: java
+ :emphasize-lines: 4
+
+ import org.apache.arrow.vector.IntVector;
+
+ // create int vector
Review comment:
Is the comment helpful here? The example itself is pretty minimal
##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
all: html
-html: py r
+html: py r j
Review comment:
nit: why not write out `java` instead of `j`? (though I realize Python
is abbreviated `py`)
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
Review comment:
```suggestion
Creating Arrow Objects
```
##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+ import org.apache.arrow.algorithm.sort.VectorValueComparator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+ @Override
+ public int compareNotNull(int index1, int index2) {
+ byte b1 = vector1.get(index1)[0];
+ byte b2 = vector2.get(index2)[0];
+ return b1 - b2;
+ }
+
+ @Override
+ public VectorValueComparator<VarCharVector> createNew() {
+ return new TestVarCharSorter();
+ }
+ }
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+ :emphasize-lines: 10
+
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+ IntVector right = new IntVector("int", rootAllocator);
+ IntVector left1 = new IntVector("int", rootAllocator);
+ IntVector left2 = new IntVector("int2", rootAllocator);
+
+ setVector(right, 10,20,30);
+
+ TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or
unequal
Review comment:
I'm not sure the comment helps here.
##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+ import org.apache.arrow.algorithm.sort.VectorValueComparator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+ @Override
+ public int compareNotNull(int index1, int index2) {
+ byte b1 = vector1.get(index1)[0];
+ byte b2 = vector2.get(index2)[0];
+ return b1 - b2;
+ }
+
+ @Override
+ public VectorValueComparator<VarCharVector> createNew() {
+ return new TestVarCharSorter();
+ }
+ }
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Compare fields on the array
Review comment:
Should we try to be consistent about calling it a "vector" instead of an
"array"?
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
Review comment:
If this is meant to be analogous to the pages for C++/Python, can we add
a subheading for Vectors, and then possibly a subheading for VectorSchemaRoot?
##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+ import org.apache.arrow.algorithm.sort.VectorValueComparator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+ @Override
+ public int compareNotNull(int index1, int index2) {
+ byte b1 = vector1.get(index1)[0];
+ byte b2 = vector2.get(index2)[0];
+ return b1 - b2;
+ }
+
+ @Override
+ public VectorValueComparator<VarCharVector> createNew() {
+ return new TestVarCharSorter();
+ }
+ }
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+ :emphasize-lines: 10
+
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+ IntVector right = new IntVector("int", rootAllocator);
+ IntVector left1 = new IntVector("int", rootAllocator);
+ IntVector left2 = new IntVector("int2", rootAllocator);
+
+ setVector(right, 10,20,30);
+
+ TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or
unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+ :emphasize-lines: 1-4
+
+ jshell> visitor.equals(left1); visitor.equals(left2);
Review comment:
nit: for clarity, why not separate lines?
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.BitVectorHelper;
+ import org.apache.arrow.vector.IntVector;
+ import org.apache.arrow.vector.VarCharVector;
+ import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+ import org.apache.arrow.vector.complex.ListVector;
+ import org.apache.arrow.vector.types.Types;
+ import org.apache.arrow.vector.types.pojo.FieldType;
+
+ import java.util.List;
+
+
+ void setVector(IntVector vector, Integer... values) {
+ final int length = values.length;
+ vector.allocateNew(length);
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(VarCharVector vector, byte[]... values) {
+ final int length = values.length;
+ vector.allocateNewSafe();
+ for (int i = 0; i < length; i++) {
+ if (values[i] != null) {
+ vector.set(i, values[i]);
+ }
+ }
+ vector.setValueCount(length);
+ }
+
+ void setVector(ListVector vector, List<Integer>... values) {
+ vector.allocateNewSafe();
+ Types.MinorType type = Types.MinorType.INT;
+ vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+ IntVector dataVector = (IntVector) vector.getDataVector();
+ dataVector.allocateNew();
+
+ // set underlying vectors
+ int curPos = 0;
+ vector.getOffsetBuffer().setInt(0, curPos);
+ for (int i = 0; i < values.length; i++) {
+ if (values[i] == null) {
+ BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+ } else {
+ BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+ for (int value : values[i]) {
+ dataVector.setSafe(curPos, value);
+ curPos += 1;
+ }
+ }
+ vector.getOffsetBuffer().setInt((i + 1) *
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+ }
+ dataVector.setValueCount(curPos);
+ vector.setLastSet(values.length - 1);
+ vector.setValueCount(values.length);
+ }
+
+ RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal
with byte buffer allocation
+
+Array of int
Review comment:
maybe ```Array of ``Int`` (32-bit integer)``` to assume less familiarity
with the Java API?
##########
File path: Makefile
##########
@@ -1,7 +1,7 @@
all: html
-html: py r
+html: py r j
Review comment:
The same goes below for paths.
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends
ValueVector.
Review comment:
What is the distinction between FieldVector and ValueVector?
##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
Review comment:
I'm not going to go through and mark all of these, but let's please make
sure to capitalize names properly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]