lidavidm commented on a change in pull request #113:
URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r778846417



##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.

Review comment:
       Can we explain or link to docs about the difference between the two?
   
   Also I think we usually call the "random access format" the "file" format 
(vs the "stream" format), especially since it's called the file format below.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or 
unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on 
algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", 
rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new 
TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new 
StableVectorComparator<>(comparatorValues);//Stable comparator only supports 
comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) 
< 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; 
stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;

Review comment:
       and here as well, we should definitely use separate statements.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format

Review comment:
       > Arrow vectors can be serialized to disk as the Arrow IPC format. Such 
files can be directly memory-mapped when read.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file

Review comment:
       This is a little confusing, this makes it sound like we can write 
batches in random order (we cannot). Also, we've already stated that this is 
for the IPC file format in the section title. Maybe this can just be "Write to 
File" (and then "Write to In-Memory Buffer" below)?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:

Review comment:
       add metadata to a field, right?
   
   Also, how do we add metadata to a schema?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), 
points: List<intCol: Int(32, true)>>
+
+Populate data

Review comment:
       Did we ever explain what a VectorSchemaRoot is?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       They take many lines of code, don't save all that many lines (especially 
since we have only a few values anyways), and force people to scroll back and 
forth.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type

Review comment:
       It might help here, for instance, to note below that we're creating a 
nested type, or that the `Int` definition accepts a bit width and a flag 
indicating signed/unsigned.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), 
points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, 
rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       Also, I'm not sure about promoting a separate serialization format for 
schemas. What about instead demonstrating how to serialize/deserialize a 
schema? (Though note that Schema.serialize is _not_ compatible with Python/C++.)

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;

Review comment:
       Hmm. Can we avoid wildcard imports? At least, we've avoided them so far 
and it obscures what comes from where.

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Compare fields on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.compare.TypeEqualsVisitor;
+
+   IntVector right = new IntVector("int", rootAllocator);
+   IntVector left1 = new IntVector("int", rootAllocator);
+   IntVector left2 = new IntVector("int2", rootAllocator);
+
+   setVector(right, 10,20,30);
+
+   TypeEqualsVisitor visitor = new TypeEqualsVisitor(right); // equal or 
unequal
+
+Comparing vector fields:
+
+.. code-block:: java
+   :emphasize-lines: 1-4
+
+   jshell> visitor.equals(left1); visitor.equals(left2);
+
+   true
+   false
+
+Compare values on the array
+===========================
+
+.. code-block:: java
+   :emphasize-lines: 15-17
+
+   import org.apache.arrow.algorithm.sort.StableVectorComparator;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.VarCharVector;
+
+   // compare two values at the given indices in the vectors.
+   // comparing org.apache.arrow.algorithm.sort.VectorValueComparator on 
algorithm
+   VarCharVector vec = new VarCharVector("valueindexcomparator", 
rootAllocator);
+   vec.allocateNew(100, 5);
+   vec.setValueCount(10);
+   vec.set(0, "ba".getBytes());
+   vec.set(1, "abc".getBytes());
+   vec.set(2, "aa".getBytes());
+   vec.set(3, "abc".getBytes());
+   vec.set(4, "a".getBytes());
+   VectorValueComparator<VarCharVector> comparatorValues = new 
TestVarCharSorter(); // less than, equal to, greater than
+   VectorValueComparator<VarCharVector> stableComparator = new 
StableVectorComparator<>(comparatorValues);//Stable comparator only supports 
comparing values from the same vector
+   stableComparator.attachVector(vec);
+
+Comparing two values at the given indices in the vectors:
+
+.. code-block:: java
+   :emphasize-lines: 1-8
+
+   jshell> stableComparator.compare(0, 1) > 0; stableComparator.compare(1, 2) 
< 0; stableComparator.compare(2, 3) < 0; stableComparator.compare(1, 3) < 0; 
stableComparator.compare(3, 1) > 0; stableComparator.compare(3, 3) == 0;
+
+   true
+   true
+   true
+   true
+   true
+   true
+
+Search values on the array
+==========================
+
+Linear search - O(n)
+********************
+
+Algorithm: org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+
+.. code-block:: java
+   :emphasize-lines: 27
+
+   import org.apache.arrow.algorithm.search.VectorSearcher;
+   import org.apache.arrow.algorithm.sort.DefaultVectorComparators;
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.vector.IntVector;
+
+   // search values on the array
+   // linear search 
org.apache.arrow.algorithm.search.VectorSearcher#linearSearch - O(n)
+   IntVector rawVector = new IntVector("", rootAllocator);
+   IntVector negVector = new IntVector("", rootAllocator);
+   rawVector.allocateNew(10);
+   rawVector.setValueCount(10);
+   negVector.allocateNew(1);
+   negVector.setValueCount(1);
+   for (int i = 0; i < 10; i++) { // prepare data in sorted order
+    if (i == 0) {

Review comment:
       Can we try to be consistent about indent spacing?

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file

Review comment:
       These comments just restate the section title.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6

Review comment:
       If we're emphasizing all lines, I don't think there's a point.

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, 
fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, 
Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new 
FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, 
null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new 
ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format
+
+Read arrays with the IPC file format
+************************************
+
+Read - random access to file
+----------------------------
+
+Consider: Before to run next code you need to write array to file with `Write 
- random access to file`_.
+
+.. code-block:: java
+   :emphasize-lines: 7
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // read - random access to file
+   FileInputStream fileInputStream = new FileInputStream(file);
+   ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), 
rootAllocator);

Review comment:
       Does this guarantee memory mapping?

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, 
fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, 
Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new 
FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, 
null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new 
ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format

Review comment:
       Do we need to repeat this?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:

Review comment:
       So at this point, I'm not sure if these utilities are actually helpful, 
vs. just manually calling `vector.setSafe(0, 1);` inline in the examples. 

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, 
fileOutputStream.getChannel());

Review comment:
       Don't we need to close the writer, file, etc.?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends 
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);

Review comment:
       It's unexplained what allocateNew, set, setSafe, setValueCount, etc. 
actually do and why you might need them.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type

Review comment:
       As a general point, I don't think most of the code comments here have 
clarified things.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+.. code-block:: java
+   :emphasize-lines: 5
+
+   import org.apache.arrow.vector.types.pojo.Schema;
+   import static java.util.Arrays.asList;
+
+   // create a definition
+   Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> schemaPerson
+
+   schemaPerson ==> Schema<name: Utf8, document: Utf8, age: Int(32, true), 
points: List<intCol: Int(32, true)>>
+
+Populate data
+=============
+
+.. code-block:: java
+   :emphasize-lines: 3,12-15
+
+   import org.apache.arrow.vector.*;
+
+   VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, 
rootAllocator);
+
+   // getting field vectors
+   VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+   VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+   IntVector ageVectorOption1 = (IntVector) vectorSchemaRoot.getVector("age");
+   ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+   // add values to the field vectors
+   setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+   setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+   setVector(ageVectorOption1, 10,20,30);
+   setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+
+   vectorSchemaRoot.setRowCount(3);
+
+Render data & metadata:
+
+.. code-block:: java
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString());
+
+   name    document    age  points
+   david   A            10  [1,3,5,7,9]
+   gladis  B            20  [2,4,6,8,10]
+   juan    C            30  [1,2,3,5,8]
+
+   jshell> System.out.println(documentVectorOption1.getField().getMetadata());
+
+   {A=Id card, B=Passport, C=Visa}
+
+Create the schema from json
+===========================
+
+For this json definition:

Review comment:
       Hmm. Is this JSON format defined somewhere/stable?

##########
File path: java/source/usecase.rst
##########
@@ -0,0 +1,277 @@
+========
+Use Case

Review comment:
       It seems these could all go under data manipulation.

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.

Review comment:
       `union` is an overloaded word since it's also a type. Maybe `A Schema is 
a list of Fields, where each Field is a name and a type.`

##########
File path: java/source/data.rst
##########
@@ -0,0 +1,316 @@
+=================
+Data manipulation
+=================
+
+Recipes related to compare, filtering or transforming data.
+
+.. contents::
+
+We are going to use this util for data manipulation:
+
+.. code-block:: java
+
+   import org.apache.arrow.algorithm.sort.VectorValueComparator;
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+
+   void setVector(IntVector vector, Integer... values) {
+      final int length = values.length;
+      vector.allocateNew(length);
+      for (int i = 0; i < length; i++) {
+          if (values[i] != null) {
+              vector.set(i, values[i]);
+          }
+      }
+      vector.setValueCount(length);
+   }
+
+  class TestVarCharSorter extends VectorValueComparator<VarCharVector> {
+    @Override
+    public int compareNotNull(int index1, int index2) {
+        byte b1 = vector1.get(index1)[0];
+        byte b2 = vector2.get(index2)[0];
+        return b1 - b2;
+    }
+
+    @Override
+    public VectorValueComparator<VarCharVector> createNew() {
+        return new TestVarCharSorter();
+    }
+  }
+  RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Compare fields on the array

Review comment:
       Also the title is a little unclear to me…"Compare Vectors for Field 
Equality"?

##########
File path: java/source/create.rst
##########
@@ -0,0 +1,134 @@
+======================
+Creating arrow objects
+======================
+
+A vector is the basic unit in the java arrow columnar format.
+Vectors are provided by java arrow for the interface FieldVector that extends 
ValueVector.
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);

Review comment:
       There's existing documentation about this: 
https://arrow.apache.org/docs/java/vector.html
   
   Is it possible to adapt that, or link to it? (If we link to it, we should 
set up the intersphinx plugin for this cookbook.)

##########
File path: java/source/io.rst
##########
@@ -0,0 +1,354 @@
+========================
+Reading and writing data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Writing array
+=============
+
+It is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format. There are two option: Random access format
+& Streaming format.
+
+We are going to use this util for reading and writing data:
+
+.. code-block:: java
+   :name: Util
+   :emphasize-lines: 114
+
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.VectorSchemaRoot;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+   import org.apache.arrow.vector.types.pojo.Schema;
+
+   import java.util.ArrayList;
+   import java.util.HashMap;
+   import java.util.List;
+   import java.util.Map;
+
+   import static java.util.Arrays.asList;
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   VectorSchemaRoot createVectorSchemaRoot(){
+       // create a column data type
+       Field name = new Field("name", FieldType.nullable(new 
ArrowType.Utf8()), null);
+
+       Map<String, String> metadata = new HashMap<>();
+       metadata.put("A", "Id card");
+       metadata.put("B", "Passport");
+       metadata.put("C", "Visa");
+       Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+       Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+       FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+       FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+       Field childField = new Field("intCol", intType, null);
+       List<Field> childFields = new ArrayList<>();
+       childFields.add(childField);
+       Field points = new Field("points", listType, childFields);
+
+       // create a definition
+       Schema schemaPerson = new Schema(asList(name, document, age, points));
+
+       RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // 
deal with byte buffer allocation
+       VectorSchemaRoot vectorSchemaRoot = 
VectorSchemaRoot.create(schemaPerson, rootAllocator);
+
+       // getting field vectors
+       VarCharVector nameVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("name"); //interface FieldVector
+       VarCharVector documentVectorOption1 = (VarCharVector) 
vectorSchemaRoot.getVector("document"); //interface FieldVector
+       IntVector ageVectorOption1 = (IntVector) 
vectorSchemaRoot.getVector("age");
+       ListVector pointsVectorOption1 = (ListVector) 
vectorSchemaRoot.getVector("points");
+
+       // add values to the field vectors
+       setVector(nameVectorOption1, "david".getBytes(), "gladis".getBytes(), 
"juan".getBytes());
+       setVector(documentVectorOption1, "A".getBytes(), "B".getBytes(), 
"C".getBytes());
+       setVector(ageVectorOption1, 10,20,30);
+       setVector(pointsVectorOption1, asList(1,3,5,7,9), asList(2,4,6,8,10), 
asList(1,2,3,5,8));
+       vectorSchemaRoot.setRowCount(3);
+
+       return vectorSchemaRoot;
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+   VectorSchemaRoot vectorSchemaRoot = createVectorSchemaRoot();
+
+
+.. code-block:: java
+   :emphasize-lines: 1-6
+
+   jshell> System.out.println(vectorSchemaRoot.contentToTSVString())
+
+   name     document age   points
+   david    A        10    [1,3,5,7,9]
+   gladis   B        20    [2,4,6,8,10]
+   juan     C        30    [1,2,3,5,8]
+
+Writing arrays with the IPC file format
+***************************************
+
+Write - Random access to file
+-----------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // random access format
+   // write - random access to file
+   File file = new File("randon_access.arrow");
+   FileOutputStream fileOutputStream = new FileOutputStream(file);
+   ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, 
fileOutputStream.getChannel());
+   writer.start();
+   writer.writeBatch();
+   writer.end();
+
+Write - random access to buffer
+-------------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+   import java.nio.channels.Channels;
+
+   // write - random access to buffer
+   ByteArrayOutputStream out = new ByteArrayOutputStream();
+   ArrowFileWriter writerBuffer = new ArrowFileWriter(vectorSchemaRoot, null, 
Channels.newChannel(out));
+   writerBuffer.start();
+   writerBuffer.writeBatch();
+   writerBuffer.end();
+
+
+Writing arrays with the IPC streamed format
+*******************************************
+
+Write - Streaming to file
+-------------------------
+
+.. code-block:: java
+   :emphasize-lines: 9
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // streaming format
+   // write - streaming to file
+   File fileStream = new File("streaming.arrow");
+   FileOutputStream fileOutputStreamforStream = new 
FileOutputStream(fileStream);
+   ArrowStreamWriter writerStream = new ArrowStreamWriter(vectorSchemaRoot, 
null, fileOutputStreamforStream);
+   writerStream.start();
+   writerStream.writeBatch();
+   writerStream.end();
+
+Write - Streaming to buffer
+---------------------------
+
+.. code-block:: java
+   :emphasize-lines: 8
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // write - streaming to buffer
+   ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+   ArrowStreamWriter writerStreamBuffer = new 
ArrowStreamWriter(vectorSchemaRoot, null, outBuffer);
+   writerStreamBuffer.start();
+   writerStreamBuffer.writeBatch();
+   writerStreamBuffer.end();
+
+Read array
+==========
+
+Arrow vectors that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk. There 
+are two option: Random access format & Streaming format
+
+Read arrays with the IPC file format
+************************************
+
+Read - random access to file
+----------------------------
+
+Consider: Before to run next code you need to write array to file with `Write 
- random access to file`_.
+
+.. code-block:: java
+   :emphasize-lines: 7
+
+   import org.apache.arrow.vector.ipc.*;
+
+   import java.io.*;
+
+   // read - random access to file
+   FileInputStream fileInputStream = new FileInputStream(file);
+   ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), 
rootAllocator);

Review comment:
       If not, how do we enable it?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name

Review comment:
       contain?

##########
File path: java/source/schema.rst
##########
@@ -0,0 +1,330 @@
+===================
+Working with schema
+===================
+
+Common definition of table has an schema. Java arrow is columnar oriented and 
it also has an schema representation. 
+Consider that each name on the schema maps to a columns for a predefined data 
type
+
+
+.. contents::
+
+We are going to use this util for creating arrow objects:
+
+.. code-block:: java
+
+   import org.apache.arrow.memory.RootAllocator;
+   import org.apache.arrow.vector.BitVectorHelper;
+   import org.apache.arrow.vector.IntVector;
+   import org.apache.arrow.vector.VarCharVector;
+   import org.apache.arrow.vector.complex.BaseRepeatedValueVector;
+   import org.apache.arrow.vector.complex.ListVector;
+   import org.apache.arrow.vector.types.Types;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   import java.util.List;
+
+
+   void setVector(IntVector vector, Integer... values) {
+       final int length = values.length;
+       vector.allocateNew(length);
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(VarCharVector vector, byte[]... values) {
+       final int length = values.length;
+       vector.allocateNewSafe();
+       for (int i = 0; i < length; i++) {
+           if (values[i] != null) {
+               vector.set(i, values[i]);
+           }
+       }
+       vector.setValueCount(length);
+   }
+
+   void setVector(ListVector vector, List<Integer>... values) {
+       vector.allocateNewSafe();
+       Types.MinorType type = Types.MinorType.INT;
+       vector.addOrGetVector(FieldType.nullable(type.getType()));
+
+       IntVector dataVector = (IntVector) vector.getDataVector();
+       dataVector.allocateNew();
+
+       // set underlying vectors
+       int curPos = 0;
+       vector.getOffsetBuffer().setInt(0, curPos);
+       for (int i = 0; i < values.length; i++) {
+           if (values[i] == null) {
+               BitVectorHelper.unsetBit(vector.getValidityBuffer(), i);
+           } else {
+               BitVectorHelper.setBit(vector.getValidityBuffer(), i);
+               for (int value : values[i]) {
+                   dataVector.setSafe(curPos, value);
+                   curPos += 1;
+               }
+           }
+           vector.getOffsetBuffer().setInt((i + 1) * 
BaseRepeatedValueVector.OFFSET_WIDTH, curPos);
+       }
+       dataVector.setValueCount(curPos);
+       vector.setLastSet(values.length - 1);
+       vector.setValueCount(values.length);
+   }
+
+   RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE); // deal 
with byte buffer allocation
+
+Define data type
+================
+
+Definition of columnar fields for string (name), integer (age) and array 
(points):
+
+.. code-block:: java
+   :emphasize-lines: 6,8,12,15
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type
+   Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), 
null);
+
+   Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, 
true)), null);
+
+   FieldType intType = new FieldType(true, new ArrowType.Int(32, true), 
/*dictionary=*/null);
+   FieldType listType = new FieldType(true, new ArrowType.List(), 
/*dictionary=*/null);
+   Field childField = new Field("intCol", intType, null);
+   List<Field> childFields = new ArrayList<>();
+   childFields.add(childField);
+   Field points = new Field("points", listType, childFields);
+
+.. code-block:: java
+   :emphasize-lines: 1-5
+
+   jshell> name; age; points;
+
+   name ==> name: Utf8
+   age ==> age: Int(32, true)
+   points ==> points: List<intCol: Int(32, true)>
+
+Define metadata
+===============
+
+In case we need to add metadata to our definition we could use:
+
+.. code-block:: java
+   :emphasize-lines: 10
+
+   import org.apache.arrow.vector.types.pojo.ArrowType;
+   import org.apache.arrow.vector.types.pojo.Field;
+   import org.apache.arrow.vector.types.pojo.FieldType;
+
+   // create a column data type + metadata
+   Map<String, String> metadata = new HashMap<>();
+   metadata.put("A", "Id card");
+   metadata.put("B", "Passport");
+   metadata.put("C", "Visa");
+   Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), null, metadata), null);
+
+.. code-block:: java
+   :emphasize-lines: 1-3
+
+   jshell> document
+
+   document ==> document: Utf8
+
+Create the schema
+=================
+
+Tables detain multiple columns, each with its own name

Review comment:
       Also, "Tables" aren't a concept in the Java library.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to