lidavidm commented on a change in pull request #12634:
URL: https://github.com/apache/arrow/pull/12634#discussion_r829123862



##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional 
sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.
+A field, i.e. an instance of `Field`_, holds together a field name, a data
+type, and some optional key-value metadata.
+
+.. code-block:: Java
+
+    // Create a column "document" of string type with metadata
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("A", "Id card");
+    metadata.put("B", "Passport");
+    metadata.put("C", "Visa");
+    Field document = new Field("document", new FieldType(true, new 
ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null);
+
+Schemas
+=======
+
+A `Schema`_ describes the overall structure consisting of any number of 
columns. It holds a sequence of fields together
+with some optional schema-wide metadata (in addition to per-field metadata).
+
+.. code-block:: Java
+
+    // Create a schema describing datasets with two columns:
+    // a int32 column "A" and a utf8-encoded string column "B"
+    import org.apache.arrow.vector.types.pojo.ArrowType;
+    import org.apache.arrow.vector.types.pojo.Field;
+    import org.apache.arrow.vector.types.pojo.FieldType;
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import static java.util.Arrays.asList;
+
+    Map<String, String> metadata = new HashMap<>();
+    metadata.put("K1", "V1");
+    metadata.put("K2", "V2");
+    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), 
null);
+    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), null);
+    Schema schema = new Schema(asList(a, b), metadata);
+
 VectorSchemaRoot
 ================
+
+.. note::
+
+    VectorSchemaRoot is somewhat analogous to tables and record batches in the 
other Arrow implementations
+    in that they all are 2D datasets, but the usage is different.
+
 A :class:`VectorSchemaRoot` is a container that can hold batches, batches flow 
through :class:`VectorSchemaRoot`

Review comment:
       Not so sure about this since we haven't introduced batches yet at this 
point…but we can revisit this later.

##########
File path: docs/source/java/vector_schema_root.rst
##########
@@ -15,21 +15,79 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-================
+.. default-domain:: java
+.. highlight:: java
+
+============
+Tabular Data
+============
+
+While arrays (aka: :doc:`ValueVector <./vector>`) represent a one-dimensional 
sequence of
+homogeneous values, data often comes in the form of two-dimensional sets of
+heterogeneous data (such as database tables, CSV files...). Arrow provides
+several abstractions to handle such data conveniently and efficiently.
+
+Fields
+======
+
+Fields are used to denote the particular columns of a table.

Review comment:
       Wait, sorry I missed this. We should not talk about tables in Java. 
Since we haven't introduced VectorSchemaRoot yet, we can talk about "tabular 
data" abstractly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to