alxp1982 commented on code in PR #24488:
URL: https://github.com/apache/beam/pull/24488#discussion_r1054001849


##########
learning/tour-of-beam/learning-content/java/schema-based-transforms/schema-concept/creating-schema/description.md:
##########
@@ -0,0 +1,153 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Overview
+
+Most structured records share some common characteristics:
+
+→  They can be subdivided into separate named fields. Fields usually have 
string names, but sometimes - as in the case of indexed tuples - have numerical 
indices instead.
+
+→  There is a confined list of primitive types that a field can have. These 
often match primitive types in most programming languages: int, long, string, 
etc.
+
+→  Often a field type can be marked as optional (sometimes referred to as 
nullable) or required.
+
+Often records have a nested structure. A nested structure occurs when a field 
itself has subfields so the type of the field itself has a schema. Fields that 
are array or map types is also a common feature of these structured records.
+
+For example, consider the following schema, representing actions in a 
fictitious e-commerce company:
+
+**Purchase**
+
+```
+Field Name              Field Type
+userId                  STRING
+itemId                  INT64
+shippingAddress         ROW(ShippingAddress)
+cost                    INT64
+transactions            ARRAY[ROW(Transaction)]
+```
+
+**ShippingAddress**
+
+```
+Field Name              Field Type
+streetAddress           STRING
+city                    STRING
+state                   nullable STRING
+country                 STRING
+postCode                STRING
+```
+
+**Transaction**
+
+```
+Field Name              Field Type
+bank                    STRING
+purchaseAmount          DOUBLE
+```
+
+Schemas provide us a type-system for Beam records that is independent of any 
specific programming-language type. There might be multiple Java classes that 
all have the same schema (for example a Protocol-Buffer class or a POJO class), 
and Beam will allow us to seamlessly convert between these types. Schemas also 
provide a simple way to reason about types across different 
programming-language APIs.
+
+A `PCollection` with a schema does not need to have a `Coder` specified, as 
Beam knows how to encode and decode Schema rows; Beam uses a special coder to 
encode schema types.
+
+### Creating Schemas
+
+While schemas themselves are language independent, they are designed to embed 
naturally into the programming languages of the Beam SDK being used. This 
allows Beam users to continue using native types while reaping the advantage of 
having Beam understand their element schemas.

Review Comment:
   While schemas are language-independent, they are designed to be embedded 
naturally into the programming languages supported by Beam SDK. You can 
continue using Java native types with Beam while taking advantage of 
schema-based transforms. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to