[jira] [Work logged] (BEAM-3437) Support schema in PCollections

ASF GitHub Bot (JIRA) Sun, 01 Apr 2018 22:09:12 -0700

     [ 
https://issues.apache.org/jira/browse/BEAM-3437?focusedWorklogId=86479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86479
 ]


ASF GitHub Bot logged work on BEAM-3437:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Apr/18 05:08
            Start Date: 02/Apr/18 05:08
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on a change in pull request #4964: 
[BEAM-3437] Introduce Schema class, and use it in BeamSQL
URL: https://github.com/apache/beam/pull/4964#discussion_r178487905
 
 

 ##########
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamAggregationTransforms.java
 ##########
 @@ -177,21 +171,30 @@ public AggregationAdaptor(List<AggregateCall> 
aggregationCalls, RowType sourceRo
           int refIndexKey = call.getArgList().get(0);
           int refIndexValue = call.getArgList().get(1);
 
+          FieldTypeDescriptor keyDescriptor =
+              sourceSchema.getField(refIndexKey).getTypeDescriptor();
           BeamSqlInputRefExpression sourceExpKey = new 
BeamSqlInputRefExpression(
-                  CalciteUtils.getFieldCalciteType(sourceRowType, 
refIndexKey), refIndexKey);
+              CalciteUtils.toSqlTypeName(keyDescriptor.getType(), 
keyDescriptor.getMetadata()),
+              refIndexKey);
+
+          FieldTypeDescriptor valueDescriptor =
+              sourceSchema.getField(refIndexValue).getTypeDescriptor();
           BeamSqlInputRefExpression sourceExpValue = new 
BeamSqlInputRefExpression(
-                  CalciteUtils.getFieldCalciteType(sourceRowType, 
refIndexValue), refIndexValue);
+              CalciteUtils.toSqlTypeName(valueDescriptor.getType(), 
valueDescriptor.getMetadata()),
+                  refIndexValue);
 
           sourceFieldExps.add(KV.of(sourceExpKey, sourceExpValue));
         } else {
           int refIndex = call.getArgList().size() > 0 ? 
call.getArgList().get(0) : 0;
+          FieldTypeDescriptor typeDescriptor = 
sourceSchema.getField(refIndex).getTypeDescriptor();
 
 Review comment:
   <!--thread_id:cc_178376529_t; 
commit:79c95678e593da730ba0472b77304ec1f916245e; resolved:0-->
   <!--section:context-quote-->
   > **akedin** wrote:
   > It's unclear where we're supposed to be using `FieldTypeDescriptor` vs 
`FieldType`. Can they be combined? So that, for example, all fields in 
`FieldType` become instances of `FieldTypeDescriptor`. Do we need both?
   
   <!--section:body-->
   FieldType is an enum that identifies the type of a row . FieldTypeDescriptor 
contains extra information needed to resolve the type (e.g. the type of the 
component element or the schema of the row).
   
   I think it is possible to merge the classes (since Java enums are just 
classes), but I think it's better to maintain .a clean separation between 
primitive field types and the recursive spec needed to resolve a type. Also 
while it's ok to add some extra convenience functionality to an Enum class, 
making it a fully recursive type seems like an abuse of enums.
   
   For reference, the two classes are _roughly_ equivalent to the SqlTypeName 
and RelDataType in Calcite. Would it be clearer if I renamed FieldType -> 
TypeName?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 86479)
    Time Spent: 3h 10m  (was: 3h)

> Support schema in PCollections
> ------------------------------
>
>                 Key: BEAM-3437
>                 URL: https://issues.apache.org/jira/browse/BEAM-3437
>             Project: Beam
>          Issue Type: Wish
>          Components: beam-model
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As discussed with some people in the team, it would be great to add schema 
> support in {{PCollections}}. It will allow us:
> 1. To expect some data type in {{PTransforms}}
> 2. Improve some runners with additional features (I'm thinking about Spark 
> runner with data frames for instance).
> A technical draft document has been created: 
> https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit?disco=AAAABhykQIs&ts=5a203b46&usp=comment_email_document
> I also started a PoC on a branch, I will update this Jira with a "discussion" 
> PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3437) Support schema in PCollections

Reply via email to