piotr-szuberski commented on pull request #13319:
URL: https://github.com/apache/beam/pull/13319#issuecomment-727823590


   > My current question is, for the problem you find that ARRAY is not well 
supported, what part of design have it affected?
   
   It affects the possibility of reading multiple cells with more fields than 
`value`, e.g.:
   ```
   key VARCHAR,
   family1 ROW<
     column1 ARRAY<ROW<val BOOLEAN, timestampMicros BIGINT>>
   >
   ```
   For now it's possible only to get the array of cell's values.
   
   In general Flink and Spark don't support the array of values. Instead they 
provide only recent values without the metadata like `timestampMicros` or 
`labels`.
   
   The Flink's approach is already implemented.
   
   Spark's approach is to skip families and columns in the schema definition, 
provide just values and then add mapping to each value in the TBLPROPERTIES in 
format 
   ```
   schemaField1 BIGINT,
   schemaField2 VARCHAR
   TBLPROPERTIES { "hbase.columns.mapping": 
"familyName:columnName:schemaField1,familyName:columnName:schemaField2"`
   ```
   
   On the other hand BigQuery does support it.
   
   Maybe for the simplicity's sake should we just stick to the Flink and Spark 
version?
   
   I can see that in Pubsub table for some reason it's possible to use only 
flattened schema for the Write operation. If it's not possible to do so with 
ROW then Spark's approach would fit well for it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to