piotr-szuberski commented on pull request #13319: URL: https://github.com/apache/beam/pull/13319#issuecomment-727823590
> My current question is, for the problem you find that ARRAY is not well supported, what part of design have it affected? It affects the possibility of reading multiple cells with more fields than `value`, e.g.: ``` key VARCHAR, family1 ROW< column1 ARRAY<ROW<val BOOLEAN, timestampMicros BIGINT>> > ``` For now it's possible only to get the array of cell's values. In general Flink and Spark don't support the array of values. Instead they provide only recent values without the metadata like `timestampMicros` or `labels`. The Flink's approach is already implemented. Spark's approach is to skip families and columns in the schema definition, provide just values and then add mapping to each value in the TBLPROPERTIES in format ``` schemaField1 BIGINT, schemaField2 VARCHAR TBLPROPERTIES { "hbase.columns.mapping": "familyName:columnName:schemaField1,familyName:columnName:schemaField2"` ``` On the other hand BigQuery does support it. Maybe for the simplicity's sake should we just stick to the Flink and Spark version? I can see that in Pubsub table for some reason it's possible to use only flattened schema for the Write operation. If it's not possible to do so with ROW then Spark's approach would fit well for it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org