[ https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222490#comment-17222490 ]
Bryan Cutler commented on ARROW-8714: ------------------------------------- +1 on the proposal of having a list array for the data (of same type as the tensor) and second array for the shape. For the shape, a list array of ints would work but it could also be possible to modify Tensor.fbs slightly to have a TensorShape message. That might have some benefit to keep the size down for lots of small tensors, but not sure if it's worth the added complexity. I also had another thought, if the shape for each tensor added an additional outer dimension to represent how many records are in each tensor, that would allow us to use a single tensor extension type for both variable and constant dimensions. For example, say you have 10 tensors of shape (2, 3) stacked in a single ndarray of (10, 2, 3), then the shape array would have a single entry {{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this case, but this would also allow to combine smaller batches, say 10 tensors where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you think of this [~chrish42] and [~jorisvandenbossche] ? > [C++] Add a Tensor logical value type with varying dimensions, implemented > using ExtensionType > ---------------------------------------------------------------------------------------------- > > Key: ARROW-8714 > URL: https://issues.apache.org/jira/browse/ARROW-8714 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format > Reporter: Christian Hudon > Priority: Major > > Support for tensor in Table, RecordBatch, etc. where each row is a tensor of > a different shape (e.g images of different sizes), but of the same underlying > type (e.g. int32). Implemented as an ExtensionType, so no need to change the > format. > I don't see needing each row being a tensor with a different number of > dimensions, so if the implementation for that falls out easily of the use > case with each row in the table having a tensor with the same number of > dimensions, great. If it adds a lot of complexity, that case would be > postponed. -- This message was sent by Atlassian Jira (v8.3.4#803005)