[ 
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222490#comment-17222490
 ] 

Bryan Cutler commented on ARROW-8714:
-------------------------------------

+1 on the proposal of having a list array for the data (of same type as the 
tensor) and second array for the shape. For the shape, a list array of ints 
would work but it could also be possible to modify Tensor.fbs slightly to have 
a TensorShape message. That might have some benefit to keep the size down for 
lots of small tensors, but not sure if it's worth the added complexity.

I also had another thought, if the shape for each tensor added an additional 
outer dimension to represent how many records are in each tensor, that would 
allow us to use a single tensor extension type for both variable and constant 
dimensions. For example, say you have 10 tensors of shape (2, 3) stacked in a 
single ndarray of (10, 2, 3), then the shape array would have a single entry 
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will 
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5, 
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this 
case, but this would also allow to combine smaller batches, say 10 tensors 
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you 
think of this [~chrish42] and [~jorisvandenbossche] ?

> [C++] Add a Tensor logical value type with varying dimensions, implemented 
> using ExtensionType
> ----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8714
>                 URL: https://issues.apache.org/jira/browse/ARROW-8714
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Format
>            Reporter: Christian Hudon
>            Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of 
> a different shape (e.g images of different sizes), but of the same underlying 
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the 
> format. 
> I don't see needing each row being a tensor with a different number of 
> dimensions, so if the implementation for that falls out easily of the use 
> case with each row in the table having a tensor with the same number of 
> dimensions, great. If it adds a lot of complexity, that case would be 
> postponed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to