[GitHub] [arrow] rok commented on issue #12553: Support for Compute Functions on Nested Arrays

GitBox Thu, 09 Jun 2022 10:04:56 -0700


rok commented on issue #12553:
URL: https://github.com/apache/arrow/issues/12553#issuecomment-1151378310


   > So now my only question is, while this seems like an optimal generalized 
solution for storage, how much computation is required to explode back out to 
the dense form in memory to do computation?
   
   I've not really benchmarked the conversion when implementing but I think it 
will heavily depend on your non-null distribution and even dimension order (!).
   It should be pretty easy to benchmark though, just time `sparse_tensor = 
pa.SparseCSFTensor.from_dense_numpy(np_array)`.
   
   > In our simple implementation since we are going by whole dimensions only, 
we can just use broadcast when necessary and then collapse back so the 
underlying data is just normal numpy arrays?
   
   I want to say yes, but I'm not 100% sure what you mean. Going from 
`pa.Tensor` to `np.array` and back should be zero copy AFAIK. Someone correct 
me if I'm wrong please!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] rok commented on issue #12553: Support for Compute Functions on Nested Arrays

Reply via email to