rok commented on issue #12553: URL: https://github.com/apache/arrow/issues/12553#issuecomment-1151378310
> So now my only question is, while this seems like an optimal generalized solution for storage, how much computation is required to explode back out to the dense form in memory to do computation? I've not really benchmarked the conversion when implementing but I think it will heavily depend on your non-null distribution and even dimension order (!). It should be pretty easy to benchmark though, just time `sparse_tensor = pa.SparseCSFTensor.from_dense_numpy(np_array)`. > In our simple implementation since we are going by whole dimensions only, we can just use broadcast when necessary and then collapse back so the underlying data is just normal numpy arrays? I want to say yes, but I'm not 100% sure what you mean. Going from `pa.Tensor` to `np.array` and back should be zero copy AFAIK. Someone correct me if I'm wrong please! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
