velvia edited a comment on pull request #9376:
URL: https://github.com/apache/arrow/pull/9376#issuecomment-778693797


   Commenting since @maxburke pinged me.
   
   On the surface I think this is a great change from the performance 
perspective.  I totally agree that being able to deal with scalars instead of 
just arrays adds huge room for optimization.   I have always thought that 
always needing intermediate arrays slowed down the processing of DataFusion 
significantly, for certain cases, and is also cache unfriendly.
   
   On the other hand, I hear what @alamb and others are saying that it adds 
complexity to what is already nontrivial.   I agree with that.
   
   I wonder if it is possible to get the best of both worlds, by extending the 
`Array` trait slightly and having a subclass of `Array` which denotes scalars, 
like `ScalarArray` which do not need `Buffer` storage and just represents 
constant scalars.   This way, functions would only need to deal with Array, but 
can recognize this subclass `ScalarArray` and do optimizations that way.   This 
is a very half-formed thought at the moment.   The train of thought is just 
what if the `Array` was not strictly a buffer-based representation but just a 
way to access columnar data, and in certain cases represents scalars.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to