I was looking at compression in arrow had a couple questions. 

If I've understood compression currently,   it is only used  'in flight'  in 
either IPC or Arrow Flight, using a block compression,  but still decoded into 
Ram at the destination in full array form.  Is this correct ? 


Given that arrow is a columnar format, has any thought been given to an option 
to have the data compressed both in memory and in flight, using some of the 
columnar techniques ? 
 As I deal primarily with Timeseries numerical data, I was thinking about some 
of the algorithms from the Gorilla paper [1]  for Floats  and Timestamps 
(Delta-of-Delta) or similar might be appropriate. 

The interface functions could  still iterate over the data and produce raw 
values so this is transparent to users of the data, but the data blocks/arrays 
in-mem are actually compressed.  

With this method, blocks could come out of a data base/source, through the data 
service, across the wire (flight)  and land in the consuming applications 
memory without ever being decompressed or processed until final use. 


Crazy thought ?


Regards

Mark. 


[1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf

Reply via email to