liusitan commented on issue #13875:
URL: https://github.com/apache/arrow/issues/13875#issuecomment-1214694078

   the reason why I am hacking the Arrow ipc format is that recently I am 
implementing a fuse filesystem for vineyard, which is an immutable storage 
manager that utilize the columnar format as the arrow. 
   
   We decided to enable our clients to access vineyard objects by reading from 
the Arrow ipc format. which means when the client wants to access the objects 
stored in the vineyard, the fuse file system will searlize the corresponding 
vienayrd objects on the fly, store it in the fuse process, and provide the 
serialized Arrow-formatted vineyard objects to clients. However, this approach 
may lead to heavy memory usage.
   
   We are thinking, is it possible to create a mapping between the Arrow ipc 
format to the vineyard objects, in terms of the information stored in the 
vineyard objects' metadata, it's totally possible, especially in terms of the 
dataframe, we store that in units of column as well. 
   
   Theoretically, if a user wants to access the 100 byte to 200 btyes of the 
Arrow-formatted vineyard objects, conceptually, that's a range of data in the 
first column, my implementation can realize its conceptual representation from 
the byte range, and grab the data from vineyard, serialized  data, provide what 
client wants.  
   
   Practically, I haven't found a way to precompute the sizes of each part of 
the serialized Arrow ipc format for now, given the documentation so far.  After 
doing some question compression, I raised the question above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to