pbower commented on issue #11239:
URL: https://github.com/apache/arrow/issues/11239#issuecomment-2122344991

   Hi, this is way late in resurrecting this, the discussion, but I have a 
general question around this, and it seems like there are not many other places 
to discuss it.
   
   Now I might be missing something, but isn't the whole point of Pyarrow to be 
a leading data interchange format? If we're picking objects, and sending them 
over the network, then doesn't that presume that the language on the other end 
is Python? Whereas, the whole point (at least for me) in looking for this 
functionality, was to read data into a Pyarrow object on the other side, and 
have it handle the mapping of what this means, for various data types that 
Pyarrow does support.
   
   Now, one can say 'Pyarrow supports Flight RPC'. Or 'protobuf these things'. 
This is true - but actually I believe it's not true in solving the use cases.
   
   Because:
   
   1.     Flight RPC doesn't solve what one serialises the objects as. A.k.a., 
pickles of non-Dataframe objects are still incompatible to the other end.
   2. Custom protobuf for every data and collection type adds more complexity 
to an application than serialising everything to bytes. Therefore, there is a 
convenience aspect to it.
   3. Sure one can serialise to disk, but then one has file locking and/or 
concurrency issues if constantly reading or writing.
   
   Use case I came here for: "Pyarrow supports a lot of data types, including 
tensors, dictionaries, etc. Brilliant". "Oh, I can't serialise it when it's a 
data interchange format, except if I assume the other end is Python."
   
   I might have misunderstood something because it doesn't make sense to me 
that this would be the case from such an otherwise strong framework?
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to