Re: [PR] GH-45263: [MATLAB] Add ability to construct `RecordBatchStreamReader` from `uint8` array [arrow]

via GitHub Thu, 16 Jan 2025 11:04:09 -0800


kevingurney commented on code in PR #45274:
URL: https://github.com/apache/arrow/pull/45274#discussion_r1919044845



##########
matlab/src/matlab/+arrow/+io/+ipc/RecordBatchStreamReader.m:
##########
@@ -26,14 +26,34 @@
         Schema
     end
 
-    methods
-        function obj = RecordBatchStreamReader(filename)

Review Comment:
   That's a good question. We could certainly do that if we wanted to and 
initially were leaning in this direction.
   
   However, there are a few reasons why we decided to go with the current 
approach:
   
   1. The majority of MATLAB classes that wrap a `libmexclass.proxy.Proxy` 
object have a single argument constructor that expects a `Proxy` instance, so 
separating out the "construction" functions this way allows 
`RecordBatchStreamReader` to follow an existing pattern that is used elsewhere 
in the MATLAB interface.
   
   2. If we end up wanting to support many different ways of constructing a 
particular Arrow type, then having a single constructor for that type won't 
always allow us to easily differentiate between similar input configurations. 
For example - if you wanted to support constructing a "CSV Reader" object from 
a filename *OR* an "in-memory buffer", it would be reasonable for both of these 
input configurations to be represented as scalar MATLAB `string` arrays. 
However, if this were the case - we wouldn't be able to distinguish one case 
from the other (i.e. they have identical function signatures).
   
   3. Technically, we could support both the `fromFile` and `fromBytes` 
functions *AND* dispatch to those functions in the constructor based on the 
input types. However, this is likely to add additional maintenance burden, and 
it is not clear if it will improve usability significantly for users.
   
   4. On a somewhat related note - we currently have `arrow.array(varargin)` - 
which is a "convenience" function - that can take in many different MATLAB 
array types and return a corresponding `arrow.array.Array` subclass. At the 
same time, we also have `arrow.array.[Type]Array.fromMATLAB(varargin)` - which 
achieves the same general goal, but does so in a slightly more explicit way for 
"developer-oriented" workflows. If we wanted to, we could always add a similar 
"convenience function" in the future like `arrow.ipcstream2table()`, 
`arrow.ipc2table()`, or `arrow.stream2table()` - which could, in theory, 
support multiple different input argument types and dispatch to the 
appropriate, corresponding static "construction" function under the hood (i.e. 
`fromBytes` or `fromFile`). However, this still wouldn't mitigate the 
limitation noted above about not being able to differentiate between two 
different "meanings" for the same input type (i.e. filename vs. in-memory 
buffer with a MATLAB `st
 ring` array).
   
   5. One more thing that might be worth thinking about is that - from a 
"discoverability" perspective - it seems like it might be easier for a new user 
to search for something like "Read Arrow IPC Stream format from bytes in 
MATLAB" and find a "fromBytes" function in the documentation, rather than 
having to explicitly check whether the `RecordBatchStreamReader` constructor 
supports the particular configuration that their workflow requires. In 
addition, it seems like this could potentially get more difficult over time if 
the number of distinct function signatures supported by the constructor 
increases.
   
   Hopefully, this reasoning makes some sense. A lot of these points are 
definitely debatable, and we can certainly see valid arguments against this 
design, as well. If anyone has strong opinions against this, we are happy to 
modify the design accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-45263: [MATLAB] Add ability to construct `RecordBatchStreamReader` from `uint8` array [arrow]

Reply via email to