kevingurney commented on code in PR #45274:
URL: https://github.com/apache/arrow/pull/45274#discussion_r1919044845
##########
matlab/src/matlab/+arrow/+io/+ipc/RecordBatchStreamReader.m:
##########
@@ -26,14 +26,34 @@
Schema
end
- methods
- function obj = RecordBatchStreamReader(filename)
Review Comment:
That's a good question. We could certainly do that if we wanted to and
initially were leaning in this direction.
However, there are a few reasons why we decided to go with the current
approach:
1. The majority of MATLAB classes that wrap a `libmexclass.proxy.Proxy`
object have a single argument constructor that expects a `Proxy` instance, so
separating out the "construction" functions this way allows
`RecordBatchStreamReader` to follow an existing pattern that is used elsewhere
in the MATLAB interface.
2. If we end up wanting to support many different ways of constructing a
particular Arrow type, then having a single constructor for that type won't
always allow us to easily differentiate between similar input configurations.
For example - if you wanted to support constructing a "CSV Reader" object from
a filename *OR* an "in-memory buffer", it would be reasonable for both of these
input configurations to be represented as scalar MATLAB `string` arrays.
However, if this were the case - we wouldn't be able to distinguish one case
from the other (i.e. they have identical function signatures).
3. Technically, we could support both the `fromFile` and `fromBytes`
functions *AND* dispatch to those functions in the constructor based on the
input types. However, this is likely to add additional maintenance burden, and
it is not clear if it will improve usability significantly for users.
4. On a somewhat related note - we currently have `arrow.array(varargin)` -
which is a "convenience" function - that can take in many different MATLAB
array types and return a corresponding `arrow.array.Array` subclass. At the
same time, we also have `arrow.array.[Type]Array.fromMATLAB(varargin)` - which
achieves the same general goal, but does so in a slightly more explicit way for
"developer-oriented" workflows. If we wanted to, we could always add a similar
"convenience function" in the future like `arrow.ipcstream2table()`,
`arrow.ipc2table()`, or `arrow.stream2table()` - which could, in theory,
support multiple different input argument types and dispatch to the
appropriate, corresponding static "construction" function under the hood (i.e.
`fromBytes` or `fromFile`). However, this still wouldn't mitigate the
limitation noted above about not being able to differentiate between two
different "meanings" for the same input type (i.e. filename vs. in-memory
buffer with a MATLAB `st
ring` array).
5. One more thing that might be worth thinking about is that - from a
"discoverability" perspective - it seems like it might be easier for a new user
to search for something like "Read Arrow IPC Stream format from bytes in
MATLAB" and find a "fromBytes" function in the documentation, rather than
having to explicitly check whether the `RecordBatchStreamReader` constructor
supports the particular configuration that their workflow requires. In
addition, it seems like this could potentially get more difficult over time if
the number of distinct function signatures supported by the constructor
increases.
Hopefully, this reasoning makes some sense. A lot of these points are
definitely debatable, and we can certainly see valid arguments against this
design, as well. If anyone has strong opinions against this, we are happy to
modify the design accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]