HI François, This sounds great. I would hope that as part of this we document the invariants (and possible sharp edges like zero length/all null and no null Arrays).
Is your intent to allow languages binding to the C++ library go through the new API or will they still be able to use the "dangerous" ones? -Micah On Fri, Mar 8, 2019 at 6:16 PM Francois Saint-Jacques < fsaintjacq...@gmail.com> wrote: > Greetings, > > I noted that the current C++ API permits constructing core objects breaking > said classes invariants. The following recent issues were affected by this: > > - ARROW-4766: segfault due to invalid ArrayData with nullptr buffer > - ARROW-4774: segfault due to invalid Table with columns of different > length > > Consumers often assumes that the objects respect the invariants, e.g. by > dereferencing `array_data->buffers[i]->data()` or > `Array::GetValue(table.n_rows - 1)` . > > Sample of classes which requires invariant in the constructor: > - ArrayData/Array: number and size of buffers depending on type > - ChunkedArray: Arrays of same type > - Column: same as ChunkedArray and Field must match array's type > - RecordBatch: number of columns and schema must match, columns of same > size > - Table: columns must be of same size > > Some classes provide static factory methods, notably: > - Most `shared_ptr<T> *::Make` methods, but they lack the Status return > type to indicate > failure, the consumer can at least check for nullptr > - `Status Table::FromRecordBatches(..., shared_ptr<T> *out)` is a good > candidate to follow > > I suspect that mis-usage is only going to grow with the number of users and > language that binds to C++. I would like to propose a plan to tackle for > the > 0.14.0 release > > 1. Implement `StatusOr` (ARROW-4800), providing a cleaner API by minimizing > output parameters. > 2. Refactor normalized factory methods for each core object (ArrayData, > ChunkedArray, Column, RecordBatch, Table) > - Common naming, I suggest we stay with `Make`. > - Common return type, `StatusOr<shared_ptr<T>>` > 3. Refactor existing Make methods to use new methods but preserve original > signature by losing error message, on top of marking them deprecated. > 4. Mark non-validating constructors as deprecated and ideailly make every > "dangerous" constructor non-public. > > We'd give 1-2 release for consumers to stop using the deprecated > methods/constructors. > > François >