HI François,
This sounds great.  I would hope that as part of this we document the
invariants (and possible sharp edges like zero length/all null and no null
Arrays).

Is your intent to allow languages binding to the C++ library go through the
new API or will they still be able to use the "dangerous" ones?

-Micah

On Fri, Mar 8, 2019 at 6:16 PM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> Greetings,
>
> I noted that the current C++ API permits constructing core objects breaking
> said classes invariants. The following recent issues were affected by this:
>
> - ARROW-4766: segfault due to invalid ArrayData with nullptr buffer
> - ARROW-4774: segfault due to invalid Table with columns of different
> length
>
> Consumers often assumes that the objects respect the invariants, e.g. by
> dereferencing `array_data->buffers[i]->data()` or
> `Array::GetValue(table.n_rows - 1)` .
>
> Sample of classes which requires invariant in the constructor:
> - ArrayData/Array: number and size of buffers depending on type
> - ChunkedArray: Arrays of same type
> - Column: same as ChunkedArray and Field must match array's type
> - RecordBatch: number of columns and schema must match, columns of same
> size
> - Table: columns must be of same size
>
> Some classes provide static factory methods, notably:
> - Most `shared_ptr<T> *::Make` methods, but they lack the Status return
> type to indicate
>   failure, the consumer can at least check for nullptr
> - `Status Table::FromRecordBatches(..., shared_ptr<T> *out)` is a good
> candidate to follow
>
> I suspect that mis-usage is only going to grow with the number of users and
> language that binds to C++. I would like to propose a plan to tackle for
> the
> 0.14.0 release
>
> 1. Implement `StatusOr` (ARROW-4800), providing a cleaner API by minimizing
>    output parameters.
> 2. Refactor normalized factory methods for each core object (ArrayData,
>    ChunkedArray, Column, RecordBatch, Table)
>     - Common naming, I suggest we stay with `Make`.
>     - Common return type, `StatusOr<shared_ptr<T>>`
> 3. Refactor existing Make methods to use new methods but preserve original
>    signature by losing error message, on top of marking them deprecated.
> 4. Mark non-validating constructors as deprecated and ideailly make every
> "dangerous" constructor non-public.
>
> We'd give 1-2 release for consumers to stop using the deprecated
> methods/constructors.
>
> François
>

Reply via email to