I have created a WIP PR for initial feedback on the approach of validating
ArrayData upon creation[1]. If there are no objections to the approach I
will complete the implementation over the next few days

The approach that Sergey describes of `get` and `unsafe get_unchecked`
sounds like a good one to me if performance testing shows we need a bypass.

Andrew

[1] https://github.com/apache/arrow-rs/pull/810

On Thu, Sep 30, 2021 at 5:28 PM Sergey Davidoff <shnat...@gmail.com> wrote:

> I believe feature flags are not the right choice here. The problem is that
> feature flags are toggled globally, so the behavior of your crate can be
> affected by the behavior of some other crate that toggles the feature.
>
> A better approach is to provide two methods, one safe and the other
> unsafe, see for example:
> https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get
>
> https://doc.rust-lang.org/stable/std/primitive.slice.html#method.get_unchecked
>
> Speaking of O(1) vs O(n), a correct program would perform these checks at
> least once anyway, and so the complexity is unavoidable. However, you can
> avoid re-checking bounds for the same buffer over and over if the state of
> the check is encoded in the type system - for example, the `&str` standard
> library type is just a slice that's already been checked and found to be
> valid UTF-8.
>
> Although I struggle to imagine bounds checks being a performance
> bottleneck when done once on object creation, given that removing them from
> hot loops usually wins you only a couple % in performance.
>
> чт, 30 сент. 2021 г. в 22:28, Andrew Lamb <al...@influxdata.com>:
>
>> I understand the need to avoid sacrificing performance as much as
>> possible. I have begun looking into adding validation into ArrayData::new
>> as you suggest.
>>
>> I am making progress, but haven't fully figured out the nested types yet.
>> Hope to have a PR up in the next day or two.
>>
>> Should we come up with something that adds unacceptable performance
>> overhead, we can also feature flag it (so that the check is done by
>> default, but can be disabled if needed)
>>
>> Andrew
>>
>> On Thu, Sep 30, 2021 at 5:33 AM Jörn Horstmann <
>> joern.horstm...@signavio.com> wrote:
>>
>>> Most of these issues seem to originate from creating arrays from
>>> ArrayData.
>>> While we could validate buffers in the XArray::from implementations, that
>>> would have some performance overhead and also some edge cases for nested
>>> data. Thinking about List<List<Utf8>>>, we recursively would need to
>>> validate and know which buffers are used for offsets or utf8 data. Since
>>> the root cause is always ArrayData::new (and possibly
>>> ArrayDataBuilder::finish) I would suggest marking those functions as
>>> unsafe.
>>>
>>> Adding checks for the top-level buffer length would still be useful, I'm
>>> just a bit opposed to having to iterate over whole buffers for
>>> validation,
>>> turning these operations from O(1) to O(n).
>>>
>>> Slightly related, GenericListArray::try_new_from_array_data currently
>>> validates that the first offset is 0. I think this check could actually
>>> be
>>> relaxed and the offset only needs to be in bounds for the child array.
>>>
>>> Jörn
>>>
>>>
>>> On Wed, Sep 29, 2021 at 9:12 PM Andrew Lamb <al...@influxdata.com>
>>> wrote:
>>>
>>> > Dear Rust Developers,
>>> >
>>> > As a heads up, several pre-existing security tickets filed against
>>> arrow-rs
>>> > have been added[1][2] to the RUSTSEC database[1][2] a few hours ago.
>>> The
>>> > author says he plans to file additional RUSTSEC entries for the other
>>> > security tickets [3].
>>> >
>>> > The criteria used for adding the arrow issues to the RUSTSEC database
>>> is
>>> > not clear to me,  but given widely used tools such as `cargo audit`
>>> report
>>> > such issues, it is likely that this will become an visible issue for
>>> our
>>> > users soon.
>>> >
>>> > Given this, I will likely start looking into existing security issues
>>> [4]
>>> > reported against arrow-rs and any help would be appreciated.
>>> >
>>> > Andrew
>>> >
>>> > [1] https://github.com/rustsec/advisory-db/pull/1057
>>> > [2] https://github.com/rustsec/advisory-db/pull/1059
>>> > [3]
>>> >
>>> https://github.com/rustsec/advisory-db/pull/1057#issuecomment-930455127
>>> > [4] https://github.com/apache/arrow-rs/labels/security
>>> >
>>>
>>>
>>> --
>>> *Jörn Horstmann* | Senior Backend Engineer
>>>
>>> www.signavio.com
>>> Kurfürstenstraße 111, 10787 Berlin, Germany
>>>
>>> Work with us! <https://hubs.ly/H0wwzcr0>
>>>
>>>
>>>
>>> <https://www.linkedin.com/company/signavio/>
>>> <https://www.twitter.com/signavio>   <https://www.facebook.com/signavio>
>>> <https://www.youtube.com/user/signavio>
>>> <https://www.xing.com/companies/signaviogmbh>
>>>
>>> [image: BPI Forum] <
>>> https://t-eu.xink.io/Tracking/Index/sSMAALBtAAAnu0MA0>
>>>
>>> HRB 121584 B Charlottenburg District Court, VAT ID: DE265675123
>>> Managing Directors: Dr. Gero Decker, Rouven Morato Adam
>>>
>>

Reply via email to