[ 
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179053#comment-16179053
 ] 

Wes McKinney commented on ARROW-1589:
-------------------------------------

> So the current way method should rather be prefixed w/ 
> "trusted"/"unsafe"/"fast".

This seems a bit like overkill to me -- if this were the norm for function 
naming we would see these naming conventions in Avro, Thrift, Protocol Buffers, 
Flatbuffers, and any other protocol / file format library. I think we can 
improve things in the short term by making the untrustedness explicit in the 
doxygen documentation / code comments. For example, there is no note of 
trustedness in

http://arrow.apache.org/docs/cpp/classarrow_1_1ipc_1_1_record_batch_stream_reader.html

That is easy to change. 

> A tiny example that already segfaults is the creation and read-out of an 
> empty stream, which IMHO should not happen. 

I agree; this should not be difficult to test for. The distinction I had hoped 
to draw was between failures arising through normal use of the software (bugs 
caused by Arrow developers implementing something incorrectly) and failures 
caused by bugs in third party systems (e.g. passing an empty string or buffer 
to a function). I agree that we should test and fix the most obvious causes of 
segfaults that may affect users of these functions.

Please understand that this software we are discussing is primarily the work of 
a single volunteer developer (me). The fact that there are not more tests for 
the cases you're describing is definitely not due to a failure on my part to 
think outside the box -- if you look at my GitHub history you can see that I am 
operating at maximum output capacity 100% of the time. As a result of not 
having more development help, I have had to make tradeooffs: prioritizing more 
features / usability / integration with other projects over testing vs. 
concerning myself with more esoteric matters. 

> [C++] Fuzzing for certain input formats
> ---------------------------------------
>
>                 Key: ARROW-1589
>                 URL: https://issues.apache.org/jira/browse/ARROW-1589
>             Project: Apache Arrow
>          Issue Type: Test
>            Reporter: Marco Neumann
>            Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for 
> reading record batches from streams. Ideally, malformed input must not crash 
> the system but must report a proper error. This could easily be implemented 
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with 
> address sanitizer (that's already implemented by Arrow's build system).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to