[jira] [Created] (ARROW-8532) [C++][CSV] Add support for sentinel values.

2020-04-20 Thread Ravil Bikbulatov (Jira)
Ravil Bikbulatov created ARROW-8532:
---

 Summary: [C++][CSV] Add support for sentinel values.
 Key: ARROW-8532
 URL: https://issues.apache.org/jira/browse/ARROW-8532
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ravil Bikbulatov


Some systems still use sentinel values to store nulls. It would be good if 
read_csv would place sentinel values and user wouldn't need to convet null 
bitmaps to sentinel values.

Adding this support doesn't contradict Arrow specification as null values are 
undefined. Also it wouldn't add any overhead to read_csv. Since Arrow is 
general purpose framework I think we can relieve users from pain of converting 
bitmats to sentinel values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8527) [C++][CSV] Add support for ReadOptions::skip_rows >= block_size

2020-04-20 Thread Ravil Bikbulatov (Jira)
Ravil Bikbulatov created ARROW-8527:
---

 Summary: [C++][CSV] Add support for ReadOptions::skip_rows >= 
block_size
 Key: ARROW-8527
 URL: https://issues.apache.org/jira/browse/ARROW-8527
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ravil Bikbulatov


Current implementation throws error in reader.cc:286 when skip_rows > header. 
However, in some workloads skip_rows used for not only skipping header but for 
just skipping first n-rows. In this case block-size constraint is greatly 
interferes. I think this constraint could be removed without performance 
reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)