n3world commented on a change in pull request #10255:
URL: https://github.com/apache/arrow/pull/10255#discussion_r630155517



##########
File path: cpp/src/arrow/csv/reader_test.cc
##########
@@ -216,5 +216,83 @@ TEST(StreamingReaderTests, NestedParallelism) {
   TestNestedParallelism(thread_pool, table_factory);
 }
 
+TEST(ReaderOptionsTests, SkipRowsAfterNames) {

Review comment:
       Actually after looking at it a bit more it doesn't have to be moved out 
of the reader but I don't think it can use SkipRows. SkipRows is very simple in 
its implementation it doesn't actually skip rows but lines in the file. I make 
the distinction here because a row can contain values with new lines. If any of 
the rows contain a quoted or escaped new line skip rows will consider that two 
lines and not one.
   
   I was thinking it might be better to add add a FirstN method to Chunker to 
be able to get the Nth occurrence of the line endings. I was thinking this 
could be integrated into the BlockReader implementations to be able to skip 
over rows even beyond the first block. This could also solve ARROW-8527.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to