toppyy commented on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1007915879


   Thanks for copy-pasting the discussion - it didn't occur to me. I tried 
option 2, but ended up having to change the class CsvReadOptions also (see 
below why). So the latest commits really implement options one and two. 
   
   To me it still seems that it's impossible to get the value of _skip_rows_ 
later on if user has set it like this:
   > `open_dataset(
   >   td,
   >   format = 'csv',
   >   read_options = arrow::CsvReadOptions$create(
   >     skip_rows = 1
   >   )
   > )`
   
   Because of this I had to implement CsvReadOptions$skip_rows to be able to 
retrieve the value in `csv_file_format_read_opts`. I could be wrong though.
   
   The functionality is as follows:
   - Check if column_names are given as a separate argument. If so, use them.
   - Check if column_names are given in read_options. If so, use them.
   - Check if schema is set. If so, use names from schema as column names. 
   
   For skip rows, we do the same (with the exception of using schema).
   
   The code is a bit verbose and may be improved by some simplification. Also, 
if any other options (in addition to column_names & skip_rows) are introduced, 
it needs an update - so not ideal in that sense.
   
   Let me know what you think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to