korowa commented on code in PR #12263:
URL: https://github.com/apache/datafusion/pull/12263#discussion_r1740119199


##########
datafusion/core/src/datasource/physical_plan/csv.rs:
##########
@@ -493,6 +517,7 @@ impl CsvConfig {
 
 impl CsvConfig {
     fn open<R: Read>(&self, reader: R) -> Result<csv::Reader<R>> {
+        dbg!(&self.terminator);

Review Comment:
   Should this line be cleaned up?



##########
datafusion/core/src/datasource/physical_plan/csv.rs:
##########
@@ -529,6 +556,7 @@ impl CsvOpener {
         config: Arc<CsvConfig>,
         file_compression_type: FileCompressionType,
     ) -> Self {
+        dbg!(&config);

Review Comment:
   This one, probably, should be cleaned up too



##########
datafusion/core/src/datasource/physical_plan/csv.rs:
##########
@@ -112,6 +114,7 @@ impl CsvExecBuilder {
             has_header: false,
             delimiter: b',',
             quote: b'"',
+            terminator: None,

Review Comment:
   I wonder if this option should have any interactions with 
`newlines_in_values`, or somehow affect execution of CSVExec  (if 
`newlines_in_values` used anywhere to actually check input data -- I wasn't 
able to find such a usages for it, but still not sure).



##########
datafusion/core/src/datasource/physical_plan/csv.rs:
##########
@@ -1210,6 +1244,44 @@ mod tests {
         crate::assert_batches_eq!(expected, &result);
     }
 
+    #[tokio::test]
+    async fn test_terminator() {

Review Comment:
   Perhaps there also should be added sqllogictests for reading files with 
customized line terminator (also checking that nothing is broken when 
terminator used along with newlines_in_values).



##########
datafusion/common/src/config.rs:
##########
@@ -1696,6 +1697,13 @@ impl CsvOptions {
         self
     }
 
+    /// The character that terminates a row.
+    /// - default to '\n'

Review Comment:
   If it's None as stated above -- won't it be CRLF by default (which is 
slightly more than `\n` according to 
[csv_core](https://docs.rs/csv-core/latest/csv_core/enum.Terminator.html) docs 
used internally by arrow-csv)? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to