Re: [I] [Bug] [Connector-File] UTF-8 BOM causes the first column of data to be null during CSV parsing [seatunnel]

via GitHub Sat, 24 Jan 2026 18:15:16 -0800


DanielCarter-stack commented on issue #10374:
URL: https://github.com/apache/seatunnel/issues/10374#issuecomment-3795837781


   <!-- code-pr-reviewer -->
   Thanks for reporting this bug. I can confirm the issue exists in 
`CsvReadStrategy.java` (lines 120-135).
   
   **Root cause**: The BOM-skipping logic (lines 130-135) executes **after** 
`CSVParser` is created (line 129). When `withFirstRecordAsHeader()` is enabled, 
the first column header is read as `"\uFEFFid"` instead of `"id"`, causing 
column name mismatch in `CsvReadStrategy.java:149-161` (`indexOf` returns -1, 
so the column is skipped).
   
   **Suggested fix**: Move BOM handling before `CSVParser` creation. Consider 
wrapping `actualInputStream` with Apache Commons IO's `BOMInputStream` or 
manually skip BOM bytes before creating the `BufferedReader`.
   
   **Workaround**: Save CSV files without BOM, or set `csv_use_header_line = 
false` and explicitly define column order in schema.
   
   This would be a great contribution! Please modify `CsvReadStrategy.java` and 
add a test case in `CsvReadStrategyTest.java` with a UTF-8 BOM CSV file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] [Connector-File] UTF-8 BOM causes the first column of data to be null during CSV parsing [seatunnel]

Reply via email to