xuang7 commented on issue #4041: URL: https://github.com/apache/texera/issues/4041#issuecomment-3514541946
During the discussion, we found that many tools treat lines starting with “#” as comments by convention, even though the official CSV spec (RFC 4180) does not support comments (there’s no “#” or other comment marker). In this case, the Pokémon dataset begins with “#” as an ID marker. The CSV File Scanner operator treated that line as a comment, which broke header parsing and used the second data row as the header when Header = ON. After replacing the leading “#” with a normal ID (or anything not “#”), the operator parsed the file correctly with Header = ON. <img width="801" height="298" alt="Image" src="https://github.com/user-attachments/assets/61cde668-6cb6-4bc5-b514-9c41e022dfe9" /> Suggested fixes: - Change the file: remove or replace the leading “#” (e.g., use “ID” or a plain value). - Parse with Header = OFF, then rename columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
