xuang7 commented on issue #4041:
URL: https://github.com/apache/texera/issues/4041#issuecomment-3514541946

   During the discussion, we found that many tools treat lines starting with 
“#” as comments by convention, even though the official CSV spec (RFC 4180) 
does not support comments (there’s no “#” or other comment marker).
   
   In this case, the Pokémon dataset begins with “#” as an ID marker. The CSV 
File Scanner operator treated that line as a comment, which broke header 
parsing and used the second data row as the header when Header = ON. After 
replacing the leading “#” with a normal ID (or anything not “#”), the operator 
parsed the file correctly with Header = ON.
   
   
   <img width="801" height="298" alt="Image" 
src="https://github.com/user-attachments/assets/61cde668-6cb6-4bc5-b514-9c41e022dfe9";
 />
   
   Suggested fixes:
   - Change the file: remove or replace the leading “#” (e.g., use “ID” or a 
plain value).
   - Parse with Header = OFF, then rename columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to