xuang7 opened a new issue, #4041: URL: https://github.com/apache/texera/issues/4041
### What happened? When loading the "pokemon-dataset" (https://hub.texera.io/dashboard/hub/dataset/result/detail/222) with CSV Scan operator, the first data row is incorrectly treated as the header even when "Header" is enabled, causing a runtime error: `Attribute name '49' already exists in the schema`. The file can be parsed when "Header" is disabled. <img width="803" height="309" alt="Image" src="https://github.com/user-attachments/assets/6f6fe1f9-520d-4c55-941d-e83db1b73c34" /> The dataset preview page correctly displays the header, which suggests the file structure is recognized in the preview context. However, this may indicate either: - A potential issue with how the CSV file structure is parsed by the CSV Scan operator, or - An issue with the CSV file structure itself, or - A need for better error messaging to help users diagnose header-related configuration issues <img width="1009" height="715" alt="Image" src="https://github.com/user-attachments/assets/378e4997-64e5-4e22-b514-b1af0bc10364" /> **Expected Behavior** The operator should recognize the first line as the header: `#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary ` And the second line should be treated as the first data row: `1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False` **Actual Behavior** The first data row is incorrectly used as the schema/header: `1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False` ### How to reproduce? 1. Add CSV File Scan operator to workflow and select pokemon-dataset/v1/Pokemon Data CSV.csv 2. Enable "Header" option 3. Run the workflow ### Version 1.1.0-incubating (Pre-release/Master) ### Commit Hash (Optional) _No response_ ### What browsers are you seeing the problem on? _No response_ ### Relevant log output ```shell ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
