Pedrinhonitz commented on issue #65379:
URL: https://github.com/apache/airflow/issues/65379#issuecomment-4537497549

   Hello, I'm curious about the problem and would like to understand it better.
   
   Thinking about it, perhaps it's possible to "fix" this without modifying 
Airflow. I might be wrong, but if you pass the delimiter through the task 
instead of the DAG body or `default_args`, I think you won't have this problem.
   
   I did a local test and it worked; maybe this will help.
   
   I created this code, where I assign the delimiter to variables, as you 
suggested, in the DAG body:
   ```python
   from datetime import datetime
   
   from airflow import DAG
   from airflow.providers.standard.operators.python import PythonOperator
   
   field_del = '@@\0@@'
   record_del = '^^\0^^'
   
   
   def process_csv(**context):
       print(f"field_del repr: {field_del!r}")
       print(f"record_del repr: {record_del!r}")
       return {"ok": True}
   
   
   with DAG(
       dag_id="old_csv_delimiter_dag",
       start_date=datetime(2024, 1, 1),
       schedule=None,
       catchup=False,
       tags=["testing", "issue-65379"]
   ) as dag:
       PythonOperator(
           task_id="process_csv",
           python_callable=process_csv,
           op_kwargs={"field_del": field_del, "record_del": record_del}
       )
   
   ```
   
   
   And I created this other one as follows (this one worked locally):
   
   ```python
   from datetime import datetime
   
   from airflow import DAG
   from airflow.providers.standard.operators.python import PythonOperator
   
   
   def _delimiters():
       nul = chr(0)
       return f"@@{nul}@@", f"^^{nul}^^{nul}"
   
   
   def process_csv(**context):
       field_del, record_del = _delimiters()
       print(f"field_del repr: {field_del!r}")
       print(f"record_del repr: {record_del!r}")
       assert '\x00' in field_del
       assert '\x00' in record_del
       return {"ok": True}
   
   
   with DAG(
       dag_id="fix_csv_delimiter_dag",
       start_date=datetime(2024, 1, 1),
       schedule=None,
       catchup=False,
       tags=["testing", "issue-65379"]
   ) as dag:
       PythonOperator(
           task_id="process_csv",
           python_callable=process_csv
       )
   ```
   
   I might be wrong, but this prevents the DAG from changing the delimiter 
parameters and causes it to be assembled inside the task, avoiding null bytes 
in the string; at least it worked here. If it doesn't work, please provide more 
details about the error.
   
   **This may not be the best solution, but it might help migrate the version 
faster than if someone else solved the problem.**
   
   **_Cursor AI was used to assist in reviewing your issue with the 
claude-4.6-sonnet-medium model; there may be defects._**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to