gmarendaz opened a new issue, #35979:
URL: https://github.com/apache/airflow/issues/35979

   ### Apache Airflow version
   
   2.7.3
   
   ### What happened
   
   I was in part of the migration of Apache Airflow in a Ubuntu environment 
from a Windows WSL environment.
   DAG was working correctly before on Windows WSL but not on Ubuntu.
   
   ### What you think should happen instead
   
   The code should work as expected because the file and DAG structure did not 
change.
   
   ### How to reproduce
   
   The problem can't be reprocued as it is in my environment only
   
   ### Operating System
   
   Ubuntu 22.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   ai-operator @ 
file:///home/apache/airflow/packages/ai_package/dist/ai_operator-0.0.0-py3-none-any.whl
   aiohttp==3.8.6
   aiosignal==1.3.1
   alembic==1.12.1
   annotated-types==0.6.0
   anyio==4.0.0
   apache-airflow==2.7.3
   apache-airflow-providers-common-sql==1.8.0
   apache-airflow-providers-ftp==3.6.0
   apache-airflow-providers-http==4.6.0
   apache-airflow-providers-imap==3.4.0
   apache-airflow-providers-mysql==5.2.1
   apache-airflow-providers-sqlite==3.5.0
   apispec==6.3.0
   argcomplete==3.1.3
   asgiref==3.7.2
   async-timeout==4.0.3
   attrs==23.1.0
   Automat==20.2.0
   Babel==2.13.1
   backoff==1.10.0
   bcrypt==3.2.0
   blinker==1.6.3
   cachelib==0.9.0
   cattrs==23.1.2
   certifi==2023.7.22
   cffi==1.16.0
   chardet==4.0.0
   charset-normalizer==3.3.2
   click==8.1.7
   clickclick==20.10.2
   cloud-init==23.2.2
   cmake==3.27.1
   colorama==0.4.6
   colorlog==4.8.0
   command-not-found==0.3
   configobj==5.0.6
   ConfigUpdater==3.1.1
   connexion==2.14.2
   constantly==15.1.0
   contourpy==1.1.0
   cron-descriptor==1.4.0
   croniter==2.0.1
   cryptography==41.0.5
   cycler==0.11.0
   dbus-python==1.2.18
   Deprecated==1.2.14
   dill==0.3.1.1
   distro==1.7.0
   distro-info==1.1+ubuntu0.1
   dnspython==2.4.2
   docutils==0.20.1
   email-validator==1.3.1
   exceptiongroup==1.1.2
   filelock==3.12.2
   Flask==2.2.5
   Flask-AppBuilder==4.3.6
   Flask-Babel==2.0.0
   Flask-Caching==2.1.0
   Flask-JWT-Extended==4.5.3
   Flask-Limiter==3.5.0
   Flask-Login==0.6.3
   Flask-Session==0.5.0
   Flask-SQLAlchemy==2.5.1
   Flask-WTF==1.2.1
   fonttools==4.42.0
   frozenlist==1.4.0
   gevent==23.7.0
   google-re2==1.1
   googleapis-common-protos==1.61.0
   graphviz==0.20.1
   greenlet==3.0.1
   grpcio==1.59.2
   gunicorn==21.2.0
   h11==0.14.0
   httpcore==0.16.3
   httplib2==0.20.2
   httpx==0.23.3
   hyperlink==21.0.0
   idna==3.4
   importlib-metadata==6.8.0
   importlib-resources==6.1.0
   imutils==0.5.4
   incremental==21.3.0
   inflection==0.5.1
   itsdangerous==2.1.2
   jeepney==0.7.1
   Jinja2==3.1.2
   jsonpatch==1.32
   jsonpointer==2.0
   jsonschema==4.19.2
   jsonschema-specifications==2023.7.1
   keyring==23.5.0
   kiwisolver==1.4.4
   launchpadlib==1.10.16
   lazr.restfulclient==0.14.4
   lazr.uri==1.0.6
   lazy-object-proxy==1.9.0
   limits==3.6.0
   linkify-it-py==2.0.2
   lit==16.0.6
   lockfile==0.12.2
   Mako==1.2.4
   Markdown==3.5.1
   markdown-it-py==3.0.0
   MarkupSafe==2.1.3
   marshmallow==3.20.1
   marshmallow-enum==1.5.1
   marshmallow-oneofschema==3.0.1
   marshmallow-sqlalchemy==0.26.1
   matplotlib==3.7.2
   mdit-py-plugins==0.4.0
   mdurl==0.1.2
   more-itertools==8.10.0
   mpmath==1.3.0
   multidict==6.0.4
   mutils==1.0.5
   mysql-connector-python==8.1.0
   mysqlclient==2.1.1
   netifaces==0.11.0
   networkx==3.1
   numpy==1.25.2
   nvidia-cublas-cu11==11.10.3.66
   nvidia-cuda-cupti-cu11==11.7.101
   nvidia-cuda-nvrtc-cu11==11.7.99
   nvidia-cuda-runtime-cu11==11.7.99
   nvidia-cudnn-cu11==8.5.0.96
   nvidia-cufft-cu11==10.9.0.58
   nvidia-curand-cu11==10.2.10.91
   nvidia-cusolver-cu11==11.4.0.1
   nvidia-cusparse-cu11==11.7.4.91
   nvidia-nccl-cu11==2.14.3
   nvidia-nvtx-cu11==11.7.91
   oauthlib==3.2.0
   opencv-python==4.8.0.76
   opentelemetry-api==1.20.0
   opentelemetry-exporter-otlp==1.20.0
   opentelemetry-exporter-otlp-proto-common==1.20.0
   opentelemetry-exporter-otlp-proto-grpc==1.20.0
   opentelemetry-exporter-otlp-proto-http==1.20.0
   opentelemetry-proto==1.20.0
   opentelemetry-sdk==1.20.0
   opentelemetry-semantic-conventions==0.41b0
   ordered-set==4.1.0
   packaging==23.2
   pandas==2.0.3
   pathspec==0.11.2
   pendulum==2.1.2
   pexpect==4.8.0
   Pillow==10.0.0
   pluggy==1.3.0
   prison==0.2.1
   protobuf==4.24.4
   psutil==5.9.6
   ptyprocess==0.7.0
   pyasn1==0.4.8
   pyasn1-modules==0.2.1
   pycparser==2.21
   pydantic==2.4.2
   pydantic_core==2.10.1
   Pygments==2.16.1
   PyGObject==3.42.1
   PyHamcrest==2.0.2
   PyJWT==2.8.0
   pyOpenSSL==21.0.0
   pyparsing==3.0.9
   pyrsistent==0.18.1
   pyserial==3.5
   python-apt==2.4.0+ubuntu2
   python-daemon==3.0.1
   python-dateutil==2.8.2
   python-debian==0.1.43+ubuntu1.1
   python-magic==0.4.24
   python-nvd3==0.15.0
   python-slugify==8.0.1
   pytz==2023.3.post1
   pytzdata==2020.1
   PyYAML==6.0.1
   referencing==0.30.2
   requests==2.31.0
   requests-toolbelt==1.0.0
   rfc3339-validator==0.1.4
   rfc3986==1.5.0
   rich==13.6.0
   rich-argparse==1.4.0
   rpds-py==0.10.6
   scipy==1.11.1
   seaborn==0.12.2
   SecretStorage==3.3.1
   service-identity==18.1.0
   setproctitle==1.3.3
   six==1.16.0
   sniffio==1.3.0
   sos==4.5.6
   SQLAlchemy==1.4.50
   SQLAlchemy-JSONField==1.0.1.post0
   SQLAlchemy-Utils==0.41.1
   sqlparse==0.4.4
   ssh-import-id==5.11
   sympy==1.12
   systemd-python==234
   tabulate==0.9.0
   tenacity==8.2.3
   termcolor==2.3.0
   text-unidecode==1.3
   torch==2.0.1
   torchvision==0.15.2
   tqdm==4.66.1
   triton==2.0.0
   Twisted==22.1.0
   typing_extensions==4.8.0
   tzdata==2023.3
   ubuntu-advantage-tools==8001
   ubuntu-drivers-common==0.0.0
   uc-micro-py==1.0.2
   ufw==0.36.1
   unattended-upgrades==0.1
   unicodecsv==0.14.1
   urllib3==1.26.18
   wadllib==1.3.6
   Werkzeug==2.2.3
   wrapt==1.15.0
   WTForms==3.0.1
   xkit==0.0.0
   xlrd==2.0.1
   yarl==1.9.2
   zipp==3.17.0
   zope.event==5.0
   zope.interface==5.4.0
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   - Miniconda latest version
   - Apache Airflow 2.7.3
   
   ### Anything else
   
   Full traceback :
   *** Found local files:
   ***   * 
/home/apache/airflow/logs/dag_id=MASP_Slave/run_id=manual__2023-11-28T00:01:58.619294+00:00/task_id=transform/attempt=1.log
   [2023-11-30, 12:29:58 CET] {taskinstance.py:1159} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: MASP_Slave.transform 
manual__2023-11-28T00:01:58.619294+00:00 [queued]>
   [2023-11-30, 12:29:58 CET] {taskinstance.py:1159} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: MASP_Slave.transform 
manual__2023-11-28T00:01:58.619294+00:00 [queued]>
   [2023-11-30, 12:29:58 CET] {taskinstance.py:1361} INFO - Starting attempt 1 
of 1
   [2023-11-30, 12:29:58 CET] {taskinstance.py:1382} INFO - Executing 
<Task(PythonOperator): transform> on 2023-11-28 00:01:58.619294+00:00
   [2023-11-30, 12:29:58 CET] {standard_task_runner.py:57} INFO - Started 
process 3168332 to run task
   [2023-11-30, 12:29:58 CET] {standard_task_runner.py:84} INFO - Running: 
['airflow', 'tasks', 'run', 'MASP_Slave', 'transform', 
'manual__2023-11-28T00:01:58.619294+00:00', '--job-id', '890530', '--raw', 
'--subdir', 'DAGS_FOLDER/manufacturing/tests/MASP_Slave.py', '--cfg-path', 
'/tmp/tmp6usvkgvc']
   [2023-11-30, 12:29:58 CET] {standard_task_runner.py:85} INFO - Job 890530: 
Subtask transform
   [2023-11-30, 12:29:59 CET] {task_command.py:416} INFO - Running 
<TaskInstance: MASP_Slave.transform manual__2023-11-28T00:01:58.619294+00:00 
[running]> on host clb-34a01
   [2023-11-30, 12:29:59 CET] {taskinstance.py:1662} INFO - Exporting env vars: 
AIRFLOW_CTX_DAG_OWNER='IMEDA' AIRFLOW_CTX_DAG_ID='MASP_Slave' 
AIRFLOW_CTX_TASK_ID='transform' 
AIRFLOW_CTX_EXECUTION_DATE='2023-11-28T00:01:58.619294+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-28T00:01:58.619294+00:00'
   [2023-11-30, 12:29:59 CET] {taskinstance.py:1937} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 
681, in _deserialize_value
       return pickle.loads(result.value)
   _pickle.UnpicklingError: pickle data was truncated
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/operators/python.py", 
line 192, in execute
       return_value = self.execute_callable()
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/operators/python.py", 
line 209, in execute_callable
       return self.python_callable(*self.op_args, **self.op_kwargs)
     File "/home/apache/airflow/dags/manufacturing/tests/MASP_Slave.py", line 
108, in transform
       extracting_res = [output for output in task_outputs if output is not 
None]
     File "/home/apache/airflow/dags/manufacturing/tests/MASP_Slave.py", line 
108, in <listcomp>
       extracting_res = [output for output in task_outputs if output is not 
None]
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 
720, in __next__
       return XCom.deserialize_value(next(self._it))
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 
693, in deserialize_value
       return BaseXCom._deserialize_value(result, False)
     File 
"/home/apache/.local/lib/python3.10/site-packages/airflow/models/xcom.py", line 
683, in _deserialize_value
       return json.loads(result.value.decode("UTF-8"), cls=XComDecoder, 
object_hook=object_hook)
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: 
invalid start byte
   [2023-11-30, 12:29:59 CET] {taskinstance.py:1400} INFO - Marking task as 
FAILED. dag_id=MASP_Slave, task_id=transform, execution_date=20231128T000158, 
start_date=20231130T112958, end_date=20231130T112959
   [2023-11-30, 12:29:59 CET] {standard_task_runner.py:104} ERROR - Failed to 
execute job 890530 for task transform ('utf-8' codec can't decode byte 0x80 in 
position 0: invalid start byte; 3168332)
   [2023-11-30, 12:29:59 CET] {local_task_job_runner.py:228} INFO - Task exited 
with return code 1
   [2023-11-30, 12:29:59 CET] {taskinstance.py:2778} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   
   The function where it fails :
   def transform(**kwargs):
       ti = kwargs['ti']
       task_outputs = ti.xcom_pull(task_ids=["old_extract", "new_extract"])
       extracting_res = [output for output in task_outputs if output is not 
None]
       df = extracting_res[0]
       df = df.rename(columns={"data_n":"info_data_n"})
       schema = _template_slave.get_schema("test_wafer")
       df = concat_time_date(df)
       df = _template_slave.filter_(df, schema)
       df = _template_slave.cast(df, schema)
       df["location"] = kwargs["dag_run"].conf['location']
       df["src_path"] = kwargs["dag_run"].conf['path']
       return df
   
   The function where the data comes from :
   def old_extract(**kwargs):
       path = kwargs["dag_run"].conf['path']
       header_size = _template_slave.header(path, "Puce N°")
       raw_df = pd.read_csv(path, header=header_size, sep="\t", 
encoding='iso-8859-1', engine='python')
       df = raw_df[raw_df.count(axis=1) > 6][1:]
       df = get_header_fields(path, df, comment = "")
       str_columns = [col for col in df.columns if isinstance(df[col], str)]
       df[str_columns] = df[str_columns].rename(columns=str.lower)\
                                        
.rename(columns=_template_slave.remove_accents)\
                                        
.rename(columns=_template_slave.remove_special_characters)
       return df
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to