sjyangkevin commented on issue #50752:
URL: https://github.com/apache/airflow/issues/50752#issuecomment-2906361119

   Hi @Felix-neko,
   
   After conducting some experiments and research. I think the root cause of 
the issue is highly likely related to serialization and deserialization when 
`context` is being passed into the virtual environment. I think one straight 
forward fix is to use `pendulum>=3`, if this is feasible for your use case. Let 
me explain what I've found.
   
   First, you can get your code working by removing `**kwargs` from `def 
generate_message(*args, **kwargs)`, as shown in the screenshot below. When 
`**kwargs` is used, Airflow `context` will be implicitly passed into the 
function. To run your function in the virtual environment, there is a process 
to serialize your function and arguments. Here is the issue.
   
   
![Image](https://github.com/user-attachments/assets/4e7b2178-b8ac-4cbb-9e16-5494f00e4967)
   
   ### About `pendulum<3` and `pendulum>=3`
   
   The Timezone class in 2.1.2 inherit from the `datetime.tzinfo`, and in 
`tzinfo`, a 
[`__reduce__`](https://github.com/stub42/pytz/blob/82e0891730a38fdcf8c9c680af34712d45a97fde/src/pytz/tzinfo.py#L521)
 method that plays a crucial role in object serialization is defined. You can 
consider this as a custom serialization and need a special deserialization 
process. In `pendulum>=3`, the Timezone class does not inherit from `tzinfo` 
anymore. [Timezone in 
pendulum>=3](https://github.com/python-pendulum/pendulum/blob/fc386be2623f711364c599df3e208eceb4dfa23b/src/pendulum/tz/timezone.py#L53).
   
   ### Why datetime can cause this error?
   If your Airflow deployment is using `pendulum>=3`, the datetime objects in 
the `context` will be serialized without using that custom serialization. 
However, when it is passed into the virtual environment and being deserialized. 
Since the `pendulum` version in your virtual environment is `<3`. The 
deserialization process tries to look for the callable and arguments packaged 
by the `__reduce__` method. However, since `pendulum>=3` doesn't use this 
method anymore, the datetime objects in `context` cannot be deserialized and 
raise the error `AttributeError: type object 'Timezone' has no attribute 
'_unpickle'`.
   
   @potiuk also mentioned in this 
[StackOverflow](https://stackoverflow.com/questions/68886365/airflow-pythonvirtualenvoperators-access-to-context-datetime)
 that the datetime fields in `context` will be serialized.
   
   Let me know if I can help further. I would also appreciate any feedback if 
there is any inaccurate description in my findings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to