gaogaotiantian commented on code in PR #55526:
URL: https://github.com/apache/spark/pull/55526#discussion_r3139884880
##########
python/pyspark/testing/mlutils.py:
##########
@@ -99,11 +100,6 @@ def tearDownClass(cls):
class MockDataset(DataFrame):
Review Comment:
Because it is supposed to use the classic dataframe. The "parent" dataframe
is not supposed to be used directly.
```python
# HACK ALERT!! this is to reduce the backward compatibility concern, and
returns
# Spark Classic DataFrame by default. This is NOT an API, and NOT
supposed to
# be directly invoked. DO NOT use this constructor.
```
We have a really weird way to keep backward compatibility by calling child
`__new__` in parent class. If we do not inherit the classic dataframe, we need
to write a `__new__` method for `MockDataset` because the parent dataframe's
`__new__` expects certain arguments that `MockDataset` does not provide. Both
classic and connect `DataFrame` is fine because they have `__init__` to
indicate type annotations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]