gaogaotiantian commented on code in PR #55526:
URL: https://github.com/apache/spark/pull/55526#discussion_r3139884880


##########
python/pyspark/testing/mlutils.py:
##########
@@ -99,11 +100,6 @@ def tearDownClass(cls):
 
 
 class MockDataset(DataFrame):

Review Comment:
   Because it is supposed to use the classic dataframe. The "parent" dataframe 
is not supposed to be used directly.
   
   ```python
       # HACK ALERT!! this is to reduce the backward compatibility concern, and 
returns
       # Spark Classic DataFrame by default. This is NOT an API, and NOT 
supposed to
       # be directly invoked. DO NOT use this constructor.
   ```
   
   We have a really weird way to keep backward compatibility by calling child 
`__new__` in parent class. If we do not inherit the classic dataframe, we need 
to write a `__new__` method for `MockDataset` because the parent dataframe's 
`__new__` expects certain arguments that `MockDataset` does not provide. Both 
classic and connect `DataFrame` is fine because they have `__init__` to 
indicate type annotations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to