Gandagorn commented on a change in pull request #1323:
URL: https://github.com/apache/systemds/pull/1323#discussion_r663786508



##########
File path: src/main/python/tests/examples/tutorials/test_adult.py
##########
@@ -387,6 +387,11 @@ def test_level2(self):
         
################################################################################################################
         X1, M1 = X1.transform_encode(spec=jspec)
 
+        # better alternative for encoding
+        # X1, M = F1.transform_encode(spec=jspec)
+        # X2 = F2.transform_apply(spec=jspec, meta=M)
+        # testX2 = X2.compute(True)

Review comment:
       @Baunsgaard
   Hi, we tried to implement a better version of the encoding using 
transform_apply, because otherwise we would calculate statistics for imputation 
on the whole data instead of just the training data.
   Unfortunately we ran into the problem that the labels in the train data are 
slightly different than the labels in the test data ("<=50K" != "<=50K."), 
which hinders us in using the encoding M for the test data. We tried different 
methods for correcting the labels in the test data before encoding, however the 
main problem is that we have not found a good way to apply changes in a column 
of a frame, and we are also unable to use frames as arguments for a dml script 
function (it seems to be not supported yet with python?).
   The simplest solution would be to remove the "." at the end of the test 
labels in the file itself. Other possible workarounds using just systemds 
functions seem to be quite complex and would probably miss the main goal of 
this tutorial.
   How should we proceed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to