Gandagorn commented on a change in pull request #1323:
URL: https://github.com/apache/systemds/pull/1323#discussion_r663786508
##########
File path: src/main/python/tests/examples/tutorials/test_adult.py
##########
@@ -387,6 +387,11 @@ def test_level2(self):
################################################################################################################
X1, M1 = X1.transform_encode(spec=jspec)
+ # better alternative for encoding
+ # X1, M = F1.transform_encode(spec=jspec)
+ # X2 = F2.transform_apply(spec=jspec, meta=M)
+ # testX2 = X2.compute(True)
Review comment:
@Baunsgaard
Hi, we tried to implement a better version of the encoding using
transform_apply, because otherwise we would calculate statistics for imputation
on the whole data instead of just the training data.
Unfortunately we ran into the problem that the labels in the train data are
slightly different than the labels in the test data ("<=50K" != "<=50K."),
which hinders us in using the encoding M for the test data. We tried different
methods for correcting the labels in the test data before encoding, however the
main problem is that we have not found a good way to apply changes in a column
of a frame, and we are also unable to use frames as arguments for a dml script
function (it seems to be not supported yet with python?).
The simplest solution would be to remove the "." at the end of the test
labels in the file itself. Other possible workarounds using just systemds
functions seem to be quite complex and would probably miss the main goal of
this tutorial.
How should we proceed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]