codeyeeter commented on a change in pull request #1323:
URL: https://github.com/apache/systemds/pull/1323#discussion_r663380211
##########
File path: src/main/python/tests/examples/tutorials/test_adult.py
##########
@@ -386,51 +386,35 @@ def test_level2(self):
"""""
################################################################################################################
- X1, M1 = X1.transform_encode(spec=jspec).compute()
+ X1, M1 = X1.transform_encode(spec=jspec)
################################################################################################################
""""
- First we re-split out data into a training and a test set with the
corresponding labels. We can then simply transform
- the numpy array of the training data back to SystemDS matrix by using
"sds.from_numpy()".
- The SystemDS scale function takes a matrix as an input and returns
three output parameters:
- # Y Matrix --- Output feature matrix with K
columns
- # ColMean Matrix --- The column means of the input,
subtracted if Center was TRUE
- # ScaleFactor Matrix --- The Scaling of the values, to
make each dimension have similar value ranges
- If we want to retransform a SystemDs Matrix to a Numpy array we can do
so by using the np.array() function.
+ First we re-split out data into a training and a test set with the
corresponding labels.
"""""
################################################################################################################
- col_length = len(X1[0])
- X = X1[0:train_count, 0:col_length - 1]
- Y = X1[0:train_count, col_length - 1:col_length].flatten()
- # Test data
- Xt = X1[train_count:train_count + test_count, 0:col_length - 1]
- Yt = X1[train_count:train_count + test_count, col_length -
1:col_length].flatten()
+ PREPROCESS_package = self.sds.source(self.preprocess_src_path,
"preprocess", print_imported_methods=True)
+ X = PREPROCESS_package.get_X(X1, train_count)
+ Y = PREPROCESS_package.get_Y(X1, train_count)
+ #We lose the column count information after using the Preprocess
Package. This triggers an error on multilogregpredict. Otherwise its working
+ Xt = self.sds.from_numpy(np.array(PREPROCESS_package.get_Xt(X1,
train_count).compute()))
Review comment:
We lose the column count information after splitting the matrix in a
sourced dml file. Is there a way around this issue without relying on this
pretty bad workaround?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]