Niketan Pansare created SYSTEMML-1123: -----------------------------------------
Summary: Refactor scikit-learn to make it scalable by using Python DSL Key: SYSTEMML-1123 URL: https://issues.apache.org/jira/browse/SYSTEMML-1123 Project: SystemML Issue Type: New Feature Reporter: Niketan Pansare 1. Eliminate explicit conversion of systemml matrix to NumPy arrays: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L382 2. Use scalable SystemML operations whenever possible Following code should work: {code:java} from sklearn import datasets, neighbors, linear_model import systemml as sml X_train = sml.matrix( ... ) y_train = sml.matrix( ... ) X_test = sml.matrix( ... ) y_test = sml.matrix( ... ) knn = neighbors.KNeighborsClassifier() logistic = linear_model.LogisticRegression() print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test)) print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test)) {code} [~mwdus...@us.ibm.com] [~iyounus] [~freiss] [~reinwald] -- This message was sent by Atlassian JIRA (v6.3.4#6332)