Niketan Pansare created SYSTEMML-1123:
-----------------------------------------

             Summary: Refactor scikit-learn to make it scalable by using Python 
DSL
                 Key: SYSTEMML-1123
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1123
             Project: SystemML
          Issue Type: New Feature
            Reporter: Niketan Pansare


1. Eliminate explicit conversion of systemml matrix to NumPy arrays: 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/validation.py#L382
2. Use scalable SystemML operations whenever possible

Following code should work:

{code:java}
from sklearn import datasets, neighbors, linear_model
import systemml as sml
X_train = sml.matrix( ... )
y_train = sml.matrix( ... )
X_test = sml.matrix( ... )
y_test = sml.matrix( ... )

knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()

print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))
print('LogisticRegression score: %f'
      % logistic.fit(X_train, y_train).score(X_test, y_test))
{code}

[~mwdus...@us.ibm.com] [~iyounus] [~freiss] [~reinwald]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to