date:20190312

Re: [scikit-learn] Difference in prediction accuracy using SGDClassifier and Cross validation scores.

2019-03-12 Thread Joel Nothman

You are calculating recall, not accuracy.

On Sun, 10 Mar 2019 at 05:36, Rajnish kamboj  wrote:
>
> Hi
>
> I have recently started machine learning and it is my first query regarding 
> prediction accuracy.
>
> There is difference in prediction accuracy using SGDClassifier and Cross 
> validation scores.
>
> import numpy as np
> from sklearn.datasets import fetch_openml
> from sklearn.linear_model import SGDClassifier
>
> mnist = fetch_openml('mnist_784', version=1, cache=True)
> X, y = mnist['data'], mnist['target']
> X_train, X_test, y_train, y_test = X[:6], X[6:], y[:6], y[6:]
> shuffled_index = np.random.permutation(6) # shuffle the 0 - 6 range
> X_train, y_train = X_train[shuffled_index], y_train[shuffled_index]
>
> y_train_5 = (y_train == '5')
> y_test_5 = (y_test == '5')
>
> sgd_clf = SGDClassifier(random_state=42, tol=1e-3, max_iter=1000)
> sgd_clf.fit(X_train, y_train_5)
>
> # Predicting for all 5s
> print("### PREDICTION STATS ##")
> y_train_5_pred = sgd_clf.predict(X_train)
>
> print("Total y_train_5 [False|True both]]:", len(y_train_5))
> print("Total y_train_5 [Only 5s]:", sum(y_train_5))
>
> # some other digit may be predicted as 5 and some 5s may be predicted as not 5
> print("Predicted 5s:", sum(y_train_5_pred))
>
> correctly_predicted = sum(np.logical_and(y_train_5_pred, y_train_5))
> print("Correct Predicted", correctly_predicted)
> print("Accuracy:", correctly_predicted/sum(y_train_5) * 100)
>
> from sklearn.model_selection import cross_val_score
> cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring='accuracy')
>
> MY Output
>
> ### PREDICTION STATS ##
> Total y_train_5 [False|True both]]: 6
> Total y_train_5 [Only 5s]: 5421
> Predicted 5s: 3863
> Correct Predicted 3574
> Accuracy: 65.9287954251983
> array([0.9323 , 0.96805, 0.9641 ])
> ###
>
> So as per my observation there is a difference, why?
>
> SGDCLassifier is ~65.92% accurate
> cross_val_score are ~95%
>
> Am I comparing it in wrong way? OR I am missing something?
>
>
> Thanks
>
> Rajnish
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] [WiMLDS scikit-learn] open source sprint in Nairobi, Kenya

2019-03-12 Thread Reshama Shaikh

I am an organizer of the New York City chapter of WiMLDS (Women in Machine
Learning & Data Science) (http://wimlds.org).

We would like to organize a scikit-learn sprint for our Nairobi chapter, (
https://www.meetup.com/topics/wimlds/all/), which is also our 4th largest
of 51 chapters, with 2100+ members.

Reference:  Impact Report for WiMLDS Scikit-learn Sprints (
https://reshamas.github.io/impact-report-for-wimlds-scikit-learn-sprints/)

Would anyone be available to facilitate this event?  It would be on a
Saturday in June 2019.  I can be reached at resh...@wimlds.org for more
information.

Best,
Reshama
--
Reshama Shaikh 

NYC WiMLDS

NYC PyLadies

--
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Difference in prediction accuracy using SGDClassifier and Cross validation scores.

[scikit-learn] [WiMLDS scikit-learn] open source sprint in Nairobi, Kenya

2 matches

Site Navigation

Mail list logo

Footer information