Re: [scikit-learn] question regarding 'RANSACRegressor' object has no attribute 'inlier_mask_'

2022-07-29 Thread Shang-Rou Hsieh via scikit-learn
 Thanks. I will give it a try.

On Friday, July 29, 2022 at 03:06:19 PM PDT, Guillaume Lemaître 
 wrote:  
 
 You need to fit the estimator to access the fitted attribute:
In [1]: from sklearn.linear_model import RANSACRegressor   ...: from 
sklearn.datasets import make_regression   ...: X, y = make_regression(   ...:   
  n_samples=200, n_features=2, noise=4.0, random_state=0)   ...: reg = 
RANSACRegressor(random_state=0).fit(X, y)

In [2]: 
In [2]: reg.inlier_mask_Out[2]: array([ True,  True,  True,  True,  True,  
True,  True,  True,  True,        True,  True,  True,  True,  True,  True,  
True,  True,  True,        True,  True,  True,  True,  True,  True,  True,  
True,  True,        True,  True,  True,  True,  True,  True,  True,  True,  
True,        True,  True,  True,  True,  True,  True,  True,  True,  True,      
  True,  True,  True,  True,  True,  True,  True,  True,  True,        True,  
True,  True,  True,  True,  True,  True,  True,  True,        True,  True,  
True,  True,  True,  True,  True,  True,  True,        True,  True,  True,  
True,  True,  True,  True,  True,  True,        True,  True,  True,  True,  
True,  True,  True,  True,  True,        True,  True,  True,  True,  True,  
True,  True,  True,  True,        True,  True,  True,  True,  True,  True,  
True,  True,  True,        True,  True,  True,  True,  True,  True,  True,  
True,  True,        True,  True,  True,  True,  True,  True,  True,  True,  
True,        True,  True,  True,  True,  True,  True,  True,  True,  True,      
  True,  True,  True,  True,  True,  True,  True,  True,  True,        True,  
True,  True,  True,  True,  True,  True,  True,  True,        True,  True,  
True,  True,  True,  True,  True,  True,  True,        True,  True,  True,  
True,  True,  True,  True,  True,  True,        True,  True,  True,  True,  
True,  True,  True,  True,  True,        True,  True,  True,  True,  True,  
True,  True,  True,  True,        True,  True,  True,  True,  True,  True,  
True,  True,  True,        True,  True])
Cheers,--Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

On 29 Jul 2022, at 23:27, Shang-Rou Hsieh via scikit-learn 
 wrote:
To whom it may concern,
Belows are the codes:
-  - - - - 
from sklearn.linear_model import RANSACRegressor

ransac = RANSACRegressor(LinearRegression(), 
 max_trials=100, # default
 min_samples=0.95, 
 loss='absolute_error', # default
 residual_threshold=None, # default 
 random_state=123)
inlier_mask = ransac.inlier_mask_



- - - - 
Here is the error message: 

AttributeError: 'RANSACRegressor' object has no attribute 'inlier_mask_'

SO I checked the attributes of RANSACRegressor using dir (RANSACRegressor) and 
I do not find 'inlier_mask_'
 

Any advise?Henry 


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


  ___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question regarding 'RANSACRegressor' object has no attribute 'inlier_mask_'

2022-07-29 Thread Guillaume Lemaître
You need to fit the estimator to access the fitted attribute:

In [1]: from sklearn.linear_model import RANSACRegressor
   ...: from sklearn.datasets import make_regression
   ...: X, y = make_regression(
   ...: n_samples=200, n_features=2, noise=4.0, random_state=0)
   ...: reg = RANSACRegressor(random_state=0).fit(X, y)


In [2]: 

In [2]: reg.inlier_mask_
Out[2]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True,  True,  True,  True,  True,  True,  True,  True,
True,  True])

Cheers,
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

> On 29 Jul 2022, at 23:27, Shang-Rou Hsieh via scikit-learn 
>  wrote:
> 
> To whom it may concern,
> 
> Belows are the codes:
> 
> -  - - - - 
> from sklearn.linear_model import RANSACRegressor
> 
> ransac = RANSACRegressor(LinearRegression(), 
>  max_trials=100, # default
>  min_samples=0.95, 
>  loss='absolute_error', # default
>  residual_threshold=None, # default 
>  random_state=123)
> 
> inlier_mask = ransac.inlier_mask_
> 
> 
> 
> - - - - 
> Here is the error message: 
> 
> AttributeError: 'RANSACRegressor' object has no attribute 'inlier_mask_'
> 
> SO I checked the attributes of RANSACRegressor using dir (RANSACRegressor) 
> and I do not find 'inlier_mask_'
> 
> 
> Any advise?
> Henry 
> 
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question regarding 'RANSACRegressor' object has no attribute 'inlier_mask_'

2022-07-29 Thread Shang-Rou Hsieh via scikit-learn
To whom it may concern,
Belows are the codes:
-  - - - - 
from sklearn.linear_model import RANSACRegressor

ransac = RANSACRegressor(LinearRegression(), 
 max_trials=100, # default
 min_samples=0.95, 
 loss='absolute_error', # default
 residual_threshold=None, # default 
 random_state=123)
inlier_mask = ransac.inlier_mask_



- - - - 
Here is the error message: 

AttributeError: 'RANSACRegressor' object has no attribute 'inlier_mask_'

SO I checked the attributes of RANSACRegressor using dir (RANSACRegressor) and 
I do not find 'inlier_mask_'
 

Any advise?Henry 


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question RE: skLearn Logistic Regression

2020-10-31 Thread serafim loukas
These are not numpy arrays.

Try:

X = np.array([-3,-2,-1,0,1,2,3]).reshape(-1,1)

And

y = np.array([0, 0, 0, 1, 1, 1, 1]).reshape(-1,1)

Makis


On 31 Oct 2020, at 17:51, The Helmbolds via scikit-learn 
mailto:scikit-learn@python.org>> wrote:

I have a case with binary results and 1-D features, like:

X = np.array(-3,-2,-1,0,1,2,3,)

and

y = np.array(0, 0, 0, 1, 1, 1, 1)

only longer arrays (about 180 entries in each array) of this general type.

So this should be the "simplest" case.

Altho I've tried several variations of the Logistic input formats, in

   LogisticRegression.fit(X, y)

they keep being rejected with the most common error message being

   Missing argument y

I assure you I do indeed have an array "y" that is passed to "fit"

SoWhat do I have to do to get Logistic Regression to accept 1-D features?


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question RE: skLearn Logistic Regression

2020-10-31 Thread The Helmbolds via scikit-learn
I have a case with binary results and 1-D features, like:
    X = np.array(-3,-2,-1,0,1,2,3,)
and
    y = np.array(0, 0, 0, 1, 1, 1, 1)
only longer arrays (about 180 entries in each array) of this general type. 

So this should be the "simplest" case.
Altho I've tried several variations of the Logistic input formats, in
   LogisticRegression.fit(X, y)
they keep being rejected with the most common error message being
   Missing argument y
I assure you I do indeed have an array "y" that is passed to "fit"
SoWhat do I have to do to get Logistic Regression to accept 1-D features?

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about n_jobs in KMeans function

2020-09-14 Thread Mingzhe Hu
Hi scikit-learn contributors,

I hope you are all doing well these days. I am now working on the KMeans
clustering acceleration algorithm and I would like to use your library as a
reference. Your codes are amazing and inspire me a lot in developing a more
efficient solution. But I still have a question about a certain parameter
in sklearn.cluster.KMeans().fit().

The installed version of sklearn is 0.23.0. I noticed there is a parameter
named n_jobs. When I set this to 1, the function runs faster than that
equals to -1. It's confusing to me. I posted the question with more details
on Stack Overflow and here is the link:
https://stackoverflow.com/questions/63887062/questions-about-n-jobs-in-sklearn-cluster-kmeans.
You can check it and we can have more discussions.

Thank you for your time and looking forward to your reply.

All the best,
Mingzhe HU

-- 
Mingzhe HU
Columbia University in the City of New York
M.S. in Electrical Engineering
mh4...@columbia.edu
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question regarding regression models

2020-06-11 Thread serafim loukas
Hi Kelden,

I answered your SO question but for the record this is what happens:

date_index is a scalar and you type date_index.columns which raises the error.

So you just need this:

def predict_price(dates,price):
date_index = np.where(date_format.columns == dates)[0][0]

x = np.zeros(len(date_format.columns))
if date_index >= 0:
x[date_index] = 1

return prediction.predict([x])[0]

predict_price('Feb 20, 2018', 1000)

Bests,
Makis


On 11 Jun 2020, at 15:12, Kelden Dorji 
mailto:keldendraduldo...@gmail.com>> wrote:



Hi scikit-learn,
I have a question related to regression models. Please find my question in the 
link below. I am still new to this and would appreciate any help. Thank you and 
have a nice day!

https://stackoverflow.com/questions/62325079/issues-with-regression-model-giving-inverse-relationship

Kelden Dradul Dorji
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question

2019-10-20 Thread ahmad qassemi
Thanks a lot guys for your great hints.

I've tested to consider only the magnitude or only the phase to attain the
goal, but those don't work in my case, I should consider both
simultaneously to get a correct result. Also, I've considered converting
into two columns (imaginary + real columns). But, the problem is that after
bi-clustering, imaginary columns with their corresponding real columns can
be in different clusters and the problem arises how to assign them to a
similar clusters. What I'm saying is that for each complex value, most
likely real and imag part would be in different clusters and it's not easy
to retrieve them to be in a same cluster. What do you think? Is it possible
to modify Scikit-learn code to work with complex values? Or ...?


Virus-free.
www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Sun, 20 Oct 2019 at 10:09, serafim loukas  wrote:

> I would take the magnitude.
> Otherwise you will have to modify the source code to make it work with
> complex values.
>
> Bests,
> Makis
>
> On Oct 20, 2019, at 15:55, Fernando Marcos Wittmann <
> fernando.wittm...@gmail.com> wrote:
>
> 
> What about converting into two columns? One with the real projection and
> the other with the complex projection?
>
> On Sat, Oct 19, 2019, 3:44 PM ahmad qassemi 
> wrote:
>
>> Dear Mr/Mrs,
>>
>>  I'm a PhD student in DS. I'm trying to use your provided code on *Spectral
>> CoClustering *and *Spectral Biclustering* to bi-cluster my data matrix (
>> https://scikit-learn.org/stable/modules/biclustering.html). Since my
>> data has complex values, i.e., matrix elements are complex, your modules
>> don't work on my data. It seems that the reason is your K-means' code
>> doesn't work with complex numbers. I will really appreciate it if you take
>> some time and tell me how should I apply your codes on my complex data.
>> Thanks a lot in advance.
>>
>> Sincerely,
>> Ahmad Qassemi
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question

2019-10-20 Thread serafim loukas
I would take the magnitude.
Otherwise you will have to modify the source code to make it work with complex 
values.

Bests,
Makis

On Oct 20, 2019, at 15:55, Fernando Marcos Wittmann 
 wrote:


What about converting into two columns? One with the real projection and the 
other with the complex projection?

On Sat, Oct 19, 2019, 3:44 PM ahmad qassemi 
mailto:ahmadqass...@gmail.com>> wrote:
Dear Mr/Mrs,

 I'm a PhD student in DS. I'm trying to use your provided code on Spectral 
CoClustering and Spectral Biclustering to bi-cluster my data matrix 
(https://scikit-learn.org/stable/modules/biclustering.html). Since my data has 
complex values, i.e., matrix elements are complex, your modules don't work on 
my data. It seems that the reason is your K-means' code doesn't work with 
complex numbers. I will really appreciate it if you take some time and tell me 
how should I apply your codes on my complex data. Thanks a lot in advance.

Sincerely,
Ahmad Qassemi
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question

2019-10-20 Thread Fernando Marcos Wittmann
What about converting into two columns? One with the real projection and
the other with the complex projection?

On Sat, Oct 19, 2019, 3:44 PM ahmad qassemi  wrote:

> Dear Mr/Mrs,
>
>  I'm a PhD student in DS. I'm trying to use your provided code on *Spectral
> CoClustering *and *Spectral Biclustering* to bi-cluster my data matrix (
> https://scikit-learn.org/stable/modules/biclustering.html). Since my data
> has complex values, i.e., matrix elements are complex, your modules don't
> work on my data. It seems that the reason is your K-means' code doesn't
> work with complex numbers. I will really appreciate it if you take some
> time and tell me how should I apply your codes on my complex data. Thanks a
> lot in advance.
>
> Sincerely,
> Ahmad Qassemi
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question

2019-10-19 Thread federico vaggi
Your options are to either pick a clustering algorithm that supports a
pre-computed distance matrix, or, find some kind of projection from C -> R,
embed your data in R, then cluster your embedded data and transfer the
labels back to C.

On Sat, Oct 19, 2019 at 11:44 AM ahmad qassemi 
wrote:

> Dear Mr/Mrs,
>
>  I'm a PhD student in DS. I'm trying to use your provided code on *Spectral
> CoClustering *and *Spectral Biclustering* to bi-cluster my data matrix (
> https://scikit-learn.org/stable/modules/biclustering.html). Since my data
> has complex values, i.e., matrix elements are complex, your modules don't
> work on my data. It seems that the reason is your K-means' code doesn't
> work with complex numbers. I will really appreciate it if you take some
> time and tell me how should I apply your codes on my complex data. Thanks a
> lot in advance.
>
> Sincerely,
> Ahmad Qassemi
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question

2019-10-19 Thread ahmad qassemi
Dear Mr/Mrs,

 I'm a PhD student in DS. I'm trying to use your provided code on *Spectral
CoClustering *and *Spectral Biclustering* to bi-cluster my data matrix (
https://scikit-learn.org/stable/modules/biclustering.html). Since my data
has complex values, i.e., matrix elements are complex, your modules don't
work on my data. It seems that the reason is your K-means' code doesn't
work with complex numbers. I will really appreciate it if you take some
time and tell me how should I apply your codes on my complex data. Thanks a
lot in advance.

Sincerely,
Ahmad Qassemi
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about Kmeans implementation in sklearn

2019-08-05 Thread Chris Aridas
Hey Serafim,

In this line
https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/cluster/k_means_.py#L303
you can see that a randomstate object is constructed and that object is
passed in the for loop that you are referring to, not the integer value
that was passed in the function.

Cheers,
Chris

On Mon, 5 Aug 2019 20:58 serafim loukas,  wrote:

> Dear Sklearn community,
>
>
> I have a simple question concerning the implementation of KMeans
> clustering algorithm.
> Two of the input arguments are the “n_init” and “random_state”.
>
> Consider a case where  *“n_init=10” and “random_state=0”.*
>
> By looking at the source code (
> https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/cluster/k_means_.py#L187),
> we have the following:
>
> for it in range(n_init):
> # run a k-means once
> labels, inertia, centers, n_iter_ = kmeans_single(
> X, sample_weight, n_clusters, max_iter=max_iter, init=init,
> verbose=verbose, precompute_distances=precompute_distances,
> tol=tol, x_squared_norms=x_squared_norms,
> random_state=random_state)
>
>
> My question is: Why the results are not going to be the same for all
> `n_init` iterations since `random_state` is fixed?
>
>
> Bests,
> Makis
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about Kmeans implementation in sklearn

2019-08-05 Thread serafim loukas
Dear Sklearn community,


I have a simple question concerning the implementation of KMeans clustering 
algorithm.
Two of the input arguments are the “n_init” and “random_state”.

Consider a case where  “n_init=10” and “random_state=0”.

By looking at the source code 
(https://github.com/scikit-learn/scikit-learn/blob/1495f69242646d239d89a5713982946b8ffcf9d9/sklearn/cluster/k_means_.py#L187),
 we have the following:

for it in range(n_init):
# run a k-means once
labels, inertia, centers, n_iter_ = kmeans_single(
X, sample_weight, n_clusters, max_iter=max_iter, init=init,
verbose=verbose, precompute_distances=precompute_distances,
tol=tol, x_squared_norms=x_squared_norms,
random_state=random_state)


My question is: Why the results are not going to be the same for all `n_init` 
iterations since `random_state` is fixed?


Bests,
Makis
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question using GridSearchCV

2019-07-24 Thread Glenn Schultz via scikit-learn
Thank you for answering ... makes sense now that you point it out.

Sent from my iPhone


> On Jul 24, 2019, at 2:57 PM, Andreas Mueller  wrote:
> 
> scoring is not a parameter.
> It needs to be passed to GridSearchCV
> 
> selfCLF =GridSearchCV(GradientBoostingClassifier(), parameters, versose = 3m 
> n_jobs = 4), scoring='roc_auc')
> 
> 
> 
>> On 7/24/19 1:24 PM, Glenn Schultz via scikit-learn wrote:
>> I am using GBClassifier, the below works if I use the default accuracy but 
>> it fails using roc_auc or roc_auc_score.  I have found many examples to work 
>> with but for the life of me I can’t get it two work with roc_auc.  What am I 
>> doing wrong.
>> 
>> from sklearn.ensemble import GradientBoostingClassifier
>> from sklearn.model_selection import GridSearchCV
>> from sklearn.metrics import auction, roc_auc_score
>> 
>> y_train = LoansTrainData[‘event’]
>> x_train LoanTrainData[LoansTrainData.columns.drop(‘event’)]
>> 
>> parameters = {“loss” :[‘deviance’],
>>“scoring” :[‘roc_auc’},
>>“learning_rate” :[.1, .05]
>> 
>> selfCLF =GridSearchCV(GradientBoostingClassifier(), parameters, versose = 3m 
>> n_jobs = 4)
>> searchCLF(x_train, y_train)
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question using GridSearchCV

2019-07-24 Thread Andreas Mueller

scoring is not a parameter.
It needs to be passed to GridSearchCV

selfCLF =GridSearchCV(GradientBoostingClassifier(), parameters, versose = 3m 
n_jobs = 4), scoring='roc_auc')



On 7/24/19 1:24 PM, Glenn Schultz via scikit-learn wrote:

I am using GBClassifier, the below works if I use the default accuracy but it 
fails using roc_auc or roc_auc_score.  I have found many examples to work with 
but for the life of me I can’t get it two work with roc_auc.  What am I doing 
wrong.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import auction, roc_auc_score

y_train = LoansTrainData[‘event’]
x_train LoanTrainData[LoansTrainData.columns.drop(‘event’)]

parameters = {“loss” :[‘deviance’],
“scoring” :[‘roc_auc’},
“learning_rate” :[.1, .05]

selfCLF =GridSearchCV(GradientBoostingClassifier(), parameters, versose = 3m 
n_jobs = 4)
searchCLF(x_train, y_train)
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question using GridSearchCV

2019-07-24 Thread Glenn Schultz via scikit-learn
I am using GBClassifier, the below works if I use the default accuracy but it 
fails using roc_auc or roc_auc_score.  I have found many examples to work with 
but for the life of me I can’t get it two work with roc_auc.  What am I doing 
wrong.

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import auction, roc_auc_score 

y_train = LoansTrainData[‘event’]
x_train LoanTrainData[LoansTrainData.columns.drop(‘event’)]

parameters = {“loss” :[‘deviance’],
“scoring” :[‘roc_auc’},
“learning_rate” :[.1, .05]

selfCLF =GridSearchCV(GradientBoostingClassifier(), parameters, versose = 3m 
n_jobs = 4)
searchCLF(x_train, y_train)
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] [Question & Help]The criterion of data size for choosing a right algorithm.

2019-02-14 Thread skim22
Dear Sir or Madam,

Good morning, My name is Steven Kim from Memphis, and I am a graduate student 
at the University of Memphis. Recently, I found the page of choosing the right 
estimator on the official website 
(https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html).
It was greatly helpful to distinguish what algorithms I should use.
But, I would like to know something in detail on the page.
My question is that there are several criteria of sample size such as ">50" and 
"100k>" before the decision. Could you let me know that the grounds( ex) 
academic papers ) for the sample sizes? It would be helpful to understand them 
more deeply.

Regards,
Steven Kim


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about contributing to scikit-learn

2018-12-10 Thread parker x
Hi Emmanuel and Joel,

Thanks very much for your advice. I will take a look at small issues first
and see what to contribute from there.

Best,
Parker

eamanu15  于2018年12月9日周日 上午6:17写道:

> Hello Parker,
>
> I can tell you my experience.
>
> I start to contribute to sklearn two month ago, and I start with code
> review, this way I can learn how sklearn is written and how is the
> workflow, read issue and try to solve them. Then, I make some PR.
>
> I can tell that the core devs are very friendly and help you always.
> Specially, I had more contact with Joel Nothman and Andreas Mueller (thanks
> guys).
>
> So, I hope this help you in some way =)
>
> Regards!
> Emmanuel
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about contributing to scikit-learn

2018-12-09 Thread eamanu15
Hello Parker,

I can tell you my experience.

I start to contribute to sklearn two month ago, and I start with code
review, this way I can learn how sklearn is written and how is the
workflow, read issue and try to solve them. Then, I make some PR.

I can tell that the core devs are very friendly and help you always.
Specially, I had more contact with Joel Nothman and Andreas Mueller (thanks
guys).

So, I hope this help you in some way =)

Regards!
Emmanuel
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about contributing to scikit-learn

2018-12-08 Thread Joel Nothman
Hi Parker,

We strongly urge new contributors to start with small issues
(documentation, small fixes, etc.) to gain confidence in the contribution
procedure, etc. Once you've worked on small issues and understand better
what comes through the issue tracker, you can consider bigger contributions.

We have indeed proposed support for imblearn-like Pipeline extensions (
https://github.com/scikit-learn/scikit-learn/issues/3855#issuecomment-357949997).
And yes, we're in need of a contributor there, but I would rather review
and merge smaller pieces of your work, before finding a large one that
needs a lot of changes before merge.

Joel

On Wed, 5 Dec 2018 at 12:15, parker x  wrote:

> Dear scikit-learn developers,
>
> My name is Parker, and I'm a data scientist.
>
> Scikit-learn is a great ML library that I work frequently for work and
> personal projects. I have always wanted to contribute something to the
> scikit-learn community, and I am wondering if you could give some opinions
> on following two ideas for contribution.
>
> My first idea is to integrate another python library 'imbalanced-learn'
> into scikit-learn so that people could also use scikit-learn to deal with
> imbalance issues.
>
> Another idea is to combine those scikit-learn built-in feature selection
> functions into one automated feature selection function that might benefit
> those users who are not familiar with feature selection process.
>
> Looking forward to your suggestions! And thank you very much for your time!
>
> Best,
> Parker
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about contributing to scikit-learn

2018-12-04 Thread parker x
Dear scikit-learn developers,

My name is Parker, and I'm a data scientist.

Scikit-learn is a great ML library that I work frequently for work and
personal projects. I have always wanted to contribute something to the
scikit-learn community, and I am wondering if you could give some opinions
on following two ideas for contribution.

My first idea is to integrate another python library 'imbalanced-learn'
into scikit-learn so that people could also use scikit-learn to deal with
imbalance issues.

Another idea is to combine those scikit-learn built-in feature selection
functions into one automated feature selection function that might benefit
those users who are not familiar with feature selection process.

Looking forward to your suggestions! And thank you very much for your time!

Best,
Parker
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about get_params / set_params

2018-10-28 Thread Guillaume Lemaître
On Sun, 28 Oct 2018 at 09:31, Louis Abraham via scikit-learn <
scikit-learn@python.org> wrote:

> Hi,
>
> According to
> http://scikit-learn.org/0.16/developers/index.html#get-params-and-set-params
> ,
> get_params and set_params are used to clone estimators.
>

sklearn.base.clone is function used for cloning. get_params and set_params
are accessors to attributes of an estimator and are defined by
BaseEstimator.
For Pipeline and FeatureUnion, those accessors rely on the _BaseComposition
which manage the access to attributes to the sub-estimators.


> However, I don't understand how it is used in FeatureUnion:
> `return self._get_params('transformer_list', deep=deep)`
>

transformer_list contain all the estimators used in the FeatureUnion, and
the _BaseComposition allow you to access the parameters of each transformer.


>
> Why doesn't it contain other arguments like n_jobs and transformer_weights?
>

The first line in _get_params in _BaseCompositin will list the attributes
of FeatureUnion;
https://github.com/scikit-learn/scikit-learn/blob/06ac22d06f54353ea5d5bba244371474c7baf938/sklearn/utils/metaestimators.py#L26

For instance:

In [5]: trans = FeatureUnion([('trans1', StandardScaler()), ('trans2',
MinMaxScaler())])


In [6]:
trans.get_params()

Out[6]:
{'n_jobs': None,
 'transformer_list': [('trans1',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ('trans2', MinMaxScaler(copy=True, feature_range=(0, 1)))],
 'transformer_weights': None,
 'trans1': StandardScaler(copy=True, with_mean=True, with_std=True),
 'trans2': MinMaxScaler(copy=True, feature_range=(0, 1)),
 'trans1__copy': True,
 'trans1__with_mean': True,
 'trans1__with_std': True,
 'trans2__copy': True,
 'trans2__feature_range': (0, 1)}

Then, n_jobs and transformer_weights are accessible.


>
> Best
> Louis
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about get_params / set_params

2018-10-28 Thread Louis Abraham via scikit-learn
Hi,

According to 
http://scikit-learn.org/0.16/developers/index.html#get-params-and-set-params 
,
get_params and set_params are used to clone estimators.
However, I don't understand how it is used in FeatureUnion:
`return self._get_params('transformer_list', deep=deep)`

Why doesn't it contain other arguments like n_jobs and transformer_weights?

Best
Louis

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about dummy coding using DictVectorizer or FeatureHasher: generating correlated dimensions

2017-11-06 Thread Yusuke Nishioka
Hello,

I have a question about dummy coding using DictVectorizer or FeatureHasher.

```
>>> from sklearn.feature_extraction import DictVectorizer, FeatureHasher
>>> D = [{'age': 23, 'gender': 'm'},{'age': 34, 'gender': 'f'},{'age': 18,
'gender': 'f'},{'age': 50, 'gender': 'm'}]
>>> m1 = FeatureHasher(n_features=10)
>>> m1.fit_transform(D).toarray()
array([[  0.,   0.,  -1.,   0.,   0.,   0.,   0.,   0.,   0.,  23.],
   [  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,  34.],
   [  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,  18.],
   [  0.,   0.,  -1.,   0.,   0.,   0.,   0.,   0.,   0.,  50.]])
>>> m2 = DictVectorizer(sparse=False)
>>> m2.fit_transform(D)
array([[ 23.,   0.,   1.],
   [ 34.,   1.,   0.],
   [ 18.,   1.,   0.],
   [ 50.,   0.,   1.]])
>>> m2.feature_names_
['age', 'gender=f', 'gender=m']
```

Since both DictVectorizer and FeatureHasher generate dimensions for
'gender=m' and 'gender=f',
these dimensions are perfectly correlated.
This is because DictVectorizer and FeatureHasher by default generate n
dimensions for n categorical values of 1 feature.

My questions are as follows:

1. My expectation is for them to generate n-1 dimensions for n categorical
values,
   and is there any way to do this using DictVectorizer and FeatureHasher?
2. How should I handle these correlated dimensions?
   In my understanding, the training on data which has colinearity will
make prediction unstable.
   Will L1 or L2 regularization work for this problem?

If there is any issue or article related to these questions,
would you please tell me the URL? Thank you.


Regards,
Yusuke
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-22 Thread Hristo Georgiev
Hi,

As it has been indicated by other members, methods such as
``LocalOutlierFactor`` do not expose a ``predict``  method by design.

However, if you nevertheless would still like to keep experimenting in the
direction of attempting to make predictions on "unseen" data, you could
simply create a sub-class with a ``predict()`` wrapper, as in:
https://gist.github.com/hristog/b6151d21aa38a6c80d80d160b7771ce9

Hristo



> On 10/06/2017 12:53 AM, Lifan Xu wrote:
>
>> Hi,
>>
>> I was trying to train a model for anomaly detection. I only have the
>> normal data which are all labeled as 1. Here is my code:
>>
>>
>> clf = sklearn.model_selection.GridSearchCV(sklearn.neighbors.
>> LocalOutlierFactor(),
>>parameters,
>>scoring="accuracy",
>>cv=kfold,
>>n_jobs=10)
>> clf.fit(vectors, labels)
>>
>>
>> But it complains "AttributeError: 'LocalOutlierFactor' object has no
>> attribute 'predict'".
>>
>> It looks like LocalOutlierFactor only has fit_predict(), but no
>> predict().
>>
>> My question is will predict() be implemented?
>>
>>
>> Thanks!
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about LDA's coef_ attribute

2017-10-16 Thread Serafeim Loukas
Dear Alex,

Thank you for the prompt response.

Are the eigenvectors stored in some variable ?
Does the lda.scalings_ attribute contain the eigenvectors ?

Best,
Serafeim

> On 16 Oct 2017, at 16:57, Alexandre Gramfort  
> wrote:
> 
> no it stores the direction of the decision function to match the API of
> linear models.
> 
> HTH
> Alex
> 
> On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas  wrote:
>> Dear Scikit-learn community,
>> 
>> Since the documentation of the LDA
>> (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html)
>> is not so clear, I would like to ask if the lda.coef_ attribute stores the
>> eigenvectors from the SVD decomposition.
>> 
>> Thank you in advance,
>> Serafeim
>> 
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about LDA's coef_ attribute

2017-10-16 Thread Alexandre Gramfort
no it stores the direction of the decision function to match the API of
linear models.

HTH
Alex

On Mon, Oct 16, 2017 at 3:27 PM, Serafeim Loukas  wrote:
> Dear Scikit-learn community,
>
> Since the documentation of the LDA
> (http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html)
> is not so clear, I would like to ask if the lda.coef_ attribute stores the
> eigenvectors from the SVD decomposition.
>
> Thank you in advance,
> Serafeim
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about LDA's coef_ attribute

2017-10-16 Thread Serafeim Loukas
Dear Scikit-learn community,

Since the documentation of the LDA 
(http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
 
)
 is not so clear, I would like to ask if the lda.coef_ attribute stores the 
eigenvectors from the SVD decomposition.

Thank you in advance,
Serafeim___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-09 Thread Andreas Mueller

What are you trying to achieve with this code?
If you label everything as 1, the highest accuracy will be obtained if 
everything is labeled as 1.

So even if the interface was implemented, the result would not be helpful.


On 10/06/2017 12:53 AM, Lifan Xu wrote:

Hi,

    I was trying to train a model for anomaly detection. I only have 
the normal data which are all labeled as 1. Here is my code:



    clf = 
sklearn.model_selection.GridSearchCV(sklearn.neighbors.LocalOutlierFactor(),

   parameters,
   scoring="accuracy",
   cv=kfold,
   n_jobs=10)
    clf.fit(vectors, labels)


    But it complains "AttributeError: 'LocalOutlierFactor' object has 
no attribute 'predict'".


    It looks like LocalOutlierFactor only has fit_predict(), but no 
predict().


    My question is will predict() be implemented?


    Thanks!

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-08 Thread Albert Thomas
Hi,

As Joel said LOF is not designed to be applied on unseen data. Therefore
there is no public predict.

Albert

On Sun 8 Oct 2017 at 06:17, Joel Nothman  wrote:

> actually I'm probably wrong there, but you may not be able to use accuracy
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-07 Thread Joel Nothman
actually I'm probably wrong there, but you may not be able to use accuracy
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-07 Thread Joel Nothman
I don't think LOF is designed to apply to unseen data.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question for using GridSearchCV on LocalOutlierFactor

2017-10-05 Thread Lifan Xu

Hi,

I was trying to train a model for anomaly detection. I only have 
the normal data which are all labeled as 1. Here is my code:



clf = 
sklearn.model_selection.GridSearchCV(sklearn.neighbors.LocalOutlierFactor(),

   parameters,
   scoring="accuracy",
   cv=kfold,
   n_jobs=10)
clf.fit(vectors, labels)


But it complains "AttributeError: 'LocalOutlierFactor' object has 
no attribute 'predict'".


It looks like LocalOutlierFactor only has fit_predict(), but no 
predict().


My question is will predict() be implemented?


Thanks!

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question-Early Stopping MLPClassifer RandomizedSearchCV

2017-08-14 Thread Andreas Mueller

Yes, you understood correctly.
You can see the implementation in the code:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neural_network/multilayer_perceptron.py#L491

It calls ``train_test_split``, so it's a random subset of the data. 
Currently the API doesn't allow providing your own validation set.

What is the use-case for that?

Andy

On 08/11/2017 05:57 PM, fabian.si...@gmx.net wrote:

Hello Scikit-Learn Team,
I´ve got a question concerning the implementation of Early Stopping in 
MLPClassifier. I am using it in combination with RandomizedSearchCV. 
The fraction used for validation in early stopping is set with the 
parameter validation_fraction of MLPClassifier. How is the validaton 
set extracted from the training set ? Does the function simply take 
the last X % from the training set ? Is there a possibility to 
manually set this validation set ?
I wonder whether I correctly understand the functionality: The neural 
net is trained on the training data and the performance is evaluated 
after every epoch on the validation set (which is internally selected 
by the MLPClassifer)? If the Net stops training, the performance on 
the left out data (Parameter "cv" in RandomizedSearch) is determined ?

Thank you very much for your help !
Kind Regards,
Fabian Sippl


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question-Early Stopping MLPClassifer RandomizedSearchCV

2017-08-11 Thread fabian . sippl

Hello Scikit-Learn Team,

 

I´ve got a question concerning the implementation of Early Stopping in MLPClassifier. I am using it in combination with RandomizedSearchCV. The fraction used for validation in early stopping is set with the parameter validation_fraction of MLPClassifier. How is the validaton set extracted from the training set ? Does the function simply take the last X % from the training set ? Is there a possibility to manually set this validation set ? 

 

I wonder whether I correctly understand the functionality: The neural net is trained on the training data and the performance is evaluated after every epoch on the validation set (which is internally selected by the MLPClassifer)? If the Net stops training, the performance on the left out data (Parameter "cv" in RandomizedSearch) is determined ? 

 

Thank you very much for your help !

 

 

Kind Regards,

Fabian Sippl

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question about class_weights in LogisticRegression

2017-08-03 Thread Tom DLT
The class weights and sample weights are used in the same way, as a factor
specific to each sample, in the loss function.
In LogisticRegression, it is equivalent to incorporate this factor into a
regularization parameter C specific to each sample.

Tom

2017-08-01 18:30 GMT+02:00 Johnson, Jeremiah :

> Right, I know how the class_weight calculation is performed. But then
> those class weights are utilized during the model fit process in some way
> in liblinear, and that¹s what I am interested in. libSVM does
> class_weight[I] * C (https://www.csie.ntu.edu.tw/~cjlin/libsvm/); is the
> implementation in liblinear the same?
>
> Best,
> Jeremiah
>
>
>
> On 8/1/17, 12:19 PM, "scikit-learn on behalf of Stuart Reynolds"
>  stu...@stuartreynolds.net> wrote:
>
> >I hope not. And not accoring to the docs...
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_scikit-2Dl
> >earn_scikit-2Dlearn_blob_ab93d65_sklearn_linear-
> 5Fmodel_logistic.py-23L947
> >&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm4n54VBW80WEzIAaqvTO
> cTEjhIk
> >rRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEOWj8o&s=
> 4uJZS3EaQgysmQlzjt-
> >yuLkSlcXTd5G50LkEFMcbZLQ&e=
> >
> >class_weight : dict or 'balanced', optional
> >Weights associated with classes in the form ``{class_label: weight}``.
> >If not given, all classes are supposed to have weight one.
> >The "balanced" mode uses the values of y to automatically adjust
> >weights inversely proportional to class frequencies in the input data
> >as ``n_samples / (n_classes * np.bincount(y))``.
> >Note that these weights will be multiplied with sample_weight (passed
> >through the fit method) if sample_weight is specified.
> >
> >On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah
> > wrote:
> >> Hello all,
> >>
> >> I¹m looking for confirmation on an implementation detail that is
> >>somewhere
> >> in liblinear, but I haven¹t found documentation for yet. When the
> >> class_weights=Œbalanced¹ parameter is set in LogisticRegression, then
> >>the
> >> regularisation parameter for an observation from class I is
> >>class_weight[I]
> >> * C, where C is the usual regularization parameter ­ is this correct?
> >>
> >> Thanks,
> >> Jeremiah
> >>
> >>
> >> ___
> >> scikit-learn mailing list
> >> scikit-learn@python.org
> >>
> >>https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.python.org_mail
> >>man_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_
> KAUkrdoA&r=hQNTLb4Jo
> >>nm4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_
> FwzMKMwn
> >>vEOWj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e=
> >>
> >___
> >scikit-learn mailing list
> >scikit-learn@python.org
> >https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.python.org_mailm
> >an_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_
> KAUkrdoA&r=hQNTLb4Jonm
> >4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_
> FwzMKMwnvEO
> >Wj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e=
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question about class_weights in LogisticRegression

2017-08-01 Thread Johnson, Jeremiah
Right, I know how the class_weight calculation is performed. But then
those class weights are utilized during the model fit process in some way
in liblinear, and that¹s what I am interested in. libSVM does
class_weight[I] * C (https://www.csie.ntu.edu.tw/~cjlin/libsvm/); is the
implementation in liblinear the same?

Best,
Jeremiah



On 8/1/17, 12:19 PM, "scikit-learn on behalf of Stuart Reynolds"
 wrote:

>I hope not. And not accoring to the docs...
>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_scikit-2Dl
>earn_scikit-2Dlearn_blob_ab93d65_sklearn_linear-5Fmodel_logistic.py-23L947
>&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm4n54VBW80WEzIAaqvTOcTEjhIk
>rRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEOWj8o&s=4uJZS3EaQgysmQlzjt-
>yuLkSlcXTd5G50LkEFMcbZLQ&e=
>
>class_weight : dict or 'balanced', optional
>Weights associated with classes in the form ``{class_label: weight}``.
>If not given, all classes are supposed to have weight one.
>The "balanced" mode uses the values of y to automatically adjust
>weights inversely proportional to class frequencies in the input data
>as ``n_samples / (n_classes * np.bincount(y))``.
>Note that these weights will be multiplied with sample_weight (passed
>through the fit method) if sample_weight is specified.
>
>On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah
> wrote:
>> Hello all,
>>
>> I¹m looking for confirmation on an implementation detail that is
>>somewhere
>> in liblinear, but I haven¹t found documentation for yet. When the
>> class_weights=Œbalanced¹ parameter is set in LogisticRegression, then
>>the
>> regularisation parameter for an observation from class I is
>>class_weight[I]
>> * C, where C is the usual regularization parameter ­ is this correct?
>>
>> Thanks,
>> Jeremiah
>>
>>
>> ___
>> scikit-learn mailing list
>> scikit-learn@python.org
>> 
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mail
>>man_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jo
>>nm4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwn
>>vEOWj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e=
>>
>___
>scikit-learn mailing list
>scikit-learn@python.org
>https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailm
>an_listinfo_scikit-2Dlearn&d=DwIGaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm
>4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=2XR2z3VWvEaERt4miGabDte3xkz_FwzMKMwnvEO
>Wj8o&s=MgZoI9VOHFh3omGKHTASFx3NAVjj6AY3j_75mnOUg04&e= 

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question about class_weights in LogisticRegression

2017-08-01 Thread Stuart Reynolds
I hope not. And not accoring to the docs...
https://github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/linear_model/logistic.py#L947

class_weight : dict or 'balanced', optional
Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

On Tue, Aug 1, 2017 at 9:03 AM, Johnson, Jeremiah
 wrote:
> Hello all,
>
> I’m looking for confirmation on an implementation detail that is somewhere
> in liblinear, but I haven’t found documentation for yet. When the
> class_weights=‘balanced’ parameter is set in LogisticRegression, then the
> regularisation parameter for an observation from class I is class_weight[I]
> * C, where C is the usual regularization parameter – is this correct?
>
> Thanks,
> Jeremiah
>
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question about class_weights in LogisticRegression

2017-08-01 Thread Johnson, Jeremiah
Hello all,

I'm looking for confirmation on an implementation detail that is somewhere in 
liblinear, but I haven't found documentation for yet. When the 
class_weights='balanced' parameter is set in LogisticRegression, then the 
regularisation parameter for an observation from class I is class_weight[I] * 
C, where C is the usual regularization parameter - is this correct?

Thanks,
Jeremiah

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about the Library of “sklearn.neural_network.BernoulliRBM” that Creates Highly Correlated Features.

2017-07-27 Thread Masanari Kondo
Dear all,

I’m using the sklearn library to generate new features of a dataset
using a Restricted Boltzmann Machine (RBM,
sklearn.neural_network.BernoulliRBM). I use the following environment:

python 3.5.0
numpy==1.11.1
scikit-learn==0.18


I have already tried a large number of iterations (n_iter=6000) and a
low learning rate (0.0001) for all training data (373 samples). However,
The new features that are generated by the RBM are all highly
correlated. Can anyone explain why this happens?


Below is a MWE:


import numpy as np
import csv
from sklearn.neural_network import BernoulliRBM

# train data
train_data = np.array(
[[0.0326086956522,0.0,0.0,0.0200400801603,0.0674157303371,0.000805152979066,0.00200803212851,0.243243243243,0.0123456790123,0.55,0.0233428760185,0.0,0.0,0.0,0.4,0.0,0.0,0.157556270138,0.0188679245283,0.0983652512615],
[0.0108695652174,0.2,0.0,0.00200400801603,0.0112359550562,0.0,0.0,0.027027027027,0.0123456790123,1.0,0.00154151068047,0.0,0.0,1.0,1.0,0.0,0.0,0.0289389067571,0.0,0.0],
[0.0869565217391,0.0,0.152542372881,0.0260521042084,0.0749063670412,0.00322061191626,0.0180722891566,0.108108108108,0.0987654320988,0.4,0.022241796961,0.2,0.0909090909091,0.0,0.40625,0.0,0.0,0.053054662388,0.0188679245283,0.129097937384],
[0.0326086956522,0.2,0.0847457627119,0.0140280561122,0.0149812734082,0.000268384326355,0.0120481927711,0.027027027027,0.0246913580247,0.25,0.00352345298392,1.0,0.0,0.75,0.6,0.0,0.0,0.0192926045047,0.0188679245283,0.0983652512615],
[0.0978260869565,0.0,0.0,0.0100200400802,0.0711610486891,0.00214707461084,0.00803212851406,0.027027027027,0.,0.265625,0.0262056815679,1.0,0.0,0.0,0.518518519,0.0,0.0,0.0568060021635,0.0566037735849,0.213107498008],
[0.0760869565217,0.8,0.0,0.0180360721443,0.0936329588015,0.0,0.0120481927711,0.0810810810811,0.0864197530864,0.35,0.0561550319313,0.0,0.0,0.863636364,0.342857143,0.5,0.,0.168121267841,0.169811320755,0.463705037033],
[0.0978260869565,1.0,0.0,0.0100200400802,0.063670411985,0.00697799248524,0.0,0.135135135135,0.0740740740741,0.416665,0.0156353226162,0.0,0.0,0.949367089,0.3,0.25,0.2667,0.0316184351626,0.0566037735849,0.163932249402],
[0.0326086956522,0.2,0.0,0.0380761523046,0.0374531835206,0.000805152979066,0.0281124497992,0.135135135135,0.037037037037,1.0,0.00836820083682,0.0,0.0,0.923076923,0.58333,0.0,0.0,0.0562700964881,0.0188679245283,0.0491752486057],
[0.0108695652174,0.0,0.0,0.0200400801603,0.00374531835206,0.0,0.0160642570281,0.0540540540541,0.0123456790123,1.0,0.000220215811495,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0188679245283,0.147540499867],
[0.217391304348,0.0,0.0,0.0140280561122,0.295880149813,0.0365002683843,0.0100401606426,0.135135135135,0.123456790123,0.4487534625,0.183880202599,1.0,0.0909090909091,0.0,0.19375,0.0,0.0,0.191961414822,0.188679245283,0.287703974741],
[0.0652173913043,0.0,0.0,0.0160320641283,0.0224719101124,0.00402576489533,0.0140562248996,0.027027027027,0.0740740740741,1.0,0.00132129486897,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0188679245283,0.147540499867],
[0.0326086956522,0.6,0.0,0.0100200400802,0.0411985018727,0.000268384326355,0.00200803212851,0.108108108108,0.0123456790123,0.25,0.00902884827131,1.0,0.0909090909091,0.971428571,0.75,0.25,0.1333,0.0594855305401,0.0566037735849,0.147540499867],
[0.119565217391,0.2,0.0,0.0140280561122,0.0973782771536,0.0,0.0100401606426,0.0540540540541,0.135802469136,0.29,0.0398590618806,1.0,0.0,0.529411765,0.409090909,0.0,0.0,0.0723472668927,0.0188679245283,0.107306205553],
[0.0326086956522,0.2,0.0,0.0100200400802,0.0262172284644,0.000268384326355,0.00200803212851,0.108108108108,0.037037037037,0.25,0.00638625853336,1.0,0.0,0.818181818,0.7,0.0,0.0,0.0401929260499,0.0188679245283,0.0983652512615],
[0.173913043478,0.4,0.0,0.0300601202405,0.243445692884,0.020397208803,0.0,0.405405405405,0.16049382716,0.46,0.106364236952,1.0,0.0,0.725490196,0.3,0.0,0.0,0.136254019315,0.169811320755,0.230532031043],
[0.163043478261,0.4,0.0,0.0180360721443,0.153558052434,0.0,0.0,0.243243243243,0.185185185185,0.3392857145,0.044924025545,1.0,0.0909090909091,0.725490196,0.225,0.25,0.1333,0.0594855305401,0.0377358490566,0.226223848446],
[0.152173913043,0.6,0.0508474576271,0.0220440881764,0.10861423221,0.0228126677402,0.00602409638554,0.216216216216,0.135802469136,0.2884615385,0.0237833076415,1.0,0.0909090909091,0.759259259,0.321428571,0.0,0.0,0.0316949931128,0.0754716981132,0.189692820679],
[0.29347826087,0.4,0.0,0.0160320641283,0.378277153558,0.0421363392378,0.0100401606426,0.0810810810811,0.185185185185,0.4123931625,0.283197533583,0.9,0.0909090909091,0.294117647,0.183760684,0.25,0.4667,0.220078599537,0.0754716981132,0.163932249402],
[0.0326086956522,0.0,0.0,0.00400801603206,0.0112359550562,0.000805152979066,0.00401606425703,0.0,0.037037037037,0.75,0.000880863245981,0.0,0.0,0.0,0.7,0.0,0.0,0.0,0.0188679245283,0.147540499867],
[0.597826086957,0.4,0.135593220339,0.0400801603206,0.397003

Re: [scikit-learn] question about scikit-learn

2017-05-04 Thread Andreas Mueller



On 05/03/2017 08:05 AM, 熊瑶 wrote:

Dear professor,

scikit-learn is really good for me to do some work using machine 
learning method. Here, I have two questions:


1)To do 5 fold cross-validation, when I use StratifiedKFold,I could 
get stratified folds that each fold contains approximately the same 
percentage of samples


of each target class as the complete set. And, when I use GroupKFold, 
it ensures that the same group is not represented in both testing and 
training sets.


I want to know whether there is a method to combine these two methods 
together?
Not implemented (yet). I think because it was a bit unclear what's the 
best thing to do.




2) When I use GridSearchCV to do parameter search, I use 
scoring="accuracy" as scoring function to choose the best parameters. 
And I find that I can only get


the accuracy score from the 5 fold cross-validation. What can I do if 
I want to get other scores such as sensitivity, specificity, MCC *at 
the same time*? It


means that I want to use accuracy to choose the best parameters and I 
want to get the scores of many scoring parameters at the same time 
when I do 5 fold



https://github.com/scikit-learn/scikit-learn/pull/7388
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question about scikit-learn

2017-05-03 Thread 熊瑶
Dear professor,


scikit-learn is really good for me to do some work using machine learning 
method. Here, I have two questions:


1)To do 5 fold cross-validation, when I use StratifiedKFold,I could get 
stratified folds that each fold contains approximately the same percentage of 
samples 


of each target class as the complete set. And, when I use GroupKFold, it 
ensures that the same group is not represented in both testing and training 
sets.


I want to know whether there is a method to combine these two methods together?


2) When I use GridSearchCV to do parameter search, I use scoring="accuracy" as 
scoring function to choose the best parameters. And I find that I can only get


the accuracy score from the 5 fold cross-validation. What can I do if I want to 
get other scores such as sensitivity, specificity, MCC at the same time? It 


means that I want to use accuracy to choose the best parameters and I want to 
get the scores of many scoring parameters at the same time when I do 5 fold


cross-validation.


Thank you.





熊瑶

北京大学深圳研究生院

化学生物学与生物技术学院

 

XIONG Yao

G301, School of Chemical Biology & Biotechnology

Peking University Shenzhen Graduate School

Shenzhen 518055, Guangdong, P.R. China

E-mail: xiong...@pku.edu.cn or xiongy20121...@foxmail.com

 ___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question about scikit-learn

2017-05-03 Thread 熊瑶
Dear professor,


scikit-learn is really good for me to do some work using machine learning 
method. Here, I have two questions:


1)To do 5 fold cross-validation, when I use StratifiedKFold,I could get 
stratified folds that each fold contains approximately the same percentage of 
samples 


of each target class as the complete set. And, when I use GroupKFold, it 
ensures that the same group is not represented in both testing and training 
sets.


I want to know whether there is a method to combine these two methods together?


2) When I use GridSearchCV to do parameter search, I use scoring="accuracy" as 
scoring function to choose the best parameters. And I find that I can only get


the accuracy score from the 5 fold cross-validation. What can I do if I want to 
get other scores such as sensitivity, specificity, MCC at the same time? It 


means that I want to use accuracy to choose the best parameters and I want to 
get the scores of many scoring parameters at the same time when I do 5 fold


cross-validation.


Thank you.






熊瑶

北京大学深圳研究生院

化学生物学与生物技术学院

 

XIONG Yao

G301, School of Chemical Biology & Biotechnology

Peking University Shenzhen Graduate School

Shenzhen 518055, Guangdong, P.R. China

E-mail: xiong...@pku.edu.cn or xiongy20121...@foxmail.com

 ___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question in using Scikit-learn MLPClassifier?

2016-12-06 Thread Sebastian Raschka
Hi,
typically, you want/need to play around with the hyperparameters if you want to 
get something useful out of an MLP — they rarely work out of the “box” since 
hyperparameters are very context-dependent.

> However, the accuracy rate is not satisfied comparing to the result in Matlab 
> which use BP algorithm too, I wonder if I should tune the parameter of MLP 
> for better?

Things you may want to try first is 

a) check if the training converged: i.e., check clf.loss_ for e.g., 200, 2000, 
5000 iterations. If the loss is noticably smaller after 5000 iterations 
(compared to 2000 iters), it would tell you that it hasn’t converged yet. 
Especially stochastic gradient descent is very sensitive to the initial 
learning rate. I also suggest that you try different values for these. Also, 
try to use a fixed random seed for reproducibility between runs, e.g., 
random_state=123

b) If you are using stochastic gradient descent with a logistic activation 
function, you may want to scale your input features via the StandardScaler so 
that the features are centered at 0 with std.dev. 1. E.g.,

sc = StandardScaler()
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled = sc.transform(X_test)

Good luck!
Sebastian



> On Dec 6, 2016, at 6:12 AM, lin...@ruijie.com.cn wrote:
> 
> Hi all:
> I uses a ‘Car Evaluation’ dataset from 
> http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data to test 
> the effect of MLP.  (I transfer some class in the data to digit value, e.g. 
> ‘low’ to 1 ‘med’ to 2, ‘high ’to 3, the final dataset’s input is 6 dimension, 
> output label is 4 dimension)
> However, the accuracy rate is not satisfied comparing to the result 
> in Matlab which use BP algorithm too, I wonder if I should tune the parameter 
> of MLP for better?
>  
> Attachment:
>  
> main code in matlab: accuracy 100% after train
> net=newff([-1 1;-1 1;-1 1;-1 1;-1 1;-1 1;],[10 
> 4],{'tansig','logsig'},'trainlm');
>  
> main code in MLP Code: accuracy 70% after fit
> clf = MLPClassifier(solver='sgd', activation='logistic', max_iter=2000, 
> learning_rate='adaptive',warm_start = True)
>  
>  
>  
>  
>  
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question in using Scikit-learn MLPClassifier?

2016-12-06 Thread linjia
Hi all:
I uses a ‘Car Evaluation’ dataset from 
http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data to test 
the effect of MLP.  (I transfer some class in the data to digit value, e.g. 
‘low’ to 1 ‘med’ to 2, ‘high ’to 3, the final dataset’s input is 6 dimension, 
output label is 4 dimension)
However, the accuracy rate is not satisfied comparing to the result in 
Matlab which use BP algorithm too, I wonder if I should tune the parameter of 
MLP for better?

Attachment:

main code in matlab: accuracy 100% after train
net=newff([-1 1;-1 1;-1 1;-1 1;-1 1;-1 1;],[10 
4],{'tansig','logsig'},'trainlm');

main code in MLP Code: accuracy 70% after fit
clf = MLPClassifier(solver='sgd', activation='logistic', max_iter=2000, 
learning_rate='adaptive',warm_start = True)





___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question about using sklearn.neural_network.MLPClassifier?

2016-11-23 Thread Sebastian Raschka
> If you keep everything at their default values, it seems to work -
>  
> ```py
> from sklearn.neural_network import MLPClassifier
> X = [[0, 0], [0, 1], [1, 0], [1, 1]]
> y = [0, 1, 1, 0]
> clf = MLPClassifier(max_iter=1000)
> clf.fit(X, y)  
> res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
> print(res)
> ```

The default is set 100 units in the hidden layer, but theoretically, it should 
work with 2 hidden logistic units (I think that’s the typical textbook/class 
example). I think what happens is that it gets stuck in local minima depending 
on the random weight initialization. E.g., the following works just fine:

from sklearn.neural_network import MLPClassifier
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]
clf = MLPClassifier(solver='lbfgs', 
activation='logistic', 
alpha=0.0, 
hidden_layer_sizes=(2,),
learning_rate_init=0.1,
max_iter=1000,
random_state=20)
clf.fit(X, y)  
res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
print(res)
print(clf.loss_)


but changing the random seed to 1 leads to:

[0 1 1 1]
0.34660921283

For comparison, I used a more vanilla MLP (1 hidden layer with 2 units and 
logistic activation as well; 
https://github.com/rasbt/python-machine-learning-book/blob/master/code/ch12/ch12.ipynb),
 essentially resulting in the same problem:






> On Nov 23, 2016, at 6:26 AM, lin...@ruijie.com.cn wrote:
> 
> Yes,you are right @ Raghav R V, thx!
> 
> However, i found the key param is ‘hidden_layer_sizes=[2]’,  I wonder if I 
> misunderstand the meaning of parameter of hidden_layer_sizes?
>  
> Is  it related to the topic : 
> http://stackoverflow.com/questions/36819287/mlp-classifier-of-scikit-neuralnetwork-not-working-for-xor
>  
>  
> 发件人: scikit-learn 
> [mailto:scikit-learn-bounces+linjia=ruijie.com...@python.org] 代表 Raghav R V
> 发送时间: 2016年11月23日 19:04
> 收件人: Scikit-learn user and developer mailing list
> 主题: Re: [scikit-learn] question about using 
> sklearn.neural_network.MLPClassifier?
>  
> Hi,
>  
> If you keep everything at their default values, it seems to work -
>  
> ```py
> from sklearn.neural_network import MLPClassifier
> X = [[0, 0], [0, 1], [1, 0], [1, 1]]
> y = [0, 1, 1, 0]
> clf = MLPClassifier(max_iter=1000)
> clf.fit(X, y)  
> res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
> print(res)
> ```
> 
> On Wed, Nov 23, 2016 at 10:27 AM,  wrote:
> Hi everyone
>  
>   I try to use sklearn.neural_network.MLPClassifier to test the XOR 
> operation, but I found the result is not satisfied. The following is code, 
> can you tell me if I use the lib incorrectly?
>  
> from sklearn.neural_network import MLPClassifier
> X = [[0, 0], [0, 1], [1, 0], [1, 1]]
> y = [0, 1, 1, 0]
> clf = MLPClassifier(solver='adam', activation='logistic', alpha=1e-3, 
> hidden_layer_sizes=(2,), max_iter=1000)
> clf.fit(X, y)  
> res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
> print(res)
>  
>  
> #result is [0 0 0 0], score is 0.5
> 
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
>  
> -- 
> Raghav RV
> https://github.com/raghavrv
>  
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] question about using sklearn.neural_network.MLPClassifier?

2016-11-23 Thread Raghav R V
Hi,

If you keep everything at their default values, it seems to work -

```py
from sklearn.neural_network import MLPClassifier
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]
clf = MLPClassifier(max_iter=1000)
clf.fit(X, y)
res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
print(res)
```

On Wed, Nov 23, 2016 at 10:27 AM,  wrote:

> Hi everyone
>
>
>
>   I try to use sklearn.neural_network.MLPClassifier to test the XOR
> operation, but I found the result is not satisfied. The following is code,
> can you tell me if I use the lib incorrectly?
>
>
>
> from sklearn.neural_network import MLPClassifier
>
> X = [[0, 0], [0, 1], [1, 0], [1, 1]]
>
> y = [0, 1, 1, 0]
>
> clf = MLPClassifier(solver='adam', activation='logistic', alpha=1e-3,
> hidden_layer_sizes=(2,), max_iter=1000)
>
> clf.fit(X, y)
>
> res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
>
> print(res)
>
>
>
>
>
> #result is [0 0 0 0], score is 0.5
>
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Raghav RV
https://github.com/raghavrv
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] question about using sklearn.neural_network.MLPClassifier?

2016-11-23 Thread linjia
Hi everyone

  I try to use sklearn.neural_network.MLPClassifier to test the XOR 
operation, but I found the result is not satisfied. The following is code, can 
you tell me if I use the lib incorrectly?

from sklearn.neural_network import MLPClassifier
X = [[0, 0], [0, 1], [1, 0], [1, 1]]
y = [0, 1, 1, 0]
clf = MLPClassifier(solver='adam', activation='logistic', alpha=1e-3, 
hidden_layer_sizes=(2,), max_iter=1000)
clf.fit(X, y)
res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])
print(res)


#result is [0 0 0 0], score is 0.5
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about Python's L2-Regularized Logistic Regression

2016-09-29 Thread Michael Eickenberg
That should totally depend on your dataset. Maybe it is an "easy" dataset
and not much regularization is needed.

Maybe use PCA(n_components=2) or an LDA transform to take a look at your
data in 2D. Maybe they are easily linearly separable?

Sklearn does not do any feature selection if you don't ask it to.

What C-values are you using? Try an np.logspace but go much farther out
both sides than you think reasonable. Then plot AUC as a function of that
to get a global idea of what is going on.

hth,
Michael

On Friday, September 30, 2016, Kristen M. Altenburger 
wrote:

> Hi All,
>
> I am trying to understand Python’s code [function ‘_fit_liblinear' in
> https://github.com/scikit-learn/scikit-learn/blob/
> master/sklearn/svm/base.py] for fitting an L2-logistic regression for a
> ‘liblinear’ solver. More specifically, my [approximately balanced class]
> dataset is such that the # of predictors [p=2000] >> # of observations
> [n=100]. Therefore, I am currently confused that when I increase C [and
> thus decrease the regularization strength] in fitting the logistic
> regression model to my training data why I then still obtain high AUC
> results when the model is then applied to my testing data. Is python
> internally doing a feature selection when fitting this model for high C
> values? Or why is it that the almost unregularized model [high C values]
> versus regularized [cross-validated approach to selecting C] model both
> result in similar AUC and accuracy results when the model is applied to the
> testing data? Should I be coding my predictors as +1/-1?
>
> Any pointers/explanations would be much appreciated!
>
> Thanks,
> Kristen
> ___
> scikit-learn mailing list
> scikit-learn@python.org 
> https://mail.python.org/mailman/listinfo/scikit-learn
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question about Python's L2-Regularized Logistic Regression

2016-09-29 Thread Sebastian Raschka
Hi, Kristen,
there shouldn’t be any internal feature selection going on behind the scenes. 
You may want to compare the weight coefficients of your regularized vs 
unregularized model, if they are exactly the same, then this would be an 
indicator that something funny is going on. Otherwise, it could be that both 
strongly- and non-regularized models are equally good or bad models on that 
dataset (btw. what value do you get for the ROC auc?).

You can access the weight coefficients via the “coef_” attribute after fitting. 
I.e.,

lr = LogisticRegression(...)
lr.fit(X_train, y_train)
lr.coef_

> Should I be coding my predictors as +1/-1? 

0 and 1 should be just fine and is the expected default. 

Best,
Sebastian

> On Sep 29, 2016, at 6:09 PM, Kristen M. Altenburger  
> wrote:
> 
> Hi All,
> 
> I am trying to understand Python’s code [function ‘_fit_liblinear' in 
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py] 
> for fitting an L2-logistic regression for a ‘liblinear’ solver. More 
> specifically, my [approximately balanced class] dataset is such that the # of 
> predictors [p=2000] >> # of observations [n=100]. Therefore, I am currently 
> confused that when I increase C [and thus decrease the regularization 
> strength] in fitting the logistic regression model to my training data why I 
> then still obtain high AUC results when the model is then applied to my 
> testing data. Is python internally doing a feature selection when fitting 
> this model for high C values? Or why is it that the almost unregularized 
> model [high C values] versus regularized [cross-validated approach to 
> selecting C] model both result in similar AUC and accuracy results when the 
> model is applied to the testing data? Should I be coding my predictors as 
> +1/-1? 
> 
> Any pointers/explanations would be much appreciated!
> 
> Thanks,
> Kristen
> ___
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about Python's L2-Regularized Logistic Regression

2016-09-29 Thread Kristen M. Altenburger
Hi All,

I am trying to understand Python’s code [function ‘_fit_liblinear' in 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py] 
for fitting an L2-logistic regression for a ‘liblinear’ solver. More 
specifically, my [approximately balanced class] dataset is such that the # of 
predictors [p=2000] >> # of observations [n=100]. Therefore, I am currently 
confused that when I increase C [and thus decrease the regularization strength] 
in fitting the logistic regression model to my training data why I then still 
obtain high AUC results when the model is then applied to my testing data. Is 
python internally doing a feature selection when fitting this model for high C 
values? Or why is it that the almost unregularized model [high C values] versus 
regularized [cross-validated approach to selecting C] model both result in 
similar AUC and accuracy results when the model is applied to the testing data? 
Should I be coding my predictors as +1/-1? 

Any pointers/explanations would be much appreciated!

Thanks,
Kristen
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Fwd: [Scikit-learn-general] MultinomialNB Scikit-learn question

2016-09-26 Thread Bharat Didwania 4-Yr B.Tech. Electrical Engg.
-- Forwarded message --
From: Bharat Didwania 4-Yr B.Tech. Electrical Engg. <
bharat.didwania.ee...@itbhu.ac.in>
Date: Mon, Sep 26, 2016 at 11:04 PM
Subject: Re: [Scikit-learn-general] MultinomialNB Scikit-learn question
To: scikit-learn-gene...@lists.sourceforge.net


It seems you are using the sample_weights with OneVsAll classifier. I think
it should be an issue as the base classifier, i.e MultinomialNB supports
sample_weights but OneVsAll does not.



 Sent with Mailtrack
<https://mailtrack.io/install?source=signature&lang=en&referral=bharat.didwania.ee...@itbhu.ac.in&idSignature=22>

On Mon, Sep 26, 2016 at 4:10 PM, Diego Vergara 
wrote:

> Hi Scikit-learn developer team. I have a query in which I need help.
>
> How does it work sample weight in MultinomialNB ?, Is there any
> documentation or equation?.
>
> In the original equation (http://scikit-learn.org/stabl
> e/modules/naive_bayes.html#multinomial-naive-bayes), weights applied to
> individual samples, throws me bad results.
> Best regards, thank you for your answers.
>
> 
> --
>
> ___
> Scikit-learn-general mailing list
> scikit-learn-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>




 Sent with Mailtrack
<https://mailtrack.io/install?source=signature&lang=en&referral=bharat.didwania.ee...@itbhu.ac.in&idSignature=22>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Question regarding kernel PCA implementation in scikit-learn

2016-07-15 Thread Mathieu Blondel
Forwarding your question to the mailing-list.

On Thu, Jul 14, 2016 at 10:33 PM, Christos Lataniotis <
latanio...@ibk.baug.ethz.ch> wrote:

> Dear Mathieu Blondel,
>
> I am a PhD student working on some machine-learning aspects related to
> dimensionality reduction. One of the methods that is of interest to me is
> kernel PCA so I tested the implementation that is offered by scikit-learn
> which I think is the most complete from the ones I could find on the web.
>
> I would like to ask for some clarification regarding the way you
> implemented the inverse transform, i.e. solving the pre-image problem.
>
> Although the paper from Bakir et. al, 2004 is cited, I think there is some
> difference in your implementation and the methodology that is discussed on
> that paper. Bakir suggests ‘learning' the pre-image map by solving a kernel
> ridge regression problem with some kernel function, say l, that is
> different than the kernel function, say k, that is used in kernel PCA,
> However by going through the source code of your implementation I think
> that kernel functions l and k coincide. It that correct? If yes, is there
> some justification (e.g. empirical) for making such assumption? I am asking
> this because as far as I have read in the literature selecting the kernel
> function l is kind of an open question still so I would expect it to be a
> parameter that can be selected by the user on top of selecting the kernel
> function for kernel PCA.
>
> Thank you for your time in advance.
>
> Best Regards,
> Christos
>
>
> --
> Christos Lataniotis
> Institute of Structural Engineering
> Chair of Risk, Safety and Uncertainty Quantification ETH Zürich - HIL E
> 35.1
> Wolfgang-Pauli-Str. 15
> CH-8093 Zürich, Switzerland
> Tel: +41 44 633 06 70
> E-Mail: latanio...@ibk.baug.ethz.ch
>
>
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] Question about error of LLE and backtransformation of coordinates

2016-06-16 Thread Matthieu Brucher
Hi!

The errors are quite small compared to the machine precision. As the
reduction is also an approximation of the underlying manifold, not an
"isotropic" one as well (you can see int he example that red points are
less squashed together than blue ones), you won't have a perfect
reconstruction either.
In a way, if you can reproduce in the reduced space the same distances (or
barycenters for LLE) compared the original space, then you can have a
perfect reconstruction (but it will still be subject to floating point
precision). For the sphere, you can't: take 4 points, can you make the
fourth as a barycenter of the other 3? No. That's the error you are seeing.

Cheers,

Matthieu


2016-06-16 10:12 GMT+01:00 Unger, Jörg :
>
> I’ve tried the example that is available here
>
>
>
>
http://scikit-learn.org/stable/auto_examples/manifold/plot_manifold_sphere.html
>
>
>
> These are essentially points on a 3D sphere, so the dimension of the
embedded manifold is two.
>
> I’ve changed the example a little bit to extract the error as well. So
instead of
>
>
>
> trans_data = manifold\
>
> .LocallyLinearEmbedding(n_neighbors, 2,
>
>
method=method).fit_transform(sphere_data).T
>
>
>
> I’ve done something like
>
> solver = manifold.LocallyLinearEmbedding(n_neighbors, dim_y,
method=method)
>
> trans_data = solver.fit_transform(sphere_data).T
>
> error = solver.reconstruction_error_
>
>
>
> I would have expected the error to be significant for dim_y=1, since I
can’t reproduce with just a single coordinate the results. For dim_y=2, I
expected a significant decrease, and for dim_y=3, I expected to exactly
recover the original result.
>
>
>
> What I get is (for standard LLE)
>
> dim_y = 1 : error = 1.6203157e-07
>
> dim_y = 2 : error = 1.79465538543e-06
>
> dim_y = 3 : error = 7.00280676182e-06
>
>
>
> Could anyone explain, why I do not get the expected results?
>
>
>
> Furthermore, is there an option to retransform the coordinates from the
local dimension to the global dimension? I’m interested in transforming the
original global samples to local coordinates (this is done via the
transform method), but then I would like to transform samples from
coordinates in the embedded space back into the global space.
>
>
>
> Best regards,
>
> Jörg F. Unger
>
>
>
--
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
planning
> reports.
http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> ___
> Scikit-learn-general mailing list
> scikit-learn-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Information System Engineer, Ph.D.
Blog: http://blog.audio-tk.com/
LinkedIn: http://www.linkedin.com/in/matthieubrucher



-- 
Information System Engineer, Ph.D.
Blog: http://blog.audio-tk.com/
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn