Seems the estimator.fit method needs the true labels, and that I
shouldn't pass either the true lables or the predicted labels to
v_measure_score (passing either triggers an AttributeError).  So now
I'm running with

 # Make a scoring function for the pipeline
v_measure_scorer = make_scorer(v_measure_score, labels_pred=kmeans.predict)

# Parameters of pipelines are set using ‘__’ separated parameter names:
estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
scoring=v_measure_scorer)
estimator.fit(D_scaled,D_labels)

It's been running overnight, hopefully I get a result this morning.
Thanks for all your help,

L.

On Wed, May 14, 2014 at 11:12 AM, Lee Zamparo <[email protected]> wrote:
> Combining the helpful suggestions of Andy & Joel I'm tyring the following:
>
> # Make a scoring function for the pipeline
> v_measure_scorer =
> make_scorer(v_measure_score,labels_true=labels[:,0],labels_pred=kmeans.predict)
>
> # Parameters of pipelines are set using ‘__’ separated parameter names:
> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
> scoring=v_measure_scorer)
> estimator.fit(D_scaled)
>
> Was this what you were referring to Andy?
>
> Thanks,
>
> Lee.
>
> On Wed, May 14, 2014 at 1:27 AM, Andreas Mueller <[email protected]> wrote:
>> I think you should use the make_scorer function. Using labels_ will not
>> work, as it will only have labels for the training split, while the
>> performance is measured on the test split.
>>
>> On May 14, 2014 2:28 AM, "Joel Nothman" <[email protected]> wrote:
>>>
>>> Hi Lee,
>>>
>>> The scoring parameter, if not an existing scoring name, needs to be a
>>> function with the signature:
>>>
>>> fn(estimator, X, y_true) -> score which increases with goodness
>>>
>>> So I think you want to define:
>>>
>>> def score_clusters(estimator, X, y):
>>>     return v_measure_score(y[:,0], kmeans.labels_))
>>>
>>> Then construct the GridSearchCV as:
>>>
>>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>>> scoring=score_clusters)
>>>
>>> It seems like there should be more predefined scorers available for
>>> clustering...
>>>
>>> Cheers,
>>>
>>> - Joel
>>>
>>>
>>> On 14 May 2014 09:10, Lee Zamparo <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to use GridSearchCV and Pipeline to tune the gamma
>>>> parameter of kernel PCA.  I'd like to use kernel PCA to transform the
>>>> data, followed by kmeans to cluster the data, followed by v-measure to
>>>> measure the goodness of fit of the clustering.
>>>>
>>>> Here's the relevant snippet of my script
>>>> -----
>>>> # Set up the kPCA -> kmeans -> v-measure pipeline
>>>> kpca = KernelPCA(kernel="rbf")
>>>> kmeans = KMeans(n_clusters=3)
>>>> pipe = Pipeline(steps=[('kpca', kpca), ('kmeans', kmeans)])
>>>>
>>>> # Range of parameters to consider for gamma in the RBF kernel for kPCA
>>>> gammas = np.logspace(-10,2,num=100)
>>>>
>>>> # Parameters of pipelines are set using ‘__’ separated parameter names:
>>>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>>>> scoring=v_measure_score(labels[:,0],kmeans.labels_))
>>>> estimator.fit(D_scaled)
>>>>
>>>> -----
>>>>
>>>> Yet I get an AttributeError claiming that the kmeans object has no
>>>> labels_ attribute.
>>>>
>>>> File "/home/lee/projects/SdA_reduce/utils/kernel_pca_pipeline.py",
>>>> line 86, in <module>
>>>>   estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas),
>>>> scoring=v_measure_score(labels[:,0],kmeans.labels_))
>>>>
>>>> AttributeError: 'KMeans' object has no attribute 'labels_'
>>>>
>>>> Does anyone have any tips on how I should restructure my snippet to
>>>> get my desired outcome?
>>>>
>>>> Thanks,
>>>>
>>>> Lee.
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>>> Get unparalleled scalability from the best Selenium testing platform
>>>> available
>>>> Simple to use. Nothing to install. Get started now for free."
>>>> http://p.sf.net/sfu/SauceLabs
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>> Get unparalleled scalability from the best Selenium testing platform
>>> available
>>> Simple to use. Nothing to install. Get started now for free."
>>> http://p.sf.net/sfu/SauceLabs
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.
>> Get unparalleled scalability from the best Selenium testing platform
>> available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to