Seems the estimator.fit method needs the true labels, and that I shouldn't pass either the true lables or the predicted labels to v_measure_score (passing either triggers an AttributeError). So now I'm running with
# Make a scoring function for the pipeline v_measure_scorer = make_scorer(v_measure_score, labels_pred=kmeans.predict) # Parameters of pipelines are set using ‘__’ separated parameter names: estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), scoring=v_measure_scorer) estimator.fit(D_scaled,D_labels) It's been running overnight, hopefully I get a result this morning. Thanks for all your help, L. On Wed, May 14, 2014 at 11:12 AM, Lee Zamparo <[email protected]> wrote: > Combining the helpful suggestions of Andy & Joel I'm tyring the following: > > # Make a scoring function for the pipeline > v_measure_scorer = > make_scorer(v_measure_score,labels_true=labels[:,0],labels_pred=kmeans.predict) > > # Parameters of pipelines are set using ‘__’ separated parameter names: > estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), > scoring=v_measure_scorer) > estimator.fit(D_scaled) > > Was this what you were referring to Andy? > > Thanks, > > Lee. > > On Wed, May 14, 2014 at 1:27 AM, Andreas Mueller <[email protected]> wrote: >> I think you should use the make_scorer function. Using labels_ will not >> work, as it will only have labels for the training split, while the >> performance is measured on the test split. >> >> On May 14, 2014 2:28 AM, "Joel Nothman" <[email protected]> wrote: >>> >>> Hi Lee, >>> >>> The scoring parameter, if not an existing scoring name, needs to be a >>> function with the signature: >>> >>> fn(estimator, X, y_true) -> score which increases with goodness >>> >>> So I think you want to define: >>> >>> def score_clusters(estimator, X, y): >>> return v_measure_score(y[:,0], kmeans.labels_)) >>> >>> Then construct the GridSearchCV as: >>> >>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >>> scoring=score_clusters) >>> >>> It seems like there should be more predefined scorers available for >>> clustering... >>> >>> Cheers, >>> >>> - Joel >>> >>> >>> On 14 May 2014 09:10, Lee Zamparo <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to use GridSearchCV and Pipeline to tune the gamma >>>> parameter of kernel PCA. I'd like to use kernel PCA to transform the >>>> data, followed by kmeans to cluster the data, followed by v-measure to >>>> measure the goodness of fit of the clustering. >>>> >>>> Here's the relevant snippet of my script >>>> ----- >>>> # Set up the kPCA -> kmeans -> v-measure pipeline >>>> kpca = KernelPCA(kernel="rbf") >>>> kmeans = KMeans(n_clusters=3) >>>> pipe = Pipeline(steps=[('kpca', kpca), ('kmeans', kmeans)]) >>>> >>>> # Range of parameters to consider for gamma in the RBF kernel for kPCA >>>> gammas = np.logspace(-10,2,num=100) >>>> >>>> # Parameters of pipelines are set using ‘__’ separated parameter names: >>>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >>>> scoring=v_measure_score(labels[:,0],kmeans.labels_)) >>>> estimator.fit(D_scaled) >>>> >>>> ----- >>>> >>>> Yet I get an AttributeError claiming that the kmeans object has no >>>> labels_ attribute. >>>> >>>> File "/home/lee/projects/SdA_reduce/utils/kernel_pca_pipeline.py", >>>> line 86, in <module> >>>> estimator = GridSearchCV(pipe, dict(kpca__gamma=gammas), >>>> scoring=v_measure_score(labels[:,0],kmeans.labels_)) >>>> >>>> AttributeError: 'KMeans' object has no attribute 'labels_' >>>> >>>> Does anyone have any tips on how I should restructure my snippet to >>>> get my desired outcome? >>>> >>>> Thanks, >>>> >>>> Lee. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>> Instantly run your Selenium tests across 300+ browser/OS combos. >>>> Get unparalleled scalability from the best Selenium testing platform >>>> available >>>> Simple to use. Nothing to install. Get started now for free." >>>> http://p.sf.net/sfu/SauceLabs >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform >>> available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
