Hi David, When applying Train_test_split to the sample space, we have a single row per subject. I am looking for some other function like Train_test_split that can deal with pairs of rows (for each subject), which does not lead to a biased accuracy. We are studying memory and have a row of features for successful memory encoding, and a second row for unsuccessful memory encoding in each of the subjects. Our target space being 1 for successful and 0 for unsuccessful encoding respectively. How do you recommend me to split this set of data in order to get a reasonable/unbiased accuracy?
Thanks, Afarin ________________________________________ From: scikit-learn <scikit-learn-bounces+afarin.famili=utsouthwestern....@python.org> on behalf of scikit-learn-requ...@python.org <scikit-learn-requ...@python.org> Sent: Monday, September 26, 2016 2:43 PM To: scikit-learn@python.org Subject: scikit-learn Digest, Vol 6, Issue 40 Send scikit-learn mailing list submissions to scikit-learn@python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn or, via email, send a message with subject or body 'help' to scikit-learn-requ...@python.org You can reach the person managing the list at scikit-learn-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. header intact (Afarin Famili) 2. Is there a built-in function for pairs of data? (Afarin Famili) 3. Re: Is there a built-in function for pairs of data? (Pedro Pazzini) 4. Re: Is there a built-in function for pairs of data? (David Nicholson) 5. Large computation time for homogeneous data with agglomerative clustering (Md. Khairullah) ---------------------------------------------------------------------- Message: 1 Date: Mon, 26 Sep 2016 18:03:27 +0000 From: Afarin Famili <afarin.fam...@utsouthwestern.edu> To: "scikit-learn@python.org" <scikit-learn@python.org> Subject: [scikit-learn] header intact Message-ID: <1474913007611.80...@utsouthwestern.edu> Content-Type: text/plain; charset="iso-8859-1" ? ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160926/92efd185/attachment-0001.html> ------------------------------ Message: 2 Date: Mon, 26 Sep 2016 18:06:49 +0000 From: Afarin Famili <afarin.fam...@utsouthwestern.edu> To: "scikit-learn@python.org" <scikit-learn@python.org> Subject: [scikit-learn] Is there a built-in function for pairs of data? Message-ID: <1474913209751.36...@utsouthwestern.edu> Content-Type: text/plain; charset="iso-8859-1" Dear Scikit-learn team, We need to deal with pairs of data in our classification task. I was wondering if there is already a built-in function in Scikit-learn that can partition the pairs of data into train and test sets? Regards, Afarin ________________________________ UT Southwestern Medical Center The future of medicine, today. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160926/983b9036/attachment-0001.html> ------------------------------ Message: 3 Date: Mon, 26 Sep 2016 15:47:26 -0300 From: Pedro Pazzini <pedropazz...@gmail.com> To: Scikit-learn user and developer mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Is there a built-in function for pairs of data? Message-ID: <CAAY8FkB2LjnegwFbn=gsoawlbcbq3dnya6bxdxn6-cvlt1r...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Like this?: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html 2016-09-26 15:06 GMT-03:00 Afarin Famili <afarin.fam...@utsouthwestern.edu>: > > Dear Scikit-learn team, > > > We need to deal with pairs of data in our classification task. I was > wondering if there is already a built-in function in Scikit-learn that can > partition the pairs of data into train and test sets? > > > Regards, > > Afarin > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160926/2ba60e6a/attachment-0001.html> ------------------------------ Message: 4 Date: Mon, 26 Sep 2016 14:53:05 -0400 From: David Nicholson <nichol...@gmail.com> To: Scikit-learn user and developer mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Is there a built-in function for pairs of data? Message-ID: <camabfbxamb5kzqy9_wu+8bfxpsecbs2fsiqqad18zi9zmoj...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Do you mean like train_test_split? http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html On Sep 26, 2016 14:43, "Afarin Famili" <afarin.fam...@utsouthwestern.edu> wrote: > > Dear Scikit-learn team, > > > We need to deal with pairs of data in our classification task. I was > wondering if there is already a built-in function in Scikit-learn that can > partition the pairs of data into train and test sets? > > > Regards, > > Afarin > > > > ------------------------------ > > UT Southwestern > > Medical Center > > The future of medicine, today. > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160926/191ef81d/attachment-0001.html> ------------------------------ Message: 5 Date: Mon, 26 Sep 2016 21:43:05 +0200 From: "Md. Khairullah" <md.khairul...@gmail.com> To: scikit-learn@python.org Subject: [scikit-learn] Large computation time for homogeneous data with agglomerative clustering Message-ID: <ca+xrtckmkwsn2y7jfg12nex-ch_v5bw7elhg5uo39wn+ebb...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Dear Scikit-learners, This is my first post here and I hope you experts can help me a lot. We are using the agglomerative clustering with ward's linkage and connectivity constraint. The data size is around 205,000 (each is a single scalar feature). The data set is dynamic (in time) and we need to apply clustering at different time thorough the process. Initially all data is 0 and they increase gradually. Alternatively, in the early stage the data is more homogeneous and the heterogeneity among the data increases gradually. If the clustering is applied at the final stage (most heterogeneous data, but off course having patterns/clusters) requesting 20 clusters it takes only 61s of CPU time. But, if clustering is run in an early stage (more homogeneous data but all are not 0 and off course there are patterns/clusters in the data) with the same settings the time rises up to 1h 5m. The CPU time is in-between of these two if the data come from an in-between time stamp. I also tried the the other linkage options too, but the situation does not improve. My understanding is that the homogeneity is playing the role. Have you experienced this too? What solution do you suggest? Thanks in advance for your attention and help. -- Best regards Md. Khairullah PhD Student, KU Leuven Numerical Analysis and Applied Mathematics Section Celestijnenlaan 200a - box 2402 3001 Leuven room: 03.18 tel. +32 16 37 39 66 fax +32 16 3 27996 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160926/da13ef50/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ------------------------------ End of scikit-learn Digest, Vol 6, Issue 40 ******************************************* _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn