Statistical inference for group differences on groups determined from the data yields incorrect results. Groups must be prespecified.
Bert On Jan 24, 2012, at 2:55 PM, "HARROLD, Tim" <th...@doh.health.nsw.gov.au> wrote: > You might want to provide an example? It's a pretty vague problem at the > moment. > > If the data can be easily picked out by human eyes, you might want to think > about your criteria you're using to pick out a contaminated result. If you > can express it in such a way that you don't need to scan each observation > (e.g. if a snapper weighs >= 300000kg then somebody entered that data > incorrectly) then you can create an indicator variable and continue with your > analysis. > > Other than that - some sort of cluster analysis might be able to pick up on 2 > distinct groups provided within each group there's a reasonable level of > homogeneity. Then from there, you can do a basic inference test for group > means to detect whether there are significant differences detected between > groups. > > Cheers, > Tim > > > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Michael > Sent: Wednesday, 25 January 2012 9:31 AM > To: r-help > Subject: Re: [R] detecting noise in data? > > Hi all, > > I just wanted to add that I am looking for a solution that's in R ... to > handle this... > > And also, in a given sample, the correct data are of the majority and the > noise are of the minority. > > Thank you! > > On Tue, Jan 24, 2012 at 4:09 PM, Michael <comtech....@gmail.com> wrote: > >> Hi all, >> >> I have data which are unfortuantely comtaminated by noise. >> >> We knew that the noise is at different level than the correct data, i.e. >> the noise data can be easily picked out by human eyes. >> >> It looks as if there are two people that generated the two very different >> data with different mean levels, and they got mixed together. >> >> i.e. assming the two data are following unknown distribution DF, >> >> and the two mean levels are u1 and u2... (unknown) >> >> Then the correct data are generated by DF(u1) >> >> and the noise are generated by DF(u2), >> >> and they got mixed... >> >> Now, how do I flag those suspicious data? At least is there a way I could >> answer the question: >> >> Given a sample of mixed data - are these data generated from the >> above-mentioned two sources, or the data are indeed generated from one >> source only. >> >> i.e. are there two substantially distinct species in the given data? >> >> Thanks a lot! >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________________________________________________________________________________ > This email has been scanned for the NSW Ministry of Health by the Websense > Hosted Email Security System. > Emails and attachments are monitored to ensure compliance with the NSW > Ministry of Health's Electronic Messaging Policy. > ______________________________________________________________________________________________________________________ > > > ______________________________________________________________________________________________________________________ > Disclaimer: This message is intended for the addressee named and may contain > confidential information. > If you are not the intended recipient, please delete it and notify the > sender. > Views expressed in this message are those of the individual sender, and are > not necessarily the views of the NSW Ministry of Health. > ______________________________________________________________________________________________________________________ > This email has been scanned for the NSW Ministry of Health by the Websense > Hosted Email Security System. > Emails and attachments are monitored to ensure compliance with the NSW > Ministry of Health's Electronic Messaging Policy. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.