You might want to provide an example? It's a pretty vague problem at the moment.

If the data can be easily picked out by human eyes, you might want to think 
about your criteria you're using to pick out a contaminated result. If you can 
express it in such a way that you don't need to scan each observation (e.g. if 
a snapper weighs >= 300000kg then somebody entered that data incorrectly) then 
you can create an indicator variable and continue with your analysis.

Other than that - some sort of cluster analysis might be able to pick up on 2 
distinct groups provided within each group there's a reasonable level of 
homogeneity. Then from there, you can do a basic inference test for group means 
to detect whether there are significant differences detected between groups.

Cheers,
Tim



-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Michael
Sent: Wednesday, 25 January 2012 9:31 AM
To: r-help
Subject: Re: [R] detecting noise in data?

Hi all,

I just wanted to add that I am looking for a solution that's in R ... to
handle this...

And also, in a given sample, the correct data are of the majority and the
noise are of the minority.

Thank you!

On Tue, Jan 24, 2012 at 4:09 PM, Michael <comtech....@gmail.com> wrote:

> Hi all,
>
> I have data which are unfortuantely comtaminated by noise.
>
> We knew that the noise is at different level than the correct data, i.e.
> the noise data can be easily picked out by human eyes.
>
> It looks as if there are two people that generated the two very different
> data with different mean levels, and they got mixed together.
>
> i.e. assming the two data are following unknown distribution DF,
>
> and the two mean levels are u1 and u2... (unknown)
>
> Then the correct data are generated by DF(u1)
>
> and the noise are generated by DF(u2),
>
> and they got mixed...
>
> Now, how do I flag those suspicious data? At least is there a way I could
> answer the question:
>
> Given a sample of mixed data - are these data generated from the
> above-mentioned two sources, or the data are indeed generated from one
> source only.
>
> i.e. are there two substantially distinct species in the given data?
>
> Thanks a lot!
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System. 
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of Health's Electronic Messaging Policy.
______________________________________________________________________________________________________________________


______________________________________________________________________________________________________________________
Disclaimer: This message is intended for the addressee named and may contain 
confidential information. 
If you are not the intended recipient, please delete it and notify the sender. 
Views expressed in this message are those of the individual sender, and are not 
necessarily the views of the NSW Ministry of Health.
______________________________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System. 
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of Health's Electronic Messaging Policy.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to