Re: [R] Algorythmic Question on Array Filtration
John Kane wrote: Thanks for your time. Please find a small example below - the real data is MUCH bigger. If you look at rows 5 and 6 of this and calculate the mass precision window I have to deal with (5 ppm), you'll find the following: Row Lower 5ppm MassHigher 5ppm Intensity 5 312.9419312.9435312.945120236.181 6 312.9422312.9438312.945414404.502 The precision windows here obviously overlap and I need to get rid of one of them, which in this case should be row6, since it has the lower intensity associated with it. For now I resort to doing an intensity sort and descending into the list populate a fresh data.frame with entries that do not have any overlap, skipping those that do. If somebody has any more sound ideas, I'd appreciate to hear about them. Thanks, Joh MassIntensity 304.9117 35595.780 305.1726 18760.413 311.0636 24047.307 312.9303 12886.216 312.9435 20236.181 312.9438 14404.502 313.1763 61033.830 313.1766 50788.418 316.9118 5908.166 317.2805 14084.841 317.2833 25603.689 317.2837 22866.578 318.0114 37929.855 318.9274 27883.295 318.9889 4496.716 321.2784 3893.165 326.1166 23745.851 327.2894 5318.226 328.8852 60934.030 329.1517 31985.486 331.0426 14883.231 332.0268 55126.078 332.2798 47364.519 333.2813 11423.807 337.1990 5330.360 339.2144 38450.804 339.2867 4065.709 340.9561 54101.844 340.9770 28172.160 345.0583 17945.025 345.0583 17877.900 347.1742 7359.428 347.2407 204792.999 353.2302 87864.153 353.2302 129691.696 363.0161 20453.771 363.0943 19481.234 363.2142 9238.244 363.2315 23323.527 363.2533 20039.607 363.2534 22068.718 364.8918 16857.488 364.9368 9527.642 366.9029 18174.233 373.2197 7730.009 385.1147 27907.070 385.1148 19383.655 393.2913 11860.719 396.9074 10793.823 400.8792 10750.249 402.8729 12411.966 407.2771 11270.566 442.8689 18101.972 442.8697 10671.199 447.3470 35927.046 449.2347 6959.247 456.9339 50402.820 461.1670 8636.998 461.1670 8151.706 473.2985 13782.291 490.9224 18510.760 I think we need a bit more information and perhaps a small example data set to see what you want. I am not familiar with term mass window. Is this a confidence interval around the mass value? --- Johannes Graumann [EMAIL PROTECTED] wrote: Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Algorythmic Question on Array Filtration
This will determine where the overlaps are and delete them. You can add some more code to determine which ones you want to delete. # add the 5ppm to the dataframe x$lower - x$Mass * (1 - 5e-6) x$upper - x$Mass * (1 + 5e-6) # create a matrix for determining overlap by adding 1 at the lower value of a row # and substracting 1 at the upper value. overlap - rbind( + cbind(index=seq(nrow(x)), value=x$lower, oper=1), + cbind(index=seq(nrow(x)), value=x$upper, oper=-1)) # sort in 'value' order to determine overlap overlap[] - overlap[order(overlap[,'value'], overlap[, 'oper']),] # 'qsize should be 0/1 if no overlap overlap - cbind(overlap, qsize=cumsum(overlap[, 'oper'])) # find the qsize 1 indicating overlap and use the index of that one and the one # after as the ones to delete. You could add code to determine which one to keep o.index - which(overlap[,'qsize'] 1) # determine the indices to delete i.delete - unique(c(overlap[o.index,'index'], overlap[o.index+1, 'index'])) # create the new matrix with overlaps deleted new.x - x[-i.delete,] head(new.x,10) Mass Intensitylowerupper 1 304.9117 35595.780 304.9102 304.9132 2 305.1726 18760.413 305.1711 305.1741 3 311.0636 24047.307 311.0620 311.0652 4 312.9303 12886.216 312.9287 312.9319 9 316.9118 5908.166 316.9102 316.9134 13 318.0114 37929.855 318.0098 318.0130 14 318.9274 27883.295 318.9258 318.9290 15 318.9889 4496.716 318.9873 318.9905 16 321.2784 3893.165 321.2768 321.2800 17 326.1166 23745.851 326.1150 326.1182 On 7/14/07, Johannes Graumann [EMAIL PROTECTED] wrote: John Kane wrote: Thanks for your time. Please find a small example below - the real data is MUCH bigger. If you look at rows 5 and 6 of this and calculate the mass precision window I have to deal with (5 ppm), you'll find the following: Row Lower 5ppm MassHigher 5ppm Intensity 5 312.9419312.9435312.945120236.181 6 312.9422312.9438312.945414404.502 The precision windows here obviously overlap and I need to get rid of one of them, which in this case should be row6, since it has the lower intensity associated with it. For now I resort to doing an intensity sort and descending into the list populate a fresh data.frame with entries that do not have any overlap, skipping those that do. If somebody has any more sound ideas, I'd appreciate to hear about them. Thanks, Joh MassIntensity 304.9117 35595.780 305.1726 18760.413 311.0636 24047.307 312.9303 12886.216 312.9435 20236.181 312.9438 14404.502 313.1763 61033.830 313.1766 50788.418 316.9118 5908.166 317.2805 14084.841 317.2833 25603.689 317.2837 22866.578 318.0114 37929.855 318.9274 27883.295 318.9889 4496.716 321.2784 3893.165 326.1166 23745.851 327.2894 5318.226 328.8852 60934.030 329.1517 31985.486 331.0426 14883.231 332.0268 55126.078 332.2798 47364.519 333.2813 11423.807 337.1990 5330.360 339.2144 38450.804 339.2867 4065.709 340.9561 54101.844 340.9770 28172.160 345.0583 17945.025 345.0583 17877.900 347.1742 7359.428 347.2407 204792.999 353.2302 87864.153 353.2302 129691.696 363.0161 20453.771 363.0943 19481.234 363.2142 9238.244 363.2315 23323.527 363.2533 20039.607 363.2534 22068.718 364.8918 16857.488 364.9368 9527.642 366.9029 18174.233 373.2197 7730.009 385.1147 27907.070 385.1148 19383.655 393.2913 11860.719 396.9074 10793.823 400.8792 10750.249 402.8729 12411.966 407.2771 11270.566 442.8689 18101.972 442.8697 10671.199 447.3470 35927.046 449.2347 6959.247 456.9339 50402.820 461.1670 8636.998 461.1670 8151.706 473.2985 13782.291 490.9224 18510.760 I think we need a bit more information and perhaps a small example data set to see what you want. I am not familiar with term mass window. Is this a confidence interval around the mass value? --- Johannes Graumann [EMAIL PROTECTED] wrote: Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Algorythmic Question on Array Filtration
Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Algorythmic Question on Array Filtration
Sorry, this sounds like a fairly basic question that can be resolved by which() and possible ifelse(). There is no details in your email. I am afraid you have to learn the basics of R or ask question with more details (e.g. example data). Or ask someone locally. Regards, Adai Johannes Graumann wrote: Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Algorythmic Question on Array Filtration
I think we need a bit more information and perhaps a small example data set to see what you want. I am not familiar with term mass window. Is this a confidence interval around the mass value? --- Johannes Graumann [EMAIL PROTECTED] wrote: Dear All, I have a data frame with the columns Mass and Intensity (this is mass spectrometry stuff). Each of the mass values gives rise to a mass window of 5 ppm around the individual mass (from mass - mass/1E6*5 to mass + mass/1E5*5). I need to filter the array such that in case these mass windows overlap I retain the mass/intensity pair with the highest intensity. I apologize for this question, but I have no formal IT education and would value any nudges toward favorable algorithmic solutions highly. Thanks for any help, Joh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.