Re: [R] Algorythmic Question on Array Filtration

2007-07-14 Thread Johannes Graumann
John Kane wrote:
Thanks for your time.

Please find a small example below - the real data is MUCH bigger.
If you look at rows 5 and 6 of this and calculate the mass precision window
I have to deal with (5 ppm), you'll find the following:

Row Lower 5ppm  MassHigher 5ppm Intensity
5   312.9419312.9435312.945120236.181
6   312.9422312.9438312.945414404.502

The precision windows here obviously overlap and I need to get rid of one of
them, which in this case should be row6, since it has the lower intensity
associated with it.

For now I resort to doing an intensity sort and descending into the list
populate a fresh data.frame with entries that do not have any overlap,
skipping those that do. If somebody has any more sound ideas, I'd
appreciate to hear about them.

Thanks, Joh

MassIntensity
304.9117 35595.780
305.1726 18760.413
311.0636 24047.307
312.9303 12886.216
312.9435 20236.181
312.9438 14404.502
313.1763 61033.830
313.1766 50788.418
316.9118 5908.166
317.2805 14084.841
317.2833 25603.689
317.2837 22866.578
318.0114 37929.855
318.9274 27883.295
318.9889 4496.716
321.2784 3893.165
326.1166 23745.851
327.2894 5318.226
328.8852 60934.030
329.1517 31985.486
331.0426 14883.231
332.0268 55126.078
332.2798 47364.519
333.2813 11423.807
337.1990 5330.360
339.2144 38450.804
339.2867 4065.709
340.9561 54101.844
340.9770 28172.160
345.0583 17945.025
345.0583 17877.900
347.1742 7359.428
347.2407 204792.999
353.2302 87864.153
353.2302 129691.696
363.0161 20453.771
363.0943 19481.234
363.2142 9238.244
363.2315 23323.527
363.2533 20039.607
363.2534 22068.718
364.8918 16857.488
364.9368 9527.642
366.9029 18174.233
373.2197 7730.009
385.1147 27907.070
385.1148 19383.655
393.2913 11860.719
396.9074 10793.823
400.8792 10750.249
402.8729 12411.966
407.2771 11270.566
442.8689 18101.972
442.8697 10671.199
447.3470 35927.046
449.2347 6959.247
456.9339 50402.820
461.1670 8636.998
461.1670 8151.706
473.2985 13782.291
490.9224 18510.760

 I think we need a bit more information and perhaps a
 small example data set to see what you want.
 
 I am not familiar with term mass window. Is this a
 confidence interval around the mass value?
 
 
 --- Johannes Graumann [EMAIL PROTECTED]
 wrote:
 
 Dear All,
 
 I have a data frame with the columns Mass and
 Intensity (this is mass
 spectrometry stuff). Each of the mass values gives
 rise to a mass window of
 5 ppm around the individual mass (from mass -
 mass/1E6*5 to mass +
 mass/1E5*5). I need to filter the array such that in
 case these mass
 windows overlap I retain the mass/intensity pair
 with the highest
 intensity.
 I apologize for this question, but I have no formal
 IT education and would
 value any nudges toward favorable algorithmic
 solutions highly.
 
 Thanks for any help,
 
 Joh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Algorythmic Question on Array Filtration

2007-07-14 Thread jim holtman
This will determine where the overlaps are and delete them.  You can
add some more code to determine which ones you want to delete.

 # add the 5ppm to the dataframe
 x$lower - x$Mass * (1 - 5e-6)
 x$upper - x$Mass * (1 + 5e-6)
 # create a matrix for determining overlap by adding 1 at the lower value of a 
 row
 # and substracting 1 at the upper value.
 overlap - rbind(
+ cbind(index=seq(nrow(x)), value=x$lower, oper=1),
+ cbind(index=seq(nrow(x)), value=x$upper, oper=-1))
 # sort in 'value' order to determine overlap
 overlap[] - overlap[order(overlap[,'value'], overlap[, 'oper']),]
 # 'qsize should be 0/1 if no overlap
 overlap - cbind(overlap, qsize=cumsum(overlap[, 'oper']))
 # find the qsize  1 indicating overlap and use the index of that one and the 
 one
 # after as the ones to delete.  You could add code to determine which one to 
 keep
 o.index - which(overlap[,'qsize']  1)
 # determine the indices to delete
 i.delete - unique(c(overlap[o.index,'index'], overlap[o.index+1, 'index']))
 # create the new matrix with overlaps deleted
 new.x - x[-i.delete,]



 head(new.x,10)
   Mass Intensitylowerupper
1  304.9117 35595.780 304.9102 304.9132
2  305.1726 18760.413 305.1711 305.1741
3  311.0636 24047.307 311.0620 311.0652
4  312.9303 12886.216 312.9287 312.9319
9  316.9118  5908.166 316.9102 316.9134
13 318.0114 37929.855 318.0098 318.0130
14 318.9274 27883.295 318.9258 318.9290
15 318.9889  4496.716 318.9873 318.9905
16 321.2784  3893.165 321.2768 321.2800
17 326.1166 23745.851 326.1150 326.1182


On 7/14/07, Johannes Graumann [EMAIL PROTECTED] wrote:
 John Kane wrote:
 Thanks for your time.

 Please find a small example below - the real data is MUCH bigger.
 If you look at rows 5 and 6 of this and calculate the mass precision window
 I have to deal with (5 ppm), you'll find the following:

 Row Lower 5ppm  MassHigher 5ppm Intensity
 5   312.9419312.9435312.945120236.181
 6   312.9422312.9438312.945414404.502

 The precision windows here obviously overlap and I need to get rid of one of
 them, which in this case should be row6, since it has the lower intensity
 associated with it.

 For now I resort to doing an intensity sort and descending into the list
 populate a fresh data.frame with entries that do not have any overlap,
 skipping those that do. If somebody has any more sound ideas, I'd
 appreciate to hear about them.

 Thanks, Joh

 MassIntensity
 304.9117 35595.780
 305.1726 18760.413
 311.0636 24047.307
 312.9303 12886.216
 312.9435 20236.181
 312.9438 14404.502
 313.1763 61033.830
 313.1766 50788.418
 316.9118 5908.166
 317.2805 14084.841
 317.2833 25603.689
 317.2837 22866.578
 318.0114 37929.855
 318.9274 27883.295
 318.9889 4496.716
 321.2784 3893.165
 326.1166 23745.851
 327.2894 5318.226
 328.8852 60934.030
 329.1517 31985.486
 331.0426 14883.231
 332.0268 55126.078
 332.2798 47364.519
 333.2813 11423.807
 337.1990 5330.360
 339.2144 38450.804
 339.2867 4065.709
 340.9561 54101.844
 340.9770 28172.160
 345.0583 17945.025
 345.0583 17877.900
 347.1742 7359.428
 347.2407 204792.999
 353.2302 87864.153
 353.2302 129691.696
 363.0161 20453.771
 363.0943 19481.234
 363.2142 9238.244
 363.2315 23323.527
 363.2533 20039.607
 363.2534 22068.718
 364.8918 16857.488
 364.9368 9527.642
 366.9029 18174.233
 373.2197 7730.009
 385.1147 27907.070
 385.1148 19383.655
 393.2913 11860.719
 396.9074 10793.823
 400.8792 10750.249
 402.8729 12411.966
 407.2771 11270.566
 442.8689 18101.972
 442.8697 10671.199
 447.3470 35927.046
 449.2347 6959.247
 456.9339 50402.820
 461.1670 8636.998
 461.1670 8151.706
 473.2985 13782.291
 490.9224 18510.760

  I think we need a bit more information and perhaps a
  small example data set to see what you want.
 
  I am not familiar with term mass window. Is this a
  confidence interval around the mass value?
 
 
  --- Johannes Graumann [EMAIL PROTECTED]
  wrote:
 
  Dear All,
 
  I have a data frame with the columns Mass and
  Intensity (this is mass
  spectrometry stuff). Each of the mass values gives
  rise to a mass window of
  5 ppm around the individual mass (from mass -
  mass/1E6*5 to mass +
  mass/1E5*5). I need to filter the array such that in
  case these mass
  windows overlap I retain the mass/intensity pair
  with the highest
  intensity.
  I apologize for this question, but I have no formal
  IT education and would
  value any nudges toward favorable algorithmic
  solutions highly.
 
  Thanks for any help,
 
  Joh
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the 

[R] Algorythmic Question on Array Filtration

2007-07-13 Thread Johannes Graumann
Dear All,

I have a data frame with the columns Mass and Intensity (this is mass
spectrometry stuff). Each of the mass values gives rise to a mass window of
5 ppm around the individual mass (from mass - mass/1E6*5 to mass +
mass/1E5*5). I need to filter the array such that in case these mass
windows overlap I retain the mass/intensity pair with the highest
intensity.
I apologize for this question, but I have no formal IT education and would
value any nudges toward favorable algorithmic solutions highly.

Thanks for any help,

Joh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Algorythmic Question on Array Filtration

2007-07-13 Thread Adaikalavan Ramasamy
Sorry, this sounds like a fairly basic question that can be resolved by 
which() and possible ifelse(). There is no details in your email.

I am afraid you have to learn the basics of R or ask question with more 
details (e.g. example data).

Or ask someone locally.

Regards, Adai



Johannes Graumann wrote:
 Dear All,
 
 I have a data frame with the columns Mass and Intensity (this is mass
 spectrometry stuff). Each of the mass values gives rise to a mass window of
 5 ppm around the individual mass (from mass - mass/1E6*5 to mass +
 mass/1E5*5). I need to filter the array such that in case these mass
 windows overlap I retain the mass/intensity pair with the highest
 intensity.
 I apologize for this question, but I have no formal IT education and would
 value any nudges toward favorable algorithmic solutions highly.
 
 Thanks for any help,
 
 Joh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Algorythmic Question on Array Filtration

2007-07-13 Thread John Kane
I think we need a bit more information and perhaps a
small example data set to see what you want.  

I am not familiar with term mass window. Is this a
confidence interval around the mass value? 


--- Johannes Graumann [EMAIL PROTECTED]
wrote:

 Dear All,
 
 I have a data frame with the columns Mass and
 Intensity (this is mass
 spectrometry stuff). Each of the mass values gives
 rise to a mass window of
 5 ppm around the individual mass (from mass -
 mass/1E6*5 to mass +
 mass/1E5*5). I need to filter the array such that in
 case these mass
 windows overlap I retain the mass/intensity pair
 with the highest
 intensity.
 I apologize for this question, but I have no formal
 IT education and would
 value any nudges toward favorable algorithmic
 solutions highly.
 
 Thanks for any help,
 
 Joh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.