Re: [R] a workaround for indexing a function?

2014-06-18 Thread Adam Kustka
I noticed a missing line in the code associated with my question.  I've
re-supplied the question now with that line.

Sorry for the trouble,
Adam

 



I have a dataset of peptide 15N/14N ratios from four experiments (where
cells are grown under different conditions and were fed either 15N or 14N
nitrate aka plant food if you are interested) and I want to normalize each
of these ratios to the median 15N/14N value from all 6000 peptides from
each experiment.  Then, since these ratios (from 2 or more peptides) are
used to calculate the protein 15N/14N abundance, I want to take the median
value of all peptides belonging to one protein to calculate the overall
protein 15N/14N abundance within that experiment.



I’ve managed to break the data set into small pieces (by protein and by
experiment), which will lead to about 500 observations.



A dummy – small – dataset could look like this (below), where V1 lists the
peptides that make up a protein, V2 is the protein ID, V3 is the metric of
interest (the 15N/14N ratio of each peptide), and V4 indicates which
experiment was run.



The stumbling block here is that I cannot index the median function.



Here is what I have done so far.



y-read.table(“nameoffile.txt”)



h-split(y,y$V4)



for(n in names(h))

write.table(h[[n]], 
file=paste(y[[n]][,”V4”][4],”_”,n,”.TXT”,sep=””))



Then I (not elegantly but OK with brute force method for four files) went
into each file and calculated the median





vls1-read.table(“VLS5584_VLS5584.TXT”)  # for example, but did this four
times for each file



newcol=vls1[,3]/median(vls1[,3]) #make a new column of 15N/14N data now
normalized to median value… here this is shown for one of four experiments.




normvls1=cbind(vls1,newcol) #added to existing data with global median
normalized numbers.



nvls1prot-split(normvls1,normvls1$V2) #splits up this experiment into many
files.. one for each protein.



for(n in
names(nvls1prot))write.table(nvls1prot[[n]],file=paste(nvls1prot[[n]][,V4][1],_,
n,.TXT,sep=))  #writes these files



Then I took this output in windows explorer and moved it to a new folder.
Set working directory to that folder.





And, lifted from a help page and tweaked to my liking..



# batch import text files (files must be in working directory);
'pattern' is case-sensitive

txtfiles = list.files(pattern=*.TXT)



for (i in 1:length(txtfiles)){

  tmp = read.table(txtfiles[i])

  xyz = data.frame(tmp)

  normHL = xyz[[5]]

median[i] = median(normHL)

}





But after that, I get the following error message.



Error in median[i] = median(normHL) :

  object of type 'closure' is not subsettable



Thanks,

Adam Kustka



SEAEFVNYQVR

B5YLU3

0.226

VLS5584

LLQTEPGTR

B5YLU3

0.199

VLS5584

SEAEFVNYQVR

B5YLU3

0.216

VLS5585

LLQTEPGTR

B5YLU3

0.183

VLS5585

SEAEFVNYQVR

B5YLU3

0.266

VLS5586

FAQMAVLGFIIPEK

B5YLU3

.2

VLS5586

LLQTEPGTR

B5YLU3

0.203

VLS5586

SEAEFVNYQVR

B5YLU3

0.516

VLS5587

FAQMAVLGFIIPEK

B5YLU3

0.764

VLS5587

IEGLGWRPK

B5YLU3

.2

VLS5587

LLQTEPGTR

B5YLU3

0.338

VLS5587

LLQTEPGTR

B5YLU3

.2

VLS5587

FSCAYLVDNPR

B5YLV4

0.131

VLS5584

ITCQDPSDDFPTYR

B5YLV4

0.122

VLS5584

SPLPEDLMPEFSFR

B5YLV4

0.105

VLS5584

ITCQDPSDDFPTYR

B5YLV4

.2

VLS5585

FLEVPMYLK

B5YLV4

.2

VLS5585

ITCQDPSDDFPTYR

B5YLV4

0.113

VLS5585

ITCQDPSDDFPTYR

B5YLV4

0.112

VLS5586

FSCAYLVDNPR

B5YLV4

0.128

VLS5586

SPLPEDLMPEFSFR

B5YLV4

0.126

VLS5586

FLEVPMYLKCLDR

B5YLV4

0.406

VLS5586

LYYNVETLK

B5YLV4

0.11

VLS5586

GEDGYLTTK

B5YLV4

0.144

VLS5586

FLEVPMYLK

B5YLV4

0.134

VLS5586

ITCQDPSDDFPTYR

B5YLV4

.2

VLS5586

FSCAYLVDNPR

B5YLV4

.2

VLS5586

FKDDFFLK

B5YLV4

0.121

VLS5586

ITCQDPSDDFPTYRDLEK

B5YLV4

0.138

VLS5586

FLEVPMYLK

B5YLV4

0.123

VLS5586

CFDDAFVR

B5YLV4

0.15

VLS5586

VFFTHGMYYTGGNLVAQVK

B5YLV4

0.25

VLS5586

FKDDFFLK

B5YLV4

0.141

VLS5586

CGFDEVHAR

B5YLV4

0.108

VLS5586

YPENYNIEELPAAGQSYIHPDTYVQR

B5YLV4

0.153

VLS5587

SPLPEDLMPEFSFR

B5YLV4

0.153

VLS5587

FSCAYLVDNPR

B5YLV4

0.137

VLS5587

LYYNVETLK

B5YLV4

0.157

VLS5587

CGFDEVHAR

B5YLV4

0.145

VLS5587

FLEVPMYLK

B5YLV4

0.142

VLS5587




On Wed, Jun 18, 2014 at 9:12 AM, Adam Kustka kus...@andromeda.rutgers.edu
wrote:



 I have a dataset of peptide 15N/14N ratios from four experiments (where
 cells are grown under different conditions and were fed either 15N or 14N
 nitrate aka plant food if you are interested) and I want to normalize each
 of these ratios to the median 15N/14N value from all 6000 peptides from
 each experiment.  Then, since these ratios (from 2 or more peptides) are
 used to calculate the protein 15N/14N abundance, I want to take the median
 value of all peptides belonging to one protein to calculate the overall
 protein 15N/14N abundance within that experiment.



 I’ve managed to break the data set into small pieces (by protein and by
 experiment), which will lead to about 500 observations.



 A dummy â

[R] a workaround for indexing a function?

2014-06-18 Thread Adam Kustka
I have a dataset of peptide 15N/14N ratios from four experiments (where
cells are grown under different conditions and were fed either 15N or 14N
nitrate aka plant food if you are interested) and I want to normalize each
of these ratios to the median 15N/14N value from all 6000 peptides from
each experiment.  Then, since these ratios (from 2 or more peptides) are
used to calculate the protein 15N/14N abundance, I want to take the median
value of all peptides belonging to one protein to calculate the overall
protein 15N/14N abundance within that experiment.



I’ve managed to break the data set into small pieces (by protein and by
experiment), which will lead to about 500 observations.



A dummy – small – dataset could look like this (below), where V1 lists the
peptides that make up a protein, V2 is the protein ID, V3 is the metric of
interest (the 15N/14N ratio of each peptide), and V4 indicates which
experiment was run.



The stumbling block here is that I cannot index the median function.



Here is what I have done so far.



y-read.table(“nameoffile.txt”)



h-split(y,y$V4)



for(n in names(h))

write.table(h[[n]], 
file=paste(y[[n]][,”V4”][4],”_”,n,”.TXT”,sep=””))



Then I (not elegantly but OK with brute force method for four files) went
into each file and calculated the median





vls1-read.table(“VLS5584_VLS5584.TXT”)  # for example, but did this four
times for each file



newcol=vls1[,3]/median(vls1[,3])



normvls1=cbind(vls1,newcol)



nvls1prot-split(normvls1,normvls1$V2)



for(n in
names(nvls1prot))write.table(nvls1prot[[n]],file=paste(nvls1prot[[n]][,V4][1],_,
n,.TXT,sep=))



Then I took this output in windows explorer and moved it to a new folder.
Set working directory to that folder.



And, lifted from a help page and tweaked to my liking..

for (i in 1:length(txtfiles)){

  tmp = read.table(txtfiles[i])

  xyz = data.frame(tmp)

  normHL = xyz[[5]]

median[i] = median(normHL)

}





But after that, I get the following error message.



Error in median[i] = median(normHL) :

  object of type 'closure' is not subsettable



Thanks,

Adam Kustka



SEAEFVNYQVR

B5YLU3

0.226

VLS5584

LLQTEPGTR

B5YLU3

0.199

VLS5584

SEAEFVNYQVR

B5YLU3

0.216

VLS5585

LLQTEPGTR

B5YLU3

0.183

VLS5585

SEAEFVNYQVR

B5YLU3

0.266

VLS5586

FAQMAVLGFIIPEK

B5YLU3

.2

VLS5586

LLQTEPGTR

B5YLU3

0.203

VLS5586

SEAEFVNYQVR

B5YLU3

0.516

VLS5587

FAQMAVLGFIIPEK

B5YLU3

0.764

VLS5587

IEGLGWRPK

B5YLU3

.2

VLS5587

LLQTEPGTR

B5YLU3

0.338

VLS5587

LLQTEPGTR

B5YLU3

.2

VLS5587

FSCAYLVDNPR

B5YLV4

0.131

VLS5584

ITCQDPSDDFPTYR

B5YLV4

0.122

VLS5584

SPLPEDLMPEFSFR

B5YLV4

0.105

VLS5584

ITCQDPSDDFPTYR

B5YLV4

.2

VLS5585

FLEVPMYLK

B5YLV4

.2

VLS5585

ITCQDPSDDFPTYR

B5YLV4

0.113

VLS5585

ITCQDPSDDFPTYR

B5YLV4

0.112

VLS5586

FSCAYLVDNPR

B5YLV4

0.128

VLS5586

SPLPEDLMPEFSFR

B5YLV4

0.126

VLS5586

FLEVPMYLKCLDR

B5YLV4

0.406

VLS5586

LYYNVETLK

B5YLV4

0.11

VLS5586

GEDGYLTTK

B5YLV4

0.144

VLS5586

FLEVPMYLK

B5YLV4

0.134

VLS5586

ITCQDPSDDFPTYR

B5YLV4

.2

VLS5586

FSCAYLVDNPR

B5YLV4

.2

VLS5586

FKDDFFLK

B5YLV4

0.121

VLS5586

ITCQDPSDDFPTYRDLEK

B5YLV4

0.138

VLS5586

FLEVPMYLK

B5YLV4

0.123

VLS5586

CFDDAFVR

B5YLV4

0.15

VLS5586

VFFTHGMYYTGGNLVAQVK

B5YLV4

0.25

VLS5586

FKDDFFLK

B5YLV4

0.141

VLS5586

CGFDEVHAR

B5YLV4

0.108

VLS5586

YPENYNIEELPAAGQSYIHPDTYVQR

B5YLV4

0.153

VLS5587

SPLPEDLMPEFSFR

B5YLV4

0.153

VLS5587

FSCAYLVDNPR

B5YLV4

0.137

VLS5587

LYYNVETLK

B5YLV4

0.157

VLS5587

CGFDEVHAR

B5YLV4

0.145

VLS5587

FLEVPMYLK

B5YLV4

0.142

VLS5587

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adonis output not significant even with dummy input?

2014-06-01 Thread Adam Kustka
I used adonis to test tested for differences in the relative species (OTU)
abundances from incubation experiments with a common starting water source
and duplicates for each of three treatments (a one way test with three
levels in the independent variable).  For full disclosure (but  it may not
be relevant) I came up with a metric to reflect change in rel species
abundance from t0 to t9 days, weighted to not overly inflate significance
of rare species. The Bray Curtis similarity table looked compelling (where
A and B represent replicates).  I was surprised to see p ~0.06.  So, I made
a dummy file with a blatant differences (dummy file below) and only
obtained a p value of 0.05. I’m clearly missing something here.  BTW I also
noticed if you purposefully mismatch your env file you can still get a
“result”.  I'm so grateful for your help!!!





B-C output from real data (three treatments each replicated twice)



St24FeA

St24FeB

St24NoFeA

St24NoFeB

St24EntA

St24FeB

0.081774

St24NoFeA

0.322509

0.282081

St24NoFeB

0.343357

0.301258

0.064281

St24EntA

0.268338

0.221075

0.098628

0.161846

St24EntB

0.203852

0.194518

0.166727

0.221682

0.096111



Dummy file used as

OTU1

OTU2

OTU3

OTU5

St24FeA

100

20

100

20

St24FeB

100

25

100

25

St24NoFeA

5

80

5

80

St24NoFeB

5

80

5

80

St24EntA

7

100

7

100

St24EntB

7

100

7

100



BC of dummy file (BCdum) is even more compelling than the real data (no
transforms done for this).



Output from adonis using BCdum as input.



Call:

adonis(formula = Bcdum ~ treatment data=st24.env, permutation=2000)

Terms

added

sequentially

(first

to

last)

Df

SumsOfSqs

MeanSqs

F.Model

R2

Pr(F)

treatment

2

0.73764

0.36882

5313.2

0.99972

0.05147

Residuals

3

0.00021

0.7

0.00028

Total

5

0.73785

1


The entire script is here.  Note the “abundance” data is not transformed in
R.  Rather, I calculate a metric of change in percent abundance



t= 9 d(counts(i)/counts(total) /

t = 0 d(counts(i)/counts(total)



which is then normalized so small abs changes in rare species don’t have an
undue influence.





St24Y-read.table(st24relchangeforR.txt)

BC24-vegdist(St24Y,method=”bray”,binary=FALSE, diag=FALSE,
upper=FALSE,na.rm = FALSE)

st24.env-read.table(st24envfileforR.txt)

adonis(BC24~ treatment,st24.env,perm=2000)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.