[R] question on aggregate
Hi, I am trying to sum column information in a list with 3 instances. For example: ID Traversed IDTraversed IDTraversed 1 5 1 71 8 2 8 2 11 2 7 3 113 22 3 16 What I want to do is sum the Traversed variable across all these instances so I can get a total value for Traversed for each ID column. So the end output should look like the following: IDTraversed 1 20 2 26 3 49 Basically, just sum up the columns by id. I am using the aggregate function, but it is not working. This is what I do: agW-aggregate(rres,rres[c('ID')],sum) //rres is the list Error in FUN(X[[1L]], ...) : argument INDEX is missing, with no default //the error message I get As you can see, I get the index error when I try this above. Can anyone help? Thanks in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with functions within a list
I will try to be clearer with what I wanted. What I am trying to do is take a list, say defined as res, that has a size of 1000 For example: res[[1]] res[[2]] .until res[[1000]] -- IdX TickId X Tick 1 2.2 11 1.4 1 2 3.1 12 3.2 1 1 1.2 21 1.1 2 2 2.2 22 3.0 2 Now, say I want to get the mean of Id=1 or Id=2 for all instances where Tick=1 or Tick=2. So in the example, the result for X when Id=1 and Tick=1 is: 1.8 (i.e., the average between 2.2 and 1.4 in res[[1]] and res[[2]]). However, in reality I would calculate the mean for all 1000 instances of Tick=1 and Id=1. I would also do this for all Id and Tick values. So basically this is what I would like to do for all the 1000 elements in my list. Take the Id and Tick values for each of the elements and find the mean of the x value for all 1000 instances that occur for the given Id and Tick values. It would be nice to get the result in a form such as: result- (some function that does what I want for all Ids and Tick values in the 1000 element list) result IdmeanXTick 1 1.8 1 2 3.151 1 1.152 2 2.6 2 This is what I tried in order to get the mean for all tick values less than 601 and Ids greater than 0: weightX-sapply(res, function(.df) {mean(.df$X[.df$Id0 .df$Tick601])}) This does not work as it seems to not provide the mean across all the elements and ids included in the conditional. I think the result is it just overwrites the previous answer so my final results is not as large as I would expect. Thanks again in advance. Mark -Original Message- From: baptiste auguie [mailto:ba...@exeter.ac.uk] Sent: Fri 3/20/2009 4:32 AM To: Altaweel, Mark R. Cc: r-help@r-project.org Subject: Re: [R] functions within a list Hi, you could have a look at the doBy package which makes these operations easier. Hadley's plyr package is also another option. baptiste On 20 Mar 2009, at 01:01, Altaweel, Mark R. wrote: Hi, I am trying to perform various functions on a list with a number of elements. For example. I would like to take the mean of different variable values in the entire list. As an example, say I have a list with 1000 elements and variables called Id and Tick. What I would like to do is take the mean of a variable called X for each Tick in the data element. So, there can be say 1 to 600 tick values per element in a list, that would mean I would like to find the 600 mean values for each of the ticks values in the 1000 elements. I tried a simple attempt below, but I am sure it is way off as it didn't produce what I expected: //res=res[[1..1000]] weightX-sapply(res, function(.df) {mean(.df$X[.df$Id0 .df $Tick601])}) Basically, I was trying to get the mean for all tick values less than 600 that have an Id variable greater than 0. So, since there are 600 ticks I would like to return a result with 600 mean values, for each of the 600 ticks, that factors all the 1000 occurrences of each tick, starting from 1. I hope this is clear. Thanks in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste AuguiƩ School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting two variables with a third used for color
I have a problem where I have two columns of data that I can simply plot using: plot(wV[0:15,3],wY[0:15,3]). This produces my desired plot. Now, say I have a third variable that I would like to introduce and use that variable to set different colors in the plot In this case, say I wanted values greater than 0 to be blue and values less than 0 to be red Basically, my question is how can one plot something like the function shown above, but with the added functionality of indicating color for a third variable Something to the effect: if(weightComply[0:15,3]0) //if the values from 0 to 15 in column 3 are greater than 0 plot(wV[0:15,3],wY[0:15,3],col=blue) //then plot the dots as blue else plot(wV[0:15,3],wY[0:15,3],col=red) //otherwise plot the dots red I know this syntax is wrong, but I think it shows what I generally want to do. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functions within a list
Hi, I am trying to perform various functions on a list with a number of elements. For example. I would like to take the mean of different variable values in the entire list. As an example, say I have a list with 1000 elements and variables called Id and Tick. What I would like to do is take the mean of a variable called X for each Tick in the data element. So, there can be say 1 to 600 tick values per element in a list, that would mean I would like to find the 600 mean values for each of the ticks values in the 1000 elements. I tried a simple attempt below, but I am sure it is way off as it didn't produce what I expected: //res=res[[1..1000]] weightX-sapply(res, function(.df) {mean(.df$X[.df$Id0 .df$Tick601])}) Basically, I was trying to get the mean for all tick values less than 600 that have an Id variable greater than 0. So, since there are 600 ticks I would like to return a result with 600 mean values, for each of the 600 ticks, that factors all the 1000 occurrences of each tick, starting from 1. I hope this is clear. Thanks in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kruskal.test() on lists
Hi, I am trying to do a kruskal wallis test on two lists, fVisited and cSN: fVisited[[1]] [1] 0.17097623 0.30376141 0.17577266 0.14951855 0.03959753 0.08096217 0.05744888 0.02196257 cSN[[1]] [1] 0.08557303 0.36477791 0.19601252 0.12981040 0.05351320 0.10385542 0.03539577 0.03106175 So if I just want to do a test on just one of the entries this is simple enough: kruskal.test(fVisited[[1]],cSN[[1]]) Kruskal-Wallis rank sum test data: fVisited[[1]] and cSN[[1]] Kruskal-Wallis chi-squared = 7, df = 7, p-value = 0.4289 However, if I try to do it over the entire list I get a problem. I wanted to do a test comparing each pair of distributions, so I thought something such as: kT-sapply(fVisited,function(.df){sapply(cSN,functions(.vecs){kruskal.test(.df,.vecs)})} But that produces this: Error in kruskal.test.default(.df, .vecs) : 'x' and 'g' must have the same length However, the values do have the same length. Can anyone see what I am doing wrong here? Thanks again for everyone's help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Line of best
Hi, I have a scatter plot, with an equation that best fits the scatter plot expressed as: 1/x^.6. I know for normal linear regression lines you can use the abline() command; however, since my best fit line is not linear, how can I draw my line on the scatter plot in a similar fashion to abline(). Thanks for everyone's help again. I appreciate this board's advice. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] conditional with and operators
Hi, I have a problem in which I am parsing data from a list. I am simply trying to return data that has several conditions being true. Here is my syntax below: d-sapply(res,function(.df){(.df$TimesVisited[.df$Tick912 .df$Id0])}) #res is the list, and I am trying to return a result that has two true conditions (that is the variable Tick should be greate than 912 and Id should be greater than 0 This returns an array of 10 with integer values of 0. This is the incorrect result However, if I do the same syntax except I remove the statement (i.e. the second conditional), then the result producing something that makes sense, which is all values that are Tick and greater than 912. Can someone let me know how I can setup my data to be parsed so I can have 2 or multiple conditionals in my function that is looking at an array. Thanks in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional with and operators
Hi, Ok that worked...im from a Java background so I guess Im used to the rather than Thanks! Mark -Original Message- From: Steven McKinney [mailto:[EMAIL PROTECTED] Sent: Tue 8/19/2008 3:33 PM To: Altaweel, Mark R.; r-help@r-project.org Subject: RE: [R] conditional with and operators Did you try it with the vector '' and operator? d-sapply(res,function(.df){(.df$TimesVisited[.df$Tick912 .df$Id0])}) (The '' operator is designed for use in e.g. if() clauses where you want a scalar logical answer) HTH Steve McKinney -Original Message- From: [EMAIL PROTECTED] on behalf of Altaweel, Mark R. Sent: Tue 8/19/2008 1:10 PM To: r-help@r-project.org Subject: [R] conditional with and operators Hi, I have a problem in which I am parsing data from a list. I am simply trying to return data that has several conditions being true. Here is my syntax below: d-sapply(res,function(.df){(.df$TimesVisited[.df$Tick912 .df$Id0])}) #res is the list, and I am trying to return a result that has two true conditions (that is the variable Tick should be greate than 912 and Id should be greater than 0 This returns an array of 10 with integer values of 0. This is the incorrect result However, if I do the same syntax except I remove the statement (i.e. the second conditional), then the result producing something that makes sense, which is all values that are Tick and greater than 912. Can someone let me know how I can setup my data to be parsed so I can have 2 or multiple conditionals in my function that is looking at an array. Thanks in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix question
Hi, I have a vector and a list, with data I would like to multiply together. So for instance I have a vector s: [[1]] [1] 44308 [[2]] [1] 4371 Also, I have a list d: [[1]] [1] 1201 6170 2036 2927 1625 1391 2074 1453 3172 3027 4691 3719 1185 320 2071 1027 1046 1186 1403 580 1382 4408 174 [[2]] [1] 6521 688 2678 3409 3033 1608 3618 1461 1836 2104 879 1095 2630 1591 2986 703 2548 913 1426 753 256 869 106 I want to multiply them together and put the results in a matrix. This is my syntax: for(i in 1:length(s)) for(j in 1:length(d)) m-d[[j]][j]/s[[i]] #m is the matrix of dimensions set to X(e.g. 10X10) However, it seems I only get one result when i get m, which is the last value of d/j, which is m[1]=0.04 in this case. I am sure Im doing something wrong here, but can't quite find the solution. Thanks. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional statement used in sapply()
Hi, I have data stored in a list that I would like to aggregate and perform some basic stats. However, I would like to apply conditional statements so that not all the data are used. Basically, I want to get a specific variable, do some basic functions (such as a mean), but only get the data in each element's data that match the condition. The code I used is below: result-sapply(res, function(.df) { #res is the list containing file data + if(.df$Volume0)mean(.df$Volume) #only have the mean function calculate on values great than 0 + }) I did get a numeric output; however, when I checked the output value the conditional was ignored (i.e. it did not do anything to the calculation) I also obtained these warning statements: Warning messages: 1: In if (.df$Volume 0) mean(.df$Volume) : the condition has length 1 and only the first element will be used 2: In if (.df$Volume 0) mean(.df$Volume) : the condition has length 1 and only the first element will be used Please let me know what am I doing wrong and how can I apply a conditional statement to the sapply function. Thanks Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional statement used in sapply()
Hi, Yes, that's it. I got the correct results. Thanks everyone for their help once again. This is a great help board. Mark -Original Message- From: Steven McKinney [mailto:[EMAIL PROTECTED] Sent: Wed 8/13/2008 5:29 PM To: Altaweel, Mark R.; r-help@r-project.org Subject: RE: [R] Conditional statement used in sapply() -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Altaweel, Mark R. Sent: Wednesday, August 13, 2008 3:03 PM To: r-help@r-project.org Subject: [R] Conditional statement used in sapply() Hi, I have data stored in a list that I would like to aggregate and perform some basic stats. However, I would like to apply conditional statements so that not all the data are used. Basically, I want to get a specific variable, do some basic functions (such as a mean), but only get the data in each element's data that match the condition. The code I used is below: result-sapply(res, function(.df) { #res is the list containing file data + if(.df$Volume0)mean(.df$Volume) #only have the mean function calculate on values great than 0 + }) You probably want something such as result-sapply(res, function(.df) { mean(.df$Volume[.df$Volume0]) }) HTH Steve McKinney I did get a numeric output; however, when I checked the output value the conditional was ignored (i.e. it did not do anything to the calculation) I also obtained these warning statements: Warning messages: 1: In if (.df$Volume 0) mean(.df$Volume) : the condition has length 1 and only the first element will be used 2: In if (.df$Volume 0) mean(.df$Volume) : the condition has length 1 and only the first element will be used Please let me know what am I doing wrong and how can I apply a conditional statement to the sapply function. Thanks Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parsing array data
Hi, I read in csv files with the following code: res - vector(mode=list,length=3) for(i in 1: length(res)) res[[i]]-read.csv(file=paste(/Users/markaltaweel/Desktop/Output/HydroDataOutput,i,.csv,sep=),header=T,sep=,) This allows me to load the data into an array of length 3, with the res array containing my data from the csv files. I would like to parse the data in the res array so that I can access my specific data columns. For instance, I have field names called 'Discharge,' 'Volume,' and 'Id'. I would like to aggregate these fields for all the loaded files in the array so that I can perform some basic comparisons of distributions, etc. Does anyone have any example code that would allow me to parse the array so that I can extract the column names and perform some data aggregation and basic stats. Currently, even if I do the following code I get a Null response: a=res[1] a[['Discharge']] # this returns [[1]] NULL Please let me know if there is a solution to this or what am I doing wrong. Thank you in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing array data
Hi, Great, that does solve most of my problem. I loaded the files, and did this: a -agg[[1]] meanD=mean(a[['Discharge']]) # returning the mean of the variable discharge for the first data element in agg Now, if I wanted to get the mean or just aggregate the variable Discharge of the entire array (which is length 3), is there a way to do that easily? I think the function aggregate(x,...) might have something to do with it, as I saw examples there. However, I have not successfully been able to get the data to aggregate. Thanks again. Mark -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tue 8/12/2008 1:05 PM To: Altaweel, Mark R. Subject: RE: [R] Parsing array data try a-res[[1]] instead of a-res[1]. you need to acess the dataframe and , to do that, you need to access WHAT'S IN THE FIRST COMPONENT OF THE LIST NOT THE FIRST LIST COMPONENT ITSELF. so that's why you need the double brackets. It takes time to get one's hand around this list concept and the different ways of bracketing but, when you do, it's a powerful mechanism. I also recommending staying on this list and watching and trying solutions if you want to get into R more deeply. I'm trying and I find that's the best way to reach that goal. good luck. mark On Tue, Aug 12, 2008 at 1:35 PM, Altaweel, Mark R. wrote: Hi, I read in csv files with the following code: res - vector(mode=list,length=3) for(i in 1: length(res)) res[[i]]-read.csv(file=paste(/Users/markaltaweel/Desktop/Output/HydroDataOutput,i,.csv,sep=),header=T,sep=,) This allows me to load the data into an array of length 3, with the res array containing my data from the csv files. I would like to parse the data in the res array so that I can access my specific data columns. For instance, I have field names called 'Discharge,' 'Volume,' and 'Id'. I would like to aggregate these fields for all the loaded files in the array so that I can perform some basic comparisons of distributions, etc. Does anyone have any example code that would allow me to parse the array so that I can extract the column names and perform some data aggregation and basic stats. Currently, even if I do the following code I get a Null response: a=res[1] a[['Discharge']] # this returns [[1]] NULL Please let me know if there is a solution to this or what am I doing wrong. Thank you in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing array data
Great, that works. Thank you. Mark -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tue 8/12/2008 2:49 PM To: Altaweel, Mark R. Cc: [EMAIL PROTECTED]; r-help@r-project.org Subject: RE: [R] Parsing array data Hi: you can do result-lapply(agg, function(.df) { mean(.df$Discharge) }) this will give you the mean of the Discharge column in the various dataframes. aggregate s used more when you want to do calculations on various subset by another variable. if you want to do that, you can. like say there was an ID variable in addition to Discharge. Then, you can do result-lapply(agg, function(.df) { aggregate(.df$Discharge,id=list(.df$ID),mean,na.rm=TRUE) }) On Tue, Aug 12, 2008 at 2:39 PM, Altaweel, Mark R. wrote: Hi, Great, that does solve most of my problem. I loaded the files, and did this: a -agg[[1]] meanD=mean(a[['Discharge']]) # returning the mean of the variable discharge for the first data element in agg Now, if I wanted to get the mean or just aggregate the variable Discharge of the entire array (which is length 3), is there a way to do that easily? I think the function aggregate(x,...) might have something to do with it, as I saw examples there. However, I have not successfully been able to get the data to aggregate. Thanks again. Mark -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tue 8/12/2008 1:05 PM To: Altaweel, Mark R. Subject: RE: [R] Parsing array data try a-res[[1]] instead of a-res[1]. you need to acess the dataframe and , to do that, you need to access WHAT'S IN THE FIRST COMPONENT OF THE LIST NOT THE FIRST LIST COMPONENT ITSELF. so that's why you need the double brackets. It takes time to get one's hand around this list concept and the different ways of bracketing but, when you do, it's a powerful mechanism. I also recommending staying on this list and watching and trying solutions if you want to get into R more deeply. I'm trying and I find that's the best way to reach that goal. good luck. mark On Tue, Aug 12, 2008 at 1:35 PM, Altaweel, Mark R. wrote: Hi, I read in csv files with the following code: res - vector(mode=list,length=3) for(i in 1: length(res)) res[[i]]-read.csv(file=paste(/Users/markaltaweel/Desktop/Output/HydroDataOutput,i,.csv,sep=),header=T,sep=,) This allows me to load the data into an array of length 3, with the res array containing my data from the csv files. I would like to parse the data in the res array so that I can access my specific data columns. For instance, I have field names called 'Discharge,' 'Volume,' and 'Id'. I would like to aggregate these fields for all the loaded files in the array so that I can perform some basic comparisons of distributions, etc. Does anyone have any example code that would allow me to parse the array so that I can extract the column names and perform some data aggregation and basic stats. Currently, even if I do the following code I get a Null response: a=res[1] a[['Discharge']] # this returns [[1]] NULL Please let me know if there is a solution to this or what am I doing wrong. Thank you in advance. Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.