Is this something like what you were asking for? The output of a 'split' will be a list of the dataframe subsets for the categories you have specified.
> x <- data.frame(g1=sample(LETTERS[1:2],30,TRUE), + g2=sample(letters[1:2], 30, TRUE), + g3=1:30) > y <- split(x, list(x$g1, x$g2)) > str(y) List of 4 $ A.a:'data.frame': 7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 ..$ g3: int [1:7] 3 4 6 8 9 13 24 $ B.a:'data.frame': 7 obs. of 3 variables: ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 ..$ g3: int [1:7] 10 11 16 17 18 20 25 $ A.b:'data.frame': 6 obs. of 3 variables: ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2 ..$ g3: int [1:6] 2 12 23 26 27 29 $ B.b:'data.frame': 10 obs. of 3 variables: ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2 ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2 2 2 2 2 ..$ g3: int [1:10] 1 5 7 14 15 19 21 22 28 30 > y $A.a g1 g2 g3 3 A a 3 4 A a 4 6 A a 6 8 A a 8 9 A a 9 13 A a 13 24 A a 24 $B.a g1 g2 g3 10 B a 10 11 B a 11 16 B a 16 17 B a 17 18 B a 18 20 B a 20 25 B a 25 $A.b g1 g2 g3 2 A b 2 12 A b 12 23 A b 23 26 A b 26 27 A b 27 29 A b 29 $B.b g1 g2 g3 1 B b 1 5 B b 5 7 B b 7 14 B b 14 15 B b 15 19 B b 19 21 B b 21 22 B b 22 28 B b 28 30 B b 30 > y[[2]] g1 g2 g3 10 B a 10 11 B a 11 16 B a 16 17 B a 17 18 B a 18 20 B a 20 25 B a 25 > > > On Sat, Jul 12, 2008 at 8:51 PM, <[EMAIL PROTECTED]> wrote: > OK. Now I know that I am dealing with a data frame. One last question on this > topic. a <- read.csv() gives me a dataframe. If I have 'c <- split(x, > x$Category), then what is returned by split in this case? c[1] seems to be > OK but c[2] is not right in my mind. If I run ci <- split(nrow(a), > a$Category). And then ci[1] seems to be the rows associated with the first > category, c[2] is the indices/rows associated with the second category, etc. > But this seems different than c[1], c[2], etc. > > Using the techniques below I can get the information on the categories. Now > as an extra level of complexity there are SubCategories within each Category. > Assume that the SubCategory names are not unique within the dataset so if I > want the SubCategory data I need to retrive the indices (or data) for the > Category and SubCategory pair. In other words if I have a Category that > ranges from 'A' to 'Z', it is possible that I might have a subcategory A a, A > b (where a and b are the sub category names). I also might have B a, B b. I > want all of the sub categories A a. NOT the subcategories a (because that > might include B a which would be different). I am guessing that this will > take more than a simple 'split'. > > Thank you. > > Kevin > > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote: >> On 12/07/2008 3:59 PM, [EMAIL PROTECTED] wrote: >> > I am sorry but if read.csv returns a dataframe and a dataframe is like a >> > matrix and I have a set of input like below and a[1,] gives me the first >> > row, what is the second index? From what I read and your input I am >> > guessing that it is the column number. So a[1,1] would return the >> > DayOfYear column for the first row, right? What does a$DayOfYear return? >> >> a$DayOfYear would be the same as a[,1] or a[,"DayOfYear"], i.e. it would >> return the entire first column. >> >> Duncan Murdoch >> >> > >> > Thank you for your patience. >> > >> > Kevin >> > >> > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote: >> >> On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote: >> >>> I am using a simple R statement to read in the file: >> >>> >> >>> a <- read.csv("Sample.dat", header=TRUE) >> >>> >> >>> There is alot of data but the first few lines look like: >> >>> >> >>> DayOfYear,Quantity,Fraction,Category,SubCategory >> >>> 1,82,0.0000390392720794458,(Unknown),(Unknown) >> >>> 2,78,0.0000371349173438631,(Unknown),(Unknown) >> >>> . . . >> >>> 71,2,0.0000009521773677913,WOMEN,Piratesses >> >>> 72,4,0.0000019043547355827,WOMEN,Piratesses >> >>> 73,3,0.0000014282660516870,WOMEN,Piratesses >> >>> 74,14,0.0000066652415745395,WOMEN,Piratesses >> >>> 75,2,0.0000009521773677913,WOMEN,Piratesses >> >>> >> >>> If I read the data in as above, the command >> >>> >> >>> a[1] >> >>> >> >>> results in the output >> >>> >> >>> [ reached getOption("max.print") -- omitted 16193 rows ]] >> >>> >> >>> Shouldn't this be the first row? >> >> No, the first row would be a[1,]. read.csv() returns a dataframe, and >> >> those are indexed with two indices to treat them like a matrix, or with >> >> one index to treat them like a list of their columns. >> >> >> >> Duncan Murdoch >> >> >> >>> a$Category[1] >> >>> >> >>> results in the output >> >>> >> >>> [1] (Unknown) >> >>> 4464 Levels: Tags ... WOMEN >> >>> >> >>> But >> >>> >> >>> a$Category[365] >> >>> >> >>> gives me: >> >>> >> >>> [1] 7 Plates (Dessert),Western\n120,5,0.0000023804434194784,7 Plates >> >>> (Dessert) >> >>> 4464 Levels: Tags ... WOMEN >> >>> >> >>> There is something fundamental about either vectors of the read.csv >> >>> command that I am missing here. >> >>> >> >>> Thank you. >> >>> >> >>> Kevin >> >>> >> >>> ---- jim holtman <[EMAIL PROTECTED]> wrote: >> >>>> Please provide commented, minimal, self-contained, reproducible code, >> >>>> or at least a before/after of what you data would look like. Taking a >> >>>> guess at what you are asking, here is one way of doing it: >> >>>> >> >>>> >> >>>>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20)) >> >>>>> x >> >>>> cat a b >> >>>> 1 B 1 0.65472393 >> >>>> 2 C 2 0.35319727 >> >>>> 3 B 3 0.27026015 >> >>>> 4 A 4 0.99268406 >> >>>> 5 C 5 0.63349326 >> >>>> 6 A 6 0.21320814 >> >>>> 7 C 7 0.12937235 >> >>>> 8 A 8 0.47811803 >> >>>> 9 A 9 0.92407447 >> >>>> 10 A 10 0.59876097 >> >>>> 11 A 11 0.97617069 >> >>>> 12 A 12 0.73179251 >> >>>> 13 B 13 0.35672691 >> >>>> 14 C 14 0.43147369 >> >>>> 15 C 15 0.14821156 >> >>>> 16 C 16 0.01307758 >> >>>> 17 B 17 0.71556607 >> >>>> 18 B 18 0.10318424 >> >>>> 19 C 19 0.44628435 >> >>>> 20 B 20 0.64010105 >> >>>>> # create a list of the indices of the data grouped by 'cat' >> >>>>> split(seq(nrow(x)), x$cat) >> >>>> $A >> >>>> [1] 4 6 8 9 10 11 12 >> >>>> >> >>>> $B >> >>>> [1] 1 3 13 17 18 20 >> >>>> >> >>>> $C >> >>>> [1] 2 5 7 14 15 16 19 >> >>>> >> >>>>> # or do you want the data >> >>>>> split(x, x$cat) >> >>>> $A >> >>>> cat a b >> >>>> 4 A 4 0.9926841 >> >>>> 6 A 6 0.2132081 >> >>>> 8 A 8 0.4781180 >> >>>> 9 A 9 0.9240745 >> >>>> 10 A 10 0.5987610 >> >>>> 11 A 11 0.9761707 >> >>>> 12 A 12 0.7317925 >> >>>> >> >>>> $B >> >>>> cat a b >> >>>> 1 B 1 0.6547239 >> >>>> 3 B 3 0.2702601 >> >>>> 13 B 13 0.3567269 >> >>>> 17 B 17 0.7155661 >> >>>> 18 B 18 0.1031842 >> >>>> 20 B 20 0.6401010 >> >>>> >> >>>> $C >> >>>> cat a b >> >>>> 2 C 2 0.35319727 >> >>>> 5 C 5 0.63349326 >> >>>> 7 C 7 0.12937235 >> >>>> 14 C 14 0.43147369 >> >>>> 15 C 15 0.14821156 >> >>>> 16 C 16 0.01307758 >> >>>> 19 C 19 0.44628435 >> >>>> >> >>>> >> >>>> On Sat, Jul 12, 2008 at 3:32 AM, <[EMAIL PROTECTED]> wrote: >> >>>>> I have search the archive and I could not find what I need so I will >> >>>>> try to ask the question here. >> >>>>> >> >>>>> I read a table in (read.table) >> >>>>> >> >>>>> a <- read.table(.....) >> >>>>> >> >>>>> The table has column names like DayOfYear, Quantity, and Category. >> >>>>> >> >>>>> The values in the row for Category are strings (characters). >> >>>>> >> >>>>> I want to get all of the rows grouped by Category. The number of >> >>>>> unique category names could be around 50. Say for argument sake the >> >>>>> number of categories is exactly 50. Can I somehow get a vector of >> >>>>> length 50 containing the rows corresponding to the category (another >> >>>>> vector)? I realize I can access any row a[i]$Category (right?). But I >> >>>>> wanta vector containing the rows corresponding to each distinct >> >>>>> Category name. >> >>>>> >> >>>>> Thank you. >> >>>>> >> >>>>> Kevin >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help@r-project.org mailing list >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>>> PLEASE do read the posting guide >> >>>>> http://www.R-project.org/posting-guide.html >> >>>>> and provide commented, minimal, self-contained, reproducible code. >> >>>>> >> >>>> >> >>>> -- >> >>>> Jim Holtman >> >>>> Cincinnati, OH >> >>>> +1 513 646 9390 >> >>>> >> >>>> What is the problem you are trying to solve? >> >>> ______________________________________________ >> >>> R-help@r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.