Re: [R] Assoociative array?

jim holtman Sat, 12 Jul 2008 18:50:35 -0700

Is this something like what you were asking for?  The output of a
'split' will be a list of the dataframe subsets for the categories you
have specified.


> x <- data.frame(g1=sample(LETTERS[1:2],30,TRUE),
+     g2=sample(letters[1:2], 30, TRUE),
+     g3=1:30)
> y <- split(x, list(x$g1, x$g2))
> str(y)
List of 4
 $ A.a:'data.frame':    7 obs. of  3 variables:
  ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1
  ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1
  ..$ g3: int [1:7] 3 4 6 8 9 13 24
 $ B.a:'data.frame':    7 obs. of  3 variables:
  ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2
  ..$ g2: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1
  ..$ g3: int [1:7] 10 11 16 17 18 20 25
 $ A.b:'data.frame':    6 obs. of  3 variables:
  ..$ g1: Factor w/ 2 levels "A","B": 1 1 1 1 1 1
  ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2
  ..$ g3: int [1:6] 2 12 23 26 27 29
 $ B.b:'data.frame':    10 obs. of  3 variables:
  ..$ g1: Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2
  ..$ g2: Factor w/ 2 levels "a","b": 2 2 2 2 2 2 2 2 2 2
  ..$ g3: int [1:10] 1 5 7 14 15 19 21 22 28 30
> y
$A.a
   g1 g2 g3
3   A  a  3
4   A  a  4
6   A  a  6
8   A  a  8
9   A  a  9
13  A  a 13
24  A  a 24

$B.a
   g1 g2 g3
10  B  a 10
11  B  a 11
16  B  a 16
17  B  a 17
18  B  a 18
20  B  a 20
25  B  a 25

$A.b
   g1 g2 g3
2   A  b  2
12  A  b 12
23  A  b 23
26  A  b 26
27  A  b 27
29  A  b 29

$B.b
   g1 g2 g3
1   B  b  1
5   B  b  5
7   B  b  7
14  B  b 14
15  B  b 15
19  B  b 19
21  B  b 21
22  B  b 22
28  B  b 28
30  B  b 30

> y[[2]]
   g1 g2 g3
10  B  a 10
11  B  a 11
16  B  a 16
17  B  a 17
18  B  a 18
20  B  a 20
25  B  a 25
>
>
>


On Sat, Jul 12, 2008 at 8:51 PM,  <[EMAIL PROTECTED]> wrote:
> OK. Now I know that I am dealing with a data frame. One last question on this 
> topic. a <- read.csv() gives me a dataframe. If I have 'c <- split(x, 
> x$Category), then what is  returned by split in this case? c[1] seems to be 
> OK but c[2] is not right in my mind. If I run ci <- split(nrow(a), 
> a$Category). And then ci[1] seems to be the rows associated with the first 
> category, c[2] is the indices/rows associated with the second category, etc. 
> But this seems different than c[1], c[2], etc.
>
> Using the techniques below I can get the information on the categories. Now 
> as an extra level of complexity there are SubCategories within each Category. 
> Assume that the SubCategory names are not unique within the dataset so if I 
> want the SubCategory data I need to retrive the indices (or data) for the 
> Category and SubCategory pair. In other words if I have a Category that 
> ranges from 'A' to 'Z', it is possible that I might have a subcategory A a, A 
> b (where a and b are the sub category names). I also might have B a, B b. I 
> want all of the sub categories A a. NOT the subcategories a (because that 
> might include B a which would be different). I am guessing that this will 
> take more than a simple 'split'.
>
> Thank you.
>
> Kevin
>
> ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
>> On 12/07/2008 3:59 PM, [EMAIL PROTECTED] wrote:
>> > I am sorry but if read.csv returns a dataframe and a dataframe is like a 
>> > matrix and I have a set of input like below and a[1,] gives me the first 
>> > row, what is the second index? From what I read and your input I am 
>> > guessing that it is the column number. So a[1,1] would return the 
>> > DayOfYear column for the first row, right? What does a$DayOfYear return?
>>
>> a$DayOfYear would be the same as a[,1] or a[,"DayOfYear"], i.e. it would
>> return the entire first column.
>>
>> Duncan Murdoch
>>
>> >
>> > Thank you for your patience.
>> >
>> > Kevin
>> >
>> > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote:
>> >> On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote:
>> >>> I am using a simple R statement to read in the file:
>> >>>
>> >>> a <- read.csv("Sample.dat", header=TRUE)
>> >>>
>> >>> There is alot of data but the first few lines look like:
>> >>>
>> >>> DayOfYear,Quantity,Fraction,Category,SubCategory
>> >>> 1,82,0.0000390392720794458,(Unknown),(Unknown)
>> >>> 2,78,0.0000371349173438631,(Unknown),(Unknown)
>> >>> . . .
>> >>> 71,2,0.0000009521773677913,WOMEN,Piratesses
>> >>> 72,4,0.0000019043547355827,WOMEN,Piratesses
>> >>> 73,3,0.0000014282660516870,WOMEN,Piratesses
>> >>> 74,14,0.0000066652415745395,WOMEN,Piratesses
>> >>> 75,2,0.0000009521773677913,WOMEN,Piratesses
>> >>>
>> >>> If I read the data in as above, the command
>> >>>
>> >>> a[1]
>> >>>
>> >>> results in the output
>> >>>
>> >>> [ reached getOption("max.print") -- omitted 16193 rows ]]
>> >>>
>> >>> Shouldn't this be the first row?
>> >> No, the first row would be a[1,].  read.csv() returns a dataframe, and
>> >> those are indexed with two indices to treat them like a matrix, or with
>> >> one index to treat them like a list of their columns.
>> >>
>> >> Duncan Murdoch
>> >>
>> >>> a$Category[1]
>> >>>
>> >>> results in the output
>> >>>
>> >>> [1] (Unknown)
>> >>> 4464 Levels:   Tags ... WOMEN
>> >>>
>> >>> But
>> >>>
>> >>> a$Category[365]
>> >>>
>> >>> gives me:
>> >>>
>> >>> [1] 7 Plates   (Dessert),Western\n120,5,0.0000023804434194784,7 Plates   
>> >>> (Dessert)
>> >>> 4464 Levels:   Tags ... WOMEN
>> >>>
>> >>> There is something fundamental about either vectors of the read.csv 
>> >>> command that I am missing here.
>> >>>
>> >>> Thank you.
>> >>>
>> >>> Kevin
>> >>>
>> >>> ---- jim holtman <[EMAIL PROTECTED]> wrote:
>> >>>> Please provide commented, minimal, self-contained, reproducible code,
>> >>>> or at least a before/after of what you data would look like.  Taking a
>> >>>> guess at what you are asking, here is one way of doing it:
>> >>>>
>> >>>>
>> >>>>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20))
>> >>>>> x
>> >>>>    cat  a          b
>> >>>> 1    B  1 0.65472393
>> >>>> 2    C  2 0.35319727
>> >>>> 3    B  3 0.27026015
>> >>>> 4    A  4 0.99268406
>> >>>> 5    C  5 0.63349326
>> >>>> 6    A  6 0.21320814
>> >>>> 7    C  7 0.12937235
>> >>>> 8    A  8 0.47811803
>> >>>> 9    A  9 0.92407447
>> >>>> 10   A 10 0.59876097
>> >>>> 11   A 11 0.97617069
>> >>>> 12   A 12 0.73179251
>> >>>> 13   B 13 0.35672691
>> >>>> 14   C 14 0.43147369
>> >>>> 15   C 15 0.14821156
>> >>>> 16   C 16 0.01307758
>> >>>> 17   B 17 0.71556607
>> >>>> 18   B 18 0.10318424
>> >>>> 19   C 19 0.44628435
>> >>>> 20   B 20 0.64010105
>> >>>>> # create a list of the indices of the data grouped by 'cat'
>> >>>>> split(seq(nrow(x)), x$cat)
>> >>>> $A
>> >>>> [1]  4  6  8  9 10 11 12
>> >>>>
>> >>>> $B
>> >>>> [1]  1  3 13 17 18 20
>> >>>>
>> >>>> $C
>> >>>> [1]  2  5  7 14 15 16 19
>> >>>>
>> >>>>> # or do you want the data
>> >>>>> split(x, x$cat)
>> >>>> $A
>> >>>>    cat  a         b
>> >>>> 4    A  4 0.9926841
>> >>>> 6    A  6 0.2132081
>> >>>> 8    A  8 0.4781180
>> >>>> 9    A  9 0.9240745
>> >>>> 10   A 10 0.5987610
>> >>>> 11   A 11 0.9761707
>> >>>> 12   A 12 0.7317925
>> >>>>
>> >>>> $B
>> >>>>    cat  a         b
>> >>>> 1    B  1 0.6547239
>> >>>> 3    B  3 0.2702601
>> >>>> 13   B 13 0.3567269
>> >>>> 17   B 17 0.7155661
>> >>>> 18   B 18 0.1031842
>> >>>> 20   B 20 0.6401010
>> >>>>
>> >>>> $C
>> >>>>    cat  a          b
>> >>>> 2    C  2 0.35319727
>> >>>> 5    C  5 0.63349326
>> >>>> 7    C  7 0.12937235
>> >>>> 14   C 14 0.43147369
>> >>>> 15   C 15 0.14821156
>> >>>> 16   C 16 0.01307758
>> >>>> 19   C 19 0.44628435
>> >>>>
>> >>>>
>> >>>> On Sat, Jul 12, 2008 at 3:32 AM,  <[EMAIL PROTECTED]> wrote:
>> >>>>> I have search the archive and I could not find what I need so I will 
>> >>>>> try to ask the question here.
>> >>>>>
>> >>>>> I read a table in (read.table)
>> >>>>>
>> >>>>> a <- read.table(.....)
>> >>>>>
>> >>>>> The table has column names like DayOfYear, Quantity, and Category.
>> >>>>>
>> >>>>> The values in the row for Category are strings (characters).
>> >>>>>
>> >>>>> I want to get all of the rows grouped by Category. The number of 
>> >>>>> unique category names could be around 50. Say for argument sake the 
>> >>>>> number of categories is exactly 50. Can I somehow get a vector of 
>> >>>>> length 50 containing the rows corresponding to the category (another 
>> >>>>> vector)? I realize I can access any row a[i]$Category (right?). But I 
>> >>>>> wanta vector containing the rows corresponding to each distinct 
>> >>>>> Category name.
>> >>>>>
>> >>>>> Thank you.
>> >>>>>
>> >>>>> Kevin
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-help@r-project.org mailing list
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide 
>> >>>>> http://www.R-project.org/posting-guide.html
>> >>>>> and provide commented, minimal, self-contained, reproducible code.
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> Jim Holtman
>> >>>> Cincinnati, OH
>> >>>> +1 513 646 9390
>> >>>>
>> >>>> What is the problem you are trying to solve?
>> >>> ______________________________________________
>> >>> R-help@r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide 
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assoociative array?

Reply via email to