I think there is a problem with my file or with 'read.csv'. As you said, a[1,] returns the first row
a[1,] DayOfYear Quantity Fraction Category SubCategory 1 1 82 0.0000390392720794458 (Unknown) (Unknown) a[2,] returns the second row a[2,] DayOfYear Quantity Fraction Category SubCategory 2 2 78 0.0000371349173438631 (Unknown) (Unknown) This seems to continue up to row 348 after which I get something like: But when I issue the command for what I would suspect to be the 365th row: I get: a[365,] DayOfYear Quantity Fraction 365 82 4 0.0000019043547355827 Category 365 7 Plates (Dessert),Western\n120,5,0.0000023804434194784,7 Plates (Dessert) SubCategory 365 Western If I brin up WinEdt and look at this transition: 355,1,0.0000004760886838956,(Unknown),(Unknown) 362,15,0.0000071413302584352,(Unknown),(Unknown) 363,1,0.0000004760886838956,(Unknown),(Unknown) 1,2,0.0000009521773677913,7" Plates (Dessert),Elmo Loves You/Hooray For Elmo 7,3,0.0000014282660516870,7" Plates (Dessert),Elmo Loves You/Hooray For Elmo 18,8,0.0000038087094711654,7" Plates (Dessert),Elmo Loves You/Hooray For Elmo Could the " character cause read.csv to get confused? Thank you. Kevin ---- [EMAIL PROTECTED] wrote: > I am sorry but if read.csv returns a dataframe and a dataframe is like a > matrix and I have a set of input like below and a[1,] gives me the first row, > what is the second index? From what I read and your input I am guessing that > it is the column number. So a[1,1] would return the DayOfYear column for the > first row, right? What does a$DayOfYear return? > > Thank you for your patience. > > Kevin > > ---- Duncan Murdoch <[EMAIL PROTECTED]> wrote: > > On 12/07/2008 12:31 PM, [EMAIL PROTECTED] wrote: > > > I am using a simple R statement to read in the file: > > > > > > a <- read.csv("Sample.dat", header=TRUE) > > > > > > There is alot of data but the first few lines look like: > > > > > > DayOfYear,Quantity,Fraction,Category,SubCategory > > > 1,82,0.0000390392720794458,(Unknown),(Unknown) > > > 2,78,0.0000371349173438631,(Unknown),(Unknown) > > > . . . > > > 71,2,0.0000009521773677913,WOMEN,Piratesses > > > 72,4,0.0000019043547355827,WOMEN,Piratesses > > > 73,3,0.0000014282660516870,WOMEN,Piratesses > > > 74,14,0.0000066652415745395,WOMEN,Piratesses > > > 75,2,0.0000009521773677913,WOMEN,Piratesses > > > > > > If I read the data in as above, the command > > > > > > a[1] > > > > > > results in the output > > > > > > [ reached getOption("max.print") -- omitted 16193 rows ]] > > > > > > Shouldn't this be the first row? > > > > No, the first row would be a[1,]. read.csv() returns a dataframe, and > > those are indexed with two indices to treat them like a matrix, or with > > one index to treat them like a list of their columns. > > > > Duncan Murdoch > > > > > > > > a$Category[1] > > > > > > results in the output > > > > > > [1] (Unknown) > > > 4464 Levels: Tags ... WOMEN > > > > > > But > > > > > > a$Category[365] > > > > > > gives me: > > > > > > [1] 7 Plates (Dessert),Western\n120,5,0.0000023804434194784,7 Plates > > > (Dessert) > > > 4464 Levels: Tags ... WOMEN > > > > > > There is something fundamental about either vectors of the read.csv > > > command that I am missing here. > > > > > > Thank you. > > > > > > Kevin > > > > > > ---- jim holtman <[EMAIL PROTECTED]> wrote: > > >> Please provide commented, minimal, self-contained, reproducible code, > > >> or at least a before/after of what you data would look like. Taking a > > >> guess at what you are asking, here is one way of doing it: > > >> > > >> > > >>> x <- data.frame(cat=sample(LETTERS[1:3],20,TRUE),a=1:20, b=runif(20)) > > >>> x > > >> cat a b > > >> 1 B 1 0.65472393 > > >> 2 C 2 0.35319727 > > >> 3 B 3 0.27026015 > > >> 4 A 4 0.99268406 > > >> 5 C 5 0.63349326 > > >> 6 A 6 0.21320814 > > >> 7 C 7 0.12937235 > > >> 8 A 8 0.47811803 > > >> 9 A 9 0.92407447 > > >> 10 A 10 0.59876097 > > >> 11 A 11 0.97617069 > > >> 12 A 12 0.73179251 > > >> 13 B 13 0.35672691 > > >> 14 C 14 0.43147369 > > >> 15 C 15 0.14821156 > > >> 16 C 16 0.01307758 > > >> 17 B 17 0.71556607 > > >> 18 B 18 0.10318424 > > >> 19 C 19 0.44628435 > > >> 20 B 20 0.64010105 > > >>> # create a list of the indices of the data grouped by 'cat' > > >>> split(seq(nrow(x)), x$cat) > > >> $A > > >> [1] 4 6 8 9 10 11 12 > > >> > > >> $B > > >> [1] 1 3 13 17 18 20 > > >> > > >> $C > > >> [1] 2 5 7 14 15 16 19 > > >> > > >>> # or do you want the data > > >>> split(x, x$cat) > > >> $A > > >> cat a b > > >> 4 A 4 0.9926841 > > >> 6 A 6 0.2132081 > > >> 8 A 8 0.4781180 > > >> 9 A 9 0.9240745 > > >> 10 A 10 0.5987610 > > >> 11 A 11 0.9761707 > > >> 12 A 12 0.7317925 > > >> > > >> $B > > >> cat a b > > >> 1 B 1 0.6547239 > > >> 3 B 3 0.2702601 > > >> 13 B 13 0.3567269 > > >> 17 B 17 0.7155661 > > >> 18 B 18 0.1031842 > > >> 20 B 20 0.6401010 > > >> > > >> $C > > >> cat a b > > >> 2 C 2 0.35319727 > > >> 5 C 5 0.63349326 > > >> 7 C 7 0.12937235 > > >> 14 C 14 0.43147369 > > >> 15 C 15 0.14821156 > > >> 16 C 16 0.01307758 > > >> 19 C 19 0.44628435 > > >> > > >> > > >> On Sat, Jul 12, 2008 at 3:32 AM, <[EMAIL PROTECTED]> wrote: > > >>> I have search the archive and I could not find what I need so I will > > >>> try to ask the question here. > > >>> > > >>> I read a table in (read.table) > > >>> > > >>> a <- read.table(.....) > > >>> > > >>> The table has column names like DayOfYear, Quantity, and Category. > > >>> > > >>> The values in the row for Category are strings (characters). > > >>> > > >>> I want to get all of the rows grouped by Category. The number of unique > > >>> category names could be around 50. Say for argument sake the number of > > >>> categories is exactly 50. Can I somehow get a vector of length 50 > > >>> containing the rows corresponding to the category (another vector)? I > > >>> realize I can access any row a[i]$Category (right?). But I wanta vector > > >>> containing the rows corresponding to each distinct Category name. > > >>> > > >>> Thank you. > > >>> > > >>> Kevin > > >>> > > >>> ______________________________________________ > > >>> R-help@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >>> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > >> > > >> > > >> -- > > >> Jim Holtman > > >> Cincinnati, OH > > >> +1 513 646 9390 > > >> > > >> What is the problem you are trying to solve? > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.