Aha! So to expand that from the original expression, > table(table(d$filename, d$email_addr))
0 1 2 3 253 20 8 9 I think that is exactly what I'm looking for. I knew it must be simple!!! What does the 0 column represent? Also, does this tell me the same thing, filtered by Japan? > table(table(d$filename, d$email_addr, > d$country_residence)[d$country_residence=="Japan"]) 0 1 2 3 958 5 2 1 How does that differ logically from this? > table(table(d$filename, d$email_addr)[d$country_residence=="Japan"]) 0 1 2 3 51 4 2 1 I don't understand why that produces different results. The first one adds a third dimension to the table, but limits that third dimension to a single element, Japan. Shouldn't it be the same? And again, what's that zero column? Thank you, Matt On 6/18/07, jim holtman <[EMAIL PROTECTED]> wrote: > If you are running on windows, make sure you have 'recording' checked in the > history window of the graphics. You can also put the output to a pdf file > and view it later. > > If you use table on the counts matrix: > > > table(counts) > counts > 0 1 2 3 > 253 20 8 9 > > > > this shows that there were 20 single tries, 8 files downloaded twice and 9 > three times. Is this what you want? > > You can also get the indices of the non-zero entries by: > > > which(counts != 0, arr.ind=TRUE) > row col > file1 1 1 > file5 6 2 > file1 1 3 > file2 3 3 > file7 8 4 > file8 9 4 > file1 1 5 > file2 3 5 > file2 3 6 > ......... > > > > > On 6/18/07, Matthew Trunnell <[EMAIL PROTECTED]> wrote: > > Jim, > > Thanks for the quick reply! When I run your code, I end up with a > > single barplot of one datapoint, file9 vs email20 == 2.0. I see the > > call to barplot is inside a for loop... maybe it's zooming through the > > display of many barplots, but all I see is the last one? > > > > In any case, I need to figure out the distribution of the retries, such as > > No. Retries Count > > 1 6 > > 2 13 > > 3 5 > > 4 3 > > 5 2 > > 6 1 > > > > That is, 6 people retried the download once; 13 people retried the > > download twice, etc. So it would be counting the frequency of the > > email-filename combination, and grouping those together by the number > > of retries. Does that make sense? > > > > When I look at the counts object from your code, I can see that it's > > close to what I need. How do I access the properties of the counts > > object-- it's a table, right? If I look at counts[1,1], that returns > > 1. But how do I get at the row/col name of that cell? Is that cell > > an object? rownames(counts[1,1]) returns null. > > > > Thanks, > > Matt > > > > > > On 6/18/07, jim holtman <[EMAIL PROTECTED]> wrote: > > > You should be using barplot and not hist. I think this produces what > you > > > want: > > > > > > x <- > "filename,last_modified,email_addr,country_residence > > > > > > file1,3/4/2006 13:54,email1,Korea (South) > > > file2,3/4/2006 14:33,email2,United States > > > file2,3/4/2006 16:03,email2,United States > > > file2,3/4/2006 16:17,email3,United States > > > file2,3/4/2006 16:28,email3,United States > > > file3,3/4/2006 19:13,email4,United States > > > file2,3/4/2006 21:22,email5,India > > > file4,3/4/2006 21:46,email6,United States > > > file1,3/4/2006 22:04,email7,Japan > > > file2,3/4/2006 22:09,email8,Croatia > > > file1,3/4/2006 22:22,email7,Japan > > > file1,3/4/2006 22:29,email9,India > > > file1,3/4/2006 23:06,email6,United States > > > file1,3/4/2006 23:33,email6,United States > > > file5,3/4/2006 23:44,email10,China > > > file1,3/5/2006 0:13,email9,India > > > file2,3/5/2006 0:52,email8,Croatia > > > file2,3/5/2006 0:54,email8,Croatia > > > file2,3/5/2006 1:10,email5,India > > > file6,3/5/2006 2:17,email9,India > > > file2,3/5/2006 2:24,email11,Italy > > > file7,3/5/2006 2:36,email12,Italy > > > file8,3/5/2006 2:52,email12,Italy > > > file2,3/5/2006 3:09,email13,United Kingdom > > > file2,3/5/2006 4:02,email14,India > > > file2,3/5/2006 4:07,email14,India > > > file2,3/5/2006 4:14,email14,India > > > file2,3/5/2006 4:37,email5,India > > > file2,3/5/2006 4:44,email15,Belgium > > > file1,3/5/2006 5:02,email9,India > > > file1,3/5/2006 5:24,email16,Taiwan > > > file2,3/5/2006 6:06,email17,Saudi Arabia > > > file2,3/5/2006 7:32,email17,Saudi Arabia > > > file2,3/5/2006 8:12,email18,Brazil > > > file2,3/5/2006 8:26,email18,Brazil > > > file2,3/5/2006 9:49,email19,United Kingdom > > > file1,3/5/2006 10:49,email11,Italy > > > file1,3/5/2006 11:16,email13,United Kingdom > > > file1,3/5/2006 11:16,email13,United Kingdom > > > file1,3/5/2006 11:45,email13,United Kingdom > > > file1,3/5/2006 14:34,email20,Australia > > > file9,3/5/2006 14:56,email20,Australia > > > file9,3/5/2006 14:56,email20,Australia > > > file5,3/5/2006 16:43,email21,United States > > > file1,3/5/2006 17:17,email7,Japan > > > file2,3/5/2006 17:26,email22,Japan > > > file2,3/5/2006 17:27,email22,Japan > > > file2,3/5/2006 17:33,email23,China > > > file1,3/5/2006 17:45,email22,Japan > > > file2,3/5/2006 17:45,email22,Japan > > > file2,3/5/2006 17:59,email23,China > > > file1,3/5/2006 18:27,email24,Japan > > > file1,3/5/2006 18:47,email25,Taiwan > > > file2,3/5/2006 18:48,email26,New Zealand > > > file2,3/5/2006 19:15,email27,Canada > > > file2,3/5/2006 19:23,email28,Canada > > > file2,3/5/2006 19:24,email28,Canada > > > file10,3/5/2006 19:49,email29,Japan > > > file10,3/5/2006 19:52,email29,Japan > > > file10,3/5/2006 19:57,email29,Japan > > > file2,3/5/2006 20:01,email29,Japan > > > file2,3/5/2006 20:02,email29,Japan > > > file2,3/5/2006 20:06,email29,Japan" > > > d <- read.csv(textConnection(x)) > > > barplot(table(d$filename), main="All Files", las=2) # plot counts for > all > > > the files > > > # generate plots for each file name showing which emails used them > > > counts <- table(d$filename, d$email_addr) > > > for (i in seq(nrow(counts))){ > > > .index <- which(counts[i,] > 0) > > > barplot(counts[i, .index], las=2, > > > names.arg=colnames(counts)[.index], main=rownames(counts)[i]) > > > } > > > > > > > > > > > > On 6/18/07, Matthew Trunnell < [EMAIL PROTECTED]> wrote: > > > > > > > > Hello R gurus, > > > > > > > > I just spent my first weekend wrestling with R, but so far have come > > > > up empty handed. > > > > > > > > I have a dataset that represents file downloads; it has 4 dimensions: > > > > date, filename, email, and country. (sample data below) > > > > > > > > My first goal is to get an idea of the frequency of repeated > > > > downloads. Let me explain that. Some people tend to download > > > > multiple times, e.g. if the download fails they keep trying over and > > > > over. I'm trying to build a histogram that shows the repeat count > > > > along the x-axis, that is, how many people downloaded once, twice, > > > > three times, etc. I plan to compare the median of that before and > > > > after we switched ISPs. > > > > > > > > To accomplish this, I'm assuming that I'll first need to combine the > > > > email and filename columns so as to represent a single download > > > > attempt by an individual. Does that sound right? Later, it would be > > > > nice to limit the histogram to a single filename, country, or company. > > > > I can probably figure that out myself after I understand how to write > > > > this funky histogram expression. > > > > > > > > With the help of Verzani's introductory text, I've learned how to read > > > > in the CSV data and do some simple tables, like this: > > > > > > > > hist(table(d$filename)) > > > > hist(table(d$filename[substring(d$filename, 1, > > > 5)=="file1"])) > > > > hist(sort(table(d$filename[substring(d$filename, 1, > > > 5)=="file1"]))) > > > > > > > > Obviously, these commands count the frequency of the files. What I'd > > > > like to see are the repeats grouped along the x-axis; I'd like to > > > > find, for all files, the distribution of retries. I hope that makes > > > > sense. :) > > > > > > > > Can someone point me in the right direction? I'm very new to R and to > > > > statistics, but I write code for a living. At this point I'd almost > > > > be better off writing a program do this kind of simple counting... but > > > > I have a feeling R would be so useful if I could just get past the > > > > initial learning curve. > > > > > > > > Thank you in advance, > > > > Matt > > > > > > > > Here's some real data, with the private info replaced :) > > > > > > > > d<-read.table > > > > (file="C:\\users\\trunnellm\\downloads\\statistics\\downloads.csv", > > > > sep=",", quote="\"", header=TRUE) > > > > > > > > filename,last_modified,email_addr,country_residence > > > > file1,3/4/2006 13:54,email1,Korea (South) > > > > file2,3/4/2006 14:33,email2,United States > > > > file2,3/4/2006 16:03,email2,United States > > > > file2,3/4/2006 16:17,email3,United States > > > > file2,3/4/2006 16:28,email3,United States > > > > file3,3/4/2006 19:13,email4,United States > > > > file2,3/4/2006 21:22,email5,India > > > > file4,3/4/2006 21:46,email6,United States > > > > file1,3/4/2006 22:04,email7,Japan > > > > file2,3/4/2006 22:09,email8,Croatia > > > > file1,3/4/2006 22:22,email7,Japan > > > > file1,3/4/2006 22:29,email9,India > > > > file1,3/4/2006 23:06,email6,United States > > > > file1,3/4/2006 23:33,email6,United States > > > > file5,3/4/2006 23:44,email10,China > > > > file1,3/5/2006 0:13,email9,India > > > > file2,3/5/2006 0:52,email8,Croatia > > > > file2,3/5/2006 0:54,email8,Croatia > > > > file2,3/5/2006 1:10,email5,India > > > > file6,3/5/2006 2:17,email9,India > > > > file2,3/5/2006 2:24,email11,Italy > > > > file7,3/5/2006 2:36,email12,Italy > > > > file8,3/5/2006 2:52,email12,Italy > > > > file2,3/5/2006 3:09,email13,United Kingdom > > > > file2,3/5/2006 4:02,email14,India > > > > file2,3/5/2006 4:07,email14,India > > > > file2,3/5/2006 4:14,email14,India > > > > file2,3/5/2006 4:37,email5,India > > > > file2,3/5/2006 4:44,email15,Belgium > > > > file1,3/5/2006 5:02,email9,India > > > > file1,3/5/2006 5:24,email16,Taiwan > > > > file2,3/5/2006 6:06,email17,Saudi Arabia > > > > file2,3/5/2006 7:32,email17,Saudi Arabia > > > > file2,3/5/2006 8:12,email18,Brazil > > > > file2,3/5/2006 8:26,email18,Brazil > > > > file2,3/5/2006 9:49,email19,United Kingdom > > > > file1,3/5/2006 10:49,email11,Italy > > > > file1,3/5/2006 11:16,email13,United Kingdom > > > > file1,3/5/2006 11:16,email13,United Kingdom > > > > file1,3/5/2006 11:45,email13,United Kingdom > > > > file1,3/5/2006 14:34,email20,Australia > > > > file9,3/5/2006 14:56,email20,Australia > > > > file9,3/5/2006 14:56,email20,Australia > > > > file5,3/5/2006 16:43,email21,United States > > > > file1,3/5/2006 17:17,email7,Japan > > > > file2,3/5/2006 17:26,email22,Japan > > > > file2,3/5/2006 17:27,email22,Japan > > > > file2,3/5/2006 17:33,email23,China > > > > file1,3/5/2006 17:45,email22,Japan > > > > file2,3/5/2006 17:45,email22,Japan > > > > file2,3/5/2006 17:59,email23,China > > > > file1,3/5/2006 18:27,email24,Japan > > > > file1,3/5/2006 18:47,email25,Taiwan > > > > file2,3/5/2006 18:48,email26,New Zealand > > > > file2,3/5/2006 19:15,email27,Canada > > > > file2,3/5/2006 19:23,email28,Canada > > > > file2,3/5/2006 19:24,email28,Canada > > > > file10,3/5/2006 19:49,email29,Japan > > > > file10,3/5/2006 19:52,email29,Japan > > > > file10,3/5/2006 19:57,email29,Japan > > > > file2,3/5/2006 20:01,email29,Japan > > > > file2,3/5/2006 20:02,email29,Japan > > > > file2,3/5/2006 20:06,email29,Japan > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > > > -- > > > Jim Holtman > > > Cincinnati, OH > > > +1 513 646 9390 > > > > > > What is the problem you are trying to solve? > > > > > > -- > > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.