Re: [R] Histograms with strings, grouped by repeat count (w/ data)

jim holtman Mon, 18 Jun 2007 18:44:29 -0700

You should be using barplot and not hist.  I think this produces what you
want:


x <- "filename,last_modified,email_addr,country_residence
file1,3/4/2006 13:54,email1,Korea (South)
file2,3/4/2006 14:33,email2,United States
file2,3/4/2006 16:03,email2,United States
file2,3/4/2006 16:17,email3,United States
file2,3/4/2006 16:28,email3,United States
file3,3/4/2006 19:13,email4,United States
file2,3/4/2006 21:22,email5,India
file4,3/4/2006 21:46,email6,United States
file1,3/4/2006 22:04,email7,Japan
file2,3/4/2006 22:09,email8,Croatia
file1,3/4/2006 22:22,email7,Japan
file1,3/4/2006 22:29,email9,India
file1,3/4/2006 23:06,email6,United States
file1,3/4/2006 23:33,email6,United States
file5,3/4/2006 23:44,email10,China
file1,3/5/2006 0:13,email9,India
file2,3/5/2006 0:52,email8,Croatia
file2,3/5/2006 0:54,email8,Croatia
file2,3/5/2006 1:10,email5,India
file6,3/5/2006 2:17,email9,India
file2,3/5/2006 2:24,email11,Italy
file7,3/5/2006 2:36,email12,Italy
file8,3/5/2006 2:52,email12,Italy
file2,3/5/2006 3:09,email13,United Kingdom
file2,3/5/2006 4:02,email14,India
file2,3/5/2006 4:07,email14,India
file2,3/5/2006 4:14,email14,India
file2,3/5/2006 4:37,email5,India
file2,3/5/2006 4:44,email15,Belgium
file1,3/5/2006 5:02,email9,India
file1,3/5/2006 5:24,email16,Taiwan
file2,3/5/2006 6:06,email17,Saudi Arabia
file2,3/5/2006 7:32,email17,Saudi Arabia
file2,3/5/2006 8:12,email18,Brazil
file2,3/5/2006 8:26,email18,Brazil
file2,3/5/2006 9:49,email19,United Kingdom
file1,3/5/2006 10:49,email11,Italy
file1,3/5/2006 11:16,email13,United Kingdom
file1,3/5/2006 11:16,email13,United Kingdom
file1,3/5/2006 11:45,email13,United Kingdom
file1,3/5/2006 14:34,email20,Australia
file9,3/5/2006 14:56,email20,Australia
file9,3/5/2006 14:56,email20,Australia
file5,3/5/2006 16:43,email21,United States
file1,3/5/2006 17:17,email7,Japan
file2,3/5/2006 17:26,email22,Japan
file2,3/5/2006 17:27,email22,Japan
file2,3/5/2006 17:33,email23,China
file1,3/5/2006 17:45,email22,Japan
file2,3/5/2006 17:45,email22,Japan
file2,3/5/2006 17:59,email23,China
file1,3/5/2006 18:27,email24,Japan
file1,3/5/2006 18:47,email25,Taiwan
file2,3/5/2006 18:48,email26,New Zealand
file2,3/5/2006 19:15,email27,Canada
file2,3/5/2006 19:23,email28,Canada
file2,3/5/2006 19:24,email28,Canada
file10,3/5/2006 19:49,email29,Japan
file10,3/5/2006 19:52,email29,Japan
file10,3/5/2006 19:57,email29,Japan
file2,3/5/2006 20:01,email29,Japan
file2,3/5/2006 20:02,email29,Japan
file2,3/5/2006 20:06,email29,Japan"
d <- read.csv(textConnection(x))
barplot(table(d$filename), main="All Files", las=2)  # plot counts for all
the files
# generate plots for each file name showing which emails used them
counts <- table(d$filename, d$email_addr)
for (i in seq(nrow(counts))){
    .index <- which(counts[i,] > 0)
    barplot(counts[i, .index], las=2,
        names.arg=colnames(counts)[.index], main=rownames(counts)[i])
}



On 6/18/07, Matthew Trunnell <[EMAIL PROTECTED]> wrote:
>
> Hello R gurus,
>
> I just spent my first weekend wrestling with R, but so far have come
> up empty handed.
>
> I have a dataset that represents file downloads; it has 4 dimensions:
> date, filename, email, and country.  (sample data below)
>
> My first goal is to get an idea of the frequency of repeated
> downloads.  Let me explain that.  Some people tend to download
> multiple times, e.g. if the download fails they keep trying over and
> over.  I'm trying to build a histogram that shows the repeat count
> along the x-axis, that is, how many people downloaded once, twice,
> three times, etc.  I plan to compare the median of that before and
> after we switched ISPs.
>
> To accomplish this, I'm assuming that I'll first need to combine the
> email and filename columns so as to represent a single download
> attempt by an individual.  Does that sound right?  Later, it would be
> nice to limit the histogram to a single filename, country, or company.
> I can probably figure that out myself after I understand how to write
> this funky histogram expression.
>
> With the help of Verzani's introductory text, I've learned how to read
> in the CSV data and do some simple tables, like this:
>
> hist(table(d$filename))
> hist(table(d$filename[substring(d$filename, 1, 5)=="file1"]))
> hist(sort(table(d$filename[substring(d$filename, 1, 5)=="file1"])))
>
> Obviously, these commands count the frequency of the files.  What I'd
> like to see are the repeats grouped along the x-axis;  I'd like to
> find, for all files, the distribution of retries.  I hope that makes
> sense. :)
>
> Can someone point me in the right direction?  I'm very new to R and to
> statistics, but I write code for a living.  At this point I'd almost
> be better off writing a program do this kind of simple counting... but
> I have a feeling R would be so useful if I could just get past the
> initial learning curve.
>
> Thank you in advance,
> Matt
>
> Here's some real data, with the private info replaced :)
>
> d<-read.table
> (file="C:\\users\\trunnellm\\downloads\\statistics\\downloads.csv",
> sep=",", quote="\"", header=TRUE)
>
> filename,last_modified,email_addr,country_residence
> file1,3/4/2006 13:54,email1,Korea (South)
> file2,3/4/2006 14:33,email2,United States
> file2,3/4/2006 16:03,email2,United States
> file2,3/4/2006 16:17,email3,United States
> file2,3/4/2006 16:28,email3,United States
> file3,3/4/2006 19:13,email4,United States
> file2,3/4/2006 21:22,email5,India
> file4,3/4/2006 21:46,email6,United States
> file1,3/4/2006 22:04,email7,Japan
> file2,3/4/2006 22:09,email8,Croatia
> file1,3/4/2006 22:22,email7,Japan
> file1,3/4/2006 22:29,email9,India
> file1,3/4/2006 23:06,email6,United States
> file1,3/4/2006 23:33,email6,United States
> file5,3/4/2006 23:44,email10,China
> file1,3/5/2006 0:13,email9,India
> file2,3/5/2006 0:52,email8,Croatia
> file2,3/5/2006 0:54,email8,Croatia
> file2,3/5/2006 1:10,email5,India
> file6,3/5/2006 2:17,email9,India
> file2,3/5/2006 2:24,email11,Italy
> file7,3/5/2006 2:36,email12,Italy
> file8,3/5/2006 2:52,email12,Italy
> file2,3/5/2006 3:09,email13,United Kingdom
> file2,3/5/2006 4:02,email14,India
> file2,3/5/2006 4:07,email14,India
> file2,3/5/2006 4:14,email14,India
> file2,3/5/2006 4:37,email5,India
> file2,3/5/2006 4:44,email15,Belgium
> file1,3/5/2006 5:02,email9,India
> file1,3/5/2006 5:24,email16,Taiwan
> file2,3/5/2006 6:06,email17,Saudi Arabia
> file2,3/5/2006 7:32,email17,Saudi Arabia
> file2,3/5/2006 8:12,email18,Brazil
> file2,3/5/2006 8:26,email18,Brazil
> file2,3/5/2006 9:49,email19,United Kingdom
> file1,3/5/2006 10:49,email11,Italy
> file1,3/5/2006 11:16,email13,United Kingdom
> file1,3/5/2006 11:16,email13,United Kingdom
> file1,3/5/2006 11:45,email13,United Kingdom
> file1,3/5/2006 14:34,email20,Australia
> file9,3/5/2006 14:56,email20,Australia
> file9,3/5/2006 14:56,email20,Australia
> file5,3/5/2006 16:43,email21,United States
> file1,3/5/2006 17:17,email7,Japan
> file2,3/5/2006 17:26,email22,Japan
> file2,3/5/2006 17:27,email22,Japan
> file2,3/5/2006 17:33,email23,China
> file1,3/5/2006 17:45,email22,Japan
> file2,3/5/2006 17:45,email22,Japan
> file2,3/5/2006 17:59,email23,China
> file1,3/5/2006 18:27,email24,Japan
> file1,3/5/2006 18:47,email25,Taiwan
> file2,3/5/2006 18:48,email26,New Zealand
> file2,3/5/2006 19:15,email27,Canada
> file2,3/5/2006 19:23,email28,Canada
> file2,3/5/2006 19:24,email28,Canada
> file10,3/5/2006 19:49,email29,Japan
> file10,3/5/2006 19:52,email29,Japan
> file10,3/5/2006 19:57,email29,Japan
> file2,3/5/2006 20:01,email29,Japan
> file2,3/5/2006 20:02,email29,Japan
> file2,3/5/2006 20:06,email29,Japan
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Histograms with strings, grouped by repeat count (w/ data)

Reply via email to