Hello,
I have logging information for multiple machines, which I am trying to
summarize and graph. So far, I process each host individually, but I would
like to summarize the user count across multiple hosts. I want to answer the
question "how many unique users logged in on a certain day across a group of
machines"?
I'm not quite sure how to scale the data frame and analysis to summarize
multiple hosts, though. I'm still getting a feel for using R.
Here is a snippet of data for one host. the user_count column is generated
from the users column using my custom function "usercount()". the samples
are taken roughly once per minute and only unique samples are recorded.
(i.e. use na.locf() to uncompress the data). Samples may occur twice in the
same minute and are rarely aligned on the same time.
Here is the original data before I turn t into a zoo series and run
na.locf() over it so I can aggregate a single host by day. I'm open to a
better way.
foo
users datetime user_count
1 user1& user2 2007-03-29 19:16:30 2
2 user1& user2 2007-03-31 00:04:46 2
3 user1& user2 2007-04-02 11:49:20 2
4 user1& user2 2007-04-02 12:02:04 2
5 user1& user2 2007-04-02 12:44:02 2
6 user1& user2& user3 2007-04-02 16:34:05 3
dput(foo)
structure(list(users = c("user1& user2", "user1& user2", "user1& user2",
"user1& user2", "user1& user2", "user1& user2& user3"), datetime =
structure(c(1175210190,
1175313886, 1175528960, 1175529724, 1175532242, 1175546045), class =
c("POSIXt",
"POSIXct"), tzone = "US/Eastern"), user_count = c(2, 2, 2, 2,
2, 3)), .Names = c("users", "datetime", "user_count"), row.names = c(NA,
6L), class = "data.frame")
Thanks,
Jason
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.