Re: [R] scaling to multiple data files

Jason Edgecombe Tue, 11 Jan 2011 12:37:23 -0800

That's correct, those users have been logged in or had processes runningon this machine for four days. The machines in question are time-sharingLinux servers for college students and professors to use. multi-day jobsare common.

The "last" command does what you suggest, but it doesn't captureprocesses left running in the background when a user logs out. Thisdata is simpler and collected across Windows and Linux hosts. Sessionsare somewhat ambiguous. We just care who is running processes on amachine at a certain time. We also collect the process name of processesthat each user is running so that we can gauge how often applicationsare being used, and by whom. For this analysis, I'm not worried aboutwhich processes are running, only unique users per day. i have almostfour years of historical data for some machines in this format.

We have multiple tools written in different languages that parse thisdata. I'm writing one that does better graphing.


On 01/11/2011 11:39 AM, jim holtman wrote:

I am not sure exactly what your data represents.  For example, from
looking at the data it appears that user1 and user2 have been logged
on for about 4 days; is that what the data is saying?  If you are
keeping track of users, why not write out a file that has the
start/end time for each user's session.  The first time you see them,
put an entry in a table and as soon as they don't show up in your
sample, write out a record for them.  With that information is it easy
to create a report of the number of unique people over time.

On Tue, Jan 11, 2011 at 10:47 AM, Jason Edgecombe
<ja...@rampaginggeek.com>  wrote:

Hello,

I have logging information for multiple machines, which I am trying to
summarize and graph. So far, I process each host individually, but I would
like to summarize the user count across multiple hosts. I want to answer the
question "how many unique users logged in on a certain day across a group of
machines"?

I'm not quite sure how to scale the data frame and analysis to summarize
multiple hosts, though. I'm still getting a feel for using R.

Here is a snippet of data for one host. the user_count column is generated
from the users column using my custom function "usercount()". the samples
are taken roughly once per minute and only unique samples are recorded.
(i.e. use na.locf() to uncompress the data). Samples may occur twice in the
same minute and are rarely aligned on the same time.

Here is the original data before I turn t into a zoo series and run
na.locf() over it so I can aggregate a single host by day. I'm open to a
better way.

foo

                  users            datetime user_count
1         user1&  user2 2007-03-29 19:16:30          2
2         user1&  user2 2007-03-31 00:04:46          2
3         user1&  user2 2007-04-02 11:49:20          2
4         user1&  user2 2007-04-02 12:02:04          2
5         user1&  user2 2007-04-02 12:44:02          2
6 user1&  user2&  user3 2007-04-02 16:34:05          3

dput(foo)

structure(list(users = c("user1&  user2", "user1&  user2", "user1&  user2",
"user1&  user2", "user1&  user2", "user1&  user2&  user3"), datetime =
structure(c(1175210190,
1175313886, 1175528960, 1175529724, 1175532242, 1175546045), class =
c("POSIXt",
"POSIXct"), tzone = "US/Eastern"), user_count = c(2, 2, 2, 2,
2, 3)), .Names = c("users", "datetime", "user_count"), row.names = c(NA,
6L), class = "data.frame")


Thanks,
Jason

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scaling to multiple data files

Reply via email to