We have a server hit by millions of users. Sever log files contains
the user ids of all of them. How do we find the frequency of login of
each user. What will the most efficient way to store the users, and
access them to find their frequency(The log files are very huge!!)

I thought of using B+ tree indexing with user ids as the key. Leaf
nodes will have the pointers to bucket of user ids. One item of bucket
will contain user id and frequency of this user.
  For insertion, search complexity will be ~O(logn)

Any potential problem with approach? Are there any better approach to
tackle this problem?

Not problems really, but opportunities. Here're a few things:
1. They're append-only (no inserts, no deletes). Why then have a B+ tree on
the log file itself, which is a structure optimized for both retrieval and
inserts?
2. If all that you want to query is the frequency of login of each user, one
might ask the following questions:
    i) How often do you want this frequency data? Is the data required to be
updated in real-time?
  ii) Is it okay to query this data each time your log's rotated?
If you require the data as and when it's created (meaning as soon as it is
logged), it may be a good idea to change the application receiving the call
to explicitly update the frequency, in say, a database.

On the other hand, if you want it periodically, you can simply run a
script/program that takes the "old" size of the log file as input, and
begins scanning the log from that offset, thus saving you time.
The frequency data itself can be stored in a hash file / b+ tree or whatever
else you'd like (so that it can be updated).

If you still wish to index your log files, search for Write-Once Read-Many
(WORM) data structures on the web.

Apologies for the non-algorithmic nature of my response, if it isn't apt for
the group.

On Wed, Dec 9, 2009 at 3:54 PM, Rohit Saraf <rohit.kumar.sa...@gmail.com>wrote:

> if you don't know abt fibonacci heaps then check out
> http://en.wikipedia.org/wiki/Fibonacci_heap
>
>
>
>
>
> Rohit Saraf
> Sophomore
> Computer Science and Engineering
> IIT Bombay
>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algoge...@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>

--

You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algoge...@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.


Reply via email to