We have a server hit by millions of users. Sever log files contains the user ids of all of them. How do we find the frequency of login of each user. What will the most efficient way to store the users, and access them to find their frequency(The log files are very huge!!)
I thought of using B+ tree indexing with user ids as the key. Leaf nodes will have the pointers to bucket of user ids. One item of bucket will contain user id and frequency of this user. For insertion, search complexity will be ~O(logn) Any potential problem with approach? Are there any better approach to tackle this problem? Not problems really, but opportunities. Here're a few things: 1. They're append-only (no inserts, no deletes). Why then have a B+ tree on the log file itself, which is a structure optimized for both retrieval and inserts? 2. If all that you want to query is the frequency of login of each user, one might ask the following questions: i) How often do you want this frequency data? Is the data required to be updated in real-time? ii) Is it okay to query this data each time your log's rotated? If you require the data as and when it's created (meaning as soon as it is logged), it may be a good idea to change the application receiving the call to explicitly update the frequency, in say, a database. On the other hand, if you want it periodically, you can simply run a script/program that takes the "old" size of the log file as input, and begins scanning the log from that offset, thus saving you time. The frequency data itself can be stored in a hash file / b+ tree or whatever else you'd like (so that it can be updated). If you still wish to index your log files, search for Write-Once Read-Many (WORM) data structures on the web. Apologies for the non-algorithmic nature of my response, if it isn't apt for the group. On Wed, Dec 9, 2009 at 3:54 PM, Rohit Saraf <rohit.kumar.sa...@gmail.com>wrote: > if you don't know abt fibonacci heaps then check out > http://en.wikipedia.org/wiki/Fibonacci_heap > > > > > > Rohit Saraf > Sophomore > Computer Science and Engineering > IIT Bombay > > > > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algoge...@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algoge...@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.