Re: [robinhood-support] [robinhood-devel] 2.5 Database Schema - Externel Tool - Traverse Filesystem

LEIBOVICI Thomas Thu, 12 Dec 2013 07:52:25 -0800

Hi Adam,

You can inspire from the requests performed by rbh-report
(it displays them when running it with -l FULL).


It can sort directories using various criteria (count, avg file size in 
the directory, etc...)
rbh-report --topdirs [--by-count | --by-avg-size] [--count-min xxx] -l FULL

Maybe you will find interesting things for your need - or if you have a 
better request, you can  suggest it for rbh-report.

Let me now when your multifs interface is almost ready. I'll be happy to 
test it (even with rbh2.4).

Regards,
Thomas


On 12/12/13 01:57, Adam Brenner wrote:
> After some head banging and going outside (too bright), I found a much
> simpler, and quicker solution!
>
> Rather than traversing the filesystem from the head node, down. I took
> advantage of the type column in the database and grabbed every user's
> directory. From that point, I simply did a lookup on the index
> parent_id where each users's directory was the key. A simple MySQL
> COUNT() and SUM() did the trick.
>
> To put this into a human time perspective, 8+ hours, to ~3+ minutes.
>
> -Adam
>
> --
> Adam Brenner
> Computer Science, Undergraduate Student
> Donald Bren School of Information and Computer Sciences
>
> Research Computing Support
> Office of Information Technology
> http://www.oit.uci.edu/rcs/
>
> University of California, Irvine
> www.ics.uci.edu/~aebrenne/
> [email protected]
>
>
> On Wed, Dec 11, 2013 at 2:25 PM, Adam Brenner <[email protected]> wrote:
>> Howdy All,
>>
>> I am building a tool that uses RBH's database scheme to traverse our
>> scanned filesystem and report the users, along with all the folders
>> that contain over N amount of entries in them (1000 for example). In
>> addition, we will be reporting on old file usage based on mtime (we do
>> not support atime on our filesystem). These different reports will
>> then automatically email the user, us, etc. about it.
>>
>> We have 75 million entires in total spread across a few RBH databases
>> (one database for each filesystem).
>>
>>
>>
>> The part I am having trouble with, is _efficiently_ starting from the
>> lowest parent_id (6BCDE980:2), the root of the filesystem, and
>> traversing all the way to the lowest node on the graph. Effectlying
>> saying, given this parent_id of 6BCDE980:2, output all of its
>> children, and their children until no more children exists.
>>
>> I believe part of the issue is perhaps my lack of understanding self
>> referencing databases, where id of one entry, is a parent_id of
>> another entry. I do not believe MySQL can handle this sort of data
>> structure. Is this correct? Have others done something similar,
>> traversing the RBH scheme from the root node to the bottom?
>>
>> Below is the pseudocode that I have written which I is the brute force
>> / naive solution for RBH 2.4.3 SCHEME. Any sort of comments will be
>> greatly appreciated.
>>
>> function wrapper (string startPath) {
>>
>>      query := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES AS
>> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
>> grp.fullpath = 'startPath' AND usr.fullpath != 'startPath'
>>      // ^^ Produces one level depth of children on the parent_id of startPath
>>
>>      foreach result in query {
>>          find1000Entries(result)
>>          output what is on the stack and email users
>>      }
>>
>> }
>>
>> function find1000Entries (string startPath) {
>>
>>      query := SELECT COUNT(usr.fullpath) AS zotCount FROM ENTRIES AS
>> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
>> grp.fullpath = 'startPath'
>>      // ^^ Produces one level depth of children on the parent_id of startPath
>>
>>      if(zotCount from query >= 1000)
>>          report a match and push to stack
>>
>>      queryNext := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES
>> AS grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
>> grp.fullpath = 'startPath' AND usr.type = 'dir'
>>      // ^^ Produces one level depth of children on the parent_id of startPath
>>      // that are only directories
>>
>>      foreach result in queryNext {
>>          find1000Entries(result)
>>          ^^^ Call this function again, but now on directory given by 
>> queryNext
>>          ^^^ keep doing this till no more entires exist
>>
>>      }
>>
>> }
>>
>> --
>> Adam Brenner
>> Computer Science, Undergraduate Student
>> Donald Bren School of Information and Computer Sciences
>>
>> Research Computing Support
>> Office of Information Technology
>> http://www.oit.uci.edu/rcs/
>>
>> University of California, Irvine
>> www.ics.uci.edu/~aebrenne/
>> [email protected]
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> robinhood-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/robinhood-devel


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Re: [robinhood-support] [robinhood-devel] 2.5 Database Schema - Externel Tool - Traverse Filesystem

Reply via email to