Hi Adam, You can inspire from the requests performed by rbh-report (it displays them when running it with -l FULL).
It can sort directories using various criteria (count, avg file size in the directory, etc...) rbh-report --topdirs [--by-count | --by-avg-size] [--count-min xxx] -l FULL Maybe you will find interesting things for your need - or if you have a better request, you can suggest it for rbh-report. Let me now when your multifs interface is almost ready. I'll be happy to test it (even with rbh2.4). Regards, Thomas On 12/12/13 01:57, Adam Brenner wrote: > After some head banging and going outside (too bright), I found a much > simpler, and quicker solution! > > Rather than traversing the filesystem from the head node, down. I took > advantage of the type column in the database and grabbed every user's > directory. From that point, I simply did a lookup on the index > parent_id where each users's directory was the key. A simple MySQL > COUNT() and SUM() did the trick. > > To put this into a human time perspective, 8+ hours, to ~3+ minutes. > > -Adam > > -- > Adam Brenner > Computer Science, Undergraduate Student > Donald Bren School of Information and Computer Sciences > > Research Computing Support > Office of Information Technology > http://www.oit.uci.edu/rcs/ > > University of California, Irvine > www.ics.uci.edu/~aebrenne/ > [email protected] > > > On Wed, Dec 11, 2013 at 2:25 PM, Adam Brenner <[email protected]> wrote: >> Howdy All, >> >> I am building a tool that uses RBH's database scheme to traverse our >> scanned filesystem and report the users, along with all the folders >> that contain over N amount of entries in them (1000 for example). In >> addition, we will be reporting on old file usage based on mtime (we do >> not support atime on our filesystem). These different reports will >> then automatically email the user, us, etc. about it. >> >> We have 75 million entires in total spread across a few RBH databases >> (one database for each filesystem). >> >> >> >> The part I am having trouble with, is _efficiently_ starting from the >> lowest parent_id (6BCDE980:2), the root of the filesystem, and >> traversing all the way to the lowest node on the graph. Effectlying >> saying, given this parent_id of 6BCDE980:2, output all of its >> children, and their children until no more children exists. >> >> I believe part of the issue is perhaps my lack of understanding self >> referencing databases, where id of one entry, is a parent_id of >> another entry. I do not believe MySQL can handle this sort of data >> structure. Is this correct? Have others done something similar, >> traversing the RBH scheme from the root node to the bottom? >> >> Below is the pseudocode that I have written which I is the brute force >> / naive solution for RBH 2.4.3 SCHEME. Any sort of comments will be >> greatly appreciated. >> >> function wrapper (string startPath) { >> >> query := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES AS >> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE >> grp.fullpath = 'startPath' AND usr.fullpath != 'startPath' >> // ^^ Produces one level depth of children on the parent_id of startPath >> >> foreach result in query { >> find1000Entries(result) >> output what is on the stack and email users >> } >> >> } >> >> function find1000Entries (string startPath) { >> >> query := SELECT COUNT(usr.fullpath) AS zotCount FROM ENTRIES AS >> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE >> grp.fullpath = 'startPath' >> // ^^ Produces one level depth of children on the parent_id of startPath >> >> if(zotCount from query >= 1000) >> report a match and push to stack >> >> queryNext := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES >> AS grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE >> grp.fullpath = 'startPath' AND usr.type = 'dir' >> // ^^ Produces one level depth of children on the parent_id of startPath >> // that are only directories >> >> foreach result in queryNext { >> find1000Entries(result) >> ^^^ Call this function again, but now on directory given by >> queryNext >> ^^^ keep doing this till no more entires exist >> >> } >> >> } >> >> -- >> Adam Brenner >> Computer Science, Undergraduate Student >> Donald Bren School of Information and Computer Sciences >> >> Research Computing Support >> Office of Information Technology >> http://www.oit.uci.edu/rcs/ >> >> University of California, Irvine >> www.ics.uci.edu/~aebrenne/ >> [email protected] > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > _______________________________________________ > robinhood-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/robinhood-devel ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ robinhood-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/robinhood-support
