On 01/16/14 22:02, Brock Palen wrote: > Actually most the scan I see stats like this: > > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | ==== > EntryProcessor Pipeline Stats === > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | Threads waiting: 7 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | Id constraints > count: 10000 (hash min=0/max=6/avg=1.3) > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 0: STAGE_GET_FID > | Wait: 0 | Curr: 0 | Done: 0 | Total: 0 | ms/op: 0.00 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 1: > STAGE_GET_INFO_DB | Wait: 0 | Curr: 0 | Done: 0 | Total: 122707382 > | ms/op: 0.69 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 2: > STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 122707382 > | ms/op: 0.10 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 3: > STAGE_REPORTING | Wait: 0 | Curr: 0 | Done: 0 | Total: 122707382 > | ms/op: 0.00 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 4: > STAGE_PRE_APPLY | Wait: 0 | Curr: 0 | Done: 0 | Total: 122707382 > | ms/op: 0.00 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 5: STAGE_DB_APPLY > | Wait: 9391 | Curr: 1 | Done: 0 | Total: 122697382, batches: > 3565010 (avg size: 33) | ms/op: 0.46 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | 6: > STAGE_RM_OLD_ENTRIES | Wait: 0 | Curr: 0 | Done: 0 | Total: 0 | > ms/op: 0.00 > 2014/01/14 09:18:02 robinhood@flux-xfer1[22595/1]: STATS | DB ops: > get=105370281/ins=13064956/upd=109632426/rm=0 > > Should I still change the thread count? As it looks the DB is limiting here, you will have no gain by changing the number of scan threads.
Regards, Thomas > > The end reported results were: > > [root@flux-xfer1 scratch]# rbh-report-scr -a > > Filesystem scan activity: > > Current scan interval: 7.0d > > Previous filesystem scan: > start: 2014/01/13 16:25:12 > duration: 07min 21s > > Last filesystem scan: > status: done > start: 2014/01/13 16:33:01 > end: 2014/01/14 09:41:56 > duration: 17h 08min 55s > > Statistics: > entries scanned: 123692129 > errors: 2 > timeouts: 0 > # threads: 8 > average speed: 2030.46 entries/sec > > Which is good enough for us. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > [email protected] > (734)936-1985 > > > > On Jan 15, 2014, at 3:02 AM, LEIBOVICI Thomas <[email protected]> wrote: > >> That's good if you get the performance you expect with MyISAM. >> There is no problem using MyISAM, except its transaction management is >> weaker, >> so you have a higher risk of corrupting the DB or creating inconsistencies >> in case of DB host crash. >> >> As you can see, the DB is not the limiting factor in you case (the pipeline >> is empty). >> For the next scans, you can try descreasing the number of scan threads (you >> currently use 8): >> In some case, I noticed a FS client saturation with so many threads, >> so you might get better performance with just 2 or 4 scan threads. >> >> Regarding InnoDB, I'm still surprised about the so low performance. >> If one day you give it another try, I would be interested in getting the >> pipeline statistics, >> and the output of http://mysqltuner.pl after some scan time, to analyze DB >> stats. >> >> Regards, >> Thomas >> >> On 01/13/14 23:12, Brock Palen wrote: >>> Oh also current thread stats from the log, interesting nothing is waiting: >>> >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | ======== FS scan >>> statistics ========= >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | current scan >>> interval = 7.0d >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | scan is running: >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | started at >>> : 2014/01/13 16:33:01 (32.9min ago) >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | last >>> action: 2014/01/13 17:05:52 (01s ago) >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | progress >>> : 5214490 entries scanned (2 errors) >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | avg. speed >>> : 3.02 ms/entry/thread -> 2647.49 entries/sec >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | inst. >>> speed: 4.68 ms/entry/thread -> 1710.82 entries/sec >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | ==== >>> EntryProcessor Pipeline Stats === >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | Threads waiting: >>> 8 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | Id constraints >>> count: 0 (hash min=0/max=0/avg=0.0) >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 0: >>> STAGE_GET_FID | Wait: 0 | Curr: 0 | Done: 0 | Total: 0 >>> | ms/op: 0.00 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 1: >>> STAGE_GET_INFO_DB | Wait: 0 | Curr: 0 | Done: 0 | Total: 5219385 >>> | ms/op: 0.59 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 2: >>> STAGE_GET_INFO_FS | Wait: 0 | Curr: 0 | Done: 0 | Total: 5219385 >>> | ms/op: 0.01 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 3: >>> STAGE_REPORTING | Wait: 0 | Curr: 0 | Done: 0 | Total: 5219385 >>> | ms/op: 0.00 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 4: >>> STAGE_PRE_APPLY | Wait: 0 | Curr: 0 | Done: 0 | Total: 5219385 >>> | ms/op: 0.00 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 5: >>> STAGE_DB_APPLY | Wait: 0 | Curr: 0 | Done: 0 | Total: >>> 5219385, batches: 95027 (avg size: 52) | ms/op: 0.33 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | 6: >>> STAGE_RM_OLD_ENTRIES | Wait: 0 | Curr: 0 | Done: 0 | Total: 0 >>> | ms/op: 0.00 >>> 2014/01/13 17:05:53 robinhood@flux-xfer1[22595/2]: STATS | DB ops: >>> get=5172474/ins=39076/upd=5180309/rm=0 >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> CAEN Advanced Computing >>> XSEDE Campus Champion >>> [email protected] >>> (734)936-1985 >>> >>> >>> >>> On Jan 10, 2014, at 4:27 AM, LEIBOVICI Thomas <[email protected]> >>> wrote: >>> >>>> Sorry Brock, I noticed an issue with the lastest code you got, that result >>>> in setting a wrong config default for entry processor. >>>> It only impacts you if you have no EntryProcessor block at all in your >>>> config file. >>>> A simple workaround is to define an empty block. >>>> >>>> Or if you want, the fix is available in git (commit >>>> 33cfa01416d7d55a275a35a34fab2beed66f7dc8). >>>> >>>> I believe that 2.5 may be slower that 2.3, as it introduce new information >>>> in the DB. >>>> However I'm still surprised it is 2-3x slower. >>>> >>>> I understood you tried both MyISAM and InnoDB with rbh 2.5. >>>> What performance difference do you get between the 2 engines? >>>> >>>> Regards, >>>> Thomas >>>> >>>> >>>> On 01/09/14 22:45, Brock Palen wrote: >>>>> Thomas, >>>>> >>>>> I tried the new build, while it was slightly faster (200/sec rather than >>>>> 50/sec) and the errors from the current 2.5.0 beta were resolved, I >>>>> enabled innodb=disabled, >>>>> >>>>> And I am getting still lower performance than 2.3 by about 2-3x but it >>>>> is still 2000-3000/second and this will work for us. >>>>> >>>>> Brock Palen >>>>> www.umich.edu/~brockp >>>>> CAEN Advanced Computing >>>>> XSEDE Campus Champion >>>>> [email protected] >>>>> (734)936-1985 >>>>> >>>>> >>>>> >>>>> On Jan 9, 2014, at 10:31 AM, LEIBOVICI Thomas <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Brock, >>>>>> >>>>>> On 01/09/14 15:49, Brock Palen wrote: >>>>>>> Adam, >>>>>>> >>>>>>> I tried adding innodb=disabled; >>>>>>> >>>>>>> To ListManager but 2.5.0 beta barfs on it: >>>>>>> >>>>>>> Config Check | WARNING: unknown parameter 'innodb' in block >>>>>>> 'ListManager' line 74 >>>>>> This parameter must be specified in the MySQL subblock (not directly in >>>>>> ListManager), as is it MySQL related: >>>>>> >>>>>> ListManager { >>>>>> ... >>>>>> MySQL { >>>>>> ... >>>>>> innodb = disabled ; >>>>>> } >>>>>> } >>>>>> >>>>>>> I sadly found out that due to a mistake on my part 2.5.0 was built >>>>>>> without lustre support so no stripe info was being collected, >>>>>>> >>>>>>> When I rebuilt performance was only 50entires/ms :-( Scans never >>>>>>> finish, you can't run qurries. >>>>>>> >>>>>>> My solution for now is to roll all the way back to 2.3.x The last >>>>>>> version of robinhood that worked at full speed for us. We will look at >>>>>>> thi sall again when we have lustre + changelogs. It is sad because we >>>>>>> really wanted the post scan script hook, we are working on a system >>>>>>> where we use squop to pull all the data from robinhood into hive/pig >>>>>>> and then run a bunch of stats on our system over time. Problem is >>>>>>> robinhood isn't fast enough on the hardware I have available (7200RPM >>>>>>> sata). >>>>>> I suggest you give a try to the latest version of code from the git >>>>>> repository. >>>>>> If you run it, drop any previous tuning in the EntryProcessor block. You >>>>>> can also re-enable acct_* parameters. >>>>>> >>>>>> With the current code, we get a nice scan speed (6.7M entries per hour >>>>>> ~1800/sec) and no longer state deadlock issues. >>>>>> It reached a changelog processing speed of 36k/sec without any sign of >>>>>> saturation. >>>>>> The DB is also on a local spinning disk and we use the innodb engine. >>>>>> >>>>>> Here are the tunings in /etc/my.cnf. >>>>>> We recently found that "innodb_log_file_size" has a major impact (see >>>>>> URL below). >>>>>> >>>>>> key_buffer_size=512M >>>>>> thread_cache_size=64 >>>>>> query_cache_size=512M >>>>>> query_cache_limit=512M >>>>>> sort_buffer_size=512M >>>>>> read_rnd_buffer_size=1M >>>>>> table_cache=8K >>>>>> tmp_table_size=1G >>>>>> max_heap_table_size=1G >>>>>> >>>>>> innodb_file_per_table >>>>>> innodb_buffer_pool_size=128G# up to 80% of physical memory >>>>>> innodb_max_dirty_pages_pct=20 >>>>>> >>>>>> # see the following tutorial to tune innodb_log_file_size: >>>>>> #http://www.mysqlperformanceblog.com/2008/11/21/how-to-calculate-a-good-innodb-log-file-size >>>>>> innodb_log_file_size=900M >>>>>> >>>>>> Regards >>>>>> Thomas >>>>>> >>>>>> ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ robinhood-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/robinhood-support
