Hi Fabio,

The results you reported are very interesting.

My opinion is that DB request batching is only interesting for high 
latency drives (spinning drives),
and this is an important result you show that it is better to use a 
multithread strategy with SSD drives.
With your best tuning (11k entry/sec), it appears the robinhood pipeline 
is almost empty
which means the DB is no longer the bottleneck in this case. The 
limiting point must be the scanning speed.
In other cases, DB is clearly the bottleneck (10.000 ops in the 
pipeline, all busy with DB operations).

It is also very good to know that xfs is a better backend for MySQL than 
ext4 (at least over SSD).

About user_acct and group_acct:
- they are mandatory for the webUI.
- for the command line, they make it possible to get instantaneous 
results (O(1)) for reports like: top users, user info, group info, fs 
info, because robinhood computes them on-the-fly when scanning/ 
processing changelogs.
BTW, I forgot to tell you must drop ACCT_STATS table after disabling 
acct options.
In this case, reports will still work and will still be consistent, but 
DB requests perform a DB scan which is longer as the filesystem grows.
This depends on how long you can wait for reports...

In the future, a solution we could implement to improve DB ingestion 
rate still maintaining acct info, is to get it out of DB "hard" transactions
and update acct info asynchronously.

Regards,
Thomas


On 06/13/14 09:30, Verzelloni Fabio wrote:
> ## BEST RESULT  with user_acct group_acct disabled / multithreads DB :
>
>           Statistics:
>              entries scanned: 2523759
>              # threads:       32
>              average speed:   11428.57 entries/sec
>
> ## RESULT with user_acct group_acct enabled / multithreads DB  :
>
>           Statistics:
>              entries scanned: 2523760
>              # threads:       32
>              average speed:   1923.08 entries/sec
>
> ## RESULT with user_acct group_acct enabled / batch DB :
>
>           Statistics:
>              entries scanned: 2523760
>              # threads:       32
>              average speed:   2862.25 entries/sec
>
> ## RESULT with user_acct group_acct disabled / batch DB :
>
>           Statistics:
>              entries scanned: 2523760
>              # threads:       32
>              average speed:   4790.42 entries/sec
> #########################################
>
> As you can see the result with multithread DB is much higher than batch DB  
> 11428.57 entries/sec vs 4790.42 entries/sec
>
> Something that could be helpful in the documentation I think would be the 
> fact that formatting the file system were the mysql is running as XFS will 
> offer better results in terms of scan performance. Indeed with ext4 I reached 
> at maximum 3800 entries/sec with user_acct group_acct disabled, instead with 
> XFS I reached 11000  entries/sec.
>
> Is it correct to say that the user_acct group_acct are only needed for the 
> webgui? Because if I'm not wrong from command line, even if I clean up the DB 
> and I scan without user_acc group_acct, I got all the needed data, but I 
> cannot use the web gui.
>
>
> Cheers
> Fabio
>
> --
> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
> via Trevano 131 - 6900 Lugano, Switzerland
> Tel: +41 (0)91 610 82 04
>
>
> ________________________________________
> From: LEIBOVICI Thomas [[email protected]]
> Sent: Wednesday, June 11, 2014 4:38 PM
> To: Verzelloni  Fabio; [email protected]
> Subject: Re: [robinhood-support] question about 'best configuration practice' 
> & 'mysql errors'
>
> Hi Fabio,
>
> Congratulations. I'm impressed of your in-depth tuning of robinhood,
> mysql and your system.
>
> Robinhood should dump stats in its logs at regular interval.
> It would help identifying bottlenecks if you could send an extract of
> it, like this:
>
>
> 2014/05/21 21:08:34 [13362/4] STATS | ======== FS scan statistics =========
> 2014/05/21 21:08:34 [13362/4] STATS | scan is running:
> 2014/05/21 21:08:34 [13362/4] STATS |      started at : 2014/05/21
> 09:08:26 (12.0h ago)
> 2014/05/21 21:08:34 [13362/4] STATS |      last action: 2014/05/21
> 21:08:33 (01s ago)
> 2014/05/21 21:08:34 [13362/4] STATS |      progress   : 6349237 entries
> scanned (0 errors)
> 2014/05/21 21:08:34 [13362/4] STATS |      avg. speed : 103.84
> ms/entry/thread -> 154.09 entries/sec
> 2014/05/21 21:08:34 [13362/4] STATS |      inst. speed: 41.30
> ms/entry/thread -> 387.39 entries/sec
> 2014/05/21 21:08:34 [13362/4] STATS | ==== EntryProcessor Pipeline Stats ===
> 2014/05/21 21:08:34 [13362/4] STATS | Idle threads: 15
> 2014/05/21 21:08:34 [13362/4] STATS | Id constraints count: 10000 (hash
> min=0/max=7/avg=1.3)
> 2014/05/21 21:08:34 [13362/4] STATS | Stage | Wait | Curr | Done |
> Total | ms/op |
> 2014/05/21 21:08:34 [13362/4] STATS |  0: GET_FID |    0 |    0 |    0
> |         0 |  0.00 |
> 2014/05/21 21:08:34 [13362/4] STATS |  1: GET_INFO_DB |    0 |    0 |
> 0 |   6603345 |  0.25 |
> 2014/05/21 21:08:34 [13362/4] STATS |  2: GET_INFO_FS |    0 |    0 |
> 0 |   6603345 |  0.03 |
> 2014/05/21 21:08:34 [13362/4] STATS |  3: REPORTING |    0 |    0 |    0
> |   6603345 |  0.00 |
> 2014/05/21 21:08:34 [13362/4] STATS |  4: PRE_APPLY |    0 |    0 |    0
> |   6603345 |  0.00 |
> 2014/05/21 21:08:34 [13362/4] STATS |  5: DB_APPLY | 9999 |    1 |    0
> |   6593345 |  6.54 | 98.35% batched (avg batch size: 85.1)
> 2014/05/21 21:08:34 [13362/4] STATS |  6: RM_OLD_ENTRIES |    0 |    0
> |    0 |         0 |  0.00 |
>
> I noticed that you disabled DB batching to use multithreaded DB
> operations. Did you get better results this way ?
>
> Your hardware looks appropriate. The stats dump will give use more
> information.
>
> Regards
>
>
> On 06/06/14 09:29, Verzelloni Fabio wrote:
>> Hello folks,
>>      I'm doing some test on robinhood in our environment, some details 
>> regarding the hardware in use:
>>
>> ## Robinhood server ##
>>
>> IBM M3550 X4
>> 128Gb RAM
>> 256G HD SSD for mysql
>> 2* Intel Xeon Processor E5-2650v2 8C 2.6GHz 20MB Cache 1866MHz
>> Lustre 2.5.1
>> Mysql 5.5
>>
>> ## Luster Version in production ##
>> lustre 2.1
>>
>> ## Robinhood.conf ##
>>
>> FS_Scan {
>>           nb_threads_scan = 32;
>>           nb_prealloc_tasks=10000;
>> }
>>
>> EntryProcessor {
>>           nb_threads = 32;
>>           STAGE_GET_FID_threads_max = 16;
>>           STAGE_GET_INFO_DB_threads_max = 4;
>>           STAGE_GET_INFO_FS_threads_max = 4;
>>           STAGE_REPORTING_threads_max = 1;
>>           STAGE_DB_APPLY_threads_max = 16;
>>           STAGE_CHGLOG_CLR_threads_max = 1;
>>           STAGE_RM_OLD_ENTRIES_threads_max = 1;
>>           max_pending_operations = 1000;
>>           max_batch_size=1;
>> }
>>
>> ## My.cnf ##
>>
>> [mysqld]
>> large-pages
>> datadir=/var/lib/mysql
>> socket=/var/lib/mysql/mysql.sock
>> user=mysql
>> # Disabling symbolic-links is recommended to prevent assorted security risks
>> symbolic-links=0
>> innodb_flush_log_at_trx_commit = 0
>> # possibly the most important setting
>> max_connections= 512
>> innodb_buffer_pool_size= 60G
>> # ~50% of memory
>> innodb_max_dirty_pages_pct= 15
>> innodb_thread_concurrency= 32
>> innodb_log_file_size= 100M
>> innodb_log_buffer_size= 50M
>> innodb_data_file_path= ibdata1:1G:autoextend
>> # kernel must be configured for support
>> table-open-cache= 2000
>> sort-buffer-size= 32M
>> read-buffer-size= 16M
>> read-rnd-buffer-size= 4M
>> thread-cache-size= 128
>> query-cache-size= 40M
>> query-cache-limit= 1M
>> tmp-table-size= 16M
>>
>> [mysqld_safe]
>> log-error=/var/log/mysqld.log
>> pid-file=/var/run/mysqld/mysqld.pid
>>
>> ## vm.nr_hugepages ##
>>
>> vm.nr_hugepages = 50000
>> vm.nr_hugepages_mempolicy = 50000
>> vm.hugetlb_shm_group = 27
>> vm.hugepages_treat_as_movable = 0
>> vm.nr_overcommit_hugepages = 0
>>
>> ## sysctl ##
>>
>> kernel.shmmax = 118111600640
>> kernel.shmall = 118111600640
>>
>> ## limits.conf ##
>>
>> mysql hard memlock unlimited
>> mysql soft memlock unlimited
>> ---
>>
>> I'm trying to find the best configuration to reach the best "entries/sec", 
>> and with this configuration the best number I can get is 2600 ~ entries/sec. 
>> Do you think that based on the HW in use is it possible to improve the speed 
>> of the scan?
>> What's the best practice to better configure the server to perform the best 
>> speed of scan?
>>
>> While I'm running the initial scan I see a lot of the following messages:
>>
>> ...
>> 2014/06/06 08:34:20 [10535/15] ListMgr | Retryable DB error in 
>> ListMgr_Insert l.218. Restarting transaction in 1 sec...
>> 2014/06/06 08:34:21 [10535/15] ListMgr | DB deadlock detected
>> ...
>>
>> I was hoping to reach 4000 / 5000 entries/sec do you think with the HW I 
>> have available I can manage to reach that result? Suggestions or questions 
>> are welcome.
>>
>> Regards
>> Fabio
>>
>>
>> --
>> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
>> via Trevano 131 - 6900 Lugano, Switzerland
>> Tel: +41 (0)91 610 82 04
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/NeoTech
>> _______________________________________________
>> robinhood-support mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/robinhood-support


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to