DB_apply is actually the longer stage with 0.4ms.

Other awaiting operations we see are stacked into the pipeline waiting for a worker thread to process it (the queue length is controled by the max_pending_operations you set in your config). It is normal there are many of them in the first stage when the changelog is full.

read speed is a good indicator of how fast it goes.
Did you get better results with the latest code and disabling accounting?

On 05/06/15 13:42, Carmelo Ponti (CSCS) wrote:
Hi Thomas

Thank you for you prompt answer.

On Wed, 2015-05-06 at 10:52 +0200, LEIBOVICI Thomas wrote:
Hi Carmelo,

Check in robinhood logs was is the slowest operation in robinhood
pipeline (grep STATS ...), and where the operrations are stacked
(waiting status).
The slower operation is GET_INFO_DB with 99999 waiting:

2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | Stage | Wait | Curr | Done | Total | ms/op | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 0: GET_FID | 0 | 0 | 0 | 0 | 0.00 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 1: GET_INFO_DB |99998 | 0 | 0 | 28642026 | 0.31 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 2: GET_INFO_FS | 0 | 0 | 0 | 10440925 | 0.24 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 3: REPORTING | 0 | 0 | 0 | 53424 | 0.00 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 4: PRE_APPLY | 0 | 0 | 0 | 7813312 | 0.00 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 5: DB_APPLY | 0 | 0 | 0 | 7813312 | 0.40 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 6: CHGLOG_CLR | 1 | 0 | 0 | 21389216 | 0.02 | 2015/05/06 13:02:47 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[22725/2] STATS | 7: RM_OLD_ENTRIES | 0 | 0 | 0 | 0 | 0.00 |

- If the limiting point is the DB access (DB_APPLY stage), consider this:

I compiled robinhood 2.5.5 and I applied the changes suggested.

Do you get better performances with "autocommit" compared to "transaction"?

I don't see much difference:

autocommit (2729.07 only after robinhood restart)

2015/05/06 13:15:17 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11264/1] STATS | read speed = 2729.07 record/sec 2015/05/06 13:16:17 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11264/1] STATS | read speed = 1917.30 record/sec 2015/05/06 13:17:17 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11264/1] STATS | read speed = 1965.80 record/sec 2015/05/06 13:18:17 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11264/1] STATS | read speed = 1295.02 record/sec
and then continue around 1200

transaction ( 3088.33 only after robinhood restart)
2015/05/06 13:28:26 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11864/1] STATS | read speed = 3088.33 record/sec 2015/05/06 13:29:26 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11864/1] STATS | read speed = 1866.95 record/sec 2015/05/06 13:30:29 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11864/1] STATS | read speed = 1787.73 record/sec 2015/05/06 13:31:29 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11864/1] STATS | read speed = 2105.38 record/sec 2015/05/06 13:32:29 robinhood@daintrbh01 <mailto:robinhood@daintrbh01>[11864/1] STATS | read speed = 1273.35 record/sec
and then continue around 1200

>      match_classes = TRUE;

If you don't care about fileclass reports (rbh-report --class-info) you
can disable "match_classes".

I'm keeping it for the moment.

>      Ignore
>      {
>          type == directory
>          and
>          ( name == ".snapdir" or name == ".snapshot" )
>      }
This is useless with Lustre.
Removed

> # ChangeLog Reader configuration
> # Parameters for processing MDT changelogs :
> ChangeLog
> {
> ...
>      queue_max_size   = 1000 ;
>      queue_max_age    = 5s ;
>      queue_check_interval = 1s ;
> }
You can try increasing max size and max age (x2?) to get more chance to
eliminate redundant changelog records.

Done

> Purge_Trigger
> {
>      trigger_on         = global_usage ;
Trigerring purge on OST_usage is more efficient, and safer to avoid
ENOSPC errors for users.

Done

GET_INFO_DB is still 99998. This appears only when the changelog is full. Usually is 0 or max 500.

Carmelo

--
----------------------------------------------------------------------
Carmelo Ponti           System Engineer
CSCS                    Swiss Center for Scientific Computing
Via Trevano 131         Email: [email protected]
CH-6900 Lugano          http://www.cscs.ch
                         Phone: +41 91 610 82 15/Fax: +41 91 610 82 82
----------------------------------------------------------------------


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to