[ 
https://issues.apache.org/jira/browse/KUDU-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3195:
------------------------------
    Summary: Make DMS flush policy more robust when maintenance threads are 
idle  (was: Make DMS flush policy more robust when maitenance threads are idle)

> Make DMS flush policy more robust when maintenance threads are idle
> -------------------------------------------------------------------
>
>                 Key: KUDU-3195
>                 URL: https://issues.apache.org/jira/browse/KUDU-3195
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.13.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> In one scenario I observed very long bootstrap times of tablet servers 
> (something between to 45 minutes and 60 minutes) even if tablet servers had 
> relatively small amount of data under management (~80GByte).  It turned out 
> the time was spent on replaying WAL segments, with {{kudu cluster ksck}} 
> reporting something like below all the time during bootstrap:
> {noformat}
>   b0a20b117a1242ae9fc15620a6f7a524 (tserver-6.local.site:7050): not running
>     State:       BOOTSTRAPPING
>     Data state:  TABLET_DATA_READY
>     Last status: Bootstrap replaying log segment 21/37 (2.28M/7.85M this 
> segment, stats: ops{read=27374 overwritten=0 applied=25016 ignored=657} 
> inserts{seen=5949247 
> ignored=0} mutations{seen=0 ignored=0} orphaned_commits=7)
> {noformat}
> The workload I ran before shutting down the tablet servers consisted of many 
> small UPSERT operations, but the cluster was idle after terminating the 
> workload for long time (about few hours or so).  The workload was generated by
> {noformat}
> kudu perf loadgen \
>   --table_name=$TABLE_NAME \
>   --num_rows_per_thread=800000000 \
>   --num_threads=4 \
>   --use_upsert \
>   --use_random_pk \
>   $MASTER_ADDR
> {noformat}
> The table that the UPSERT workload was running against had been pre-populated 
> by the following:
> {noformat}
> kudu perf loadgen --table_num_replicas=3 --keep-auto-table 
> --table_num_hash_partitions=5 --table_num_range_partitions=5 
> --num_rows_per_thread=800000000 --num_threads=4 $MASTER_ADDR
> {noformat}
> As it turned out, tablet servers accumulated huge number of DMS which 
> required flushing/compaction, but after the memory pressure subsided, the 
> compaction policy was scheduling just one  operation per tablet in every 120 
> seconds (the latter interval is controlled by {{\-\-flush_threshold_secs}}).  
> In fact, tablet servers could flush those rowsets non-stop since the 
> maintenance threads were completely idle otherwise and there were no active 
> workload running against the cluster.  Those DMS has been around for long 
> time (much more than 120 seconds) and were anchoring a lot of WAL segments.  
> So, the operations from the WAL had to be replayed once I restarted the 
> tablet servers.
> It would be great to update the flushing/compaction policy to allow tablet 
> servers run {{FlushDeltaMemStoresOp}} as soon as a DMS becomes older than 
> specified by {{\-\-flush_threshold_secs}} when the maintenance threads are 
> not busy otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to