[jira] [Commented] (KUDU-2638) kudu cluster restart very long time to reused

Adar Dembo (JIRA) Wed, 19 Dec 2018 11:15:17 -0800


    [ 
https://issues.apache.org/jira/browse/KUDU-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725276#comment-16725276
 ]


Adar Dembo commented on KUDU-2638:
----------------------------------

Thank you for the log.

Although each server has 12 disks, it would seem that Kudu is configured to use 
just one for its data directories:
bq. --fs_data_dirs=/data1/data/kudu/tserver-new

This will have a dramatic impact on overall Kudu performance. Firstly, Kudu 
will only bootstrap one tablet at a time (see the documentation for 
{{--num_tablets_to_open_simultaneously}}, which helps explain why your tablets 
take so long to bootstrap. Secondly, your overall disk bandwidth is very low, 
so maintenance manager flush/compact operations are much slower than they 
otherwise would be.

If you upgrade to Kudu 1.7 or 1.8 and rebuild your tservers (one at a time), 
Kudu's metadata will be stored on the same disk as the WALs rather than the 
first data directory. In your case, with only one data directory, having the 
metadata colocated with all of Kudu's data is going to make all flush/compact 
operations slower (as they need to rewrite the tablet superblocks).

Another thing that stands out to me is the relative size of each Kudu data 
block:
{quote}
1 data directories: /data1/data/kudu/tserver-new/data
Total live blocks: 19299871
Total live bytes: 102086799764
Total live bytes (after alignment): 176281313280
Total number of LBM containers: 226 (17 full)
{quote}

This works out to a couple KB per data block. Ideally data blocks would be 
larger, closer to 1 MB each. Having so many small data blocks means more 
overhead elsewhere in the system.

Finally, as you pointed out, the number of delta compaction operations is quite 
high, as is the number of DMS flushes. What kind of workload is this? It seems 
to be dominated by UPDATEs, which isn't optimal for Kudu.

> kudu cluster restart very long time to reused
> ---------------------------------------------
>
>                 Key: KUDU-2638
>                 URL: https://issues.apache.org/jira/browse/KUDU-2638
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: jiaqiyang
>            Priority: Major
>             Fix For: n/a
>
>         Attachments: kudu16.tc.tablet.png, tserverLog.tar.gz
>
>
> when restart my kudu cluster ;all tablet not avalible:
> run kudu cluster ksck show that:
> Table Summary                                                                 
>                                                                               
>    
> Name | Status | Total Tablets | Healthy | Under-replicated | Unavailable
> --------------------------------------------------------------------------------+------------
> t1 | HEALTHY | 1 | 1 | 0 | 0
> t2 | UNAVAILABLE | 5 | 0 | 1 | 4
> t3 | UNAVAILABLE | 6 | 2 | 0 | 4
> t3 | UNAVAILABLE | 3 | 0 | 0 | 3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KUDU-2638) kudu cluster restart very long time to reused

Reply via email to