[ https://issues.apache.org/jira/browse/KUDU-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adar Dembo updated KUDU-1549: ----------------------------- Summary: LBM should start up faster (was: recovery speed of kudu-tserver should be faster.) I'm repurposing this JIRA for the general problem of "LBM startup is too damn slow." Some potential improvements: # Identify and delete LBM containers that are full but have no live blocks. This can happen at startup time, at last-live-block-deletion time, periodically (perhaps via maintenance manager scheduling), or some combination of the above # Identify LBM containers that are full and have very few live blocks. "Defragment" the container and make it available for writing again. Probably best to do this periodically; it may get expensive to do it at startup or when the container becomes full. # Compact LBM container metadata by identifying and removing CREATE/DELETE pairs of records. Probably best to restrict this to full containers. Not sure when it's best to do it. > LBM should start up faster > -------------------------- > > Key: KUDU-1549 > URL: https://issues.apache.org/jira/browse/KUDU-1549 > Project: Kudu > Issue Type: Improvement > Components: tablet, tserver > Environment: cpu: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz > mem: 252 G > disk: single ssd 1.5 T left. > Reporter: zhangsong > Labels: data-scalability > Attachments: a14844513e5243a993b2b84bf0dcec4c.short.txt > > > After experiencing physical node crash, it found recovery/start speed of > kudu-tserver is slower than that of usual restart case. There are some > message like "Found partial trailing metadata" in kudu-tserver log and it > seems cost more than 20 minute to recover these metadata. > According to adar , it should be this slow. > attachment is the start log . -- This message was sent by Atlassian JIRA (v6.3.15#6346)