[jira] [Commented] (KUDU-1973) Coalesce RPCs destined for the same server
[ https://issues.apache.org/jira/browse/KUDU-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967161#comment-15967161 ] Todd Lipcon commented on KUDU-1973: --- I think this should be more specific to the consensus system, not to krpc in general. The thing here is that all of the tablets have separate heartbeat timers, so each heartbeat (empty UpdateConsensus) RPC ends up causing a wakeup and context switch on the server side, etc. But there isn't really any particular reason not to coalesce them so they all get sent and arrive at the same time -- so long as it arrives before the election timer expires, it's fine to shift a heartbeat later by hundreds of milliseconds, for example. As such, we could bundle a bunch of empty UpdateConsensus RPCs destined for the same node into a single RPC and avoid the extra wakeups. > Coalesce RPCs destined for the same server > -- > > Key: KUDU-1973 > URL: https://issues.apache.org/jira/browse/KUDU-1973 > Project: Kudu > Issue Type: Sub-task > Components: rpc, tserver >Affects Versions: 1.4.0 >Reporter: Adar Dembo > Labels: data-scalability > > The krpc subsystem ensures that only one _connection_ exists between any pair > of nodes, but it doesn't coalesce the _RPCs_ themselves. In clusters with > dense nodes (especially with a lot of tablets), there's often a great number > of RPCs sent between pairs of nodes. > We should explore ways of coalescing those RPCs. I don't know whether that > would happen within the krpc system itself (i.e. in a payload-agnostic way), > or whether we'd only coalesce RPCs known to be "hot" (like UpdateConsensus). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1969) Please tidy up incubator distribution files
[ https://issues.apache.org/jira/browse/KUDU-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1969. --- Resolution: Fixed Assignee: Todd Lipcon Fix Version/s: n/a Done, sorry for the delay. > Please tidy up incubator distribution files > --- > > Key: KUDU-1969 > URL: https://issues.apache.org/jira/browse/KUDU-1969 > Project: Kudu > Issue Type: Bug > Environment: http://www.apache.org/dist/incubator/kudu/ >Reporter: Sebb >Assignee: Todd Lipcon > Fix For: n/a > > > Please remove the old incubator releases as per: > http://incubator.apache.org/guides/graduation.html#dist -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1972) Explore ways to reduce maintenance manager CPU load
[ https://issues.apache.org/jira/browse/KUDU-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1972: -- Description: The current design of the maintenance manager includes a dedicated thread that wakes up every so often (default to 250 ms), looks for work to do, and schedules it to be done on helper threads. On a large tablet server, "look for work to do" can be very CPU intensive. We should explore ways to mitigate this. Additionally, if we identify "cold" tablets (i.e. those not servicing any writes), we should be able to further reduce their scheduling load, perhaps by not running the compaction knapsack solver on them at all. was: The current design of the maintenance manager includes a dedicated thread that wakes up every so often (default to 250 ms), looks for work to do, and schedules it to be done on helper threads. On a large cluster, "look for work to do" can be very CPU intensive. We should explore ways to mitigate this. Additionally, if we identify "cold" tablets (i.e. those not servicing any writes), we should be able to further reduce their scheduling load, perhaps by not running the compaction knapsack solver on them at all. > Explore ways to reduce maintenance manager CPU load > --- > > Key: KUDU-1972 > URL: https://issues.apache.org/jira/browse/KUDU-1972 > Project: Kudu > Issue Type: Sub-task > Components: tserver >Affects Versions: 1.4.0 >Reporter: Adar Dembo > Labels: data-scalability > > The current design of the maintenance manager includes a dedicated thread > that wakes up every so often (default to 250 ms), looks for work to do, and > schedules it to be done on helper threads. On a large tablet server, "look > for work to do" can be very CPU intensive. We should explore ways to mitigate > this. > Additionally, if we identify "cold" tablets (i.e. those not servicing any > writes), we should be able to further reduce their scheduling load, perhaps > by not running the compaction knapsack solver on them at all. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1974) Improve web UI experience with many tablets
Adar Dembo created KUDU-1974: Summary: Improve web UI experience with many tablets Key: KUDU-1974 URL: https://issues.apache.org/jira/browse/KUDU-1974 Project: Kudu Issue Type: Sub-task Components: supportability, tserver Affects Versions: 1.4.0 Reporter: Adar Dembo On nodes with many tablets, the web UI is...not great. There are several pages that display something for each tablet, and those pages become unwieldy with thousands of tablets. We should look into either: # Removing those pages (if they aren't adding value, or # Collapsing the data and making it expandable/searchable, or # Exposing the same data in a different way (i.e. perhaps it'd be more relevant if aggregated anyway). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1973) Coalesce RPCs destined for the same server
Adar Dembo created KUDU-1973: Summary: Coalesce RPCs destined for the same server Key: KUDU-1973 URL: https://issues.apache.org/jira/browse/KUDU-1973 Project: Kudu Issue Type: Sub-task Components: rpc, tserver Affects Versions: 1.4.0 Reporter: Adar Dembo The krpc subsystem ensures that only one _connection_ exists between any pair of nodes, but it doesn't coalesce the _RPCs_ themselves. In clusters with dense nodes (especially with a lot of tablets), there's often a great number of RPCs sent between pairs of nodes. We should explore ways of coalescing those RPCs. I don't know whether that would happen within the krpc system itself (i.e. in a payload-agnostic way), or whether we'd only coalesce RPCs known to be "hot" (like UpdateConsensus). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1972) Explore ways to reduce maintenance manager CPU load
Adar Dembo created KUDU-1972: Summary: Explore ways to reduce maintenance manager CPU load Key: KUDU-1972 URL: https://issues.apache.org/jira/browse/KUDU-1972 Project: Kudu Issue Type: Sub-task Components: tserver Affects Versions: 1.4.0 Reporter: Adar Dembo The current design of the maintenance manager includes a dedicated thread that wakes up every so often (default to 250 ms), looks for work to do, and schedules it to be done on helper threads. On a large cluster, "look for work to do" can be very CPU intensive. We should explore ways to mitigate this. Additionally, if we identify "cold" tablets (i.e. those not servicing any writes), we should be able to further reduce their scheduling load, perhaps by not running the compaction knapsack solver on them at all. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1971) Explore reducing number of data blocks by tuning existing parameters
[ https://issues.apache.org/jira/browse/KUDU-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1971: - Labels: data-scalability (was: ) > Explore reducing number of data blocks by tuning existing parameters > > > Key: KUDU-1971 > URL: https://issues.apache.org/jira/browse/KUDU-1971 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Affects Versions: 1.4.0 >Reporter: Adar Dembo > Labels: data-scalability > > One way to scale to larger on-disk data sets is to reduce the ratio between > data blocks and data; that is, to make data blocks larger. Two existing > parameters control for this: > * budgeted_compaction_target_rowset_size: within a given flush or compaction > operation, stipulates the size of each rowset. Currently 32M. > * tablet_compaction_budget_mb: stipulates the amount of data that should be > included in any given compaction. Currently 128M. > It might be interesting to explore raising these. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967031#comment-15967031 ] Adar Dembo commented on KUDU-1913: -- A couple notes about this: We currently rely on there being a single prepare thread per tablet in order to serialize writes via Raft replication. If these threads were aggregated across the tserver, we'd want a way to ensure that writes from the same tablet are processed serially. Chromium's [Sequenced Worker Pool|https://cs.chromium.org/chromium/src/base/threading/sequenced_worker_pool.h?q=base::SequencedWorkerPool&sq=package:chromium&l=72&type=cs] might be a good fit for this. [MultiRaft|https://www.cockroachlabs.com/blog/scaling-raft/] is an approach adopted by CockroachDB to improve Raft scalability when a server has many tablets. It could be worth exploring for our purposes too, though I see CockroachDB is [now using etcd's Raft implementation|https://github.com/cockroachdb/cockroach/issues/20]; I don't know if it implements MultiRaft or not. > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-383: Issue Type: Bug (was: Sub-task) Parent: (was: KUDU-1967) > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Bug > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy > Labels: data-scalability > Fix For: n/a > > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-383. - Resolution: Duplicate Fix Version/s: n/a Yes, I'm closing a bug that's far older than it's duplicate, but KUDU-1913 does a better job of describing the problem. > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Sub-task > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy > Labels: data-scalability > Fix For: n/a > > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1971) Explore reducing number of data blocks by tuning existing parameters
Adar Dembo created KUDU-1971: Summary: Explore reducing number of data blocks by tuning existing parameters Key: KUDU-1971 URL: https://issues.apache.org/jira/browse/KUDU-1971 Project: Kudu Issue Type: Sub-task Components: tablet Affects Versions: 1.4.0 Reporter: Adar Dembo One way to scale to larger on-disk data sets is to reduce the ratio between data blocks and data; that is, to make data blocks larger. Two existing parameters control for this: * budgeted_compaction_target_rowset_size: within a given flush or compaction operation, stipulates the size of each rowset. Currently 32M. * tablet_compaction_budget_mb: stipulates the amount of data that should be included in any given compaction. Currently 128M. It might be interesting to explore raising these. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1442) LBM should log startup progress periodically
[ https://issues.apache.org/jira/browse/KUDU-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967020#comment-15967020 ] Jean-Daniel Cryans commented on KUDU-1442: -- bq. That might be tricky to orchestrate across data dir threads; maybe once per thread per x number of containers? Yeah that sounds pretty good. Dunno what x should be though. > LBM should log startup progress periodically > > > Key: KUDU-1442 > URL: https://issues.apache.org/jira/browse/KUDU-1442 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Reporter: zhangsong >Priority: Trivial > Labels: data-scalability, newbie > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed
[ https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-38: --- Issue Type: Sub-task (was: Improvement) Parent: KUDU-1967 > bootstrap should not replay logs that are known to be fully flushed > --- > > Key: KUDU-38 > URL: https://issues.apache.org/jira/browse/KUDU-38 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Affects Versions: M3 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Labels: data-scalability, startup-time > > Currently the bootstrap process will process all of the log segments, > including those that can be trivially determined to contain only durable > edits. This makes startup unnecessarily slow. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-1830) Reduce Kudu WAL log disk usage
[ https://issues.apache.org/jira/browse/KUDU-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1830: Assignee: (was: Adar Dembo) > Reduce Kudu WAL log disk usage > -- > > Key: KUDU-1830 > URL: https://issues.apache.org/jira/browse/KUDU-1830 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > WAL log can take significent disk space. So far there are some config to > limit it. but it can go very high. > WAL log size = #tablets * log_segment_size_mb * log segments (1 if there is > write ops to this tablet, can go up to log_max_segments_to_retain) > Logs are retained even if there is no write for a while. > We could reduce the WAL log usage by > - reduce min_segments_to_retain to 1 instead of 2, a > - reduce steady state consumption of idle tablets, roll a WAL if it has had > no writes for a few minutes and size more than a MB or two so that "idle" > tablets have 0 WAL space consumed -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-383: --- Assignee: (was: Adar Dembo) > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Sub-task > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy > Labels: data-scalability > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1913: Assignee: (was: Adar Dembo) > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1549) LBM should start up faster
[ https://issues.apache.org/jira/browse/KUDU-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1549: - Issue Type: Sub-task (was: Improvement) Parent: KUDU-1967 > LBM should start up faster > -- > > Key: KUDU-1549 > URL: https://issues.apache.org/jira/browse/KUDU-1549 > Project: Kudu > Issue Type: Sub-task > Components: tablet, tserver > Environment: cpu: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz > mem: 252 G > disk: single ssd 1.5 T left. >Reporter: zhangsong > Labels: data-scalability > Attachments: a14844513e5243a993b2b84bf0dcec4c.short.txt > > > After experiencing physical node crash, it found recovery/start speed of > kudu-tserver is slower than that of usual restart case. There are some > message like "Found partial trailing metadata" in kudu-tserver log and it > seems cost more than 20 minute to recover these metadata. > According to adar , it should be this slow. > attachment is the start log . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1549) LBM should start up faster
[ https://issues.apache.org/jira/browse/KUDU-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1549: - Summary: LBM should start up faster (was: recovery speed of kudu-tserver should be faster.) I'm repurposing this JIRA for the general problem of "LBM startup is too damn slow." Some potential improvements: # Identify and delete LBM containers that are full but have no live blocks. This can happen at startup time, at last-live-block-deletion time, periodically (perhaps via maintenance manager scheduling), or some combination of the above # Identify LBM containers that are full and have very few live blocks. "Defragment" the container and make it available for writing again. Probably best to do this periodically; it may get expensive to do it at startup or when the container becomes full. # Compact LBM container metadata by identifying and removing CREATE/DELETE pairs of records. Probably best to restrict this to full containers. Not sure when it's best to do it. > LBM should start up faster > -- > > Key: KUDU-1549 > URL: https://issues.apache.org/jira/browse/KUDU-1549 > Project: Kudu > Issue Type: Improvement > Components: tablet, tserver > Environment: cpu: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz > mem: 252 G > disk: single ssd 1.5 T left. >Reporter: zhangsong > Labels: data-scalability > Attachments: a14844513e5243a993b2b84bf0dcec4c.short.txt > > > After experiencing physical node crash, it found recovery/start speed of > kudu-tserver is slower than that of usual restart case. There are some > message like "Found partial trailing metadata" in kudu-tserver log and it > seems cost more than 20 minute to recover these metadata. > According to adar , it should be this slow. > attachment is the start log . -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1442) LBM should log startup progress periodically
[ https://issues.apache.org/jira/browse/KUDU-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1442: - Labels: data-scalability newbie (was: data-scalability) Summary: LBM should log startup progress periodically (was: kudu should have more log when starting up, user need to check the progress of starting.) > LBM should log startup progress periodically > > > Key: KUDU-1442 > URL: https://issues.apache.org/jira/browse/KUDU-1442 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Reporter: zhangsong >Priority: Trivial > Labels: data-scalability, newbie > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1442) kudu should have more log when starting up, user need to check the progress of starting.
[ https://issues.apache.org/jira/browse/KUDU-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967012#comment-15967012 ] Adar Dembo commented on KUDU-1442: -- Apart from fixing KUDU-1192, we should (as [~bruceSz] said) log more progress information in the log block manager. Perhaps a periodic progress message every second that describes how many containers we've loaded thus far? That might be tricky to orchestrate across data dir threads; maybe once per thread per x number of containers? > kudu should have more log when starting up, user need to check the progress > of starting. > > > Key: KUDU-1442 > URL: https://issues.apache.org/jira/browse/KUDU-1442 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Reporter: zhangsong >Priority: Trivial > Labels: data-scalability > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1442) kudu should have more log when starting up, user need to check the progress of starting.
[ https://issues.apache.org/jira/browse/KUDU-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1442: - Issue Type: Sub-task (was: Improvement) Parent: KUDU-1967 > kudu should have more log when starting up, user need to check the progress > of starting. > > > Key: KUDU-1442 > URL: https://issues.apache.org/jira/browse/KUDU-1442 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Reporter: zhangsong >Priority: Trivial > Labels: data-scalability > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1125) Reduce impact of enabling fsync on the master
[ https://issues.apache.org/jira/browse/KUDU-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1125: - Issue Type: Sub-task (was: Improvement) Parent: KUDU-1967 > Reduce impact of enabling fsync on the master > - > > Key: KUDU-1125 > URL: https://issues.apache.org/jira/browse/KUDU-1125 > Project: Kudu > Issue Type: Sub-task > Components: master >Affects Versions: Feature Complete >Reporter: Jean-Daniel Cryans >Priority: Critical > Labels: data-scalability > > First time running ITBLL since we enabled fsync in the master and I'm now > seeing RPCs timing out because the master is always ERROR_SERVER_TOO_BUSY. In > the log I can see a lot of elections going on and the queue is always full. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-1830) Reduce Kudu WAL log disk usage
[ https://issues.apache.org/jira/browse/KUDU-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1830: Assignee: Adar Dembo > Reduce Kudu WAL log disk usage > -- > > Key: KUDU-1830 > URL: https://issues.apache.org/jira/browse/KUDU-1830 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu >Assignee: Adar Dembo > Labels: data-scalability > > WAL log can take significent disk space. So far there are some config to > limit it. but it can go very high. > WAL log size = #tablets * log_segment_size_mb * log segments (1 if there is > write ops to this tablet, can go up to log_max_segments_to_retain) > Logs are retained even if there is no write for a while. > We could reduce the WAL log usage by > - reduce min_segments_to_retain to 1 instead of 2, a > - reduce steady state consumption of idle tablets, roll a WAL if it has had > no writes for a few minutes and size more than a MB or two so that "idle" > tablets have 0 WAL space consumed -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1807) GetTableSchema() is O(n) in the number of tablets
[ https://issues.apache.org/jira/browse/KUDU-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1807: - Issue Type: Sub-task (was: Bug) Parent: KUDU-1967 > GetTableSchema() is O(n) in the number of tablets > - > > Key: KUDU-1807 > URL: https://issues.apache.org/jira/browse/KUDU-1807 > Project: Kudu > Issue Type: Sub-task > Components: master, perf >Affects Versions: 1.2.0 >Reporter: Todd Lipcon >Priority: Critical > Labels: data-scalability > > GetTableSchema calls TableInfo::IsCreateTableDone. This method checks each > tablet for whether it is in the correct state, which requires acquiring the > RWC lock for every tablet. This is somewhat slow for large tables with > thousands of tablets, and this is actually a relatively hot path because > every task in an Impala query ends up calling GetTableSchema() when it opens > its scanner. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1913: Assignee: Adar Dembo > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu >Assignee: Adar Dembo > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1830) Reduce Kudu WAL log disk usage
[ https://issues.apache.org/jira/browse/KUDU-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1830: - Issue Type: Sub-task (was: Improvement) Parent: KUDU-1967 > Reduce Kudu WAL log disk usage > -- > > Key: KUDU-1830 > URL: https://issues.apache.org/jira/browse/KUDU-1830 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > WAL log can take significent disk space. So far there are some config to > limit it. but it can go very high. > WAL log size = #tablets * log_segment_size_mb * log segments (1 if there is > write ops to this tablet, can go up to log_max_segments_to_retain) > Logs are retained even if there is no write for a while. > We could reduce the WAL log usage by > - reduce min_segments_to_retain to 1 instead of 2, a > - reduce steady state consumption of idle tablets, roll a WAL if it has had > no writes for a few minutes and size more than a MB or two so that "idle" > tablets have 0 WAL space consumed -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-383: Issue Type: Sub-task (was: Bug) Parent: KUDU-1967 > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Sub-task > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy >Assignee: Adar Dembo > Labels: data-scalability > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-383: Labels: data-scalability (was: ) > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Bug > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy >Assignee: Mike Percy > Labels: data-scalability > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-383) Massive numbers of threads used by log append and GC
[ https://issues.apache.org/jira/browse/KUDU-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-383: --- Assignee: Adar Dembo (was: Mike Percy) > Massive numbers of threads used by log append and GC > > > Key: KUDU-383 > URL: https://issues.apache.org/jira/browse/KUDU-383 > Project: Kudu > Issue Type: Bug > Components: log >Affects Versions: M4.5 >Reporter: Mike Percy >Assignee: Adar Dembo > Labels: data-scalability > Attachments: create-table-stress-test-10697.txt.gz > > > {noformat} > $ ../../build-support/stacktrace-thread-summary.pl > create-table-stress-test-10697.txt | awk '{print $3}' | sort | uniq -c | sort > -n > 1 kudu::KernelStackWatchdog::RunThread() > 1 kudu::MaintenanceManager::RunSchedulerThread() > 1 kudu::master::CatalogManagerBgTasks::Run() > 1 kudu::tserver::Heartbeater::Thread::RunThread() > 1 kudu::tserver::ScannerManager::RunRemovalThread() > 1 main > 1 timer_helper_thread > 1 timer_sigev_thread > 2 kudu::rpc::AcceptorPool::RunThread() > 2 master_thread > 4 kudu::ThreadPool::DispatchThread(bool) > 12 kudu::rpc::ReactorThread::RunThread() > 20 kudu::rpc::ServicePool::RunThread() >3291 kudu::log::Log::AppendThread::RunThread() >3291 kudu::tablet::TabletPeer::RunLogGC() > W0626 02:09:16.853266 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.854862 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > W0626 02:09:16.882686 27997 messenger.cc:187] Unable to handle RPC call: > Service unavailable: TSHeartbeat request on kudu.master.MasterService from > 127.0.0.1:46937 dropped due to backpressure. The service queue is full; it > has 50 items. > W0626 02:09:16.884294 28074 heartbeater.cc:278] Failed to heartbeat: Remote > error: Failed to send heartbeat: Service unavailable: TSHeartbeat request on > kudu.master.MasterService from 127.0.0.1:46937 dropped due to backpressure. > The service queue is full; it has 50 items. > F0626 02:09:31.747577 10965 test_main.cc:63] Maximum unit test time exceeded > (900 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1913: - Issue Type: Sub-task (was: Bug) Parent: KUDU-1967 > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Sub-task > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1970) Integration test for data scalability
Adar Dembo created KUDU-1970: Summary: Integration test for data scalability Key: KUDU-1970 URL: https://issues.apache.org/jira/browse/KUDU-1970 Project: Kudu Issue Type: Sub-task Components: master, tserver Affects Versions: 1.4.0 Reporter: Adar Dembo Assignee: Adar Dembo To help test data scalability fixes, we need a way to easily produce an environment that exhibits our current scalability issues. I'm sure one of our long-running workloads would be up to the task, but aside from taking a long time, it'd also fill up the disk, which makes it unusable on most developer machines. Ultimately, data isn't really the root cause of our scalability woes; it's the metadata necessary to maintain the data that hurts us. So an idealized environment would be heavy on the metadata. Here's a not-so-exhaustive list: * Many tablets. * Many columns per tablet. * Many rowsets per tablet. * Many data blocks. * Many tables (tservers don't care about this, but maybe the master does?) Let's write an integration test that swamps the machine with the above. It should be use an external mini cluster to simplify isolating master and tserver performance characteristics, but it needn't have more than one instance of each. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KUDU-1837) kudu-jepsen reports non-linearizable history for the tserver-majorities-ring scenario
[ https://issues.apache.org/jira/browse/KUDU-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966918#comment-15966918 ] Alexey Serbin edited comment on KUDU-1837 at 4/13/17 12:34 AM: --- This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. After the client model has been updated, no error occurred so far. If any new issues is found by the new kudu-jepsen code, please open a new JIRA item for that. was (Author: aserbin): This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the tserver-majorities-ring > scenario > - > > Key: KUDU-1837 > URL: https://issues.apache.org/jira/browse/KUDU-1837 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170115T142504.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the tserver-majorities-ring scenario. The artifacts from the failed scenario > are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KUDU-1825) kudu-jepsen reports non-linearizable history for the kill-restart-all-tservers scenario
[ https://issues.apache.org/jira/browse/KUDU-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966917#comment-15966917 ] Alexey Serbin edited comment on KUDU-1825 at 4/13/17 12:33 AM: --- This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. After the client model has been updated, no error occurred so far. If any new issues is found by the new kudu-jepsen code, please open a new JIRA item for that. was (Author: aserbin): This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the > kill-restart-all-tservers scenario > --- > > Key: KUDU-1825 > URL: https://issues.apache.org/jira/browse/KUDU-1825 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: history.txt, linear.svg, master.log, ts-1.log, ts-2.log, > ts-3.log, ts-4.log, ts-5.log > > > The kudu-jepsen test found non-linearizable history of operations for > kill-restart-all-tservers scenario. > The artifacts of the failed scenario are attached. > It's necessary to create reproducible scenarios for that and fix, if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KUDU-1842) kudu-jepsen reports non-linearizable history for the 'all-random-halves' scenario
[ https://issues.apache.org/jira/browse/KUDU-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966919#comment-15966919 ] Alexey Serbin edited comment on KUDU-1842 at 4/13/17 12:33 AM: --- This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. After the client model has been updated, no error occurred so far. If any new issues is found by the new kudu-jepsen code, please open a new JIRA item for that. was (Author: aserbin): This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the 'all-random-halves' > scenario > - > > Key: KUDU-1842 > URL: https://issues.apache.org/jira/browse/KUDU-1842 > Project: Kudu > Issue Type: Bug > Components: consensus, test >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170119T023411.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the 'all-random-halves' nemesis scenario. The artifacts from the failed > scenario are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KUDU-1838) kudu-jepsen reports non-linearizable history for the hammer-3-tservers scenario
[ https://issues.apache.org/jira/browse/KUDU-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966914#comment-15966914 ] Alexey Serbin edited comment on KUDU-1838 at 4/13/17 12:32 AM: --- This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. After the client model has been updated, no error occurred so far. If any new issues is found by the new kudu-jepsen code, please open a new JIRA item for that. was (Author: aserbin): This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the hammer-3-tservers > scenario > --- > > Key: KUDU-1838 > URL: https://issues.apache.org/jira/browse/KUDU-1838 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170115T165415.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the hammer-3-tservers scenario. The artifacts from the failed scenario are > attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1838) kudu-jepsen reports non-linearizable history for the hammer-3-tservers scenario
[ https://issues.apache.org/jira/browse/KUDU-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966921#comment-15966921 ] David Alves commented on KUDU-1838: --- Should add that since that was fixed this error no longer occurs > kudu-jepsen reports non-linearizable history for the hammer-3-tservers > scenario > --- > > Key: KUDU-1838 > URL: https://issues.apache.org/jira/browse/KUDU-1838 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170115T165415.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the hammer-3-tservers scenario. The artifacts from the failed scenario are > attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1837) kudu-jepsen reports non-linearizable history for the tserver-majorities-ring scenario
[ https://issues.apache.org/jira/browse/KUDU-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-1837. - Resolution: Invalid Fix Version/s: 1.4.0 This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the tserver-majorities-ring > scenario > - > > Key: KUDU-1837 > URL: https://issues.apache.org/jira/browse/KUDU-1837 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170115T142504.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the tserver-majorities-ring scenario. The artifacts from the failed scenario > are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1842) kudu-jepsen reports non-linearizable history for the 'all-random-halves' scenario
[ https://issues.apache.org/jira/browse/KUDU-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-1842. - Resolution: Invalid Fix Version/s: 1.4.0 This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the 'all-random-halves' > scenario > - > > Key: KUDU-1842 > URL: https://issues.apache.org/jira/browse/KUDU-1842 > Project: Kudu > Issue Type: Bug > Components: consensus, test >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170119T023411.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the 'all-random-halves' nemesis scenario. The artifacts from the failed > scenario are attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1825) kudu-jepsen reports non-linearizable history for the kill-restart-all-tservers scenario
[ https://issues.apache.org/jira/browse/KUDU-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-1825. - Resolution: Invalid Fix Version/s: 1.4.0 This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the > kill-restart-all-tservers scenario > --- > > Key: KUDU-1825 > URL: https://issues.apache.org/jira/browse/KUDU-1825 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: history.txt, linear.svg, master.log, ts-1.log, ts-2.log, > ts-3.log, ts-4.log, ts-5.log > > > The kudu-jepsen test found non-linearizable history of operations for > kill-restart-all-tservers scenario. > The artifacts of the failed scenario are attached. > It's necessary to create reproducible scenarios for that and fix, if possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1838) kudu-jepsen reports non-linearizable history for the hammer-3-tservers scenario
[ https://issues.apache.org/jira/browse/KUDU-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-1838. - Resolution: Invalid Fix Version/s: 1.4.0 This bug is not valid since the kudu-jepsen ran in wrong client model: using a separate Kudu client per test actor. That model would require to propagate timestamp between clients for sound test results, but that was not the case. The client model has been updated by changelist 678a309b5d88e2fe9c6a0674ba7de00daee35cac: since then all test actors share the same Kudu client instance, so timestamp is propagated automatically. > kudu-jepsen reports non-linearizable history for the hammer-3-tservers > scenario > --- > > Key: KUDU-1838 > URL: https://issues.apache.org/jira/browse/KUDU-1838 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alexey Serbin > Labels: kudu-jepsen > Fix For: 1.4.0 > > Attachments: 20170115T165415.000-0800.tar.bz2 > > > The kudu-jepsen test has found an instance of non-linearizable history for > the hammer-3-tservers scenario. The artifacts from the failed scenario are > attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1966) Data directories can be removed erroneously
[ https://issues.apache.org/jira/browse/KUDU-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Burkert resolved KUDU-1966. --- Resolution: Fixed Fix Version/s: 1.4.0 > Data directories can be removed erroneously > --- > > Key: KUDU-1966 > URL: https://issues.apache.org/jira/browse/KUDU-1966 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.4.0 >Reporter: Adar Dembo >Assignee: Dan Burkert > Fix For: 1.4.0 > > > Kudu data directories can be removed in between starts of a server, which > will lead to tablet bootstrap failures. There exists logic to protect against > this (see > [PathInstanceMetadataFile::CheckIntegrity()|https://github.com/apache/kudu/blob/master/src/kudu/fs/block_manager_util.h#L78]), > but it was missing a key check to ensure that no directory had been removed > from the set. > To be clear, we do want to support removing data directories, but in a more > structured and protected manner. For the time being, we should close this > loophole. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1966) Data directories can be removed erroneously
[ https://issues.apache.org/jira/browse/KUDU-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966899#comment-15966899 ] Dan Burkert commented on KUDU-1966: --- Resolved by [bd24f04fb43db9f1fcbf9a60ecc31824c3c79bfd|https://github.com/apache/kudu/commit/bd24f04fb43db9f1fcbf9a60ecc31824c3c79bfd]. > Data directories can be removed erroneously > --- > > Key: KUDU-1966 > URL: https://issues.apache.org/jira/browse/KUDU-1966 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: 1.4.0 >Reporter: Adar Dembo >Assignee: Dan Burkert > > Kudu data directories can be removed in between starts of a server, which > will lead to tablet bootstrap failures. There exists logic to protect against > this (see > [PathInstanceMetadataFile::CheckIntegrity()|https://github.com/apache/kudu/blob/master/src/kudu/fs/block_manager_util.h#L78]), > but it was missing a key check to ensure that no directory had been removed > from the set. > To be clear, we do want to support removing data directories, but in a more > structured and protected manner. For the time being, we should close this > loophole. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1968) Aborted tablet copies delete live blocks
[ https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1968. --- Resolution: Fixed Fix Version/s: 1.4.0 1.3.1 Resolved by reverting the above-mentioned patch. Will fast-track a 1.3.1 release. > Aborted tablet copies delete live blocks > > > Key: KUDU-1968 > URL: https://issues.apache.org/jira/browse/KUDU-1968 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Fix For: 1.3.1, 1.4.0 > > > 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious > regression in the case of a failed tablet copy. As of that patch, the > following sequence happens: > - we fetch the remote tablet's metadata, and set our local metadata to match > it (including the remote block IDs) > - as we download blocks, we replace remote block ids with local block IDs > - if we fail in the middle, we call DeleteTablet > -- this means that, since we still have some remote block IDs in the > metadata, the DeleteTablet call deletes local blocks based on remote block > IDs. These block ids are likely to belong to other live tablets locally! > This can cause pretty serious dataloss, and has the tendency to cascade > around a cluster, since later attempts to copy a tablet with missing blocks > will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (KUDU-1853) Error during tablet copy may orphan a bunch of stuff
[ https://issues.apache.org/jira/browse/KUDU-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reopened KUDU-1853: -- With the revert of 72541b47eb55b2df4eab5d6050f517476ed6d370 (see KUDU-1968), this bug is no longer fixed. Well, it's still fixed for 1.3.0 (which has already been released), but not for any subsequent release. > Error during tablet copy may orphan a bunch of stuff > > > Key: KUDU-1853 > URL: https://issues.apache.org/jira/browse/KUDU-1853 > Project: Kudu > Issue Type: Bug > Components: tablet, tserver >Affects Versions: 1.2.0 >Reporter: Adar Dembo >Assignee: Mike Percy >Priority: Critical > Fix For: 1.3.0 > > > Currently, a failure during tablet copy may leave behind a number of > different things: > # Downloaded superblock (if the failure falls after TabletCopyClient::Start()) > # Downloaded data blocks (if the failure falls during > TabletCopyClient::FetchAll()) > # Downloaded WAL segments (if the failure falls during > TabletCopyClient::FetchAll()) > # Downloaded cmeta file (if the failure falls during > TabletCopyClient::Finish()) > The next time the tserver starts, it'll see that this tablet's state is still > TABLET_DATA_COPYING and will tombstone it. That takes care of #1, #3, and #4 > (well, it leaves the cmeta file behind as the tombstone, but that's > intentional). > Unfortunately, all data blocks are orphaned, because the on-disk superblock > has no record of the new blocks, and so they aren't deleted. > We're already tracking a general purpose GC mechanism for data blocks in > KUDU-829, but I think this separate JIRA for describing the problem with > tablet copy is useful, if only as a reference for users. > Separately, it may be worth addressing these issues for failures that don't > result in tserver crashes, such as intermittent network outages between > tservers. A long lived tserver won't GC for some time, and it'd be nice to > reclaim the disk space used by these orphaned objects in the interim, not to > mention that implementing this kind of "GC" for data blocks is a lot easier > than a general purpose GC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1969) Please tidy up incubator distribution files
[ https://issues.apache.org/jira/browse/KUDU-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated KUDU-1969: --- Description: Please remove the old incubator releases as per: http://incubator.apache.org/guides/graduation.html#dist was: Please remove the old incubator releases as per: http://incubator.apache.org/guides/graduation.html Transferring Resources Distribution mirrors 6. After you have a release at your new home (/dist/${project}/ area), remove any distribution artefacts from your old /dist/incubator/${project}/ area. Remember from the mirror guidelines that everything is automatically added to archive.apache.org anyway. > Please tidy up incubator distribution files > --- > > Key: KUDU-1969 > URL: https://issues.apache.org/jira/browse/KUDU-1969 > Project: Kudu > Issue Type: Bug > Environment: http://www.apache.org/dist/incubator/kudu/ >Reporter: Sebb > > Please remove the old incubator releases as per: > http://incubator.apache.org/guides/graduation.html#dist -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1969) Please tidy up incubator distribution files
Sebb created KUDU-1969: -- Summary: Please tidy up incubator distribution files Key: KUDU-1969 URL: https://issues.apache.org/jira/browse/KUDU-1969 Project: Kudu Issue Type: Bug Environment: http://www.apache.org/dist/incubator/kudu/ Reporter: Sebb Please remove the old incubator releases as per: http://incubator.apache.org/guides/graduation.html Transferring Resources Distribution mirrors 6. After you have a release at your new home (/dist/${project}/ area), remove any distribution artefacts from your old /dist/incubator/${project}/ area. Remember from the mirror guidelines that everything is automatically added to archive.apache.org anyway. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1968) Aborted tablet copies delete live blocks
[ https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated KUDU-1968: - Code Review: https://gerrit.cloudera.org/#/c/6613/ > Aborted tablet copies delete live blocks > > > Key: KUDU-1968 > URL: https://issues.apache.org/jira/browse/KUDU-1968 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > > 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious > regression in the case of a failed tablet copy. As of that patch, the > following sequence happens: > - we fetch the remote tablet's metadata, and set our local metadata to match > it (including the remote block IDs) > - as we download blocks, we replace remote block ids with local block IDs > - if we fail in the middle, we call DeleteTablet > -- this means that, since we still have some remote block IDs in the > metadata, the DeleteTablet call deletes local blocks based on remote block > IDs. These block ids are likely to belong to other live tablets locally! > This can cause pretty serious dataloss, and has the tendency to cascade > around a cluster, since later attempts to copy a tablet with missing blocks > will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1891) Uploading 100,000 rows x 20 columns results in not enough mutation buffer space when uploading data using Python
[ https://issues.apache.org/jira/browse/KUDU-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1891. --- Resolution: Not A Problem Fix Version/s: n/a > Uploading 100,000 rows x 20 columns results in not enough mutation buffer > space when uploading data using Python > > > Key: KUDU-1891 > URL: https://issues.apache.org/jira/browse/KUDU-1891 > Project: Kudu > Issue Type: Bug > Components: python >Affects Versions: 1.2.0 > Environment: Ubuntu 16.04 >Reporter: Roger > Fix For: n/a > > > The table had one timestamp column and 19 single precision columns with only > the timestamp as the primary key. > The tuples were uploaded in the following way: > {code} > table = client.table('new_table') > session = client.new_session() > for t in tuples[:10]: > session.apply(table.new_insert(t)) > {code} > Please note that the default flush mode in Python is manual. > This resulted in the bellow error: > {code} > --- > KuduBadStatus Traceback (most recent call last) > in () > 2 session = client.new_session() > 3 for t in tuples[:10]: > > 4 session.apply(table.new_insert(t)) > 5 > 6 try: > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/client.pyx > in kudu.client.Session.apply (kudu/client.cpp:15185)() > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/client.pyx > in kudu.client.WriteOperation.add_to_session (kudu/client.cpp:27992)() > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/errors.pyx > in kudu.errors.check_status (kudu/errors.cpp:1314)() > KuduBadStatus: b'Incomplete: not enough mutation buffer space remaining for > operation: required additional 225 when 7339950 of 7340032 already used' > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1891) Uploading 100,000 rows x 20 columns results in not enough mutation buffer space when uploading data using Python
[ https://issues.apache.org/jira/browse/KUDU-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966496#comment-15966496 ] Todd Lipcon commented on KUDU-1891: --- Yea, I was suggesting that the user use the other mode, not that we make a code change. I'll resolve as not-a-bug since it's been several weeks with no response. > Uploading 100,000 rows x 20 columns results in not enough mutation buffer > space when uploading data using Python > > > Key: KUDU-1891 > URL: https://issues.apache.org/jira/browse/KUDU-1891 > Project: Kudu > Issue Type: Bug > Components: python >Affects Versions: 1.2.0 > Environment: Ubuntu 16.04 >Reporter: Roger > > The table had one timestamp column and 19 single precision columns with only > the timestamp as the primary key. > The tuples were uploaded in the following way: > {code} > table = client.table('new_table') > session = client.new_session() > for t in tuples[:10]: > session.apply(table.new_insert(t)) > {code} > Please note that the default flush mode in Python is manual. > This resulted in the bellow error: > {code} > --- > KuduBadStatus Traceback (most recent call last) > in () > 2 session = client.new_session() > 3 for t in tuples[:10]: > > 4 session.apply(table.new_insert(t)) > 5 > 6 try: > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/client.pyx > in kudu.client.Session.apply (kudu/client.cpp:15185)() > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/client.pyx > in kudu.client.WriteOperation.add_to_session (kudu/client.cpp:27992)() > /root/anaconda3/envs/sifr-repository/lib/python3.5/site-packages/kudu/errors.pyx > in kudu.errors.check_status (kudu/errors.cpp:1314)() > KuduBadStatus: b'Incomplete: not enough mutation buffer space remaining for > operation: required additional 225 when 7339950 of 7340032 already used' > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1968) Aborted tablet copies delete live blocks
[ https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966491#comment-15966491 ] Todd Lipcon commented on KUDU-1968: --- I should note that I verified that with a revert, it went back to the old behavior of orphaning blocks, as expected, but that's preferable to deleting the wrong ones. > Aborted tablet copies delete live blocks > > > Key: KUDU-1968 > URL: https://issues.apache.org/jira/browse/KUDU-1968 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > > 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious > regression in the case of a failed tablet copy. As of that patch, the > following sequence happens: > - we fetch the remote tablet's metadata, and set our local metadata to match > it (including the remote block IDs) > - as we download blocks, we replace remote block ids with local block IDs > - if we fail in the middle, we call DeleteTablet > -- this means that, since we still have some remote block IDs in the > metadata, the DeleteTablet call deletes local blocks based on remote block > IDs. These block ids are likely to belong to other live tablets locally! > This can cause pretty serious dataloss, and has the tendency to cascade > around a cluster, since later attempts to copy a tablet with missing blocks > will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1968) Aborted tablet copies delete live blocks
[ https://issues.apache.org/jira/browse/KUDU-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966490#comment-15966490 ] Todd Lipcon commented on KUDU-1968: --- I'm able to repro with the following sequence: {code} rm -Rf /tmp/m /tmp/ts-{1,2,3} ninja -C build/release kudu-tserver kudu-master kudu build/latest/bin/kudu-master -fs_wal_dir /tmp/m & build/latest/bin/kudu-tserver -fs_wal_dir /tmp/ts-1 -rpc_bind_addresses=0.0.0.0:7001 -webserver_port=8001 -flush_threshold_secs=10 -unlock-experimental-flags & build/latest/bin/kudu-tserver -fs_wal_dir /tmp/ts-2 -rpc_bind_addresses=0.0.0.0:7002 -webserver_port=8002 -flush_threshold_secs=10 -unlock-experimental-flags -unlock-unsafe-flags -fault-crash-on-handle-tc-fetch-data=0.2 & sleep 5 # wait for servers to all start build/latest/bin/kudu test loadgen localhost -keep_auto_table -num_rows_per_thread=100 sleep 20 # wait for flush tablet=$(ls -1 /tmp/ts-2/tablet-meta/* | head -1 | xargs basename) build/latest/bin/kudu remote_replica copy $tablet localhost:7002 localhost:7001 build/latest/bin/kudu fs check -fs_wal_dir /tmp/ts-1/ {code} We should revert the patch in trunk and branch-1.3 and release 1.3.1 ASAP. > Aborted tablet copies delete live blocks > > > Key: KUDU-1968 > URL: https://issues.apache.org/jira/browse/KUDU-1968 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > > 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious > regression in the case of a failed tablet copy. As of that patch, the > following sequence happens: > - we fetch the remote tablet's metadata, and set our local metadata to match > it (including the remote block IDs) > - as we download blocks, we replace remote block ids with local block IDs > - if we fail in the middle, we call DeleteTablet > -- this means that, since we still have some remote block IDs in the > metadata, the DeleteTablet call deletes local blocks based on remote block > IDs. These block ids are likely to belong to other live tablets locally! > This can cause pretty serious dataloss, and has the tendency to cascade > around a cluster, since later attempts to copy a tablet with missing blocks > will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1968) Aborted tablet copies delete live blocks
Todd Lipcon created KUDU-1968: - Summary: Aborted tablet copies delete live blocks Key: KUDU-1968 URL: https://issues.apache.org/jira/browse/KUDU-1968 Project: Kudu Issue Type: Bug Components: tserver Affects Versions: 1.3.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker 72541b47eb55b2df4eab5d6050f517476ed6d370 (KUDU-1853) caused a serious regression in the case of a failed tablet copy. As of that patch, the following sequence happens: - we fetch the remote tablet's metadata, and set our local metadata to match it (including the remote block IDs) - as we download blocks, we replace remote block ids with local block IDs - if we fail in the middle, we call DeleteTablet -- this means that, since we still have some remote block IDs in the metadata, the DeleteTablet call deletes local blocks based on remote block IDs. These block ids are likely to belong to other live tablets locally! This can cause pretty serious dataloss, and has the tendency to cascade around a cluster, since later attempts to copy a tablet with missing blocks will get aborted as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (KUDU-463) Add checksumming to cfile and other on-disk formats
[ https://issues.apache.org/jira/browse/KUDU-463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke reassigned KUDU-463: Assignee: Grant Henke (was: Adar Dembo) > Add checksumming to cfile and other on-disk formats > --- > > Key: KUDU-463 > URL: https://issues.apache.org/jira/browse/KUDU-463 > Project: Kudu > Issue Type: Sub-task > Components: cfile, tablet >Affects Versions: Private Beta >Reporter: Todd Lipcon >Assignee: Grant Henke > Labels: kudu-roadmap > > We should add CRC32C checksums to cfile blocks, metadata blocks, etc, to > protect against silent disk corruption. We should probably do this prior to a > public release, since it will likely have a negative performance impact, and > we don't want to have a public regression. -- This message was sent by Atlassian JIRA (v6.3.15#6346)