[ https://issues.apache.org/jira/browse/MESOS-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947020#comment-16947020 ]
Andrei Sekretenko commented on MESOS-10008: ------------------------------------------- Also it should be noted that `operator <=` makes use of `convertToFixed()`. Looks like with the current approach values larger than (1<<63)/1000.0 should not be accepted as valid scalars, and we should take care to avoid creating them. > Invalid quota config can crash master > ------------------------------------- > > Key: MESOS-10008 > URL: https://issues.apache.org/jira/browse/MESOS-10008 > Project: Mesos > Issue Type: Improvement > Reporter: Andrei Sekretenko > Priority: Major > > We are observing the following crash on the 1.9.1 master: > {code} > I1008 10:12:15.148486 4687 http.cpp:1115] HTTP POST for > /master/api/v1?_ts=1570529541073&UPDATE_QUOTA from 10.0.7.253:35410 with > User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) Ap> > I1008 10:12:15.148665 4687 http.cpp:263] Processing call UPDATE_QUOTA > I1008 10:12:15.148756 4687 quota_handler.cpp:1136] Authorizing principal > 'bootstrapuser' to update quota config for role 's1' > I1008 10:12:15.149169 4685 registrar.cpp:487] Applied 1 operations in > 56277ns; attempting to update the registry > I1008 10:12:15.149338 4681 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 13 > I1008 10:12:15.149467 4689 replica.cpp:541] Replica received write request > for position 13 from __req_res__(29)@10.0.7.253:5050 > I1008 10:12:15.151820 4683 replica.cpp:695] Replica received learned notice > for position 13 from log-network(2)@10.0.7.253:5050 > I1008 10:12:15.153559 4679 registrar.cpp:544] Successfully updated the > registry in 4.348928ms > I1008 10:12:15.153592 4678 coordinator.cpp:348] Coordinator attempting to > write TRUNCATE action at position 14 > I1008 10:12:15.153715 4679 hierarchical.cpp:1619] Updated quota for role > 's1', guarantees: {} limits: cpus:2; disk:-9.22337203685478e+15; gpus:3; > mem:1000000000000 > I1008 10:12:15.153796 4677 replica.cpp:541] Replica received write request > for position 14 from __req_res__(30)@10.0.7.253:5050 > I1008 10:12:15.155380 4691 replica.cpp:695] Replica received learned notice > for position 14 from log-network(2)@10.0.7.253:5050 > I1008 10:12:15.249722 4677 authenticator.cpp:324] dstip=10.0.7.253 > type=audit timestamp=2019-10-08 10:12:15.249673984+00:00 reason="Valid > authentication token" uid="bootstrapuser" obje> > I1008 10:12:15.249956 4682 http.cpp:1115] HTTP GET for > /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414 with > User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebK> > I1008 10:12:15.250633 4691 http.cpp:1132] HTTP GET for > /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414: '200 OK' after > 1.72621ms > I1008 10:12:15.570379 4689 hierarchical.cpp:1908] Before allocation, > required quota headroom is {} and available quota headroom is cpus:0.9; > disk:75853; mem:5507 > F1008 10:12:15.570580 4689 resource_quantities.cpp:330] Check failed: scalar > >= Value::Scalar() (-9.22337203685478e+15 vs. 0) > *** Check failure stack trace: *** > @ 0x7fc786f0148d google::LogMessage::Fail() > @ 0x7fc786f036e8 google::LogMessage::SendToLog() > @ 0x7fc786f01023 google::LogMessage::Flush() > @ 0x7fc786f04029 google::LogMessageFatal::~LogMessageFatal() > @ 0x7fc785954dfa mesos::ResourceQuantities::add() > @ 0x7fc785954fb6 mesos::ResourceQuantities::fromScalarResource() > @ 0x7fc78595e135 mesos::shrinkResources() > @ 0x7fc785a874a9 > mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::__allocate() > @ 0x7fc785a88089 > mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_allocate() > @ 0x7fc785a93882 > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingN5mesos8internal6master9allocator8internal28Hier> > @ 0x7fc786e49e21 process::ProcessBase::consume() > @ 0x7fc786e6141b process::ProcessManager::resume() > @ 0x7fc786e670b6 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > @ 0x7fc782a28b22 (unknown) > @ 0x7fc7821be94a (unknown) > @ 0x7fc781eef07f clone > {code} > Note that the value of disk quota limit is *logged* as "negative". -- This message was sent by Atlassian Jira (v8.3.4#803005)