[jira] [Updated] (MESOS-8090) Mesos 1.4.0 crashes with 1.3.x agent with oversubscription

2017-10-22 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-8090:

Fix Version/s: 1.4.1

> Mesos 1.4.0 crashes with 1.3.x agent with oversubscription
> --
>
> Key: MESOS-8090
> URL: https://issues.apache.org/jira/browse/MESOS-8090
> Project: Mesos
>  Issue Type: Bug
>  Components: master, oversubscription
>Affects Versions: 1.4.0
>Reporter: Zhitao Li
>Assignee: Zhitao Li
> Fix For: 1.4.1, 1.5.0
>
>
> We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
> over-subscription enabled agent running 1.3.1 code.
> The crash line is:
> {code:none}
> resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19
> {code}
> Stack trace in gdb:
> {panel:title=My title}
> #0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f22f3554448 in __GI_abort () at abort.c:89
> #2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
> src/utilities.cc:147
> #3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
> #4  0x7f22f61566cd in google::LogMessage::SendToLog (this= out>) at src/logging.cc:1412
> #5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
> src/logging.cc:1281
> #6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
> (this=, __in_chrg=) at src/logging.cc:1984
> #7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
> /mesos/src/common/resources.cpp:1051
> #8  0x7f22f527e1e5 in mesos::Resources::Resource_::isEmpty 
> (this=this@entry=0x7f22e713d2e0) at /mesos/src/common/resources.cpp:1173
> #9  0x7f22f527e20c in mesos::Resources::add (this=0x7f22e713d400, 
> that=...) at /mesos/src/common/resources.cpp:1993
> #10 0x7f22f527f860 in mesos::Resources::operator+= 
> (this=this@entry=0x7f22e713d400, that=...) at 
> /mesos/src/common/resources.cpp:2016
> #11 0x7f22f527f91d in mesos::Resources::operator+= 
> (this=this@entry=0x7f22e713d400, that=...) at 
> /mesos/src/common/resources.cpp:2025
> #12 0x7f22f527fa4b in mesos::Resources::Resources (this=0x7f22e713d400, 
> _resources=...) at /mesos/src/common/resources.cpp:1277
> #13 0x7f22f548b812 in mesos::internal::master::Master::updateSlave 
> (this=0x558137bbae70, message=...) at /mesos/src/master/master.cpp:6681
> #14 0x7f22f550adc1 in 
> ProtobufProcess::_handlerM
>  (t=0x558137bbae70, method=
> (void 
> (mesos::internal::master::Master::*)(mesos::internal::master::Master * const, 
> const mesos::internal::UpdateSlaveMessage &)) 0x7f22f548b6d0 
>   const&)>, 
> data="\n)\n'07ba28cc-d9fa-44fb-8d6b-f8c5c90f8a90-S1\022\030\n\004cpus\020\000\032\t\t\000\000\000\000\000\000\063@2\001*J")
> at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:799
> #15 0x7f22f54c8791 in 
> ProtobufProcess::visit (this=0x558137bbae70, 
> event=...) at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:104
> #16 0x7f22f54572d4 in mesos::internal::master::Master::_visit 
> (this=this@entry=0x558137bbae70, event=...) at 
> /mesos/src/master/master.cpp:1643
> #17 0x7f22f547014d in mesos::internal::master::Master::visit 
> (this=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1575
> #18 0x7f22f60b7169 in serve (event=..., this=0x558137bbbf28) at 
> /mesos/3rdparty/libprocess/include/process/process.hpp:87
> #19 process::ProcessManager::resume (this=, 
> process=0x558137bbbf28) at /mesos/3rdparty/libprocess/src/process.cpp:3346
> #20 0x7f22f60bd056 in operator() (__closure=0x558137aa3218) at 
> /mesos/3rdparty/libprocess/src/process.cpp:2881
> #21 _M_invoke<> (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1700
> #22 operator() (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1688
> #23 
> std::thread::_Impl()>
>  >::_M_run(void) (this=0x558137aa3200) at /usr/include/c++/4.9/thread:115
> #24 0x7f22f40b3970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #25 0x7f22f38d1064 in start_thread (arg=0x7f22e713e700) at 
> pthread_create.c:309
> #26 0x7f22f360662d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8090) Mesos 1.4.0 crashes with 1.3.x agent with oversubscription

2017-10-13 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-8090:
-
Description: 
We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
over-subscription enabled agent running 1.3.1 code.

The crash line is:

{code:none}
resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19
{code}

Stack trace in gdb:

{panel:title=My title}
#0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f22f3554448 in __GI_abort () at abort.c:89
#2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7f22f61566cd in google::LogMessage::SendToLog (this=) 
at src/logging.cc:1412
#5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
src/logging.cc:1281
#6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
(this=, __in_chrg=) at src/logging.cc:1984
#7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
/mesos/src/common/resources.cpp:1051
#8  0x7f22f527e1e5 in mesos::Resources::Resource_::isEmpty 
(this=this@entry=0x7f22e713d2e0) at /mesos/src/common/resources.cpp:1173
#9  0x7f22f527e20c in mesos::Resources::add (this=0x7f22e713d400, that=...) 
at /mesos/src/common/resources.cpp:1993
#10 0x7f22f527f860 in mesos::Resources::operator+= 
(this=this@entry=0x7f22e713d400, that=...) at 
/mesos/src/common/resources.cpp:2016
#11 0x7f22f527f91d in mesos::Resources::operator+= 
(this=this@entry=0x7f22e713d400, that=...) at 
/mesos/src/common/resources.cpp:2025
#12 0x7f22f527fa4b in mesos::Resources::Resources (this=0x7f22e713d400, 
_resources=...) at /mesos/src/common/resources.cpp:1277
#13 0x7f22f548b812 in mesos::internal::master::Master::updateSlave 
(this=0x558137bbae70, message=...) at /mesos/src/master/master.cpp:6681
#14 0x7f22f550adc1 in 
ProtobufProcess::_handlerM
 (t=0x558137bbae70, method=
(void (mesos::internal::master::Master::*)(mesos::internal::master::Master 
* const, const mesos::internal::UpdateSlaveMessage &)) 0x7f22f548b6d0 
, 
data="\n)\n'07ba28cc-d9fa-44fb-8d6b-f8c5c90f8a90-S1\022\030\n\004cpus\020\000\032\t\t\000\000\000\000\000\000\063@2\001*J")
at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:799
#15 0x7f22f54c8791 in 
ProtobufProcess::visit (this=0x558137bbae70, 
event=...) at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:104
#16 0x7f22f54572d4 in mesos::internal::master::Master::_visit 
(this=this@entry=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1643
#17 0x7f22f547014d in mesos::internal::master::Master::visit 
(this=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1575
#18 0x7f22f60b7169 in serve (event=..., this=0x558137bbbf28) at 
/mesos/3rdparty/libprocess/include/process/process.hpp:87
#19 process::ProcessManager::resume (this=, 
process=0x558137bbbf28) at /mesos/3rdparty/libprocess/src/process.cpp:3346
#20 0x7f22f60bd056 in operator() (__closure=0x558137aa3218) at 
/mesos/3rdparty/libprocess/src/process.cpp:2881
#21 _M_invoke<> (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1700
#22 operator() (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1688
#23 
std::thread::_Impl()>
 >::_M_run(void) (this=0x558137aa3200) at /usr/include/c++/4.9/thread:115
#24 0x7f22f40b3970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x7f22f38d1064 in start_thread (arg=0x7f22e713e700) at 
pthread_create.c:309
#26 0x7f22f360662d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{panel}



  was:
We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
over-subscription enabled agent running 1.3.1 code.

The crash line is:


{panel:title=My title}
resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19
{panel}

Stack trace in gdb:

{panel:title=My title}
#0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f22f3554448 in __GI_abort () at abort.c:89
#2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7f22f61566cd in google::LogMessage::SendToLog (this=) 
at src/logging.cc:1412
#5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
src/logging.cc:1281
#6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
(this=, __in_chrg=) at src/logging.cc:1984
#7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
/mesos/src/common/resources.cpp:1051
#8  0x7f22f527e1e5 in 

[jira] [Updated] (MESOS-8090) Mesos 1.4.0 crashes with 1.3.x agent with oversubscription

2017-10-13 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-8090:
-
Description: 
We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
over-subscription enabled agent running 1.3.1 code.

The crash line is:


{panel:title=My title}
resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19
{panel}

Stack trace in gdb:

{panel:title=My title}
#0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f22f3554448 in __GI_abort () at abort.c:89
#2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7f22f61566cd in google::LogMessage::SendToLog (this=) 
at src/logging.cc:1412
#5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
src/logging.cc:1281
#6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
(this=, __in_chrg=) at src/logging.cc:1984
#7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
/mesos/src/common/resources.cpp:1051
#8  0x7f22f527e1e5 in mesos::Resources::Resource_::isEmpty 
(this=this@entry=0x7f22e713d2e0) at /mesos/src/common/resources.cpp:1173
#9  0x7f22f527e20c in mesos::Resources::add (this=0x7f22e713d400, that=...) 
at /mesos/src/common/resources.cpp:1993
#10 0x7f22f527f860 in mesos::Resources::operator+= 
(this=this@entry=0x7f22e713d400, that=...) at 
/mesos/src/common/resources.cpp:2016
#11 0x7f22f527f91d in mesos::Resources::operator+= 
(this=this@entry=0x7f22e713d400, that=...) at 
/mesos/src/common/resources.cpp:2025
#12 0x7f22f527fa4b in mesos::Resources::Resources (this=0x7f22e713d400, 
_resources=...) at /mesos/src/common/resources.cpp:1277
#13 0x7f22f548b812 in mesos::internal::master::Master::updateSlave 
(this=0x558137bbae70, message=...) at /mesos/src/master/master.cpp:6681
#14 0x7f22f550adc1 in 
ProtobufProcess::_handlerM
 (t=0x558137bbae70, method=
(void (mesos::internal::master::Master::*)(mesos::internal::master::Master 
* const, const mesos::internal::UpdateSlaveMessage &)) 0x7f22f548b6d0 
, 
data="\n)\n'07ba28cc-d9fa-44fb-8d6b-f8c5c90f8a90-S1\022\030\n\004cpus\020\000\032\t\t\000\000\000\000\000\000\063@2\001*J")
at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:799
#15 0x7f22f54c8791 in 
ProtobufProcess::visit (this=0x558137bbae70, 
event=...) at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:104
#16 0x7f22f54572d4 in mesos::internal::master::Master::_visit 
(this=this@entry=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1643
#17 0x7f22f547014d in mesos::internal::master::Master::visit 
(this=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1575
#18 0x7f22f60b7169 in serve (event=..., this=0x558137bbbf28) at 
/mesos/3rdparty/libprocess/include/process/process.hpp:87
#19 process::ProcessManager::resume (this=, 
process=0x558137bbbf28) at /mesos/3rdparty/libprocess/src/process.cpp:3346
#20 0x7f22f60bd056 in operator() (__closure=0x558137aa3218) at 
/mesos/3rdparty/libprocess/src/process.cpp:2881
#21 _M_invoke<> (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1700
#22 operator() (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1688
#23 
std::thread::_Impl()>
 >::_M_run(void) (this=0x558137aa3200) at /usr/include/c++/4.9/thread:115
#24 0x7f22f40b3970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x7f22f38d1064 in start_thread (arg=0x7f22e713e700) at 
pthread_create.c:309
#26 0x7f22f360662d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{panel}



  was:
We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
over-subscription enabled agent running 1.3.1 code.

The crash line is:

resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19

Stack trace in gdb:

{panel:title=My title}
#0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f22f3554448 in __GI_abort () at abort.c:89
#2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
src/utilities.cc:147
#3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x7f22f61566cd in google::LogMessage::SendToLog (this=) 
at src/logging.cc:1412
#5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
src/logging.cc:1281
#6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
(this=, __in_chrg=) at src/logging.cc:1984
#7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
/mesos/src/common/resources.cpp:1051
#8  0x7f22f527e1e5 in mesos::Resources::Resource_::isEmpty 

[jira] [Updated] (MESOS-8090) Mesos 1.4.0 crashes with 1.3.x agent with oversubscription

2017-10-13 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li updated MESOS-8090:
-
Affects Version/s: 1.4.0

> Mesos 1.4.0 crashes with 1.3.x agent with oversubscription
> --
>
> Key: MESOS-8090
> URL: https://issues.apache.org/jira/browse/MESOS-8090
> Project: Mesos
>  Issue Type: Bug
>  Components: master, oversubscription
>Affects Versions: 1.4.0
>Reporter: Zhitao Li
>Assignee: Michael Park
>
> We are seeing a crash in 1.4.0 master when it receives {{updateSlave}} from a 
> over-subscription enabled agent running 1.3.1 code.
> The crash line is:
> resources.cpp:1050] Check failed: !resource.has_role() cpus{REV}:19
> Stack trace in gdb:
> {panel:title=My title}
> #0  0x7f22f3553067 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f22f3554448 in __GI_abort () at abort.c:89
> #2  0x7f22f615cd79 in google::DumpStackTraceAndExit () at 
> src/utilities.cc:147
> #3  0x7f22f6154a4d in google::LogMessage::Fail () at src/logging.cc:1458
> #4  0x7f22f61566cd in google::LogMessage::SendToLog (this= out>) at src/logging.cc:1412
> #5  0x7f22f6154612 in google::LogMessage::Flush (this=0x18ac7) at 
> src/logging.cc:1281
> #6  0x7f22f61570b9 in google::LogMessageFatal::~LogMessageFatal 
> (this=, __in_chrg=) at src/logging.cc:1984
> #7  0x7f22f527e133 in mesos::Resources::isEmpty (resource=...) at 
> /mesos/src/common/resources.cpp:1051
> #8  0x7f22f527e1e5 in mesos::Resources::Resource_::isEmpty 
> (this=this@entry=0x7f22e713d2e0) at /mesos/src/common/resources.cpp:1173
> #9  0x7f22f527e20c in mesos::Resources::add (this=0x7f22e713d400, 
> that=...) at /mesos/src/common/resources.cpp:1993
> #10 0x7f22f527f860 in mesos::Resources::operator+= 
> (this=this@entry=0x7f22e713d400, that=...) at 
> /mesos/src/common/resources.cpp:2016
> #11 0x7f22f527f91d in mesos::Resources::operator+= 
> (this=this@entry=0x7f22e713d400, that=...) at 
> /mesos/src/common/resources.cpp:2025
> #12 0x7f22f527fa4b in mesos::Resources::Resources (this=0x7f22e713d400, 
> _resources=...) at /mesos/src/common/resources.cpp:1277
> #13 0x7f22f548b812 in mesos::internal::master::Master::updateSlave 
> (this=0x558137bbae70, message=...) at /mesos/src/master/master.cpp:6681
> #14 0x7f22f550adc1 in 
> ProtobufProcess::_handlerM
>  (t=0x558137bbae70, method=
> (void 
> (mesos::internal::master::Master::*)(mesos::internal::master::Master * const, 
> const mesos::internal::UpdateSlaveMessage &)) 0x7f22f548b6d0 
>   const&)>, 
> data="\n)\n'07ba28cc-d9fa-44fb-8d6b-f8c5c90f8a90-S1\022\030\n\004cpus\020\000\032\t\t\000\000\000\000\000\000\063@2\001*J")
> at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:799
> #15 0x7f22f54c8791 in 
> ProtobufProcess::visit (this=0x558137bbae70, 
> event=...) at /mesos/3rdparty/libprocess/include/process/protobuf.hpp:104
> #16 0x7f22f54572d4 in mesos::internal::master::Master::_visit 
> (this=this@entry=0x558137bbae70, event=...) at 
> /mesos/src/master/master.cpp:1643
> #17 0x7f22f547014d in mesos::internal::master::Master::visit 
> (this=0x558137bbae70, event=...) at /mesos/src/master/master.cpp:1575
> #18 0x7f22f60b7169 in serve (event=..., this=0x558137bbbf28) at 
> /mesos/3rdparty/libprocess/include/process/process.hpp:87
> #19 process::ProcessManager::resume (this=, 
> process=0x558137bbbf28) at /mesos/3rdparty/libprocess/src/process.cpp:3346
> #20 0x7f22f60bd056 in operator() (__closure=0x558137aa3218) at 
> /mesos/3rdparty/libprocess/src/process.cpp:2881
> #21 _M_invoke<> (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1700
> #22 operator() (this=0x558137aa3218) at /usr/include/c++/4.9/functional:1688
> #23 
> std::thread::_Impl()>
>  >::_M_run(void) (this=0x558137aa3200) at /usr/include/c++/4.9/thread:115
> #24 0x7f22f40b3970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #25 0x7f22f38d1064 in start_thread (arg=0x7f22e713e700) at 
> pthread_create.c:309
> #26 0x7f22f360662d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)