Re: Single host VM limit when using RBD

2013-01-17 Thread Andrey Korolyov
Hi Matthew,

Seems to a low value in /proc/sys/kernel/threads-max value.

On Thu, Jan 17, 2013 at 12:37 PM, Matthew Anderson
 wrote:
> I've run into a limit on the maximum number of RBD backed VM's that I'm able 
> to run on a single host. I have 20 VM's (21 RBD volumes open) running on a 
> single host and when booting the 21st machine I get the below error from 
> libvirt/QEMU. I'm able to shut down a VM and start another in it's place so 
> there seems to be a hard limit on the amount of volumes I'm able to have 
> open.  I did some googling and the error 11 from pthread_create seems to mean 
> 'resource unavailable' so I'm probably running into a thread limit of some 
> sort. I did try increasing the max_thread kernel option but nothing changed. 
> I moved a few VM's to a different empty host and they start with no issues at 
> all.
>
> This machine has 4 OSD's running on it in addition to the 20 VM's. Kernel 
> 3.7.1. Ceph 0.56.1 and QEMU 1.3.0. There is currently 65GB of 96GB free ram 
> and no swap.
>
> Can anyone suggest where the limit might be or anything I can do to narrow 
> down the problem?
>
> Thanks
> -Matt
> -
>
> Error starting domain: internal error Process exited while reading console 
> log output: char device redirected to /dev/pts/23
> Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In 
> function 'void Thread::create(size_t)' thread 7f4eb5a65960 time 2013-01-17 
> 02:32:58.096437
> common/Thread.cc: 110: FAILED assert(ret == 0)
> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: (()+0x2aaa8f) [0x7f4eb2de8a8f]
> 2: (SafeTimer::init()+0x95) [0x7f4eb2cd2575]
> 3: (librados::RadosClient::connect()+0x72c) [0x7f4eb2c689dc]
> 4: (()+0xa0290) [0x7f4eb5b27290]
> 5: (()+0x879dd) [0x7f4eb5b0e9dd]
> 6: (()+0x87c1b) [0x7f4eb5b0ec1b]
> 7: (()+0x87ae1) [0x7f4eb5b0eae1]
> 8: (()+0x87d50) [0x7f4eb5b0ed50]
> 9: (()+0xb37b2) [0x7f4eb5b3a7b2]
> 10: (()+0x1e83eb) [0x7f4eb5c6f3eb]
> 11: (()+0x1ab54a) [0x7f4eb5c3254a]
> 12: (main()+0x9da) [0x7f4eb5c72a3a]
> 13: (__libc_start_main()+0xfd) [0x7f4eb1ab4cdd]
> 14: (()+0x710b9) [0x7f4eb5af80b9]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> terminate called after
>
> Traceback (most recent call last):
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 96, in 
> cb_wrapper
> callback(asyncjob, *args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 117, in tmpcb
> callback(*args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/domain.py", line 1090, in startup
> self._backend.create()
>   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 620, in create
> if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
> libvirtError: internal error Process exited while reading console log output: 
> char device redirected to /dev/pts/23
> Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In 
> function 'void Thread::create(size_t)' thread 7f4eb5a65960 time 2013-01-17 
> 02:32:58.096437
> common/Thread.cc: 110: FAILED assert(ret == 0)
> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: (()+0x2aaa8f) [0x7f4eb2de8a8f]
> 2: (SafeTimer::init()+0x95) [0x7f4eb2cd2575]
> 3: (librados::RadosClient::connect()+0x72c) [0x7f4eb2c689dc]
> 4: (()+0xa0290) [0x7f4eb5b27290]
> 5: (()+0x879dd) [0x7f4eb5b0e9dd]
> 6: (()+0x87c1b) [0x7f4eb5b0ec1b]
> 7: (()+0x87ae1) [0x7f4eb5b0eae1]
> 8: (()+0x87d50) [0x7f4eb5b0ed50]
> 9: (()+0xb37b2) [0x7f4eb5b3a7b2]
> 10: (()+0x1e83eb) [0x7f4eb5c6f3eb]
> 11: (()+0x1ab54a) [0x7f4eb5c3254a]
> 12: (main()+0x9da) [0x7f4eb5c72a3a]
> 13: (__libc_start_main()+0xfd) [0x7f4eb1ab4cdd]
> 14: (()+0x710b9) [0x7f4eb5af80b9]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> terminate called after
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Single host VM limit when using RBD

2013-01-17 Thread Matthew Anderson
Hi Audrey,

I did try your suggestion beforehand and it doesn't appear to fix the issue. 

[root@KVM04 ~]# cat  /proc/sys/kernel/threads-max 
2549635
[root@KVM04 ~]# echo 5549635 > /proc/sys/kernel/threads-max
[root@KVM04 ~]# virsh start EX03
error: Failed to start domain EX03
error: internal error Process exited while reading console log output: char 
device redirected to /dev/pts/23
Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In 
function 'void Thread::create(size_t)' thread 7f5ec9706960 time 2013-01-17 
16:46:50.935681
common/Thread.cc: 110: FAILED assert(ret == 0)
 ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
 1: (()+0x2aaa8f) [0x7f5ec6a89a8f]
 2: (SafeTimer::init()+0x95) [0x7f5ec6973575]
 3: (librados::RadosClient::connect()+0x72c) [0x7f5ec69099dc]
 4: (()+0xa0290) [0x7f5ec97c8290]
 5: (()+0x879dd) [0x7f5ec97af9dd]
 6: (()+0x87c1b) [0x7f5ec97afc1b]
 7: (()+0x87ae1) [0x7f5ec97afae1]
 8: (()+0x87d50) [0x7f5ec97afd50]
 9: (()+0xb37b2) [0x7f5ec97db7b2]
 10: (()+0x1e83eb) [0x7f5ec99103eb]
 11: (()+0x1ab54a) [0x7f5ec98d354a]
 12: (main()+0x9da) [0x7f5ec9913a3a]
 13: (__libc_start_main()+0xfd) [0x7f5ec5755cdd]
 14: (()+0x710b9) [0x7f5ec97990b9]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
terminate called after
    


-Original Message-
From: Andrey Korolyov [mailto:and...@xdel.ru] 
Sent: Thursday, 17 January 2013 4:42 PM
To: Matthew Anderson
Cc: ceph-devel@vger.kernel.org
Subject: Re: Single host VM limit when using RBD

Hi Matthew,

Seems to a low value in /proc/sys/kernel/threads-max value.

On Thu, Jan 17, 2013 at 12:37 PM, Matthew Anderson  
wrote:
> I've run into a limit on the maximum number of RBD backed VM's that I'm able 
> to run on a single host. I have 20 VM's (21 RBD volumes open) running on a 
> single host and when booting the 21st machine I get the below error from 
> libvirt/QEMU. I'm able to shut down a VM and start another in it's place so 
> there seems to be a hard limit on the amount of volumes I'm able to have 
> open.  I did some googling and the error 11 from pthread_create seems to mean 
> 'resource unavailable' so I'm probably running into a thread limit of some 
> sort. I did try increasing the max_thread kernel option but nothing changed. 
> I moved a few VM's to a different empty host and they start with no issues at 
> all.
>
> This machine has 4 OSD's running on it in addition to the 20 VM's. Kernel 
> 3.7.1. Ceph 0.56.1 and QEMU 1.3.0. There is currently 65GB of 96GB free ram 
> and no swap.
>
> Can anyone suggest where the limit might be or anything I can do to narrow 
> down the problem?
>
> Thanks
> -Matt
> -
>
> Error starting domain: internal error Process exited while reading 
> console log output: char device redirected to /dev/pts/23
> Thread::try_create(): pthread_create failed with error 
> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 
> 7f4eb5a65960 time 2013-01-17 02:32:58.096437
> common/Thread.cc: 110: FAILED assert(ret == 0) ceph version 0.56.1 
> (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: (()+0x2aaa8f) [0x7f4eb2de8a8f]
> 2: (SafeTimer::init()+0x95) [0x7f4eb2cd2575]
> 3: (librados::RadosClient::connect()+0x72c) [0x7f4eb2c689dc]
> 4: (()+0xa0290) [0x7f4eb5b27290]
> 5: (()+0x879dd) [0x7f4eb5b0e9dd]
> 6: (()+0x87c1b) [0x7f4eb5b0ec1b]
> 7: (()+0x87ae1) [0x7f4eb5b0eae1]
> 8: (()+0x87d50) [0x7f4eb5b0ed50]
> 9: (()+0xb37b2) [0x7f4eb5b3a7b2]
> 10: (()+0x1e83eb) [0x7f4eb5c6f3eb]
> 11: (()+0x1ab54a) [0x7f4eb5c3254a]
> 12: (main()+0x9da) [0x7f4eb5c72a3a]
> 13: (__libc_start_main()+0xfd) [0x7f4eb1ab4cdd]
> 14: (()+0x710b9) [0x7f4eb5af80b9]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> terminate called after
>
> Traceback (most recent call last):
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 96, in 
> cb_wrapper
> callback(asyncjob, *args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/asyncjob.py", line 117, in tmpcb
> callback(*args, **kwargs)
>   File "/usr/share/virt-manager/virtManager/domain.py", line 1090, in startup
> self._backend.create()
>   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 620, in create
> if ret == -1: raise libvirtError ('virDomainCreate() failed', 
> dom=self)
> libvirtError: internal error Process exited while reading console log 
> output: char device redirected to /dev/pts/23
> Thread::try_create(): pthread_create failed with error 
> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 
> 7f4eb5a65960 time 2013-01-17 02:32:58.096437
> 

Re: Single host VM limit when using RBD

2013-01-17 Thread Dan Mick
How about RLIMIT_NPROC, or memory exhaustion?

On Jan 17, 2013, at 12:47 AM, Matthew Anderson  wrote:

> Hi Audrey,
> 
> I did try your suggestion beforehand and it doesn't appear to fix the issue. 
> 
> [root@KVM04 ~]# cat  /proc/sys/kernel/threads-max 
> 2549635
> [root@KVM04 ~]# echo 5549635 > /proc/sys/kernel/threads-max
> [root@KVM04 ~]# virsh start EX03
> error: Failed to start domain EX03
> error: internal error Process exited while reading console log output: char 
> device redirected to /dev/pts/23
> Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In 
> function 'void Thread::create(size_t)' thread 7f5ec9706960 time 2013-01-17 
> 16:46:50.935681
> common/Thread.cc: 110: FAILED assert(ret == 0)
> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
> 1: (()+0x2aaa8f) [0x7f5ec6a89a8f]
> 2: (SafeTimer::init()+0x95) [0x7f5ec6973575]
> 3: (librados::RadosClient::connect()+0x72c) [0x7f5ec69099dc]
> 4: (()+0xa0290) [0x7f5ec97c8290]
> 5: (()+0x879dd) [0x7f5ec97af9dd]
> 6: (()+0x87c1b) [0x7f5ec97afc1b]
> 7: (()+0x87ae1) [0x7f5ec97afae1]
> 8: (()+0x87d50) [0x7f5ec97afd50]
> 9: (()+0xb37b2) [0x7f5ec97db7b2]
> 10: (()+0x1e83eb) [0x7f5ec99103eb]
> 11: (()+0x1ab54a) [0x7f5ec98d354a]
> 12: (main()+0x9da) [0x7f5ec9913a3a]
> 13: (__libc_start_main()+0xfd) [0x7f5ec5755cdd]
> 14: (()+0x710b9) [0x7f5ec97990b9]
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> terminate called after
> 
> 
> 
> -Original Message-
> From: Andrey Korolyov [mailto:and...@xdel.ru] 
> Sent: Thursday, 17 January 2013 4:42 PM
> To: Matthew Anderson
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: Single host VM limit when using RBD
> 
> Hi Matthew,
> 
> Seems to a low value in /proc/sys/kernel/threads-max value.
> 
> On Thu, Jan 17, 2013 at 12:37 PM, Matthew Anderson  
> wrote:
>> I've run into a limit on the maximum number of RBD backed VM's that I'm able 
>> to run on a single host. I have 20 VM's (21 RBD volumes open) running on a 
>> single host and when booting the 21st machine I get the below error from 
>> libvirt/QEMU. I'm able to shut down a VM and start another in it's place so 
>> there seems to be a hard limit on the amount of volumes I'm able to have 
>> open.  I did some googling and the error 11 from pthread_create seems to 
>> mean 'resource unavailable' so I'm probably running into a thread limit of 
>> some sort. I did try increasing the max_thread kernel option but nothing 
>> changed. I moved a few VM's to a different empty host and they start with no 
>> issues at all.
>> 
>> This machine has 4 OSD's running on it in addition to the 20 VM's. Kernel 
>> 3.7.1. Ceph 0.56.1 and QEMU 1.3.0. There is currently 65GB of 96GB free ram 
>> and no swap.
>> 
>> Can anyone suggest where the limit might be or anything I can do to narrow 
>> down the problem?
>> 
>> Thanks
>> -Matt
>> -
>> 
>> Error starting domain: internal error Process exited while reading 
>> console log output: char device redirected to /dev/pts/23
>> Thread::try_create(): pthread_create failed with error 
>> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 
>> 7f4eb5a65960 time 2013-01-17 02:32:58.096437
>> common/Thread.cc: 110: FAILED assert(ret == 0) ceph version 0.56.1 
>> (e4a541624df62ef353e754391cbbb707f54b16f7)
>> 1: (()+0x2aaa8f) [0x7f4eb2de8a8f]
>> 2: (SafeTimer::init()+0x95) [0x7f4eb2cd2575]
>> 3: (librados::RadosClient::connect()+0x72c) [0x7f4eb2c689dc]
>> 4: (()+0xa0290) [0x7f4eb5b27290]
>> 5: (()+0x879dd) [0x7f4eb5b0e9dd]
>> 6: (()+0x87c1b) [0x7f4eb5b0ec1b]
>> 7: (()+0x87ae1) [0x7f4eb5b0eae1]
>> 8: (()+0x87d50) [0x7f4eb5b0ed50]
>> 9: (()+0xb37b2) [0x7f4eb5b3a7b2]
>> 10: (()+0x1e83eb) [0x7f4eb5c6f3eb]
>> 11: (()+0x1ab54a) [0x7f4eb5c3254a]
>> 12: (main()+0x9da) [0x7f4eb5c72a3a]
>> 13: (__libc_start_main()+0xfd) [0x7f4eb1ab4cdd]
>> 14: (()+0x710b9) [0x7f4eb5af80b9]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
>> interpret this.
>> terminate called after
>> 
>> Traceback (most recent call last):
>>  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 96, in 
>> cb_wrapper
>>callback(asyncjob, *args, **kwargs)
>>  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 117, in tmpcb
>>callback(*args, **kwargs)
>>  File "/usr/share/virt-manager/virtManager/domain.py", line 1090, in startup
>>   

Re: Single host VM limit when using RBD

2013-01-17 Thread Jim Schutt
On 01/17/2013 11:36 AM, Dan Mick wrote:
> How about RLIMIT_NPROC, or memory exhaustion?

Also, check /proc/sys/kernel/pid_max.

I've solved a similar pthread_create problem by increasing this
to 256k, up from 32k.

-- Jim

> 
> On Jan 17, 2013, at 12:47 AM, Matthew Anderson  wrote:
> 
>> Hi Audrey,
>>
>> I did try your suggestion beforehand and it doesn't appear to fix the issue. 
>>
>> [root@KVM04 ~]# cat  /proc/sys/kernel/threads-max 
>> 2549635
>> [root@KVM04 ~]# echo 5549635 > /proc/sys/kernel/threads-max
>> [root@KVM04 ~]# virsh start EX03
>> error: Failed to start domain EX03
>> error: internal error Process exited while reading console log output: char 
>> device redirected to /dev/pts/23
>> Thread::try_create(): pthread_create failed with error 11common/Thread.cc: 
>> In function 'void Thread::create(size_t)' thread 7f5ec9706960 time 
>> 2013-01-17 16:46:50.935681
>> common/Thread.cc: 110: FAILED assert(ret == 0)
>> ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7)
>> 1: (()+0x2aaa8f) [0x7f5ec6a89a8f]
>> 2: (SafeTimer::init()+0x95) [0x7f5ec6973575]
>> 3: (librados::RadosClient::connect()+0x72c) [0x7f5ec69099dc]
>> 4: (()+0xa0290) [0x7f5ec97c8290]
>> 5: (()+0x879dd) [0x7f5ec97af9dd]
>> 6: (()+0x87c1b) [0x7f5ec97afc1b]
>> 7: (()+0x87ae1) [0x7f5ec97afae1]
>> 8: (()+0x87d50) [0x7f5ec97afd50]
>> 9: (()+0xb37b2) [0x7f5ec97db7b2]
>> 10: (()+0x1e83eb) [0x7f5ec99103eb]
>> 11: (()+0x1ab54a) [0x7f5ec98d354a]
>> 12: (main()+0x9da) [0x7f5ec9913a3a]
>> 13: (__libc_start_main()+0xfd) [0x7f5ec5755cdd]
>> 14: (()+0x710b9) [0x7f5ec97990b9]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
>> interpret this.
>> terminate called after
>>     
>>
>>
>> -Original Message-
>> From: Andrey Korolyov [mailto:and...@xdel.ru] 
>> Sent: Thursday, 17 January 2013 4:42 PM
>> To: Matthew Anderson
>> Cc: ceph-devel@vger.kernel.org
>> Subject: Re: Single host VM limit when using RBD
>>
>> Hi Matthew,
>>
>> Seems to a low value in /proc/sys/kernel/threads-max value.
>>
>> On Thu, Jan 17, 2013 at 12:37 PM, Matthew Anderson  
>> wrote:
>>> I've run into a limit on the maximum number of RBD backed VM's that I'm 
>>> able to run on a single host. I have 20 VM's (21 RBD volumes open) running 
>>> on a single host and when booting the 21st machine I get the below error 
>>> from libvirt/QEMU. I'm able to shut down a VM and start another in it's 
>>> place so there seems to be a hard limit on the amount of volumes I'm able 
>>> to have open.  I did some googling and the error 11 from pthread_create 
>>> seems to mean 'resource unavailable' so I'm probably running into a thread 
>>> limit of some sort. I did try increasing the max_thread kernel option but 
>>> nothing changed. I moved a few VM's to a different empty host and they 
>>> start with no issues at all.
>>>
>>> This machine has 4 OSD's running on it in addition to the 20 VM's. Kernel 
>>> 3.7.1. Ceph 0.56.1 and QEMU 1.3.0. There is currently 65GB of 96GB free ram 
>>> and no swap.
>>>
>>> Can anyone suggest where the limit might be or anything I can do to narrow 
>>> down the problem?
>>>
>>> Thanks
>>> -Matt
>>> -
>>>
>>> Error starting domain: internal error Process exited while reading 
>>> console log output: char device redirected to /dev/pts/23
>>> Thread::try_create(): pthread_create failed with error 
>>> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 
>>> 7f4eb5a65960 time 2013-01-17 02:32:58.096437
>>> common/Thread.cc: 110: FAILED assert(ret == 0) ceph version 0.56.1 
>>> (e4a541624df62ef353e754391cbbb707f54b16f7)
>>> 1: (()+0x2aaa8f) [0x7f4eb2de8a8f]
>>> 2: (SafeTimer::init()+0x95) [0x7f4eb2cd2575]
>>> 3: (librados::RadosClient::connect()+0x72c) [0x7f4eb2c689dc]
>>> 4: (()+0xa0290) [0x7f4eb5b27290]
>>> 5: (()+0x879dd) [0x7f4eb5b0e9dd]
>>> 6: (()+0x87c1b) [0x7f4eb5b0ec1b]
>>> 7: (()+0x87ae1) [0x7f4eb5b0eae1]
>>> 8: (()+0x87d50) [0x7f4eb5b0ed50]
>>> 9: (()+0xb37b2) [0x7f4eb5b3a7b2]
>>> 10: (()+0x1e83eb) [0x7f4eb5c6f3eb]
>>> 11: (()+0x1ab54a) [0x7f4eb5c3254a]
>>> 12: (main()+0x9da) [0x7f4eb5c72a3a]
>>> 13: (__libc_start_main()+0xfd) [0x7f4eb1ab4cdd]
>>> 14: (()+0x710b9) [0x7f4eb5af80b9]
>>> NOTE: a