Re: [Openstack] [Cinder] HP MSA array as a secondary backend is only used user is an admin

2018-10-18 Thread Jim Okken
just a bump.
can anyone offer any advice on this cinder driver
cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver?

thanks!
-- Jim


On Thu, Oct 11, 2018 at 4:08 PM Jim Okken  wrote:

> hi All,
>
> not sure if I can find an answer here to this specific situation with the
> cinder backend driver cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver.
> If not how can I get in touch with someone more familiar with
> cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver
>
> we have a HP MSA storage array connected to most of our compute nodes and
> we are using the cinder driver
> cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver as a second backend so
> that openstack can, if directed by metadata, create volumes on it during
> instance creation. Openstack creates volumes using this MSA backend if the
> metadata of the image selected contains "cinder_image_volume_type=MSA".
> This second MSA type of volume was added to cinder.
>
> We use a CentOS-6-x86_64-GenericCloud-1707.qcow2 image which has this
> metadata added. Without this metadata RBD/CEPH images are made
>
> This works great for the admin user but not for a regular _ member_ user.
>
> With the admin user volumes created show Type=*MSA* and
> Host=node-44.domain.com@*MSA#A*. (correct)
>
> With the _member_ user volumes created show Type=*MSA* but
> Host=rbd:volumes@RBD-backend#*RBD-backend (this is CEPH, incorrect!)*.
>
> And I can confirm the volume is not on the MSA. Correct RBD/CEPH volumes
> show Type=*volumes_ceph* and Host=rbd:volumes@RBD-backend#*RBD-backend*.
>
> This happens if the cinder volume type is created as a Private type or a
> Public type.
>
> I have tried to set the properties on the cinder MSA volume type for the
> specific project we want to use this volume type in, and to set the
> project-domain for this volume type. nothing has helped.
>
> can anyone shed any light on this behavior or point out anything helpful
> in the logs pls?
>
> Looking at the logs I do see the _ member_ user is a non-default-domain
> user while admin is obviously the default domain. other than that I can't
> make heads or tails of the logs.
>
> Here are logs if anyone wants to look at them:
> a bad _ member_ volume creation was UUID
> fb9047c3-1b6b-4d2b-bae8-5177e86eb1f2 https://pastebin.com/bmFAy6RR
>
> a good admin volume creation was UUID b49e33db-8ab8-489f-b7cb-092f421178c1
> https://pastebin.com/5SAecNJ2
>
> We are using Newton, thanks!!!
>
>
> -- Jim
>
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Cinder] HP MSA array as a secondary backend is only used user is an admin

2018-10-11 Thread Jim Okken
hi All,

not sure if I can find an answer here to this specific situation with the
cinder backend driver cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver.
If not how can I get in touch with someone more familiar with
cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver

we have a HP MSA storage array connected to most of our compute nodes and
we are using the cinder driver
cinder.volume.drivers.san.hp.hpmsa_fc.HPMSAFCDriver as a second backend so
that openstack can, if directed by metadata, create volumes on it during
instance creation. Openstack creates volumes using this MSA backend if the
metadata of the image selected contains "cinder_image_volume_type=MSA".
This second MSA type of volume was added to cinder.

We use a CentOS-6-x86_64-GenericCloud-1707.qcow2 image which has this
metadata added. Without this metadata RBD/CEPH images are made

This works great for the admin user but not for a regular _ member_ user.

With the admin user volumes created show Type=*MSA* and
Host=node-44.domain.com@*MSA#A*. (correct)

With the _member_ user volumes created show Type=*MSA* but
Host=rbd:volumes@RBD-backend#*RBD-backend (this is CEPH, incorrect!)*.

And I can confirm the volume is not on the MSA. Correct RBD/CEPH volumes
show Type=*volumes_ceph* and Host=rbd:volumes@RBD-backend#*RBD-backend*.

This happens if the cinder volume type is created as a Private type or a
Public type.

I have tried to set the properties on the cinder MSA volume type for the
specific project we want to use this volume type in, and to set the
project-domain for this volume type. nothing has helped.

can anyone shed any light on this behavior or point out anything helpful in
the logs pls?

Looking at the logs I do see the _ member_ user is a non-default-domain
user while admin is obviously the default domain. other than that I can't
make heads or tails of the logs.

Here are logs if anyone wants to look at them:
a bad _ member_ volume creation was UUID
fb9047c3-1b6b-4d2b-bae8-5177e86eb1f2 https://pastebin.com/bmFAy6RR

a good admin volume creation was UUID b49e33db-8ab8-489f-b7cb-092f421178c1
https://pastebin.com/5SAecNJ2

We are using Newton, thanks!!!


-- Jim
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Fuel] add custom settings to a fuel deploy

2018-05-01 Thread Jim Okken
hi Jitendra,
thanks very much for your reply!

We deploy with the UI and right now the environment is set and we only add
compute nodes.
In the past we deployed one compute node at a time but now that we
understand the process we deploy multiple, 3 or 5 compute nodes at a time.
Honestly though going fwd it could be 1 or multiple nodes at a time.
This is an growing internal-use environment, but we have 23 compute nodes
right now, so we are going to be growing it slower going fwd.

Right now we have some simple shell scripts which we run after a successful
deploy, these set the settings in the config files and restart openstack
services. But until those scripts are run the environment is missing those
needed additions and not really usable.
Not a huge problem for an internal-use environment, but we would like to
have no downtime.
Also it is HA so we have 3 controllers.

thanks!!


-- Jim

On Tue, May 1, 2018 at 3:39 PM, Jitendra Kumar Bhaskar <
jitendr...@pramati.com> wrote:

> Hi Jim,
>
> I can help you one that, but before that wanted to understand how are you
> deploying the additional computes:
> 1. If CLI then share the command that you used to deploy.
> 2. If UI then are you deploying only one node after selection ?
>
>
> Regards
> Jitendra Bhaskar
>
> Regards
> Bhaskar
> +1-469-514-7986
>
>
>
>
>
> On Tue, May 1, 2018 at 12:21 PM, Jim Okken <j...@jokken.com> wrote:
>
>> Hi list,
>>
>>
>>
>> We’ve created a pretty large openstack Newton HA environment using fuel.
>> After initial hiccups with deployment (not all fuel troubles) we can now
>> add additional compute nodes to the environment with ease!
>>
>> Thank you for all who’ve worked on all the projects to make this product.
>>
>>
>>
>> My question has to do with something I think I should know already: How
>> can we get fuel to stop overwriting custom settings in our environment?
>> When we deploy new compute nodes, original openstack settings on all nodes
>> are re-deployed/re-set.
>>
>>
>>
>> For example we have changes to settings in these files on the controller
>> nodes.
>>
>>
>>
>> /etc/nova/nova.conf
>>
>> /etc/neutron/dhcp_agent.ini
>>
>> /etc/neutron/plugins/ml2/openvswitch_agent.ini
>>
>> /etc/openstack-dashboard/local_settings.py
>>
>> /etc/keystone/keystone.conf
>>
>> /etc/cinder/cinder.conf
>>
>> /etc/neutron/neutron.conf
>>
>>
>>
>> I’m guessing the method to resolve this is not to stop fuel from
>> overwriting settings, but to add to fuel some tasks that sets these custom
>> settings again near the end of each deploy.
>>
>>
>>
>> I’m sure this is something I am supposed to know already, but so far in
>> my route thru Openstack land experience with this has escaped me.
>>
>> Can you send me some advice, pointers, places to start?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> --jim
>>
>>
>> ___
>> Mailing list: http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>>
>>
>
> Disclaimer:
> The contents of this email and any attachments are confidential. They are
> intended for the named recipient(s) only. If you have received this email
> by mistake, please notify the sender immediately and do not disclose the
> contents to anyone or make copies thereof.
>
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] [Fuel] add custom settings to a fuel deploy

2018-05-01 Thread Jim Okken
Hi list,



We’ve created a pretty large openstack Newton HA environment using fuel.
After initial hiccups with deployment (not all fuel troubles) we can now
add additional compute nodes to the environment with ease!

Thank you for all who’ve worked on all the projects to make this product.



My question has to do with something I think I should know already: How can
we get fuel to stop overwriting custom settings in our environment? When we
deploy new compute nodes, original openstack settings on all nodes are
re-deployed/re-set.



For example we have changes to settings in these files on the controller
nodes.



/etc/nova/nova.conf

/etc/neutron/dhcp_agent.ini

/etc/neutron/plugins/ml2/openvswitch_agent.ini

/etc/openstack-dashboard/local_settings.py

/etc/keystone/keystone.conf

/etc/cinder/cinder.conf

/etc/neutron/neutron.conf



I’m guessing the method to resolve this is not to stop fuel from
overwriting settings, but to add to fuel some tasks that sets these custom
settings again near the end of each deploy.



I’m sure this is something I am supposed to know already, but so far in my
route thru Openstack land experience with this has escaped me.

Can you send me some advice, pointers, places to start?



Thanks!



--jim
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] compute nodes down

2017-12-29 Thread Jim Okken
I believe this issue turned out to be the shared storage device we are
using for shared storage to each compute node.

it had an access issue and one instance's vHD files had access attempts
that hung forever and never timed out.
this make sense for one node to be having nova issues. But could this cause
all compute nodes to have nova services to stop after some time? (in a
shared storage setup does each node access/query each vHD on the storage
periodically?)

thanks!

-- Jim

On Tue, Dec 19, 2017 at 3:45 AM, Tobias Urdin <tobias.ur...@crystone.com>
wrote:

> Enable debug in nova.conf and check conductor and compute logs.
>
> Check that your clock is in-sync with NTP or you might experience that the
> alive checks in the database exceeds the service_down_time config value.
>
> On 12/19/2017 12:09 AM, Jim Okken wrote:
>
> hi list,
>
> hoping someone could shed some light on this issue I just started seeing
> today
>
> all my compute nodes started showing as "Down" in the Horizon ->
> Hypervisors -> Compute Nodes tab
>
>
> root@node-1:~# nova service-list
> +-+--+---+--+---
> --+---++-+
> | Id  | Binary   | Host  | Zone | Status  | State
> | Updated_at | Disabled Reason |
> +-+--+---+--+---
> --+---++-+
> | 325 | nova-compute | node-9.mydom.com  | nova | enabled | down
> | 2017-12-18T21:59:38.00 | -   |
> | 448 | nova-compute | node-14.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:42.00 | -   |
> | 451 | nova-compute | node-17.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:04.00 | -   |
> | 454 | nova-compute | node-11.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:02.00 | -   |
> | 457 | nova-compute | node-12.mydom.com | nova | enabled | up
> | 2017-12-18T22:42:12.00 | -   |
> | 472 | nova-compute | node-16.mydom.com | nova | enabled | down
> | 2017-12-18T00:16:01.00 | -   |
> | 475 | nova-compute | node-10.mydom.com | nova | enabled | down
> | 2017-12-18T00:26:09.00 | -   |
> | 478 | nova-compute | node-13.mydom.com | nova | enabled | down
> | 2017-12-17T23:54:06.00 | -   |
> | 481 | nova-compute | node-15.mydom.com | nova | enabled | up
> | 2017-12-18T22:41:34.00 | -   |
> | 484 | nova-compute | node-8.mydom.com  | nova | enabled | down
> | 2017-12-17T23:55:50.00 | -   |
>
>
> if I stop and the start nova-compute on the down nodes the stop will take
> several minutes and then the start will be quick and fine. but after about
> 2 hours the nova-compute service will show down again.
>
> i am not seeing any ERRORS in nova logs.
>
> I get this for the status of a node that is showing as "UP"
>
>
>
> root@node-14:~# systemctl status nova-compute.service
> â nova-compute.service - OpenStack Compute
>Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
> vendor preset: enabled)
>Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
>  Docs: man:nova-compute(1)
>   Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
> (code=exited, status=0/SUCCESS)
>   Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>   Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
> /var/lib/nova (code=exited, status=0/SUCCESS)
>  Main PID: 32196 (nova-compute)
>CGroup: /system.slice/nova-compute.service
>ââ32196 /usr/bin/python /usr/bin/nova-compute
> --config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
> --log-file=/var/log/nova/nova-compute.log
>
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
> 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
> reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
> 22:31:47.605 32196 INFO nova.compute.resource_tracker
> [req-f30b2

[Openstack] compute nodes down

2017-12-18 Thread Jim Okken
hi list,

hoping someone could shed some light on this issue I just started seeing
today

all my compute nodes started showing as "Down" in the Horizon ->
Hypervisors -> Compute Nodes tab


root@node-1:~# nova service-list
+-+--+---+--+-+---++-+
| Id  | Binary   | Host  | Zone | Status  | State |
Updated_at | Disabled Reason |
+-+--+---+--+-+---++-+
| 325 | nova-compute | node-9.mydom.com  | nova | enabled | down  |
2017-12-18T21:59:38.00 | -   |
| 448 | nova-compute | node-14.mydom.com | nova | enabled | up|
2017-12-18T22:41:42.00 | -   |
| 451 | nova-compute | node-17.mydom.com | nova | enabled | up|
2017-12-18T22:42:04.00 | -   |
| 454 | nova-compute | node-11.mydom.com | nova | enabled | up|
2017-12-18T22:42:02.00 | -   |
| 457 | nova-compute | node-12.mydom.com | nova | enabled | up|
2017-12-18T22:42:12.00 | -   |
| 472 | nova-compute | node-16.mydom.com | nova | enabled | down  |
2017-12-18T00:16:01.00 | -   |
| 475 | nova-compute | node-10.mydom.com | nova | enabled | down  |
2017-12-18T00:26:09.00 | -   |
| 478 | nova-compute | node-13.mydom.com | nova | enabled | down  |
2017-12-17T23:54:06.00 | -   |
| 481 | nova-compute | node-15.mydom.com | nova | enabled | up|
2017-12-18T22:41:34.00 | -   |
| 484 | nova-compute | node-8.mydom.com  | nova | enabled | down  |
2017-12-17T23:55:50.00 | -   |


if I stop and the start nova-compute on the down nodes the stop will take
several minutes and then the start will be quick and fine. but after about
2 hours the nova-compute service will show down again.

i am not seeing any ERRORS in nova logs.

I get this for the status of a node that is showing as "UP"



root@node-14:~# systemctl status nova-compute.service
â nova-compute.service - OpenStack Compute
   Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled;
vendor preset: enabled)
   Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
 Docs: man:nova-compute(1)
  Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova
(code=exited, status=0/SUCCESS)
  Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova
/var/lib/nova (code=exited, status=0/SUCCESS)
  Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova
/var/lib/nova (code=exited, status=0/SUCCESS)
 Main PID: 32196 (nova-compute)
   CGroup: /system.slice/nova-compute.service
   ââ32196 /usr/bin/python /usr/bin/nova-compute
--config-file=/etc/nova/nova-compute.conf --config-file=/etc/nova/nova.conf
--log-file=/var/log/nova/nova-compute.log

Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdriver
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' _send
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
reply msg_id: 2877b9707da144f3a91e7b80e2705fb3 __call__
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.605 32196 INFO nova.compute.resource_tracker
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable vcpus:
40, total allocated vcpus: 0
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.606 32196 INFO nova.compute.resource_tracker
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource view:
name=node-14.mydom.com phys_ram=128812MB used_ram=512MB phys_disk=6691GB
used_disk=0GB total_vcpus=40 used_vcpus=0 pci_stats=[]
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.610 32196 DEBUG oslo_messaging._drivers.amqpdriver
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id:
ad32abe833f4440d86c15b911aa35c43 exchange 'nova' topic 'conductor' _send
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.632 32196 DEBUG oslo_messaging._drivers.amqpdriver [-] received
reply msg_id: ad32abe833f4440d86c15b911aa35c43 __call__
/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
Dec 18 22:31:47 node-14.mydom.com nova-compute[32196]: 2017-12-18
22:31:47.633 32196 WARNING nova.scheduler.client.report
[req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Unable to refresh my
resource provider record
Dec 

Re: [Openstack] soft lockup on Newton compute nodes

2017-11-20 Thread Jim Okken
just an update for this question.
the issue is resolved with a kernel update

upgrading multiple compute nodes from kernel 4.4.0.93 to 4.4.0.98 fixed the
softlockup issue. Also this kernel change does not seem to have broken
anything else in openstack

-- Jim

On Fri, Nov 10, 2017 at 9:50 AM, Jim Okken <j...@jokken.com> wrote:

> = UPDATE 11/10 ==
> hi again,
>
> based on some advice from a member of this mailing list we've been looking
> into kernel and driver versions of our compute nodes
>
> We also have plain non openstack "KVM on Ubuntu" servers for testing.
>
> I looked at driver and kernel differences between these Ubuntu 16 w/ KVM
> systems and our openstack compute nodes. I found Ubuntu 16 w/ KVM was at
> kernel version 4.4.0-87 and that the openstack compute nodes were at
> 4.4.0-93. So I upgraded the Ubuntu 16 w/ KVM to 4.4.0-93 and was able to
> reproduce this problem (but only on the exact HP hardware that is our
> openstack compute nodes, and not on other hardware).
> Next I updated these Ubuntu 16 w/ KVM to 4.4.0-98 and the problem no
> longer occured!
>
> I need to upgrade a few openstack compute nodes to 4.4.0-98 and test. Do
> anyone think this kernel change could break openstack?
>
> In the kernel change log I found a fix for a specific HP server in
> 4.4.0-98 (not the same as our server but somewhat similar)
>
> thanks!
>
> -- Jim
>
> On Mon, Oct 23, 2017 at 10:25 PM, Jim Okken <j...@jokken.com> wrote:
>
>> = UPDATE 10/23 ==
>>
>> we have been trying different things to get better debug we disabled
>> rate-limiting in order to get better info in /var/log/message. for some
>> reason (maybe unrelated) we didn't get the soft lockup during this test But
>> this time we got openvswitch, br_netfilter, etc in the call trace in
>> /var/log/messages
>>
>> Please advise in any way! thx!!
>>
>> basically we are running various types of SIP/RTP test traffic between 2
>> instances (on different compute nodes). This time instead of one hypervisor
>> getting the errors both hypervisors did, but neither got the soft lockup.
>>
>> log snippetes below, full logs here:
>>
>> www.jokken.com/downloads/node-68.txt
>>
>> www.jokken.com/downloads/node-90.txt
>>
>>
>> *node-68*
>>
>> 2017-10-20T17:48:37.031741+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>> 2017-10-20T17:58:36.281069+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T17:58:37.548500+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:08:36.180377+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:08:37.058861+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>> 2017-10-20T18:18:36.175797+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:18:37.583237+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 41 messages lost due to rate-limiting
>>
>> 2017-10-20T18:28:36.172090+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: begin to drop messages due to rate-limiting
>>
>> 2017-10-20T18:28:37.125346+00:00 node-68 rsyslogd-2177: imuxsock[pid
>> 5085]: 40 messages lost due to rate-limiting
>>
>>
>>
>> ps -aef | grep 5080
>>
>> ceilome+ 5080 3502 0 Oct03 ? 01:32:57 ceilometer-polling - AgentManager(0)
>>
>>
>>
>> 2017-10-20T18:35:10.759230+00:00 node-68 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="3431" x-info="
>> http://www.rsyslog.com;] exiting on signal 15.
>>
>> 2017-10-20T18:35:10.790611+00:00 node-68 rsyslogd: [origin
>> software="rsyslogd" swVersion="8.16.0" x-pid="23851" x-info="
>> http://www.rsyslog.com;] start
>>
>> 2017-10-20T18:35:10.790395+00:00 node-68 rsyslogd: rsyslogd's groupid
>> changed to 108
>>
>> 2017-10-20T18:35:10.790455+00:00 node-68 rsyslogd: rsyslogd's userid
>> changed to 104
>>
>> 2017-10-20T18:35:10.790491+00:00 node-68 rsyslogd-2357: queue "action 0
>> queue": high water mark is set quite low at 8000. You should only set it
>> below 60% (60) if you have a good reason for this. [v8.16.0 try
>> http://www.rsyslog.com/e/2357 ]
>>
>>
>>
>> Test starts: Fri Oct 20 18:52:48 2017
>&

Re: [Openstack] [Fuel] node name issue

2017-11-20 Thread Jim Okken
update to an old question:
I have gotten around this issue.

I'm not sure how I got around this issue, but my theory is something I
noticed quite by accident.

The servers I was having these troubles on use 80GB hard drives, but they
also have a flash drive in them, for small OS deployments. I stumbled
across a /dev/sda6 partition on these flash drives.

On this partition I found 2 files: meta-data and user-data. In those files
was the old node name I mentioned in the original post. This partition must
have been detected, and these stale files were used by fuel-agent when the
newly provisioned node first booted, even though they were booting from the
80GB drive which had its own /dev/sda6 partition.

I suspect the stale/incorrect /dev/sda6 is probably is from Fuel 8 when we
tried to deploy on some of these flash drives...

once I deleted the stale/incorrect /dev/sda6 then theprovisioning and
deployment went perfectly

thanks

-- Jim

On Thu, Sep 28, 2017 at 5:02 PM, Jim Okken <j...@jokken.com> wrote:

> I ran  "fuel2 node update -H blade13 20" just to get out of the node-*
> naming convention, as someone suggested
>
>
>
> The deploy still names the node node-11 and provisioning fails.
>
> digging a little more, i see it might have to do with the fuel-agent
> cloud-init scripts.
>
> in the cloud-init.log on the new node I see the node name being set to
> node-11!
>
>
>
> this isnt node-11. this was node-20, but I renamed it to blade13 with the
> command "fuel2 node update -H blade13 20"
>
>
>
> i also noted that after the cloud-init scripts ran at the end the first
> boot of the new provisioned OS, that in the Fuel GUI, the FQDN field became
> node-11.ourdomain.com (before it was bootstrap.ourdomain.com)
>
> (in the same window Hostname still show as blade13)
>
>
>
> But FQDN in the Fuel2 CLI output still shows  node-20.ourdomain.com!!!
>
>
>
> [fuel2 node show 20
>
> | id  | 20
>  |
>
> | name| Untitled (68:58)
>   |
>
> | status  | ready
>  |
>
> | os_platform | ubuntu
>  |
>
> | roles   | [u'compute']
>  |
>
> | kernel_params   | None
>  |
>
> | pending_roles   | []
>  |
>
> | hostname| node-20
>  |
>
> | fqdn| node-20.dialogic.com
>  |
>
> | platform_name   | ProLiant BL460c Gen9
>  |
>
>
>
>
>
>
>
>
>
>
>
> where can i find the cloud init settings which are deploy to new nodes?
>
> i guess this has something to do with this file:
> /usr/share/fuel-agent/cloud-init-templates/cloud_config_ubuntu.jinja2
>
> in that file I see
>
> hostname: {{ common.hostname }}
>
> fqdn: {{ common.fqdn }}
>
>
>
> please help me with an info you might have or let me know that populates
> those 2 parts of the template?
>
>
>
> Is there a database these values are all stored in on the fuel server?
>
>
>
> Thanks
>
>
>
> --Jim
>
> -- Jim
>
> On Tue, Sep 26, 2017 at 12:00 PM, Jim Okken <j...@jokken.com> wrote:
>
>> also I should add, I dont have the original hard drives in the system so
>> it isn't because it is booting the old OS where these node names were set.
>> this is definitely the newly installed OS being given the wroing hostname
>>
>>
>>
>> is there a database this is all kept in? maybe I could look around and
>> find where these old node names are being saved?
>>
>> thanks!
>>
>> -- Jim
>>
>> On Mon, Sep 25, 2017 at 6:03 PM, Jim Okken <j...@jokken.com> wrote:
>>
>>> hi all,
>>>
>>> I am using Fuel 10.
>>>
>>> i have 2 nodes I am trying to deploy as compute nodes. at one time in
>>> the past I was attempting to deploy them too. I assume back then their node
>>> names were node-11 and node-20.
>>>
>>> they were never successfully deploy and now I've worked out their
>>> hardware issues and are attempting to 

Re: [Openstack] soft lockup on Newton compute nodes

2017-11-10 Thread Jim Okken
= UPDATE 11/10 ==
hi again,

based on some advice from a member of this mailing list we've been looking
into kernel and driver versions of our compute nodes

We also have plain non openstack "KVM on Ubuntu" servers for testing.

I looked at driver and kernel differences between these Ubuntu 16 w/ KVM
systems and our openstack compute nodes. I found Ubuntu 16 w/ KVM was at
kernel version 4.4.0-87 and that the openstack compute nodes were at
4.4.0-93. So I upgraded the Ubuntu 16 w/ KVM to 4.4.0-93 and was able to
reproduce this problem (but only on the exact HP hardware that is our
openstack compute nodes, and not on other hardware).
Next I updated these Ubuntu 16 w/ KVM to 4.4.0-98 and the problem no longer
occured!

I need to upgrade a few openstack compute nodes to 4.4.0-98 and test. Do
anyone think this kernel change could break openstack?

In the kernel change log I found a fix for a specific HP server in 4.4.0-98
(not the same as our server but somewhat similar)

thanks!

-- Jim

On Mon, Oct 23, 2017 at 10:25 PM, Jim Okken <j...@jokken.com> wrote:

> = UPDATE 10/23 ==
>
> we have been trying different things to get better debug we disabled
> rate-limiting in order to get better info in /var/log/message. for some
> reason (maybe unrelated) we didn't get the soft lockup during this test But
> this time we got openvswitch, br_netfilter, etc in the call trace in
> /var/log/messages
>
> Please advise in any way! thx!!
>
> basically we are running various types of SIP/RTP test traffic between 2
> instances (on different compute nodes). This time instead of one hypervisor
> getting the errors both hypervisors did, but neither got the soft lockup.
>
> log snippetes below, full logs here:
>
> www.jokken.com/downloads/node-68.txt
>
> www.jokken.com/downloads/node-90.txt
>
>
> *node-68*
>
> 2017-10-20T17:48:37.031741+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: 40 messages lost due to rate-limiting
>
> 2017-10-20T17:58:36.281069+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: begin to drop messages due to rate-limiting
>
> 2017-10-20T17:58:37.548500+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: 41 messages lost due to rate-limiting
>
> 2017-10-20T18:08:36.180377+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: begin to drop messages due to rate-limiting
>
> 2017-10-20T18:08:37.058861+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: 40 messages lost due to rate-limiting
>
> 2017-10-20T18:18:36.175797+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: begin to drop messages due to rate-limiting
>
> 2017-10-20T18:18:37.583237+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: 41 messages lost due to rate-limiting
>
> 2017-10-20T18:28:36.172090+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: begin to drop messages due to rate-limiting
>
> 2017-10-20T18:28:37.125346+00:00 node-68 rsyslogd-2177: imuxsock[pid
> 5085]: 40 messages lost due to rate-limiting
>
>
>
> ps -aef | grep 5080
>
> ceilome+ 5080 3502 0 Oct03 ? 01:32:57 ceilometer-polling - AgentManager(0)
>
>
>
> 2017-10-20T18:35:10.759230+00:00 node-68 rsyslogd: [origin
> software="rsyslogd" swVersion="8.16.0" x-pid="3431" x-info="
> http://www.rsyslog.com;] exiting on signal 15.
>
> 2017-10-20T18:35:10.790611+00:00 node-68 rsyslogd: [origin
> software="rsyslogd" swVersion="8.16.0" x-pid="23851" x-info="
> http://www.rsyslog.com;] start
>
> 2017-10-20T18:35:10.790395+00:00 node-68 rsyslogd: rsyslogd's groupid
> changed to 108
>
> 2017-10-20T18:35:10.790455+00:00 node-68 rsyslogd: rsyslogd's userid
> changed to 104
>
> 2017-10-20T18:35:10.790491+00:00 node-68 rsyslogd-2357: queue "action 0
> queue": high water mark is set quite low at 8000. You should only set it
> below 60% (60) if you have a good reason for this. [v8.16.0 try
> http://www.rsyslog.com/e/2357 ]
>
>
>
> Test starts: Fri Oct 20 18:52:48 2017
>
>
>
> 2017-10-20T18:56:20.408532+00:00 node-68 kernel: [1458996.797708]
> [ cut here ]
>
> 2017-10-20T18:56:20.408571+00:00 node-68 kernel: [1458996.797728]
> WARNING: CPU: 27 PID: 0 at /build/linux-YyUNAI/linux-4.4.0/net/core/dev.c:2445
> skb_warn_bad_offload+0xd1/0x120()
>
> 2017-10-20T18:56:20.408574+00:00 node-68 kernel: [1458996.797732]
> qvofd385f05-cb: caps=(0x0184075b59e9, 0x) len=2636
> data_len=2594 gso_size=1480 gso_type=6 ip_summed=0
>
> 2017-10-20T18:56:20.408576+00:00 node-68 kernel: [1458996.797735] Modules
> linked in: bonding binfmt_misc nf_conntrack_netlink vhost_net vhost macvtap
> macvlan xt_mac xt_tcpudp xt_physdev br_netfilter xt_set ip_set_hash_net
> ip_set nfnetli

Re: [Openstack] soft lockup on Newton compute nodes

2017-10-23 Thread Jim Okken
7-10-20T19:00:19.698100+00:00 node-90 kernel: [97583.653163]
[] ? generic_exec_single+0x85/0x120

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653167]
[] ? be_eq_notify+0x60/0x70 [be2net]

2017-10-20T19:00:19.698101+00:00 node-90 kernel: [97583.653168]
[] __netif_receive_skb+0x18/0x60

2017-10-20T19:00:19.698102+00:00 node-90 kernel: [97583.653170]
[] process_backlog+0xa8/0x150

2017-10-20T19:00:19.698104+00:00 node-90 kernel: [97583.653171]
[] net_rx_action+0x21e/0x360

2017-10-20T19:00:19.698105+00:00 node-90 kernel: [97583.653173]
[] __do_softirq+0x101/0x290

2017-10-20T19:00:19.698106+00:00 node-90 kernel: [97583.653175]
[] do_softirq_own_stack+0x1c/0x30

2017-10-20T19:00:19.698107+00:00 node-90 kernel: [97583.653176]  
[] do_softirq.part.19+0x38/0x40

2017-10-20T19:00:19.698108+00:00 node-90 kernel: [97583.653179]
[] do_softirq+0x1d/0x20

2017-10-20T19:00:19.698110+00:00 node-90 kernel: [97583.653181]
[] netif_rx_ni+0x33/0x80

2017-10-20T19:00:19.698111+00:00 node-90 kernel: [97583.653184]
[] tun_get_user+0x506/0x880

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653185]
[] tun_sendmsg+0x51/0x70

2017-10-20T19:00:19.698112+00:00 node-90 kernel: [97583.653188]
[] handle_tx+0x306/0x4e0 [vhost_net]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653190]
[] handle_tx_kick+0x15/0x20 [vhost_net]

2017-10-20T19:00:19.698113+00:00 node-90 kernel: [97583.653193]
[] vhost_worker+0xf3/0x190 [vhost]

2017-10-20T19:00:19.698115+00:00 node-90 kernel: [97583.653195]
[] ? vhost_poll_wakeup+0x30/0x30 [vhost]

2017-10-20T19:00:19.698116+00:00 node-90 kernel: [97583.653198]
[] kthread+0xe5/0x100

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653199]
[] ? kthread_create_on_node+0x1e0/0x1e0

2017-10-20T19:00:19.698117+00:00 node-90 kernel: [97583.653203]
[] ret_from_fork+0x3f/0x70

2017-10-20T19:00:19.698118+00:00 node-90 kernel: [97583.653204]
[] ? kthread_create_on_node+0x1e0/0x1e0

2017-10-20T19:00:19.698123+00:00 node-90 kernel: [97583.653206] ---[ end
trace d7e73079b38e57b4 ]---





-- Jim

On Wed, Oct 18, 2017 at 11:37 PM, Jim Okken <j...@jokken.com> wrote:

> hi all,
>
> please help us out with an issue we are seeing on multiple compute nodes
> running Newton (Ubuntu 16.04.3 Kernel 4.4.0). After about 1 hour of running
> our VOIP test application the instances become non-responsive and can't be
> pinged as well do the compute nodes.
>
> messages appear on the compute node console screens. a screen shot of that
> is hosted here:
>
> http://www.jokken.com/downloads/console.png
>
> i'll try to attach it also.
>
> The first compute node this was seen on was running 2 instances, the
> second was running only 1 instance. They were using on a portion of the
> total 40 vCPUs available, and the load was moderate. Cold boot these nodes
> and all is well again, until we run our application for about 1 hour.
>
> please let us know what you think thanks!
>
> not a lot is shown in DEBUG logging of Nova and Neutron on the compute node
>
> these logs are here:
>
> http://www.jokken.com/downloads/logs.zip
>
> i'll try to attach them too.
>
> https://ask.openstack.org/en/question/110748/soft-lockup-
> on-newton-compute-nodes/
>
> /var/log/messages on the compute node shows many repeats of these messages:
>
> 2017-10-18T20:49:26.462309+00:00 node-58 kernel: [1297007.624935] Modules
> linked in: binfmt_misc nf_conntrack_netlink vhost_net vhost macvtap macvlan
> ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
> ip_set_hash_net ip_set nfnetlink veth ebtable_filter ebtables openvswitch
> ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
> ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport
> xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables
> xfs ipmi_ssif 8021q garp mrp intel_rapl x86_pkg_temp_thermal
> intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
> serio_raw bridge stp llc sb_edac edac_core hpilo ioatdma lpc_ich shpchp dca
> ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter mac_hid kvm_intel kvm
> irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr
> iscsi_tcp libiscsi_tcp nf_conntrack_proto_gre nf_conntrack_ipv6
> nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 raid10
> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
> raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses
> enclosure uas usb_storage psmouse ahci lpfc be2iscsi libahci be2net
> iscsi_boot_sysfs libiscsi vxlan scsi_transport_fc ip6_udp_tunnel
> scsi_transport_iscsi udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac
> scsi_dh_alua dm_multipath
>
> 2017-10-18T20:49:26.462311+00:00 node-58 kernel: [1297007.625008] 

Re: [Openstack] [Fuel] node name issue

2017-09-28 Thread Jim Okken
I ran  "fuel2 node update -H blade13 20" just to get out of the node-*
naming convention, as someone suggested



The deploy still names the node node-11 and provisioning fails.

digging a little more, i see it might have to do with the fuel-agent
cloud-init scripts.

in the cloud-init.log on the new node I see the node name being set to
node-11!



this isnt node-11. this was node-20, but I renamed it to blade13 with the
command "fuel2 node update -H blade13 20"



i also noted that after the cloud-init scripts ran at the end the first
boot of the new provisioned OS, that in the Fuel GUI, the FQDN field became
node-11.ourdomain.com (before it was bootstrap.ourdomain.com)

(in the same window Hostname still show as blade13)



But FQDN in the Fuel2 CLI output still shows  node-20.ourdomain.com!!!



[fuel2 node show 20

| id  |
20
|

| name| Untitled (68:58)
  |

| status  |
ready
|

| os_platform |
ubuntu
|

| roles   |
[u'compute']
|

| kernel_params   |
None
|

| pending_roles   |
[]
  |

| hostname|
node-20
|

| fqdn| node-20.dialogic.com
|

| platform_name   | ProLiant BL460c
Gen9
|











where can i find the cloud init settings which are deploy to new nodes?

i guess this has something to do with this file:
/usr/share/fuel-agent/cloud-init-templates/cloud_config_ubuntu.jinja2

in that file I see

hostname: {{ common.hostname }}

fqdn: {{ common.fqdn }}



please help me with an info you might have or let me know that populates
those 2 parts of the template?



Is there a database these values are all stored in on the fuel server?



Thanks



--Jim

-- Jim

On Tue, Sep 26, 2017 at 12:00 PM, Jim Okken <j...@jokken.com> wrote:

> also I should add, I dont have the original hard drives in the system so
> it isn't because it is booting the old OS where these node names were set.
> this is definitely the newly installed OS being given the wroing hostname
>
>
>
> is there a database this is all kept in? maybe I could look around and
> find where these old node names are being saved?
>
> thanks!
>
> -- Jim
>
> On Mon, Sep 25, 2017 at 6:03 PM, Jim Okken <j...@jokken.com> wrote:
>
>> hi all,
>>
>> I am using Fuel 10.
>>
>> i have 2 nodes I am trying to deploy as compute nodes. at one time in the
>> past I was attempting to deploy them too. I assume back then their node
>> names were node-11 and node-20.
>>
>> they were never successfully deploy and now I've worked out their
>> hardware issues and are attempting to deploy them again. now Fuel has given
>> them the names node-80 and node-81.
>> (i may be at 80 in my node names but I only have 17 nodes so far)
>>
>> the deploy of these 2 nodes does not get past installing Ubuntu. The
>> nodes reboot after Ubuntu is installed and come up incorrectly as node-11
>> and node-20. After that Fuel sits for a long while and then gives an error
>> (pasted at the end of email). I assume the nodes come up with the wrong
>> name/ip/ssh-key and Fuel can't contact them.
>>
>> I'm a novice at using the fuel and fuel2 cli's but I've tried deleting
>> these nodes and removing from database. Then re-PXE boot the nodes and
>> start a fresh deploy just to have them named node11 and 20 again. Fuel cli
>> does show the correct host name for these nodes, but I've tried anyway to
>> (re)set the host name for these node with no affect.
>>
>> If I try to delete node-11 and node-20 I get this error
>> 404 Client Error: Not Found for url: http://10.20.243.1:8000/api/v1
>> /nodes/?ids=11 (NodeCollection not found)
>>
>> what can I do to get past this please?
>>
>>
>>
>> Errors from the Fuel Astute log:
>> 2017-09-25 21:06:28 ERROR [1565] Error running provisioning:
>> # ,
>> trace: ["/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:178:in
>> `rescue in initialize_mclient'", "/usr/share/gems/gems/astute-1
>> 0.0.0/lib/astute/mclient.rb:161:in `initialize_mclient'",
>> "/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:51:in
>> `initialize'", "/usr/share/gems/gems/astute-1
>> 0.0.0/lib/astute/nailgun_hooks.rb:421:in `new'",
>> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:421:in
>> `run_shell_without_check'", "/usr/share/gems/gems/astute-1
>> 0.0.0/lib/astute/nailgun_hooks.rb:449:in `update_node_status'",
>> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:313:in
>> `reboot_hook'", "/usr/share/gems/gems/astute-1
>> 0.0.0/lib/astute/nailg

Re: [Openstack] [Fuel] node name issue

2017-09-26 Thread Jim Okken
also I should add, I dont have the original hard drives in the system so it
isn't because it is booting the old OS where these node names were set.
this is definitely the newly installed OS being given the wroing hostname



is there a database this is all kept in? maybe I could look around and find
where these old node names are being saved?

thanks!

-- Jim

On Mon, Sep 25, 2017 at 6:03 PM, Jim Okken <j...@jokken.com> wrote:

> hi all,
>
> I am using Fuel 10.
>
> i have 2 nodes I am trying to deploy as compute nodes. at one time in the
> past I was attempting to deploy them too. I assume back then their node
> names were node-11 and node-20.
>
> they were never successfully deploy and now I've worked out their hardware
> issues and are attempting to deploy them again. now Fuel has given them the
> names node-80 and node-81.
> (i may be at 80 in my node names but I only have 17 nodes so far)
>
> the deploy of these 2 nodes does not get past installing Ubuntu. The nodes
> reboot after Ubuntu is installed and come up incorrectly as node-11 and
> node-20. After that Fuel sits for a long while and then gives an error
> (pasted at the end of email). I assume the nodes come up with the wrong
> name/ip/ssh-key and Fuel can't contact them.
>
> I'm a novice at using the fuel and fuel2 cli's but I've tried deleting
> these nodes and removing from database. Then re-PXE boot the nodes and
> start a fresh deploy just to have them named node11 and 20 again. Fuel cli
> does show the correct host name for these nodes, but I've tried anyway to
> (re)set the host name for these node with no affect.
>
> If I try to delete node-11 and node-20 I get this error
> 404 Client Error: Not Found for url: http://10.20.243.1:8000/api/
> v1/nodes/?ids=11 (NodeCollection not found)
>
> what can I do to get past this please?
>
>
>
> Errors from the Fuel Astute log:
> 2017-09-25 21:06:28 ERROR [1565] Error running provisioning:
> # ,
> trace: ["/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:178:in
> `rescue in initialize_mclient'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/mclient.rb:161:in `initialize_mclient'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:51:in
> `initialize'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:421:in
> `new'", "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:421:in
> `run_shell_without_check'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/nailgun_hooks.rb:449:in `update_node_status'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:313:in
> `reboot_hook'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:38:in
> `block in process'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/nailgun_hooks.rb:26:in `each'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:26:in
> `process'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/image_provision.rb:117:in
> `reboot'", "/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:273:in
> `soft_reboot'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:240:in
> `provision_piece'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:126:in `block (3 levels) in
> provision_and_watch_progress'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:309:in `call'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:309:in
> `sleep_not_greater_than'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:120:in `block (2 levels) in
> provision_and_watch_progress'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:119:in `loop'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:119:in `block
> in provision_and_watch_progress'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:118:in `catch'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:118:in
> `provision_and_watch_progress'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/provision.rb:52:in `provision'",
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/orchestrator.rb:109:in
> `provision'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/server/dispatcher.rb:46:in
> `provision'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/server/dispatcher.rb:37:in
> `image_provision'", 
> "/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:172:in
> `dispatch_message'", "/usr/share/gems/gems/astute-
> 10.0.0/lib/astute/server/server.rb:131:in `block in dispatch'&

[Openstack] [Fuel] node name issue

2017-09-25 Thread Jim Okken
hi all,

I am using Fuel 10.

i have 2 nodes I am trying to deploy as compute nodes. at one time in the
past I was attempting to deploy them too. I assume back then their node
names were node-11 and node-20.

they were never successfully deploy and now I've worked out their hardware
issues and are attempting to deploy them again. now Fuel has given them the
names node-80 and node-81.
(i may be at 80 in my node names but I only have 17 nodes so far)

the deploy of these 2 nodes does not get past installing Ubuntu. The nodes
reboot after Ubuntu is installed and come up incorrectly as node-11 and
node-20. After that Fuel sits for a long while and then gives an error
(pasted at the end of email). I assume the nodes come up with the wrong
name/ip/ssh-key and Fuel can't contact them.

I'm a novice at using the fuel and fuel2 cli's but I've tried deleting
these nodes and removing from database. Then re-PXE boot the nodes and
start a fresh deploy just to have them named node11 and 20 again. Fuel cli
does show the correct host name for these nodes, but I've tried anyway to
(re)set the host name for these node with no affect.

If I try to delete node-11 and node-20 I get this error
404 Client Error: Not Found for url:
http://10.20.243.1:8000/api/v1/nodes/?ids=11 (NodeCollection not found)

what can I do to get past this please?



Errors from the Fuel Astute log:
2017-09-25 21:06:28 ERROR [1565] Error running provisioning:
# ,
trace: ["/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:178:in
`rescue in initialize_mclient'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:161:in
`initialize_mclient'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:51:in
`initialize'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:421:in
`new'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:421:in
`run_shell_without_check'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:449:in
`update_node_status'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:313:in
`reboot_hook'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:38:in
`block in process'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:26:in
`each'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/nailgun_hooks.rb:26:in
`process'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/image_provision.rb:117:in
`reboot'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:273:in
`soft_reboot'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:240:in
`provision_piece'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:126:in `block
(3 levels) in provision_and_watch_progress'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:309:in `call'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:309:in
`sleep_not_greater_than'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:120:in `block
(2 levels) in provision_and_watch_progress'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:119:in `loop'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:119:in `block
in provision_and_watch_progress'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:118:in
`catch'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:118:in
`provision_and_watch_progress'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/provision.rb:52:in
`provision'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/orchestrator.rb:109:in
`provision'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/dispatcher.rb:46:in
`provision'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/dispatcher.rb:37:in
`image_provision'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:172:in
`dispatch_message'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:131:in
`block in dispatch'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/task_queue.rb:64:in
`call'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/task_queue.rb:64:in
`block in each'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/task_queue.rb:56:in
`each'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/task_queue.rb:56:in
`each'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:128:in
`each_with_index'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:128:in
`dispatch'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/server/server.rb:106:in
`block in perform_main_job'"]
2017-09-25 21:06:26 ERROR [1565] Error occured while provisioning:
# >
2017-09-25 21:06:26 ERROR [1565] No more retries for MCollective client
instantiation after exception:
["/usr/share/gems/gems/mcollective-client-2.8.4/lib/mcollective/rpc/client.rb:507:in
`discover'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:167:in
`initialize_mclient'",
"/usr/share/gems/gems/astute-10.0.0/lib/astute/mclient.rb:51:in
`initialize'",

[Openstack] [Fuel] Danube Fuel 10 compute node base system partition size

2017-09-13 Thread Jim Okken
Hi all,




In danube disk provisioning for a compute node, the smallest disk/partition
size for the base system is 54GB.



After I deploy a compute node I see 44GB free of the 54GB. So it seems
something smaller that 54GB can be used.



Can I somehow change the setting for the smallest disk/partition size to
something smaller so I can have Fuel deploy the base OS to a smaller drive?

I have 14 HP blades with an internal 32GB disk which I would prefer to use
for the base system.







See /dev/mapper/os-root:



Filesystem   Size  Used Avail Use%
Mounted on

udev  63G 0   63G   0%
/dev

tmpfs 13G   49M   13G   1%
/run

/dev/mapper/os-root   50G  3.0G   44G   7% /

tmpfs 63G 0   63G   0%
/dev/shm

tmpfs5.0M 0  5.0M   0%
/run/lock

tmpfs 63G 0   63G   0%
/sys/fs/cgroup

/dev/sda3196M   58M  129M  32%
/boot

/dev/mapper/vm-nova  318G   33M  318G   1% /var/lib/nova

cgmfs100K 0  100K   0%
/run/cgmanager/fs

/dev/mapper/3600c0ff0001ea00fa8a1b6590300-part1  280G  4.5G  275G   2%
/mnt/MSA_FC_Vol1

tmpfs 13G 0   13G   0%
/run/user/0




thanks!
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Fuel] storage question. (Fuel 10 Newton deploy with storage nodes)

2017-09-06 Thread Jim Okken
thanks for the help once again Eddie!

im sure you remember i have that fiber channel SAN configuration


This system has a 460GB disk mapped to it from the fiber channel SAN. As
far as I can tell this disk isn't much different to the OS than a local
SATA drive.
There is also a internal 32GB USB/Flash drive in this system which isn't
even shown in the Fuel 10 GUI

In the bootstrap OS I see:

ls /dev/disk/by-path:
pci-:00:14.0-usb-0:3.1:1.0-scsi-0:0:0:0
pci-:09:00.0-fc-0x247000c0ff25ce6d-lun-12
pci-:09:00.0-fc-0x207000c0ff25ce6d-lun-12

both those xxx-lun-12 devices are the same drive.


I also see one /dev/dm-X device
 lsblk /dev/dm-0
NAME  MAJ:MIN RM   SIZE RO TYPE
 MOUNTPOINT
3600c0ff0001ea00f5d1fa4590100 252:00 429.3G  0 mpath


there are 3 /dev/sdX devices

1.
(parted) select /dev/sda
Using /dev/sda
(parted) print
Model: HP iLO Internal SD-CARD (scsi)
Disk /dev/sdd: 32.1GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:
Number  Start  End  Size  Type  File system  Flags


2.
(parted) select /dev/sdb
Using /dev/sdb
(parted) print
Error: /dev/sdb: unrecognised disk label
Model: HP MSA 2040 SAN (scsi)
Disk /dev/sdb: 461GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:


3.
(parted) select /dev/sdc
Using /dev/sdc
(parted) print
Error: /dev/sdc: unrecognised disk label
Model: HP MSA 2040 SAN (scsi)
Disk /dev/sdc: 461GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:


dev sdb and sdc are the same disk.




I see this bug, but wouldn't know how to even start applying a patch if it
apply to my situation.
https://bugs.launchpad.net/fuel/+bug/1652788


thanks!

-- Jim

On Mon, Sep 4, 2017 at 2:34 AM, Eddie Yen <missile0...@gmail.com> wrote:

> Hi
>
> Can you describe your disk configuration and partitioning?
>
> 2017-09-02 4:57 GMT+08:00 Jim Okken <j...@jokken.com>:
>
>> Hi all,
>>
>>
>>
>> Can you offer and insight in this failure I get when deploying 2 compute
>> nodes using Fuel 10, please? (contoller etc nodes are all deployed/working)
>>
>>
>>
>> fuel_agent.cmd.agent PartitionNotFoundError: Partition
>> /dev/mapper/3600c0ff0001ea00f521fa4590100-part2 not found after
>> creation fuel_agent.cmd.agent [-] Partition 
>> /dev/mapper/3600c0ff0001ea00f521fa4590100-part2
>> not found after creation
>>
>>
>>
>>
>>
>> ls -al /dev/mapper
>>
>> 600c0ff0001ea00f521fa4590100 -> ../dm-0
>>
>> 600c0ff0001ea00f521fa4590100-part1 -> ../dm-1
>>
>> 600c0ff0001ea00f521fa4590100p2 -> ../dm-2
>>
>>
>>
>> Why the 2nd partition was created and actually named "...000p2" rather
>> than "...000-part2" is beyond me.
>>
>>
>>
>>  More logging if it helps, lots of failures:
>>
>>
>>
>> 2017-09-01 18:42:32ERRpuppet-user[3642]:  /bin/bash
>> "/etc/puppet/shell_manifests/provision_56_command.sh" returned 255
>> instead of one of [0]
>>
>> 2017-09-01 18:42:32NOTICE puppet-user[3642]:
>> (/Stage[main]/Main/Exec[provision_56_shell]/returns) Partition
>> /dev/mapper/3600c0ff0001ea00f5d1fa4590100-part2 not found after
>> creation
>>
>> 2017-09-01 18:42:32NOTICE puppet-user[3642]:
>> (/Stage[main]/Main/Exec[provision_56_shell]/returns) Unexpected error
>>
>> 2017-09-01 18:42:32NOTICE puppet-user[3642]:
>> (/Stage[main]/Main/Exec[provision_56_shell]/returns) /bin/bash: warning:
>> setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
>>
>> 2017-09-01 18:42:31WARNING   systemd-udevd[4982]:
>> Process '/sbin/kpartx -u -p -part /dev/dm-0' failed with exit code 1.
>>
>> 2017-09-01 18:42:31INFO  multipathd[1012]:  dm-3: remove map
>> (uevent)
>>
>> 2017-09-01 18:42:31WARNING   systemd-udevd[4964]:
>> Process '/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.
>>
>> 2017-09-01 18:42:31WARNING   systemd-udevd[4963]:
>> Process '/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.
>>
>> 2017-09-01 18:42:31ERRmultipath:  /dev/sda: can't store
>> path info
>>
>> 2017-09-01 18:42:30WARNING   systemd-udevd[4889]:
>> Process '/sbin/kpartx -u -p -part /dev/dm-0' failed with exit code 1.
>>
>> 2017-09-01 18:42:29INFO  multipathd[1012]:  dm-3: remove map
>> (uevent)
>>
>> 2017-09-01 18:42:29WARNING   systemd-udevd[4866]:
>> Process '/usr/bin/partx -d --nr 1-1024 /dev/

Re: [Openstack] [Fuel] storage question. (Fuel 10 Newton deploy with storage nodes)

2017-09-01 Thread Jim Okken
Hi all,



Can you offer and insight in this failure I get when deploying 2 compute
nodes using Fuel 10, please? (contoller etc nodes are all deployed/working)



fuel_agent.cmd.agent PartitionNotFoundError: Partition
/dev/mapper/3600c0ff0001ea00f521fa4590100-part2 not found after
creation fuel_agent.cmd.agent [-] Partition
/dev/mapper/3600c0ff0001ea00f521fa4590100-part2 not found after creation





ls -al /dev/mapper

600c0ff0001ea00f521fa4590100 -> ../dm-0

600c0ff0001ea00f521fa4590100-part1 -> ../dm-1

600c0ff0001ea00f521fa4590100p2 -> ../dm-2



Why the 2nd partition was created and actually named "...000p2" rather than
"...000-part2" is beyond me.



 More logging if it helps, lots of failures:



2017-09-01 18:42:32ERRpuppet-user[3642]:  /bin/bash
"/etc/puppet/shell_manifests/provision_56_command.sh" returned 255 instead
of one of [0]

2017-09-01 18:42:32NOTICE puppet-user[3642]:
(/Stage[main]/Main/Exec[provision_56_shell]/returns) Partition
/dev/mapper/3600c0ff0001ea00f5d1fa4590100-part2 not found after creation

2017-09-01 18:42:32NOTICE puppet-user[3642]:
(/Stage[main]/Main/Exec[provision_56_shell]/returns) Unexpected error

2017-09-01 18:42:32NOTICE puppet-user[3642]:
(/Stage[main]/Main/Exec[provision_56_shell]/returns) /bin/bash: warning:
setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

2017-09-01 18:42:31WARNING   systemd-udevd[4982]:  Process
'/sbin/kpartx -u -p -part /dev/dm-0' failed with exit code 1.

2017-09-01 18:42:31INFO  multipathd[1012]:  dm-3: remove map
(uevent)

2017-09-01 18:42:31WARNING   systemd-udevd[4964]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:31WARNING   systemd-udevd[4963]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:31ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:30WARNING   systemd-udevd[4889]:  Process
'/sbin/kpartx -u -p -part /dev/dm-0' failed with exit code 1.

2017-09-01 18:42:29INFO  multipathd[1012]:  dm-3: remove map
(uevent)

2017-09-01 18:42:29WARNING   systemd-udevd[4866]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:29WARNING   systemd-udevd[4867]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:29ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:28WARNING   systemd-udevd[4791]:  Process
'/sbin/kpartx -u -p -part /dev/dm-0' failed with exit code 1.

2017-09-01 18:42:28INFO  multipathd[1012]:  dm-3: remove map
(uevent)

2017-09-01 18:42:28WARNING   systemd-udevd[4773]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:28WARNING   systemd-udevd[4774]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:28ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:28INFO  multipathd[1012]:  dm-2: remove map
(uevent)

2017-09-01 18:42:27WARNING   systemd-udevd[4655]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:27WARNING   systemd-udevd[4654]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:27ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:26WARNING   systemd-udevd[4576]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:26WARNING   systemd-udevd[4577]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:26ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:26INFO  multipathd[1012]:  dm-2: remove map
(uevent)

2017-09-01 18:42:25NOTICE nailgun-agent:  I,
[2017-09-01T18:42:21.541001 #3601]  INFO -- : Wrote data to file
'/etc/nailgun_uid'. Data: 56

2017-09-01 18:42:24WARNING   systemd-udevd[4114]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdb' failed with exit code 1.

2017-09-01 18:42:24WARNING   systemd-udevd[4115]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed with exit code 1.

2017-09-01 18:42:24ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:24INFO  multipathd[1012]:  dm-2: remove map
(uevent)

2017-09-01 18:42:24NOTICE nailgun-agent:  I,
[2017-09-01T18:42:20.153616 #3601]  INFO -- : API URL is
https://10.20.243.1:8443/api

2017-09-01 18:42:24ERRmultipath:  /dev/sda: can't store
path info

2017-09-01 18:42:24WARNING   systemd-udevd[3965]:  Process
'/usr/bin/partx -d --nr 1-1024 /dev/sdc' failed 

Re: [Openstack] [Fuel] storage question. (Fuel 10 Newton deploy with storage nodes)

2017-08-25 Thread Jim Okken
thanks Mike for the info

yes I do want very fast VM provisioning and all the useful features that
comes with having all 3 glance/cinder/ephemeral in CEPH on the storage node.

But I can't afford to have my vHD (either as a cinder volume, or as a
ephemeral volume) over the network on the storage node.

Do any FUEL experts know exactly what the "Ceph RBD for ephemeral volumes
(Nova)" option in Fuel 10  does?
Does it move the running instances vHD off the hypervisors, and onto the
storage node? (aka: move ephemeral from local IO to, network IO?)

thanks!


-- Jim

On Fri, Aug 25, 2017 at 12:08 AM, Mike Smith <mism...@overstock.com> wrote:

> Ceph is basically a ‘swiss army knife of storage’.  It can play multiple
> roles in an Openstack deployment, which is one reason why it is so popular
> among this crowd.  It can be used as storage for:
>
> - nova ephemeral disks (Ceph RBD)
> - replacement for swift (Ceph Object)
> - cinder volume backend (Ceph RBD)
> - glance image backend (Ceph RBD)
> - gnocchi metrics storage (Ceph Object)
> - generic filesystem (CephFS)
>
> …and probably a few more that I’m missing.
>
> The combination of Ceph as backend for glance and nova ephemeral and/or
> cinder volumes is gorgeous because it’s an ‘instance clone’ of the glance
> image into the disk/volume which means very fast VM provisioning.  Some
> people boot instances off of nova ephemeral storage, some prefer to boot
> off of cinder volumes.  It depends if you want features like QoS (I/O
> limiting), snapshots, backup, and whether you want the data to be able to
> ‘persist’ as a volume after the VM that uses it is removed or if you want
> it to disappear when the VM is deleted (i.e. ‘ephemeral’)
>
> I’m not a ‘fuel/mirantis guy’ so I can’t tell you specifically what those
> options in their installer do, but generally Ceph storage is often housed
> on separate servers dedicated to Ceph regardless of how you want to use
> it.  Some people to colocate Ceph onto their compute nodes and have them
> perform double duty (i.e. ‘hyperconverged’)
>
> Hopefully this gives you a little bit of information regarding how Ceph is
> used.
>
>
> Mike Smith
> Lead Cloud System Architect
> Overstock.com
>
>
>
> On Aug 24, 2017, at 9:22 PM, Jim Okken <j...@jokken.com> wrote:
>
> Ive been learning a bit more about storage. Let me share what think I know
> and ask a more specific question. Please correct me if I am off on what I
> think I know.
>
>
> Glance Images and Cinder Volumes are traditionally stored on the storage
> node. Ephemeral volumes (Nova managed, traditionally on the compute node)
> are the copy of the Glance image that has been copied to the compute node
> and booted as an instances' vHD. Cinder volumes can (among other things) be
> added to an instance as additional storage besides this Glance Image.
>
>
> In Fuel I set the "Ceph RBD for volumes (Cinder)" and "Ceph RBD for images
> (Glance)" settings, which will setup Glance and Cinder on the CEPH OSD
> storage nodes.
>
> But I am not sure about what the setting "Ceph RBD for ephemeral volumes
> (Nova)" will do.
>
> Would selecting it move the running instances' vHD off the hypervisors and
> onto the storage node? (aka: move ephemeral from local to over the network?
>
>
> Thanks
>
>
> --jim
>
>
>
> On Thu, Aug 24, 2017 at 12:14 PM, Jim Okken <j...@jokken.com> wrote:
>
>> Hi all,
>>
>>
>> We have a pretty complicated storage setup and I am not sure how to
>> configure Fuel for deployment of the storage nodes. I'm using Fuel
>> 10/Newton. Plus i'm a bit confused on some of the storage aspects
>> (image/glance, volume/cinder, ephemeral/?.)
>>
>>
>> We have 3 nodes dedicated to be storage nodes, for HA.
>>
>> We’re using fiber channel extents and need to use the CEPH filesystem.
>>
>>
>> I’ll try to simplify the storage situation at first to ask my initial
>> question without too many details.
>>
>>
>> We have a fast and a slow storage location. Management tells me they want
>> the slow location for the Glance images and the fast location for the place
>> where the instances actually run. (assume compute nodes with slow hard
>> drives but access to a fast fiber channel volume.)
>>
>>
>> Where is “the place where the instances actually run”. It isn’t via
>> Glance nor Cinder is it?
>>
>> When I configure the storage for CEPH OSD node I see volume settings for
>> Base System, CEPH and CEPH journal. (I see my slow storage and my fast
>> storage disks).
>>
>>
>> When I configure  the storage for a Compute node I see volum

[Openstack] [Fuel] storage question. (Fuel 10 Newton deploy with storage nodes)

2017-08-24 Thread Jim Okken
Ive been learning a bit more about storage. Let me share what think I know
and ask a more specific question. Please correct me if I am off on what I
think I know.



Glance Images and Cinder Volumes are traditionally stored on the storage
node. Ephemeral volumes (Nova managed, traditionally on the compute node)
are the copy of the Glance image that has been copied to the compute node
and booted as an instances' vHD. Cinder volumes can (among other things) be
added to an instance as additional storage besides this Glance Image.



In Fuel I set the "Ceph RBD for volumes (Cinder)" and "Ceph RBD for images
(Glance)" settings, which will setup Glance and Cinder on the CEPH OSD
storage nodes.

But I am not sure about what the setting "Ceph RBD for ephemeral volumes
(Nova)" will do.

Would selecting it move the running instances' vHD off the hypervisors and
onto the storage node? (aka: move ephemeral from local to over the network?



Thanks



--jim



On Thu, Aug 24, 2017 at 12:14 PM, Jim Okken <j...@jokken.com> wrote:

> Hi all,
>
>
> We have a pretty complicated storage setup and I am not sure how to
> configure Fuel for deployment of the storage nodes. I'm using Fuel
> 10/Newton. Plus i'm a bit confused on some of the storage aspects
> (image/glance, volume/cinder, ephemeral/?.)
>
>
> We have 3 nodes dedicated to be storage nodes, for HA.
>
> We’re using fiber channel extents and need to use the CEPH filesystem.
>
>
> I’ll try to simplify the storage situation at first to ask my initial
> question without too many details.
>
>
> We have a fast and a slow storage location. Management tells me they want
> the slow location for the Glance images and the fast location for the place
> where the instances actually run. (assume compute nodes with slow hard
> drives but access to a fast fiber channel volume.)
>
>
>
> Where is “the place where the instances actually run”. It isn’t via Glance
> nor Cinder is it?
>
> When I configure the storage for CEPH OSD node I see volume settings for
> Base System, CEPH and CEPH journal. (I see my slow storage and my fast
> storage disks).
>
>
> When I configure  the storage for a Compute node I see volume settings for
> Base system and Virtual Storage. Is this Ephemeral storage? How does a
> Virtual Storage volume here compare to the CEPH volume on the CEPH OSD?
>
>
> I have seen an openstack instance who’s .xml file on the compute node
> shows the vHD as a CEPH path (ie: 
> rbd:compute/f63e4d30-7706-40be-8eda-b74e91b9dac1_disk.
> Is this a CEPH local to the compute node or CEPH on the storage node? (Is
> this Ephemeral storage?)
>
>
> Thanks for any help you might have, I’m a bit confused
>
>
> thanks
>
>
> -- Jim
>
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] storage questions (Fuel 10 Newton deploy)

2017-08-24 Thread Jim Okken
Hi all,


We have a pretty complicated storage setup and I am not sure how to
configure Fuel for deployment of the storage nodes. I'm using Fuel
10/Newton. Plus i'm a bit confused on some of the storage aspects
(image/glance, volume/cinder, ephemeral/?.)


We have 3 nodes dedicated to be storage nodes, for HA.

We’re using fiber channel extents and need to use the CEPH filesystem.


I’ll try to simplify the storage situation at first to ask my initial
question without too many details.


We have a fast and a slow storage location. Management tells me they want
the slow location for the Glance images and the fast location for the place
where the instances actually run. (assume compute nodes with slow hard
drives but access to a fast fiber channel volume.)



Where is “the place where the instances actually run”. It isn’t via Glance
nor Cinder is it?

When I configure the storage for CEPH OSD node I see volume settings for
Base System, CEPH and CEPH journal. (I see my slow storage and my fast
storage disks).


When I configure  the storage for a Compute node I see volume settings for
Base system and Virtual Storage. Is this Ephemeral storage? How does a
Virtual Storage volume here compare to the CEPH volume on the CEPH OSD?


I have seen an openstack instance who’s .xml file on the compute node shows
the vHD as a CEPH path (ie:
rbd:compute/f63e4d30-7706-40be-8eda-b74e91b9dac1_disk. Is this a CEPH local
to the compute node or CEPH on the storage node? (Is this Ephemeral
storage?)


Thanks for any help you might have, I’m a bit confused


thanks


-- Jim
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack