Re: [ceph-users] help troubleshooting some osd communication problems
On Fri, Apr 29, 2016 at 9:34 AM, Mike Lovell wrote: > On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov < > asheplya...@mirantis.com> wrote: > >> Hi, >> >> > i also wonder if just taking 148 out of the cluster (probably just >> marking it out) would help >> >> As far as I understand this can only harm your data. The acting set of PG >> 17.73 is [41, 148], >> so after stopping/taking out OSD 148 OSD 41 will store the only copy of >> objects in PG 17.73 >> (so it won't accept writes any more). >> >> > since there are other osds in the up set (140 and 5) >> >> These OSDs are not in the acting set, they have no (at least some of the) >> objects from PG 17.73, >> and are copying the missing objects from OSDs 41 and 148. Naturally this >> slows down or even >> blocks writes to PG 17.73. >> > > k. i didn't know if it could just use the members of the up set that are > not in the acting set for completing writes. when thinking through it in my > head it seemed reasonable but i could also see pitfalls with doing it. > thats why i was asking if it was possible. > > > > the only thing holding things together right now is a while loop doing >> an 'ceph osd down 41' every minute >> >> As far as I understand this disturbs the backfilling and further delays >> writes to that poor PG. >> > > it definitely does seem to have an impact similar to that. the only upside > is that it clears the slow io messages though i don't know if it actually > lets the client io complete. recovery doesn't make any progress though in > between the down commands. its not making any progress on its own anyways. > i went to check things this morning and noticed that the number of objects misplaced had dropped from what i was expecting and was occasionally seeing lines from ceph -w saying a number of objects were recovering. the only PG in a state other than active+clean was the one that 41 and 148 were bickering about so it looks like they were now passing traffic. it appeared to start just after on of the osd down events that was happening in the loop i had running. a little while after the backfill started making progress, it completed. so its fine now. i would still like to try and find out the cause since this has happened twice now. but at least its not an emergency for me at the moment. one other thing that was odd was that i saw the misplaced objects go negative during the backfill. this is one of the lines from ceph -w. 2016-04-29 10:38:15.011241 mon.0 [INF] pgmap v27055697: 6144 pgs: 6143 active+clean, 1 active+undersized+degraded+remapped+backfilling; 123 TB data, 372 TB used, 304 TB / 691 TB avail; 130 MB/s rd, 135 MB/s wr, 11210 op/s; 14547/93845634 objects degraded (0.016%); -13959/93845634 objects misplaced (-0.015%); 27358 kB/s, 7 objects/s recovering it seemed to complete around the point where it got to -14.5k misplaced. i'm guessing this is just a reporting error but i immediately started a deep-scrub on the pg just to make sure things are consistent. mike ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] help troubleshooting some osd communication problems
On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov wrote: > Hi, > > > i also wonder if just taking 148 out of the cluster (probably just > marking it out) would help > > As far as I understand this can only harm your data. The acting set of PG > 17.73 is [41, 148], > so after stopping/taking out OSD 148 OSD 41 will store the only copy of > objects in PG 17.73 > (so it won't accept writes any more). > > > since there are other osds in the up set (140 and 5) > > These OSDs are not in the acting set, they have no (at least some of the) > objects from PG 17.73, > and are copying the missing objects from OSDs 41 and 148. Naturally this > slows down or even > blocks writes to PG 17.73. > k. i didn't know if it could just use the members of the up set that are not in the acting set for completing writes. when thinking through it in my head it seemed reasonable but i could also see pitfalls with doing it. thats why i was asking if it was possible. > the only thing holding things together right now is a while loop doing an > 'ceph osd down 41' every minute > > As far as I understand this disturbs the backfilling and further delays > writes to that poor PG. > it definitely does seem to have an impact similar to that. the only upside is that it clears the slow io messages though i don't know if it actually lets the client io complete. recovery doesn't make any progress though in between the down commands. its not making any progress on its own anyways. mike ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] help troubleshooting some osd communication problems
Hi, > i also wonder if just taking 148 out of the cluster (probably just marking it out) would help As far as I understand this can only harm your data. The acting set of PG 17.73 is [41, 148], so after stopping/taking out OSD 148 OSD 41 will store the only copy of objects in PG 17.73 (so it won't accept writes any more). > since there are other osds in the up set (140 and 5) These OSDs are not in the acting set, they have no (at least some of the) objects from PG 17.73, and are copying the missing objects from OSDs 41 and 148. Naturally this slows down or even blocks writes to PG 17.73. > the only thing holding things together right now is a while loop doing an 'ceph osd down 41' every minute As far as I understand this disturbs the backfilling and further delays writes to that poor PG. Best regards, Alexey On Fri, Apr 29, 2016 at 8:06 AM, Mike Lovell wrote: > i attempted to grab some logs from the two osds in questions with debug_ms > and debug_osd at 20. i have looked through them a little bit but digging > through the logs at this verbosity is something i don't have much > experience with. hopefully someone on the list can help make sense of it. > the logs are at these urls. > > http://stuff.dev-zero.net/ceph-osd.148.debug.log.gz > http://stuff.dev-zero.net/ceph-osd.41.debug.log.gz > http://stuff.dev-zero.net/ceph.mon.log.gz > > it last one is a trimmed portion of the ceph.log from one of the monitors > for the time frame the osd logs cover. to make these, i moved the existing > log file, set the increased verbosity, had the osds reopen their log files, > gave it a few minutes, moved the log files again, and had the osds reopen > their logs a second time. this resulted in something that is hopefully just > enough context to see whats going on. > > i did a 'ceph osd down 41' at about the 20:40:06 mark and the cluster > seems to report normal data for the next 30 seconds. after that, the slow > io messages from both osds about ops from each other start appearing. i > tried tracing a few ops in both logs but couldn't make sense of it. can > anyone help me with taking a look and/or pointers about how to understand > what's going on? > > oh. this is 0.94.5. the basic cluster layout is two racks with 9 nodes in > each rack with either 12 or 14 osds per node. ssd cache tiering is being > used. the pools are just replicated ones with a size of 3. here is the data > from pg dump for the pg that isn't making progress on recovery, which i'm > guessing is a result of the same problem. the workload is a bunch of vms > with rbd. > > pg_stat objects mip degr misp unf bytes log disklog state state_stamp v > reported up up_primary acting acting_primary last_scrub scrub_stamp > last_deep_scrub deep_scrub_stamp > 17.73 14545 0 14545 14547 0 62650182662 10023 10023 > active+undersized+degraded+remapped+backfilling 2016-04-29 01:59:42.148644 > 161768'26740604 161768:37459478 [140,5,41] 140 [41,148] 41 55547'11246156 > 2015-09-15 > 08:53:32.724322 53282'7470580 2015-09-01 07:19:45.054261 > > i also wonder if just taking 148 out of the cluster (probably just marking > it out) would help. the min size is 2 but, since there are other osds in > the up set (140 and 5), will the cluster keep working? or will it block > until the PG has finished with recovery to the new osds? > > thanks in advance. hopefully someone can help soon because right now the > only thing holding things together right now is a while loop doing an 'ceph > osd down 41' every minute. :( > > mike > > On Thu, Apr 28, 2016 at 5:49 PM, Samuel Just wrote: > >> I'd guess that to make any progress we'll need debug ms = 20 on both >> sides of the connection when a message is lost. >> -Sam >> >> On Thu, Apr 28, 2016 at 2:38 PM, Mike Lovell >> wrote: >> > there was a problem on one of the clusters i manage a couple weeks ago >> where >> > pairs of OSDs would wait indefinitely on subops from the other OSD in >> the >> > pair. we used a liberal dose of "ceph osd down ##" on the osds and >> > eventually things just sorted them out a couple days later. >> > >> > it seems to have come back today and co-workers and i are stuck on >> trying to >> > figure out why this is happening. here are the details that i know. >> > currently 2 OSDs, 41 and 148, keep waiting on subops from each other >> > resulting in lines such as the following in ceph.log. >> > >> > 2016-04-28 13:29:26.875797 osd.41 10.217.72.22:6802/3769 56283 : >> cluster >> > [WRN] slow request 30.642736 seconds old, received at 2016-04-28 >> > 13:28:56.233001: osd_op(client.11172360.0:516946146 >> > rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size >> 4194304 >> > write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=1 >> > ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently >> > waiting for subops from 5,140,148 >> > >> > 2016-04-28 13:29:28.031452 osd.148 10.217.72.11:6820/6487 25324 : >> cluster >> > [WRN] slow request 30.960922 seconds old, rece
Re: [ceph-users] help troubleshooting some osd communication problems
i attempted to grab some logs from the two osds in questions with debug_ms and debug_osd at 20. i have looked through them a little bit but digging through the logs at this verbosity is something i don't have much experience with. hopefully someone on the list can help make sense of it. the logs are at these urls. http://stuff.dev-zero.net/ceph-osd.148.debug.log.gz http://stuff.dev-zero.net/ceph-osd.41.debug.log.gz http://stuff.dev-zero.net/ceph.mon.log.gz it last one is a trimmed portion of the ceph.log from one of the monitors for the time frame the osd logs cover. to make these, i moved the existing log file, set the increased verbosity, had the osds reopen their log files, gave it a few minutes, moved the log files again, and had the osds reopen their logs a second time. this resulted in something that is hopefully just enough context to see whats going on. i did a 'ceph osd down 41' at about the 20:40:06 mark and the cluster seems to report normal data for the next 30 seconds. after that, the slow io messages from both osds about ops from each other start appearing. i tried tracing a few ops in both logs but couldn't make sense of it. can anyone help me with taking a look and/or pointers about how to understand what's going on? oh. this is 0.94.5. the basic cluster layout is two racks with 9 nodes in each rack with either 12 or 14 osds per node. ssd cache tiering is being used. the pools are just replicated ones with a size of 3. here is the data from pg dump for the pg that isn't making progress on recovery, which i'm guessing is a result of the same problem. the workload is a bunch of vms with rbd. pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 17.73 14545 0 14545 14547 0 62650182662 10023 10023 active+undersized+degraded+remapped+backfilling 2016-04-29 01:59:42.148644 161768'26740604 161768:37459478 [140,5,41] 140 [41,148] 41 55547'11246156 2015-09-15 08:53:32.724322 53282'7470580 2015-09-01 07:19:45.054261 i also wonder if just taking 148 out of the cluster (probably just marking it out) would help. the min size is 2 but, since there are other osds in the up set (140 and 5), will the cluster keep working? or will it block until the PG has finished with recovery to the new osds? thanks in advance. hopefully someone can help soon because right now the only thing holding things together right now is a while loop doing an 'ceph osd down 41' every minute. :( mike On Thu, Apr 28, 2016 at 5:49 PM, Samuel Just wrote: > I'd guess that to make any progress we'll need debug ms = 20 on both > sides of the connection when a message is lost. > -Sam > > On Thu, Apr 28, 2016 at 2:38 PM, Mike Lovell > wrote: > > there was a problem on one of the clusters i manage a couple weeks ago > where > > pairs of OSDs would wait indefinitely on subops from the other OSD in the > > pair. we used a liberal dose of "ceph osd down ##" on the osds and > > eventually things just sorted them out a couple days later. > > > > it seems to have come back today and co-workers and i are stuck on > trying to > > figure out why this is happening. here are the details that i know. > > currently 2 OSDs, 41 and 148, keep waiting on subops from each other > > resulting in lines such as the following in ceph.log. > > > > 2016-04-28 13:29:26.875797 osd.41 10.217.72.22:6802/3769 56283 : cluster > > [WRN] slow request 30.642736 seconds old, received at 2016-04-28 > > 13:28:56.233001: osd_op(client.11172360.0:516946146 > > rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size > 4194304 > > write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=1 > > ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently > > waiting for subops from 5,140,148 > > > > 2016-04-28 13:29:28.031452 osd.148 10.217.72.11:6820/6487 25324 : > cluster > > [WRN] slow request 30.960922 seconds old, received at 2016-04-28 > > 13:28:57.070471: osd_op(client.24127500.0:2960618 > > rbd_data.38178d8adeb4d.10f8 [set-alloc-hint object_size > 8388608 > > write_size 8388608,write 3194880~4096] 17.fb41a37c RETRY=1 > > ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently > > waiting for subops from 41,115 > > > > from digging in the logs, it appears like some messages are being lost > > between the OSDs. this is what osd.41 sees: > > - > > 2016-04-28 13:28:56.233702 7f3b171e0700 1 -- 10.217.72.22:6802/3769 <== > > client.11172360 10.217.72.41:0/6031968 6 > > osd_op(client.11172360.0:516946146 > rbd_data.36bfe359c4998.0d08 > > [set-alloc-hint object_size 4194304 write_size 4194304,write > 1835008~143360] > > 17.3df49873 RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected > > e159001) v5 236+0+143360 (781016428 0 3953649960) 0x1d551c00 con > > 0x1a78d9c0 > > 2016-04-28 13:28:56.233983 7f3b49020700 1 -- > 10.217.8
Re: [ceph-users] help troubleshooting some osd communication problems
I'd guess that to make any progress we'll need debug ms = 20 on both sides of the connection when a message is lost. -Sam On Thu, Apr 28, 2016 at 2:38 PM, Mike Lovell wrote: > there was a problem on one of the clusters i manage a couple weeks ago where > pairs of OSDs would wait indefinitely on subops from the other OSD in the > pair. we used a liberal dose of "ceph osd down ##" on the osds and > eventually things just sorted them out a couple days later. > > it seems to have come back today and co-workers and i are stuck on trying to > figure out why this is happening. here are the details that i know. > currently 2 OSDs, 41 and 148, keep waiting on subops from each other > resulting in lines such as the following in ceph.log. > > 2016-04-28 13:29:26.875797 osd.41 10.217.72.22:6802/3769 56283 : cluster > [WRN] slow request 30.642736 seconds old, received at 2016-04-28 > 13:28:56.233001: osd_op(client.11172360.0:516946146 > rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size 4194304 > write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=1 > ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently > waiting for subops from 5,140,148 > > 2016-04-28 13:29:28.031452 osd.148 10.217.72.11:6820/6487 25324 : cluster > [WRN] slow request 30.960922 seconds old, received at 2016-04-28 > 13:28:57.070471: osd_op(client.24127500.0:2960618 > rbd_data.38178d8adeb4d.10f8 [set-alloc-hint object_size 8388608 > write_size 8388608,write 3194880~4096] 17.fb41a37c RETRY=1 > ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently > waiting for subops from 41,115 > > from digging in the logs, it appears like some messages are being lost > between the OSDs. this is what osd.41 sees: > - > 2016-04-28 13:28:56.233702 7f3b171e0700 1 -- 10.217.72.22:6802/3769 <== > client.11172360 10.217.72.41:0/6031968 6 > osd_op(client.11172360.0:516946146 rbd_data.36bfe359c4998.0d08 > [set-alloc-hint object_size 4194304 write_size 4194304,write 1835008~143360] > 17.3df49873 RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected > e159001) v5 236+0+143360 (781016428 0 3953649960) 0x1d551c00 con > 0x1a78d9c0 > 2016-04-28 13:28:56.233983 7f3b49020700 1 -- 10.217.89.22:6825/313003769 > --> 10.217.89.18:6806/1010441 -- osd_repop(client.11172360.0:516946146 17.73 > 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v 159001'26722799) > v1 -- ?+46 0x1d6db200 con 0x21add440 > 2016-04-28 13:28:56.234017 7f3b49020700 1 -- 10.217.89.22:6825/313003769 > --> 10.217.89.11:6810/4543 -- osd_repop(client.11172360.0:516946146 17.73 > 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v 159001'26722799) > v1 -- ?+46 0x1d6dd000 con 0x21ada000 > 2016-04-28 13:28:56.234046 7f3b49020700 1 -- 10.217.89.22:6825/313003769 > --> 10.217.89.11:6812/43006487 -- osd_repop(client.11172360.0:516946146 > 17.73 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v > 159001'26722799) v1 -- ?+144137 0x14becc00 con 0xf2cd4a0 > 2016-04-28 13:28:56.243555 7f3b35976700 1 -- 10.217.89.22:6825/313003769 > <== osd.140 10.217.89.11:6810/4543 23 > osd_repop_reply(client.11172360.0:516946146 17.73 ondisk, result = 0) v1 > 83+0+0 (494696391 0 0) 0x28ea7b00 con 0x21ada000 > 2016-04-28 13:28:56.257816 7f3b27d9b700 1 -- 10.217.89.22:6825/313003769 > <== osd.5 10.217.89.18:6806/1010441 35 > osd_repop_reply(client.11172360.0:516946146 17.73 ondisk, result = 0) v1 > 83+0+0 (2393425574 0 0) 0xfe82fc0 con 0x21add440 > > > this, however is what osd.148 sees: > - > [ulhglive-root@ceph1 ~]# grep :516946146 /var/log/ceph/ceph-osd.148.log > 2016-04-28 13:29:33.470156 7f195fcfc700 1 -- 10.217.72.11:6820/6487 <== > client.11172360 10.217.72.41:0/6031968 460 > osd_op(client.11172360.0:516946146 rbd_data.36bfe359c4998.0d08 > [set-alloc-hint object_size 4194304 write_size 4194304,write 1835008~143360] > 17.3df49873 RETRY=2 ack+ondisk+retry+write+redirected+known_if_redirected > e159002) v5 236+0+143360 (129493315 0 3953649960) 0x1edf2300 con > 0x24dc0d60 > > also, due to the ceph osd down commands, there is recovery that needs to > happen for a PG shared between these OSDs that is never making any progress. > its probably due to whatever is cause the repops to fail. > > i did some tcpdump on both sides limiting things to the ip addresses and > ports being used by these two OSDs and see packets flowing between the two > osds. i attempted to have wireshark decode the actual ceph traffic but it > was only able to get bits and pieces of the ceph protocol bits but at least > for the moment i'm blaming that on the ceph dissector for wireshark. there > aren't any dropped or error packets on any of the network interfaces > involved. > > does anyone have any ideas of where to look next or other tips for this? > we've put debug_ms and debug_osd at 1/1 to get the bits of inf
[ceph-users] help troubleshooting some osd communication problems
there was a problem on one of the clusters i manage a couple weeks ago where pairs of OSDs would wait indefinitely on subops from the other OSD in the pair. we used a liberal dose of "ceph osd down ##" on the osds and eventually things just sorted them out a couple days later. it seems to have come back today and co-workers and i are stuck on trying to figure out why this is happening. here are the details that i know. currently 2 OSDs, 41 and 148, keep waiting on subops from each other resulting in lines such as the following in ceph.log. 2016-04-28 13:29:26.875797 osd.41 10.217.72.22:6802/3769 56283 : cluster [WRN] slow request 30.642736 seconds old, received at 2016-04-28 13:28:56.233001: osd_op(client.11172360.0:516946146 rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size 4194304 write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently waiting for subops from 5,140,148 2016-04-28 13:29:28.031452 osd.148 10.217.72.11:6820/6487 25324 : cluster [WRN] slow request 30.960922 seconds old, received at 2016-04-28 13:28:57.070471: osd_op(client.24127500.0:2960618 rbd_data.38178d8adeb4d.10f8 [set-alloc-hint object_size 8388608 write_size 8388608,write 3194880~4096] 17.fb41a37c RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected e159001) currently waiting for subops from 41,115 from digging in the logs, it appears like some messages are being lost between the OSDs. this is what osd.41 sees: - 2016-04-28 13:28:56.233702 7f3b171e0700 1 -- 10.217.72.22:6802/3769 <== client.11172360 10.217.72.41:0/6031968 6 osd_op(client.11172360.0:516946146 rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size 4194304 write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=1 ack+ondisk+retry+write+redirected+known_if_redirected e159001) v5 236+0+143360 (781016428 0 3953649960) 0x1d551c00 con 0x1a78d9c0 2016-04-28 13:28:56.233983 7f3b49020700 1 -- 10.217.89.22:6825/313003769 --> 10.217.89.18:6806/1010441 -- osd_repop(client.11172360.0:516946146 17.73 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v 159001'26722799) v1 -- ?+46 0x1d6db200 con 0x21add440 2016-04-28 13:28:56.234017 7f3b49020700 1 -- 10.217.89.22:6825/313003769 --> 10.217.89.11:6810/4543 -- osd_repop(client.11172360.0:516946146 17.73 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v 159001'26722799) v1 -- ?+46 0x1d6dd000 con 0x21ada000 2016-04-28 13:28:56.234046 7f3b49020700 1 -- 10.217.89.22:6825/313003769 --> 10.217.89.11:6812/43006487 -- osd_repop(client.11172360.0:516946146 17.73 3df49873/rbd_data.36bfe359c4998.0d08/head//17 v 159001'26722799) v1 -- ?+144137 0x14becc00 con 0xf2cd4a0 2016-04-28 13:28:56.243555 7f3b35976700 1 -- 10.217.89.22:6825/313003769 <== osd.140 10.217.89.11:6810/4543 23 osd_repop_reply(client.11172360.0:516946146 17.73 ondisk, result = 0) v1 83+0+0 (494696391 0 0) 0x28ea7b00 con 0x21ada000 2016-04-28 13:28:56.257816 7f3b27d9b700 1 -- 10.217.89.22:6825/313003769 <== osd.5 10.217.89.18:6806/1010441 35 osd_repop_reply(client.11172360.0:516946146 17.73 ondisk, result = 0) v1 83+0+0 (2393425574 0 0) 0xfe82fc0 con 0x21add440 this, however is what osd.148 sees: - [ulhglive-root@ceph1 ~]# grep :516946146 /var/log/ceph/ceph-osd.148.log 2016-04-28 13:29:33.470156 7f195fcfc700 1 -- 10.217.72.11:6820/6487 <== client.11172360 10.217.72.41:0/6031968 460 osd_op(client.11172360.0:516946146 rbd_data.36bfe359c4998.0d08 [set-alloc-hint object_size 4194304 write_size 4194304,write 1835008~143360] 17.3df49873 RETRY=2 ack+ondisk+retry+write+redirected+known_if_redirected e159002) v5 236+0+143360 (129493315 0 3953649960) 0x1edf2300 con 0x24dc0d60 also, due to the ceph osd down commands, there is recovery that needs to happen for a PG shared between these OSDs that is never making any progress. its probably due to whatever is cause the repops to fail. i did some tcpdump on both sides limiting things to the ip addresses and ports being used by these two OSDs and see packets flowing between the two osds. i attempted to have wireshark decode the actual ceph traffic but it was only able to get bits and pieces of the ceph protocol bits but at least for the moment i'm blaming that on the ceph dissector for wireshark. there aren't any dropped or error packets on any of the network interfaces involved. does anyone have any ideas of where to look next or other tips for this? we've put debug_ms and debug_osd at 1/1 to get the bits of info mentioned. putting them at 20 probably isn't going to be helpful so anyone have a suggestion on another level to put it at that might be useful? go figure that this would happen while i'm at the openstack summit and it would keep me from paying attention to some interesting presentations. thanks in advance for any help. mike __