[Openstack] [Swift] Object replication...

2016-12-13 Thread Shyam Prasad N
Hi,

I have an openstack swift cluster with 2 nodes, and a replication count of
2.
So, theoretically, during a PUT request, both replicas are updated
synchronously. Only then the request will return a success. Please correct
me if I'm wrong on this.

I have a script that periodically does a PUT to a small object with some
random data, and then immediately GETs the object. On some occasions, I'm
getting older data during the GET.

Is my expectation above correct? Or is there some other setting needed to
make the replication synchronous?

-- 
-Shyam
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] instance's provider network ip can not be accessed from outside.

2016-12-13 Thread walterxj






Hi,


  I'm following the guide of newton with CentOS7 
(http://docs.openstack.org/newton/install-guide-rdo/neutron.html) ,everything 
seems OK but when I ping the vm's ip (in provider network) from node(assume 
nodeA) on the provider physical network ,it returns unreachable.But nodeA can 
reach the provider network's dhcp and gateway ip. Also the vm can reach dhcp 
and gateway and nodeA's IP. 

  After a long time research I found that the problem resulted in the 
compute-node's iptables:

there is an iptables chain for each bridge,just like: -A 
neutron-linuxbri-i7f605f37-f -m comment --comment "Send unmatched traffic to 
the fallback chain." -j neutron-linuxbri-sg-fallback,when I delete this 
chain,the vm's provider network ip can be reached! Everything works well.Is 
this a bug or I have misconfigured something? Any advice is appreciated !


walterxj

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] instance's provider network ip can not be accessed from outside.

2016-12-13 Thread walterxj






yep ...
   I just found the security issue too. My fault , thank you ! and sorry for it.


walterxj
 From: Jorge Luiz CorreaDate: 2016-12-13 17:43To: walterxjCC: openstackSubject: 
Re: [Openstack] instance's provider network ip can not be accessed from 
outside.Hum, have you checked the security group rules? By default, all traffic 
can go out from VMs, but we need to create some rules to pass traffic from 
outside to VMs. 

I'm just making a bet. Maybe this iptables rule is the rule that drop the 
packets when there is no rule do pass them from outside to inside. 

:)

- JLC

On Tue, Dec 13, 2016 at 6:55 AM, walterxj  wrote:

Hi,


  I'm following the guide of newton with CentOS7 
(http://docs.openstack.org/newton/install-guide-rdo/neutron.html) ,everything 
seems OK but when I ping the vm's ip (in provider network) from node(assume 
nodeA) on the provider physical network ,it returns unreachable.But nodeA can 
reach the provider network's dhcp and gateway ip. Also the vm can reach dhcp 
and gateway and nodeA's IP. 

  After a long time research I found that the problem resulted in the 
compute-node's iptables:

there is an iptables chain for each bridge,just like: -A 
neutron-linuxbri-i7f605f37-f -m comment --comment "Send unmatched traffic to 
the fallback chain." -j neutron-linuxbri-sg-fallback,when I delete this 
chain,the vm's provider network ip can be reached! Everything works well.Is 
this a bug or I have misconfigured something? Any advice is appreciated !


walterxj

___

Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Post to     : openstack@lists.openstack.org

Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack





___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] instance's provider network ip can not be accessed from outside.

2016-12-13 Thread Jorge Luiz Correa
Hum, have you checked the security group rules? By default, all traffic can
go out from VMs, but we need to create some rules to pass traffic from
outside to VMs.

I'm just making a bet. Maybe this iptables rule is the rule that drop the
packets when there is no rule do pass them from outside to inside.

:)

- JLC

On Tue, Dec 13, 2016 at 6:55 AM, walterxj  wrote:

> Hi,
>
> I'm following the guide of newton with CentOS7 (http://docs.openstack.org/
> newton/install-guide-rdo/neutron.html) ,everything seems OK but when I
> ping the vm's ip (in provider network) from node(assume nodeA) on the
> provider physical network ,it returns unreachable.But nodeA can reach the
> provider network's dhcp and gateway ip. Also the vm can reach dhcp and
> gateway and nodeA's IP.
> After a long time research I found that the problem resulted in the
> compute-node's iptables:
> there is an iptables chain for each bridge,just like: -A
> neutron-linuxbri-i7f605f37-f -m comment --comment "Send unmatched traffic
> to the fallback chain." -j neutron-linuxbri-sg-fallback,when I delete
> this chain,the vm's provider network ip can be reached! Everything works
> well.Is this a bug or I have misconfigured something? Any advice is
> appreciated !
>
> --
> walterxj
>
> ___
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
>
>
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Swift] Object replication...

2016-12-13 Thread John Dickinson


On 13 Dec 2016, at 0:21, Shyam Prasad N wrote:

> Hi,
>
> I have an openstack swift cluster with 2 nodes, and a replication count of
> 2.
> So, theoretically, during a PUT request, both replicas are updated
> synchronously. Only then the request will return a success. Please correct
> me if I'm wrong on this.
>
> I have a script that periodically does a PUT to a small object with some
> random data, and then immediately GETs the object. On some occasions, I'm
> getting older data during the GET.
>
> Is my expectation above correct? Or is there some other setting needed to
> make the replication synchronous?


This is an interesting case of both Swift and your expectations being correct. 
But wait! How can that be when they seem to be at odds? Therein lies the fun* 
of Distributed Systems. Yay.

(*actually, not that much fun)

Ok, here's how it works. I'm assuming you have more than one hard drive on each 
of your two servers. When Swift gets the PUT request, the proxy will determine 
where the object data is supposed to be in the cluster. It does this via 
hashing and ring lookups (this is deterministic, but the details of that 
process aren't important here). The proxy will look for  places 
to put the data. In your case, this is 2. Because of the way the ring works, it 
will look for one drive on each of your two servers first. It will not put the 
data on two drives on one server. So in the Happy Path, the client makes a PUT 
request, the proxy sends the data to both replicas, and after both have been 
fsync'd, the client gets a 201 Created response. [1]

This is well and good, and the greatest part about it is that Swift can 
guarantee read-your-creates. That is, when you create a new object, you are 
immediately able to read it. However, what you describe is slightly different. 
You're overwriting an existing object, and sometimes you're getting back the 
older version of the object on a subsequent read. This is normal and expected. 
Read on for why.

The above process is the Happy Path for when there are no failures in the 
system. A failure could be a hardware failure, but it could also be some part 
of the system being overloaded. Spinning drives have very real physical limits 
to the amount of data they can read and write per unit time. An overloaded hard 
drive can cause a read or write request to time out, thus becoming a "failure" 
in the cluster.

So when you overwrite an object in Swift, the exact same process happens: the 
proxy finds the right locations, sends the data to all those locations, and 
returns a success if a quorum successfully fsync'd the data to disk.

However, what happens if there's a failure?

When the proxy determines the correct location for the object, it chooses what 
we call "primary" nodes. These are the canonical locations where the data is 
supposed to be right now. All the other drives in the cluster are called 
"handoff" nodes. For a given object, some nodes ( of them, to be 
exact) are primary nodes, and all the rest in the cluster are handoffs. For 
another object, a different set of nodes will be primary, and all the rest in 
the cluster are handoffs. This is the same regardless of how many replicas 
you're using or how many drives you have in the cluster.

So when there's a failure in the cluster and a write request comes in, what 
happens? Again, the proxy finds the primary nodes for the object and it tries 
to connect to them. However, if one (or more) can't be connected to, then the 
proxy will start trying to connect to handoff nodes. After the proxy gets 
 successful connections, it sends the data to those storage 
nodes, the data is fsync'd, and the client gets a successful response code 
(assuming at least a quorum were able to be fsync'd). Note that in your case 
with two replicas, if the primary nodes were extra busy (e.g. serving other 
requests) or actually failing (drives do that, pretty often, in fact), then the 
proxy will choose a handoff location to write the data. This means that even 
when the cluster has issues, your writes are still completely durably 
written.[2]

The read request path is very similar: primary nodes are chosen, one is 
selected at random, if the data is there, it's returned. If the data isn't 
there, the next primary is chosen, etc etc.

Ok, we're finally able to get down to answering your question.

Let's assume you have a busy drive in the cluster. You (over)write your object, 
the proxy looks up the primary nodes, sees that one is busy (i.e. gets a 
timeout), chooses a handoff location, writes the data, and you get a 201 
response. Since this is an overwrite, you've got an old version of the object 
on one primary, a new version on another primary, and a new version on a 
handoff node. Now you do the immediate GET. The proxy finds the primary nodes, 
randomly chooses one, and oh no! it chose the one with the old data. Since 
there's data there, that version of the object gets returned to the client, and 
you see the o

[Openstack] DVR ARP cache update loop delaying launch of metadata proxy

2016-12-13 Thread Gustavo Randich
Hi Openstackers,

We have the folowing issue (using Mitaka / DVR / Xenial), perhaps someone
can help ;)

When our hosts boots up, the ARP cache population loop of L3 Agent is
delaying the start of neutron-ns-metadata-proxy for around a minute -- see
logs below; then, when nova-compute launches VMs, all of cloud-init runs
fail with timeout when reading metadata

To workaround this, we've made a systemd unit on which nova-compute is
dependent; this unit waits for ns-metadata-proxy process to appear, and
only then nova-compute starts

Curiously, in dvr_local_router.py, in _update_arp_entry function, there is
a comment saying "# TODO(mrsmith): optimize the calls below for bulk
calls"...

By now we have a single virtual router with 170 VMs, but the number of VMs
will grow, so my questions are

Should this be issue of concern?

Is there a better / faster / bulk way to execute those "ip neigh" commands?

Or simply, metadata proxy should launch before ARP cache population?




PD: I've also seen (obviously) this ARP cache population in the L3 agent of
Neutron Nodes, and I hope it does not affect / delay the HA failover
mechanism... (didn't test yet)




# journalctl -u neutron-l3-agent | grep "COMMAND=/usr/bin/neutron-rootwrap
/etc/neutron/rootwrap.conf" | sed 's,neutron : TTY=unknown ;
PWD=/var/lib/neutron ; USER=root ; COMMAND=/usr/bin/neutron-rootwrap
/etc/neutron/rootwrap.conf,,g' | head -25

Dec 13 13:33:43 e71-host15 sudo[20157]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 neutron-ns-metadata-proxy
--pid_file=/var/lib/neutron/external/pids/6149559f-fa54-493c-bf37-7d1827181228.pid
--metadata_proxy_socket=/var/
Dec 13 13:33:55 e71-host15 sudo[20309]:   ip -o netns list
Dec 13 13:33:55 e71-host15 sudo[20315]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 sysctl -w net.ipv4.ip_forward=1
Dec 13 13:33:55 e71-host15 sudo[20322]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 sysctl -w
net.ipv6.conf.all.forwarding=1
Dec 13 13:33:56 e71-host15 sudo[20331]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show rfp-6149559f-f
Dec 13 13:33:56 e71-host15 sudo[20336]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:56 e71-host15 sudo[20342]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:56 e71-host15 sudo[20345]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip addr show qr-24f3070a-d4
permanent
Dec 13 13:33:56 e71-host15 sudo[20348]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 route list dev
qr-24f3070a-d4 scope link
Dec 13 13:33:56 e71-host15 sudo[20354]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -6 route list dev
qr-24f3070a-d4 scope link
Dec 13 13:33:56 e71-host15 sudo[20357]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 arping -A -I qr-24f3070a-d4 -c
3 -w 4.5 10.96.0.1
Dec 13 13:33:57 e71-host15 sudo[20368]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:57 e71-host15 sudo[20372]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.100 lladdr fa:16:3e:1b:d6:cd nud permanent dev qr-24f3070a-d4
Dec 13 13:33:57 e71-host15 sudo[20375]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:57 e71-host15 sudo[20378]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.101 lladdr fa:16:3e:b4:12:28 nud permanent dev qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20384]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20387]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.102 lladdr fa:16:3e:3f:bb:58 nud permanent dev qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20390]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20393]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.103 lladdr fa:16:3e:5a:90:67 nud permanent dev qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20399]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20402]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.104 lladdr fa:16:3e:ba:fc:f3 nud permanent dev qr-24f3070a-d4
Dec 13 13:33:58 e71-host15 sudo[20405]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -o link show qr-24f3070a-d4
Dec 13 13:33:59 e71-host15 sudo[20411]:   ip netns exec
qrouter-6149559f-fa54-493c-bf37-7d1827181228 ip -4 neigh replace
10.96.0.105 lladdr fa:16:3e:0a:16:d1 nud permanent dev qr-24f3070a-d4
...
...
...
# journalctl -u neutron-l3-agent | grep "COMMAND=/usr/bin/neutron-rootwrap
/etc/n

Re: [Openstack] [Swift] How to calculate rsync max connections

2016-12-13 Thread Mark Kirkwood

Any thoughts on this one guys?


On 06/12/16 09:56, Mark Kirkwood wrote:

Hi,

Is there a way to calculate rsync max connection (i.e based on number 
of hosts, devices, object server workers etc)?


Some context: we have recently gone live with Swift 2.7.0 cluster (2 
regions, 6 hosts overall, each with 16 cores/32 threads and 4 drives). 
We have rsync max connections to 25 (which I think is being set by 
puppet-swift.., is the default value). We are seeing errors in the 
rsync log: 'max connections (25) reached'. I'd like to be able to say 
to the operations guys: set 'max connections to x, because for calculating it>'.



Cheers


Mark


___
Mailing list: 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Post to : openstack@lists.openstack.org
Unsubscribe : 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Swift] How to calculate rsync max connections

2016-12-13 Thread Clay Gerrard
I strongly prefer to configure rsyncd with a module per disk:

https://github.com/openstack/swift/blob/c0640f87107d84d262c20bdc1250b805ae8f9482/etc/rsyncd.conf-sample#L25

and then tune the per-disk connection limit to 2-4

There's not really hard and fast rule, in some sense it's related to
replicator concurrency (the producer of the rsync's that *want* those
connections) - targeting roughly:

   replicator-concurrency * #nodes ~== (max_conenction * nodes or
max_connection_per_module_per_disk * #disks)

But max connections is about enforcing a sane limit, if you hit that limit
*a lot* it may be limiting throughput (and can lead to other
inefficiencies).  But having a few max connections fire now and again is
probably a good thing, since there's no guarantee that every replicator
coro in the cluster might not be thinking it should be talking to a
partition that just so happen to all have a replica on this one disk at
this one time.  You generally don't want too many connections hammering a
single spindle because of the await it introduces for client requests that
might need to hit that disk (but there's auditor configuration management
as well, and the new ionice options for tuning might be an option as well).

-Clay

On Tue, Dec 13, 2016 at 3:11 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Any thoughts on this one guys?
>
>
>
> On 06/12/16 09:56, Mark Kirkwood wrote:
>
>> Hi,
>>
>> Is there a way to calculate rsync max connection (i.e based on number of
>> hosts, devices, object server workers etc)?
>>
>> Some context: we have recently gone live with Swift 2.7.0 cluster (2
>> regions, 6 hosts overall, each with 16 cores/32 threads and 4 drives). We
>> have rsync max connections to 25 (which I think is being set by
>> puppet-swift.., is the default value). We are seeing errors in the rsync
>> log: 'max connections (25) reached'. I'd like to be able to say to the
>> operations guys: set 'max connections to x, because > calculating it>'.
>>
>>
>> Cheers
>>
>>
>> Mark
>>
>>
>> ___
>> Mailing list: http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>> Post to : openstack@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>>
>
>
> ___
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
> k
> Post to : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
> k
>
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Swift] How to calculate rsync max connections

2016-12-13 Thread Mark Kirkwood

Thanks Clay, very helpful!


On 14/12/16 13:06, Clay Gerrard wrote:

I strongly prefer to configure rsyncd with a module per disk:

https://github.com/openstack/swift/blob/c0640f87107d84d262c20bdc1250b805ae8f9482/etc/rsyncd.conf-sample#L25

and then tune the per-disk connection limit to 2-4

There's not really hard and fast rule, in some sense it's related to 
replicator concurrency (the producer of the rsync's that *want* those 
connections) - targeting roughly:


   replicator-concurrency * #nodes ~== (max_conenction * nodes or 
max_connection_per_module_per_disk * #disks)


But max connections is about enforcing a sane limit, if you hit that 
limit *a lot* it may be limiting throughput (and can lead to other 
inefficiencies).  But having a few max connections fire now and again 
is probably a good thing, since there's no guarantee that every 
replicator coro in the cluster might not be thinking it should be 
talking to a partition that just so happen to all have a replica on 
this one disk at this one time.  You generally don't want too many 
connections hammering a single spindle because of the await it 
introduces for client requests that might need to hit that disk (but 
there's auditor configuration management as well, and the new ionice 
options for tuning might be an option as well).


-Clay

On Tue, Dec 13, 2016 at 3:11 PM, Mark Kirkwood 
mailto:mark.kirkw...@catalyst.net.nz>> 
wrote:


Any thoughts on this one guys?



On 06/12/16 09:56, Mark Kirkwood wrote:

Hi,

Is there a way to calculate rsync max connection (i.e based on
number of hosts, devices, object server workers etc)?

Some context: we have recently gone live with Swift 2.7.0
cluster (2 regions, 6 hosts overall, each with 16 cores/32
threads and 4 drives). We have rsync max connections to 25
(which I think is being set by puppet-swift.., is the default
value). We are seeing errors in the rsync log: 'max
connections (25) reached'. I'd like to be able to say to the
operations guys: set 'max connections to x, because '.


Cheers


Mark


___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Post to : openstack@lists.openstack.org

Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Post to : openstack@lists.openstack.org

Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack






___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] devstack support for neutron sfc?

2016-12-13 Thread Michael Gale
Hello,


Does anyone know if neutron-sfc is working in devstack?

I started to follow the instructions here:
https://wiki.openstack.org/wiki/Neutron/ServiceInsertionAndChaining#Single_Host_networking-sfc_installation_steps_and_testbed_setup

I ended up with a working devstack instance using:
- Ubuntu 16.04
- newton stable for devstack and networking-sfc
- I set an environment var to disable OVS recompile since 16.04 comes with
OVS 2.5.0 and the recompil was failing during the buidl.


I could build VM's, networks and I believe I setup a sfc implementation
correctly (portpairs, portgroups, portclassification, etc). I created a
ServiceVM on the same internal network as my source VM and used an neutron
router to access the outside world. I tried to route all outbound traffic
on port 80 through my ServiceVM.

The issue I ran into was that my ServiceVM would only see the initial
outbound SYN's after that the return traffic and data packets would always
go between the source VM and the external web server only.

>From the different test scenarios I ran, I could always see the initial
outbound SYN packets however it always seems that the neutron router would
route the return packets back via the normal routing rules and ignore my
sfc setup.

Michael
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Swift] Object replication...

2016-12-13 Thread Shyam Prasad N
Thanks John, for that excellent description. (Perhaps this should make it
into one of the FAQs page) :)

I didn't know that swift was an eventually consistent object storage
system. With the (replica_count/2)+1 synchronous PUT model, I always
thought swift was strictly consistent.

So going by what you're saying, even with a replica count of 3, the storage
system will still be eventually consistent, not strictly. I can only
increase my chances of getting consistent data by increasing the disk count
(so as to even out the load), but I cannot be absolutely certain.

Will I be able to somehow achieve strict consistency model on a swift
cluster then? Will reducing the replica count to 1 help? Will that ensure
that every overwrite will update the object location mapping? Or are there
still chances that there is a handoff when the mapped location is busy, and
another get is mapped to an older version?

Please note that I'm okay to sacrifice high availability, if that ensures
that the data is strictly consistent.

Regards,
Shyam


On Dec 13, 2016 22:46, "John Dickinson"  wrote:

On 13 Dec 2016, at 0:21, Shyam Prasad N wrote:

Hi,

I have an openstack swift cluster with 2 nodes, and a replication count of
2.
So, theoretically, during a PUT request, both replicas are updated
synchronously. Only then the request will return a success. Please correct
me if I'm wrong on this.

I have a script that periodically does a PUT to a small object with some
random data, and then immediately GETs the object. On some occasions, I'm
getting older data during the GET.

Is my expectation above correct? Or is there some other setting needed to
make the replication synchronous?

This is an interesting case of both Swift and your expectations being
correct. But wait! How can that be when they seem to be at odds? Therein
lies the fun* of Distributed Systems. Yay.

(*actually, not that much fun)

Ok, here's how it works. I'm assuming you have more than one hard drive on
each of your two servers. When Swift gets the PUT request, the proxy will
determine where the object data is supposed to be in the cluster. It does
this via hashing and ring lookups (this is deterministic, but the details
of that process aren't important here). The proxy will look for  places to put the data. In your case, this is 2. Because of the way
the ring works, it will look for one drive on each of your two servers
first. It will not put the data on two drives on one server. So in the
Happy Path, the client makes a PUT request, the proxy sends the data to
both replicas, and after both have been fsync'd, the client gets a 201
Created response. [1]

This is well and good, and the greatest part about it is that Swift can
guarantee read-your-creates. That is, when you create a new object, you are
immediately able to read it. However, what you describe is slightly
different. You're overwriting an existing object, and sometimes you're
getting back the older version of the object on a subsequent read. This is
normal and expected. Read on for why.

The above process is the Happy Path for when there are no failures in the
system. A failure could be a hardware failure, but it could also be some
part of the system being overloaded. Spinning drives have very real
physical limits to the amount of data they can read and write per unit
time. An overloaded hard drive can cause a read or write request to time
out, thus becoming a "failure" in the cluster.

So when you overwrite an object in Swift, the exact same process happens:
the proxy finds the right locations, sends the data to all those locations,
and returns a success if a quorum successfully fsync'd the data to disk.

However, what happens if there's a failure?

When the proxy determines the correct location for the object, it chooses
what we call "primary" nodes. These are the canonical locations where the
data is supposed to be right now. All the other drives in the cluster are
called "handoff" nodes. For a given object, some nodes ( of
them, to be exact) are primary nodes, and all the rest in the cluster are
handoffs. For another object, a different set of nodes will be primary, and
all the rest in the cluster are handoffs. This is the same regardless of
how many replicas you're using or how many drives you have in the cluster.

So when there's a failure in the cluster and a write request comes in, what
happens? Again, the proxy finds the primary nodes for the object and it
tries to connect to them. However, if one (or more) can't be connected to,
then the proxy will start trying to connect to handoff nodes. After the
proxy gets  successful connections, it sends the data to
those storage nodes, the data is fsync'd, and the client gets a successful
response code (assuming at least a quorum were able to be fsync'd). Note
that in your case with two replicas, if the primary nodes were extra busy
(e.g. serving other requests) or actually failing (drives do that, pretty
often, in fact), then the proxy will choose a handoff lo

[Openstack] Upgrade from Mitaka to Newton. Nova 13 does not worn to work witn Nova 14

2016-12-13 Thread Evgeniy Ivanov

Hello!

I'm experiencing some kind of a bug, when I have a controller with 
mitaka and a controller with Newton.
When I upgraded the first part of my cluster to Newton, the second part 
stopped to work (nova-api).


I got this error:
oslo_service.service ServiceTooOld: This service is older (v9) than the 
minimum (v15) version of the rest of the deployment. Unable to continue.


So, as we can see, all those dialogs, manuals, docs, posts which are 
talking about compatibility between 2 releases are wrong, aren't they?


I will be happy to read some thoughts from you guys, thanks!

How to reproduce:
Install nova 13 on 2 nodes
Disable the first node
Upgrade the first node to nova 14
Restart nova-api on the second node (13 ver.)

--
Best Regards,
Evgeniy Ivanov


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] Upgrade from Mitaka to Newton. Nova 13 does not worn to work witn Nova 14

2016-12-13 Thread Evgeniy Ivanov

Sorry for duplicate, please, ignore this thread

On 14/12/16 10:35, Evgeniy Ivanov wrote:

Hello!

I'm experiencing some kind of a bug, when I have a controller with
mitaka and a controller with Newton.
When I upgraded the first part of my cluster to Newton, the second part
stopped to work (nova-api).

I got this error:
oslo_service.service ServiceTooOld: This service is older (v9) than the
minimum (v15) version of the rest of the deployment. Unable to continue.

So, as we can see, all those dialogs, manuals, docs, posts which are
talking about compatibility between 2 releases are wrong, aren't they?

I will be happy to read some thoughts from you guys, thanks!

How to reproduce:
Install nova 13 on 2 nodes
Disable the first node
Upgrade the first node to nova 14
Restart nova-api on the second node (13 ver.)



--
Best Regards,
Evgeniy Ivanov


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


[Openstack] Upgrade Mitaka to Newton. Nova 13 does not work with Nova 14

2016-12-13 Thread Evgeniy Ivanov

Hello!

I'm experiencing some kind of a bug, when I have a controller with 
mitaka and a controller with Newton.
When I upgraded the first part of my cluster to Newton, the second part 
stopped to work (nova-api).


I got this error:
oslo_service.service ServiceTooOld: This service is older (v9) than the 
minimum (v15) version of the rest of the deployment. Unable to continue.


So, as we can see, all those dialogs, manuals, docs, posts which are 
talking about compatibility between 2 releases are wrong, aren't they?


I will be happy to read some thoughts from you guys, thanks!

How to reproduce:
Install nova 13 on 2 nodes
Disable the first node
Upgrade the first node to nova 14
Restart nova-api on the second node (13 ver.)

--
Best Regards,
Evgeniy Ivanov


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack