Re: [Openstack] Several questions about HOW SWIFT WORKS

2012-01-06 Thread John Dickinson
The best, technical description of the ring was written by the person who had 
the biggest role in writing it for swift: 
http://www.tlohg.com/p/building-consistent-hashing-ring.html

--John




smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Several questions about HOW SWIFT WORKS

2012-01-06 Thread Alejandro Comisario

Thanks for the answers John !
Below are a couple more questions.

On 01/03/2012 07:03 PM, John Dickinson wrote:

Answers inline.

On Jan 3, 2012, at 11:32 AM, Alejandro Comisario wrote:



So, lets get down to business.

# 1 we have memcache service running on each proxy, so as far as we know, 
memcache actually caches keystone tokens and object paths as the request ( PUT 
, GET) enters the proxy, but for example, if we restart one proxy server, so 
the memcached service is empty, is the restarted proxy node going to the 
neighbor memcache on nex request, lookup for what it needs, and cache the 
answer on itself so the next query is solved locally ?


Memcache works as a distributed lookup. So the keys that were stored on the 
server that was restarted are no longer cached. The proxies share a memcache 
pool (at least in the example proxy config), so requests are fetched from that 
pool. Since the keys are balanced across the entire memcache pool, roughly 1/N 
memcache requests will be local (where N == the number of proxy servers).
Thats clear, so, the proxy that was restarted, is doing a lookup to the 
memcache pool, but the memcache on this proxy is empty, so, when a new 
key needs to be cached, is the empty memcached preferred for the next 
cache ?




# 2 the documentation says regarding "For each request, it will look up the location 
of the account, container, or object in the ring (see below) and route the request 
accordingly" in what way the proxy actually does the look-up regarding WHERE is an 
object / container in the cluster ? does it connect to any datanode asking for an object 
location ? does the proxy have any locally sotarge data ??


The proxy does not store any data locally (not even to buffer reads or writes). 
The proxy uses the ring to determine how to handle the read or write. The ring 
is a mapping of the storage volumes that, given an account, container, and 
object, provides the final location of where the data is to be stored. The 
proxy then uses this information to either read or write the object.
What is actually the RING ? speaking on a lower level, is "the use of 
the ring" the capacity of looking up the open ports opened on the 
dataNodes by the object, account and container services ? i just want to 
understand what is actually the RING and how the proxies use it.





# 3 Maybe it has to do with the previous question but, every dataNode knows 
everything that is stored on the cluster (container service) or only knows the 
object that has itself, and the replicas of its objects?


Things are stored in swift deterministically, so data nodes don't know where 
everything is stored, but they know how to find where it should be stored (ie 
the ring).
Again, how the dataNodes uses THE RING to know, where an object or a 
container should be ? Maybe if you can give me a very technical answer, 
like : well, we first connect to this port for this service and made a 
lookup, or we look up our local SQLite database, etc.





# 4 We are building a production cluster of 24 datanodes, having 6 drives each 
(144 immediate drives) we know, that a good default number of partitions per 
drive is 100, so the math for creating the ring will be (24 nodes * 6 drives * 
100 partitions) but we know the at the end of the year, the amount of datanodes 
(and drives also) could be 2x or 3x more. So, for the initial setup, can we 
build the RING with our 144 drives and 100 partitions per drive so we can 
modify the ring / partitions later and rebalance? or is safer to think about 
future infrastructure increase, and build the ring with those numbers in mind ?


Your partition power should take into account the largest size your cluster can 
be. You cannot change the partition power after you deploy the ring unless you 
migrate everything in your cluster (a manual process of GET from the old ring 
and PUT to the new ring), so it is important to select the proper partition 
power up front.

Thats very clear !





# 5 We put a new object into the cluster, the proxy decides where to write the object (is 
it in a round-robin manner ?) is the proxy server giving a "Created" response 
when the 1st replica is actually writen and put into the account and container SQLite 
databases ? or there is and ok just when the OBJECT service actually wrote the data on 
disc ?


The proxy sends the write to 3 object servers. The object servers write to disk 
and then send a request to the container servers to update the container 
listing. The object servers then return success to the proxy. After 2 object 
servers have returned success, the proxy can return success to the client.

Perfectly understood !





Hope, we can shed some lights regarding this doubts.


There are obviously some details I've glossed over in the short answers above. 
Much of the complexity in swift comes from failure scenarios. Please ask if you 
need more detail.


--John



___
Mailing list: https:

Re: [Openstack] Several questions about HOW SWIFT WORKS

2012-01-04 Thread Chmouel Boudjnah
On Tue, Jan 3, 2012 at 6:32 PM, Alejandro Comisario
 wrote:
> # 1 we have memcache service running on each proxy, so as far as we know,
> memcache actually caches keystone tokens and object paths as the request (

W.R.T the keystone token caching; in trunk the swift/keystone
middleware will use the configuration from keystone.conf configuration
(via keystone auth_token middleware) and not from
swift/proxy_server.conf. It supports multiple memcached servers as
described by John but you probably want to make sure the configuration
match between the two.

Chmouel.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Several questions about HOW SWIFT WORKS

2012-01-03 Thread John Dickinson
Answers inline.

On Jan 3, 2012, at 11:32 AM, Alejandro Comisario wrote:

> 
> So, lets get down to business.
> 
> # 1 we have memcache service running on each proxy, so as far as we know, 
> memcache actually caches keystone tokens and object paths as the request ( 
> PUT , GET) enters the proxy, but for example, if we restart one proxy server, 
> so the memcached service is empty, is the restarted proxy node going to the 
> neighbor memcache on nex request, lookup for what it needs, and cache the 
> answer on itself so the next query is solved locally ?

Memcache works as a distributed lookup. So the keys that were stored on the 
server that was restarted are no longer cached. The proxies share a memcache 
pool (at least in the example proxy config), so requests are fetched from that 
pool. Since the keys are balanced across the entire memcache pool, roughly 1/N 
memcache requests will be local (where N == the number of proxy servers).

> 
> # 2 the documentation says regarding "For each request, it will look up the 
> location of the account, container, or object in the ring (see below) and 
> route the request accordingly" in what way the proxy actually does the 
> look-up regarding WHERE is an object / container in the cluster ? does it 
> connect to any datanode asking for an object location ? does the proxy have 
> any locally sotarge data ??

The proxy does not store any data locally (not even to buffer reads or writes). 
The proxy uses the ring to determine how to handle the read or write. The ring 
is a mapping of the storage volumes that, given an account, container, and 
object, provides the final location of where the data is to be stored. The 
proxy then uses this information to either read or write the object.

> 
> # 3 Maybe it has to do with the previous question but, every dataNode knows 
> everything that is stored on the cluster (container service) or only knows 
> the object that has itself, and the replicas of its objects?

Things are stored in swift deterministically, so data nodes don't know where 
everything is stored, but they know how to find where it should be stored (ie 
the ring).

> 
> # 4 We are building a production cluster of 24 datanodes, having 6 drives 
> each (144 immediate drives) we know, that a good default number of partitions 
> per drive is 100, so the math for creating the ring will be (24 nodes * 6 
> drives * 100 partitions) but we know the at the end of the year, the amount 
> of datanodes (and drives also) could be 2x or 3x more. So, for the initial 
> setup, can we build the RING with our 144 drives and 100 partitions per drive 
> so we can modify the ring / partitions later and rebalance? or is safer to 
> think about future infrastructure increase, and build the ring with those 
> numbers in mind ?

Your partition power should take into account the largest size your cluster can 
be. You cannot change the partition power after you deploy the ring unless you 
migrate everything in your cluster (a manual process of GET from the old ring 
and PUT to the new ring), so it is important to select the proper partition 
power up front.

> 
> # 5 We put a new object into the cluster, the proxy decides where to write 
> the object (is it in a round-robin manner ?) is the proxy server giving a 
> "Created" response when the 1st replica is actually writen and put into the 
> account and container SQLite databases ? or there is and ok just when the 
> OBJECT service actually wrote the data on disc ?

The proxy sends the write to 3 object servers. The object servers write to disk 
and then send a request to the container servers to update the container 
listing. The object servers then return success to the proxy. After 2 object 
servers have returned success, the proxy can return success to the client.

> 
> Hope, we can shed some lights regarding this doubts.

There are obviously some details I've glossed over in the short answers above. 
Much of the complexity in swift comes from failure scenarios. Please ask if you 
need more detail.


--John



smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Several questions about HOW SWIFT WORKS

2012-01-03 Thread Alejandro Comisario

Hi everyone !

Since we are using swift for a time now, we would like to know a few 
things in a deep way about how some things actually works in SWIFT.


Imagine the setup where im putting all the doubts is as follow :
+ 2 proxyNodes
+ 10 dataNodes ( 5 zones )

So, lets get down to business.

# 1 we have memcache service running on each proxy, so as far as we 
know, memcache actually caches keystone tokens and object paths as the 
request ( PUT , GET) enters the proxy, but for example, if we restart 
one proxy server, so the memcached service is empty, is the restarted 
proxy node going to the neighbor memcache on nex request, lookup for 
what it needs, and cache the answer on itself so the next query is 
solved locally ?


# 2 the documentation says regarding "For each request, it will look up 
the location of the account, container, or object in the ring (see 
below) and route the request accordingly" in what way the proxy actually 
does the look-up regarding WHERE is an object / container in the cluster 
? does it connect to any datanode asking for an object location ? does 
the proxy have any locally sotarge data ??


# 3 Maybe it has to do with the previous question but, every dataNode 
knows everything that is stored on the cluster (container service) or 
only knows the object that has itself, and the replicas of its objects?


# 4 We are building a production cluster of 24 datanodes, having 6 
drives each (144 immediate drives) we know, that a good default number 
of partitions per drive is 100, so the math for creating the ring will 
be (24 nodes * 6 drives * 100 partitions) but we know the at the end of 
the year, the amount of datanodes (and drives also) could be 2x or 3x 
more. So, for the initial setup, can we build the RING with our 144 
drives and 100 partitions per drive so we can modify the ring / 
partitions later and rebalance? or is safer to think about future 
infrastructure increase, and build the ring with those numbers in mind ?


# 5 We put a new object into the cluster, the proxy decides where to 
write the object (is it in a round-robin manner ?) is the proxy server 
giving a "Created" response when the 1st replica is actually writen and 
put into the account and container SQLite databases ? or there is and ok 
just when the OBJECT service actually wrote the data on disc ?


Hope, we can shed some lights regarding this doubts.
Thanks !

Cheers.

--
Alex 
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp