Re: [Openstack] Several questions about HOW SWIFT WORKS
The best, technical description of the ring was written by the person who had the biggest role in writing it for swift: http://www.tlohg.com/p/building-consistent-hashing-ring.html --John smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Several questions about HOW SWIFT WORKS
Thanks for the answers John ! Below are a couple more questions. On 01/03/2012 07:03 PM, John Dickinson wrote: Answers inline. On Jan 3, 2012, at 11:32 AM, Alejandro Comisario wrote: So, lets get down to business. # 1 we have memcache service running on each proxy, so as far as we know, memcache actually caches keystone tokens and object paths as the request ( PUT , GET) enters the proxy, but for example, if we restart one proxy server, so the memcached service is empty, is the restarted proxy node going to the neighbor memcache on nex request, lookup for what it needs, and cache the answer on itself so the next query is solved locally ? Memcache works as a distributed lookup. So the keys that were stored on the server that was restarted are no longer cached. The proxies share a memcache pool (at least in the example proxy config), so requests are fetched from that pool. Since the keys are balanced across the entire memcache pool, roughly 1/N memcache requests will be local (where N == the number of proxy servers). Thats clear, so, the proxy that was restarted, is doing a lookup to the memcache pool, but the memcache on this proxy is empty, so, when a new key needs to be cached, is the empty memcached preferred for the next cache ? # 2 the documentation says regarding "For each request, it will look up the location of the account, container, or object in the ring (see below) and route the request accordingly" in what way the proxy actually does the look-up regarding WHERE is an object / container in the cluster ? does it connect to any datanode asking for an object location ? does the proxy have any locally sotarge data ?? The proxy does not store any data locally (not even to buffer reads or writes). The proxy uses the ring to determine how to handle the read or write. The ring is a mapping of the storage volumes that, given an account, container, and object, provides the final location of where the data is to be stored. The proxy then uses this information to either read or write the object. What is actually the RING ? speaking on a lower level, is "the use of the ring" the capacity of looking up the open ports opened on the dataNodes by the object, account and container services ? i just want to understand what is actually the RING and how the proxies use it. # 3 Maybe it has to do with the previous question but, every dataNode knows everything that is stored on the cluster (container service) or only knows the object that has itself, and the replicas of its objects? Things are stored in swift deterministically, so data nodes don't know where everything is stored, but they know how to find where it should be stored (ie the ring). Again, how the dataNodes uses THE RING to know, where an object or a container should be ? Maybe if you can give me a very technical answer, like : well, we first connect to this port for this service and made a lookup, or we look up our local SQLite database, etc. # 4 We are building a production cluster of 24 datanodes, having 6 drives each (144 immediate drives) we know, that a good default number of partitions per drive is 100, so the math for creating the ring will be (24 nodes * 6 drives * 100 partitions) but we know the at the end of the year, the amount of datanodes (and drives also) could be 2x or 3x more. So, for the initial setup, can we build the RING with our 144 drives and 100 partitions per drive so we can modify the ring / partitions later and rebalance? or is safer to think about future infrastructure increase, and build the ring with those numbers in mind ? Your partition power should take into account the largest size your cluster can be. You cannot change the partition power after you deploy the ring unless you migrate everything in your cluster (a manual process of GET from the old ring and PUT to the new ring), so it is important to select the proper partition power up front. Thats very clear ! # 5 We put a new object into the cluster, the proxy decides where to write the object (is it in a round-robin manner ?) is the proxy server giving a "Created" response when the 1st replica is actually writen and put into the account and container SQLite databases ? or there is and ok just when the OBJECT service actually wrote the data on disc ? The proxy sends the write to 3 object servers. The object servers write to disk and then send a request to the container servers to update the container listing. The object servers then return success to the proxy. After 2 object servers have returned success, the proxy can return success to the client. Perfectly understood ! Hope, we can shed some lights regarding this doubts. There are obviously some details I've glossed over in the short answers above. Much of the complexity in swift comes from failure scenarios. Please ask if you need more detail. --John ___ Mailing list: https:
Re: [Openstack] Several questions about HOW SWIFT WORKS
On Tue, Jan 3, 2012 at 6:32 PM, Alejandro Comisario wrote: > # 1 we have memcache service running on each proxy, so as far as we know, > memcache actually caches keystone tokens and object paths as the request ( W.R.T the keystone token caching; in trunk the swift/keystone middleware will use the configuration from keystone.conf configuration (via keystone auth_token middleware) and not from swift/proxy_server.conf. It supports multiple memcached servers as described by John but you probably want to make sure the configuration match between the two. Chmouel. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Several questions about HOW SWIFT WORKS
Answers inline. On Jan 3, 2012, at 11:32 AM, Alejandro Comisario wrote: > > So, lets get down to business. > > # 1 we have memcache service running on each proxy, so as far as we know, > memcache actually caches keystone tokens and object paths as the request ( > PUT , GET) enters the proxy, but for example, if we restart one proxy server, > so the memcached service is empty, is the restarted proxy node going to the > neighbor memcache on nex request, lookup for what it needs, and cache the > answer on itself so the next query is solved locally ? Memcache works as a distributed lookup. So the keys that were stored on the server that was restarted are no longer cached. The proxies share a memcache pool (at least in the example proxy config), so requests are fetched from that pool. Since the keys are balanced across the entire memcache pool, roughly 1/N memcache requests will be local (where N == the number of proxy servers). > > # 2 the documentation says regarding "For each request, it will look up the > location of the account, container, or object in the ring (see below) and > route the request accordingly" in what way the proxy actually does the > look-up regarding WHERE is an object / container in the cluster ? does it > connect to any datanode asking for an object location ? does the proxy have > any locally sotarge data ?? The proxy does not store any data locally (not even to buffer reads or writes). The proxy uses the ring to determine how to handle the read or write. The ring is a mapping of the storage volumes that, given an account, container, and object, provides the final location of where the data is to be stored. The proxy then uses this information to either read or write the object. > > # 3 Maybe it has to do with the previous question but, every dataNode knows > everything that is stored on the cluster (container service) or only knows > the object that has itself, and the replicas of its objects? Things are stored in swift deterministically, so data nodes don't know where everything is stored, but they know how to find where it should be stored (ie the ring). > > # 4 We are building a production cluster of 24 datanodes, having 6 drives > each (144 immediate drives) we know, that a good default number of partitions > per drive is 100, so the math for creating the ring will be (24 nodes * 6 > drives * 100 partitions) but we know the at the end of the year, the amount > of datanodes (and drives also) could be 2x or 3x more. So, for the initial > setup, can we build the RING with our 144 drives and 100 partitions per drive > so we can modify the ring / partitions later and rebalance? or is safer to > think about future infrastructure increase, and build the ring with those > numbers in mind ? Your partition power should take into account the largest size your cluster can be. You cannot change the partition power after you deploy the ring unless you migrate everything in your cluster (a manual process of GET from the old ring and PUT to the new ring), so it is important to select the proper partition power up front. > > # 5 We put a new object into the cluster, the proxy decides where to write > the object (is it in a round-robin manner ?) is the proxy server giving a > "Created" response when the 1st replica is actually writen and put into the > account and container SQLite databases ? or there is and ok just when the > OBJECT service actually wrote the data on disc ? The proxy sends the write to 3 object servers. The object servers write to disk and then send a request to the container servers to update the container listing. The object servers then return success to the proxy. After 2 object servers have returned success, the proxy can return success to the client. > > Hope, we can shed some lights regarding this doubts. There are obviously some details I've glossed over in the short answers above. Much of the complexity in swift comes from failure scenarios. Please ask if you need more detail. --John smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Several questions about HOW SWIFT WORKS
Hi everyone ! Since we are using swift for a time now, we would like to know a few things in a deep way about how some things actually works in SWIFT. Imagine the setup where im putting all the doubts is as follow : + 2 proxyNodes + 10 dataNodes ( 5 zones ) So, lets get down to business. # 1 we have memcache service running on each proxy, so as far as we know, memcache actually caches keystone tokens and object paths as the request ( PUT , GET) enters the proxy, but for example, if we restart one proxy server, so the memcached service is empty, is the restarted proxy node going to the neighbor memcache on nex request, lookup for what it needs, and cache the answer on itself so the next query is solved locally ? # 2 the documentation says regarding "For each request, it will look up the location of the account, container, or object in the ring (see below) and route the request accordingly" in what way the proxy actually does the look-up regarding WHERE is an object / container in the cluster ? does it connect to any datanode asking for an object location ? does the proxy have any locally sotarge data ?? # 3 Maybe it has to do with the previous question but, every dataNode knows everything that is stored on the cluster (container service) or only knows the object that has itself, and the replicas of its objects? # 4 We are building a production cluster of 24 datanodes, having 6 drives each (144 immediate drives) we know, that a good default number of partitions per drive is 100, so the math for creating the ring will be (24 nodes * 6 drives * 100 partitions) but we know the at the end of the year, the amount of datanodes (and drives also) could be 2x or 3x more. So, for the initial setup, can we build the RING with our 144 drives and 100 partitions per drive so we can modify the ring / partitions later and rebalance? or is safer to think about future infrastructure increase, and build the ring with those numbers in mind ? # 5 We put a new object into the cluster, the proxy decides where to write the object (is it in a round-robin manner ?) is the proxy server giving a "Created" response when the 1st replica is actually writen and put into the account and container SQLite databases ? or there is and ok just when the OBJECT service actually wrote the data on disc ? Hope, we can shed some lights regarding this doubts. Thanks ! Cheers. -- Alex ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp