Re: [Gluster-devel] Data classification proposal
Am I right if I understood that the value for media-type is not interpreted beyond the scope of matching rules? That is to say, we don't need/have any notion of media-types that type check internally for forming (sub)volumes using the rules specified. Exactly. To us it's just an opaque ID. Should the no. of bricks or lower-level subvolumes that match the rule be an exact multiple of group-size? Good question. I think users see the current requirement to add bricks in multiples of the replica/stripe size as an annoyance. This will only get worse with erasure coding where the group size is larger. On the other hand, we do need to make sure that members of a group are on different machines. This is why I think we need to be able to split bricks, so that we can use overlapping replica/erasure sets. For example, if we have five bricks and two-way replication, we can split bricks to get a multiple of two and life's good again. So *long term* I think we can/should remove any restriction on users, but there are a whole bunch of unsolved issues around brick splitting. I'm not sure what to do in the short term. Here's a more complex example that adds replication and erasure coding to the mix. # Assume 20 hosts, four fast and sixteen slow (named appropriately). rule tier-1 select *fast* group-size 2 type cluster/afr rule tier-2 # special pattern matching otherwise-unused bricks select %{unclaimed} group-size 8 type cluster/ec parity=2 # i.e. two groups, each six data plus two parity rule all select tier-1 select tier-2 type features/tiering In the above example we would have 2 subvolumes each containing 2 bricks that would be aggregated by rule tier-1. Lets call those subvolumes as tier-1-fast-0 and tier-fast-1. Both of these subvolumes are afr based two-way replicated subvolumes. Are these instances of tier-1-* composed using cluster/dht by the default semantics? Yes. Any time we have multiple subvolumes and no other specified way to combine them into one, we just slap DHT on top. We do this already at the top level; with data classification we might do it at lower levels too. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
Hi Jeff, Missed to add this: SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 'SSL_ERROR_WANT_READ' Thanks, Vijay On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote: Hi Jeff, This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: edge triggered and multi-threaded epoll). The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please find the stack trace below). In the code snippet below we found that 'SSL_pending' was returning 0. I have added a condition here to return from the function when there is no data available. Please suggest if this is OK to do this way or do we need to restructure this function for multi-threaded epoll? code: socket.c 178 static int 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, SSL_trinary_func *func) 180 { 211 switch (SSL_get_error(priv-ssl_ssl,r)) { 212 case SSL_ERROR_NONE: 213 return r; 214 case SSL_ERROR_WANT_READ: 215 if (SSL_pending(priv-ssl_ssl) == 0) 216 return r; 217 pfd.fd = priv-sock; 221 if (poll(pfd,1,-1) 0) { /code Thanks, Vijay On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote: From the stack trace we found that function 'socket_submit_request' is waiting on mutext_lock. lock is held by the function 'ssl_do' and this function is blocked by poll syscall. (gdb) bt #0 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 #1 0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=value optimized out) at event-epoll.c:632 #2 0x00407ecd in main (argc=4, argv=0x7fff160a4528) at glusterfsd.c:2023 (gdb) info threads 10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait () from /lib64/libpthread.so.0 8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () from /lib64/libc.so.6 3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () from /lib64/libpthread.so.0 * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 *(gdb) thread 3** **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0 0x003daa80e264 in __lll_lock_wait ()** ** from /lib64/libpthread.so.0** **(gdb) bt #0 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x003daa8093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134 *#4 0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, prog=value optimized out, procnum=value optimized out, cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, proghdr=0x7f3b8212f410, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=value optimized out, frame=0x7f3b93d2a454, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0) at rpc-clnt.c:1556 #5 0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, req=value optimized out, frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, iobref=0x0, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0, xdrproc=0x7f3b94a4ede0 xdr_gfs3_lookup_req) at client.c:243 #6 0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data=0x7f3b8212f660) at client-rpc-fops.c:3119 (gdb) p priv-lock $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 1, __kind = 0, __spins = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = \002\000\000\000\000\000\000\000\201f\000\000\001, '\000' repeats 26 times, __align = 2} *(gdb) thread 4 [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0 0x003daa4df343 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x003daa4df343 in poll () from /lib64/libc.so.6 #1 0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, buf=0x7f3b7c051264, len=4, func=0x3db2441570 SSL_read) at socket.c:216 #2 0x7f3b8aa7277b in __socket_ssl_readv (this=value optimized out,
Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
Hi Jeff, This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: edge triggered and multi-threaded epoll). The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please find the stack trace below). In the code snippet below we found that 'SSL_pending' was returning 0. I have added a condition here to return from the function when there is no data available. Please suggest if this is OK to do this way or do we need to restructure this function for multi-threaded epoll? code: socket.c 178 static int 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, SSL_trinary_func *func) 180 { 211 switch (SSL_get_error(priv-ssl_ssl,r)) { 212 case SSL_ERROR_NONE: 213 return r; 214 case SSL_ERROR_WANT_READ: 215 if (SSL_pending(priv-ssl_ssl) == 0) 216 return r; 217 pfd.fd = priv-sock; 221 if (poll(pfd,1,-1) 0) { /code Thanks, Vijay On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote: From the stack trace we found that function 'socket_submit_request' is waiting on mutext_lock. lock is held by the function 'ssl_do' and this function is blocked by poll syscall. (gdb) bt #0 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 #1 0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=value optimized out) at event-epoll.c:632 #2 0x00407ecd in main (argc=4, argv=0x7fff160a4528) at glusterfsd.c:2023 (gdb) info threads 10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait () from /lib64/libpthread.so.0 8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () from /lib64/libc.so.6 3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () from /lib64/libpthread.so.0 * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 *(gdb) thread 3** **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0 0x003daa80e264 in __lll_lock_wait ()** ** from /lib64/libpthread.so.0** **(gdb) bt #0 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x003daa8093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134 *#4 0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, prog=value optimized out, procnum=value optimized out, cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, proghdr=0x7f3b8212f410, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=value optimized out, frame=0x7f3b93d2a454, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0) at rpc-clnt.c:1556 #5 0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, req=value optimized out, frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, cbkfn=0x7f3b892364b0 client3_3_lookup_cbk, iobref=0x0, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0, xdrproc=0x7f3b94a4ede0 xdr_gfs3_lookup_req) at client.c:243 #6 0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data=0x7f3b8212f660) at client-rpc-fops.c:3119 (gdb) p priv-lock $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 1, __kind = 0, __spins = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = \002\000\000\000\000\000\000\000\201f\000\000\001, '\000' repeats 26 times, __align = 2} *(gdb) thread 4 [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0 0x003daa4df343 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x003daa4df343 in poll () from /lib64/libc.so.6 #1 0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, buf=0x7f3b7c051264, len=4, func=0x3db2441570 SSL_read) at socket.c:216 #2 0x7f3b8aa7277b in __socket_ssl_readv (this=value optimized out, opvector=value optimized out, opcount=value optimized out) at socket.c:335 #3 0x7f3b8aa72c26 in __socket_cached_read (this=value optimized out, vector=value optimized out,
[Gluster-devel] Glusterfs Help needed
Dear All, I am building Glusterfs on shared storage. I got Disk array with 2 SAS controller, one controller connected to node A and other Node B. Can I create Glusterfs between these two node ( A B) without replication, but data should be read / write on both node ( for better performance). In case of node A fail data should be accessed from node B. Please suggest. Regards, Chandrahasa S Tata Consultancy Services Data Center- ( Non STPI) 2nd Pokharan Road, Subash Nagar , Mumbai - 400601,Maharashtra India Ph:- +91 22 677-81825 Buzz:- 4221825 Mailto: chandrahas...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting From: jenk...@build.gluster.org (Gluster Build System) To: gluster-us...@gluster.org, gluster-devel@gluster.org Date: 06/24/2014 03:46 PM Subject:[Gluster-users] glusterfs-3.5.1 released Sent by:gluster-users-boun...@gluster.org SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz This release is made off jenkins-release-73 -- Gluster Build System ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Data classification proposal
Its possible to express your example using lists if their entries are allowed to overlap. I see that you wanted a way to express a matrix (overlapping rules) with gluster's tree-like syntax as backdrop. A polytree may be a better term than matrix (DAG without cycles), i.e. when there are overlaps a node in the graph gets multiple in-arcs. Syntax aside, we seem to part on where to solve the problem- config file or UX. I prefer the UX have the logic to build the configuration file, given how complex it can be. My preference would be for the config file be mostly read only with extremely simple syntax. I'll put some more thought into this and believe this discussion has illuminated some good points. Brick: host1:/SSD1 SSD1 Brick: host1:/SSD2 SSD2 Brick: host2:/SSD3 SSD3 Brick: host2:/SSD4 SSD4 Brick: host1:/DISK1 DISK1 rule rack4: select SSD1, SSD2, DISK1 # some files should go on ssds in rack 4 rule A: option filter-condition *.lock select SSD1, SSD2 # some files should go on ssds anywhere rule B: option filter-condition *.out select SSD1, SSD2, SSD3, SSD4 # some files should go anywhere in rack 4 rule C option filter-condition *.c select rack4 # some files we just don't care rule D option filter-condition *.h select SSD1, SSD2, SSD3, SSD4, DISK1 volume: option filter-condition A,B,C,D - Original Message - From: Jeff Darcy jda...@redhat.com To: Dan Lambright dlamb...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Monday, June 23, 2014 7:11:44 PM Subject: Re: [Gluster-devel] Data classification proposal Rather than using the keyword unclaimed, my instinct was to explicitly list which bricks have not been claimed. Perhaps you have something more subtle in mind, it is not apparent to me from your response. Can you provide an example of why it is necessary and a list could not be provided in its place? If the list is somehow difficult to figure out, due to a particularly complex setup or some such, I'd prefer a CLI/GUI build that list rather than having sysadmins hand-edit this file. It's not *difficult* to make sure every brick has been enumerated by some rule, and that there are no overlaps, but it's certainly tedious and error prone. Imagine that a user has four has bricks in four machines, using names like serv1-b1, serv1-b2, ..., serv4-b6. Accordingly, they've set up rules to put serv1* into one set and serv[234]* into another set (which is already more flexibility than I think your proposal gave them). Now when they add serv5 they need an extra step to add it to the tiering config, which wouldn't have been necessary if we supported defaults. What percentage of users would forget that step at least once? I don't know for sure, but I'd guess it's pretty high. Having a CLI or GUI create configs just means that we have to add support for defaults there instead. We'd still have to implement the same logic, they'd still have to specify the same thing. That just seems like moving the problem around instead of solving it. The key-value piece seems like syntactic sugar - an alias. If so, let the name itself be the alias. No notions of SSD or physical location need be inserted. Unless I am missing that it *is* necessary, I stand by that value judgement as a philosophy of not putting anything into the configuration file that you don't require. Can you provide an example of where it is necessary? OK... - Brick: SSD1 Brick: SSD2 Brick: SSD3 Brick: SSD4 Brick: DISK1 rack4: SSD1, SSD2, DISK1 filter A : SSD1, SSD2 filter B : SSD1,SSD2, SSD3, SSD4 filter C: rack4 filter D: SSD1, SSD2, SSD3, SSD4, DISK1 meta-filter: filter A, filter B, filter C, filter D * some files should go on ssds in rack 4 * some files should go on ssds anywhere * some files should go anywhere in rack 4 * some files we just don't care Notice how the rules *overlap*. We can't support that if our syntax only allows the user to express a list (or list of lists). If the list is ordered by type, we can't also support location-based rules. If the list is ordered by location, we lose type-based rules instead. Brick properties create a matrix, with an unknown number of dimensions (e.g. security level, tenant ID, and so on as well as type and location). The logical way to represent such a space for rule-matching purposes is to let users define however many dimensions (keys) as they want and as many values for each dimension as they want. Whether the exact string type or unclaimed appears anywhere isn't the issue. What matters is that the *semantics* of assigning properties to a brick have to be more sophisticated than just assigning each a position in a list, and we need a syntax that supports those semantics. Otherwise we'll end up solving the same UX problems again and again each time we add a feature that involves treating bricks or data differently. Each time we'll probably do it a little differently and confuse users a little
[Gluster-devel] regarding inode-unref on root inode
Does anyone know why inode_unref is no-op for root inode? I see the following code in inode.c static inode_t * __inode_unref (inode_t *inode) { if (!inode) return NULL; if (__is_root_gfid(inode-gfid)) return inode; ... } Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Data classification proposal
Its possible to express your example using lists if their entries are allowed to overlap. I see that you wanted a way to express a matrix (overlapping rules) with gluster's tree-like syntax as backdrop. A polytree may be a better term than matrix (DAG without cycles), i.e. when there are overlaps a node in the graph gets multiple in-arcs. Syntax aside, we seem to part on where to solve the problem- config file or UX. I prefer the UX have the logic to build the configuration file, given how complex it can be. My preference would be for the config file be mostly read only with extremely simple syntax. I'll put some more thought into this and believe this discussion has illuminated some good points. Brick: host1:/SSD1 SSD1 Brick: host1:/SSD2 SSD2 Brick: host2:/SSD3 SSD3 Brick: host2:/SSD4 SSD4 Brick: host1:/DISK1 DISK1 rule rack4: select SSD1, SSD2, DISK1 # some files should go on ssds in rack 4 rule A: option filter-condition *.lock select SSD1, SSD2 # some files should go on ssds anywhere rule B: option filter-condition *.out select SSD1, SSD2, SSD3, SSD4 # some files should go anywhere in rack 4 rule C option filter-condition *.c select rack4 # some files we just don't care rule D option filter-condition *.h select SSD1, SSD2, SSD3, SSD4, DISK1 volume: option filter-condition A,B,C,D This seems to leave us with two options. One option is that select supports only explicit enumeration, so that adding a brick means editing multiple rules that apply to it. The other option is that select supports wildcards. Using a regex to match parts of a name is effectively the same as matching the explicit tags we started with, except that expressing complex Boolean conditions using a regex can get more than a bit messy. As Jamie Zawinski famously said: Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems. I think it's nice to support regexes instead of plain strings in lower-level rules, but relying on them alone to express complex higher-level policies would IMO be a mistake. Likewise, defining a proper syntax for a config file seems both more flexible and easier than defining one for a CLI, where the parsing options are even more limited. What happens when someone wants to use Puppet (for example) to set this up? Then the user would express their will in Puppet syntax, which would have to convert it to our CLI syntax, which would convert it to our config-file syntax. Why not allow them to skip a step where information might get lost or mangled in translation? We can still have CLI commands to do the most common kinds of manipulation, as we do for volfiles, but the final form can be more extensible. It will still be more comprehensible than Ceph's CRUSH maps. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel