On Wed, Jan 30, 2019 at 9:44 AM Raghavendra Gowdappa <rgowd...@redhat.com> wrote:
> > > On Wed, Jan 30, 2019 at 7:35 AM Li, Deqian (NSB - CN/Hangzhou) < > deqian...@nokia-sbell.com> wrote: > >> Hi, >> >> >> >> Could you help to check this coredump? >> >> We are using glusterfs 3.12-3(3 replicated bricks solution ) to do >> stability testing under high CPU load like 80% by stress and doing I/O. >> >> After several hours, coredump happened in glusterfs side . >> >> >> >> [Current thread is 1 (Thread 0x7ffff37d2700 (LWP 3696))] >> >> Missing separate debuginfos, use: dnf debuginfo-install >> rcp-pack-glusterfs-1.8.1_11_g99e9ca6-RCP2.wf28.x86_64 >> >> (gdb) bt >> >> #0 0x00007ffff0d5c845 in wb_fulfill (wb_inode=0x7fffd406b3b0, >> liabilities=0x7fffdc234b50) at write-behind.c:1148 >> >> #1 0x00007ffff0d5e4d5 in wb_process_queue (wb_inode=0x7fffd406b3b0) at >> write-behind.c:1718 >> >> #2 0x00007ffff0d5eda7 in wb_writev (frame=0x7fffe0086290, >> this=0x7fffec014b00, fd=0x7fffe4034070, vector=0x7fffdc445720, count=1, >> offset=67108863, flags=32770, iobref=0x7fffdc00d550, xdata=0x0) >> >> at write-behind.c:1825 >> >> #3 0x00007ffff0b51fcb in du_writev_resume (ret=0, frame=0x7fffdc0305a0, >> opaque=0x7fffdc0305a0) at disk-usage.c:490 >> >> #4 0x00007ffff7b3510d in synctask_wrap () at syncop.c:377 >> >> #5 0x00007ffff60d0660 in ?? () from /lib64/libc.so.6 >> >> #6 0x0000000000000000 in ?? () >> >> (gdb) p wb_inode >> >> $1 = (wb_inode_t *) 0x7fffd406b3b0 >> >> (gdb) frame 2 >> >> #2 0x00007ffff0d5eda7 in wb_writev (frame=0x7fffe0086290, >> this=0x7fffec014b00, fd=0x7fffe4034070, vector=0x7fffdc445720, count=1, >> offset=67108863, flags=32770, iobref=0x7fffdc00d550, xdata=0x0) >> >> at write-behind.c:1825 >> >> 1825 in write-behind.c >> >> (gdb) p *fd >> >> $2 = {pid = 18154, flags = 32962, refcount = 0, inode_list = {next = >> 0x7fffe4034080, prev = 0x7fffe4034080}, inode = 0x0, lock = {spinlock = 0, >> mutex = {__data = {__lock = 0, __count = 0, __owner = 0, >> >> __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = >> {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, >> "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}}, >> >> _ctx = 0x7fffe4022930, xl_count = 17, lk_ctx = 0x7fffe40350e0, >> anonymous = _gf_false} >> >> (gdb) p fd >> >> $3 = (fd_t *) 0x7fffe4034070 >> >> >> >> (gdb) p wb_inode->this >> >> $1 = (xlator_t *) 0xffffffffffffff00 >> >> >> >> After adding test log I found the FOP sequence in write-behind xlator >> side was mass as bellow showing. In the FUSE side the FLUSH is after >> write2, but in the WB side, FLUSH is between write2 ‘wb_do_unwinds’ and >> ‘wb_fulfill’. >> >> So I think this should has problem. >> > > wb_do_unwinds the write after caching it. > * wb_do_unwinds unwind the write after caching it. Once the write response is unwound, kernel issues a FLUSH. So, this is a > valid sequence of operations and nothing wrong. > > I think it’s possible that the FLUSH and later RELEASE operation will >> destroy the fd , it will cause ‘wb_in->this(0xffffffffffffff00)’. Do you >> think so? >> >> And I think our new adding disk-usage xlator’s synctask_new will dealy >> the write operation, but the FLUSH operation without this delay(because not >> invoked the disk-usage xlator). >> > > Flush and release wait for completion of any on-going writes. So, unless > write-behind has unwound the write, it won't see a flush. By the time write > is unwound, write-behind makes sure to take references on objects it uses > (like fds, iobref etc). So, I don't see a problem there. > > >> >> Do you agree with my speculation ? and how to fix?(we don’t want to move >> the disk-usage xlator) >> > > I've still not found the RCA. We can discuss about the fix once RCA is > found. > > >> >> >> >> Problematic FOP sequence : >> >> >> >> FUSE side: WB side: >> >> >> >> Write 1 write1 >> >> Write2 do unwind >> >> Write 2 FLUSH >> >> Release(destroy fd) >> >> FLUSH write2 (wb_fulfill) then coredump. >> >> Release >> >> >> >> >> >> int >> >> wb_fulfill (wb_inode_t *wb_inode, list_head_t *liabilities) >> >> { >> >> wb_request_t *req = NULL; >> >> wb_request_t *head = NULL; >> >> wb_request_t *tmp = NULL; >> >> wb_conf_t *conf = NULL; >> >> off_t expected_offset = 0; >> >> size_t curr_aggregate = 0; >> >> size_t vector_count = 0; >> >> int ret = 0; >> >> >> >> conf = wb_inode->this->private; à this line coredump >> >> >> >> list_for_each_entry_safe (req, tmp, liabilities, winds) { >> >> list_del_init (&req->winds); >> >> >> >> …. >> >> >> >> >> >> volume ccs-write-behind >> >> 68: type performance/write-behind >> >> 69: subvolumes ccs-dht >> >> 70: end-volume >> >> 71: >> >> * 72: volume ccs-disk-usage **à we add a new xlator >> here for write op ,just for checking if disk if full. And synctask_new for >> write.* >> >> *73: type performance/disk-usage* >> >> *74: subvolumes ccs-write-behind* >> >> *75: end-volume* >> >> 76: >> >> 77: volume ccs-read-ahead >> >> 78: type performance/read-ahead >> >> 79: subvolumes ccs-disk-usage >> >> 80: end-volume >> >> >> >> >> >> >> >> Ps. Part of Our new translator code >> >> >> >> int >> >> du_writev (call_frame_t *frame, xlator_t *this, fd_t *fd, >> >> struct iovec *vector, int count, off_t off, uint32_t flags, >> >> struct iobref *iobref, dict_t *xdata) >> >> { >> >> int op_errno = -1; >> >> int ret = -1; >> >> du_local_t *local = NULL; >> >> loc_t tmp_loc = {0,}; >> >> >> >> VALIDATE_OR_GOTO (frame, err); >> >> VALIDATE_OR_GOTO (this, err); >> >> VALIDATE_OR_GOTO (fd, err); >> >> >> >> tmp_loc.gfid[15] = 1; >> >> tmp_loc.inode = fd->inode; >> >> tmp_loc.parent = fd->inode; >> >> local = du_local_init (frame, &tmp_loc, fd, GF_FOP_WRITE); >> >> if (!local) { >> >> >> >> op_errno = ENOMEM; >> >> goto err; >> >> } >> >> local->vector = iov_dup (vector, count); >> >> local->offset = off; >> >> local->count = count; >> >> local->flags = flags; >> >> local->iobref = iobref_ref (iobref); >> >> >> >> ret = synctask_new(this->ctx->env, >> du_get_du_info,du_writev_resume,frame,frame); >> >> if(ret) >> >> { >> >> op_errno = -1; >> >> gf_log (this->name, GF_LOG_WARNING,"synctask_new return >> failure ret(%d) ",ret); >> >> goto err; >> >> } >> >> return 0; >> >> err: >> >> op_errno = (op_errno == -1) ? errno : op_errno; >> >> DU_STACK_UNWIND (writev, frame, -1, op_errno, NULL, NULL, NULL); >> >> return 0; >> >> } >> >> >> >> Br, >> >> Li Deqian >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users