On 01/22/2016 09:13 AM, Raghavendra Talur wrote:
HI All,

I am sure there are many tricks hidden under sleeves of many Gluster
developers.
I realized this when speaking to new developers. It would be good have a
searchable thread of such tricks.

Just reply back on this thread with the tricks that you have and I
promise I will collate them and add them to developer guide.


Looking forward to be amazed!



Things that I normally do:


1. Visualizing flow through the stack is one of the first steps that I use in debugging. Tracing a call from the origin (application) to the underlying filesystem is usually helpful in isolating most problems. Looking at logs emanating from the endpoints of each stack (fuse/nfs etc. + client protocols, server + posix) helps in identifying the stack that might be the source of a problem. For understanding the nature of fops happening, you can use the wireshark plugin or the trace translator at appropriate locations in the graph.

2. Use statedump/meta for understanding internal state. For servers statedump is the only recourse, on fuse clients you can get a meta view of the filesystem by

cd /mnt/point/.meta

and you get to view the statedump information in a hierarchical fashion (including information from individual xlators).

3. Reproduce a problem by minimizing the number of nodes in a graph. This can be done by disabling translators that can be disabled through volume set interface and by having custom volume files.

4. Use error-gen while developing new code to simulate fault injection for fops.

5. If a problem happens only at scale, try reproducing the problem by reducing default limits in code (timeouts, inode table limits etc.). Some of them do require re-compilation of code.

6. Use the wealth of tools available on *nix systems for understanding a performance problem better. This infographic [1] and page [2] by Brendan Gregg is quite handy for using the right tool at the right layer.

7. For isolating regression test failures:

     - use tests/utils/testn.sh to quickly identify the failing test
- grep for the last "Test Summary Report" in the jenkins report for a failed regression run. That usually provides a pointer to the failing test. - In case of a failure due to a core, the gdb command provided in the jenkins report is quite handy to get a backtrace after downloading the core and its runtime to your laptop.

8. Get necessary information to debug a problem as soon as a new bug is logged (a day or two is ideal). If we miss that opportunity, users could have potentially moved on to other things and obtaining information can prove to be difficult.

9. Be paranoid about any code that you write ;-). Anything that is not tested by us will come back to haunt us sometime else in the future.

10. Use terminator [3] for concurrently executing the same command on multiple nodes.

Will fill in more when I recollect something useful.

-Vijay

[1] http://www.brendangregg.com/Perf/linux_observability_tools.png

[2] http://www.brendangregg.com/perf.html

[3] https://launchpad.net/terminator


_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to