On 01/22/2016 09:13 AM, Raghavendra Talur wrote:
HI All,
I am sure there are many tricks hidden under sleeves of many Gluster
developers.
I realized this when speaking to new developers. It would be good have a
searchable thread of such tricks.
Just reply back on this thread with the tricks that you have and I
promise I will collate them and add them to developer guide.
Looking forward to be amazed!
Things that I normally do:
1. Visualizing flow through the stack is one of the first steps that I
use in debugging. Tracing a call from the origin (application) to the
underlying filesystem is usually helpful in isolating most problems.
Looking at logs emanating from the endpoints of each stack (fuse/nfs
etc. + client protocols, server + posix) helps in identifying the stack
that might be the source of a problem. For understanding the nature of
fops happening, you can use the wireshark plugin or the trace translator
at appropriate locations in the graph.
2. Use statedump/meta for understanding internal state. For servers
statedump is the only recourse, on fuse clients you can get a meta view
of the filesystem by
cd /mnt/point/.meta
and you get to view the statedump information in a hierarchical fashion
(including information from individual xlators).
3. Reproduce a problem by minimizing the number of nodes in a graph.
This can be done by disabling translators that can be disabled through
volume set interface and by having custom volume files.
4. Use error-gen while developing new code to simulate fault injection
for fops.
5. If a problem happens only at scale, try reproducing the problem by
reducing default limits in code (timeouts, inode table limits etc.).
Some of them do require re-compilation of code.
6. Use the wealth of tools available on *nix systems for understanding a
performance problem better. This infographic [1] and page [2] by Brendan
Gregg is quite handy for using the right tool at the right layer.
7. For isolating regression test failures:
- use tests/utils/testn.sh to quickly identify the failing test
- grep for the last "Test Summary Report" in the jenkins report
for a failed regression run. That usually provides a pointer to the
failing test.
- In case of a failure due to a core, the gdb command provided in
the jenkins report is quite handy to get a backtrace after downloading
the core and its runtime to your laptop.
8. Get necessary information to debug a problem as soon as a new bug is
logged (a day or two is ideal). If we miss that opportunity, users could
have potentially moved on to other things and obtaining information can
prove to be difficult.
9. Be paranoid about any code that you write ;-). Anything that is not
tested by us will come back to haunt us sometime else in the future.
10. Use terminator [3] for concurrently executing the same command on
multiple nodes.
Will fill in more when I recollect something useful.
-Vijay
[1] http://www.brendangregg.com/Perf/linux_observability_tools.png
[2] http://www.brendangregg.com/perf.html
[3] https://launchpad.net/terminator
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel