Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

Aaron Knister Tue, 07 Mar 2017 13:51:38 -0800

Hi Bob,

I have the impression the biggest impact is to metadata-type operationsrather than throughput but don't quote me on that because I have verylittle data to back it up. In the process of testing upgrading from GPFS3.5 to 4.1 we ran fio on 1000 some nodes against an FS in our testenvironment which sustained about 60-80k iops on the filesystem'smetadata LUNs. At one point I couldn't understand why I was strugglingto get about 13k iops and realized tracing was turned on on some subsetof nsd servers (which are also manager nodes). After turning it off thethroughput immediately shot back up to where I was expecting it to be.

Also during testing we were tracking down a bug for which I needed torun tracing *everywhere* and then turn it off when one of the managernodes saw a particular error. I used a script IBM had sent me a whileback to help with this that I made some tweaks to. I've attached it incase its helpful. In a nutshell the process looks like:

- start tracing everywhere (/usr/lpp/mmfs/bin/mmdsh -Nall/usr/lpp/mmfs/bin/mmtrace start). Doing it this way avoids the need tochange the sdrfs file which depending on your cluster size may or maynot have some benefits.- run a command to watch for the event in question that when triggeredruns /usr/lpp/mmfs/bin/mmdsh -Nall /usr/lpp/mmfs/bin/mmtrace stop

If the condition could present itself on multiple nodes within quicksuccession (as was the case for me) you could wrap the mmdsh forstopping tracing in an flock, using an arbitrary node that stores thelock locally:

ssh $stopHost flock -xn /tmp/mmfsTraceStopLock -c"'/usr/lpp/mmfs/bin/mmdsh -N all /usr/lpp/mmfs/bin/mmtrace stop'"


Wrapping it in an flock avoids multiple trace format format attempts.

-Aaron

On 3/7/17 3:32 PM, Oesterlin, Robert wrote:

I’m considering enabling trace on all nodes all the time, doing
something like this:



mmtracectl --set --trace=def --trace-recycle=global
--tracedev-write-mode=overwrite --tracedev-overwrite-buffer-size=256M
mmtracectl --start



My questions are:



- What is the performance penalty of leaving this on 100% of the time on
a node?

- Does anyone have any suggestions on automation on stopping trace when
a particular event occurs?

- What other issues, if any?





Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413







_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776

#!/usr/bin/ksh
stopHost=loremds20
mmtrace=/usr/lpp/mmfs/bin/mmtrace
mmtracectl=/usr/lpp/mmfs/bin/mmtracectl
# No automatic start of mmtrace.
# Second to sleep between checking.
secondsToSleep=2

# Flag to know when tripped or stopped
tripped=0

# mmfs log file to monitor
logToGrep=/var/log/messages

# Path to mmfs bin directory
MMFSbin=/usr/lpp/mmfs/bin

# Trip file.  Will exist if trap is sprung
trapHasSprung=/tmp/mmfsTrapHasSprung

rm $trapHasSprung 2>/dev/null

# Start tracing on this node
#${mmtrace} start

# Initial count of expelled message in mmfs log
baseCount=$(grep "unmounted by the system with return code 301 reason code" 
$logToGrep | wc -l)

# do this loop while the trip file does not exist

while [[ ! -f $trapHasSprung ]]
do
  sleep $secondsToSleep

  # Get current count of expelled to check against the initial.
  currentCount=$(grep "unmounted by the system with return code 301 reason 
code" $logToGrep | wc -l)

  if [[ $currentCount > $baseCount ]]
  then
   tripped=1
   /usr/lpp/mmfs/bin/mmdsh -N managernodes,quorumnodes touch $trapHasSprung
   # cluster manager?
   #stopHost=$(/usr/lpp/mmfs/bin/tslsmgr | grep '^Cluster manager' | awk '{ 
print $NF }' | sed -e 's/[()]//g')
   ssh $stopHost flock -xn /tmp/mmfsTraceStopLock -c "'/usr/lpp/mmfs/bin/mmdsh 
-N all -f128 /usr/lpp/mmfs/bin/mmtrace stop noformat'"
  fi

done

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

Reply via email to