On 11/10/2010 04:25, chris moden wrote:
Hi Phil,

Sorry I haven't replied sooner; I've been away.

I can sort of see what you're saying, but I thought that dtrace was triggered 
by the relevant event in the kernel.
This program works when I test it, and it is normally left to run indefinitely; 
I'm not continuously stopping and starting it.
The error only occurred after about 1 1/2 weeks of continuous running.
Would the condition you mentioned still obtain?
Am I missing something here?

Cheers
Chris

Sorry Chris, I'd assumed it was urgent, which is why I replied from my iPhone. So, as I mentioned, I didn't read the script that thoroughly (because all the lines were wrapped exceedingly short). The info about how you run it, and that your bug only shows up after 1 1/2 weeks would have been useful data.

Let's add some context...
Just got this error

<code>
dtrace: error on enabled probe ID 6 (ID 11792: syscall::rename:return): invalid 
address (0x0) in action #1 at DIF offset 28
</code>
which I thought I'd solved above. Apparently not:(  any ptrs gratefully 
received.

And your code...

#################################
# --- DTrace ---
#
# NB: seem to need the single quotes around the DTrace code ...

That's a shell script thing, not a DTrace issue.

# This also means the even the contents of comment blocks CANNOT have single 
quotes
# in them eg don't, won't etc... (sigh...)

Again this is a shell script thing. If you want a ' you can always do '"'"' inside the initial '.

/usr/sbin/dtrace -n  '

... or you can put the D in a separate script and start it with...

#!/usr/sbin/dtrace -s

... and use ' and " to your heart's content.

  /* Params from shell input */
  inline string DIRNAME  = "'$dirname'";

  #pragma D option quiet
  #pragma D option switchrate=10hz

  /*
   * Print header
   */
  dtrace:::BEGIN
  {
     /* print main headers: We cannot line up final arg hdr exactly
      * because the cmd len varies
      */
     printf("%-20s %-12s %5s  %5s  %6s  %6s  %s ->  %s\n",
            "TIME", "ZONE", "GID", "UID", "PID", "PPID", "CMD", "TARGET") ;
  }

  /*
   *  Check exec event type
   */

syscall::unlink:entry
{
     /* Grab the dirname in qn to test later: remove any preceding path */
     /* Experiment seems to indicate unlink will not have this value in the 
return state ;
      * contrast with rmdir below which may not have it in entry state
      */
     tgt = basename(copyinstr(arg0));
}

/*http://docs.sun.com/app/docs/doc/817-6223/6mlkidlrg?l=en&a=view#indexterm-458 
 :
  * Avoiding Errors
  * The copyin() and copyinstr() subroutines cannot read from user addresses 
which have not yet
  * been touched so even a valid address may cause an error if the page 
containing that address
  * has not yet been faulted in by being accessed.
  * To resolve this issue, wait for kernel or application to use the data 
before tracing it.
  * For example, you might wait until the system call returns to apply 
copyinstr()
  */

This can apply to unlink as well, your experiments were not exhaustive.

syscall::rmdir:entry, syscall::rename:entry
{
     /* Try saving a ptr to the relevant value for later, otherwise it gives 
invalid addr error
      * in return section below
      */
     self->file = arg0;
}

syscall::rmdir:return, syscall::rename:return
{
     /* Grab the dirname in qn to test later: remove any preceding path */
     tgt = basename(copyinstr(self->file));
}

The return probe needs to be matched with the entry probe. In a race between two or more threads you cannot predict what tgt will contain.

/* Not matching on zone because tgt dir can be deleted from global,
  * although the users should not be able to get in there.
  */
syscall::rmdir:return, syscall::rename:return, syscall::unlink:return
/ DIRNAME == tgt /
{
     /* Print the field values. The TARGET tends not to line up as we
        print the cmd and the target name for completeness. For a shell level 
cmd,
        we will get the target name in the CMD field as well. For an "internal" 
cmd,
        eg rmdir() from within perl, the CMD field does not contain the target 
value.
     */
     printf("%-20Y %-12s %5d  %5d  %6d  %6d  %s ->  %s\n",
             walltimestamp, zonename, gid, uid, pid, ppid,
             curpsinfo->pr_psargs, tgt ) ;

     /* Clear the self->file ptr to avoid dynamic variable drop errors */
     self->file = 0;
}

Again, this needs to match the corresponding entry probe.

I suggest something like the following ...

syscall::rmdir:entry, syscall::rename:entry, syscall::unlink:entry
/arg0/
{
    self->file = arg0;
}

syscall::rmdir:return, syscall::rename:return, syscall::unlink:return
/self->file/
{
    self->tgt = basename(copyinstr(self->file));
}

syscall::rmdir:return, syscall::rename:return, syscall::unlink:return
/self->tgt == DIRNAME/
{
    printf("%-20Y %-12s %5d %5d %6d %6d %s -> %s\n",
        walltimestamp, zonename, gid, uid, pid, ppid,
        curpsinfo->pr_psargs, tgt ) ;
}

syscall::rmdir:return, syscall::rename:return, syscall::unlink:return
/self->file/
{
    self->tgt = "";
    self->file = 0;
}

Because you failed to match entry and return probes, you may have missed the event you were looking for in a race between two or more threads. I think the issue you saw was probably that something called rename(2) with arg0 set to zero, which is why I predicate on arg0 being non-zero.

Phil

_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to