Hmm. Just speculating since I don't enough about sun4v, but could it 
be something to do with the processor switching to another CMT thread 
when the process blocks waiting for memory?

Since almost all of the user time is spent in one function (as 
determined by the profile provider) I started doing some timing of 
this particular function. I discovered that there are three distinct 
times that the thread can take through the code path. The shortest was 
about the same on the two processors, but the longest was quite a bit 
longer on the sun4v. I suspect that this longer time represents the 
time when the data is not in the cache. My understanding of CMT is 
that when the thread blocks on memory, the processor switches to a 
different thread. That would imply a different address mapping space 
as well, wouldn't it? Perhaps there is a window where the PC points 
one  place, but the address space is pointing another? Just a WAG.


Jon Haslam wrote:
> 
>> One more thing. On the T1000, I run this script:
>>
>> #!/usr/sbin/dtrace -s
>> profile-997
>> /arg1 && execname == "ssh"/
>> {
>>          @[execname,pid,ustack()]=count()
>> }
>> profile-997
>> /arg1 && execname == "sshd"/
>> {
>>          @[execname,pid,ustack()]=count()
>> }
>> profile-50s
>> {
>>          exit(0);
>> }
>> END
>> {
>>          trunc(@,1000)
>> }
>>
>> and I get a whole bunch of errors that look like this:
>>
>> dtrace: error on enabled probe ID 1 (ID 1853: profile:::profile-997): 
>> invalid address (0x3bb98000) in action #4
>>
>> The only difference on these is that the address is different each 
>> time. I don't get these message on the v220 at all. Any ideas what the 
>> problem is?
>>   
> 
> I had a quick look at this a while ago and here are a few
> observations I made. Firstly though, what version of Solaris are
> you running? Do you see the errors spasmodically or all the time?
> 
> It could be down to the fact that early on in a processes lifecycle
> and very periodically during its lifetime (for mmu synchronisation)
> we can have an invalid user mmu context. When we go and walk the
> user stack we'll have an invalid secondary context and you'll
> see these errors.
> 
> Also it could be that you're catching a process very early on in
> its lifecycle when we have an address space setup but we have no
> mappings loaded. When we walk the stack we induce user pagefaults
> with the loads and get the invalid address errors.
> 
> I would expect to see these errors on all SPARC and not just
> 4v though so I'm not sure why you don't see these errors on
> 4u though. Could be something completely different though...
> 
> Jon.
> 
>>
>> Brian Utterback wrote:
>>  
>>> I have been doing some profiling using the profile provider. I have a 
>>> command that runs more slowly on the T1000 than it does on prior 
>>> systems and I am trying to find out why. Using the profile provider 
>>> at 1000 hz, and aggregating on the ustack output, I find that the 
>>> same function appears at the top of the stack on both platforms, but 
>>> on each there are specific instruction locations within the function 
>>> that appear most often and these are different on the two platforms. 
>>> They are consistent on a platform. That is when I re-run the test on 
>>> one platform, about 4 specific PC locations will appear in the top 
>>> spots, but on that platform it is always the same 4 and the 4 are 
>>> different between the platforms.
>>>
>>> So, I am trying to figure out whether or not there is something 
>>> special happening at those locations, or just before or just after, 
>>> or are they just artifacts of how the profile provider works? There 
>>> are two function calls with this function, but neither set of 4 
>>> locations seem to be near the calls. And if the time was really spent 
>>> inside the next lower down level, wouldn't that have been reflected 
>>> in the ustack output?
>>>     
>>
>>   
> 

-- 
blu

There are two rules in life:
Rule 1- Don't tell people everything you know
----------------------------------------------------------------------
Brian Utterback - Solaris RPE, Sun Microsystems, Inc.
Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to