> Hmm. Just speculating since I don't enough about sun4v, but could it
> be something to do with the processor switching to another CMT thread
> when the process blocks waiting for memory?
>
> Since almost all of the user time is spent in one function (as
> determined by the profile provider) I started doing some timing of
> this particular function. I discovered that there are three distinct
> times that the thread can take through the code path. The shortest was
> about the same on the two processors, but the longest was quite a bit
> longer on the sun4v. I suspect that this longer time represents the
> time when the data is not in the cache. My understanding of CMT is
> that when the thread blocks on memory, the processor switches to a
> different thread. That would imply a different address mapping space
> as well, wouldn't it? Perhaps there is a window where the PC points
> one place, but the address space is pointing another? Just a WAG.
Yeah, I wouldn't like to say without looking at the particular
issue although I can see the behaviour that you see though. Sometime
I've got to look at another mmu context/ustack() related issue on 4v
and I'll verify the source of this then.
Jon.
>
> Jon Haslam wrote:
>
>>> One more thing. On the T1000, I run this script:
>>>
>>> #!/usr/sbin/dtrace -s
>>> profile-997
>>> /arg1 && execname == "ssh"/
>>> {
>>> @[execname,pid,ustack()]=count()
>>> }
>>> profile-997
>>> /arg1 && execname == "sshd"/
>>> {
>>> @[execname,pid,ustack()]=count()
>>> }
>>> profile-50s
>>> {
>>> exit(0);
>>> }
>>> END
>>> {
>>> trunc(@,1000)
>>> }
>>>
>>> and I get a whole bunch of errors that look like this:
>>>
>>> dtrace: error on enabled probe ID 1 (ID 1853: profile:::profile-997):
>>> invalid address (0x3bb98000) in action #4
>>>
>>> The only difference on these is that the address is different each
>>> time. I don't get these message on the v220 at all. Any ideas what the
>>> problem is?
>>>
>>>
>> I had a quick look at this a while ago and here are a few
>> observations I made. Firstly though, what version of Solaris are
>> you running? Do you see the errors spasmodically or all the time?
>>
>> It could be down to the fact that early on in a processes lifecycle
>> and very periodically during its lifetime (for mmu synchronisation)
>> we can have an invalid user mmu context. When we go and walk the
>> user stack we'll have an invalid secondary context and you'll
>> see these errors.
>>
>> Also it could be that you're catching a process very early on in
>> its lifecycle when we have an address space setup but we have no
>> mappings loaded. When we walk the stack we induce user pagefaults
>> with the loads and get the invalid address errors.
>>
>> I would expect to see these errors on all SPARC and not just
>> 4v though so I'm not sure why you don't see these errors on
>> 4u though. Could be something completely different though...
>>
>> Jon.
>>
>>
>>> Brian Utterback wrote:
>>>
>>>
>>>> I have been doing some profiling using the profile provider. I have a
>>>> command that runs more slowly on the T1000 than it does on prior
>>>> systems and I am trying to find out why. Using the profile provider
>>>> at 1000 hz, and aggregating on the ustack output, I find that the
>>>> same function appears at the top of the stack on both platforms, but
>>>> on each there are specific instruction locations within the function
>>>> that appear most often and these are different on the two platforms.
>>>> They are consistent on a platform. That is when I re-run the test on
>>>> one platform, about 4 specific PC locations will appear in the top
>>>> spots, but on that platform it is always the same 4 and the 4 are
>>>> different between the platforms.
>>>>
>>>> So, I am trying to figure out whether or not there is something
>>>> special happening at those locations, or just before or just after,
>>>> or are they just artifacts of how the profile provider works? There
>>>> are two function calls with this function, but neither set of 4
>>>> locations seem to be near the calls. And if the time was really spent
>>>> inside the next lower down level, wouldn't that have been reflected
>>>> in the ustack output?
>>>>
>>>>
>>>
>>>
>
>
_______________________________________________
dtrace-discuss mailing list
[email protected]