Re: [marss86-devel] regarding multi-p2p

avadh patel Sun, 02 Oct 2011 22:34:10 -0700

On Wed, Sep 28, 2011 at 10:54 AM, sparsh mittal <[email protected]>wrote:


>
> On Wed, Sep 28, 2011 at 11:55 AM, avadh patel <[email protected]> wrote:
>
>>
>> Even if you are running multi-prog workloads, kernel will share some data.
>> For your scheme to work, you need to implement a special WT-L1 and L2 cache
>> that will make sure that on each write cache line has to be locked in shared
>> L2 so write update is seen as atomic and hence you serialize all memory
>> updates for correctness.
>>
>> I have done some implementation of multip2p.
>
>  1.       If I only take user-stats, still do I need to implement what you
> said above?
>
Statistics are totally different from actual execution. Even if you are just
interested in user level stats, the simulator will run kernel level code and
you need to make sure that its executed correctly. So you need to make sure
that you serialize all the memory accesses.

>  2.        Now, my understanding of  how to apply your above comment is:
> when L1 cache writes to L2, I need to make sure
>
> a.       No other L1 cache is updating L2 at that time or no other L1 can
> replace that block before it is written.
>
> b.      Something else?
>
For correctness, before a cache write is performed on L2 cache, L2 will send
'invalidate' signal to all other L1 caches so they can remove their outdated
cache line and on next read/write they request updated data from L2 cache.
 With this method, on every write you'll send an invalidate request to all
L1 caches and it will fill up all the queues quickly.  So it will be bad
idea to implement this because the penalty is huge.

Other method is to keep 'present' bit in L2 cache line to keep track of
which L1 caches have that line and send invalidate message to only those
caches.  With this method it seems more like Directory based coherence, so
may be its better to use cache coherence.  Before you invest more time in
implementing either of these methods, I suggest that you run some
simulations and collect some statistics about how much coherence is
affecting your benchmarks. As you said earlier that most of the data is
private then the overhead of having cache coherence will be minimum.  You
may save some power in write-through caches but I think using either of
these method will also add some extra power consumption to simple
write-through caches.

- Avadh

>   Thanks a lot for these insights.
>
>
>
>> - Avadh
>>
>>  Thanks and Regards
>>> Sparsh Mittal
>>>
>>>
>>>
>>>
>>> On Wed, Sep 28, 2011 at 12:10 AM, DRAM Ninjas <[email protected]>wrote:
>>>
>>>> Certainly you can do this, but unless I'm misunderstanding something, it
>>>> won't actually be 'correct'. If you have multiple L1 caches, you have to
>>>> enforce consistency on them -- which is precisely why you can only have the
>>>> bus there be a MESI bus. If you just have all of the private caches write
>>>> through, then who knows what value that will end up going back to memory is
>>>> and what each processor will see as the 'right' values.
>>>>
>>>> From a simulation standpoint it obviously won't matter since there's no
>>>> data involved in the memoryHierarchy objects, but I don't know if realism 
>>>> is
>>>> a concern for you.
>>>>
>>>>
>>>> On Wed, Sep 28, 2011 at 12:09 AM, avadh patel <[email protected]>wrote:
>>>>
>>>>>
>>>>> On Tue, Sep 27, 2011 at 2:59 PM, sparsh mittal <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hello
>>>>>> This regards to a previous discussion (copied below):
>>>>>>
>>>>>> 1. So can we say that for multi-core, L1 cache can be only mesi, as
>>>>>> per code existing now.
>>>>>>
>>>>>
>>>>> Now you can also use moesi cache.
>>>>>
>>>>>> 2. If I think of implementing p2p_multi, would you give some hint:
>>>>>> which files/code sections would be affected (some hints you have already
>>>>>> given below). Any precaution?
>>>>>>
>>>>>> Take a look at ptlsim/cache/p2p.* files.  You can add a array (use
>>>>> 'dynarray') in p2p which can store multiple upper level controllers when
>>>>> registered.  And when it receives a request, it can find 'dest' controller
>>>>> from the array and forward request to 'dest' controller.
>>>>>
>>>>> - Avadh
>>>>>
>>>>> Summary of my experiments:
>>>>>>> 1. The configuration with L1 =mesi runs fine, which you pointed out
>>>>>>> 2. The config with L1=write-through, and L1-L2 as mesi does not work
>>>>>>> (copied below, item 1, reduced log file attached)
>>>>>>>
>>>>>>
>>>>>> I looked at the configuration and logfile and the reason its not
>>>>>> working is because the bus interconnect is designed for MESI only. So 
>>>>>> when
>>>>>> you attached write-through caches it doesnt work because bus is waiting 
>>>>>> for
>>>>>> snoop response from all connected controllers. And WT caches are not
>>>>>> designed to perform snoop operations and send response back, so they 
>>>>>> ignore
>>>>>> the request and dont respond anything. (Look for 'responseReceived' in
>>>>>> logfile, it at the end).  Due to this behavior the cores wait for cache
>>>>>> request to complete and never make any progress.
>>>>>>
>>>>>> 3. The config with L1=write-through, and L1-L2 as p2p does not work
>>>>>>> (copied below, item 2, log file has almost nothing)
>>>>>>>
>>>>>>
>>>>>> In this configuration, first thing is that you used 'p2p' interconnect
>>>>>> to attach all L1 I,D caches and L2 cache, but 'p2p' supports connecting 
>>>>>> only
>>>>>> 2 controllers.
>>>>>>
>>>>>> The solution to this issue is to create a new interconnect module that
>>>>>> allows you to send messages directly to lower cache and send response 
>>>>>> back.
>>>>>>  To develop such a module should not take too long as you are not going 
>>>>>> to
>>>>>> buffer any request, you'll just pass the request from source to 
>>>>>> destination
>>>>>> just like in 'p2p' interconnect. But unlike 'p2p' this interface will
>>>>>> support multiple Upper and single Lower interconnect. I suggest you take 
>>>>>> a
>>>>>> look at p2p interconnect design on how it passes the message from one
>>>>>> interconnect to other. And create a new interconnect, lets call it
>>>>>> 'p2p_multi', that allows multiple upper connections.
>>>>>>
>>>>>> Thanks and Regards
>>>>>> Sparsh Mittal
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> http://www.marss86.org
>>>>>> Marss86-Devel mailing list
>>>>>> [email protected]
>>>>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> http://www.marss86.org
>>>>> Marss86-Devel mailing list
>>>>> [email protected]
>>>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] regarding multi-p2p

Reply via email to