As a follow-up on my earlier email about the no snoop setting: If you
haven't checked that out yet you probably should. (c.f. last post in
  https://software.intel.com/en-us/forums/topic/401498)

I think that what you describe below might point to that. From a quick
look at the 82574L spec it looks like no snoop is disabled by default on
that card (i.e. the respective PCIe transactions will be snooped by the
cache), while the x540 spec seems to indicate that no snoop is enabled
by default (which was also my experience with the 82599).

Not quite sure though about the behavior with mfences though. In any
case I would rule out the no snoop issue (basically involves setting one
bit, iirc, and I think I found that in the FreeBSD ixgbe driver).

On Thu, Apr 23 02:53, Brandon Falk wrote:
> This is verrrrry strange indeed. The mfence works but the lfence does not.
> On top of these I have tried *many* other different operations which I
> thought may have an effect. You can find the entire code snippit for these
> tests at the end of this email. When the code was being tested each
> 'section' was individually uncommented and then the result of that test is
> placed in a comment above the test.
> 
> After doing these tests and scratching my head, I decided to do a test with
> my old 82574L driver. I removed the X540 from the test machine and in the
> same PCIe slot placed in the 82574L card which I have previously used (also
> using the same ethernet connection). Since the descriptor format for legacy
> descriptors is identical in the X540 and 82574L, I used the exact same code
> (posted below) as I used on the X540. On this card *no fences* were needed.
> Indicating that either A. I'm initialing the X540 in a manner that somehow
> makes this behaviour possible. B. the X540 (or maybe my specific one) has a
> bug that is causing a need force these fences. C. Maybe it's not a bug but
> something that needs to be documented. I still find it very strange that a
> write fence changes how things operate when no writes are actually being
> done where I'm fencing.
> 
> Some other tests I have done:
> 
> - Have other processors spin and do mfences while the main core does not do
> an mfence. This did not make it work, and this is expected as an mfence
> only should locally change behaviour.
> - mfences all over the X540 initialization and prior to doing the DD
> polling. Did not fix the problem.
> 
> Some things I could think of that would cause this problem:
> 
> - It's just a bug in the X540, or mine specifically. (If I'm not too lazy
> maybe I'll swap my X540s around between machines and try on another one).
> - It's a bug in my initialization, but I would find this strange as my
> 82574L driver initializes in almost an identical fashion.
> - It's a bug in my motherboard/CPU, but only on >=8x channel PCIe cards,
> which would explain why it didn't affect the 82574L.
> 
> --------- Example code ---------------------
> 
> ; Get the rx entry
> mov rdx, qword [gs:thread_local.x540_rx_ring_base]
> mov rax, qword [gs:thread_local.x540_rx_head]
> shl rax, 4
> 
> ; XXX: Temporary, used for the loop counter for testing.
> xor ebp, ebp
> 
> ; Putting an mfence/lfence/sfence here has no effect on the result.
> 
> mov rsi, qword [rdx + rax + 0] ; pointer to packet contents
> 
> ; Putting an mfence/lfence/sfence here has no effect on the result.
> 
> .lewp:
> ; Putting an mfence/lfence/sfence here has no effect on the result.
> 
> ; Wait until a packet is present here by polling the DD bit
> test dword [rdx + rax + 8 + 4], 1
> jz   short .lewp
> 
> ; Without anything here, we fail with nothing getting printed.
>  ; Successful
> ;mfence
> 
> ; Failure, never prints
> ;lfence
> 
> ; Successful
> ;sfence
> 
> ; Successful (the pushes and pops do not change the result, fails without
> ; rdtsc)
> ;push rax
> ;push rdx
> ;rdtsc
> ;pop rdx
> ;pop rax
> 
> ; Failure, prints out 3 to the screen. Meaning we read the value 3 times
> ; before it became accurate. On a second attempt it printed 3 as well.
> ;clflush [rsi + (udp_template_10g.ulen - udp_template_10g)]
> 
> ; Successful
> ;wbinvd
> 
> ; Failure, never prints
> ; mov rcx, 1024
> ;.simple_pause:
> ; dec rcx
> ; jnz short .simple_pause
> 
> ; Failure, never prints
> ; mov rcx, 1024
> ;.do_some_reads:
> ; pop r15
> ; dec rcx
> ; jnz short .do_some_reads
> 
> ; Failure, never prints
> ; mov rcx, 1024
> ;.do_some_writes:
> ; push r15
> ; dec rcx
> ; jnz short .do_some_writes
> 
> ; Successful
> ;invlpg [rsi + (udp_template_10g.ulen - udp_template_10g)]
>  ; Failure, never prints
> ;prefetch [rsi + (udp_template_10g.ulen - udp_template_10g)]
> 
> ; Failure, never prints
> ;lock inc qword [rsp]
> 
> ; Spinloop and keep track of how many spins we have done in ebp. We spin
> ; until the packet indicates a UDP length of 0x14 bytes.
> .spin:
> inc ebp
> cmp word [rsi + (udp_template_10g.ulen - udp_template_10g)], 0x1400
> jne short .spin
> 
> ; Print out the value in ebp to the screen (count from the loop above).
> mov  edx, ebp
> call outhexq
> 
> cli
> hlt


-- 
Antoine Kaufmann
<antoi...@cs.washington.edu>

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to