Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Arnaud Lacombe
Hi,

On Thu, May 5, 2011 at 1:20 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, May 4, 2011 at 5:38 PM, Jack Vogel jfvo...@gmail.com wrote:
 I have had my validation engineer busy all day, we have tried both
 a 9 kernel as well as 8.2,  using the code from HEAD, and we
 cannot reproduce this problem.

 Actually, it can be trivially reproduced by tainting `error'. As it is
 uninitialized in HEAD, it's value can be _anything_, so let's mark it
 as explicitly invalid.

 diff -u ./if_em.c /data/src/freebsd/em-7.2.2/src/if_em.c
 --- ./if_em.c   2011-02-18 01:18:23.0 -0500
 +++ /data/src/freebsd/em-7.2.2/src/if_em.c      2011-05-05
 01:12:01.0 -0400
 @@ -3912,7 +3912,7 @@
        struct  adapter         *adapter = rxr-adapter;
        struct em_buffer        *rxbuf;
        bus_dma_segment_t       seg[1];
 -       int                     i, j, nsegs, error;
 +       int                     i, j, nsegs, error = -1;

 The error pointed out in this thread pops up in the next boot.

I put a call to kdb_enter() at the beginning of the function, helped
with some textdump I got all the backtrace [0] for all the time
em_setup_receive_ring() is called. All are exactly the same:

kdb_enter_why(0,c09f6511,f391aaa8,c09be1e2,c09f6511,...) at kdb_enter_why+0x3b
kdb_enter(c09f6511,0,3810,,5dc,...) at kdb_enter+0x19
em_setup_receive_ring(c3c8d600,c3c8d7a4,c3c96004,31fa,c3c8d600,...)
at em_setup_receive_ring+0x22
em_setup_receive_structures(c3c96000,f15f2000,38,8100,3,...) at
em_setup_receive_structures+0x26
em_init_locked(c3c96000,0,c09f5de5,414,1,...) at em_init_locked+0x2f2
em_ioctl(c3c7d000,80206934,c3ce9d00,c07b7a0b,c3f2a230,...) at em_ioctl+0x1c3
ifhwioctl(c3f2a230,f391ac34,c07b7a0b,c3f3e3d0,c08df1c0,...) at ifhwioctl+0x4b8
ifioctl(c3f3e3d0,80206934,c3ce9d00,c3f2a230,c3f2a230,...) at ifioctl+0x82
kern_ioctl(c3f2a230,3,80206934,c3ce9d00,c3ce9d00,...) at kern_ioctl+0xa8
ioctl(c3f2a230,f391acf8,c,c,f391ad2c,...) at ioctl+0xc5
syscall(f391ad38) at syscall+0x17d
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x4816ee23, esp =
0xbfbfe67c, ebp = 0xbfbfe698 ---

This fully explain why the main loop in em_setup_receive_ring() is
never entered, as we always verify `j == rxr-next_to_check' (provided
that mbuf have been refreshed if some packet were transfered) and
return the value on the stack. As of now, beside changing the
call-site of em_setup_receive_ring() to ensure it is never re-entered,
I'd guess that the patch I sent earlier today, is the only way to
ensure that no junk is returned.

I'd guess that the driver _is_ able to transmit, if the code was not
explicitly calling em_stop() upon em_setup_receive_structures()
failure.

 - Arnaud

[0]: I wish that would have been as easy as in Linux, where a WARN()
call do all the job automatically, but still, I should not hope for
that much unless I am the one implementing it ... yes, free whining,
it's 2a.m. ...

  - Arnaud

 The data your netstat -m shows suggests to me that what's happening
 is somehow setup of the receive ring is running more than once maybe??

 You asked at one point how this could go into STABLE, well, because
 not only here at Intel, but at lots of external customers this code has been
 used and tested thoroughly.

 I am not calling into question your problem, but until I understand what it
 is I cannot fix it :)

 The thing I am guessing right now is the culprit is the setup code, the
 reason
 is that when I ported to the igb driver I found that it did not work on our
 newer
 hardware, and so I went back to the older version of setup for igb. Now,
 even
 though I have not seen hardware fail with em, maybe there is some.

 To help me give me a complete pciconf -lv, and if its a namebrand system
 tell me that, including all hardware in it.

 If you like Olivier I can make a version of em for you that also reverts the
 setup code the way I did for igb, see if that fixes it for you?

 Thanks for your patience,

 Jack
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


bsdlabel showing value zero on fsize, bsize and bps/cpg for all partitions

2011-05-05 Thread Sergi Seira

Hello,

on freebsd version 6 and 7 I was relaying on bsdlabel to get block size :

# bsdlabel /dev/mirror/gm4s1
# /dev/mirror/gm4s1:
8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:  419430404.2BSD 2048 16384 28528
  b:  8388608  4194304  swap
  c: 2930416020unused0 0 # raw part, don't 
edit
  d:  2097152 125829124.2BSD 2048 16384 28528
  e: 52428800 146800644.2BSD 2048 16384 28528
  f: 225932738 671088644.2BSD 2048 16384 28528

but on 8.1 and 8.2 I get zero values :

# bsdlabel /dev/mirror/gm6s1
# /dev/mirror/gm6s1:
8 partitions:
#size   offsetfstype   [fsize bsize bps/cpg]
  a:  419430404.2BSD0 0 0
  b: 17186816  4194304  swap
  c: 19535251050unused0 0 # raw part, don't 
edit
  d:  2097152 213811204.2BSD0 0 0
  e: 83886080 234782724.2BSD0 0 0
  f: 1846160753 1073643524.2BSD0 0 0

Has anyone seen this?
Is it some step I missed on install?
Is there another command to get block size?

Thanks for your help,
regards,
Sergi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Jack Vogel
OK, but what this does not explain is why I do not see this if
its so easily reproduced, what causes the failure case, any idea?

As I said, given the code was not feasible for igb anyway I would not
be unhappy about returning to the old way of doing things.

Jack


On Wed, May 4, 2011 at 11:03 PM, Arnaud Lacombe lacom...@gmail.com wrote:

 Hi,

 On Thu, May 5, 2011 at 1:20 AM, Arnaud Lacombe lacom...@gmail.com wrote:
  Hi,
 
  On Wed, May 4, 2011 at 5:38 PM, Jack Vogel jfvo...@gmail.com wrote:
  I have had my validation engineer busy all day, we have tried both
  a 9 kernel as well as 8.2,  using the code from HEAD, and we
  cannot reproduce this problem.
 
  Actually, it can be trivially reproduced by tainting `error'. As it is
  uninitialized in HEAD, it's value can be _anything_, so let's mark it
  as explicitly invalid.
 
  diff -u ./if_em.c /data/src/freebsd/em-7.2.2/src/if_em.c
  --- ./if_em.c   2011-02-18 01:18:23.0 -0500
  +++ /data/src/freebsd/em-7.2.2/src/if_em.c  2011-05-05
  01:12:01.0 -0400
  @@ -3912,7 +3912,7 @@
 struct  adapter *adapter = rxr-adapter;
 struct em_buffer*rxbuf;
 bus_dma_segment_t   seg[1];
  -   int i, j, nsegs, error;
  +   int i, j, nsegs, error = -1;
 
  The error pointed out in this thread pops up in the next boot.
 
 I put a call to kdb_enter() at the beginning of the function, helped
 with some textdump I got all the backtrace [0] for all the time
 em_setup_receive_ring() is called. All are exactly the same:

 kdb_enter_why(0,c09f6511,f391aaa8,c09be1e2,c09f6511,...) at
 kdb_enter_why+0x3b
 kdb_enter(c09f6511,0,3810,,5dc,...) at kdb_enter+0x19
 em_setup_receive_ring(c3c8d600,c3c8d7a4,c3c96004,31fa,c3c8d600,...)
 at em_setup_receive_ring+0x22
 em_setup_receive_structures(c3c96000,f15f2000,38,8100,3,...) at
 em_setup_receive_structures+0x26
 em_init_locked(c3c96000,0,c09f5de5,414,1,...) at em_init_locked+0x2f2
 em_ioctl(c3c7d000,80206934,c3ce9d00,c07b7a0b,c3f2a230,...) at
 em_ioctl+0x1c3
 ifhwioctl(c3f2a230,f391ac34,c07b7a0b,c3f3e3d0,c08df1c0,...) at
 ifhwioctl+0x4b8
 ifioctl(c3f3e3d0,80206934,c3ce9d00,c3f2a230,c3f2a230,...) at ifioctl+0x82
 kern_ioctl(c3f2a230,3,80206934,c3ce9d00,c3ce9d00,...) at kern_ioctl+0xa8
 ioctl(c3f2a230,f391acf8,c,c,f391ad2c,...) at ioctl+0xc5
 syscall(f391ad38) at syscall+0x17d
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x4816ee23, esp =
 0xbfbfe67c, ebp = 0xbfbfe698 ---

 This fully explain why the main loop in em_setup_receive_ring() is
 never entered, as we always verify `j == rxr-next_to_check' (provided
 that mbuf have been refreshed if some packet were transfered) and
 return the value on the stack. As of now, beside changing the
 call-site of em_setup_receive_ring() to ensure it is never re-entered,
 I'd guess that the patch I sent earlier today, is the only way to
 ensure that no junk is returned.

 I'd guess that the driver _is_ able to transmit, if the code was not
 explicitly calling em_stop() upon em_setup_receive_structures()
 failure.

  - Arnaud

 [0]: I wish that would have been as easy as in Linux, where a WARN()
 call do all the job automatically, but still, I should not hope for
 that much unless I am the one implementing it ... yes, free whining,
 it's 2a.m. ...

   - Arnaud
 
  The data your netstat -m shows suggests to me that what's happening
  is somehow setup of the receive ring is running more than once maybe??
 
  You asked at one point how this could go into STABLE, well, because
  not only here at Intel, but at lots of external customers this code has
 been
  used and tested thoroughly.
 
  I am not calling into question your problem, but until I understand what
 it
  is I cannot fix it :)
 
  The thing I am guessing right now is the culprit is the setup code, the
  reason
  is that when I ported to the igb driver I found that it did not work on
 our
  newer
  hardware, and so I went back to the older version of setup for igb. Now,
  even
  though I have not seen hardware fail with em, maybe there is some.
 
  To help me give me a complete pciconf -lv, and if its a namebrand system
  tell me that, including all hardware in it.
 
  If you like Olivier I can make a version of em for you that also reverts
 the
  setup code the way I did for igb, see if that fixes it for you?
 
  Thanks for your patience,
 
  Jack
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to 
 freebsd-current-unsubscr...@freebsd.org
 
 

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: bsdlabel showing value zero on fsize, bsize and bps/cpg for all partitions

2011-05-05 Thread Andrey V. Elsukov
On 05.05.2011 10:35, Sergi Seira wrote:
 on freebsd version 6 and 7 I was relaying on bsdlabel to get block size :
 Has anyone seen this?
 Is it some step I missed on install?
 Is there another command to get block size?

I think dumpfs(8) is better tool for doing that.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


My problems with stability on -current

2011-05-05 Thread Doug Barton
This is long, sorry. I wish I could condense things down to just the 
answer, or even just the question, but here goes. I've used HEAD on my 
main workstation(s) for many years. It's common for there to be ups and 
downs, and that's fine. Lately however the problems have been debilitating.


First a timeline. Since sometime before January 2008 I've been using a 
Dell Latitude D620 laptop as my primary system. It has a core 2 duo 
running at 2.33 G, and 2 G RAM. I 4xboot it with windows xp, freebsd 
current (amd64), another freebsd (usually 8.N-RELEASE i386) and Ubuntu. 
On the first and last I don't do a lot of compiling obviously, but even 
under heavy load on 8.2-RELEASE I'm not seeing problems, so the problems 
I _am_ seeing are not hardware related.


I keep my system very close to stock. My kernel config is GENERIC minus 
devices I don't have, and plus the following:


options EXT2FS
options IEEE80211_DEBUG # enable debug msgs
options VESA
device  atapicam
device  sound
device  snd_hda
device  snp

I was building with clang for a while, but when the problems started I 
went back to gcc. I still have INVARIANTS on but I disabled WITNESS 
because with all the known+unfixed LORs it's kind of pointless. Nothing 
interesting in make/src.conf either, the latter is just a list of stuff 
not to build, KERNCONF, and MODULES_OVERRIDE.


Starting around December 2009 I started having problems under load with 
-current. Often I reported them, sometimes problems were found, 
sometimes not. In the course of trying to debug those problems I 
disabled throttling, which helped. Switching to SCHED_4BSD also helped 
quite a bit with interactivity under load, although it was still worse 
than on 8.x.


In October of 2010 I was lucky enough to receive a donation of a Dell 
Optiplex desktop that I started using as my primary workstation. Around 
that same time there was some work being done in the scheduler(s) and 
various related systems, and my desktop (which had a slightly faster 
core 2 duo and 4 G RAM) was running great. I assumed that the problems 
were solved.


Then 2 months ago I packed up the desktop system and pulled out the 
laptop again. I updated to the latest -current on the laptop, and all 
heck broke loose. I couldn't do anything on my laptop that created even 
a mediocre load without it crashing. Trying to do something like a 
buildworld (even without -j) would cause the system to absolutely crawl. 
I'd get tons of the dreaded calcru messages about time going 
backwards, and the system clock would lose literally minutes of wall 
clock time. At one point when I could keep it up long enough to build 
the world without crashing it had lost 40 minutes of wall clock time 
when it finished. I think that specific problem happened sometime 
between March 15 and r220282.


In trying to find that problem, I uncovered another, deeper problem with 
the one-shot timers from r212541. In order to make my binary search 
easier for the problem described above I was using a -current snapshot 
CD from August 2010 that I had laying around. I could easily build world 
with -j2, run X, do normal desktop stuff (firefox, thunderbird, pidgin, 
etc.) all at the same time. When I got closer to the more recent 
-current, it would crash as soon as I put a load on it. I eventually 
bifurcated down to that exact commit. I've been running on 212540 for 
over a week now without any problems, including lots of port builds with 
FORCE_MAKE_JOBS, etc.


Alexander suggested some knobs to twist for the timers, and I'll be glad 
to do that once he gets back to me with more concrete suggestions now 
that he knows more about my specific problems.



Doug

--

Nothin' ever doesn't change, but nothin' changes much.
-- OK Go

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Interrupt storm with MSI in combination with em1

2011-05-05 Thread Daan Vreeken
Hi Jack,

On Thursday 05 May 2011 02:25:39 Jack Vogel wrote:
 OK, but the reason you see the multiple cases of irq 16 is that's the
 bridge,
 once you are using MSIX, as vmstat shows, its using other vectors.

 Can you capture the messages file with the actual storm happening?

I'll do that as soon as I witness another storm. Right now the system has been 
up over half a day (with MSI/MSIX enabled) and everything seems to be working 
as it should.

 I noticed some complaints about checksums in the dmesg, have you
 checked on BIOS upgrades or something like that on your motherboard?

Not yet. I'll reboot the machine later today when I have physical access to it 
to check the BIOS version. I'll keep you informed as soon as I get another 
storm going.


 On Wed, May 4, 2011 at 4:27 PM, Daan Vreeken d...@vehosting.nl wrote:
  On Thursday 05 May 2011 00:15:43 you wrote:
   This all looks completely kosher,  what IRQ is the storm on??
 
  IRQ 16. Further down this email there is a list of devices that share the
  IRQ
  according to 'dmesg'.
 
   On Wed, May 4, 2011 at 3:04 PM, Daan Vreeken d...@vehosting.nl wrote:
Hi,
   
On Wednesday 04 May 2011 20:47:32 Jack Vogel wrote:
 Will you please set it back to a default and then boot and capture
  the
 message for me?
   
No problem. Here's the output with MSI/MSIX enabled :
   
http://vehosting.nl/pub_diffs/dmesg_plantje2_with_msix_2011_05_04.txt
   
I've also added the output of vmstat -i a couple of minutes after a
reboot
with MSI enabled :
   http://vehosting.nl/pub_diffs/vmstat_i_2011_05_04.txt
   
Note that in the above vmstat -i dump the interrupt storm hasn't
started yet. For some reason the storm doesn't always start directly
at boot. I haven't been able (yet) to pinpoint what's triggering it
to start.
   
 On Wed, May 4, 2011 at 11:19 AM, Daan Vreeken d...@vehosting.nl
 
  wrote:
  Hi Jack,
 
  Wednesday 04 May 2011 19:46:05 Jack Vogel wrote:
   Who makes your motherboard? The problem you are having is that
  MSIX
   AND MSI are both failing as em0 comes up, so it falls back to
  Legacy
   interrupt mode,
   and must be having some issue with sharing the line, causing
   the storm.
 
  The motherboard is an Asus P7H55-M.
 
  Sorry, I should have mentioned that the dmesg output is from
  booting
  with :
   hw.pci.enable_msix=0
   hw.pci.enable_msi=0
 
  .. in loader.conf.
 
  With those lines in loader.conf, MSI and MSIX is disabled, both
  cards work
  like they should and there is no interrupt storm.
 
  With MSI/MSIX enabled, both cards work like they should and I see
  the
  counters
  of the MSI interrupts increase (in small amounts, like they
  should),
  but at boot-time an interrupt storm starts on 'legacy' IRQ 16.
 
  Because the only difference between disabling/enabling MSI/MSIX
  seems
  to be in
  the way em0/em1 are used, and because 'em1' shares IRQ 16
  according to the dmesg, I'm suspecting 'em1' is causing the
  storm. (But please correct me if I'm wrong :)
 
  What can I do to help track this problem down?
 
According to dmesg the following devices share IRQ 16 :
   pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on
  pci0
   em0: Intel(R) PRO/1000 Network Connection 7.2.3 port
0xcc00-0xcc1f mem
  0xf7de-0xf7df,0xf7d0-0xf7d7,0xf7ddc000-0xf7dd
  irq 16 at device 0.0 on pci1
   vgapci0: VGA-compatible display port 0xbc00-0xbc07
  mem 0xf780-0xf7bf,0xe000-0xefff irq
  16
at device 2.0 on
  pci0
   ehci0: Intel PCH USB 2.0 controller USB-B mem
0xf7cfa000-0xf7cfa3ff
  irq 16 at device 26.0 on pci0
   em1: Intel(R) PRO/1000 Network Connection 7.2.3 port
0xec00-0xec1f mem
  0xf7fe-0xf7ff,0xf7f0-0xf7f7,0xf7fdc000-0xf7fd
  irq 16 at device 0.0 on pci4
   pcib4: ACPI PCI-PCI bridge irq 16 at device 28.5 on
  pci0
During a storm vmstat -i shows a rate of about 220.000
interrupts/sec.
MSI
interrupt delivery to both 'em0' and 'em1' seems to work
correctly during
a storm, as I see their counters increase normally in the
  vmstat
-i output.
As only 'em0' and 'em1' seem to be using MSI interrupts, my
  guess
is that the
e1000 driver is causing this problem. Could it be that the
  driver
forgets to
clear/mask legacy interrupts when attaching the MSI
interrupts perhaps?
   
Any tips on how to debug and/or fix this?
   
   
The full output of dmesg can be found here :
   
http://vehosting.nl/pub_diffs/dmesg_plantje2_2011_05_04.txt
   
And the 

Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Olivier Smedts
Hello,

2011/5/4 Arnaud Lacombe lacom...@gmail.com:
 Hi,

 On Wed, May 4, 2011 at 3:58 AM, Olivier Smedts oliv...@gid0.org wrote:
 em0: Using an MSI interrupt
 em0: Ethernet address: d4:85:64:b2:aa:f5
 em0: Could not setup receive structures
 em0: Could not setup receive structures

 What can we do to help you debug this ?

 At some point in time, in late February, I had the same issue on a
 6-interface machine. I tracked this down to the fact that the main
 loop in em_setup_receive_ring() was not being entered. This resulted
 in junk being returned as `error'  is not explicitly initialized. At
 the time, the following patch worked for me. Without it the driver was
 unable to initialize with RX/TX ring's size of 512. With it, ring's
 size of 1024 initialized fine.

 diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c
 index fb6ed67..f02059a 100644
 --- a/sys/dev/e1000/if_em.c
 +++ b/sys/dev/e1000/if_em.c
 @@ -3901,7 +3901,7 @@ em_setup_receive_ring(struct rx_ring *rxr)
        struct  adapter         *adapter = rxr-adapter;
        struct em_buffer        *rxbuf;
        bus_dma_segment_t       seg[1];
 -       int                     i, j, nsegs, error;
 +       int                     i, j, nsegs, error = 0;

This patch made the trick for me. I'll post what Jack asked for in the
following mail.

 I did not dig much more at the time, but I was definitively seeing an
 odd behavior. Anyhow, I am no longer able to reproduce this with
 7.2.3, so cannot dig in more details.

 Btw, I wish you all luck, it took me nearly two full months to
 convince Jack (and other FreeBSD devs) that there was a bug in the
 mbuf refresh code.

  - Arnaud




-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email  vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Olivier Smedts
Hello,

(sorry for dual posting)

2011/5/4 Jack Vogel jfvo...@gmail.com:
 I have had my validation engineer busy all day, we have tried both
 a 9 kernel as well as 8.2,  using the code from HEAD, and we
 cannot reproduce this problem.

 The data your netstat -m shows suggests to me that what's happening
 is somehow setup of the receive ring is running more than once maybe??

 You asked at one point how this could go into STABLE, well, because
 not only here at Intel, but at lots of external customers this code has been
 used and tested thoroughly.

 I am not calling into question your problem, but until I understand what it
 is I cannot fix it :)

 The thing I am guessing right now is the culprit is the setup code, the
 reason
 is that when I ported to the igb driver I found that it did not work on our
 newer
 hardware, and so I went back to the older version of setup for igb. Now,
 even
 though I have not seen hardware fail with em, maybe there is some.

 To help me give me a complete pciconf -lv, and if its a namebrand system
 tell me that, including all hardware in it.

The computer is a HP Compaq 8100 Elite Convertible Minitower PC.

Here is what I have with the new driver and Arnaud Lacombe's patch.

%uname -a
FreeBSD zozo.afpicl.lan 9.0-CURRENT FreeBSD 9.0-CURRENT #0
r219752:221420: Wed May  4 11:16:37 CEST 2011
r...@zozo.afpicl.lan:/usr/obj/usr/src/sys/CORE  amd64

%pciconf -lv
hostb0@pci0:0:0:0:  class=0x06 card=0x304b103c chip=0xd1318086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = HOST-PCI
pcib1@pci0:0:3:0:   class=0x060400 card=0x304b103c chip=0xd1388086
rev=0x11 hdr=0x01
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = PCI-PCI
none0@pci0:0:8:0:   class=0x088000 card=0x004b003c chip=0xd1558086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none1@pci0:0:8:1:   class=0x088000 card=0x004b003c chip=0xd1568086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none2@pci0:0:8:2:   class=0x088000 card=0x004b003c chip=0xd1578086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none3@pci0:0:8:3:   class=0x088000 card=0x004b003c chip=0xd1588086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none4@pci0:0:16:0:  class=0x088000 card=0x004b003c chip=0xd1508086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none5@pci0:0:16:1:  class=0x088000 card=0x004b003c chip=0xd1518086
rev=0x11 hdr=0x00
   vendor = 'Intel Corporation'
   class  = base peripheral
none6@pci0:0:22:0:  class=0x078000 card=0x304b103c chip=0x3b648086
rev=0x06 hdr=0x00
   vendor = 'Intel Corporation'
   class  = simple comms
none7@pci0:0:22:3:  class=0x070002 card=0x304b103c chip=0x3b678086
rev=0x06 hdr=0x00
   vendor = 'Intel Corporation'
   class  = simple comms
   subclass   = UART
em0@pci0:0:25:0:class=0x02 card=0x304b103c chip=0x10ef8086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   class  = network
   subclass   = ethernet
ehci0@pci0:0:26:0:  class=0x0c0320 card=0x304b103c chip=0x3b3c8086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   class  = serial bus
   subclass   = USB
hdac1@pci0:0:27:0:  class=0x040300 card=0x304b103c chip=0x3b568086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   class  = multimedia
   subclass   = HDA
pcib2@pci0:0:28:0:  class=0x060400 card=0x304b103c chip=0x3b428086
rev=0x05 hdr=0x01
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = PCI-PCI
pcib3@pci0:0:28:4:  class=0x060400 card=0x304b103c chip=0x3b4a8086
rev=0x05 hdr=0x01
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = PCI-PCI
pcib4@pci0:0:28:6:  class=0x060400 card=0x304b103c chip=0x3b4e8086
rev=0x05 hdr=0x01
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = PCI-PCI
ehci1@pci0:0:29:0:  class=0x0c0320 card=0x304b103c chip=0x3b348086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   class  = serial bus
   subclass   = USB
pcib5@pci0:0:30:0:  class=0x060401 card=0x304b103c chip=0x244e8086
rev=0xa5 hdr=0x01
   vendor = 'Intel Corporation'
   device = '82801 Family (ICH2/3/4/5/6/7/8/9,63xxESB) Hub
Interface to PCI Bridge'
   class  = bridge
   subclass   = PCI-PCI
isab0@pci0:0:31:0:  class=0x060100 card=0x304b103c chip=0x3b0a8086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = PCI-ISA
ahci0@pci0:0:31:2:  class=0x010601 card=0x304b103c chip=0x3b228086
rev=0x05 hdr=0x00
   vendor = 'Intel Corporation'
   device = 'IBEX AHCI Controller(6Port) (Intel Q57 Express)'
   class  = mass storage
   subclass   = SATA
vgapci0@pci0:1:0:0: class=0x03 card=0x10021002 chip=0x94981002
rev=0x00 hdr=0x00
   

Re: Switch from legacy ata(4) to CAM-based ATA

2011-05-05 Thread Sergey Kandaurov
2011/4/20 Alexander Motin m...@freebsd.org:
 Hi.

 With 9.0 release approaching quickly, I believe it the best time now to
 manage migration from legacy ata(4) ATA to the new CAM-based one. New
 ATA code present in the tree for more then a year now, used by many
 people and proved it's superior functionality and reliability. The only
 major issue with it now is the migration process. Sooner or later we
 have to pass it, but due to major UI and API changes we can't do it
 after 9.0 release. So I propose to do it the next Sunday (April 24) to
 have as much time for troubleshooting as possible.

 I have prepared the following patch to do it:
 http://people.freebsd.org/~mav/ata_switch.patch

 I haven't added geom_raid to the kernel configurations because we have
 no other GEOM classes there. But tell me if you thing I should.

 If somebody has any problems with new ATA stack, please repeat your
 tests with latest HEAD code and contact me if problem is still there.
 Next three weeks before BSDCan I am going to dedicate to fixing possibly
 remaining issues.


XENHVM uses it's own naming scheme and can name disks as daN or adN,
depending on virtual block device id. atapci0/ata0/ata1 devices still present
there (such as in Bruce Cran's dmesg), but no any disks attached from it:
instead, all of them hung from device/vbd/N.
[In a non-XENHVM mode they are attached from ataN channels, as usual.]

/*
 * Translate Linux major/minor to an appropriate name and unit
 * number. For HVM guests, this allows us to use the same drive names
 * with blkfront as the emulated drives, easing transition slightly.
 */

xenbusb_front0: Xen Frontend Devices on xenstore0
xenbusb_back0: Xen Backend Devices on xenstore0
xctrl0: Xen Control Device on xenstore0
xbd0: 17000MB Virtual Block Device at device/vbd/768 on xenbusb_front0
xbd0: attaching as ad0
GEOM: ad0s1: geometry does not match label (16h,63s != 255h,63s).
xbd1: 3812MB Virtual Block Device at device/vbd/2048 on xenbusb_front0
xbd1: attaching as da0
xbd2: 114439MB Virtual Block Device at device/vbd/2064 on xenbusb_front0
xbd2: attaching as da1

Probably, /sys/dev/xen/blkfront/blkfront.c needs updating by s/ad/ada/g;
or such. I believe, xen generates sequential numbers starting from zero
(or rather such numbers that can be converted to sequential numbers),
similar to what ATA_CAM does.

-- 
wbr,
pluknet
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread O. Hartmann

On 05/04/11 16:20, Dimitry Andric wrote:

On 2011-05-04 15:44, Manfred Antar wrote:
...

src.conf:

WITHOUT_DYNAMICROOT=yes
WITH_IDEA=yes
.if !defined(CC) || ${CC} == cc
CC=clang
.endif
.if !defined(CXX) || ${CXX} == c++
CXX=clang++
.endif
#Don't die on warnings
NO_WERROR=
WERROR=


Aha. Please move the clang-related stuff to make.conf instead, e.g.
this fragment:

.if !defined(CC) || ${CC} == cc
CC=clang
.endif
.if !defined(CXX) || ${CXX} == c++
CXX=clang++
.endif
#Don't die on warnings
NO_WERROR=
WERROR=




On a notebook (DELL Latitude E6510) I tried compiling world with CLANG. 
So far, so good. It worked. But after rebooting I got a strange 
misbehaviour of the xdm login window (black/white instead of coloured), 
but this was only some superficial symptome. The whole system seems to 
be corrupted. Hitting tab key results like hitting exit in the console. 
The gcc 4.2.1 system compiler isn't capable of producing binaries, see 
message below. At this very moment, the box isn't usable anymore, I 
can't even compile a world with cc (see error below, that was generated 
by trying to compile a kernel and I'm really confused why cc is used 
instead of clang).


Well, the boxes I reported errors from prior to this are desktop systems 
with nVidia (Fermi based) graphics boards using a driver BLOB 270.XX.XX 
which is also used by the notebook.


The desktop boxes uses C2D based intel chips, the notebook uses a 
Core-i5 based chip. All systems got compiled with option


CPUTYPE?=native

I guess the first compilation with CLANG destroyed the base' system 
compiler, at this moment I'm incapable of switching back. Floating like 
a dead man in the water.



Any suggestions?

Regards and thanks in advance,
Oliver
---
awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -h
awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -d
rpcgen -hM /usr/src/sys/kgssapi/gssd.x | grep -v pthread.h  gssd.h
cc1: internal compiler error: Bus error: 10
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.
rpcgen -c /usr/src/sys/kgssapi/gssd.x -o gssd_xdr.c
cc1: internal compiler error: Bus error: 10
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.
*** Error code 1

Stop in /usr/obj/usr/src/sys/MUNIN.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Olivier Smedts
2011/5/5 O. Hartmann ohart...@zedat.fu-berlin.de:
 On 05/04/11 16:20, Dimitry Andric wrote:

 On 2011-05-04 15:44, Manfred Antar wrote:
 ...

 src.conf:

 WITHOUT_DYNAMICROOT=yes
 WITH_IDEA=yes
 .if !defined(CC) || ${CC} == cc
 CC=clang
 .endif
 .if !defined(CXX) || ${CXX} == c++
 CXX=clang++
 .endif
 #Don't die on warnings
 NO_WERROR=
 WERROR=

 Aha. Please move the clang-related stuff to make.conf instead, e.g.
 this fragment:

 .if !defined(CC) || ${CC} == cc
 CC=clang
 .endif
 .if !defined(CXX) || ${CXX} == c++
 CXX=clang++
 .endif
 #Don't die on warnings
 NO_WERROR=
 WERROR=



 On a notebook (DELL Latitude E6510) I tried compiling world with CLANG. So
 far, so good. It worked. But after rebooting I got a strange misbehaviour of
 the xdm login window (black/white instead of coloured), but this was only
 some superficial symptome. The whole system seems to be corrupted. Hitting
 tab key results like hitting exit in the console. The gcc 4.2.1 system
 compiler isn't capable of producing binaries, see message below. At this
 very moment, the box isn't usable anymore, I can't even compile a world with
 cc (see error below, that was generated by trying to compile a kernel and
 I'm really confused why cc is used instead of clang).

 Well, the boxes I reported errors from prior to this are desktop systems
 with nVidia (Fermi based) graphics boards using a driver BLOB 270.XX.XX
 which is also used by the notebook.

 The desktop boxes uses C2D based intel chips, the notebook uses a Core-i5
 based chip. All systems got compiled with option

 CPUTYPE?=native

Can you try without CPUTYPE native, or with another value ?
native is not a supported value in /usr/share/mk/bsd.cpu.mk

With gcc I used :
CPUTYPE?=core2
CFLAGS=-O2 -pipe -march=native
NO_CPU_CFLAGS=yes
COPTFLAGS=-O2 -pipe -march=native
NO_CPU_COPTFLAGS=yes

So that /usr/share/mk/bsd.cpu.mk could set the right variables and I
could set my own -march value in CFLAGS for gcc.

But now for HEAD (which has a newer gcc and clang) I use :
CPUTYPE?=core2
CFLAGS=-O2 -pipe -march=core2
NO_CPU_CFLAGS=yes
COPTFLAGS=-O2 -pipe -march=core2
NO_CPU_COPTFLAGS=yes

Because with clang, -march=native often breaks buildworld, while
-march=core2 is ok.

First, try to see if you buildworld is still broken with a different
(or empty!) make.conf.

 I guess the first compilation with CLANG destroyed the base' system
 compiler, at this moment I'm incapable of switching back. Floating like a
 dead man in the water.


 Any suggestions?

 Regards and thanks in advance,
 Oliver
 ---
 awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -h
 awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -d
 rpcgen -hM /usr/src/sys/kgssapi/gssd.x | grep -v pthread.h  gssd.h
 cc1: internal compiler error: Bus error: 10
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See URL:http://gcc.gnu.org/bugs.html for instructions.
 rpcgen -c /usr/src/sys/kgssapi/gssd.x -o gssd_xdr.c
 cc1: internal compiler error: Bus error: 10
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See URL:http://gcc.gnu.org/bugs.html for instructions.
 *** Error code 1

 Stop in /usr/obj/usr/src/sys/MUNIN.
 *** Error code 1

 Stop in /usr/src.
 *** Error code 1

 Stop in /usr/src.
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org




-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email  vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


atkbdc broken on current ?

2011-05-05 Thread Damjan Marion

Hi,

I have issue with old HP DL380G3 server. When I use ILO virtual console to 
manage server. Seems that 9-CURRENT fails to detect atkbdc.
When I boot 8.2-RELEASE it works well.

8.2 dmesg shows:

atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0

9.0:

atkbdc0: Keyboard controller (i8042) failed to probe at port 0x60 on isa0

Is this a known issue?

Should I enable some additional outputs, like KBDIO_DEBUG?

Thanks,

Damjan___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Roman Divacky
 Because with clang, -march=native often breaks buildworld, while
 -march=core2 is ok.

Can you be more specific about this claim? On what CPU are seeing
this breakage?

Anyway, can you compile and run on that machine this:

http://lev.vlakno.cz/~rdivacky/Host.cpp

It's the LLVM CPU autodetection code, it will print the name of
your CPU. I wonder whats the difference to core2.

Thank you. roman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Olivier Smedts
2011/5/5 Roman Divacky rdiva...@freebsd.org:
 Because with clang, -march=native often breaks buildworld, while
 -march=core2 is ok.

 Can you be more specific about this claim? On what CPU are seeing
 this breakage?

On a Core2 Quad Q9450 and a Core i7 860.
I use core2 on both because that's the most approaching values
supported in bsd.cpu.mk and gcc in HEAD.
I reverted from -march=native to -march=core2 for two reasons, the
first beeing that gcc didn't use the right -mtune when using
-march=native (I think it was using internally -mtune=generic).
I'll try to be more specific if I can find the tests I was using at
that time. The second reason is that with -march=native, my
buildworld often failed with clang, and since I use -march=core2 I
had no issues. I'll try to buildworld with -march=native and report
back.

 Anyway, can you compile and run on that machine this:

        http://lev.vlakno.cz/~rdivacky/Host.cpp

Compiled with gcc and clang, both output (on one of the two computers
I use most) :
roman = corei7

 It's the LLVM CPU autodetection code, it will print the name of
 your CPU. I wonder whats the difference to core2.

 Thank you. roman

Cheers

-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email  vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Arnaud Lacombe
Hi,

On Thu, May 5, 2011 at 2:59 AM, Jack Vogel jfvo...@gmail.com wrote:
 OK, but what this does not explain is why I do not see this if
 its so easily reproduced, what causes the failure case, any idea?

It is completely random as it depends on the content of the stack. I
spent 3 or 4 hours trying to reproduce it using different approach on
different platform, with different version of the code and failed. And
once `error' was explicitly colored, it popped up. That's the beauty
of error related with uninitialized variable.

 - Arnaud

 As I said, given the code was not feasible for igb anyway I would not
 be unhappy about returning to the old way of doing things.

I am not sure what you mean by old way of doing thing, but I'd guess
that the ring only need to be setup on a few occasion, like
initialization and MTU transition. I'm not sure either how other
driver manage their ring.

 Jack


 On Wed, May 4, 2011 at 11:03 PM, Arnaud Lacombe lacom...@gmail.com wrote:

 Hi,

 On Thu, May 5, 2011 at 1:20 AM, Arnaud Lacombe lacom...@gmail.com wrote:
  Hi,
 
  On Wed, May 4, 2011 at 5:38 PM, Jack Vogel jfvo...@gmail.com wrote:
  I have had my validation engineer busy all day, we have tried both
  a 9 kernel as well as 8.2,  using the code from HEAD, and we
  cannot reproduce this problem.
 
  Actually, it can be trivially reproduced by tainting `error'. As it is
  uninitialized in HEAD, it's value can be _anything_, so let's mark it
  as explicitly invalid.
 
  diff -u ./if_em.c /data/src/freebsd/em-7.2.2/src/if_em.c
  --- ./if_em.c   2011-02-18 01:18:23.0 -0500
  +++ /data/src/freebsd/em-7.2.2/src/if_em.c      2011-05-05
  01:12:01.0 -0400
  @@ -3912,7 +3912,7 @@
         struct  adapter         *adapter = rxr-adapter;
         struct em_buffer        *rxbuf;
         bus_dma_segment_t       seg[1];
  -       int                     i, j, nsegs, error;
  +       int                     i, j, nsegs, error = -1;
 
  The error pointed out in this thread pops up in the next boot.
 
 I put a call to kdb_enter() at the beginning of the function, helped
 with some textdump I got all the backtrace [0] for all the time
 em_setup_receive_ring() is called. All are exactly the same:

 kdb_enter_why(0,c09f6511,f391aaa8,c09be1e2,c09f6511,...) at
 kdb_enter_why+0x3b
 kdb_enter(c09f6511,0,3810,,5dc,...) at kdb_enter+0x19
 em_setup_receive_ring(c3c8d600,c3c8d7a4,c3c96004,31fa,c3c8d600,...)
 at em_setup_receive_ring+0x22
 em_setup_receive_structures(c3c96000,f15f2000,38,8100,3,...) at
 em_setup_receive_structures+0x26
 em_init_locked(c3c96000,0,c09f5de5,414,1,...) at em_init_locked+0x2f2
 em_ioctl(c3c7d000,80206934,c3ce9d00,c07b7a0b,c3f2a230,...) at
 em_ioctl+0x1c3
 ifhwioctl(c3f2a230,f391ac34,c07b7a0b,c3f3e3d0,c08df1c0,...) at
 ifhwioctl+0x4b8
 ifioctl(c3f3e3d0,80206934,c3ce9d00,c3f2a230,c3f2a230,...) at ifioctl+0x82
 kern_ioctl(c3f2a230,3,80206934,c3ce9d00,c3ce9d00,...) at kern_ioctl+0xa8
 ioctl(c3f2a230,f391acf8,c,c,f391ad2c,...) at ioctl+0xc5
 syscall(f391ad38) at syscall+0x17d
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x4816ee23, esp =
 0xbfbfe67c, ebp = 0xbfbfe698 ---

 This fully explain why the main loop in em_setup_receive_ring() is
 never entered, as we always verify `j == rxr-next_to_check' (provided
 that mbuf have been refreshed if some packet were transfered) and
 return the value on the stack. As of now, beside changing the
 call-site of em_setup_receive_ring() to ensure it is never re-entered,
 I'd guess that the patch I sent earlier today, is the only way to
 ensure that no junk is returned.

 I'd guess that the driver _is_ able to transmit, if the code was not
 explicitly calling em_stop() upon em_setup_receive_structures()
 failure.

  - Arnaud

 [0]: I wish that would have been as easy as in Linux, where a WARN()
 call do all the job automatically, but still, I should not hope for
 that much unless I am the one implementing it ... yes, free whining,
 it's 2a.m. ...

   - Arnaud
 
  The data your netstat -m shows suggests to me that what's happening
  is somehow setup of the receive ring is running more than once maybe??
 
  You asked at one point how this could go into STABLE, well, because
  not only here at Intel, but at lots of external customers this code has
  been
  used and tested thoroughly.
 
  I am not calling into question your problem, but until I understand
  what it
  is I cannot fix it :)
 
  The thing I am guessing right now is the culprit is the setup code, the
  reason
  is that when I ported to the igb driver I found that it did not work on
  our
  newer
  hardware, and so I went back to the older version of setup for igb.
  Now,
  even
  though I have not seen hardware fail with em, maybe there is some.
 
  To help me give me a complete pciconf -lv, and if its a namebrand
  system
  tell me that, including all hardware in it.
 
  If you like Olivier I can make a version of em for you that also
  reverts the
  

Re: Clang error make buildworld

2011-05-05 Thread O. Hartmann

On 05/05/11 15:46, Olivier Smedts wrote:

2011/5/5 O. Hartmannohart...@zedat.fu-berlin.de:

On 05/04/11 16:20, Dimitry Andric wrote:


On 2011-05-04 15:44, Manfred Antar wrote:
...


src.conf:

WITHOUT_DYNAMICROOT=yes
WITH_IDEA=yes
.if !defined(CC) || ${CC} == cc
CC=clang
.endif
.if !defined(CXX) || ${CXX} == c++
CXX=clang++
.endif
#Don't die on warnings
NO_WERROR=
WERROR=


Aha. Please move the clang-related stuff to make.conf instead, e.g.
this fragment:

.if !defined(CC) || ${CC} == cc
CC=clang
.endif
.if !defined(CXX) || ${CXX} == c++
CXX=clang++
.endif
#Don't die on warnings
NO_WERROR=
WERROR=




On a notebook (DELL Latitude E6510) I tried compiling world with CLANG. So
far, so good. It worked. But after rebooting I got a strange misbehaviour of
the xdm login window (black/white instead of coloured), but this was only
some superficial symptome. The whole system seems to be corrupted. Hitting
tab key results like hitting exit in the console. The gcc 4.2.1 system
compiler isn't capable of producing binaries, see message below. At this
very moment, the box isn't usable anymore, I can't even compile a world with
cc (see error below, that was generated by trying to compile a kernel and
I'm really confused why cc is used instead of clang).

Well, the boxes I reported errors from prior to this are desktop systems
with nVidia (Fermi based) graphics boards using a driver BLOB 270.XX.XX
which is also used by the notebook.

The desktop boxes uses C2D based intel chips, the notebook uses a Core-i5
based chip. All systems got compiled with option

CPUTYPE?=native


Can you try without CPUTYPE native, or with another value ?
native is not a supported value in /usr/share/mk/bsd.cpu.mk

With gcc I used :
CPUTYPE?=core2
CFLAGS=-O2 -pipe -march=native
NO_CPU_CFLAGS=yes
COPTFLAGS=-O2 -pipe -march=native
NO_CPU_COPTFLAGS=yes

So that /usr/share/mk/bsd.cpu.mk could set the right variables and I
could set my own -march value in CFLAGS for gcc.

But now for HEAD (which has a newer gcc and clang) I use :
CPUTYPE?=core2
CFLAGS=-O2 -pipe -march=core2
NO_CPU_CFLAGS=yes
COPTFLAGS=-O2 -pipe -march=core2
NO_CPU_COPTFLAGS=yes

Because with clang, -march=native often breaks buildworld, while
-march=core2 is ok.

First, try to see if you buildworld is still broken with a different
(or empty!) make.conf.



Well I would like to to as suggested, but I can not even build a 
system/kernel anymore. Using clang, the build process dies when it comes 
to rpcgen as shown below, it uses cc (fixed) and cc doen't work 
properly anymore.




I guess the first compilation with CLANG destroyed the base' system
compiler, at this moment I'm incapable of switching back. Floating like a
dead man in the water.


Any suggestions?

Regards and thanks in advance,
Oliver
---
awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -h
awk -f /usr/src/sys/tools/usbdevs2h.awk /usr/src/sys/dev/usb/usbdevs -d
rpcgen -hM /usr/src/sys/kgssapi/gssd.x | grep -v pthread.h  gssd.h
cc1: internal compiler error: Bus error: 10
Please submit a full bug report,
with preprocessed source if appropriate.
SeeURL:http://gcc.gnu.org/bugs.html  for instructions.
rpcgen -c /usr/src/sys/kgssapi/gssd.x -o gssd_xdr.c
cc1: internal compiler error: Bus error: 10
Please submit a full bug report,
with preprocessed source if appropriate.
SeeURL:http://gcc.gnu.org/bugs.html  for instructions.
*** Error code 1

Stop in /usr/obj/usr/src/sys/MUNIN.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Arnaud Lacombe
Hi,

On Wed, May 4, 2011 at 3:00 AM, Alastair Hogge a...@fastmail.fm wrote:
 [.]
 I also tried 2x,  4x 25600 for max mbuff clusters via kern.ipc.nmbclusters.
 This didn't help.

For the record, I did the math yestarday, checked the code. By
default, a machine with 6 82574L-backed em(4) interfaces, with only 3
used (ie. brought up), initializes and work just fine with as low as
3076 mbuf clusters (1024*3 + 2).  It has been transferring about 28k
pps or 20Mbps of traffic (ICMP ping flood) since for the last 10h.
Here is the `netstat -m' output:

# netstat -m
2879/916/3795 mbufs in use (current/cache/total)
2877/199/3076/3076 mbuf clusters in use (current/cache/total/max)
2877/199 mbuf+clusters out of packet secondary zone in use (current/cache)
0/2/2/1537 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/768 9k jumbo clusters in use (current/cache/total/max)
0/0/0/384 16k jumbo clusters in use (current/cache/total/max)
6473K/635K/7108K bytes allocated to network (current/cache/total)
0/540580029/268859859 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/5/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

and, yes, allocation denial has sky-rocketed, but beside that the
driver is stable. In that case, the uninitialized issue did not happen
when the system booted.

The complete machine should be able to initialize properly with 6146 clusters.

 - Arnaud
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Olivier Smedts
2011/5/5 Roman Divacky rdiva...@freebsd.org:
 Because with clang, -march=native often breaks buildworld, while
 -march=core2 is ok.

 Can you be more specific about this claim? On what CPU are seeing
 this breakage?

Ok, with latest HEAD...

%echo | gcc -march=native -E -v -x c -### -
Using built-in specs.
Target: amd64-undermydesk-freebsd
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 4.2.2 20070831 prerelease [FreeBSD]
 /usr/libexec/cc1 -E -quiet -v -D_LONGLONG -
-march=core2 -mtune=generic

With -march=native, gcc adds -mtune=generic while the man pages
says -march=xxx sets -mtune=xxx.

%echo | gcc -march=core2 -E -v -x c -### -
Using built-in specs.
Target: amd64-undermydesk-freebsd
Configured with: FreeBSD/amd64 system compiler
Thread model: posix
gcc version 4.2.2 20070831 prerelease [FreeBSD]
 /usr/libexec/cc1 -E -quiet -v -D_LONGLONG - -march=core2

With -march=core2, gcc doesn't add -mtune=generic, so it should
use -mtune=core2 as suggested by its man page.

That's why I use -march=core2 for gcc. Now for clang...

With -march=core2, my buildworld compiles just fine on my Core2
Quad, whereas with -march=native (without -jX) if fails on :
=== libexec/atrun (all)
clang -O2 -pipe -march=native -fomit-frame-pointer
-DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
-DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
-Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c
/usr/src/libexec/atrun/atrun.c
clang -O2 -pipe -march=native -fomit-frame-pointer
-DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
-DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
-Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c
/usr/src/libexec/atrun/gloadavg.c
clang -O2 -pipe -march=native -fomit-frame-pointer
-DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
-DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
-Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign  -o atrun atrun.o
gloadavg.o -lpam -lutil
clang: warning: argument unused during compilation: '-std=gnu99'
/usr/obj/usr/src/tmp/usr/lib/crt1.o: In function `_start':
/usr/src/lib/csu/amd64/crt1.c:(.text+0x5d): undefined reference to `atexit'
/usr/src/lib/csu/amd64/crt1.c:(.text+0x64): undefined reference to `_init_tls'
/usr/src/lib/csu/amd64/crt1.c:(.text+0x6e): undefined reference to `atexit'
/usr/src/lib/csu/amd64/crt1.c:(.text+0x88): undefined reference to `exit'
atrun.o: In function `perr':
/usr/src/libexec/atrun/atrun.c:(.text+0x65): undefined reference to `strlen'
/usr/src/libexec/atrun/atrun.c:(.text+0xac): undefined reference to `vwarn'
/usr/src/libexec/atrun/atrun.c:(.text+0xb6): undefined reference to `exit'
/usr/src/libexec/atrun/atrun.c:(.text+0xd5): undefined reference to `snprintf'
/usr/src/libexec/atrun/atrun.c:(.text+0xe6): undefined reference to `vsyslog'
/usr/src/libexec/atrun/atrun.c:(.text+0xf0): undefined reference to `exit'
atrun.o: In function `perrx':
/usr/src/libexec/atrun/atrun.c:(.text+0x19f): undefined reference to `vwarnx'
/usr/src/libexec/atrun/atrun.c:(.text+0x1a9): undefined reference to `exit'
/usr/src/libexec/atrun/atrun.c:(.text+0x1be): undefined reference to `vsyslog'
/usr/src/libexec/atrun/atrun.c:(.text+0x1c8): undefined reference to `exit'
atrun.o: In function `main':
/usr/src/libexec/atrun/atrun.c:(.text+0x224): undefined reference to `geteuid'
/usr/src/libexec/atrun/atrun.c:(.text+0x239): undefined reference to `getegid'
/usr/src/libexec/atrun/atrun.c:(.text+0x24a): undefined reference to `setegid'
/usr/src/libexec/atrun/atrun.c:(.text+0x255): undefined reference to `seteuid'
/usr/src/libexec/atrun/atrun.c:(.text+0x269): undefined reference to `openlog'
/usr/src/libexec/atrun/atrun.c:(.text+0x26f): undefined reference to `opterr'
/usr/src/libexec/atrun/atrun.c:(.text+0x292): undefined reference to `getopt'
/usr/src/libexec/atrun/atrun.c:(.text+0x2ac): undefined reference to `optarg'
/usr/src/libexec/atrun/atrun.c:(.text+0x2bb): undefined reference to `sscanf'
/usr/src/libexec/atrun/atrun.c:(.text+0x2e7): undefined reference to `__stderrp'
/usr/src/libexec/atrun/atrun.c:(.text+0x2fb): undefined reference to `fwrite'
/usr/src/libexec/atrun/atrun.c:(.text+0x305): undefined reference to `exit'

Re: Clang error make buildworld

2011-05-05 Thread Roman Divacky
 clang -O2 -pipe -march=native -fomit-frame-pointer
 -DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
 -DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
 -DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
 -DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
 -I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
 -DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
 -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c
 /usr/src/libexec/atrun/gloadavg.c
 clang -O2 -pipe -march=native -fomit-frame-pointer
 -DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
 -DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
 -DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
 -DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
 -I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
 -DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
 -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign  -o atrun atrun.o
 gloadavg.o -lpam -lutil
 clang: warning: argument unused during compilation: '-std=gnu99'
 /usr/obj/usr/src/tmp/usr/lib/crt1.o: In function `_start':
 /usr/src/lib/csu/amd64/crt1.c:(.text+0x5d): undefined reference to `atexit'


Can you invoke this very same command (ie. linking) with -### and show me?
Does it work when you try to link the same .o files without specifying
-march=native ?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Olivier Smedts
2011/5/5 Roman Divacky rdiva...@freebsd.org:
 clang -O2 -pipe -march=native -fomit-frame-pointer
 -DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
 -DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
 -DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
 -DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
 -I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
 -DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
 -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c
 /usr/src/libexec/atrun/gloadavg.c
 clang -O2 -pipe -march=native -fomit-frame-pointer
 -DATJOB_DIR=\/var/at/jobs/\  -DLFILE=\/var/at/jobs/.lockfile\
 -DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\  -DVERSION=\2.9\
 -DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
 -DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
 -I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
 -DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
 -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign  -o atrun atrun.o
 gloadavg.o -lpam -lutil
 clang: warning: argument unused during compilation: '-std=gnu99'
 /usr/obj/usr/src/tmp/usr/lib/crt1.o: In function `_start':
 /usr/src/lib/csu/amd64/crt1.c:(.text+0x5d): undefined reference to `atexit'


 Can you invoke this very same command (ie. linking) with -### and show me?
 Does it work when you try to link the same .o files without specifying
 -march=native ?

I'm going to try. In the meantime, I did other tests on this machine,
which is detected by clang as -march=corei7.

Compiling this with the system's clang (which has been compiled with
-march=core2) and -march=core2 is OK.
Compiling this with the system's clang (which has been compiled with
-march=core2) and -march=native is OK.
Compiling this with the bootstrap clang (which has been compiled with
-march=native) and -march=native FAILS.

The problem seems to be inside the clang compiled with -march=native.
Next, I'm going to try with a bootstrap clang compiled with
-march=corei7.

-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email  vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Jack Vogel
On Thu, May 5, 2011 at 7:21 AM, Arnaud Lacombe lacom...@gmail.com wrote:

 Hi,

 On Thu, May 5, 2011 at 2:59 AM, Jack Vogel jfvo...@gmail.com wrote:
  OK, but what this does not explain is why I do not see this if
  its so easily reproduced, what causes the failure case, any idea?
 
 It is completely random as it depends on the content of the stack. I
 spent 3 or 4 hours trying to reproduce it using different approach on
 different platform, with different version of the code and failed. And
 once `error' was explicitly colored, it popped up. That's the beauty
 of error related with uninitialized variable.

  - Arnaud

  As I said, given the code was not feasible for igb anyway I would not
  be unhappy about returning to the old way of doing things.
 
 I am not sure what you mean by old way of doing thing, but I'd guess
 that the ring only need to be setup on a few occasion, like
 initialization and MTU transition. I'm not sure either how other
 driver manage their ring.


The old way was as the code is in igb now, on each entry to this
setup it would completely wipe the descriptor memory, then release
all mbufs, and initialize from scratch.  Its only because of this
lazy reinit, meaning only the range from next_to_refresh to
next_to_check is reset, that this problem can happen.

For igb the reason this will not work, is it requires you to set
E1000_RDH(i) to next_to_check, and in fact, the hardware
prohibits the write, its ALWAYS 0 after a reset. The reason
for this is that the hardware wishes to manage the head
index and not software.

Anyway, I see the problematic code path, its only when
you skip the while loop altogether. I'm surprised the compiler
did not complain about this, its usually so anal.

Jack
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full

2011-05-05 Thread Garrett Cooper
On May 4, 2011, at 2:07 AM, Kostik Belousov wrote:

 On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
 On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper yaneg...@gmail.com wrote:
 On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick mckus...@mckusick.com 
 wrote:
 Date: Tue, 3 May 2011 22:40:26 -0700
 Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
  partition when filesystem full
 From: Garrett Cooper yaneg...@gmail.com
 To: Jeff Roberson j...@freebsd.org,
 Marshall Kirk McKusick mckus...@mckusick.com
 Cc: FreeBSD Current freebsd-current@freebsd.org
 
 Hi Jeff and Dr. McKusick,
 Ran into this panic when /usr ran out of space doing a make
 universe on amd64/r221219 (it took ~15 minutes for the panic to occur
 after the filesystem ran out of space -- wasn't quite sure what it was
 doing at the time):
 
 ...
 
 Let me know what other commands you would like for me to run in kgdb.
 Thanks,
 -Garrett
 
 You did not indicate whether you are running an 8.X system or a 9-current
 system. It would be helpful to know that.
 
 I've actually been running CURRENT for a few years now, but you're right --
 I didn't mention that part.
 
 Jeff thinks that there may be a potential race in the locking code for
 softdep_request_cleanup. If so, this patch for 9-current should fix it:
 
 Index: ffs_softdep.c
 ===
 --- ffs_softdep.c   (revision 221385)
 +++ ffs_softdep.c   (working copy)
 @@ -11380,7 +11380,8 @@
continue;
}
MNT_IUNLOCK(mp);
 -   if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, 
 curthread)) {
 +   if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | 
 LK_INTERLOCK,
 +   curthread)) {
MNT_ILOCK(mp);
continue;
}
 
 If you are running an 8.X system, hopefully you will be able to apply it.
 
I've applied it, rebuilt and installed the kernel, and trying to
 repro the case again. Will let you know how things go!
 
Happened again with the change. It's really easy to repro:
 
 1. Get a filesystem with UFS+SU
 2. Execute something that does a large number of small writes to a partition.
 3. 'dd if=/dev/zero of=FOO bs=10m' on the same partition
 
The kernel will panic with the issue I discussed above.
 Thanks!
 
 Jeff' change is required to avoid LORs, but it is not sufficient to
 prevent recursion. We must skip the vnode supplied as a parameter to
 softdep_request_cleanup(). Theoretically, other vnodes might be also
 locked by curthread, thus I think the change below is needed. Try this.
 
 diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
 index a6d4441..25fa5d6 100644
 --- a/sys/ufs/ffs/ffs_softdep.c
 +++ b/sys/ufs/ffs/ffs_softdep.c
 @@ -11380,7 +11380,9 @@ retry:
   continue;
   }
   MNT_IUNLOCK(mp);
 - if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
 + if (VOP_ISLOCKED(lvp) ||
 + vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
 + curthread)) {
   MNT_ILOCK(mp);
   continue;
   }

Ran into the same panic after I applied the patch above with the repro 
steps I described before. One thing that I noticed is that the issue isn't as 
easy to reproduce unless you add the dd in parallel with the make operation.
Thanks,
-Garrett___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Clang error make buildworld

2011-05-05 Thread Olivier Smedts
2011/5/5 Roman Divacky rdiva...@freebsd.org:
 Can you invoke this very same command (ie. linking) with -### and show me?
 Does it work when you try to link the same .o files without specifying
 -march=native ?

My system has previously been compiled with clang and -march=core2.
It's a corei7.

With -march=native in make.conf, after the failed buildworld I cd in
/usr/obj/usr/src/libexec/atrun/ and :

# clang -O2 -pipe -march=native -fomit-frame-pointer
-DATJOB_DIR=\/var/at/jobs/\ -DLFILE=\/var/at/jobs/.lockfile\
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\/var/at/spool\ -DVERSION=\2.9\
-DDAEMON_UID=1 -DDAEMON_GID=1 -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\/var/at/\
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun
-DLOGIN_CAP -DPAM -std=gnu99 -fstack-protector -Wsystem-headers -Wall
-Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -o atrun atrun.o
gloadavg.o -lpam -lutil
clang: warning: argument unused during compilation: '-std=gnu99'

OK

# /usr/obj/usr/src/tmp/usr/bin/clang -O2 -pipe -march=native
-fomit-frame-pointer -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\ -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\ -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1 -DDEFAULT_BATCH_QUEUE=\'E\' -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\ -I/usr/src/libexec/atrun/../../usr.bin/at
-I/usr/src/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign -o atrun atrun.o gloadavg.o -lpam
-lutil

FAIL (clang: error: linker command failed with exit code 1 (use -v to
see invocation))

# /usr/obj/usr/src/tmp/usr/bin/clang -O2 -pipe -march=native
-fomit-frame-pointer -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\ -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\ -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1 -DDEFAULT_BATCH_QUEUE=\'E\' -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\ -I/usr/src/libexec/atrun/../../usr.bin/at
-I/usr/src/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign -o atrun atrun.o gloadavg.o -lpam
-lutil -###
FreeBSD clang version 3.0 (trunk 130700) 20110502
Target: x86_64-undermydesk-freebsd9.0
Thread model: posix
clang: warning: argument unused during compilation: '-std=gnu99'
 /usr/obj/usr/src/tmp/usr/bin/ld --eh-frame-hdr -dynamic-linker
/libexec/ld-elf.so.1 -o atrun
/usr/obj/usr/src/tmp/usr/lib/crt1.o
/usr/obj/usr/src/tmp/usr/lib/crti.o
/usr/obj/usr/src/tmp/usr/lib/crtbegin.o
-L/usr/obj/usr/src/tmp/usr/lib atrun.o gloadavg.o -lpam
-lutil -lgcc --as-needed -lgcc_s --no-as-needed -lc
-lgcc --as-needed -lgcc_s --no-as-needed
/usr/obj/usr/src/tmp/usr/lib/crtend.o
/usr/obj/usr/src/tmp/usr/lib/crtn.o

Using the bootstrap clang (compiled with -march=native) and trying to
compile atrun, this time using -march=core2 :

# /usr/obj/usr/src/tmp/usr/bin/clang -O2 -pipe -march=core2
-fomit-frame-pointer -DATJOB_DIR=\/var/at/jobs/\
-DLFILE=\/var/at/jobs/.lockfile\ -DLOADAVG_MX=1.5
-DATSPOOL_DIR=\/var/at/spool\ -DVERSION=\2.9\ -DDAEMON_UID=1
-DDAEMON_GID=1 -DDEFAULT_BATCH_QUEUE=\'E\' -DDEFAULT_AT_QUEUE=\'c\'
-DPERM_PATH=\/var/at/\ -I/usr/src/libexec/atrun/../../usr.bin/at
-I/usr/src/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
-fstack-protector -Wsystem-headers -Wall -Wno-format-y2k
-Wno-uninitialized -Wno-pointer-sign -o atrun atrun.o gloadavg.o -lpam
-lutil

FAIL (same error)

When trying to compile the Host.cpp you provided (which compiled
fine with my system's clang and gcc), still with the bootstrap clang :

# /usr/obj/usr/src/tmp/usr/bin/clang -v Host.cpp
FreeBSD clang version 3.0 (trunk 130700) 20110502
Target: x86_64-undermydesk-freebsd9.0
Thread model: posix
 /usr/obj/usr/src/tmp/usr/bin/clang -cc1 -triple
x86_64-undermydesk-freebsd9.0 -emit-obj -mrelax-all -disable-free
-main-file-name Host.cpp -mrelocation-model static -mdisable-fp-elim
-masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64
-momit-leaf-frame-pointer -v -resource-dir
/usr/obj/usr/src/tmp/usr/bin/../lib/clang/3.0 -fdeprecated-macro
-ferror-limit 19 -fmessage-length 236 -fcxx-exceptions -fexceptions
-fgnu-runtime -fdiagnostics-show-option -fcolor-diagnostics -o
/tmp/cc-6ijoGC.o -x c++ Host.cpp
clang -cc1 version 3.0 based upon llvm 3.0svn hosted on
x86_64-undermydesk-freebsd9.0
ignoring nonexistent directory
/usr/obj/usr/src/tmp/usr/include/c++/4.2/backward/backward
ignoring nonexistent directory
/usr/obj/usr/src/tmp/usr/bin/../lib/clang/3.0/include
ignoring duplicate directory /usr/obj/usr/src/tmp/usr/include/c++/4.2
ignoring duplicate directory /usr/obj/usr/src/tmp/usr/include/c++/4.2/backward
ignoring duplicate directory /usr/obj/usr/src/tmp/usr/include/c++/4.2/backward
#include ... search starts here:
#include ... search starts here:
 /usr/obj/usr/src/tmp/usr/include/c++/4.2
 /usr/obj/usr/src/tmp/usr/include/c++/4.2/backward
 /usr/obj/usr/src/tmp/usr/include/clang/3.0
 /usr/obj/usr/src/tmp/usr/include

Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full

2011-05-05 Thread Kostik Belousov
On Thu, May 05, 2011 at 10:23:47AM -0700, Garrett Cooper wrote:
 On May 4, 2011, at 2:07 AM, Kostik Belousov wrote:
 
  On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
  On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper yaneg...@gmail.com wrote:
  On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick mckus...@mckusick.com 
  wrote:
  Date: Tue, 3 May 2011 22:40:26 -0700
  Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
   partition when filesystem full
  From: Garrett Cooper yaneg...@gmail.com
  To: Jeff Roberson j...@freebsd.org,
  Marshall Kirk McKusick mckus...@mckusick.com
  Cc: FreeBSD Current freebsd-current@freebsd.org
  
  Hi Jeff and Dr. McKusick,
  Ran into this panic when /usr ran out of space doing a make
  universe on amd64/r221219 (it took ~15 minutes for the panic to occur
  after the filesystem ran out of space -- wasn't quite sure what it was
  doing at the time):
  
  ...
  
  Let me know what other commands you would like for me to run in 
  kgdb.
  Thanks,
  -Garrett
  
  You did not indicate whether you are running an 8.X system or a 9-current
  system. It would be helpful to know that.
  
  I've actually been running CURRENT for a few years now, but you're right 
  --
  I didn't mention that part.
  
  Jeff thinks that there may be a potential race in the locking code for
  softdep_request_cleanup. If so, this patch for 9-current should fix it:
  
  Index: ffs_softdep.c
  ===
  --- ffs_softdep.c   (revision 221385)
  +++ ffs_softdep.c   (working copy)
  @@ -11380,7 +11380,8 @@
 continue;
 }
 MNT_IUNLOCK(mp);
  -   if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, 
  curthread)) {
  +   if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | 
  LK_INTERLOCK,
  +   curthread)) {
 MNT_ILOCK(mp);
 continue;
 }
  
  If you are running an 8.X system, hopefully you will be able to apply it.
  
 I've applied it, rebuilt and installed the kernel, and trying to
  repro the case again. Will let you know how things go!
  
 Happened again with the change. It's really easy to repro:
  
  1. Get a filesystem with UFS+SU
  2. Execute something that does a large number of small writes to a 
  partition.
  3. 'dd if=/dev/zero of=FOO bs=10m' on the same partition
  
 The kernel will panic with the issue I discussed above.
  Thanks!
  
  Jeff' change is required to avoid LORs, but it is not sufficient to
  prevent recursion. We must skip the vnode supplied as a parameter to
  softdep_request_cleanup(). Theoretically, other vnodes might be also
  locked by curthread, thus I think the change below is needed. Try this.
  
  diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
  index a6d4441..25fa5d6 100644
  --- a/sys/ufs/ffs/ffs_softdep.c
  +++ b/sys/ufs/ffs/ffs_softdep.c
  @@ -11380,7 +11380,9 @@ retry:
  continue;
  }
  MNT_IUNLOCK(mp);
  -   if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) {
  +   if (VOP_ISLOCKED(lvp) ||
  +   vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
  +   curthread)) {
  MNT_ILOCK(mp);
  continue;
  }
 
   Ran into the same panic after I applied the patch above with the repro 
 steps I described before. One thing that I noticed is that the issue isn't as 
 easy to reproduce unless you add the dd in parallel with the make operation.

Well, I misread your original report. Also, there is another issue
that is easily reproducable in similar situation. The latest patch
is below.

diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index 231e3d6..f064053 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -366,6 +366,8 @@ void  __mnt_vnode_markerfree(struct vnode **mvp, 
struct mount *mp);
 #define MNT_LAZY   3   /* push data not written by filesystem syncer */
 #define MNT_SUSPEND4   /* Suspend file system after sync */
 
+#defineMNT_WAIT_ADV0x1000  /* MNT_WAIT prevent deadlock */
+
 /*
  * Generic file handle
  */
diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
index e60514d..87837cc 100644
--- a/sys/ufs/ffs/ffs_alloc.c
+++ b/sys/ufs/ffs/ffs_alloc.c
@@ -420,13 +420,13 @@ nospace:
 */
if (reclaimed == 0) {
reclaimed = 1;
-   softdep_request_cleanup(fs, vp, cred, FLUSH_BLOCKS_WAIT);
-   UFS_UNLOCK(ump);
if (bp) {
+   UFS_UNLOCK(ump);
brelse(bp);
bp = NULL;
+   UFS_LOCK(ump);
}
- 

Re: Clang error make buildworld

2011-05-05 Thread Roman Divacky
 # /usr/obj/usr/src/tmp/usr/bin/clang -O2 -pipe -march=native
 -fomit-frame-pointer -DATJOB_DIR=\/var/at/jobs/\
 -DLFILE=\/var/at/jobs/.lockfile\ -DLOADAVG_MX=1.5
 -DATSPOOL_DIR=\/var/at/spool\ -DVERSION=\2.9\ -DDAEMON_UID=1
 -DDAEMON_GID=1 -DDEFAULT_BATCH_QUEUE=\'E\' -DDEFAULT_AT_QUEUE=\'c\'
 -DPERM_PATH=\/var/at/\ -I/usr/src/libexec/atrun/../../usr.bin/at
 -I/usr/src/libexec/atrun -DLOGIN_CAP -DPAM -std=gnu99
 -fstack-protector -Wsystem-headers -Wall -Wno-format-y2k
 -Wno-uninitialized -Wno-pointer-sign -o atrun atrun.o gloadavg.o -lpam
 -lutil
 
 FAIL (clang: error: linker command failed with exit code 1 (use -v to
 see invocation))

Can you run this in gdb and show me backtrace? Also, what version is your
binutils?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: atkbdc broken on current ?

2011-05-05 Thread John Baldwin
On Thursday, May 05, 2011 9:21:04 am Damjan Marion wrote:
 
 Hi,
 
 I have issue with old HP DL380G3 server. When I use ILO virtual console to 
manage server. Seems that 9-CURRENT fails to detect atkbdc.
 When I boot 8.2-RELEASE it works well.
 
 8.2 dmesg shows:
 
 atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
 
 9.0:
 
 atkbdc0: Keyboard controller (i8042) failed to probe at port 0x60 on isa0
 
 Is this a known issue?
 
 Should I enable some additional outputs, like KBDIO_DEBUG?

I suspect this is a resource issue stemming from changes I made to the acpi(4) 
bus driver quite a while ago to make it use rman_reserve_resource().  Can you
capture a full verbose dmesg from 9 along with devinfo -rv and devinfo -ur 
output from 9?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Olivier Smedts
2011/5/5 Jack Vogel jfvo...@gmail.com:
 Anyway, I see the problematic code path, its only when
 you skip the while loop altogether. I'm surprised the compiler
 did not complain about this, its usually so anal.

Could it be related to the compiler (clang) or some optimization flags ?

-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: oliv...@gid0.org        - against HTML email  vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: problems with em(4) since update to driver 7.2.2

2011-05-05 Thread Jack Vogel
Not sure, I wondered if those seeing this had some special sequence of
actions they took for granted that is different than what we do in house...

In any case, the init really is ultimately a correctness thing, so let's
just
call it good :)

Jack


On Thu, May 5, 2011 at 11:16 AM, Olivier Smedts oliv...@gid0.org wrote:

 2011/5/5 Jack Vogel jfvo...@gmail.com:
  Anyway, I see the problematic code path, its only when
  you skip the while loop altogether. I'm surprised the compiler
  did not complain about this, its usually so anal.

 Could it be related to the compiler (clang) or some optimization flags ?

 --
 Olivier Smedts _
 ASCII ribbon campaign ( )
 e-mail: oliv...@gid0.org- against HTML email  vCards  X
 www: http://www.gid0.org- against proprietary attachments / \

   Il y a seulement 10 sortes de gens dans le monde :
   ceux qui comprennent le binaire,
   et ceux qui ne le comprennent pas.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full

2011-05-05 Thread Garrett Cooper
On Thu, May 5, 2011 at 10:36 AM, Kostik Belousov kostik...@gmail.com wrote:
 On Thu, May 05, 2011 at 10:23:47AM -0700, Garrett Cooper wrote:
 On May 4, 2011, at 2:07 AM, Kostik Belousov wrote:

  On Tue, May 03, 2011 at 11:58:49PM -0700, Garrett Cooper wrote:
  On Tue, May 3, 2011 at 11:42 PM, Garrett Cooper yaneg...@gmail.com 
  wrote:
  On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick mckus...@mckusick.com 
  wrote:
  Date: Tue, 3 May 2011 22:40:26 -0700
  Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
   partition when filesystem full
  From: Garrett Cooper yaneg...@gmail.com
  To: Jeff Roberson j...@freebsd.org,
          Marshall Kirk McKusick mckus...@mckusick.com
  Cc: FreeBSD Current freebsd-current@freebsd.org
 
  Hi Jeff and Dr. McKusick,
      Ran into this panic when /usr ran out of space doing a make
  universe on amd64/r221219 (it took ~15 minutes for the panic to occur
  after the filesystem ran out of space -- wasn't quite sure what it was
  doing at the time):
 
  ...
 
      Let me know what other commands you would like for me to run in 
  kgdb.
  Thanks,
  -Garrett
 
  You did not indicate whether you are running an 8.X system or a 
  9-current
  system. It would be helpful to know that.
 
  I've actually been running CURRENT for a few years now, but you're right 
  --
  I didn't mention that part.
 
  Jeff thinks that there may be a potential race in the locking code for
  softdep_request_cleanup. If so, this patch for 9-current should fix it:
 
  Index: ffs_softdep.c
  ===
  --- ffs_softdep.c       (revision 221385)
  +++ ffs_softdep.c       (working copy)
  @@ -11380,7 +11380,8 @@
                                 continue;
                         }
                         MNT_IUNLOCK(mp);
  -                       if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, 
  curthread)) {
  +                       if (vget(lvp, LK_EXCLUSIVE | LK_NOWAIT | 
  LK_INTERLOCK,
  +                           curthread)) {
                                 MNT_ILOCK(mp);
                                 continue;
                         }
 
  If you are running an 8.X system, hopefully you will be able to apply 
  it.
 
     I've applied it, rebuilt and installed the kernel, and trying to
  repro the case again. Will let you know how things go!
 
     Happened again with the change. It's really easy to repro:
 
  1. Get a filesystem with UFS+SU
  2. Execute something that does a large number of small writes to a 
  partition.
  3. 'dd if=/dev/zero of=FOO bs=10m' on the same partition
 
     The kernel will panic with the issue I discussed above.
  Thanks!
 
  Jeff' change is required to avoid LORs, but it is not sufficient to
  prevent recursion. We must skip the vnode supplied as a parameter to
  softdep_request_cleanup(). Theoretically, other vnodes might be also
  locked by curthread, thus I think the change below is needed. Try this.
 
  diff --git a/sys/ufs/ffs/ffs_softdep.c b/sys/ufs/ffs/ffs_softdep.c
  index a6d4441..25fa5d6 100644
  --- a/sys/ufs/ffs/ffs_softdep.c
  +++ b/sys/ufs/ffs/ffs_softdep.c
  @@ -11380,7 +11380,9 @@ retry:
                              continue;
                      }
                      MNT_IUNLOCK(mp);
  -                   if (vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK, curthread)) 
  {
  +                   if (VOP_ISLOCKED(lvp) ||
  +                       vget(lvp, LK_EXCLUSIVE | LK_INTERLOCK | LK_NOWAIT,
  +                       curthread)) {
                              MNT_ILOCK(mp);
                              continue;
                      }

       Ran into the same panic after I applied the patch above with the repro 
 steps I described before. One thing that I noticed is that the issue isn't 
 as easy to reproduce unless you add the dd in parallel with the make 
 operation.

 Well, I misread your original report. Also, there is another issue
 that is easily reproducable in similar situation. The latest patch
 is below.

 diff --git a/sys/sys/mount.h b/sys/sys/mount.h
 index 231e3d6..f064053 100644
 --- a/sys/sys/mount.h
 +++ b/sys/sys/mount.h
 @@ -366,6 +366,8 @@ void          __mnt_vnode_markerfree(struct vnode **mvp, 
 struct mount *mp);
  #define MNT_LAZY       3       /* push data not written by filesystem syncer 
 */
  #define MNT_SUSPEND    4       /* Suspend file system after sync */

 +#define        MNT_WAIT_ADV    0x1000      /* MNT_WAIT prevent deadlock 
 */
 +
  /*
  * Generic file handle
  */
 diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c
 index e60514d..87837cc 100644
 --- a/sys/ufs/ffs/ffs_alloc.c
 +++ b/sys/ufs/ffs/ffs_alloc.c
 @@ -420,13 +420,13 @@ nospace:
         */
        if (reclaimed == 0) {
                reclaimed = 1;
 -               softdep_request_cleanup(fs, vp, cred, FLUSH_BLOCKS_WAIT);
 -               UFS_UNLOCK(ump);
                if (bp) {
 +                       UFS_UNLOCK(ump);
                        

Re: responsiveness during IO tasks

2011-05-05 Thread Alexander Motin
Alexander Motin wrote:
 Julian Elischer wrote:
 Doug Barton wrote:
 No problem, just let's hunt things down. I'll wait for that larger post.
 In meantime, if it is related to eventtimers, it would be good to
 collect more detailed information. You could try to make timer run
 during idle (kern.eventtimer.idletick). You could try to switch timer
 from one-shot to periodic mode (kern.eventtimer.periodic). You could
 also try to switch to another timer (kern.eventtimer.timer).
 kern.eventtimer.periodic needs to be disabled to run 9.x on xen
 (as of a few months ago)
 
 Yes, but it needs to be enabled (it is disabled by default). I remember
 about it and going to experiment with it nearest time.

Problem with Xen HVM freeze in one-shot mode workarounded by r221508.
Also, looking on Xen 4.1 sources, seems like problem was already fixed
from their side also.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Building FreeBSD 9.0-CUR/amd64 with CLANG fails

2011-05-05 Thread Mark Linimon
On Wed, May 04, 2011 at 09:17:23AM +0200, O. Hartmann wrote:
 I guess the ports-tree isn't mature for clang.

That's correct.

mcl
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Interrupt storm with MSI in combination with em1

2011-05-05 Thread Daan Vreeken
Hi Peter,

On Thursday 05 May 2011 21:28:02 Peter Jeremy wrote:
 On 2011-May-05 13:22:59 +0200, Daan Vreeken d...@vehosting.nl wrote:
 Not yet. I'll reboot the machine later today when I have physical access
  to it to check the BIOS version. I'll keep you informed as soon as I get
  another storm going.

 Depending on the quality of your BIOS (competence of the vendor), you
 might find that kenv(8) reports the BIOS version without needing a reboot.
 (Look at smbios.bios.* in the output).

Great! I didn't know that :)

# kenv
...
smbios.bios.reldate=07/15/2010
...
smbios.bios.version=0303   
...
smbios.planar.maker=ASUSTeK Computer INC.
smbios.planar.product=P7H55-M LX


Version 0402 is the latest and greatest, so it's time to upgrade. According 
to Asus it Improves system stability, so let's see if this 'cures' IRQ 16.


Thanks,
-- 
Daan Vreeken
VEHosting
http://VEHosting.nl
tel: +31-(0)40-7113050 / +31-(0)6-46210825
KvK nr: 17174380
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Interrupt storm with MSI in combination with em1

2011-05-05 Thread Jack Vogel
Cool, thanks for the update! Good luck.

Jack


On Thu, May 5, 2011 at 1:17 PM, Daan Vreeken d...@vehosting.nl wrote:

 Hi Peter,

 On Thursday 05 May 2011 21:28:02 Peter Jeremy wrote:
  On 2011-May-05 13:22:59 +0200, Daan Vreeken d...@vehosting.nl wrote:
  Not yet. I'll reboot the machine later today when I have physical access
   to it to check the BIOS version. I'll keep you informed as soon as I
 get
   another storm going.
 
  Depending on the quality of your BIOS (competence of the vendor), you
  might find that kenv(8) reports the BIOS version without needing a
 reboot.
  (Look at smbios.bios.* in the output).

 Great! I didn't know that :)

 # kenv
 ...
 smbios.bios.reldate=07/15/2010
 ...
 smbios.bios.version=0303   
 ...
 smbios.planar.maker=ASUSTeK Computer INC.
 smbios.planar.product=P7H55-M LX


 Version 0402 is the latest and greatest, so it's time to upgrade.
 According
 to Asus it Improves system stability, so let's see if this 'cures' IRQ
 16.


 Thanks,
 --
 Daan Vreeken
 VEHosting
 http://VEHosting.nl
 tel: +31-(0)40-7113050 / +31-(0)6-46210825
 KvK nr: 17174380

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Processes in swapped out states in recent CURRENT?

2011-05-05 Thread Garrett Cooper
I was watching top output on my dev box and I noticed that there
are more swapped out processes present on the system, shortly after
boot (which doesn't make sense given that I'm not low on resources on
the box). Also, the os when I run os.waitpid() in python claims that
the child doesn't exist, so I'm wondering if there's an issue with the
processes reported via ps, top, etc.
I'm noting this because it's a behavior change over my
'stable'-ish workstation, running CURRENT/r220089/amd64, which is
spec'ed out the same as the dev box, minus some multimedia hardware.
Thanks,
-Garrett

# uname -a
FreeBSD fallout.local 9.0-CURRENT FreeBSD 9.0-CURRENT #0 r221219M: Thu
May  5 12:09:37 PDT 2011
root@fallout.local:/usr/obj/usr/src/sys/FALLOUT  amd6
# fstat -p 1832
USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W
root sshd1832 root / 2 drwxr-xr-x1024  r
root sshd1832   wd / 2 drwxr-xr-x1024  r
root sshd1832 text /usr 730118 -r-xr-xr-x  240944  r
root sshd18320 /dev  6 crw-rw-rw-null  r
root sshd18321 /dev  6 crw-rw-rw-null rw
root sshd18322 /dev  6 crw-rw-rw-null rw
root sshd18323* internet stream tcp fe01e56cf000
root sshd18324* pseudo-terminal master  pts/1 rw
root sshd18325* local stream fe0008f79960 -
fe0008f79a50
# fstat -p 149
USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W
root adjkerntz149 root / 2 drwxr-xr-x1024  r
root adjkerntz149   wd / 2 drwxr-xr-x1024  r
root adjkerntz149 text /329805 -r-xr-xr-x8792  r
root adjkerntz1490 /dev  6 crw-rw-rw-null rw
root adjkerntz1491 /dev  6 crw-rw-rw-null rw
root adjkerntz1492 /dev  6 crw-rw-rw-null rw
# fstat -p 1479
USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W
root syslogd 1479 root / 2 drwxr-xr-x1024  r
root syslogd 1479   wd / 2 drwxr-xr-x1024  r
root syslogd 1479 text /usr 739002 -r-xr-xr-x   39008  r
root syslogd 14790 /dev  6 crw-rw-rw-null rw
root syslogd 14791 /dev  6 crw-rw-rw-null rw
root syslogd 14792 /dev  6 crw-rw-rw-null rw
root syslogd 14793 /var 353301 -rw---   4  w
root syslogd 14794* local dgram fe0008cd31e0
root syslogd 14795* local dgram fe0008cd30f0
root syslogd 14796* internet6 dgram udp fe0008ced540
root syslogd 14797* internet dgram udp fe0008ced3f0
root syslogd 14798 /dev 29 crw---klog  r
root syslogd 1479   10 /var 1389613 -rw-r--r--   25389  w
root syslogd 1479   11 /var 1389579 -rw---  62  w
root syslogd 1479   12 /var 1389572 -rw---   10164  w
root syslogd 1479   13 /var 1389601 -rw-r-2814  w
root syslogd 1479   14 /var 1389575 -rw-r--r--  62  w
root syslogd 1479   15 /var 1389580 -rw---  62  w
root syslogd 1479   16 /var 1389577 -rw---   57212  w
root syslogd 1479   17 /var 1389606 -rw---   38046  w
root syslogd 1479   18 /var 1389578 -rw-r-  62  w
# fstat -p 1829
USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W
gcooper  sh  1829 root / 2 drwxr-xr-x1024  r
gcooper  sh  1829   wd /usr 1884160 drwxr-xr-x1024  r
gcooper  sh  1829 text /212057 -r-xr-xr-x  131784  r
gcooper  sh  18290 /dev127 crw--w   pts/0 rw
gcooper  sh  18291 /dev127 crw--w   pts/0 rw
gcooper  sh  18292 /dev127 crw--w   pts/0 rw
gcooper  sh  1829   10 /dev127 crw--w   pts/0 rw

# python -c 'import os; os.waitpid(1825, 0)'
Traceback (most recent call last):
  File string, line 1, in module
OSError: [Errno 10] No child processes
# ps auxww | grep 1825
root 1825   0.0  0.0  47952  0  ??  IWs  - 0:00.00
sshd: gcooper [priv] (sshd)
root88213   0.0  0.0  16340   1356   3  S+1:25PM   0:00.00 grep 1825
# top -b
last pid: 96740;  load averages:  1.07,  0.98,  0.92  up 0+01:15:3213:27:04
50 processes:  2 running, 48 sleeping

Mem: 56M Active, 23M Inact, 795M Wired, 1848K Cache, 1237M Buf, 11G Free
Swap: 24G Total, 832K Used, 24G Free


  PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU
COMMAND
 1828 gcooper 1  200 47952K  3372K select  6   0:02  0.00% sshd
26295 root1  200  9972K   888K kqread  2   0:01  0.00% tail
95888 root1  520 14472K  8092K wait1   0:00  0.00% make
 1729 root1  200 20368K  3000K 

Re: My problems with stability on -current

2011-05-05 Thread Alexander Motin
Doug Barton wrote:
 Alexander suggested some knobs to twist for the timers, and I'll be glad
 to do that once he gets back to me with more concrete suggestions now
 that he knows more about my specific problems.

OK, I am all here. While this post is indeed larger then previous, it is
not much more informative. Sorry. :(

I see several possibly unrelated problems there:
 - crashes are always crashes. They should be debugged.
 - calcru going backwards could have the same roots as lost wall clock
time. If there are some problems with timer interrupts, timecounters
could wrap unnoticed that will cause random time jumps.
 - interactivity problems. I can't prove it is unrelated, but have no
real ideas now.

I would start from most obvious problems. I need to know more about
crashes. As usual: how to trigger, stack backtraces, etc.

What's about time problems, I would try to collect more data:
 - show `sysctl kern.eventtimer`, `sysctl kern.timecounter` and verbose
dmesg outputs;
 - what eventtimer is used now and does it helps to switch to another
one with kern.eventtimer.timer sysctl?
 - does the timer runs in periodic or one-shot mode and does it helps to
switch to another one?
 - if full CPU load makes time to stop, try to track what is going on
with timer interrupts using `vmstat -i` and `systat -vm 1`. Under full
CPU load in one-shot mode you should have stable timer interrupt rate
about hz+stathz.
 - if timer interrupts are not working well, you can build kernel with
optionsKTR
optionsALQ
optionsKTR_ALQ
optionsKTR_COMPILE=(KTR_SPARE2)
optionsKTR_ENTRIES=131072
optionsKTR_MASK=(KTR_SPARE2)
to track event timers operation and use ktrdump to save the trace when
problem exist (preferably when it begins).

And let's experiment with fresh CURRENT.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: atkbdc broken on current ?

2011-05-05 Thread Damjan Marion

On May 5, 2011, at 7:43 PM, John Baldwin wrote:

 On Thursday, May 05, 2011 9:21:04 am Damjan Marion wrote:
 
 Hi,
 
 I have issue with old HP DL380G3 server. When I use ILO virtual console to 
 manage server. Seems that 9-CURRENT fails to detect atkbdc.
 When I boot 8.2-RELEASE it works well.
 
 8.2 dmesg shows:
 
 atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
 
 9.0:
 
 atkbdc0: Keyboard controller (i8042) failed to probe at port 0x60 on isa0
 
 Is this a known issue?
 
 Should I enable some additional outputs, like KBDIO_DEBUG?
 
 I suspect this is a resource issue stemming from changes I made to the 
 acpi(4) 
 bus driver quite a while ago to make it use rman_reserve_resource().  Can you
 capture a full verbose dmesg from 9 along with devinfo -rv and devinfo -ur 
 output from 9?

Here it is:

http://web.me.com/dmarion/atkbdc.txt

Thanks,

Damjan___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full

2011-05-05 Thread Kostik Belousov
On Thu, May 05, 2011 at 12:49:48PM -0700, Garrett Cooper wrote:
 Things look ok with that patch and the one that Jeff provided for
 the LOR, taking into account your style change with the flag list.
 Thanks!

I do not understand your response. Jeff' patch was included into the
cumulative change I sent you, with slight modification.

What 'style change with the flag list' are you referencing to ?


pgpoDAWslEXEc.pgp
Description: PGP signature


Re: Interrupt storm with MSI in combination with em1

2011-05-05 Thread Peter Jeremy
On 2011-May-05 13:22:59 +0200, Daan Vreeken d...@vehosting.nl wrote:
Not yet. I'll reboot the machine later today when I have physical access to it 
to check the BIOS version. I'll keep you informed as soon as I get another 
storm going.

Depending on the quality of your BIOS (competence of the vendor), you
might find that kenv(8) reports the BIOS version without needing a reboot.
(Look at smbios.bios.* in the output).

-- 
Peter Jeremy


pgpZbYhnW3y6u.pgp
Description: PGP signature


Using Dtrace for Performance Evaluation

2011-05-05 Thread David Christensen
I was looking at using dtrace to help characterize performance
for the new bxe(4) driver but I'm having problems with the very
simple task of capturing time spent in a function.  The D script
I'm using looks like the following:

#pragma D option quiet

fbt:if_bxe::entry
{
self-in = timestamp;
}

fbt:if_bxe::return
{

@callouts[((struct callout *)arg0)-c_func] = sum(timestamp -
self-in);
}

tick-10sec
{
printa(%40a %10@d\n, @callouts);
clear(@callouts);
printf(\n);
}

BEGIN
{
printf(%40s | %s\n, function, nanoseconds per second);
}

After building dtrace into the kernel and loading the dtraceall
kernel module, when I load my bxe kernel module and run dtrace -l
to list all supported probes I notice that many functions have an 
entry probe but no exit probe.  This effectively prevents me from
calculating timestamps on fbt:if_bxe::return probes.  Why am I
seeing this behavior?

Dave

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: Using Dtrace for Performance Evaluation

2011-05-05 Thread David Christensen
  After building dtrace into the kernel and loading the dtraceall
  kernel module, when I load my bxe kernel module and run dtrace -l
  to list all supported probes I notice that many functions have an
  entry probe but no exit probe.  This effectively prevents me from
  calculating timestamps on fbt:if_bxe::return probes.  Why am I
  seeing this behavior?
 
 Tail call optimization could do that to you:
 http://en.wikipedia.org/wiki/Tail_call

How to disable tail call optimization when building my driver?

Dave

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Using Dtrace for Performance Evaluation

2011-05-05 Thread Artem Belevich
On Thu, May 5, 2011 at 4:33 PM, David Christensen davi...@broadcom.com wrote:
  After building dtrace into the kernel and loading the dtraceall
  kernel module, when I load my bxe kernel module and run dtrace -l
  to list all supported probes I notice that many functions have an
  entry probe but no exit probe.  This effectively prevents me from
  calculating timestamps on fbt:if_bxe::return probes.  Why am I
  seeing this behavior?

 Tail call optimization could do that to you:
 http://en.wikipedia.org/wiki/Tail_call

 How to disable tail call optimization when building my driver?

Google is your friend:

Either compile with -O0/-O1, or use -fno-optimize-sibling-calls.
http://stackoverflow.com/questions/3679435/how-do-i-disable-tailcall-optimizations-in-gcc

--Artem
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Using Dtrace for Performance Evaluation

2011-05-05 Thread Artem Belevich
On Thu, May 5, 2011 at 1:08 PM, David Christensen davi...@broadcom.com wrote:
 I was looking at using dtrace to help characterize performance
 for the new bxe(4) driver but I'm having problems with the very
 simple task of capturing time spent in a function.  The D script
 I'm using looks like the following:

 #pragma D option quiet

 fbt:if_bxe::entry
 {
        self-in = timestamp;
 }

 fbt:if_bxe::return
 {

        @callouts[((struct callout *)arg0)-c_func] = sum(timestamp -
            self-in);
 }

 tick-10sec
 {
        printa(%40a %10@d\n, @callouts);
        clear(@callouts);
        printf(\n);
 }

 BEGIN
 {
        printf(%40s | %s\n, function, nanoseconds per second);
 }

 After building dtrace into the kernel and loading the dtraceall
 kernel module, when I load my bxe kernel module and run dtrace -l
 to list all supported probes I notice that many functions have an
 entry probe but no exit probe.  This effectively prevents me from
 calculating timestamps on fbt:if_bxe::return probes.  Why am I
 seeing this behavior?

Tail call optimization could do that to you:
http://en.wikipedia.org/wiki/Tail_call

--Artem


 Dave

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org