Bug#1041715: chkrootkit runs at very inopportune times (when the server is loaded)

2023-07-22 Thread James Bottomley
On Sat, 2023-07-22 at 16:19 +0100, Richard Lewis wrote:
> On Sat, 22 Jul 2023 at 15:48, james.bottom...@hansenpartnership.com
>  wrote:
> > The systemd chkrootkit.timer has this line:
> > 
> > OnBootSec=30min
> > 
> > Which means it runs 30 minutes after a reboot.  I tend to upgrade
> > my servers
> > in the early morning, which means it's still running when people
> > start using
> > the services (and it is very disk heavy so they notice the
> > slowdown).
> > 
> > Ideally this should run from cron.daily so it can be sequenced with
> > all the
> > other daily services.  However, if you insist on running it from
> > systemd, can
> > it at least have an OnCalendar timer set from a config file, so I
> > can
> > sequence it to begin at night?
> 
> Hi - you should be able to do this without any changes to the
> package:
> 
> systemctl edit chkrootkit.timer
> 
> and add/change the settings in a drop-in file directly to have it run
> when you like - there's no point duplicating such things in the
> chkrootkit config file. Using systemd's built-in methods is more
> flexible and avoids having to edit dpkg conffiles and get prompts on
> future upgrades.

Well, I did do this with vi to add the OnCalendar entry I suggested. 
The problem is most sysadmins have trouble figuring out the syntax. 
Plus it's a chase around three manual pages to figure out that what you
need is OnCalendar.  Perhaps adding a commented out OnCalendar to the
file would save others the archaeology?

> You can also disable the .timer entirely and make a local script to
> run from cron.daily:
> 
> systemctl disable chkrootkit.timer
> ln -s /usr/sbin/chkrootkit-daily/ /etc/cron.daily/local-chkrootkit
> # untested, but you get the idea
> 
> For better or worse, debian has chosen to make systemd the default.
> This does require doing things in different ways, but it is actually
> a lot more flexible.

Yes, the problem I have: wanting periodic services to all start in the
evening and run reasonably sequentially isn't one of the options
systemd can apparently cope with ... but I get that's not a chkrootkit
problem.

James



Bug#1016359: [edk2-devel] [Patch 1/2] OvmfPkg: Change default to disable MptScsi and PvScsi

2022-12-07 Thread James Bottomley
On Wed, 2022-12-07 at 17:04 +0100, Ard Biesheuvel wrote:
> On Wed, 7 Dec 2022 at 17:02, Gerd Hoffmann  wrote:
> > 
> > On Wed, Dec 07, 2022 at 09:14:39AM -0500, James Bottomley wrote:
> > > On Wed, 2022-12-07 at 15:09 +0100, Ard Biesheuvel wrote:
> > > > So at some point, these drivers will be removed rather than
> > > > kept
> > > > alive by the core team unless someone steps up.
> > > 
> > > How important is keeping them alive?
> > 
> > Most common use case is probably bootimg images created on other
> > hypervisors on qemu.  Otherwise there is little reason to use
> > something which is not virtio-scsi.
> > 
> > > I can volunteer to "maintain" them which I anticipate won't be
> > > much effort (plus I'm used to looking after obsolete SCSI
> > > equipment).  The hardware is obsolete, so the mechanics of their
> > > emulation isn't going to change, the only potential risk is
> > > changes in the guest to host transmission layer that breaks
> > > something.
> > 
> 
> Thanks James, that would be very helpful.
> 
> > Yes, I don't expect it being much effort, but knowing oldish scsi
> > stuff certainly helps understanding the driver code if needed.  If
> > you want step up sent a patch updating Maintainers.txt accordingly.
> > 
> 
> Having the informed opinion of a domain expert should allow us to
> diagnose issued related to these drivers with more confidence, and
> also give us insight in how obsolete those drivers actually are.
> 
> I can send the patch if you prefer.

Sure, who can resist someone else doing all the work.

I note we do have a maintained LSI driver: OvmfPkg/LsiScsiDxe.  It
seems to be based on the 53c896 which is really only a marginal subset
of the 1030 ... if I'm remembering correctly the 1030 did Low Voltage
Differential (so a faster SCSI Parallel bus), but since that's a SCSI
Bus protocol, it should have no real impact on the utility of the
emulation.  Is the LsiScsiDxe usable by Debian?

James



Bug#1016359: [edk2-devel] [Patch 1/2] OvmfPkg: Change default to disable MptScsi and PvScsi

2022-12-07 Thread James Bottomley
On Wed, 2022-12-07 at 15:09 +0100, Ard Biesheuvel wrote:
> So at some point, these drivers will be removed rather than kept
> alive by the core team unless someone steps up.

How important is keeping them alive?  I can volunteer to "maintain"
them which I anticipate won't be much effort (plus I'm used to looking
after obsolete SCSI equipment).  The hardware is obsolete, so the
mechanics of their emulation isn't going to change, the only potential
risk is changes in the guest to host transmission layer that breaks
something.  On the other hand, I've got to say I use virtio-scsi in all
my VM testing environments, so the maintenance will likely only be
reacting when someone else reports a problem.

James



Bug#1025453: Bug#1024093: The problem is still present in 0.3.61-1

2022-12-04 Thread James Bottomley
On Sun, 2022-12-04 at 23:49 +0100, Dylan Aïssi wrote:
> Le dim. 4 déc. 2022 à 23:28, James Bottomley
>
> So please restore the state of this bug report and open your own bug
> report.

I'm not really as expert in the bug control system as debian developers
are supposed to be but I think I got this done.

James



Bug#1024093: The problem is still present in 0.3.61-1

2022-12-04 Thread James Bottomley
On Sun, 2022-12-04 at 22:42 +0100, Dylan Aïssi wrote:
> Le sam. 3 déc. 2022 à 15:45, James Bottomley
>  a écrit :
> > 
> > I just tested this out with the same results as previously reported
> > 
> 
> Do you mean you don't have sound at all using pipewire 0.3.61 in a
> QEMU VM?

No, this is a physical system.  The sound problems are only partial as
described at

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024093#45

James



Bug#1024093: The problem is still present in 0.3.61-1

2022-12-03 Thread James Bottomley
I just tested this out with the same results as previously reported

James



Bug#1024093: Just broke for me on upgrade as well

2022-11-26 Thread James Bottomley
mythtv still works fine, but google-chrome (used for streaming)
doesn't.  I'm still using pulseaudio with pipewire-pulse and what
pavucontrol shows me is google-chrome having difficulty connecting to
the audio output (keeps appearing and disappearing in the playback
tab).

Falling back to 0.3.58 fixes everything



Bug#959069: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine

2020-04-28 Thread James Bottomley
Subject: linux-image-5.5.0-2-amd64 won't boot in a AMD SEV Virtual Machine
Package: src:linux
Version: 5.5.17-1
Severity: important

The boot failure is total: not even a console log can be seen, and
seems to be due to the necessary memory encryption option not being set
in the debian kernel: 

# CONFIG_AMD_MEM_ENCRYPT is not set

In spite of the fact that the rest of the SEV encryption variables are
set:

CONFIG_KVM_AMD_SEV=y
CONFIG_USB_SEVSEG=m

So I'm reporting this on the assumption that it is supposed to work out
of the box and not setting AMD_MEM_ENCRYPT was an oversight.  Not
setting this means that all the I/O devices are sending encrypted
memory pages through to QEMU which is what's causing the hang.  With
this set, the kernel would bounce all the encrypted pages into
unencrypted pages before sending them to devices.

James



Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load

2019-10-03 Thread James Bottomley
On Wed, 2019-10-02 at 22:07 +0200, Salvatore Bonaccorso wrote:
> > Linux Kernel 5.2 is completely unusable on most of my systems.  The
> > problem seems to be something to do with memory compaction causing
> > intervals where the system becomes unresponsive.
> > 
> > This is definitely an upstream issue (my laptop running the
> > upstream kernel is displaying the problem as well) so this bug is
> > really just a warning not to deploy the 5.2 kernel until a fix is
> > found.
> 
> If so, could you point where it was reported upstream so we can set
> accorrdingly where it has been forwarded to?

Well the initial incarnation of this upstream patch set

https://marc.info/?t=15676268933

Seems to fix the problem in my testbeds.  I'm testing out the first two
patches only at the moment.

James



Bug#941611: linux-image-5.2.0-2-amd64: Kernel 5.2 has terrible performance under load

2019-10-02 Thread James Bottomley
Package: src:linux
Version: 5.2.9-2
Severity: important
Tags: upstream

Dear Maintainer,

Linux Kernel 5.2 is completely unusable on most of my systems.  The problem
seems to be something to do with memory compaction causing intervals where
the system becomes unresponsive.

This is definitely an upstream issue (my laptop running the upstream
kernel is displaying the problem as well) so this bug is really just a
warning not to deploy the 5.2 kernel until a fix is found.

-- Package-specific info:
** Kernel log: boot messages should be attached

** Model information
sys_vendor: 
product_name: 
product_version: 
chassis_vendor: 
chassis_version: 
bios_vendor: Intel Corp.
bios_version: BX97510J.86A.1209.2006.0601.1340
board_vendor: Intel Corporation
board_name: D975XBX
board_version: AAD27094-305

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 82975X Memory Controller Hub 
[8086:277c]
Subsystem: Intel Corporation 82975X Memory Controller Hub [8086:5842]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel modules: i82975x_edac

00:01.0 PCI bridge [0604]: Intel Corporation 82975X PCI Express Root Port 
[8086:277d] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [88] Subsystem: Intel Corporation 82975X PCI Express Root 
Port [8086:]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address:   Data: 
Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- 
TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s, Exit 
Latency L0s <256ns
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- 
Surprise-
Slot #0, PowerLimit 75.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
Changed: MRL- PresDet+ LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
Capabilities: [100 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed+ WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [140 v1] Root Complex Link
Desc:   PortNumber=02 ComponentID=01 EltType=Config
Link0:  Desc:   TargetPort=00 TargetComponent=01 AssocRCRB- 
LinkType=MemMapped LinkValid+
Addr:   fed19000
Link1:  Desc:   TargetPort=03 TargetComponent=01 AssocRCRB- 
LinkType=Config LinkValid+
Addr:   00:03.0  CfgSpace=00018000
Kernel driver in use: pcieport

00:1b.0 Audio device [0403]: Intel Corporation NM10/ICH7 Family High Definition 
Audio Controller [8086:27d8] (rev 01)
Subsystem: Intel Corporation NM10/ICH7 Family High Definition Audio 
Controller [8086:0417]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- 

Bug#925082: postfix: "Chunk exceeds message size limit" when message_size_limit = 0

2019-03-25 Thread James Bottomley
On Tue, 19 Mar 2019 21:53:41 + Scott Kitterman  wrote:
> This is fixed in a new upstream release that I expect to package
> shortly.

Me too on being affected by this and waiting to the fixed version 3.4.4
being packaged.  Since postfix is actually rejecting email permanently,
shouldn't this be a grave bug?

James



Bug#920533: asterisk: on upgrade from 13.23.1 to 16.1.1 RTP streams get misdirected to NAT devices

2019-02-10 Thread James Bottomley
On Tue, 2019-01-29 at 10:54 +0100, Bernhard Schmidt wrote:
> Hi James,
> 
> thanks. Have you raised this issue with upstream somehow? I know
> chan_sip is deprecated, but I doubt a bug this severe would be
> undetected for that long.
> 
> I'll try to whip together a test for this (my test installation is
> using chan_pjsip and IPv6).

Actually, you can close this; it turns out to be a firewall issue: the
asterisk 16 update apparently changed the rtp.conf file, moving the RTP
listening port range outside of the matching range on the firewall
hence an intermittent audio loss problem with external connections. 
What the patch does is mostly open the conntrack port on the firewall
allowing the inbound RTP stream.  The correct fix is, of course, to put
the rules back to matching each other.

James



Bug#912864: openssl: new version of openssl breaks some openvpn clients

2019-02-07 Thread James Bottomley
On Thu, 2019-02-07 at 22:55 +0100, Jean-Marc wrote:
> On Mon, 26 Nov 2018 23:41:13 +0100 Sebastian Andrzej Siewior  a...@breakpoint.cc> wrote:
> > On 2018-11-04 22:15:04 [+0100], Kurt Roeckx wrote:
> > > > You're implying openvpn doesn't pick up the openssl.cnf changes
> > > > so I have to set tls-version-min 1.0 in the server side
> > > > configuration?  OK, that works too.  
> > > 
> > > Your client doesn't support the settings in the openssl.cfg file.
> > > Your openvpn client by defaults does TLS 1.0 only. The only way
> > > for your client to do something other than TLS 1.0 is set the
> > > tls-version-min variable to something. If you set it to 1.0, it
> > > will do any version supported by the openssl library higher than
> > > 1.0.
> > 
> > James, is everything okay/clear? The tls-version-min option for the
> > older OpenVPN version should have fixed things. Is there anything
> > else or can this be considered done?
> > 
> > > Kurt
> > 
> > Sebastian
> 
> Hi James,
> 
> May I ask you if you got all the answers you needed and if it fixed
> the problem.

Yes, I said that in the initial quote: setting tls-version-min in
openssl.cnf works, and that's what I've done.  It's just unexpected
that you have to update your openvpn config files.

James


signature.asc
Description: This is a digitally signed message part


Bug#920533: asterisk: on upgrade from 13.23.1 to 16.1.1 RTP streams get misdirected to NAT devices

2019-01-26 Thread James Bottomley
Tags: patch

This is the patch I have applied to my system which fixes the problem
locally

James

---

Index: BUILD/channels/chan_sip.c
===
--- BUILD.orig/channels/chan_sip.c
+++ BUILD/channels/chan_sip.c
@@ -10997,6 +10997,8 @@ static int process_sdp(struct sip_pvt *p
if (req->method == SIP_RESPONSE) {
start_ice(p->rtp, 1);
}
+   if (p->natdetected)
+   ast_sockaddr_copy(sa, >recv);
ast_sockaddr_set_port(sa, portno);
ast_rtp_instance_set_remote_address(p->rtp, sa);
if (debug) {
@@ -23891,6 +23893,8 @@ static void handle_response_invite(struc
int rtn;
struct ast_party_connected_line connected;
struct ast_set_party_connected_line update_connected;
+   char domain[MAXHOSTNAMELEN];
+   struct ast_sockaddr addr;
 
if (reinvite) {
ast_debug(4, "SIP response %d to RE-invite on %s call %s\n", 
resp, outgoing ? "outgoing" : "incoming", p->callid);
@@ -23945,6 +23949,9 @@ static void handle_response_invite(struc
if (!reinvite) {
set_pvt_allowed_methods(p, req);
}
+   get_domain(sip_get_header(req, "To"), domain, sizeof(domain));
+   ast_sockaddr_resolve_first(, domain, 0);
+   check_for_nat(, p);
 
switch (resp) {
case 100:   /* Trying */



Bug#913205: python-urllib3: mythtv ttvdb.py fails with latest python-urllib3

2018-11-08 Thread James Bottomley
On Thu, 2018-11-08 at 23:23 +0100, Daniele Tricoli wrote:
> On 11/8/18 10:26 PM, James Bottomley wrote:
> > The reported bug is a well known API incompatibility going> from
> > urllib3 1.23 to 1.24 and affects more than just mythtv.
> 
> It's not sure that this is a bug either for mythtv, as I said, the
> problem seems to be related to that cache that use pickle.
> Could you try to clean this cache?

Yes, removing the cache directory seems to allow it to work again.

> Consider, that this bug report is the first one after the upload of
> urllib3 1.24 to experimental, then unstable and urllib3 1.24 was also
> backported in 2018-10-29.

It's not the only bug report, though.  The internet has quite a few of
them; that's how I (eventually) figured out that the version change
from 1.22 to 1.24 of urllib3 was the problem.

> > I get that at some point upstreams do daft stuff like this and a
> > distribution has to roll with it but it's only been 23 days since
> > 1.24 was released so I think debian testing could do with a little
> > more caution, at least to give all the dependent projects time to
> > update their code.
> 
> I can understand your feeling but testing is the development state of
> the next stable Debian distribution, so some breakage can happen:
> yes, I hate that too, and I try to minimize it.
> I'm sorry for you but honestly I don't understand your "debian
> testing could do with a little more caution" related to packages that
> are not even in Debian: what do you suggest, practically?

Well, the usual assumption people work under is that upgrading minor
versions doesn't break stuff ... that means you should be able to
upgrade from 1.22 to 1.24 without issue.  That, unfortunately, doesn't
seem to be true for urllib3 as the trace dump show.  I suspect the only
way you're going to find out is by bug reports like this one.  So now
you know and you think you have a cause, is there a way to prevent
other people seeing the same issue?

James


signature.asc
Description: This is a digitally signed message part


Bug#913205: python-urllib3: mythtv ttvdb.py fails with latest python-urllib3

2018-11-08 Thread James Bottomley
On Thu, 2018-11-08 at 22:10 +0100, Daniele Tricoli wrote:
> Thanks for your report!
> Unfortunately package from deb-multimedia.org are not official, so
> you should ask to deb-multimedia.org maintainers as reported in their
> FAQ

I don't think this is necessarily their fault.  The reported bug is a
well known API incompatibility going from urllib3 1.23 to 1.24 and
affects more than just mythtv.

I get that at some point upstreams do daft stuff like this and a
distribution has to roll with it but it's only been 23 days since 1.24
was released so I think debian testing could do with a little more
caution, at least to give all the dependent projects time to update
their code.

James


signature.asc
Description: This is a digitally signed message part


Bug#913205: python-urllib3: mythtv ttvdb.py fails with latest python-urllib3

2018-11-07 Thread James Bottomley
Package: python-urllib3
Version: 1.24-1
Severity: important

Mythtv version 29.1+fixes20180821.gite5fc66e822-dmo2 is failing with

mythtv@vito:~$ /usr/share/mythtv/metadata/Television/ttvdb.py -B Superstore
Traceback (most recent call last):
  File "/usr/share/mythtv/metadata/Television/ttvdb.py", line 2578, in 
sys.exit(main())
  File "/usr/share/mythtv/metadata/Television/ttvdb.py", line 2278, in main
userkey=tvdb_account.account_identifier)
  File "/usr/lib/python2.7/dist-packages/MythTV/ttvdb/tvdb_api.py", line 693, 
in __init__
self.session.remove_expired_responses()
  File "/usr/lib/python2.7/dist-packages/requests_cache/core.py", line 159, in 
remove_expired_responses
self.cache.remove_old_entries(datetime.utcnow() - self._cache_expire_after)
  File "/usr/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 
110, in remove_old_entries
response, created_at = self.responses[key]
  File 
"/usr/lib/python2.7/dist-packages/requests_cache/backends/storage/dbdict.py", 
line 163, in __getitem__
return pickle.loads(bytes(super(DbPickleDict, self).__getitem__(key)))
ImportError: No module named ordered_dict

Meaning that mythtv can no longer pick up programme information from thetvdb.org

Falling back to version 1.22-1 makes everything work again.

-- System Information:
Debian Release: buster/sid
  APT prefers testing-debug
  APT policy: (500, 'testing-debug'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages python-urllib3 depends on:
ii  python  2.7.15-3
ii  python-six  1.11.0-2

Versions of packages python-urllib3 recommends:
ii  ca-certificates  20170717
ii  python-cryptography  2.3-1
ii  python-idna  2.6-1
ii  python-ipaddress 1.0.17-1
ii  python-openssl   18.0.0-1

Versions of packages python-urllib3 suggests:
pn  python-ntlm   
pn  python-socks  

-- no debconf information



Bug#912864: [Pkg-openssl-devel] Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
On Sun, 2018-11-04 at 21:30 +0100, Kurt Roeckx wrote:
> On Sun, Nov 04, 2018 at 12:13:43PM -0800, James Bottomley wrote:
> > 
> > No, I'm saying with no client tls-version-min specified at all (the
> > usual default openvpn config) it fails in 1.1.1 and works with
> > 1.1.0
> > 
> > With client tls-version-min set to 1.0 it works with both.
> 
> Yes, and that's totally what I expected, and have been explaining.
> The 2.3.X version only want to do TLS 1.0 unless you specify
> "tls-version-min 1.0", in which case they also do TLS 1.2.

You're implying openvpn doesn't pick up the openssl.cnf changes so I
have to set tls-version-min 1.0 in the server side configuration?  OK,
that works too.  

> So I'm failing to see what this bug report is about.

When you upgrade from openssl 1.1.0 to 1.1.1 causes an openvpn
connection failure which the upgrade instructions don't fix.  It also
seems to me there are probably quite a few other openssl.cnf blind
applications in the system which will fail in a similar fashion.

James



Bug#912864: [Pkg-openssl-devel] Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
On Sun, 2018-11-04 at 21:10 +0100, Kurt Roeckx wrote:
> On Sun, Nov 04, 2018 at 11:39:59AM -0800, James Bottomley wrote:
> > > 
> > > On which side do you use tls-version-min?
> > 
> > client
> > 
> > >  Can you please give the version of both openvpn and openssl on
> > > both
> > > sides.
> > 
> > Client is openwrt, server is debian testing.  The package of the
> > server
> > was already provided in the bug report, but again it's
> > 
> > openssl 1.1.1-2
> > openvpn 2.4.6-1
> > 
> > Packages on the openwrt client are
> > 
> > libopenssl 1.0.2g-1
> > openvpn-openssl  2.3.6-5
> 
> So you're saying that even with tls-version-min 1.0 on your
> client side and with openssl.cnf changed on the server it's still
> not working?

No, I'm saying with no client tls-version-min specified at all (the
usual default openvpn config) it fails in 1.1.1 and works with 1.1.0

With client tls-version-min set to 1.0 it works with both.

James



Bug#912864: [Pkg-openssl-devel] Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
On Sun, 2018-11-04 at 20:32 +0100, Kurt Roeckx wrote:
> On Sun, Nov 04, 2018 at 11:19:41AM -0800, James Bottomley wrote:
> > On Sun, 2018-11-04 at 20:15 +0100, Kurt Roeckx wrote:
> > > This is not at all how the version negiotation in TLS 1.2 and
> > > below works. The client just indicates the highest version it
> > > supports, so for instance TLS 1.2. It's then up to the server to
> > > pick a version that the client supports, so one that is smaller
> > > than
> > > TLS 1.2, and it might pick TLS 1.0 or 1.2. It will then send a
> > > server
> > > hello with that version.
> > 
> > OK, so I'm weary of trying to construct a theory of what the bug
> > actually is, why don't you try to come up with one.  The symptoms
> > are
> > that openvpn in openwrt works with server 1.1.0 and fails with
> > server
> > 1.1.1 if you don't specify tls-version-min 1.0 on the command line.
> 
> On which side do you use tls-version-min?

client

>  Can you please give the version of both openvpn and openssl on both
> sides.

Client is openwrt, server is debian testing.  The package of the server
was already provided in the bug report, but again it's

openssl 1.1.1-2
openvpn 2.4.6-1

Packages on the openwrt client are

libopenssl 1.0.2g-1
openvpn-openssl  2.3.6-5

James



Bug#912864: [Pkg-openssl-devel] Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
On Sun, 2018-11-04 at 20:15 +0100, Kurt Roeckx wrote:
> This is not at all how the version negiotation in TLS 1.2 and
> below works. The client just indicates the highest version it
> supports, so for instance TLS 1.2. It's then up to the server to
> pick a version that the client supports, so one that is smaller than
> TLS 1.2, and it might pick TLS 1.0 or 1.2. It will then send a server
> hello with that version.

OK, so I'm weary of trying to construct a theory of what the bug
actually is, why don't you try to come up with one.  The symptoms are
that openvpn in openwrt works with server 1.1.0 and fails with server
1.1.1 if you don't specify tls-version-min 1.0 on the command line.

> So there are normally 2 cases that can be a problem:
> - The client sends TLS 1.0 and the server has 1.2 as minimum, so
>   the server say it's not supported.
> - The client sends TLS 1.2, the server answers with 1.0, the
>   client says 1.0 is too low.
> 
> The error message you showed says that it's the server that is
> rejecting the client's version, and that the server is running a
> 1.1.1 version. Are you sure you've actually restarted the server
> after changing the config file?

Yes, the server got rebooted after the upgrade.

James



Bug#912864: [Pkg-openssl-devel] Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
On Sun, 2018-11-04 at 18:43 +0100, Kurt Roeckx wrote:
> Older versions of openvpn only support TLS 1.0 because they told
> OpenSSL to only use TLS 1.0. Adding the --tls-version-min 1.0
> should make it support all TLS versions since openvpn 2.3.4 or
> something like that, and I think 2.4 or newer should just work.

There's a difference: if you don't specify the command line tls-
version-min, it actually asks openssl for the minimum version.  If you
do specify, it takes what you tell it.

> But if you changed the openssl.cfg to say all versions are
> supported, it should work too, I'm not sure why you say otherwise.

Well, obviously because it doesn't work as the log attached in the bug
report shows.

The values I have in openssl.cnf are the recommended

MinProtocol = None
CipherString = DEFAULT

And it definitely works because imap has an android client at 0.9.8
which didn't work before the addition of that.

The openssl code looks to use SSL_CTX_get_min_proto_version() if you
don't specify a version, so it finds a protocol below tls 1.0 to
present which causes the error.  From the ordering in openssl, this is
likely to be SSLv3, isn't it?

The bug here is that you shouldn't kill the negotiation just because
the client offers to support SSLv3, you should move on to negotiate a
more secure cipher and only error out if the client can't support any
protocols openssl is told to consider secure.

James



Bug#912864: openssl: new version of openssl breaks some openvpn clients

2018-11-04 Thread James Bottomley
Package: openssl
Version: 1.1.1-2
Severity: important

I've applied all the downgrades recommended to the openssl.cnf file
and most services are now working again with the exception of openvpn.

The only failure seems to be a VPN connection to an openwrt router.
The router is running Chaos Calmer 15.05 and has an upgraded openssl
(to 1.0.2g-1).

The error is on the debian server side and only shows up at openvpn
extreme verbosity:

Sun Nov  4 08:40:04 2018 us=149552 50.35.68.20:56038 OpenSSL: 
error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported 
protocol

The full verbose negotiation is

Sun Nov  4 08:40:04 2018 us=116122 50.35.68.20:56038 Control Channel MTU parms 
[ L:1621 D:1212 EF:38 EB:0 ET:0 EL:3 ]
Sun Nov  4 08:40:04 2018 us=116160 50.35.68.20:56038 Data Channel MTU parms [ 
L:1621 D:1450 EF:121 EB:406 ET:0 EL:3 ]
Sun Nov  4 08:40:04 2018 us=116243 50.35.68.20:56038 Local Options String 
(VER=V4): 'V4,dev-type tun,link-mtu 1557,tun-mtu 1500,proto UDPv4,cipher 
AES-128-CBC,auth SHA1,keysize 128,key-method 2,tls-server'
Sun Nov  4 08:40:04 2018 us=116263 50.35.68.20:56038 Expected Remote Options 
String (VER=V4): 'V4,dev-type tun,link-mtu 1557,tun-mtu 1500,proto UDPv4,cipher 
AES-128-CBC,auth SHA1,keysize 128,key-method 2,tls-client'
RSun Nov  4 08:40:04 2018 us=116336 50.35.68.20:56038 TLS: Initial packet from 
[AF_INET]50.35.68.20:56038, sid=812b650a 26613bfb
WRRWRSun Nov  4 08:40:04 2018 us=149552 50.35.68.20:56038 OpenSSL: 
error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported 
protocol
Sun Nov  4 08:40:04 2018 us=150331 50.35.68.20:56038 TLS_ERROR: BIO read 
tls_read_plaintext error
Sun Nov  4 08:40:04 2018 us=150984 50.35.68.20:56038 TLS Error: TLS object -> 
incoming plaintext read error
Sun Nov  4 08:40:04 2018 us=151598 50.35.68.20:56038 TLS Error: TLS handshake 
failed
Sun Nov  4 08:40:04 2018 us=152357 50.35.68.20:56038 SIGUSR1[soft,tls-error] 
received, client-instance restarting

Obviously this was all working with 1.1.0 so something seems to have
changed in the tls negotiation routines.

I can fix this in the client by adding the openssl command
--tls-version-min 1.0 so it probably means, the way openvpn works that
the openssl version installed in openwrt has some strange TLS version
< 1.0 but clearly openssl shouldn't error out when presented with
lower ciphers it should negotiate the mutually acceptable version.

-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 4.18.0-2-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages openssl depends on:
ii  libc6  2.27-8
ii  libssl1.1  1.1.1-2

openssl recommends no packages.

Versions of packages openssl suggests:
ii  ca-certificates  20170717

-- Configuration Files:
/etc/ssl/openssl.cnf changed [not included]

-- no debconf information



Bug#872375: lirc: irrecord segfaults when recording a button

2018-02-11 Thread James Bottomley
I'm getting this same segfault as well.  My old remote fell apart, so I
was trying to train the IR receiver on a new one (well, the unused VCR
section of the current TV remote.

This is what I get:

Starting program: /usr/bin/irrecord -d /dev/lirc0 -f
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
Warning: Running as root.
Using raw access on device /dev/lirc0

irrecord -  application for recording IR-codes for usage with lirc
Copyright (C) 1998,1999 Christoph Bartelmus(l...@bartelmus.de)

This program will record the signals from your remote control
and create a config file for lircd.

A proper config file for lircd is maybe the most vital part of this
package, so you should invest some time to create a working config
file. Although I put a good deal of effort in this program it is often
not possible to automatically recognize all features of a remote
control. Often short-comings of the receiver hardware make it nearly
impossible. If you have problems to create a config file READ THE
DOCUMENTATION at https://sf.net/p/lirc-remotes/wiki

If there already is a remote control of the same brand available at
http://sf.net/p/lirc-remotes you might want to try using such a
remote as a template. The config files already contains all
parameters of the protocol used by remotes of a certain brand and
knowing these parameters makes the job of this program much
easier. There are also template files for the most common protocols
available. Templates can be downloaded using irdb-get(1). You use a
template file by providing the path of the file as a command line
parameter.

Please take the time to finish the file as described in
https://sourceforge.net/p/lirc-remotes/wiki/Checklist/ an send it
to   so it can be made available to others.

Press RETURN to continue.




Usually you should not create a new config file for devinput
devices. LIRC is installed with a devinput.lircd.conf file which 
is built for the current system which works with all remotes 
supported by the kernel. There might be a need to update 
this file so it matches the current kernel. For this, use the 
lirc-make-devinput(1) script.

Press RETURN to continue.

Checking for ambient light  creating too much disturbances.
Please don't press any buttons, just wait a few seconds...

No significant noise (received 0 bytes)

Enter name of remote (only ascii, no spaces) :c
Using c.lircd.conf as output filename

Hold down an arbitrary key
...
.
Found gap (7941 us)

Please enter the name for the next button (press  to finish
recording)
KEY_A

Now hold down button "KEY_A".

Program received signal SIGSEGV, Segmentation fault.
0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x779af1eb in record_buttons (btn_state=0x7fffe560, 
last_status=, state=, opts=)
at irrecord.c:1966
#2  0x814d in ?? ()
#3  0x6e4c in ?? ()
#4  0x77613f2a in __libc_start_main (main=0x64c0,
argc=4, 
argv=0x7fffeb58, init=, fini=, 
rtld_fini=, stack_end=0x7fffeb48)
at ../csu/libc-start.c:310
#5  0x74da in ?? ()
(gdb) 

James



Bug#871987: This bug appears to be fixed as of 1.1.0g-2

2017-11-11 Thread James Bottomley
I can confirm that all the previously failing tools (dovecot, stunnel
etc) are working again with 1.1.0g-2

Presumably this also fixes 

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=875423

James



Bug#871987: openssl breaks dovecot

2017-08-16 Thread James Bottomley
On Wed, 2017-08-16 at 08:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-08-14 10:46:04 [-0700], James Bottomley wrote:
> > 
> > Just a me too on this: on upgrade, both dovecot and a stunnel based
> > web application got broken for an older android client.
> >  Downgrading to 1.1.0f-3 fixes the problem for both dovecot and
> > stunnel4
> 
> So what are we talking about? Android 4 and the internal mail and web
> client? What happens if you switch to firefox/chrome and k-9
> mail/blue mail?

When you run a system for others, you don't get to dictate tools.
 However, from the complaints it seems to be android 2.3.7 and any
embedded system still using openssl 0.9.8, which must be using TLS 1.0

James



Bug#871987: openssl breaks dovecot

2017-08-14 Thread James Bottomley
Just a me too on this: on upgrade, both dovecot and a stunnel based web
application got broken for an older android client.  Downgrading
to 1.1.0f-3 fixes the problem for both dovecot and stunnel4

James



Bug#793814: Workaround for this bug

2016-05-11 Thread James Bottomley
On Wed, 2016-05-11 at 12:06 +0200, Michael Biebl wrote:
> Hi James
> 
> On Thu, 15 Oct 2015 10:55:05 -0700 James Bottomley
> <james.bottom...@hansenpartnership.com> wrote:
> > On Thu, 2015-10-15 at 19:45 +0200, Michael Biebl wrote:
> > > Btw, in both cases (#770135 and #793814) the users have been
> > > restarting
> > > the dbus daemon. You mentioned that you did not do that.
> > > 
> > > So I'm not sure if it's actually the same issue after all and
> > > your
> > > problem should be tracked as a separate issue.
> > 
> > I'm starting to think it's a logind robustness problem.  I traced
> > back
> > through the logs to the first instance after reboot and this is
> > what I
> > find:
> 
> ..
> 
>  Logind is still active and replies later in the trace, so it looks
> like
> > dbus either dropped a message or did some type of unexpected
> > disconnect.
> > After this, logind works until it can't abandon the session, then
> > it
> > never replies on the bus again.  So I suspect somewhere in the
> > error
> > handling inside logind it doesn't cope with unexpected loss of dbus
> > messages.
> 
> We recently had this upstream bug report, which looks like it could
> be
> relevant:
> https://github.com/systemd/systemd/issues/1961
> 
> Do you see session scopes piling up on your system?

bedivere:~# ls -ld /run/systemd/system/session-*|wc -l
141

So apparently, yes.

signature.asc
Description: This is a digitally signed message part


Bug#775578: systemd kills spamassassin on system start

2016-02-11 Thread James Bottomley
On Thu, 2016-02-11 at 23:41 +0100, Michael Biebl wrote:
> Control: tags -1 + moreinfo
> 
> On Sat, 17 Jan 2015 09:08:32 -0800 James Bottomley
> <james.bottom...@hansenpartnership.com> wrote:
> > Package: systemd
> > Version: 215-8
> > Severity: normal
> > 
> > Almost every time the system reboots, spamassassin fails to start. 
> >  The systemd logs for this are:
> > 
> > # systemctl status -l spamassassin.service
> > ● spamassassin.service - Perl-based spam filter using text
> > analysis
> >Loaded: loaded (/lib/systemd/system/spamassassin.service;
> > enabled)
> >Active: failed (Result: timeout) since Sat 2015-01-17 08:49:04
> > PST; 3min 45s ago
> >   Process: 528 ExecStart=/usr/sbin/spamd -d -
> > -pidfile=/var/run/spamassassin.pid $OPTIONS (code=killed,
> > signal=TERM)
> > 
> > Jan 17 08:48:10 bedivere spamd[528]: logger: removing stderr method
> > Jan 17 08:49:04 bedivere systemd[1]: spamassassin.service start
> > operation timed out. Terminating.
> > Jan 17 08:49:04 bedivere systemd[1]: Failed to start Perl-based
> > spam filter using text analysis.
> > Jan 17 08:49:04 bedivere systemd[1]: Unit spamassassin.service
> > entered failed state.
> > Jan 17 08:49:04 bedivere spamd[748]: spamd: server killed by
> > SIGTERM, shutting down
> > Jan 17 08:49:04 bedivere spamd[748]: spamd: cannot unlink
> > /var/run/spamassassin.pid: No such file or directory
> > 
> > This server is still x86 and a big internet system, so it has lots
> > of
> > intensive processes on start, like fail2ban , clamd and apache.  It
> > looks like because of this, systemd gives spamassassin a few
> > seconds
> > (there's no log of how long; the logger message is from the pre
> > -reboot
> > os) to start and it takes longer.
> > 
> > As far as I can tell, this value doesn't seem to be configurable or
> > even package specific.  It looks remarkably silly for the init
> > system
> > to impose an absolute timeout on service start, particularly when
> > it
> > doesn't take into account the characteristics of the machine or ask
> > the package how long it might reasonably take.
> > 
> > So far, it's only spamassassin, so it's annoying but not serious to
> > have to log in and restart it after every reboot.  However, if
> > systemd
> > did this to a necessary service, it would become a serious bug
> 
> Can you provide instructions how this issue can be reproduced?

You mean reproduce this outside of booting the system? I have no idea
how to do that.  I suspect it's a load issue, so this is the system:

cat /prbedivere:~# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping: 9
microcode   : 0x2e
cpu MHz : 2813.471
cache size  : 512 KB
physical id : 0
siblings: 1
core id : 0
cpu cores   : 1
apicid  : 0
initial apicid  : 0
fdiv_bug: no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
pebs bts cid xtpr
bugs:
bogomips: 5626.94
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 32 bits virtual
power management:

bedivere:~# cat /proc/meminfo 
MemTotal:1022016 kB
MemFree:   43816 kB
MemAvailable: 346924 kB
Buffers:   36980 kB
Cached:   295580 kB
SwapCached:39344 kB
Active:   528756 kB
Inactive: 395168 kB
Active(anon): 321488 kB
Inactive(anon):   306540 kB
Active(file): 207268 kB
Inactive(file):88628 kB
Unevictable:   0 kB
Mlocked:   0 kB
HighTotal:131016 kB
HighFree:   7628 kB
LowTotal: 891000 kB
LowFree:   36188 kB
SwapTotal:   1951892 kB
SwapFree:1582008 kB
Dirty:   120 kB
Writeback: 0 kB
AnonPages:568992 kB
Mapped:54324 kB
Shmem: 36664 kB
Slab:  36280 kB
SReclaimable:  21972 kB
SUnreclaim:14308 kB
KernelStack:3096 kB
PageTables: 4420 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit: 2462900 kB
Committed_AS:1973536 kB
VmallocTotal: 122880 kB
VmallocUsed:   11120 kB
VmallocChunk: 103032 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   4096 kB
DirectMap4k:   86008 kB
DirectMap4M:  823296 kB

You can probably configure a VM roughly to match that

James



signature.asc
Description: This is a digitally signed message part


Bug#793814: Workaround for this bug

2015-10-15 Thread James Bottomley
On Thu, 2015-10-15 at 07:35 +0200, Stefano wrote:
> Same here. I have a git server with clients connecting every 10 minutes or
> so via SSH, and the problem is exactly the same.
> 
> About 20 hours from reboot logind goes nuts, then it must be restarted at
> very variable intervals, about a couple a day.

What architecture is this?  I only see the problems on a 32 bit x86
system.  I don't see them on a 64 bit x86 in spite of trying to trigger
them there (in a sample size of 1 of each, so it's not statistically
significant ... yet).

I also see that /run/systemd/sessions has about a hundred or so sessions
whose leaders are dead on the 32 bit system.  On the 64 bit system it
appears only still living sessions are in here.

The final curious observation is that on the 32 bit system, the session
id never seems to go over 4k.  On the 64 bit system, it does.

James



Bug#793814: Workaround for this bug

2015-10-15 Thread James Bottomley
On Thu, 2015-10-15 at 19:45 +0200, Michael Biebl wrote:
> Btw, in both cases (#770135 and #793814) the users have been restarting
> the dbus daemon. You mentioned that you did not do that.
> 
> So I'm not sure if it's actually the same issue after all and your
> problem should be tracked as a separate issue.

I'm starting to think it's a logind robustness problem.  I traced back
through the logs to the first instance after reboot and this is what I
find:

Oct 11 04:17:01 bedivere CRON[2185]: pam_unix(cron:session): session opened for 
user root by (uid=0)
Oct 11 04:17:01 bedivere CRON[2185]: pam_unix(cron:session): session closed for 
user root
Oct 11 04:17:40 bedivere sshd[2193]: Accepted publickey for clh15 from 
184.11.141.41 port 38172 ssh2: DSA 
SHA256:UK5h9LRMtBthvW0Ncv1SG4WRmSFNs1hPcowPzzyt+iY
Oct 11 04:17:40 bedivere sshd[2193]: pam_unix(sshd:session): session opened for 
user clh15 by (uid=0)
Oct 11 04:17:40 bedivere systemd-logind[640]: New session 916 of user clh15.
Oct 11 04:17:40 bedivere sshd[2195]: Accepted publickey for clh15 from 
184.11.141.41 port 38173 ssh2: DSA 
SHA256:UK5h9LRMtBthvW0Ncv1SG4WRmSFNs1hPcowPzzyt+iY
Oct 11 04:17:40 bedivere sshd[2195]: pam_unix(sshd:session): session opened for 
user clh15 by (uid=0)
Oct 11 04:17:40 bedivere systemd-logind[640]: New session 917 of user clh15.
Oct 11 04:17:40 bedivere sshd[2195]: pam_systemd(sshd:session): Failed to 
create session: Message recipient disconnected from message bus without replying
Oct 11 04:17:40 bedivere systemd-logind[640]: Failed to abandon session scope: 
Transport endpoint is not connected
Oct 11 04:17:40 bedivere sshd[2204]: Received disconnect from 184.11.141.41: 
11: disconnected by user
Oct 11 04:17:40 bedivere sshd[2204]: Disconnected from 184.11.141.41
Oct 11 04:17:40 bedivere sshd[2193]: pam_unix(sshd:session): session closed for 
user clh15
Oct 11 04:17:40 bedivere sshd[2203]: Received disconnect from 184.11.141.41: 
11: disconnected by user
Oct 11 04:17:40 bedivere sshd[2203]: Disconnected from 184.11.141.41
Oct 11 04:17:40 bedivere sshd[2195]: pam_unix(sshd:session): session closed for 
user clh15
Oct 11 04:18:05 bedivere sshd[2193]: pam_systemd(sshd:session): Failed to 
release session: Connection timed out
Oct 11 04:18:05 bedivere systemd-logind[640]: Failed to abandon session scope: 
Transport endpoint is not connected
Oct 11 04:18:05 bedivere dbus[698]: [system] Failed to activate service 
'org.freedesktop.login1': timed out

Thereafter everything fails to activate org.freedesktop.login1.

However, it looks like the trouble is here:

Oct 11 04:17:40 bedivere sshd[2195]: pam_systemd(sshd:session):
Failed to create session: Message recipient disconnected from
message bus without replying

Logind is still active and replies later in the trace, so it looks like
dbus either dropped a message or did some type of unexpected disconnect.
After this, logind works until it can't abandon the session, then it
never replies on the bus again.  So I suspect somewhere in the error
handling inside logind it doesn't cope with unexpected loss of dbus
messages.

James



signature.asc
Description: This is a digitally signed message part


Bug#793814: Workaround for this bug

2015-10-14 Thread James Bottomley
On Thu, 2015-10-15 at 00:30 +0200, Michael Biebl wrote:
> Am 15.10.2015 um 00:16 schrieb James Bottomley:
> > Since there doesn't seem to have been any progress on this from the
> > systemd side, and I'm really tired of my email users yelling at me
> > (justifiably since once the systemd-logind service dies, dovecot sasl
> > fails to work and they can't send email), this is the script I came up
> > with to alleviate the problem:
> 
> So are you restarting dbus as well for some reason?

You can't really restart dbus in a systemd system...  However, the
problem seems to be the endpoint stops listening.  systemd-logind is
still running but strace -p shows it no longer responds when something
contacts the end point.  If you restart it, it will listen for a while
before stopping again.

The script simply detects the dbus timeout and restarts the service.

James



signature.asc
Description: This is a digitally signed message part


Bug#793814: Workaround for this bug

2015-10-14 Thread James Bottomley
Since there doesn't seem to have been any progress on this from the
systemd side, and I'm really tired of my email users yelling at me
(justifiably since once the systemd-logind service dies, dovecot sasl
fails to work and they can't send email), this is the script I came up
with to alleviate the problem:

---
#!/bin/bash
##
# shell script to restart systemd.logind when it fails
##
tail -F /var/log/auth.log|while read junk; do
case $junk in
*pam_systemd*org.freedesktop.login1*timed\ out*)
echo $junk
systemctl restart systemd-logind.service
;;
esac
done
---

You spawn it from init (redirected to a log) and it will monitor the
authentication log for any indication that systemd-logind has dropped
off the dbus and respawn it.   The effect is that even if they get a
SASL authentication failure one time, it will already be working the
next time they try.

James



Bug#793814: Workaround for this bug

2015-10-14 Thread James Bottomley
On Thu, 2015-10-15 at 01:06 +0200, Michael Biebl wrote:
> Am 15.10.2015 um 00:34 schrieb James Bottomley:
> > On Thu, 2015-10-15 at 00:30 +0200, Michael Biebl wrote:
> >> Am 15.10.2015 um 00:16 schrieb James Bottomley:
> >>> Since there doesn't seem to have been any progress on this from the
> >>> systemd side, and I'm really tired of my email users yelling at me
> >>> (justifiably since once the systemd-logind service dies, dovecot sasl
> >>> fails to work and they can't send email), this is the script I came up
> >>> with to alleviate the problem:
> >>
> >> So are you restarting dbus as well for some reason?
> > 
> > You can't really restart dbus in a systemd system...  However, the
> > problem seems to be the endpoint stops listening.  systemd-logind is
> > still running but strace -p shows it no longer responds when something
> > contacts the end point.  If you restart it, it will listen for a while
> > before stopping again.
> 
> If in your case you are not restarting dbus, can you isolate what's
> causing logind to stop responding on your system?
> I.e., can you provide us with steps how to reproduce the issue.

It happens naturally in a running system.  This one supplies
sparkleshare sync services for shared and backup directories, so there
are a huge number of micro sshd logins doing syncs, SASL via dovecot for
postfix and dovecot itself.  It's not based on LDAP, so everything goes
via the password file based authenticator.

The logs indicate that from reboot to first timeout took about 20 hours.
Thereafter, the timeouts seem to increase in frequency to about two or
three an hour.

James



signature.asc
Description: This is a digitally signed message part


Bug#801629: asterisk: when run with systemd, asterisk no longer defaults to realtime or allows it to be configured

2015-10-12 Thread James Bottomley
Package: asterisk
Version: 1:13.1.0~dfsg-1.1+b1
Severity: normal

Under systemv init, asterisk is spawned by /etc/init.d/asterisk, which
checks the /etc/default/asterisk file and spawns asterisk realtime
(with the -p flag) unless the default is altered to AST_REALTIME=no

However, with the change to systemd, the default is no longer to spawn
with the -p flags.  Indeed, it's no longer configurable because the
whole lot is hard coded in the /etc/systemd/system/asterisk.service
as:

ExecStart=/usr/sbin/asterisk -g -f -U asterisk

Firstly, I think we need to put this back to the default of being
realtime unless requested not to, and secondly, I think we need to
respect the configuration options of /etc/default/asterisk.
Unfortunately, systemd is trying to phase these files out (by ignoring
them):

http://0pointer.de/blog/projects/on-etc-sysinit.html

But according to the blog, there is still a way of importing shell
script like configuration files via the EnvironmentFile option, or by
spawning an actual script that reads the file.


-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 4.2.0-1-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages asterisk depends on:
ii  adduser   3.113+nmu3
ii  asterisk-config   1:13.1.0~dfsg-1.1
ii  asterisk-core-sounds-en [asterisk-prompt-en]  1.4.22-1
ii  asterisk-modules  1:13.1.0~dfsg-1.1+b1
ii  asterisk-sounds-main [asterisk-prompt-en] 1:1.6.2.9-2+squeeze10
ii  init-system-helpers   1.23
ii  libc6 2.19-22
ii  libcap2   1:2.24-12
ii  libedit2  3.1-20150325-1
ii  libgcc1   1:5.2.1-21
ii  libjansson4   2.7-3
ii  libpopt0  1.16-10
ii  libsqlite3-0  3.8.11.1-1
ii  libssl1.0.0   1.0.2d-1
ii  libstdc++65.2.1-21
ii  libtinfo5 6.0+20150810-1
ii  libuuid1  2.27-3
ii  libxml2   2.9.2+zdfsg1-4
ii  libxslt1.11.1.28-2+b2

Versions of packages asterisk recommends:
ii  asterisk-moh-opsound-gsm 2.03-1
ii  asterisk-voicemail [asterisk-voicemail-storage]  1:13.1.0~dfsg-1.1+b1
ii  sox  14.4.1-5

Versions of packages asterisk suggests:
pn  asterisk-dahdi   
ii  asterisk-dev 1:13.1.0~dfsg-1.1
ii  asterisk-doc 1:13.1.0~dfsg-1.1
ii  asterisk-ooh323  1:13.1.0~dfsg-1.1+b1
pn  asterisk-vpb 

-- no debconf information



Bug#800087: closed by Michael Biebl <bi...@debian.org> (Re: Bug#800087: systemd lists running daemons as failed after reboot)

2015-10-04 Thread James Bottomley
On Sun, 2015-10-04 at 23:06 +, Debian Bug Tracking System wrote:
> Well, this is apparently an apache configuration problem, causing the
> service to return a non-zero exit code.
> 
> Once you fix that, the service should not be marked as failed.

That's pretty obviously an incorrect deduction, isn't it?  If it were a
config file problem leading to the control process erroring out then the
manual systemctl start apache would also fail; but, as you can see from
the description in the original report, it doesn't.

The logs have this in them:

[ 2015-10-03 10:41:55.6129 1309/b5cfeb40 age/Hel/Main.cpp:455 ]: Signal 
received. Gracefully shutting down... (send signal 2 more time(s) to force 
shutdown)
[ 2015-10-03 10:41:55.6160 1319/b6960b40 age/Log/Main.cpp:349 ]: Signal 
received. Gracefully shutting down... (send signal 2 more time(s) to force 
shutdown)
[ 2015-10-03 10:45:21.1489 1353/b5dfeb40 age/Hel/Main.cpp:455 ]: Signal 
received. Gracefully shutting down... (send signal 2 more time(s) to force 
shutdown)
[ 2015-10-03 10:45:21.1521 1374/b68fab40 age/Log/Main.cpp:349 ]: Signal 
received. Gracefully shutting down... (send signal 2 more time(s) to force 
shutdown)

So apache dies because something external kills it during boot up.  Hm,
I wonder what that could be ...

James



Bug#800087: systemd lists running daemons as failed after reboot

2015-10-03 Thread James Bottomley
On Wed, 2015-09-30 at 15:20 +0200, Michael Biebl wrote:
> Control: tags -1 + moreinfo
> 
> Am 26.09.2015 um 18:08 schrieb James Bottomley:
> > Package: systemd
> > Version: 226-3
> > Severity: normal
> > 
> > rebooting the current system often causes listed init system failures.
> > in addition to failing to start spamd (listed under separate bug).
> > Systemd thinks units have failed when, in fact, they're running.  The
> > most common units for this are apache2 and mysql.
> > 
> > This is the state of a recent reboot, where apache2 is listed as failed
> > 
> > bedivere:~# systemctl --failed
> >   UNITLOAD   ACTIVE SUBDESCRIPTION
> > ● apache2.service loaded failed failed LSB: Apache2 web server
> > 
> > LOAD   = Reflects whether the unit definition was properly loaded.
> > ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
> > SUB= The low-level unit activation state, values depend on unit type.
> > 
> > 1 loaded units listed. Pass --all to see loaded but inactive units, too.
> > To show all installed unit files use 'systemctl list-unit-files'.
> > 
> > Whereas ps shows it to be running
> 
> What's the output of
> systemctl status apache2.service ?

OK, so the machine only has a maintenance window on saturday.  I
rebooted and asked this and this is the result:

bedivere:~# systemctl status -l apache2.service
● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2)
   Active: failed (Result: exit-code) since Sat 2015-10-03 10:45:42 PDT; 1min 
53s ago
 Docs: man:systemd-sysv-generator(8)
  Process: 693 ExecStart=/etc/init.d/apache2 start (code=exited, 
status=1/FAILURE)
   CGroup: /system.slice/apache2.service
   ├─1407 /usr/sbin/apache2 -k start
   ├─1410 PassengerAgent watchdog   
  
   ├─1417 PassengerAgent server 

   ├─1422 PassengerAgent logger 

   ├─1699 /usr/sbin/apache2 -k start
   ├─1701 /usr/sbin/apache2 -k start
   ├─1702 /usr/sbin/apache2 -k start
   ├─1704 /usr/sbin/apache2 -k start
   └─1706 /usr/sbin/apache2 -k start

Oct 03 10:44:38 bedivere systemd[1]: Starting LSB: Apache2 web server...
Oct 03 10:45:06 bedivere apache2[693]: Starting web server: apache2AH00180: 
WARNING: MaxRequestWorkers of 20 exceeds ServerLimit value of
Oct 03 10:45:06 bedivere apache2[693]: 5 servers, decreasing MaxRequestWorkers 
to 5.
Oct 03 10:45:06 bedivere apache2[693]: To increase, please see the ServerLimit 
directive.
Oct 03 10:45:42 bedivere apache2[693]: failed!
Oct 03 10:45:42 bedivere apache2[693]: The apache2 instance did not start 
within 20 seconds. Please read the log files to discover problems ... (warning).
Oct 03 10:45:42 bedivere systemd[1]: apache2.service: Control process exited, 
code=exited status=1
Oct 03 10:45:42 bedivere systemd[1]: Failed to start LSB: Apache2 web server.
Oct 03 10:45:42 bedivere systemd[1]: apache2.service: Unit entered failed state.
Oct 03 10:45:42 bedivere systemd[1]: apache2.service: Failed with result 
'exit-code'.

James



Bug#793814: I just ran into this on debian testing

2015-09-17 Thread James Bottomley
Package: systemd
Version: 225-1

My fatal symptoms are that Postfix SASL stops working with

Sep 17 07:11:11 bedivere postfix/smtpd[20955]: warning: unknown[x.x.x.x]: SASL 
PLAIN authentication failed: Connection lost to authentication server

Which is really annoying because it requires fixing by restarting
systemd-logind on the server and then restarting evolution on the
client.

I also get the delayed login problem until systemd-logind is restarted,
but that's minor compared to the postfix failure, which triggers
complaints from my users.

James



Bug#789242: libmyth-0.27-0: metadata fetching for specials no longer works

2015-06-18 Thread James Bottomley
Package: libmyth-0.27-0
Version: 0.27.4+fixes20150606-dmo1
Severity: normal

Upstream bug: https://code.mythtv.org/trac/ticket/12460

A bug has been introduced into mythtv that prevents the grabber from
fetching data for special episodes.  The problem is a check in
metadatadownload.cpp which checks the season for being non-zero before
grabbing specific episode data.  Fix this by only checking the episode
for being non-zero.

-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages libmyth-0.27-0 depends on:
ii  iputils-ping  3:20121221-5+b2
ii  libasound21.0.28-1
ii  libass5   1:0.12.2-dmo1
ii  libavahi-compat-libdnssd1 0.6.31-5
ii  libavc1394-0  0.5.4-2
ii  libc6 2.19-18
ii  libcrystalhd3 1:0.0~git20110715.fdd2f19-11
ii  libfftw3-double3  3.3.4-2
ii  libfftw3-single3  3.3.4-2
ii  libfreetype6  2.5.2-4
ii  libgcc1   1:5.1.1-9
ii  libgl1-mesa-glx [libgl1]  10.5.5-1
ii  libhdhomerun1 20140604-2
ii  libiec61883-0 1.2.0-0.2
ii  libjack-jackd2-0 [libjack-0.116]  1.9.10+20140719git3eb0ae6a~dfsg-2
ii  libmp3lame0   1:3.99.5-dmo4
ii  libmythavcodec54  0.27.4+fixes20150606-dmo1
ii  libmythavformat54 0.27.4+fixes20150606-dmo1
ii  libmythavutil52   0.27.4+fixes20150606-dmo1
ii  libmythswresample00.27.4+fixes20150606-dmo1
ii  libmythswscale2   0.27.4+fixes20150606-dmo1
ii  libmythzmq1   0.27.4+fixes20150606-dmo1
ii  libpulse0 6.0-2
ii  libqjson0 0.8.1-3
ii  libqt4-dbus   4:4.8.7+dfsg-1
ii  libqt4-network4:4.8.7+dfsg-1
ii  libqt4-opengl 4:4.8.7+dfsg-1
ii  libqt4-script 4:4.8.7+dfsg-1
ii  libqt4-sql4:4.8.7+dfsg-1
ii  libqt4-sql-mysql  4:4.8.7+dfsg-1
ii  libqt4-xml4:4.8.7+dfsg-1
ii  libqtcore44:4.8.7+dfsg-1
ii  libqtgui4 4:4.8.7+dfsg-1
ii  libqtwebkit4  2.3.4.dfsg-3
ii  libraw1394-11 2.1.1-1
ii  libsamplerate00.1.8-8
ii  libssl1.0.0   1.0.2c-1
ii  libstdc++65.1.1-9
ii  libtag1c2a1.9.1-2.1
ii  libva-glx11.5.1-2
ii  libva-x11-1   1.5.1-2
ii  libva11.5.1-2
ii  libvdpau1 1.1-1
ii  libx11-6  2:1.6.3-1
ii  libxext6  2:1.3.3-1
ii  libxinerama1  2:1.1.3-1+b1
ii  libxml2   2.9.1+dfsg1-5
ii  libxrandr22:1.4.2-1+b1
ii  libxv12:1.0.10-1+b1
ii  libxxf86vm1   1:1.1.4-1
ii  procps2:3.3.9-9
ii  zlib1g1:1.2.8.dfsg-2+b1

Versions of packages libmyth-0.27-0 recommends:
ii  udisks2  2.1.5-3

libmyth-0.27-0 suggests no packages.

-- no debconf information


Index: mythtv-dmo-0.27.4+fixes20150606/libs/libmythmetadata/metadatadownload.cpp
===
--- mythtv-dmo-0.27.4+fixes20150606.orig/libs/libmythmetadata/metadatadownload.cpp
+++ mythtv-dmo-0.27.4+fixes20150606/libs/libmythmetadata/metadatadownload.cpp
@@ -566,7 +566,7 @@ MetadataLookupList MetadataDownload::han
   lookup-GetSubtitle(), lookup, false);
 }
 
-if (list.isEmpty()  lookup-GetSeason()  lookup-GetEpisode())
+if (list.isEmpty()  lookup-GetEpisode())
 {
 list = grabber.LookupData(lookup-GetInetref(), lookup-GetSeason(),
   lookup-GetEpisode(), lookup);


Bug#788898: grub2-common: Jessie install on 4TB disk fails to reboot

2015-06-15 Thread James Bottomley
Package: grub2-common
Version: 2.02~beta2-23
Severity: important
usertags: debian-b...@lists.debian.org

Doing a fresh install of Jessie netinst on factory brand new 4TB disk
failed to reboot.  The system completely refuses to boot after the usb
key is removed.  The problem, as it turns out on my system, is that
nothing in the install process set the boot flag in the protected master
boot record and without that, the system refuses to boot the disk.

Fixing it up with parted disk_set pmbr_boot on was simple enough (after
about an hour of trying to work out what the problem was), but it will
appear to be an install failure to most users.

Most modern motherboards will recognise a GPT label partition table and
boot it regardless of the state of the protected master boot record boot
flag, but for the benefit of those which don't, the debian installer
should set this flag on all disks.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#787814: get-iplayer: unable to download anything

2015-06-07 Thread James Bottomley
On Fri, 05 Jun 2015 15:24:09 +0100 Peter J Ross peadar.ru...@gmail.com wrote:
 On Fri, 05 Jun 2015 12:04:52 +0100 =?utf-8?b?SnVoYSBKw6R5a2vDpA==?= 
 ju...@iki.fi wrote:
  Not a SCALAR reference at /usr/bin/get-iplayer line 7099.
 
 This has been fixed upstream in version 2.93.
 
 Current upstream version is 2.94.

I confirm this.  Simply copying get_iplayer version 2.94 from the git
repository over the existing binary file makes everything work again.

Can we update the version please?  Having a non-working package doesn't
really help anyone.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#784072: fail2ban recidive jail no longer works

2015-05-02 Thread James Bottomley
Package: fail2ban
Version: 0.9.1-1
Severity: important

the recidive jail is spewing lines into fail2ban.log like this

2015-05-02 11:30:38,076 fail2ban.action [26155]: ERROR   iptables -N 
f2b-recidive
iptables -A f2b-recidive -j RETURN
iptables -I INPUT -p all -m multiport --dports all -j f2b-recidive -- stderr: 
biptables v1.4.21: multiport needs `-p tcp', `-p udp', `-p udplite', `-p sctp' 
or `-p dccp'\nTry `iptables -h' or 'iptables --help' for more information.\n
2015-05-02 11:30:38,077 fail2ban.action [26155]: ERROR   iptables -N 
f2b-recidive
iptables -A f2b-recidive -j RETURN
iptables -I INPUT -p all -m multiport --dports all -j f2b-recidive -- returned 2

The reason seems to be this in jail.conf

[recidive]
logpath  = /var/log/fail2ban.log
port = all
protocol = all
...

adding a jail.local entry

[recidive]
enabled = true
banaction = iptables-allports

fixes the error, so perhaps this last line should be in jail.conf

-- System Information:
Debian Release: 8.0
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.16.0-4-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages fail2ban depends on:
ii  init-system-helpers  1.22
ii  lsb-base 4.1+Debian13+nmu1
ii  python3  3.4.2-2
pn  python3:any  none

Versions of packages fail2ban recommends:
ii  iptables   1.4.21-2+b1
ii  python3-pyinotify  0.9.5-1
ii  whois  5.2.7

Versions of packages fail2ban suggests:
ii  bsd-mailx [mailx]8.1.2-0.20141216cvs-2
ii  mailutils [mailx]1:2.99.98-2
ii  mailx1:20081101-2
pn  python3-systemd  none
ii  rsyslog [system-log-daemon]  8.4.2-1

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#783309: dovecot-core: dovecot 2.2.13 segfaults when selecting virtual folders

2015-04-25 Thread James Bottomley
Package: dovecot-core
Version: 1:2.2.13-1
Severity: important

Evolution and Squirrelmail misbehave badly with dovecot virtual search
folders (to the extent that evolution doesn't really function).  This
is because the folder segfaults when it is selected.  The logs show
this:

Apr 25 02:10:22 bedivere dovecot: imap(jejb): Fatal: master: service(imap): 
child 19837 killed with signal 11 (core dumps disabled)

The virtual folder triggering the crash has the configuration

namespace {
  prefix = virtual/
  separator = /
  location = virtual:~/Maildir/virtual
}

And the actual virtual/openstack/dovecot-virtual file is

Lists/openstack-dev
  inthread refs or from james bottomley keyword thread

It's a standard search for threads I've either replied to or marked with
an imap keyword 'thread'

The imap command

0 status virtual/openstack (MESSAGES UNSEEN RECENT)

Is crashing most of the time with

gdb /usr/lib/dovecot/imap
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show
copying
and show warranty for details.
This GDB was configured as i586-linux-gnu.
Type show configuration for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type help.
Type apropos word to search for commands related to word...
Reading symbols from /usr/lib/dovecot/imap...Reading symbols
from /usr/lib/debug//usr/lib/dovecot/imap...done.
done.
(gdb) r
Starting program: /usr/lib/dovecot/imap 
process 7868 is executing new program: /usr/bin/doveconf
process 7868 is executing new program: /usr/lib/dovecot/imap
* PREAUTH [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID
ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS
THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN
NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH
ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS SPECIAL-USE BINARY
MOVE] Logged in as jejb
0 status virtual/openstack (MESSAGES UNSEEN RECENT)

Program received signal SIGSEGV, Segmentation fault.
0xb7f0814b in mail_search_args_deinit_sub (args=args@entry=0x8005caf0, 
arg=0x8005cb08) at mail-search.c:169
169 mail-search.c: No such file or directory.
(gdb) bt
#0  0xb7f0814b in mail_search_args_deinit_sub
(args=args@entry=0x8005caf0, 
arg=0x8005cb08) at mail-search.c:169
#1  0xb7f08aea in mail_search_args_deinit (args=0x8005caf0)
at mail-search.c:193
#2  0xb7d9ecf1 in virtual_mailbox_close_internal
(mbox=mbox@entry=0x800585d0)
at virtual-storage.c:253
#3  0xb7d9ed7b in virtual_mailbox_close (box=0x800585d0)
at virtual-storage.c:307
#4  0xb7f403ff in mail_thread_mailbox_close (box=0x800585d0)
at index-thread.c:628
#5  0xb7f0d3fe in mailbox_close (box=0x800585d0) at mail-storage.c:1182
#6  0xb7f0d472 in mailbox_free (_box=0xb878) at mail-storage.c:1197
#7  0x8001e0ae in imap_status_get (cmd=0x80053948, ns=0x80052220, 
mailbox=0x8003a250 virtual/openstack, items=0xb8f8, 
result_r=0xb900) at imap-status.c:96
#8  0x80011f64 in cmd_status (cmd=0x80053948) at cmd-status.c:40
#9  0x80016847 in command_exec (cmd=0x80053948) at imap-commands.c:158
#10 0x800155cb in client_command_input (cmd=0x80053948) at
imap-client.c:778
#11 0x80015719 in client_command_input (cmd=0x80053948) at
imap-client.c:839
#12 0x800159cd in client_handle_next_command (remove_io_r=synthetic
pointer, 
client=0x80052ee0) at imap-client.c:877
#13 client_handle_input (client=0x80052ee0) at imap-client.c:889
#14 0x80015dd5 in client_input (client=0x80052ee0) at imap-client.c:931
---Type return to continue, or q return to quit---
#15 0xb7e202e3 in io_loop_call_io (io=0x800538c8) at ioloop.c:441
#16 0xb7e214be in io_loop_handler_run_internal (ioloop=0x80042458)
at ioloop-epoll.c:220
#17 0xb7e2036a in io_loop_handler_run (ioloop=0x80042458) at
ioloop.c:488
#18 0xb7e203f9 in io_loop_run (ioloop=0x80042458) at ioloop.c:465
#19 0xb7dc6985 in master_service_run (service=0x80042380, 
callback=0x800200b0 client_connected) at master-service.c:566
#20 0x80008b44 in main (argc=1, argv=0xbd84) at main.c:400

I thought it might be this patch

http://hg.dovecot.org/dovecot-2.2/rev/5c6f49e2d8d9

But after applying I still get the same segfault.

The actual location of the fault is this assert in the code

   case SEARCH_INTHREAD:
i_assert(arg-value.search_args-refcount  0);

So there's probably some problem with executing the actual search

The bug isn't present in the experimental package 2.2.15, so the fix
must be in the dovecot tree somewhere between 2.2.13 and 2.2.15, but I
can't find it.

-- Package-specific

Bug#769692: libnet-dns-perl has non-numeric version number 0.80_2

2014-11-15 Thread James Bottomley
Package: libnet-dns-perl
Version: 0.80.2-2
Severity: important

This causes bugs in any package which checks the version; for instance:

/etc/cron.daily/amavisd-new:
Argument 0.80_2 isn't numeric in numeric ge (=) at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/AskDNS.pm line 214.
Argument 0.80_2 isn't numeric in numeric lt () at 
/usr/share/perl5/Mail/SpamAssassin/Dns.pm line 521.
Argument 0.80_2 isn't numeric in numeric ge (=) at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/AskDNS.pm line 214.

The fix is simple, change VERSION in DNS.pm to 0.8002

-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.16.0-4-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libnet-dns-perl depends on:
ii  libc6   2.19-13
ii  libdigest-hmac-perl 1.03+dfsg-1
ii  libio-socket-inet6-perl 2.72-1
ii  libnet-ip-perl  1.26-1
ii  perl5.20.1-2
ii  perl-base [perlapi-5.20.1]  5.20.1-2

libnet-dns-perl recommends no packages.

libnet-dns-perl suggests no packages.

-- no debconf information

-- debsums errors found:
debsums: changed file /usr/lib/i386-linux-gnu/perl5/5.20/Net/DNS.pm (from 
libnet-dns-perl package)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#759807: spamassassin fails to start after perl-5.20 upgrade

2014-08-30 Thread James Bottomley
Package: spamassassin
Version: 3.4.0-2
Severity: important

It fails with:

 bedivere spamd[2472]: Can't locate Mail/SpamAssassin/CompiledRegexps/body_0.pm 
in @INC (you may need to install the 
Mail::SpamAssassin::CompiledRegexps::body_0 module) (@INC contains: 
/var/lib/spamassassin/compiled/5.020/3.004000 
/var/lib/spamassassin/compiled/5.020/3.004000/auto lib /usr/share/perl5 
/etc/perl /usr/local/lib/i386-linux-gnu/perl/5.20.0 
/usr/local/share/perl/5.20.0 /usr/lib/i386-linux-gnu/perl5/5.20 
/usr/lib/i386-linux-gnu/perl/5.20 /usr/share/perl/5.20 
/usr/local/lib/site_perl) at (eval 1027) line 1.

Because the upgrade means the path it is looking for the compiled
regexps on changed.

The problem can be fixed by running sa-compile and then restarting
spamassassin.

Either spamassassin should be automatically fixed to recompile the
rules in this case because it's going to happen for every upgrade of
perl, or the spamassassin package should install some perl upgrade
hook to ensure the recompile gets done

-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.14-2-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages spamassassin depends on:
ii  adduser 3.113+nmu3
ii  init-system-helpers 1.21
pn  libarchive-tar-perl none
ii  libhtml-parser-perl 3.71-1+b2
ii  libnet-dns-perl 0.79-1
ii  libnetaddr-ip-perl  4.075+dfsg-1+b1
ii  libsocket6-perl 0.25-1+b1
ii  libsys-hostname-long-perl   1.4-3
ii  libwww-perl 6.08-1
ii  perl5.20.0-4
ii  perl-modules [libio-zlib-perl]  5.20.0-4

Versions of packages spamassassin recommends:
ii  gnupg  1.4.18-2
ii  libio-socket-inet6-perl2.72-1
ii  libmail-spf-perl   2.9.0-3
ii  perl [libsys-syslog-perl]  5.20.0-4
ii  sa-compile 3.4.0-2
ii  spamc  3.4.0-2

Versions of packages spamassassin suggests:
ii  libdbi-perl  1.631-3+b1
ii  libio-compress-perl [libcompress-zlib-perl]  2.064-1
ii  libio-socket-ssl-perl1.997-2
ii  libmail-dkim-perl0.40-1
ii  perl [libcompress-zlib-perl] 5.20.0-4
ii  pyzor1:0.5.0-2
ii  razor1:2.85-4.1+b1

-- Configuration Files:
/etc/default/spamassassin changed [not included]
/etc/spamassassin/v310.pre changed [not included]

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#744774: fail2ban recidive no longer sends lines matching IP in fail2ban.log

2014-04-14 Thread James Bottomley
Package: fail2ban
Version: 0.8.13-1
Severity: normal

The regular expression for reporting the actual falining lines in
sendmail-whois-lines.conf does not match the ban lines by recidive in
fail2ban.log.  The reason is that the IP address appears at the end of
the line, so the grep

grep '[^0-9]ip[^0-9]' logpath

Does not match (end of line is not a matchable character).  The fix is
to use an extended grep matching either not numeric or end of line:

egrep '[^0-9]ip([^0-9]|$)' logpath

-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.13-1-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages fail2ban depends on:
ii  lsb-base4.1+Debian12
pn  python:any  none

Versions of packages fail2ban recommends:
ii  iptables  1.4.21-1
pn  python-pyinotify  none
ii  whois 5.1.1

Versions of packages fail2ban suggests:
ii  bsd-mailx [mailx]8.1.2-0.20131005cvs-1
ii  mailutils [mailx]1:2.99.98-1.1
ii  mailx1:20081101-2
pn  python-gamin none
ii  rsyslog [system-log-daemon]  7.6.3-1

-- Configuration Files:
/etc/fail2ban/action.d/iptables-allports.conf changed:
[INCLUDES]
before = iptables-blocktype.conf
[Definition]
actionstart = iptables -N fail2ban-name
  iptables -A fail2ban-name -j RETURN
  iptables -I chain -p protocol -j fail2ban-name
actionstop = iptables -D chain -p protocol -j fail2ban-name
 iptables -F fail2ban-name
 iptables -X fail2ban-name
actioncheck = iptables -n -L chain | grep -q 'fail2ban-name[ \t]'
actionban = iptables -I fail2ban-name 1 -s ip -j blocktype
actionunban = iptables -D fail2ban-name -s ip -j blocktype
[Init]
name = default
protocol = all
chain = INPUT

/etc/fail2ban/action.d/sendmail-whois-lines.conf changed:
[INCLUDES]
before = sendmail-common.conf
[Definition]
actionstart = printf %%b Subject: [Fail2Ban] name: started on `uname -n`
  Date: `LC_TIME=C date -u +%%a, %%d %%h %%Y %%T +`
  From: sendername sender
  To: dest\n
  Hi,\n
  The jail name has been started successfully.\n
  Regards,\n
  Fail2Ban | /usr/sbin/sendmail -f sender dest
actionstop = printf %%b Subject: [Fail2Ban] name: stopped on `uname -n`
 Date: `LC_TIME=C date -u +%%a, %%d %%h %%Y %%T +`
 From: sendername sender
 To: dest\n
 Hi,\n
 The jail name has been stopped.\n
 Regards,\n
 Fail2Ban | /usr/sbin/sendmail -f sender dest
actioncheck = 
actionban = printf %%b Subject: [Fail2Ban] name: banned ip from `uname -n`
Date: `LC_TIME=C date -u +%%a, %%d %%h %%Y %%T +`
From: sendername sender
To: dest\n
Hi,\n
The IP ip has just been banned by Fail2Ban after
failures attempts against name.\n\n
Here is more information about ip:\n
`/usr/bin/whois ip || echo missing whois program`\n\n
Lines containing IP:ip in logpath\n
`egrep '[^0-9]ip([^0-9]|$)' logpath`\n\n
Regards,\n
Fail2Ban | /usr/sbin/sendmail -f sender dest
actionunban = 
[Init]
name = default
logpath = /dev/null

/etc/fail2ban/filter.d/asterisk.conf changed:
[INCLUDES]
before = common.conf
[Definition]
_daemon = asterisk
__pid_re = (?:\[\d+\])
log_prefix= (?:NOTICE|SECURITY)%(__pid_re)s:?(?:\[C-[\da-f]*\])? \S+:\d*( in 
\w+:)?
failregex = ^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s Registration from 
'[^']*' failed for 'HOST(:\d+)?' - (Wrong password|Username/auth name 
mismatch|No matching peer found|Not a local domain|Device does not match 
ACL|Peer is not supposed to register|ACL error \(permit/deny\)|Not a local 
domain)$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s Call from '[^']*' 
\(HOST:\d+\) to extension '\d+' rejected because extension not found in 
context '.*'\.$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s Host HOST failed to 
authenticate as '[^']*'$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s No registration for peer 
'[^']*' \(from HOST\)$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s Host HOST failed MD5 
authentication for '[^']*' \([^)]+\)$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s Failed to authenticate 
(user|device) [^@]+@HOST\S*$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s 
(?:handle_request_subscribe: )?Sending fake auth rejection for (device|user) 
\d*sip:[^@]+@HOST;tag=\w+\S*$
^(%(__prefix_line)s|\[\]\s*)%(log_prefix)s 

Bug#672851: netbase 5.0: network connection does not come up upon reboot

2012-05-20 Thread James Bottomley
On Sat, 2012-05-19 at 22:40 +0100, Adam D. Barratt wrote:
 On Sat, 2012-05-19 at 19:55 +0100, James Bottomley wrote:
  On Sat, 2012-05-19 at 16:19 +0200, Marco d'Itri wrote:
   There is no bug, just don't do stupid things to your system.
  
  Yes, there is.  The bug is that testing breaks networking on an upgrade
  and will do so for at least the next six days.
 
 No, it won't.
 
   ifupdown |0.7~rc3 |   testing | source, amd64, armel, armhf, i386, 
 ia64, kfreebsd-amd64, kfreebsd-i386, mips, mipsel, powerpc, s390, s390x, sparc

Great ... that's all I asked for: acceleration of ifupdown or backoff of
netbase in testing.

This bug can be closed now.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#672851: netbase 5.0: network connection does not come up upon reboot

2012-05-19 Thread James Bottomley
Package: netbase
Version: 5.0
Followup-For: Bug #672851

This bug is still present in testing

It means that anyone who does a dist-upgrade -t testing gets no
networking.  This just happened to me with a co-lo box resulting in a
wasted hour investigating the source of an apparent boot failure.

This is a really serious problem.  It may be fixed by ifupdown-0.7~rc3
but that's stuck in unstable for another 6 days according to

http://bjorn.haxx.se/debian/testing.pl?package=ifupdown

Having networking break on dist upgrade to testing for another six
days (at least) looks to be irresponsible to say the least.  Please
revert the netbase update or accelerate ifupdown into testing.

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-2-686-pae (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages netbase depends on:
ii  lsb-base  4.1+Debian3

Versions of packages netbase recommends:
ii  ifupdown  0.7~alpha5+really0.6.16

netbase suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#672851: netbase 5.0: network connection does not come up upon reboot

2012-05-19 Thread James Bottomley
On Sat, 2012-05-19 at 16:19 +0200, Marco d'Itri wrote:
 On May 19, James Bottomley james.bottom...@hansenpartnership.com wrote:
 
  It means that anyone who does a dist-upgrade -t testing gets no
  networking.  This just happened to me with a co-lo box resulting in a
 Don't do it then ffs.

Interesting idea, but back to the real world where people do actually
upgrade their systems, telling them not to isn't so hot for all those
new release things people do.

 There is no bug, just don't do stupid things to your system.

Yes, there is.  The bug is that testing breaks networking on an upgrade
and will do so for at least the next six days.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#647900: mptspi init failure on Sparc SMP in Linux 3.0

2011-12-14 Thread James Bottomley
OK, so rather than try to guess based on x86, lets involve the experts;
I've cc'd the sparclinux list.

Sparc people,

The failure alluded to below occurs on an SMP system, but not on a UP
one.  The symptom appears to indicate the failure of interrupt delivery
on SMP, which is why the MPT card fails to initialise.  How do we debug
this on a sparc?

Thanks,

James


On Wed, 2011-12-14 at 09:31 +0100, Biblioteka UR wrote:
 Praveen,
 
 Thank you for your response.
 
 This is Sparc server and here is silo, not grub. So I do not know how 
 well I tried your suggestions.
 
 root@fire:/boot# ls -l
 total 28506
 lrwxrwxrwx 1 root root1 Nov  5 08:28 boot - .
 -rw-r--r-- 1 root root88515 Nov 14 16:35 config-3.1.0-1-sparc64
 -rw-r--r-- 1 root root88956 Nov 14 16:47 config-3.1.0-1-sparc64-smp
 lrwxrwxrwx 1 root root1 Nov  5 08:28 etc - .
 -rw-r--r-- 1 root root 1024 Aug 26  2010 fd.b
 -rw-r--r-- 1 root root  512 Aug 26  2010 first.b
 -rw-r--r-- 1 root root 1024 Aug 26  2010 generic.b
 -rw-r--r-- 1 root root  692 Aug 26  2010 ieee32.b
 lrwxrwxrwx 1 root root   30 Nov 23 08:13 initrd.img - 
 initrd.img-3.1.0-1-sparc64-smp
 -rw-r--r-- 1 root root 10201990 Nov 23 08:05 initrd.img-3.1.0-1-sparc64
 -rw-r--r-- 1 root root 10293974 Dec 12 10:34 initrd.img-3.1.0-1-sparc64-smp
 lrwxrwxrwx 1 root root   26 Nov 23 08:04 initrd.img.old - 
 initrd.img-3.1.0-1-sparc64
 -rw-r--r-- 1 root root 7704 Aug 26  2010 isofs.b
 drwxr-xr-x 2 root root12288 Nov  5 08:22 lost+found
 -rw-r--r-- 1 root root 7680 Nov  5 08:28 old.b
 -rw-r--r-- 1 root root78336 Nov  5 08:28 second.b
 -rw-r--r-- 1 root root  199 Dec 14 08:36 silo.conf
 -rw-r--r-- 1 root root76387 Aug 26  2010 silotftp.b
 -rw-r--r-- 1 root root  1629480 Nov 14 16:35 System.map-3.1.0-1-sparc64
 -rw-r--r-- 1 root root  1676706 Nov 14 16:47 System.map-3.1.0-1-sparc64-smp
 -rw-r--r-- 1 root root  512 Aug 26  2010 ultra.b
 lrwxrwxrwx 1 root root   27 Nov 23 08:13 vmlinuz - 
 vmlinuz-3.1.0-1-sparc64-smp
 -rw-r--r-- 1 root root  2385979 Nov 14 16:34 vmlinuz-3.1.0-1-sparc64
 -rw-r--r-- 1 root root  2504212 Nov 14 16:46 vmlinuz-3.1.0-1-sparc64-smp
 lrwxrwxrwx 1 root root   23 Nov 23 08:04 vmlinuz.old - 
 vmlinuz-3.1.0-1-sparc64
 
 I tried:
 root@fire:/boot# cat silo.conf
 root=/dev/sda2
 partition=1
 default=LinuxOLD
 read-only
 timeout=100
 
 image=/vmlinuz
  label=Linux
  initrd=/initrd.img
  append=pci=routeirq
 
 image=/vmlinuz.old
  label=LinuxOLD
  initrd=/initrd.img.old
 
 and
 
 root@fire:/boot# cat silo.conf
 root=/dev/sda2
 partition=1
 default=LinuxOLD
 read-only
 timeout=100
 
 image=/vmlinuz
  label=Linux
  initrd=/initrd.img
  append=irqpoll
 
 image=/vmlinuz.old
  label=LinuxOLD
  initrd=/initrd.img.old
 
 
 These changes have not helped. I have the same error messages.
 
 Unfortunately I can't collect the kernel messages with smp linux-image. 
 I set in rsyslog.conf
 
 kern.debug  -/var/log/kern.debug
 
 I had already set
 
 *.=debug;\
  auth,authpriv.none;\
  news.none;mail.none -/var/log/debug
 
 but there is no log messages from broken boot. I can collect messages 
 only from putty.
 
 Regards,
 Mariusz
 
 PS. Sorry for my English. :)
 
 W dniu 2011-12-13 19:36, Krishnamoorthy, Praveen pisze:
  Mariusz,
 
  [   68.319518] mptbase: ioc0: WARNING - Issuing Reset from
  mpt_config!!, doorbell=0x2400
  [   69.175505] mptbase: ioc0: Attempting Retry Config request type
  0x3, page 0x, action 0
  [   84.267524] mptbase: ioc0: WARNING - Issuing Reset from
  mpt_config!!, doorbell=0x2400
  As Nagalakshmi pointed out, the series of reset happens because the config 
  request for reading the page header fails. This is the first time the 
  message queues are used when the card is coming up, therefore taking into 
  account, that the same driver and same card works perfectly on non-smp 
  linux kernel,  I am guessing that the config request would have been sent 
  successfully and the firmware would have processed the request and raised 
  an interrupt through the IRQ line assigned for this card, it is somehow not 
  routed to our driver's interrupt service routine. Therefore could you try 
  the following to check if any of it works?
 
  1. add pci=routeirq to the kernel boot parameters in /boot/grub/menu.lst
  Eg) 
  title  Debian XYZ
  root   (hdX,X)
  kernel /boot/vmlinuz-XYZ root=XX ro quiet splash pci=routeirq
  initrd /boot/initrd.img-XYZ
 
  2. add irqpoll to the kernel boot parameters in /boot/grub/menu.lst
  Eg) 
  title  Debian XYZ
  root   (hdX,X)
  kernel /boot/vmlinuz-XYZ root=XX ro quiet splash irqpoll
  initrd /boot/initrd.img-XYZ
 
  Regards,
  Praveen
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org

Bug#632518: asterisk: Placing calls on hold fails with some IP phones (aastra 9133i, sipdroid)

2011-07-02 Thread James Bottomley
Package: asterisk
Version: 1:1.8.4.3-1
Severity: important
Tags: upstream patch

Upstream bugzilla: https://issues.asterisk.org/jira/browse/ASTERISK-18086

Any phone that uses INVITE to 0.0.0.0 to place a call on hold fails
with asterisk 1.8.  This feature works in asterisk 1.6, so this is 
a feature regression.

The problem looks to have been introduced by replacing the hold check
for 0.0.0.0 with ast_sockaddr_isnull() which dosn't return true on
0.0.0.0.

Fix attached works by making ast_sockaddr_resolve_first_af() explicitly
set zero length on the 0.0.0.0 address case.

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.38-2-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages asterisk depends on:
ii  adduser 3.113add and remove users and groups
ii  asterisk-config 1:1.8.4.3-1  Configuration files for Asterisk
ii  asterisk-modules1:1.8.4.3-1  loadable modules for the Asterisk 
ii  asterisk-sounds-mai 1:1.6.2.9-2+squeeze2 Core Sound files for Asterisk (Eng
ii  libc6   2.13-7   Embedded GNU C Library: Shared lib
ii  libcap2 1:2.21-1 support for getting/setting POSIX.
ii  libgcc1 1:4.6.0-10   GCC support library
ii  libncurses5 5.9-1shared libraries for terminal hand
ii  libssl1.0.0 1.0.0d-3 SSL shared libraries
ii  libstdc++6  4.6.0-10 The GNU Standard C++ Library v3
ii  libxml2 2.7.8.dfsg-3 GNOME XML library

Versions of packages asterisk recommends:
ii  asterisk-moh-opsound-gsm 2.03-1  asterisk extra sound files - Engli
ii  asterisk-voicemail   1:1.8.4.3-1 simple voicemail support for the A
ii  sox  14.3.2-1Swiss army knife of sound processi

Versions of packages asterisk suggests:
ii  asterisk-dahdi   1:1.8.4.3-1 DAHDI devices support for the Aste
ii  asterisk-dev 1:1.8.4.3-1 Development files for Asterisk
ii  asterisk-doc 1:1.8.4.3-1 Source code documentation for Aste
pn  asterisk-ooh323  none  (no description available)

-- Configuration Files:
/etc/default/asterisk changed [not included]

-- no debconf information
--- channels/chan_sip.c.old 2011-07-02 12:11:56.0 -0500
+++ channels/chan_sip.c 2011-07-02 21:10:05.0 -0500
@@ -28439,7 +28439,14 @@
ast_debug(1, Multiple addresses, using the first one only\n);
}
 
-   ast_sockaddr_copy(addr, addrs[0]);
+   if (addrs[0].ss.ss_family == AF_INET 
+   ((struct sockaddr_in *)addrs[0].ss)-sin_addr.s_addr
+   == INADDR_ANY)
+   /* treat 0.0.0.0 as NULL address; needed to get
+* hold to work in some circumstances */
+   addr-len = 0;
+   else
+   ast_sockaddr_copy(addr, addrs[0]);
 
ast_free(addrs);
return 0;


Bug#626191: Patch in comment number 5 is correct

2011-05-28 Thread James Bottomley
Upstream already noted the issue:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6392#c1

And committed the same fix:

http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Dns.pm?r1=926883r2=928958pathrev=928958diff_format=h

So it should be safe to apply to debian.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations

2011-04-19 Thread James Bottomley
On Tue, 2011-04-19 at 11:20 +0200, Bartlomiej Zolnierkiewicz wrote:
 From: Bartlomiej Zolnierkiewicz bzoln...@gmail.com
 Subject: [PATCH v2] pata_cmd64x: add enablebits checking
 
 Fixes IDE - libata regression.
 
 IDE's cmd64x host driver has been supporting enablebits checking
 since the initial driver's merge.

Actually, the thread discussing the proposed patches is here:

http://marc.info/?t=13031522715

I much prefer the dummy interface approach to the prereset one because
it prevents any possible poke at the registers which will crash the box.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations

2011-04-18 Thread James Bottomley
On Mon, 2011-04-18 at 14:12 +0400, Sergei Shtylyov wrote:
 Hello.
 
 On 18-04-2011 3:58, James Bottomley wrote:
 
  I've got a parisc system where the DVD drive is hardwired to a silicon
  image controller:
 
  00:02.0 IDE interface: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to
  ATA Host Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
   Subsystem: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA
  Host Controller
   Flags: bus master, medium devsel, latency 64, IRQ 69
   I/O ports at 0d18 [size=8]
   I/O ports at 0d24 [size=4]
   I/O ports at 0d10 [size=8]
   I/O ports at 0d20 [size=4]
   I/O ports at 0d00 [size=16]
   Capabilities: [60] Power Management version 2
   Kernel driver in use: pata_cmd64x
 
  The specific problem is that any access to the registers where the
  secondary port should be causes an instant fault on the box (I think
  because the second port just isn't wired up internally, so the memory
  doesn't respond), so the default libata-sff driver that pata_cmd64x is
  attached to causes this by insisting on probing both ports.
 
 Perhaps the secondary port is disabled (though it's strange that your 
 lspci dump shows I/O resources for both ports allocated).

It's a last minute wedgie into an enterprise system because they wanted
a DVD and there are no SCSI ones ... although why they didn't do USB 

  I can get all of this working by fixing up all the hard coded knowledge
  in libata-sff only to use a single port.
 
  However, I can't fix the libata-sff driver until I know how to tell
  there's only one port wired.  Does anyone with cmd649 knowledge have any
  idea how I might tell this?
 
 The secondary port is enabled in the PCI config. space: register 0x51 bit 
 3 controls this. Unfortunately, pata_cmd64x driver still doesn't check the 
 channel enable bits; the cmd64x driver does though, so it might be worth 
 trying...

So this is the enablebits code in driver/ide that's missing from any of
the libata stuff?  Should this be generic in libata-sff? ... I mean even
on an x86 where arbitrary memory can be poked without consequence,
trying to activate a disabled port will still produce lots of noise.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley
On Sun, 2011-04-17 at 11:11 -0400, John David Anglin wrote:
  On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: 
   On Sat, 16 Apr 2011, James Bottomley wrote:
   
Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
the OHCI module and discovers the ports just fine.
   
   Boot 2.6.39-rc3 fails for me with attached config.
  
  I can't quite build it.  With gcc version 4.2.4 (Debian 4.2.4-6) I'm
  getting an ICE:
  
  net/wireless/reg.c: In function 'freq_reg_info_regd':
  net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1,
  at expr.c:8744
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See URL:http://gcc.gnu.org/bugs.html for instructions.
  For Debian GNU/Linux specific bug reporting instructions,
  see URL:file:///usr/share/doc/gcc-4.2/README.Bugs.
  make[2]: *** [net/wireless/reg.o] Error 1
 
 This is probably fixed as it doesn't occur with 
 gcc version 4.5.3 20110101 (prerelease) [gcc-4_5-branch revision 168387] 
 (GCC) .
 GCC 4.2 and 4.3 are no longer maintained and there won't be any further
 releases from these branches.
 
 Without looking at the above, it's hard to tell whether the bug is a
 middle-end or backend bug.  Many middle-end bugs are fixed in more recent
 GCC versions.  Although newer versions may bring their own problems,
 we can get help in fixing problems particularly if they are regressions. 
 
 The asm delay slot bug affected all GCC versions.  I backported the fix to
 the 4.3, 4.4 and 4.5 branches.  This is a problem in the kernel because of
 the following:
 
 ** The __asm__ op below simple prevents gcc/ld from reordering
 ** instructions across the mb() call.
 */
 #define mb()__asm__ __volatile__(:::memory) /* barrier() 
 */
 
 It's just a matter of chance whether a barrier ends up in the delay slot
 of a branch in a critical location.

I'll redo optimisation on that one and see if I can avoid this.

  Plus there's a bug in my kernel code:
  
  drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup':
  drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function
  'kzalloc
  
  If I correct for these (add missing slab.h include and disable wireless)
 
 I had to add missing slab.h as well.  However, I didn't touch wireless
 with 4.5.3.
 
  and build, the last message I see is
  
  turn off boot console ttyB0
  
  Which indicates it's got a problem with the console configuration (I
  don't see any console registration for the DIVA serial port on ttyS1 in
  the boot log).
 
 Comparing the console output that I recorded for the debian kernel, I
 see udev starts much earlier.  It only has the initial message from the
 tg3 driver and SCSI subsystem.

It's most likely a driver module that's getting loaded which is turned
off in the booting configuration ... finding it isn't going to be easy,
though ...

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley
On Sun, 2011-04-17 at 10:23 -0500, James Bottomley wrote:
 It's most likely a driver module that's getting loaded which is turned
 off in the booting configuration ... finding it isn't going to be easy,
 though ...

Finally got a build (had to swap out -Os for -O2).

I traced the module loads and successful inits and found it; it's
pata_cmd64x  ... it loads but never returns from init.  I bet it's
trying to poke into ISA space which causes the HPMC.

Removing this one module from the system allows it to boot again.

I'd suggest just disabling in the parisc config for now.  Using an ATA
based CD/DVD instead of a SCSI one is a very recent thing.  I'll see if
I can get it working, but ATA controllers tend to be somewhat nasty and
x86 specific ...

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley
On Sun, 2011-04-17 at 20:37 +0100, Ben Hutchings wrote:
 On Sun, 2011-04-17 at 14:28 -0500, James Bottomley wrote:
  I traced the module loads and successful inits and found it; it's
  pata_cmd64x  ... it loads but never returns from init.  I bet it's
  trying to poke into ISA space which causes the HPMC.
  
  Removing this one module from the system allows it to boot again.
 [...]
 
 We also had a recent report that this driver is also bust on some sparc
 systems.  We could swap back to cmd64x on these architectures but I
 would rather get pata_cmd64x fixed.

Well, I've got a working pata_cm64x (and now a working CD drive).

The specific issue on parisc (and probably sparc) is that we're using
this siimage chip hard wired to a single DVD drive.  We have no use for
a secondary port, so there isn't one.  The registers for the secondary
port are pointing off into empty space.  When libata-sff tries to touch
the secondary port, we get an instant High Priority Machine Check
because on most non-x86 systems, it's a fault to touch non-responding
memory.

I got it to work by making libata-sff only probe a single port.  Now,
here's the problem: the libata-sff driver is hardwired to probe two
ports, so it will require major surgery to check dynamically how many
ports there are ... and the second problem is that I don't even know how
to check this.  I'll ask about this on linux-ide.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: libata-sff/pata_cmd64x problem with hardwired configurations

2011-04-17 Thread James Bottomley
I've got a parisc system where the DVD drive is hardwired to a silicon
image controller:

00:02.0 IDE interface: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to
ATA Host Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA
Host Controller
Flags: bus master, medium devsel, latency 64, IRQ 69
I/O ports at 0d18 [size=8]
I/O ports at 0d24 [size=4]
I/O ports at 0d10 [size=8]
I/O ports at 0d20 [size=4]
I/O ports at 0d00 [size=16]
Capabilities: [60] Power Management version 2
Kernel driver in use: pata_cmd64x

The specific problem is that any access to the registers where the
secondary port should be causes an instant fault on the box (I think
because the second port just isn't wired up internally, so the memory
doesn't respond), so the default libata-sff driver that pata_cmd64x is
attached to causes this by insisting on probing both ports.

I can get all of this working by fixing up all the hard coded knowledge
in libata-sff only to use a single port.

However, I can't fix the libata-sff driver until I know how to tell
there's only one port wired.  Does anyone with cmd649 knowledge have any
idea how I might tell this?

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-17 Thread James Bottomley
On Sun, 2011-04-17 at 21:25 -0400, John David Anglin wrote:
 This is excellent detective work.  If I might ask, how did you trace
 the module loads and successful inits?

Heh, you're expecting me to name magic tracing tools?  Well (shuffles
feet) I just put printks in kernel/modules.c to do it.  It's basically
impossible to trace a boot problem like this any other way, because we
don't have enough of the system up to use any tools.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley
On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
 I posted this debian bug report because the most recent debian
 SMP kernel build fails to boot on my rp3440:
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997
 
 I don't think debian kernels have worked since lenny.

Hmm, well upstream ones have: so it's likely a patch debian has but
upstream doesn't, or it could be a toolchain issue ... I didn't think
gcc-4.4.5 worked properly on 64 bit without a few patches?

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley
On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote:
  On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
   I posted this debian bug report because the most recent debian
   SMP kernel build fails to boot on my rp3440:
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997
   
   I don't think debian kernels have worked since lenny.
  
  Hmm, well upstream ones have: so it's likely a patch debian has but
  upstream doesn't, or it could be a toolchain issue ... I didn't think
  gcc-4.4.5 worked properly on 64 bit without a few patches?
 
 Yes, but debian tends to build almost everything.  For some reason,
 I've turned off ipv6.  Unlike many kernel bugs, this one is completely
 reproducible.

I suppose it could be USB ... before I got ion, I didn't have any parisc
systems with USB, so it's turned off in my build.  I'll turn it on and
see if there's a problem there.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley
On Sat, 2011-04-16 at 15:48 -0500, James Bottomley wrote:
 On Sat, 2011-04-16 at 15:29 -0400, John David Anglin wrote:
   On Sat, 2011-04-16 at 14:07 -0400, John David Anglin wrote:
I posted this debian bug report because the most recent debian
SMP kernel build fails to boot on my rp3440:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622997

I don't think debian kernels have worked since lenny.
   
   Hmm, well upstream ones have: so it's likely a patch debian has but
   upstream doesn't, or it could be a toolchain issue ... I didn't think
   gcc-4.4.5 worked properly on 64 bit without a few patches?
  
  Yes, but debian tends to build almost everything.  For some reason,
  I've turned off ipv6.  Unlike many kernel bugs, this one is completely
  reproducible.
 
 I suppose it could be USB ... before I got ion, I didn't have any parisc
 systems with USB, so it's turned off in my build.  I'll turn it on and
 see if there's a problem there.

Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
the OHCI module and discovers the ports just fine.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#622997: Debian bug 622997

2011-04-16 Thread James Bottomley
On Sat, 2011-04-16 at 19:35 -0400, John David Anglin wrote: 
 On Sat, 16 Apr 2011, James Bottomley wrote:
 
  Strike that one ... I enabled USB in my 2.6.39-rc3 build and it inserts
  the OHCI module and discovers the ports just fine.
 
 Boot 2.6.39-rc3 fails for me with attached config.

I can't quite build it.  With gcc version 4.2.4 (Debian 4.2.4-6) I'm
getting an ICE:

net/wireless/reg.c: In function 'freq_reg_info_regd':
net/wireless/reg.c:645: internal compiler error: in expand_expr_real_1,
at expr.c:8744
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:http://gcc.gnu.org/bugs.html for instructions.
For Debian GNU/Linux specific bug reporting instructions,
see URL:file:///usr/share/doc/gcc-4.2/README.Bugs.
make[2]: *** [net/wireless/reg.o] Error 1

Plus there's a bug in my kernel code:

drivers/usb/host/xhci-pci.c: In function 'xhci_pci_setup':
drivers/usb/host/xhci-pci.c:61: error: implicit declaration of function
'kzalloc

If I correct for these (add missing slab.h include and disable wireless)
and build, the last message I see is

turn off boot console ttyB0

Which indicates it's got a problem with the console configuration (I
don't see any console registration for the DIVA serial port on ttyS1 in
the boot log).

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#619773: mpg321 uses 100% cpu with astersk in version 0.2.13-1

2011-03-26 Thread James Bottomley
Package: mpg321
Version: 0.2.13-1
Severity: important


After a recent upgrade in testing, mpg321 which is run by asterisk
for music on hold consumes 100% CPU all the time.

Dropping back to version 0.2.12-1 fixes the problem

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages mpg321 depends on:
ii  libao4  1.0.0-5  Cross Platform Audio Output Librar
ii  libaudio-scrobbler-perl 0.01-2.1 perl interface to audioscrobbler.c
ii  libc6   2.11.2-11Embedded GNU C Library: Shared lib
ii  libid3tag0  0.15.1b-10   ID3 tag reading library from the M
ii  libmad0 0.15.1b-5MPEG audio decoder library
ii  zlib1g  1:1.2.3.4.dfsg-3 compression library - runtime

mpg321 recommends no packages.

mpg321 suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#614968: has now impacted testing

2011-03-06 Thread James Bottomley
On Sat, 2011-03-05 at 10:34 -0600, James Bottomley wrote:
 If I remove the transactions and just do a straight delete, everything
 works (at least in my test environment; I'll try it on the real program
 tonight).  There's actually no real reason for the deletes to be done
 transactionally.  Even if we get a crash, we don't care whether all the
 keys or none are deleted ... if we get partial deletion, the next
 cleanup will get the rest.

OK, so confirmed last night that the cleanup ran correctly without the
transactional logic and that postgrey didn't die:

Mar  6 02:17:23 bedivere postgrey[32282]: cleaning main database finished. 
before: 9349, after: 4484
Mar  6 02:17:24 bedivere postgrey[32282]: cleaning clients database finished. 
before: 2844, after: 2336

The patch below is what I'm running.

James

---

--- /usr/sbin/postgrey.orig 2011-01-11 13:15:43.0 -0800
+++ /usr/sbin/postgrey  2011-03-05 08:40:02.0 -0800
@@ -275,11 +275,7 @@
 $nr_keys_after++;
 }
 }
-my $db_obj = $self-{postgrey}{db_obj};
-my $txn = $db_env-txn_begin();
-$db_obj-Txn($txn);
 for my $key (@old_keys) { delete $db-{$key}; }
-$txn-txn_commit();
 
 $self-mylog(1, cleaning main database finished. before: 
$nr_keys_before, after: $nr_keys_after);
 
@@ -299,11 +295,7 @@
 $nr_keys_after++;
 }
 }
-my $db_cawl_obj = $self-{postgrey}{db_cawl_obj};
-$txn = $db_env-txn_begin();
-$db_cawl_obj-Txn($txn);
 for my $key (@old_keys_cawl) { delete $cawl_db-{$key}; }
-$txn-txn_commit();
 
 $self-mylog(1, cleaning clients database finished. before: 
$nr_keys_before, after: $nr_keys_after);
 }





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#614968: has now impacted testing

2011-03-05 Thread James Bottomley
On Fri, 2011-03-04 at 15:06 -0400, Joey Hess wrote:
 James Bottomley wrote:
  I picked this up on a recent testing upgrade (last Saturday).  My
  postgrey process is now dying every night as well (I've set up a harness
  to restart it, but it's not ideal).
  
  Following what happened in #441069 I tried
  
  db5.1_recover -h /var/lib/postgrey/
  
  but that doesn't fix the problem.  Whatever it is, it looks to be deeper
  than simply an incompatible database update.
 
 My workaround has been to simply disable the broken part on line 249:
 
 if(0 and $hour  1 and $hour  7 and
 $now - $self-{postgrey}{last_maint_keys} = 82800)

Well, that means your database never gets cleaned.  I was going to say
that simply restarting at least meant the main database got cleaned
(because it's crashing on the AWL transaction).  However, I stripped the
cleaning code from postgrey and ran it on a copy of the database.  In
spite of the fact that there's no error on the first $txn-commit(), the
transaction hasn't committed.  The next transaction you try to create in
the $db_env is empty, which is why the crash.

What's even stranger is that the entire db and db_cawl ties look to be
non functional after $txn-commit() ... it's like setting up the
transaction actually causes the entire perl DB mechanism to fall over.

This all smells like a libdb5.1 problem.

If I remove the transactions and just do a straight delete, everything
works (at least in my test environment; I'll try it on the real program
tonight).  There's actually no real reason for the deletes to be done
transactionally.  Even if we get a crash, we don't care whether all the
keys or none are deleted ... if we get partial deletion, the next
cleanup will get the rest.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#614968: has now impacted testing

2011-03-04 Thread James Bottomley
I picked this up on a recent testing upgrade (last Saturday).  My
postgrey process is now dying every night as well (I've set up a harness
to restart it, but it's not ideal).

Following what happened in #441069 I tried

db5.1_recover -h /var/lib/postgrey/

but that doesn't fix the problem.  Whatever it is, it looks to be deeper
than simply an incompatible database update.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley
On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote:
  (5) Child process B is waken up and sees old value at x in
 oldpage,
  through different cache line.  B sleeps.
 
 This isn't possible.  at this point, A and B have the same virtual
 address and mapping for oldpage this means they are the same cache
 colour, so they both see the cached value.

Perhaps to add more detail to this.  In spite of what the arch manual
says (it says the congruence stride is 16MB), the congruence stride on
all manufactured parisc processors is 4MB.  This means that any virtual
addresses, regardless of space id, that are equal modulo 4MB have the
same cache colour.

James
 




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-06 Thread James Bottomley
On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote:
 John David Anglin wrote:
  It is interesting that in the case of the Debian bug that
  a thread of the parent process causes the COW break and thereby corrupts
  its own memory.  As far as I can tell, the fork'd child never writes
  to the memory that causes the fault.
 
 Thanks for writing and testing a patch.
 
 The case of #561203 is second scenario.  I think that this case is
 relevant to VIVT-WB machine too (provided kernel does copy by kernel
 address).
 
 James Bottomley wrote:
  So this is going to be a hard sell because of the arch churn. There are,
  however, three ways to do it with the original signature.
 
 Currently, I think that signature change would be inevitable for
 ptep_set_wrprotect.

Well we can't do it by claiming several architectures are wrong in their
implementation.  We might do it by claiming to need vma knowledge ...
however, even if you want the flush, as I said, you don't need to change
the signature.

   1. implement copy_user_highpage ... this allows us to copy through
  the child's page cache (which is coherent with the parent's
  before the cow) and thus pick up any cache changes without a
  flush
 
 Let me think about this way.
 
 Well, this would improve both cases of the first scenario of mine and
 the second scenario.
 
 But... I think that even if we would have copy_user_highpage which
 does copy by user address, we need to flush at ptep_set_wrprotect.  I
 think that we need to keep the condition: no dirty cache for COW page.
 
 Think about third scenario of threads and fork:
 
 (1) In process A, there are multiple threads, and a thread A-1 invokes
 fork.  We have process B, with a different space identifier color.

I don't understand what you mean by space colour ... there's cache
colour which refers to the line in the cache to which the the physical
memory maps.  The way PA is set up, space ID doesn't factor into cache
colour.

 (2) Another thread A-2 in process A runs while A-1 copies memory by
 dup_mmap.  A-2 writes to the address x in a page.  Let's call
 this page oldpage.
 
 (3) We have dirty cache for x by A-2 at the time of
 ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
 here.
 
 (4) A-1 finishes copy, and sleeps.
 
 (5) Child process B is waken up and sees old value at x in oldpage,
 through different cache line.  B sleeps.

This isn't possible.  at this point, A and B have the same virtual
address and mapping for oldpage this means they are the same cache
colour, so they both see the cached value.

James

 (6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
 copies data on oldpage to newpage.  OK, newpage is
 consistent with copy_user_highpage by user address.
 
 Note that during this copy, the cache line of x by A-2 is
 flushed out to oldpage.  It invokes another memory fault and COW
 break.  (I think that this memory fault is unhealthy.)
 Then, new value goes to x on oldpage (when it's physically
 tagged cache).
 
 A-2 sleeps.
 
 (7) Child process B is waken up.  When it accesses at x, it sees new
 value suddenly.
 
 
 If we flush cache to oldpage at ptep_set_wrprotect, this couldn't
 occur.
 
 
   *   *   *
 
 
 I know that we should not do threads and fork.  It is difficult to
 define clean semantics.  Because another thread may touch memory while
 a thread which does memory copy for fork, the memory what the child
 process will see may be inconsistent.  For the child, a page might be
 new, while another page might be old.
 
 For VIVT-WB cache machine, I am considering a possibility for the
 child process to have inconsistent memory even within a single page
 (when we have no flush at ptep_set_wrprotect).
 
 It will be needed for me to talk to linux-arch soon or later.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-05 Thread James Bottomley
On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
  Thanks a lot for the discussion.
  
  James Bottomley wrote:
   So your theory is that the data the kernel sees doing the page copy can
   be stale because of dirty cache lines in userspace (which is certainly
   possible in the ordinary way)?
  
  Yes.
  
   By design that shouldn't happen: the idea behind COW breaking is
   that before it breaks, the page is read only ... this means that
   processes can have clean cache copies of it, but never dirty cache
   copies (because writes are forbidden).
  
  That must be design, I agree.
  
  To keep this condition (no dirty cache for COW page), we need to flush
  cache before ptep_set_wrprotect.  That's my point.
  
  Please look at the code path:
 (kernel/fork.c)
 do_fork - copy_process - copy_mm - dup_mm - dup_mmap -
 (mm/memory.c)
 copy_page_range - copy_p*d_range - copy_one_pte - ptep_set_wrprotect
  
  The function flush_cache_dup_mm is called from dup_mmap, that's enough
  for a case of a process with single thread.
  I think that:
  We need to flush cache before ptep_set_wrprotect for a process with
  multiple threads.  Other threads may change memory after a thread
  invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
  a process may sleep at pte_alloc function to get a page.
 
 I agree.  It is interesting that in the case of the Debian bug that
 a thread of the parent process causes the COW break and thereby corrupts
 its own memory.  As far as I can tell, the fork'd child never writes
 to the memory that causes the fault.
 
 My testing indicates that your suggested change fixes the Debian
 bug.  I've attached below my latest test version.  This seems to fix
 the bug on both SMP and UP kernels.
 
 However, it doesn't fix all page/cache related issues on parisc
 SMP kernels that I commonly see.
 
 My first inclination after even before reading your analysis was
 to assume that copy_user_page was broken (i.e, that even if a
 processor cache was dirty when the COW page was write protected,
 it should be possible to do the flush before the page is copied).
 However, this didn't seem to work...  Possibly, there are issues
 with aliased addresses.
 
 I note that sparc flushes the entire cache and purges the entire
 tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
 that I see is not limited to PA8800/PA8900, I'm not convinced
 that we maintain coherency that is required for these processors
 in copy_user_page when we have multiple threads.
 
 As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
 pagefault_disable()/pagefault_enable() on PA8800.
 
 Dave
 -- 
 J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
 National Research Council of Canada  (613) 990-0752 (FAX: 
 952-6602)
 
 diff --git a/arch/parisc/include/asm/pgtable.h 
 b/arch/parisc/include/asm/pgtable.h
 index a27d2e2..b140d5c 100644
 --- a/arch/parisc/include/asm/pgtable.h
 +++ b/arch/parisc/include/asm/pgtable.h
 @@ -14,6 +14,7 @@
  #include linux/bitops.h
  #include asm/processor.h
  #include asm/cache.h
 +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long 
 vmaddr, unsigned long pfn);
  
  /*
   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
 @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
 *mm, unsigned long addr,
   return old_pte;
  }
  
 -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
 addr, pte_t *ptep)
 +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
 mm_struct *mm, unsigned long addr, pte_t *ptep)
  {
  #ifdef CONFIG_SMP
   unsigned long new, old;
 +#endif
 + pte_t old_pte = *ptep;
 +
 + if (atomic_read(mm-mm_users)  1)

Just to verify there's nothing this is hiding, can you make this 

if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

 + flush_cache_page(vma, addr, pte_pfn(old_pte));
  
 +#ifdef CONFIG_SMP
   do {
   old = pte_val(*ptep);
   new = pte_val(pte_wrprotect(__pte (old)));
   } while (cmpxchg((unsigned long *) ptep, old, new) != old);
  #else
 - pte_t old_pte = *ptep;
   set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
  #endif
  }
 diff --git a/mm/memory.c b/mm/memory.c
 index 09e4b1b..21c2916 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct 
 *src_mm,
* in the parent and the child
*/
   if (is_cow_mapping(vm_flags)) {
 - ptep_set_wrprotect(src_mm, addr, src_pte);
 + ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn. There are,
however, three ways to do it with the original signature.

 1. implement copy_user_highpage ... this allows us

Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-02 Thread James Bottomley
On Fri, 2010-04-02 at 12:48 +0900, NIIBE Yutaka wrote:
 Thanks for your quick reply.
 
 James Bottomley wrote:
  In COW breaking, the page table entry is copied, so A and B no longer
  have page table entries at the same physical location.  If the COW is
  intact, A and B have the same physical page, but it's also accessed by
  the same virtual address, hence no aliasing.
 
 Let me explain more.
 
 In the scenario, I assume:
 
   No aliasing between A and B.
   We have aliasing between kernel access and user access.
 
 Before COW breaking A and B share same data (with no aliasing same
 space identifier color), and B sees data in cache, while memory has
 stale data.
 
 At COW breaking, kernel copies the memory, it doesn't see new data
 in cache because of aliasing.
 
 Isn't it possible?

So your theory is that the data the kernel sees doing the page copy can
be stale because of dirty cache lines in userspace (which is certainly
possible in the ordinary way)?  By design that shouldn't happen: the
idea behind COW breaking is that before it breaks, the page is read
only ... this means that processes can have clean cache copies of it,
but never dirty cache copies (because writes are forbidden).  As soon as
one or other process tries to write to the page, it gets a memory
protection trap long before the data it's trying to write goes into the
cache.  By the time the write is allowed to complete (and the cache
becomes dirty), the process will have the new copy of the page which
belongs exclusively to it.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#561203: threads and fork on machine with VIPT-WB cache

2010-04-01 Thread James Bottomley
On Fri, 2010-04-02 at 11:41 +0900, NIIBE Yutaka wrote:
 (9) Process B does read-access on memory, which gets *NEW* data in
 cache (if process space identifier color is same).
 Process B does write-access on memory which causes memory fault,
 as it's COW memory.
 
 Note: Process B sees *NEW* data because it's VIPT-WB cache.
 It shares same memory in this situation.

So I think the bug here is that you're confusing aliasing with SMP cache
coherence.  In an alias situation, the same physical line is mapped to
multiple lines in a processor's cache (at different virtual addresses),
which means you can get a different answer depending on which alias you
read.

In COW breaking, the page table entry is copied, so A and B no longer
have page table entries at the same physical location.  If the COW is
intact, A and B have the same physical page, but it's also accessed by
the same virtual address, hence no aliasing.

In an SMP incoherent system, A and B could get different results (if on
different CPUs) because the write protect is in the cache of A but not
B.  However, PA is SMP coherent, so the act of B reading a line which is
dirty in A's cache causes a flush before the read completes via the
cache chequerboard logic and B ends up reading the same value A would
have read.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#558999: FTBFS [hppa] - recompile with -ffunction-sections

2009-12-01 Thread James Bottomley
 I am not very good at GCC optimizations. Can you please explain why
 this problem is not seen on other architectures? Also can you please
 advise if I should add this compiler option for all arch or just hppa.

It's not really a gcc problem, it's an ELF one.  The ELF spec for HPPA
says that we need to leave symbol resolution as a relative jump.  On
parisc 32 bits, this is a 17bit relative jump.  We can do longer by
indirecting through a stub section.  However, this problem usually
occurs because the actual text section of the .o file is bigger than
131k (about 17 bits) and so the linker can't insert a reachable stub
into the binary.

-ffunction-sections splits the text section up into one section per
function, so now the linker can insert the stubs in between the
functions and thus the problem is solved (until a single function gets
longer than 131k).

Most other architectures have bigger relative jumps, so they likely
won't need -ffunction-sections (unless the file you're compiling gets
much bigger).

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#519707: Upgrade to new version of postgrey fails to start with FATAL: ERROR: ...

2009-09-11 Thread James Bottomley
On Sun, 2009-09-06 at 12:06 +0100, Antonio Radici wrote:
 thanks for your report, I wrote a preinst script which is running a
 db4.7_upgrade if we are upgrading from a version less than 1.31-1.

That sounds like a good fix, thanks.

 This is fixed in the git repo in collab-maint and it will be included in the
 next release.

Unfortunately, since I've already done this on my box, I don't really
have any way to test.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#545229: linux-image-2.6.30-1-parisc: panic on boot

2009-09-05 Thread James Bottomley
Package: linux-image-2.6.30-1-parisc
Version: 2.6.30-6
Severity: critical
Tags: patch
Justification: breaks the whole system



-- Package-specific info:

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (650, 'testing'), (500, 'stable')
Architecture: hppa (parisc)

Kernel: Linux 2.6.26-2-parisc
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.30-1-parisc depends on:
ii  debconf [debconf-2.0] 1.5.27 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.93.4 tools for generating an initramfs
ii  module-init-tools 3.10-3 tools for managing Linux kernel mo

linux-image-2.6.30-1-parisc recommends no packages.

Versions of packages linux-image-2.6.30-1-parisc suggests:
pn  linux-doc-2.6.30  none (no description available)
ii  palo  1.16+nmu1  Linux boot loader for parisc/hppa

-- debconf information:
  linux-image-2.6.30-1-parisc/postinst/kimage-is-a-directory:
  linux-image-2.6.30-1-parisc/postinst/old-initrd-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/lilo-has-ramdisk:
  linux-image-2.6.30-1-parisc/preinst/abort-overwrite-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/old-system-map-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/failed-to-move-modules-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/prerm/removing-running-kernel-2.6.30-1-parisc: 
true
  linux-image-2.6.30-1-parisc/postinst/bootloader-test-error-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/create-kimage-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/depmod-error-initrd-2.6.30-1-parisc: 
false
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.30-1-parisc/preinst/lilo-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/old-dir-initrd-link-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/elilo-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/overwriting-modules-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/preinst/abort-install-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/postinst/bootloader-error-2.6.30-1-parisc:
  linux-image-2.6.30-1-parisc/preinst/bootloader-initrd-2.6.30-1-parisc: true
  linux-image-2.6.30-1-parisc/postinst/depmod-error-2.6.30-1-parisc: false
  linux-image-2.6.30-1-parisc/preinst/initrd-2.6.30-1-parisc:
  
linux-image-2.6.30-1-parisc/prerm/would-invalidate-boot-loader-2.6.30-1-parisc: 
true

---

All current debian 2.6.30-1 kernels panic on boot on parisc systems when
loading the initial modules.

Problem is actually caused by binutils outputting duplicate .text 
section names.  However, this trips a panic on boot because kernel/modules.c
has insufficient error checking for this case

Patches to fix this are

From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001
From: James Bottomley james.bottom...@hansenpartnership.com
Date: Wed, 26 Aug 2009 22:04:12 +0930
Subject: module: workaround duplicate section names

and to fix up that patch

From ea6bff368548d79529421a9dc0710fc5330eb504 Mon Sep 17 00:00:00 2001
From: Ingo Molnar mi...@elte.hu
Date: Fri, 28 Aug 2009 10:44:56 +0200
Subject: modules: Fix build error in the !CONFIG_KALLSYMS case
From 1b364bf438cf337a3818aee77d68c0713f3e1fc4 Mon Sep 17 00:00:00 2001
From: James Bottomley james.bottom...@hansenpartnership.com
Date: Wed, 26 Aug 2009 22:04:12 +0930
Subject: module: workaround duplicate section names

The root cause is a duplicate section name (.text); is this legal?
[ Amerigo Wang: AFAIK, yes. ]

However, there's a problem with commit
6d76013381ed28979cd122eb4b249a88b5e384fa in that if you fail to allocate
a mod-sect_attrs (in this case it's null because of the duplication),
it still gets used without checking in add_notes_attrs()

This should fix it

[ This patch leaves other problems, particularly the sections directory,
  but recent parisc toolchains seem to produce these modules and this
  prevents a crash and is a minimal change -- RR ]

Signed-off-by: James Bottomley james.bottom...@suse.de
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Tested-by: Helge Deller del...@gmx.de
Signed-off-by: Linus Torvalds torva...@linux-foundation.org
---
 kernel/module.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 07c80e6..eccb561 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2355,7 +2355,8 @@ static noinline struct module *load_module(void __user 
*umod,
if (err  0)
goto unlink;
add_sect_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
-   add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
+   if (mod-sect_attrs)
+   add_notes_attrs(mod, hdr-e_shnum, secstrings, sechdrs);
 
/* Get rid of temporary copy */
vfree(hdr);
-- 
1.6.0.2



Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware

2009-08-15 Thread James Bottomley
On Sat, 2009-08-15 at 19:21 +0100, Ben Hutchings wrote:
 On Sat, 2009-08-15 at 10:47 -0700, james.bottom...@hansenpartnership.com
 wrote:
  Package: linux-image-2.6.30-1-686
  Version: 2.6.30-5
  Severity: serious
  Justification: Policy 2.2.1
 
 That very same section explains why we cannot do what you are
 suggesting!

No, it doesn't ... the decision to put firmware-linux in non-free is
obviously wrong, since the same firmware was shipped as is in main with
2.6.26-2

  On upgrade from 2.6.30-2-686 networking (on a remote machine) failed to
  start, meaning that a support ticket had to be opened for KVM access.
 
 I don't recommend running unstable on production machines.

If you bother to read the bug report, you'd see it's actually running
testing.

  Diagnosis revealed that the e100 driver in 2.6.26-2-686 required no
  firmware, so the firmware-linux package wasn't installed.  Apparently
  2.6.30-1-686 was built with external firmware for the e100 so it now
  depends on the firmware-linux package.
  
  This is a serious policy violation because required hardware stops
  working after the upgrade.
 
 No, most systems do not require the firmware-linux package.

That's not really relevant, is it?  linux-image ships with a ton of
drivers most systems don't use as well.

The point is that what was working before the upgrade didn't work after
it.

  Fix suggested is to make 2.6.30-1-686 depend on linux-firmware so that
  on upgrade the necessary firmware is present.
 
 I intend to ensure that firmware-linux is mentioned in the release notes
 for squeeze, but it cannot be recommended or made a dependency.

So this amounts to ... assuming the user can find the notice (because
there's a blizzard of notices that go with each upgrade, particularly if
they're going from lenny - squeeze) you'll tell them that you broke
their system?

The point here is to try and ensure large numbers of systems don't break
before this exits testing for stable.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#541702: linux-image-2.6.30-1-686: Kernel fails to start networking because no e100 firmware

2009-08-15 Thread James Bottomley
OK, so lets go back to basics here.

The point of a bug report is to report a bug.  The Bug here is that
large numbers of systems will break on upgrade to this kernel once it
hits stable.  This is the problem that needs fixing.

The fact that you find the suggested fix politically incorrect, or that
you don't think I should have been able to find the bug in the first
place are irrelevant to the fact that the bug exists.

Apart from being appallingly bad release practice, breaking a
significant fraction of users on an upgrade is also a debian policy
violation as I've cited (the package is too buggy to release because of
all the breakage).

Trying to describe this as fixed because you'll put it in the release
notes is wrong in principle because it doesn't prevent the existing
users from suffering breakage a priori.

A pre upgrade script that detected the problem based on the runtime
detection that the user needed modules with firmware now in
firmware-linux would be acceptable.  Just stop, print the warning and
allow them to OK or cancel.  The list of modules now requiring firmware
surely isn't non-free and it can be derived from the linux build system
fairly easily.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#527265: [scott.bai...@eds.com: Bug#527265: linux-image-2.6.29-1-alpha-smp: detached firmware qlogic/1040.bin fails to load for qla1280]

2009-05-06 Thread James Bottomley
On Wed, 2009-05-06 at 16:19 +0200, maximilian attems wrote:
  [4194023.390744] [ cut here ]
  [4194023.448362] WARNING: at 
 /build/buildd-linux-2.6_2.6.29-3-alpha-bvFcox/linux-2.6-2.6.29/debian/build/source_alpha_none/kernel/so

Is there any way we can get what that file and line actually is?  It
looks like the kernel build hasn't truncated the path name to top of
tree for some reason (did you build with non standard options)?

I suspect it might just be a lockdep error about calling request
firmware with interrupts disabled.

Could you also check to see you have this fix in your kernel:

commit 0ce49d6da993adf8b17b7f3ed9805ade14a6a6f3
Author: David Woodhouse david.woodho...@intel.com
Date:   Wed Apr 8 01:22:36 2009 -0700

qla1280: Fix off-by-some error in firmware loading.

James





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#519707: Upgrade to new version of postgrey fails to start with FATAL: ERROR: ...

2009-03-14 Thread James Bottomley
Package: postgrey
Version: 1.32-3
Severity: serious
Justification: Policy 2.2.1 makes package to buggy to release


This looks to be an incidental fault affecting postgrey, but the serious
consequence is that postgrey refuses to start.

What seems to have happened is that on the recent system upgrade
libberkeleydb-perl (a package upon which postgrey relies) was upgraded
to version 0.38-1.  This version moved from db4.6 to db4.7, which seem
to be mutually incompatible formats causing postgrey to fail instantly with

postgrey: FATAL: ERROR: can't create DB environment: No such file or directory 
(dbdir: /var/lib/postgrey uid/gid: 121,121)  

because it can no longer read the database files in /var/lib/postgrey

The only solution to this appears to be to dump the postgrey databases before
the upgrade and to reform them after (or simply to remove all the old files
losing the accumulated data).

I'm open to this being a severe bug in libberkeleydb-perl either since it
is the root cause, but I think posgrey will still have to acquire the
dump and restore package scripts to fix it.


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (650, 'testing'), (500, 'stable')
Architecture: hppa (parisc)

Kernel: Linux 2.6.26-1-parisc
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages postgrey depends on:
ii  adduser   3.110  add and remove users and groups
ii  debconf   1.5.25 Debian configuration management sy
ii  libberkeleydb-perl0.38-1 use Berkeley DB 4 databases from P
ii  libnet-dns-perl   0.65-1 Perform DNS queries from a Perl sc
ii  libnet-server-perl0.97-1 An extensible, general perl server
ii  perl  5.10.0-19  Larry Wall's Practical Extraction 
ii  ucf   3.0016 Update Configuration File: preserv

Versions of packages postgrey recommends:
ii  libdigest-sha1-perl   2.11-2+b1  NIST SHA-1 message digest algorith
ii  libnet-rblclient-perl 0.5-2  Queries multiple Realtime Blackhol
ii  libparse-syslog-perl  1.10-1 Perl module for parsing syslog ent
ii  postfix   2.5.5-1.1  High-performance mail transport ag

postgrey suggests no packages.

-- debconf information:
  postgrey/1.32-3_changeport:



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#506122: Postgrey port number changes without warning on upgrade

2009-02-21 Thread James Bottomley
This just bit me too on an upgrade to testing now pulling in postgrey.

This is a serious problem because changing the port under postfix gives
you an invalid SMPT configuration ... it won't just time out on the
check, it returns a 451 invalid SMTP config to all callers.  This
doesn't get fixed until the admin checks the log and notices the
problem.

You can't seriously be a server distribution if you're going to do
something as stupid as break a standard
postfix/postgrey/clamav/spamassassin setup without warning:  That's the
gold standard for debian based email servers on the internet.

You cannot allow this to go into stable ... and a warning isn't enough,
it would have to be a preconfig that forces the admin to pay attention
(or preferably something that detects postfix is using a different
port).

James Bottomley





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#479612: after upgrade to spandsp 0.0.4pre18 asterisk-app-fax crashes asterisk

2008-05-05 Thread James Bottomley
Package: asterisk-app-fax
Version: 0.0.20070624-1
Severity: critical
Justification: causes serious data loss

Apparently the dependency of asterisk-spandsp-plugins and spandsp-dev
is pretty tight.  It looks like there was a binary incompatible change
introduced by the upgrade from 0.0.4pre16 to 0.0.4pre18

I verified that asterisk crashes every time a fax is received after the
upgrade

Recompiling asterisk-spandsp-plugins_0.0.20070624-1 and reinstalling makes
the crash go away.  It looks like theres a compiled version in unstable that
will go through to testing in 10 days and fix the problem

The net of this bug report is that either these packages need to be tied
together, or the so version of spandsp needs increasing for every change
(because in the pre stage the ABI is obviously not stable)

-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.24-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages asterisk-app-fax depends on:
ii  asterisk   1:1.4.18.1~dfsg-1 Open Source Private Branch Exchang
ii  libc6  2.7-10GNU C Library: Shared libraries
ii  libspandsp30.0.4pre18-1  Telephony signal processing librar

Versions of packages asterisk-app-fax recommends:
ii  gs-common 0.3.13-0.1 Common files for different Ghostsc
ii  libconfig-tiny-perl   2.12-1 Read/Write .ini style files with a
ii  libfile-sync-perl 0.09-4 Perl interface to sync() and fsync
ii  liblocale-gettext-perl1.05-3 Using libc functions for internati
ii  libmime-lite-perl 3.021-3Perl5 module for convenient genera
ii  libpaper-utils1.1.23 library for handling paper charact
ii  libtiff-tools 3.8.2-8TIFF manipulation and conversion t
ii  psutils   1.17-24A collection of PostScript documen

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#479612: after upgrade to spandsp 0.0.4pre18 asterisk-app-fax crashes asterisk

2008-05-05 Thread James Bottomley
On Mon, 2008-05-05 at 21:47 +0300, Tzafrir Cohen wrote:
 On Mon, May 05, 2008 at 01:13:40PM -0500, James Bottomley wrote:
  Package: asterisk-app-fax
  Version: 0.0.20070624-1
  Severity: critical
  Justification: causes serious data loss
  
  Apparently the dependency of asterisk-spandsp-plugins and spandsp-dev
  is pretty tight.  It looks like there was a binary incompatible change
  introduced by the upgrade from 0.0.4pre16 to 0.0.4pre18
  
  I verified that asterisk crashes every time a fax is received after the
  upgrade
  
  Recompiling asterisk-spandsp-plugins_0.0.20070624-1 and reinstalling makes
  the crash go away.  
  
  The net of this bug report is that either these packages need to be tied
  together, or the so version of spandsp needs increasing for every change
  (because in the pre stage the ABI is obviously not stable)
 
 A new version was uploaded just today. 0.0.20070624-2 . It was aparantly
 built against a newer spandsp. Does it fix this issue for you?

I think it will, that's why I wrote 

 It looks like theres a compiled version in unstable that
 will go through to testing in 10 days and fix the problem

But I'd have to do an upgrade on the running system to check that, which
means not until tomorrow.

James





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#479612: after upgrade to spandsp 0.0.4pre18 asterisk-app-fax crashes asterisk

2008-05-05 Thread James Bottomley
On Mon, 2008-05-05 at 21:30 +0300, Faidon Liambotis wrote:
 James Bottomley wrote:
  Apparently the dependency of asterisk-spandsp-plugins and spandsp-dev
  is pretty tight.  It looks like there was a binary incompatible change
  introduced by the upgrade from 0.0.4pre16 to 0.0.4pre18
  
  I verified that asterisk crashes every time a fax is received after the
  upgrade
  
  Recompiling asterisk-spandsp-plugins_0.0.20070624-1 and reinstalling makes
  the crash go away.  It looks like theres a compiled version in unstable that
  will go through to testing in 10 days and fix the problem
  
  The net of this bug report is that either these packages need to be tied
  together, or the so version of spandsp needs increasing for every change
  (because in the pre stage the ABI is obviously not stable)
 It's been a while since I used app-fax -- therefore, help maintaining it 
 is welcome.
 
 Are you certain that the crash is because of a spandsp ABI change and 
 not because of an asterisk ABI change? We've had one of those as well :(

Yes ... because

 1. There was no asterisk update in the last apt-get update on
Saturday
 2. recompiling sterisk-spandsp-plugins_0.0.20070624-1 with the new
libspandsp-dev fixes the problem (as I said).

James





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#476285: linux-image-2.6.24-1-parisc: panics on boot in cmpxchg_futex_value_locked

2008-04-15 Thread James Bottomley
Package: linux-image-2.6.24-1-parisc
Version: 2.6.24-5
Severity: critical
Tags: patch
Justification: breaks the whole system


This actually isn't just a bug in debian, it affects every distro which
uses the stable tree as a base

for instance, the gentoo bug is here:

http://bugs.gentoo.org/show_bug.cgi?id=217030

The panic is:

backtrace:
 [10587970] init+0x20/0xc4
 [105807e0] kernel_init+0xf4/0x328
 [10109c5c] ret_from_kernel_thread+0x1c/0x24


Kernel Fault: Code=26 regs=8fc241c0 (Addr=)

 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 0100 Not tainted
r00-03  0004ff0f 104fc140 10587970 f0412000
r04-07   105b57c0  
r08-11   1059b810 105b5810 104c3810
r12-15  10568810 1059b810 8fc24088 3b9aca00
r16-19  f8c4 f17c f174 
r20-23  4000 07ff 10587950 0001
r24-27     104c6010
r28-31  8fc24000 c99f4bdd 8fc241c0 105807e0
sr00-03     
sr04-07     

IASQ:   IAOQ: 101433b8 101433bc 
 IIR: 0f401089ISR:   IOR:   
 CPU:0   CR30: 8fc24000 CR31:   
 ORIG_R28:  
 IAOQ[0]: cmpxchg_futex_value_locked+0x28/0x9c  
 IAOQ[1]: cmpxchg_futex_value_locked+0x2c/0x9c  
 RP(r2): init+0x20/0xc4 
Kernel panic - not syncing: Kernel Fault   


The root cause is a backport of this commit:

commit a0c1e9073ef7428a14309cba010633a6cd6719ea
Author: Thomas Gleixner [EMAIL PROTECTED]
Date:   Sat Feb 23 15:23:57 2008 -0800

futex: runtime enable pi and robust functionality

To the stable tree (went in for 2.6.24.4).  This breaks parisc because
we weren't set up to process NULL as a futex cmpxchg address.  We
found and fixed the bug upstream as:

commit c20a84c91048c76c1379011c96b1a5cee5c7d9a0
Author: Kyle McMartin [EMAIL PROTECTED]
Date:   Sat Mar 1 10:25:52 2008 -0800

[PARISC] futex: special case cmpxchg NULL in kernel space

but, because we didn't know tglx had requested a backport, the fix
wasn't backported to stable.

I'll send the necessary patch into stable, but to get parisc working
again on debian it has to be applied on top of the current kernel.

NOTE: This bug was introduced into 2.6.24.4; 2.6.24.3 doesn't have it.


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (650, 'testing')
Architecture: hppa (parisc)

Kernel: Linux 2.6.22-3-parisc
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.24-1-parisc depends on:
ii  debconf [debconf-2.0]1.5.20  Debian configuration management sy
ii  initramfs-tools [linux-initr 0.91e   tools for generating an initramfs
ii  module-init-tools3.3-pre11-4 tools for managing Linux kernel mo

linux-image-2.6.24-1-parisc recommends no packages.

-- debconf information excluded

*** parisc-cmpxchg-fix.diff
From c8d402df60b3aad85b30cfe7df20f829ef6eb895 Mon Sep 17 00:00:00 2001
From: Kyle McMartin [EMAIL PROTECTED]
Date: Sat, 1 Mar 2008 10:25:52 -0800
Subject: [PARISC] futex: special case cmpxchg NULL in kernel space

Commit a0c1e9073ef7428a14309cba010633a6cd6719ea added code to futex.c
to detect whether futex_atomic_cmpxchg_inatomic was implemented at run
time:

+   curval = cmpxchg_futex_value_locked(NULL, 0, 0);
+   if (curval == -EFAULT)
+   futex_cmpxchg_enabled = 1;

This is bogus on parisc, since page zero in kernel virtual space is the
gateway page for syscall entry, and should not be read from the kernel.
(That, and we really don't like the kernel faulting on its own address
 space...)

Signed-off-by: Kyle McMartin [EMAIL PROTECTED]
---
 include/asm-parisc/futex.h |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/asm-parisc/futex.h b/include/asm-parisc/futex.h
index dbee6e6..fdc6d05 100644
--- a/include/asm-parisc/futex.h
+++ b/include/asm-parisc/futex.h
@@ -56,6 +56,12 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, 
int newval)
int err = 0;
int uval;
 
+   /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
+* our gateway page, and causes no end of trouble...
+*/
+   if (segment_eq(KERNEL_DS, get_fs())  !uaddr)
+   return -EFAULT;
+
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
return -EFAULT;
 
@@ -67,5 +73,5 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, 
int newval)
return uval;
 }
 
-#endif
-#endif
+#endif /*__KERNEL__*/
+#endif /*_ASM_PARISC_FUTEX_H*/
-- 
1.5.3.8



-- 
To 

Bug#476292: linux-image-2.6.24-1-parisc64: 64 bit kernel panics on boot in handle_interruption

2008-04-15 Thread James Bottomley
Package: linux-image-2.6.24-1-parisc64
Version: 2.6.24-5
Severity: critical
Tags: patch
Justification: breaks the whole system


The parisc 64 bit kernel panics on boot with this:

  CC  net/ipv4/netfilter/iptable_raw.mod.o
  CC  net/ipv4/tcp_diag.mod.o
  CC  net/ipv4/tunnel4.mod.o
  CC  net/ipv4/xfrm4_mode_beet.mod.o
  CC  net/ipv4/xfrm4_tunnel.mod.o
  CC  net/key/af_key.mod.o
  CC  net/llc/llc.mod.o
  CC  net/llc/llc2.mod.o
  CC  net/netfilter/nfnetlink_log.mod.o
  CC  net/netfilter/nfnetlink.mod.o
  CC  net/netfilter/nfnetlink_queue.mod.o
  CC  net/netfilter/xt_CLASSIFY.mod.o
  CC  net/netfilter/x_tables.mod.o
  CC  net/netfilter/xt_DSCP.mod.o
  CC  net/netfilter/xt_MARK.mod.o
  CC  net/netfilter/xt_NFQUEUE.mod.o
  CC  net/netfilter/xt_comment.mod.o
  CC  net/netfilter/xt_dccp.mod.o
  CC  net/netfilter/xt_dscp.mod.o
  CC  net/netfilter/xt_esp.mod.o
  CC  net/netfilter/xt_length.mod.o
  CC  net/netfilter/xt_limit.mod.o
  CC  net/netfilter/xt_mac.mod.o
  CC  net/netfilter/xt_mark.mod.o
  CC  net/netfilter/xt_multiport.mod.o
  CC  net/netfilter/xt_pkttype.mod.o
  CC  net/netfilter/xt_policy.mod.o
  CC  net/netfilter/xt_realm.mod.o
  CC  net/netfilter/xt_sctp.mod.o
  CC  net/netfilter/xt_string.mod.o
  CC  net/netfilter/xt_tcpmss.mod.o
  CC  net/netfilter/xt_tcpudp.mod.o
  CC  net/packet/af_packet.mod.o
  CC  net/sctp/sctp.mod.o
  CC  net/sunrpc/auth_gss/auth_rpcgss.mod.o
  CC  net/sunrpc/auth_gss/rpcsec_gss_krb5.mod.o
  CC  net/sunrpc/auth_gss/rpcsec_gss_spkm3.mod.o
  CC  net/sunrpc/sunrpc.mod.o
  CC  net/tipc/tipc.mod.o
  CC  net/xfrm/xfrm_user.mod.o
  CC  sound/ac97_bus.mod.o
  CC  sound/core/oss/snd-mixer-oss.mod.o
  CC  sound/core/oss/snd-pcm-oss.mod.o
  CC  sound/core/seq/oss/snd-seq-oss.mod.o
  CC  sound/core/seq/snd-seq-device.mod.o
  CC  sound/core/seq/snd-seq-dummy.mod.o
  CC  sound/core/seq/snd-seq-midi-event.mod.o
  CC  sound/core/seq/snd-seq-midi.mod.o
  CC  sound/core/seq/snd-seq.mod.o
  CC  sound/core/snd-hwdep.mod.o
  CC  sound/core/snd-page-alloc.mod.o
  CC  sound/core/snd-pcm.mod.o
  CC  sound/core/snd-rawmidi.mod.o
  CC  sound/core/snd-timer.mod.o
  CC  sound/core/snd.mod.o
  CC  sound/parisc/snd-harmony.mod.o
  CC  sound/pci/ac97/snd-ac97-codec.mod.o
  CC  sound/pci/rme9652/snd-hdspm.mod.o
  CC  sound/pci/snd-ad1889.mod.o
  LD [M]  crypto/aes_generic.ko
  CC  sound/soundcore.mod.o
  LD [M]  crypto/anubis.ko
  LD [M]  crypto/arc4.ko
  LD [M]  crypto/blkcipher.ko
  LD [M]  crypto/blowfish.ko
  LD [M]  crypto/cast5.ko
  LD [M]  crypto/cast6.ko
  LD [M]  crypto/cbc.ko
  LD [M]  crypto/crc32c.ko
  LD [M]  crypto/crypto_null.ko
  LD [M]  crypto/deflate.ko
  LD [M]  crypto/des_generic.ko
  LD [M]  crypto/ecb.ko
  LD [M]  crypto/khazad.ko
  LD [M]  crypto/gf128mul.ko
  LD [M]  crypto/md4.ko
  LD [M]  crypto/md5.ko
  LD [M]  crypto/michael_mic.ko
  LD [M]  crypto/serpent.ko
  LD [M]  crypto/sha256_generic.ko
  LD [M]  crypto/sha512.ko
  LD [M]  crypto/tcrypt.ko
  LD [M]  crypto/tea.ko
  LD [M]  crypto/tgr192.ko
  LD [M]  crypto/twofish.ko
  LD [M]  crypto/twofish_common.ko
  LD [M]  crypto/wp512.ko
  LD [M]  drivers/base/firmware_class.ko
  LD [M]  drivers/block/aoe/aoe.ko
  LD [M]  drivers/block/cryptoloop.ko
  LD [M]  drivers/block/loop.ko
  LD [M]  drivers/block/pktcdvd.ko
  LD [M]  drivers/block/sx8.ko
  LD [M]  drivers/block/ub.ko
  LD [M]  drivers/block/umem.ko
  LD [M]  drivers/cdrom/cdrom.ko
  LD [M]  drivers/char/lp.ko
  LD [M]  drivers/char/agp/parisc-agp.ko
  LD [M]  drivers/char/raw.ko
  LD [M]  drivers/hid/usbhid/usbhid.ko
  LD [M]  drivers/input/keyboard/hil_kbd.ko
  LD [M]  drivers/input/keyboard/hilkbd.ko
  LD [M]  drivers/input/misc/hp_sdc_rtc.ko
  LD [M]  drivers/input/misc/uinput.ko
  LD [M]  drivers/input/mouse/hil_ptr.ko
  LD [M]  drivers/input/mouse/psmouse.ko
  LD [M]  drivers/input/mouse/sermouse.ko
  LD [M]  drivers/input/serio/parkbd.ko
  LD [M]  drivers/input/serio/pcips2.ko
  LD [M]  drivers/input/serio/serio_raw.ko
  LD [M]  drivers/md/dm-crypt.ko
  LD [M]  drivers/input/serio/serport.ko
  LD [M]  drivers/md/dm-emc.ko
  LD [M]  drivers/md/dm-mirror.ko
  LD [M]  drivers/md/dm-mod.ko
  LD [M]  drivers/md/dm-multipath.ko
  LD [M]  drivers/md/dm-round-robin.ko
  LD [M]  drivers/md/dm-snapshot.ko
  LD [M]  drivers/md/dm-zero.ko
  LD [M]  drivers/md/faulty.ko
  LD [M]  drivers/md/linear.ko
  LD [M]  drivers/md/md-mod.ko
  LD [M]  drivers/md/multipath.ko
  LD [M]  drivers/md/raid1.ko
  LD [M]  drivers/md/raid0.ko
  LD [M]  drivers/md/raid10.ko
  LD [M]  drivers/message/fusion/mptbase.ko
  LD [M]  drivers/message/fusion/mptctl.ko
  LD [M]  drivers/message/fusion/mptfc.ko
  LD [M]  drivers/message/fusion/mptsas.ko
  LD [M]  drivers/message/fusion/mptscsih.ko
  LD [M]  drivers/message/fusion/mptspi.ko
  LD [M]  drivers/net/3c59x.ko
  LD [M]  

Bug#374792: Dell CERC ATA100/4ch support

2007-05-16 Thread James Bottomley
On Wed, 2007-05-16 at 14:36 +0100, Leigh Blackwell wrote:
 I have been looking at the issue with theses cerc devices, has this 
 bug 374792 been closed based on people reverting the firmware to  6.61. 
 
 Unfortunately Dell doesn't support a Firmware version that old on our 
 Server, is it possible to re-open this bug? I have been unable to get the
 current etch install to recognize my driver controller with any of the
 megaraid drivers.

Umm, but this is a bug in Dell Support isn't it?  I don't think there's
a kernel fix for that.

LSIs position is that in current kernels they only support this device
with the new megaraid driver and only for firmware version = 6.61.
Surely you just need to get Dell and LSI on the same page?

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#391384: linux-image-2.6.18-1-686: Compaq Proliant DL380 fails to boot

2006-10-08 Thread James Bottomley
On Sun, 2006-10-08 at 14:40 -0700, Matt Taggart wrote:
 dann frazier writes...
 
  hey Grant/James,
It looks like we're still having cpqarray/sym2 conflicts under
  2.6.18 - any idea what this problem may be?
 
 This is for dl380. At the very bottom (after the close of the bug) of
 
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=380272
 
 someone suggests a fix for dl380.
 
 jejb/ggg,
 
 Does that look like the right fix?

Er ... you mean the email that I sent pointing to a fix in the
scsi-rc-fixes tree?  Then yes, I think it's a correct fix.  It's already
in 2.6.18

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#391384: linux-image-2.6.18-1-686: Compaq Proliant DL380 fails to boot

2006-10-08 Thread James Bottomley
On Sun, 2006-10-08 at 21:16 -0600, Grant Grundler wrote:
 I didn't know Compaq used two different 53[cC]510 parts.
 Patch below adds the same tweak to the 0x0010 device ID.
 
 James or willy, this look good to you?

It seems reasonable.

Can we get confirmation from the bug submitter that it actually fixes
the problem?

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#380272: kernel-image-2.6-686-smp: cpqarray module fails to detect arrays

2006-08-18 Thread James Bottomley
On Fri, 2006-08-18 at 12:39 -0400, Kyle McMartin wrote:
 The problem is because they both claim support for the same PCI Ids:

That's this fix, isn't it?

http://www.kernel.org/git/?p=linux/kernel/git/jejb/scsi-rc-fixes-2.6.git;a=commit;h=b2b3c121076961333977f485f0d54c22121df920

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#345336: package libmail-spf-query-perl in testing has missing dependency on LMAP::CID2SPF

2005-12-30 Thread James Bottomley
Package: libmail-spf-query-perl
Version: 1.997-3

The mail.err log files have these errors in:

Dec 29 06:16:36 redscar spamd[7945]: Can't locate LMAP/CID2SPF.pm in
@INC (@INC
contains: ../lib /usr/share/perl5 /etc/perl /usr/local/lib/perl/5.8.7 
/usr/local/share/perl/5.8.7 /usr/lib/perl5 /usr/lib/perl/5.8 
/usr/share/perl/5.8 /usr/local/lib/site_perl) at 
/usr/share/perl5/Mail/SPF/Query.pm line 1757, GEN462 line 216. 

Because the LMAP::CID2SPF package (listed as a requirement in
www.openspf.org/download.html) isn't present.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-28 Thread James Bottomley
On Sun, 2005-11-20 at 21:21 -0500, Graham Knap wrote:
 Sure enough, the kernel now boots. I'll attach the dmesg output here.
 
 Do you guys have a final patch in mind?
 
 Let me know if there are other tests you'd like me to run. Now that I
 know how to do this, I should be able to turn around test results
 fairly quickly.

OK, try the attached.  If it works out, I'll soak it in -mm for a while
and then try to put it in as a bug fix for 2.6.15.

James

diff --git a/drivers/scsi/scsi_transport_spi.c 
b/drivers/scsi/scsi_transport_spi.c
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -812,12 +812,10 @@ spi_dv_device_internal(struct scsi_devic
if (!scsi_device_sync(sdev)  !scsi_device_dt(sdev))
return;
 
-   /* see if the device has an echo buffer.  If it does we can
-* do the SPI pattern write tests */
-
-   len = 0;
-   if (scsi_device_dt(sdev))
-   len = spi_dv_device_get_echo_buffer(sdev, buffer);
+   /* len == -1 is the signal that we need to ascertain the
+* presence of an echo buffer before trying to use it.  len ==
+* 0 means we don't have an echo buffer */
+   len = -1;
 
  retry:
 
@@ -840,11 +838,23 @@ spi_dv_device_internal(struct scsi_devic
if (spi_min_period(starget) == 8)
DV_SET(pcomp_en, 1);
}
+   /* Do the read only INQUIRY tests */
+   spi_dv_retrain(sdev, buffer, buffer + sdev-inquiry_len,
+  spi_dv_device_compare_inquiry);
+   /* See if we actually managed to negotiate and sustain DT */
+   if (i-f-get_dt)
+   i-f-get_dt(starget);
+
+   /* see if the device has an echo buffer.  If it does we can do
+* the SPI pattern write tests.  Because of some broken
+* devices, we *only* try this on a device that has actually
+* negotiated DT */
+
+   if (len == -1  spi_dt(starget))
+   len = spi_dv_device_get_echo_buffer(sdev, buffer);
 
-   if (len == 0) {
+   if (len = 0) {
starget_printk(KERN_INFO, starget, Domain Validation skipping 
write tests\n);
-   spi_dv_retrain(sdev, buffer, buffer + len,
-  spi_dv_device_compare_inquiry);
return;
}
 




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley
On Sun, 2005-11-13 at 12:41 -0500, Doug Ledford wrote:
 If the drive is unaccessible after the DV failure, even on a warm reboot 
 (which includes a SCSI bus reset), then the drive is flat hung. 
 Something done in the current code is breaking it.  Can you get a boot 
 with DV turned off and capture the log messages and post them here 
 please?  You already said it didn't help with the problem, but I'd like 
 to see the failure scenario with it off, that might help determine the 
 true root cause of the issue.

Yes, you're right ... the sequencer code seems to identify the
WRITE_BUFFER as the failing command.  Can you try with the attached
patch, which will force DV to ignore the echo buffer write tests?

Thanks,

James

diff --git a/drivers/scsi/scsi_transport_spi.c 
b/drivers/scsi/scsi_transport_spi.c
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -816,8 +816,10 @@ spi_dv_device_internal(struct scsi_devic
 * do the SPI pattern write tests */
 
len = 0;
+#if 0
if (scsi_device_dt(sdev))
len = spi_dv_device_get_echo_buffer(sdev, buffer);
+#endif
 
  retry:
 




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley
On Sun, 2005-11-13 at 13:03 -0500, Graham Knap wrote:
 Doug Ledford [EMAIL PROTECTED] wrote:
  You already said it didn't help with the problem, 
 
 I meant that I don't think I successfully disabled DV, because the boot
 messages were *identical*, except for the line where the kernel shows
 the Kernel command line.
 
 I had added this argument at the end of the line:  aic7xxx=dv:{0}
 
 I've re-read aic7xxx.txt and I'm not sure what I'm doing wrong. If
 you can tell me how to disable DV, I'd be happy to give it a try.

aic7xxx.txt is out of date.  The aic7xxx (and 79xx) drivers use the
generic domain validation code now rather than the old aic specific ones
(which is what the dv:{0} option is referring to).  If you try the code
in the prior email, I think that will disable the piece of DV that's
causing the problem.

If the test code succeeds, the problem is pretty nasty:  Apparently the
device claims DT support but in fact rejects DT in the negotiation.  We
use DT support to begin the check for an echo buffer, which starts with
READ_BUFFERS for the descriptor.  Apparently this device returns a valid
descriptor with a reasonable echo buffer size and then promptly throws a
wobbly when we try to use it.

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-13 Thread James Bottomley
On Sun, 2005-11-13 at 14:42 -0500, Doug Ledford wrote:
 The device is on a non-LVD bus.  Certain devices were created back when 
 the spec still stated that using PPR negotiation messages on a non-LVD 
 bus was a no-no.  As the echo buffer was an addition to support DV, and 
 originally DV wasn't intended to be used on non-LVD busses, it might 
 stand to reason that this device simply is going tits up because we are 
 attempting to use the echo buffer while in SE mode.  Checking that 
 PPR/DT is valid (not just between controller and device, but also given 
 bus mode) and only using echo buffer DV when all LVD conditions are met 
 would likely solve the problem (assuming that the problem is what you 
 are referring to).

I think so (pending confirmation of the patch working).  The current DV
code assumes that if the device claims DT support in the INQUIRY data
*and* it returns a valid descriptor to the READ_BUFFER descriptors
command then enhanced DV should be attempted.

What I'm contemplating doing (which is what you also suggest) is
tightening up the check so if the standard DV read tests produce a
negotiation that doesn't set DT then we won't attempt enhanced DV

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-12 Thread James Bottomley
On Tue, 2005-11-08 at 20:47 -0500, Graham Knap wrote:
 Target 0 Negotiation Settings
 User: 40.000MB/s transfers (20.000MHz, offset 127, 16bit)
 Goal: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
 Curr: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)

That's a bit unfortunate ... it shows that the domain validation code
negotiated identical settings in the old kernel, so it doesn't look like
that's the problem.

My best guess would be that the bus is slightly marginal.  The aic7xxx
drivers are notoriously sensitive to bus problems.  Could you try
lowering the bus speed to 10MHz in the aic7xxx bios and see if that
helps?

Thanks,

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#338089: New aic7xxx driver fails spectacularly on 2940UW

2005-11-08 Thread James Bottomley
On Tue, 2005-11-08 at 12:31 +0900, Horms wrote:
 On Mon, Nov 07, 2005 at 09:45:23PM -0500, Graham Knap wrote:
  Package: linux-image-2.6.14-1-686
  Version: 2.6.14-2
  
  Recent versions of the aic7xxx driver will not boot on my secondary PC.
  The 2.6.8 kernel shipped with sarge works perfectly, but neither the
  2.6.12 kernel in testing nor the 2.6.14 kernel in unstable will boot.
  
  This is an older system: 
  Asus P2L-B, Celeron 500MHz, 384MB RAM, GeForce2 MX AGP
  Adaptec 2940UW, IBM DDYS-T09170 (9GB disk)
  
  I can't understand what exactly is failing, but I will attach a boot
  log. (So null modem cables *are* still useful for something!)
  
  I've tried adding aic7xxx=dv:{0} to the boot arguments but that
  doesn't seem to make a difference. Also, aic7xxx=verbose doesn't seem
  to do anything either.
  
  I don't know if this makes a difference but my 2940UW reports its BIOS
  revision as 1.34.3 during POST.
  
  Any help would be much appreciated.
 
 Hi Graham, 
 
 thanks for your detailed report. This does smell a lot like a driver
 bug, and as such, its proably best passed onto the upstream maintainers.
 As such I've CCed James Bottomley and linux-scsi for comment.
 
 The other main possiblility, is that perhaps the aic7xxx_old driver would
 work. Or perhaps some other module loading foo, though its seems the
 module is loaded fine, it just doesn't like your card very much.

This is an older drive, so it looks like it passes domain validation
(read only) but then chokes on the next command.  On 2.6.8, what do the
transport settings report? (that's cat /proc/scsi/aic7xxx/0)?

Thanks,

James




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]