Re: Bizarre issue: USB 3 disconnecting and dying

2015-07-22 Thread David Fuchs
On Wed, Jul 22, 2015 at 3:17 PM, Håkon Alstadheim 
wrote:

> How long are your USB-cables? What kind of power supply do you have, and
> what else is drawing power?
>
The USB cable the drive is currently on is fairly short - about 80cm if I
had to guess. The drive has its own wall-wart specced at 2A (12V). There's
nothing plugged into the USB ports that would draw power (the only other
device plugged in is a UPS).

>
> I just went from a PSU specced at 5x the needed sustained power to 10x .
> Got a slight but definite improvement in USB stability. Yes, my system is
> drawing ~ 150w from a 1200w supply. Externally powered hubs were no help.
> 5v is specced at 30amps max. I still get drop-outs , very rarely reqiuring
> hard boot now, except if I try the multi-media-keys on my back-lit usb2.0
> keyboard.
>
The host is powered by a 300W  Fortron SFX PSU, which should be about 5-6x
of actual power draw. It's plugged into the UPS so power should be fairly
stable... but you do bring up a good point, the PSU is one element I hadn't
considered. Might try to swap that out, too, when I get a chance.


>
> --
> To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact
> listmas...@lists.debian.org
> Archive:
> https://lists.debian.org/fa2b0565-1df4-4fbc-b280-8866b2083...@alstadheim.priv.no
>
>


Re: Bizarre issue: USB 3 disconnecting and dying

2015-07-20 Thread David Fuchs
Thanks Joel and Jape for the responses!

I did indeed have entries for this disk in /etc/fstab (and /etc/crypttab,
since it's an encrypted disk). Per your suggestion I've removed those to
make sure. The issue isn't just that the FS isn't mounted anymore, though -
as far as the host is concerned, the underlying device simply ceases to
exist. The disk doesn't just get renamed by udev either - there's no entry
for it in /dev/sd*, no output whatsoever from dmesg when I plug it out and
back in.

On Mon, Jul 20, 2015 at 4:29 PM, Joel Rees  wrote:

> On Tue, Jul 21, 2015 at 3:46 AM, David Fuchs  wrote:
> > Hi all,
> >
> > I have an issue with an external hard drive that I'm at my wit's end
> with.
> > I'll try to keep it short:
> >
> > My system is connected to an external SATA HD via USB 3 (used for
> backups).
> > For 6+ months, this setup has worked flawlessly.
>
> My instincts are to throw a lot of negativity at you about USB. But I
> will ask one question, did you leave it plugged in all the time? Were
> you careful when you plugged it in and unplugged it?
>
Yes, the device is plugged in all the time, with nobody near it most of the
times the disk decided to disappear. And I'm careful when plugging it in to
make sure it's properly connected, I even took compressed air to both ends
of the connection to make sure there was no dust in there.

>
> > About a week ago, I disconnected the external drive (a Seagate GoFlex
> > docking station + disk combo, but I've since switched to another
> enclosure),
> > put another drive in the dock, and reconnected. Ever since, my system is
> > possessed.
>
> Any chance that you have an entry in /etc/fstab for it (as Jape suggests)?
>
> If so, do the entries specify the disk by something other than UUID or
> label? UUID and label are the only options which should be used any
> more, and UUID is somewhat more to be recommended.
>
I did have some entries in /etc/crypttab specifying the disk by UUID. I've
removed them.

>
> (I'm thinking, if I could read your logs and stay awake, I might be
> able to tell. I'm drifting in and out, so I'm not going to try.)
>
> (Skipping negativity about udev.)
>
> > At random times, the external drive will disconnect for no discernible
> > reason. It can happen in the middle of a write or after the disk has been
> > idle or sleeping for hours. It may happen within minutes or days after
> the
> > device was first connected.
> >
> > The only relevant thing I can find in the logs is a laconic "usb 3-1: USB
> > disconnect, device number 3".
> >
> > Once the system is in this state, things are thoroughly messed up. For
> > starters, the disk will not reconnect (no errors or messages in dmesg)
> if I
> > plug it out and back in. Even rebooting the host will not bring it back!
> > Also, anything assuming the existence of certain USB devices is borked.
> > lsusb just hangs, forever. I can't kill -9 it. Heck, sometimes I can't
> even
> > rmmod xhci_hcd (same thing - just hangs, unkillable.)
> >
> > Weirdly enough, USB 2 devices still work in the USB 3 port. In fact,
> this is
> > where this tale becomes entirely bizarre.
>
> Unfortunately, not really all that bizarre. Or, at least, not more
> bizarre than, erm, let's skip that negativity, I guess.
>
> On possibility that occurs to me here, might you have damaged the
> actual physical connector?
>
That was my first hunch, but since I've used several cables, it seems
unlikely.

>
> > The only way so far that I've
> > found to get the USB 3 back to live is this workaround: I plug in a USB 2
> > device in the USB 3 port (I have a Lexar memory card reader I use for
> this
> > purpose, but presumably, any USB 2 device would do), plug it out, plug
> the
> > disk back in, and voila! It connects. Until it disconnects again, and the
> > insane rain dance begins anew.
> >
> > At this point, I have:
> > * tried 3 different HDDs (from 3 different manufacturers) so it's
> probably
> > not related to the disk.
> > * tried 2 different external enclosures/docks, so it's probably not
> related
> > to the usb-sata adapter.
> > * tried 2 different USB cables with those docks, so it's probably not the
> > cable.
> > * swapped the motherboard (a supermicro A1SAi-2750F) with an identical
> new
> > one, so probably not an electrical or mechanical issue with the board.
>
> My goodness.
>
> > * disabled USB autosuspend (options usbcore autosuspend=-1 and
> > autosuspend_delay_ms=-1)
>
> Yeah.
>
&g

Bizarre issue: USB 3 disconnecting and dying

2015-07-20 Thread David Fuchs
Hi all,

I have an issue with an external hard drive that I'm at my wit's end with.
I'll try to keep it short:

My system is connected to an external SATA HD via USB 3 (used for backups).
For 6+ months, this setup has worked flawlessly.

About a week ago, I disconnected the external drive (a Seagate GoFlex
docking station + disk combo, but I've since switched to another
enclosure), put another drive in the dock, and reconnected. Ever since, my
system is possessed.

At random times, the external drive will disconnect for no discernible
reason. It can happen in the middle of a write or after the disk has been
idle or sleeping for hours. It may happen within minutes or days after the
device was first connected.

The only relevant thing I can find in the logs is a laconic "usb 3-1: USB
disconnect, device number 3".

Once the system is in this state, things are thoroughly messed up. For
starters, the disk will not reconnect (no errors or messages in dmesg) if I
plug it out and back in. Even rebooting the host will not bring it back!
Also, anything assuming the existence of certain USB devices is borked.
lsusb just hangs, forever. I can't kill -9 it. Heck, sometimes I can't even
rmmod xhci_hcd (same thing - just hangs, unkillable.)

Weirdly enough, USB 2 devices still work in the USB 3 port. In fact, this
is where this tale becomes entirely bizarre. The only way so far that I've
found to get the USB 3 back to live is this workaround: I plug in a USB 2
device in the USB 3 port (I have a Lexar memory card reader I use for this
purpose, but presumably, any USB 2 device would do), plug it out, plug the
disk back in, and voila! It connects. Until it disconnects again, and the
insane rain dance begins anew.

At this point, I have:
* tried 3 different HDDs (from 3 different manufacturers) so it's probably
not related to the disk.
* tried 2 different external enclosures/docks, so it's probably not related
to the usb-sata adapter.
* tried 2 different USB cables with those docks, so it's probably not the
cable.
* swapped the motherboard (a supermicro A1SAi-2750F) with an identical new
one, so probably not an electrical or mechanical issue with the board.
* disabled USB autosuspend (options usbcore autosuspend=-1 and
autosuspend_delay_ms=-1)
* upgraded from Wheezy (kernel 3.2.0-4-amd64) to Jessie (3.16.0-4-amd64).

Could this be a kernel bug introduced somewhere around 3.2.0-4, and still
present in 3.16.0-4? Or is the USB 3 controller on my board just buggy (and
if so, any idea why has this not manifested itself until recently?) Any
workarounds I can try (other than using USB 2)?

Thanks in advance!
- Dave.

Relevant system & log info:

*uname -a*
Linux deepthought 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
x86_64 GNU/Linux

*lsusb*
Bus 001 Device 005: ID 0557:2419 ATEN International Co., Ltd
Bus 001 Device 004: ID 0557:7000 ATEN International Co., Ltd Hub
Bus 001 Device 002: ID 8087:07db Intel Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 174c:55aa ASMedia Technology Inc. ASMedia 2105 SATA
bridge
Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 002 Device 003: ID 0764:0601 Cyber Power System, Inc.
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

(the only things plugged into USB are the external disk and a UPS. The ATEN
devices, I believe, are a kb and mouse emulated by the IPMI).

*lspci*
00:00.0 Host bridge: Intel Corporation Atom processor C2000 SoC Transaction
Router (rev 02)
00:01.0 PCI bridge: Intel Corporation Atom processor C2000 PCIe Root Port 1
(rev 02)
00:02.0 PCI bridge: Intel Corporation Atom processor C2000 PCIe Root Port 2
(rev 02)
00:03.0 PCI bridge: Intel Corporation Atom processor C2000 PCIe Root Port 3
(rev 02)
00:0e.0 Host bridge: Intel Corporation Atom processor C2000 RAS (rev 02)
00:0f.0 IOMMU: Intel Corporation Atom processor C2000 RCEC (rev 02)
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0
(rev 02)
00:14.0 Ethernet controller: Intel Corporation Ethernet Connection I354
(rev 03)
00:14.1 Ethernet controller: Intel Corporation Ethernet Connection I354
(rev 03)
00:14.2 Ethernet controller: Intel Corporation Ethernet Connection I354
(rev 03)
00:14.3 Ethernet controller: Intel Corporation Ethernet Connection I354
(rev 03)
00:16.0 USB controller: Intel Corporation Atom processor C2000 USB Enhanced
Host Controller (rev 02)
00:17.0 SATA controller: Intel Corporation Atom processor C2000 AHCI SATA2
Controller (rev 02)
00:18.0 SATA controller: Intel Corporation Atom processor C2000 AHCI SATA3
Controller (rev 02)
00:1f.0 ISA bridge: Intel Corporation Atom processor C2000 PCU (rev 02)
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)
01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev
03)
02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics
Family (rev 30)
03:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 H

Re: Re: dm-crypt/LUKS performance

2014-11-17 Thread David Fuchs
> Which Debian release? Kernel? Motherboard make/ model? CPU model? RAM
module(s)
> make/ model? SSD exact model? Defaults? Customizations?
My initial post was indeed a little light on details, so here's more info:
I'm dealing with a pristine installation of Wheezy. It is running on a
Supermicro A1SAi-2750F with 16GB of Kingston SODIMM RAM and a Intel c2750
Avoton octo-core. The drive I was referring to in my tests is a Samsung 840
PRO with 128 GB.
There's no load on the system what-so-ever other than the tests I'm running.

I measured read/write speeds by a) writing/reading directly to/from the
partition, b) cyrptsetup luksFormat + cryptsetup luksOpen and then writing
to the corresponging /dev/mapper/ device. No LVM or other
indirections involved as someone else suggested might have been the case.

> That SSD appears to have hardware encryption.  So, why dm-crypt?
Basically, what someone else already said - "hardware encryption" on SSDs
is not really useful or trustworthy.

> I assume those are non-default option values. Predicting and measuring
how each of those are
> affected by AES-NI would be a non-trivial task. You might want to try
some benchmarks using the
> defaults first, and then go from there.
I've tried a few options, mostly different key sizes, with similarly bad
performance penalties.

> Did you do a secure erase on the SSD prior to the write tests? SSD's can
write to fresh blocks faster
> than they can reclaim old blocks, erase them, and then write.
I didn't erase/trim anything from the drive. However, it is a brand new
drive that hasn't had more than a few GB written to it in its lifetime, so
I wouldn't think this is an issue. Also, I'd expect degrading SSD
performance to affect both encrypted and non-encrypted writes (and it
really shouldn't affect any of the reads).

> Your first write test is 256 MB, but all the other tests are 512 MB. Why
the unequal size?
As you point out, my tests were certainly not very systematic or scientific
and might even be completely non-representative of any real-world workload.
If I have the time, I will grab another spare drive, and run some more
thorough tests. As crude as my tests were, though, I'm still baffled by the
results.

Thanks,
- Dave.


dm-crypt/LUKS performance

2014-11-16 Thread David Fuchs
Hi all,

First off, I realize this question has been asked here and elsewhere
before, but I can't seem to find any recent relevant numbers on this.

I am setting up a system with an Intel octo-core Avoton, which has AES-NI
support. After doing some crude benchmarking tests with dd, I am surprised
about the huge performance penalty that full-disk encryption apparently has
on read/write throughput.

In short, the write speed plummets to around 160 MB/s, as opposed to 270
MB/s on the naked partition; read speed is at 115 MB/s (slower than writing
- no idea why), as opposed to 465 MB/s on the bare partition. (I've pasted
the results below.)

I encrypted the partition with aes-xts-plain64, sha-512 and a 512 bit key,
but also tried 256 bit key with similar results. The drive in question is a
Samsung 840 pro SSD, but I've fiddled with a couple of spinning drives
before, and the performance penalty was similarly bad.

The system will be used as a home file server, and the results with drive
encryption are still acceptable - but I'm still curious if they are to be
expected, or if there is an obvious culprit for the performance hit. Is it
possible that I'm not using the hardware AES?

Thanks,
- Dave.

Encrypted drive setup:

cryptsetup luksFormat -c aes-xts-plain64 --hash sha512 --iter-time 2000
--use-random -s 512 /dev/sdd6


Results w/o encryption:

# dd bs=1M count=256 if=/dev/zero of=/dev/sdd6 conv=fdatasync
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.990527 s, 271 MB/s

# dd bs=1M count=512 if=/dev/sdd6  of=/dev/null
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 1.15489 s, 465 MB/s


Results with encryption:

# dd bs=1M count=512 if=/dev/zero  of=/dev/mapper/test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 3.26955 s, 164 MB/s

# dd bs=1M count=512 if=/dev/mapper/test  of=/dev/null
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 4.66179 s, 115 MB/s


Re: boot error messages with custom kernel

2007-05-21 Thread David Fuchs

thanks for the help.

as I mentioned, the modules.dep file is there - but not in the initrd
image that's created.

however, I've solved it by recompiling the kernel with the parallel
port driver as a module rather than into the kernel. it seems it was
this driver that tried to load some additional modules too early.

cheers,
- Dave.

On 5/22/07, Greg Folkert <[EMAIL PROTECTED]> wrote:

On Mon, 2007-05-21 at 22:33 +0200, David Fuchs wrote:
> hi all,
>
> I want to have a grsecurity enabled kernel and thus compiled my own.
> while doing so, I also removed tons of modules from the kernel config
> (drivers I know I'll never need), and chose to compile some into the
> kernel instead of modules (e.g., drivers for my sata disks).
>
> I followed the directions found at
> 
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-kernel-org-package.
>
> now, during the boot process, just after the kernel boots, I get some
> error messages:
>
> FATAL: Could not load /lib/modules/2.6.19.2-grsec.1/modules.dep: No
> such file or directory
>
> apart from this, the system boots perfectly fine and runs normal. the
> file /lib/modules/2.6.19.2-grsec.1/modules.dep does exist, but there
> is no such file in the generated initrd image (neither is there in the
> default kernel's).
>
> so, why exactly is it looking for this file, and how do I get rid of the 
error?

It is looking for the file created and/or updated by "depmod" while
running the following command:

depmod -e -F /boot/System.map-`uname -r` -v `uname -r`

Supposedly, as far as everything is there, yours should look like this:

depmod -e -F /boot/System.map-2.6.19.2-grsec.1 -v
2.6.19.2-grsec.1

That should update and give you the following files
in /lib/modules/2.6.19.2-grsec.1/:

modules.alias
modules.ccwmap
modules.dep
modules.ieee1394map
modules.inputmap
modules.isapnpmap
modules.ofmap
modules.pcimap
modules.seriomap
modules.symbols
modules.usbmap

Here is hoping.
--
greg, [EMAIL PROTECTED]
PGP key: 1024D/B524687C  2003-08-05
Fingerprint: E1D3 E3D7 5850 957E FED0  2B3A ED66 6971 B524 687C
Alternate Fingerprint: 09F9 1102 9D74  E35B D841 56C5 6356 88C0






--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




boot error messages with custom kernel

2007-05-21 Thread David Fuchs

hi all,

I want to have a grsecurity enabled kernel and thus compiled my own.
while doing so, I also removed tons of modules from the kernel config
(drivers I know I'll never need), and chose to compile some into the
kernel instead of modules (e.g., drivers for my sata disks).

I followed the directions found at
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-kernel-org-package.

now, during the boot process, just after the kernel boots, I get some
error messages:

FATAL: Could not load /lib/modules/2.6.19.2-grsec.1/modules.dep: No
such file or directory

apart from this, the system boots perfectly fine and runs normal. the
file /lib/modules/2.6.19.2-grsec.1/modules.dep does exist, but there
is no such file in the generated initrd image (neither is there in the
default kernel's).

so, why exactly is it looking for this file, and how do I get rid of the error?

thanks,
- Dave.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: network interfaces fail to start on boot

2007-05-21 Thread David Fuchs

haha... I found out why I need to manually start my interfaces :)

the init script /etc/init.d/networking is missing !

still, udev tries bringing up the two physical interfaces if they're
set as allow=hotplug - but there's a line

wait_for_lo

in /lib/udev/net.agent which means that no physical interface can be
enabled as long as the loopback device is down.

aptitude reinstall netbase, problem solved :)

thanks for the help,
- Dave.

On 5/21/07, Andrei Popescu <[EMAIL PROTECTED]> wrote:

On Mon, May 21, 2007 at 12:37:34AM +0200, David Fuchs wrote:

Putting this back on list.

> On 5/21/07, Andrei Popescu <[EMAIL PROTECTED]> wrote:
> >On Sun, May 20, 2007 at 11:58:27PM +0200, David Fuchs wrote:
> >> hi all,
> >>
> >> I just installed Etch on a system with 3 network interfaces: eth0,
> >> eth1, lo. all network interfaces are configured as auto in
> >> /etc/network/interfaces.
> >>
> >> for some reason, the interfaces all fail to start on boot. I have to
> >> manually run ifup -a for them to work. how can I fix this?
> >
> >Can you please post your /etc/network/interfaces and the output of
> >'dmesg | grep eth0'
>
> # dmesg | grep eth
> eth0: RealTek RTL8139 at 0x400, xx:xx:xx:xx:xx:xx, IRQ 18
> eth0:  Identified 8139 chip type `RTL 8139C`
> eth1: VIA Rhine II at 0x18400, xx:xx:xx:xx:xx:xx, IRQ 20
> eth1 MII PHY found at address 1, status 0x786d advertising 01e1 Link 45e1

This looks good, but you could try to have a look through the entire
dmesg, maybe you can spot something.

> 
> auto lo
> iface lo inet loopback
>
> iface eth0 inet dhcp
>
> iface eth1 inet static
>address 10.0.0.11
>netmask 255.255.255.0
>network 10.0.0.0
>broadcast 10.0.0.255
>gateway 10.0.0.111
>dns-nameservers 10.0.0.111
>dns-search domain.com
>
> auto eth0
> auto eth1
>
> 

Looks ok to me. You could also enable bootlog in /etc/default/bootlogd
This will keep the boot messages (which are different then dmesg/syslog)
and maybe you can spot something which you are missing at boottime.

> >> another nuisance: every 20 minutes, some process prints "-- MARK --"
> >> to the console. I've never seen this before on any system - what is
> >> this about?
> >
> >That's strange. I know syslog prints this in /var/log/syslog if there's
> >nothing (notable) happening on your computer, but it shouldn't go to the
> >console. Did you change anything from the default setup?
>
> nope. it's a completely pristine installation.

Sorry, I'm out of ideas.

Regards,
Andrei
--
If you can't explain it simply, you don't understand it well enough.
(Albert Einstein)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGUUi1qJyztHCFm9kRAgDjAJ9bfT0nhBz/cXblgSp8fpoK81aZFwCdGSps
ZaOSnMgCTTA++ecz8/y+IME=
=lxGP
-END PGP SIGNATURE-





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: mounting LVM partitions fails after etch upgrade

2007-05-21 Thread David Fuchs

dear all,

a while back I posted to this list because my file systems on LVM over
RAID1 would not mount cleanly anymore after upgrade from sarge to
etch. this weekend I had time to poke around in the data on both the
disks, and found out what was wrong.

as it turns out, since almost a year, *no* data at all was written to
one of the disks!! that didn't stop mdadm from happily reporting that
everything with the array was in perfect order, though. I rebooted the
system a few times during this period, and not even when assembling
the array it complained about anything.

due to the upgrade of mdadm, it seems that the s/w raid started using
both disks again, and by writing data to the 'old' disk, corrupting
some of the out-of-date data there. I'm glad I didn't try to fix this
with fsck, it probably would have completely toasted the data on both
disks.

how can such a catastrophic failure of a raid array happen, and worse,
go completely unnoticed? I don't think it's a config issue, it
perfectly mirrored all data before that point. both disks are
physically perfect, not a single bad block.

cheers,
- Dave.

On 5/6/07, Douglas Allan Tutty <[EMAIL PROTECTED]> wrote:

On Sun, May 06, 2007 at 03:25:02PM +0200, David Fuchs wrote:
> I have just upgraded my sarge system to etch, following exactly the upgrade
> instructions at http://www.us.debian.org/releases/etch/i386/release-notes/.
>
> now my system does not boot correctly anymore... I'm using RAID1 with two
> disks, / is on md0 and all other mounts (/home/, /var, /usr etc) are on md1
> using LVM.
>
> the first problem is that during boot, only md0 gets started. I can get
> around this by specifying break=mount on the kernel boot line and manually
> starting md1, but where need I change what so that md1 gets started at this
> point as well?
>
> after manually starting md1 and continuing to boot, I get errors like
>
> Inode 184326 has illegal block(s)
> /var: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without the -a or -o
> options)
>
> ... same for all other partitions on that volume group
>
> fsck died with exit status 4
> A log is being saved in /var/log/fsck/checkfs if that location is
> writable.(it is not)
>
> at this point I get dropped to a maintenance shell. when I select to
> continue the boot process:

What happens if instead of forcing a boot you do what it says: run fsck
without the -a or -o options?

>
> EXT3-fs warning: mounting fs with errors. running e2fsck is recommended
> EXT3 FS on dm-4, internal journal
> EXT3-FS: mounted filesystem with ordered data mode.
> ... same for all mounts (same for dm-3, dm-2, dm-1, dm-0)
>
> EXT3-fs error (device dm-1) in ext3_reserve_inode_write: Journal has aborted
> EXT3-fs error (device dm-1) in ext3_orphan)write: Journal has aborted
> EXT3-fs error (device dm-1) in ext3_orphan_del: Journal has aborted
> EXT3-fs error (device dm-1) in ext3_truncate_write: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device dm-1): ext3_journal)_start_sb: Detected aborte
> djournal
> Remounting filesystem read-only
>
> and finally I get tons of these:
>
> dm-0: rw-9, want=6447188432, limit=10485760
> attempt to access beyond end of device
>
> the system then stops for a long time (~5 minutes) at "starting systlog
> service" but eventually the login prompt comes up, and I can log in, see all
> my data, and even (to my surprise) write to the partitions on md1...
>
...which probably corrupts the fs even more.

> what the hell is going on here? thanks a lot in advance for any help!
>
What is going on is that you started with a simple booting error that
has propogated into filesystem errors.  Those errors are compounded by
forcing a mount of a filesystem with errors .  Remember that the system
that starts  LVM and raid itself exists on the disks

What you need is a shell with the root fs either totally unmounted or
mounted ro.  Does booting single-user work?  What about telling the
kernel init=/bin/sh? From there, you can check the status of the mds
with:

#/sbin/mdadm -D /dev/md0
#/sbin/mdadm -D /dev/md1
...

check the status of the logical volumes:
#/sbin/lvdisplay [lvname]

and then check the filesystems with:

#/sbin/e2fsck -f -c -c  /dev/...


Only once you get the filesystems fully functional should you attempt to
boot further.

Doug.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




network interfaces fail to start on boot

2007-05-20 Thread David Fuchs

hi all,

I just installed Etch on a system with 3 network interfaces: eth0,
eth1, lo. all network interfaces are configured as auto in
/etc/network/interfaces.

for some reason, the interfaces all fail to start on boot. I have to
manually run ifup -a for them to work. how can I fix this?

another nuisance: every 20 minutes, some process prints "-- MARK --"
to the console. I've never seen this before on any system - what is
this about?

thanks,
- Dave.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: mounting LVM partitions fails after etch upgrade

2007-05-07 Thread David Fuchs

> yes, / on md0 does get fsck'd cleanly, whether in single boot or
> 'normal' boot. I can get into a root shell w/o any filesystem related
> errors.

Good.  Now if only debian's single-user mode didn't start all kinds of
extras that need /usr and /var

> the problem are all other mounts, which reside on LVM on md1. fsck
> tells me that there are hundreds of inodes with thousands of illegal
> blocks.
>
> I never had any problems related to fs corruption, and I don't see how
> a simple system upgrade could cause this. so, I'm still thinking that
> something with the raid or lvm setup is screwed, but I don't know what
> or why.
>
> as you can probably tell I have never dealt with fixing a broken fs,
> but I'm afraid that running e2fsck would completely screw my data.
> what I primarily want is not a fs w/o errors but rescue as much data
> as possible...

You mean you don't have backups?  On which fs is the non-backed-up data?


I do have backups of the most important files on DVDs. everything else
was (partially, as space allowed...) backed up to other logical
volumes, but everything except / was in volumes on the same volume
group on the same disk array, making this pretty useless :(
I never even remotely imagined the possibility of all file systems
becoming corrupt at once. Mainly I'd like back my /home and
/home/vpopmail partitions.


> >Note that e2fsck can take several passes.  Also, you don't want the -a
> >option (which is the backward-compatible version of -p) which exists
> >with error code 4 if a problem would require human intervention, since
> >you are there to intervene and don't want it to exit.

If I remember previous posts in this thread, this all started after an
upgrade and you got an fsck warning that said to run fsck manually and
instead of following fsck's advice, you forced a normal mount of unclean
filesystems.

that was probably not the smartest of possible actions...


 Any data that's been corrupted has probably already been
corrupted.

Boot into single-user mode and ensure that non-root fs are totally
unmounted.  Then run e2fsck -f as many times as it takes to fix.  This
gets the fs into a consistant state but you may have already lost data.

If you had full backups of your data, at this point its probably easier
to reinstall.  Remember that some of the data lost will be debian's,
e.g. corrupted files in /usr/bin.  If it were me, I'd get the partition
that had my data fixed, back up the data, then do a clean install.


what I'll do is the following: I'll rip out one of the mirrors so that
I always have all data in its current state no matter what happens.
then I install a fresh disk, boot the system with some live-cd, copy
an image of the old disk to the new one and see how much I can get out
of it with a tool like e2salvage. from there I'll install a clean
system.

and I'll have a look at alternative file systems.


For your next install, you may want to review the list archives for
threads on the choice of filesystems.  Each (except perhaps reiser now)
have people who swear by them.  Personally, I swear by JFS after bad
experience with reiserfs and an experience similar to yours with ext3
where I _did_ follow the instruction to do a manual fsck; it still hosed
my data.  I had backups.

FS corruption is nasty to put it politely.  You have my sympathies.
Good luck.


thanks for all your help!

cheers,
- Dave.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: mounting LVM partitions fails after etch upgrade

2007-05-07 Thread David Fuchs

On 5/6/07, Douglas Allan Tutty <[EMAIL PROTECTED]> wrote:

On Sun, May 06, 2007 at 07:27:26PM +0200, David Fuchs wrote:
> I mounted (read-only) some of the virtual volumes, to see if the data is
> still there... it seems as if there is some 'offset' on the file system,
> i.e.
> when looking at some file it contains stuff that should be in a completely
> different file... or it tells me attempt to access beyond end of device.

I gues this confirms that these filesystems are corrupted.
>
> during the 'normal' boot process (i.e. init=/bin/sh not set) this is the
> exact error I get:
>
> [/sbin/fsck.ext3 (1) -- /var ] fsck.ext3 -a -C0 /dev/mapper/volg1-b
> fsck.ext3: no such file or directory while trying to open
> /dev/mapper/volg1-b
> /dev/mapper/volg1-b:
> The superblock could not be read or does not describe a correct ext2
> filesystem. if the device is valid and it really contains an ext2 filsystem
> (and not swap or ufs or something else), then the superblock is corrupt, and
> you might try running e2fsck with an alternate superblock:
>e2fsck -b 8192 

What happens if you now boot single?  Does your root filesystem get
fsck'd cleanly?  This is the first step.  Once you can cleanly get into
a single-user root shell life is much better.  From here you can again
verify that the md arrays and lvs are in good shape automatically and
from there you can fsck the filesystems without having them mounted at
all.


yes, / on md0 does get fsck'd cleanly, whether in single boot or
'normal' boot. I can get into a root shell w/o any filesystem related
errors.

the problem are all other mounts, which reside on LVM on md1. fsck
tells me that there are hundreds of inodes with thousands of illegal
blocks.

I never had any problems related to fs corruption, and I don't see how
a simple system upgrade could cause this. so, I'm still thinking that
something with the raid or lvm setup is screwed, but I don't know what
or why.

as you can probably tell I have never dealt with fixing a broken fs,
but I'm afraid that running e2fsck would completely screw my data.
what I primarily want is not a fs w/o errors but rescue as much data
as possible...

thanks,
- Dave.



Note that e2fsck can take several passes.  Also, you don't want the -a
option (which is the backward-compatible version of -p) which exists
with error code 4 if a problem would require human intervention, since
you are there to intervene and don't want it to exit.

Let us know how you progress.

Doug.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]




Re: mounting LVM partitions fails after etch upgrade

2007-05-06 Thread David Fuchs

I mounted (read-only) some of the virtual volumes, to see if the data is
still there... it seems as if there is some 'offset' on the file system, i.e.
when looking at some file it contains stuff that should be in a completely
different file... or it tells me attempt to access beyond end of device.

during the 'normal' boot process (i.e. init=/bin/sh not set) this is the
exact error I get:

[/sbin/fsck.ext3 (1) -- /var ] fsck.ext3 -a -C0 /dev/mapper/volg1-b
fsck.ext3: no such file or directory while trying to open
/dev/mapper/volg1-b
/dev/mapper/volg1-b:
The superblock could not be read or does not describe a correct ext2
filesystem. if the device is valid and it really contains an ext2 filsystem
(and not swap or ufs or something else), then the superblock is corrupt, and
you might try running e2fsck with an alternate superblock:
   e2fsck -b 8192 

I really don't know how I'm supposed to fix this, any help greatly
appreciated.

thanks,
- Dave.

On 5/6/07, David Fuchs <[EMAIL PROTECTED]> wrote:


hi all,

I have just upgraded my sarge system to etch, following exactly the
upgrade instructions at 
http://www.us.debian.org/releases/etch/i386/release-notes/
.

now my system does not boot correctly anymore... I'm using RAID1 with two
disks, / is on md0 and all other mounts (/home/, /var, /usr etc) are on md1
using LVM.

the first problem is that during boot, only md0 gets started. I can get
around this by specifying break=mount on the kernel boot line and manually
starting md1, but where need I change what so that md1 gets started at this
point as well?

after manually starting md1 and continuing to boot, I get errors like

[/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/mapper/volg1-b
/var: recovering journal
/var contains a file system with errors, check forced.
/var:
Inode 184326 has illegal block(s)
/var: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without the -a or
-o options)

... same for all other partitions on that volume group

fsck died with exit status 4
A log is being saved in /var/log/fsck/checkfs if that location is
writable. (it is not)

at this point I get dropped to a maintenance shell. when I select to
continue the boot process:

EXT3-fs warning: mounting fs with errors. running e2fsck is recommended
EXT3 FS on dm-4, internal journal
EXT3-FS: mounted filesystem with ordered data mode.

... same for all mounts (same for dm-3, dm-2, dm-1, dm-0)

EXT3-fs error (device dm-1) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-1) in ext3_orphan)write: Journal has aborted
EXT3-fs error (device dm-1) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-1) in ext3_truncate_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device dm-1): ext3_journal)_start_sb: Detected aborte
djournal
Remounting filesystem read-only

and finally I get tons of these:

dm-0: rw-9, want=6447188432, limit=10485760
attempt to access beyond end of device

the system then stops for a long time (~5 minutes) at "starting systlog
service" but eventually the login prompt comes up, and I can log in, see all
my data, and even (to my surprise) write to the partitions on md1...

what the hell is going on here? thanks a lot in advance for any help!

best,
- Dave




Re: mounting LVM partitions fails after etch upgrade

2007-05-06 Thread David Fuchs

thanks for the help.

here's what I did: I booted single-user with init=/bin/sh, and md0 mounted
read-only. everything works so far, I get to the shell w/o any errors.

at this point, md1 is not started but I can start it with mdadm -A
--auto=yes /dev/md1. mdadm -D /dev/md{0,1} shows state: clean for both
arrays, and state: active sync for the disks, so I assume the raid arrays
are doing well.

vgdisplay --ignorelockingfailure volg1 and
lvdisplay --ignorelockingfailure volg1

displays the correct information about the volume group and all volumes
(althoug this takes pretty long...?). I can make the volumes available using
vgchange and lvchange.

however, fsck then shows tons of 'illegal block #... in inode ... '
messages.

I am simply at a complete loss as to why my file system should suddenly be
so corrupt???! I've never had any probs like this before.

and, more importantly, is there a safe (i.e. no data loss) way of fixing it?

thanks,
- Dave.




On 5/6/07, Douglas Allan Tutty <[EMAIL PROTECTED]> wrote:


On Sun, May 06, 2007 at 03:25:02PM +0200, David Fuchs wrote:
> I have just upgraded my sarge system to etch, following exactly the
upgrade
> instructions at
http://www.us.debian.org/releases/etch/i386/release-notes/.
>
> now my system does not boot correctly anymore... I'm using RAID1 with
two
> disks, / is on md0 and all other mounts (/home/, /var, /usr etc) are on
md1
> using LVM.
>
> the first problem is that during boot, only md0 gets started. I can get
> around this by specifying break=mount on the kernel boot line and
manually
> starting md1, but where need I change what so that md1 gets started at
this
> point as well?
>
> after manually starting md1 and continuing to boot, I get errors like
>
> Inode 184326 has illegal block(s)
> /var: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without the -a
or -o
> options)
>
> ... same for all other partitions on that volume group
>
> fsck died with exit status 4
> A log is being saved in /var/log/fsck/checkfs if that location is
> writable.(it is not)
>
> at this point I get dropped to a maintenance shell. when I select to
> continue the boot process:

What happens if instead of forcing a boot you do what it says: run fsck
without the -a or -o options?

>
> EXT3-fs warning: mounting fs with errors. running e2fsck is recommended
> EXT3 FS on dm-4, internal journal
> EXT3-FS: mounted filesystem with ordered data mode.
> ... same for all mounts (same for dm-3, dm-2, dm-1, dm-0)
>
> EXT3-fs error (device dm-1) in ext3_reserve_inode_write: Journal has
aborted
> EXT3-fs error (device dm-1) in ext3_orphan)write: Journal has aborted
> EXT3-fs error (device dm-1) in ext3_orphan_del: Journal has aborted
> EXT3-fs error (device dm-1) in ext3_truncate_write: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device dm-1): ext3_journal)_start_sb: Detected aborte
> djournal
> Remounting filesystem read-only
>
> and finally I get tons of these:
>
> dm-0: rw-9, want=6447188432, limit=10485760
> attempt to access beyond end of device
>
> the system then stops for a long time (~5 minutes) at "starting systlog
> service" but eventually the login prompt comes up, and I can log in, see
all
> my data, and even (to my surprise) write to the partitions on md1...
>
...which probably corrupts the fs even more.

> what the hell is going on here? thanks a lot in advance for any help!
>
What is going on is that you started with a simple booting error that
has propogated into filesystem errors.  Those errors are compounded by
forcing a mount of a filesystem with errors .  Remember that the system
that starts  LVM and raid itself exists on the disks

What you need is a shell with the root fs either totally unmounted or
mounted ro.  Does booting single-user work?  What about telling the
kernel init=/bin/sh? From there, you can check the status of the mds
with:

#/sbin/mdadm -D /dev/md0
#/sbin/mdadm -D /dev/md1
...

check the status of the logical volumes:
#/sbin/lvdisplay [lvname]

and then check the filesystems with:

#/sbin/e2fsck -f -c -c  /dev/...


Only once you get the filesystems fully functional should you attempt to
boot further.

Doug.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact
[EMAIL PROTECTED]




mounting LVM partitions fails after etch upgrade

2007-05-06 Thread David Fuchs

hi all,

I have just upgraded my sarge system to etch, following exactly the upgrade
instructions at http://www.us.debian.org/releases/etch/i386/release-notes/.

now my system does not boot correctly anymore... I'm using RAID1 with two
disks, / is on md0 and all other mounts (/home/, /var, /usr etc) are on md1
using LVM.

the first problem is that during boot, only md0 gets started. I can get
around this by specifying break=mount on the kernel boot line and manually
starting md1, but where need I change what so that md1 gets started at this
point as well?

after manually starting md1 and continuing to boot, I get errors like

[/sbin/fsck.ext3 (1) -- /var] fsck.ext3 -a -C0 /dev/mapper/volg1-b
/var: recovering journal
/var contains a file system with errors, check forced.
/var:
Inode 184326 has illegal block(s)
/var: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY (i.e. without the -a or -o
options)

... same for all other partitions on that volume group

fsck died with exit status 4
A log is being saved in /var/log/fsck/checkfs if that location is
writable.(it is not)

at this point I get dropped to a maintenance shell. when I select to
continue the boot process:

EXT3-fs warning: mounting fs with errors. running e2fsck is recommended
EXT3 FS on dm-4, internal journal
EXT3-FS: mounted filesystem with ordered data mode.

... same for all mounts (same for dm-3, dm-2, dm-1, dm-0)

EXT3-fs error (device dm-1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device dm-1) in ext3_orphan)write: Journal has aborted
EXT3-fs error (device dm-1) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-1) in ext3_truncate_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device dm-1): ext3_journal)_start_sb: Detected aborte
djournal
Remounting filesystem read-only

and finally I get tons of these:

dm-0: rw-9, want=6447188432, limit=10485760
attempt to access beyond end of device

the system then stops for a long time (~5 minutes) at "starting systlog
service" but eventually the login prompt comes up, and I can log in, see all
my data, and even (to my surprise) write to the partitions on md1...

what the hell is going on here? thanks a lot in advance for any help!

best,
- Dave


Re: harsher kill than kill -9 ?

2001-07-12 Thread David Fuchs
The `D' tells you that the process is in an uninterruptible disk wait 
state.  This can only be caused by a kernel problem with the i/o 
routines, a filesystem problem, or a device driver problem.  There 
really isn't anything you can do but restart the machine to get rid of 
that process unless you're lucky enough to have it return from it's wait 
(not likely).


   For some explanation, a problem like this is caused when an 
application tries to read/write from disk, but the i/o call never 
returns, and it doesn't result in a timeout either.  `kill -9', as most 
of you know, is used to send SIGKILL to a process.  The SIGKILL signal 
can not be blocked, ignored, or handled by an application in any way, 
which means that once your application receives a SIGKILL, it's done 
for.  The reason it doesn't work in this situation, however, is that the 
program needs to accept the signal that's trying to be sent first.  It 
can't do this while it's waiting to return from it's disk activity.


   If you'd like something to read, check out "Advanced Programming in 
the UNIX Environment" (Stevens 1992).  It's got some very useful 
information regarding signals and how they're handled.


-David Fuchs

Daniel Patrick Berdine wrote:


On Thu, 12 Jul 2001, Andrei Ivanov wrote:


Well, if its a zombie process, it'll go away by itself after a while.
Can you send an output of ps aux for that process (or top), please?
Andrei



not zombie...

19355 rothaar9   0 90588  88M   476 D 0.0 17.6   0:01 plot.out

from top.

I actually had dpkg do the same thing the other day when I tried to
install a third part .deb. Eventually got it to work fine, but it got hung
up at dpkg --config.  I thought it was just a fuke of what I was doing at
the time or something, but maybe not...

I run dozens of apps very similar to this every day for my work, the only
thing special about this one was that I was trying to run it with a
certain array as large as possible.  It kept segfaulting because I didn't
have neough memory I guess, until this one, which just sortof sits there,
munching up memory but no CPU (luckily! :) )

The array was 1,000,000 x 2 x 2 x 1 and it was trying to make 100 copies,
so I don't blame the machine for complaining a bit, but I'd really like to
kill the stupid thing...

Thanks,

-Dan







Re: dpkg problem

2001-07-10 Thread David Fuchs

Joost Kooij wrote:


[ouch! next time, please hit enter after +/- 72 characters.]


   Sorry, I've corrected this now...



On Tue, Jul 10, 2001 at 12:08:27AM -0700, David Fuchs wrote:


   I've recently installed Debian (Potato) on a personal computer,
and I'm having some difficulty with the package manager (dpkg) that came
with it.  The problem came up after I installed the Ximian Gnome packages
(via apt-get).  Once that had completed, I had a working copy of Ximian
Gnome (and all it's other installed applications).  I then decided to
upgrade XFree86 to v4.1.  During the upgrade, I backed up and removed
the contents of /etc/X11 and /usr/X11R6, just to start fresh.



Don't use apt-get for this directly, use dselect, the proper frontend.



   The Ximian Gnome crew had the option to install via apt-get.  I used 
that option...





Don't use rm -rf for this directly, use dselect, the proper frontend.



   It wasn't exactly `rm -rf' that I used to rename those folders... 
but I see your point.





   Once I had XFree86 4.1 installed, I needed to re-install some of the
Ximian Gnome packages (namely gdm, as it's config rested in /etc/X11/gdm).
Upon running `dpkg --install', I found that it never actually install gdm.
Sure enough, it created the directory structure under /etc/X11/gdm, but
there were no files to speak of.  The backup I had displayed a number
of files (sessions and config data).  I tried `dpkg --install' again,
and put it in the background.  I noticed during the install, that it was
creating the proper files that I was missing, but they were suffixed with
'.dpkg-new'.  Once `dpkg --install' had completed, it removed those files
rather than renaming them (to chop the .dpkg-new off).  Hence my problem.



Don't use dpkg for this directly, use dselect, the proper frontend.

   Alright, sir, I believe we're vitiating the goal of this mailing 
list.  It would be wonderful if you could back up the monomaniacle 
inclination toward 'dselect' in your last three statements with some 
explanation.  I'm looking for information on dpkg, not dselect.  If I 
absolutely must use dselect, I would like to know why, and what 
functionality it provides to Debian that dpkg (or one of the dpkg 
utilities such as 'dpkg-deb') does not.  I'm doing this to learn, and 
using the base tools for debian package management will inherently tell 
me how dselect works.





   So the question is, why did dpkg not install the files properly?
Obviously it's keeping track of what's installed (or should I say, what
it *thinks* is installed).  dpkg's assumptions don't help me, however,
and I can't be certain my applications are installed correctly if it
goes removing things after the fact.  How can I force dpkg to *forget*
about what I've already installed, so I can install it again?  Better yet,
is there a way to force a proper re-install with dpkg?



You never completely removed the packages, probably.  There is "remove",
which removes the binaries etc., and there is "purge", which also
removes configfiles.


   Ok.



When you only "remove", and later reinstall, your original configfiles
will still be inplace.  That is, unless you have removed some of these
files while the package was removed, then it will not put a new configfile
in if it finds that the old one is gone.  This is just like when upgrading
an installed package:  if you removed a configfile on the old version,
you don't expect it to reappear after upgrade.

   Sounds good.  It looks like the reason dpkg removed those 'dpkg-new' 
files was because dpkg found the package was already installed.  This 
was because it was never purged, only removed.  I will attempt to purge 
everything, and then install.  A purge should remove any list files 
associated with the particular packages, correct?  If so, an install (I 
take it you must do an install, not re-install, after purging) should 
re-construct those list files?  I certainly hope so.





   One way I've found to re-install, is to use `dpkg-deb --extract
 /', but that's not good, as it kind of defeats the point
of dpkg, as dpkg is used to not only extract, but to configure things
you install as well, correct?  I've also found another way, which is to
remove the list files associated with a program (I think I found them in
/var/lib/dpkg/status/info/* ??).  Again, this defeats the point of dpkg,
as dpkg should be a package manager on it's own accord, so I shouldn't
need to be deleting things.  Of course, after deleting a bunch of those
list files, dpkg complains whenever I run it that it's missing things.
It would be great if someone could tell me how to repair this as well...



Please, for your own sake, do not hack into the package management system
at this level, until you have read all the documentation, in which case
the above explanation would have been more evident from the start.


   Yes, I'll make sure I've read all the dpkg documentation.



Cheers,


Joost



Thanks,

-David Fuchs



dpkg problem

2001-07-09 Thread David Fuchs



Hello Debian users!
 
    I've recently installed Debian 
(Potato) on a personal computer, and I'm having some difficulty with the package 
manager (dpkg) that came with it.  The problem came up after I installed 
the Ximian Gnome packages (via apt-get).  Once that had completed, I had a 
working copy of Ximian Gnome (and all it's other installed applications).  
I then decided to upgrade XFree86 to v4.1.  During the upgrade, I 
backed up and removed the contents of /etc/X11 and /usr/X11R6, just to start 
fresh.
 
    Once I had XFree86 4.1 
installed, I needed to re-install some of the Ximian Gnome packages (namely gdm, 
as it's config rested in /etc/X11/gdm).  Upon running `dpkg --install', I 
found that it never actually install gdm.  Sure enough, it created the 
directory structure under /etc/X11/gdm, but there were no files to speak 
of.  The backup I had displayed a number of files (sessions and config 
data).  I tried `dpkg --install' again, and put it in the background.  
I noticed during the install, that it was creating the proper files that I was 
missing, but they were suffixed with '.dpkg-new'.  Once `dpkg --install' 
had completed, it removed those files rather than renaming them (to chop the 
.dpkg-new off).  Hence my problem.
 
    So the question is, why 
did dpkg not install the files properly?  Obviously it's keeping track 
of what's installed (or should I say, what it *thinks* is installed).  
dpkg's assumptions don't help me, however, and I can't be certain my 
applications are installed correctly if it goes removing things after the 
fact.  How can I force dpkg to *forget* about what I've already installed, 
so I can install it again?  Better yet, is there a way to force a 
proper re-install with dpkg?
 
    One way I've found to 
re-install, is to use `dpkg-deb --extract  /', but that's not 
good, as it kind of defeats the point of dpkg, as dpkg is used to not only 
extract, but to configure things you install as well, correct?  I've also 
found another way, which is to remove the list 
files associated with a program (I think I found them in 
/var/lib/dpkg/status/info/* ??).  Again, this defeats the point of dpkg, as 
dpkg should be a package manager on it's own accord, so I shouldn't need to be 
deleting things.  Of course, after deleting a bunch of those list files, 
dpkg complains whenever I run it that it's missing things.  It would be 
great if someone could tell me how to repair this as well...
 
    So thats it, I hope someone can 
help me out here.  I've posted to a local Linux group as well, but to no 
avail.
 
-David Fuchs