Re: 3.12-rc5 and overwritten partition table - by powertop?

2014-01-23 Thread John Twideldum
>(CCing linux-ide)
>
>It seems like most likely either the SATA host controller or drive
>doesn't play nice with link power management enabled. Can you post the
>full dmesg boot log?

I found backup, yay! -> attached

Compared to Stefan's report, same hardware, same SSD,
only my BIOS is one rev older and my 840 Pro firmware is also one rev older.

Hope it helps,
J


log_oct29.txt.bz2
Description: BZip2 compressed data


Re: 3.12-rc5 and overwritten partition table - by powertop?

2014-01-20 Thread Robert Hancock

On 10/29/2013 04:32 PM, John Twideldum wrote:

The first ~170kb of /dev/sda got blown away with what seems to be a logging 
output
by Powertop, when I was playing with the tuneables.


So did you log the output to some file? I'm just trying to understand how
it could get onto your disk in the first place...


Attached a dump of the first 1Mb of the disk, HTH.
It looks like a powertop log?
(I have powertop 2.4)


Yes, likely. But it is strange the corruption doesn't even end at any
sensible boundary (data ends at offset 0x27b53). Shrug...


My recollection what I did is this:

I was looking into powertop and observing how -rc5 works now with Haswell.
I saw the tuneable parameters and quite a few were "bad", so I set them to 
"good".
Power usage dropped about one third - yay!
However, changing "SATA link power" threw up complaints:

Oct 29 09:09:21 localhost kernel: [ 3697.423868] ata1.00: exception Emask 0x10 
SAct 0x1 SErr 0xc action 0x6 frozen
Oct 29 09:09:21 localhost kernel: [ 3697.423873] ata1.00: irq_stat 0x0800, 
interface fatal error
Oct 29 09:09:21 localhost kernel: [ 3697.423877] ata1: SError: { CommWake 10B8B 
}
Oct 29 09:09:21 localhost kernel: [ 3697.423880] ata1.00: failed command: WRITE 
FPDMA QUEUED
Oct 29 09:09:21 localhost kernel: [ 3697.423886] ata1.00: cmd 
61/38:00:01:9e:a4/01:00:00:00:00/40 tag 0 ncq 159744 out
Oct 29 09:09:21 localhost kernel: [ 3697.423886]  res 
50/01:00:01:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
Oct 29 09:09:21 localhost kernel: [ 3697.423888] ata1.00: status: { DRDY }
Oct 29 09:09:21 localhost kernel: [ 3697.423894] ata1: hard resetting link
Oct 29 09:09:22 localhost kernel: [ 3697.743196] ata1: SATA link up 6.0 Gbps 
(SStatus 133 SControl 300)
Oct 29 09:09:22 localhost kernel: [ 3697.744707] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.744719] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744725] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744813] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.745212] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.746694] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.746705] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746711] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746779] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.747286] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.747432] ata1.00: configured for 
UDMA/133
Oct 29 09:09:22 localhost kernel: [ 3697.763181] ata1: EH complete

I did not know yet about what "frozen" means, so I did not investigate and
very soon powered down as I had to leave.
Next time I boot up I did not boot.
So data probable is just the size because as long as I had powertop running...


(CCing linux-ide)

It seems like most likely either the SATA host controller or drive 
doesn't play nice with link power management enabled. Can you post the 
full dmesg boot log?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12-rc5 and overwritten partition table - by powertop?

2014-01-20 Thread Stefan Agner
Am 2013-10-29 21:10, schrieb Jan Kara:
>> The first ~170kb of /dev/sda got blown away with what seems to be a logging 
>> output
>> by Powertop, when I was playing with the tuneables.
>> (Luckily the first partition starts later :-))
>   So did you log the output to some file? I'm just trying to understand how
> it could get onto your disk in the first place...
> 
>> Why is that I don't know, but maybe when turning on the SATA knobs
>> something goes wrong. I'm afraid to try again, but I accept rather higher
>> power use than data loss again :-/

I experienced the same on the very same hardware (Lenovo T440s). Like
John, I turned all those knobs in powertop, including the SATA ones.
Several time I ended up with broken partition table. Once, even my EFI
System partition (first partition) was broken. However, since I use EFI
I was able to recover the partition table quite easily (gdisk asks for
recovery from backup partition table, kudos to the designer of the GPT
format!).

This happens running on Arch Linux with stock 3.12.7 as well as mainline
3.13 kernel. I use latest T440s firmware (2.17). 

Is it possible to disable/warn user when using that knob (at least on
Lenovo T440s), in order to avoid users left at an unbootable system...?

dmesg output:
[2.744398] ata1: SATA max UDMA/133 abar m2048@0xf063c000 port
0xf063c100 irq 59
[2.744400] ata2: DUMMY
[2.744401] ata3: DUMMY
[3.063804] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[3.064532] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[3.064536] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[3.064538] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[3.064606] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES)
succeeded
[3.064926] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[3.064929] ata1.00: ATA-9: Samsung SSD 840 PRO Series, DXM05B0Q, max
UDMA/133
[3.064931] ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth
31/32), AA
[3.065256] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[3.065259] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[3.065261] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[3.065286] ata1.00: ACPI cmd ef/10:09:00:00:00:a0 (SET FEATURES)
succeeded
[3.065545] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[3.065605] ata1.00: configured for UDMA/133
...
[  130.578789] ata1.00: exception Emask 0x0 SAct 0x7fff SErr 0x4
action 0x0
[  130.578794] ata1.00: irq_stat 0x4001
[  130.578796] ata1: SError: { CommWake }
[  130.578798] ata1.00: failed command: WRITE FPDMA QUEUED
[  130.578802] ata1.00: cmd 61/10:00:f0:29:05/00:00:00:00:00/40 tag 0
ncq 8192 out
[  130.578804] ata1.00: status: { DRDY ERR }
[  130.578806] ata1.00: error: { ABRT }
...
[  130.579011] ata1.00: failed command: WRITE FPDMA QUEUED
[  130.579014] ata1.00: cmd 61/10:f0:58:7c:0f/00:00:00:00:00/40 tag 30
ncq 8192 out
[  130.579016] ata1.00: status: { DRDY ERR }
[  130.579017] ata1.00: error: { ABRT }
[  130.579207] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[  130.579456] ata1.00: failed to get NCQ Send/Recv Log Emask 0x1
[  130.579511] ata1.00: configured for UDMA/133
[  130.579583] ata1: EH complete

--
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12-rc5 and overwritten partition table - by powertop?

2013-10-29 Thread John Twideldum
>> >> The first ~170kb of /dev/sda got blown away with what seems to be a 
>> >> logging output
>> >> by Powertop, when I was playing with the tuneables.
>> >
>> >So did you log the output to some file? I'm just trying to understand how
>> >it could get onto your disk in the first place...
>>
>> Attached a dump of the first 1Mb of the disk, HTH.
>> It looks like a powertop log?
>> (I have powertop 2.4)
>
>Yes, likely. But it is strange the corruption doesn't even end at any
>sensible boundary (data ends at offset 0x27b53). Shrug...

My recollection what I did is this:

I was looking into powertop and observing how -rc5 works now with Haswell.
I saw the tuneable parameters and quite a few were "bad", so I set them to 
"good".
Power usage dropped about one third - yay!
However, changing "SATA link power" threw up complaints:

Oct 29 09:09:21 localhost kernel: [ 3697.423868] ata1.00: exception Emask 0x10 
SAct 0x1 SErr 0xc action 0x6 frozen
Oct 29 09:09:21 localhost kernel: [ 3697.423873] ata1.00: irq_stat 0x0800, 
interface fatal error
Oct 29 09:09:21 localhost kernel: [ 3697.423877] ata1: SError: { CommWake 10B8B 
}
Oct 29 09:09:21 localhost kernel: [ 3697.423880] ata1.00: failed command: WRITE 
FPDMA QUEUED
Oct 29 09:09:21 localhost kernel: [ 3697.423886] ata1.00: cmd 
61/38:00:01:9e:a4/01:00:00:00:00/40 tag 0 ncq 159744 out
Oct 29 09:09:21 localhost kernel: [ 3697.423886]  res 
50/01:00:01:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
Oct 29 09:09:21 localhost kernel: [ 3697.423888] ata1.00: status: { DRDY }
Oct 29 09:09:21 localhost kernel: [ 3697.423894] ata1: hard resetting link
Oct 29 09:09:22 localhost kernel: [ 3697.743196] ata1: SATA link up 6.0 Gbps 
(SStatus 133 SControl 300)
Oct 29 09:09:22 localhost kernel: [ 3697.744707] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.744719] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744725] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.744813] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.745212] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.746694] ata1.00: ACPI cmd 
ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.746705] ata1.00: ACPI cmd 
f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746711] ata1.00: ACPI cmd 
ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Oct 29 09:09:22 localhost kernel: [ 3697.746779] ata1.00: ACPI cmd 
ef/10:09:00:00:00:a0 (SET FEATURES) succeeded
Oct 29 09:09:22 localhost kernel: [ 3697.747286] ata1.00: failed to get NCQ 
Send/Recv Log Emask 0x1
Oct 29 09:09:22 localhost kernel: [ 3697.747432] ata1.00: configured for 
UDMA/133
Oct 29 09:09:22 localhost kernel: [ 3697.763181] ata1: EH complete

I did not know yet about what "frozen" means, so I did not investigate and
very soon powered down as I had to leave.
Next time I boot up I did not boot.
So data probable is just the size because as long as I had powertop running...

HTH,
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12-rc5 and overwritten partition table - by powertop?

2013-10-29 Thread Jan Kara
On Tue 29-10-13 22:58:40, John Twideldum wrote:
> >> The first ~170kb of /dev/sda got blown away with what seems to be a 
> >> logging output
> >> by Powertop, when I was playing with the tuneables.
> >
> >So did you log the output to some file? I'm just trying to understand how
> >it could get onto your disk in the first place...
> 
> Attached a dump of the first 1Mb of the disk, HTH.
> It looks like a powertop log?
> (I have powertop 2.4)
  Yes, likely. But it is strange the corruption doesn't even end at any
sensible boundary (data ends at offset 0x27b53). Shrug...

> meanwhile, testdisk running in all variations recovered enough clues to
> guess the partitions correctly - the data is fine :-)
  Good.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.12-rc5 and overwritten partition table - by powertop?

2013-10-29 Thread John Twideldum
>> The first ~170kb of /dev/sda got blown away with what seems to be a logging 
>> output
>> by Powertop, when I was playing with the tuneables.
>
>So did you log the output to some file? I'm just trying to understand how
>it could get onto your disk in the first place...

Attached a dump of the first 1Mb of the disk, HTH.
It looks like a powertop log?
(I have powertop 2.4)

meanwhile, testdisk running in all variations recovered enough clues to
guess the partitions correctly - the data is fine :-)

J


backup_overwrite.dump.bz2
Description: BZip2 compressed data


Re: 3.12-rc5 and overwritten partition table - by powertop?

2013-10-29 Thread Jan Kara
On Tue 29-10-13 15:57:54, John Twideldum wrote:
> replying to myself with more insights...
>  
> >000 13.0\nalsa:hw
> >020 C0D0\t99.0\nal
> >040 sa:hwC0D3\t99.000
> >060 00\nbacklight:acp
> >.
> 
> The first ~170kb of /dev/sda got blown away with what seems to be a logging 
> output
> by Powertop, when I was playing with the tuneables.
> (Luckily the first partition starts later :-))
  So did you log the output to some file? I'm just trying to understand how
it could get onto your disk in the first place...

> Why is that I don't know, but maybe when turning on the SATA knobs
> something goes wrong. I'm afraid to try again, but I accept rather higher
> power use than data loss again :-/

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/