Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Joel Becker
On Tue, Jul 03, 2012 at 11:46:08PM -0700, Aleks Clark wrote:
> any ideas how long this is going to take on a 2tb fs with ~400gb used?
> going on 10 hours of downtime, and it's been doing Pass 0a for the
> past 10 minutes. also, should all the nodes be up (but unmounted) for
> this?

I don't have any good idea how long it will take.  That's a lot
of filesystem.  However, it should take less than 10h!  You don't need
any other nodes up or down.  All that matters is that there are no
mounts.

Joel

> 
> On Tue, Jul 3, 2012 at 11:42 PM, Joel Becker  wrote:
> > Because it's unsafe to do any I/O at that point.  We'd rather you have
> > to reboot than scribble more bad data on your disk!
> >
> > Joel
> >
> > On Tue, Jul 03, 2012 at 11:35:32PM -0700, Aleks Clark wrote:
> >> it said 'clean' and exited. Working on bringing the cluster down. Is
> >> there a reason why, after the kernel panics, ocfs2 makes all i/o
> >> block? I can't even unmount the filesystem on any node, I have to
> >> actually reboot it.
> >>
> >> On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker  wrote:
> >> > On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
> >> >> well, by 'clean', it said it was clean. the locks persisted though. I
> >> >> seriously can't believe there's no way to force lock removal. is it
> >> >> just a file somewhere I can delete?
> >> >
> >> > There's no lock hanging around past a full restart.  This looks like
> >> > on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
> >> > just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
> >> > (obviously with the filesystem not mounted on ANY node).
> >> >
> >> > Joel
> >> >
> >> >>
> >> >>
> >> >> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  
> >> >> wrote:
> >> >> > yep, tried that, returned clean.
> >> >> >
> >> >> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
> >> >> >  wrote:
> >> >> >>
> >> >> >> One more thing: did you try running fsck.ocfs2 on it?
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Herbert.
> >> >> >>
> >> >> >>
> >> >> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
> >> >> >>>
> >> >> >>> Hmm doesn't mean much to me, but maybe to someone else on the list. 
> >> >> >>>  But
> >> >> >>> I bet their first suggestion will be to try a recent kernel...
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Herbert.
> >> >> >>>
> >> >> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
> >> >> 
> >> >>  Nick, I don't think so, it's a 2tb partition with only 300gb used.
> >> >> 
> >> >>  Herb,
> >> >> 
> >> >> 
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578659]
> >> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
> >> >>  path_leaf_bh(left_path) == path_leaf_bh(right_path)
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578714]
> >> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
> >> >>  during insert of 15761664 (left path cpos 20725762) results in two
> >> >>  identical paths ending at 395267
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut 
> >> >>  here
> >> >>  ]
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
> >> >> 
> >> >>  /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode: 
> >> >>   [#1]
> >> >>  SMP
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
> >> >>  /sys/devices/virtual/net/lo/operstate
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
> >> >>  drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
> >> >>  iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
> >> >>  x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
> >> >>  ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp 
> >> >>  loop
> >> >>  md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
> >> >>  i2c_core pcspkr processor button psmouse joydev evdev serio_raw 
> >> >>  usbhid
> >> >>  hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
> >> >>  usbcore scsi_mod e1000e nls_base thermal thermal_sys [last 
> >> >>  unloaded:
> >> >>  drbd]
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: 
> >> >>  kvm
> >> >>  Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
> >> >>  0010:[]  []
> >> >>  ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
> >> >>  0018:880014839688  EFLAGS: 00010292
> >> >>  Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 
> >> >>  00bf
> >> >>  RBX: 00

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
any ideas how long this is going to take on a 2tb fs with ~400gb used?
going on 10 hours of downtime, and it's been doing Pass 0a for the
past 10 minutes. also, should all the nodes be up (but unmounted) for
this?

On Tue, Jul 3, 2012 at 11:42 PM, Joel Becker  wrote:
> Because it's unsafe to do any I/O at that point.  We'd rather you have
> to reboot than scribble more bad data on your disk!
>
> Joel
>
> On Tue, Jul 03, 2012 at 11:35:32PM -0700, Aleks Clark wrote:
>> it said 'clean' and exited. Working on bringing the cluster down. Is
>> there a reason why, after the kernel panics, ocfs2 makes all i/o
>> block? I can't even unmount the filesystem on any node, I have to
>> actually reboot it.
>>
>> On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker  wrote:
>> > On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
>> >> well, by 'clean', it said it was clean. the locks persisted though. I
>> >> seriously can't believe there's no way to force lock removal. is it
>> >> just a file somewhere I can delete?
>> >
>> > There's no lock hanging around past a full restart.  This looks like
>> > on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
>> > just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
>> > (obviously with the filesystem not mounted on ANY node).
>> >
>> > Joel
>> >
>> >>
>> >>
>> >> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
>> >> > yep, tried that, returned clean.
>> >> >
>> >> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
>> >> >  wrote:
>> >> >>
>> >> >> One more thing: did you try running fsck.ocfs2 on it?
>> >> >>
>> >> >> Thanks,
>> >> >> Herbert.
>> >> >>
>> >> >>
>> >> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
>> >> >>>
>> >> >>> Hmm doesn't mean much to me, but maybe to someone else on the list.  
>> >> >>> But
>> >> >>> I bet their first suggestion will be to try a recent kernel...
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Herbert.
>> >> >>>
>> >> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
>> >> 
>> >>  Nick, I don't think so, it's a 2tb partition with only 300gb used.
>> >> 
>> >>  Herb,
>> >> 
>> >> 
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578659]
>> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>> >>  path_leaf_bh(left_path) == path_leaf_bh(right_path)
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578714]
>> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>> >>  during insert of 15761664 (left path cpos 20725762) results in two
>> >>  identical paths ending at 395267
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut 
>> >>  here
>> >>  ]
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>> >> 
>> >>  /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  
>> >>  [#1]
>> >>  SMP
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>> >>  /sys/devices/virtual/net/lo/operstate
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>> >>  Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>> >>  drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>> >>  iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>> >>  x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>> >>  ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>> >>  md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>> >>  i2c_core pcspkr processor button psmouse joydev evdev serio_raw 
>> >>  usbhid
>> >>  hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>> >>  usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>> >>  drbd]
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>> >>  Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>> >>  0010:[]  []
>> >>  ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>> >>  0018:880014839688  EFLAGS: 00010292
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
>> >>  RBX: 00060803 RCX: 1806
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
>> >>  RSI: 0096 RDI: 0246
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
>> >>  R08: 000209d0 R09: 000a
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
>> >>  R11: 0001 R12: 013c4002
>> >>  Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
>> >>  R14: 

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Joel Becker
Because it's unsafe to do any I/O at that point.  We'd rather you have
to reboot than scribble more bad data on your disk!

Joel

On Tue, Jul 03, 2012 at 11:35:32PM -0700, Aleks Clark wrote:
> it said 'clean' and exited. Working on bringing the cluster down. Is
> there a reason why, after the kernel panics, ocfs2 makes all i/o
> block? I can't even unmount the filesystem on any node, I have to
> actually reboot it.
> 
> On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker  wrote:
> > On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
> >> well, by 'clean', it said it was clean. the locks persisted though. I
> >> seriously can't believe there's no way to force lock removal. is it
> >> just a file somewhere I can delete?
> >
> > There's no lock hanging around past a full restart.  This looks like
> > on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
> > just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
> > (obviously with the filesystem not mounted on ANY node).
> >
> > Joel
> >
> >>
> >>
> >> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
> >> > yep, tried that, returned clean.
> >> >
> >> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
> >> >  wrote:
> >> >>
> >> >> One more thing: did you try running fsck.ocfs2 on it?
> >> >>
> >> >> Thanks,
> >> >> Herbert.
> >> >>
> >> >>
> >> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
> >> >>>
> >> >>> Hmm doesn't mean much to me, but maybe to someone else on the list.  
> >> >>> But
> >> >>> I bet their first suggestion will be to try a recent kernel...
> >> >>>
> >> >>> Thanks,
> >> >>> Herbert.
> >> >>>
> >> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
> >> 
> >>  Nick, I don't think so, it's a 2tb partition with only 300gb used.
> >> 
> >>  Herb,
> >> 
> >> 
> >>  Jul  3 14:47:26 castor kernel: [3488036.578659]
> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
> >>  path_leaf_bh(left_path) == path_leaf_bh(right_path)
> >>  Jul  3 14:47:26 castor kernel: [3488036.578714]
> >>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
> >>  during insert of 15761664 (left path cpos 20725762) results in two
> >>  identical paths ending at 395267
> >>  Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
> >>  ]
> >>  Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
> >> 
> >>  /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
> >>  Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  
> >>  [#1]
> >>  SMP
> >>  Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
> >>  /sys/devices/virtual/net/lo/operstate
> >>  Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
> >>  Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
> >>  drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
> >>  iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
> >>  x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
> >>  ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
> >>  md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
> >>  i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
> >>  hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
> >>  usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
> >>  drbd]
> >>  Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
> >>  Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
> >>  Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
> >>  0010:[]  []
> >>  ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
> >>  Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
> >>  0018:880014839688  EFLAGS: 00010292
> >>  Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
> >>  RBX: 00060803 RCX: 1806
> >>  Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
> >>  RSI: 0096 RDI: 0246
> >>  Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
> >>  R08: 000209d0 R09: 000a
> >>  Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
> >>  R11: 0001 R12: 013c4002
> >>  Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
> >>  R14: 0001 R15: 88023c153c60
> >>  Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
> >>  7f0cfef83700() GS:880008a0()
> >>  knlGS:
> >>  Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
> >>  002b CR0: 8005003b
> >>  Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
> >>  C

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
it said 'clean' and exited. Working on bringing the cluster down. Is
there a reason why, after the kernel panics, ocfs2 makes all i/o
block? I can't even unmount the filesystem on any node, I have to
actually reboot it.

On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker  wrote:
> On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
>> well, by 'clean', it said it was clean. the locks persisted though. I
>> seriously can't believe there's no way to force lock removal. is it
>> just a file somewhere I can delete?
>
> There's no lock hanging around past a full restart.  This looks like
> on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
> just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
> (obviously with the filesystem not mounted on ANY node).
>
> Joel
>
>>
>>
>> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
>> > yep, tried that, returned clean.
>> >
>> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
>> >  wrote:
>> >>
>> >> One more thing: did you try running fsck.ocfs2 on it?
>> >>
>> >> Thanks,
>> >> Herbert.
>> >>
>> >>
>> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
>> >>>
>> >>> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
>> >>> I bet their first suggestion will be to try a recent kernel...
>> >>>
>> >>> Thanks,
>> >>> Herbert.
>> >>>
>> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
>> 
>>  Nick, I don't think so, it's a 2tb partition with only 300gb used.
>> 
>>  Herb,
>> 
>> 
>>  Jul  3 14:47:26 castor kernel: [3488036.578659]
>>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>>  path_leaf_bh(left_path) == path_leaf_bh(right_path)
>>  Jul  3 14:47:26 castor kernel: [3488036.578714]
>>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>>  during insert of 15761664 (left path cpos 20725762) results in two
>>  identical paths ending at 395267
>>  Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
>>  ]
>>  Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>> 
>>  /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>>  Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  
>>  [#1]
>>  SMP
>>  Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>>  /sys/devices/virtual/net/lo/operstate
>>  Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>>  Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>>  drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>>  iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>>  x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>>  ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>>  md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>>  i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
>>  hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>>  usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>>  drbd]
>>  Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>>  Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>>  Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>>  0010:[]  []
>>  ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>>  Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>>  0018:880014839688  EFLAGS: 00010292
>>  Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
>>  RBX: 00060803 RCX: 1806
>>  Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
>>  RSI: 0096 RDI: 0246
>>  Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
>>  R08: 000209d0 R09: 000a
>>  Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
>>  R11: 0001 R12: 013c4002
>>  Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
>>  R14: 0001 R15: 88023c153c60
>>  Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
>>  7f0cfef83700() GS:880008a0()
>>  knlGS:
>>  Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
>>  002b CR0: 8005003b
>>  Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
>>  CR3: 00023ccb6000 CR4: 000426e0
>>  Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
>>  DR1:  DR2: 
>>  Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
>>  DR6: 0ff0 DR7: 0400
>>  Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
>>  25326, threadi

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Joel Becker
On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
> well, by 'clean', it said it was clean. the locks persisted though. I
> seriously can't believe there's no way to force lock removal. is it
> just a file somewhere I can delete?

There's no lock hanging around past a full restart.  This looks like
on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
(obviously with the filesystem not mounted on ANY node).

Joel

> 
> 
> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
> > yep, tried that, returned clean.
> >
> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
> >  wrote:
> >>
> >> One more thing: did you try running fsck.ocfs2 on it?
> >>
> >> Thanks,
> >> Herbert.
> >>
> >>
> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
> >>>
> >>> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
> >>> I bet their first suggestion will be to try a recent kernel...
> >>>
> >>> Thanks,
> >>> Herbert.
> >>>
> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
> 
>  Nick, I don't think so, it's a 2tb partition with only 300gb used.
> 
>  Herb,
> 
> 
>  Jul  3 14:47:26 castor kernel: [3488036.578659]
>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>  path_leaf_bh(left_path) == path_leaf_bh(right_path)
>  Jul  3 14:47:26 castor kernel: [3488036.578714]
>  (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>  during insert of 15761664 (left path cpos 20725762) results in two
>  identical paths ending at 395267
>  Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
>  ]
>  Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
> 
>  /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>  Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1]
>  SMP
>  Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>  /sys/devices/virtual/net/lo/operstate
>  Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>  Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>  drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>  iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>  x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>  ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>  md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>  i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
>  hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>  usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>  drbd]
>  Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>  Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>  Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>  0010:[]  []
>  ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>  Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>  0018:880014839688  EFLAGS: 00010292
>  Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
>  RBX: 00060803 RCX: 1806
>  Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
>  RSI: 0096 RDI: 0246
>  Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
>  R08: 000209d0 R09: 000a
>  Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
>  R11: 0001 R12: 013c4002
>  Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
>  R14: 0001 R15: 88023c153c60
>  Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
>  7f0cfef83700() GS:880008a0()
>  knlGS:
>  Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
>  002b CR0: 8005003b
>  Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
>  CR3: 00023ccb6000 CR4: 000426e0
>  Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
>  DR1:  DR2: 
>  Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
>  DR6: 0ff0 DR7: 0400
>  Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
>  25326, threadinfo 880014838000, task 88023b999c40)
>  Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
>  Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
>  013c4002 00060803 880014839718
>  Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   880232abde80
>  88023b999c40 88023b999c40 8800148397a8
>  Jul  

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
grr. the copy I 'recovered' using dd to copy instead of cp is totally
munged. Would really appreciate some pointers on fixing the ocfs2
issue, I've got data backups but not looking forward to rebuilding the
whole damned VM :/

On Tue, Jul 3, 2012 at 6:57 PM, Aleks Clark  wrote:
> well, by 'clean', it said it was clean. the locks persisted though. I
> seriously can't believe there's no way to force lock removal. is it
> just a file somewhere I can delete?
>
>
> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
>> yep, tried that, returned clean.
>>
>> On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
>>  wrote:
>>>
>>> One more thing: did you try running fsck.ocfs2 on it?
>>>
>>> Thanks,
>>> Herbert.
>>>
>>>
>>> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:

 Hmm doesn't mean much to me, but maybe to someone else on the list.  But
 I bet their first suggestion will be to try a recent kernel...

 Thanks,
 Herbert.

 On 7/3/2012 6:19 PM, Aleks Clark wrote:
>
> Nick, I don't think so, it's a 2tb partition with only 300gb used.
>
> Herb,
>
>
> Jul  3 14:47:26 castor kernel: [3488036.578659]
> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
> path_leaf_bh(left_path) == path_leaf_bh(right_path)
> Jul  3 14:47:26 castor kernel: [3488036.578714]
> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
> during insert of 15761664 (left path cpos 20725762) results in two
> identical paths ending at 395267
> Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
> ]
> Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>
> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
> Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1]
> SMP
> Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
> /sys/devices/virtual/net/lo/operstate
> Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
> Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
> drbd]
> Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
> Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
> 0010:[]  []
> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
> 0018:880014839688  EFLAGS: 00010292
> Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
> RBX: 00060803 RCX: 1806
> Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
> RSI: 0096 RDI: 0246
> Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
> R08: 000209d0 R09: 000a
> Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
> R11: 0001 R12: 013c4002
> Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
> R14: 0001 R15: 88023c153c60
> Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
> 7f0cfef83700() GS:880008a0()
> knlGS:
> Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
> 002b CR0: 8005003b
> Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
> CR3: 00023ccb6000 CR4: 000426e0
> Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
> DR1:  DR2: 
> Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
> DR6: 0ff0 DR7: 0400
> Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
> 25326, threadinfo 880014838000, task 88023b999c40)
> Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
> Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
> 013c4002 00060803 880014839718
> Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   880232abde80
> 88023b999c40 88023b999c40 8800148397a8
> Jul  3 14:47:26 castor kernel: [3488036.579977]<0>   8800148397c8
> 8800148398a8 88023d8027f8 00f08100
> Jul  3 14:47:26 cast

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
well, by 'clean', it said it was clean. the locks persisted though. I
seriously can't believe there's no way to force lock removal. is it
just a file somewhere I can delete?


On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark  wrote:
> yep, tried that, returned clean.
>
> On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
>  wrote:
>>
>> One more thing: did you try running fsck.ocfs2 on it?
>>
>> Thanks,
>> Herbert.
>>
>>
>> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
>>>
>>> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
>>> I bet their first suggestion will be to try a recent kernel...
>>>
>>> Thanks,
>>> Herbert.
>>>
>>> On 7/3/2012 6:19 PM, Aleks Clark wrote:

 Nick, I don't think so, it's a 2tb partition with only 300gb used.

 Herb,


 Jul  3 14:47:26 castor kernel: [3488036.578659]
 (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
 path_leaf_bh(left_path) == path_leaf_bh(right_path)
 Jul  3 14:47:26 castor kernel: [3488036.578714]
 (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
 during insert of 15761664 (left path cpos 20725762) results in two
 identical paths ending at 395267
 Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
 ]
 Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at

 /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
 Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1]
 SMP
 Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
 /sys/devices/virtual/net/lo/operstate
 Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
 Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
 drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
 iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
 x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
 ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
 md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
 i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
 hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
 usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
 drbd]
 Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
 Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
 Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
 0010:[]  []
 ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
 Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
 0018:880014839688  EFLAGS: 00010292
 Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
 RBX: 00060803 RCX: 1806
 Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
 RSI: 0096 RDI: 0246
 Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
 R08: 000209d0 R09: 000a
 Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
 R11: 0001 R12: 013c4002
 Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
 R14: 0001 R15: 88023c153c60
 Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
 7f0cfef83700() GS:880008a0()
 knlGS:
 Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
 002b CR0: 8005003b
 Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
 CR3: 00023ccb6000 CR4: 000426e0
 Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
 DR1:  DR2: 
 Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
 DR6: 0ff0 DR7: 0400
 Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
 25326, threadinfo 880014838000, task 88023b999c40)
 Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
 Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
 013c4002 00060803 880014839718
 Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   880232abde80
 88023b999c40 88023b999c40 8800148397a8
 Jul  3 14:47:26 castor kernel: [3488036.579977]<0>   8800148397c8
 8800148398a8 88023d8027f8 00f08100
 Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
 Jul  3 14:47:26 castor kernel: [3488036.580074]  []
 ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
 Jul  3 14:47:26 castor kernel: [3488036.580108]  []
 ? __ocfs2_journal_access+0x261/0x32a [ocfs2]
 Jul  3 14:47:26 castor kernel: [3488036.580156]  []
 ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
 Jul  3 14:47

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
yep, tried that, returned clean.

On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
 wrote:
>
> One more thing: did you try running fsck.ocfs2 on it?
>
> Thanks,
> Herbert.
>
>
> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
>>
>> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
>> I bet their first suggestion will be to try a recent kernel...
>>
>> Thanks,
>> Herbert.
>>
>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
>>>
>>> Nick, I don't think so, it's a 2tb partition with only 300gb used.
>>>
>>> Herb,
>>>
>>>
>>> Jul  3 14:47:26 castor kernel: [3488036.578659]
>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>>> path_leaf_bh(left_path) == path_leaf_bh(right_path)
>>> Jul  3 14:47:26 castor kernel: [3488036.578714]
>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>>> during insert of 15761664 (left path cpos 20725762) results in two
>>> identical paths ending at 395267
>>> Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
>>> ]
>>> Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>>>
>>> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>>> Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1]
>>> SMP
>>> Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>>> /sys/devices/virtual/net/lo/operstate
>>> Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>>> Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>>> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>>> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>>> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>>> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>>> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>>> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
>>> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>>> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>>> drbd]
>>> Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>>> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>>> Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>>> 0010:[]  []
>>> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>>> 0018:880014839688  EFLAGS: 00010292
>>> Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
>>> RBX: 00060803 RCX: 1806
>>> Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
>>> RSI: 0096 RDI: 0246
>>> Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
>>> R08: 000209d0 R09: 000a
>>> Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
>>> R11: 0001 R12: 013c4002
>>> Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
>>> R14: 0001 R15: 88023c153c60
>>> Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
>>> 7f0cfef83700() GS:880008a0()
>>> knlGS:
>>> Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
>>> 002b CR0: 8005003b
>>> Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
>>> CR3: 00023ccb6000 CR4: 000426e0
>>> Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
>>> DR1:  DR2: 
>>> Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
>>> DR6: 0ff0 DR7: 0400
>>> Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
>>> 25326, threadinfo 880014838000, task 88023b999c40)
>>> Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
>>> Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
>>> 013c4002 00060803 880014839718
>>> Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   880232abde80
>>> 88023b999c40 88023b999c40 8800148397a8
>>> Jul  3 14:47:26 castor kernel: [3488036.579977]<0>   8800148397c8
>>> 8800148398a8 88023d8027f8 00f08100
>>> Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
>>> Jul  3 14:47:26 castor kernel: [3488036.580074]  []
>>> ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.580108]  []
>>> ? __ocfs2_journal_access+0x261/0x32a [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.580156]  []
>>> ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.580205]  []
>>> ? ocfs2_add_inode_data+0x62/0x6e [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.580239]  []
>>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
>>> Jul  3 14:47:26 castor kernel: [3488036.580272]  []
>>> ? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2]
>>> Jul  3 14:47:26 castor kernel

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread herbert van.den.bergh

Hmm doesn't mean much to me, but maybe to someone else on the list.  But 
I bet their first suggestion will be to try a recent kernel...

Thanks,
Herbert.

On 7/3/2012 6:19 PM, Aleks Clark wrote:
> Nick, I don't think so, it's a 2tb partition with only 300gb used.
>
> Herb,
>
>
> Jul  3 14:47:26 castor kernel: [3488036.578659]
> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
> path_leaf_bh(left_path) == path_leaf_bh(right_path)
> Jul  3 14:47:26 castor kernel: [3488036.578714]
> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
> during insert of 15761664 (left path cpos 20725762) results in two
> identical paths ending at 395267
> Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
> ]
> Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
> Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1] SMP
> Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
> /sys/devices/virtual/net/lo/operstate
> Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
> Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
> drbd]
> Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
> Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
> 0010:[]  []
> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
> 0018:880014839688  EFLAGS: 00010292
> Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
> RBX: 00060803 RCX: 1806
> Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
> RSI: 0096 RDI: 0246
> Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
> R08: 000209d0 R09: 000a
> Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
> R11: 0001 R12: 013c4002
> Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
> R14: 0001 R15: 88023c153c60
> Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
> 7f0cfef83700() GS:880008a0()
> knlGS:
> Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
> 002b CR0: 8005003b
> Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
> CR3: 00023ccb6000 CR4: 000426e0
> Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
> DR1:  DR2: 
> Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
> DR6: 0ff0 DR7: 0400
> Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
> 25326, threadinfo 880014838000, task 88023b999c40)
> Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
> Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
> 013c4002 00060803 880014839718
> Jul  3 14:47:26 castor kernel: [3488036.579923]<0>  880232abde80
> 88023b999c40 88023b999c40 8800148397a8
> Jul  3 14:47:26 castor kernel: [3488036.579977]<0>  8800148397c8
> 8800148398a8 88023d8027f8 00f08100
> Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
> Jul  3 14:47:26 castor kernel: [3488036.580074]  []
> ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580108]  []
> ? __ocfs2_journal_access+0x261/0x32a [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580156]  []
> ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580205]  []
> ? ocfs2_add_inode_data+0x62/0x6e [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580239]  []
> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580272]  []
> ? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580321]  []
> ? ocfs2_set_buffer_uptodate+0x15/0x60e [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580370]  []
> ? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580418]  []
> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580451]  []
> ? ocfs2_write_begin+0x116/0x1d2 [ocfs2]
> Jul  3 14:47:26 castor kernel: [3488036.580484]  []
> ? ge

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread herbert van.den.bergh

One more thing: did you try running fsck.ocfs2 on it?

Thanks,
Herbert.

On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
> I bet their first suggestion will be to try a recent kernel...
>
> Thanks,
> Herbert.
>
> On 7/3/2012 6:19 PM, Aleks Clark wrote:
>> Nick, I don't think so, it's a 2tb partition with only 300gb used.
>>
>> Herb,
>>
>>
>> Jul  3 14:47:26 castor kernel: [3488036.578659]
>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>> path_leaf_bh(left_path) == path_leaf_bh(right_path)
>> Jul  3 14:47:26 castor kernel: [3488036.578714]
>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>> during insert of 15761664 (left path cpos 20725762) results in two
>> identical paths ending at 395267
>> Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
>> ]
>> Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>> Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1] SMP
>> Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>> /sys/devices/virtual/net/lo/operstate
>> Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>> Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
>> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>> drbd]
>> Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>> Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>> 0010:[]  []
>> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>> 0018:880014839688  EFLAGS: 00010292
>> Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
>> RBX: 00060803 RCX: 1806
>> Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
>> RSI: 0096 RDI: 0246
>> Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
>> R08: 000209d0 R09: 000a
>> Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
>> R11: 0001 R12: 013c4002
>> Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
>> R14: 0001 R15: 88023c153c60
>> Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
>> 7f0cfef83700() GS:880008a0()
>> knlGS:
>> Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
>> 002b CR0: 8005003b
>> Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
>> CR3: 00023ccb6000 CR4: 000426e0
>> Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
>> DR1:  DR2: 
>> Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
>> DR6: 0ff0 DR7: 0400
>> Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
>> 25326, threadinfo 880014838000, task 88023b999c40)
>> Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
>> Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
>> 013c4002 00060803 880014839718
>> Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   880232abde80
>> 88023b999c40 88023b999c40 8800148397a8
>> Jul  3 14:47:26 castor kernel: [3488036.579977]<0>   8800148397c8
>> 8800148398a8 88023d8027f8 00f08100
>> Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
>> Jul  3 14:47:26 castor kernel: [3488036.580074]  []
>> ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580108]  []
>> ? __ocfs2_journal_access+0x261/0x32a [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580156]  []
>> ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580205]  []
>> ? ocfs2_add_inode_data+0x62/0x6e [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580239]  []
>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580272]  []
>> ? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580321]  []
>> ? ocfs2_set_buffer_uptodate+0x15/0x60e [ocfs2]
>> Jul  3 14:47:26 castor kernel: [3488036.580370]  []
>> ? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2]
>> Jul  3 14:47:26 castor kernel: 

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
Nick, I don't think so, it's a 2tb partition with only 300gb used.

Herb,


Jul  3 14:47:26 castor kernel: [3488036.578659]
(25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
path_leaf_bh(left_path) == path_leaf_bh(right_path)
Jul  3 14:47:26 castor kernel: [3488036.578714]
(25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
during insert of 15761664 (left path cpos 20725762) results in two
identical paths ending at 395267
Jul  3 14:47:26 castor kernel: [3488036.578800] [ cut here
]
Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
/build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode:  [#1] SMP
Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
/sys/devices/virtual/net/lo/operstate
Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
drbd]
Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
0010:[]  []
ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
0018:880014839688  EFLAGS: 00010292
Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00bf
RBX: 00060803 RCX: 1806
Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 
RSI: 0096 RDI: 0246
Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: 8800148398a8
R08: 000209d0 R09: 000a
Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 
R11: 0001 R12: 013c4002
Jul  3 14:47:26 castor kernel: [3488036.579568] R13: 88002a1e4030
R14: 0001 R15: 88023c153c60
Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
7f0cfef83700() GS:880008a0()
knlGS:
Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
002b CR0: 8005003b
Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 7f0d25dbf000
CR3: 00023ccb6000 CR4: 000426e0
Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 
DR1:  DR2: 
Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 
DR6: 0ff0 DR7: 0400
Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
25326, threadinfo 880014838000, task 88023b999c40)
Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
Jul  3 14:47:26 castor kernel: [3488036.579887]  00f08100
013c4002 00060803 880014839718
Jul  3 14:47:26 castor kernel: [3488036.579923] <0> 880232abde80
88023b999c40 88023b999c40 8800148397a8
Jul  3 14:47:26 castor kernel: [3488036.579977] <0> 8800148397c8
8800148398a8 88023d8027f8 00f08100
Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
Jul  3 14:47:26 castor kernel: [3488036.580074]  []
? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580108]  []
? __ocfs2_journal_access+0x261/0x32a [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580156]  []
? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580205]  []
? ocfs2_add_inode_data+0x62/0x6e [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580239]  []
? ocfs2_journal_access_di+0x0/0xf [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580272]  []
? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580321]  []
? ocfs2_set_buffer_uptodate+0x15/0x60e [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580370]  []
? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580418]  []
? ocfs2_journal_access_di+0x0/0xf [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580451]  []
? ocfs2_write_begin+0x116/0x1d2 [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580484]  []
? generic_file_buffered_write+0x118/0x278
Jul  3 14:47:26 castor kernel: [3488036.580515]  []
? __generic_file_aio_write+0x25f/0x293
Jul  3 14:47:26 castor kernel: [3488036.580548]  []
? ocfs2_prepare_inode_for_write+0x683/0x69c [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580597]  []
? ocfs2_rw_lock+0x16d/0x239 [ocfs2]
Jul  3 14:47:26 castor kernel: [3488036.580628]  []
?

Re: [Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Herbert van den Bergh
On 07/03/2012 04:12 PM, Aleks Clark wrote:
> Ok, so I've got this ocfs2 cluster that's been running for a long
> while, hosting my VMs. All of the sudden I'm getting kernel panics
> originating from ocfs2 when trying to spin up one particular file.
> I've determined that there are several locks on this file, one of them
> exclusive. I restarted the whole cluster to try to get rid of it, but
> no go. I also tried to copy the file, both on and off of the cluster,
> but only half of it copied. Any way to get around either issue would
> be appreciated.

The panic stack may be helpful, and any messages that the kernel spit 
out before it.

Thanks,
Herbert.



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] remove locks? or copy the whole file?

2012-07-03 Thread Aleks Clark
Ok, so I've got this ocfs2 cluster that's been running for a long
while, hosting my VMs. All of the sudden I'm getting kernel panics
originating from ocfs2 when trying to spin up one particular file.
I've determined that there are several locks on this file, one of them
exclusive. I restarted the whole cluster to try to get rid of it, but
no go. I also tried to copy the file, both on and off of the cluster,
but only half of it copied. Any way to get around either issue would
be appreciated.

Regards,

-- 
Aleks Clark

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users