Re: [Ocfs2-users] OCFS2 Error in the filesystem after of some weeks running ocfs2

2012-02-13 Thread Eduardo Diaz - Gmail
Serius?? I make many test in 32 bits environment and don't has any
problem.

I will take another approach and I will put a switch in the interconnect..
(now I has a crossover cable)..

If still have problems I will look a view of of migrating to 64bits.. :-)

Thanks for the tip :)


On Fri, Feb 10, 2012 at 1:54 PM, Adi Kriegisch  wrote:

> Dear Eduardo,
>
> > I shutdown the filesystem an make a fsck.ocfs2 and there is many
> > errors y cluster file but there is no way to test that the ocfs2 are
> > ok? I can stop in night but for me this are crazy, because every to
> > months the filesystem are broken and if I stop one node the running
> > node go down...
> >
> > I have all system in debian squezee with ocfs2 1.6.3
> >
> > Any Ideas??
> We had similar issues (also running debian squeeze 32bit). At the time we
> suspected having not enough LOWMEM available for the recovery to complete
> successfully.
> Switching to amd64 solved the issue for us. Luckily ocfs2 is able to run
> with mixed 32bit and 64bit clients so we could migrate our servers one by
> one without too much interrupting production.
>
> -- Adi
>
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2 Error in the filesystem after of some weeks running ocfs2

2012-02-10 Thread Adi Kriegisch
Dear Eduardo,

> I shutdown the filesystem an make a fsck.ocfs2 and there is many
> errors y cluster file but there is no way to test that the ocfs2 are
> ok? I can stop in night but for me this are crazy, because every to
> months the filesystem are broken and if I stop one node the running
> node go down...
> 
> I have all system in debian squezee with ocfs2 1.6.3
> 
> Any Ideas??
We had similar issues (also running debian squeeze 32bit). At the time we
suspected having not enough LOWMEM available for the recovery to complete
successfully.
Switching to amd64 solved the issue for us. Luckily ocfs2 is able to run
with mixed 32bit and 64bit clients so we could migrate our servers one by
one without too much interrupting production.

-- Adi

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] OCFS2 Error in the filesystem after of some weeks running ocfs2

2012-02-09 Thread Eduardo Diaz - Gmail
Hi to all, I am running a very simple configuration of drbd primary
primary.. I make all test some weeks ago and all runs very well,
(shudown the nodes, etc etc etc)..

I will repeat the probes yesterday and now :(...

I don't know what happens, again!!! but every time that I stop one
node (shutdown, not poweroff) the cluster is broken :-(...

I shutdown the filesystem an make a fsck.ocfs2 and there is many
errors y cluster file but there is no way to test that the ocfs2 are
ok? I can stop in night but for me this are crazy, because every to
months the filesystem are broken and if I stop one node the running
node go down...

I have all system in debian squezee with ocfs2 1.6.3

Any Ideas??

 Feb  7 13:58:33 servidoradantra2 kernel: [1864496.744051] block
drbd0: conn( Unconnected -> WFConnection )
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064015] o2net:
connection to node servidoradantra1 (num 0) at 192.168.2.1: has
been idle for 60.0 seconds, shutting it down.
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064025]
(0,0):o2net_idle_timer:1495 here are some times that might help debug
the situation: (tmr 1328619504.71832 now 1328619564.71605 dr
1328619504.71815 adv 1328619504.71839:1328619504.71840 func
(18797194:507) 1328619488.80748:1328619488.80749)
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064048] o2net: no
longer connected to node servidoradantra1 (num 0) at 192.168.2.1:
Feb  7 13:59:31 servidoradantra2 kernel: [1864554.860190]
(2950,0):o2dlm_eviction_cb:269 o2dlm has evicted node 0 from group
F0E244E5687046DBAAF6A928CCDEEEF1
Feb  7 13:59:31 servidoradantra2 kernel: [1864554.874012]
(28219,0):dlm_get_lock_resource:839
F0E244E5687046DBAAF6A928CCDEEEF1:M120766ee68: at
least one node (0) to recover before lock mastery can begin
Feb  7 13:59:32 servidoradantra2 kernel: [1864555.876011]
(28219,0):dlm_get_lock_resource:893
F0E244E5687046DBAAF6A928CCDEEEF1:M120766ee68: at
least one node (0) to recover before lock mastery can begin
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309527]
(3132,3):dlm_get_lock_resource:839
F0E244E5687046DBAAF6A928CCDEEEF1:$RECOVERY: at least one node (0) to
recover before lock mastery can begin
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309533]
(3132,3):dlm_get_lock_resource:873 F0E244E5687046DBAAF6A928CCDEEEF1:
recovery map is not empty, but must master $RECOVERY lock now
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309549]
(3132,3):dlm_do_recovery:523 (3132) Node 1 is the Recovery Master for
the Dead Node 0 for Domain F0E244E5687046DBAAF6A928CCDEEEF1
Feb  7 13:59:43 servidoradantra2 kernel: [1864566.880235]
(28219,0):ocfs2_replay_journal:1607 Recovering node 0 from slot 0 on
device (147,0)
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884880]
[ cut here ]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884902] kernel BUG
at 
/build/buildd-linux-2.6_2.6.32-39squeeze1-i386-F5tMlP/linux-2.6-2.6.32/debian/build/source_i386_none/fs/ocfs2/journal.c:1702!
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884938] invalid
opcode:  [#1] SMP
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884960] last sysfs
file: /sys/devices/pci:00/:00:1f.2/host5/target5:0:0/5:0:0:0/model
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884991] Modules
linked in: ocfs2 jbd2 quota_tree crc32c drbd lru_cache cn pci_stub
vboxpci vboxnetadp vboxnetflt vboxdrv cls_u32 sch_htb sch_ingress
sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_TPROXY
nf_tproxy_core xt_hashlimit xt_comment xt_owner xt_recent xt_iprange
xt_policy xt_multiport ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP
ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah
ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_MARK
xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack
xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat
nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle
nfnetlink iptable_filter ip_tables x_tables ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs
xfs exportfs it87 hwmon_vid coretemp loop firewire_sbp2
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nouveau
ttm drm_kms_helper snd_pcm drm snd_timer snd soundcore i2c_i801 i2c_
Feb  7 13:59:47 servidoradantra2 kernel: algo_bit parport_pc i2c_core
snd_page_alloc parport psmouse evdev button pcspkr serio_raw processor
ext3 jbd mbcache dm_mod sg usbhid hid sr_mod cdrom ata_generic sd_mod
crc_t10dif uhci_hcd pata_jmicron firewire_ohci thermal ahci
firewire_core floppy crc_itu_t libata r8169 mii ehci_hcd scsi_mod
thermal_sys sky2 usbcore nls_base [last unloaded: scsi_wait_scan]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886462]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886477] Pid: 28219,
comm: ocfs2rec Not tainted (2.6.32-5-686-bigmem #1) 965P-DS4
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886505] EIP:
0060:[] EFLAGS: 00010246 CPU: 0
F

Re: [Ocfs2-users] OCFS2 error.

2010-06-22 Thread Tao Ma
Hi Veeraa,

On 06/23/2010 10:46 AM, veeraa bose wrote:
> Hi Team,
>
>
> Hi Team,
>
> we are getting below error in shared disk on VMwares guest operating system.
>
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return
> code = 0x0018
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
> sector 2367
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: (swapper,0,0):o2hb_bio_end_io:
> 237 ERROR: IO Error -5
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel:
> (o2hb-DEEDA3062A,4504,0):o2hb_do_disk_heartbeat:768 ERROR: status = -5
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return
> code = 0x0018
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
> sector 2367
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel:
> (syslogd,4298,0):o2hb_bio_end_io:237 ERROR: IO Error -5
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel:
> (o2hb-DEEDA3062A,4504,0):o2hb_do_disk_heartbeat:768 ERROR: status = -5
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return
> code = 0x0018
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
> sector 8259921
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: reservation conflict
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: SCSI error: return
> code = 0x0018
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdf,
> sector 8268113
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: reservation conflict
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: SCSI error: return
> code = 0x0018
> Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdf,
> sector 8309329
>
> we are getting Input/output error on fileson ocfs2 FS, when we tried to
> copy. Please let know what could be problem.
yes, you get I/O error(-5 is also EIO), so it isn't related to ocfs2 and 
there should be some problem in your block device that vmware provides. 
So could you please check whether you can write to the device 
successfully by 'dd'?

Regards,
Tao

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] OCFS2 error.

2010-06-22 Thread veeraa bose
Hi Team,


Hi Team,

we are getting below error in shared disk on VMwares guest operating system.

Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return code
= 0x0018
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
sector 2367
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: (swapper,0,0):o2hb_bio_end_io:
237 ERROR: IO Error -5
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel:
(o2hb-DEEDA3062A,4504,0):o2hb_do_disk_heartbeat:768 ERROR: status = -5
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return code
= 0x0018
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
sector 2367
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: (syslogd,4298,0):o2hb_bio_end_io:237
ERROR: IO Error -5
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel:
(o2hb-DEEDA3062A,4504,0):o2hb_do_disk_heartbeat:768 ERROR: status = -5
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: reservation conflict
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:3:0: SCSI error: return code
= 0x0018
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdg,
sector 8259921
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: reservation conflict
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: SCSI error: return code
= 0x0018
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdf,
sector 8268113
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: reservation conflict
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: sd 1:0:2:0: SCSI error: return code
= 0x0018
Jun 23 01:46:12 SCRBXLPDEFRM635 kernel: end_request: I/O error, dev sdf,
sector 8309329

we are getting Input/output error on fileson ocfs2 FS, when we tried to
copy. Please let know what could be problem.

Thanks
Veera.
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2 ERROR: status = - 107

2010-05-26 Thread Sunil Mushran

-107 means the node lost connection with the other node. The messages
below appear cut-pastes and not in sequence. So I cannot tell for sure
what happened next. What should have happened is that the node would
then go into quorum mode followed by recovery mode.

Sunil

On 05/26/2010 05:38 AM, Francesco Gabriele wrote:

Good afternoon,
I am facing these errors on the /var/log/messages regarding ocfs:

kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107

(21312,2):dlm_get_lock_resource:966 E66ADD1149B9416E8D2B3CA50809ABE0: 
recovery map is not empty, but must master $RECOVERY lock now
(10571,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on 
device (8,17)
(10577,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on 
device (8,113)
(10576,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on 
device (8,97)
(10575,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on 
device (8,81)
(21372,3):dlm_get_lock_resource:932 
70984185BB314019A43246EB1EDCBEA0:$RECOVERY: at least one node (0) 
torecover before lock mastery can begin
(21372,3):dlm_get_lock_resource:966 70984185BB314019A43246EB1EDCBEA0: 
recovery map is not empty, but must master $RECOVERY lock now
(21360,6):dlm_get_lock_resource:932 
3A7F251A33BD448690BD4967BF9EB992:$RECOVERY: at least one node (0) 
torecover before lock mastery can begin
(21360,6):dlm_get_lock_resource:966 3A7F251A33BD448690BD4967BF9EB992: 
recovery map is not empty, but must master $RECOVERY lock now

kernel: kjournald starting. Commit interval 5 seconds
kjournald starting. Commit interval 5 seconds


Can someone tell me what kind of error is it and which can be the cause?

Thanks.

Regards,
Francesco Gabriele


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2 ERROR: status = - 107

2010-05-26 Thread David Murphy
Last time I ran into this I had to take the cluster offline and do  fsck -fy
/dev/   this repaired the damage to the FS and the cluster was able to
start.

 

Basically it  not starting  the   cluster because it noticed a node is "out
of sync" .

 

David

 

From: ocfs2-users-boun...@oss.oracle.com
[mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of Francesco Gabriele
Sent: Wednesday, May 26, 2010 7:38 AM
To: ocfs2-users@oss.oracle.com
Subject: [Ocfs2-users] OCFS2 ERROR: status = - 107

 

Good afternoon,
I am facing these errors on the /var/log/messages regarding ocfs:

kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107

(21312,2):dlm_get_lock_resource:966 E66ADD1149B9416E8D2B3CA50809ABE0:
recovery map is not empty, but must master $RECOVERY lock now
(10571,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,17)
(10577,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,113)
(10576,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,97)
(10575,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,81)
(21372,3):dlm_get_lock_resource:932
70984185BB314019A43246EB1EDCBEA0:$RECOVERY: at least one node (0) torecover
before lock mastery can begin
(21372,3):dlm_get_lock_resource:966 70984185BB314019A43246EB1EDCBEA0:
recovery map is not empty, but must master $RECOVERY lock now
(21360,6):dlm_get_lock_resource:932
3A7F251A33BD448690BD4967BF9EB992:$RECOVERY: at least one node (0) torecover
before lock mastery can begin
(21360,6):dlm_get_lock_resource:966 3A7F251A33BD448690BD4967BF9EB992:
recovery map is not empty, but must master $RECOVERY lock now
kernel: kjournald starting. Commit interval 5 seconds
kjournald starting. Commit interval 5 seconds


Can someone tell me what kind of error is it and which can be the cause?

Thanks.

Regards,
Francesco Gabriele 

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] OCFS2 ERROR: status = - 107

2010-05-26 Thread Francesco Gabriele
Good afternoon,
I am facing these errors on the /var/log/messages regarding ocfs:

kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107
kernel: (21371,5):dlm_drop_lockres_ref:2295 ERROR: status = -107
kernel: (21371,5):dlm_purge_lockres:189 ERROR: status = -107

(21312,2):dlm_get_lock_resource:966 E66ADD1149B9416E8D2B3CA50809ABE0:
recovery map is not empty, but must master $RECOVERY lock now
(10571,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,17)
(10577,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,113)
(10576,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,97)
(10575,0):ocfs2_replay_journal:1191 Recovering node 0 from slot 0 on device
(8,81)
(21372,3):dlm_get_lock_resource:932
70984185BB314019A43246EB1EDCBEA0:$RECOVERY: at least one node (0) torecover
before lock mastery can begin
(21372,3):dlm_get_lock_resource:966 70984185BB314019A43246EB1EDCBEA0:
recovery map is not empty, but must master $RECOVERY lock now
(21360,6):dlm_get_lock_resource:932
3A7F251A33BD448690BD4967BF9EB992:$RECOVERY: at least one node (0) torecover
before lock mastery can begin
(21360,6):dlm_get_lock_resource:966 3A7F251A33BD448690BD4967BF9EB992:
recovery map is not empty, but must master $RECOVERY lock now
kernel: kjournald starting. Commit interval 5 seconds
kjournald starting. Commit interval 5 seconds


Can someone tell me what kind of error is it and which can be the cause?

Thanks.

Regards,
Francesco Gabriele
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

2009-03-20 Thread Jari Takkala
Hi Joel,

I emailed you and Srinivas the image file a few days ago. Can you confirm that 
you received the attachment?

Jari

- "Joel Becker"  wrote:

> > I've run o2image against the snapshot. I can email that directly to
> > you, or if there is a private FTP server you want me to upload it to
> > please let me know. It's 5.2MB compressed, 4.2GB uncompressed. Even
> > though it's metadata, I don't think I'll be able to attach it to the
> > bug report for security reasons.
> 
>   Email it to me and copy srinivas.e...@oracle.com.  Don't copy
> ocfs2-users@oss.oracle.com :-)
> 

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

2009-03-18 Thread Joel Becker
On Wed, Mar 18, 2009 at 02:05:09PM +, Jari Takkala wrote:
> Thanks for your response. My comments are inline below.
> 
> - "Joel Becker"  wrote:
> 
> > First and foremost, can you file a bugzilla bug?  This is great
> > detail, and it should be captured there.  More comments below.
> 
> Done, bug 1090 opened, http://oss.oracle.com/bugzilla/show_bug.cgi?id=1090.

Thanks.

> > I'm guessing this was a global bitmap cluster group based on the
> > function call chain, but I'd like to verify.  Is it possible to get
> > an o2image of the volume for us to look at?  o2image should create an
> > image without data so that its safe to send to us.
> 
> I've run o2image against the snapshot. I can email that directly to you, or 
> if there is a private FTP server you want me to upload it to please let me 
> know. It's 5.2MB compressed, 4.2GB uncompressed. Even though it's metadata, I 
> don't think I'll be able to attach it to the bug report for security reasons.

Email it to me and copy srinivas.e...@oracle.com.  Don't copy
ocfs2-users@oss.oracle.com :-)

Joel

-- 

Life's Little Instruction Book #497

"Go down swinging."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

2009-03-18 Thread Jari Takkala
Hi Joel,

Thanks for your response. My comments are inline below.

- "Joel Becker"  wrote:

>   First and foremost, can you file a bugzilla bug?  This is great
> detail, and it should be captured there.  More comments below.

Done, bug 1090 opened, http://oss.oracle.com/bugzilla/show_bug.cgi?id=1090.

>   All your errors are -5, or EIO.  They appear to all be coming
> from the group descriptor error, but your log is very weird - it's
> almost in the reverse order the functions are called.

>   These errors are self-consistent.  That is, the higher levels of
> the chain agree with the lower levels.  Of course, they all agree on
> the bit count at the lowest level that is wrong.  How it came to be wrong
> is the $64k question.
>   Can you attach the message logs from all nodes to the bugzilla
> bug?  Maybe one of the other nodes did something.

The logs I attached are from /var/log/messages, the order is the same in the 
dmesg buffer. I've attached the same logs to the bugzilla bug. Unfortunately 
there's nothing more that was logged during that time period then what I've 
already posted. The only thing I did not save were all of the hundreds of lines 
of output from the two fsck's.

>   I'm guessing this was a global bitmap cluster group based on the
> function call chain, but I'd like to verify.  Is it possible to get
> an o2image of the volume for us to look at?  o2image should create an
> image without data so that its safe to send to us.

I've run o2image against the snapshot. I can email that directly to you, or if 
there is a private FTP server you want me to upload it to please let me know. 
It's 5.2MB compressed, 4.2GB uncompressed. Even though it's metadata, I don't 
think I'll be able to attach it to the bug report for security reasons.

I did a quick 'fsck.ocfs2 -f' on the snapshot of the volume and it reports the 
group descriptor mismatch problem. I aborted the fsck and didn't make any 
changes. This snapshot was taken with the filesystem offline on all systems. 
Following the snapshot I brought the filesystem back online, started our 
application, and then began the 'rm -rf'.

I can do some more tests on the snapshot if you necessary. At this time the 
only modification I've made is to relabel the filesystem so that it does not 
clash with the the actual volume.

Thanks!

Jari

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

2009-03-17 Thread Joel Becker
On Tue, Mar 17, 2009 at 11:37:29AM +, Jari Takkala wrote:
> We recently ran into an issue with another one of our OCFS2 clusters where 
> OCFS2 detected on-disk corruption. The filesystem in question has a capacity 
> of 641GB, and I had attempted to remove about 500GB of files. The number of 
> files that I was removing would have been about 20,000.

First and foremost, can you file a bugzilla bug?  This is great
detail, and it should be captured there.  More comments below.

> I started the 'rm -rf' on host2. Shortly after that the filesystem was 
> automatically mounted read-only and the following errors logged:
> 
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_commit_truncate:6490 ERROR: 
> status = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_delete_inode:974 ERROR: status = 
> -5
> Mar 15 14:31:34 host2 kernel: (2008,1):__ocfs2_flush_truncate_log:5111 ERROR: 
> status = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_free_clusters:1842 ERROR: status 
> = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_free_suballoc_bits:1755 ERROR: 
> status = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_replay_truncate_records:5039 
> ERROR: status = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_truncate_for_delete:562 ERROR: 
> status = -5
> Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_wipe_inode:733 ERROR: status = -5
> Mar 15 14:31:34 host2 kernel: File system is now read-only due to the 
> potential of on-disk corruption. Please run fsck.ocfs2 once the file system 
> is unmounted.
> Mar 15 14:31:34 host2 kernel: OCFS2: ERROR (device xvdb1): 
> ocfs2_check_group_descriptor: Group descriptor # 12257280 has bit count 32256 
> but claims that 32639 are free
> Mar 15 14:31:36 host2 kernel: (1796,1):__ocfs2_flush_truncate_log:5111 ERROR: 
> status = -5
> Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_free_clusters:1842 ERROR: status 
> = -5
> Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_free_suballoc_bits:1755 ERROR: 
> status = -5
> Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_replay_truncate_records:5039 
> ERROR: status = -5
> Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_truncate_log_worker:5150 ERROR: 
> status = -5
> Mar 15 14:31:36 host2 kernel: OCFS2: ERROR (device xvdb1): 
> ocfs2_check_group_descriptor: Group descriptor # 12257280 has bit count 32256 
> but claims that 32639 are free

All your errors are -5, or EIO.  They appear to all be coming
from the group descriptor error, but your log is very weird - it's
almost in the reverse order the functions are called.

> I unmounted the filesystem from all three nodes and ran 'ocfs2.fsck -f' on 
> it. I had to run ocfs2.fsck twice before it reported the clean. A snippet of 
> the fsck output:
> 
> [GROUP_FREE_BITS] Group descriptor at block 12257280 claims to have 32639 
> free bits which is more than 32238 bits indicated by the bitmap. Drop its 
> free bit count down to the total?  y
> [CHAIN_BITS] Chain 137 in allocator inode 11 has 249271 bits marked free out 
> of 677376 total bits but the block groups in the chain have 248870 free out 
> of 677376 total.  Fix this by updating the chain record?  y
> [CHAIN_GROUP_BITS] Allocator inode 11 has 109425928 bits marked used out of 
> 167772803 total bits but the chains have 109426329 used out of 167772803 
> total.  Fix this by updating the inode counts  y

These errors are self-consistent.  That is, the higher levels of
the chain agree with the lower levels.  Of course, they all agree on the
bit count at the lowest level that is wrong.  How it came to be wrong is
the $64k question.
Can you attach the message logs from all nodes to the bugzilla
bug?  Maybe one of the other nodes did something.

> I have a snapshot of the LUN with the original data, and I can run some tests 
> on it if necessary and try to reproduce the problem. Note that we had another 
> OCFS2 issue recently 
> (http://oss.oracle.com/pipermail/ocfs2-users/2009-February/003369.html), and 
> we're still investigating that. However, that problem was on a database 
> cluster, which is on a different storage array, and it does not seem to be 
> the same problem.

I'm guessing this was a global bitmap cluster group based on the
function call chain, but I'd like to verify.  Is it possible to get an
o2image of the volume for us to look at?  o2image should create an image
without data so that its safe to send to us.

Joel

-- 

"Reality is merely an illusion, albeit a very persistent one."
- Albert Einstien

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] OCFS2 Error: Group Descriptor Mismatch

2009-03-17 Thread Jari Takkala
Hello,

We recently ran into an issue with another one of our OCFS2 clusters where 
OCFS2 detected on-disk corruption. The filesystem in question has a capacity of 
641GB, and I had attempted to remove about 500GB of files. The number of files 
that I was removing would have been about 20,000.

I started the 'rm -rf' on host2. Shortly after that the filesystem was 
automatically mounted read-only and the following errors logged:

Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_commit_truncate:6490 ERROR: status 
= -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_delete_inode:974 ERROR: status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):__ocfs2_flush_truncate_log:5111 ERROR: 
status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_free_clusters:1842 ERROR: status = 
-5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_free_suballoc_bits:1755 ERROR: 
status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_replay_truncate_records:5039 
ERROR: status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_truncate_for_delete:562 ERROR: 
status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_wipe_inode:733 ERROR: status = -5
Mar 15 14:31:34 host2 kernel: File system is now read-only due to the potential 
of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted.
Mar 15 14:31:34 host2 kernel: OCFS2: ERROR (device xvdb1): 
ocfs2_check_group_descriptor: Group descriptor # 12257280 has bit count 32256 
but claims that 32639 are free
Mar 15 14:31:36 host2 kernel: (1796,1):__ocfs2_flush_truncate_log:5111 ERROR: 
status = -5
Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_free_clusters:1842 ERROR: status = 
-5
Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_free_suballoc_bits:1755 ERROR: 
status = -5
Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_replay_truncate_records:5039 
ERROR: status = -5
Mar 15 14:31:36 host2 kernel: (1796,1):ocfs2_truncate_log_worker:5150 ERROR: 
status = -5
Mar 15 14:31:36 host2 kernel: OCFS2: ERROR (device xvdb1): 
ocfs2_check_group_descriptor: Group descriptor # 12257280 has bit count 32256 
but claims that 32639 are free

I unmounted the filesystem from all three nodes and ran 'ocfs2.fsck -f' on it. 
I had to run ocfs2.fsck twice before it reported the clean. A snippet of the 
fsck output:

[GROUP_FREE_BITS] Group descriptor at block 12257280 claims to have 32639 free 
bits which is more than 32238 bits indicated by the bitmap. Drop its free bit 
count down to the total?  y
[CHAIN_BITS] Chain 137 in allocator inode 11 has 249271 bits marked free out of 
677376 total bits but the block groups in the chain have 248870 free out of 
677376 total.  Fix this by updating the chain record?  y
[CHAIN_GROUP_BITS] Allocator inode 11 has 109425928 bits marked used out of 
167772803 total bits but the chains have 109426329 used out of 167772803 total. 
 Fix this by updating the inode counts  y
[CLUSTER_ALLOC_BIT] Cluster 12268148 is marked in the global cluster bitmap but 
it isn't in use.  Clear its bit in the bitmap?  y
[INODE_ORPHANED] Inode 15975682 was found in the orphan directory. Delete its 
contents and unlink it?  y

*See the attachment for all errors logged and some more output from ocfs2.fsck.

Following the fsck, I remounted the filesystem on all nodes and was able to 
delete the remainder of the files. I ran a quick test using our application on 
one of the remaining files and it appeared to be intact, with no data 
corruption.

During the maintenance our application was running, and it may have been 
writing some files to disk, however the amount would have been very small (I 
see a 493K file created at 14:34) compared to our busy times. Our monitoring 
graphs show a much lighter load than during normal operations, so the OCFS2 
filesystem was not under any unusual load apart from the 'rm -rf'.

We are running RHEL 5.2 as Xen guests on all nodes in the cluster, with kernel 
2.6.18-92.el5xen and ocfs2-2.6.18-92.el5xen-1.4.1-1.el5 installed.

I did a quick search on the mailing list for some of the errors we encountered, 
but couldn't find any results that seemed to document a similar issue.

I have a snapshot of the LUN with the original data, and I can run some tests 
on it if necessary and try to reproduce the problem. Note that we had another 
OCFS2 issue recently 
(http://oss.oracle.com/pipermail/ocfs2-users/2009-February/003369.html), and 
we're still investigating that. However, that problem was on a database 
cluster, which is on a different storage array, and it does not seem to be the 
same problem.

Has anyone seen this issue before, or does anyone have any advice on how we can 
troubleshoot it?

Regards,

JariMar 15 14:31:34 host2 kernel: (2008,1):ocfs2_commit_truncate:6490 ERROR: status 
= -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_delete_inode:974 ERROR: status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):__ocfs2_flush_truncate_log:5111 ERROR: 
status = -5
Mar 15 14:31:34 host2 kernel: (2008,1):ocfs2_free_clusters:1842 ERROR: status = 
-5
Mar 15 14:31:34 host2 kernel: (

Re: [Ocfs2-users] OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks

2009-03-01 Thread Tao Ma
Hi Daniel,

Daniel Keisling wrote:
> Patch was here:
> http://oss.oracle.com/pipermail/ocfs2-devel/2008-September/002787.html
yes, that patch has been merged into ocfs2-1.4 and should be ready for 
the next release. Also as Joel said, "If you have the appropriate 
support, you should call support and file that way."

Here is the workaround from the mail list, and I don't know whether it 
is suitable in your case.

I would guess that you are upgrading from ocfs2-1.2 to ocfs2-1.4. If 
that is the case please make sure which file cause this bug. Use
debugfs.ocfs2 -R "findpath <23693699>" /dev/sdh1 to see what the file is.
If that file isn't a datafile(I think it shouldn't be since the file is 
only 5120 bytes) and your volume is used for other files(e.g Oracle 
Home), then please remove "datavolume" from the mount option and instead 
set the init.ora parameter filesystemio_options=directio (or is it 
odirect). This should limit Oracle to use the odirect flag for the files 
it should. datavolume is legacy, please refer to the OCFS2 1.4
Users Guide for further information about the datavolume option.
Wish it help.

Regards,
Tao
> 
> [r...@wilracdbdr01 /]# debugfs.ocfs2 -R 'stat <23693699>' /dev/sdh1
> Inode: 23693699   Mode: 0660   Generation: 2707416418
> (0xa15fe562)
> FS Generation: 236416663 (0xe176e97)
> Type: Regular   Attr: 0x0   Flags: Valid
> User: 503 (oracle)   Group: 505 (dba)   Size: 5120
> Links: 1   Clusters: 2
> ctime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
> atime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
> mtime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
> dtime: 0x0 -- Wed Dec 31 19:00:00 1969
> ctime_nsec: 0x222d308b -- 573386891
> atime_nsec: 0x21390e2d -- 557387309
> mtime_nsec: 0x222d308b -- 573386891
> Last Extblk: 0
> Sub Alloc Slot: 0   Sub Alloc Bit: 19
> Tree Depth: 0   Count: 243   Next Free Rec: 1
> ## OffsetClusters   Block#  Flags
> 0  0 2  346108690x0
> 
>  [r...@wilracdbdr01 /]# debugfs.ocfs2 -R stats /dev/sdh1
> Revision: 0.90
> Mount Count: 0   Max Mount Count: 20
> State: 0   Errors: 0
> Check Interval: 0   Last Check: Fri Feb 27 19:47:46 2009
> Creator OS: 0
> Feature Compat: 1 BackupSuper
> Feature Incompat: 0 None
> Tunefs Incomplete: 0 None
> Feature RO compat: 0 None
> Root Blknum: 5   System Dir Blknum: 6
> First Cluster Group Blknum: 3
> Block Size Bits: 12   Cluster Size Bits: 12
> Max Node Slots: 4
> Label: ph1p_arch
> UUID: 839B5D0925C74CD4920F4E8CC065D180
> Cluster stack: classic o2cb
> Inode: 2   Mode: 00   Generation: 236416663 (0xe176e97)
> FS Generation: 236416663 (0xe176e97)
> Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
> User: 0 (root)   Group: 0 (root)   Size: 0
> Links: 0   Clusters: 39321087
> ctime: 0x48515d46 -- Thu Jun 12 13:30:46 2008
> atime: 0x0 -- Wed Dec 31 19:00:00 1969
> mtime: 0x48515d46 -- Thu Jun 12 13:30:46 2008
> dtime: 0x0 -- Wed Dec 31 19:00:00 1969
> ctime_nsec: 0x -- 0
> atime_nsec: 0x -- 0
> mtime_nsec: 0x -- 0
> Last Extblk: 0
> Sub Alloc Slot: Global   Sub Alloc Bit: 65535
> 
>> -Original Message-
>> From: Joel Becker [mailto:joel.bec...@oracle.com] 
>> Sent: Friday, February 27, 2009 6:50 PM
>> To: Daniel Keisling
>> Cc: ocfs2-users@oss.oracle.com; Sunil Mushran
>> Subject: Re: OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks
>>
>> On Fri, Feb 27, 2009 at 06:40:38PM -0600, Daniel Keisling wrote:
>>> I'm am getting the following error when writing to an OCF2 
>> filesystem:
>>>  
>>>  
>>> Feb 27 19:06:37 wilracdbdr01 kernel: OCFS2: ERROR (device sdh1):
>>> ocfs2_direct_IO_get_blocks: Inode 23693699 has a hole at block 6
>>> Feb 27 19:06:37 wilracdbdr01 kernel: File system is now 
>> read-only due to
>>> the potential of on-disk corruption. Please run fsck.ocfs2 
>> once the file
>>> system is unmounted.
>>  This basically says that your filesystem does not support sparse
>> files, but it does have a hole in an inode - which shouldn't happen if
>> sparse isn't supported.
>>  Can you send the output of "debugfs.ocfs2 -R 'stat <23693699>'
>> /dev/sdh1" and "debugfs.ocfs2 -R stats /dev/sdh1"?
>>
>>> I saw a patch that was released in September 2008.  How do 
>> I get this?
>>
>>  What patch?  Do you have a link?  Without knowing the patch I
>> can't tell you whether that patch affects you.
>>
>>> This a production system and we are currently unable to 
>> start the DB.
>>
>>  If you have the appropriate support, you should call support and
>> file that way.  Support will also want the information I requested
>> above.
>>
>> Joel
>> -- 
>>
>> "Three o'clo

Re: [Ocfs2-users] OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks

2009-02-27 Thread Daniel Keisling
Patch was here:
http://oss.oracle.com/pipermail/ocfs2-devel/2008-September/002787.html

[r...@wilracdbdr01 /]# debugfs.ocfs2 -R 'stat <23693699>' /dev/sdh1
Inode: 23693699   Mode: 0660   Generation: 2707416418
(0xa15fe562)
FS Generation: 236416663 (0xe176e97)
Type: Regular   Attr: 0x0   Flags: Valid
User: 503 (oracle)   Group: 505 (dba)   Size: 5120
Links: 1   Clusters: 2
ctime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
atime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
mtime: 0x49a88566 -- Fri Feb 27 19:29:26 2009
dtime: 0x0 -- Wed Dec 31 19:00:00 1969
ctime_nsec: 0x222d308b -- 573386891
atime_nsec: 0x21390e2d -- 557387309
mtime_nsec: 0x222d308b -- 573386891
Last Extblk: 0
Sub Alloc Slot: 0   Sub Alloc Bit: 19
Tree Depth: 0   Count: 243   Next Free Rec: 1
## OffsetClusters   Block#  Flags
0  0 2  346108690x0

 [r...@wilracdbdr01 /]# debugfs.ocfs2 -R stats /dev/sdh1
Revision: 0.90
Mount Count: 0   Max Mount Count: 20
State: 0   Errors: 0
Check Interval: 0   Last Check: Fri Feb 27 19:47:46 2009
Creator OS: 0
Feature Compat: 1 BackupSuper
Feature Incompat: 0 None
Tunefs Incomplete: 0 None
Feature RO compat: 0 None
Root Blknum: 5   System Dir Blknum: 6
First Cluster Group Blknum: 3
Block Size Bits: 12   Cluster Size Bits: 12
Max Node Slots: 4
Label: ph1p_arch
UUID: 839B5D0925C74CD4920F4E8CC065D180
Cluster stack: classic o2cb
Inode: 2   Mode: 00   Generation: 236416663 (0xe176e97)
FS Generation: 236416663 (0xe176e97)
Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
User: 0 (root)   Group: 0 (root)   Size: 0
Links: 0   Clusters: 39321087
ctime: 0x48515d46 -- Thu Jun 12 13:30:46 2008
atime: 0x0 -- Wed Dec 31 19:00:00 1969
mtime: 0x48515d46 -- Thu Jun 12 13:30:46 2008
dtime: 0x0 -- Wed Dec 31 19:00:00 1969
ctime_nsec: 0x -- 0
atime_nsec: 0x -- 0
mtime_nsec: 0x -- 0
Last Extblk: 0
Sub Alloc Slot: Global   Sub Alloc Bit: 65535

> -Original Message-
> From: Joel Becker [mailto:joel.bec...@oracle.com] 
> Sent: Friday, February 27, 2009 6:50 PM
> To: Daniel Keisling
> Cc: ocfs2-users@oss.oracle.com; Sunil Mushran
> Subject: Re: OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks
> 
> On Fri, Feb 27, 2009 at 06:40:38PM -0600, Daniel Keisling wrote:
> > I'm am getting the following error when writing to an OCF2 
> filesystem:
> >  
> >  
> > Feb 27 19:06:37 wilracdbdr01 kernel: OCFS2: ERROR (device sdh1):
> > ocfs2_direct_IO_get_blocks: Inode 23693699 has a hole at block 6
> > Feb 27 19:06:37 wilracdbdr01 kernel: File system is now 
> read-only due to
> > the potential of on-disk corruption. Please run fsck.ocfs2 
> once the file
> > system is unmounted.
> 
>   This basically says that your filesystem does not support sparse
> files, but it does have a hole in an inode - which shouldn't happen if
> sparse isn't supported.
>   Can you send the output of "debugfs.ocfs2 -R 'stat <23693699>'
> /dev/sdh1" and "debugfs.ocfs2 -R stats /dev/sdh1"?
> 
> > I saw a patch that was released in September 2008.  How do 
> I get this?
> 
>   What patch?  Do you have a link?  Without knowing the patch I
> can't tell you whether that patch affects you.
> 
> > This a production system and we are currently unable to 
> start the DB.
> 
>   If you have the appropriate support, you should call support and
> file that way.  Support will also want the information I requested
> above.
> 
> Joel
> -- 
> 
> "Three o'clock is always too late or too early for anything you
>  want to do."
> - Jean-Paul Sartre
> 
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.bec...@oracle.com
> Phone: (650) 506-8127
> 
> 

__
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that you must not read this transmission and
that any disclosure, copying, printing, distribution or use of this
transmission is strictly prohibited. If you have received this transmission
in error, please immediately notify the sender by telephone or return email
and delete the original transmission and its attachments without reading
or saving in any manner.


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks

2009-02-27 Thread Daniel Keisling
I'm am getting the following error when writing to an OCF2 filesystem:
 
 
Feb 27 19:06:37 wilracdbdr01 kernel: OCFS2: ERROR (device sdh1):
ocfs2_direct_IO_get_blocks: Inode 23693699 has a hole at block 6
Feb 27 19:06:37 wilracdbdr01 kernel: File system is now read-only due to
the potential of on-disk corruption. Please run fsck.ocfs2 once the file
system is unmounted.
 
I saw a patch that was released in September 2008.  How do I get this?
I am running the following:
 
[r...@wilracdbdr01 ~]# rpm -qa | grep ocfs
ocfs2console-1.4.1-1.el5
ocfs2-2.6.18-53.el5-1.2.8-2.el5
ocfs2-tools-1.4.1-1.el5
ocfs2-2.6.18-92.1.13.el5-1.4.1-1.el5

Linux wilracdbdr01.wilm.ppdi.com 2.6.18-92.1.13.el5 #1 SMP Thu Sep 4
03:51:21 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

This a production system and we are currently unable to start the DB.
 
Thanks,

Daniel



__
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that you must not read this transmission and
that any disclosure, copying, printing, distribution or use of this
transmission is strictly prohibited. If you have received this transmission
in error, please immediately notify the sender by telephone or return email
and delete the original transmission and its attachments without reading
or saving in any manner.
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] OCFS2: ERROR (device sdh1): ocfs2_direct_IO_get_blocks

2009-02-27 Thread Joel Becker
On Fri, Feb 27, 2009 at 06:40:38PM -0600, Daniel Keisling wrote:
> I'm am getting the following error when writing to an OCF2 filesystem:
>  
>  
> Feb 27 19:06:37 wilracdbdr01 kernel: OCFS2: ERROR (device sdh1):
> ocfs2_direct_IO_get_blocks: Inode 23693699 has a hole at block 6
> Feb 27 19:06:37 wilracdbdr01 kernel: File system is now read-only due to
> the potential of on-disk corruption. Please run fsck.ocfs2 once the file
> system is unmounted.

This basically says that your filesystem does not support sparse
files, but it does have a hole in an inode - which shouldn't happen if
sparse isn't supported.
Can you send the output of "debugfs.ocfs2 -R 'stat <23693699>'
/dev/sdh1" and "debugfs.ocfs2 -R stats /dev/sdh1"?

> I saw a patch that was released in September 2008.  How do I get this?

What patch?  Do you have a link?  Without knowing the patch I
can't tell you whether that patch affects you.

> This a production system and we are currently unable to start the DB.

If you have the appropriate support, you should call support and
file that way.  Support will also want the information I requested
above.

Joel
-- 

"Three o'clock is always too late or too early for anything you
 want to do."
- Jean-Paul Sartre

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] OCFS2 "ERROR: bad directory"

2008-01-12 Thread Luke Schierer
On Sat, Jan 12, 2008 at 07:52:13AM -0500, Vincas Čižiūnas wrote:
> We have a problem.  My friend and I have a directory that cannot be removed 
> because it has no . or .. .  Does anyone want to take a guess on what to 
> do?  fsck.ocfs2 does not fix it.  Thank you. I can provide any other 
> information you may need.  I'd like to be able to fix this directory.
>
>
> (23563,0):ocfs2_empty_dir:306 ERROR: bad directory (dir #18886251) - no `.' 
> or `..'
>
> --V

We also cannot move or rename the directory.  If you try to move it,
it complains about moving a directory to a subdirectory of itself, ie:

[EMAIL PROTECTED]:~/todelete$ mv Wedding/ test
mv: cannot move `Wedding/' to a subdirectory of itself, `test'
[EMAIL PROTECTED]:~/todelete$

and rm complains that the directory is not empty even with the -rf
flag.

Any ideas on how to further debug and fix this would be greatly
appreciated.

Thanks,
luke


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] OCFS2 "ERROR: bad directory"

2008-01-12 Thread Vincas Čižiūnas
We have a problem.  My friend and I have a directory that cannot be  
removed because it has no . or .. .  Does anyone want to take a guess  
on what to do?  fsck.ocfs2 does not fix it.  Thank you. I can provide  
any other information you may need.  I'd like to be able to fix this  
directory.



(23563,0):ocfs2_empty_dir:306 ERROR: bad directory (dir #18886251) -  
no `.' or `..'


--V


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocfs2 error messages

2006-10-31 Thread Sunil Mushran

So it is bug#790. It just may be a case of unnecessary error messages
for you. I am still investigating it.

Matthew Flusche wrote:

Yes, one of the clustered file systems is shared with nfs.

-Original Message-
From: Sunil Mushran [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 31, 2006 12:25 PM

To: Matthew Flusche
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 error messages

Are you using NFS by any chance? I am looking into bug#790
that also encounters the same error (ESTALE).

Matthew Flusche wrote:
  
I received the following error messages in the system logs.  Is this 
anything to be concerned with?


 

kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid dinode: 
i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0


kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate inode 
failed! i_blkno=1293597, i_ino=1293597


kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116

kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116

kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116

 

This is a three node cluster, no other error messages on any of the 
other nodes. 

 


System Information

RHEL 4U4 2.6.9-42.0.2 kernel

ocfs2console-1.2.1-1

ocfs2-tools-debuginfo-1.2.1-1

ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1

ocfs2-tools-1.2.1-1

 


Thanks,

 


Matt





  

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
  



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


RE: [Ocfs2-users] ocfs2 error messages

2006-10-31 Thread Matthew Flusche
Yes, one of the clustered file systems is shared with nfs.

-Original Message-
From: Sunil Mushran [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 31, 2006 12:25 PM
To: Matthew Flusche
Cc: ocfs2-users@oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 error messages

Are you using NFS by any chance? I am looking into bug#790
that also encounters the same error (ESTALE).

Matthew Flusche wrote:
>
> I received the following error messages in the system logs.  Is this 
> anything to be concerned with?
>
>  
>
> kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid dinode: 
> i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0
>
> kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate inode 
> failed! i_blkno=1293597, i_ino=1293597
>
> kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116
>
> kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116
>
> kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116
>
>  
>
> This is a three node cluster, no other error messages on any of the 
> other nodes. 
>
>  
>
> System Information
>
> RHEL 4U4 2.6.9-42.0.2 kernel
>
> ocfs2console-1.2.1-1
>
> ocfs2-tools-debuginfo-1.2.1-1
>
> ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1
>
> ocfs2-tools-1.2.1-1
>
>  
>
> Thanks,
>
>  
>
> Matt
>
>

>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] ocfs2 error messages

2006-10-31 Thread Sunil Mushran

Are you using NFS by any chance? I am looking into bug#790
that also encounters the same error (ESTALE).

Matthew Flusche wrote:


I received the following error messages in the system logs.  Is this 
anything to be concerned with?


 

kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid dinode: 
i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0


kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate inode 
failed! i_blkno=1293597, i_ino=1293597


kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116

kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116

kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116

 

This is a three node cluster, no other error messages on any of the 
other nodes. 

 


System Information

RHEL 4U4 2.6.9-42.0.2 kernel

ocfs2console-1.2.1-1

ocfs2-tools-debuginfo-1.2.1-1

ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1

ocfs2-tools-1.2.1-1

 


Thanks,

 


Matt



___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
  


___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[Ocfs2-users] ocfs2 error messages

2006-10-31 Thread Matthew Flusche








I received the following error messages in the system
logs.  Is this anything to be concerned with?

 

kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid
dinode: i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0

kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate
inode failed! i_blkno=1293597, i_ino=1293597

kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116

kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116

kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116

 

This is a three node cluster, no other error messages on any
of the other nodes.  

 

System Information

RHEL 4U4 2.6.9-42.0.2 kernel

ocfs2console-1.2.1-1

ocfs2-tools-debuginfo-1.2.1-1

ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1

ocfs2-tools-1.2.1-1

 

Thanks,

 

Matt






___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users