Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Jason Price
'mainline' means 'the current release of the linux kernel' not 'patched down
into the distro level'.

I'm guessing here, but I think the 'production release' would mean OCFS2
v1.4..

--Jason

On Fri, Jun 25, 2010 at 3:12 PM, Eric Raskin  wrote:

>  Thanks very much.  I'm not really sure about the difference between
> "mainline" and "production release".  Are we looking at days, weeks, or
> months? :-)
>
>Eric
>
>
> Joel Becker wrote:
>
> On Fri, Jun 25, 2010 at 02:03:59PM -0400, Eric Raskin wrote:
>
>
>  In the meantime, I'm planning on moving data off the file system,
> re-creating it, then moving the data back on again.  I'm hoping that
> will "defragment" it enough to allow it to continue working for a while.
>
> Does anyone know if I'm wasting my time?  Will it just have the same
> problem when I put the data back?
>
>
>   That will help for a little while, but you will run into the
> problem as you get nearer full.  The solution is in mainline, we're
> working on bringing it to production releases.
>
> Joel
>
>
>
>
> --
> 
> Eric H. Raskin  Voice:  914-765-0500 x120
> Professional Advertising Systems Inc.   Fax:914-765-0503
> 200 Business Park Dr Ste 304EMail:  eras...@paslists.com
>
> Armonk, NY 10504Web:www.paslists.com
>
>
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Eric Raskin
Thanks very much.  I'm not really sure about the difference between
"mainline" and "production release".  Are we looking at days, weeks, or
months? :-)

   Eric

Joel Becker wrote:
> On Fri, Jun 25, 2010 at 02:03:59PM -0400, Eric Raskin wrote:
>   
>> In the meantime, I'm planning on moving data off the file system,
>> re-creating it, then moving the data back on again.  I'm hoping that
>> will "defragment" it enough to allow it to continue working for a while.
>>
>> Does anyone know if I'm wasting my time?  Will it just have the same
>> problem when I put the data back?
>> 
>
>   That will help for a little while, but you will run into the
> problem as you get nearer full.  The solution is in mainline, we're
> working on bringing it to production releases.
>
> Joel
>
>   

-- 

Eric H. Raskin  Voice:  914-765-0500 x120
Professional Advertising Systems Inc.   Fax:914-765-0503
200 Business Park Dr Ste 304EMail:  eras...@paslists.com
Armonk, NY 10504Web:www.paslists.com

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Joel Becker
On Fri, Jun 25, 2010 at 02:03:59PM -0400, Eric Raskin wrote:
> In the meantime, I'm planning on moving data off the file system,
> re-creating it, then moving the data back on again.  I'm hoping that
> will "defragment" it enough to allow it to continue working for a while.
> 
> Does anyone know if I'm wasting my time?  Will it just have the same
> problem when I put the data back?

That will help for a little while, but you will run into the
problem as you get nearer full.  The solution is in mainline, we're
working on bringing it to production releases.

Joel

-- 

 Joel's Second Law:

If a code change requires additional user setup, it is wrong.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Joel Becker
On Fri, Jun 25, 2010 at 01:49:28PM -0400, Jason Price wrote:
> I've updated the bug #1263.  I am still periodically getting ENOSPC errors
> out of this file system.  Apparently bugzilla isn't accepting attachments at
> this moment, so I'll attach the current stat_sysdir.sh output.
> 
> At this point, I can't continue having these errors in production.  Next
> week, I'll begin migrating away from OCFS2.  If you have any ideas for tests
> or remedy's, I'll be happy to run some tests before I begin the migration.

The solution is in the mainline Linux kernel now.  We're working
on bringing it to production releases.

Joel

-- 

"People with narrow minds usually have broad tongues."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Eric Raskin
Unfortunately, we also have just started having the same issue and are
anxiously awaiting a solution.   So far I've just deleted things I
didn't absolutely need in order to keep our production system running,
but that won't work forever.

The filesystem was defined for 750GB.  We've currently got 529GB of data
on it (71%).  When we got near 80%, it stops allowing us to create files.

In the meantime, I'm planning on moving data off the file system,
re-creating it, then moving the data back on again.  I'm hoping that
will "defragment" it enough to allow it to continue working for a while.

Does anyone know if I'm wasting my time?  Will it just have the same
problem when I put the data back?

Thanks in advance...

   Eric

Jason Price wrote:
> I've updated the bug #1263.  I am still periodically getting ENOSPC
> errors out of this file system.  Apparently bugzilla isn't accepting
> attachments at this moment, so I'll attach the current stat_sysdir.sh
> output.
>
> At this point, I can't continue having these errors in production.
>  Next week, I'll begin migrating away from OCFS2.  If you have any
> ideas for tests or remedy's, I'll be happy to run some tests before I
> begin the migration.
>
> --Jason
> 
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

-- 

Eric H. Raskin  Voice:  914-765-0500 x120
Professional Advertising Systems Inc.   Fax:914-765-0503
200 Business Park Dr Ste 304EMail:  eras...@paslists.com
Armonk, NY 10504Web:www.paslists.com

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-25 Thread Jason Price
I've updated the bug #1263.  I am still periodically getting ENOSPC errors
out of this file system.  Apparently bugzilla isn't accepting attachments at
this moment, so I'll attach the current stat_sysdir.sh output.

At this point, I can't continue having these errors in production.  Next
week, I'll begin migrating away from OCFS2.  If you have any ideas for tests
or remedy's, I'll be happy to run some tests before I begin the migration.

--Jason
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-10 Thread Jason Price
(Sorry Tao: I realized I had just replied to you)

I just uploaded a third output from stat_sysfs.sh to bug # 1263.  It was
taken while we were experiencing ENOSPC errors.  In my limited testing, I
was able to write a 324k file, then a 1620k file (5x324), but failed to
write a 16200k file (10x1620).

I also may need to frame the stat_sysfs outputs.  Here's a rough timeline:

Monday morning: start experiencing ENOSPC errors.  Start researching, while
Node1 limps along (no traffic on node 2.  Take stat_sysfs output (the one I
posted last to bug # 1163.  It is also posted under bug # 1159).  This is
when I ran the file size test mentioned above.

I found the bug # 1159, scheduled emergency downtime to "tunefs.ocfs2 -N 3"
the cluster.  Everything works fine, traffic still on node 1.  Writing large
files (60-70 megs) works just fine at this time.

Wednesday early morning: Again we start seeing ENOSPC errors.  Fail traffic
to node2, unmount/remount OCFS volume from node1. Take stat_sysfs.sh outputs
on both nodes (these are the first two that I posted to bug # 1163).
 Continue researching.  After failing to node2, writing large files works
again.

Wednesday around 11: Again ENOSPC errors start appearing.  I take the
opportunity to upgrade node1 to v1.4.7, then fail traffic to node1, then
upgrade node2 to v1.4.7.  We haven't seen the problem since (granted, that's
less than 24 hours).

This problem mostly affects the users attempting to write files via FTP.
 From the FTP daemon, I have log files which say that we're getting 'No
space left on device' errors, but I don't have info about file sizes that
are failing.

On Wed, Jun 9, 2010 at 10:20 PM, Tao Ma  wrote:

> Hi Jason,
>
>
> On 06/09/2010 11:34 PM, Jason Price wrote:
>
>> And now it's starting to fail again.
>>
> How about the situation?
> I checked your stat_sysfs output, it looks that you have spaces for inode,
> extent alloc and local alloc(but maybe the kernel haven't flushed the
> metadata to the disk while the stat_sysfs only read the disk). So why you
> meet with ENOSPC? Can you describe it in more detail? You meet with it when
> touching a new file, or cat some bytes to a file or ...?
> If you find the wrong scenario, please enable the debugfs option so that we
> can find out the real cause.
> debugfs.ocfs2 -l INODE allow
> debugfs.ocfs2 -l DISK_ALLOC allow
> run you test case here.
> debugfs.ocfs2 -l INODE off
> debugfs.ocfs2 -l DISK_ALLOC off
>
> Regards,
> Tao
>
>
>> --Jason
>>
>> On Wed, Jun 9, 2010 at 9:51 AM, Jason Price > > wrote:
>>
>>I've got a busy FTP/Web cluster running OCFS2 v1.4.4.
>>
>>I've started getting "No space on device" errors when users attempt
>>to write to the file system.  Disk utilization is about 76% with
>>more than 100gb free.  Inode utilization is also at 76%.
>>
>>I thought this was a manifestation of bug # 1189, so I decreased the
>>number of nodes via tunefs.ocfs2 from 8 (the default) down to 3
>>(there are only 2 nodes in the cluster, with no growth anticipated).
>>
>>That got me out of the woods on Monday, but this morning the problem
>>manifested again.
>>
>>I've opened bug # 1263 about this issue. (link:
>>http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263 )
>>
>>Does anyone have other ideas?
>>
>>I'm more than happy to supply other information.
>>
>>What seems to happen is that small writes are allowed, but bigger
>>writes failed.  On Monday, I could write multiple 325kb files, and I
>>could cat them together to make one file of ~2 mb, but when I tried
>>to make a 10ish mb file, it failed.
>>
>>--Jason
>>
>>
>>
>>
>> ___
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-09 Thread Tao Ma
Hi Jason,

On 06/09/2010 11:34 PM, Jason Price wrote:
> And now it's starting to fail again.
How about the situation?
I checked your stat_sysfs output, it looks that you have spaces for 
inode, extent alloc and local alloc(but maybe the kernel haven't flushed 
the metadata to the disk while the stat_sysfs only read the disk). So 
why you meet with ENOSPC? Can you describe it in more detail? You meet 
with it when touching a new file, or cat some bytes to a file or ...?
If you find the wrong scenario, please enable the debugfs option so that 
we can find out the real cause.
debugfs.ocfs2 -l INODE allow
debugfs.ocfs2 -l DISK_ALLOC allow
run you test case here.
debugfs.ocfs2 -l INODE off
debugfs.ocfs2 -l DISK_ALLOC off

Regards,
Tao

>
> --Jason
>
> On Wed, Jun 9, 2010 at 9:51 AM, Jason Price  > wrote:
>
> I've got a busy FTP/Web cluster running OCFS2 v1.4.4.
>
> I've started getting "No space on device" errors when users attempt
> to write to the file system.  Disk utilization is about 76% with
> more than 100gb free.  Inode utilization is also at 76%.
>
> I thought this was a manifestation of bug # 1189, so I decreased the
> number of nodes via tunefs.ocfs2 from 8 (the default) down to 3
> (there are only 2 nodes in the cluster, with no growth anticipated).
>
> That got me out of the woods on Monday, but this morning the problem
> manifested again.
>
> I've opened bug # 1263 about this issue. (link:
> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263 )
>
> Does anyone have other ideas?
>
> I'm more than happy to supply other information.
>
> What seems to happen is that small writes are allowed, but bigger
> writes failed.  On Monday, I could write multiple 325kb files, and I
> could cat them together to make one file of ~2 mb, but when I tried
> to make a 10ish mb file, it failed.
>
> --Jason
>
>
>
>
> ___
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-09 Thread Jason Price
And now it's starting to fail again.

--Jason

On Wed, Jun 9, 2010 at 9:51 AM, Jason Price  wrote:

> I've got a busy FTP/Web cluster running OCFS2 v1.4.4.
>
> I've started getting "No space on device" errors when users attempt to
> write to the file system.  Disk utilization is about 76% with more than
> 100gb free.  Inode utilization is also at 76%.
>
> I thought this was a manifestation of bug # 1189, so I decreased the number
> of nodes via tunefs.ocfs2 from 8 (the default) down to 3 (there are only 2
> nodes in the cluster, with no growth anticipated).
>
> That got me out of the woods on Monday, but this morning the problem
> manifested again.
>
> I've opened bug # 1263 about this issue. (link:
> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263 )
>
> Does anyone have other ideas?
>
> I'm more than happy to supply other information.
>
> What seems to happen is that small writes are allowed, but bigger writes
> failed.  On Monday, I could write multiple 325kb files, and I could cat them
> together to make one file of ~2 mb, but when I tried to make a 10ish mb
> file, it failed.
>
> --Jason
>
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

[Ocfs2-users] 'No space left on device' error with plenty of space.

2010-06-09 Thread Jason Price
I've got a busy FTP/Web cluster running OCFS2 v1.4.4.

I've started getting "No space on device" errors when users attempt to write
to the file system.  Disk utilization is about 76% with more than 100gb
free.  Inode utilization is also at 76%.

I thought this was a manifestation of bug # 1189, so I decreased the number
of nodes via tunefs.ocfs2 from 8 (the default) down to 3 (there are only 2
nodes in the cluster, with no growth anticipated).

That got me out of the woods on Monday, but this morning the problem
manifested again.

I've opened bug # 1263 about this issue. (link:
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1263 )

Does anyone have other ideas?

I'm more than happy to supply other information.

What seems to happen is that small writes are allowed, but bigger writes
failed.  On Monday, I could write multiple 325kb files, and I could cat them
together to make one file of ~2 mb, but when I tried to make a 10ish mb
file, it failed.

--Jason
___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users