Why max_debt isn't used in ext4's find_group_orlov(...) ?

2007-07-19 Thread Yan Zheng

Hi all

max_debt is used in ext2's find_group_orlov .  In ext4's
find_group_orlov, max_debt is only computed, but not used.  I wonder
whether it's a typo,  Can anyone give me a answer? The kernel source I
read is 2.6.22.

Thanks in advance.


Best Regards

YZ
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why max_debt isn't used in ext4's find_group_orlov(...) ?

2007-07-19 Thread Kalpak Shah
On Thu, 2007-07-19 at 21:00 +0800, Yan Zheng wrote:
 Hi all
 
 max_debt is used in ext2's find_group_orlov .  In ext4's
 find_group_orlov, max_debt is only computed, but not used.  I wonder
 whether it's a typo,  Can anyone give me a answer? The kernel source I
 read is 2.6.22.

I think you are right, max_debt is currently unused in the current
ext3/4 code. In ext2 max_debt is used along with s_debts (in struct
ext2_sb_info) to decide inode allocations. s_debts is no longer present
in ext3/4. 

You can find the reason for removing s_debts here:
http://osdir.com/ml/file-systems.ext2.devel/2004-09/msg00027.html

Basically use of s_debts was unsafe since it performed unlocked byte
increment/decrement on words which may be being accessed simultaneously
on other CPUs and since it was a dynamic in-memory table, it required
extension during online resize which needed proper locking.

Until s_debts is added again, I think the code relating to max_debts
should be removed?


Thanks,
Kalpak.

 
 Thanks in advance.
 
 
 Best Regards
 
 YZ
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e2fsck bogus error report on orphan-list

2007-07-19 Thread Ryoichi . Kato
Hi,
I hit a problem of ext3/e2fsck on orphan-list handling.

The following sequence produces bogus e2fsck error report:
/dev/XXX: Inodes that were part of a corrupted orphan linked list found.

   1. Delete a file in an ext3 filesystem in early 1970
   2. Set RTC to 2007, and then mount/write the filesystem.
   3. Run e2fsck (with -f)

This is because i_dtime (deletion time) field is also used as a
next-pointer of an orphan-list (stores inode number rather than time),
and e2fsck handles it improperly.
You will have the same probrem if you run e2fsck on an ext3
filesystem with 1.2+ billion of files in it. (Is it possible?)

For more detail, please take a look at a document I wrote:
 - http://tree.celinuxforum.org/CelfPubWiki/Ext3OrphanedInodeProblem
 - 
http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree15?action=AttachFiledo=gettarget=ext3orphaned-inode.ppt
 (Sorry for .PPT)


So, my questions are:

 *Is this really a bug (or design defect) ?

 *Which of ext3 or e2fsck is responsible for the problem?
- I feel that e2fsck is. But needs help of ext3 to solve it elegantly.

 *How should I(we) deal with this problem.
- As a work-around, it's avoidable by just set RTC
  to 2007 or so before doing any ext3 operation.

Thank you.
--
Ryoichi KATO [EMAIL PROTECTED]
Audio Development  Engineering Div.
Sony Corporation Audio Business Group
Tel +81-3-3599-3862 / Fax +81-3-3599-3859


--
Ryoichi KATO [EMAIL PROTECTED]
System Design Dept. No4
Audio Development  Engineering Div.
Sony Corporation Audio Business Group
Tel +81-3-3599-3862 / Fax +81-3-3599-3859
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck bogus error report on orphan-list

2007-07-19 Thread Theodore Tso
On Fri, Jul 20, 2007 at 12:39:19AM +0900, [EMAIL PROTECTED] wrote:
 Hi,
 I hit a problem of ext3/e2fsck on orphan-list handling.

Wow, I'm rather impressed that this was sufficient for a presentation
at a conference.  You could have just sent me e-mail.  :-)

 
 The following sequence produces bogus e2fsck error report:
 /dev/XXX: Inodes that were part of a corrupted orphan linked list found.
 
1. Delete a file in an ext3 filesystem in early 1970

Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
do that.

2. Set RTC to 2007, and then mount/write the filesystem.

There is code that detects when the time is set back in the 1970's
(normally due to a bad clock battery) and thus disables this
particular check.  So it only triggers when the clock was previously
bad, and is now good.

 This is because i_dtime (deletion time) field is also used as a
 next-pointer of an orphan-list (stores inode number rather than time),
 and e2fsck handles it improperly.
 You will have the same probrem if you run e2fsck on an ext3
 filesystem with 1.2+ billion of files in it. (Is it possible?)

It's *possible* but in practice no one does it, because the fsck times
if the filesystem had that many inodes would be pretty scary --- and
there will always be times when you must run fsck --- for example, if
you have hardware induced corruption and you need to salvage the
filesystem because your backups had failed (or you weren't doing
backups :-).


The net is that the check is basically a sanity check to make any bugs
in the orphaned list handling would be discovered, although it can
also trigger if there is block device corruption where part of the
inode table is corrupted.  I had added hueristics that for most people
meant that it never triggered, so I'm surprised that it actually did
in your environment.  Still, if it did, the easist thing to do is to
just turn it off.

We haven't had bugs in that area of the code for a long time, and if
it's actually causing you trouble, the simplest thing to do is to just
comment out the check.  That, or just make sure that the time is
correct, which is generally a good idea anyway.  Hmm, maybe I should
add an e2fsck configuration parameter:

[options]
unreliable_system_clock = 1

Which disables various hueristics that assumes that the system clock
can be trusted.

 - Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Take 2]e2fsprogs: Undo I/O manager

2007-07-19 Thread Aneesh Kumar K.V
This patch fixes some bugs found during testing of the large
inode migration patches.

-aneesh


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] e2fsprogs: Add undoe2fs

2007-07-19 Thread Aneesh Kumar K.V
From: Aneesh Kumar K.V [EMAIL PROTECTED]

undoe2fs can be used to replay the transaction saved
in the transaction file using undo I/O Manager

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
---
 misc/Makefile.in |   10 +-
 misc/undoe2fs.c  |   77 ++
 2 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 misc/undoe2fs.c

diff --git a/misc/Makefile.in b/misc/Makefile.in
index ccad78c..51bb17a 100644
--- a/misc/Makefile.in
+++ b/misc/Makefile.in
@@ -15,7 +15,7 @@ INSTALL = @INSTALL@
 @[EMAIL PROTECTED] e2image.8
 
 SPROGS=mke2fs badblocks tune2fs dumpe2fs blkid logsave \
-   $(E2IMAGE_PROG) @FSCK_PROG@ 
+   $(E2IMAGE_PROG) @FSCK_PROG@  undoe2fs
 USPROGS=   mklost+found filefrag
 SMANPAGES= tune2fs.8 mklost+found.8 mke2fs.8 dumpe2fs.8 badblocks.8 \
e2label.8 findfs.8 blkid.8 $(E2IMAGE_MAN) \
@@ -39,6 +39,7 @@ E2IMAGE_OBJS= e2image.o
 FSCK_OBJS= fsck.o base_device.o
 BLKID_OBJS=blkid.o
 FILEFRAG_OBJS= filefrag.o
+UNDOE2FS_OBJS=  undoe2fs.o
 
 XTRA_CFLAGS=   -I$(srcdir)/../e2fsck -I.
 
@@ -47,7 +48,7 @@ SRCS= $(srcdir)/tune2fs.c $(srcdir)/mklost+found.c 
$(srcdir)/mke2fs.c \
$(srcdir)/badblocks.c $(srcdir)/fsck.c $(srcdir)/util.c \
$(srcdir)/uuidgen.c $(srcdir)/blkid.c $(srcdir)/logsave.c \
$(srcdir)/filefrag.c $(srcdir)/base_device.c \
-   $(srcdir)/../e2fsck/profile.c
+   $(srcdir)/../e2fsck/profile.c $(srcdir)/undoe2fs.c
 
 LIBS= $(LIBEXT2FS) $(LIBCOM_ERR) 
 DEPLIBS= $(LIBEXT2FS) $(LIBCOM_ERR) 
@@ -108,6 +109,10 @@ e2image: $(E2IMAGE_OBJS) $(DEPLIBS)
@echo  LD $@
@$(CC) $(ALL_LDFLAGS) -o e2image $(E2IMAGE_OBJS) $(LIBS) $(LIBINTL)
 
+undoe2fs: $(UNDOE2FS_OBJS) $(DEPLIBS)
+   @echo  LD $@
+   @$(CC) $(ALL_LDFLAGS) -o undoe2fs $(UNDOE2FS_OBJS) $(LIBS)
+
 base_device: base_device.c
@echo  LD $@
@$(CC) $(ALL_CFLAGS) $(ALL_LDFLAGS) $(srcdir)/base_device.c \
@@ -434,3 +439,4 @@ filefrag.o: $(srcdir)/filefrag.c
 base_device.o: $(srcdir)/base_device.c $(srcdir)/fsck.h
 profile.o: $(srcdir)/../e2fsck/profile.c $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/../e2fsck/profile.h prof_err.h
+undoe2fs.o: $(srcdir)/undoe2fs.c $(top_srcdir)/lib/ext2fs/tdb.h
diff --git a/misc/undoe2fs.c b/misc/undoe2fs.c
new file mode 100644
index 000..d14d44a
--- /dev/null
+++ b/misc/undoe2fs.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright IBM Corporation, 2007
+ * Author Aneesh Kumar K.V [EMAIL PROTECTED]
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Public
+ * License.
+ * %End-Header%
+ */
+
+#include stdio.h
+#include stdlib.h
+#include fcntl.h
+#if HAVE_ERRNO_H
+#include errno.h
+#endif
+#include ext2fs/tdb.h
+
+void usage(char *prg_name)
+{
+   fprintf(stderr,
+   Usage: %s transaction file filesystem\n, prg_name);
+   exit(1);
+
+}
+
+
+main(int argc, char *argv[])
+{
+   TDB_CONTEXT *tdb;
+   TDB_DATA key, data;
+   unsigned long  blk_num;
+   unsigned long long int location;
+   int fd, retval;
+
+   if (argc != 3)
+   usage(argv[0]);
+
+   tdb = tdb_open(argv[1], 0, 0, O_RDONLY, 0600);
+
+   if (!tdb) {
+   fprintf(stderr, Failed tdb_open %s\n,  strerror(errno));
+   exit(1);
+   }
+
+   fd = open(argv[2], O_WRONLY);
+   if (fd  == -1) {
+   fprintf(stderr, Failed open %s\n, strerror(errno));
+   exit(1);
+   }
+
+   for (key = tdb_firstkey(tdb); key.dptr; key = tdb_nextkey(tdb, key)) {
+   data = tdb_fetch(tdb, key);
+   if (!data.dptr) {
+   fprintf(stderr,
+   Failed tdb_fetch %s\n, tdb_errorstr(tdb));
+   exit(1);
+   }
+   blk_num = *(unsigned long *)key.dptr;
+   location = blk_num * data.dsize;
+   printf(Replayed transaction of size %d at location %ld\n,
+   data.dsize, blk_num);
+   retval = lseek(fd, location, SEEK_SET);
+   if (retval == -1) {
+   fprintf(stderr, Failed lseek %s\n, strerror(errno));
+   exit(1);
+   }
+   retval = write(fd, data.dptr, data.dsize);
+   if (retval == -1) {
+   fprintf(stderr, Failed write %s\n, strerror(errno));
+   exit(1);
+   }
+   }
+   close(fd);
+   tdb_close(tdb);
+
+}
-- 
1.5.3.rc2-dirty

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e2fsck bogus error report on orphan-list

2007-07-19 Thread Ryoichi KATO
At Thu, 19 Jul 2007 12:55:10 -0400,
Theodore Tso wrote:
 On Fri, Jul 20, 2007 at 12:39:19AM +0900, [EMAIL PROTECTED] wrote:
  Hi,
  I hit a problem of ext3/e2fsck on orphan-list handling.
 
 Wow, I'm rather impressed that this was sufficient for a presentation
 at a conference.  You could have just sent me e-mail.  :-)

I know it's a rare case for most of the people and not sure
it is a 'bug',  but I thought it might happen more offten for CE people.
So, I asked for opinions of CE people in a lighting session of
CELF Technical Jamboree.


 1. Delete a file in an ext3 filesystem in early 1970
 
 Dare I ask *why* the system clock was set in the 1970's?  Umm... don't
 do that.

As Tim pointed out, embedded devices offten omit RTC battery.


 2. Set RTC to 2007, and then mount/write the filesystem.
 
 There is code that detects when the time is set back in the 1970's
 (normally due to a bad clock battery) and thus disables this
 particular check.  So it only triggers when the clock was previously
 bad, and is now good.

Actually, it's a *real* problem happend for my car navigation product.
Until GPS signal is available, it's clock was 1970.
And for servers and PCs, it's possible that RTC backup battery run out,
then clock get set correctly afterward by, say, NTP.


 The net is that the check is basically a sanity check to make any bugs
 in the orphaned list handling would be discovered, although it can
 also trigger if there is block device corruption where part of the
 inode table is corrupted.  I had added hueristics that for most people
 meant that it never triggered, so I'm surprised that it actually did
 in your environment.  Still, if it did, the easist thing to do is to
 just turn it off.

Now, after things behind the problem turned out, it's easy.
But let me point out that,

 * It is very difficult to relate RTC to the problem.
   No clue without digging into e2fsck source code.

 * -p (preen) option of e2fsck doen't fix it automatically.
   Though I'm not sure but, maybe it's safe to correct the
   problem automatically?


Actually, it took me for several weeks to solve, because it is rare.
My system only reset RTC for hardware reset or when main battery run out
but not for software reset. But it can happen.


Thank you.
--
Ryoichi KATO [EMAIL PROTECTED]
System Design Dept. No4
Audio Development  Engineering Div.
Sony Corporation Audio Business Group
Tel +81-3-3599-3862 / Fax +81-3-3599-3859
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Faster ext2_clear_inode()

2007-07-19 Thread Andrew Morton
On Mon, 9 Jul 2007 22:00:03 +0200
Jörn Engel [EMAIL PROTECTED] wrote:

 On Mon, 9 July 2007 22:01:48 +0400, Alexey Dobriyan wrote:
  
  Yes. Note that ext2_clear_inode() is referenced from ext2_sops, so even
  empty, it leaves traces in resulting kernel.
 
 Is that your opinion or have you actually measured a difference?
 I strongly suspect that compilers are smart enough to optimize away a
 call to an empty static function.
 

It saves a big 16 bytes of text here.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html