Re: kernel BUG at /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29

2006-03-15 Thread Alexander Zarochentsev
Hello,

please try the attached patch.

On Wednesday 15 March 2006 10:50, Christian Trefzer wrote:
 Hi everyone,

 I got this half an hour ago, with some processes left in D state,
 namely ooffice.bin and two instances of procmail, as this happened on
 my /home LV:

 kernel BUG at
 /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_convers
ion.c:29! invalid opcode:  [#1]
 PREEMPT
 Modules linked in: mga drm w83781d hwmon_vid hwmon i2c_isa
 snd_seq_midi snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event
 snd_seq snd_cmipci snd_opl3_lib snd_hwdep snd_mpu401_uart ohci_hcd
 floppy sr_mod cdrom pata_via i2c_viapro aic7xxx scsi_transport_spi
 ehci_hcd uhci_hcd 3c59x mii snd_ens1370 gameport snd_rawmidi
 snd_seq_device snd_pcm snd_timer snd_ak4531_codec snd soundcore
 snd_page_alloc via_agp agpgart usbcore xfs exportfs reiser4 ext2 loop
 lp parport_pc parport rtc psmouse reiserfs dm_mod raid5 raid1 xor
 md_mod pata_pdc2027x libata sd_mod scsi_mod unix CPU:0
 EIP:0060:[f2daa02d]Not tainted VLI
 EFLAGS: 00010286   (2.6.16-rc5 #10)
 EIP is at get_exclusive_access+0x31/0x44 [reiser4]
 eax: b26d6c04   ebx:    ecx: ec54bbf4   edx: b736fdc0
 esi: 3dbf3000   edi: 6c85   ebp: 6c85   esp: ded36f0c
 ds: 007b   es: 007b   ss: 0068
 Process soffice.bin (pid: 12533, threadinfo=ded36000 task=b3b7b070)
 Stack: 0f2da83da  c52ca544 e52935a8 7000 b014c75f
 c52ca544 e7b3cc80 b0151cd1 b6b54354 b6b5434c 3dbf3000 ed113360
 e6a02160 b26d6bc0 ec54bc4c ec54bbf4  6c85 0001
  ec54bc00  6c85 Call Trace:
 [f2da83da] write_unix_file+0x1ba/0x60c [reiser4]
 [f2da8220] write_unix_file+0x0/0x60c [reiser4]
 [b0101135] syscall_call+0x7/0xb
 Code: ff 21 e0 8b 00 8b 80 b0 04 00 00 8b 40 40 8b 50 08 85 d2 75 16
 ba 01 00 ff ff 89 c8 0f c1 10 85 d2 75 12 c7 41 24 01 00 00 00 c3
 0f 0b 1d 00 04 8c dc f2 eb e0 51 e8 13 ac 35 bd 59 eb e5 55 89


 I had another occurrence of something looking similar at first
 glance, repeatedly grinding my laptop to halt when I was on a trip.
 The only way to make it go away was to wipe the device by dd'ing
 /dev/zero to it. Not even tar-backup and mkfs did the job - otherwise
 I could have left out the word repeatedly...

 The only thing I could imagine other than a serious problem wrt.
 reiser4 code is a soft bad block relocated by the drive upon write,
 but there was nothing like a read error in the logs. Furthermore I
 wanted the gurus to know since it occured to me more than once.


 Thanks for your time!
 Chris



 FYI, here comes something about the disk, including SMART error log:


 /dev/sda:

 ATA device, with non-removable media
   Model Number:   SAMSUNG SV1203N
   Serial Number:  S01CJ10Y410901
   Firmware Revision:  TQ100-30
 Standards:
   Supported: 7 6 5 4
   Likely used: 7
 Configuration:
   Logical max current
   cylinders   16383   16383
   heads   16  16
   sectors/track   63  63
   --
   CHS current addressable sectors:   16514064
   LBAuser addressable sectors:  234493056
   LBA48  user addressable sectors:  234493056
   device size with M = 1024*1024:  114498 MBytes
   device size with M = 1000*1000:  120060 MBytes (120 GB)
 Capabilities:
   LBA, IORDY(can be disabled)
   Queue depth: 1
   Standby timer values: spec'd by Standard, no device specific minimum
   R/W multiple sector transfer: Max = 16  Current = 16
   Recommended acoustic management value: 254, current value: 254
   DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
   PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=240ns  IORDY flow control=120ns
 Commands/features:
   Enabled Supported:
  *READ BUFFER cmd
  *WRITE BUFFER cmd
  *Host Protected Area feature set
  *Look-ahead
  *Write cache
  *Power Management feature set
   Security Mode feature set
  *SMART feature set
  *FLUSH CACHE EXT command
  *Mandatory FLUSH CACHE command
  *Device Configuration Overlay feature set
  *48-bit Address feature set
  *Automatic Acoustic Management feature set
   SET MAX security extension
  *DOWNLOAD MICROCODE cmd
  *SMART self-test
  *SMART error logging
 Security:
   Master password revision code = 65534
   supported
   not enabled
   not locked
   not frozen
   not expired: security count
   supported: enhanced erase
   56min for SECURITY ERASE UNIT. 56min for ENHANCED SECURITY ERASE
 UNIT. HW reset results:
   CBLID- above Vih
   Device num = 0 determined by the jumper
 Checksum: correct



 smartctl version 5.33 [i386-pc-linux-gnu] Copyright (C) 

Re: kernel BUG at /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29

2006-03-15 Thread Christian Trefzer
Hi Alexander,

On Wed, Mar 15, 2006 at 11:11:49AM +0300, Alexander Zarochentsev wrote:
 please try the attached patch.

Wow, that was FAST!

Will do ASAP, but for now I'll have to _hurry_ to work. I'll build a
kernel with your patch right when I come back.

Thanks a bunch!

Chris



pgpTE8mCvAmrh.pgp
Description: PGP signature


Re: kernel BUG at /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29

2006-03-15 Thread Christian Trefzer
Hi again,

small update and clarification: I wiped the laptop's volume on which the
problem occured, but my workstation's has _not yet_ been wiped. But:

kernel BUG at 
/usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29!
invalid opcode:  [#1]
PREEMPT 
Modules linked in: mga drm usb_storage libusual w83781d hwmon_vid hwmon i2c_isa 
snd_seq_midi snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq 
snd_cmipci snd_opl3_lib snd_hwdep snd_mpu401_uart ohci_hcd floppy sr_mod cdrom 
pata_via i2c_viapro aic7xxx scsi_transport_spi ehci_hcd uhci_hcd 3c59x mii 
snd_ens1370 gameport snd_rawmidi snd_seq_device snd_pcm snd_timer 
snd_ak4531_codec snd soundcore snd_page_alloc via_agp agpgart usbcore xfs 
exportfs reiser4 ext2 loop lp parport_pc parport rtc psmouse reiserfs dm_mod 
raid5 raid1 xor md_mod pata_pdc2027x libata sd_mod scsi_mod unix
CPU:0
EIP:0060:[f2daa02d]Not tainted VLI
EFLAGS: 00010282   (2.6.16-rc5 #10) 
EIP is at get_exclusive_access+0x31/0x44 [reiser4]
eax: bdb66604   ebx:    ecx: d44cd294   edx: d09041e0
esi: 3d762000   edi: 6c85   ebp: 6c85   esp: c33c9f0c
ds: 007b   es: 007b   ss: 0068
Process soffice.bin (pid: 687, threadinfo=c33c9000 task=cfc0da90)
Stack: 0f2da83da  e6578e8c d2f12ee8 7000 b014c75f e6578e8c 
b19e8900 
   b0151cd1 d025fda8 d025fd9c 3d762000 c1465440 b7c775c0 bdb665c0 d44cd2ec 
   d44cd294  6c85 0001  d44cd2a0  6c85 
Call Trace:
 [f2da83da] write_unix_file+0x1ba/0x60c [reiser4]
 [f2da8220] write_unix_file+0x0/0x60c [reiser4]
 [b0101135] syscall_call+0x7/0xb
Code: ff 21 e0 8b 00 8b 80 b0 04 00 00 8b 40 40 8b 50 08 85 d2 75 16 ba 01 00 
ff ff 89 c8 0f c1 10 85 d2 75 12 c7 41 24 01 00 00 00 c3 0f 0b 1d 00 04 8c dc 
f2 eb e0 51 e8 13 ac 35 bd 59 eb e5 55 89 
 6lp0: ECP mode

This is what I got with the laptop's disk in a USB case and the home
volume from there mounted as /home. Processes stuck in D state were all
trying to write to /home. I have to mention that the volume on the
laptop disk which had problems before was _not_ home, so this is not a
repetition of something I've seen before. Basically, I have now reset my
sysctl values regarding the vm subsystem to the default values and will
try to reproduce.

Workstation has 1GB of RAM and is trying to build OOo2 from source at
the time, but writing the build tree to a totally different volume. The
build process also is not interrupted by this BUG.


Thanks,

Chris



pgpsJS6vH1G57.pgp
Description: PGP signature


Re: fatal not directory found error at pass 3a of reiserfsck

2006-03-15 Thread Alain Knaff

Vitaly Fertman wrote:
if rebuilding the tree has been finished, you do not need to rebuild 
it again, you can merely double check the fs consistency with --check

(default) option.



Even though 3.6.19 apparently succeeded with its rebuild-tree, 3.6.20 with 
--check still found errors, so we had to run this with rebuild-tree as well.


But that run finished all right, and succeeded in retrieving one additional 
file, which 3.6.19 hadn't.


Many thanks,

Alain


Re: State of the Reiser4 FS

2006-03-15 Thread Hans Reiser
Avuton Olrich wrote:

On 3/15/06, Hans Reiser [EMAIL PROTECTED] wrote:
  

Avuton Olrich wrote:


I just saw a thread on the LKML a minute ago asking about the state of
getting the patch into vanilla linux. I read Andrew Morton's post
about a month ago stating that it could happen soon, but was unlikely
due to there not actually being a need for it to go into mainline (no
major distro default, etc...).

  

Can you supply a reference to this post?  The only distro which is not
influenced by performance numbers when selecting a filesystem is RedHat,
and most of the rest are just waiting to be sure that politics will not
kill reiser4 inclusion.  I am sure I can come up with a we will support
it if you let it in petition of distros if such a silliness is needed.



I was refering to this post:
http://marc.theaimsgroup.com/?l=linux-kernelm=113775878722100w=2

Thanks for all the answers
--
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
  

Oh, well, the overall tone of that email is not all that negative.

We will work on the 4k at a time issue, overcome that issue technically,
and then after that is resolved deal with generating desire for a
filesystem that is 2x (reiser4.0) to 4x (4.1alpha with compression) faster.


Permission denied while accessing pseudos

2006-03-15 Thread Yoanis Gil Delgado
Hi there. I've been playing around with pseudos and whenever i try to access a 
pseudo dir or file i get a Permission denied error. For example:
$ ls hello.txt//rwx
Permission denied
I also tried 
$ ls hello.txt/..plugins/rwx
 Permission denied
I also tried to list a missing plugin and i get the same error. For example
$ ls hello.txt//pepito
Permission denied

Thanks.




Re: Permission denied while accessing pseudos

2006-03-15 Thread Edward Shishkin

Yoanis Gil Delgado wrote:

Hi there. I've been playing around with pseudos and whenever i try to access a 
pseudo dir or file i get a Permission denied error. For example:

$ ls hello.txt//rwx
Permission denied
I also tried 
	$ ls hello.txt/..plugins/rwx

 Permission denied
I also tried to list a missing plugin and i get the same error. For example
$ ls hello.txt//pepito
Permission denied

Thanks.
 



Hello.
What patch do you use to enable pseudo?



Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Schäfer
On 10:29 Wed 15 Mar , Hans Reiser wrote:
 Tell the mosix guys we would be willing to cooperate with them regarding
 their problem.

If it was that easy... The problem for openMosix is that most devices
fetch data in 4k blocks via copy_from_user(). For migrated processes,
openMosix intercepts these calls and forwards them to the node which
currently hosts the process. This forwarding yields a high latency
penalty.

Obviously there are two ways to get rid of this problem: 

* modify _every_ Linux device driver to use a
  _a_lot_more_than_4k_at_a_time_ approach or

* implement a second read ahead buffer which fetches large blocks via
  the network in the background and answers calls to copy_from_user()
  directly from the local buffer

In my _very_ humble opinion the first approach would be much nicer,
but after you guys had so many trouble with just your filesystem, I
don't see that one coming, not at all.

So I think the long term strategy for oM will the second, double
buffering approach. At least I couldn't think of any other realistic,
feasible way.

BTW: how are you guys planning to solve this 4k issue? Will you revert
to small blocks or will you pretend to perform 4k transfers and
assemble those in the background to, again, process large chunks at
once? If yes, wouldn't this seriously increase CPU usage due to
(most likely) unnecessary data duplication?

Regards
-Andreas


Re: Permission denied while accessing pseudos

2006-03-15 Thread Yoanis Gil Delgado
On Wednesday 15 March 2006 14:04, you wrote:
 Yoanis Gil Delgado wrote:
 Hi there. I've been playing around with pseudos and whenever i try to
  access a pseudo dir or file i get a Permission denied error. For example:
  $ ls hello.txt//rwx
  Permission denied
 I also tried
  $ ls hello.txt/..plugins/rwx
   Permission denied
 I also tried to list a missing plugin and i get the same error. For
  example $ ls hello.txt//pepito
  Permission denied
 
 Thanks.

 Hello.
 What patch do you use to enable pseudo?
I use, reiser4-2.6.15-enable-metas.diff available at the namesys ftp.



Re: kernel BUG at /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29

2006-03-15 Thread Christian Trefzer
Hi zam,

after the first build of OOo has finished (as gift for my laptop) the
machine is working on it's own copy, after a reboot with the exact same
setup yielding my BUG problems plus your patch applied. I'll keep an eye
on my syslog. Sorry for taking so long!

Regards,
Chris



pgpzVAOAAJdik.pgp
Description: PGP signature


Re: Permission denied while accessing pseudos

2006-03-15 Thread Edward Shishkin

Yoanis Gil Delgado wrote:


On Wednesday 15 March 2006 14:04, you wrote:
 


Yoanis Gil Delgado wrote:
   


Hi there. I've been playing around with pseudos and whenever i try to
access a pseudo dir or file i get a Permission denied error. For example:
$ ls hello.txt//rwx
Permission denied
I also tried
$ ls hello.txt/..plugins/rwx
 Permission denied
I also tried to list a missing plugin and i get the same error. For
example $ ls hello.txt//pepito
Permission denied

Thanks.
 


Hello.
What patch do you use to enable pseudo?
   


I use, reiser4-2.6.15-enable-metas.diff available at the namesys ftp.

 



In order to access pseudos of a regular file, the last one should have 
executable permission.

Set it by chmod (1) and try again.

Thanks,
Edward.


Re: State of the Reiser4 FS

2006-03-15 Thread Hans Reiser
Jonathan Briggs wrote:

On Tue, 2006-03-14 at 23:14 -0800, Hans Reiser wrote:
[snip]
  

They claim that if we don't use the ext3 code
in our fs then they will be forced to shoulder an extra burden to
maintain our code.  We are not allowed to specify that they should not
maintain our code at all.  I need to read more Kafka I think, it is hard
for me to understand it all.



Err, this actually does make a lot of sense Hans.

The mainline Linux Kernel code is maintained by everyone that can
convince Linus or a sub-maintainer to accept their patch.  In order to
  

I am the reiserfs/reiser4 sub-maintainer.  So, if reiser4 works well,
and is faster than any other Linux FS, and it is,  maintaining it over
time is for me to worry about, not them. 


Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Dilger
On Mar 15, 2006  20:27 +0100, Andreas Sch�fer wrote:
 If it was that easy... The problem for openMosix is that most devices
 fetch data in 4k blocks via copy_from_user(). For migrated processes,
 openMosix intercepts these calls and forwards them to the node which
 currently hosts the process. This forwarding yields a high latency
 penalty.
 
 Obviously there are two ways to get rid of this problem: 
 
 * modify _every_ Linux device driver to use a
   _a_lot_more_than_4k_at_a_time_ approach or
 
 * implement a second read ahead buffer which fetches large blocks via
   the network in the background and answers calls to copy_from_user()
   directly from the local buffer

Or you can use a network filesystem like Lustre that handles this
itself ;-).  Sadly, though, it has to do both of these to get
good performance, via {sub,per}version of the VFS/VM.

Clients do delayed-write (writeback cache, with write credits from
the server to accound for space) to avoid small RPCs.  They also
do large amounts of readahead (in large chunks) to improve reads
for applications and the VM that breaks up all reads into 4kB chunks.

Servers also do batch block allocation and then large direct writes
instead of going through the VFS/VM.  There are still a number of
device drivers that break up bios into chunks smaller than 1MB, and
that hurts performance.

Having a generic delayed/batch allocation mechanism is definitely
the right way to go, and from my reading of linux-fsdevel this is
underway by some folks at IBM.  Since we have to support customers
dating back to 2.4.21 it will be a while before we can move over to
the newer APIs, once they are available.

 BTW: how are you guys planning to solve this 4k issue? Will you revert
 to small blocks or will you pretend to perform 4k transfers and
 assemble those in the background to, again, process large chunks at
 once? If yes, wouldn't this seriously increase CPU usage due to
 (most likely) unnecessary data duplication?

It doesn't result in data duplication, per se, since the pages are
copied into kernel space only once.  What it does mean is that there
needs to be a duplication of infrastructure in order to reassemble
and track all of these pages.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Schäfer
On 12:43 Wed 15 Mar , Hans Reiser wrote:
 I am the reiserfs/reiser4 sub-maintainer.  So, if reiser4 works well,
 and is faster than any other Linux FS, and it is,  maintaining it over
 time is for me to worry about, not them. 

I feel this thread is about to trail off to shores we all know too
well. AFAICS we do have two completely different issues here: 

* The core maintainers want the whole code to adhere to certain
  standards. This doesn't have anything to do with performance
  etc. It's just for the fact that this standard is both, a sign of
  reliability and maintainability (even for the unlikely case that
  Namesys would disappear)

* Reiser4 doesn't adhere to some of these standards because they don't
  make much sense from a performance (and design) point of view. 

I think the short term solution should be to adapt Reiser4 to the
standard, but in the long run keep bugging the Linux people to change
some paradigms (as one of Linux' core advantages has always been the
ability and willingness to throw decayed code overboard).

When you think about it, both POV do make sense. It's just so sad this
whole debate has become much more a political than a style debate.

-Andreas



Static overrun in reiser3

2006-03-15 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi Hans -

I've been playing around with the Coverity code checker, and while I
think it still sees a few too many false positives, it's a good tool.

Anyway, one of the potential bugs it came up with in reiserfs was this one:

struct tree_balance contains a number of arrays of size MAX_HEIGHT (5).
In fix_nodes(), line 2502, we see:
p_s_tb-insert_size[n_h + 1] =
(DC_SIZE + KEY_SIZE) * (p_s_tb-blknum[n_h]
- - 1);

I haven't run a thorough analysis, but is it possible for n_h to be 4
there, and then n_h + 1 would be 5, overrunning into the next field of
struct tree_balance? The tool seems to think so, but it also thought
that not checking that dentry-d_inode != NULL after calling
inode-i_op-mkdir was invalid, even though a successful return value
implies that dentry-d_inode != NULL.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEGIkGLPWxlyuTD7IRAno5AJ92Qql/sMnii2Kk2VdFlLs/Hbpc3ACffcjT
qsw0pCCjm2DfeMA67n5sLu4=
=1bzF
-END PGP SIGNATURE-


Re: Static overrun in reiser3

2006-03-15 Thread Hans Reiser
Jeff Mahoney wrote:


 Hi Hans -

 I've been playing around with the Coverity code checker, and while I
 think it still sees a few too many false positives, it's a good tool.

Thanks for doing that work!  If you could do it for V4, that would be
great too.  If not, maybe Edward could do it.


 Anyway, one of the potential bugs it came up with in reiserfs was this
 one:

 struct tree_balance contains a number of arrays of size MAX_HEIGHT (5).
 In fix_nodes(), line 2502, we see:
 p_s_tb-insert_size[n_h + 1] =
 (DC_SIZE + KEY_SIZE) * (p_s_tb-blknum[n_h]
 - 1);

 I haven't run a thorough analysis, but is it possible for n_h to be 4
 there, and then n_h + 1 would be 5, overrunning into the next field of
 struct tree_balance? The tool seems to think so, but it also thought
 that not checking that dentry-d_inode != NULL after calling
 inode-i_op-mkdir was invalid, even though a successful return value
 implies that dentry-d_inode != NULL.

I'll let vs answer this.


 -Jeff

 --
 Jeff Mahoney
 SUSE Labs



Re: Static overrun in reiser3

2006-03-15 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hans Reiser wrote:
 Jeff Mahoney wrote:
 
 Hi Hans -

 I've been playing around with the Coverity code checker, and while I
 think it still sees a few too many false positives, it's a good tool.
 
 Thanks for doing that work!  If you could do it for V4, that would be
 great too.  If not, maybe Edward could do it.

Ah, sorry, all I can do is review their database. I can't actually run
the checker myself.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEGI2dLPWxlyuTD7IRApBbAJ9UYkwroLoeRtRJhXQKAuezoYoo+gCePed1
P9mZqgnf5FcU9FvrYfq3rmM=
=dE9f
-END PGP SIGNATURE-


Re: Static overrun in reiser3

2006-03-15 Thread Hans Reiser
Jeff Mahoney wrote:

 Hans Reiser wrote:

 Jeff Mahoney wrote:

 Hi Hans -
 
 I've been playing around with the Coverity code checker, and while I
 think it still sees a few too many false positives, it's a good tool.

 Thanks for doing that work!  If you could do it for V4, that would be
 great too.  If not, maybe Edward could do it.


 Ah, sorry, all I can do is review their database. I can't actually run
 the checker myself.

Ah, so there is a database somewhere that we can look at?


 -Jeff

 --
 Jeff Mahoney
 SUSE Labs



Re: Static overrun in reiser3

2006-03-15 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hans Reiser wrote:
 Jeff Mahoney wrote:
 
 Hans Reiser wrote:

 Jeff Mahoney wrote:
 Hi Hans -

 I've been playing around with the Coverity code checker, and while I
 think it still sees a few too many false positives, it's a good tool.
 Thanks for doing that work!  If you could do it for V4, that would be
 great too.  If not, maybe Edward could do it.

 Ah, sorry, all I can do is review their database. I can't actually run
 the checker myself.
 
 Ah, so there is a database somewhere that we can look at?

Yes, you have to register at scan.coverity.com. There was a medium-sized
thread on LKML about it around two weeks ago.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEGJA6LPWxlyuTD7IRAvcpAKCUvdOSMQYP9WD3IF8Qm3i2mxomaACgm0PL
VnxXK5P6mKc37FDNCl0tvCs=
=wCll
-END PGP SIGNATURE-


4k at a time only works well for an OS that does less per iteration than Linux does

2006-03-15 Thread Hans Reiser
To sum up for akpm what was said previously in this thread:

Linux needs to take the lesson learned from bios about how going up and
down through various software layers once per 4k is too expensive, and
generalize it to where in general, in all of the layers, we are
operating many pages at a time.  It would be nice for code simplicity if
the same struct, with perhaps some optional attachments to it, was used
the whole trip (rather than pagevecs in once place and bios in another). 

For kernel newbies on lkml let me summarize it as for every trip made
through these layers, a whole lot more than 4k of code gets traversed,
all of it seeming to have some legitimate purpose;-), so only handling
4k per trip is more expensive than you might guess.

Now, for the question that was raised:

Andreas Schäfer wrote:

On 10:29 Wed 15 Mar , Hans Reiser wrote:
  

Tell the mosix guys we would be willing to cooperate with them regarding
their problem.



If it was that easy... The problem for openMosix is that most devices
fetch data in 4k blocks via copy_from_user(). For migrated processes,
openMosix intercepts these calls and forwards them to the node which
currently hosts the process. This forwarding yields a high latency
penalty.

Obviously there are two ways to get rid of this problem: 

* modify _every_ Linux device driver to use a
  _a_lot_more_than_4k_at_a_time_ approach or
  

I suspect that if the code benefitted more than mosix in its
implementation, and I think it would, then akpm might not be opposed to
the one above provided someone wrote it.  There is a rumor he already
understands this is a problem.  Maybe he can comment on the rumor.;-)

* implement a second read ahead buffer which fetches large blocks via
  the network in the background and answers calls to copy_from_user()
  directly from the local buffer

In my _very_ humble opinion the first approach would be much nicer,
but after you guys had so many trouble with just your filesystem, I
don't see that one coming, not at all.

So I think the long term strategy for oM will the second, double
buffering approach. At least I couldn't think of any other realistic,
feasible way.

BTW: how are you guys planning to solve this 4k issue? Will you revert
to small blocks 

not sure what that means.  You mean surrender?  Not yet.;-)  First we
fix our code so that reiser4 really does what I am arguing it needs to
do, then we will argue it is the right approach.

or will you pretend to perform 4k transfers and
assemble those in the background to, again, process large chunks at
once? 

Oh is that ugly  no.  Dynamically sized pagevecs (I call them
pagezams) are better.

We will process things in large chunks all the way down to the io layer,
and then wait for Nate or others to fix the io layer someday.

If yes, wouldn't this seriously increase CPU usage due to
(most likely) unnecessary data duplication?

Regards
-Andreas


  




Re: kernel BUG at /usr/src/sources/linux-2.6.16-rc5/fs/reiser4/plugin/file/tail_conversion.c:29

2006-03-15 Thread Alexander Zarochentsev
Hello,

On Wednesday 15 March 2006 22:40, Christian Trefzer wrote:
 Hi zam,

 after the first build of OOo has finished (as gift for my laptop)
 the machine is working on it's own copy, after a reboot with the
 exact same setup yielding my BUG problems plus your patch applied.
 I'll keep an eye on my syslog. Sorry for taking so long!

I guess it was in release_unix_file where get_exclusive_access is called 
outside reiser4_context.

can you please replace the old patch by the attached one?


 Regards,
 Chris

-- 
Alex.
From: Alexander Zarochentsev [EMAIL PROTECTED]

Have get_exclusive_access() restart transaction before taking r/w semaphore.

There are several places in write_unix_file and extent_balance_dirty_pages
where transaction may be open before calling get_exclusive_access.  It triggers
the deadlock detection BUG_ON inside get_exclusive_access().

This patch fixes the bug by embedding txn_restart into the
get_exclusive_access() code and cleanes up other places where txn_restart() was
called right before get_eclusive_access(). 

Signed-off-by: [EMAIL PROTECTED]

 fs/reiser4/plugin/file/file.c|   13 -
 fs/reiser4/plugin/file/tail_conversion.c |8 +---
 2 files changed, 5 insertions(+), 16 deletions(-)

Index: linux-2.6.16-rc5-mm2/fs/reiser4/plugin/file/file.c
===
--- linux-2.6.16-rc5-mm2.orig/fs/reiser4/plugin/file/file.c
+++ linux-2.6.16-rc5-mm2/fs/reiser4/plugin/file/file.c
@@ -1451,9 +1451,6 @@ static int commit_file_atoms(struct inod
 	int result;
 	unix_file_info_t *uf_info;
 
-	/* close current transaction */
-	txn_restart_current();
-
 	uf_info = unix_file_inode_data(inode);
 
 	/*
@@ -2174,7 +2171,6 @@ append_and_or_overwrite(hint_t * hint, s
 done_lh(hint-lh);
 if (!exclusive) {
 	drop_nonexclusive_access(uf_info);
-	txn_restart_current();
 	get_exclusive_access(uf_info);
 }
 result = tail2extent(uf_info);
@@ -2964,15 +2960,6 @@ int delete_object_unix_file(struct inode
 	unix_file_info_t *uf_info;
 	int result;
 
-	/*
-	 * transaction can be open already. For example:
-	 * writeback_inodes-sync_sb_inodes-reiser4_sync_inodes-
-	 * generic_sync_sb_inodes-iput-generic_drop_inode-
-	 * generic_delete_inode-reiser4_delete_inode-delete_object_unix_file.
-	 * So, restart transaction to avoid deadlock with file rw semaphore.
-	 */
-	txn_restart_current();
-
 	if (inode_get_flag(inode, REISER4_NO_SD))
 		return 0;
 
Index: linux-2.6.16-rc5-mm2/fs/reiser4/plugin/file/tail_conversion.c
===
--- linux-2.6.16-rc5-mm2.orig/fs/reiser4/plugin/file/tail_conversion.c
+++ linux-2.6.16-rc5-mm2/fs/reiser4/plugin/file/tail_conversion.c
@@ -16,17 +16,19 @@
 /* exclusive access to a file is acquired when file state changes: tail2extent, empty2tail, extent2tail, etc */
 void get_exclusive_access(unix_file_info_t * uf_info)
 {
+	reiser4_context * ctx = get_current_context_check();
+
 	assert(nikita-3028, schedulable());
 	assert(nikita-3047, LOCK_CNT_NIL(inode_sem_w));
 	assert(nikita-3048, LOCK_CNT_NIL(inode_sem_r));
 	/*
-	 * deadlock detection: sometimes we commit a transaction under
+	 * deadlock avoidance: sometimes we commit a transaction under
 	 * rw-semaphore on a file. Such commit can deadlock with another
 	 * thread that captured some block (hence preventing atom from being
 	 * committed) and waits on rw-semaphore.
 	 */
-	assert(nikita-3361, get_current_context()-trans-atom == NULL);
-	BUG_ON(get_current_context()-trans-atom != NULL);
+	if (ctx != NULL)
+		txn_restart(ctx);
 	LOCK_CNT_INC(inode_sem_w);
 	down_write(uf_info-latch);
 	uf_info-exclusive_use = 1;