Re: State of the Reiser4 FS

2006-03-15 Thread Hans Reiser
Avuton Olrich wrote:

On 3/15/06, Hans Reiser [EMAIL PROTECTED] wrote:
  

Avuton Olrich wrote:


I just saw a thread on the LKML a minute ago asking about the state of
getting the patch into vanilla linux. I read Andrew Morton's post
about a month ago stating that it could happen soon, but was unlikely
due to there not actually being a need for it to go into mainline (no
major distro default, etc...).

  

Can you supply a reference to this post?  The only distro which is not
influenced by performance numbers when selecting a filesystem is RedHat,
and most of the rest are just waiting to be sure that politics will not
kill reiser4 inclusion.  I am sure I can come up with a we will support
it if you let it in petition of distros if such a silliness is needed.



I was refering to this post:
http://marc.theaimsgroup.com/?l=linux-kernelm=113775878722100w=2

Thanks for all the answers
--
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
  

Oh, well, the overall tone of that email is not all that negative.

We will work on the 4k at a time issue, overcome that issue technically,
and then after that is resolved deal with generating desire for a
filesystem that is 2x (reiser4.0) to 4x (4.1alpha with compression) faster.


Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Schäfer
On 10:29 Wed 15 Mar , Hans Reiser wrote:
 Tell the mosix guys we would be willing to cooperate with them regarding
 their problem.

If it was that easy... The problem for openMosix is that most devices
fetch data in 4k blocks via copy_from_user(). For migrated processes,
openMosix intercepts these calls and forwards them to the node which
currently hosts the process. This forwarding yields a high latency
penalty.

Obviously there are two ways to get rid of this problem: 

* modify _every_ Linux device driver to use a
  _a_lot_more_than_4k_at_a_time_ approach or

* implement a second read ahead buffer which fetches large blocks via
  the network in the background and answers calls to copy_from_user()
  directly from the local buffer

In my _very_ humble opinion the first approach would be much nicer,
but after you guys had so many trouble with just your filesystem, I
don't see that one coming, not at all.

So I think the long term strategy for oM will the second, double
buffering approach. At least I couldn't think of any other realistic,
feasible way.

BTW: how are you guys planning to solve this 4k issue? Will you revert
to small blocks or will you pretend to perform 4k transfers and
assemble those in the background to, again, process large chunks at
once? If yes, wouldn't this seriously increase CPU usage due to
(most likely) unnecessary data duplication?

Regards
-Andreas


Re: State of the Reiser4 FS

2006-03-15 Thread Hans Reiser
Jonathan Briggs wrote:

On Tue, 2006-03-14 at 23:14 -0800, Hans Reiser wrote:
[snip]
  

They claim that if we don't use the ext3 code
in our fs then they will be forced to shoulder an extra burden to
maintain our code.  We are not allowed to specify that they should not
maintain our code at all.  I need to read more Kafka I think, it is hard
for me to understand it all.



Err, this actually does make a lot of sense Hans.

The mainline Linux Kernel code is maintained by everyone that can
convince Linus or a sub-maintainer to accept their patch.  In order to
  

I am the reiserfs/reiser4 sub-maintainer.  So, if reiser4 works well,
and is faster than any other Linux FS, and it is,  maintaining it over
time is for me to worry about, not them. 


Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Dilger
On Mar 15, 2006  20:27 +0100, Andreas Sch�fer wrote:
 If it was that easy... The problem for openMosix is that most devices
 fetch data in 4k blocks via copy_from_user(). For migrated processes,
 openMosix intercepts these calls and forwards them to the node which
 currently hosts the process. This forwarding yields a high latency
 penalty.
 
 Obviously there are two ways to get rid of this problem: 
 
 * modify _every_ Linux device driver to use a
   _a_lot_more_than_4k_at_a_time_ approach or
 
 * implement a second read ahead buffer which fetches large blocks via
   the network in the background and answers calls to copy_from_user()
   directly from the local buffer

Or you can use a network filesystem like Lustre that handles this
itself ;-).  Sadly, though, it has to do both of these to get
good performance, via {sub,per}version of the VFS/VM.

Clients do delayed-write (writeback cache, with write credits from
the server to accound for space) to avoid small RPCs.  They also
do large amounts of readahead (in large chunks) to improve reads
for applications and the VM that breaks up all reads into 4kB chunks.

Servers also do batch block allocation and then large direct writes
instead of going through the VFS/VM.  There are still a number of
device drivers that break up bios into chunks smaller than 1MB, and
that hurts performance.

Having a generic delayed/batch allocation mechanism is definitely
the right way to go, and from my reading of linux-fsdevel this is
underway by some folks at IBM.  Since we have to support customers
dating back to 2.4.21 it will be a while before we can move over to
the newer APIs, once they are available.

 BTW: how are you guys planning to solve this 4k issue? Will you revert
 to small blocks or will you pretend to perform 4k transfers and
 assemble those in the background to, again, process large chunks at
 once? If yes, wouldn't this seriously increase CPU usage due to
 (most likely) unnecessary data duplication?

It doesn't result in data duplication, per se, since the pages are
copied into kernel space only once.  What it does mean is that there
needs to be a duplication of infrastructure in order to reassemble
and track all of these pages.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Schäfer
On 12:43 Wed 15 Mar , Hans Reiser wrote:
 I am the reiserfs/reiser4 sub-maintainer.  So, if reiser4 works well,
 and is faster than any other Linux FS, and it is,  maintaining it over
 time is for me to worry about, not them. 

I feel this thread is about to trail off to shores we all know too
well. AFAICS we do have two completely different issues here: 

* The core maintainers want the whole code to adhere to certain
  standards. This doesn't have anything to do with performance
  etc. It's just for the fact that this standard is both, a sign of
  reliability and maintainability (even for the unlikely case that
  Namesys would disappear)

* Reiser4 doesn't adhere to some of these standards because they don't
  make much sense from a performance (and design) point of view. 

I think the short term solution should be to adapt Reiser4 to the
standard, but in the long run keep bugging the Linux people to change
some paradigms (as one of Linux' core advantages has always been the
ability and willingness to throw decayed code overboard).

When you think about it, both POV do make sense. It's just so sad this
whole debate has become much more a political than a style debate.

-Andreas



Re: State of the Reiser4 FS

2006-03-14 Thread Vladimir V. Saveliev
Hello

On Tue, 2006-03-14 at 02:41 -0800, Avuton Olrich wrote:
 Hello,
 
 I just saw a thread on the LKML a minute ago asking about the state of
 getting the patch into vanilla linux. I read Andrew Morton's post
 about a month ago stating that it could happen soon, but was unlikely
 due to there not actually being a need for it to go into mainline (no
 major distro default, etc...). I was wondering, myself, earlier in the
 day what the state of the patch was, if anything further has been said
 about getting it into mainline. Was also wondering if there was a lot
 of work going into it right now, or are people tied up doing other
 things?
 
 If anyone has time for an answer it'd be appreciated by everyone I'm sure,
 

AFAIK, the most recent reason why reiser4 does not get included is that
reiser4 developers have to change reiser4 to use generic code to
implement read/write.
AFAICS, reiser4 developers do not work on that.



Re: State of the Reiser4 FS

2006-03-14 Thread Hans Reiser
Clemens Eisserer wrote:

AFAIK, the most recent reason why reiser4 does not get included is that
reiser4 developers have to change reiser4 to use generic code to
implement read/write.
AFAICS, reiser4 developers do not work on that.


Has this really become a reason to not include reiser4 into mainline?
  

Yes, this is the official reason.

I also don't see a reason for that - at least it would bind reiser4
more close to linux making ports to other OS harder.
  

You are entirely correct.  It is an interesting social phenomenom that
we must do this, yes?  Using the ext3 (err, generic) code makes it much
harder to license and port reiser4.

Furthermore if it would decrease performance its simply no way to go.
  

What we are currently doing is rewriting the reiser4 read and write code
to not operate 4k at a time.  The design specification was that it was
supposed to do as much as possible once per write, and as little as
possible every 4k.  Unfortunately, when I reviewed our code the design
specification had not been adhered to.  After the reiser4 code adheres
to the reiser4 design specification, it will be possible to argue that
the reiser4 design specification is technically superior, and the
generic code should change.   I generally believe that the per 4k
approach used throughout the linux kernel is not as CPU efficient as
sending larger groups of pages through the layers all at once.  In other
words, there is a reason we have bios, and we need to learn the lesson
from them that they teach us, and abstract it into a general design
approach.

We must make reiser4 adhere to the reiser4 design specification before
we can deal with their demand that we change the generic code so that it
does what reiser4 does.  I have no desire to touch their code, but they
require it.  Generally speaking, they don't really like any feature
existing in reiser4 that is not in their code, and ask that we add it to
their code before reiser4 is allowed to have it.  They call the ext3
code the generic code.  They claim that if we don't use the ext3 code
in our fs then they will be forced to shoulder an extra burden to
maintain our code.  We are not allowed to specify that they should not
maintain our code at all.  I need to read more Kafka I think, it is hard
for me to understand it all.

(btw. I think this could be a way to generate some revenue - I think
there is demand for a modern fs which is supported by both, windows
and linux).
  

There are so many ways to generate revenue by spending revenue I don't
have in my pocket right now.  forgive me, yes, someday we should do
that and will do that.

lg Clemens


  

If any of you users want to see a reiser4, you have to strenuously
clamor for it to go into mainline, or you simply will not get it. 
Namesys cannot survive indefinitely with it not going into the kernel. 
This is a political issue, and viewing it as otherwise is simply naive. 
It is sad, I chose Linux over BSD to develop for because BSD used to be
like this.

Hans