PCI bridge setup error in linux-2.4.x (anyone of them)

2001-06-29 Thread Martin Dalecki

I ahve a PC box at hand, which ist containing 8 PCI slots.
Four of them are sitting behind a PCI bridge.
The error in the new kernel series is that during the
PCI bus setup if a card is sitting behind the bridge, it
will be miracelously detected TWICE. Once in front of the
bridge and once behind the bridge. The initialisation of
the card will then be entierly hossed.

This00:02.0 PCI bridge: Intel Corporation 80960RP [i960 RP
Microprocessor/Bridge] (rev 03)
00:02.1 I2O: Intel Corporation 80960RP [i960RP Microprocessor] (rev 03)
00:03.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 02)

00:06.0 System peripheral: Hewlett-Packard Company NetServer Smart IRQ
Router (rev a0)
00:08.0 VGA compatible controller: Cirrus Logic GD 5446 (rev 45)
00:0f.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:0f.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:0f.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:0f.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:10.0 Host bridge: Intel Corporation 450NX - 82451NX Memory & I/O
Controller (rev 03)
00:12.0 Host bridge: Intel Corporation 450NX - 82454NX PCI Expander
Bridge (rev 02)
02:04.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink]
(rev 74)
oops:~ # 
 doesn't happen under linux-2.2.x kernel series.

Here is the output of lspci on this box after I moved the
card in front of the bridge:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PCI bridge setup error in linux-2.4.x (anyone of them)

2001-06-29 Thread Martin Dalecki

I ahve a PC box at hand, which ist containing 8 PCI slots.
Four of them are sitting behind a PCI bridge.
The error in the new kernel series is that during the
PCI bus setup if a card is sitting behind the bridge, it
will be miracelously detected TWICE. Once in front of the
bridge and once behind the bridge. The initialisation of
the card will then be entierly hossed.

This00:02.0 PCI bridge: Intel Corporation 80960RP [i960 RP
Microprocessor/Bridge] (rev 03)
00:02.1 I2O: Intel Corporation 80960RP [i960RP Microprocessor] (rev 03)
00:03.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 02)

00:06.0 System peripheral: Hewlett-Packard Company NetServer Smart IRQ
Router (rev a0)
00:08.0 VGA compatible controller: Cirrus Logic GD 5446 (rev 45)
00:0f.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:0f.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:0f.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:0f.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:10.0 Host bridge: Intel Corporation 450NX - 82451NX Memory  I/O
Controller (rev 03)
00:12.0 Host bridge: Intel Corporation 450NX - 82454NX PCI Expander
Bridge (rev 02)
02:04.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink]
(rev 74)
oops:~ # 
 doesn't happen under linux-2.2.x kernel series.

Here is the output of lspci on this box after I moved the
card in front of the bridge:
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in 3c905 driver.

2001-06-25 Thread Martin Dalecki

William Park wrote:
> 
> On Mon, Jun 25, 2001 at 08:51:28PM +0200, Martin Dalecki wrote:
> > Just a note...
> >
> > This card get's detected twofold by the plain 2.4.5 kernel.
> > It get's listed twice under both lspci and during the kernel boot
> > sequence on a HP LHr3 system.
> 
> I get only one message, I have 3c905CX and 2.4.5 kernel.  Maybe you have
> 2 cards inside? ;-)

Could you hand me your .config file over. Maybe there is something
sensitive
in the choices for PCI acess, Power management or not - or whatever else
it may be. I would like to confirm the true source of this error.
(Currently I'm guessing at a buggy compiler provided by SuSE or buggs
in the PCI setup code or some wired kind of BIOS configuration problem).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Bug in 3c905 driver.

2001-06-25 Thread Martin Dalecki

Just a note...

This card get's detected twofold by the plain 2.4.5 kernel.
It get's listed twice under both lspci and during the kernel boot
sequence on a HP LHr3 system.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Bug in 3c905 driver.

2001-06-25 Thread Martin Dalecki

Just a note...

This card get's detected twofold by the plain 2.4.5 kernel.
It get's listed twice under both lspci and during the kernel boot
sequence on a HP LHr3 system.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in 3c905 driver.

2001-06-25 Thread Martin Dalecki

William Park wrote:
 
 On Mon, Jun 25, 2001 at 08:51:28PM +0200, Martin Dalecki wrote:
  Just a note...
 
  This card get's detected twofold by the plain 2.4.5 kernel.
  It get's listed twice under both lspci and during the kernel boot
  sequence on a HP LHr3 system.
 
 I get only one message, I have 3c905CX and 2.4.5 kernel.  Maybe you have
 2 cards inside? ;-)

Could you hand me your .config file over. Maybe there is something
sensitive
in the choices for PCI acess, Power management or not - or whatever else
it may be. I would like to confirm the true source of this error.
(Currently I'm guessing at a buggy compiler provided by SuSE or buggs
in the PCI setup code or some wired kind of BIOS configuration problem).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Threads, inelegance, and Java

2001-06-20 Thread Martin Dalecki

Mike Harrold wrote:
> So what? Crusoe isn't designed for use in supercomputers. It's designed
> for use in laptops where the user is running an email reader, a web
> browser, a word processor, and where the user couldn't give a cr*p about
> performance as long as it isn't noticeable (20% *isn't* for those types
> of apps), but where the user does give a cr*p about how long his or her
> battery lasts (ie, the entire business day, and not running out of power
> at lunch time).

I'm just to good in remembering the academing discussion about
code morphing beeing a way to get more performance out of a chip
design. They where claiming, that due to the fact they could make
the underlying chip design much simpler and VLIW, the performance offset
by the emulation wouldn't be smaller than the performance win
in therms of a suprerior underlying chip architecture.
This was set off to provide compensation for the biggest hurdle
of VLIW design - insane code size and partially huge memmory
bus bandwidth designs due to this. (Why do you think the itanim
sucks on integer performance?)
After this turned out the be the fact in reality - IBM dropped
the developement of code morphing chips. Well transmeta turned
to claims that the main advantage of it's design is much smaller
power consumption. Well but in relity underclocked modern
design optimized for power consumtions beat the transmeta
chip easly: Geode, and the recently announced VIA chip to name a few.
In comparision to chip design esp. targetted at low power consumtion
the transmeta chip is laughable: this ARM please! My psion
beats *ANY* chip from them by huge magnitude.

> Yes, it *can* be used in a supercomputer (or more preferably, a cluster
> of Linux machines), or even as a server where performance isn't the
> number one concern and things like power usage (read: anywhere in
> California right now ;-) ), and rack size are important. You can always
> get faster, more efficient hardware, but you'll pay for it.

Well the transmeta cpu isn't cheap actually. And if you talk about
super computing, hmm what about some PowerPC CPU variant - they very
compettetiv in terms of cost and FPU performance! Transmeta isn't the
adequate choice here.

> Remember, the whole concept of code-morphing is that the majority of
> apps that people run repeat the same slice of code over and over (eg,
> a word processor). Once crusoe has translated it once, it doesn't need
> to do it again. It's the same concept as a JIT java compiler.

Well both of those concepts fail in terms of optimization due
to the same reason: much less information is present about
the structure of the code then during source code compilation.
And therefore usually the performance of any kind of JIT compiler
*sucks* in comparision to classical sophisticated compilers.
Additionaly there may be some performance wins due to the
ability of runtime profiling (anykind thereof), however it still remains
to be shown that this performs better then statically analyzed code.

> /Mike - who doesn't work for Transmeta, in case anyone was wondering... :-)

/Marcin - who doesn't bet a penny on Transmeta
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Threads, inelegance, and Java

2001-06-20 Thread Martin Dalecki

Rob Landley wrote:

> The same arguments were made 30 years ago about writing the OS in a high
> level language like C rather than in raw assembly.  And back in the days of
> the sub-1-mhz CPU, that really meant something.

And then those days we are still writing lot's of ASM in kernels...

> I don't know about that.  The 8 bit nature of java bytecode means you can
> suck WAY more instructions in across the memory bus in a given clock cycle,
> and you can also hold an insane amount of them in cache.  These are the real

What about the constant part of instructions? What about the alignment
characteristics
of current CPU busses?

> performance limiting factors, since the inside of your processor is clock
> multiplied into double digits nowdays, and that'll only increase as die sizes
> shrink, transistor budgets grow, and cache sizes get bigger.
> 
> In theory, a 2-core RISC or 3-core VLIW processor can execute an interpretive
> JVM pretty darn fast.  Think a jump-table based version (not quite an array

Bullshit! In theory the JVM resembles some very very old instruction
set well suited for a CISC CPU. In esp. the leak of registers is even
bigger
then on i386 arch. And bloody no compiler will be able to optimize this
sanely... And then there arises the problem of local variable management
and
so on. There where attempts already made to design a CPU according to
this
specs. As far as one can see they have all failed. Even Sun himself gave
up
his design. The compact instruction set is due to Javas inheritance from
the embedded world - nothing else. Too compact instruction set designs
make
for very nasty instruction decoders and therfore slow CPUs. This
complexity can be better overcome by the IBM memmory compressor chip
then in the instruction set itself.

> Or if you like the idea of a JIT, think about transmeta writing a code
> morphing layer that takes java bytecodes.  Ditch the VM and have the
> processor do it in-cache.

Blah blah blah. The performance of the Transmeta CPU SUCKS ROCKS. No
matter
what they try to make you beleve. A venerable classical desing like
the Geode outperforms them in any terms. There is simple significant
information
lost between compiled code and source code. Therefore no JIT compiler
in this world will ever match the optimization opportunities of a
classic
C compiler! IBM researched opportunities for code morphing long ago
before
Transmeta come to live - they ditched it for good reasons. Well the
actual
paper states that the theorethical performance was "just" 20% worser
then
a comparable normal design. Well "just 20%" is a half universe diameter
for
CPU designers.

> This doesn't mean java is really likely to outperform native code.  But it
> does mean that the theoretical performance problems aren't really that bad.
> Most java programs I've seen were written by rabid monkeys, but that's not
> the fault of the language. [1].

Think garbage collector - this explains nearly 90% of the performance
problems
Java code has. The remaining 10% are still by a factor of 10 bad in
comparision to classical code. Think zero copy on write - most Java code
induces insane
amounts of copyiing data around for no good reaons (Sring class and
friends
for example). Java code will never ever perform well.
 
> How many instructions does your average processor really NEED?  MIT's first
> computer had 4 instructions: load, save, add, and test/jump.  We only need 32
> or 64 bits for the data we're manipulating.  8 bit code is a large part of
> what allowed people to write early video games in 8k of ram.

Ton's of them. Please just remember how the RISC instruction set designs
evolved
over time. It was no accident!

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Threads, inelegance, and Java

2001-06-20 Thread Martin Dalecki

Rob Landley wrote:

 The same arguments were made 30 years ago about writing the OS in a high
 level language like C rather than in raw assembly.  And back in the days of
 the sub-1-mhz CPU, that really meant something.

And then those days we are still writing lot's of ASM in kernels...

 I don't know about that.  The 8 bit nature of java bytecode means you can
 suck WAY more instructions in across the memory bus in a given clock cycle,
 and you can also hold an insane amount of them in cache.  These are the real

What about the constant part of instructions? What about the alignment
characteristics
of current CPU busses?

 performance limiting factors, since the inside of your processor is clock
 multiplied into double digits nowdays, and that'll only increase as die sizes
 shrink, transistor budgets grow, and cache sizes get bigger.
 
 In theory, a 2-core RISC or 3-core VLIW processor can execute an interpretive
 JVM pretty darn fast.  Think a jump-table based version (not quite an array

Bullshit! In theory the JVM resembles some very very old instruction
set well suited for a CISC CPU. In esp. the leak of registers is even
bigger
then on i386 arch. And bloody no compiler will be able to optimize this
sanely... And then there arises the problem of local variable management
and
so on. There where attempts already made to design a CPU according to
this
specs. As far as one can see they have all failed. Even Sun himself gave
up
his design. The compact instruction set is due to Javas inheritance from
the embedded world - nothing else. Too compact instruction set designs
make
for very nasty instruction decoders and therfore slow CPUs. This
complexity can be better overcome by the IBM memmory compressor chip
then in the instruction set itself.

 Or if you like the idea of a JIT, think about transmeta writing a code
 morphing layer that takes java bytecodes.  Ditch the VM and have the
 processor do it in-cache.

Blah blah blah. The performance of the Transmeta CPU SUCKS ROCKS. No
matter
what they try to make you beleve. A venerable classical desing like
the Geode outperforms them in any terms. There is simple significant
information
lost between compiled code and source code. Therefore no JIT compiler
in this world will ever match the optimization opportunities of a
classic
C compiler! IBM researched opportunities for code morphing long ago
before
Transmeta come to live - they ditched it for good reasons. Well the
actual
paper states that the theorethical performance was just 20% worser
then
a comparable normal design. Well just 20% is a half universe diameter
for
CPU designers.

 This doesn't mean java is really likely to outperform native code.  But it
 does mean that the theoretical performance problems aren't really that bad.
 Most java programs I've seen were written by rabid monkeys, but that's not
 the fault of the language. [1].

Think garbage collector - this explains nearly 90% of the performance
problems
Java code has. The remaining 10% are still by a factor of 10 bad in
comparision to classical code. Think zero copy on write - most Java code
induces insane
amounts of copyiing data around for no good reaons (Sring class and
friends
for example). Java code will never ever perform well.
 
 How many instructions does your average processor really NEED?  MIT's first
 computer had 4 instructions: load, save, add, and test/jump.  We only need 32
 or 64 bits for the data we're manipulating.  8 bit code is a large part of
 what allowed people to write early video games in 8k of ram.

Ton's of them. Please just remember how the RISC instruction set designs
evolved
over time. It was no accident!

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [OT] Threads, inelegance, and Java

2001-06-20 Thread Martin Dalecki

Mike Harrold wrote:
 So what? Crusoe isn't designed for use in supercomputers. It's designed
 for use in laptops where the user is running an email reader, a web
 browser, a word processor, and where the user couldn't give a cr*p about
 performance as long as it isn't noticeable (20% *isn't* for those types
 of apps), but where the user does give a cr*p about how long his or her
 battery lasts (ie, the entire business day, and not running out of power
 at lunch time).

I'm just to good in remembering the academing discussion about
code morphing beeing a way to get more performance out of a chip
design. They where claiming, that due to the fact they could make
the underlying chip design much simpler and VLIW, the performance offset
by the emulation wouldn't be smaller than the performance win
in therms of a suprerior underlying chip architecture.
This was set off to provide compensation for the biggest hurdle
of VLIW design - insane code size and partially huge memmory
bus bandwidth designs due to this. (Why do you think the itanim
sucks on integer performance?)
After this turned out the be the fact in reality - IBM dropped
the developement of code morphing chips. Well transmeta turned
to claims that the main advantage of it's design is much smaller
power consumption. Well but in relity underclocked modern
design optimized for power consumtions beat the transmeta
chip easly: Geode, and the recently announced VIA chip to name a few.
In comparision to chip design esp. targetted at low power consumtion
the transmeta chip is laughable: this ARM please! My psion
beats *ANY* chip from them by huge magnitude.

 Yes, it *can* be used in a supercomputer (or more preferably, a cluster
 of Linux machines), or even as a server where performance isn't the
 number one concern and things like power usage (read: anywhere in
 California right now ;-) ), and rack size are important. You can always
 get faster, more efficient hardware, but you'll pay for it.

Well the transmeta cpu isn't cheap actually. And if you talk about
super computing, hmm what about some PowerPC CPU variant - they very
compettetiv in terms of cost and FPU performance! Transmeta isn't the
adequate choice here.

 Remember, the whole concept of code-morphing is that the majority of
 apps that people run repeat the same slice of code over and over (eg,
 a word processor). Once crusoe has translated it once, it doesn't need
 to do it again. It's the same concept as a JIT java compiler.

Well both of those concepts fail in terms of optimization due
to the same reason: much less information is present about
the structure of the code then during source code compilation.
And therefore usually the performance of any kind of JIT compiler
*sucks* in comparision to classical sophisticated compilers.
Additionaly there may be some performance wins due to the
ability of runtime profiling (anykind thereof), however it still remains
to be shown that this performs better then statically analyzed code.

 /Mike - who doesn't work for Transmeta, in case anyone was wondering... :-)

/Marcin - who doesn't bet a penny on Transmeta
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: How to know HZ from userspace?

2001-05-30 Thread Martin Dalecki

Joel Becker wrote:
> 
> On Wed, May 30, 2001 at 05:24:37PM -0700, Jonathan Lundell wrote:
> > FWIW (perhaps not much in this context), the POSIX way is sysconf(_SC_CLK_TCK)
> >
> > POSIX sysconf is pretty useful for this kind of thing (not just HZ, either).
> 
> Well, how many hundred things on Linux are available from /proc
> but not from sysconf or the like?  :-)

Those hundert things which you either don't need or which should go to
syslog
or shouldn't be sysconf and nothing else.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] fix more typos in Configure.help and fs/nls/Config.in

2001-05-30 Thread Martin Dalecki

> Standard is right.
> Believe me as someone who are living in Belarus ;-)

OK. I trust you.

> 
> Official country name: Belarus
> Language/Nationality: Belarusian
> 
> Standard has taken things right as we pronounce them.
> 
> Please apply the patch.
> 
> P. S. Political history had made us 'white russians' approx. hundred years ago.

No white russians was formerly the russian term for polish.

> Real historical name of country is Lithuania - as our neighbour country is
> called now.

Bull shit. It was poland - most of the time you reffer to.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: How to know HZ from userspace?

2001-05-30 Thread Martin Dalecki

Joel Becker wrote:
 
 On Wed, May 30, 2001 at 05:24:37PM -0700, Jonathan Lundell wrote:
  FWIW (perhaps not much in this context), the POSIX way is sysconf(_SC_CLK_TCK)
 
  POSIX sysconf is pretty useful for this kind of thing (not just HZ, either).
 
 Well, how many hundred things on Linux are available from /proc
 but not from sysconf or the like?  :-)

Those hundert things which you either don't need or which should go to
syslog
or shouldn't be sysconf and nothing else.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] fix more typos in Configure.help and fs/nls/Config.in

2001-05-30 Thread Martin Dalecki

 Standard is right.
 Believe me as someone who are living in Belarus ;-)

OK. I trust you.

 
 Official country name: Belarus
 Language/Nationality: Belarusian
 
 Standard has taken things right as we pronounce them.
 
 Please apply the patch.
 
 P. S. Political history had made us 'white russians' approx. hundred years ago.

No white russians was formerly the russian term for polish.

 Real historical name of country is Lithuania - as our neighbour country is
 called now.

Bull shit. It was poland - most of the time you reffer to.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-23 Thread Martin Dalecki

Linus Torvalds wrote:
> 
> On Tue, 22 May 2001, Jeff Garzik wrote:
> >
> > IMHO it would be nice to (for 2.4) create wrappers for accessing the
> > block arrays, so that we can more easily dispose of the arrays when 2.5
> > rolls around...
> 
> No.
> 
> We do not create wrappers "so that we can easily change the implementation
> when xxx happens".
> 
> That way lies bad implementations.

However Linus please note that in the case of the bould arrays
used in device handling code we have code patterns like this:

if (blah[major]) {
size = blah[major][minor]
} else
size = some default;

And those have to by dragged throughout the whole places where
the arrays get used. Thus making some wrappers (many are already in
place):

1. Prevents typo kind of programming errors.

2. Possibly make the code more explicit.

and please don't forget:

3. Allows to change the underlying implementation in some soon point in
time.

However I agree that *without* the above arguments such kind of wrappers
would make the overall code as unreadable as C++ code frequently is,
which
tryies to preserve private: attributes at simple field cases..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-23 Thread Martin Dalecki

Linus Torvalds wrote:
 
 On Tue, 22 May 2001, Jeff Garzik wrote:
 
  IMHO it would be nice to (for 2.4) create wrappers for accessing the
  block arrays, so that we can more easily dispose of the arrays when 2.5
  rolls around...
 
 No.
 
 We do not create wrappers so that we can easily change the implementation
 when xxx happens.
 
 That way lies bad implementations.

However Linus please note that in the case of the bould arrays
used in device handling code we have code patterns like this:

if (blah[major]) {
size = blah[major][minor]
} else
size = some default;

And those have to by dragged throughout the whole places where
the arrays get used. Thus making some wrappers (many are already in
place):

1. Prevents typo kind of programming errors.

2. Possibly make the code more explicit.

and please don't forget:

3. Allows to change the underlying implementation in some soon point in
time.

However I agree that *without* the above arguments such kind of wrappers
would make the overall code as unreadable as C++ code frequently is,
which
tryies to preserve private: attributes at simple field cases..
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> > IMHO it would be nice to create wrappers for accessing the block arrays
> 
> Last year Linus didnt like that at all. Maybe this year.

Well... the attached patch lines up into this effort and fixes
some abuses, removes redundant code and so on. Please have a second
look.

diff -urN linux/drivers/block/ll_rw_blk.c new/drivers/block/ll_rw_blk.c
--- linux/drivers/block/ll_rw_blk.c Thu Apr 12 21:15:52 2001
+++ new/drivers/block/ll_rw_blk.c   Mon Apr 30 23:16:03 2001
@@ -85,25 +85,21 @@
 int * blk_size[MAX_BLKDEV];
 
 /*
- * blksize_size contains the size of all block-devices:
+ * blksize_size contains the block size of all block-devices:
  *
  * blksize_size[MAJOR][MINOR]
  *
- * if (!blksize_size[MAJOR]) then 1024 bytes is assumed.
+ * Access to this array should happen through the get_blksize_size() function.
+ * If (!blksize_size[MAJOR]) then 1024 bytes is assumed.
  */
 int * blksize_size[MAX_BLKDEV];
 
 /*
  * hardsect_size contains the size of the hardware sector of a device.
  *
- * hardsect_size[MAJOR][MINOR]
- *
- * if (!hardsect_size[MAJOR])
- * then 512 bytes is assumed.
- * else
- * sector_size is hardsect_size[MAJOR][MINOR]
- * This is currently set by some scsi devices and read by the msdos fs driver.
- * Other uses may appear later.
+ * Access to this array should happen through the get_hardsect_size() function.
+ * The default value is assumed to be 512 unless specified differently by the
+ * corresponding low-level driver.
  */
 int * hardsect_size[MAX_BLKDEV];
 
@@ -992,22 +988,14 @@
 
 void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
 {
-   unsigned int major;
-   int correct_size;
+   ssize_t correct_size;
int i;
 
if (!nr)
return;
 
-   major = MAJOR(bhs[0]->b_dev);
-
/* Determine correct block size for this device. */
-   correct_size = BLOCK_SIZE;
-   if (blksize_size[major]) {
-   i = blksize_size[major][MINOR(bhs[0]->b_dev)];
-   if (i)
-   correct_size = i;
-   }
+   correct_size = get_blksize_size(bhs[0]->b_dev);
 
/* Verify requested block sizes. */
for (i = 0; i < nr; i++) {
diff -urN linux/drivers/block/loop.c new/drivers/block/loop.c
--- linux/drivers/block/loop.c  Thu Apr 12 04:05:14 2001
+++ new/drivers/block/loop.cMon Apr 30 23:30:17 2001
@@ -272,22 +272,10 @@
return desc.error;
 }
 
-static inline int loop_get_bs(struct loop_device *lo)
-{
-   int bs = 0;
-
-   if (blksize_size[MAJOR(lo->lo_device)])
-   bs = blksize_size[MAJOR(lo->lo_device)][MINOR(lo->lo_device)];
-   if (!bs)
-   bs = BLOCK_SIZE;
-
-   return bs;
-}
-
 static inline unsigned long loop_get_iv(struct loop_device *lo,
unsigned long sector)
 {
-   int bs = loop_get_bs(lo);
+   int bs = get_blksize_size(lo->lo_device);
unsigned long offset, IV;
 
IV = sector / (bs >> 9) + lo->lo_offset / bs;
@@ -306,9 +294,9 @@
pos = ((loff_t) bh->b_rsector << 9) + lo->lo_offset;
 
if (rw == WRITE)
-   ret = lo_send(lo, bh, loop_get_bs(lo), pos);
+   ret = lo_send(lo, bh, get_blksize_size(lo->lo_device), pos);
else
-   ret = lo_receive(lo, bh, loop_get_bs(lo), pos);
+   ret = lo_receive(lo, bh, get_blksize_size(lo->lo_device), pos);
 
return ret;
 }
@@ -650,12 +638,7 @@
lo->old_gfp_mask = inode->i_mapping->gfp_mask;
inode->i_mapping->gfp_mask = GFP_BUFFER;
 
-   bs = 0;
-   if (blksize_size[MAJOR(inode->i_rdev)])
-   bs = blksize_size[MAJOR(inode->i_rdev)][MINOR(inode->i_rdev)];
-   if (!bs)
-   bs = BLOCK_SIZE;
-
+   bs = get_blksize_size(inode->i_rdev);
set_blocksize(dev, bs);
 
lo->lo_bh = lo->lo_bhtail = NULL;
diff -urN linux/drivers/char/raw.c new/drivers/char/raw.c
--- linux/drivers/char/raw.cFri Apr 27 23:23:25 2001
+++ new/drivers/char/raw.c  Mon Apr 30 22:57:20 2001
@@ -124,22 +124,25 @@
return err;
}
 
-   
-   /* 
-* Don't interfere with mounted devices: we cannot safely set
-    * the blocksize on a device which is already mounted.  
+   /*
+* 29.04.2001 Martin Dalecki:
+*
+* The original comment here was saying:
+*
+* "Don't interfere with mounted devices: we cannot safely set the
+* blocksize on a device which is already mounted."
+*
+* However the code below was setting happily the blocksize
+* disregarding the previous check. I have fixed this, however I'm
+* quite sure, that the statement above isn't right and we should be
+* able to remove the first arm of t

Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

And if we are at the topic... Those are the places where blk_size[]
get's
abused, since it's in fact a property of a FS in fact and not the
property of
a particular device... blksect_size is the array describing the physical
access limits of a device and blk_size get's usually checked against it.
However due to the bad naming and the fact that this information is
associated with major/minor number usage same device driver writers got
*very* confused as you can see below:

./fs/block_dev.c: Here this information should be passed entierly insice
the request.

./fs/partitions/check.c: Here it basically get's reset or ignored


Here it's serving the purpose of a sector size, which is bogous!

./mm/swapfile.c:#include  /* for blk_size */
./mm/swapfile.c:if (!dev || (blk_size[MAJOR(dev)] &&
./mm/swapfile.c: !blk_size[MAJOR(dev)][MINOR(dev)]))
./mm/swapfile.c:if (blk_size[MAJOR(dev)])
./mm/swapfile.c:swapfilesize = blk_size[MAJOR(dev)][MINOR(dev)]


Here it shouldn't be needed
./drivers/block/ll_rw_blk.c: 


./drivers/block/floppy.c:   blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/nbd.c:  blk_size[MAJOR_NR] = nbd_sizes;
./drivers/block/rd.c: * and set blk_size for -ENOSPC, Werner Fink
<[EMAIL PROTECTED]>, Apr '99
./drivers/block/amiflop.c:  blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/loop.c: if (blk_size[MAJOR(lodev)])
./drivers/block/ataflop.c: *   - Set blk_size for proper size checking
./drivers/block/ataflop.c:  blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/cpqarray.c: drv->blk_size;
./drivers/block/z2ram.c:blk_size[ MAJOR_NR ] = z2_sizes;
./drivers/block/swim3.c:blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/swim_iop.c: blk_size[MAJOR_NR] = floppy_sizes;
./drivers/char/raw.c:   if (blk_size[MAJOR(dev)])
./drivers/scsi/advansys.c:ASC_DCNTblk_size;
./drivers/scsi/sd.c:blk_size[SD_MAJOR(i)] = NULL;
./drivers/scsi/sr.c:blk_size[MAJOR_NR] = sr_sizes;
./drivers/scsi/sr.c:blk_size[MAJOR_NR] = NULL;
./drivers/sbus/char/jsflash.c:  blk_size[JSFD_MAJOR] = jsfd_sizes;
./drivers/ide/ide-cd.c: blk_size[HWIF(drive)->major] =
HWIF(drive)->gd->sizes;
./drivers/ide/ide-floppy.c: *   Revalidate the new media. Should set
blk_size[]
./drivers/acorn/block/fd1772.c: blk_size[MAJOR_NR] = floppy_sizes;
./drivers/i2o/i2o_block.c:  blk_size[MAJOR_NR] = i2ob_sizes;

In the following they are REALLY confusing it and then compensating for
this misunderstanding in lvm.h by redefining the corresponding default
values.

./drivers/s390/*

And then some minor confusions follow...

./drivers/mtd/mtdblock.c:   blk_size[MAJOR_NR] = NULL;
./drivers/md/md.c:  if (blk_size[MAJOR(dev)])
./arch/m68k/atari/stram.c:blk_size[STRAM_MAJOR] = stram_sizes;

Basically one should just stop setting blk_size[][] inside *ANY* driver
and anything should still work fine unless the driver is broken...

Well that's the point for another fine kernel experiment I will do
and report whatever it works really out like this in reality 8-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> Martin Dalecki writes:
> 
> > Erm... I wasn't talking about the DESIRED state of affairs!
> > I was talking about the CURRENT state of affairs. OK?
> 
> Oh, but in 1995 it was quite possible to compile the kernel
> with kdev_t a pointer type, and I have done it several times since.

Yes I remember but unfortunately some big L* did ignore
your *fine* efforts entierly in favour of developing 
/proc and /dev/random and other crap maybe?

> The kernel keeps growing, so each time it is more work than
> the previous time.
> 
> > At least you have admitted that you where the one responsible
> > for the design of this MESS.
> 
> Thank you! However, you give me too much honour.

Well ... you ask for it in the corresponding header ;-).
But it isn't yours fault indeed I admit...
I know the discussions from memmory since I'm returning REGULARLY to
this
topic in intervals of about between 6 and 24 months since about
maybe already 6 years!!! Currently they have just started to hurt
seriously. And please remember the change I have mentioned above
wasn't intended as developement but just only as an experiment...

Well let's us stop throw flames at each other.
Please have a tight look at the following *EXPERIMENT* I have
already done. It's really really only intended to mark the
places where the full mess shows it's ugly head:

http://www.dalecki.de/big-002.diff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> Martin Dalecki writes:
> 
> > I fully agree with you.
> 
> Good.
> 
> Unfortunately I do not fully agree with you.
> 
> > Most of the places where there kernel is passing kdev_t
> > would be entierly satisfied with only the knowlendge of
> > the minor number.
> 
> My kdev_t is a pointer to a structure with device data
> and device operations. Among the operations a routine
> that produces a name. Among the data, in the case of a
> block device, the size, and the sectorsize, ..
> 
> A minor gives no name, and no data.
> 
> Linus' minor is 20-bit if I recall correctly.
> My minor is 32-bit. Neither of the two can be
> used to index arrays.

Erm... I wasn't talking about the DESIRED state of affairs!
I was talking about the CURRENT state of affairs. OK?
The fact still remains that most of the places which a have pointed
out just need the minor nibble of whatever bits you pass to them.

Apparently nobody on this list here blabbering about a new improved
minor/major space didn't actually take the time and looked into
all those places where the kernel is CURRENTLY replying in minor/major
array enumeration. They are TON's of them. The most ugly are RAID
drivers
an all those MD LVW and whatever stuff as well as abused minor number
spaces as replacements of differnt majors.

At least you have admitted that you where the one responsible for
the design of this MESS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> > They are entirely different. Too different sets of operations.
> 
> Maybe you didnt understand what I meant.
> both bdev and cdev take care of the correspondence
> device number <---> struct with operations.
> 
> The operations are different, but all bdev/cdev code is identical.
> 
> So the choice is between two uglies:
> (i) have some not entirely trivial amount of code twice in the kernel
> (ii) have a union at the point where the struct operations
> is assigned.
> 
> I preferred the union.
> 
> >> And a second remark: don't forget that presently the point where
> >> bdev is introduced is not quite right. We must only introduce it
> >> when we really have a device, not when there only is a device
> >> number (like on a mknod call).
> 
> > That's simply wrong. kdev_t is used for unopened objects quite often.
> 
> Yes, but that was my design mistake in 1995.

I fully agree with you. Most of the places where there kernel is passing
kdev_t
would be entierly satisfied with only the knowlendge of the minor number
used to
distinguish between different device ranges, which is BTW an abuse by
itself as well
since minors where for encounters of instances of similiar devices in
linux...
The places where this is the case are namely:

1. literally: all character devices.

2. The whole scsi stuff.

3. most of the ide stuff.

4. md/lvm and similiar culprits.

I did "discover" this by splitting the i_dev field from stuct inode
into explicit i_minor and i_major fields and then actually "fixing" my
particular kernel configuration until it worked again. This was
*very* insigtfull, since it discovered all the places where kdev_t get's
used, where it shouldn't be of any need anylonger anyway.

The remaining places where kdev_t comes into sight are mostly
the places where the kernel is mounting the initial root
device.

In case you would like to have a look at the resulting bit huge
patch I can send it to you...

> I think you'll find if you continue on this way,
> as I found and already wrote in kdev_t.h
> that it is bad to carry pointers around for unopened and unknown devices.
> 
> So, I think that the setup must be changed a tiny little bit
> and distinguish meaningless numbers from devices.
> 
> Andries
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
 
  They are entirely different. Too different sets of operations.
 
 Maybe you didnt understand what I meant.
 both bdev and cdev take care of the correspondence
 device number --- struct with operations.
 
 The operations are different, but all bdev/cdev code is identical.
 
 So the choice is between two uglies:
 (i) have some not entirely trivial amount of code twice in the kernel
 (ii) have a union at the point where the struct operations
 is assigned.
 
 I preferred the union.
 
  And a second remark: don't forget that presently the point where
  bdev is introduced is not quite right. We must only introduce it
  when we really have a device, not when there only is a device
  number (like on a mknod call).
 
  That's simply wrong. kdev_t is used for unopened objects quite often.
 
 Yes, but that was my design mistake in 1995.

I fully agree with you. Most of the places where there kernel is passing
kdev_t
would be entierly satisfied with only the knowlendge of the minor number
used to
distinguish between different device ranges, which is BTW an abuse by
itself as well
since minors where for encounters of instances of similiar devices in
linux...
The places where this is the case are namely:

1. literally: all character devices.

2. The whole scsi stuff.

3. most of the ide stuff.

4. md/lvm and similiar culprits.

I did discover this by splitting the i_dev field from stuct inode
into explicit i_minor and i_major fields and then actually fixing my
particular kernel configuration until it worked again. This was
*very* insigtfull, since it discovered all the places where kdev_t get's
used, where it shouldn't be of any need anylonger anyway.

The remaining places where kdev_t comes into sight are mostly
the places where the kernel is mounting the initial root
device.

In case you would like to have a look at the resulting bit huge
patch I can send it to you...

 I think you'll find if you continue on this way,
 as I found and already wrote in kdev_t.h
 that it is bad to carry pointers around for unopened and unknown devices.
 
 So, I think that the setup must be changed a tiny little bit
 and distinguish meaningless numbers from devices.
 
 Andries
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
 
 Martin Dalecki writes:
 
  I fully agree with you.
 
 Good.
 
 Unfortunately I do not fully agree with you.
 
  Most of the places where there kernel is passing kdev_t
  would be entierly satisfied with only the knowlendge of
  the minor number.
 
 My kdev_t is a pointer to a structure with device data
 and device operations. Among the operations a routine
 that produces a name. Among the data, in the case of a
 block device, the size, and the sectorsize, ..
 
 A minor gives no name, and no data.
 
 Linus' minor is 20-bit if I recall correctly.
 My minor is 32-bit. Neither of the two can be
 used to index arrays.

Erm... I wasn't talking about the DESIRED state of affairs!
I was talking about the CURRENT state of affairs. OK?
The fact still remains that most of the places which a have pointed
out just need the minor nibble of whatever bits you pass to them.

Apparently nobody on this list here blabbering about a new improved
minor/major space didn't actually take the time and looked into
all those places where the kernel is CURRENTLY replying in minor/major
array enumeration. They are TON's of them. The most ugly are RAID
drivers
an all those MD LVW and whatever stuff as well as abused minor number
spaces as replacements of differnt majors.

At least you have admitted that you where the one responsible for
the design of this MESS.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
 
 Martin Dalecki writes:
 
  Erm... I wasn't talking about the DESIRED state of affairs!
  I was talking about the CURRENT state of affairs. OK?
 
 Oh, but in 1995 it was quite possible to compile the kernel
 with kdev_t a pointer type, and I have done it several times since.

Yes I remember but unfortunately some big L* did ignore
your *fine* efforts entierly in favour of developing 
/proc and /dev/random and other crap maybe?

 The kernel keeps growing, so each time it is more work than
 the previous time.
 
  At least you have admitted that you where the one responsible
  for the design of this MESS.
 
 Thank you! However, you give me too much honour.

Well ... you ask for it in the corresponding header ;-).
But it isn't yours fault indeed I admit...
I know the discussions from memmory since I'm returning REGULARLY to
this
topic in intervals of about between 6 and 24 months since about
maybe already 6 years!!! Currently they have just started to hurt
seriously. And please remember the change I have mentioned above
wasn't intended as developement but just only as an experiment...

Well let's us stop throw flames at each other.
Please have a tight look at the following *EXPERIMENT* I have
already done. It's really really only intended to mark the
places where the full mess shows it's ugly head:

http://www.dalecki.de/big-002.diff
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

And if we are at the topic... Those are the places where blk_size[]
get's
abused, since it's in fact a property of a FS in fact and not the
property of
a particular device... blksect_size is the array describing the physical
access limits of a device and blk_size get's usually checked against it.
However due to the bad naming and the fact that this information is
associated with major/minor number usage same device driver writers got
*very* confused as you can see below:

./fs/block_dev.c: Here this information should be passed entierly insice
the request.

./fs/partitions/check.c: Here it basically get's reset or ignored


Here it's serving the purpose of a sector size, which is bogous!

./mm/swapfile.c:#include linux/blkdev.h /* for blk_size */
./mm/swapfile.c:if (!dev || (blk_size[MAJOR(dev)] 
./mm/swapfile.c: !blk_size[MAJOR(dev)][MINOR(dev)]))
./mm/swapfile.c:if (blk_size[MAJOR(dev)])
./mm/swapfile.c:swapfilesize = blk_size[MAJOR(dev)][MINOR(dev)]


Here it shouldn't be needed
./drivers/block/ll_rw_blk.c: 


./drivers/block/floppy.c:   blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/nbd.c:  blk_size[MAJOR_NR] = nbd_sizes;
./drivers/block/rd.c: * and set blk_size for -ENOSPC, Werner Fink
[EMAIL PROTECTED], Apr '99
./drivers/block/amiflop.c:  blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/loop.c: if (blk_size[MAJOR(lodev)])
./drivers/block/ataflop.c: *   - Set blk_size for proper size checking
./drivers/block/ataflop.c:  blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/cpqarray.c: drv-blk_size;
./drivers/block/z2ram.c:blk_size[ MAJOR_NR ] = z2_sizes;
./drivers/block/swim3.c:blk_size[MAJOR_NR] = floppy_sizes;
./drivers/block/swim_iop.c: blk_size[MAJOR_NR] = floppy_sizes;
./drivers/char/raw.c:   if (blk_size[MAJOR(dev)])
./drivers/scsi/advansys.c:ASC_DCNTblk_size;
./drivers/scsi/sd.c:blk_size[SD_MAJOR(i)] = NULL;
./drivers/scsi/sr.c:blk_size[MAJOR_NR] = sr_sizes;
./drivers/scsi/sr.c:blk_size[MAJOR_NR] = NULL;
./drivers/sbus/char/jsflash.c:  blk_size[JSFD_MAJOR] = jsfd_sizes;
./drivers/ide/ide-cd.c: blk_size[HWIF(drive)-major] =
HWIF(drive)-gd-sizes;
./drivers/ide/ide-floppy.c: *   Revalidate the new media. Should set
blk_size[]
./drivers/acorn/block/fd1772.c: blk_size[MAJOR_NR] = floppy_sizes;
./drivers/i2o/i2o_block.c:  blk_size[MAJOR_NR] = i2ob_sizes;

In the following they are REALLY confusing it and then compensating for
this misunderstanding in lvm.h by redefining the corresponding default
values.

./drivers/s390/*

And then some minor confusions follow...

./drivers/mtd/mtdblock.c:   blk_size[MAJOR_NR] = NULL;
./drivers/md/md.c:  if (blk_size[MAJOR(dev)])
./arch/m68k/atari/stram.c:blk_size[STRAM_MAJOR] = stram_sizes;

Basically one should just stop setting blk_size[][] inside *ANY* driver
and anything should still work fine unless the driver is broken...

Well that's the point for another fine kernel experiment I will do
and report whatever it works really out like this in reality 8-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] struct char_device

2001-05-22 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
 
  IMHO it would be nice to create wrappers for accessing the block arrays
 
 Last year Linus didnt like that at all. Maybe this year.

Well... the attached patch lines up into this effort and fixes
some abuses, removes redundant code and so on. Please have a second
look.

diff -urN linux/drivers/block/ll_rw_blk.c new/drivers/block/ll_rw_blk.c
--- linux/drivers/block/ll_rw_blk.c Thu Apr 12 21:15:52 2001
+++ new/drivers/block/ll_rw_blk.c   Mon Apr 30 23:16:03 2001
@@ -85,25 +85,21 @@
 int * blk_size[MAX_BLKDEV];
 
 /*
- * blksize_size contains the size of all block-devices:
+ * blksize_size contains the block size of all block-devices:
  *
  * blksize_size[MAJOR][MINOR]
  *
- * if (!blksize_size[MAJOR]) then 1024 bytes is assumed.
+ * Access to this array should happen through the get_blksize_size() function.
+ * If (!blksize_size[MAJOR]) then 1024 bytes is assumed.
  */
 int * blksize_size[MAX_BLKDEV];
 
 /*
  * hardsect_size contains the size of the hardware sector of a device.
  *
- * hardsect_size[MAJOR][MINOR]
- *
- * if (!hardsect_size[MAJOR])
- * then 512 bytes is assumed.
- * else
- * sector_size is hardsect_size[MAJOR][MINOR]
- * This is currently set by some scsi devices and read by the msdos fs driver.
- * Other uses may appear later.
+ * Access to this array should happen through the get_hardsect_size() function.
+ * The default value is assumed to be 512 unless specified differently by the
+ * corresponding low-level driver.
  */
 int * hardsect_size[MAX_BLKDEV];
 
@@ -992,22 +988,14 @@
 
 void ll_rw_block(int rw, int nr, struct buffer_head * bhs[])
 {
-   unsigned int major;
-   int correct_size;
+   ssize_t correct_size;
int i;
 
if (!nr)
return;
 
-   major = MAJOR(bhs[0]-b_dev);
-
/* Determine correct block size for this device. */
-   correct_size = BLOCK_SIZE;
-   if (blksize_size[major]) {
-   i = blksize_size[major][MINOR(bhs[0]-b_dev)];
-   if (i)
-   correct_size = i;
-   }
+   correct_size = get_blksize_size(bhs[0]-b_dev);
 
/* Verify requested block sizes. */
for (i = 0; i  nr; i++) {
diff -urN linux/drivers/block/loop.c new/drivers/block/loop.c
--- linux/drivers/block/loop.c  Thu Apr 12 04:05:14 2001
+++ new/drivers/block/loop.cMon Apr 30 23:30:17 2001
@@ -272,22 +272,10 @@
return desc.error;
 }
 
-static inline int loop_get_bs(struct loop_device *lo)
-{
-   int bs = 0;
-
-   if (blksize_size[MAJOR(lo-lo_device)])
-   bs = blksize_size[MAJOR(lo-lo_device)][MINOR(lo-lo_device)];
-   if (!bs)
-   bs = BLOCK_SIZE;
-
-   return bs;
-}
-
 static inline unsigned long loop_get_iv(struct loop_device *lo,
unsigned long sector)
 {
-   int bs = loop_get_bs(lo);
+   int bs = get_blksize_size(lo-lo_device);
unsigned long offset, IV;
 
IV = sector / (bs  9) + lo-lo_offset / bs;
@@ -306,9 +294,9 @@
pos = ((loff_t) bh-b_rsector  9) + lo-lo_offset;
 
if (rw == WRITE)
-   ret = lo_send(lo, bh, loop_get_bs(lo), pos);
+   ret = lo_send(lo, bh, get_blksize_size(lo-lo_device), pos);
else
-   ret = lo_receive(lo, bh, loop_get_bs(lo), pos);
+   ret = lo_receive(lo, bh, get_blksize_size(lo-lo_device), pos);
 
return ret;
 }
@@ -650,12 +638,7 @@
lo-old_gfp_mask = inode-i_mapping-gfp_mask;
inode-i_mapping-gfp_mask = GFP_BUFFER;
 
-   bs = 0;
-   if (blksize_size[MAJOR(inode-i_rdev)])
-   bs = blksize_size[MAJOR(inode-i_rdev)][MINOR(inode-i_rdev)];
-   if (!bs)
-   bs = BLOCK_SIZE;
-
+   bs = get_blksize_size(inode-i_rdev);
set_blocksize(dev, bs);
 
lo-lo_bh = lo-lo_bhtail = NULL;
diff -urN linux/drivers/char/raw.c new/drivers/char/raw.c
--- linux/drivers/char/raw.cFri Apr 27 23:23:25 2001
+++ new/drivers/char/raw.c  Mon Apr 30 22:57:20 2001
@@ -124,22 +124,25 @@
return err;
}
 
-   
-   /* 
-* Don't interfere with mounted devices: we cannot safely set
-* the blocksize on a device which is already mounted.  
+   /*
+* 29.04.2001 Martin Dalecki:
+*
+* The original comment here was saying:
+*
+* Don't interfere with mounted devices: we cannot safely set the
+* blocksize on a device which is already mounted.
+*
+* However the code below was setting happily the blocksize
+* disregarding the previous check. I have fixed this, however I'm
+* quite sure, that the statement above isn't right and we should be
+* able to remove the first arm of the branch below entierly.
 */
-   
-   sector_size = 512;
if (get_super(rdev) != NULL) {
-   if (blksize_size[MAJOR(rdev

Re: [PATCH] SCSI disk minor number cleaning

2001-05-15 Thread Martin Dalecki

Andrzej Krzysztofowicz wrote:
> 
> Hi,
>   The following patch cleans up a bit usage of parameters related to
> number of minors per disk in the SCSI subsystem. This is a preliminary
> patch and it seems to not contain any problematic changes. The full version
> of the patch (that allows to succesfully change SCSI_MINOR_SHIFT and use
> more/less partitions per disk) is available at
> 
> ftp://rudy.mif.pg.gda.pl/pub/People/ankry/patches/scsi-minor/
> 
> Both are against 2.4.4-ac9, but the "shorter" one can be applied to
> 2.4.5-pre series as well.
> 
> Any comments are welcome.

Good stuff!  This is at least tagging where the problems are!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: To Pending Device Number Registrants

2001-05-15 Thread Martin Dalecki

Linus Torvalds wrote:

> and then use
> 
> fd = open("/dev/fd0/colourspace", O_RDWR);

> This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
> etc is MUCH preferable to ioctl's, makes code more readable and logical,
> and often makes it possible to do things you couldn't sanely do before
> (control these things from scripts etc).
> 
> And using ASCII names ("eject") instead of numbers (see the "FDEJECT" and
> "CDROMEJECT" etc #defines) sure as hell makes for easier maintenance and
> avoids the whole issue of maintaining static numbers (all the same things
> that make me hate device number maintenance makes me also hate the fact
> that we need to maintain this list of ioctl numbers etc). By using
> descriptive names, the "maintenance" simple does not exist.


Blah blah blah Now we have just one ugly cluttered undocumented
(please insert the list of you favourite invictions here) /proc.
This way we would have TONS of them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: Getting out of hand?

2001-05-15 Thread Martin Dalecki

Linus Torvalds wrote:
> 
> On Mon, 14 May 2001, Alan Cox wrote:
> >
> > Except that Linus wont hand out major numbers, which means I can't even boot
> > simply off such a device. I bet the vendors in question dont think the sun
> > shines out of linus backside any more.
> 
> Actually, it does. It's just that some people have gotten so blinded by my
> a** that they can no longer see it any more ;)
> 
> The problem I have is that there are lots of _good_ solutions, but they
> all imply a bit more work than the bad ones.
> 
> What does that result in? Everybody continues to use the simple old setup,
> which required no thought at all, but that is a pain to maintain.
> 
> For example, the only thing you need in order to boot is to have a nice
> clean "disk" major number. That's it. Nothing fancy, nothing more.
> 
> Look at what we have now:
> 
>  - ramdisk: major 1. Fair enough - ramdisk is special, in that it doesn't
>have any "real hardware". No problem.
>  - SCSI disks:
> major 8, 65-71,
>  - Compaq smart2:
> major 72-79
>  - Compaq CISS:
> major 104-111
>  - DASD;
> major 94
>  - IDE:
> major 3, 22, 33-34, 56-57, 88-91
> 
> and then the small random ones.
> 
> NONE of these major numbers have _any_ redeeming qualities except for the
> ramdisk. They should all be _one_ major number, namely "disk". There are
> absolutely NO advantages to having separate devices for soem strange
> compaq controllers and IDE disks. There is _no_ point in having some SCSI
> disks show up at major 8, while others (who just happen to be attached to
> a scsi bus that is not driven by the generic SCSI layer) show up at major
> 104 or whatever.

And then the IDE stuff is stiuoid to use the same major numbers for
in fact entierly different devices like CD-ROM and IDE disk drivers on
the same major... This makes it VERY uncomfortable to guarantee that
for example the sector size and driver read ahead are properties
tighted to the major number alone... In fact Linux is bundling 
read ahead with the major number only, in esp. inside the RAID drivers
which is entierly wrong! (see blksize_size array and read_ahead array).

And yes the RAID drivers are in particular *VERY* stiupid in
terms of major/minor number usage.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: Getting out of hand?

2001-05-15 Thread Martin Dalecki

Linus Torvalds wrote:
 
 On Mon, 14 May 2001, Alan Cox wrote:
 
  Except that Linus wont hand out major numbers, which means I can't even boot
  simply off such a device. I bet the vendors in question dont think the sun
  shines out of linus backside any more.
 
 Actually, it does. It's just that some people have gotten so blinded by my
 a** that they can no longer see it any more ;)
 
 The problem I have is that there are lots of _good_ solutions, but they
 all imply a bit more work than the bad ones.
 
 What does that result in? Everybody continues to use the simple old setup,
 which required no thought at all, but that is a pain to maintain.
 
 For example, the only thing you need in order to boot is to have a nice
 clean disk major number. That's it. Nothing fancy, nothing more.
 
 Look at what we have now:
 
  - ramdisk: major 1. Fair enough - ramdisk is special, in that it doesn't
have any real hardware. No problem.
  - SCSI disks:
 major 8, 65-71,
  - Compaq smart2:
 major 72-79
  - Compaq CISS:
 major 104-111
  - DASD;
 major 94
  - IDE:
 major 3, 22, 33-34, 56-57, 88-91
 
 and then the small random ones.
 
 NONE of these major numbers have _any_ redeeming qualities except for the
 ramdisk. They should all be _one_ major number, namely disk. There are
 absolutely NO advantages to having separate devices for soem strange
 compaq controllers and IDE disks. There is _no_ point in having some SCSI
 disks show up at major 8, while others (who just happen to be attached to
 a scsi bus that is not driven by the generic SCSI layer) show up at major
 104 or whatever.

And then the IDE stuff is stiuoid to use the same major numbers for
in fact entierly different devices like CD-ROM and IDE disk drivers on
the same major... This makes it VERY uncomfortable to guarantee that
for example the sector size and driver read ahead are properties
tighted to the major number alone... In fact Linux is bundling 
read ahead with the major number only, in esp. inside the RAID drivers
which is entierly wrong! (see blksize_size array and read_ahead array).

And yes the RAID drivers are in particular *VERY* stiupid in
terms of major/minor number usage.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LANANA: To Pending Device Number Registrants

2001-05-15 Thread Martin Dalecki

Linus Torvalds wrote:

 and then use
 
 fd = open(/dev/fd0/colourspace, O_RDWR);

 This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
 etc is MUCH preferable to ioctl's, makes code more readable and logical,
 and often makes it possible to do things you couldn't sanely do before
 (control these things from scripts etc).
 
 And using ASCII names (eject) instead of numbers (see the FDEJECT and
 CDROMEJECT etc #defines) sure as hell makes for easier maintenance and
 avoids the whole issue of maintaining static numbers (all the same things
 that make me hate device number maintenance makes me also hate the fact
 that we need to maintain this list of ioctl numbers etc). By using
 descriptive names, the maintenance simple does not exist.


Blah blah blah Now we have just one ugly cluttered undocumented
(please insert the list of you favourite invictions here) /proc.
This way we would have TONS of them.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SCSI disk minor number cleaning

2001-05-15 Thread Martin Dalecki

Andrzej Krzysztofowicz wrote:
 
 Hi,
   The following patch cleans up a bit usage of parameters related to
 number of minors per disk in the SCSI subsystem. This is a preliminary
 patch and it seems to not contain any problematic changes. The full version
 of the patch (that allows to succesfully change SCSI_MINOR_SHIFT and use
 more/less partitions per disk) is available at
 
 ftp://rudy.mif.pg.gda.pl/pub/People/ankry/patches/scsi-minor/
 
 Both are against 2.4.4-ac9, but the shorter one can be applied to
 2.4.5-pre series as well.
 
 Any comments are welcome.

Good stuff!  This is at least tagging where the problems are!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] fbdev logo (fwd)

2001-05-10 Thread Martin Dalecki

>   - Political fixes:
>   o There were still some penguins left carrying a glass of beer or wine.
> This problem is about 2 years old!

Could You please for the sake of political correctness just replace
the beer with a glass of vodka please... It tastes better anyway!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] fbdev logo (fwd)

2001-05-10 Thread Martin Dalecki

   - Political fixes:
   o There were still some penguins left carrying a glass of beer or wine.
 This problem is about 2 years old!

Could You please for the sake of political correctness just replace
the beer with a glass of vodka please... It tastes better anyway!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: blkdev in pagecache

2001-05-09 Thread Martin Dalecki

Andrea Arcangeli wrote:
> 
> On Wed, May 09, 2001 at 11:13:33AM +0200, Martin Dalecki wrote:
> > >   (buffered and direct) to work with a 4096 bytes granularity instead of
> >
> > You mean PAGE_SIZE :-).
> 
> In my first patch it is really 4096 bytes, but yes I agree we should
> change that to PAGE_CACHE_SIZE. The _only_ reason it's 4096 fixed bytes is that
> I wasn't sure all the device drivers out there can digest a bh->b_size of
> 8k/32k/64k (for the non x86 archs) and I checked the minimal PAGE_SIZE
> supported by linux is 4k. If Jens says I can sumbit 64k b_size without
> any problem for all the relevant blkdevices then I will change that in a
> jiffy ;). Anyways changing that is truly easy, just define
> BUFFERED_BLOCKSIZE to PAGE_CACHE_SIZE instad of 4096 (plus the .._BITS as
> well) and it should do the trick automatically. So for now I only cared
> to make it easy to change that.
> 
> > Exactly, please see my former explanation... BTW.> If you are gogin into
> > the range of PAGE_SIZE, it may be very well possible to remove the
> > whole page assoociated mechanisms of a buffer_head?
> 
> I wouldn't be that trivial to drop it, not much different than dropping
> it when a fs has a 4k blocksize. I think the dynamic allocation of the
> bh is not that a bad thing, or at least it's an orthogonal problem to
> moving the blkdev in pagecache ;).

I think the only guys which will have a hard time on this will be ibm's 
AS/390 people and maybe a far fainter pille of problems will araise in
lvm and raid
code... As I stated already in esp the AS/390 are the ones most confused
about
blksize_size ver. hardsect_size bh->b_size and so on semantics.
find /usr/src/linux -exec grep blksize_size /dev/null {} \;
shows this fine as well as the corresponding BLOCK_SIZE redefinition in
the
lvm.h file! Well not much worth of caring about I think... (It will just
*force*
them to write cleaner code 8-).

> 
> > Basically this is something which should come down to the strategy
> > routine
> > of the corresponding device and be fixed there... And then we have this
> 
> so you mean the device driver should make sure blk_size is PAGE_CACHE_SIZE
> aligned and to take care of writing zero in the pagecache beyond the end
> of the device? That would be fine from my part but I'm not yet sure
> that's the cleanest manner to handle that.

Yes that's about it. We *can* afford to expect that the case of access
behind
a device should be handled as an exception and not by checks
beforeahead.
This should greatly simplify the main code...

> 
> > Some notes about the code:
> >
> >   kdev_t dev = inode->i_rdev;
> > - struct buffer_head * bh, *bufferlist[NBUF];
> > - register char * p;
> > + int err;
> >
> > - if (is_read_only(dev))
> > - return -EPERM;
> > + err = -EIO;
> > + if (iblock >= (blk_size[MAJOR(dev)][MINOR(dev)] >>
> > (BUFFERED_BLOCKSIZE_BITS - BLOCK_SIZE_BITS)))
> >^
> >
> > blk_size[MAJOR(dev)] can very well be equal NULL! In this case one is
> > supposed to assume blk_size[MAJOR(dev)][MINOR(dev)] to be INT_MAX.
> > Are you shure it's guaranteed here to be already preset?
> >
> > Same question goes for calc_end_index and calc_rsize.
> 
> that's a bug indeed (a minor one at least because all the relevant
> blkdevices initialize such array and if it's not initialized you notice
> before you can make any damage ;), thanks for pointing it out!

This kind of problem slipery in are the reasons for the last tinny
encapsulation patch I sendid
to Linus and Alan (for inclusion into 2.4.5)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: blkdev in pagecache

2001-05-09 Thread Martin Dalecki

Andrea Arcangeli wrote:

> (btw, also the current rawio uses a 512byte bh->b_size granularity that is even
> worse than the 1024byte b_size of the blkdev, O_DIRECT is much smarter
> on this side as it uses the softblocksize of the fs that can be as well
> 4k if you created the fs with -b 4096)

Amen to this the differentation bitween blksize_size and
hardsect_size in
linux is:
a) not quite usefull, since blksize_size isn't in reality a property of
the
device but more a property of the actually mounted file system.
b) very confusing... see my last patch about the RaiserFS and please
have a look at the AS390 code, which basically got *very* confused about
the sematics of blksize_size()

> I'll describe here some of the details of the blkdev-pagecache-1 patch:
> 
> - /dev/raw* and drivers/char/raw.c gets obsoleted and replaced by

HURRA! Great stuff!

>   opening the blkdevice with O_DIRECT, it looks much saner and I
>   basically get it for free by just implementing 10 lines of the
>   blkdev_direct_IO callback, of course I didn't removed the /dev/raw*
>   API for compatibility.

PLEASE REMOVE IT AS SOON AS POSSIBLE! It's an really insane API just for
ORACLE tuning, and well most oracle deployers don't run on /dev/raw* at
least not under Linux, where it basically doesn't give you any reall
performance gains... Or at least one could amke /dev/raw* a configure
option and
a module
 
> - I force the virtual blocksize for all the blkdev I/O
>   (buffered and direct) to work with a 4096 bytes granularity instead of

You mean PAGE_SIZE :-).

>   the current 1024 softblocksize because we need that for getting higher
>   performance, 1024 is too low because it wastes too much ram and too
>   much cpu. So a DBMS won't be able anymore to write 512bytes to the

Exactly, please see my former explanation... BTW.> If you are gogin into
the range of PAGE_SIZE, it may be very well possible to remove the
whole page associated mechanisms of a buffer_head?

>   disk using rawio being sure it will be a single atomic block update.
>   If you use /dev/raw nothing changed of course, only opening blkdev
>   with O_DIRECT enforce a minimal granularity of 4096 bytes in the I/O.
>   I don't think this is a problem, and also O_DIRECT through the fs was
>   just using the fs softblocksize instead of the hardblocksize as unit
>   of the minimal direct-IO granularity.
> 
> - writes to the blockdevice won't end in the buffer cache, so it will
>   be impossible to update the superblock of an ext2 partition mounted ro
>   for example, it must not be mounted at all to update the superblock, I
>   will need to invent an hack to fix this problem or it will get too
>   annoying. One way could simply to change ext2 and have it checking
>   the buffer to be uptodate before marking it dirty again but maybe
>   we could also do it in a generic manner that fixes all the fs at once
>   (OTOH probably not that many fs needs to be fscked online...).
> 
> - mmap should be functional but it's totally untested.
> 
> - currently the last `harddisk_size & 4095' bytes (if any) won't be
>   accessible via the blkdev, to avoid sending to the hardware requests
>   beyond the end of the device. Not sure how/if to solve this. But this is
>   definitely not a new issue, the same thing happens today in 2.2 and
>   2.4 after you mount a 4k filesystem on a blockdevice. OTOH I'm scared
>   a mke2fs -b 1024 could get confused. But I really don't want to
>   decrease the b_size of the buffer header even if we fix this.

Basically this is something which should come down to the strategy
routine
of the corresponding device and be fixed there... And then we have this
gross 
blk_size check in ll_rw_block.c 

Some notes about the code:

kdev_t dev = inode->i_rdev;
-   struct buffer_head * bh, *bufferlist[NBUF];
-   register char * p;
+   int err;
 
-   if (is_read_only(dev))
-   return -EPERM;
+   err = -EIO;
+   if (iblock >= (blk_size[MAJOR(dev)][MINOR(dev)] >>
(BUFFERED_BLOCKSIZE_BITS - BLOCK_SIZE_BITS)))
 ^

blk_size[MAJOR(dev)] can very well be equal NULL! In this case one is
supposed to assume blk_size[MAJOR(dev)][MINOR(dev)] to be INT_MAX.
Are you shure it's guaranteed here to be already preset?

Same question goes for calc_end_index and calc_rsize.


+   goto out;
 
-   written = write_error = buffercount = 0;
-   blocksize = BLOCK_SIZE;
-   if (blksize_size[MAJOR(dev)] && blksize_size[MAJOR(dev)][MINOR(dev)])
-   blocksize = blksize_size[MAJOR(dev)][MINOR(dev)];
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Martin Dalecki

Rusty Russell wrote:
> 
> In message <[EMAIL PROTECTED]> you write:
> >
> > Jonathan Morton writes:
> >  > >-  page_count(page) == (1 + !!page->buffers));
> >  >
> >  > Two inversions in a row?
> >
> > It is the most straightforward way to make a '1' or '0'
> > integer from the NULL state of a pointer.
> 
> Overall, I'd have to say that this:
> 
> -   dead_swap_page =
> -   (PageSwapCache(page) &&
> -page_count(page) == (1 + !!page->buffers));
> -
> 
> Is nicer as:
> 
> int dead_swap_page = 0;
> 
> if (PageSwapCache(page)
> && page_count(page) == (page->buffers ? 1 : 2))
> dead_swap_page = 1;
> 
> After all, the second is what the code *means* (1 and 2 are magic
> numbers).
> 
> That said, anyone who doesn't understand the former should probably
> get some more C experience before commenting on others' code...

Basically Amen.

But there are may be better chances that the compiler does do
better job at branch prediction in the second case? 
Wenn anyway objdump -S should show it...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: page_launder() bug

2001-05-09 Thread Martin Dalecki

Rusty Russell wrote:
 
 In message [EMAIL PROTECTED] you write:
 
  Jonathan Morton writes:
-  page_count(page) == (1 + !!page-buffers));
   
Two inversions in a row?
 
  It is the most straightforward way to make a '1' or '0'
  integer from the NULL state of a pointer.
 
 Overall, I'd have to say that this:
 
 -   dead_swap_page =
 -   (PageSwapCache(page) 
 -page_count(page) == (1 + !!page-buffers));
 -
 
 Is nicer as:
 
 int dead_swap_page = 0;
 
 if (PageSwapCache(page)
  page_count(page) == (page-buffers ? 1 : 2))
 dead_swap_page = 1;
 
 After all, the second is what the code *means* (1 and 2 are magic
 numbers).
 
 That said, anyone who doesn't understand the former should probably
 get some more C experience before commenting on others' code...

Basically Amen.

But there are may be better chances that the compiler does do
better job at branch prediction in the second case? 
Wenn anyway objdump -S should show it...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: blkdev in pagecache

2001-05-09 Thread Martin Dalecki

Andrea Arcangeli wrote:

 (btw, also the current rawio uses a 512byte bh-b_size granularity that is even
 worse than the 1024byte b_size of the blkdev, O_DIRECT is much smarter
 on this side as it uses the softblocksize of the fs that can be as well
 4k if you created the fs with -b 4096)

Amen to this the differentation bitween blksize_size and
hardsect_size in
linux is:
a) not quite usefull, since blksize_size isn't in reality a property of
the
device but more a property of the actually mounted file system.
b) very confusing... see my last patch about the RaiserFS and please
have a look at the AS390 code, which basically got *very* confused about
the sematics of blksize_size()

 I'll describe here some of the details of the blkdev-pagecache-1 patch:
 
 - /dev/raw* and drivers/char/raw.c gets obsoleted and replaced by

HURRA! Great stuff!

   opening the blkdevice with O_DIRECT, it looks much saner and I
   basically get it for free by just implementing 10 lines of the
   blkdev_direct_IO callback, of course I didn't removed the /dev/raw*
   API for compatibility.

PLEASE REMOVE IT AS SOON AS POSSIBLE! It's an really insane API just for
ORACLE tuning, and well most oracle deployers don't run on /dev/raw* at
least not under Linux, where it basically doesn't give you any reall
performance gains... Or at least one could amke /dev/raw* a configure
option and
a module
 
 - I force the virtual blocksize for all the blkdev I/O
   (buffered and direct) to work with a 4096 bytes granularity instead of

You mean PAGE_SIZE :-).

   the current 1024 softblocksize because we need that for getting higher
   performance, 1024 is too low because it wastes too much ram and too
   much cpu. So a DBMS won't be able anymore to write 512bytes to the

Exactly, please see my former explanation... BTW. If you are gogin into
the range of PAGE_SIZE, it may be very well possible to remove the
whole page associated mechanisms of a buffer_head?

   disk using rawio being sure it will be a single atomic block update.
   If you use /dev/raw nothing changed of course, only opening blkdev
   with O_DIRECT enforce a minimal granularity of 4096 bytes in the I/O.
   I don't think this is a problem, and also O_DIRECT through the fs was
   just using the fs softblocksize instead of the hardblocksize as unit
   of the minimal direct-IO granularity.
 
 - writes to the blockdevice won't end in the buffer cache, so it will
   be impossible to update the superblock of an ext2 partition mounted ro
   for example, it must not be mounted at all to update the superblock, I
   will need to invent an hack to fix this problem or it will get too
   annoying. One way could simply to change ext2 and have it checking
   the buffer to be uptodate before marking it dirty again but maybe
   we could also do it in a generic manner that fixes all the fs at once
   (OTOH probably not that many fs needs to be fscked online...).
 
 - mmap should be functional but it's totally untested.
 
 - currently the last `harddisk_size  4095' bytes (if any) won't be
   accessible via the blkdev, to avoid sending to the hardware requests
   beyond the end of the device. Not sure how/if to solve this. But this is
   definitely not a new issue, the same thing happens today in 2.2 and
   2.4 after you mount a 4k filesystem on a blockdevice. OTOH I'm scared
   a mke2fs -b 1024 could get confused. But I really don't want to
   decrease the b_size of the buffer header even if we fix this.

Basically this is something which should come down to the strategy
routine
of the corresponding device and be fixed there... And then we have this
gross 
blk_size check in ll_rw_block.c 

Some notes about the code:

kdev_t dev = inode-i_rdev;
-   struct buffer_head * bh, *bufferlist[NBUF];
-   register char * p;
+   int err;
 
-   if (is_read_only(dev))
-   return -EPERM;
+   err = -EIO;
+   if (iblock = (blk_size[MAJOR(dev)][MINOR(dev)] 
(BUFFERED_BLOCKSIZE_BITS - BLOCK_SIZE_BITS)))
 ^

blk_size[MAJOR(dev)] can very well be equal NULL! In this case one is
supposed to assume blk_size[MAJOR(dev)][MINOR(dev)] to be INT_MAX.
Are you shure it's guaranteed here to be already preset?

Same question goes for calc_end_index and calc_rsize.


+   goto out;
 
-   written = write_error = buffercount = 0;
-   blocksize = BLOCK_SIZE;
-   if (blksize_size[MAJOR(dev)]  blksize_size[MAJOR(dev)][MINOR(dev)])
-   blocksize = blksize_size[MAJOR(dev)][MINOR(dev)];
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: blkdev in pagecache

2001-05-09 Thread Martin Dalecki

Andrea Arcangeli wrote:
 
 On Wed, May 09, 2001 at 11:13:33AM +0200, Martin Dalecki wrote:
 (buffered and direct) to work with a 4096 bytes granularity instead of
 
  You mean PAGE_SIZE :-).
 
 In my first patch it is really 4096 bytes, but yes I agree we should
 change that to PAGE_CACHE_SIZE. The _only_ reason it's 4096 fixed bytes is that
 I wasn't sure all the device drivers out there can digest a bh-b_size of
 8k/32k/64k (for the non x86 archs) and I checked the minimal PAGE_SIZE
 supported by linux is 4k. If Jens says I can sumbit 64k b_size without
 any problem for all the relevant blkdevices then I will change that in a
 jiffy ;). Anyways changing that is truly easy, just define
 BUFFERED_BLOCKSIZE to PAGE_CACHE_SIZE instad of 4096 (plus the .._BITS as
 well) and it should do the trick automatically. So for now I only cared
 to make it easy to change that.
 
  Exactly, please see my former explanation... BTW. If you are gogin into
  the range of PAGE_SIZE, it may be very well possible to remove the
  whole page assoociated mechanisms of a buffer_head?
 
 I wouldn't be that trivial to drop it, not much different than dropping
 it when a fs has a 4k blocksize. I think the dynamic allocation of the
 bh is not that a bad thing, or at least it's an orthogonal problem to
 moving the blkdev in pagecache ;).

I think the only guys which will have a hard time on this will be ibm's 
AS/390 people and maybe a far fainter pille of problems will araise in
lvm and raid
code... As I stated already in esp the AS/390 are the ones most confused
about
blksize_size ver. hardsect_size bh-b_size and so on semantics.
find /usr/src/linux -exec grep blksize_size /dev/null {} \;
shows this fine as well as the corresponding BLOCK_SIZE redefinition in
the
lvm.h file! Well not much worth of caring about I think... (It will just
*force*
them to write cleaner code 8-).

 
  Basically this is something which should come down to the strategy
  routine
  of the corresponding device and be fixed there... And then we have this
 
 so you mean the device driver should make sure blk_size is PAGE_CACHE_SIZE
 aligned and to take care of writing zero in the pagecache beyond the end
 of the device? That would be fine from my part but I'm not yet sure
 that's the cleanest manner to handle that.

Yes that's about it. We *can* afford to expect that the case of access
behind
a device should be handled as an exception and not by checks
beforeahead.
This should greatly simplify the main code...

 
  Some notes about the code:
 
kdev_t dev = inode-i_rdev;
  - struct buffer_head * bh, *bufferlist[NBUF];
  - register char * p;
  + int err;
 
  - if (is_read_only(dev))
  - return -EPERM;
  + err = -EIO;
  + if (iblock = (blk_size[MAJOR(dev)][MINOR(dev)] 
  (BUFFERED_BLOCKSIZE_BITS - BLOCK_SIZE_BITS)))
 ^
 
  blk_size[MAJOR(dev)] can very well be equal NULL! In this case one is
  supposed to assume blk_size[MAJOR(dev)][MINOR(dev)] to be INT_MAX.
  Are you shure it's guaranteed here to be already preset?
 
  Same question goes for calc_end_index and calc_rsize.
 
 that's a bug indeed (a minor one at least because all the relevant
 blkdevices initialize such array and if it's not initialized you notice
 before you can make any damage ;), thanks for pointing it out!

This kind of problem slipery in are the reasons for the last tinny
encapsulation patch I sendid
to Linus and Alan (for inclusion into 2.4.5)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: iso9660 endianness cleanup patch

2001-05-02 Thread Martin Dalecki

"H. Peter Anvin" wrote:
> 
> Hi guys,
> 
> I was looking over the iso9660 code, and noticed that it was doing
> endianness conversion via ad hoc *functions*, not even inlines; nor did
> it take any advantage of the fact that iso9660 is bi-endian (has "all"
> data in both bigendian and littleendian format.)
> 
> The attached patch fixes both.  It is against 2.4.4, but from the looks
> of it it should patch against -ac as well.

Please beware: There is a can of worms you are openning up here, 
since there are many broken CD producer programms out there, which
only provide the little endian data and incorrect big endian
entries. I had some CD's of this form myself. So the endian neutrality
of the iso9660 is only in the theory present...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: iso9660 endianness cleanup patch

2001-05-02 Thread Martin Dalecki

H. Peter Anvin wrote:
 
 Hi guys,
 
 I was looking over the iso9660 code, and noticed that it was doing
 endianness conversion via ad hoc *functions*, not even inlines; nor did
 it take any advantage of the fact that iso9660 is bi-endian (has all
 data in both bigendian and littleendian format.)
 
 The attached patch fixes both.  It is against 2.4.4, but from the looks
 of it it should patch against -ac as well.

Please beware: There is a can of worms you are openning up here, 
since there are many broken CD producer programms out there, which
only provide the little endian data and incorrect big endian
entries. I had some CD's of this form myself. So the endian neutrality
of the iso9660 is only in the theory present...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PATCH 2.4.4 some fixes for the usage of blksize_size and others

2001-04-30 Thread Martin Dalecki
;
-
+   bs = get_blksize_size(inode->i_rdev);
set_blocksize(dev, bs);
 
lo->lo_bh = lo->lo_bhtail = NULL;
diff -urN linux/drivers/char/raw.c new/drivers/char/raw.c
--- linux/drivers/char/raw.cFri Apr 27 23:23:25 2001
+++ new/drivers/char/raw.c  Mon Apr 30 22:57:20 2001
@@ -124,22 +124,25 @@
return err;
}
 
-   
-   /* 
-* Don't interfere with mounted devices: we cannot safely set
-* the blocksize on a device which is already mounted.  
+   /*
+* 29.04.2001 Martin Dalecki:
+*
+* The original comment here was saying:
+*
+* "Don't interfere with mounted devices: we cannot safely set the
+* blocksize on a device which is already mounted."
+*
+* However the code below was setting happily the blocksize
+* disregarding the previous check. I have fixed this, however I'm
+* quite sure, that the statement above isn't right and we should be
+* able to remove the first arm of the branch below entierly.
 */
-   
-   sector_size = 512;
if (get_super(rdev) != NULL) {
-   if (blksize_size[MAJOR(rdev)])
-   sector_size = blksize_size[MAJOR(rdev)][MINOR(rdev)];
+   sector_size = get_blksize_size(rdev);
} else {
-   if (hardsect_size[MAJOR(rdev)])
-   sector_size = hardsect_size[MAJOR(rdev)][MINOR(rdev)];
+   sector_size = get_hardsect_size(rdev);
+   set_blocksize(rdev, sector_size);
}
-
-   set_blocksize(rdev, sector_size);
raw_devices[minor].sector_size = sector_size;
 
for (sector_bits = 0; !(sector_size & 1); )
@@ -148,7 +151,7 @@
 
  out:
up(_devices[minor].mutex);
-   
+
return err;
 }
 
diff -urN linux/drivers/md/lvm-snap.c new/drivers/md/lvm-snap.c
--- linux/drivers/md/lvm-snap.c Fri Apr 27 23:23:25 2001
+++ new/drivers/md/lvm-snap.c   Mon Apr 30 23:27:40 2001
@@ -172,20 +172,6 @@
blocks[i] = start++;
 }
 
-inline int lvm_get_blksize(kdev_t dev)
-{
-   int correct_size = BLOCK_SIZE, i, major;
-
-   major = MAJOR(dev);
-   if (blksize_size[major])
-   {
-   i = blksize_size[major][MINOR(dev)];
-   if (i)
-   correct_size = i;
-   }
-   return correct_size;
-}
-
 #ifdef DEBUG_SNAPSHOT
 static inline void invalidate_snap_cache(unsigned long start, unsigned long nr,
 kdev_t dev)
@@ -218,7 +204,7 @@
 
if (is == 0) return;
is--;
-blksize_snap = lvm_get_blksize(lv_snap->lv_block_exception[is].rdev_new);
+blksize_snap = get_blksize_size(lv_snap->lv_block_exception[is].rdev_new);
 is -= is % (blksize_snap / sizeof(lv_COW_table_disk_t));
 
memset(lv_COW_table, 0, blksize_snap);
@@ -262,7 +248,7 @@
snap_phys_dev = lv_snap->lv_block_exception[idx].rdev_new;
snap_pe_start = lv_snap->lv_block_exception[idx - (idx % 
COW_entries_per_pe)].rsector_new - lv_snap->lv_chunk_size;
 
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
 
 COW_entries_per_block = blksize_snap / sizeof(lv_COW_table_disk_t);
 idx_COW_table = idx % COW_entries_per_pe % COW_entries_per_block;
@@ -307,7 +293,7 @@
idx++;
snap_phys_dev = lv_snap->lv_block_exception[idx].rdev_new;
snap_pe_start = lv_snap->lv_block_exception[idx - (idx % 
COW_entries_per_pe)].rsector_new - lv_snap->lv_chunk_size;
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
iobuf->blocks[0] = snap_pe_start >> (blksize_snap >> 10);
} else iobuf->blocks[0]++;
 
@@ -384,8 +370,8 @@
 
iobuf = lv_snap->lv_iobuf;
 
-   blksize_org = lvm_get_blksize(org_phys_dev);
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_org = get_blksize_size(org_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
max_blksize = max(blksize_org, blksize_snap);
min_blksize = min(blksize_org, blksize_snap);
max_sectors = KIO_MAX_SECTORS * (min_blksize>>9);
diff -urN linux/drivers/md/lvm-snap.h new/drivers/md/lvm-snap.h
--- linux/drivers/md/lvm-snap.h Mon Jan 29 01:11:20 2001
+++ new/drivers/md/lvm-snap.h   Mon Apr 30 23:26:28 2001
@@ -32,7 +32,6 @@
 #define LVM_SNAP_H
 
 /* external snapshot calls */
-extern inline int lvm_get_blksize(kdev_t);
 extern int lvm_snapshot_alloc(lv_t *);
 extern void lvm_snapshot_fill_COW_page(vg_t *, lv_t *);
 extern int lvm_snapshot_COW(kdev_t, ulong, ulong, ulong, lv_t *);
diff -urN linux/drivers/md/lvm.c new/drivers/md/lvm.c
--- linux/driv

PATCH 2.4.4 some fixes for the usage of blksize_size and others

2001-04-30 Thread Martin Dalecki
 = lo-lo_bhtail = NULL;
diff -urN linux/drivers/char/raw.c new/drivers/char/raw.c
--- linux/drivers/char/raw.cFri Apr 27 23:23:25 2001
+++ new/drivers/char/raw.c  Mon Apr 30 22:57:20 2001
@@ -124,22 +124,25 @@
return err;
}
 
-   
-   /* 
-* Don't interfere with mounted devices: we cannot safely set
-* the blocksize on a device which is already mounted.  
+   /*
+* 29.04.2001 Martin Dalecki:
+*
+* The original comment here was saying:
+*
+* Don't interfere with mounted devices: we cannot safely set the
+* blocksize on a device which is already mounted.
+*
+* However the code below was setting happily the blocksize
+* disregarding the previous check. I have fixed this, however I'm
+* quite sure, that the statement above isn't right and we should be
+* able to remove the first arm of the branch below entierly.
 */
-   
-   sector_size = 512;
if (get_super(rdev) != NULL) {
-   if (blksize_size[MAJOR(rdev)])
-   sector_size = blksize_size[MAJOR(rdev)][MINOR(rdev)];
+   sector_size = get_blksize_size(rdev);
} else {
-   if (hardsect_size[MAJOR(rdev)])
-   sector_size = hardsect_size[MAJOR(rdev)][MINOR(rdev)];
+   sector_size = get_hardsect_size(rdev);
+   set_blocksize(rdev, sector_size);
}
-
-   set_blocksize(rdev, sector_size);
raw_devices[minor].sector_size = sector_size;
 
for (sector_bits = 0; !(sector_size  1); )
@@ -148,7 +151,7 @@
 
  out:
up(raw_devices[minor].mutex);
-   
+
return err;
 }
 
diff -urN linux/drivers/md/lvm-snap.c new/drivers/md/lvm-snap.c
--- linux/drivers/md/lvm-snap.c Fri Apr 27 23:23:25 2001
+++ new/drivers/md/lvm-snap.c   Mon Apr 30 23:27:40 2001
@@ -172,20 +172,6 @@
blocks[i] = start++;
 }
 
-inline int lvm_get_blksize(kdev_t dev)
-{
-   int correct_size = BLOCK_SIZE, i, major;
-
-   major = MAJOR(dev);
-   if (blksize_size[major])
-   {
-   i = blksize_size[major][MINOR(dev)];
-   if (i)
-   correct_size = i;
-   }
-   return correct_size;
-}
-
 #ifdef DEBUG_SNAPSHOT
 static inline void invalidate_snap_cache(unsigned long start, unsigned long nr,
 kdev_t dev)
@@ -218,7 +204,7 @@
 
if (is == 0) return;
is--;
-blksize_snap = lvm_get_blksize(lv_snap-lv_block_exception[is].rdev_new);
+blksize_snap = get_blksize_size(lv_snap-lv_block_exception[is].rdev_new);
 is -= is % (blksize_snap / sizeof(lv_COW_table_disk_t));
 
memset(lv_COW_table, 0, blksize_snap);
@@ -262,7 +248,7 @@
snap_phys_dev = lv_snap-lv_block_exception[idx].rdev_new;
snap_pe_start = lv_snap-lv_block_exception[idx - (idx % 
COW_entries_per_pe)].rsector_new - lv_snap-lv_chunk_size;
 
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
 
 COW_entries_per_block = blksize_snap / sizeof(lv_COW_table_disk_t);
 idx_COW_table = idx % COW_entries_per_pe % COW_entries_per_block;
@@ -307,7 +293,7 @@
idx++;
snap_phys_dev = lv_snap-lv_block_exception[idx].rdev_new;
snap_pe_start = lv_snap-lv_block_exception[idx - (idx % 
COW_entries_per_pe)].rsector_new - lv_snap-lv_chunk_size;
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
iobuf-blocks[0] = snap_pe_start  (blksize_snap  10);
} else iobuf-blocks[0]++;
 
@@ -384,8 +370,8 @@
 
iobuf = lv_snap-lv_iobuf;
 
-   blksize_org = lvm_get_blksize(org_phys_dev);
-   blksize_snap = lvm_get_blksize(snap_phys_dev);
+   blksize_org = get_blksize_size(org_phys_dev);
+   blksize_snap = get_blksize_size(snap_phys_dev);
max_blksize = max(blksize_org, blksize_snap);
min_blksize = min(blksize_org, blksize_snap);
max_sectors = KIO_MAX_SECTORS * (min_blksize9);
diff -urN linux/drivers/md/lvm-snap.h new/drivers/md/lvm-snap.h
--- linux/drivers/md/lvm-snap.h Mon Jan 29 01:11:20 2001
+++ new/drivers/md/lvm-snap.h   Mon Apr 30 23:26:28 2001
@@ -32,7 +32,6 @@
 #define LVM_SNAP_H
 
 /* external snapshot calls */
-extern inline int lvm_get_blksize(kdev_t);
 extern int lvm_snapshot_alloc(lv_t *);
 extern void lvm_snapshot_fill_COW_page(vg_t *, lv_t *);
 extern int lvm_snapshot_COW(kdev_t, ulong, ulong, ulong, lv_t *);
diff -urN linux/drivers/md/lvm.c new/drivers/md/lvm.c
--- linux/drivers/md/lvm.c  Sat Apr 21 19:37:16 2001
+++ new/drivers/md/lvm.cMon Apr 30 23:31:56 2001
@@ -1077,7 +1077,7 @@
memset(bh,0,sizeof bh);
bh.b_rsector = block;
bh.b_dev

Re: [PATCH] cleanup for fixing get_super() races

2001-04-29 Thread Martin Dalecki

Alexander Viro wrote:
> 
> On Sat, 28 Apr 2001, Martin Dalecki wrote:
> 
> > I think in the context you are inventig the proposed function,
> > the drivers has allways an inode at hand. And contrary to what Linus
> 
> Read the patch. Almost all cases are of the "loop over partitions of foo"
> kind.
> 
> > says, drivers not just know about the devices they handle, they
> > know about the data they should get - at least in the context
> > of block devices. And then you could as well pass the inode, which
> > is already containing a refference to the corresponding sb and
> > save the whole get_super linear array lookup 8-). I think
> 
> No, you don't. Moreover, inode of device (even if you had it) _doesn't_
> contain a reference to sb of filesystem mounted from that device.

Ohhh sorry right, I just did forget to have an checking look at the code
before actual rammbling... It must had been some reminiscent from
some other expermient with the kernel code I did recently that confused
me.
Sorry again...

> > the less kdev_t the better! It's overused already anyway, like
> > for example in the whole SCSI code, where the functions in reality only
> > want to pass the minor number to differentiate they behaviour...

This however I still hold up...

> > If you are gogin to flag the behaviour of the function,
> > then please use a bitpattern of well definded flags as a parameter,
> > in a similiar way like it's done for example in many GUI libraries
> > (GTK, Motif and so on). This would make it far more readabel.
> 
> /me looks at From:
> OK, Albert, what have you done with real Martin?
> 
> OK, whoever you are - no, "expandable" interfaces of that sort are
> rotten idea. What we really need is to replace sync_dev with fsync_dev -
> it _is_ correct in such context. That's it - 1 bit of information, no
> bitmaps needed.
> 
> /me is still boggled by the idea of somebody refereing to GTK as an
> example of style...

Ehm, only for the waythe flags get passed, not the rest of it.
You know if I see some parameter, taking possible values 0, 1, 2
then I mostly think that there should be some concrete names given to
them :-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] cleanup for fixing get_super() races

2001-04-29 Thread Martin Dalecki

Alexander Viro wrote:
 
 On Sat, 28 Apr 2001, Martin Dalecki wrote:
 
  I think in the context you are inventig the proposed function,
  the drivers has allways an inode at hand. And contrary to what Linus
 
 Read the patch. Almost all cases are of the loop over partitions of foo
 kind.
 
  says, drivers not just know about the devices they handle, they
  know about the data they should get - at least in the context
  of block devices. And then you could as well pass the inode, which
  is already containing a refference to the corresponding sb and
  save the whole get_super linear array lookup 8-). I think
 
 No, you don't. Moreover, inode of device (even if you had it) _doesn't_
 contain a reference to sb of filesystem mounted from that device.

Ohhh sorry right, I just did forget to have an checking look at the code
before actual rammbling... It must had been some reminiscent from
some other expermient with the kernel code I did recently that confused
me.
Sorry again...

  the less kdev_t the better! It's overused already anyway, like
  for example in the whole SCSI code, where the functions in reality only
  want to pass the minor number to differentiate they behaviour...

This however I still hold up...

  If you are gogin to flag the behaviour of the function,
  then please use a bitpattern of well definded flags as a parameter,
  in a similiar way like it's done for example in many GUI libraries
  (GTK, Motif and so on). This would make it far more readabel.
 
 /me looks at From:
 OK, Albert, what have you done with real Martin?
 
 OK, whoever you are - no, expandable interfaces of that sort are
 rotten idea. What we really need is to replace sync_dev with fsync_dev -
 it _is_ correct in such context. That's it - 1 bit of information, no
 bitmaps needed.
 
 /me is still boggled by the idea of somebody refereing to GTK as an
 example of style...

Ehm, only for the waythe flags get passed, not the rest of it.
You know if I see some parameter, taking possible values 0, 1, 2
then I mostly think that there should be some concrete names given to
them :-).
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] cleanup for fixing get_super() races

2001-04-28 Thread Martin Dalecki

Alexander Viro wrote:
> 
> On Fri, 27 Apr 2001, Alexander Viro wrote:
> 
> > Fine with me. Actually in _all_ cases execept cdrom.c it's preceded by
> > either sync_dev() or fsync_dev(). What do you think about pulling that
> > into the same function? Actually, that's what I've done in namespace
> > patch (name being invalidate_dev(), BTW ;-) The only problem I see
> > here is the argument telling whether we want sync or fsync (or nothing).
> > OTOH, I seriously suspect that we ought replace all sync_dev() cases
> > with fsync_dev() anyway... Your opinion?
> >   Al
> 
> PS: last time I've separated that part of patch was a couple months
> ago. See if something similar to the variant below would be OK with
> you (I'll rediff it):

I think in the context you are inventig the proposed function, 
the drivers has allways an inode at hand. And contrary to what Linus
says, drivers not just know about the devices they handle, they 
know about the data they should get - at least in the context
of block devices. And then you could as well pass the inode, which
is already containing a refference to the corresponding sb and
save the whole get_super linear array lookup 8-). I think
the less kdev_t the better! It's overused already anyway, like
for example in the whole SCSI code, where the functions in reality only
want to pass the minor number to differentiate they behaviour...

If you are gogin to flag the behaviour of the function,
then please use a bitpattern of well definded flags as a parameter,
in a similiar way like it's done for example in many GUI libraries
(GTK, Motif and so on). This would make it far more readabel.

-- just my two euro-cent's...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] cleanup for fixing get_super() races

2001-04-28 Thread Martin Dalecki

Alexander Viro wrote:
 
 On Fri, 27 Apr 2001, Alexander Viro wrote:
 
  Fine with me. Actually in _all_ cases execept cdrom.c it's preceded by
  either sync_dev() or fsync_dev(). What do you think about pulling that
  into the same function? Actually, that's what I've done in namespace
  patch (name being invalidate_dev(), BTW ;-) The only problem I see
  here is the argument telling whether we want sync or fsync (or nothing).
  OTOH, I seriously suspect that we ought replace all sync_dev() cases
  with fsync_dev() anyway... Your opinion?
Al
 
 PS: last time I've separated that part of patch was a couple months
 ago. See if something similar to the variant below would be OK with
 you (I'll rediff it):

I think in the context you are inventig the proposed function, 
the drivers has allways an inode at hand. And contrary to what Linus
says, drivers not just know about the devices they handle, they 
know about the data they should get - at least in the context
of block devices. And then you could as well pass the inode, which
is already containing a refference to the corresponding sb and
save the whole get_super linear array lookup 8-). I think
the less kdev_t the better! It's overused already anyway, like
for example in the whole SCSI code, where the functions in reality only
want to pass the minor number to differentiate they behaviour...

If you are gogin to flag the behaviour of the function,
then please use a bitpattern of well definded flags as a parameter,
in a similiar way like it's done for example in many GUI libraries
(GTK, Motif and so on). This would make it far more readabel.

-- just my two euro-cent's...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-04-27 Thread Martin Dalecki

Linus Torvalds wrote:

> Dump was a stupid program in the first place. Leave it behind.

Not quite Linus - dump/restore are nice tools to create for example
automatic over network installation servers, i.e. efficient system
images
or such. tar/cpio and friends don't deal properly with

a. holes inside of files.
b. hardlinks between files.

Really they are not useless. However I wouldn't recommend them
for backup practicies as well.

Please see for example:

http://www.systime-solutions.de/index.php?topic=produkte=setupserver

Well yes, if you understand german...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PATCH for 2.4.3 - tinny mount code cleanup (kernel 0.97 compatibility)

2001-04-27 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> From: Martin Dalecki <[EMAIL PROTECTED]>
> 
> The attached patch is fixing georgeous "backward compatibility"
> in the mount system command. It is removing two useless defines in
> the kernel headers and finally doubles the number of possible
> flags for the mount command.
> 
> Please apply.
> 
> You have it all backwards. Your patch halves the number of
> possible flags. The present kernel can use 32 (or 31) flags.
> 
> @@ -1317,10 +1313,6 @@
>  struct super_block *sb;
>  int retval = 0;
> 
> -/* Discard magic */
> -if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
> -flags &= ~MS_MGC_MSK;
> -
>  /* Basic sanity checks */
> 
>  if (!dir_name || !*dir_name || !memchr(dir_name, 0, PAGE_SIZE))
> 
> You see what this code does: if the top half has this old magic
> (as it has today in 100% of all Linux installations),
> then the top half is ignored.
> If the value is non-conventional, it can be used to mean something.
> 
> Maybe you did not realize that mount still puts that value there?

Oops typo in find ./ ... grep on my side  maybe?

Anyway at least the comment there is at best missleading...

> The mount we use today will be around for many years to come.
> This "discard magic" part cannot be removed within five years.
> 
> Andries
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PATCH for 2.4.3 - tinny mount code cleanup (kernel 0.97 compatibility)

2001-04-27 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
 
 From: Martin Dalecki [EMAIL PROTECTED]
 
 The attached patch is fixing georgeous backward compatibility
 in the mount system command. It is removing two useless defines in
 the kernel headers and finally doubles the number of possible
 flags for the mount command.
 
 Please apply.
 
 You have it all backwards. Your patch halves the number of
 possible flags. The present kernel can use 32 (or 31) flags.
 
 @@ -1317,10 +1313,6 @@
  struct super_block *sb;
  int retval = 0;
 
 -/* Discard magic */
 -if ((flags  MS_MGC_MSK) == MS_MGC_VAL)
 -flags = ~MS_MGC_MSK;
 -
  /* Basic sanity checks */
 
  if (!dir_name || !*dir_name || !memchr(dir_name, 0, PAGE_SIZE))
 
 You see what this code does: if the top half has this old magic
 (as it has today in 100% of all Linux installations),
 then the top half is ignored.
 If the value is non-conventional, it can be used to mean something.
 
 Maybe you did not realize that mount still puts that value there?

Oops typo in find ./ ... grep on my side  maybe?

Anyway at least the comment there is at best missleading...

 The mount we use today will be around for many years to come.
 This discard magic part cannot be removed within five years.
 
 Andries
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] SMP race in ext2 - metadata corruption.

2001-04-27 Thread Martin Dalecki

Linus Torvalds wrote:

 Dump was a stupid program in the first place. Leave it behind.

Not quite Linus - dump/restore are nice tools to create for example
automatic over network installation servers, i.e. efficient system
images
or such. tar/cpio and friends don't deal properly with

a. holes inside of files.
b. hardlinks between files.

Really they are not useless. However I wouldn't recommend them
for backup practicies as well.

Please see for example:

http://www.systime-solutions.de/index.php?topic=produktesubtopic=setupserver

Well yes, if you understand german...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PATCH: 2.4.3 tinny module interface cleanum

2001-04-26 Thread Martin Dalecki

Ingo Oeser wrote:
> 
> On Thu, Apr 26, 2001 at 10:58:46AM +0200, Martin Dalecki wrote:
> > 1. Help making the module interface cleaner by a tinny margin :-).
> 
> You only help changing the API during a stable[1] series. Wait until 2.5
> for this.
> 
> API cannot change during stable series. (ABI can, BTW)
> So lets just forget about this, ok ;-)

So just show me one module using this function in a non-broken way!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PATCH: 2.4.3 tinny module interface cleanum

2001-04-26 Thread Martin Dalecki

Hello!

The following patch is making the get_empty_super() function
just local to the place where it's only use is and where it's only
use should be: fs/super.c

The removal of this symbol from ksyms.c should:

1. Help making the module interface cleaner by a tinny margin :-).

2. shouldn't hurt anything sane.

Please apply... line sloops in the patch are due to bla bla, and
don't hurt...

Thank's

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

diff -ur linux/fs/super.c new/fs/super.c
--- linux/fs/super.cWed Apr 18 20:41:17 2001
+++ new/fs/super.c  Thu Apr 26 01:08:48 2001
@@ -691,7 +691,7 @@
  * the request.
  */
  
-struct super_block *get_empty_super(void)
+static struct super_block *get_empty_super(void)
 {
struct super_block *s;
 
diff -ur linux/include/linux/fs.h new/include/linux/fs.h
--- linux/include/linux/fs.hWed Apr 18 20:41:18 2001
+++ new/include/linux/fs.h  Thu Apr 26 01:03:03 2001
@@ -1291,7 +1285,6 @@
 
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(kdev_t);
-struct super_block *get_empty_super(void);
 extern void put_super(kdev_t);
 unsigned long generate_cluster(kdev_t, int b[], int);
 unsigned long generate_cluster_swab32(kdev_t, int b[], int);
diff -ur linux/kernel/ksyms.c new/kernel/ksyms.c
--- linux/kernel/ksyms.cWed Apr 18 20:41:19 2001
+++ new/kernel/ksyms.c  Thu Apr 26 00:40:48 2001
@@ -129,7 +129,6 @@
 EXPORT_SYMBOL(update_atime);
 EXPORT_SYMBOL(get_fs_type);
 EXPORT_SYMBOL(get_super);
-EXPORT_SYMBOL(get_empty_super);
 EXPORT_SYMBOL(getname);
 EXPORT_SYMBOL(names_cachep);
 EXPORT_SYMBOL(fput);



PATCH for 2.4.3 - tinny mount code cleanup (kernel 0.97 compatibility)

2001-04-26 Thread Martin Dalecki

The attached patch is fixing georgeous "backward compatibility"
in the mount system command. It is removing two useless defines in
the kernel headers and finally doubles the number of possible
flags for the mount command.

Please apply.

If there are any line count difference warnings when applying this
patch to the vanilla 2.4.3 tree- then please bear with me, 
it's only due to the fact that I preffered to edit this patch by hand
out form some other...

Thank's in advance.

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

diff -ur linux/arch/sparc/kernel/sys_sunos.c new/arch/sparc/kernel/sys_sunos.c
--- linux/arch/sparc/kernel/sys_sunos.c Wed Apr 18 20:40:50 2001
+++ new/arch/sparc/kernel/sys_sunos.c   Thu Apr 26 01:01:50 2001
@@ -749,7 +749,7 @@
 asmlinkage int
 sunos_mount(char *type, char *dir, int flags, void *data)
 {
-   int linux_flags = MS_MGC_MSK; /* new semantics */
+   int linux_flags = 0;
int ret = -EINVAL;
char *dev_fname = 0;
char *dir_page, *type_page;
diff -ur linux/arch/sparc64/kernel/sys_sunos32.c new/arch/sparc64/kernel/sys_sunos32.c
--- linux/arch/sparc64/kernel/sys_sunos32.c Wed Apr 18 20:40:50 2001
+++ new/arch/sparc64/kernel/sys_sunos32.c   Thu Apr 26 01:01:46 2001
@@ -717,7 +717,7 @@
 asmlinkage int
 sunos_mount(char *type, char *dir, int flags, void *data)
 {
-   int linux_flags = MS_MGC_MSK; /* new semantics */
+   int linux_flags = 0;
int ret = -EINVAL;
char *dev_fname = 0;
char *dir_page, *type_page;
diff -ur linux/fs/super.c new/fs/super.c
--- linux/fs/super.cWed Apr 18 20:41:17 2001
+++ new/fs/super.c  Thu Apr 26 01:08:48 2001
@@ -1297,16 +1297,12 @@
 }
 
 /*
- * Flags is a 16-bit value that allows up to 16 non-fs dependent flags to
+ * Flags is a 32-bit value that allows up to 32 non-fs dependent flags to
  * be given to the mount() call (ie: read-only, no-dev, no-suid etc).
  *
  * data is a (void *) that can point to any structure up to
  * PAGE_SIZE-1 bytes, which can contain arbitrary fs-dependent
  * information (or be NULL).
- *
- * NOTE! As pre-0.97 versions of mount() didn't use this setup, the
- * flags used to have a special 16-bit magic number in the high word:
- * 0xC0ED. If this magic number is present, the high word is discarded.
  */
 long do_mount(char * dev_name, char * dir_name, char *type_page,
  unsigned long flags, void *data_page)
@@ -1317,10 +1313,6 @@
struct super_block *sb;
int retval = 0;
 
-   /* Discard magic */
-   if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
-   flags &= ~MS_MGC_MSK;
- 
/* Basic sanity checks */
 
if (!dir_name || !*dir_name || !memchr(dir_name, 0, PAGE_SIZE))
@@ -1345,12 +1337,6 @@
if (!type_page || !memchr(type_page, 0, PAGE_SIZE))
return -EINVAL;
 
-#if 0  /* Can be deleted again. Introduced in patch-2.3.99-pre6 */
-   /* loopback mount? This is special - requires fewer capabilities */
-   if (strcmp(type_page, "bind")==0)
-   return do_loopback(dev_name, dir_name);
-#endif
-
/* for the rest we _really_ need capabilities... */
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
diff -ur linux/include/linux/fs.h new/include/linux/fs.h
--- linux/include/linux/fs.hWed Apr 18 20:41:18 2001
+++ new/include/linux/fs.h  Thu Apr 26 01:03:03 2001
@@ -121,12 +121,6 @@
 #define MS_RMT_MASK(MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|\
MS_SYNCHRONOUS|MS_MANDLOCK|MS_NOATIME|MS_NODIRATIME)
 
-/*
- * Magic mount flag number. Has to be or-ed to the flag values.
- */
-#define MS_MGC_VAL 0xC0ED  /* magic flag number to indicate "new" flags */
-#define MS_MGC_MSK 0x  /* magic flag number mask */
-
 /* Inode flags - they have nothing to superblock flags now */
 
 #define S_SYNC 1   /* Writes are synced at once */



PATCH for 2.4.3 - tinny mount code cleanup (kernel 0.97 compatibility)

2001-04-26 Thread Martin Dalecki

The attached patch is fixing georgeous backward compatibility
in the mount system command. It is removing two useless defines in
the kernel headers and finally doubles the number of possible
flags for the mount command.

Please apply.

If there are any line count difference warnings when applying this
patch to the vanilla 2.4.3 tree- then please bear with me, 
it's only due to the fact that I preffered to edit this patch by hand
out form some other...

Thank's in advance.

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

diff -ur linux/arch/sparc/kernel/sys_sunos.c new/arch/sparc/kernel/sys_sunos.c
--- linux/arch/sparc/kernel/sys_sunos.c Wed Apr 18 20:40:50 2001
+++ new/arch/sparc/kernel/sys_sunos.c   Thu Apr 26 01:01:50 2001
@@ -749,7 +749,7 @@
 asmlinkage int
 sunos_mount(char *type, char *dir, int flags, void *data)
 {
-   int linux_flags = MS_MGC_MSK; /* new semantics */
+   int linux_flags = 0;
int ret = -EINVAL;
char *dev_fname = 0;
char *dir_page, *type_page;
diff -ur linux/arch/sparc64/kernel/sys_sunos32.c new/arch/sparc64/kernel/sys_sunos32.c
--- linux/arch/sparc64/kernel/sys_sunos32.c Wed Apr 18 20:40:50 2001
+++ new/arch/sparc64/kernel/sys_sunos32.c   Thu Apr 26 01:01:46 2001
@@ -717,7 +717,7 @@
 asmlinkage int
 sunos_mount(char *type, char *dir, int flags, void *data)
 {
-   int linux_flags = MS_MGC_MSK; /* new semantics */
+   int linux_flags = 0;
int ret = -EINVAL;
char *dev_fname = 0;
char *dir_page, *type_page;
diff -ur linux/fs/super.c new/fs/super.c
--- linux/fs/super.cWed Apr 18 20:41:17 2001
+++ new/fs/super.c  Thu Apr 26 01:08:48 2001
@@ -1297,16 +1297,12 @@
 }
 
 /*
- * Flags is a 16-bit value that allows up to 16 non-fs dependent flags to
+ * Flags is a 32-bit value that allows up to 32 non-fs dependent flags to
  * be given to the mount() call (ie: read-only, no-dev, no-suid etc).
  *
  * data is a (void *) that can point to any structure up to
  * PAGE_SIZE-1 bytes, which can contain arbitrary fs-dependent
  * information (or be NULL).
- *
- * NOTE! As pre-0.97 versions of mount() didn't use this setup, the
- * flags used to have a special 16-bit magic number in the high word:
- * 0xC0ED. If this magic number is present, the high word is discarded.
  */
 long do_mount(char * dev_name, char * dir_name, char *type_page,
  unsigned long flags, void *data_page)
@@ -1317,10 +1313,6 @@
struct super_block *sb;
int retval = 0;
 
-   /* Discard magic */
-   if ((flags  MS_MGC_MSK) == MS_MGC_VAL)
-   flags = ~MS_MGC_MSK;
- 
/* Basic sanity checks */
 
if (!dir_name || !*dir_name || !memchr(dir_name, 0, PAGE_SIZE))
@@ -1345,12 +1337,6 @@
if (!type_page || !memchr(type_page, 0, PAGE_SIZE))
return -EINVAL;
 
-#if 0  /* Can be deleted again. Introduced in patch-2.3.99-pre6 */
-   /* loopback mount? This is special - requires fewer capabilities */
-   if (strcmp(type_page, bind)==0)
-   return do_loopback(dev_name, dir_name);
-#endif
-
/* for the rest we _really_ need capabilities... */
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
diff -ur linux/include/linux/fs.h new/include/linux/fs.h
--- linux/include/linux/fs.hWed Apr 18 20:41:18 2001
+++ new/include/linux/fs.h  Thu Apr 26 01:03:03 2001
@@ -121,12 +121,6 @@
 #define MS_RMT_MASK(MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|\
MS_SYNCHRONOUS|MS_MANDLOCK|MS_NOATIME|MS_NODIRATIME)
 
-/*
- * Magic mount flag number. Has to be or-ed to the flag values.
- */
-#define MS_MGC_VAL 0xC0ED  /* magic flag number to indicate new flags */
-#define MS_MGC_MSK 0x  /* magic flag number mask */
-
 /* Inode flags - they have nothing to superblock flags now */
 
 #define S_SYNC 1   /* Writes are synced at once */



PATCH: 2.4.3 tinny module interface cleanum

2001-04-26 Thread Martin Dalecki

Hello!

The following patch is making the get_empty_super() function
just local to the place where it's only use is and where it's only
use should be: fs/super.c

The removal of this symbol from ksyms.c should:

1. Help making the module interface cleaner by a tinny margin :-).

2. shouldn't hurt anything sane.

Please apply... line sloops in the patch are due to bla bla, and
don't hurt...

Thank's

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

diff -ur linux/fs/super.c new/fs/super.c
--- linux/fs/super.cWed Apr 18 20:41:17 2001
+++ new/fs/super.c  Thu Apr 26 01:08:48 2001
@@ -691,7 +691,7 @@
  * the request.
  */
  
-struct super_block *get_empty_super(void)
+static struct super_block *get_empty_super(void)
 {
struct super_block *s;
 
diff -ur linux/include/linux/fs.h new/include/linux/fs.h
--- linux/include/linux/fs.hWed Apr 18 20:41:18 2001
+++ new/include/linux/fs.h  Thu Apr 26 01:03:03 2001
@@ -1291,7 +1285,6 @@
 
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(kdev_t);
-struct super_block *get_empty_super(void);
 extern void put_super(kdev_t);
 unsigned long generate_cluster(kdev_t, int b[], int);
 unsigned long generate_cluster_swab32(kdev_t, int b[], int);
diff -ur linux/kernel/ksyms.c new/kernel/ksyms.c
--- linux/kernel/ksyms.cWed Apr 18 20:41:19 2001
+++ new/kernel/ksyms.c  Thu Apr 26 00:40:48 2001
@@ -129,7 +129,6 @@
 EXPORT_SYMBOL(update_atime);
 EXPORT_SYMBOL(get_fs_type);
 EXPORT_SYMBOL(get_super);
-EXPORT_SYMBOL(get_empty_super);
 EXPORT_SYMBOL(getname);
 EXPORT_SYMBOL(names_cachep);
 EXPORT_SYMBOL(fput);



Re: PATCH: 2.4.3 tinny module interface cleanum

2001-04-26 Thread Martin Dalecki

Ingo Oeser wrote:
 
 On Thu, Apr 26, 2001 at 10:58:46AM +0200, Martin Dalecki wrote:
  1. Help making the module interface cleaner by a tinny margin :-).
 
 You only help changing the API during a stable[1] series. Wait until 2.5
 for this.
 
 API cannot change during stable series. (ABI can, BTW)
 So lets just forget about this, ok ;-)

So just show me one module using this function in a non-broken way!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Registry (DevReg) Patch 0.2.0

2001-04-24 Thread Martin Dalecki

Tim Jansen wrote:
> 
> On Tuesday 24 April 2001 11:40, Martin Dalecki wrote:
> > Tim Jansen wrote:
> > > The Linux Device Registry (devreg) is a kernel patch that adds a device
> > > database in XML format to the /proc filesystem. It collects all
> > OH SHIT!!  ^^^
> > Why don't you just add postscript output to /proc?
> 
> XML wasn't my first choice. The 0.1.x versions used simple name/value pairs,
> I gave this up after trying to fit the complex USB
> configuration/interface/endpoint data into name/value pairs. Thinking about
> text file formats that allow me to display hierarchical information,  XML was
> the obvious choice for me. Are there alternatives to get complex and
> extendable information out to user space? (see
> http://www.tjansen.de/devreg/devreg.output.txt for a example /proc/devreg
> output)

Yes filesystem structures. Or just simple parsing in the user space
plain binary
data.

> My other ideas were:
> - using a simple binary format, just dump structs. This would break all
> applications every time somebody changes the format, and this should happen
> very often because of the nature of the format
> - using a complicated, extendable binary format, for example chunk-based like
> (a|r)iff file formats. This would add more code in the kernel than XML
> output, is difficult to understand and requires more work in user space
> (because XML parsers are already available)
> - making up a new text-based format with properties similar to XML because I
> knew that many people dont like the idea of XML output in the kernel.. I
> really thought about it, but it does not make much sense.
> 
> The actual code overhead of XML output compared to a format like
> /proc/bus/usb/devices is almost zero, XML is only a little bit more verbose.
> I agree that XML is not perfect for this kind of data, but it is simple to
> generate, well known and I dont see a better alternative.
> 
> bye..
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Registry (DevReg) Patch 0.2.0

2001-04-24 Thread Martin Dalecki

Tim Jansen wrote:
> 
> The Linux Device Registry (devreg) is a kernel patch that adds a device
> database in XML format to the /proc filesystem. It collects all information
OH SHIT!! ^^^ 


Why don't you just add postscript output to /proc?


> about the system's physical devices, creates persistent device ids and
> provides them in the file /proc/devreg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Registry (DevReg) Patch 0.2.0

2001-04-24 Thread Martin Dalecki

Tim Jansen wrote:
 
 The Linux Device Registry (devreg) is a kernel patch that adds a device
 database in XML format to the /proc filesystem. It collects all information
OH SHIT!! ^^^ 

IRONY
Why don't you just add postscript output to /proc?
/IRONY

 about the system's physical devices, creates persistent device ids and
 provides them in the file /proc/devreg.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Registry (DevReg) Patch 0.2.0

2001-04-24 Thread Martin Dalecki

Tim Jansen wrote:
 
 On Tuesday 24 April 2001 11:40, Martin Dalecki wrote:
  Tim Jansen wrote:
   The Linux Device Registry (devreg) is a kernel patch that adds a device
   database in XML format to the /proc filesystem. It collects all
  OH SHIT!!  ^^^
  Why don't you just add postscript output to /proc?
 
 XML wasn't my first choice. The 0.1.x versions used simple name/value pairs,
 I gave this up after trying to fit the complex USB
 configuration/interface/endpoint data into name/value pairs. Thinking about
 text file formats that allow me to display hierarchical information,  XML was
 the obvious choice for me. Are there alternatives to get complex and
 extendable information out to user space? (see
 http://www.tjansen.de/devreg/devreg.output.txt for a example /proc/devreg
 output)

Yes filesystem structures. Or just simple parsing in the user space
plain binary
data.

 My other ideas were:
 - using a simple binary format, just dump structs. This would break all
 applications every time somebody changes the format, and this should happen
 very often because of the nature of the format
 - using a complicated, extendable binary format, for example chunk-based like
 (a|r)iff file formats. This would add more code in the kernel than XML
 output, is difficult to understand and requires more work in user space
 (because XML parsers are already available)
 - making up a new text-based format with properties similar to XML because I
 knew that many people dont like the idea of XML output in the kernel.. I
 really thought about it, but it does not make much sense.
 
 The actual code overhead of XML output compared to a format like
 /proc/bus/usb/devices is almost zero, XML is only a little bit more verbose.
 I agree that XML is not perfect for this kind of data, but it is simple to
 generate, well known and I dont see a better alternative.
 
 bye..
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Major max and Disk Max in 2.4.x kernel

2001-04-23 Thread Martin Dalecki

"Dupuis, Don" wrote:
> 
> I have already sent a patch to Alan and Linus on this issue.  Linus has
> never responed and Alan said he would look into it in the middle of April.
> Nothing is new at this point
> 
> -Original Message-
> From: PhiloVivero [mailto:[EMAIL PROTECTED]]
> Sent: Sunday, April 22, 2001 12:12 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
> [EMAIL PROTECTED]
> Subject: Device Major max and Disk Max in 2.4.x kernel
> 
> I have a problem. Trying to write an iostat for Linux (or use an existing
> one):
> 
> >From the kernel source:
> 
> [/usr/src/linux-2.4.2/include/linux] :) grep DK_MAX *.h
> kernel_stat.h:#define DK_MAX_MAJOR 16
> kernel_stat.h:#define DK_MAX_DISK 16
> 
> What to notice: MAJOR and DISK max are 16.
> 
> Again, from the kernel source:
> 
> [/usr/src/linux-2.4.2/fs/proc] :) grep -15 DK_MAX proc_misc.c
> 
> for (major = 0; major < DK_MAX_MAJOR; major++) {
> for (disk = 0; disk < DK_MAX_DISK; disk++) {
> int active = kstat.dk_drive[major][disk] +
> kstat.dk_drive_rblk[major][disk] +
> kstat.dk_drive_wblk[major][disk];
> if (active)
> len += sprintf(page + len,
> "(%u,%u):(%u,%u,%u,%u,%u) ",
> major, disk,
> kstat.dk_drive[major][disk],
> kstat.dk_drive_rio[major][disk],
> kstat.dk_drive_rblk[major][disk],
> kstat.dk_drive_wio[major][disk],
> kstat.dk_drive_wblk[major][disk]
> );
> }
> }
> 
> What to notice: We are looping up to the DK_MAX_MAJOR and DK_MAX_DISK. What
> this means is, any major >16 or disk >16 won't be listed in /proc/stat under
> the "disk_io" section.
> 
> Problem. On my system, which I figure is not too uncommon, I have several
> partitions on two hard drives and a CDROM. They are configured thusly:
> 
> # cat /proc/partitions
> major minor  #blocks  name
>3 0   20094480 hda
>3 16313513 hda1
>3 2 401625 hda2
>3 3   13374112 hda3
>3644497152 hdb
>   56 0   45034920 hdi
>   56 1   22490968 hdi1
>   56 2   22539195 hdi2
> 
> What to notice: I have a drive on /dev/hdi (never mind why, it actually
> works)
> that is block major 56. Not only that, my cdrom device on /dev/hdb is block
> major 3, but minor number 64. I am assuming for disks, minor == disk. Sorry
> if
> this is an incorrect assumption.
> 
> No stats for /dev/hdi nor /dev/hdb ever show up in /proc/stat. Only for
> /dev/hda. On my other 2.4.2 system, with multiple hard drives under 16/16,
> I get multiple devices under /proc/stat.
> 
> The patch seems relatively easy. Change linux/include/linux/kernel_stat.h to
> allow block major up to 56 (in my case... 64 in general???) and disks up to
> 64
> (in my case).
> 
> But we might need more than 64 disks on a block major (there are MANY snips
> in
> this so-called cut 'n' paste, because I figure you don't want to see them
> all):
> 
> # l /dev/hd* | sort -n
> brw-rw1 root disk   3,  79 Feb 22 08:57 /dev/hdb15
> brw-rw1 root disk   3,  80 Feb 22 08:57 /dev/hdb16
> brw-rw1 root disk  22,  79 Feb 22 08:57 /dev/hdd15
> brw-rw1 root disk  22,  80 Feb 22 08:57 /dev/hdd16
> brw-rw1 root disk  33,  79 Feb 22 08:57 /dev/hdf15
> brw-rw1 root disk  33,  80 Feb 22 08:57 /dev/hdf16
> brw-rw1 root disk  34,  79 Feb 22 08:57 /dev/hdh15
> brw-rw1 root disk  34,  80 Feb 22 08:57 /dev/hdh16
> brw-rw1 root disk  56, 126 Mar 25 17:14 /dev/hdj62
> brw-rw1 root disk  56, 127 Mar 25 17:14 /dev/hdj63
> 
> What to notice: We have disks up to 127. I never see any block major over 64
> on my system. The /dev/hdj device isn't used on my system. /dev/hdi and
> /dev/hdj belong to a Promise RAID controller on a new-ish
> ASUS AMD motherboard.
> 
> Let me know if I can be of further service. I must bashfully admit that I'm
> not enough of a guru to recompile my kernel anymore, or I'd tweak the
> kernel_stat.h file and recompile myself to test this.
> 
> This is just hazy recollection, but I think the 2.2.x kernels have the same
> problem.

Just never thing about those stats - they just broken by design.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [BUG] lvm beta7 and ac11 problems

2001-04-23 Thread Martin Dalecki

Jeff Chua wrote:
> 
> On Mon, 23 Apr 2001, Martin Dalecki wrote:
> 
> > > > depmod: *** Unresolved symbols in 
>/lib/modules/2.4.3-ac11/kernel/drivers/md/lvm-mod.o
> 
> try this (after you have applied the patch for lvm 0.9.1_beta7) ...
> 
> Jeff
> [[EMAIL PROTECTED]]
> 
> --- /u2/src/linux/drivers/md/lvm.c.org  Mon Apr 23 21:11:32 2001
> +++ /u2/src/linux/drivers/md/lvm.c  Mon Apr 23 21:12:27 2001
> @@ -1791,7 +1791,7 @@
> int max_hardblocksize = 0, hardblocksize;
> 
> for (le = 0; le < lv->lv_allocated_le; le++) {
> -   hardblocksize =
> get_hardblocksize(lv->lv_current_pe[le].dev);
> +   hardblocksize =
> get_hardsect_size(lv->lv_current_pe[le].dev);

^
> if (hardblocksize == 0)
> hardblocksize = 512;
^

Those above two code lines can be killed, since get_hardsect_size
is returning the default sector size of Linux (namely 512 bytes)
in case the driver didn't have a chance to set hardsect_size[] array
in time for usage (Which shouldn't happen anyway).


> if (hardblocksize > max_hardblocksize)
> @@ -1801,7 +1801,7 @@
> if (lv->lv_access & LV_SNAPSHOT) {
> for (e = 0; e < lv->lv_remap_end; e++) {
> hardblocksize =
> -   get_hardblocksize(
> +   get_hardsect_size(
> 
> lv->lv_block_exception[e].rdev_new);
> if (hardblocksize == 0)
> hardblocksize = 512;
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [BUG] lvm beta7 and ac11 problems

2001-04-23 Thread Martin Dalecki

Jens Axboe wrote:
> 
> On Sat, Apr 21 2001, Ed Tomlinson wrote:
> > Hi,
> >
> > building a kernel with 2.4.3-ac11 and lvm beta7 + vfs_locking_patch-2.4.2 yields:
> >
> > oscar# depmod -ae 2.4.3-ac11
> > depmod: *** Unresolved symbols in 
>/lib/modules/2.4.3-ac11/kernel/drivers/md/lvm-mod.o
> > depmod: get_hardblocksize
> >
> > ideas?
> 
> s/get_hardblocksize/get_hardsect_size

And don't forget to have a look whatever the get_hardblocksize == 0
check
or similar can't be killed alltogether as well
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [BUG] lvm beta7 and ac11 problems

2001-04-23 Thread Martin Dalecki

Jens Axboe wrote:
 
 On Sat, Apr 21 2001, Ed Tomlinson wrote:
  Hi,
 
  building a kernel with 2.4.3-ac11 and lvm beta7 + vfs_locking_patch-2.4.2 yields:
 
  oscar# depmod -ae 2.4.3-ac11
  depmod: *** Unresolved symbols in 
/lib/modules/2.4.3-ac11/kernel/drivers/md/lvm-mod.o
  depmod: get_hardblocksize
 
  ideas?
 
 s/get_hardblocksize/get_hardsect_size

And don't forget to have a look whatever the get_hardblocksize == 0
check
or similar can't be killed alltogether as well
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [BUG] lvm beta7 and ac11 problems

2001-04-23 Thread Martin Dalecki

Jeff Chua wrote:
 
 On Mon, 23 Apr 2001, Martin Dalecki wrote:
 
depmod: *** Unresolved symbols in 
/lib/modules/2.4.3-ac11/kernel/drivers/md/lvm-mod.o
 
 try this (after you have applied the patch for lvm 0.9.1_beta7) ...
 
 Jeff
 [[EMAIL PROTECTED]]
 
 --- /u2/src/linux/drivers/md/lvm.c.org  Mon Apr 23 21:11:32 2001
 +++ /u2/src/linux/drivers/md/lvm.c  Mon Apr 23 21:12:27 2001
 @@ -1791,7 +1791,7 @@
 int max_hardblocksize = 0, hardblocksize;
 
 for (le = 0; le  lv-lv_allocated_le; le++) {
 -   hardblocksize =
 get_hardblocksize(lv-lv_current_pe[le].dev);
 +   hardblocksize =
 get_hardsect_size(lv-lv_current_pe[le].dev);

^
 if (hardblocksize == 0)
 hardblocksize = 512;
^

Those above two code lines can be killed, since get_hardsect_size
is returning the default sector size of Linux (namely 512 bytes)
in case the driver didn't have a chance to set hardsect_size[] array
in time for usage (Which shouldn't happen anyway).


 if (hardblocksize  max_hardblocksize)
 @@ -1801,7 +1801,7 @@
 if (lv-lv_access  LV_SNAPSHOT) {
 for (e = 0; e  lv-lv_remap_end; e++) {
 hardblocksize =
 -   get_hardblocksize(
 +   get_hardsect_size(
 
 lv-lv_block_exception[e].rdev_new);
 if (hardblocksize == 0)
 hardblocksize = 512;
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Device Major max and Disk Max in 2.4.x kernel

2001-04-23 Thread Martin Dalecki

Dupuis, Don wrote:
 
 I have already sent a patch to Alan and Linus on this issue.  Linus has
 never responed and Alan said he would look into it in the middle of April.
 Nothing is new at this point
 
 -Original Message-
 From: PhiloVivero [mailto:[EMAIL PROTECTED]]
 Sent: Sunday, April 22, 2001 12:12 AM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Subject: Device Major max and Disk Max in 2.4.x kernel
 
 I have a problem. Trying to write an iostat for Linux (or use an existing
 one):
 
 From the kernel source:
 
 [/usr/src/linux-2.4.2/include/linux] :) grep DK_MAX *.h
 kernel_stat.h:#define DK_MAX_MAJOR 16
 kernel_stat.h:#define DK_MAX_DISK 16
 
 What to notice: MAJOR and DISK max are 16.
 
 Again, from the kernel source:
 
 [/usr/src/linux-2.4.2/fs/proc] :) grep -15 DK_MAX proc_misc.c
 snip
 for (major = 0; major  DK_MAX_MAJOR; major++) {
 for (disk = 0; disk  DK_MAX_DISK; disk++) {
 int active = kstat.dk_drive[major][disk] +
 kstat.dk_drive_rblk[major][disk] +
 kstat.dk_drive_wblk[major][disk];
 if (active)
 len += sprintf(page + len,
 (%u,%u):(%u,%u,%u,%u,%u) ,
 major, disk,
 kstat.dk_drive[major][disk],
 kstat.dk_drive_rio[major][disk],
 kstat.dk_drive_rblk[major][disk],
 kstat.dk_drive_wio[major][disk],
 kstat.dk_drive_wblk[major][disk]
 );
 }
 }
 
 What to notice: We are looping up to the DK_MAX_MAJOR and DK_MAX_DISK. What
 this means is, any major 16 or disk 16 won't be listed in /proc/stat under
 the disk_io section.
 
 Problem. On my system, which I figure is not too uncommon, I have several
 partitions on two hard drives and a CDROM. They are configured thusly:
 
 # cat /proc/partitions
 major minor  #blocks  name
3 0   20094480 hda
3 16313513 hda1
3 2 401625 hda2
3 3   13374112 hda3
3644497152 hdb
   56 0   45034920 hdi
   56 1   22490968 hdi1
   56 2   22539195 hdi2
 
 What to notice: I have a drive on /dev/hdi (never mind why, it actually
 works)
 that is block major 56. Not only that, my cdrom device on /dev/hdb is block
 major 3, but minor number 64. I am assuming for disks, minor == disk. Sorry
 if
 this is an incorrect assumption.
 
 No stats for /dev/hdi nor /dev/hdb ever show up in /proc/stat. Only for
 /dev/hda. On my other 2.4.2 system, with multiple hard drives under 16/16,
 I get multiple devices under /proc/stat.
 
 The patch seems relatively easy. Change linux/include/linux/kernel_stat.h to
 allow block major up to 56 (in my case... 64 in general???) and disks up to
 64
 (in my case).
 
 But we might need more than 64 disks on a block major (there are MANY snips
 in
 this so-called cut 'n' paste, because I figure you don't want to see them
 all):
 
 # l /dev/hd* | sort -n
 brw-rw1 root disk   3,  79 Feb 22 08:57 /dev/hdb15
 brw-rw1 root disk   3,  80 Feb 22 08:57 /dev/hdb16
 brw-rw1 root disk  22,  79 Feb 22 08:57 /dev/hdd15
 brw-rw1 root disk  22,  80 Feb 22 08:57 /dev/hdd16
 brw-rw1 root disk  33,  79 Feb 22 08:57 /dev/hdf15
 brw-rw1 root disk  33,  80 Feb 22 08:57 /dev/hdf16
 brw-rw1 root disk  34,  79 Feb 22 08:57 /dev/hdh15
 brw-rw1 root disk  34,  80 Feb 22 08:57 /dev/hdh16
 brw-rw1 root disk  56, 126 Mar 25 17:14 /dev/hdj62
 brw-rw1 root disk  56, 127 Mar 25 17:14 /dev/hdj63
 
 What to notice: We have disks up to 127. I never see any block major over 64
 on my system. The /dev/hdj device isn't used on my system. /dev/hdi and
 /dev/hdj belong to a Promise RAID controller on a new-ish
 ASUS AMD motherboard.
 
 Let me know if I can be of further service. I must bashfully admit that I'm
 not enough of a guru to recompile my kernel anymore, or I'd tweak the
 kernel_stat.h file and recompile myself to test this.
 
 This is just hazy recollection, but I think the 2.2.x kernels have the same
 problem.

Just never thing about those stats - they just broken by design.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



PATCH tinny confusion cleanup in 2.4.3

2001-04-18 Thread Martin Dalecki

Hello!

The attached patch remove the get_hardblock_size() function entierly
from the kernel. This is due to the fact that this function is
compleatly
unneccessary due to the existance of get_hardsect_size(), which got
introduced to properly encapsulate acesses to the hardsec_size[].
As a side effect this is reducing the number of module call-entrypoints
by one, which is a Good Thing TM.

Plase just apply it...

diff -urN linux/fs/buffer.c linux-new/fs/buffer.c
--- linux/fs/buffer.c   Wed Apr 18 20:41:16 2001
+++ linux-new/fs/buffer.c   Wed Apr 18 18:28:52 2001
@@ -555,25 +555,6 @@
return bh;
 }
 
-unsigned int get_hardblocksize(kdev_t dev)
-{
-   /*
-* Get the hard sector size for the given device.  If we don't know
-* what it is, return 0.
-*/
-   if (hardsect_size[MAJOR(dev)] != NULL) {
-   int blksize = hardsect_size[MAJOR(dev)][MINOR(dev)];
-   if (blksize != 0)
-   return blksize;
-   }
-
-   /*
-* We don't know what the hardware sector size for this device is.
-* Return 0 indicating that we don't know.
-*/
-   return 0;
-}
-
 void buffer_insert_inode_queue(struct buffer_head *bh, struct inode *inode)
 {
spin_lock(_list_lock);
diff -urN linux/fs/ext2/super.c linux-new/fs/ext2/super.c
--- linux/fs/ext2/super.c   Fri Dec 29 23:36:44 2000
+++ linux-new/fs/ext2/super.c   Wed Apr 18 19:19:44 2001
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 
@@ -404,11 +405,9 @@
 * This is important for devices that have a hardware
 * sectorsize that is larger than the default.
 */
-   blocksize = get_hardblocksize(dev);
-   if( blocksize == 0 || blocksize < BLOCK_SIZE )
- {
+   blocksize = get_hardsect_size(dev);
+   if(blocksize < BLOCK_SIZE )
blocksize = BLOCK_SIZE;
- }
 
sb->u.ext2_sb.s_mount_opt = 0;
if (!parse_options ((char *) data, _block, , ,
@@ -482,11 +481,9 @@
 * Make sure the blocksize for the filesystem is larger
 * than the hardware sectorsize for the machine.
 */
-   hblock = get_hardblocksize(dev);
-   if((hblock != 0)
-   && (sb->s_blocksize < hblock) )
-   {
-   printk("EXT2-fs: blocksize too small for device.\n");
+   hblock = get_hardsect_size(dev);
+   if (sb->s_blocksize < hblock) {
+   printk(KERN_ERR "EXT2-fs: blocksize too small for device.\n");
goto failed_mount;
}
 
diff -urN linux/fs/isofs/inode.c linux-new/fs/isofs/inode.c
--- linux/fs/isofs/inode.c  Wed Apr 18 20:41:16 2001
+++ linux-new/fs/isofs/inode.c  Wed Apr 18 20:23:40 2001
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -493,21 +494,21 @@
printk("iocharset = %s\n", opt.iocharset);
 #endif
 
-   /*
-* First of all, get the hardware blocksize for this device.
-* If we don't know what it is, or the hardware blocksize is
-* larger than the blocksize the user specified, then use
-* that value.
-*/
-   blocksize = get_hardblocksize(dev);
-   if(blocksize > opt.blocksize) {
-   /*
-* Force the blocksize we are going to use to be the
-* hardware blocksize.
-*/
-   opt.blocksize = blocksize;
+   /*
+* First of all, get the hardware blocksize for this device.
+* If we don't know what it is, or the hardware blocksize is
+* larger than the blocksize the user specified, then use
+* that value.
+*/
+   blocksize = get_hardsect_size(dev);
+   if(blocksize > opt.blocksize) {
+   /*
+* Force the blocksize we are going to use to be the
+* hardware blocksize.
+*/
+   opt.blocksize = blocksize;
}
- 
+
blocksize_bits = 0;
{
  int i = opt.blocksize;
diff -urN linux/fs/minix/inode.c linux-new/fs/minix/inode.c
--- linux/fs/minix/inode.c  Wed Apr 18 20:41:16 2001
+++ linux-new/fs/minix/inode.c  Wed Apr 18 20:27:54 2001
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -179,7 +180,7 @@
const char * errmsg;
struct inode *root_inode;
unsigned int hblock;
-   
+
/* N.B. These should be compile-time tests.
   Unfortunately that is impossible. */
if (32 != sizeof (struct minix_inode))
@@ -187,8 +188,8 @@
if (64 != sizeof(struct minix2_inode))
panic("bad V2 i-node size");
 
-   hblock = get_hardblocksize(dev);
-   if (hblock && hblock > BLOCK_SIZE)
+   hblock = get_hardsect_size(dev);
+   if (hblock > BLOCK_SIZE)
goto out_bad_hblock;
 
set_blocksize(dev, BLOCK_SIZE);

PATCH tinny confusion cleanup in 2.4.3

2001-04-18 Thread Martin Dalecki

Hello!

The attached patch remove the get_hardblock_size() function entierly
from the kernel. This is due to the fact that this function is
compleatly
unneccessary due to the existance of get_hardsect_size(), which got
introduced to properly encapsulate acesses to the hardsec_size[].
As a side effect this is reducing the number of module call-entrypoints
by one, which is a Good Thing TM.

Plase just apply it...

diff -urN linux/fs/buffer.c linux-new/fs/buffer.c
--- linux/fs/buffer.c   Wed Apr 18 20:41:16 2001
+++ linux-new/fs/buffer.c   Wed Apr 18 18:28:52 2001
@@ -555,25 +555,6 @@
return bh;
 }
 
-unsigned int get_hardblocksize(kdev_t dev)
-{
-   /*
-* Get the hard sector size for the given device.  If we don't know
-* what it is, return 0.
-*/
-   if (hardsect_size[MAJOR(dev)] != NULL) {
-   int blksize = hardsect_size[MAJOR(dev)][MINOR(dev)];
-   if (blksize != 0)
-   return blksize;
-   }
-
-   /*
-* We don't know what the hardware sector size for this device is.
-* Return 0 indicating that we don't know.
-*/
-   return 0;
-}
-
 void buffer_insert_inode_queue(struct buffer_head *bh, struct inode *inode)
 {
spin_lock(lru_list_lock);
diff -urN linux/fs/ext2/super.c linux-new/fs/ext2/super.c
--- linux/fs/ext2/super.c   Fri Dec 29 23:36:44 2000
+++ linux-new/fs/ext2/super.c   Wed Apr 18 19:19:44 2001
@@ -24,6 +24,7 @@
 #include linux/slab.h
 #include linux/init.h
 #include linux/locks.h
+#include linux/blkdev.h
 #include asm/uaccess.h
 
 
@@ -404,11 +405,9 @@
 * This is important for devices that have a hardware
 * sectorsize that is larger than the default.
 */
-   blocksize = get_hardblocksize(dev);
-   if( blocksize == 0 || blocksize  BLOCK_SIZE )
- {
+   blocksize = get_hardsect_size(dev);
+   if(blocksize  BLOCK_SIZE )
blocksize = BLOCK_SIZE;
- }
 
sb-u.ext2_sb.s_mount_opt = 0;
if (!parse_options ((char *) data, sb_block, resuid, resgid,
@@ -482,11 +481,9 @@
 * Make sure the blocksize for the filesystem is larger
 * than the hardware sectorsize for the machine.
 */
-   hblock = get_hardblocksize(dev);
-   if((hblock != 0)
-(sb-s_blocksize  hblock) )
-   {
-   printk("EXT2-fs: blocksize too small for device.\n");
+   hblock = get_hardsect_size(dev);
+   if (sb-s_blocksize  hblock) {
+   printk(KERN_ERR "EXT2-fs: blocksize too small for device.\n");
goto failed_mount;
}
 
diff -urN linux/fs/isofs/inode.c linux-new/fs/isofs/inode.c
--- linux/fs/isofs/inode.c  Wed Apr 18 20:41:16 2001
+++ linux-new/fs/isofs/inode.c  Wed Apr 18 20:23:40 2001
@@ -27,6 +27,7 @@
 #include linux/nls.h
 #include linux/ctype.h
 #include linux/smp_lock.h
+#include linux/blkdev.h
 
 #include asm/system.h
 #include asm/uaccess.h
@@ -493,21 +494,21 @@
printk("iocharset = %s\n", opt.iocharset);
 #endif
 
-   /*
-* First of all, get the hardware blocksize for this device.
-* If we don't know what it is, or the hardware blocksize is
-* larger than the blocksize the user specified, then use
-* that value.
-*/
-   blocksize = get_hardblocksize(dev);
-   if(blocksize  opt.blocksize) {
-   /*
-* Force the blocksize we are going to use to be the
-* hardware blocksize.
-*/
-   opt.blocksize = blocksize;
+   /*
+* First of all, get the hardware blocksize for this device.
+* If we don't know what it is, or the hardware blocksize is
+* larger than the blocksize the user specified, then use
+* that value.
+*/
+   blocksize = get_hardsect_size(dev);
+   if(blocksize  opt.blocksize) {
+   /*
+* Force the blocksize we are going to use to be the
+* hardware blocksize.
+*/
+   opt.blocksize = blocksize;
}
- 
+
blocksize_bits = 0;
{
  int i = opt.blocksize;
diff -urN linux/fs/minix/inode.c linux-new/fs/minix/inode.c
--- linux/fs/minix/inode.c  Wed Apr 18 20:41:16 2001
+++ linux-new/fs/minix/inode.c  Wed Apr 18 20:27:54 2001
@@ -21,6 +21,7 @@
 #include linux/init.h
 #include linux/smp_lock.h
 #include linux/highuid.h
+#include linux/blkdev.h
 
 #include asm/system.h
 #include asm/bitops.h
@@ -179,7 +180,7 @@
const char * errmsg;
struct inode *root_inode;
unsigned int hblock;
-   
+
/* N.B. These should be compile-time tests.
   Unfortunately that is impossible. */
if (32 != sizeof (struct minix_inode))
@@ -187,8 +188,8 @@
if (64 != sizeof(struct minix2_inode))
panic("bad V2 i-node size");
 
-   

Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

> One thing I certainly miss: DevFS is not mandatory (yet).

That's "only" due to the fact that DevFS is an insanely racy and
instable
piece of CRAP. I'm unhappy it's there anyway...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

Alan Cox wrote:
> 
> > If anything I'm a *SERIOUS* production user. And I wouldn't allow
> > *ANYBODY* here to run am explicitly tagged as developement kernel
> > here anyway in an production enviornment. That's what releases are for
> > damn.
> > Or do you think that Linux should still preserve DOS compatibility
> > in to the eternity as other "popular" systems do?
> 
> You still break 2.4-2.6. Thats a production release jump. Right now I can
> and do run 2.0->2.4 on the same box. If you dont understand why to many
> people that is a requirement please talk to folks who run real business on
> Linux

You have possible no imagination about how real the business is I do
:-).
What's worth it to be able running 2.0 and 2.4 on the same box?
I just intendid to tell you that there are actually people in the
REAL BUSINESS out there who know about and are willing to sacifier
compatibility until perpetuum for contignouus developement.

BTW we don't run much of Cyrix486 hardware anymore here.. More like
boxes with few gigs of ram 4 CPU's RAID and so on...
The single biggest memmory hog here is currently the Oracle 9i AS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

Alan Cox wrote:
> 
> > So change them as well for a new distribution. What's there problem.
> > There isn't anything out there you can't do by hand.
> > Fortunately so!
> 
> So users cannot go back and forward between new and old kernels. Very good.
> Try explaining that to serious production -users- of a system and see how
> it goes down

If anything I'm a *SERIOUS* production user. And I wouldn't allow
*ANYBODY* here to run am explicitly tagged as developement kernel
here anyway in an production enviornment. That's what releases are for
damn.
Or do you think that Linux should still preserve DOS compatibility
in to the eternity as other "popular" systems do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

[EMAIL PROTECTED] wrote:
> 
> OK - everybody back from San Jose - pity I couldnt come -
> and it is no longer April 1st, so we can continue quarreling
> a little.
> 
> Interesting that where I had divided stuff in the trivial part,
> the interesting part and the lot-of-work part we already start
> fighting on the trivial part. Maybe it is not very important -
> still I'd prefer to do things right from the start.
> 
> Yes. We need a larger dev_t as everybody agrees.
> How large?
> 
> What is dev_t used for? It is a communication channel from
> filesystem to user space (via the stat() system call)
> and from user space to filesystem (via the mknod() system call).
> 
> So, it seems the kernel interface must allow passing the values
> that actually occur, in present or future file systems.
> Making the interface narrow is only asking for problems later.
> Are there already any filesystems that use 64-bits?
> I would say that that is irrelevant - what we don't have today
> may come tomorrow - but in fact the NFSv3 interface uses
> a 64-bit device number.
> 
> So glibc comes with 64 bits, the kernel has to hand these bits
> over to NFS but is unwilling to - you are not going to get
> more than 32. Why?
> 
> > I have a holy crusade.
> 
> I fail to see the connection. There is no bloat here, the kernel
> is hardly involved. Some values are passed. If the values are
> larger than the filesystem likes it will probably return EINVAL.
> But the kernel has no business getting in the way.
> 
> There is no matter of efficiency either - mknod is not precisely
> the most frequently used system call, and our stat interface, which
> really is important, is 64 bits today.

I think the only reason for Linux to take 12 bit major is the
fact that then he only has to increas the lenght of the static
major device pointers in the kernel and it will be there...
However the problem is mostly that the aforementioned array
of pointers shouldn't me there in first place.

> 
> Not using 64 also gives interesting small problems with Solaris or
> FreeBSD NFS mounts. One uses 14+18, the other 8+24, so with 12+20
> we cannot handle Solaris' majors and we cannot handle FreeBSD's minors.
> 
> [Then there were discussions about naming.
> These are interesting, but independent.
> The current discussion is almost entirely about mknod.]
> 
> Andries
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

Alan Cox wrote:
 
  So change them as well for a new distribution. What's there problem.
  There isn't anything out there you can't do by hand.
  Fortunately so!
 
 So users cannot go back and forward between new and old kernels. Very good.
 Try explaining that to serious production -users- of a system and see how
 it goes down

If anything I'm a *SERIOUS* production user. And I wouldn't allow
*ANYBODY* here to run am explicitly tagged as developement kernel
here anyway in an production enviornment. That's what releases are for
damn.
Or do you think that Linux should still preserve DOS compatibility
in to the eternity as other "popular" systems do?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

Alan Cox wrote:
 
  If anything I'm a *SERIOUS* production user. And I wouldn't allow
  *ANYBODY* here to run am explicitly tagged as developement kernel
  here anyway in an production enviornment. That's what releases are for
  damn.
  Or do you think that Linux should still preserve DOS compatibility
  in to the eternity as other "popular" systems do?
 
 You still break 2.4-2.6. Thats a production release jump. Right now I can
 and do run 2.0-2.4 on the same box. If you dont understand why to many
 people that is a requirement please talk to folks who run real business on
 Linux

You have possible no imagination about how real the business is I do
:-).
What's worth it to be able running 2.0 and 2.4 on the same box?
I just intendid to tell you that there are actually people in the
REAL BUSINESS out there who know about and are willing to sacifier
compatibility until perpetuum for contignouus developement.

BTW we don't run much of Cyrix486 hardware anymore here.. More like
boxes with few gigs of ram 4 CPU's RAID and so on...
The single biggest memmory hog here is currently the Oracle 9i AS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-04-03 Thread Martin Dalecki

 One thing I certainly miss: DevFS is not mandatory (yet).

That's "only" due to the fact that DevFS is an insanely racy and
instable
piece of CRAP. I'm unhappy it's there anyway...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-29 Thread Martin Dalecki

Alan Cox wrote:
> 
> > Why do you worry about installers? New distro - new kernel - new
> > installer
> 
> Because the same code tends to be shared with post install configuration
> tools too.

So change them as well for a new distribution. What's there problem.
There isn't anything out there you can't do by hand. 
Fortunately so!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-29 Thread Martin Dalecki

Alan Cox wrote:
 
  Why do you worry about installers? New distro - new kernel - new
  installer
 
 Because the same code tends to be shared with post install configuration
 tools too.

So change them as well for a new distribution. What's there problem.
There isn't anything out there you can't do by hand. 
Fortunately so!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Alan Cox wrote:
> 
> > Exactly. It's just that for historical reasons, I think the major for
> > "disk" should be either the old IDE or SCSI one, which just can show more
> > devices. That way old installers etc work without having to suddenly start
> > knowing about /dev/disk0.
> 
> They will mostly break. Installers tend to parse /proc/scsi and have fairly
> complex ioctl based relationships based on knowing ide v scsi.
> 
> /dev/disc/ is a little un-unix but its clean

Why do you worry about installers? New distro - new kernel - new
installer
that's they job to worry about it. They will change the installer anyway
and this kind of change actually is going to simplyfy the code there, I
think,
a bit.

Just kill the old device major suddenly and place it in the changelog
of the new kernel that the user should mknod and add it to /dev/fstab
before rebooting into the new kernel. Hey that's developement anyway :-)
If the developer boots back into the old kernel just other mounts
 in /dev/fstab will fail no problem for transition here in sight...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

> what do other vaguely unix-like systems do?  does, say, plan9 have a
> better way of dealing with all this?

Yes.

Normal UNIX has as well. For reffernece see: block ver raw 
devices on docs.sun.com :-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Alan Cox wrote:
> 
> > high-end-disks. Rather the reverse. I'm advocating the SCSI layer not
> > hogging a major number, but letting low-level drivers get at _their_
> > requests directly.
> 
> A major for 'disk' generically makes total sense. Classing raid controllers
> as 'scsi' isnt neccessarily accurate. A major for 'serial ports' would also
> solve a lot of misery

And IDE disk ver CD-ROM f and block vers. raw devices
and so so at perpetuum. Those are the reaons why the
density of majros ver. minors is exactly
revers in solaris with respect to the proposal of Linus..

And then we have all those VERY SPARSE static arrays of
major versus minor devices information (if you look at which cells
from those arrays are used on a running system which maybe about
6-8 devices actually attached!)

The main  sheer practical problem to changing kdev_t is
the HUGE number of in fact entierly differnt drivers sharing the same
major
and splitting up the minor number space and then hooking
devices with differnt block sizes and such on the same major.
Many things in the block device layer handling could
be simplefied significalty if one could assume for
example that all the devices on one single major
have the same block size and so on...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Linus Torvalds wrote:
> 
> On Tue, 27 Mar 2001, Andre Hedrick wrote:
> >
> > Am I hearing you state you want dynamic device points and dynamic majors?
> 
> Yes and no.
> 
> We need static structures for user space - from a user perspective it
> makes a ton more sense to say "I want to see all disks" than it does to
> know that you have to do /dev/hd*, /dev/sd* plus all the extra magic
> combinations that can happen (USB etc).
> 
> So in a sense what I'm arguing for is for _stricter_ device numbers to the
> outside world.
> 
> But internally, it would be reasonably easy to make a mapping from those
> user-visible numbers to a much looser version.
> 
> One example of this is going to happen very early in 2.5.x: the whole
> "partitioning" stuff is going to go away from the driver, and into the
> ll_rw_block layer as just another disk re-mapping thing. We already do
> those kinds of re-mappings for LVM reasons anyway, and partitioning is not
> something a disk driver should know about, really.
> 
> And that kind of partitioning mapping automatically means that we'd need
> to remap minor numbers, and do it on a per-major basis (because the
> partitioning mapping right now is not actually the same between SCSI and
> IDE: IDE uses six bits of partitioning, while SCSI uses just four bits).
> And once you do that, you might as well start "remapping" major numbers
> too.
> 
> So let's say that you have two separate SCSI controllers - they would both
> show up on major #8, and different minor numbers. Right now, for example,
> controller 1 might have one disk, with minors 0-15 (for the whole disk and
> 15 partitions), and controller 2 might have two disks using minors 16-47.
> 
> As it stands now, the SCSI layer needs to do the remapping, and because
> the SCSI layer does the remapping, nothing but SCSI layer devices can use
> major #8.
> 
> But once you start doing partition mapping in ll_rw_block.c, you might as
> well get rid of the notion that "SCSI is major 8". You could easily have
> many different drivers, with many different queues, and remap them all to
> have major 8 (and different minors) so that it looks simple for a user
> that just wants to see SCSI disks.
> 
> Which is not to say that the same disk might not show up somewhere else
> too, if anybody wants it to. The _driver_ should just know "unit x on
> queue y", and then the driver might do whatever it wants (it might be, for
> example, that the driver actually wants to show multiple controllers as
> one queue, if the driver really wants to for some reason). And it should
> be possible to have two drivers that really have no idea at ALL about each
> other to just share the same major numbers.

Then please please please demangle other cases as well!
IDE is the one which is badging my head most. SCSI as well...

Granted I wouldn't mind a rebot with new /dev/* once!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

"H. Peter Anvin" wrote:
> 
> Alan Cox wrote:
> >
> > > Another example: all the stupid pseudo-SCSI drivers that got their own
> > > major numbers, and wanted their very own names in /dev. They are BAD for
> > > the user. Install-scripts etc used to be able to just test /dev/hd[a-d]
> > > and /dev/sd[0-x] and they'd get all the disks. Deficiencies in the SCSI
> >
> > Sorry here I have to disagree. This is _policy_ and does not belong in the
> > kernel. I can call them all /dev/hdfoo or /dev/disc/blah regardless of
> > major/minor encoding. If you dont mind kernel based policy then devfs
> > with /dev/disc already sorts this out nicely.
> >
> > IMHO more procfs crud is also not the answer. procfs is already poorly
> > managed with arbitary and semi-random namespace. Its a beautiful example of
> > why adhoc naming is as broken as random dev_t allocations. Maybe Al Viro's
> > per device file systems solve that.
> >
> 
> In some ways, they make matters worse -- now you have to effectively keep
> a device list in /etc/fstab.  Not exactly user friendly.
> 
> devfs -- in the abstract -- really isn't that bad of an idea; after all,

Devfs is from a desing point of view the duplication for the bad /proc
design for devices. If you need a good design for general device
handling with names - network interfaces are the thing too look at.
mount() should be more like a select()... accept()!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

"H. Peter Anvin" wrote:
> 
> This is my opinion on the issue.  Short summary: "I'm sick of the
> administrative burden associated with keeping dev_t dense."
> 
> Linus Torvalds wrote:
> >
> > And let's take a look at /dev. Do a "ls -l /dev" and think about it. Every
> > device needs a unique number. Do you ever envision seeing that "ls -l"
> > taking about 500 billion years to complete? I don't. I don't think you do.
> > But that's how ludicrous a 64-bit device number is.
> >
> 
> That's how ludicrous a *dense* 64-bit device number is.  I have to say I
> disagree with you that sparse number spaces are a bad idea.  The
> IPv4->IPv6 transition people have looked at the issues of number spaces
> and how much harder they get to keep dense when the size of the
> numberspace grows, because your lookup operation becomes so much more
> painful.  Any time you have to take a larger number space and squeeze it
> into a smaller number space, you get some serious pain.
> 
> Part of the reason we haven't -- quite -- run out of 8-bit majors yet is
> because I have been an absolute *bastard* with registrants lately.  It
> would cut down on my workload if I could assign majors without worrying
> too much about whether or not that particular driver is really going to
> be made public.
> 
> 64 bits is obviously excessive, but I really don't feel comfortable
> saying that only 12 bits of major is sufficient.  16 I would buy, but I
> don't think 16 bits of minor is sufficient.  Given that, it seems to me
> -- especially since dev_t isn't exactly the most accessed data type in
> the universe -- that the conceptual simplicity of keeping the major and
> minor separate in individual 32-bit words really is just as well.  YES,
> it's overengineering, but the cost is very small; the cost of
> underengineering is having to go through yet another painful transition.
> Unfortunately, the Linux community seems to have some serious problems
> with getting system-wide transitions to happen, especially the ones that
> involve ABI changes.  This needs to be taken into account.
> 
> -hpa

Then just tell me please why the PCI name space is just 32 bit?

Majros are for drivers Minors are for device driver instances 
(yes linux does split minors in a stiupid way by forexample
using the same major for IDE disks and ide CD-ROM, which are in
fact compleatly different devices just sharing driver code...
(Dirrerent block sizes, different interface protokoll and so on)


Those are the reaons solaris is using a split 24/12 (Major/Minor)
and they don't have our problems here.

> 
> --
> <[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
> "Unix gives you enough rope to shoot yourself in the foot."
> http://www.zytor.com/~hpa/puzzle.txt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

"H. Peter Anvin" wrote:
 
 This is my opinion on the issue.  Short summary: "I'm sick of the
 administrative burden associated with keeping dev_t dense."
 
 Linus Torvalds wrote:
 
  And let's take a look at /dev. Do a "ls -l /dev" and think about it. Every
  device needs a unique number. Do you ever envision seeing that "ls -l"
  taking about 500 billion years to complete? I don't. I don't think you do.
  But that's how ludicrous a 64-bit device number is.
 
 
 That's how ludicrous a *dense* 64-bit device number is.  I have to say I
 disagree with you that sparse number spaces are a bad idea.  The
 IPv4-IPv6 transition people have looked at the issues of number spaces
 and how much harder they get to keep dense when the size of the
 numberspace grows, because your lookup operation becomes so much more
 painful.  Any time you have to take a larger number space and squeeze it
 into a smaller number space, you get some serious pain.
 
 Part of the reason we haven't -- quite -- run out of 8-bit majors yet is
 because I have been an absolute *bastard* with registrants lately.  It
 would cut down on my workload if I could assign majors without worrying
 too much about whether or not that particular driver is really going to
 be made public.
 
 64 bits is obviously excessive, but I really don't feel comfortable
 saying that only 12 bits of major is sufficient.  16 I would buy, but I
 don't think 16 bits of minor is sufficient.  Given that, it seems to me
 -- especially since dev_t isn't exactly the most accessed data type in
 the universe -- that the conceptual simplicity of keeping the major and
 minor separate in individual 32-bit words really is just as well.  YES,
 it's overengineering, but the cost is very small; the cost of
 underengineering is having to go through yet another painful transition.
 Unfortunately, the Linux community seems to have some serious problems
 with getting system-wide transitions to happen, especially the ones that
 involve ABI changes.  This needs to be taken into account.
 
 -hpa

Then just tell me please why the PCI name space is just 32 bit?

Majros are for drivers Minors are for device driver instances 
(yes linux does split minors in a stiupid way by forexample
using the same major for IDE disks and ide CD-ROM, which are in
fact compleatly different devices just sharing driver code...
(Dirrerent block sizes, different interface protokoll and so on)


Those are the reaons solaris is using a split 24/12 (Major/Minor)
and they don't have our problems here.

 
 --
 [EMAIL PROTECTED] at work, [EMAIL PROTECTED] in private!
 "Unix gives you enough rope to shoot yourself in the foot."
 http://www.zytor.com/~hpa/puzzle.txt
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

"H. Peter Anvin" wrote:
 
 Alan Cox wrote:
 
   Another example: all the stupid pseudo-SCSI drivers that got their own
   major numbers, and wanted their very own names in /dev. They are BAD for
   the user. Install-scripts etc used to be able to just test /dev/hd[a-d]
   and /dev/sd[0-x] and they'd get all the disks. Deficiencies in the SCSI
 
  Sorry here I have to disagree. This is _policy_ and does not belong in the
  kernel. I can call them all /dev/hdfoo or /dev/disc/blah regardless of
  major/minor encoding. If you dont mind kernel based policy then devfs
  with /dev/disc already sorts this out nicely.
 
  IMHO more procfs crud is also not the answer. procfs is already poorly
  managed with arbitary and semi-random namespace. Its a beautiful example of
  why adhoc naming is as broken as random dev_t allocations. Maybe Al Viro's
  per device file systems solve that.
 
 
 In some ways, they make matters worse -- now you have to effectively keep
 a device list in /etc/fstab.  Not exactly user friendly.
 
 devfs -- in the abstract -- really isn't that bad of an idea; after all,

Devfs is from a desing point of view the duplication for the bad /proc
design for devices. If you need a good design for general device
handling with names - network interfaces are the thing too look at.
mount() should be more like a select()... accept()!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Alan Cox wrote:
 
  high-end-disks. Rather the reverse. I'm advocating the SCSI layer not
  hogging a major number, but letting low-level drivers get at _their_
  requests directly.
 
 A major for 'disk' generically makes total sense. Classing raid controllers
 as 'scsi' isnt neccessarily accurate. A major for 'serial ports' would also
 solve a lot of misery

And IDE disk ver CD-ROM f and block vers. raw devices
and so so at perpetuum. Those are the reaons why the
density of majros ver. minors is exactly
revers in solaris with respect to the proposal of Linus..

And then we have all those VERY SPARSE static arrays of
major versus minor devices information (if you look at which cells
from those arrays are used on a running system which maybe about
6-8 devices actually attached!)

The main  sheer practical problem to changing kdev_t is
the HUGE number of in fact entierly differnt drivers sharing the same
major
and splitting up the minor number space and then hooking
devices with differnt block sizes and such on the same major.
Many things in the block device layer handling could
be simplefied significalty if one could assume for
example that all the devices on one single major
have the same block size and so on...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Linus Torvalds wrote:
 
 On Tue, 27 Mar 2001, Andre Hedrick wrote:
 
  Am I hearing you state you want dynamic device points and dynamic majors?
 
 Yes and no.
 
 We need static structures for user space - from a user perspective it
 makes a ton more sense to say "I want to see all disks" than it does to
 know that you have to do /dev/hd*, /dev/sd* plus all the extra magic
 combinations that can happen (USB etc).
 
 So in a sense what I'm arguing for is for _stricter_ device numbers to the
 outside world.
 
 But internally, it would be reasonably easy to make a mapping from those
 user-visible numbers to a much looser version.
 
 One example of this is going to happen very early in 2.5.x: the whole
 "partitioning" stuff is going to go away from the driver, and into the
 ll_rw_block layer as just another disk re-mapping thing. We already do
 those kinds of re-mappings for LVM reasons anyway, and partitioning is not
 something a disk driver should know about, really.
 
 And that kind of partitioning mapping automatically means that we'd need
 to remap minor numbers, and do it on a per-major basis (because the
 partitioning mapping right now is not actually the same between SCSI and
 IDE: IDE uses six bits of partitioning, while SCSI uses just four bits).
 And once you do that, you might as well start "remapping" major numbers
 too.
 
 So let's say that you have two separate SCSI controllers - they would both
 show up on major #8, and different minor numbers. Right now, for example,
 controller 1 might have one disk, with minors 0-15 (for the whole disk and
 15 partitions), and controller 2 might have two disks using minors 16-47.
 
 As it stands now, the SCSI layer needs to do the remapping, and because
 the SCSI layer does the remapping, nothing but SCSI layer devices can use
 major #8.
 
 But once you start doing partition mapping in ll_rw_block.c, you might as
 well get rid of the notion that "SCSI is major 8". You could easily have
 many different drivers, with many different queues, and remap them all to
 have major 8 (and different minors) so that it looks simple for a user
 that just wants to see SCSI disks.
 
 Which is not to say that the same disk might not show up somewhere else
 too, if anybody wants it to. The _driver_ should just know "unit x on
 queue y", and then the driver might do whatever it wants (it might be, for
 example, that the driver actually wants to show multiple controllers as
 one queue, if the driver really wants to for some reason). And it should
 be possible to have two drivers that really have no idea at ALL about each
 other to just share the same major numbers.

Then please please please demangle other cases as well!
IDE is the one which is badging my head most. SCSI as well...

Granted I wouldn't mind a rebot with new /dev/* once!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

 what do other vaguely unix-like systems do?  does, say, plan9 have a
 better way of dealing with all this?

Yes.

Normal UNIX has as well. For reffernece see: block ver raw 
devices on docs.sun.com :-).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Larger dev_t

2001-03-28 Thread Martin Dalecki

Alan Cox wrote:
 
  Exactly. It's just that for historical reasons, I think the major for
  "disk" should be either the old IDE or SCSI one, which just can show more
  devices. That way old installers etc work without having to suddenly start
  knowing about /dev/disk0.
 
 They will mostly break. Installers tend to parse /proc/scsi and have fairly
 complex ioctl based relationships based on knowing ide v scsi.
 
 /dev/disc/ is a little un-unix but its clean

Why do you worry about installers? New distro - new kernel - new
installer
that's they job to worry about it. They will change the installer anyway
and this kind of change actually is going to simplyfy the code there, I
think,
a bit.

Just kill the old device major suddenly and place it in the changelog
of the new kernel that the user should mknod and add it to /dev/fstab
before rebooting into the new kernel. Hey that's developement anyway :-)
If the developer boots back into the old kernel just other mounts
 in /dev/fstab will fail no problem for transition here in sight...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOM killer???

2001-03-27 Thread Martin Dalecki

Ingo Oeser wrote:
> 
> On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
> > > @@ -93,6 +95,10 @@
> > > p->uid == 0 || p->euid == 0)
> > > points /= 4;
> > >
> > > +   /* Much the same goes for processes with low UIDs */
> > > +   if(p->uid < 100 || p->euid < 100)
> > > + points /= 2;
> > > +
> >
> > Plase change to 100 to 500 - this would make it consistant with
> > the useradd command, which starts adding new users at the UID 500
> 
> No, useradd reads usally the /etc/login.defs to select the range.
> The oom-killer should have configurables for that, to allow the
> policy decisions in USER space -- where it belongs -- not in KERNEL space

OK sysctl would be more appripriate.

> If we use my OOM killer API, this patch would be a module and
> could have module parameters to select that.
> 
> Johnathan: I URGE you to apply my patch before adding OOM killer
>stuff. What's wrong with it, that you cannot use it? ;-)
> 
> It is easy to add configurables to a module and play with them
> WITHOUT recompiling.

It's total overkill and therefore not a good design.

> Dynamic sysctl tables would also be possible, IF we had an value
> that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.
> 
> It is also better to include the egid into the decision. There
> are deamons, that I defintely want to be killed on a workstation,
> but not on a server.
> 
> e.g. My important matlab calculation, which runs in user mode
> should not be killed. But killing a local webserver, which serves
> my help system is ok (because I will not loose work, and might
> get it over the net, if there is a problem).
> 
> So as Rik stated: The OOM killer cannot suit all people, so it
> has to be configurable, to be OOM kill, not overkill ;-)

Irony: Why then not store this information permanently - inside
the UID of the application?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] OOM handling

2001-03-27 Thread Martin Dalecki

Michel Wilson wrote:
> 
> > relative ages.  The major flaw in my code is that a sufficiently
> > long-lived
> > process becomes virtually immortal, even if it happens to spring a serious
> > leak after this time - the flaw in yours is that system processes
> 
> I think this could easily be fixed if you'd 'chop off' the runtime at a
> certain point:
> 
> if(runtime > something_big)
> runtime = something_big;
> 
> This would of course need some tuning. The only thing i don't like about
> this is that it's a kind of 'magical value', but i suppose it's not a very
> good idea to make this configurable, right?

Then after some time runtime becomes allmost irrelevant.
You are basically for what I call normalization by the total 
system uptime.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] OOM handling

2001-03-27 Thread Martin Dalecki

Jonathan Morton wrote:

> 
> Oh and BTW, I think Bit/sqr(seconds) is a perfectly acceptable unit for
> "badness".  Think about it - it increases with pigginess and decreases with
> longevity.  I really don't see a problem with it per se.

Right it's not a problem pre se, but as you already explained
the problem is in the weightinig of different factors.
It's a matter of principle.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOM killer???

2001-03-27 Thread Martin Dalecki

Jonathan Morton wrote:
> 
> >Out of Memory: Killed process 117 (sendmail).
> >
> >What we did to run it out of memory, I don't know. But I do know that
> >it shouldn't be killing one process more than once... (the process
> >should not exist after one try...)
> 
> This is a known bug in the Out-of-Memory handler, where it does not count the buffer 
>and cache memory as "free" (it should), causing premature OOM killing.  It is, 
>however, normal for the OOM killer to attempt to kill a process more than once - it 
>takes a few scheduler cycles for the SIGKILL to actually reach the process and take 
>effect.
> 
> Also, it probably shouldn't have killed Sendmail, since that is usually a 
>long-running, low-UID (and important) process.  The OOM-kill selector is another 
>thing that wants fixing, and my patch contains a *very rough* beginning to this.
> 
> The following patch should solve your problem for now, until a more detailed fix 
>(which also clears up many other problems) is available in the stable kernel.
> 
> Alan and/or Linus may wish to apply this patch too...
> 
> (excerpt from my original patch from Saturday follows)
> 
> --- start ---
> diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
> --- linux-2.4.1.orig/mm/oom_kill.c  Tue Nov 14 18:56:46 2000
> +++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
> @@ -76,7 +76,9 @@
> run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
> 
> points /= int_sqrt(cpu_time);
> -   points /= int_sqrt(int_sqrt(run_time));
> +
> +   /* Long-running processes are *very* important, so don't take the 4th root */
> +   points /= run_time;
> 
> /*
>  * Niced processes are most likely less important, so double
> @@ -93,6 +95,10 @@
> p->uid == 0 || p->euid == 0)
> points /= 4;
> 
> +   /* Much the same goes for processes with low UIDs */
> +   if(p->uid < 100 || p->euid < 100)
> + points /= 2;
> +

Plase change to 100 to 500 - this would make it consistant with
the useradd command, which starts adding new users at the UID 500
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOM killer???

2001-03-27 Thread Martin Dalecki

Jonathan Morton wrote:
 
 Out of Memory: Killed process 117 (sendmail).
 
 What we did to run it out of memory, I don't know. But I do know that
 it shouldn't be killing one process more than once... (the process
 should not exist after one try...)
 
 This is a known bug in the Out-of-Memory handler, where it does not count the buffer 
and cache memory as "free" (it should), causing premature OOM killing.  It is, 
however, normal for the OOM killer to attempt to kill a process more than once - it 
takes a few scheduler cycles for the SIGKILL to actually reach the process and take 
effect.
 
 Also, it probably shouldn't have killed Sendmail, since that is usually a 
long-running, low-UID (and important) process.  The OOM-kill selector is another 
thing that wants fixing, and my patch contains a *very rough* beginning to this.
 
 The following patch should solve your problem for now, until a more detailed fix 
(which also clears up many other problems) is available in the stable kernel.
 
 Alan and/or Linus may wish to apply this patch too...
 
 (excerpt from my original patch from Saturday follows)
 
 --- start ---
 diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
 --- linux-2.4.1.orig/mm/oom_kill.c  Tue Nov 14 18:56:46 2000
 +++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
 @@ -76,7 +76,9 @@
 run_time = (jiffies - p-start_time)  (SHIFT_HZ + 10);
 
 points /= int_sqrt(cpu_time);
 -   points /= int_sqrt(int_sqrt(run_time));
 +
 +   /* Long-running processes are *very* important, so don't take the 4th root */
 +   points /= run_time;
 
 /*
  * Niced processes are most likely less important, so double
 @@ -93,6 +95,10 @@
 p-uid == 0 || p-euid == 0)
 points /= 4;
 
 +   /* Much the same goes for processes with low UIDs */
 +   if(p-uid  100 || p-euid  100)
 + points /= 2;
 +

Plase change to 100 to 500 - this would make it consistant with
the useradd command, which starts adding new users at the UID 500
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] OOM handling

2001-03-27 Thread Martin Dalecki

Jonathan Morton wrote:

 
 Oh and BTW, I think Bit/sqr(seconds) is a perfectly acceptable unit for
 "badness".  Think about it - it increases with pigginess and decreases with
 longevity.  I really don't see a problem with it per se.

Right it's not a problem pre se, but as you already explained
the problem is in the weightinig of different factors.
It's a matter of principle.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: OOM killer???

2001-03-27 Thread Martin Dalecki

Ingo Oeser wrote:
 
 On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
   @@ -93,6 +95,10 @@
   p-uid == 0 || p-euid == 0)
   points /= 4;
  
   +   /* Much the same goes for processes with low UIDs */
   +   if(p-uid  100 || p-euid  100)
   + points /= 2;
   +
 
  Plase change to 100 to 500 - this would make it consistant with
  the useradd command, which starts adding new users at the UID 500
 
 No, useradd reads usally the /etc/login.defs to select the range.
 The oom-killer should have configurables for that, to allow the
 policy decisions in USER space -- where it belongs -- not in KERNEL space

OK sysctl would be more appripriate.

 If we use my OOM killer API, this patch would be a module and
 could have module parameters to select that.
 
 Johnathan: I URGE you to apply my patch before adding OOM killer
stuff. What's wrong with it, that you cannot use it? ;-)
 
 It is easy to add configurables to a module and play with them
 WITHOUT recompiling.

It's total overkill and therefore not a good design.

 Dynamic sysctl tables would also be possible, IF we had an value
 that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.
 
 It is also better to include the egid into the decision. There
 are deamons, that I defintely want to be killed on a workstation,
 but not on a server.
 
 e.g. My important matlab calculation, which runs in user mode
 should not be killed. But killing a local webserver, which serves
 my help system is ok (because I will not loose work, and might
 get it over the net, if there is a problem).
 
 So as Rik stated: The OOM killer cannot suit all people, so it
 has to be configurable, to be OOM kill, not overkill ;-)

Irony: Why then not store this information permanently - inside
the UID of the application?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Martin Dalecki

"Eric W. Biederman" wrote:
> 
> Matthew Wilcox <[EMAIL PROTECTED]> writes:
> 
> > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> > > What do you mean by problems 5 years down the road?  The real issue is that
> > > this 32-bit block count limit affects composite devices like MD RAID and
> > > LVM today, not just individual disks.  There have been several postings
> > > I have seen with people having a problem _today_ with a 2TB limit on
> > > devices.
> >
> > people who can afford 2TB of disc can afford to buy a 64-bit processor.
> 
> Currently that doesn't solve the problem as block_nr is held in an int.
> And as gcc compiles an int to a 32bit number on a 64bit processor, the
> problem still isn't solved.
> 
> That at least we need to address.

And then you must face the fact that there may be the need for
some of the shelf software, which isn't well supported on 
correspondig 64 bit architectures... as well. So the
arguemnt doesn't hold up to the reality in any way.
BTW. For many reasons 32 bit architecutres are in
respoect of some application shemes *faster* the 64.
Ultra III in 64 mode just crawls in comparision to 32.
Alpha - unfortulatly an orphaned and dyring archtecutre... which
is not well supported by sw verndors...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Martin Dalecki

"Eric W. Biederman" wrote:
 
 Matthew Wilcox [EMAIL PROTECTED] writes:
 
  On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
   What do you mean by problems 5 years down the road?  The real issue is that
   this 32-bit block count limit affects composite devices like MD RAID and
   LVM today, not just individual disks.  There have been several postings
   I have seen with people having a problem _today_ with a 2TB limit on
   devices.
 
  people who can afford 2TB of disc can afford to buy a 64-bit processor.
 
 Currently that doesn't solve the problem as block_nr is held in an int.
 And as gcc compiles an int to a 32bit number on a 64bit processor, the
 problem still isn't solved.
 
 That at least we need to address.

And then you must face the fact that there may be the need for
some of the shelf software, which isn't well supported on 
correspondig 64 bit architectures... as well. So the
arguemnt doesn't hold up to the reality in any way.
BTW. For many reasons 32 bit architecutres are in
respoect of some application shemes *faster* the 64.
Ultra III in 64 mode just crawls in comparision to 32.
Alpha - unfortulatly an orphaned and dyring archtecutre... which
is not well supported by sw verndors...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   >