from:"Christoph Raisch"

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-15 Thread Christoph Raisch


Dave Hansen <[EMAIL PROTECTED]> wrote on 14.02.2008 18:12:43:

> On Thu, 2008-02-14 at 09:46 +0100, Christoph Raisch wrote:
> > Dave Hansen <[EMAIL PROTECTED]> wrote on 13.02.2008 18:05:00:
> > > On Wed, 2008-02-13 at 16:17 +0100, Jan-Bernd Themann wrote:
> > > > Constraints imposed by HW / FW:
> > > > - eHEA has own MMU
> > > > - eHEA  Memory Regions (MRs) are used by the eHEA MMU  to translate
> > virtual
> > > >   addresses to absolute addresses (like DMA mapped memory on a PCI
bus)
> > > > - The number of MRs is limited (not enough to have one MR per
packet)
> > >
> > > Are there enough to have one per 16MB section?
> >
> > Unfortunately this won't work. This was one of our first ideas we
tossed
> > out,
> > but the number of MRs will not be sufficient.
>
> Can you give a ballpark of how many there are to work with? 10? 100?
> 1000?
>
It depends on HMC configuration, but in worst case the upper limit is in
the 2 digits range.

> > > But, I'm really not convinced that you can actually keep this map
> > > yourselves.  It's not as simple as you think.  What happens if you
get
> > > on an LPAR with two sections, one [EMAIL PROTECTED] and another
> > > [EMAIL PROTECTED]  That's quite possible.  I think your
vmalloc'd
> > > array will eat all of memory.
> > I'm glad you mention this part. There are many algorithms out there to
> > handle this problem,
> > hashes/trees/... all of these trade speed for smaller memory footprint.
> > We based the table decission on the existing implementations of the
> > architecture.
> > Do you see such a case coming along for the next generation POWER
systems?
>
> Dude.  It exists *TODAY*.  Go take a machine, add tens of gigabytes of
> memory to it.  Then, remove all of the sections of memory in the middle.
> You'll be left with a very sparse memory configuration that we *DO*
> handle today in the core VM.  We handle it quite well, actually.
>
> The hypervisor does not shrink memory from the top down.  It pulls
> things out of the middle and shuffles things around.  In fact, a NUMA
> node's memory isn't even contiguous.
>
> Your code will OOM the machine in this case.  I consider the ehea driver
> buggy in this regard.

Your comment indicates that the upper limit for memory to be set on HMC
does not influence
the upper limit of the partition physical address space.
So our base assumption we discussed internally is wrong here.
(conclusion see below)
>
> > I would guess these drastic changes would also require changes in base
> > kernel.
>
> No, we actually solved those a couple years ago.
>
> > Will you provide a generic mapping system with a contiguous virtual
address
> > space
> > like the ehea_bmap we can query? This would need to be a "stable" part
of
> > the implementation,
> > including translation functions from kernel to
nextgen_ehea_generic_bmap
> > like virt_to_abs.
>
> Yes, that's a real possibility, especially if some other users for it
> come forward.  We could definitely add something like that to the
> generic code.  But, you'll have to be convincing that what we have now
> is insufficient.
>
> Does this requirement:
> "- MRs cover a contiguous virtual memory block (no holes)"
> come from the hardware?
>
yes
> Is that *EACH* MR?  OR all MRs?
>
each
> Where does EHEA_BUSMAP_START come from?  Is that defined in the
> hardware?  Have you checked to ensure that no other users might want a
> chunk of memory in that area?
>
EHEA_BUSMAP_START is a value which has to match between the wqe
virtual addresses and the MR used in them.
Fortunately there's a simple answer on that one. Each MR has a own address
space,
so there's no need to check.
A HEA MR actually has exactly the same attributes as a Infiniband MR with
this hardware.
send/receive processing is pretty much comparable to a Infiniband UD queue.

> Can you query the existing MRs?
no
> Not change them in place, but can you
> query their contents?
no
>
> > > That's why we have SPARSEMEM_EXTREME and SPARSEMEM_VMEMMAP
implemented
> > > in the core, so that we can deal with these kinds of problems, once
and
> > > *NOT* in every single little driver out there.
> > >
> > > > Functions to use while building ehea_bmap + MRs:
> > > > - Use either the functions that are used by the memory hotplug
system
> > as
> > > >   well, that means using the section defines + functions
> > (section_nr_to_pfn,
> > > >   pfn_valid)
> > >
> > > Basica

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-14 Thread Christoph Raisch

ing them
> properly.  You should assume that you can export and use
> walk_memory_resource().

So this seems to come down to a basic question:
New hardware seems to have a tendency to get "private MMUs",
which need private mappings from the kernel address space into a
"HW defined address space with potentially unique characteristics"
RDMA in Openfabrics with global MR is the most prominent example heading
there


>
> Do you know what other operating systems do with this hardware?

We're not aware of another open source Operating system trying to address
this topic.

>
> In the future, please make an effort to get review from knowledgeable
> people about these kinds of things before using them in your driver.
> Your company has many, many resources available, and all you need to do
> is ask.  All that you have to do is look to the tops of the files of the
> functions you are calling.
>

So we're glad we finally found the right person who takes responsibility
for this topic!


> -- Dave
>
Gruss / Regards
Christoph Raisch + Jan-Bernd Themann


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: Add kdump support

2007-11-26 Thread Christoph Raisch

Michael Ellerman wrote on 26.11.2007 09:16:28:
> Solutions that might be better:
>
>  a) if there are a finite number of handles and we can predict their
> values, just delete them all in the kdump kernel before the driver
> loads.

Guessing the values does not work, because of the handle structure
defined by the hypervisor.

>  b) if there are a small & finite number of handles, save their values
> in a device tree property and have the kdump kernel read them and
> delete them before the driver loads.

5*16*nr_ports+1+1=   >82. a ML16 has 4 adapters with up to 16 ports, so the
number
is not small anymore
The device tree functions are currently not exported.

If you crashdump to a new kernel, will it get the device tree
representation
of the crashed kernel or of the initial one of open firmware?

>  c) if neither of those work, provide a minimal routine that _only_
> deletes the handles in the crashed kernel.

I would hope this has the highest chance to actually work.
For this we would have to add a proper notifier chain.
Do you agree?

>  d) 

Firmware change? But that's not something you will get very soon.

Christoph R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: add kexec support

2007-11-05 Thread Christoph Raisch


Michael Neuling <[EMAIL PROTECTED]> wrote on 03.11.2007 07:06:31:

> > DD allocates HEA resources and gets firmware_handles for these
resources.
> > To free the resources DD needs to use exactly these handles.
> > There's no generic firmware call "clean out all resources".
> > Allocating the same resources twice does not work.
>
> Can we get a new firmware call to do this?

Well, there's no simple answer to this. I'm not working on firmware.
I'm trying to get an answer... but don't expect anything "real soon".

>
> > So a new kernel can't free the resources allocated by an old kernel,
> > because the numeric values of the handles aren't known anymore.
>
> How many possible handles are there?

Depends on system configuration, between
4 and 64 per port.

>
> If the handles are lost, is the only way to clear out the HEA resources
> is to reset the partition?

Yes, that's exactly the problem.

>
> > Potential Solution:
> > Hea driver cleanup function hooks into ppc_md.machine_crash_shutdown
> > and frees all firmware resources at shutdown time of the crashed
kernel.
>
> This means the crashed kernel now has to be trusted to shut down and
> free up the resources.  Isn't trusting the crashing kernel in this way
> against the whole kdump idea?

I would hope that if the cleanup routine only does hcalls
and does not change any kernel memory areas, then the risk to damage
anything
else  in kernel should be pretty small. This should allow to catch most
cases,
but as always you can imagine situations where the kernel memory is broken
beyond hope to even restart the kdump kernel.


>
> > crash_kexec continues and loads new kernel.
> > The new kernel restarts the HEA driver within kdump kernel, which will
work
> > because resources have been freed before.
> >
> > Michael, would this work?

Is ppc_md.machine_crash_shutdown the right hook?

Gruss/Regards
Christoph R


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: add kexec support

2007-11-02 Thread Christoph Raisch

Michael Ellerman <[EMAIL PROTECTED]> wrote on 02.11.2007 07:30:08:

> On Wed, 2007-10-31 at 20:48 +0100, Christoph Raisch wrote:
> > Michael Ellerman <[EMAIL PROTECTED]> wrote on 30.10.2007 23:50:36:
> If that's really the way it works then eHEA is more or less broken for
> kdump I'm afraid.

We think we have a way to workaround this, but let me first try to
explain the base problem.

DD allocates HEA resources and gets firmware_handles for these resources.
To free the resources DD needs to use exactly these handles.
There's no generic firmware call "clean out all resources".
Allocating the same resources twice does not work.

So a new kernel can't free the resources allocated by an old kernel,
because the numeric values of the handles aren't known anymore.

Potential Solution:
Hea driver cleanup function hooks into ppc_md.machine_crash_shutdown
and frees all firmware resources at shutdown time of the crashed kernel.
crash_kexec continues and loads new kernel.
The new kernel restarts the HEA driver within kdump kernel, which will work
because resources have been freed before.

Michael, would this work?

Gruss / Regards
Christoph R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: add kexec support

2007-10-31 Thread Christoph Raisch

Michael Ellerman <[EMAIL PROTECTED]> wrote on 30.10.2007 23:50:36:
>
> On Tue, 2007-10-30 at 09:39 +0100, Christoph Raisch wrote:
> >
> > Michael Ellerman <[EMAIL PROTECTED]> wrote on 28.10.2007 23:32:17:
> > Hope I didn't miss anything here...
>
> Perhaps. When we kdump the kernel does not call the reboot notifiers, so
> the code Jan-Bernd just added won't get called. So the eHEA resources
> won't be freed. When the kdump kernel tries to load the eHEA driver what
> will happen?
>
Good point.

If the device driver tries to allocate resources again (in the kdump
kernel),
which have been allocated before (in the crashed kernel) the hcalls will
fail because from the hypervisor view the resources are still in use.
Currently there's no method to find out the resource handles for these
HEA resources allocated by the crashed kernel within the hypervisor...

So we have to trigger a explicit deregister in the hypervisor before the
driver
is started again.
How do you recommend we should trigger this in the kdump process?
Is placing a hook into a ppc_md.machine_kexec be an option?

Gruss / Regards
Christoph R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: add kexec support

2007-10-30 Thread Christoph Raisch



Michael Ellerman <[EMAIL PROTECTED]> wrote on 28.10.2007 23:32:17:
>
>
> How do you plan to support kdump?
>

When kexec is fully supported kdump should work out of the box
as for any other ethernet card (if you load the right eth driver).
There's nothing specific to kdump you have to handle in
ethernet device drivers.
Hope I didn't miss anything here...

Gruss / Regards
Christoph R

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: new NAPI interface broken

2007-10-17 Thread Christoph Raisch


Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote on 16.10.2007
11:01:49:

>
>
> Christoph, have any of you tried it on powerpc ?

No we didn't try this (yet).
This approach makes a lot of sense.

Why is this not installed by both large distros on PPC by default?
how mature is this for larger SMPs on PPC?

>
> Cheers,
> Ben.
>
>
Gruss / Regards
Christoph R.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: new NAPI interface broken for POWER architecture?

2007-09-12 Thread Christoph Raisch



David Miller <[EMAIL PROTECTED]> wrote on 12.09.2007 14:50:04:

> From: Jan-Bernd Themann <[EMAIL PROTECTED]>
> Date: Fri, 7 Sep 2007 11:37:02 +0200
>
> > 2) On SMP systems: after netif_rx_complete has been called on CPU1
> >(+interruts enabled), netif_rx_schedule could be called on CPU2
> >(irq handler) before net_rx_action on CPU1 has checked
NAPI_STATE_SCHED.
> >In that case the device would be added to poll lists of CPU1 and
CPU2
> >as net_rx_action would see NAPI_STATE_SCHED set.
> >This must not happen. It will be caught when netif_rx_complete is
> >called the second time (BUG() called)
> >
> > This would mean we have a problem on all SMP machines right now.
>
> This is not a correct statement.
>
> Only on your platform do network device interrupts get moved
> around, no other platform does this.
>
> Sparc64 doesn't, all interrupts stay in one location after
> the cpu is initially choosen.
>
> x86 and x86_64 specifically do not move around network
> device interrupts, even though other device types do
> get dynamic IRQ cpu distribution.
>
> That's why you are the only person seeing this problem.
>
> I agree that it should be fixed, but we should also fix the IRQ
> distribution scheme used on powerpc platforms which is totally
> broken in these cases.

This is definitely not something we can change in the HEA device driver
alone.
It could also affect any other networking cards on POWER (e1000,s2io...).

Paul, Michael, Arndt, what is your opinion here?

Gruss / Regards
Christoph Raisch

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

2007-06-04 Thread Christoph Raisch


Christoph Hellwig wrote on 31.05.2007 15:41:18:

> I'm still very unhappy with having all this in various drivers.  There's
> a lot of code that can be turned into generic library functions, and even
> more code that could be made generic with some amount of refactoring.

Yes, we'd also prefer to use a generic function, but we first would want to
get
some "real world" experience how our driver behaves with LRO to be even
able to
define requirements for such a generic function. A lot of this is tied into
pathlengths,
caching, and why does that help compared to a different TCP receive side
processing?
In a perfect world we shouldn't see a diffference if this is enabled or
not,
but measurements indicate something completely different at 10gbit.

Gruss / Regards
Christoph Raisch


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

2007-06-04 Thread Christoph Raisch



Stephen Hemminger  wrote on 31.05.2007 18:37:03:

>
> >
> >
> > +static int try_get_ip_tcp_hdr(struct ehea_cqe *cqe, struct sk_buff
*skb,
> > +   struct iphdr **iph, struct tcphdr **tcph)
> > +{
> > +   int ip_len;
> > +
> > +   /* non tcp/udp packets */
> > +   if (!cqe->header_length)
> > +  return -1;
> > +
> > +   /* non tcp packet */
> > +   *iph = (struct iphdr *)(skb->data);
>
> Why the indirection, copying of headers..
This interacts with the header split function in the hardware.
>
> > +   if ((*iph)->protocol != IPPROTO_TCP)
> > +  return -1;
> > +
> > +   ip_len = (u8)((*iph)->ihl);
> > +   ip_len <<= 2;
> > +   *tcph = (struct tcphdr *)(((u64)*iph) + ip_len);
> > +
> > +   return 0;
> > +}
> > +
> >
>
> This code seems to be duplicating a lot  (but not all) of the TCP/IP
> input path validation checks. This is a security problem if nothing
else...
>
We should only do aggregation in the driver if this really is a TCP header,
otherwise things will get worse.
You're right, we should at least check that tcph is within the received
frame.

> Also, how do you prevent DoS attacks from hostile TCP senders that send
> huge number of back to back frames?

Actually a huge number of back to back frames is what we would want to
receive
at 10 gbit ;-)
How is it possible to figure out if this is what you want or just DoS?
It doesn't change anything compared to a non LRO driver, we process a
certain
maximum amount of frames before waiting for the next interrupt,
the packet filters/DoS should still see all traffic (which is above the
driver).
Any suggestions how to handle this better/different?

>
> --
> Stephen Hemminger

Gruss / Regards
Christoph Raisch

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.19-rc3 1/2] ehea: kzalloc GFP_ATOMIC fix

2006-10-27 Thread Christoph Raisch

Andrew Morton <[EMAIL PROTECTED]> wrote on 27.10.2006 05:13:13:

> On Wed, 25 Oct 2006 13:11:42 +0200
> Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:
>
> > This patch fixes kzalloc parameters (GFP_ATOMIC instead of GFP_KERNEL)
>
> why?

these few kcallocs run in atomic context in some situations.
therefore GFP_KERNEL is no good idea.

Christoph R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.19 PATCH 2/7] ehea: pHYP interface

2006-08-18 Thread Christoph Raisch


>
> Hi,
>
> > I asked SO to recount arguments and we've come to a conclusion that
> > there're in fact 19 args not 18 as the name suggests. 19 args is
> > I-N-S-A-N-E.
>
> It will be partially cleaned up by:
>
> http://ozlabs.org/pipermail/linuxppc-dev/2006-July/024556.html
>
> However it doesnt fix the fact someone has architected such a crazy
> interface :(
>
> Anton


well, just as background info, this is the wrapper around
a single assembly instruction which calls system firmware and takes
9 CPU registers for input and 9 CPU registers for output parameters.
This definition by platform architecture won't change in the near future,
but the good news is with Antons change the wrapper will look much nicer.

Gruss / Regards . . . Christoph R

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2.6.19 PATCH 3/7] ehea: queue management

2006-08-18 Thread Christoph Raisch

> You should really do some measurements to see what the minimal
> queue sizes are that can get you optimal throughput.
>
>Arnd <><

we did.
And as always in performance tuning... one size fits all unfortunately is
not the correct answer.
Therefore we'll leave that open to the user as most other new ethernet
driver did as well.

Regards . . . Christoph R

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 4/6] ehea: header files

2006-08-15 Thread Christoph Raisch



"Jenkins, Clive" wrote on 15.08.2006 12:53:05:

> > > You mean the eHEA has its own concept of page size? Separate from
> the
> > > page size used by the MMU?
> > >
> >
> > yes, the eHEA currently supports only 4K pages for queues
>
> In that case, I suggest use the kernel's page size, but add a
> compile-time
> check, and quit with an error message if driver does not support it.

eHEA does support other page sizes than 4k, but the HW interface expects to
see 4k pages
The adaption is done in the device driver, therefore we have a seperate 4k
define.


Regards . . . Christoph R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: EHEA: Is there a size limit below 80K for patches?

2006-06-08 Thread Christoph Raisch

well, now I'm confused...
2 People, two opinions

Here's a URL for a complete tarball, "sharing" the download location with
our other driver.
http://prdownloads.sourceforge.net/ibmehcad/ehea_EHEA_0002.tgz

We're waiting for a sourceforge project now since 9 days to put out a tgz,
and it looks like many other people get upset right now by their response
time.

So what's the "right"/preferred way to proceed?

Christop R.

> We're currently developing a new Ethernet device driver for a 10G IBM
chip
> for System p. (ppc64)
>
> A later version of the driver should end up in mainline kernel.
> How should we proceed to get first comments by the community?
> Either post this code as a patch to netdev or
yes

> put a full tarball on for example sourceforge?
nope.

Please read and observe:  Documentation/SubmittingPatches
and Section 3 of it, References, for other sources of
expectations/requirements.

The -mm tree also contains Documentation/SubmitChecklist
that you may find useful.

---
~Randy

Jeff Garzik <[EMAIL PROTECTED]> wrote on 08.06.2006 13:34:36:

> Jan-Bernd Themann wrote:
> > Hello,
> >
> > we tried two times to send a patch set. In both cases the second
> > (largest) patch
> > got lost. The first one was a bit above 100k, the second one we tried
> > was like 75K.
> >
> > Any idea what might be the problem?
>
> It might be size, or tripping a spam filter.
>
> For a new driver, do the sane thing and just post a URL.
>
>Jeff
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

new driver for IBM ethernet chip

2006-06-02 Thread Christoph Raisch

We're currently developing a new Ethernet device driver for a 10G IBM chip
for System p. (ppc64)

A later version of the driver should end up in mainline kernel.
How should we proceed to get first comments by the community?
Either post this code as a patch to netdev or
put a full tarball on for example sourceforge?


Gruss / Regards . . . Christoph Raisch

christoph raisch, HCAD teamlead, IODF2 (d/3627), ibm boeblingen lab,
phone: (+49/0)7031-16 4584,  fax: -16 2042, loc: 71032-05-003, internet:
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

Re: [PATCH] ehea: Add kdump support

Re: [PATCH] ehea: add kexec support

Re: [PATCH] ehea: add kexec support

Re: [PATCH] ehea: add kexec support

Re: [PATCH] ehea: add kexec support

Re: new NAPI interface broken

Re: new NAPI interface broken for POWER architecture?

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

Re: [PATCH 2/2] ehea: Receive SKB Aggregation

Re: [PATCH 2.6.19-rc3 1/2] ehea: kzalloc GFP_ATOMIC fix

Re: [2.6.19 PATCH 2/7] ehea: pHYP interface

Re: [2.6.19 PATCH 3/7] ehea: queue management

RE: [PATCH 4/6] ehea: header files

Re: EHEA: Is there a size limit below 80K for patches?

new driver for IBM ethernet chip

17 matches

Site Navigation

Mail list logo

Footer information