Re: [PATCH] fix undefined reference to device_power_up/resume

2007-08-24 Thread Olaf Hering
On Sat, Aug 25, Paul Mackerras wrote:

> Olaf Hering writes:
> 
> > So change even more places from PM to PM_SLEEP to allow linking.
> 
> What config shows these errors?  I presume you need to have CONFIG_PM
> but not CONFIG_PM_SLEEP in order to see them?

The .config below boots on a wallstreet.
atyfb hangs for some reason, pcmcia still broken. I will send separate
mails for these bugs.


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc3
# Fri Aug 24 21:05:04 2007
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_6xx=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
# CONFIG_SMP is not set
CONFIG_PPC32=y
CONFIG_PPC_MERGE=y
CONFIG_MMU=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_IRQ_PER_CPU=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
# CONFIG_ARCH_NO_VIRT_TO_BUS is not set
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_OF=y
CONFIG_OF=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_SYS_SUPPORTS_APM_EMULATION=y
# CONFIG_DEFAULT_UIMAGE is not set
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION="-wallstreet"
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=21
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
CONFIG_DEFAULT_NOOP=y
CONFIG_DEFAULT_IOSCHED="noop"

#
# Platform support
#
CONFIG_PPC_MULTIPLATFORM=y
# CONFIG_EMBEDDED6xx is not set
# CONFIG_PPC_82xx is not set
# CONFIG_PPC_83xx is not set
# CONFIG_PPC_86xx is not set
CONFIG_CLASSIC32=y
# CONFIG_PPC_CHRP is not set
# CONFIG_PPC_MPC52xx is not set
# CONFIG_PPC_MPC5200 is not set
# CONFIG_PPC_EFIKA is not set
# CONFIG_PPC_LITE5200 is not set
CONFIG_PPC_PMAC=y
# CONFIG_PPC_CELL is not set
# CONFIG_PPC_CELL_NATIVE is not set
# CONFIG_PQ2ADS is not set
CONFIG_PPC_NATIVE=y
CONFIG_MPIC=y
# CONFIG_MPIC_WEIRD is not set
# CONFIG_PPC_I8259 is not set
# CONFIG_PPC_RTAS is not set
# CONFIG_MMIO_NVRAM is not set
CONFIG_PPC_MPC106=y
# CONFIG_PPC_970_NAP is not set
# CONFIG_PPC_INDIRECT_IO is not set
# CONFIG_GENERIC_IOMAP is not set
# CONFIG_CPU_FREQ is not set
# CONFIG_PPC601_SYNC_FIX is not set
# CONFIG_TAU is not set
# CONFIG_CPM2 is not set
# CONFIG_FSL_ULI1575 is not set

#
# Kernel options
#
# CONFIG_HIGHMEM is not set
CONFIG_HZ_100=y
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
# CONFIG_KEXEC is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_PROC_DEVICETREE=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_PM=y
# CONFIG_PM_LEGACY is not set
# CONFIG_PM_DEBUG is not set
# CONFIG_SUSPEND is not set
# CONFIG_HIBERNATION is not set
CONFIG_APM_EMULATION=y
# CONFIG_SECCOMP is not set
#

Re: [PATCH 1/4] ehea: fix interface to DLPAR tools

2007-08-24 Thread Jeff Garzik
Jan-Bernd Themann wrote:
> Userspace DLPAR tool expects decimal numbers to be written to
> and read from sysfs entries.
> 
> Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>

applied 1-3


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ucc_geth: kill unused include

2007-08-24 Thread Jeff Garzik
Kumar Gala wrote:
> The ucc_geth_mii code is based on the gianfar_mii code that use to include
> ocp.h.  ucc never need this and it causes issues when we want to kill
> arch/ppc includes from arch/powerpc.
> 
> Signed-off-by: Kumar Gala <[EMAIL PROTECTED]>
> ---
> 
> Jeff, if you issue with this for 2.6.23, I'd prefer to push this via
> the powerpc.git trees in 2.6.24 as part of a larger cleanup.  Let me know
> one way or the other.
> 
> - k
> 
>  drivers/net/ucc_geth_mii.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
> index 6c257b8..df884f0 100644
> --- a/drivers/net/ucc_geth_mii.c
> +++ b/drivers/net/ucc_geth_mii.c
> @@ -32,7 +32,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 

Feel free to push via PPC git


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 4/4] ehea: show physical port state

2007-08-24 Thread Jeff Garzik
Jan-Bernd Themann wrote:
> Introduces a module parameter to decide whether the physical
> port link state is propagated to the network stack or not.
> It makes sense not to take the physical port state into account
> on machines with more logical partitions that communicate
> with each other. This is always possible no matter what the physical
> port state is. Thus eHEA can be considered as a switch there.
> 
> Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
> 
> ---
>  drivers/net/ehea/ehea.h  |5 -
>  drivers/net/ehea/ehea_main.c |   14 +-
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
> index d67f97b..8d58be5 100644
> --- a/drivers/net/ehea/ehea.h
> +++ b/drivers/net/ehea/ehea.h
> @@ -39,7 +39,7 @@
>  #include 
>  
>  #define DRV_NAME "ehea"
> -#define DRV_VERSION  "EHEA_0073"
> +#define DRV_VERSION  "EHEA_0074"
>  
>  /* eHEA capability flags */
>  #define DLPAR_PORT_ADD_REM 1
> @@ -402,6 +402,8 @@ struct ehea_mc_list {
>  
>  #define EHEA_PORT_UP 1
>  #define EHEA_PORT_DOWN 0
> +#define EHEA_PHY_LINK_UP 1
> +#define EHEA_PHY_LINK_DOWN 0
>  #define EHEA_MAX_PORT_RES 16
>  struct ehea_port {
>   struct ehea_adapter *adapter;/* adapter that owns this port */
> @@ -427,6 +429,7 @@ struct ehea_port {
>   u32 msg_enable;
>   u32 sig_comp_iv;
>   u32 state;
> + u8 phy_link;
>   u8 full_duplex;
>   u8 autoneg;
>   u8 num_def_qps;
> diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
> index db57474..1804c99 100644
> --- a/drivers/net/ehea/ehea_main.c
> +++ b/drivers/net/ehea/ehea_main.c
> @@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
>  static int sq_entries = EHEA_DEF_ENTRIES_SQ;
>  static int use_mcs = 0;
>  static int num_tx_qps = EHEA_NUM_TX_QP;
> +static int show_phys_link = 0;
>  
>  module_param(msg_level, int, 0);
>  module_param(rq1_entries, int, 0);
>  module_param(rq2_entries, int, 0);
>  module_param(rq3_entries, int, 0);
>  module_param(sq_entries, int, 0);
> +module_param(show_phys_link, int, 0);
>  module_param(use_mcs, int, 0);
>  module_param(num_tx_qps, int, 0);
>  
>  MODULE_PARM_DESC(num_tx_qps, "Number of TX-QPS");
>  MODULE_PARM_DESC(msg_level, "msg_level");
> +MODULE_PARM_DESC(show_phys_link, "Show link state of external port"
> +  "1:yes, 0: no.  Default = 0 ");
>  MODULE_PARM_DESC(rq3_entries, "Number of entries for Receive Queue 3 "
>"[2^x - 1], x = [6..14]. Default = "
>__MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) ")");
> @@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 
> port_speed)
>   ehea_error("Failed setting port speed");
>   }
>   }
> - netif_carrier_on(port->netdev);
> + if (!show_phys_link || (port->phy_link == EHEA_PHY_LINK_UP))
> + netif_carrier_on(port->netdev);
> +
>   kfree(cb4);
>  out:
>   return ret;
> @@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter 
> *adapter, u64 eqe)
>   }
>  
>   if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {
> + port->phy_link = EHEA_PHY_LINK_UP;
>   if (netif_msg_link(port))
>   ehea_info("%s: Physical port up",
> port->netdev->name);
> + if (show_phys_link)
> + netif_carrier_on(port->netdev);
>   } else {
> + port->phy_link = EHEA_PHY_LINK_DOWN;
>   if (netif_msg_link(port))
>   ehea_info("%s: Physical port down",
> port->netdev->name);
> + if (show_phys_link)
> + netif_carrier_off(port->netdev);

I think it's misnamed, calling it "show_xxx", because this (as the 
change description notes) controls propagation of carrier to the network 
stack.

Jeff



___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[2.6.23 PATCH] Fix SLB initialization at boot time

2007-08-24 Thread Paul Mackerras
This partially reverts edd0622bd2e8f755c960827e15aa6908c3c5aa94.

It turns out that the part of that commit that aimed to ensure that we
created an SLB entry for the kernel stack on secondary CPUs when
starting the CPU didn't achieve its aim, and in fact caused a
regression, because get_paca()->kstack is not initialized at the point
where slb_initialize is called.

This therefore just reverts that part of that commit, while keeping
the change to slb_flush_and_rebolt, which is correct and necessary.

Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>
---
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index a73d2d7..ff1811a 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -74,6 +74,22 @@ static inline void slb_shadow_clear(unsigned long entry)
get_slb_shadow()->save_area[entry].esid = 0;
 }
 
+static inline void create_shadowed_slbe(unsigned long ea, unsigned long flags,
+   unsigned long entry)
+{
+   /*
+* Updating the shadow buffer before writing the SLB ensures
+* we don't get a stale entry here if we get preempted by PHYP
+* between these two statements.
+*/
+   slb_shadow_update(ea, flags, entry);
+
+   asm volatile("slbmte  %0,%1" :
+: "r" (mk_vsid_data(ea, flags)),
+  "r" (mk_esid_data(ea, entry))
+: "memory" );
+}
+
 void slb_flush_and_rebolt(void)
 {
/* If you change this make sure you change SLB_NUM_BOLTED
@@ -226,12 +242,16 @@ void slb_initialize(void)
vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
/* Invalidate the entire SLB (even slot 0) & all the ERATS */
-   slb_shadow_update(PAGE_OFFSET, lflags, 0);
-   asm volatile("isync; slbia; sync; slbmte  %0,%1; isync" ::
-"r" (get_slb_shadow()->save_area[0].vsid),
-"r" (get_slb_shadow()->save_area[0].esid) : "memory");
-
-   slb_shadow_update(VMALLOC_START, vflags, 1);
-
-   slb_flush_and_rebolt();
+   asm volatile("isync":::"memory");
+   asm volatile("slbmte  %0,%0"::"r" (0) : "memory");
+   asm volatile("isync; slbia; isync":::"memory");
+   create_shadowed_slbe(PAGE_OFFSET, lflags, 0);
+
+   create_shadowed_slbe(VMALLOC_START, vflags, 1);
+
+   /* We don't bolt the stack for the time being - we're in boot,
+* so the stack is in the bolted segment.  By the time it goes
+* elsewhere, we'll call _switch() which will bolt in the new
+* one. */
+   asm volatile("isync":::"memory");
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-24 Thread Stephen Rothwell
On Fri, 24 Aug 2007 13:11:04 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
>
> On Fri, Aug 24, 2007 at 02:05:31PM +1000, Stephen Rothwell wrote:
> > 
> > It is not documented as such (as far as I can see), but pci_dev_put is
> > safe to call with NULL. And there are other places in the kernel that
> > explicitly use that fact.
> 
> Some places check, others do not. I'll leave it be for now but might take
> care of it during some future cleanup. Thanks for point it out though.

No worries.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpxwx8N5mAYu.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] fix undefined reference to device_power_up/resume

2007-08-24 Thread Al Viro
On Sat, Aug 25, 2007 at 11:10:13AM +1000, Paul Mackerras wrote:
> Olaf Hering writes:
> 
> > So change even more places from PM to PM_SLEEP to allow linking.
> 
> What config shows these errors?  I presume you need to have CONFIG_PM
> but not CONFIG_PM_SLEEP in order to see them?

E.g. PM, PPC32, SMP.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread akepner
On Fri, Aug 24, 2007 at 02:47:11PM -0700, David Miller wrote:

> 
> Someone should reference that thread _now_ before this discussion goes
> too far and we repeat a lot of information ..

Here's part of the thread:
http://marc.info/?t=11159530601&r=1&w=2

Also, Jamal's paper may be of interest - Google for ""when napi comes 
to town".

-- 
Arthur

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Bodo Eggert
Linas Vepstas <[EMAIL PROTECTED]> wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:

>> 3) On modern systems the incoming packets are processed very fast. Especially
>> on SMP systems when we use multiple queues we process only a few packets
>> per napi poll cycle. So NAPI does not work very well here and the interrupt
>> rate is still high.
> 
> I saw this too, on a system that is "modern" but not terribly fast, and
> only slightly (2-way) smp. (the spidernet)
> 
> I experimented wih various solutions, none were terribly exciting.  The
> thing that killed all of them was a crazy test case that someone sprung on
> me:  They had written a worst-case network ping-pong app: send one
> packet, wait for reply, send one packet, etc.
> 
> If I waited (indefinitely) for a second packet to show up, the test case
> completely stalled (since no second packet would ever arrive).  And if I
> introduced a timer to wait for a second packet, then I just increased
> the latency in the response to the first packet, and this was noticed,
> and folks complained.

Possible solution / possible brainfart:

Introduce a timer, but don't start to use it to combine packets unless you
receive n packets within the timeframe. If you receive less than m packets
within one timeframe, stop using the timer. The system should now have a
decent response time when the network is idle, and when the network is
busy, nobody will complain about the latency.-)
-- 
Funny quotes:
22. When everything's going your way, you're in the wrong lane and and going
the wrong way.
Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Stevens
Stephen Hemminger <[EMAIL PROTECTED]> wrote on 08/24/2007 
08:52:03 AM:

> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done 
by the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs 
have expired.

For generic hardware that doesn't support it, couldn't you use an 
estimater
and adjust the timer dynamicly in software based on sampled values? Switch 
to per-packet
interrupts when the receive rate is low...
Actually, that's how I thought NAPI worked before I found out 
otherwise (ie,
before I looked :-)).

The hardware-accelerated one is essentially siloing as done by 
ancient serial
devices on UNIX systems. If you had a tunable for a target count, and an 
estimator
for the time interval, then switch to per-packet when the estimator 
exceeds a tunable
max threshold (and also, I suppose, if you near overflowing the ring on 
the min
timer granularity), you get almost all of it, right?
Problem is if it increases rapidly, you may drop packets before 
you notice
that the ring is full in the current estimated interval.

 +-DLS


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread akepner
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> ...
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. What we need would be some sort of timer polling mode 
>    which will schedule a device after a certain amount of time for high load 
>    situations. With high precision timers this could work well. Current
>    usual timers are too slow. A finer granularity would be needed to keep the
>latency down (and queue length moderate).
> 

We found the same on ia64-sn systems with tg3 a couple of years 
ago. Using simple interrupt coalescing ("don't interrupt until 
you've received N packets or M usecs have elapsed") worked 
reasonably well in practice. If your h/w supports that (and I'd 
guess it does, since it's such a simple thing), you might try 
it.

-- 
Arthur

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fix undefined reference to device_power_up/resume

2007-08-24 Thread Paul Mackerras
Olaf Hering writes:

> So change even more places from PM to PM_SLEEP to allow linking.

What config shows these errors?  I presume you need to have CONFIG_PM
but not CONFIG_PM_SLEEP in order to see them?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 05/20] bootwrapper: flatdevtree fixes

2007-08-24 Thread David Gibson
On Fri, Aug 24, 2007 at 09:48:37AM -0500, Scott Wood wrote:
> On Fri, Aug 24, 2007 at 11:01:22AM +1000, David Gibson wrote:
> > On Thu, Aug 23, 2007 at 12:48:30PM -0500, Scott Wood wrote:
> > > It's likely to be ugly no matter what, though I'll try to come up with 
> > > something slightly nicer.  If I were doing this code from scratch, I'd 
> > > probably liven the tree first and reflatten it to pass to the kernel.
> > 
> > Eh, probably not worth bothering doing an actual implementation at
> > this stage - I'll have to redo it for libfdt anyway.
> 
> Too late, I already wrote it -- it wasn't as bad as I thought it would
> be.

Well, there you go.

> > flatdevtree uses some of the information it caches in the phandle
> > context stuff to remember who's the parent of a node.  libfdt uses raw
> > offsets into the structure, so the *only* way to implement
> > get_parent() is to rescan the dt from the beginning, keeping track of
> > parents until reaching the given node.
> 
> What is the benefit of doing it that way?

Most other operations are simpler like this - no more futzing around
converting between phandles and offsets and back again at the
beginning and end of most functions.

More importantly, it allows libfdt to be "stateless" in the sense that
you can manipulate the device tree without having to maintain any
context or state structure apart from the device tree blob itself.
That's particularly handy for doing read-only accesses really early
with a minimum of fuss.

In particular, it means libfdt does not need malloc().  That can be
rather useful for some that's supposed to be embeddable in a variety
of strange, constrained environments such as bootloaders and
firmwares.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
> From: David Stevens <[EMAIL PROTECTED]>
> Date: Fri, 24 Aug 2007 09:50:58 -0700
> 
> > Problem is if it increases rapidly, you may drop packets
> > before you notice that the ring is full in the current estimated
> > interval.
> 
> This is one of many reasons why hardware interrupt mitigation
> is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] Handle alignment faults on SPE load/store instructions

2007-08-24 Thread Kumar Gala
This adds code to handle alignment traps generated by the following
SPE (signal processing engine) load/store instructions, by emulating
the instruction in the kernel (as is done for other instructions that
generate alignment traps):

evldd[x] Vector Load Double Word into Double Word [Indexed]
evldw[x] Vector Load Double into Two Words [Indexed]
evldh[x] Vector Load Double into Four Half Words [Indexed]
evlhhesplat[x]   Vector Load Half Word into Half Words Even and Splat [Indexed]
evlhhousplat[x]  Vector Load Half Word into Half Word Odd Unsigned and Splat 
[Indexed]
evlhhossplat[x]  Vector Load Half Word into Half Word Odd Signed and Splat 
[Indexed]
evlwhe[x]Vector Load Word into Two Half Words Even [Indexed]
evlwhou[x]   Vector Load Word into Two Half Words Odd Unsigned 
(zero-extended) [Indexed]
evlwhos[x]   Vector Load Word into Two Half Words Odd Signed (with sign 
extension) [Indexed]
evlwwsplat[x]Vector Load Word into Word and Splat [Indexed]
evlwhsplat[x]Vector Load Word into Two Half Words and Splat [Indexed]
evstdd[x]Vector Store Double of Double [Indexed]
evstdw[x]Vector Store Double of Two Words [Indexed]
evstdh[x]Vector Store Double of Four Half Words [Indexed]
evstwhe[x]   Vector Store Word of Two Half Words from Even [Indexed]
evstwho[x]   Vector Store Word of Two Half Words from Odd [Indexed]
evstwwe[x]   Vector Store Word of Word from Even [Indexed]
evstwwo[x]   Vector Store Word of Word from Odd [Indexed]

---

Exists in my git, tree posted here for review.

 arch/powerpc/kernel/align.c |  250 +++
 1 files changed, 250 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index 4c47f9c..e06f75d 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -46,6 +46,8 @@ struct aligninfo {
 #define S  0x40/* single-precision fp or... */
 #define SX 0x40/* ... byte count in XER */
 #define HARD   0x80/* string, stwcx. */
+#define E4 0x40/* SPE endianness is word */
+#define E8 0x80/* SPE endianness is double word */

 /* DSISR bits reported for a DCBZ instruction: */
 #define DCBZ   0x5f/* 8xx/82xx dcbz faults when cache not enabled */
@@ -392,6 +394,248 @@ static int emulate_fp_pair(struct pt_regs *regs, unsigned 
char __user *addr,
return 1;   /* exception handled and fixed up */
 }

+#ifdef CONFIG_SPE
+
+static struct aligninfo spe_aligninfo[32] = {
+   { 8, LD+E8 },   /* 0 00 00: evldd[x] */
+   { 8, LD+E4 },   /* 0 00 01: evldw[x] */
+   { 8, LD },  /* 0 00 10: evldh[x] */
+   INVALID,/* 0 00 11 */
+   { 2, LD },  /* 0 01 00: evlhhesplat[x] */
+   INVALID,/* 0 01 01 */
+   { 2, LD },  /* 0 01 10: evlhhousplat[x] */
+   { 2, LD+SE },   /* 0 01 11: evlhhossplat[x] */
+   { 4, LD },  /* 0 10 00: evlwhe[x] */
+   INVALID,/* 0 10 01 */
+   { 4, LD },  /* 0 10 10: evlwhou[x] */
+   { 4, LD+SE },   /* 0 10 11: evlwhos[x] */
+   { 4, LD+E4 },   /* 0 11 00: evlwwsplat[x] */
+   INVALID,/* 0 11 01 */
+   { 4, LD },  /* 0 11 10: evlwhsplat[x] */
+   INVALID,/* 0 11 11 */
+
+   { 8, ST+E8 },   /* 1 00 00: evstdd[x] */
+   { 8, ST+E4 },   /* 1 00 01: evstdw[x] */
+   { 8, ST },  /* 1 00 10: evstdh[x] */
+   INVALID,/* 1 00 11 */
+   INVALID,/* 1 01 00 */
+   INVALID,/* 1 01 01 */
+   INVALID,/* 1 01 10 */
+   INVALID,/* 1 01 11 */
+   { 4, ST },  /* 1 10 00: evstwhe[x] */
+   INVALID,/* 1 10 01 */
+   { 4, ST },  /* 1 10 10: evstwho[x] */
+   INVALID,/* 1 10 11 */
+   { 4, ST+E4 },   /* 1 11 00: evstwwe[x] */
+   INVALID,/* 1 11 01 */
+   { 4, ST+E4 },   /* 1 11 10: evstwwo[x] */
+   INVALID,/* 1 11 11 */
+};
+
+#defineEVLDD   0x00
+#defineEVLDW   0x01
+#defineEVLDH   0x02
+#defineEVLHHESPLAT 0x04
+#defineEVLHHOUSPLAT0x06
+#defineEVLHHOSSPLAT0x07
+#defineEVLWHE  0x08
+#defineEVLWHOU 0x0A
+#defineEVLWHOS 0x0B
+#defineEVLWWSPLAT  0x0C
+#defineEVLWHSPLAT  0x0E
+#defineEVSTDD  0x10
+#defineEVSTDW  0x11
+#defineEVSTDH  0x12
+#defineEVSTWHE 0x18
+#defineEVSTWHO 0x1A
+#defineEVSTWWE 0x1C
+#defineEVSTWWO 0x1E
+
+/*
+ * Emulate SPE loads and stores.
+ * Only Book-E has these instru

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: James Chapman <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 18:16:45 +0100

> Does hardware interrupt mitigation really interact well with NAPI?

It interacts quite excellently.

There was a long saga about this with tg3 and huge SGI numa
systems with large costs for interrupt processing, and the
fix was to do a minimal amount of interrupt mitigation and
this basically cleared up all the problems.

Someone should reference that thread _now_ before this discussion goes
too far and we repeat a lot of information and people like myself have
to stay up all night correcting the misinformation and
misunderstandings that are basically guarenteed for this topic :)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 09:50:58 -0700

> Problem is if it increases rapidly, you may drop packets
> before you notice that the ring is full in the current estimated
> interval.

This is one of many reasons why hardware interrupt mitigation
is really needed for this.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: [EMAIL PROTECTED] (Linas Vepstas)
Date: Fri, 24 Aug 2007 11:45:41 -0500

> In the end, I just let it be, and let the system work as a
> busy-beaver, with the high interrupt rate. Is this a wise thing to
> do?

The tradeoff is always going to be latency vs. throughput.

A sane default should defer enough to catch multiple packets coming in
at something close to line rate, but not so much that latency unduly
suffers.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 15:59:16 +0200

> 1) The current implementation of netif_rx_schedule, netif_rx_complete
>    and the net_rx_action have the following problem: netif_rx_schedule
>    sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
>    netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
>    to the poll_list again (as well). netif_rx_complete clears the 
> NAPI_STATE_SCHED.
>    If an interrupt handler calls netif_rx_schedule on CPU 2
>    after netif_rx_complete has been called on CPU 1 (and the poll function 
>    has not returned yet), the NAPI instance will be added twice to the 
>    poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
>    netif_rx_complete is called twice for the device (BUG() called)

Indeed, this is the "who should manage the list" problem.
Probably the answer is that whoever transitions the NAPI_STATE_SCHED
bit from cleared to set should do the list addition.

Patches welcome :-)

> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. What we need would be some sort of timer polling mode 
>    which will schedule a device after a certain amount of time for high load 
>    situations. With high precision timers this could work well. Current
>    usual timers are too slow. A finer granularity would be needed to keep the
>latency down (and queue length moderate).

This is why minimal levels of HW interrupt mitigation should be enabled
in your chip.  If it does not support this, you will indeed need to look
into using high resolution timers or other schemes to alleviate this.

I do not think it deserves a generic core networking helper facility,
the chips that can't mitigate interrupts are few and obscure.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 11:11:56PM +0200, Jan-Bernd Themann wrote:
> (when they are available for
> POWER in our case). 

hrtimer worked fine on the powerpc cell arch last summer.
I assume they work on p5 and p6 too, no ??

> I tried to implement something with "normal" timers, but the result
> was everything but great. The timers seem to be far too slow.
> I'm not sure if it helps to increase it from 1000HZ to 2500HZ
> or more.

Heh. Do the math. Even on 1gigabit cards, that's not enough:

(1gigabit/sec) x (byte/8 bits) x (packet/1500bytes) x (sec/1000 jiffy) 

is 83 packets a jiffy (for big packets, even more for small packets, 
and more again for 10 gigabit cards). So polling once per jiffy is a 
latency disaster.

--linas  

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread David Miller
From: Jan-Bernd Themann <[EMAIL PROTECTED]>
Date: Fri, 24 Aug 2007 15:59:16 +0200

>    It would be nice if it is possible to schedule queues to other CPU's, or
>    at least to use interrupts to put the queue to another cpu (not nice for 
>    as you never know which one you will hit). 
>    I'm not sure how bad the tradeoff would be.

Once the per-cpu NAPI poll queues start needing locks, much of the
gain will be lost.  This is strictly what we want to avoid.

We need real facilities for IRQ distribution policies.  With that none
of this is an issue.

This is also a platform specific problem with IRQ behavior, the IRQ
distibution scheme you mention would never occur on sparc64 for
example.  We use a fixed round-robin distribution of interrupts to
CPUS there, they don't move.

Each scheme has it's advantages, but you want a difference scheme here
than what is implemented and the fix is therefore not in the
networking :-)

Furthermore, most cards that will be using multi-queue will be
using hashes on the packet headers to choose the MSI-X interrupt
and thus the cpu to be targetted.  Those cards will want fixed
instead of dynamic interrupt to cpu distribution schemes as well,
so your problem is not unique and they'll need the same fix as
you do.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread James Chapman
Stephen Hemminger wrote:
> On Fri, 24 Aug 2007 17:47:15 +0200
> Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:
> 
>> Hi,
>>
>> On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
>>> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
 ...
 3) On modern systems the incoming packets are processed very fast. 
 Especially
on SMP systems when we use multiple queues we process only a few packets
per napi poll cycle. So NAPI does not work very well here and the 
 interrupt 
rate is still high. What we need would be some sort of timer polling 
 mode 
which will schedule a device after a certain amount of time for high 
 load 
situations. With high precision timers this could work well. Current
usual timers are too slow. A finer granularity would be needed to keep 
 the
latency down (and queue length moderate).

>>> We found the same on ia64-sn systems with tg3 a couple of years 
>>> ago. Using simple interrupt coalescing ("don't interrupt until 
>>> you've received N packets or M usecs have elapsed") worked 
>>> reasonably well in practice. If your h/w supports that (and I'd 
>>> guess it does, since it's such a simple thing), you might try 
>>> it.
>>>
>> I don't see how this should work. Our latest machines are fast enough that 
>> they
>> simply empty the queue during the first poll iteration (in most cases).
>> Even if you wait until X packets have been received, it does not help for
>> the next poll cycle. The average number of packets we process per poll queue
>> is low. So a timer would be preferable that periodically polls the 
>> queue, without the need of generating a HW interrupt. This would allow us
>> to wait until a reasonable amount of packets have been received in the 
>> meantime
>> to keep the poll overhead low. This would also be useful in combination
>> with LRO.
>>
> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by 
> the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have 
> expired.

Does hardware interrupt mitigation really interact well with NAPI? In my 
experience, holding off interrupts for X packets or Y usecs does more 
harm than good; such hardware features are useful only when the OS has 
no NAPI-like mechanism.

When tuning NAPI drivers for packets/sec performance (which is a good 
indicator of driver performance), I make sure that the driver stays in 
NAPI polled mode while it has any rx or tx work to do. If the CPU is 
fast enough that all work is always completed on each poll, I have the 
driver stay in polled mode until dev->poll() is called N times with no 
work being done. This keeps interrupts disabled for reasonable traffic 
levels, while minimizing packet processing latency. No need for hardware 
interrupt mitigation.

> The parameters for controlling it are already in ethtool, the issue is 
> finding a good
> default set of values for a wide range of applications and architectures. 
> Maybe some
> heuristic based on processor speed would be a good starting point. The 
> dynamic irq
> moderation stuff is not widely used because it is too hard to get right.

I agree. It would be nice to find a way for the typical user to derive 
best values for these knobs for his/her particular system. Perhaps a 
tool using pktgen and network device phy internal loopback could be 
developed?

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Linas Vepstas schrieb:
> On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
>   
>> Linas Vepstas <[EMAIL PROTECTED]> wrote:
>> 
>>> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
>>>   
 3) On modern systems the incoming packets are processed very fast. 
 Especially
 on SMP systems when we use multiple queues we process only a few packets
 per napi poll cycle. So NAPI does not work very well here and the interrupt
 rate is still high.
 
>>> worst-case network ping-pong app: send one
>>> packet, wait for reply, send one packet, etc.
>>>   
>> Possible solution / possible brainfart:
>>
>> Introduce a timer, but don't start to use it to combine packets unless you
>> receive n packets within the timeframe. If you receive less than m packets
>> within one timeframe, stop using the timer. The system should now have a
>> decent response time when the network is idle, and when the network is
>> busy, nobody will complain about the latency.-)
>> 
>
> Ohh, that was inspirational. Let me free-associate some wild ideas.
>
> Suppose we keep a running average of the recent packet arrival rate,
> Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
> flat-out).  If we could poll the driver at a rate of 10-20 per
> millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
> then we could potentially service the card without ever having to enable 
> interrupts on the card, and without hurting latency.
>
> If the packet arrival rate becomes slow enough, we go back to an
> interrupt-driven scheme (to keep latency down).
>
> The main problem here is that, even for HZ=1000 machines, this amounts 
> to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
> using the high-resolution timers. And, umm, don't the HR timers require
> a cpu timer interrupt to make them go? So its not clear that this is much
> of a win.
>   
That is indeed a good question. At least for 10G eHEA we see
that the average number of packets/poll cycle is very low.
With high precision timers we could control the poll interval
better and thus make sure we get enough packets on the queue in
high load situations to benefit from LRO while keeping the
latency moderate. When the traffic load is low we could just
stick to plain NAPI. I don't know how expensive hp timers are,
we probably just have to test it (when they are available for
POWER in our case). However, having more packets
per poll run would make LRO more efficient and thus the total
CPU utilization would decrease.

I guess on most systems there are not many different network
cards working in parallel. So if the driver could set the poll
interval for its devices, it could be well optimized depending
on the NICs characteristics.

Maybe it would be good enough to have a timer that schedules
the device for NAPI (and thus triggers SoftIRQs, which will
trigger NAPI). Whether this timer would be used via a generic
interface or would be implemented as a proprietary solution
would depend on whether other drivers want / need this feature
as well. Drivers / NICs that work fine with plain NAPI don't
have to use timer :-)

I tried to implement something with "normal" timers, but the result
was everything but great. The timers seem to be far too slow.
I'm not sure if it helps to increase it from 1000HZ to 2500HZ
or more.

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/6] PowerPC 440EPx: Sequoia DTS

2007-08-24 Thread Segher Boessenkool
 address-permutation = <0 1 3 2 4 5 7 6 e f d c a b 9 8>;
>
>>> Yes, I was contemplating something like that.
>
>> Let's not define this until we need it though :-)
>
>Let's ot even think of it,

It is good to think about it, for the simple reason that it
validates whether the current design is future-proof or not.

> since this will end up in a "catch all" driver,

Yeah, we shouldn't _define_ anything like this, not until
it is needed anyway.

> and yet this may be not enough when the flash doesn't support 8-but 
> R/W, for example (I've already quoted it...

Yeah.  There is no need to future-proof to insane designs anyway;
whatever can not fit in the "generic" framework can bloody well
just do its own binding, no need to pollute the generic thing.

 I haven't heard or thought of anything better either.  Using 
 "ranges"
 is conceptually wrong, even ignoring the technical problems that 
 come
 with it.
>>> Why is "ranges" conceptually wrong?
>
>> The flash partitions aren't separate devices sitting on a
>
>Yeah, that's why I decided not to go that from the very start... 
> though wait: I didn't do this simply because they'renot devices.
> That lead me to interesting question: do device tree have something 
> for the disk partitions?

Some do.  Most don't.  There is no standardised binding I know of.

The big huge difference here is that disks typically do contain
partitioning information on the disk itself, and flash doesn't.

>> "flash bus", they are "sub-devices" of their parent.
>
>They're quite an abstaction of a device -- althogh Linux treats 
> them as separate devices indeed.

Sure, it's a pseudo-device.  Nothing new there.

>>> To be honest this looks rather to me like another case where having
>>> overlapping 'reg' and 'ranges' would actually make sense.
>
>> It never makes sense.  You should give the "master" device
>> the full "reg" range it covers, and have it define its own
>> address space; "sub-devices" can carve out their little hunk
>> from that.  You don't want more than one device owning the
>> same address range in the same address space.
>
>So, no "ranges" prop in MTD node is necessary? Phew... :-)

Yeah, it would be positively harmful.  They are pseudo-devices
only, the kernel device driver needs to always access the real
device.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
> Linas Vepstas <[EMAIL PROTECTED]> wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> >> 3) On modern systems the incoming packets are processed very fast. 
> >> Especially
> >> on SMP systems when we use multiple queues we process only a few packets
> >> per napi poll cycle. So NAPI does not work very well here and the interrupt
> >> rate is still high.
> > 
> > worst-case network ping-pong app: send one
> > packet, wait for reply, send one packet, etc.
> 
> Possible solution / possible brainfart:
> 
> Introduce a timer, but don't start to use it to combine packets unless you
> receive n packets within the timeframe. If you receive less than m packets
> within one timeframe, stop using the timer. The system should now have a
> decent response time when the network is idle, and when the network is
> busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more "fair" to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.
 
--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] fix undefined reference to device_power_up/resume

2007-08-24 Thread Olaf Hering

Current Linus tree fails to link on pmac32:

drivers/built-in.o: In function `pmac_wakeup_devices':
via-pmu.c:(.text+0x5bab4): undefined reference to `device_power_up'
via-pmu.c:(.text+0x5bb08): undefined reference to `device_resume'
drivers/built-in.o: In function `pmac_suspend_devices':
via-pmu.c:(.text+0x5c260): undefined reference to `device_power_down'
via-pmu.c:(.text+0x5c27c): undefined reference to `device_resume'
make[1]: *** [.tmp_vmlinux1] Error 1

changing CONFIG_PM > CONFIG_PM_SLEEP leads to:

drivers/built-in.o: In function `pmu_led_set':
via-pmu-led.c:(.text+0x5cdca): undefined reference to `pmu_sys_suspended'
via-pmu-led.c:(.text+0x5cdce): undefined reference to `pmu_sys_suspended'
drivers/built-in.o: In function `pmu_req_done':
via-pmu-led.c:(.text+0x5ce3e): undefined reference to `pmu_sys_suspended'
via-pmu-led.c:(.text+0x5ce42): undefined reference to `pmu_sys_suspended'
drivers/built-in.o: In function `adb_init':
(.init.text+0x4c5c): undefined reference to `pmu_register_sleep_notifier'
make[1]: *** [.tmp_vmlinux1] Error 1

So change even more places from PM to PM_SLEEP to allow linking.

Signed-off-by: Olaf Hering <[EMAIL PROTECTED]>

---
 drivers/macintosh/adb.c |4 ++--
 drivers/macintosh/via-pmu.c |   34 +-
 include/linux/pmu.h |2 +-
 3 files changed, 20 insertions(+), 20 deletions(-)

--- a/drivers/macintosh/adb.c
+++ b/drivers/macintosh/adb.c
@@ -89,7 +89,7 @@ static int sleepy_trackpad;
 static int autopoll_devs;
 int __adb_probe_sync;
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static void adb_notify_sleep(struct pmu_sleep_notifier *self, int when);
 static struct pmu_sleep_notifier adb_sleep_notifier = {
adb_notify_sleep,
@@ -313,7 +313,7 @@ int __init adb_init(void)
printk(KERN_WARNING "Warning: no ADB interface detected\n");
adb_controller = NULL;
} else {
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
pmu_register_sleep_notifier(&adb_sleep_notifier);
 #endif /* CONFIG_PM */
 #ifdef CONFIG_PPC
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -152,10 +152,10 @@ static spinlock_t pmu_lock;
 static u8 pmu_intr_mask;
 static int pmu_version;
 static int drop_interrupts;
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_PPC32)
 static int option_lid_wakeup = 1;
-#endif /* CONFIG_PM && CONFIG_PPC32 */
-#if 
(defined(CONFIG_PM)&&defined(CONFIG_PPC32))||defined(CONFIG_PMAC_BACKLIGHT_LEGACY)
+#endif /* CONFIG_PM_SLEEP && CONFIG_PPC32 */
+#if 
(defined(CONFIG_PM_SLEEP)&&defined(CONFIG_PPC32))||defined(CONFIG_PMAC_BACKLIGHT_LEGACY)
 static int sleep_in_progress;
 #endif
 static unsigned long async_req_locks;
@@ -875,7 +875,7 @@ proc_read_options(char *page, char **sta
 {
char *p = page;
 
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_PPC32)
if (pmu_kind == PMU_KEYLARGO_BASED &&
pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,0,-1) >= 0)
p += sprintf(p, "lid_wakeup=%d\n", option_lid_wakeup);
@@ -916,7 +916,7 @@ proc_write_options(struct file *file, co
*(val++) = 0;
while(*val == ' ')
val++;
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_PPC32)
if (pmu_kind == PMU_KEYLARGO_BASED &&
pmac_call_feature(PMAC_FTR_SLEEP_STATE,NULL,0,-1) >= 0)
if (!strcmp(label, "lid_wakeup"))
@@ -1738,7 +1738,7 @@ pmu_present(void)
return via != 0;
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 
 static LIST_HEAD(sleep_notifiers);
 
@@ -1769,9 +1769,9 @@ pmu_unregister_sleep_notifier(struct pmu
return 0;
 }
 EXPORT_SYMBOL(pmu_unregister_sleep_notifier);
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */
 
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_PPC32)
 
 /* Sleep is broadcast last-to-first */
 static void broadcast_sleep(int when)
@@ -2390,7 +2390,7 @@ powerbook_sleep_3400(void)
return 0;
 }
 
-#endif /* CONFIG_PM && CONFIG_PPC32 */
+#endif /* CONFIG_PM_SLEEP && CONFIG_PPC32 */
 
 /*
  * Support for /dev/pmu device
@@ -2573,7 +2573,7 @@ pmu_ioctl(struct inode * inode, struct f
int error = -EINVAL;
 
switch (cmd) {
-#if defined(CONFIG_PM) && defined(CONFIG_PPC32)
+#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_PPC32)
case PMU_IOC_SLEEP:
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -2601,7 +2601,7 @@ pmu_ioctl(struct inode * inode, struct f
return put_user(0, argp);
else
return put_user(1, argp);
-#endif /* CONFIG_PM && CONFIG_PPC32 */
+#endif /* CONFIG_PM_SLEEP && CONFIG_PPC32 */
 
 #ifdef CONFIG_PMAC_BACKLIGHT_LEGACY
/* Compatibility ioctl's for backlight */
@@ -2757,7 +2757,7 @@ pmu_polled_request(struct adb_

Re: [PATCH 2/6] PowerPC 440EPx: Sequoia DTS

2007-08-24 Thread Sergei Shtylyov
Segher Boessenkool wrote:

>>>address-permutation = <0 1 3 2 4 5 7 6 e f d c a b 9 8>;

>>Yes, I was contemplating something like that.

> Let's not define this until we need it though :-)

Let's ot even think of it, since this will end up in a "catch all" driver, 
and yet this may be not enough when the flash doesn't support 8-but R/W, for 
example (I've already quoted it...

>>>I haven't heard or thought of anything better either.  Using "ranges"
>>>is conceptually wrong, even ignoring the technical problems that come
>>>with it.
>>Why is "ranges" conceptually wrong?

> The flash partitions aren't separate devices sitting on a

Yeah, that's why I decided not to go that from the very start... though 
wait: I didn't do this simply because they'renot devices.
That lead me to interesting question: do device tree have something for the 
disk partitions?

> "flash bus", they are "sub-devices" of their parent.

They're quite an abstaction of a device -- althogh Linux treats them as 
separate devices indeed.

>>To be honest this looks rather to me like another case where having
>>overlapping 'reg' and 'ranges' would actually make sense.

> It never makes sense.  You should give the "master" device
> the full "reg" range it covers, and have it define its own
> address space; "sub-devices" can carve out their little hunk
> from that.  You don't want more than one device owning the
> same address range in the same address space.

So, no "ranges" prop in MTD node is necessary? Phew... :-)

> Segher

WBR, Sergei
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2] [02/10] pasemi_mac: Stop using the pci config space accessors for register read/writes

2007-08-24 Thread Olof Johansson
On Fri, Aug 24, 2007 at 02:05:31PM +1000, Stephen Rothwell wrote:
> On Thu, 23 Aug 2007 13:13:10 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
> >
> >  out:
> > -   pci_dev_put(mac->iob_pdev);
> > -out_put_dma_pdev:
> > -   pci_dev_put(mac->dma_pdev);
> > -out_free_netdev:
> > +   if (mac->iob_pdev)
> > +   pci_dev_put(mac->iob_pdev);
> > +   if (mac->dma_pdev)
> > +   pci_dev_put(mac->dma_pdev);
> 
> It is not documented as such (as far as I can see), but pci_dev_put is
> safe to call with NULL. And there are other places in the kernel that
> explicitly use that fact.

Some places check, others do not. I'll leave it be for now but might take
care of it during some future cleanup. Thanks for point it out though.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
James Chapman schrieb:
> Stephen Hemminger wrote:
>> On Fri, 24 Aug 2007 17:47:15 +0200
>> Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
 On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> ...
> 3) On modern systems the incoming packets are processed very fast. 
> Especially
>on SMP systems when we use multiple queues we process only a 
> few packets
>per napi poll cycle. So NAPI does not work very well here and 
> the interruptrate is still high. What we need would be some 
> sort of timer polling modewhich will schedule a device after a 
> certain amount of time for high loadsituations. With high 
> precision timers this could work well. Current
>usual timers are too slow. A finer granularity would be needed 
> to keep the
>latency down (and queue length moderate).
>
 We found the same on ia64-sn systems with tg3 a couple of years 
 ago. Using simple interrupt coalescing ("don't interrupt until 
 you've received N packets or M usecs have elapsed") worked 
 reasonably well in practice. If your h/w supports that (and I'd 
 guess it does, since it's such a simple thing), you might try it.

>>> I don't see how this should work. Our latest machines are fast 
>>> enough that they
>>> simply empty the queue during the first poll iteration (in most cases).
>>> Even if you wait until X packets have been received, it does not 
>>> help for
>>> the next poll cycle. The average number of packets we process per 
>>> poll queue
>>> is low. So a timer would be preferable that periodically polls the 
>>> queue, without the need of generating a HW interrupt. This would 
>>> allow us
>>> to wait until a reasonable amount of packets have been received in 
>>> the meantime
>>> to keep the poll overhead low. This would also be useful in combination
>>> with LRO.
>>>
>>
>> You need hardware support for deferred interrupts. Most devices have 
>> it (e1000, sky2, tg3)
>> and it interacts well with NAPI. It is not a generic thing you want 
>> done by the stack,
>> you want the hardware to hold off interrupts until X packets or Y 
>> usecs have expired.
>
> Does hardware interrupt mitigation really interact well with NAPI? In 
> my experience, holding off interrupts for X packets or Y usecs does 
> more harm than good; such hardware features are useful only when the 
> OS has no NAPI-like mechanism.
>
> When tuning NAPI drivers for packets/sec performance (which is a good 
> indicator of driver performance), I make sure that the driver stays in 
> NAPI polled mode while it has any rx or tx work to do. If the CPU is 
> fast enough that all work is always completed on each poll, I have the 
> driver stay in polled mode until dev->poll() is called N times with no 
> work being done. This keeps interrupts disabled for reasonable traffic 
> levels, while minimizing packet processing latency. No need for 
> hardware interrupt mitigation.
Yes, that was one idea as well. But the problem with that is that 
net_rx_action will call
the same poll function over and over again in a row if there are no 
further network
devices. The problem about this approach is that you always poll just a 
very few packets
each time. This does not work with LRO well, as there are no packets to 
aggregate...
So it would make more sense to wait for a certain time before trying it 
again.
Second problem: after the jiffies incremented by one in net_rx_action 
(after some poll rounds), net_rx_action will quit and return control to 
the softIRQ handler. The poll function
is called again as the softIRQ handler thinks there is more work to be 
done. So even
then we do not wait... After some rounds in the softIRQ handler, we 
finally wait some time.

>
>> The parameters for controlling it are already in ethtool, the issue 
>> is finding a good
>> default set of values for a wide range of applications and 
>> architectures. Maybe some
>> heuristic based on processor speed would be a good starting point. 
>> The dynamic irq
>> moderation stuff is not widely used because it is too hard to get right.
>
> I agree. It would be nice to find a way for the typical user to derive 
> best values for these knobs for his/her particular system. Perhaps a 
> tool using pktgen and network device phy internal loopback could be 
> developed?
>


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Shirley Ma
> Just to be clear, in the previous email I posted on this thread, I
> described a worst-case network ping-pong test case (send a packet, wait
> for reply), and found out that a deffered interrupt scheme just damaged
> the performance of the test case. 

When splitting rx and tx handler, I found some performance gain by 
deffering interrupt scheme in tx not rx in IPoIB driver.

Shirley
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Rick Jones
> Just to be clear, in the previous email I posted on this thread, I
> described a worst-case network ping-pong test case (send a packet, wait
> for reply), and found out that a deffered interrupt scheme just damaged
> the performance of the test case.  Since the folks who came up with the
> test case were adamant, I turned off the defferred interrupts.  
> While defferred interrupts are an "obvious" solution, I decided that 
> they weren't a good solution. (And I have no other solution to offer).

Sounds exactly like the default netperf TCP_RR test and any number of other 
benchmarks.  The "send  a request, wait for reply, send next request, etc etc 
etc" is a rather common application behaviour afterall.

rick jones
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by 
> the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have 
> expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. 

I saw this too, on a system that is "modern" but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


8555CDS BSP on 8548CDS board

2007-08-24 Thread mike zheng
 Hi,

I was told Freescale's 8555CDS board is very similar to 8548CDS board. I
just wonder what exactly the differences are. can I just put the 8555CDS BSP
onto the 8548CDS board?

Thanks  in advance,

Mike
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Stephen Hemminger
On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > > ...
> > > 3) On modern systems the incoming packets are processed very fast. 
> > > Especially
> > >    on SMP systems when we use multiple queues we process only a few 
> > > packets
> > >    per napi poll cycle. So NAPI does not work very well here and the 
> > > interrupt 
> > >    rate is still high. What we need would be some sort of timer polling 
> > > mode 
> > >    which will schedule a device after a certain amount of time for high 
> > > load 
> > >    situations. With high precision timers this could work well. Current
> > >    usual timers are too slow. A finer granularity would be needed to keep 
> > > the
> > >latency down (and queue length moderate).
> > > 
> > 
> > We found the same on ia64-sn systems with tg3 a couple of years 
> > ago. Using simple interrupt coalescing ("don't interrupt until 
> > you've received N packets or M usecs have elapsed") worked 
> > reasonably well in practice. If your h/w supports that (and I'd 
> > guess it does, since it's such a simple thing), you might try 
> > it.
> > 
> 
> I don't see how this should work. Our latest machines are fast enough that 
> they
> simply empty the queue during the first poll iteration (in most cases).
> Even if you wait until X packets have been received, it does not help for
> the next poll cycle. The average number of packets we process per poll queue
> is low. So a timer would be preferable that periodically polls the 
> queue, without the need of generating a HW interrupt. This would allow us
> to wait until a reasonable amount of packets have been received in the 
> meantime
> to keep the poll overhead low. This would also be useful in combination
> with LRO.
> 

You need hardware support for deferred interrupts. Most devices have it (e1000, 
sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by the 
stack,
you want the hardware to hold off interrupts until X packets or Y usecs have 
expired.

The parameters for controlling it are already in ethtool, the issue is finding 
a good
default set of values for a wide range of applications and architectures. Maybe 
some
heuristic based on processor speed would be a good starting point. The dynamic 
irq
moderation stuff is not widely used because it is too hard to get right.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
> On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> > ...
> > 3) On modern systems the incoming packets are processed very fast. 
> > Especially
> >    on SMP systems when we use multiple queues we process only a few packets
> >    per napi poll cycle. So NAPI does not work very well here and the 
> > interrupt 
> >    rate is still high. What we need would be some sort of timer polling 
> > mode 
> >    which will schedule a device after a certain amount of time for high 
> > load 
> >    situations. With high precision timers this could work well. Current
> >    usual timers are too slow. A finer granularity would be needed to keep 
> > the
> >latency down (and queue length moderate).
> > 
> 
> We found the same on ia64-sn systems with tg3 a couple of years 
> ago. Using simple interrupt coalescing ("don't interrupt until 
> you've received N packets or M usecs have elapsed") worked 
> reasonably well in practice. If your h/w supports that (and I'd 
> guess it does, since it's such a simple thing), you might try 
> it.
> 

I don't see how this should work. Our latest machines are fast enough that they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the 
queue, without the need of generating a HW interrupt. This would allow us
to wait until a reasonable amount of packets have been received in the meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 05/20] bootwrapper: flatdevtree fixes

2007-08-24 Thread Scott Wood
On Fri, Aug 24, 2007 at 11:01:22AM +1000, David Gibson wrote:
> On Thu, Aug 23, 2007 at 12:48:30PM -0500, Scott Wood wrote:
> > It's likely to be ugly no matter what, though I'll try to come up with 
> > something slightly nicer.  If I were doing this code from scratch, I'd 
> > probably liven the tree first and reflatten it to pass to the kernel.
> 
> Eh, probably not worth bothering doing an actual implementation at
> this stage - I'll have to redo it for libfdt anyway.

Too late, I already wrote it -- it wasn't as bad as I thought it would
be.

> flatdevtree uses some of the information it caches in the phandle
> context stuff to remember who's the parent of a node.  libfdt uses raw
> offsets into the structure, so the *only* way to implement
> get_parent() is to rescan the dt from the beginning, keeping track of
> parents until reaching the given node.

What is the benefit of doing it that way?

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
   and the net_rx_action have the following problem: netif_rx_schedule
   sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
   netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
   to the poll_list again (as well). netif_rx_complete clears the 
NAPI_STATE_SCHED.
   If an interrupt handler calls netif_rx_schedule on CPU 2
   after netif_rx_complete has been called on CPU 1 (and the poll function 
   has not returned yet), the NAPI instance will be added twice to the 
   poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
   netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are 
   currently all processed on the CPU where the interrupt comes in. This
   is because netif_rx_schedule will always add the rx queue to the CPU's
   napi poll_list. The result under heavy presure is that all queues will
   gather on the weakest CPU (with highest CPU load) after some time as they
   will stay there as long as the entire queue is emptied. On SMP systems 
   this behaviour is not desired. It should also work well without interrupt
   pinning.
   It would be nice if it is possible to schedule queues to other CPU's, or
   at least to use interrupts to put the queue to another cpu (not nice for 
   as you never know which one you will hit). 
   I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especially
   on SMP systems when we use multiple queues we process only a few packets
   per napi poll cycle. So NAPI does not work very well here and the interrupt 
   rate is still high. What we need would be some sort of timer polling mode 
   which will schedule a device after a certain amount of time for high load 
   situations. With high precision timers this could work well. Current
   usual timers are too slow. A finer granularity would be needed to keep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Remove barriers from the SLB shadow buffer update

2007-08-24 Thread Josh Boyer
On Fri, 2007-08-24 at 16:58 +1000, Michael Neuling wrote:
> After talking to an IBM POWER hypervisor design and development (PHYP)
> guy, there seems to be no need for memory barriers when updating the SLB
> shadow buffer provided we only update it from the current CPU, which we
> do.
> 
> Also, these guys see no need in the future for these barriers.

Does this result in a significant performance gain?  I'm just curious.

josh

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: asm-ppc header issues when building ARCH=powerpc

2007-08-24 Thread Geert Uytterhoeven
On Fri, 24 Aug 2007, Kumar Gala wrote:
> On Aug 24, 2007, at 2:10 AM, Geert Uytterhoeven wrote:
> > On Thu, 23 Aug 2007, Brad Boyer wrote:
> > > Just as an extra note, the file drivers/macintosh/adb-iop.c is m68k only,
> > > so you should probably leave that alone as well. It probably doesn't need
> > > that header, but the change should really come from the 68k side of
> > > things.
> > 
> > Thanks, it's indeed not needed.
> 
> If its ok that the removal comes from my patchset that would be great.  I'm
> tired of re-spinning these patches at this point ;)

Sure, less work for me! ;-)

Acked-by: Geert Uytterhoeven <[EMAIL PROTECTED]>

With kind regards,
 
Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
 
Phone:+32 (0)2 700 8453 
Fax:  +32 (0)2 700 8622 
E-mail:   [EMAIL PROTECTED] 
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe  
A division of Sony Service Centre (Europe) N.V. 
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium  
VAT BE 0413.825.160 · RPR Brussels  
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: asm-ppc header issues when building ARCH=powerpc

2007-08-24 Thread Kumar Gala

On Aug 24, 2007, at 2:10 AM, Geert Uytterhoeven wrote:

> On Thu, 23 Aug 2007, Brad Boyer wrote:
>> Just as an extra note, the file drivers/macintosh/adb-iop.c is  
>> m68k only,
>> so you should probably leave that alone as well. It probably  
>> doesn't need
>> that header, but the change should really come from the 68k side  
>> of things.
>
> Thanks, it's indeed not needed.

If its ok that the removal comes from my patchset that would be  
great.  I'm tired of re-spinning these patches at this point ;)

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: asm-ppc header issues when building ARCH=powerpc

2007-08-24 Thread Geert Uytterhoeven
On Thu, 23 Aug 2007, Brad Boyer wrote:
> Just as an extra note, the file drivers/macintosh/adb-iop.c is m68k only,
> so you should probably leave that alone as well. It probably doesn't need
> that header, but the change should really come from the 68k side of things.

Thanks, it's indeed not needed.

With kind regards,
 
Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
 
Phone:+32 (0)2 700 8453 
Fax:  +32 (0)2 700 8622 
E-mail:   [EMAIL PROTECTED] 
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe  
A division of Sony Service Centre (Europe) N.V. 
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium  
VAT BE 0413.825.160 · RPR Brussels  
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev