Re[4]: [PATCH 02/11][v3] async_tx: add support for asynchronous GF multiplication

2009-01-17 Thread Yuri Tikhonov
Hello Dan,

On Friday, January 16, 2009 you wrote:

 On Fri, Jan 16, 2009 at 4:41 AM, Yuri Tikhonov y...@emcraft.com wrote:
 I don't think this will work as we will be mixing Q into the new P and
 P into the new Q.  In order to support (src_cnt  device-max_pq) we
 need to explicitly tell the driver that the operation is being
 continued (DMA_PREP_CONTINUE) and to apply different coeffeicients to
 P and Q to cancel the effect of including them as sources.

  With DMA_PREP_ZERO_P/Q approach, the Q isn't mixed into new P, and P
 isn't mixed into new Q. For your example of max_pq=4:

  p, q = PQ(src0, src1, src2, src3, src4, COEF({01}, {02}, {04}, {08}, {10}))

  with the current implementation will be split into:

  p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08})
  p`,q` = PQ(src4, COEF({10}))

  which will result to the following:

  p = ((dma_flags  DMA_PREP_ZERO_P) ? 0 : old_p) + src0 + src1 + src2 + src3
  q = ((dma_flags  DMA_PREP_ZERO_Q) ? 0 : old_q) + {01}*src0 + {02}*src1 + 
 {04}*src2 + {08}*src3

  p` = p + src4
  q` = q + {10}*src4


 Huh?  Does the ppc440spe engine have some notion of flagging a source
 as old_p/old_q?  Otherwise I do not see how the engine will not turn
 this into:

 p` = p + src4 + q
 q` = q + {10}*src4 + {x}*p

 I think you missed the fact that we have passed p and q back in as
 sources.  Unless we have multiple p destinations and multiple q
 destinations, or hardware support for continuations I do not see how
 you can guarantee this split.

 I guess, I've got your point. You are missing the fact that 
destinations for 'p' and 'q' are passed in device_prep_dma_pq() method 
separately from sources. Speaking your words: we do not have multiple 
destinations through the while() cycles, the destinations are the same 
in each pass.

 Please look at do_async_pq() implementation more carefully: 'blocks' 
is a pointer to 'src_cnt' sources _plus_ two destination pages (as 
it's stated in async_pq() description). Before coming into the while() 
cycle we save destinations in the dma_dest[] array, and then pass this 
to device_prep_dma_pq() in each (src_cnt/max_pq) cycle. That is, we do 
not passes destinations as the sources explicitly: we just clear 
DMA_PREP_ZERO_P/Q flags to notify ADMA level that this have to XOR the 
current content of destination(s) with the result of new operation.

  I'm afraid that the difference (13/4, 125/32) is very significant, so
 getting rid of DMA_PREP_ZERO_P/Q will eat most of the improvement
 which could be achieved with the current approach.

 Data corruption is a slightly higher cost :-).


  but at this point I do not see a cleaner alternatve for engines like 
 iop13xx.

  I can't find any description of iop13xx processors at Intel's
 web-site, only 3xx:

 http://www.intel.com/design/iio/index.htm?iid=ipp_embed+embed_io

  So, it's hard for me to do any suggestions. I just wonder - doesn't
 iop13xx allow users to program destination addresses into the sources
 fields of descriptors?

 Yes it does, but the engine does not know it is a destination.

 Take a look at page 496 of the following and tell me if you come to a
 different conclusion.
 http://download.intel.com/design/iio/docs/31503602.pdf

 I see. The major difference in the implementation of support for P+Q 
in ppc440spe DMA engines is that ppc440spe allows to include (xor) the 
previous content of P_Result and/or Q_Result just by setting a 
corresponding indication in the destination (P_Result and/or Q_Result) 
address(es) 

 The 5.7.5 P+Q Update Operation case won't help here, since, if 
I understand it right, it doesn't allow to set up different 
multipliers for Old and New Data.

 So, it looks like your approach:

p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))

 is the only possible way of including the previous P/Q content into 
the calculation.

 But I still think, that this p'/q' hack should have a place on the 
ADMA level, not ASYNC_TX. It looks more generic if ASYNC_TX will 
assume that ADMA is capable of p'=p+src / q'=q+{}*src. Otherwise, 
we'll have an overhead for the DMAs which could work without this 
overhead.

 In your case, the IOP ADMA driver should handle the situation when it 
receives 4 sources to be P+Qed with the previous contents of 
destinations, for example, by generating the sequence of 4 descriptors 
to process such a request.

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re[4]: [PATCH 03/11][v3] async_tx: add support for asynchronous RAID6 recovery operations

2009-01-17 Thread Yuri Tikhonov
On Friday, January 16, 2009 you wrote:

 On Fri, Jan 16, 2009 at 4:51 AM, Yuri Tikhonov y...@emcraft.com wrote:
  The reason why I preferred to use async_pq() instead of async_xor()
 here is to maximize the chance that the whole D+D recovery operation
 will be handled in one ADMA device, i.e. without channels switch and
 the latency introduced because of that.


 This should be a function of the async_tx_find_channel implementation.
  The default version tries to keep a chain of operations on one
 channel.

 struct dma_chan *
 __async_tx_find_channel(struct dma_async_tx_descriptor *depend_tx,
 enum dma_transaction_type tx_type)
 {
 /* see if we can keep the chain on one channel */
 if (depend_tx 
 dma_has_cap(tx_type, depend_tx-chan-device-cap_mask))
 return depend_tx-chan;
 return dma_find_channel(tx_type);
 }

 Right. Then I need to update my ADMA driver, and add support for 
explicit DMA_XOR capability on channels which can process DMA_PQ.
Thanks.

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] PS3 ps3av_set_video_mode() make id signed

2009-01-17 Thread Roel Kluin
vi drivers/video/ps3fb.c +618
static int ps3fb_set_par(struct fb_info *info)
{
struct ps3fb_par *par = info-par;
... [ and at line 660 ] ...
if (ps3av_set_video_mode(par-new_mode_id))

now new_mode_id is an int

vi drivers/video/ps3fb.c +132
struct ps3fb_par {
...
int new_mode_id;
...
};

vi drivers/ps3/ps3av.c +844
int ps3av_set_video_mode(u32 id)

-^^^

{
...
if (... || id  0) {

^^^

dev_dbg(ps3av-dev-core, %s: error id :%d\n, __func__, id);
return -EINVAL;
}
...
id = ps3av_auto_videomode(ps3av-av_hw_conf);
if (id  1) {
-^^^
printk(KERN_ERR %s: invalid id :%d\n, __func__, id);
return -EINVAL;
}
...
ps3av-ps3av_mode = id;

vi drivers/ps3/ps3av.c +763
static int ps3av_auto_videomode()

---^^^

+42
static struct ps3av {
...
int ps3av_mode;
...
};

-8---8---
make id signed so a negative id will get noticed

Signed-off-by: Roel Kluin roel.kl...@gmail.com
---
diff --git a/drivers/ps3/ps3av.c b/drivers/ps3/ps3av.c
index 5324978..7aa6d41 100644
--- a/drivers/ps3/ps3av.c
+++ b/drivers/ps3/ps3av.c
@@ -838,7 +838,7 @@ static int ps3av_get_hw_conf(struct ps3av *ps3av)
 }
 
 /* set mode using id */
-int ps3av_set_video_mode(u32 id)
+int ps3av_set_video_mode(int id)
 {
int size;
u32 option;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[BUG] 2.6.28-git12: powerpc/pci: Reserve legacy regions on PCI broke my G3

2009-01-17 Thread Mikael Pettersson
The 2.6.29-rc kernels hang during boot on my PowerMac G3 (Beige).
The last messages I see on the console are

Kernel command line: ramdisk_size=8192
irq: Found primary Apple PIC /pci/mac-io for 64 irqs
irq: System has 64 possible interrupts

and then it hangs until I reboot it. Unfortunately I don't have a serial cable
compatible with the G3's serial port so I can't set up a serial console.

Eventually I narrowed the cause down to powerpc/pci: Reserve legacy regions on 
PCI
(commit c1f343028d35ba4e88cd4a3c44e0d8b8a84264ee) in 2.6.28-git12. Reverting 
that
from 2.6.29-rc2 results in a kernel that boots ok and works reliably.

The problem seems system specific as both my G5 and my G4 had no problems at all
with 2.6.29-rc1.

/Mikael
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Re[2]: [PATCH][v4] powerpc 44x: support for 256KB PAGE_SIZE

2009-01-17 Thread Milton Miller

On Jan 16, 2009, at 9:18 AM, Yuri Tikhonov wrote:

On Friday, January 16, 2009 you wrote:

On Jan 12, 2009, at 4:49 PM, Yuri Tikhonov wrote:



This patch adds support for 256KB pages on ppc44x-based boards.




+config STDBINUTILS
+ bool Using standard binutils settings
+ depends on 44x
+ default y




I think this should be



config STDBINUTILS
bool Using standard binutils settings if 44x
default y



that way we imply that all powerpc users are using the standard
binutils instead of only those using a 44x platform.  We still get the
intended effect of asking the user only on 44x.


I haven't looked at the resulting question or config order to see if 
it

makes sense to leave it here or put it closer to the page size.


 I'm not sure about this. For 44x platforms - the STDBINUTILS option
is reasonable, because it's used in the PAGE_SIZE selection process.
But as regarding the other powerpcs the STDBINUTILS option will do
nothing, but taking a superfluous string in configs. Are you sure this
will be better ?



Ok I tried this out in menuconfig.  You are right that the depends on 
makes sense as it removes the option from the config file as not 
relevant.  But right now to enable 256K pages one has to go to platform 
setup to find this dependency, then has to go to general setup to find 
the shmem option at the bottom of the list in the embedded/expert 
section, then finally go to the kernel options menu to finally choose 
the page size.


Moving this question just before the page size choice removes one of 
those hidden menu, so I suggest that it be moved to just before the 
option that it allow be selected.


milton

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev