Re[4]: [PATCH 02/11][v3] async_tx: add support for asynchronous GF multiplication
Hello Dan, On Friday, January 16, 2009 you wrote: On Fri, Jan 16, 2009 at 4:41 AM, Yuri Tikhonov y...@emcraft.com wrote: I don't think this will work as we will be mixing Q into the new P and P into the new Q. In order to support (src_cnt device-max_pq) we need to explicitly tell the driver that the operation is being continued (DMA_PREP_CONTINUE) and to apply different coeffeicients to P and Q to cancel the effect of including them as sources. With DMA_PREP_ZERO_P/Q approach, the Q isn't mixed into new P, and P isn't mixed into new Q. For your example of max_pq=4: p, q = PQ(src0, src1, src2, src3, src4, COEF({01}, {02}, {04}, {08}, {10})) with the current implementation will be split into: p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}) p`,q` = PQ(src4, COEF({10})) which will result to the following: p = ((dma_flags DMA_PREP_ZERO_P) ? 0 : old_p) + src0 + src1 + src2 + src3 q = ((dma_flags DMA_PREP_ZERO_Q) ? 0 : old_q) + {01}*src0 + {02}*src1 + {04}*src2 + {08}*src3 p` = p + src4 q` = q + {10}*src4 Huh? Does the ppc440spe engine have some notion of flagging a source as old_p/old_q? Otherwise I do not see how the engine will not turn this into: p` = p + src4 + q q` = q + {10}*src4 + {x}*p I think you missed the fact that we have passed p and q back in as sources. Unless we have multiple p destinations and multiple q destinations, or hardware support for continuations I do not see how you can guarantee this split. I guess, I've got your point. You are missing the fact that destinations for 'p' and 'q' are passed in device_prep_dma_pq() method separately from sources. Speaking your words: we do not have multiple destinations through the while() cycles, the destinations are the same in each pass. Please look at do_async_pq() implementation more carefully: 'blocks' is a pointer to 'src_cnt' sources _plus_ two destination pages (as it's stated in async_pq() description). Before coming into the while() cycle we save destinations in the dma_dest[] array, and then pass this to device_prep_dma_pq() in each (src_cnt/max_pq) cycle. That is, we do not passes destinations as the sources explicitly: we just clear DMA_PREP_ZERO_P/Q flags to notify ADMA level that this have to XOR the current content of destination(s) with the result of new operation. I'm afraid that the difference (13/4, 125/32) is very significant, so getting rid of DMA_PREP_ZERO_P/Q will eat most of the improvement which could be achieved with the current approach. Data corruption is a slightly higher cost :-). but at this point I do not see a cleaner alternatve for engines like iop13xx. I can't find any description of iop13xx processors at Intel's web-site, only 3xx: http://www.intel.com/design/iio/index.htm?iid=ipp_embed+embed_io So, it's hard for me to do any suggestions. I just wonder - doesn't iop13xx allow users to program destination addresses into the sources fields of descriptors? Yes it does, but the engine does not know it is a destination. Take a look at page 496 of the following and tell me if you come to a different conclusion. http://download.intel.com/design/iio/docs/31503602.pdf I see. The major difference in the implementation of support for P+Q in ppc440spe DMA engines is that ppc440spe allows to include (xor) the previous content of P_Result and/or Q_Result just by setting a corresponding indication in the destination (P_Result and/or Q_Result) address(es) The 5.7.5 P+Q Update Operation case won't help here, since, if I understand it right, it doesn't allow to set up different multipliers for Old and New Data. So, it looks like your approach: p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10})) is the only possible way of including the previous P/Q content into the calculation. But I still think, that this p'/q' hack should have a place on the ADMA level, not ASYNC_TX. It looks more generic if ASYNC_TX will assume that ADMA is capable of p'=p+src / q'=q+{}*src. Otherwise, we'll have an overhead for the DMAs which could work without this overhead. In your case, the IOP ADMA driver should handle the situation when it receives 4 sources to be P+Qed with the previous contents of destinations, for example, by generating the sequence of 4 descriptors to process such a request. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re[4]: [PATCH 03/11][v3] async_tx: add support for asynchronous RAID6 recovery operations
On Friday, January 16, 2009 you wrote: On Fri, Jan 16, 2009 at 4:51 AM, Yuri Tikhonov y...@emcraft.com wrote: The reason why I preferred to use async_pq() instead of async_xor() here is to maximize the chance that the whole D+D recovery operation will be handled in one ADMA device, i.e. without channels switch and the latency introduced because of that. This should be a function of the async_tx_find_channel implementation. The default version tries to keep a chain of operations on one channel. struct dma_chan * __async_tx_find_channel(struct dma_async_tx_descriptor *depend_tx, enum dma_transaction_type tx_type) { /* see if we can keep the chain on one channel */ if (depend_tx dma_has_cap(tx_type, depend_tx-chan-device-cap_mask)) return depend_tx-chan; return dma_find_channel(tx_type); } Right. Then I need to update my ADMA driver, and add support for explicit DMA_XOR capability on channels which can process DMA_PQ. Thanks. Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] PS3 ps3av_set_video_mode() make id signed
vi drivers/video/ps3fb.c +618 static int ps3fb_set_par(struct fb_info *info) { struct ps3fb_par *par = info-par; ... [ and at line 660 ] ... if (ps3av_set_video_mode(par-new_mode_id)) now new_mode_id is an int vi drivers/video/ps3fb.c +132 struct ps3fb_par { ... int new_mode_id; ... }; vi drivers/ps3/ps3av.c +844 int ps3av_set_video_mode(u32 id) -^^^ { ... if (... || id 0) { ^^^ dev_dbg(ps3av-dev-core, %s: error id :%d\n, __func__, id); return -EINVAL; } ... id = ps3av_auto_videomode(ps3av-av_hw_conf); if (id 1) { -^^^ printk(KERN_ERR %s: invalid id :%d\n, __func__, id); return -EINVAL; } ... ps3av-ps3av_mode = id; vi drivers/ps3/ps3av.c +763 static int ps3av_auto_videomode() ---^^^ +42 static struct ps3av { ... int ps3av_mode; ... }; -8---8--- make id signed so a negative id will get noticed Signed-off-by: Roel Kluin roel.kl...@gmail.com --- diff --git a/drivers/ps3/ps3av.c b/drivers/ps3/ps3av.c index 5324978..7aa6d41 100644 --- a/drivers/ps3/ps3av.c +++ b/drivers/ps3/ps3av.c @@ -838,7 +838,7 @@ static int ps3av_get_hw_conf(struct ps3av *ps3av) } /* set mode using id */ -int ps3av_set_video_mode(u32 id) +int ps3av_set_video_mode(int id) { int size; u32 option; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[BUG] 2.6.28-git12: powerpc/pci: Reserve legacy regions on PCI broke my G3
The 2.6.29-rc kernels hang during boot on my PowerMac G3 (Beige). The last messages I see on the console are Kernel command line: ramdisk_size=8192 irq: Found primary Apple PIC /pci/mac-io for 64 irqs irq: System has 64 possible interrupts and then it hangs until I reboot it. Unfortunately I don't have a serial cable compatible with the G3's serial port so I can't set up a serial console. Eventually I narrowed the cause down to powerpc/pci: Reserve legacy regions on PCI (commit c1f343028d35ba4e88cd4a3c44e0d8b8a84264ee) in 2.6.28-git12. Reverting that from 2.6.29-rc2 results in a kernel that boots ok and works reliably. The problem seems system specific as both my G5 and my G4 had no problems at all with 2.6.29-rc1. /Mikael ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Re[2]: [PATCH][v4] powerpc 44x: support for 256KB PAGE_SIZE
On Jan 16, 2009, at 9:18 AM, Yuri Tikhonov wrote: On Friday, January 16, 2009 you wrote: On Jan 12, 2009, at 4:49 PM, Yuri Tikhonov wrote: This patch adds support for 256KB pages on ppc44x-based boards. +config STDBINUTILS + bool Using standard binutils settings + depends on 44x + default y I think this should be config STDBINUTILS bool Using standard binutils settings if 44x default y that way we imply that all powerpc users are using the standard binutils instead of only those using a 44x platform. We still get the intended effect of asking the user only on 44x. I haven't looked at the resulting question or config order to see if it makes sense to leave it here or put it closer to the page size. I'm not sure about this. For 44x platforms - the STDBINUTILS option is reasonable, because it's used in the PAGE_SIZE selection process. But as regarding the other powerpcs the STDBINUTILS option will do nothing, but taking a superfluous string in configs. Are you sure this will be better ? Ok I tried this out in menuconfig. You are right that the depends on makes sense as it removes the option from the config file as not relevant. But right now to enable 256K pages one has to go to platform setup to find this dependency, then has to go to general setup to find the shmem option at the bottom of the list in the embedded/expert section, then finally go to the kernel options menu to finally choose the page size. Moving this question just before the page size choice removes one of those hidden menu, so I suggest that it be moved to just before the option that it allow be selected. milton ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev