Not being a microcode expert, I am guessing here as two more possible reasons:
1. Accessing 8 consecutive bytes that are double word aligned is faster than accessing 8 consecutive bytes that are not double word aligned. The slowing down of the execution would be multiplied by as many processors as are trying to access the same 8 bytes simultaneously. In order not to slow down all possibly 64 (now 90?) processors colliding on the same 8 bytes, each processor is required to access the 8 bytes in the minimum possible time, which means double word aligned for all CPUs that want to share in the collision. 2. Suppose processor A is accessing bytes 100-107 with a CDS and processor B is accessing bytes 104-10B with a CDS. The PoP says 8 bytes will be consistent, but 12 bytes total are involved. Must now all 12 bytes be kept consistent? Now throw in 50 or 60 other processors accessing one or more of the same bytes with CDS. The logic required to determine which bytes to keep consistent might need a granularity of one byte rather than a double word, further complicating the logic of the microcode, expanding the microcode, slowing down the whole process, and possibly requiring more latches. Bill Fairchild Rocket Software -----Original Message----- From: IBM Mainframe Assembler List [mailto:assembler-l...@listserv.uga.edu] On Behalf Of Alex Kodat Sent: Tuesday, August 17, 2010 3:10 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: CDS and alignment question More specifically, the CS, CDS, and CDSG instructions would have been a nightmare to implement if they could cross first-level cache block boundaries. Requiring that the targets of these instructions be word, double-word, or quad-word aligned, respectively, ensures this without tying the instructions to a particular cache block size. It does require that cache blocks be at least multiples of 16 bytes, but that's a pretty good bet for any sane design. AFAIK, cache block size has always been at least a multiple of 128 on 360 and upward machines.