Daniel Drake пишет: > Being spoilt by the luxuries of i386/x86_64 I've never really had a good > grasp on unaligned memory access problems on other architectures and decided > it was time to figure it out. As a result I've written this documentation > which I plan to submit for inclusion as > Documentation/unaligned_memory_access.txt > > Before I do so, any comments on the following?
>From the viewpoint of yours truly (and I am a teacher of operating system >classes), this is a long-expected document, which is going to be very useful >especially for newbies. My students often make alignment mistakes in their >code, and your article will definitely make my job much easier. Thank you, Daniel, for your work. Dmitri > > Thanks, > Daniel > > > > > UNALIGNED MEMORY ACCESSES > ========================= > > Linux runs on a wide variety of architectures which have varying behaviour > when it comes to memory access. This document presents some details about > unaligned accesses, why you need to write code that doesn't cause them, > and how to write such code! > > > What's the definition of an unaligned access? > ============================================= > > Unaligned memory accesses occur when you try to read N bytes of data starting > from an address that is not evenly divisible by N (i.e. addr % N != 0). > For example, reading 4 bytes of data from address 0x10000004 is fine, but > reading 4 bytes of data from address 0x10000005 would be an unaligned memory > access. > > > Why unaligned access is bad > =========================== > > Most architectures are unable to perform unaligned memory accesses. Any > unaligned access causes a processor exception. > > Some architectures have an exception handler implemented in the kernel which > corrects the memory access, but this is very expensive and is not true for > all architectures. You cannot rely on the exception handler to correct your > memory accesses. > > In summary: if your code causes unaligned memory accesses to happen, your code > will not work on some platforms, and will perform *very* badly on others. > > You may be wondering why you have never seen these problems on your own > architecture. Some architectures (such as i386 and x86_64) do not have this > limitation, but nevertheless it is important for you to write portable code > that works everywhere. > > > Natural alignment > ================= > > The rule we mentioned earlier forms what we refer to as natural alignment: > When accessing N bytes of memory, the base memory address must be evenly > divisible by N, i.e. addr % N == 0 > > When writing code, assume the target architecture has natural alignment > requirements. > > Sidenote: in reality, only a few architectures require natural alignment > on all sizes of memory access. However, again we must consider ALL supported > architectures; natural alignment is the only way to achieve full portability. > > > Code that doesn't cause unaligned access > ======================================== > > At first, the concepts above may seem a little hard to relate to actual > coding practice. After all, you don't have a great deal of control over > memory addresses of certain variables, etc. > > Fortunately things are not too complex, as in most cases, the compiler > ensures that things will work for you. For example, take the following > structure: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > }; > > Let us assume that an instance of the above structure resides in memory > starting at address 0x10000000. With a basic level of understanding, it would > not be unreasonable to expect that accessing field2 would cause an unaligned > access. You'd be expecting field2 to be located at offset 2 bytes into the > structure, i.e. address 0x10000002, but that address is not evenly divisible > by 4 (remember, we're reading a 4 byte value here). > > Fortunately, the compiler understands the alignment constraints, so in the > above case it would insert 2 bytes of padding inbetween field1 and field2. > Therefore, for standard structure types you can always rely on the compiler > to pad structures so that accesses to fields are suitably aligned (assuming > you do not cast the field to a type of different length). > > Similarly, you can also rely on the compiler to align variables and function > parameters to a naturally aligned scheme, based on the size of the type of > the variable. > > Sidenote: in the above example, you may wish to reorder the fields in the > above structure so that the overall structure uses less memory. For example, > moving field3 to sit inbetween field1 and field2 (where the padding is > inserted) would shrink the overall structure by 1 byte: > > struct foo { > u16 field1; > u8 field3; > u32 field2; > }; > > Sidenote: it should be obvious by now, but in case it is not, accessing a > single byte (u8 or char) can never cause an unaligned access, because all > memory addresses are evenly divisible by 1. > > > Code that causes unaligned access > ================================= > > With the above in mind, let's move onto a real life example of a function > that can cause an unaligned memory access. The following function adapted > from include/linux/etherdevice.h is an optimized routine to compare two > ethernet MAC addresses for equality. > > unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) > { > const u16 *a = (const u16 *) addr1; > const u16 *b = (const u16 *) addr2; > return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; > } > > In the above function, the reference to a[0] causes 2 bytes (16 bits) to > be read from memory starting at address addr1. Think about what would happen > if addr1 was an odd address, such as 0x10000003. (Hint: it'd be an unaligned > access) > > Despite the potential unaligned access problems with the above function, it > is included in the kernel anyway but is documented to only work on > 16-bit-aligned addresses. It is up to the caller to ensure this alignment or > not use this function at all. This alignment-unsafe function is still useful > as it is a decent optimization for the cases when you can ensure alignment. > > > Here is another example of code that could cause unaligned accesses: > void myfunc(u8 *data, u32 value) > { > [...] > *((u32 *) data) = cpu_to_le32(value); > [...] > } > > This code will cause unaligned accesses every time the data parameter points > to an address that is not evenly divisible by 4. > > > Consider the following structure: > struct foo { > u16 field1; > u32 field2; > u8 field3; > } __attribute__((packed)); > > It's the same structure as we looked at earlier, but the packed attribute has > been added. This attribute ensures that the compiler never inserts any padding > and the structure is laid out in memory exactly as is suggested above. > > The packed attribute is useful when you want to use a C struct to represent > some data that comes in a fixed arrangement 'off the wire'. > > It should be clear why accessing fields of an instance of that structure could > cause unaligned accesses in some situations. Even if the instance started at > an address such as 0x10000000 where accessing field1 would not cause an > unaligned access, accessing field2 would be reading 4 bytes from 0x10000002, > which, is an unaligned access. The compiler didn't jump to your rescue and > insert padding because you asked it not to. > > > In summary, the 3 main scenarios where you may run into unaligned access > problems involve: > 1. Recasting variables to types of different lengths > 2. Pointer arithmetic followed by access to at least 2 bytes of data > 3. Accessing elements of packed structures > > > Avoiding unaligned accesses > =========================== > > Going back to an earlier example: > void myfunc(u8 *data, u32 value) > { > [...] > *((u16 *) data) = cpu_to_le32(value); > [...] > } > > To avoid the unaligned memory access, you could rewrite it as follows: > > void myfunc(u8 *data, u32 value) > { > [...] > value = cpu_to_le32(value); > memcpy(data, value, sizeof(value)); > [...] > } > > It's safe to assume that memcpy will always copy bytewise and hence will > never cause an unaligned access. > > > Recall an example packed structure from earlier: > > struct foo { > u16 field1; > u32 field2; > u8 field3; > } __attribute__((packed)); > > The following code will potentially cause 2 unaligned accesses: writing to > field2, then reading from field2: > > void myfunc2(u32 some_data) > { > struct foo myinstance; > u32 tmp; > > myinstance.field2 = some_data; > tmp = myinstance.field2 * 2; > } > > When writing this code, you should be aware that field2 acccesses are > potentially unaligned therefore the above will break on some systems. The > kernel provides two macros to simplify handling of situations such as the > above: > > void myfunc2(u32 some_data) > { > struct foo myinstance; > u32 tmp; > > put_unaligned(tmp, &myinstance.field2); > tmp = get_unaligned(&myinstance.field2); > } > > These macros work from pointers to the unaligned data, and work for memory > accesses of any length (not just 32 bits as in the example above). You could > even use put_unaligned() rather than memcpy() in order to solve the bug in > the first example (myfunc()) given above. > > -- > Author: Daniel Drake <[EMAIL PROTECTED]> > With help from: Johannes Berg, Uli Kunitz. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/