On Mon, Sep 12, 2011 at 10:10 AM, pavan tc <pavan...@gmail.com> wrote: > Hi, > > I would like to know if there have been issues with optimized linked > list code with GCC 4.3.2. [optiimization flag : -O2] > > The following is the inlined code that has the problem: > > static inline void > list_add_tail (struct list_head *new, struct list_head *head) > { > new->next = head; > new->prev = head->prev; > > new->prev->next = new; > new->next->prev = new; > } > > The above code has been used in the loop as below: > > pool = GF_CALLOC (count, padded_sizeof_type, gf_common_mt_long); > if (!pool) { > GF_FREE (mem_pool); > return NULL; > } > > for (i = 0; i < count; i++) { > list = pool + (i * (padded_sizeof_type)); > INIT_LIST_HEAD (list); > list_add_tail (list, &mem_pool->list); > ^^^^^^^^^^^^ > } > > '&mem_pool-> list' is used as the list head. mem_pool is a pointer to type : > struct mem_pool { > struct list_head list; > int hot_count; > int cold_count; > gf_lock_t lock; > unsigned long padded_sizeof_type; > void *pool; > void *pool_end; > int real_sizeof_type; > }; > > 'list' is the new member being added to the tail of the list pointed to by > head. > It is a pointer to type: > struct list_head { > struct list_head *next; > struct list_head *prev; > }; > > The generated assembly for the loop (with the linined list_add_tail()) > is as below: > > 40a1c: e8 0f 03 fd ff callq 10d30 <__gf_calloc@plt> > 40a21: 48 85 c0 test %rax,%rax > 40a24: 48 89 c7 mov %rax,%rdi > 40a27: 0f 84 bf 00 00 00 je 40aec <mem_pool_new_fn+0x14c> > 40a2d: 48 8b 73 08 mov 0x8(%rbx),%rsi > 40a31: 4d 8d 44 24 01 lea 0x1(%r12),%r8 > 40a36: 31 c0 xor %eax,%eax > 40a38: b9 01 00 00 00 mov $0x1,%ecx > 40a3d: 0f 1f 00 nopl (%rax) > 40a40: 49 0f af c5 imul %r13,%rax > <=== loop start > 40a44: 48 8d 04 07 lea (%rdi,%rax,1),%rax > 40a48: 48 89 18 mov %rbx,(%rax) > # list->next = head > 40a4b: 48 89 06 mov %rax,(%rsi) > # <!!> head->prev->next = list > 40a4e: 48 8b 10 mov (%rax),%rdx > # rdx holds list->next > 40a51: 48 89 70 08 mov %rsi,0x8(%rax) # > list->prev = head->prev; > 40a55: 48 89 42 08 mov %rax,0x8(%rdx) # > list->next->prev = list > 40a59: 48 89 c8 mov %rcx,%rax > 40a5c: 48 83 c1 01 add $0x1,%rcx > 40a60: 4c 39 c1 cmp %r8,%rcx > 40a63: 75 db jne 40a40 <mem_pool_new_fn+0xa0> > > In the assembly above, %rbx holds the address of 'head'. > %rsi holds the value of head->prev. This is assigned outside the loop and the > compiler classifies it as a loop invariant, which is where, I think, > the problem is. > This line of code should have been inside the loop. > <!!> - %rsi still holds the value of head->prev that was assigned > outside the loop. > > The following experiments eliminate the problem: > > 1. Using 'volatile' on the address that 'head' points to. > 2. Using a function call (logging calls, for example) inside the loop. > 3. Using the direct libc calloc instead of the GF_CALLOC. > [GF_CALLOC does some accounting when accounting is enabled. Calls vanilla > libc calloc() otherwise]. > > So, anything that necessitates a different usage of %rsi seems to be > correcting > the behaviour. > > 4. Using gcc 4.4.3 [ The obvious solution would then be to use 4.4.3, > but I would > like to understand if this is a known problem with 4.3.2. Small > programs written to > emulate this problem do not exhibit the erroneous behaviour.] > > Please let me know if any more details about this behaviour are required. > I'll be glad to provide them.
Use -fno-strict-aliasing. Your code invokes undefined behavior. > TIA, > Pavan >