On Mon, Oct 16, 2017 at 5:00 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> This patch merges adjacent memset builtin partitions if possible. It is >>>> a useful special case optimization transforming below code: >>>> >>>> #define M (256) >>>> #define N (512) >>>> >>>> struct st >>>> { >>>> int a[M][N]; >>>> int c[M]; >>>> int b[M][N]; >>>> }; >>>> >>>> void >>>> foo (struct st *p) >>>> { >>>> for (unsigned i = 0; i < M; ++i) >>>> { >>>> p->c[i] = 0; >>>> for (unsigned j = N; j > 0; --j) >>>> { >>>> p->a[i][j - 1] = 0; >>>> p->b[i][j - 1] = 0; >>>> } >>>> } >>>> >>>> into a single memset function call, rather than three calls initializing >>>> the structure field by field. >>>> >>>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK? >>> >>> + /* Insertion sort is good enough for the small sub-array. */ >>> + for (k = i + 1; k < j; ++k) >>> + { >>> + part1 = (*partitions)[k]; >>> + for (l = k; l > i; --l) >>> + { >>> + part2 = (*partitions)[l - 1]; >>> + if (part1->builtin->dst_base_offset >>> + >= part2->builtin->dst_base_offset) >>> + break; >>> + >>> + (*partitions)[l] = part2; >>> + } >>> + (*partitions)[l] = part1; >>> + } >>> >>> so we want to sort [i, j[ after dst_base_offset. I realize you don't want >>> to write a qsort helper for this but I can't wrap my head around the above >>> in 5 minutes so ... please! >> Hmm, I thought twice about this and now I believe stable sorting (thus >> insertion sort) >> is required here. Please see below for explanation. >> >>> >>> You don't seem to check the sizes of the memsets given that they >>> obviously cannot overlap(!?) >> Yes, given it's quite special case transformation, I did add code >> checking overlap cases. >>> >>> Also why handle memset and not memcpy/memmove or combinations of >>> them (for sorting)? >>> >>> for () >>> { >>> p->a[i] = 0; >>> p->c[i] = q->c[i]; >>> p->b[i] = 0; >>> } >>> >>> with a and b adjacent. I suppose p->c could be computed by a >>> non-builtin partition as well. >> Yes, the two memset builtin partitions can be merged in this case, but... >>> So don't we want to see if dependences allow sorting all builtin >>> partitions next to each other >>> as much as possible? (maybe we do that already?) >> The answer for this, above partition merging and use of qsort is no. >> I think all the three are the same question here. For now we only do >> topological sort for partitions. To maximize parallelism (either by merging >> normal parallel partitions or merging builtin partitions) requires fine-tuned >> sorting between partitions that doesn't dependence on each other. >> In order to sort all memset/memcpy/memmove, we need check dependence >> between all data references between different partitions. For example, I >> created new test ldist-36.c illustrating sorting memcpy along with memset >> would generate wrong code because dependence is broken. It's the same >> for qsort. In extreme case, if the same array is set twice with different >> rhs >> value, the order between the two sets needs to be preserved. Unfortunately, >> qsort is unstable and could reorder different sets. This would break output >> dependence. >> At the point of this function, dependence graph is destroyed, we can't do >> much in addition to special case handling for memset. Full solution would >> require a customized topological sorting process. >> >> So, this updated patch keeps insertion sort with additional comment >> explaining >> why. Also two test cases added showing when memset partitions should be >> merged (we can't for now) and when memset partitions should not be merged. > Hmm, I can factor out the sorting loop into a function, that might > make it easier > to read. Okay, I will use std::stable_sort in this case, that's exactly what we want for this case.
Thanks, bin