> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field, > > testcase from v2 and add a few more comments to describe the field? > > > > And add an -mtune=ultra-size to make it able to test without change > > other behavior? > > > > Hi Palmer: > > > > Are you OK with that? > > I'm still not convinced on the performance: like Andrew and I pointed > out, this is a difficult case for pipelines of this flavor to handle. > Nobody here knows anything about this pipeline deeply enough to say > anything difinitive, though, so this is really just a guess.
So with an extra field to indicate should resolve that? I believe people should only set overlap_op_by_pieces to true only if they are sure it has benefits. > As I'm not convinced this is an obvious performance win I'm not going to > merge it without a benchmark. If you're convinced and want to merge it > that's fine, I don't really care about the performance fo the C906 and > if someone complains we can always just revert it later. I suppose Christoph has tried with their internal processor, and it's benefit on performance, but it can't be open-source yet, so v2 patch set using C906 to demo and test that since that is the only processor with slow_unaligned_access=False. I agree on the C906 part, we never know it's benefit or not, so I propose adding one -mtune=ultra-size to make this test-able rather than changing C906.