Hi Hao,

I think there are a few issues here. The biggest is that using
separate registers in the ins/outs list means that the
register-allocator won't know they must be consecutive and might
decide to create an instruction like "ld2 {v12.4s, v2.4s}, [x0]".

I think the ARM solution of using pairs, triplets and so on is
probably the best option.

Also, the ld1/st1 instructions should probably have normal patterns
for a single vector load and store as well. When writing some kind of
test I encountered "could not select" errors on simple vector stores.

Finally, I think the SelectVLD (& possibly SelectVST) code doesn't
transfer the chains properly and leaves the @llvm.arm.neon.vldN in the
DAG after selection. The attached file (after adding a store v16i8
pattern) should demonstrate this (it was my attempt to trick regalloc
into doing the wrong thing, so may be useful for your own tests there,
though I didn't get that far).

Oh, and one minor naming detail: NeonI_LdStElem. I thought the "Elem"
instructions were different ones, like "ld1 { v0.4s[2] }, [x0]". I'd
have called these "Multiple" as in the comment, or something similar.

Cheers.

Tim.

Attachment: tmp.ll
Description: Binary data

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to