From: Denys Vlasenko > Sent: 21 November 2018 13:44 ... > I also tested this while working for string ops code in musl. > > I think at least 128 bytes would be the minimum where "REP insn" > are more efficient. In my testing, it's more like 256 bytes...
What happens for misaligned copies? I had a feeling that the ERMS 'reb movsb' code used some kind of barrel shifter in that case. The other problem with the ERMS copy is that it gets used for copy_to/from_io() - and the 'rep movsb' on uncached locations has to do byte copies. Byte reads on PCIe are really horrid. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)