On Fri, Nov 27, 2020 at 09:45:39AM -0800, Andy Lutomirski wrote: > Is -22 (-EINVAL) the same error it returns if you pass in garbage?
Define garbage. Yes, if you have a sequence of bytes which you can unambiguously determine to be - an invalid instruction in some of the tables - REX prefix with the wrong bits set - a byte says that some insn part like ModRM or SIB is following but the buffer falls short - ... other error condition then maybe you can say, yes, I'm looking at garbage and can error out right then and there. But you need to have enough bytes of information to determine that. For example (those are random bytes): 00000000000011ff <.asm_start>: 11ff: 95 xchg %eax,%ebp 1200: 14 60 adc $0x60,%al 1202: 77 74 ja 1278 <__libc_csu_init+0x28> 1204: 82 (bad) 1205: 67 dc 55 35 fcoml 0x35(%ebp) The 0x82 is usually in opcode group 1 but that opcode is invalid in 64-bit mode. So if this is a 64-bit executable, you know that that is an invalid insn. Another example: 18: a0 .byte 0xa0 19: 17 (bad) 1a: 27 (bad) 1b: ea (bad) 1c: 90 nop 1d: 90 nop 1e: 90 nop 1f: 90 nop 0xa0 is the opcode for MOV AL, moffset8 where moffset8 is an address-sized memory offset, which in 64-bit mode is 64-bit. But we have only 7 bytes after the 0xa0 thus we know that the buffer is truncated. If it had one byte more, it would be a valid insn: 18: a0 17 27 ea 90 90 90 movabs 0x9090909090ea2717,%al 20: 90 90 I'm sure you get the idea: if you have enough unambiguous bits which tell you that this cannot be a valid insn, then you can return early from the decoder and signal that fact. I'm not sure that you can do that for all possible byte combinations and also I'm not sure that it won't ever happen that it per chance misinterprets garbage data as valid instructions. > How hard would it be to teach it to return a different error code when > the buffer is too small? Yap, see above. Unambiguous cases are clear but I don't know it would work in all cases. For example, let's say you give it a zeroed out buffer of 8 bytes which doesn't contain anything - you've just zeroed it out and haven't put any insns in there yet. But those are perfectly valid insns: 0: 00 00 add %al,(%rax) 2: 00 00 add %al,(%rax) 4: 00 00 add %al,(%rax) 6: 00 00 add %al,(%rax) So now you go about your merry way working with those although they're not real instructions which some tool generated. See what I mean? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette