On 10/20/25 12:33, Peter Maydell wrote:
> On Mon, 20 Oct 2025 at 16:01, Sean Anderson <[email protected]> wrote:
>>
>> On 10/18/25 03:21, Heinrich Schuchardt wrote:
>> > On 10/17/25 23:35, Sean Anderson wrote:
>> >> When semihosting 32-bit systems, the return value of FLEN will be stored
>> >> in a 32-bit integer. To prevent wraparound, return -1 and set EOVERFLOW.
>> >> This matches the behavior of stat(2). Static files don't need to be
>> >> checked, since are always small.
>> >>
>> >> Signed-off-by: Sean Anderson <[email protected]>
>> >> ---
>> >>
>> >>   semihosting/arm-compat-semi.c | 17 ++++++++++++++---
>> >>   1 file changed, 14 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/semihosting/arm-compat-semi.c b/semihosting/arm-compat-semi.c
>> >> index c5a07cb947..57453ca6be 100644
>> >> --- a/semihosting/arm-compat-semi.c
>> >> +++ b/semihosting/arm-compat-semi.c
>> >> @@ -305,8 +305,19 @@ static uint64_t common_semi_flen_buf(CPUState *cs)
>> >>       return sp - 64;
>> >>   }
>> >>   +static void common_semi_flen_cb(CPUState *cs, uint64_t ret, int err)
>> >> +{
>> >> +    CPUArchState *env = cpu_env(cs);
>> >> +
>> >> +    if (!err && !is_64bit_semihosting(env) && ret > INT32_MAX) {
>> >
>> >
>> > The issue with the current implementation is that files with file sizes 
>> > over 4 GiB will be reported as file size < 4 -GiB on 32bit systems. Thanks 
>> > for addressing this.
>> >
>> > But unfortunately with your change you are additionally dropping support 
>> > for file sizes 2 GiB to 4 GiB on 32bit devices. This should be avoided.
>> >
>> > The semihosting specification specifies that the value returned in r0 
>> > should be -1 if an error occurs. So on 32 bit systems 0xffffffff should be 
>> > returned.
>> >
>> > As file sizes cannot be negative there is not reason to assume that the 
>> > value in r0 has to be interpreted by semihosting clients as signed.
>> >
>> > Please, change your commit to check against 0xffffffff.
>> >
>> > It might make sense to contact ARM to make their specification clearer.
>>
>> stat(2) will return -1/EOVERFLOW on 32-bit hosts for files over 2 GiB. I 
>> believe we should be consistent.
> 
> I can see both sides of this one -- the semihosting spec
> is pretty old and was designed (to the extent it was designed)
> back in an era when 2GB of RAM or a 2GB file was pretty
> implausible sounding. (And today there's not much appetite
> for making updates to it for AArch32, because 32-bit is
> the past, not the future, and in any case you have to deal
> with all the existing implementations of the spec so
> changes are super painful to try to promulgate.)
> 
> Our QEMU implementation of SYS_ISERROR() says "anything that's
> a negative 32-bit integer is an error number" for 32-bit hosts,
> which I suppose you might count on the side of "check
> against INT32_MAX".
> 
> I think overall that if we think that anybody is or might be
> using semihosting with files in that 2..4GB size, we should
> err on the side of preserving that functionality. Otherwise
> somebody will report it as a bug and we'll want to fix it
> as a regression. And it doesn't seem impossible that somebody
> out there is doing so.
> 
> If we report 2..4GB file sizes as if the file size value
> was an unsigned integer, then clients that expect that will
> work, and clients that treat any negative 32-bit value as
> an error will also work in the sense that they'll handle it
> as an error in the same way they would have done for -1.
> So that is the safest approach from a back-compat point
> of view, and I think that's what I lean towards doing.

Actually, some existing targets don't handle "negative" file sizes
properly. In particular, newlib generally treats the result as an int or
an off_t, which is a long (except on cygwin). Both of those types are
32 bits or smaller on (I)LP32. So if you return a 3 GiB size it will be
treated as -1 GiB. This will break lseek with SEEK_END, since newlib
uses the signed result of flen to recompute a new absolute offset.

IMO the spec is unclear, and this is reflected in the varying semantics
of differing implementations. I think returning any negative number
other than -1 is just inviting bugs.

--Sean

Reply via email to