On 8/16/15 09:41, Chen Gang wrote: > On 8/16/15 02:16, Chen Gang wrote: >> >> On 8/15/15 23:47, Richard Henderson wrote: >>> On Aug 15, 2015 2:56 AM, Chen Gang <xili_gchen_5...@hotmail.com> >>>> Oh, we are unlucky, after continue gcc testsuite, add/sub floating point >>>> insns also can be mixed together! The related C code, -save-temps, and >>>> objdump files are in attachments (is it gcc's issue? I guess not). >>>> >>>> So, I guess, we have to 'crack' all floating point insns, precisely, or >>>> we can not pass gcc testsuite. >>>> >>> >>> If you go back to my first message to you on the subject, you'll find that >>> my suggestion was to not split the operation at all, using move for pack1. >>> Which would nicely handle any such interleaving. >>> >> >> OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only >> simply move. :-( >> >> But what you said is really quite valuable to me!! we can treat the flag >> as a caller saved context, then can let the caller can use callee freely >> (in fact, I guess, the real hardware treats it as caller context, too). >> >> - we have to define the flag format based on the existing format in the >> related docs and tilegx.md (reserve 0-20 and 25-31 bits). >> >> - We can only use 21-24 for mark addsub, mul, or typecast result. If >> 21-24 bits are all zero, it means typecast result. For fsingle: 32-63 >> bits is the input integer; for fdouble: srca is the input integer. >> >> - For addsub and mul result, we use 32-63 bits for an index of resource >> handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1, >> fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free. >> >> But if caller "make mistakes", our implementation can not avoid related >> resource leak (but the real hardware can, it also lets caller save all >> related resources; when it needs them, it can let caller pass them to). >> > > If we assume that the optimization for the floating point insns can not > cross the basic blocks (I guess so), we can reset all related resources > when start a basic block. >
Oh, sorry, even qemu itself, my split a basic block into 2 basic blocks, when the basic block is too big. And we also have to assume a same value may call fdouble_pack1 individually with multiple times. So for the resource management, we can do like this: - For fsingle, it can be saved in 32-63 bits of caller context (it is float32 which is 32-bit). - For fdouble, we can allocate a 'bit' buffer for it (e.g. 8KB), when the saved values count overflow 1K, let it roundup to 0 again -- of cause, the old 1Kth value should be already useless. I guess, in this way, we can emulate the tilgex floating points insns!! :-) Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed