Thank you so much, again!

I can't overstate how useful all these recommendations are, and I'm looking forward to see how these play out in simulations. So I guess it's time for me to work!


Sincerely yours,

Dani.


On 18/4/25 14:41, Sven Tennie wrote:
Hey Daniel 👋

Thanks a lot for your kind words.

The AArch64 ISA might also be some source of inspiration. AArch64 has some combined instructions which RISC-V hasn't. E.g ADD of two registers with an included shift. Though, we don't seem to use many of them and I haven't found any usage that wouldn't be well covered in RISC-V NCG. Probably, that's because MachOp (https://hackage.haskell.org/package/ghc-9.12.1/docs/GHC-Cmm-MachOp.html#t:MachOp) is pretty fine grained.

A good candidate for investigations could be the CSET pseudo-instruction. I stumbled over it while looking for pseudo-ops which lead to combined instructions in AArch64 NCG. The CSET pseudo-op leads to two instructions in RISC-V NCG and to one in AArch64 NCG: - https://gitlab.haskell.org/ghc/ghc/-/blob/386f18548e3c66d04f648a9d34f167a086c1328b/compiler/GHC/CmmToAsm/AArch64/Ppr.hs#L443 - https://gitlab.haskell.org/ghc/ghc/-/blob/386f18548e3c66d04f648a9d34f167a086c1328b/compiler/GHC/CmmToAsm/RV64/Ppr.hs#L554

Though, this might be a sub-optimal implementation (in this case we'd be happy to get a ticket ;) ). As CSET is used for comparisons, it should appear pretty frequently.

A bit off-topic, but for the sake of completeness: The Compiler Explorer seems to use DWARF symbols to map assembly instructions to Haskell code lines. At least, it compiles with -g.

VELDT's profile is stated as "RV32I (no FENCE, ECALL, EBREAK)" on their Github page. But, we target RV64G with both, the NCG and LLVM backends. (The main reason to not support simpler profiles is that all hardware on the market that is powerful enough to reasonably run Haskell supports at least RV64G.)

Thanks for the hint about the J-extension. I will take a look at it.

Enjoy your weekend & best regards,

Sven

Am Fr., 18. Apr. 2025 um 12:13 Uhr schrieb Daniel Trujillo Viedma <danihacker.vie...@gmail.com>:

    Thank you so much for all the information and the help.


    Seriously, this is much more than I was hoping to get, even the
    suggestion for generating commented assembly code (which is, I
    assume, the method that Compiler Explorer uses to relate the
    high-level Haskell code with the assembly output of the compiler,
    which is really nice). And four your RISC-V NGC, which I found
    easy to understand.


    I guess this is the kind of professionals that Haskell attracts,
    which is a big part of why I love it.


    I will send here an executive summary of my findings, including
    statistics about a couple of programs that I try. I don't know if
    I'll be able to do a very statistical significant analysis,
    because I'll still have a lot of things to do (to extend QEMU,
    maybe also gem5, and implement it in a Clash microprocessor
    design, probably VELDT), but maybe in the future I can automate
    more of it and running a more comprehensive analysis. FYI, I have
    found that the RISC-V specs mention Haskell among other languages
    in a still empty J extension section, which will be aimed at
    helping dynamically translated languages as well as
    garbage-collected, but I guess RISC-V people is still more focused
    on other things and it will take some time to start work on that
    extension.


    I find also very interesting your suggestion for far-jumping, but
    I'm afraid that will be very unpopular among hardware designers
    because it messes with their highly appreciated and scarce L1
    cache. But funnily enough, I had the impresion before starting
    this project that some kind of simple mechanism for complex
    jumping would be a good idea. I will keep this in mind when
    looking for patterns in the assembly code.


    Once more, thank you so much for your work and the help, and I
    hope I can deliver soon information that you all could find
    interesting.


    Have a very nice weekend!

    Cheers,

    Dani.


    On 17/4/25 9:19, Sven Tennie wrote:
    Hey Daniel 👋

    That's really an interesting topic, because we never analyzed the
    emitted RISC-V assembly with statistical measures.

    So, if I may ask for a favour: If you spot anything that could be
    better expressed with the current ISA, please open a ticket and
    label it as RISC-V: https://gitlab.haskell.org/ghc/ghc/-/issues
    (We haven't decided which RISC-V profile to require. I.e.
    requiring the very latest extensions would frustrate people with
    older hardware... However, it's in anycase good to have possible
    improvements documented in tickets.)

    I'm wondering if you really have to go through QEMU. Or, if
    feeding assembly code to a parser and then doing the math on that
    wouldn't be sufficient? (Of course, tracing the execution is more
    accurate. However, it's much more complicated as well.)

    To account Assembly instructions to Cmm statements you may use
    the GHC parameters -ddump-cmm and -dppr-debug (and to stream this
    into files instead of stdout -ddump-to-file.) This will add
    comments for most Cmm statements into the dumped assembly code.

    At first, I thought that sign-extension / truncation might be a
    good candidate. However, it turned out that this is already
    covered by the RISC-V B-extension. Which led to this new ticket:
    https://gitlab.haskell.org/ghc/ghc/-/issues/25966

    Skimming over the NCG code and watching out for longer or
    repeating instruction lists might be a good strategy to make
    educated guesses.

    From a developer's perspective, I found the immediate sizes
    (usually 12bit) rather limiting. E.g. the Note [RISCV64 far
    jumps]
    
(https://gitlab.haskell.org/ghc/ghc/-/blob/395e0ad17c0d309637f079a05dbdc23e0d4188f6/compiler/GHC/CmmToAsm/RV64/CodeGen.hs?page=2#L1996)
    tells a story how we had to work around this limit for addresses
    in conditional jumps.

    So, you could raise the question if - analog to compressed
    expressions - it wouldn't make sense to have extended expressions
    that cover two words. Such that the first word is the instruction
    and the second it's immediate(s). (Hardware designers would
    probably hate that, because it means a bigger change to the
    instruction decoding unit. However, I got asked as a software
    developer ;) )

    Other than that, I've unfortunately got no great ideas.

    Please feel free to keep us in the loop (especially regarding the
    results of your analyses.) And, if you've got any questions
    regarding the RISC-V NCG, please feel free to reach out either
    here or directly to me. There's also a #GHC "room" on Matrix
    where you can quickly drop smaller scoped questions.

    I hope that was of any help. Best regards,

    Sven

    Am Mi., 16. Apr. 2025 um 10:34 Uhr schrieb Matthew Pickering
    <matthewtpicker...@gmail.com>:

        Hi Daniel. I think Sven Tennie and the LoongArch contributors
        are the experts in NCG for these kinds of instruction sets. I
        have cced them.

        Cheers,

        Matt

        On Tue, Apr 15, 2025 at 5:40 PM Daniel Trujillo Viedma
        <danihacker.vie...@gmail.com> wrote:

            Hello, ghc-devs! My name is Daniel Trujillo, I'm a
            Haskell enthusiast
            from Spain and I'm trying to make my Master's thesis
            about accelerating
            Haskell programs with a custom ISA extension.


            Right now, my focus is in executing software written in
            Haskell within
            QEMU in order to get traces that tells me, basically, how
            many times
            each block (not exactly basic blocks, but sort of) of
            assembly code has
            been executed, with the hope of finding some patterns of
            RISCV
            instructions that I could implement together into 1
            instruction.


            As you can see, my method is a bit crude, and I was
            wondering if the
            people involved with any of the different internal
            representations (STG,
            Cmm...) and/or native code generators (particularly
            RISCV) could provide
            me hints about assembly instructions that would have made
            the work
            easier, by removing the need of "massaging" the Cmm code
            to make CodeGen
            easier, or the need of particular optimizations, or in
            general, dirty
            tricks because of lacking of proper support of the
            standard RISCV ISA.


            And of course, I would also appreciate very much other
            hints from people
            involved in general performance (as oppossed to, for
            example, libraries
            for SIMD and parallel execution, or Haskell wrappers to
            lower-level code
            for performance reasons).


            P.D. I'm sorry if I broke any netiquette rule, but I'm
            very new to the
            email list, and haven't received yet any email from it.


            Looking forward to hear from you!

            Cheers,

            Dani.

            _______________________________________________
            ghc-devs mailing list
            ghc-devs@haskell.org
            http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to