On Tue, 23 Nov 2021 12:39:45 GMT, Pavel Rappo <pra...@openjdk.org> wrote:
>> You may be somewhat missing the point I was trying to make. >> >> You have two impls of `CoarseParser`, both of which contain a regular >> expression for the parsing, hidden inside their private matcher field. >> >> The only other functionality of `CoarseParser` is `payloadEnd` and >> `markupStart`. >> >> My suggestion is to start by updating each of the regex with named groups >> for the payload and markup parts of the line, such that you can derive >> `payloadEnd` and `markupStart` from the appropriate named groups. >> >> At that point, the only thing unique about the impls of `CoarseParser` is >> their regex, and that regex could become a property of the `Language` object. >> >>> As you also note, "end-of-line comments" and "comment lines" differ. I >>> couldn't quickly come up with a regex that accounts for both of them. >> >> To be clear, I am _not_ suggesting a single regex. I am suggesting a regex >> per supported language. > > I abstracted out the mechanics behind the `CoarseParser` precisely because I > couldn't come up with a simple way to derive `payloadEnd` and `markupStart` > using only groups, be they named or otherwise. My regex fu is not strong > enough. If you could find a way that does not look too ugly and passes the > added `TestLangProperties` test, be my guest. I think the following regexes with named groups should yield the same results as yours: "^(?<payload>.*)//(?<markup>\\s*@\\s*\\w+.+?)$" "^(?<payload>[ \t]*([#!].*)?)[#!](?<markup>\\s*@\\s*\\w+.+?)$" (Note that I didn't try running them through the test, maybe I should have :) ------------- PR: https://git.openjdk.java.net/jdk/pull/6397