On Tue, 23 Nov 2021 12:39:45 GMT, Pavel Rappo <pra...@openjdk.org> wrote:

>> You may be somewhat missing the point I was trying to make.
>> 
>> You have two impls of `CoarseParser`, both of which contain a regular 
>> expression for the parsing, hidden inside their private matcher field.
>> 
>> The only other functionality of `CoarseParser` is `payloadEnd` and 
>> `markupStart`.
>> 
>> My suggestion is to start by updating each of the regex with named groups 
>> for the payload and markup parts of the line, such that you can derive 
>> `payloadEnd` and `markupStart` from the appropriate named groups.
>> 
>> At that point, the only thing unique about the impls of `CoarseParser` is 
>> their regex, and that regex could become a property of the `Language` object.
>> 
>>> As you also note, "end-of-line comments" and "comment lines" differ. I 
>>> couldn't quickly come up with a regex that accounts for both of them.
>> 
>> To be clear, I am _not_ suggesting a single regex.  I am suggesting a regex 
>> per supported language.
>
> I abstracted out the mechanics behind the `CoarseParser` precisely because I 
> couldn't come up with a simple way to derive `payloadEnd` and `markupStart` 
> using only groups, be they named or otherwise. My regex fu is not strong 
> enough. If you could find a way that does not look too ugly and passes the 
> added `TestLangProperties` test, be my guest.

I think the following regexes with named groups should yield the same results 
as yours:

    "^(?<payload>.*)//(?<markup>\\s*@\\s*\\w+.+?)$"

    "^(?<payload>[ \t]*([#!].*)?)[#!](?<markup>\\s*@\\s*\\w+.+?)$"

(Note that I didn't try running them through the test, maybe I should have :)

-------------

PR: https://git.openjdk.java.net/jdk/pull/6397

Reply via email to