https://bz.apache.org/bugzilla/show_bug.cgi?id=70024

Philippe Cloutier <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|RewriteRule: clarify what   |RewriteRule: misleading
                   |Pattern matches and what    |about the Pattern’s subject
                   |Substitution replaces       |string and what
                   |across contexts             |Substitution replaces (not
                   |                            |always “original URL-path”)
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED

--- Comment #15 from Philippe Cloutier <[email protected]> ---
(In reply to Rich Bowen from comment #7)
> Ok, I'll have to look at the patch tomorrow.
(In reply to Rich Bowen from comment #8)
> I will attempt to review the patch at some point, but unlikely to get to it
> today.
Thanks. Note that this reports a defect of normal importance which has been
present for over 19 years:
https://github.com/apache/httpd/commit/fc5122253ab7368941ab610077bc5948911b9117
There is no urgency to fix, and I would much prefer to be told about a complete
fix without regressions in 1 month than about a rushed fix this week which
turns out to be partial.

> I'm going to attempt to retitle this ticket to reflect what it's
> actually about, but I'm still kind of unsure what it's actually intended to
> address. "Incoherent" does not communicate to me what you're actually
> objecting to.
OK, I replaced with a title pointing more precisely to where I am confident the
error is.

(In reply to Rich Bowen from comment #10)
> 1. Split into smaller patches
> 
> This bundles at least 4 independent changes. Reviewing and committing
> separately would be much easier:
> 
> - (A) Add [!] to the syntax line — trivial, can commit immediately
> - (B) Consolidate the Negation note — low-risk reformatting
> - (C) Rewrite "What is matched?" section
> - (D) Restructure Substitution section from <dl> to nested <ul>
I am an advocate of commit atomicity; this section’s slow evolution is not the
only reason why I tied several changes together. The changes are not
independent. For example, defining when a rule triggers depends on fixing the
description for no match rules (A). But well, feel free to commit as you wish
if you think it's realistic.

> 2. Guide vs. reference doc placement
> 
> The expanded Rewrite Guide (docs/manual/rewrite/) has recently been
> substantially rewritten. Much of what this patch adds to the directive
> synopsis is already covered:
> 
> - VirtualHost vs. per-directory matching → rewrite/htaccess.xml §"What URL
> does the rule see?"
> - Directory-path stripping → rewrite/htaccess.xml §"path-stripping"
> - Sequential rule application → rewrite/tech.xml §"API Phases"
> 
> The httpd docs deliberately separate terse reference (mod/*.xml) from
> explanatory guide content (rewrite/*.xml). I'd prefer to keep the directive
> synopsis concise and cross-reference the guide rather than duplicating
> tutorial-level explanations inline.
I fail to see your point here. The patch’s goal is to fix the reference. If you
see “tutorial-level explanations” that appear added, they must simply have been
moved.

> 3. On "subject-path" terminology
> 
> I'd rather not introduce new terminology here. The relevant concepts already
> have well-defined names in RFC 7230/9110 (request-target, path) and in
> httpd's own vocabulary (URL-path, filesystem path). Coining "subject-path"
> doesn't clarify — it adds a term readers then have to map back to the
> standard ones.
httpd introduces terms like "directory-path" and "file-path" even if these
could be replaced using standard terms like "pathname". Any term does require
readers to learn and remember, although tooltips (a la
https://www.glossaryplugin.com/demo/) would help.

> The current approach of explaining contextually what the
> pattern is matched against is sufficient.
If the current approach was sufficient, we would not see “the original
URL-path” or that kind of shortcut spread through the documentation. Expanding
a proper definition each time requires effort from writers, and lengthens what
readers go through.

> 4. Technical concerns
> 
> - "The subject-path and directives such as DocumentRoot and Alias determine
> the currently mapped filesystem path" — this inverts the causality.
> DocumentRoot/Alias determine the mapped path; the prefix is stripped from
> that to produce what the pattern sees.
As the “What is matched?” section explains, the currently mapped filesystem
path is not only determined by DocumentRoot/Alias:
https://httpd.apache.org/docs/2.4/en/mod/mod_rewrite.html#rewriterule

> - "satisfied if and only if the pattern has at least 1 match" — a regex
> either matches a string or it doesn't.
Not really. Without going into details, if you test a regex against "subject
string", you are testing tens of substrings. If the pattern is "s", there is 1
match for [0,1] and another for [8,9]. See pcre2demo(3) and its "-g" switch.
However, mod_rewrite does not find all of these; a rule triggers as soon as the
subject-path has at least 1 matching substring.

That being said, your comment is valuable, highlighting that mod_rewrite
assumes no more than 1 match. The documentation needs to explain which match is
considered for back-references.

> Suggested path forward:
> 
> - […]
> - For C and D: consider whether the content you want to add is already in
> the Rewrite Guide docs. If it's not there, that's where it should go — with
> a cross-reference from the directive synopsis.
The little content this adds only makes up for {under,mis}-specified parts. It
first belongs in specification, although nothing prevents adding it or adapting
it for the guide.


(In reply to Rich Bowen from comment #11)
> Y'know, the more I look at this, the more I think this is wrong. This patch
> implies that '!' is separate from "the pattern" - ie, the regular expression
> - and it is not. the '!' is part of the regex.
> 
> Meanwhile, the part of the doc that starts with "the NOT character", also
> seems to suggest that ! is somehow part of mod_rewrite's logic, when, in
> fact, it's just a usual part of the pcre engine.
(In reply to Rich Bowen from comment #13)
> `!` is a standard part of regular expression matching, which is done in the
> library that we use in this module.
You are not entirely wrong, but mod_rewrite is !simple. PCRE supports “^” and
“!” for negation, but these are only supported in specific contexts. When the
first argument starts with an exclamation mark, there would be no such context
if the mark was integrated in the pattern, so it would be literal was it not
for special treatment: https://regexr.com/8n498
To avoid further complicating this already non-trivial ticket, this is now
tracked in #70061.


(In reply to Rich Bowen from comment #14)
> Declines the inaccurate portions of the offered patch.
If you see any new inaccuracy which the patch adds, please indicate it.

> r1934224 consolidates the regular expression advice (in particular, the
> discussion of negation with !) to the rewrite/intro.xml doc, rather than
> duplicating it here.
> […] 
> Discussion of how RewriteRule matches requests, and the portions of the HTTP
> request that are (or can be) considered, is covered in detail in
> rewrite/intro.html and rewrite/htaccess.html, and so is not duplicated here.
> 
> Ideally, we'll reduce that duplication even further in the coming days --
> mod/mod_rewrite.html is intended to be the "just the facts" reference
> manual, and rewrite/* is intended to be the in-depth hand-holding tutorial.
Specifying how RewriteRule matches is not (just) hand-holding material. This is
core behavior for the reference manual, not just for a guide which
“supplements” it.

I agree the task is already large enough to avoid duplication, but I am
skeptical that linking to the RewriteRule Basics section, which is inaccurate
and largely redundant with RewriteRule Directive, is desirable.

> Closing FIXED, but will follow up with moving more of this "what is matched"
> content into the Rewrite Guide folder, and leave references to that content
> in the reference manual.
There is some improvement in your changes (in particular, I appreciate that
this better sticks to a specification, moving advice elsewhere), but the issue
this tracks (the 3 or 4 issues Claude described) unfortunately persists.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to