This is an automated email from the ASF dual-hosted git repository.
janhoy pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/solr.git
The following commit(s) were added to refs/heads/main by this push:
new 19ec326349d SOLR-17979 Improve changes2html.py for authors and PR
detection (#3831)
19ec326349d is described below
commit 19ec326349d1f31d6735f3b8c11b793503c77f43
Author: Jan Høydahl <[email protected]>
AuthorDate: Tue Nov 4 11:04:17 2025 +0100
SOLR-17979 Improve changes2html.py for authors and PR detection (#3831)
- Authors with url and github nick handled
- Plain PR ref `#123` detected as PR#123 with github link
- Correct a changelog yaml missing JIRA issue
- Fix links in dev-docs/changelog.adoc
- Describe logchangeArchive task.
- Remove mention of Perl as a requirement for build
---
...7619 Use logchange for changelog management.yml | 3 +
dev-docs/changelog.adoc | 11 +-
dev-docs/how-to-contribute.adoc | 2 +-
dev-docs/solr-source-code.adoc | 2 +-
.../documentation/changes-to-html/changes2html.py | 258 ++++++++++++++++++---
5 files changed, 243 insertions(+), 33 deletions(-)
diff --git a/changelog/v9.10.0/SOLR-17619 Use logchange for changelog
management.yml b/changelog/v9.10.0/SOLR-17619 Use logchange for changelog
management.yml
index 63a14ab5278..238a628cab7 100644
--- a/changelog/v9.10.0/SOLR-17619 Use logchange for changelog management.yml
+++ b/changelog/v9.10.0/SOLR-17619 Use logchange for changelog management.yml
@@ -5,3 +5,6 @@ authors:
- name: Jan Høydahl
nick: janhoy
url: https://home.apache.org/phonebook.html?uid=janhoy
+links:
+ - name: SOLR-17619
+ url: https://issues.apache.org/jira/browse/SOLR-17619
diff --git a/dev-docs/changelog.adoc b/dev-docs/changelog.adoc
index fc4aa72cbc5..6e0f2973230 100644
--- a/dev-docs/changelog.adoc
+++ b/dev-docs/changelog.adoc
@@ -37,7 +37,7 @@ solr/
== 3. The YAML format
-Below is an example of a changelog yaml fragment. The full yaml format is
xref:https://logchange.dev/tools/logchange/reference/#tasks[documented here],
but we normally only need `title`, `type`, `authors` and `links`. For a change
without a JIRA, you can add the PR number in `issues`:
+Below is an example of a changelog yaml fragment. The full yaml format is
https://logchange.dev/tools/logchange/reference/#yaml-entry-format[documented
here], but we normally only need `title`, `type`, `authors` and `links`. For a
change without a JIRA, you can add the PR number in `issues`:
[source, yaml]
----
@@ -120,8 +120,13 @@ The logchange gradle plugin offers some tasks, here are
the two most important:
| `logchangeRelease`
| Creates a new changelog release by moving files from `changelog/unreleased/`
directory to `changelog/vX.Y.Z` directory
+
+| `logchangeArchive`
+| Archives the list of released versions up to (and including) the specified
version by transferring their summaries to the `archive.md` file, merging all
existing archives, and deleting the corresponding version directories.
|===
+The `logchangeRelease` and `logchangeGenerate` tasks are used by
ReleaseWizard. The `logchangeArchive` task can be ran once for every major
release or when the number of versioned changelog folders grow too large.
+
These are integrated in the Release Wizard.
=== 6.2 Migration tool
@@ -242,5 +247,5 @@ Example report output (Json or Markdown):
== 7. Further Reading
-* xref:https://github.com/logchange/logchange[Logchange web page]
-* xref:https://keepachangelog.com/en/1.1.0/[keepachangelog.com website]
+* https://github.com/logchange/logchange[Logchange web page]
+* https://keepachangelog.com/en/1.1.0/[keepachangelog.com website]
diff --git a/dev-docs/how-to-contribute.adoc b/dev-docs/how-to-contribute.adoc
index 006232b38a8..7621eb7b8e7 100644
--- a/dev-docs/how-to-contribute.adoc
+++ b/dev-docs/how-to-contribute.adoc
@@ -33,7 +33,7 @@ In order to make a new contribution to Solr you will use the
fork you have creat
1. Create a new Jira issue in the Solr project:
https://issues.apache.org/jira/projects/SOLR/issues
2. Create a new branch in your Solr fork to provide a PR for your contribution
on the newly created issue. Make any necessary changes for the given
bug/feature in that branch. You can use additional information in these
dev-docs to build and test your code as well as ensure it passes code quality
checks.
3. Once you are satisfied with your changes, get your branch ready for a PR by
running `./gradlew tidy updateLicenses check -x test`. This will format your
source code, update licenses of any dependency version changes and run all
pre-commit tests. Commit the changes.
-* Note: the `check` command requires `perl` and `python3` to be present on
your `PATH` to validate documentation.
+* Note: the `check` command requires `python3` to be present on your `PATH` to
validate documentation.
4. Open a PR of your branch against the `main` branch of the apache/solr
repository. When you open a PR on your fork, this should be the default option.
* The title of your PR should include the Solr Jira issue that you opened,
i.e. `SOLR-12345: New feature`.
* The PR description will automatically populate with a pre-set template that
you will need to fill out.
diff --git a/dev-docs/solr-source-code.adoc b/dev-docs/solr-source-code.adoc
index 5874a2e06ec..fd22d4e0c7d 100644
--- a/dev-docs/solr-source-code.adoc
+++ b/dev-docs/solr-source-code.adoc
@@ -34,7 +34,7 @@ To build the documentation, type `./gradlew -p solr
documentation`.
`./gradlew check` will assemble Solr and run all validation tasks unit tests.
-NOTE: the `check` command requires `perl` and `python3` to be present on your
`PATH` to validate documentation.
+NOTE: the `check` command requires `python3` to be present on your `PATH` to
validate documentation.
To build the final Solr artifacts run `./gradlew assemble`.
diff --git a/gradle/documentation/changes-to-html/changes2html.py
b/gradle/documentation/changes-to-html/changes2html.py
index 4827efa40d1..67c2bbb592c 100755
--- a/gradle/documentation/changes-to-html/changes2html.py
+++ b/gradle/documentation/changes-to-html/changes2html.py
@@ -138,40 +138,232 @@ class HTMLGenerator:
self.GITHUB_ISSUE_PREFIX, 'GITHUB#{0}')
]
- def extract_issue_from_text(self, text):
+ def _format_issue_link(self, url_prefix, issue_id, label):
+ """Format a single issue reference as an HTML anchor tag"""
+ return f'<a href="{url_prefix}{issue_id}">{label}</a>'
+
+ def _extract_markdown_issue(self, text):
"""
- Extract the first JIRA/GitHub issue from markdown text.
- Returns (issue_link_html, text_without_issue)
+ Extract markdown-formatted JIRA/GitHub issues like [SOLR-123](url) or
[PR#123](url).
+ Returns (issue_link_html, text_without_issue) or (None, text) if not
found.
"""
for pattern, url_prefix, label_fmt in self.issue_patterns:
match = re.search(pattern, text)
if match:
issue_id = match.group(1)
label = label_fmt.format(issue_id)
- issue_html = f'<a href="{url_prefix}{issue_id}">{label}</a>'
+ issue_html = self._format_issue_link(url_prefix, issue_id,
label)
text_without = (text[:match.start()] +
text[match.end():]).strip()
return issue_html, text_without
+
return None, text
+ def _extract_plain_pr_references(self, text):
+ """
+ Extract plain GitHub PR references like #123 or #123 #456.
+ Only matches PRs that appear before the author list (before opening
paren or at end).
+ Returns (issue_link_html, text_without_issue) or (None, text) if not
found.
+ """
+ # Pattern: #\d+ optionally followed by more #\d+ before opening paren
or end of string
+ pattern = r'#(\d+)(?:\s+#(\d+))*\s*(?=\(|$)'
+ match = re.search(pattern, text)
+
+ if not match:
+ return None, text
+
+ # Extract all PR numbers from the matched text
+ pr_numbers = re.findall(r'#(\d+)', match.group(0))
+ if not pr_numbers:
+ return None, text
+
+ # Format each PR as an HTML link and join with commas
+ pr_links = [self._format_issue_link(self.GITHUB_PR_PREFIX, pr_num,
f'PR#{pr_num}')
+ for pr_num in pr_numbers]
+ issue_html = ', '.join(pr_links)
+
+ # Remove the PR references from the text
+ text_without = (text[:match.start()] + text[match.end():]).strip()
+ return issue_html, text_without
+
+ def extract_issue_from_text(self, text):
+ """
+ Extract the first issue reference from text.
+ Tries in order: markdown JIRA/GitHub issues, plain GitHub PR
references.
+ Returns (issue_link_html, text_without_issue) or (None, text) if not
found.
+ """
+ # Try markdown-formatted issues first
+ issue_html, text_without = self._extract_markdown_issue(text)
+ if issue_html:
+ return issue_html, text_without
+
+ # Fall back to plain GitHub PR references
+ return self._extract_plain_pr_references(text)
+
+ def _format_single_author(self, author_text):
+ """
+ Format a single author entry to HTML.
+ Supports:
+ - Plain name: "Jan Høydahl" -> "Jan Høydahl"
+ - Markdown link: "[Jan Høydahl](url)" -> "<a href=\"url\">Jan
Høydahl</a>"
+ - Name with GitHub: "Jan Høydahl @janhoy" -> "<a
href=\"https://github.com/janhoy\">Jan Høydahl</a>"
+ - Link with GitHub: "[Jan Høydahl](url) @janhoy" -> "<a
href=\"url\">Jan Høydahl</a> <a href=\"https://github.com/janhoy\">@janhoy</a>"
+ """
+ author_text = author_text.strip()
+
+ # Extract markdown link: [text](url)
+ markdown_link_match = re.search(r'\[([^\]]+)\]\(([^)]+)\)',
author_text)
+ # Extract GitHub handle: @username
+ github_match = re.search(r'@(\w+)', author_text)
+
+ if markdown_link_match:
+ # Has markdown link
+ link_text = markdown_link_match.group(1)
+ link_url = markdown_link_match.group(2)
+ html = f'<a href="{link_url}">{self.escape_html(link_text)}</a>'
+
+ if github_match:
+ # Has both markdown link and GitHub handle
+ github_handle = github_match.group(1)
+ html += f' <a
href="https://github.com/{github_handle}">@{github_handle}</a>'
+
+ return html
+ elif github_match:
+ # Has GitHub handle but no markdown link - extract name and link
it to GitHub
+ github_handle = github_match.group(1)
+ # Remove the @handle part to get just the name
+ name = author_text.replace(f'@{github_handle}', '').strip()
+ return f'<a
href="https://github.com/{github_handle}">{self.escape_html(name)}</a>'
+ else:
+ # Plain name with no links
+ return self.escape_html(author_text)
+
+ def _extract_one_author_group(self, text, start_pos):
+ """
+ Extract one author group starting from start_pos (pointing to an
opening paren).
+ Returns (author_content, end_pos) or (None, start_pos) if no valid
group.
+ Handles markdown links [text](url) inside the group.
+ """
+ if start_pos >= len(text) or text[start_pos] != '(':
+ return None, start_pos
+
+ paren_depth = 0
+ bracket_depth = 0
+ content = []
+
+ for i in range(start_pos, len(text)):
+ char = text[i]
+
+ # Track brackets to know if we're inside [text]
+ if char == '[' and bracket_depth >= 0:
+ bracket_depth += 1
+ elif char == ']' and bracket_depth > 0:
+ bracket_depth -= 1
+ # Only track paren depth outside brackets
+ elif bracket_depth == 0:
+ if char == '(':
+ paren_depth += 1
+ elif char == ')':
+ paren_depth -= 1
+ if paren_depth == 0:
+ # Found matching closing paren
+ return ''.join(content[1:]).strip(), i # Skip opening
paren
+
+ content.append(char)
+
+ return None, start_pos
+
def extract_authors(self, text):
- """Extract authors from trailing parentheses"""
- # Match (author1) (author2) ... at the end
- match = re.search(r'\s*(\([^)]+(?:\)\s*\([^)]+)*\))\s*$', text)
- if match:
- authors_text = match.group(1)
- text_without_authors = text[:match.start()].strip()
-
- # Parse individual authors
- authors = re.findall(r'\(([^)]+)\)', authors_text)
- authors_list = []
- for author_group in authors:
- # Split by comma or "and"
- for author in re.split(r',\s*|\s+and\s+', author_group):
+ """Extract authors from trailing parentheses, handling markdown links
[text](url)"""
+ authors_list = []
+
+ # Find all author groups at the end of the text
+ # Work backwards from the end to find opening parentheses
+ i = len(text) - 1
+
+ # Skip trailing whitespace
+ while i >= 0 and text[i] in ' \t\n\r':
+ i -= 1
+
+ if i < 0 or text[i] != ')':
+ return None, text
+
+ # Find all complete author groups by working backwards
+ author_positions = [] # List of (start, end) positions
+
+ while i >= 0:
+ if text[i] == ')':
+ # Find the matching opening paren for this closing paren
+ paren_depth = 1
+ bracket_depth = 0
+ j = i - 1
+
+ while j >= 0 and paren_depth > 0:
+ char = text[j]
+
+ # Track brackets
+ if char == ']':
+ bracket_depth += 1
+ elif char == '[':
+ bracket_depth -= 1
+ # Track parens outside brackets
+ elif bracket_depth == 0:
+ if char == ')':
+ paren_depth += 1
+ elif char == '(':
+ paren_depth -= 1
+
+ j -= 1
+
+ if paren_depth == 0:
+ # Found matching opening paren at j+1
+ start_pos = j + 1
+
+ # Check if this is part of a markdown link [text](url)
+ # Markdown links have ] immediately before the (
+ if start_pos > 0 and text[start_pos - 1] == ']':
+ # This is a markdown link URL, not an author group
+ # Continue searching backwards
+ i = j
+ else:
+ # This is an author group
+ author_positions.insert(0, (start_pos, i))
+
+ # Move past this group
+ i = j
+
+ # Skip whitespace before next potential group
+ while i >= 0 and text[i] in ' \t\n\r':
+ i -= 1
+
+ # Check if there's another author group right before
+ if i >= 0 and text[i] != ')':
+ # No more author groups
+ break
+ else:
+ break
+ else:
+ break
+
+ # Now process the found author groups
+ if author_positions:
+ # Extract text before first author group
+ first_start = author_positions[0][0]
+ text_without_authors = text[:first_start].strip()
+
+ # Extract and format each author group
+ for start_pos, end_pos in author_positions:
+ author_content = text[start_pos + 1:end_pos]
+
+ # Split by comma or "and" for multiple authors in one group
+ for author in re.split(r',\s*|\s+and\s+', author_content):
author = author.strip()
if author:
- authors_list.append(author)
+ formatted_author = self._format_single_author(author)
+ authors_list.append(formatted_author)
+
+ if authors_list:
+ return authors_list, text_without_authors
- return authors_list, text_without_authors
return None, text
def format_changelog_item(self, item_text):
@@ -183,17 +375,27 @@ class HTMLGenerator:
# Extract the issue
issue_html, text_after_issue = self.extract_issue_from_text(item_text)
- if not issue_html:
- return self.linkify_remaining_text(item_text)
+ # Always try to extract authors, whether or not we found an issue
+ authors_list, description = self.extract_authors(text_after_issue if
issue_html else item_text)
- # Extract authors and clean description
- authors_list, description = self.extract_authors(text_after_issue)
- description = re.sub(r'^[:\s]+', '', description).strip()
-
- # Build HTML
- html = f'{issue_html}: {self.escape_html(description)}'
+ if issue_html:
+ # We have an issue link
+ description = re.sub(r'^[:\s]+', '', description).strip()
+ html = f'{issue_html}: {self.escape_html(description)}'
+ else:
+ # No issue link found
+ if authors_list:
+ # We have authors but no issue - just use the description part
+ html = self.escape_html(description)
+ else:
+ # No issue and no authors - linkify the full text
+ return self.linkify_remaining_text(item_text)
+
+ # Add authors if we have them
if authors_list:
- html += f'<br /><span class="attrib">({self.escape_html(",
".join(authors_list))})</span>'
+ # Authors are already formatted as HTML, don't escape
+ html += f'<br /><span class="attrib">({",
".join(authors_list)})</span>'
+
return html
def linkify_remaining_text(self, text):