[
https://issues.apache.org/jira/browse/VALIDATOR-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018592#comment-18018592
]
Kushal Dixit commented on VALIDATOR-501:
----------------------------------------
Hi [~ggregory] ,
I see you have added test cases in revision [319400a
.|https://github.com/apache/commons-validator/commit/319400a5ab8a694dd8ebc6131ae3fd20c9db0edd].
I'm not sure if a fix is already available, but I'd like to share a potential
solution I've been working on.
While the issue could potentially be solved by restructuring the existing regex
validation and {{unicodeToASCII}} conversion, I believe adding an efficient
pre-validation utility method would be a cleaner approach. This method would
check for RFC-violating hyphens *before* the IDN conversion takes place,
preventing the conversion from masking the original violations.
{code:java}
public boolean isValid(final String domain) {
if (domain == null) {
return false;
}
// Early checking for hyphens (-)
if (hasLeadingOrTrailingHyphensInLabel(domain)) {
return false;
}
// ... existing code
}
private boolean hasLeadingOrTrailingHyphensInLabel(final String domain) {
// early exit
final int n = domain.length();
if (n == 0) {
return false;
}
// remove leading white spaces.
int start = 0;
while (start < n && Character.isWhitespace(domain.charAt(start))) {
start++;
}
if (start == n) {
return false;
}
// remove trailing white spaces.
int end = n - 1;
while (end >= start && Character.isWhitespace(domain.charAt(end))) {
end--;
}
// check first and last char is (-) just for early exit
if (domain.charAt(start) == '-' || domain.charAt(end) == '-') {
return true;
}
for (int i = start + 1; i <= end; i++) {
if (domain.charAt(i) == '.') {
// Find last non-whitespace before (.)
int prevPos = i - 1;
while (prevPos >= start &&
Character.isWhitespace(domain.charAt(prevPos))) {
prevPos--;
}
if (prevPos >= start && domain.charAt(prevPos) == '-') {
return true;
}
// Find first non-whitespace after (.)
int nextPos = i + 1;
while (nextPos < end &&
Character.isWhitespace(domain.charAt(nextPos))) {
nextPos++;
}
if (nextPos < end && domain.charAt(nextPos) == '-') {
return true;
}
}
}
// finally return false if (-) is found no-where.
return false;
}{code}
This is most efficient version of {{hasLeadingOrTrailingHyphensInLabel.}}
If code readability is prioritised over performance, simpler implementations
using {{trim()}} and {{split()}} would also work, but the core principle
remains the same: {*}validate hyphen placement before IDN conversion{*}.
We could also add an early length check in the utility method to fail fast on
obviously invalid domains for further pruning.
{code:java}
if (domain.length() > MAX_DOMAIN_LENGTH) {
return true; // invalid domain
} {code}
Let me know your thoughts on this approach!
{*}Note{*}: I have tested with different scenarios and it seems to be working
for most of the cases, I found white spaces were creating some problem for ex.
{{www. -google.com}} hence I handled it through different while loops.
> DomainValidator accepts hyphens at start/end of domain name with Unicode
> characters
> -----------------------------------------------------------------------------------
>
> Key: VALIDATOR-501
> URL: https://issues.apache.org/jira/browse/VALIDATOR-501
> Project: Commons Validator
> Issue Type: Bug
> Reporter: Victoria Dimitrova
> Priority: Major
> Labels: BUG, validator
>
> The method `DomainValidator.getInstance().isValid()` returns `true` for
> invalid domain names that start or end with a hyphen when the domain name
> contains Unicode characters.
> This behavior is incorrect according to the domain name specifications (RFC
> 1035, RFC 5890), which do not allow:
> - Domain labels starting or ending with a hyphen (`-`)
> Test case that should pass but fails on version 1.10.0:
> {code:java}
> @Test
> void shouldBeInvalid()
> {
> assertAll(() ->
> assertFalse(DomainValidator.getInstance().isValid("-test.fr")),
> () -> assertFalse(DomainValidator.getInstance().isValid("-tést.fr")),
> () -> assertFalse(DomainValidator.getInstance().isValid("test-.fr")),
> () -> assertFalse(DomainValidator.getInstance().isValid("tést-.fr")));
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)