[ 
https://issues.apache.org/jira/browse/VALIDATOR-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018592#comment-18018592
 ] 

Kushal Dixit commented on VALIDATOR-501:
----------------------------------------

Hi [~ggregory] ,

I see you have added test cases in revision [319400a 
.|https://github.com/apache/commons-validator/commit/319400a5ab8a694dd8ebc6131ae3fd20c9db0edd].
 I'm not sure if a fix is already available, but I'd like to share a potential 
solution I've been working on.

While the issue could potentially be solved by restructuring the existing regex 
validation and {{unicodeToASCII}} conversion, I believe adding an efficient 
pre-validation utility method would be a cleaner approach. This method would 
check for RFC-violating hyphens *before* the IDN conversion takes place, 
preventing the conversion from masking the original violations.
{code:java}
public boolean isValid(final String domain) {
    if (domain == null) {
        return false;
    }

    // Early checking for hyphens (-)
    if (hasLeadingOrTrailingHyphensInLabel(domain)) {
        return false;
    }

    // ... existing code
}

private boolean hasLeadingOrTrailingHyphensInLabel(final String domain) {
    // early exit
    final int n = domain.length();
    if (n == 0) {
        return false;
    }

    // remove leading white spaces.
    int start = 0;
    while (start < n && Character.isWhitespace(domain.charAt(start))) {
        start++;
    }

    if (start == n) {
        return false;
    }

    // remove trailing white spaces.
    int end = n - 1;
    while (end >= start && Character.isWhitespace(domain.charAt(end))) {
        end--;
    }

    // check first and last char is (-) just for early exit
    if (domain.charAt(start) == '-' || domain.charAt(end) == '-') {
        return true;
    }

    for (int i = start + 1; i <= end; i++) {
        if (domain.charAt(i) == '.') {
            // Find last non-whitespace before (.)
            int prevPos = i - 1;
            while (prevPos >= start && 
Character.isWhitespace(domain.charAt(prevPos))) {
                prevPos--;
            }
            if (prevPos >= start && domain.charAt(prevPos) == '-') {
                return true;
            }

            // Find first non-whitespace after (.)
            int nextPos = i + 1;
            while (nextPos < end && 
Character.isWhitespace(domain.charAt(nextPos))) {
                nextPos++;
            }
            if (nextPos < end && domain.charAt(nextPos) == '-') {
                return true;
            }
        }
    }

    // finally return false if (-) is found no-where.
    return false;
}{code}
This is most efficient version of {{hasLeadingOrTrailingHyphensInLabel.}}

If code readability is prioritised over performance, simpler implementations 
using {{trim()}} and {{split()}} would also work, but the core principle 
remains the same: {*}validate hyphen placement before IDN conversion{*}.
 
We could also add an early length check in the utility method to fail fast on 
obviously invalid domains for further pruning.
{code:java}
if (domain.length() > MAX_DOMAIN_LENGTH) {
    return true; // invalid domain
} {code}
Let me know your thoughts on this approach!

{*}Note{*}: I have tested with different scenarios and it seems to be working 
for most of the cases, I found white spaces were creating some problem for ex. 
{{www.  -google.com}} hence I handled it through different while loops.

> DomainValidator accepts hyphens at start/end of domain name with Unicode 
> characters
> -----------------------------------------------------------------------------------
>
>                 Key: VALIDATOR-501
>                 URL: https://issues.apache.org/jira/browse/VALIDATOR-501
>             Project: Commons Validator
>          Issue Type: Bug
>            Reporter: Victoria Dimitrova
>            Priority: Major
>              Labels: BUG, validator
>
> The method `DomainValidator.getInstance().isValid()` returns `true` for 
> invalid domain names that start or end with a hyphen when the domain name 
> contains Unicode characters.
> This behavior is incorrect according to the domain name specifications (RFC 
> 1035, RFC 5890), which do not allow:
>  - Domain labels starting or ending with a hyphen (`-`)
> Test case that should pass but fails on version 1.10.0:
> {code:java}
> @Test
> void shouldBeInvalid()
> {
>  assertAll(() -> 
> assertFalse(DomainValidator.getInstance().isValid("-test.fr")),  
> () -> assertFalse(DomainValidator.getInstance().isValid("-tést.fr")),
> () -> assertFalse(DomainValidator.getInstance().isValid("test-.fr")),    
> () -> assertFalse(DomainValidator.getInstance().isValid("tést-.fr")));
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to