LohithkumarAV opened a new issue, #981:
URL: https://github.com/apache/directory-scimple/issues/981
# SCIM Filter Parser Fails with Accented Characters
## Summary
The Apache Directory SCIM filter parser fails to parse SCIM search requests
when filter values contain accented/diacritic characters, returning `400 Bad
Request` with error `"Unable to map or parse JSON to SCIM schema"`.
## Environment
- **Library**: `org.apache.directory.scim:scim-spec`
- **Parser**: ANTLR-based filter parser in
`org.apache.directory.scim.spec.filter.Filter`
- **Java Version**: 17+
- **Affected Component**: `GroupService.find()` method calling
`buildFilterTree(filter)`
## Impact
- **Severity**: High
- **Scope**: Blocks SCIM RFC 7644 Section 3.13 compliance for
internationalized string normalization
- **Affected Operations**: All SCIM search operations with accented
characters in filter values
## Steps to Reproduce
### 1. Create a group with accented characters
**Request:**
```json
POST /scim/v2/Groups
{
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
"displayName": "José's Team"
}
**Response:** ✅ Success (201 Created)
{
"id": "468b6df5-80aa-4c94-ab39-75e36172d859",
"displayName": "José's Team"
}
### 2. Search with exact accented characters
**Request:**
POST /scim/v2/Groups/.search
{
"schemas": ["urn:ietf:params:scim:api:messages:2.0:SearchRequest"],
"filter": "displayName eq \"José's Team\"",
"startIndex": 1,
"count": 10
}
Failure (400 Bad Request)
{
"status": 400,
"scimType": "invalidSyntax",
"error": "Unable to map or parse JSON to SCIM schema. Please check syntax
and field types."
}
Search WITHOUT accents (workaround)
Request
POST /scim/v2/Groups/.search
{
"schemas": ["urn:ietf:params:scim:api:messages:2.0:SearchRequest"],
"filter": "displayName eq \"Jose's Team\"",
"startIndex": 1,
"count": 10
}
Response
Success (200 OK) - Returns the group with displayName: "José's Team"
Test Results Matrix
| Filter Value | Expected | Actual | Status |
|-------------|----------|--------|--------|
| "José's Team" | 200 OK | 400 Bad Request | ❌ FAIL |
| "Jose's Team" | 200 OK | 200 OK | ✅ PASS |
| "JOSÉ'S TEAM" | 200 OK | 400 Bad Request | ❌ FAIL |
| "JOSE'S TEAM" | 200 OK | 200 OK | ✅ PASS |
| "Müller's Gruppe" | 200 OK | 400 Bad Request | ❌ FAIL |
| "Muller's Gruppe" | 200 OK | 200 OK | ✅ PASS |
| "Café Équipe" | 200 OK | 400 Bad Request | ❌ FAIL |
| "Cafe Equipe" | 200 OK | 200 OK | ✅ PASS |
| "Ñoño's Tëäm" | 200 OK | 400 Bad Request | ❌ FAIL |
| "Nono's Team" | 200 OK | 200 OK | ✅ PASS |
| "Åse's Øverhead" | 200 OK | 400 Bad Request | ❌ FAIL |
| "åse's øverhead" | 200 OK | 400 Bad Request | ❌ FAIL |
Affected Character Sets
The parser fails with:
1. **Spanish accents**: José, JOSÉ
2. **German umlauts**: Müller, MÜLLER
3. **French accents**: Café, Équipe
4. **Multiple diacritics**: Ñoño's Tëäm
5. **Nordic characters**: Åse's Øverhead, åse's øverhead
Root Cause Analysis
The ANTLR grammar used by the SCIM filter parser appears to have issues
tokenizing Unicode characters in the following contexts:
1. **Accented characters combined with apostrophes**: José's, Müller's
2. **Multiple diacritics**: Ñoño's Tëäm
3. **Nordic characters**: Åse's Øverhead
4. **French accents**: Café Équipe
The parser likely treats these as invalid token sequences rather than valid
string literals.
Expected Behavior
According to **RFC 7644 Section 3.4.2.2** (Filtering):
> String attribute values are compared using case-insensitive matching and
SHOULD be normalized according to Section 3.13.
The filter parser should:
1. Accept any valid Unicode characters in string literals
2. Parse filter values containing accented characters without errors
3. Allow the application layer to perform normalization for comparison
Actual Behavior
The parser rejects filter values containing accented characters with a
generic syntax error, preventing any normalization logic from executing.
Code Flow Analysis
The failure occurs **before** application code is reached:
1. ❌ Apache Directory SCIM library receives HTTP POST with filter string
2. ❌ ANTLR parser attempts to tokenize: `"displayName eq \"José's Team\""`
3. ❌ Parser fails on accented characters → throws exception
4. ❌ Returns 400 Bad Request
5. ⛔ Application's `find(Filter filter, ...)` method **never called**
6. ⛔ Custom normalization logic **never executes**
**Proof**: When filter has no accents (`"Jose's Team"`), parsing succeeds
and application-level normalization correctly matches groups with accented
names.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]