saksenasonali opened a new pull request, #672:
URL: https://github.com/apache/atlas/pull/672
What changes were proposed in this pull request?
indexType STRING fields are indexed without tokenization (Mapping.STRING),
so that issue applies to TEXT attributes (indexType == null), not STRING.
qualifiedName uses default TEXT indexing and attributes with indexType:
STRING (e.g. name, owner) don't have this tokenization problem.
Update the CONTAINS check so we only fall back to Janus graph when
indexType == null and either the filter value exceeds max token length or
contains tokenize characters. STRING indexType attributes will keep using index
search for long CONTAINS values.
[ATLAS-5032](https://issues.apache.org/jira/browse/ATLAS-5032): Fix basic
search for long qualifiedName with startsWith / endsWith / contains
Problem
Basic search with attribute filters on qualifiedName returns no results when
filter values exceed Solr’s default max token length (255). This affects
startsWith, endsWith, and contains, especially when multiple criteria on the
same attribute are combined with AND (e.g. qualifiedName starts with a long
prefix and ends with @primary).
Root cause: Solr ignores tokens longer than maxTokenLength, so index-based
search does not match even though the entity exists and can be retrieved by
GUID.
Solution (Approach 2 from
[ATLAS-5032](https://issues.apache.org/jira/browse/ATLAS-5032))
For indexed string attributes, when the filter value length exceeds the
configured Solr token limit, do not use the Solr index for STARTS_WITH,
ENDS_WITH, or CONTAINS. Search falls back to JanusGraph instead.
Also ensure index and graph query paths stay consistent when the same
attribute appears in multiple AND criteria:
Skip graph filter construction when the criterion is still index-searchable.
Skip index query construction when the criterion is not index-searchable.
How was this patch tested?
Unit / module tests
EntitySearchProcessorTest — 48 tests, including 6 new
[ATLAS-5032](https://issues.apache.org/jira/browse/ATLAS-5032) scenarios (short
and long qualifiedName, hive_table and hive_column, tokenized name, CONTAINS +
ENDS_WITH).
Full repository module: mvn -pl repository test — 2391 tests, 0 failures.
mvn -pl common,repository -DskipTests install — build success.
Manual / REST (local Docker Atlas)
Reproduced the JIRA flow against http://localhost:21000/:
Created a hive_table with a ~370-character name and qualifiedName
default.@primary.
Basic search with AND:
qualifiedName startsWith default.<370-char-name>
qualifiedName endsWith @primary
Result: 1 matching entity (previously empty).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]