[
https://issues.apache.org/jira/browse/ATLAS-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048723#comment-18048723
]
Umesh Patil commented on ATLAS-4988:
------------------------------------
URL:- [https://github.com/apache/atlas/pull/490]
*ATLAS-4988*
*What changes were proposed in this pull request?*
Background: In Apache Atlas, deleting a Business Metadata (BM) definition
requires a pre-check to ensure no existing entities are assigned to those
attributes. For attributes with a data type of Array (multi-valued), the
deletion process was taking an exceptionally long time (e.g., over 5 minutes
for -38k entities), even when the BM was not assigned to any entity.
*Root Cause:*
# Missing Solr Index: Attributes of type Array are generally not indexed in
Solr.
# Inefficient Search: The old logic forced a Solr-based search
(entityDiscoveryService) for all attributes. When an attribute is not indexed,
Solr performs a full collection scan, leading to severe performance degradation.
# Data Payload: The search request was fetching full attribute values
(setAttributes), adding unnecessary I/O overhead for a simple existence check.
*Changes Proposed:*
# Hybrid Validation Strategy: Introduced a conditional check to determine if
an attribute is indexable before deciding the search path.
# Optimized Indexed Search: For indexable attributes, the Solr search was
optimized by using the NOT_NULL operator and clearing requested attributes
(Collections.emptySet()) to return only the GUID, significantly speeding up the
response.
# Direct Graph Fallback: Introduced isBusinessAttributePresentInGraph() for
non-indexable attributes. This method queries the Graph database (JanusGraph)
directly.
# Targeted Lookups: The graph fallback uses Constants.ENTITY_TYPE_PROPERTY_KEY
to narrow the scope to only relevant entity types, avoiding full system scans.
# Integrity Maintenance: Ensured that even unindexed attributes are checked
before deletion to prevent "dangling" metadata, while maintaining sub-second
performance.
{*}Impact{*}:
Performance Gain: Deletion time for Business Metadata with Array attributes
reduced from -341 seconds to -1.5 second.
Reliability: Maintains strict data integrity by ensuring no "in-use" Business
Metadata can be deleted.
Scalability: The fix ensures that as the number of entities grows, the deletion
of metadata remains performant.
*How was this patch tested?*
Maven Build:
Build Successful.
*Manual Testing:*
# Creation & Deletion (No Assignment): Created Business Metadata with String
and Array types. Verified deletion is nearly instantaneous (-0.8s - 1.5s).
# Deletion Blocked (With Assignment): * Assigned a Business Metadata attribute
to a hive_table entity.
# Attempted to delete the BM definition.
# Verified the system correctly throws ATLAS-409-00-002: Given type has
references.
# Large Dataset Validation: Tested against a repository containing -38,000
hive_table entities to confirm the Solr timeout and latency issue is resolved.
> BusinessMetadata with attribute of data type Array takes time to Delete.
> ------------------------------------------------------------------------
>
> Key: ATLAS-4988
> URL: https://issues.apache.org/jira/browse/ATLAS-4988
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core
> Affects Versions: 3.0.0, 2.4.0
> Reporter: chaitali borole
> Assignee: Umesh Patil
> Priority: Major
>
> whenever we use a multi valued attribute i.e array data type for an attribute.
> 1.When such BM is not assigned to any entities and you try to delete it even
> then It tries to fetch all the entities under the respective applicable type
> name
> and finally deletes the BM when it doesn't find the BM-attribute assigned to
> any of entities.
> array type is not indexed in solr hence the above issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)