[ 
https://issues.apache.org/jira/browse/ATLAS-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048723#comment-18048723
 ] 

Umesh Patil commented on ATLAS-4988:
------------------------------------

URL:- [https://github.com/apache/atlas/pull/490]



*ATLAS-4988*
*What changes were proposed in this pull request?*
Background: In Apache Atlas, deleting a Business Metadata (BM) definition 
requires a pre-check to ensure no existing entities are assigned to those 
attributes. For attributes with a data type of Array (multi-valued), the 
deletion process was taking an exceptionally long time (e.g., over 5 minutes 
for -38k entities), even when the BM was not assigned to any entity.

*Root Cause:*
 # Missing Solr Index: Attributes of type Array are generally not indexed in 
Solr.
 # Inefficient Search: The old logic forced a Solr-based search 
(entityDiscoveryService) for all attributes. When an attribute is not indexed, 
Solr performs a full collection scan, leading to severe performance degradation.
 # Data Payload: The search request was fetching full attribute values 
(setAttributes), adding unnecessary I/O overhead for a simple existence check.

*Changes Proposed:*
 # Hybrid Validation Strategy: Introduced a conditional check to determine if 
an attribute is indexable before deciding the search path.
 # Optimized Indexed Search: For indexable attributes, the Solr search was 
optimized by using the NOT_NULL operator and clearing requested attributes 
(Collections.emptySet()) to return only the GUID, significantly speeding up the 
response.
 # Direct Graph Fallback: Introduced isBusinessAttributePresentInGraph() for 
non-indexable attributes. This method queries the Graph database (JanusGraph) 
directly.
 # Targeted Lookups: The graph fallback uses Constants.ENTITY_TYPE_PROPERTY_KEY 
to narrow the scope to only relevant entity types, avoiding full system scans.
 # Integrity Maintenance: Ensured that even unindexed attributes are checked 
before deletion to prevent "dangling" metadata, while maintaining sub-second 
performance.

{*}Impact{*}:

Performance Gain: Deletion time for Business Metadata with Array attributes 
reduced from -341 seconds to -1.5 second.
Reliability: Maintains strict data integrity by ensuring no "in-use" Business 
Metadata can be deleted.
Scalability: The fix ensures that as the number of entities grows, the deletion 
of metadata remains performant.

*How was this patch tested?*
Maven Build:
Build Successful.

*Manual Testing:*
 # Creation & Deletion (No Assignment): Created Business Metadata with String 
and Array types. Verified deletion is nearly instantaneous (-0.8s - 1.5s).
 # Deletion Blocked (With Assignment): * Assigned a Business Metadata attribute 
to a hive_table entity.
 # Attempted to delete the BM definition.
 # Verified the system correctly throws ATLAS-409-00-002: Given type has 
references.
 # Large Dataset Validation: Tested against a repository containing -38,000 
hive_table entities to confirm the Solr timeout and latency issue is resolved.

> BusinessMetadata with attribute of data type Array takes time to Delete.
> ------------------------------------------------------------------------
>
>                 Key: ATLAS-4988
>                 URL: https://issues.apache.org/jira/browse/ATLAS-4988
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: chaitali borole
>            Assignee: Umesh Patil
>            Priority: Major
>
> whenever we use a multi valued attribute i.e array data type for an attribute.
> 1.When such BM is not assigned to any entities and you try to delete it even 
> then It tries to fetch all the entities under the respective applicable type 
> name
> and finally deletes the BM when it doesn't find the BM-attribute assigned to 
> any of entities.
> array type is not indexed in solr hence the above issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to