Pinal Shah created ATLAS-4882:
---------------------------------
Summary: Export/Import: Export exits with "Found 0 entities"
Key: ATLAS-4882
URL: https://issues.apache.org/jira/browse/ATLAS-4882
Project: Atlas
Issue Type: Bug
Components: atlas-core
Reporter: Pinal Shah
Assignee: Pinal Shah
*Issue:*
Export during ingestion fails giving Found 0 entities in the logs
Ingestion meaning Atlas is consuming messages
*When is the issue seen?*
It occurs when there is huge amount of data in backend and Atlas is consuming
messages linked to entity of which export is running
*Analysis to find Root cause:*
* when there is huge amount of data in backend, export FAILS
* when there is huge amount of data in backend but less tables under it, then
also export FAILS
* if background consumption stops, export PASS
* if consumption is of different entities then requested in export, export PASS
* export query to find starting object uses below query, where has clause to
check property is expensive
g.V().has('__typeName','hive_db').has('Referenceable.qualifiedName','db6@cm').has('__guid').values('__guid')
- has('__guid') queries [(35x_t <> null)]:vertex_index , checked timetaken in
the solr logs
2024-06-14 02:38:56.218 INFO (qtp1158676965-19) [c:vertex_index s:shard1
r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request
[vertex_index_shard1_replica_n1] webapp=/solr path=/select
params=\{q=*:*&_stateVer_=vertex_index:12&fl=id&start=0&fq=35x_t:*+&rows=500000&wt=javabin&version=2}
hits=1681928 status=0 QTime=4227
2024-06-14 02:40:23.945 INFO (qtp1158676965-16) [c:vertex_index s:shard1
r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request
[vertex_index_shard1_replica_n1] webapp=/solr path=/select
params=\{q=*:*&_stateVer_=vertex_index:12&fl=id&start=500000&fq=35x_t:*+&rows=500000&wt=javabin&version=2}
hits=1682086 status=0 QTime=787
2024-06-14 02:41:37.703 INFO (qtp1158676965-14) [c:vertex_index s:shard1
r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request
[vertex_index_shard1_replica_n1] webapp=/solr path=/select
params=\{q=*:*&_stateVer_=vertex_index:12&fl=id&start=1000000&fq=35x_t:*+&rows=500000&wt=javabin&version=2}
hits=1682216 status=0 QTime=1962
2024-06-14 02:42:20.715 INFO (qtp1158676965-20) [c:vertex_index s:shard1
r:core_node2 x:vertex_index_shard1_replica_n1] o.a.s.c.S.Request
[vertex_index_shard1_replica_n1] webapp=/solr path=/select
params=\{q=*:*&_stateVer_=vertex_index:12&fl=id&start=1500000&fq=35x_t:*+&rows=500000&wt=javabin&version=2}
hits=1682363 status=0 QTime=4465
- ran same query through gremlin shell while ingestion is happening it doesnt
fail
- time taken for above gremlin query in code when ingestion : 214825ms
- time takem for above gremlin query in gremlin shell when ingestion : 104641ms
- time taken for above gremlin query when no ingestion : 181682ms
WorkAround
- Remove .has('__guid') clause from below, it is very quick
g.V().has('__typeName','hive_db').has('Referenceable.qualifiedName','db6@cm').has('__guid').values('__guid')
--
This message was sent by Atlassian Jira
(v8.20.10#820010)