Hi Team, I'm using apache-atlas-2.3.0 with embedded hbase and solr.
Problem Statement: - Row count in 'apache_atlas_janus' table is increasing exponentially whenever a new typedef, new entity is created/updated. While this is not an issue with embedded hbase and solr distribution, In production, we are reaching around 20-30M row count in janus table - While apache-atlas provides greater control on setting up the TTL on 'audit table' ( https://issues.apache.org/jira/browse/ATLAS-4768 ), there's no way to control TTL on the 'janus table', other than to setup TTL on the column families manually using hbase shell. - Records on 'janus table' are not human-readable, since they are stored in a serialized form. - Setting up TTL manually on the janus table's column families causes atlas to malfunction, evidently so, because we don't know what rows are getting deleted. What is required: -> Some level of control on the janus table TTL, so as to purge out older/not required records, without messing up with any other components in atlas. -> If TTL is not possible, then at least there should be a way to deserialize the hbase rows in the janus table, so that we can implement our own TTL logic. What I've tried: -> Reading the janus hbase table through the java code, ran into this issue: https://github.com/JanusGraph/janusgraph/issues/941 -> Tried setting up the TTL on the vertices in janusgraph using gremlin queries. The problem is, each vertex in atlas is defined with the label of 'vertex'. Setting up management object on the label itself throwing the error of: 'Name cannot be in protected namespace: vertex' -> Even tried setting up TTL on a vertex on a local janusgraph instance (without atlas). Didn't saw any difference in row count even after vertex TTL is expired -> Atlast, tried to delete some rows in the janus table based on a timestamp range, for the following scenarios: - Tried deleting the rows in janus table only for a single update timestamp - Tried deleting the rows only for entity updates timestamp - Tried deleting the rows which were created before the latest entity update In all the cases, entity got disappeared in the UI, with the following error: No typename found for given entity with guid: c90744dc-7ac6-4b5f-8fd2-ffc6282f5a64 In short, deleting any rows related to the entity in the janus table is messing up with the entity itself. Please let me know if there's any existing solution for the above problem, or should I reach out to the janusgraph community regarding the serialization/deserialization issue. Thanks!
