[
https://issues.apache.org/jira/browse/HDDS-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872912#comment-17872912
]
Devesh Kumar Singh commented on HDDS-11007:
-------------------------------------------
Have tested with 3000 OM Write TPS and created 250M keys, Verified in Recon ,
Recon synced the metrics fully and successfully. Few observations and
recommendations on Recon configs which is needed to be adjusted:
{{{}[ozone.recon.om|http://ozone.recon.om/]{}}}{{{}.snapshot.task.interval.delay{}}}
-> 1m
{{{}[recon.om|http://recon.om/]{}}}{{{}.delta.update.loop.limit{}}} -> 50
{{{}[recon.om|http://recon.om/]{}}}{{{}.delta.update.limit {}}}-> 10,000
{{hadoop.hdds.db.rocksdb.WAL_ttl_seconds}} -> 1800 secs
Recommended heap Recon with OM having {{5K-8k TPS (approx) }}-> 80 GB
Without sufficient heap allocation to Recon especially in a busy cluster, Recon
may shutdown due to memory allocation and may create sync issues later. Since
Recon do sync with OM on a periodic basis with handful of 500 records in one
sync period and if the cluster is very busy with high OM write TPS, then Recon
may be lagging with huge WAL log sequence number difference and once OM flushes
the memtable data to SSTs, OM RocksDB WAL log may not hold the expected
sequence number what Recon may request to OM, so this will keep forcing Recon
to have full snapshot which may increase Recon heap memory and frequent Full
GCs may be observed, due to which allocation failure in JVM is a possibility
and OOM also may happen.
In essence, kindly tweak the suggested recon configs values and sufficient heap
with these configs.
So concluding it as not an issue and closing the issue.
> Key count shown in Recon was lagging behind the actual key count in OM
> ----------------------------------------------------------------------
>
> Key: HDDS-11007
> URL: https://issues.apache.org/jira/browse/HDDS-11007
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM, Ozone Recon
> Affects Versions: 1.4.0
> Reporter: Siddhant Sangwan
> Assignee: Devesh Kumar Singh
> Priority: Major
>
> The actual key count as recorded by Ozone Manager was ~102 million keys,
> while Recon was stuck at 67 million keys for more than a day. The "DB Synced
> at" time that's shown in Recon's Overview page was also not updating.
> [~deveshsingh] has been doing some initial analysis on this issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]