[ 
https://issues.apache.org/jira/browse/HDDS-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872912#comment-17872912
 ] 

Devesh Kumar Singh commented on HDDS-11007:
-------------------------------------------

Have tested with 3000 OM Write TPS and created 250M keys, Verified in Recon , 
Recon synced the metrics fully and successfully. Few observations and 
recommendations on Recon configs which is needed to be adjusted:

 

{{{}[ozone.recon.om|http://ozone.recon.om/]{}}}{{{}.snapshot.task.interval.delay{}}}
 -> 1m
{{{}[recon.om|http://recon.om/]{}}}{{{}.delta.update.loop.limit{}}} -> 50
{{{}[recon.om|http://recon.om/]{}}}{{{}.delta.update.limit {}}}-> 10,000


{{hadoop.hdds.db.rocksdb.WAL_ttl_seconds}} -> 1800 secs

Recommended heap Recon with OM having {{5K-8k TPS (approx) }}-> 80 GB

Without sufficient heap allocation to Recon especially in a busy cluster, Recon 
may shutdown due to memory allocation and may create sync issues later. Since 
Recon do sync with OM on a periodic basis with handful of 500 records in one 
sync period and if the cluster is very busy with high OM write TPS, then Recon 
may be lagging with huge WAL log sequence number difference and once OM flushes 
the memtable data to SSTs, OM RocksDB WAL log may not hold the expected 
sequence number what Recon may request to OM, so this will keep forcing Recon 
to have full snapshot which may increase Recon heap memory and frequent Full 
GCs may be observed, due to which allocation failure in JVM is a possibility 
and OOM also may happen.

In essence, kindly tweak the suggested recon configs values and sufficient heap 
with these configs.

So concluding it as not an issue and closing the issue.

> Key count shown in Recon was lagging behind the actual key count in OM
> ----------------------------------------------------------------------
>
>                 Key: HDDS-11007
>                 URL: https://issues.apache.org/jira/browse/HDDS-11007
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, Ozone Recon
>    Affects Versions: 1.4.0
>            Reporter: Siddhant Sangwan
>            Assignee: Devesh Kumar Singh
>            Priority: Major
>
> The actual key count as recorded by Ozone Manager was ~102 million keys, 
> while Recon was stuck at 67 million keys for more than a day. The "DB Synced 
> at" time that's shown in Recon's Overview page was also not updating. 
> [~deveshsingh] has been doing some initial analysis on this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to