aminghadersohi opened a new pull request, #38079:
URL: https://github.com/apache/superset/pull/38079

   ### SUMMARY
   
   The dashboard thumbnail digest computation is non-deterministic, causing 
excessive cache misses and unnecessary Selenium screenshot regeneration. This 
was observed in production with a **4.3% cache hit rate** (24 hits out of 555 
triggers over 14 days) for a single workspace, with one dashboard being 
re-screenshotted **132 times in a single day**.
   
   #### Root cause
   
   1. **`dashboard.datasources` returns a Python `set`**, and 
`_adjust_string_with_rls()` iterates over it to build the hash input string. 
Python sets have non-deterministic iteration order across different processes 
(different `PYTHONHASHSEED`). Different Gunicorn workers produce different 
digests for the same dashboard+user → cache miss → Selenium screenshot → all 
chart queries fire against the data warehouse.
   
   2. **`dashboard.charts`** depends on `self.slices`, a SQLAlchemy 
relationship with no `order_by` clause, adding another source of ordering 
instability.
   
   #### Fix
   
   - Sort datasources by ID before iterating in `_adjust_string_with_rls()`
   - Sort chart names in `get_dashboard_digest()` before including in the hash 
input
   
   These are minimal, targeted changes that ensure digest stability without 
changing any other behavior.
   
   ### BEFORE/AFTER SCREENSHOTS OR COVERAGE URL
   
   N/A - backend-only change, no UI impact.
   
   ### TESTING INSTRUCTIONS
   
   Added `test_dashboard_digest_deterministic_datasource_order` which verifies 
that three different orderings of the same datasources produce identical 
digests.
   
   ### ADDITIONAL INFORMATION
   
   - **Related PRs**: #37895, #37899, #37941 (reduce per-computation DB cost 
but don't fix the digest instability)
   - **Impact**: For a dashboard with N datasources, there are N! possible 
iteration orders from the set, each potentially producing a different digest. 
Sorting reduces this to exactly 1 deterministic result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to