Re: [I] [SIP-153] Translating Superset asset data using custom flask-babel extraction methods [superset]

via GitHub Fri, 21 Feb 2025 11:43:40 -0800


mistercrunch commented on issue #32139:
URL: https://github.com/apache/superset/issues/32139#issuecomment-2675387707


   After talking to a few folks, I wanted to capture a handful of requirements 
and thoughts here:
   
   - requirement: zero performance hit for `en`: given that 95%+ (to my 
knowledge) of use cases for Superset are single language, and generally `en`, I 
would love to have a way to guarantee that serving single-language/en has zero 
performance hit from a UX and systems standpoint. Strickly speaking would mean 
no extra lookup or zero extra joins required when operating in `en`.
   - readability/searchability: the use of "translated string ids" seems 
important for efficient indexing and lookup, meaning a message like "The 
dashboard couldn't be created since .... Please ..." probably needs some sort 
of more efficient id for lookup. Personally I think it's a requirement to keep 
`en` string living in-code to help with code greppability, and serving the 
zero-cost-for-en requirement stated above. What I'm trying to avoid is having 
the codebase and/or metadata using cryptic string ids that require a database 
lookup to make sense of.
   - no data translations: while we currently cover translating the app (static 
on a per-release basis), and now want to add support for translating metadata 
(some of the strings stored in the metadata database, slowly changing), I'd 
argue against trying to solve translating the data itself (strings returned by 
the analytics database). This means that the app itself is localized, and 
things like the dashboard title (ie "CompanyX Revenue Dashboard"), chart title 
(ie "Revenue by Country"), and even dimension labels ("Country") are localized, 
but the dimension's content "United States" cannot be translated. The risk here 
is that the surface to cover is extremely wide, and a single data table 
visualization could require thousands of lookups. The level of dynamism here is 
crazy/unmanageable.  
   
   Another consideration @eschutho brought up is editability. What happens when 
you're in the spanish local and edit the dashboard? Are you effectively 
updating spanish metadata in this mode as you alter dashboard and chart titles? 
My take on this is no, you'd be editing the raw, untranslated string. To edit 
any specific locale, you'd go through a different workflow, effectively editing 
the `Translations` model/data, wherever that lives.
   
   A bit more about how this could be achieved with jinja. Where a dasbhoard 
title might have been "CompanyX Revenue Dashboard", someone could input a jinja 
macro as in "CompanyX {{ i18n(string_id="rev_dash", en_text="Revenue 
Dashboard") }}". Now this jinja macro, when locale='en' would effectively just 
return the en_text without doing any lookup. If user local is spanish, it would 
perform a lookup against a `Translation` model, that simply stores (string_id, 
locale, translated_text), and returns what it finds. If nothing it returns the 
en_text.
   
   This in theory serves the requirement of zero cost for en. Now how much does 
it cost to serve a different locale? Simply the cost of an indexed k/v lookup 
(probably counted in few ms), times the number of things to lookup. In a 
large/rich dashboard, that could be dozens, to maybe up to a hundred or so k/v 
lookups. Maybe there could be smart caching, memoization and/or acceleration 
with using the caching layer (Redis) as opposed to the metadata db. But overall 
I think it's not super bad to do a dozen k/v lookup while serving a dashboard.
   
   One more consideration is the idea of "namespacing" strings and/or bundling 
fetches, where extra logic/semantics would exist to fetch a certain set of 
relevant translations all at once. Say if you know you're building a certain 
dashboard, in theory you could do an efficient lookup to build a 
custom/temporary language pack (with all the strings you need for a particular 
dashboard) as a single request. Personally tend to think this is overly 
complex/taxing. I prefer the simplicity of a single wide namespace with more 
atomic lookups over a more deeply engineered solution. Said solution might 
incorporate fields like `object_type` and `object_id` as part of that 
Translation model, and allow for batch retrievals of translations as opposed to 
atomic key/value fetches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Re: [I] [SIP-153] Translating Superset asset data using custom flask-babel extraction methods [superset]

Reply via email to