mistercrunch commented on issue #32139: URL: https://github.com/apache/superset/issues/32139#issuecomment-2675387707
After talking to a few folks, I wanted to capture a handful of requirements and thoughts here: - requirement: zero performance hit for `en`: given that 95%+ (to my knowledge) of use cases for Superset are single language, and generally `en`, I would love to have a way to guarantee that serving single-language/en has zero performance hit from a UX and systems standpoint. Strickly speaking would mean no extra lookup or zero extra joins required when operating in `en`. - readability/searchability: the use of "translated string ids" seems important for efficient indexing and lookup, meaning a message like "The dashboard couldn't be created since .... Please ..." probably needs some sort of more efficient id for lookup. Personally I think it's a requirement to keep `en` string living in-code to help with code greppability, and serving the zero-cost-for-en requirement stated above. What I'm trying to avoid is having the codebase and/or metadata using cryptic string ids that require a database lookup to make sense of. - no data translations: while we currently cover translating the app (static on a per-release basis), and now want to add support for translating metadata (some of the strings stored in the metadata database, slowly changing), I'd argue against trying to solve translating the data itself (strings returned by the analytics database). This means that the app itself is localized, and things like the dashboard title (ie "CompanyX Revenue Dashboard"), chart title (ie "Revenue by Country"), and even dimension labels ("Country") are localized, but the dimension's content "United States" cannot be translated. The risk here is that the surface to cover is extremely wide, and a single data table visualization could require thousands of lookups. The level of dynamism here is crazy/unmanageable. Another consideration @eschutho brought up is editability. What happens when you're in the spanish local and edit the dashboard? Are you effectively updating spanish metadata in this mode as you alter dashboard and chart titles? My take on this is no, you'd be editing the raw, untranslated string. To edit any specific locale, you'd go through a different workflow, effectively editing the `Translations` model/data, wherever that lives. A bit more about how this could be achieved with jinja. Where a dasbhoard title might have been "CompanyX Revenue Dashboard", someone could input a jinja macro as in "CompanyX {{ i18n(string_id="rev_dash", en_text="Revenue Dashboard") }}". Now this jinja macro, when locale='en' would effectively just return the en_text without doing any lookup. If user local is spanish, it would perform a lookup against a `Translation` model, that simply stores (string_id, locale, translated_text), and returns what it finds. If nothing it returns the en_text. This in theory serves the requirement of zero cost for en. Now how much does it cost to serve a different locale? Simply the cost of an indexed k/v lookup (probably counted in few ms), times the number of things to lookup. In a large/rich dashboard, that could be dozens, to maybe up to a hundred or so k/v lookups. Maybe there could be smart caching, memoization and/or acceleration with using the caching layer (Redis) as opposed to the metadata db. But overall I think it's not super bad to do a dozen k/v lookup while serving a dashboard. One more consideration is the idea of "namespacing" strings and/or bundling fetches, where extra logic/semantics would exist to fetch a certain set of relevant translations all at once. Say if you know you're building a certain dashboard, in theory you could do an efficient lookup to build a custom/temporary language pack (with all the strings you need for a particular dashboard) as a single request. Personally tend to think this is overly complex/taxing. I prefer the simplicity of a single wide namespace with more atomic lookups over a more deeply engineered solution. Said solution might incorporate fields like `object_type` and `object_id` as part of that Translation model, and allow for batch retrievals of translations as opposed to atomic key/value fetches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org