[PR] feat: dataset folders (backend) [superset]

via GitHub Wed, 05 Mar 2025 11:48:20 -0800


betodealmeida opened a new pull request, #32520:
URL: https://github.com/apache/superset/pull/32520


   <!---
   Please write the PR title following the conventions at 
https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   Tentative implementation of the backend for 
https://github.com/apache/superset/issues/32351.
   
   I ended up doing the simplest thing that works, based on these requirements:
   
   1. We want to reuse existing dataset APIs (there will be no new APIs to 
manage folders).
   2. Order of folders is important, as well as the order of elements within 
folders.
   3. Support for nested folders should be present from the start (new 
requirement).
   
   I started the implementation with a `Folder` model, and mapped relationships:
   
   - `Folder n:1 SqlaTable`
   - `SqlMetric n:1 Folder`
   - `TableColumn n:1 Folder`
   
   One problem with this approach is that order is important, so I had to keep 
track of the order of each folder and each element inside a folder. This 
required additional columns, and complex bookkeeping when a dataset was updated 
— did the positions change? Were metrics removed, columns added, folders 
renamed? Moving the last element in a folder to the first position, for 
example, would require updating the position of all elements inside the folder.
   
   Additionally, representing nested folders would require additional 
relationships to be tracked.
   
   I ditched that approach, and instead opted to serialize the folder structure 
to a new JSON column in the `SqlaTable` model called `folders`. When doing a 
`GET` request to `/api/v1/dataset/` or `/api/v1/explore/` the response now 
includes UUIDs for metrics and columns, and has the new attribute `folders`:
   
   ```python
   {
     ...
     "metrics": [
       {
         "metric_name": "count",
         "uuid": "uuid2",
         ...
       },
       ...
     ],
     "columns": [
       {
         "column_name": "country",
         "uuid": "uuid5",
         ...
       },
       {
         "column_name": "column-not-in-any-folder",
         "uuid": "uuid6",
         ...
       },
       ...
     ],
     "folders": [
       {
         "type": "folder",
         "uuid": "uuid1",
         "name": "My metrics",
         "children": [
           {
             "type": "metric",
             "uuid": "uuid2",
             "name": "count",
           },
         ],
       },
       {
         "type": "folder",
         "uuid": "uuid3",
         "name": "My columns",
         "children": [
           {
             "type": "folder",
             "uuid": "uuid4",
             "name": "Dimensions",
             "children": [
               {
                 "type": "column",
                 "uuid": "uuid5",
                 "name": "country",
               },
             ],
           },
         ],
       },
     ]
   }
   ```
   
   Note that for metrics and columns `type` and `name` are redundant (since 
they're still present in the resposne payload under `metrics` and `columns`, 
respectively). But to make the API clearer I uncluded them in the response, 
even if technically only the UUID is needed.
   
   With this solution the frontend can easily build the UI from this response. 
Note that the payload only includes custom folders, and the `metrics` and 
`columns` attribute in the response are unmodified (meaning a metric that is 
present in a folder will still show up under `metrics`). It's up to the 
frontend to build the existing "Columns" and "Metrics" sections by removing any 
elements that are present in custom folders. This way we can build the feature 
progressively by first enhancing the API, and later adding the custom UI.
   
   To organize the metrics and columns into folders, as well as create new 
folders, the user must edit the dataset (dataset creation doesn't show metrics 
nor columns). After creating folders and adding metrics/columns to them the 
user can save the dataset. The `PUT` request will then send a payload that can 
also be enhanced with the `folders` attribute.
   
   There is no model for folders, since it offered little to no value. Instead, 
the client simply declare the folder name, UUID, and an optional description, 
as well as the tree structure. The UUID is used for external bookkeeping, for 
example, for systems where semantics are defined outside of Superset and 
periodically synced via the API.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   N/A
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   Added tests.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: dataset folders (backend) [superset]

Reply via email to