renaudk opened a new issue, #36798:
URL: https://github.com/apache/superset/issues/36798

   ### Bug description
   
   ### Summary
   When using **ECharts time-series charts (area chart)** with **Time Grain = 
HOUR**, Superset 6.x generates an **incorrect SQL expression** for the hourly 
bucket.
   
   The generated SQL **drops the `DATE()` truncation**, which prevents proper 
aggregation and results in one row per record.
   
   This is a **regression compared to Superset 4.1.4**.
   
   ---
   
   ### Superset version
   - Superset: **6.x** (reproduced on latest 6.0.x)
   - Python: 3.11
   - Chart type: **ECharts – Area Chart**
   - Time grain: **HOUR (PT1H)**
   
   ---
   
   ### Database / Engine
   - Tested with:
     - **StarRocks** (`starrocks://`)
     - **MySQL** (`mysql://`)
   - The issue occurs **independently of the SQLAlchemy backend**.
   
   ---
   
   ### Expected behavior
   For Time Grain = **HOUR**, Superset should generate a bucket that truncates 
minutes and seconds, for example:
   
   ```sql
   DATE_ADD(DATE(col), INTERVAL HOUR(col) HOUR)
   ```
   
   (or any equivalent expression that produces `YYYY-MM-DD HH:00:00`).
   
   This correctly aggregates rows by hour.
   
   ---
   
   ### Actual behavior
   Superset generates:
   
   ```sql
   DATE_ADD(col, INTERVAL (HOUR(col)) HOUR)
   ```
   
   This expression:
   - does **not** truncate minutes/seconds
   - keeps `mm:ss` from the original timestamp
   - results in **no aggregation** (each row remains unique)
   
   ---
   
   ### Evidence / Analysis
   
   - `StarRocksEngineSpec.get_time_grain_expressions()["PT1H"]` correctly 
returns:
   
   ```sql
   DATE_ADD(DATE({col}), INTERVAL HOUR({col}) HOUR)
   ```
   
   - However, for **ECharts charts**, the SQL rendered in the final query 
**drops `DATE()`**.
   - A marker inserted in `get_time_grain_expressions()` confirms that:
     - the mapping **is called**
     - but the expression is later **re-rendered / normalized**, losing `DATE()`
   
   Example observed query fragment:
   
   ```sql
   /* __MARK_PT1H__ */
   GROUP BY
     DATE_ADD(`START_DATETIME_UTC`, INTERVAL (HOUR(`START_DATETIME_UTC`)) HOUR)
   ```
   
   This proves that:
   - the EngineSpec mapping is consulted
   - but the final SQL is reconstructed incorrectly afterward
   
   ---
   
   ### Why this is a bug
   - The expression produced by Superset does **not implement an hourly time 
bucket**
   - It breaks all time-series aggregations at the HOUR grain
   - This affects **all databases**, not only StarRocks
   - Superset 4.1.4 did not exhibit this behavior
   
   ---
   
   ### Workaround
   A workaround is to override the PT1H grain with a non-optimizable 
expression, for example:
   
   ```sql
   DATE_ADD(CAST(DATE({col}) AS DATETIME), INTERVAL HOUR({col}) HOUR)
   ```
   
   This prevents the renderer from dropping the truncation.
   
   Here is the full patch to put in config file:
   ```python
   def FLASK_APP_MUTATOR(app):
       from superset.db_engine_specs.starrocks import StarRocksEngineSpec
   
       orig = StarRocksEngineSpec.get_time_grain_expressions
   
       @classmethod
       def patched(cls):
           d = dict(orig() or {})
           d["PT1H"] = "DATE_ADD(CAST(DATE({col}) AS DATETIME), INTERVAL 
HOUR({col}) HOUR)"
           return d
   
       try:
           orig.cache_clear()
       except Exception:
           pass
   
       StarRocksEngineSpec.get_time_grain_expressions = patched
       return app
   ```
   
   ---
   
   ### Suggested fix
   - Ensure that **ECharts time-series SQL generation respects the full 
expression returned by `get_time_grain_expressions()`**
   - Avoid reconstructing the HOUR expression manually
   - Alternatively, use a canonical implementation like:
   
   ```sql
   date_trunc('hour', col)
   ```
   
   where supported.
   
   ---
   
   ### Reproduction steps
   1. Create a dataset with a DATETIME/TIMESTAMP column
   2. Create an **ECharts Area Chart**
   3. Set:
      - Time Grain = **HOUR**
   4. Inspect **View query**
   5. Observe `DATE_ADD(col, INTERVAL HOUR(col) HOUR)` instead of a truncated 
timestamp
   
   ---
   
   ### Additional notes
   - Switching between `mysql://` and `starrocks://` does **not** change the 
behavior
   - Other time grains (DAY, MONTH, YEAR) behave correctly
   - This appears specific to **PT1H handling in the ECharts query pipeline**
   
   
   ### Screenshots/recordings
   
   _No response_
   
   ### Superset version
   
   6.0.0
   
   ### Python version
   
   3.11
   
   ### Node version
   
   16
   
   ### Browser
   
   Chrome
   
   ### Additional context
   
   _No response_
   
   ### Checklist
   
   - [x] I have searched Superset docs and Slack and didn't find a solution to 
my problem.
   - [x] I have searched the GitHub issue tracker and didn't find a similar bug 
report.
   - [x] I have checked Superset's logs for errors and if I found a relevant 
Python stacktrace, I included it here as text in the "additional context" 
section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to