Re: [PR] feat(reports): allow custom na values [superset]

via GitHub Fri, 03 Oct 2025 10:50:22 -0700


betodealmeida commented on code in PR #35481:
URL: https://github.com/apache/superset/pull/35481#discussion_r2402802005



##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2653,3 +2653,172 @@ def test_pivot_multi_level_index():
 | ('Total (Sum)', '', '')           |            210 |            105 |        
      0 |
     """.strip()
     )
+
+
+def test_apply_client_processing_csv_format_preserves_na_strings():
+    """
+    Test that apply_client_processing preserves "NA" strings
+    when REPORTS_CSV_NA_NAMES is set to empty list.
+    This ensures that scheduled reports can be configured to
+    preserve strings like "NA" as literal values.

Review Comment:
   Nit, [multiline docstrings should have a 1 line summary, empty line, 
additional lines](https://peps.python.org/pep-0257/#multi-line-docstrings).
   
   ```suggestion
       Test that apply_client_processing preserves "NA" when 
REPORTS_CSV_NA_NAMES is [].
   
       This ensures that scheduled reports can be configured to
       preserve strings like "NA" as literal values.
   ```



##########
superset/config.py:
##########
@@ -1354,6 +1354,15 @@ def allowed_schemas_for_csv_upload(  # pylint: 
disable=unused-argument
 # Values that should be treated as nulls for the csv uploads.
 CSV_DEFAULT_NA_NAMES = list(STR_NA_VALUES)
 
+# Values that should be treated as nulls for scheduled reports CSV processing.
+# If not set or None, defaults to standard pandas NA handling behavior.
+# Set to a custom list to control which values should be treated as null.
+# Examples:
+# REPORTS_CSV_NA_NAMES = None  # Use default pandas NA handling (backwards 
compatible)
+# REPORTS_CSV_NA_NAMES = []    # Disable all automatic NA conversion
+# REPORTS_CSV_NA_NAMES = ["", "NULL", "null"]  # Only treat these specific 
values as NA
+REPORTS_CSV_NA_NAMES = list(CSV_DEFAULT_NA_NAMES)

Review Comment:
   ```suggestion
   REPORTS_CSV_NA_NAMES: list[str] | None = None
   ```



##########
tests/unit_tests/charts/test_client_processing.py:
##########
@@ -2653,3 +2653,172 @@ def test_pivot_multi_level_index():
 | ('Total (Sum)', '', '')           |            210 |            105 |        
      0 |
     """.strip()
     )
+
+
+def test_apply_client_processing_csv_format_preserves_na_strings():
+    """
+    Test that apply_client_processing preserves "NA" strings
+    when REPORTS_CSV_NA_NAMES is set to empty list.
+    This ensures that scheduled reports can be configured to
+    preserve strings like "NA" as literal values.
+    """
+    from unittest.mock import patch
+
+    # CSV data with "NA" string that should be preserved
+    csv_data = "first_name,last_name\nJeff,Smith\nAlice,NA"
+
+    result = {
+        "queries": [
+            {
+                "result_format": ChartDataResultFormat.CSV,
+                "data": csv_data,
+            }
+        ]
+    }
+
+    form_data = {
+        "datasource": "1__table",
+        "viz_type": "table",
+        "slice_id": 1,
+        "url_params": {},
+        "metrics": [],
+        "groupby": [],
+        "columns": ["first_name", "last_name"],
+        "extra_form_data": {},
+        "force": False,
+        "result_format": "csv",
+        "result_type": "results",
+    }
+
+    # Test with REPORTS_CSV_NA_NAMES set to empty list (disable NA conversion)
+    with patch(
+        "superset.charts.client_processing.current_app.config.get"
+    ) as mock_config:
+        # Only mock the specific config key we're testing
+        def mock_get(key, default=None):
+            if key == "REPORTS_CSV_NA_NAMES":
+                return []  # Empty list disables NA conversion
+            return default

Review Comment:
   You can use the `with_config` decorator here, eg:
   
   ```python
   @with_config({"REPORTS_CSV_NA_NAMES": []})
   def test_something(...):
       ...
   ```



##########
superset/charts/client_processing.py:
##########
@@ -340,7 +341,16 @@ def apply_client_processing(  # noqa: C901
         if query["result_format"] == ChartDataResultFormat.JSON:
             df = pd.DataFrame.from_dict(data)
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            df = pd.read_csv(StringIO(data))
+            # Use custom NA values configuration for
+            # reports to avoid unwanted conversions
+            # This allows users to control which values should be treated as 
null/NA
+            na_values = current_app.config.get("REPORTS_CSV_NA_NAMES", None)
+            if na_values is not None:
+                df = pd.read_csv(
+                    StringIO(data), keep_default_na=False, na_values=na_values
+                )
+            else:
+                df = pd.read_csv(StringIO(data))

Review Comment:
   Small nit: `dict.get()` already defaults to `None` when the key doesn't 
exist, so no need for the second argument. But also, since we know the key 
exists in config we can just access it using brackets.
   
   With this, we can simplify the call a little bit;
   
   ```suggestion
               na_values = current_app.config["REPORTS_CSV_NA_NAMES"]
               df = pd.read_csv(
                   StringIO(data),
                   keep_default_na=na_value is None,
                   na_values=na_values,
               )
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(reports): allow custom na values [superset]

Reply via email to