[jira] [Updated] (SDAP-406) Time series comparison stats issues

Nga Thien Chung (Jira) Tue, 01 Nov 2022 14:21:10 -0700


     [ 
https://issues.apache.org/jira/browse/SDAP-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nga Thien Chung updated SDAP-406:
---------------------------------
    Resolution: Fixed
        Status: Done  (was: To Do)

> Time series comparison stats issues
> -----------------------------------
>
>                 Key: SDAP-406
>                 URL: https://issues.apache.org/jira/browse/SDAP-406
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Bug
>          Components: analysis
>            Reporter: Kevin Marlis
>            Priority: Major
>
> {*}In short{*}: the time series comparison stats only compute the linear 
> regression for the results that have sync'd up times. ex: DS1 and DS2 are 
> both monthly products, but DS1 data falls on the first of the month and DS2 
> falls on the middle of the month. With no matching times across the two 
> datasets, none of the algorithm results data gets provided to the regression 
> algorithm.
>  
> {*}In detail{*}: The issue is at this line: 
> [https://github.com/apache/incubator-sdap-nexus/blob/22b10f661f02e4b8329e3973234b83b188133d8c/analysis/webservice/algorithms_spark/TimeSeriesSpark.py#L314]
> {{`xy`}} is appended to if there are 2 dictionaries of results in 
> `{{{}item`{}}}. That only happens if there are two identical time values 
> between the two datasets. The linear regression algorithm will return nans if 
> x and y arrays only contain one value, which can be problematic downstream. 
> The xs and ys for the regression never get appended to because the dates 
> never sync up ({{{}if len(item) == 2{}}} is never satisfied). Empty 
> comparison stats don't appear to cause an impact to the charts on the 
> frontend.
>  
> *Possible fixes...*
>  * check if lin regression results are nan, if so set stats to empty dict
>  * Date normalization to make the time steps consistent across multiple 
> datasets
>  
> For now we're going with the first option, although the second option could 
> be looked into.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (SDAP-406) Time series comparison stats issues

Reply via email to