[
https://issues.apache.org/jira/browse/SDAP-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nga Thien Chung updated SDAP-406:
---------------------------------
Resolution: Fixed
Status: Done (was: To Do)
> Time series comparison stats issues
> -----------------------------------
>
> Key: SDAP-406
> URL: https://issues.apache.org/jira/browse/SDAP-406
> Project: Apache Science Data Analytics Platform
> Issue Type: Bug
> Components: analysis
> Reporter: Kevin Marlis
> Priority: Major
>
> {*}In short{*}: the time series comparison stats only compute the linear
> regression for the results that have sync'd up times. ex: DS1 and DS2 are
> both monthly products, but DS1 data falls on the first of the month and DS2
> falls on the middle of the month. With no matching times across the two
> datasets, none of the algorithm results data gets provided to the regression
> algorithm.
>
> {*}In detail{*}: The issue is at this line:
> [https://github.com/apache/incubator-sdap-nexus/blob/22b10f661f02e4b8329e3973234b83b188133d8c/analysis/webservice/algorithms_spark/TimeSeriesSpark.py#L314]
> {{`xy`}} is appended to if there are 2 dictionaries of results in
> `{{{}item`{}}}. That only happens if there are two identical time values
> between the two datasets. The linear regression algorithm will return nans if
> x and y arrays only contain one value, which can be problematic downstream.
> The xs and ys for the regression never get appended to because the dates
> never sync up ({{{}if len(item) == 2{}}} is never satisfied). Empty
> comparison stats don't appear to cause an impact to the charts on the
> frontend.
>
> *Possible fixes...*
> * check if lin regression results are nan, if so set stats to empty dict
> * Date normalization to make the time steps consistent across multiple
> datasets
>
> For now we're going with the first option, although the second option could
> be looked into.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)