[ https://issues.apache.org/jira/browse/SDAP-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nga Thien Chung updated SDAP-406: --------------------------------- Resolution: Fixed Status: Done (was: To Do) > Time series comparison stats issues > ----------------------------------- > > Key: SDAP-406 > URL: https://issues.apache.org/jira/browse/SDAP-406 > Project: Apache Science Data Analytics Platform > Issue Type: Bug > Components: analysis > Reporter: Kevin Marlis > Priority: Major > > {*}In short{*}: the time series comparison stats only compute the linear > regression for the results that have sync'd up times. ex: DS1 and DS2 are > both monthly products, but DS1 data falls on the first of the month and DS2 > falls on the middle of the month. With no matching times across the two > datasets, none of the algorithm results data gets provided to the regression > algorithm. > > {*}In detail{*}: The issue is at this line: > [https://github.com/apache/incubator-sdap-nexus/blob/22b10f661f02e4b8329e3973234b83b188133d8c/analysis/webservice/algorithms_spark/TimeSeriesSpark.py#L314] > {{`xy`}} is appended to if there are 2 dictionaries of results in > `{{{}item`{}}}. That only happens if there are two identical time values > between the two datasets. The linear regression algorithm will return nans if > x and y arrays only contain one value, which can be problematic downstream. > The xs and ys for the regression never get appended to because the dates > never sync up ({{{}if len(item) == 2{}}} is never satisfied). Empty > comparison stats don't appear to cause an impact to the charts on the > frontend. > > *Possible fixes...* > * check if lin regression results are nan, if so set stats to empty dict > * Date normalization to make the time steps consistent across multiple > datasets > > For now we're going with the first option, although the second option could > be looked into. -- This message was sent by Atlassian Jira (v8.20.10#820010)