skorper opened a new pull request #132:
URL: https://github.com/apache/incubator-sdap-nexus/pull/132


   https://issues.apache.org/jira/browse/SDAP-338
   
   Added support for multi-var tiles in the matchup algorithm. This support 
exists for multi-variable swath and gridded tiles. 
   
   I went back and forth a lot about the best way to do this. In the end, I 
decided the best way for now is to make it backwards compatible so the other 
algorithms can be updated as-needed. This means all algorithms should work as 
normal on single-var tiles, but would need to update their code to work with 
multi-var. 
   
   An alternative idea would be to update our tile representation drastically, 
which would require all algorithms to be updated for both single and multi var 
tiles. This might be the 'right' way to do this at some point, but for now the 
approach I went with is the one with the smallest impact on other SDAP 
algorithms.
   
   In order to support this I made the following changes:
   
   - Modified `Tile`:
     - Added a new variable `is_multi` which is True when the tile is multi-var.
     - Changed the name of `var_name` to `var_names`. In the single-var case, 
this will be a list of size 1.
   - In `CassandraProxy`, added a new case in `get_lat_lon_time_data_meta` for 
`swath_multi_variable_tile` and `grid_multi_variable_tile`.
     - In both cases, the 'num vars' dimension is moved to the front of the nd 
array. This is counter to the shape provided by the ingester. The ingester has 
the 'num vars' dimension at the end of the nd array. For example, let's say the 
data is size 30 x 30 x 30, and there are two variables. That means the ingester 
will store the tile data as 30 x 30 x 30 x 2, whereas in 
`get_lat_lon_time_data_meta` I'm transforming that to 2 x 30 x 30 x 30. The 
reason for this is because the algorithm doesn't really have to change, you can 
run the same algorithm on `data[0], data[1], ...` and it should just work. I'm 
definitely looking for feedback on this decision, because I can see the 
argument both ways.
     - In the single var case, the data size would be 30 x 30 x 30 (using the 
example from above) without the extra dimension. The is_multi flag needs to be 
used by algorithms to determine which shape to expect the data.
   - Updated `nexustiles._solr_docs_to_tiles` to parse var_names from 
`var_name` field, where that field is a JSON encoded array (according to 
William)
   - Updated `Tile` to utilize dataclasses. This just cleaned up the code and 
allowed removal of boilerplate code.
   - Updated code in various places to work with the case where Tile.is_multi 
== True.
   - Added docstrings as I was going where missing and where clarification was 
probably needed due to changes.
   - Added more unit tests to test_matchup.py. Updated existing unit tests to 
work with changes.
   
   The matchup algorithm was tested with:
   
   1. Single-var sat to in-situ
   2. Single-var sat to single-var sat
   3. Multi-var sat to single-var sat
   4. Multi-var sat to multi-var sat
   5. Multi-var sat to in-situ
   
   NOTE: I'm still working on adding additional unit tests for `nexustiles` but 
thought I'd open the PR ASAP because it's large. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to