skorper opened a new pull request #132: URL: https://github.com/apache/incubator-sdap-nexus/pull/132
https://issues.apache.org/jira/browse/SDAP-338 Added support for multi-var tiles in the matchup algorithm. This support exists for multi-variable swath and gridded tiles. I went back and forth a lot about the best way to do this. In the end, I decided the best way for now is to make it backwards compatible so the other algorithms can be updated as-needed. This means all algorithms should work as normal on single-var tiles, but would need to update their code to work with multi-var. An alternative idea would be to update our tile representation drastically, which would require all algorithms to be updated for both single and multi var tiles. This might be the 'right' way to do this at some point, but for now the approach I went with is the one with the smallest impact on other SDAP algorithms. In order to support this I made the following changes: - Modified `Tile`: - Added a new variable `is_multi` which is True when the tile is multi-var. - Changed the name of `var_name` to `var_names`. In the single-var case, this will be a list of size 1. - In `CassandraProxy`, added a new case in `get_lat_lon_time_data_meta` for `swath_multi_variable_tile` and `grid_multi_variable_tile`. - In both cases, the 'num vars' dimension is moved to the front of the nd array. This is counter to the shape provided by the ingester. The ingester has the 'num vars' dimension at the end of the nd array. For example, let's say the data is size 30 x 30 x 30, and there are two variables. That means the ingester will store the tile data as 30 x 30 x 30 x 2, whereas in `get_lat_lon_time_data_meta` I'm transforming that to 2 x 30 x 30 x 30. The reason for this is because the algorithm doesn't really have to change, you can run the same algorithm on `data[0], data[1], ...` and it should just work. I'm definitely looking for feedback on this decision, because I can see the argument both ways. - In the single var case, the data size would be 30 x 30 x 30 (using the example from above) without the extra dimension. The is_multi flag needs to be used by algorithms to determine which shape to expect the data. - Updated `nexustiles._solr_docs_to_tiles` to parse var_names from `var_name` field, where that field is a JSON encoded array (according to William) - Updated `Tile` to utilize dataclasses. This just cleaned up the code and allowed removal of boilerplate code. - Updated code in various places to work with the case where Tile.is_multi == True. - Added docstrings as I was going where missing and where clarification was probably needed due to changes. - Added more unit tests to test_matchup.py. Updated existing unit tests to work with changes. The matchup algorithm was tested with: 1. Single-var sat to in-situ 2. Single-var sat to single-var sat 3. Multi-var sat to single-var sat 4. Multi-var sat to multi-var sat 5. Multi-var sat to in-situ NOTE: I'm still working on adding additional unit tests for `nexustiles` but thought I'd open the PR ASAP because it's large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org