[GitHub] [incubator-sdap-nexus] skorper opened a new pull request, #235: SDAP-415: Satellite to satellite queries fail if using an L2 dataset

via GitHub Mon, 13 Mar 2023 23:32:10 -0700


skorper opened a new pull request, #235:
URL: https://github.com/apache/incubator-sdap-nexus/pull/235


   Fixed bug where satellite to satellite queries fail if using an L2 dataset. 
   
   This issue was being caused by a bug where secondary satellite tile masks 
were being incorrectly combined, causing the entire secondary tile to be 
masked. This is due to the nature of Python masks and the fact that `True` 
means the value is invalid, meaning a logical_or against an entirely masked np 
array would result in an entirely masked np array. This issue cropped up when 
running sat to sat matchup where VIIRS was the secondary dataset. This is 
because VIIRS contains lots of null values for some variables -- in many cases 
the entire variable in the tile is masked. This would cause the above issue. 
   
   Simplifying the problem, our old logic was like this:
   
   ```python
   >>> a = np.ma.masked_array([1.0, 2.0, 3.0, 4.0], mask=[0, 0, 1, 0])
   >>> b = np.ma.masked_array([5.0, 6.0, 7.0, 8.0], mask=[1, 1, 1, 1])
   >>> np.logical_or(a, b)
   masked_array(data=[--, --, --, --],
                mask=[ True,  True,  True,  True],
          fill_value=1e+20,
               dtype=bool)
   ```
   
   where `True` means drop the value and `False` means keep the value. This is 
not what we want! We want the inverse logic, where a masked array "or'd" 
against an entirely masked array "or'd"  == the first array.
   
   Our new logic is like this:
   
   ```python
   >>> np.logical_not(np.logical_and(a.mask, b.mask))
   array([True, True,  False, True])
   ```
   
   where `True` == keep the value and `False` means drop the value. 
   
   In addition to the above, made a few small changes:
   
   1. Only query the insitu API for the schema if `parameter_s` is provided
   2. Retrieve tiles one-by-one rather than all at once when finding/retrieving 
data for secondary tiles.
   3. If no secondary tiles are found (in sat to sat matchup), handle 
gracefully and return `[]` rather than letting an error get raised
   4. Fixed bug where only the first two variables are considered in the tile 
mask computed in `get_indices`
   
   Tested like so:
   
   - Tested Shawn's `ASCATB-L2-Coastal` -> `VIIRS_NPP-2018_Heatwave` query 
locally. It works!
   - Manually ran Riley's regression tests -- all passed. NOTE: Only sat to sat 
run for now. See below.
   
   Please note the following needs to be done before this PR is approved/merged:
   
   1. Run full regression test suite
      - This is not currently possible because the insitu api is down.
   3. Run benchmarks for (2) above, to ensure sure retrieving sat tiles 
one-by-one is faster than retrieving them all at once. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-sdap-nexus] skorper opened a new pull request, #235: SDAP-415: Satellite to satellite queries fail if using an L2 dataset

Reply via email to