[jira] [Updated] (SDAP-344) Add ability to read time stamp from global attributes

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-344:
-
Issue Type: Improvement  (was: Task)

> Add ability to read time stamp from global attributes
> -
>
> Key: SDAP-344
> URL: https://issues.apache.org/jira/browse/SDAP-344
> Project: Apache Science Data Analytics Platform
>  Issue Type: Improvement
>  Components: granule-ingester
>Reporter: Joseph C. Jacob
>Priority: Major
>
> Some datasets lack a time variable and instead encode the time stamp in 
> global attributes called time_coverage_start and time_coverage_end.  Example:
>  :time_coverage_start = "2002-07-04T00:40:05.000Z";
>  :time_coverage_end = "2017-08-01T03:00:00.000Z";
>  The ingester needs to be able to extract a single time stamp from these 
> attributes, using either of the attributes, or the average of both.
> The ingester should read the time stamp from a granule using these 3 methods 
> (in priority order):
>  # From the time variable
>  # From the global attributes (this ticket)
>  # From the filename
> All three were options in the old legacy ningester.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SDAP-345) Add ability to read time stamp from the filename

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-345:
-
Issue Type: Improvement  (was: Task)

> Add ability to read time stamp from the filename
> 
>
> Key: SDAP-345
> URL: https://issues.apache.org/jira/browse/SDAP-345
> Project: Apache Science Data Analytics Platform
>  Issue Type: Improvement
>  Components: granule-ingester
>Reporter: Joseph C. Jacob
>Priority: Major
>
> Some datasets lack a time variable and attributes and only indicate the date 
> and/or time in the filename.  In these cases, the ingester needs to be able 
> to extract the time stamp from the filenames according to a new regular 
> expression setting in the collections-config ConfigMap.
> The ingester should read the time stamp from a granule using these 3 methods 
> (in priority order):
>  # From the time variable
>  # From the global attributes (this ticket)
>  # From the filename
> All three were options in the old legacy ningester.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SDAP-326) Make ingest processors optional in incubator-sdap-ingestor

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-326:
-
Issue Type: Improvement  (was: Task)

> Make ingest processors optional in incubator-sdap-ingestor
> --
>
> Key: SDAP-326
> URL: https://issues.apache.org/jira/browse/SDAP-326
> Project: Apache Science Data Analytics Platform
>  Issue Type: Improvement
>  Components: collection-ingester, granule-ingester
>Reporter: Joseph C. Jacob
>Priority: Major
>
> h3. The Problem:
> The old *incubator-sdap-ningesterpy* / *incubator-sdap-ningester* required 
> that we list the processors to be applied to each dataset at ingest time in 
> the configuration file for the dataset.  The new *incubator-sdap-ingester* 
> applies these processors automatically and has no mechanism to change the 
> behavior via a data collection config setting.  This is a problem with the 
> processor that converts any variable with units "kelvin" to units "celsius" 
> because some variables are in units "kelvin", but represent a difference from 
> a norm and should not be transformed.
> Currently, "*kelvintocelsius*" is the only processor that has been identified 
> as one that we need to be able to turn off.  However, this may apply to any 
> units conversion or to other processors added in the future.
> h3. The Details:
> In particular, for the *{{MUR25-JPL-L4-GLOB-v4.2}}* dataset, we commonly 
> ingest both the *{{analysed_sst}}* and the *{{sst_anomaly}}*, both of which 
> natively have units of degrees Kelvin, but the {{*sst_anomaly* represents a 
> difference from some norm and should not be subject to the “subtract 273.15” 
> operation.  An *sst_anomaly*}} of 0 degrees in degrees Kelvin is still a 0 
> degree “anomaly” or “difference” in degrees Celsius.  So, we need to restrict 
> which variables get this operation applied to them.
> h3. Proposed Solution:
> I propose to solve this in a way that is not specific to *kelvintocelsius* 
> processor.  Currently that processor is the only one that has been identified 
> as one that we need to be able to turn off, but there may be others in the 
> future.  The proposed solution is to add a keyword in the 
> *collections-config* where we can list any processors to be turned OFF for a 
> dataset.  Then we would just need to check that a processor is not in this 
> list before applying it.  This approach would work for the *kelvintocelsius* 
> processor and any other processor that is already supported or is added in 
> the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SDAP-343) Fix reading of time stamp in GPM data

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-343:
-
Issue Type: Bug  (was: Task)

> Fix reading of time stamp in GPM data
> -
>
> Key: SDAP-343
> URL: https://issues.apache.org/jira/browse/SDAP-343
> Project: Apache Science Data Analytics Platform
>  Issue Type: Bug
>  Components: granule-ingester
>Reporter: Joseph C. Jacob
>Priority: Blocker
>
> The GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V06 
> (GPM_3IMERGDE) dataset 
> ([https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDE_06/summary?keywords=GPM)] 
> gives an error upon ingest:  
> {quote}{{granule_ingester.exceptions.Exceptions.TileProcessingError: Could 
> not generate tiles from the granule because of the following error: 
> unsupported operand type(s) for /: 'cftime._cftime.DatetimeJulian' and 
> 'float'.}}{quote}
> {{This appears to be related to how the time variable is read.  The time 
> variable is given as: }}
>double time(time) ;
>   time:units = "days since 1970-01-01 00:00:00Z" ;
>   time:standard_name = "time" ;
>   time:calendar = "julian" ;
>   time:bounds = "time_bnds" ;
>   time:origname = "time" ;
>   time:fullnamepath = "/time" ;
>double time_bnds(time, nv) ;
>   time_bnds:units = "days since 1970-01-01 00:00:00Z" ;
>   time_bnds:coordinates = "time nv" ;
>   time_bnds:origname = "time_bnds" ;
>   time_bnds:fullnamepath = "/time_bnds" ;
> {{The new SDAP ingester uses xarray to read the NetCDF files.  Xarray tries 
> to force conversion to a datetime64 object if possible, but seems to have 
> been unable to do so in this case (maybe related to the Julian calendar being 
> used?).  }}
> {{The old legacy ningester was able to read GPM in the past (during the 
> OceanWorks project).  A notable difference is that the old ningester used the 
> NetCDF4 module instead of xarray.}}
> In this ticket we need to determine if xarray can be used correctly for this 
> dataset, and if not, we need to revert back to using NetCDF4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SDAP-217) Add ingest processor to flip tiles vertically

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-217:
-
Resolution: Implemented
Status: Done  (was: In Progress)

This was implemented as part of this PR:  
[https://github.com/apache/incubator-sdap-ingester/pull/31]

 

> Add ingest processor to flip tiles vertically
> -
>
> Key: SDAP-217
> URL: https://issues.apache.org/jira/browse/SDAP-217
> Project: Apache Science Data Analytics Platform
>  Issue Type: Improvement
>Reporter: Joseph C. Jacob
>Assignee: Joseph C. Jacob
>Priority: Major
>
> SDAP currently assumes that data granules are packaged such that the 
> latitudes are monotonically +ascending+.  This ticket is to add a new ingest 
> processor to vertically flip tiles at ingest time in order to support 
> datasets with granules that have monotonically +decreasing+ latitudes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SDAP-217) Add ingest processor to flip tiles vertically

2021-09-20 Thread Joseph C. Jacob (Jira)


[ 
https://issues.apache.org/jira/browse/SDAP-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417901#comment-17417901
 ] 

Joseph C. Jacob commented on SDAP-217:
--

This was implemented as part of this PR:  
[https://github.com/apache/incubator-sdap-ingester/pull/31]

 

> Add ingest processor to flip tiles vertically
> -
>
> Key: SDAP-217
> URL: https://issues.apache.org/jira/browse/SDAP-217
> Project: Apache Science Data Analytics Platform
>  Issue Type: Improvement
>Reporter: Joseph C. Jacob
>Assignee: Joseph C. Jacob
>Priority: Major
>
> SDAP currently assumes that data granules are packaged such that the 
> latitudes are monotonically +ascending+.  This ticket is to add a new ingest 
> processor to vertically flip tiles at ingest time in order to support 
> datasets with granules that have monotonically +decreasing+ latitudes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SDAP-346) PySpark environment variables incorrectly set

2021-09-20 Thread Joseph C. Jacob (Jira)
Joseph C. Jacob created SDAP-346:


 Summary: PySpark environment variables incorrectly set
 Key: SDAP-346
 URL: https://issues.apache.org/jira/browse/SDAP-346
 Project: Apache Science Data Analytics Platform
  Issue Type: Bug
  Components: nexus
Reporter: Joseph C. Jacob
Assignee: Joseph C. Jacob


SDAP deployment fails due to the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON 
environment variables being set incorrectly to directories instead of 
executables in the incubator-sdap-nexus/docker/nexus-webapp/Dockerfile:

PYSPARK_DRIVER_PYTHON=/opt/conda/lib/python3.8
PYSPARK_PYTHON=/opt/conda/lib/python3.8

The correct settings are to the executables:

PYSPARK_DRIVER_PYTHON=/opt/conda/bin/python3.8
PYSPARK_PYTHON=/opt/conda/bin/python3.8

These can be correctly set by overriding them in webapp.distributed.driver.env 
and webapp.distributed.executor.env in the helm chart values.yaml, but this 
ticket is to make the default settings work so that no setting is needed in the 
values.yaml.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SDAP-345) Add ability to read time stamp from the filename

2021-09-20 Thread Joseph C. Jacob (Jira)
Joseph C. Jacob created SDAP-345:


 Summary: Add ability to read time stamp from the filename
 Key: SDAP-345
 URL: https://issues.apache.org/jira/browse/SDAP-345
 Project: Apache Science Data Analytics Platform
  Issue Type: Task
  Components: granule-ingester
Reporter: Joseph C. Jacob


Some datasets lack a time variable and attributes and only indicate the date 
and/or time in the filename.  In these cases, the ingester needs to be able to 
extract the time stamp from the filenames according to a new regular expression 
setting in the collections-config ConfigMap.

The ingester should read the time stamp from a granule using these 3 methods 
(in priority order):
 # From the time variable
 # From the global attributes (this ticket)
 # From the filename

All three were options in the old legacy ningester.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SDAP-344) Add ability to read time stamp from global attributes

2021-09-20 Thread Joseph C. Jacob (Jira)
Joseph C. Jacob created SDAP-344:


 Summary: Add ability to read time stamp from global attributes
 Key: SDAP-344
 URL: https://issues.apache.org/jira/browse/SDAP-344
 Project: Apache Science Data Analytics Platform
  Issue Type: Task
  Components: granule-ingester
Reporter: Joseph C. Jacob


Some datasets lack a time variable and instead encode the time stamp in global 
attributes called time_coverage_start and time_coverage_end.  Example:
:time_coverage_start = "2002-07-04T00:40:05.000Z";
:time_coverage_end = "2017-08-01T03:00:00.000Z";
The ingester needs to be able to extract a single time stamp from these 
attributes, using either of the attributes, or the average of both.

The ingest should read the time stamp from a granule using these 3 methods (in 
priority order):
 # From the time variable
 # From the global attributes (this ticket)
 # From the filename

All three were options in the old legacy ningester.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SDAP-344) Add ability to read time stamp from global attributes

2021-09-20 Thread Joseph C. Jacob (Jira)


 [ 
https://issues.apache.org/jira/browse/SDAP-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph C. Jacob updated SDAP-344:
-
Description: 
Some datasets lack a time variable and instead encode the time stamp in global 
attributes called time_coverage_start and time_coverage_end.  Example:
 :time_coverage_start = "2002-07-04T00:40:05.000Z";
 :time_coverage_end = "2017-08-01T03:00:00.000Z";
 The ingester needs to be able to extract a single time stamp from these 
attributes, using either of the attributes, or the average of both.

The ingester should read the time stamp from a granule using these 3 methods 
(in priority order):
 # From the time variable
 # From the global attributes (this ticket)
 # From the filename

All three were options in the old legacy ningester.

  was:
Some datasets lack a time variable and instead encode the time stamp in global 
attributes called time_coverage_start and time_coverage_end.  Example:
:time_coverage_start = "2002-07-04T00:40:05.000Z";
:time_coverage_end = "2017-08-01T03:00:00.000Z";
The ingester needs to be able to extract a single time stamp from these 
attributes, using either of the attributes, or the average of both.

The ingest should read the time stamp from a granule using these 3 methods (in 
priority order):
 # From the time variable
 # From the global attributes (this ticket)
 # From the filename

All three were options in the old legacy ningester.


> Add ability to read time stamp from global attributes
> -
>
> Key: SDAP-344
> URL: https://issues.apache.org/jira/browse/SDAP-344
> Project: Apache Science Data Analytics Platform
>  Issue Type: Task
>  Components: granule-ingester
>Reporter: Joseph C. Jacob
>Priority: Major
>
> Some datasets lack a time variable and instead encode the time stamp in 
> global attributes called time_coverage_start and time_coverage_end.  Example:
>  :time_coverage_start = "2002-07-04T00:40:05.000Z";
>  :time_coverage_end = "2017-08-01T03:00:00.000Z";
>  The ingester needs to be able to extract a single time stamp from these 
> attributes, using either of the attributes, or the average of both.
> The ingester should read the time stamp from a granule using these 3 methods 
> (in priority order):
>  # From the time variable
>  # From the global attributes (this ticket)
>  # From the filename
> All three were options in the old legacy ningester.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SDAP-343) Fix reading of time stamp in GPM data

2021-09-20 Thread Joseph C. Jacob (Jira)
Joseph C. Jacob created SDAP-343:


 Summary: Fix reading of time stamp in GPM data
 Key: SDAP-343
 URL: https://issues.apache.org/jira/browse/SDAP-343
 Project: Apache Science Data Analytics Platform
  Issue Type: Task
  Components: granule-ingester
Reporter: Joseph C. Jacob


The GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V06 
(GPM_3IMERGDE) dataset 
([https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDE_06/summary?keywords=GPM)] 
gives an error upon ingest:  
{quote}{{granule_ingester.exceptions.Exceptions.TileProcessingError: Could not 
generate tiles from the granule because of the following error: unsupported 
operand type(s) for /: 'cftime._cftime.DatetimeJulian' and 'float'.}}{quote}
{{This appears to be related to how the time variable is read.  The time 
variable is given as: }}
   double time(time) ;
  time:units = "days since 1970-01-01 00:00:00Z" ;
  time:standard_name = "time" ;
  time:calendar = "julian" ;
  time:bounds = "time_bnds" ;
  time:origname = "time" ;
  time:fullnamepath = "/time" ;
   double time_bnds(time, nv) ;
  time_bnds:units = "days since 1970-01-01 00:00:00Z" ;
  time_bnds:coordinates = "time nv" ;
  time_bnds:origname = "time_bnds" ;
  time_bnds:fullnamepath = "/time_bnds" ;
{{The new SDAP ingester uses xarray to read the NetCDF files.  Xarray tries to 
force conversion to a datetime64 object if possible, but seems to have been 
unable to do so in this case (maybe related to the Julian calendar being 
used?).  }}

{{The old legacy ningester was able to read GPM in the past (during the 
OceanWorks project).  A notable difference is that the old ningester used the 
NetCDF4 module instead of xarray.}}

In this ticket we need to determine if xarray can be used correctly for this 
dataset, and if not, we need to revert back to using NetCDF4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sdap-ingester] skorper opened a new pull request #41: SDAP-323: Update summarizing processor and Solr schema to support multiple variables

2021-09-20 Thread GitBox


skorper opened a new pull request #41:
URL: https://github.com/apache/incubator-sdap-ingester/pull/41


   Updated the solr doc to support multiple variables. The following changes 
were made:
   
   1. Updated var name field to a list type
   2. Updated var name field to `tile_var_name_ss`
   3. Added a new field `{var_name}.tile_standard_name_s` which contains 
standard name.
   
   For example:
   
   ```json
   ...
 "tile_var_name_ss": [
   "wind_speed",
   "wind_to_direction"
 ],
 "wind_speed.tile_standard_name_s": "wind_speed",
   
 "wind_to_direction.tile_standard_name_s": "wind_to_direction",
   ...
   ```
   
   The variable name and standard name are still stored as json encoded lists 
OR strings (single vs multi-var), but then are translated to lists in the solr 
metadata. In the single-var case, the var name field is a list of size 1, in 
the multi-var case the var name is a list of size N. 
   
   I implemented it such that standard name is `null` in the solr doc when not 
available in the granule metadata. Any thoughts about whether or not this is 
the desired behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (SDAP-342) Update webapp to use spark 3.1.1

2021-09-20 Thread Joseph C. Jacob (Jira)
Joseph C. Jacob created SDAP-342:


 Summary: Update webapp to use spark 3.1.1
 Key: SDAP-342
 URL: https://issues.apache.org/jira/browse/SDAP-342
 Project: Apache Science Data Analytics Platform
  Issue Type: Task
  Components: helm
Reporter: Joseph C. Jacob
Assignee: Joseph C. Jacob


The current helm/templates/webapp.yml configuration specifies Spark version 
2.4.4 in several places.  Verify that no active project really requires 2.4.4 
and, if not, update it to version 3.1.1 (or whatever is the latest Spark 
version that the spark operator supports).  The spark-operator is at 
[https://github.com/GoogleCloudPlatform/spark-on-k8s-operator,] and there is a 
Version Matrix about half way down the page in the README.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sdap-ingester] skorper closed pull request #40: Added standard name field to solr doc

2021-09-20 Thread GitBox


skorper closed pull request #40:
URL: https://github.com/apache/incubator-sdap-ingester/pull/40


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sdap-ingester] skorper commented on pull request #40: Added standard name field to solr doc

2021-09-20 Thread GitBox


skorper commented on pull request #40:
URL: 
https://github.com/apache/incubator-sdap-ingester/pull/40#issuecomment-923185638


   Closing this PR because the agreed upon multi-var format has changed. A new 
PR will be opened with the multi-var solr metadata format. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Podling Sdap Report Reminder - October 2020

2021-09-20 Thread jmclean
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 21 October 2020.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, October 07).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.
*   How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://cwiki.apache.org/confluence/display/INCUBATOR/October2020

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Note: The format of the report has changed to use markdown.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC