[jira] [Updated] (SDAP-344) Add ability to read time stamp from global attributes
[ https://issues.apache.org/jira/browse/SDAP-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-344: - Issue Type: Improvement (was: Task) > Add ability to read time stamp from global attributes > - > > Key: SDAP-344 > URL: https://issues.apache.org/jira/browse/SDAP-344 > Project: Apache Science Data Analytics Platform > Issue Type: Improvement > Components: granule-ingester >Reporter: Joseph C. Jacob >Priority: Major > > Some datasets lack a time variable and instead encode the time stamp in > global attributes called time_coverage_start and time_coverage_end. Example: > :time_coverage_start = "2002-07-04T00:40:05.000Z"; > :time_coverage_end = "2017-08-01T03:00:00.000Z"; > The ingester needs to be able to extract a single time stamp from these > attributes, using either of the attributes, or the average of both. > The ingester should read the time stamp from a granule using these 3 methods > (in priority order): > # From the time variable > # From the global attributes (this ticket) > # From the filename > All three were options in the old legacy ningester. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SDAP-345) Add ability to read time stamp from the filename
[ https://issues.apache.org/jira/browse/SDAP-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-345: - Issue Type: Improvement (was: Task) > Add ability to read time stamp from the filename > > > Key: SDAP-345 > URL: https://issues.apache.org/jira/browse/SDAP-345 > Project: Apache Science Data Analytics Platform > Issue Type: Improvement > Components: granule-ingester >Reporter: Joseph C. Jacob >Priority: Major > > Some datasets lack a time variable and attributes and only indicate the date > and/or time in the filename. In these cases, the ingester needs to be able > to extract the time stamp from the filenames according to a new regular > expression setting in the collections-config ConfigMap. > The ingester should read the time stamp from a granule using these 3 methods > (in priority order): > # From the time variable > # From the global attributes (this ticket) > # From the filename > All three were options in the old legacy ningester. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SDAP-326) Make ingest processors optional in incubator-sdap-ingestor
[ https://issues.apache.org/jira/browse/SDAP-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-326: - Issue Type: Improvement (was: Task) > Make ingest processors optional in incubator-sdap-ingestor > -- > > Key: SDAP-326 > URL: https://issues.apache.org/jira/browse/SDAP-326 > Project: Apache Science Data Analytics Platform > Issue Type: Improvement > Components: collection-ingester, granule-ingester >Reporter: Joseph C. Jacob >Priority: Major > > h3. The Problem: > The old *incubator-sdap-ningesterpy* / *incubator-sdap-ningester* required > that we list the processors to be applied to each dataset at ingest time in > the configuration file for the dataset. The new *incubator-sdap-ingester* > applies these processors automatically and has no mechanism to change the > behavior via a data collection config setting. This is a problem with the > processor that converts any variable with units "kelvin" to units "celsius" > because some variables are in units "kelvin", but represent a difference from > a norm and should not be transformed. > Currently, "*kelvintocelsius*" is the only processor that has been identified > as one that we need to be able to turn off. However, this may apply to any > units conversion or to other processors added in the future. > h3. The Details: > In particular, for the *{{MUR25-JPL-L4-GLOB-v4.2}}* dataset, we commonly > ingest both the *{{analysed_sst}}* and the *{{sst_anomaly}}*, both of which > natively have units of degrees Kelvin, but the {{*sst_anomaly* represents a > difference from some norm and should not be subject to the “subtract 273.15” > operation. An *sst_anomaly*}} of 0 degrees in degrees Kelvin is still a 0 > degree “anomaly” or “difference” in degrees Celsius. So, we need to restrict > which variables get this operation applied to them. > h3. Proposed Solution: > I propose to solve this in a way that is not specific to *kelvintocelsius* > processor. Currently that processor is the only one that has been identified > as one that we need to be able to turn off, but there may be others in the > future. The proposed solution is to add a keyword in the > *collections-config* where we can list any processors to be turned OFF for a > dataset. Then we would just need to check that a processor is not in this > list before applying it. This approach would work for the *kelvintocelsius* > processor and any other processor that is already supported or is added in > the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SDAP-343) Fix reading of time stamp in GPM data
[ https://issues.apache.org/jira/browse/SDAP-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-343: - Issue Type: Bug (was: Task) > Fix reading of time stamp in GPM data > - > > Key: SDAP-343 > URL: https://issues.apache.org/jira/browse/SDAP-343 > Project: Apache Science Data Analytics Platform > Issue Type: Bug > Components: granule-ingester >Reporter: Joseph C. Jacob >Priority: Blocker > > The GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V06 > (GPM_3IMERGDE) dataset > ([https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDE_06/summary?keywords=GPM)] > gives an error upon ingest: > {quote}{{granule_ingester.exceptions.Exceptions.TileProcessingError: Could > not generate tiles from the granule because of the following error: > unsupported operand type(s) for /: 'cftime._cftime.DatetimeJulian' and > 'float'.}}{quote} > {{This appears to be related to how the time variable is read. The time > variable is given as: }} >double time(time) ; > time:units = "days since 1970-01-01 00:00:00Z" ; > time:standard_name = "time" ; > time:calendar = "julian" ; > time:bounds = "time_bnds" ; > time:origname = "time" ; > time:fullnamepath = "/time" ; >double time_bnds(time, nv) ; > time_bnds:units = "days since 1970-01-01 00:00:00Z" ; > time_bnds:coordinates = "time nv" ; > time_bnds:origname = "time_bnds" ; > time_bnds:fullnamepath = "/time_bnds" ; > {{The new SDAP ingester uses xarray to read the NetCDF files. Xarray tries > to force conversion to a datetime64 object if possible, but seems to have > been unable to do so in this case (maybe related to the Julian calendar being > used?). }} > {{The old legacy ningester was able to read GPM in the past (during the > OceanWorks project). A notable difference is that the old ningester used the > NetCDF4 module instead of xarray.}} > In this ticket we need to determine if xarray can be used correctly for this > dataset, and if not, we need to revert back to using NetCDF4. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SDAP-217) Add ingest processor to flip tiles vertically
[ https://issues.apache.org/jira/browse/SDAP-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-217: - Resolution: Implemented Status: Done (was: In Progress) This was implemented as part of this PR: [https://github.com/apache/incubator-sdap-ingester/pull/31] > Add ingest processor to flip tiles vertically > - > > Key: SDAP-217 > URL: https://issues.apache.org/jira/browse/SDAP-217 > Project: Apache Science Data Analytics Platform > Issue Type: Improvement >Reporter: Joseph C. Jacob >Assignee: Joseph C. Jacob >Priority: Major > > SDAP currently assumes that data granules are packaged such that the > latitudes are monotonically +ascending+. This ticket is to add a new ingest > processor to vertically flip tiles at ingest time in order to support > datasets with granules that have monotonically +decreasing+ latitudes. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (SDAP-217) Add ingest processor to flip tiles vertically
[ https://issues.apache.org/jira/browse/SDAP-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417901#comment-17417901 ] Joseph C. Jacob commented on SDAP-217: -- This was implemented as part of this PR: [https://github.com/apache/incubator-sdap-ingester/pull/31] > Add ingest processor to flip tiles vertically > - > > Key: SDAP-217 > URL: https://issues.apache.org/jira/browse/SDAP-217 > Project: Apache Science Data Analytics Platform > Issue Type: Improvement >Reporter: Joseph C. Jacob >Assignee: Joseph C. Jacob >Priority: Major > > SDAP currently assumes that data granules are packaged such that the > latitudes are monotonically +ascending+. This ticket is to add a new ingest > processor to vertically flip tiles at ingest time in order to support > datasets with granules that have monotonically +decreasing+ latitudes. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SDAP-346) PySpark environment variables incorrectly set
Joseph C. Jacob created SDAP-346: Summary: PySpark environment variables incorrectly set Key: SDAP-346 URL: https://issues.apache.org/jira/browse/SDAP-346 Project: Apache Science Data Analytics Platform Issue Type: Bug Components: nexus Reporter: Joseph C. Jacob Assignee: Joseph C. Jacob SDAP deployment fails due to the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables being set incorrectly to directories instead of executables in the incubator-sdap-nexus/docker/nexus-webapp/Dockerfile: PYSPARK_DRIVER_PYTHON=/opt/conda/lib/python3.8 PYSPARK_PYTHON=/opt/conda/lib/python3.8 The correct settings are to the executables: PYSPARK_DRIVER_PYTHON=/opt/conda/bin/python3.8 PYSPARK_PYTHON=/opt/conda/bin/python3.8 These can be correctly set by overriding them in webapp.distributed.driver.env and webapp.distributed.executor.env in the helm chart values.yaml, but this ticket is to make the default settings work so that no setting is needed in the values.yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SDAP-345) Add ability to read time stamp from the filename
Joseph C. Jacob created SDAP-345: Summary: Add ability to read time stamp from the filename Key: SDAP-345 URL: https://issues.apache.org/jira/browse/SDAP-345 Project: Apache Science Data Analytics Platform Issue Type: Task Components: granule-ingester Reporter: Joseph C. Jacob Some datasets lack a time variable and attributes and only indicate the date and/or time in the filename. In these cases, the ingester needs to be able to extract the time stamp from the filenames according to a new regular expression setting in the collections-config ConfigMap. The ingester should read the time stamp from a granule using these 3 methods (in priority order): # From the time variable # From the global attributes (this ticket) # From the filename All three were options in the old legacy ningester. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SDAP-344) Add ability to read time stamp from global attributes
Joseph C. Jacob created SDAP-344: Summary: Add ability to read time stamp from global attributes Key: SDAP-344 URL: https://issues.apache.org/jira/browse/SDAP-344 Project: Apache Science Data Analytics Platform Issue Type: Task Components: granule-ingester Reporter: Joseph C. Jacob Some datasets lack a time variable and instead encode the time stamp in global attributes called time_coverage_start and time_coverage_end. Example: :time_coverage_start = "2002-07-04T00:40:05.000Z"; :time_coverage_end = "2017-08-01T03:00:00.000Z"; The ingester needs to be able to extract a single time stamp from these attributes, using either of the attributes, or the average of both. The ingest should read the time stamp from a granule using these 3 methods (in priority order): # From the time variable # From the global attributes (this ticket) # From the filename All three were options in the old legacy ningester. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SDAP-344) Add ability to read time stamp from global attributes
[ https://issues.apache.org/jira/browse/SDAP-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph C. Jacob updated SDAP-344: - Description: Some datasets lack a time variable and instead encode the time stamp in global attributes called time_coverage_start and time_coverage_end. Example: :time_coverage_start = "2002-07-04T00:40:05.000Z"; :time_coverage_end = "2017-08-01T03:00:00.000Z"; The ingester needs to be able to extract a single time stamp from these attributes, using either of the attributes, or the average of both. The ingester should read the time stamp from a granule using these 3 methods (in priority order): # From the time variable # From the global attributes (this ticket) # From the filename All three were options in the old legacy ningester. was: Some datasets lack a time variable and instead encode the time stamp in global attributes called time_coverage_start and time_coverage_end. Example: :time_coverage_start = "2002-07-04T00:40:05.000Z"; :time_coverage_end = "2017-08-01T03:00:00.000Z"; The ingester needs to be able to extract a single time stamp from these attributes, using either of the attributes, or the average of both. The ingest should read the time stamp from a granule using these 3 methods (in priority order): # From the time variable # From the global attributes (this ticket) # From the filename All three were options in the old legacy ningester. > Add ability to read time stamp from global attributes > - > > Key: SDAP-344 > URL: https://issues.apache.org/jira/browse/SDAP-344 > Project: Apache Science Data Analytics Platform > Issue Type: Task > Components: granule-ingester >Reporter: Joseph C. Jacob >Priority: Major > > Some datasets lack a time variable and instead encode the time stamp in > global attributes called time_coverage_start and time_coverage_end. Example: > :time_coverage_start = "2002-07-04T00:40:05.000Z"; > :time_coverage_end = "2017-08-01T03:00:00.000Z"; > The ingester needs to be able to extract a single time stamp from these > attributes, using either of the attributes, or the average of both. > The ingester should read the time stamp from a granule using these 3 methods > (in priority order): > # From the time variable > # From the global attributes (this ticket) > # From the filename > All three were options in the old legacy ningester. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SDAP-343) Fix reading of time stamp in GPM data
Joseph C. Jacob created SDAP-343: Summary: Fix reading of time stamp in GPM data Key: SDAP-343 URL: https://issues.apache.org/jira/browse/SDAP-343 Project: Apache Science Data Analytics Platform Issue Type: Task Components: granule-ingester Reporter: Joseph C. Jacob The GPM IMERG Early Precipitation L3 1 day 0.1 degree x 0.1 degree V06 (GPM_3IMERGDE) dataset ([https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDE_06/summary?keywords=GPM)] gives an error upon ingest: {quote}{{granule_ingester.exceptions.Exceptions.TileProcessingError: Could not generate tiles from the granule because of the following error: unsupported operand type(s) for /: 'cftime._cftime.DatetimeJulian' and 'float'.}}{quote} {{This appears to be related to how the time variable is read. The time variable is given as: }} double time(time) ; time:units = "days since 1970-01-01 00:00:00Z" ; time:standard_name = "time" ; time:calendar = "julian" ; time:bounds = "time_bnds" ; time:origname = "time" ; time:fullnamepath = "/time" ; double time_bnds(time, nv) ; time_bnds:units = "days since 1970-01-01 00:00:00Z" ; time_bnds:coordinates = "time nv" ; time_bnds:origname = "time_bnds" ; time_bnds:fullnamepath = "/time_bnds" ; {{The new SDAP ingester uses xarray to read the NetCDF files. Xarray tries to force conversion to a datetime64 object if possible, but seems to have been unable to do so in this case (maybe related to the Julian calendar being used?). }} {{The old legacy ningester was able to read GPM in the past (during the OceanWorks project). A notable difference is that the old ningester used the NetCDF4 module instead of xarray.}} In this ticket we need to determine if xarray can be used correctly for this dataset, and if not, we need to revert back to using NetCDF4. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-sdap-ingester] skorper opened a new pull request #41: SDAP-323: Update summarizing processor and Solr schema to support multiple variables
skorper opened a new pull request #41: URL: https://github.com/apache/incubator-sdap-ingester/pull/41 Updated the solr doc to support multiple variables. The following changes were made: 1. Updated var name field to a list type 2. Updated var name field to `tile_var_name_ss` 3. Added a new field `{var_name}.tile_standard_name_s` which contains standard name. For example: ```json ... "tile_var_name_ss": [ "wind_speed", "wind_to_direction" ], "wind_speed.tile_standard_name_s": "wind_speed", "wind_to_direction.tile_standard_name_s": "wind_to_direction", ... ``` The variable name and standard name are still stored as json encoded lists OR strings (single vs multi-var), but then are translated to lists in the solr metadata. In the single-var case, the var name field is a list of size 1, in the multi-var case the var name is a list of size N. I implemented it such that standard name is `null` in the solr doc when not available in the granule metadata. Any thoughts about whether or not this is the desired behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (SDAP-342) Update webapp to use spark 3.1.1
Joseph C. Jacob created SDAP-342: Summary: Update webapp to use spark 3.1.1 Key: SDAP-342 URL: https://issues.apache.org/jira/browse/SDAP-342 Project: Apache Science Data Analytics Platform Issue Type: Task Components: helm Reporter: Joseph C. Jacob Assignee: Joseph C. Jacob The current helm/templates/webapp.yml configuration specifies Spark version 2.4.4 in several places. Verify that no active project really requires 2.4.4 and, if not, update it to version 3.1.1 (or whatever is the latest Spark version that the spark operator supports). The spark-operator is at [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator,] and there is a Version Matrix about half way down the page in the README. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-sdap-ingester] skorper closed pull request #40: Added standard name field to solr doc
skorper closed pull request #40: URL: https://github.com/apache/incubator-sdap-ingester/pull/40 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sdap-ingester] skorper commented on pull request #40: Added standard name field to solr doc
skorper commented on pull request #40: URL: https://github.com/apache/incubator-sdap-ingester/pull/40#issuecomment-923185638 Closing this PR because the agreed upon multi-var format has changed. A new PR will be opened with the multi-var solr metadata format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sdap.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Podling Sdap Report Reminder - October 2020
Dear podling, This email was sent by an automated system on behalf of the Apache Incubator PMC. It is an initial reminder to give you plenty of time to prepare your quarterly board report. The board meeting is scheduled for Wed, 21 October 2020. The report for your podling will form a part of the Incubator PMC report. The Incubator PMC requires your report to be submitted 2 weeks before the board meeting, to allow sufficient time for review and submission (Wed, October 07). Please submit your report with sufficient time to allow the Incubator PMC, and subsequently board members to review and digest. Again, the very latest you should submit your report is 2 weeks prior to the board meeting. Candidate names should not be made public before people are actually elected, so please do not include the names of potential committers or PPMC members in your report. Thanks, The Apache Incubator PMC Submitting your Report -- Your report should contain the following: * Your project name * A brief description of your project, which assumes no knowledge of the project or necessarily of its field * A list of the three most important issues to address in the move towards graduation. * Any issues that the Incubator PMC or ASF Board might wish/need to be aware of * How has the community developed since the last report * How has the project developed since the last report. * How does the podling rate their own maturity. This should be appended to the Incubator Wiki page at: https://cwiki.apache.org/confluence/display/INCUBATOR/October2020 Note: This is manually populated. You may need to wait a little before this page is created from a template. Note: The format of the report has changed to use markdown. Mentors --- Mentors should review reports for their project(s) and sign them off on the Incubator wiki page. Signing off reports shows that you are following the project - projects that are not signed may raise alarms for the Incubator PMC. Incubator PMC