[GitHub] [incubator-sdap-ingester] tloubrieu-jpl commented on a change in pull request #4: SDAP-237 Dockerize Collection Manager

GitBox Thu, 25 Jun 2020 11:51:55 -0700


tloubrieu-jpl commented on a change in pull request #4:
URL: 
https://github.com/apache/incubator-sdap-ingester/pull/4#discussion_r445769115




##########
File path: collection_manager/containers/docker/entrypoint.sh
##########
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+python /collection_manager/collection_manager/main.py \

Review comment:
       No sure if this is relevant, you will see.

##########
File path: collection_manager/containers/docker/entrypoint.sh
##########
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+python /collection_manager/collection_manager/main.py \

Review comment:
       The setup.py file can also generate a command line this first line 
shorted (for example collection-manager):
   
   entry_points={
           'console_scripts': 
['collection-manager=collection_manager.collection_manager:main']
       }

##########
File path: collection_manager/README.md
##########
@@ -1,103 +1,75 @@
-# SDAP manager for ingestion of datasets
+# SDAP Collection Manager
 
-## Prerequisites
-
-### python 3
-
-Install anaconda for python 3. From the graphic install for example for macos:
-
-https://www.anaconda.com/distribution/#macos
-
-### git lfs (for development)
-
-Git lfs for the deployment from git, see https://git-lfs.github.com/
-
-If not available you have to get netcdf files for test, if you do need the 
tests.
-
-### Deployed nexus on kubernetes cluster
+The SDAP Collection Manager is a service that watches a YAML file (the 
[Collections
+Configuration](#the-collections-configuration-file) file) stored on the 
filesystem, and all the directories listed in that
+file. Whenever new granules are added to any of the watched directories, the 
Collection
+Manager service will publish a message to RabbitMQ to be picked up by the 
Granule Ingester
+(`/granule_ingester` in this repository), which will then ingest the new 
granules.
 
-See project https://github.com/apache/incubator-sdap-nexus
 
-    $ helm install nexus .  --namespace=sdap --dependency-update -f 
~/overridden-nexus-values.yml 
-
-For development purpose, you might want to expose solr port outside kubernetes
-
-   kubectl port-forward solr-set-0 8983:8983 -n sdap 
+## Prerequisites
 
- 
-## For developers
+Python 3.7
 
-### deploy project
+## Building the service
+From `incubator-sdap-ingester/collection_manager`, run:
 
-    $ bash
-    $ git clone ...
-    $ cd sdap_ingest_manager
-    $ python -m venv venv
-    $ source ./venv/bin/activate
-    $ pip install .
-    $ pytest -s
+    $ python setup.py install
     
-Note the command pip install -e . does not work as it does not deploy the 
configuration files.
-
-### Update the project
-
-Update the code and the test with your favorite IDE (e.g. pyCharm).
-
-### Launch for development/tests
-
-### Prerequisite
 
-Deploy a local rabbitmq service, for example with docker.
+## Running the service
+From `incubator-sdap-ingester/collection_manager`, run:
 
-    docker run -d --hostname localhost -p 5672:5672 --name rabbitmq rabbitmq:3
-   
-   
-### Launch the service
+    $ python collection_manager/main.py -h
+    
+### The Collections Configuration File
 
+A path to a collections configuration file must be passed in to the Collection 
Manager
+at startup via the `--collections-path` parameter. Below is an example of what 
the 
+collections configuration file should look like:
 
-The service reads the collection configuration and submit granule ingestion 
messages to the message broker (rabbitmq).
-For each collection, 2 ingestion priority levels are proposed: the nominal 
priority, the priority for forward processing (newer files), usually higher. 
-An history of the ingested granules is managed so that the ingestion can stop 
and re-start anytime.
+```yaml
+# collections.yaml
 
-    cd collection_manager
-    python main.py -h
-    python main.py --collections ../tests/resources/data/collections.yml 
--history-path=/tmp
+collections:
 
-# Containerization
+    # The identifier for the dataset as it will appear in NEXUS.
+  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_LAND 
 
-TO BE UPDATED
+    # The local path to watch for NetCDF granule files to be associated with 
this dataset. 
+    # Supports glob-style patterns.
+    path: /opt/data/grace/*land*.nc 
 
-## Docker
+    # The name of the NetCDF variable to read when ingesting granules into 
NEXUS for this dataset.
+    variable: lwe_thickness 
 
-    docker build . -f containers/docker/config-operator/Dockerfile --no-cache 
--tag tloubrieu/sdap-ingest-manager:latest
-        
-To publish the docker image on dockerhub do (step necessary for kubernetes 
deployment):
+    # An integer priority level to use when publishing messages to RabbitMQ 
for historical data. 
+    # Higher number = higher priority.
+    priority: 1 
 
-    docker login
-    docker push tloubrieu/sdap-ingest-manager:latest
-    
-## Kubernetes
-    
-### Launch the service
+    # An integer priority level to use when publishing messages to RabbitMQ 
for forward-processing data.
+    # Higher number = higher priority.
+    forward-processing-priority: 5 
 
-    kubectl apply -f containers/kubernetes/job.yml -n sdap
-    
-Delete the service: 
+  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_OCEAN
+    path: /opt/data/grace/*ocean*.nc
+    variable: lwe_thickness
+    priority: 2
+    forward-processing-priority: 6
 
-    kubectl delete jobs --all -n sdap
-    
-    
+  - id: AVHRR_OI-NCEI-L4-GLOB-v2.0
+    path: /opt/data/avhrr/*.nc
+    variable: analysed_sst
+    priority: 1
 
-    
+```
+## Running the tests
+From `incubator-sdap-ingester/collection_manager`, run:
 
+    $ pip install pytest
+    $ pytest
     
-    
-    
- 
-    
-    
-
-
-
-
+## Building the Docker image
+From `incubator-sdap-ingester/collection_manager`, run:
 
+    $ docker build . -f docker/Dockerfile -t nexusjpl/collection-manager

Review comment:
       The 'containers' subdirectory is missing in the command line.
   
   I have a lot of warning when I run it, it doesn't need for you to change 
something but I want to make sure you're having the same warnings:
   In file included from ext/_yaml.c:596:
   ext/_yaml.h:10: warning: "PyString_CheckExact" redefined
    #define PyString_CheckExact PyBytes_CheckExact
    
   ext/_yaml.c:486: note: this is the location of the previous definition
      #define PyString_CheckExact          PyUnicode_CheckExact
    
   ext/_yaml.c: In function ‘__pyx_pf_5_yaml_get_version_string’:
   ext/_yaml.c:1891:17: warning: assignment discards ‘const’ qualifier from 
pointer target type [-Wdiscarded-qualifiers]
      __pyx_v_value = yaml_get_version_string();
                    ^
   
   The docker containers is building on my side.

##########
File path: collection_manager/containers/docker/entrypoint.sh
##########
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+python /collection_manager/collection_manager/main.py \
+  $([[ ! -z "$COLLECTIONS_PATH" ]] && echo 
--collections-path=$COLLECTIONS_PATH) \
+  $([[ ! -z "$RABBITMQ_HOST" ]] && echo --rabbitmq-host=$RABBITMQ_HOST) \

Review comment:
       Do we need these lines for the kubernetes/helm integration ?
   
   

##########
File path: collection_manager/collection_manager/main.py
##########
@@ -18,34 +18,35 @@ def check_path(path) -> str:
 
 
 def get_args() -> argparse.Namespace:
-    parser = argparse.ArgumentParser(description="Run ingestion for a list of 
collection ingestion streams")
-    parser.add_argument("--refresh",
-                        help="refresh interval in seconds to check for new or 
updated granules",
-                        default=300)
-    parser.add_argument("--collections",
+    parser = argparse.ArgumentParser(description="Watch the filesystem for new 
granules, and publish messages to "
+                                                 "RabbitMQ whenever they 
become available.")
+    parser.add_argument("--collections-path",
                         help="Absolute path to collections configuration file",
+                        metavar="PATH",
                         required=True)
-    parser.add_argument('--rabbitmq_host',
+    history_group = parser.add_mutually_exclusive_group(required=True)
+    history_group.add_argument("--history-path",
+                               metavar="PATH",
+                               help="Absolute path to ingestion history local 
directory")
+    history_group.add_argument("--history-url",
+                               metavar="URL",
+                               help="URL to ingestion history solr database")
+    parser.add_argument('--rabbitmq-host',
                         default='localhost',
                         metavar='HOST',
                         help='RabbitMQ hostname to connect to. (Default: 
"localhost")')
-    parser.add_argument('--rabbitmq_username',
+    parser.add_argument('--rabbitmq-username',
                         default='guest',
                         metavar='USERNAME',
                         help='RabbitMQ username. (Default: "guest")')
-    parser.add_argument('--rabbitmq_password',
+    parser.add_argument('--rabbitmq-password',
                         default='guest',
                         metavar='PASSWORD',
                         help='RabbitMQ password. (Default: "guest")')
-    parser.add_argument('--rabbitmq_queue',
+    parser.add_argument('--rabbitmq-queue',

Review comment:
       Yes that looks nicer. I think arg parse converts into a proper attribute 
name on its own. 

##########
File path: collection_manager/collection_manager/main.py
##########
@@ -8,7 +8,7 @@
 
 logging.basicConfig(level=logging.INFO)
 logging.getLogger("pika").setLevel(logging.WARNING)
-logger = logging.getLogger("collection_manager")
+logger = logging.getLogger()

Review comment:
       I am usually doing:
   logger = logging.getLogger(__name__)
   
   Does it behave the same without the __name__ argument ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-sdap-ingester] tloubrieu-jpl commented on a change in pull request #4: SDAP-237 Dockerize Collection Manager

Reply via email to