This is an automated email from the ASF dual-hosted git repository.
pingsutw pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new fc65e0c SUBMARINE-1000. Add submarine save model method
fc65e0c is described below
commit fc65e0c948411dfefb878755be1703e2638878d6
Author: jeff-901 <[email protected]>
AuthorDate: Tue Sep 7 20:21:47 2021 +0800
SUBMARINE-1000. Add submarine save model method
### What is this PR for?
Create submarine save model method.
### What type of PR is it?
Feature
### Todos
* [x] - Create repository
* [x] - Create save method for pytorch and tensorflow
* [ ] - Create model version when saving.
### What is the Jira issue?
https://issues.apache.org/jira/browse/SUBMARINE-1000
### How should this be tested?
Call the save_model_submarine function and check the artifact by minio ui.
### Screenshots (if appropriate)

### Questions:
* Do the license files need updating? No
* Are there breaking changes for older versions? No
* Does this need new documentation? Yes
Author: jeff-901 <[email protected]>
Signed-off-by: Kevin <[email protected]>
Closes #733 from jeff-901/SUBMARINE-1000 and squashes the following commits:
fe0e9fa2 [jeff-901] fix new lint
5b4b093f [jeff-901] fix lint
4f9e5a29 [jeff-901] repository use environ value and use kubectl to wait
minio
bd259b75 [jeff-901] add the comment of sleep
c50beec7 [jeff-901] change Repository.py to lowercase
3f2d1fb6 [jeff-901] exit if bucket creation fail
20ca0366 [jeff-901] delete local model
0c313391 [jeff-901] create submarine bucket when server starts
b2858ea5 [jeff-901] fix lint
c7d75620 [jeff-901] fix lint
6d9cb107 [jeff-901] fix lint and add TODO
0741d07a [jeff-901] add repository and save model method
---
bin/submarine.sh | 6 ++
dev-support/docker-images/submarine/Dockerfile | 8 ++-
.../docker-images/submarine/create_bucket.sh | 35 +++++++++++
.../pysubmarine/submarine/artifacts/__init__.py | 20 ++++++
.../pysubmarine/submarine/artifacts/repository.py | 73 ++++++++++++++++++++++
.../pysubmarine/submarine/models/client.py | 38 ++++++++++-
.../pysubmarine/submarine/models/pytorch.py | 22 +++++++
.../pysubmarine/submarine/models/tensorflow.py | 18 ++++++
8 files changed, 216 insertions(+), 4 deletions(-)
diff --git a/bin/submarine.sh b/bin/submarine.sh
index e8c109b..d2e74b0 100755
--- a/bin/submarine.sh
+++ b/bin/submarine.sh
@@ -55,4 +55,10 @@ if [[ ! -d "${SUBMARINE_LOG_DIR}" ]]; then
$(mkdir -p "${SUBMARINE_LOG_DIR}")
fi
+/usr/local/bin/create_bucket.sh
+if [ $? -ne 0 ];then
+ echo "Create submarine bucket fail"
+ exit 1
+fi
+
exec $JAVA_RUNNER $JAVA_OPTS -cp ${SUBMARINE_SERVER_CLASSPATH}
${SUBMARINE_SERVER_MAIN} "$@" | tee -a "${SUBMARINE_SERVER_LOGFILE}" 2>&1
diff --git a/dev-support/docker-images/submarine/Dockerfile
b/dev-support/docker-images/submarine/Dockerfile
index 792080a..f038407 100644
--- a/dev-support/docker-images/submarine/Dockerfile
+++ b/dev-support/docker-images/submarine/Dockerfile
@@ -23,7 +23,7 @@ MAINTAINER Apache Software Foundation
<[email protected]>
# INSTALL openjdk
RUN apk update && \
- apk add --no-cache openjdk8 tzdata bash tini && \
+ apk add --no-cache openjdk8 tzdata bash tini&& \
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo Asia/Shanghai > /etc/timezone && \
apk del tzdata && \
@@ -38,6 +38,12 @@ ADD ./tmp/submarine-site.xml "/opt/submarine-current/conf/"
ADD ./tmp/submarine.sh "/opt/submarine-current/bin/"
ADD ./tmp/mysql-connector-java-5.1.39.jar "/opt/submarine-current/lib/"
+# Create submarine Bucket
+WORKDIR /usr/local/bin
+RUN wget https://dl.min.io/client/mc/release/linux-amd64/mc && chmod +x mc
+RUN wget https://dl.k8s.io/release/v1.15.11/bin/linux/amd64/kubectl && chmod
+x kubectl
+COPY create_bucket.sh /usr/local/bin
+
WORKDIR /opt/submarine-current
# Submarine port
diff --git a/dev-support/docker-images/submarine/create_bucket.sh
b/dev-support/docker-images/submarine/create_bucket.sh
new file mode 100755
index 0000000..826407e
--- /dev/null
+++ b/dev-support/docker-images/submarine/create_bucket.sh
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -euo pipefail
+
+S3_ENDPOINT_URL="http://submarine-minio-service:9000"
+AWS_ACCESS_KEY_ID="submarine_minio"
+AWS_SECRET_ACCESS_KEY="submarine_minio"
+
+
+# Wait for minio pod to setup
+/bin/bash -c "kubectl wait --for=condition=ready pod -l
app=submarine-minio-pod; mc config host add minio ${S3_ENDPOINT_URL}
${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY}"
+
+# Create if the bucket "minio/submarine" not exists
+
+if /bin/bash -c "mc ls minio/submarine" >/dev/null 2>&1; then
+ echo "Bucket minio/submarine already exists, skipping creation."
+else
+ /bin/bash -c "mc mb minio/submarine"
+fi
diff --git a/submarine-sdk/pysubmarine/submarine/artifacts/__init__.py
b/submarine-sdk/pysubmarine/submarine/artifacts/__init__.py
new file mode 100644
index 0000000..c75c093
--- /dev/null
+++ b/submarine-sdk/pysubmarine/submarine/artifacts/__init__.py
@@ -0,0 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from submarine.artifacts.repository import Repository
+
+__all__ = [
+ "Repository",
+]
diff --git a/submarine-sdk/pysubmarine/submarine/artifacts/repository.py
b/submarine-sdk/pysubmarine/submarine/artifacts/repository.py
new file mode 100644
index 0000000..3dff6fc
--- /dev/null
+++ b/submarine-sdk/pysubmarine/submarine/artifacts/repository.py
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import boto3
+
+
+class Repository:
+ def __init__(self, experiment_id):
+ self.client = boto3.client(
+ "s3",
+ aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID"),
+ aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY"),
+ endpoint_url=os.environ.get("MLFLOW_S3_ENDPOINT_URL"),
+ )
+ self.dest_path = experiment_id
+
+ def _upload_file(self, local_file, bucket, key):
+ self.client.upload_file(Filename=local_file, Bucket=bucket, Key=key)
+
+ def _list_artifact_subfolder(self, artifact_path):
+ response = self.client.list_objects(
+ Bucket="submarine",
+ Prefix=os.path.join(self.dest_path, artifact_path) + "/",
+ Delimiter="/",
+ )
+ return response.get("CommonPrefixes")
+
+ def log_artifact(self, local_file, artifact_path):
+ bucket = "submarine"
+ dest_path = self.dest_path
+ dest_path = os.path.join(dest_path, artifact_path)
+ dest_path = os.path.join(dest_path, os.path.basename(local_file))
+ self._upload_file(
+ local_file=local_file,
+ bucket=bucket,
+ key=dest_path,
+ )
+
+ def log_artifacts(self, local_dir, artifact_path):
+ bucket = "submarine"
+ dest_path = self.dest_path
+ list_of_subfolder = self._list_artifact_subfolder(artifact_path)
+ if list_of_subfolder is None:
+ artifact_path = os.path.join(artifact_path, "1")
+ else:
+ artifact_path = os.path.join(artifact_path,
str(len(list_of_subfolder) + 1))
+ dest_path = os.path.join(dest_path, artifact_path)
+ local_dir = os.path.abspath(local_dir)
+ for (root, _, filenames) in os.walk(local_dir):
+ upload_path = dest_path
+ if root != local_dir:
+ rel_path = os.path.relpath(root, local_dir)
+ upload_path = os.path.join(dest_path, rel_path)
+ for f in filenames:
+ self._upload_file(
+ local_file=os.path.join(root, f),
+ bucket=bucket,
+ key=os.path.join(upload_path, f),
+ )
diff --git a/submarine-sdk/pysubmarine/submarine/models/client.py
b/submarine-sdk/pysubmarine/submarine/models/client.py
index a954bac..9cf655f 100644
--- a/submarine-sdk/pysubmarine/submarine/models/client.py
+++ b/submarine-sdk/pysubmarine/submarine/models/client.py
@@ -15,12 +15,16 @@
under the License.
"""
import os
+import re
+import tempfile
import time
import mlflow
from mlflow.exceptions import MlflowException
from mlflow.tracking import MlflowClient
+from submarine.artifacts.repository import Repository
+
from .constant import (
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
@@ -31,15 +35,21 @@ from .utils import exist_ps, get_job_id, get_worker_index
class ModelsClient:
- def __init__(self, tracking_uri=None, registry_uri=None):
+ def __init__(
+ self,
+ tracking_uri=None,
+ registry_uri=None,
+ aws_access_key_id=None,
+ aws_secret_access_key=None,
+ ):
"""
Set up mlflow server connection, including: s3 endpoint, aws, tracking
server
"""
# if setting url in environment variable,
# there is no need to set it by MlflowClient() or
mlflow.set_tracking_uri() again
os.environ["MLFLOW_S3_ENDPOINT_URL"] = registry_uri or
MLFLOW_S3_ENDPOINT_URL
- os.environ["AWS_ACCESS_KEY_ID"] = AWS_ACCESS_KEY_ID
- os.environ["AWS_SECRET_ACCESS_KEY"] = AWS_SECRET_ACCESS_KEY
+ os.environ["AWS_ACCESS_KEY_ID"] = aws_access_key_id or
AWS_ACCESS_KEY_ID
+ os.environ["AWS_SECRET_ACCESS_KEY"] = aws_secret_access_key or
AWS_SECRET_ACCESS_KEY
os.environ["MLFLOW_TRACKING_URI"] = tracking_uri or MLFLOW_TRACKING_URI
self.client = MlflowClient()
self.type_to_log_model = {
@@ -48,6 +58,7 @@ class ModelsClient:
"tensorflow": mlflow.tensorflow.log_model,
"keras": mlflow.keras.log_model,
}
+ self.artifact_repo = Repository(get_job_id())
def start(self):
"""
@@ -98,6 +109,27 @@ class ModelsClient:
else:
raise MlflowException("No valid type of model has been
matched")
+ def save_model_submarine(self, model_type, model, artifact_path,
registered_model_name=None):
+ pattern = r"[0-9A-Za-z][0-9A-Za-z-_]*[0-9A-Za-z]|[0-9A-Za-z]"
+ if not re.fullmatch(pattern, artifact_path):
+ raise Exception(
+ "Artifact_path must only contains numbers, characters, hyphen
and underscore. "
+ " Artifact_path must starts and ends with numbers or
characters."
+ )
+ with tempfile.TemporaryDirectory() as tempdir:
+ if model_type == "pytorch":
+ import submarine.models.pytorch
+
+ submarine.models.pytorch.save_model(model, tempdir)
+ elif model_type == "tensorflow":
+ import submarine.models.tensorflow
+
+ submarine.models.tensorflow.save_model(model, tempdir)
+ else:
+ raise Exception("No valid type of model has been matched to
{}".format(model_type))
+ self.artifact_repo.log_artifacts(tempdir, artifact_path)
+ # TODO for registering model ()
+
def _get_or_create_experiment(self, experiment_name):
"""
Return the id of experiment.
diff --git a/submarine-sdk/pysubmarine/submarine/models/pytorch.py
b/submarine-sdk/pysubmarine/submarine/models/pytorch.py
new file mode 100644
index 0000000..a143aa5
--- /dev/null
+++ b/submarine-sdk/pysubmarine/submarine/models/pytorch.py
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import torch
+
+
+def save_model(model, artifact_path):
+ torch.save(model, os.path.join(artifact_path, "model.pth"))
diff --git a/submarine-sdk/pysubmarine/submarine/models/tensorflow.py
b/submarine-sdk/pysubmarine/submarine/models/tensorflow.py
new file mode 100644
index 0000000..fbe5324
--- /dev/null
+++ b/submarine-sdk/pysubmarine/submarine/models/tensorflow.py
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def save_model(model, artifact_path):
+ model.save(artifact_path)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]