This is an automated email from the ASF dual-hosted git repository.
cdmikechen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new 61716e2d SUBMARINE-1318. Add documentation for serve
61716e2d is described below
commit 61716e2db25641f51d22116260c162246ffc834f
Author: cdmikechen <[email protected]>
AuthorDate: Sun Sep 18 14:14:23 2022 +0800
SUBMARINE-1318. Add documentation for serve
### What is this PR for?
Add documentation for serve and revise some text.
### What type of PR is it?
Documentation
### Todos
* [x] - Serve document
### What is the Jira issue?
https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1318
### How should this be tested?
No.
### Screenshots (if appropriate)

### Questions:
* Do the license files need updating? No
* Are there breaking changes for older versions? No
* Does this need new documentation? Yes
Author: cdmikechen <[email protected]>
Signed-off-by: cdmikechen <[email protected]>
Closes #996 from cdmikechen/SUBMARINE-1318 and squashes the following
commits:
b012e1fe [cdmikechen] Fix example link
10e1828e [cdmikechen] Add documentation for serve
---
dev-support/examples/quickstart/train.py | 2 +-
submarine-serve/installation/seldon-secret.yaml | 2 +-
website/docs/gettingStarted/quickstart.md | 199 +++++++++++++++-------
website/static/img/quickstart-artifacts.png | Bin 0 -> 46012 bytes
website/static/img/submarine-model-registry.png | Bin 0 -> 75839 bytes
website/static/img/submarine-register-model.png | Bin 0 -> 71220 bytes
website/static/img/submarine-serve-model.png | Bin 0 -> 129247 bytes
website/static/img/submarine-serve-prediction.png | Bin 0 -> 159760 bytes
8 files changed, 140 insertions(+), 63 deletions(-)
diff --git a/dev-support/examples/quickstart/train.py
b/dev-support/examples/quickstart/train.py
index 5653fa97..fd4eb42d 100644
--- a/dev-support/examples/quickstart/train.py
+++ b/dev-support/examples/quickstart/train.py
@@ -14,7 +14,7 @@
#
==============================================================================
"""
An example of multi-worker training with Keras model using Strategy API.
-https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
+https://github.com/kubeflow/training-operator/blob/master/examples/tensorflow/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
"""
import tensorflow as tf
import tensorflow_datasets as tfds
diff --git a/submarine-serve/installation/seldon-secret.yaml
b/submarine-serve/installation/seldon-secret.yaml
index 4a7d0c37..64aed2f4 100644
--- a/submarine-serve/installation/seldon-secret.yaml
+++ b/submarine-serve/installation/seldon-secret.yaml
@@ -18,7 +18,7 @@ apiVersion: v1
kind: Secret
metadata:
name: submarine-serve-secret
- namespace: default
+ namespace: submarine-user-test
type: Opaque
stringData:
RCLONE_CONFIG_S3_TYPE: s3
diff --git a/website/docs/gettingStarted/quickstart.md
b/website/docs/gettingStarted/quickstart.md
index 44d1bd27..0ad835ef 100644
--- a/website/docs/gettingStarted/quickstart.md
+++ b/website/docs/gettingStarted/quickstart.md
@@ -37,7 +37,7 @@ This document gives you a quick view on the basic usage of
Submarine platform. Y
2. Start minikube cluster and install Istio
-```
+```bash
minikube start --vm-driver=docker --cpus 8 --memory 8192 --kubernetes-version
v1.21.2
istioctl install -y
# Or if you want to support Pod Security Policy
(https://minikube.sigs.k8s.io/docs/tutorials/using_psp), you can use the
following command to start cluster
@@ -48,14 +48,14 @@ minikube start
--extra-config=apiserver.enable-admission-plugins=PodSecurityPoli
1. Clone the project
-```
+```bash
git clone https://github.com/apache/submarine.git
cd submarine
```
2. Create necessary namespaces
-```
+```bash
kubectl create namespace submarine
kubectl create namespace submarine-user-test
kubectl label namespace submarine istio-injection=enabled
@@ -64,28 +64,34 @@ kubectl label namespace submarine-user-test
istio-injection=enabled
3. Install the submarine operator and dependencies by helm chart
-```
+```bash
helm install submarine ./helm-charts/submarine -n submarine
```
4. Create a Submarine custom resource and the operator will create the
submarine server, database, etc. for us.
-```
+```bash
kubectl apply -f submarine-cloud-v2/artifacts/examples/example-submarine.yaml
-n submarine-user-test
```
-### Ensure submarine is ready
+5. Install submarine serve package istio and seldon-core
+```bash
+./submarine-serve/installation/install.sh
```
+
+### Ensure submarine is ready
+
+```bash
$ kubectl get pods -n submarine
NAME READY STATUS RESTARTS
AGE
notebook-controller-deployment-66d85984bf-x562z 1/1 Running 0
7h7m
-pytorch-operator-7d778f4859-g7xph 2/2 Running 0
7h7m
-tf-job-operator-7d895bf77c-75n72 2/2 Running 0
7h7m
+training-operator-6dcd5b9c64-nxwr2 1/1 Running 0
7h7m
+submarine-operator-9cb7bc84d-brddz 1/1 Running 0
7h7m
$ kubectl get pods -n submarine-user-test
NAME READY STATUS RESTARTS AGE
-submarine-database-bdcb77549-rq2ds 2/2 Running 0 7h6m
+submarine-database-0 1/1 Running 0 7h6m
submarine-minio-686b8777ff-zg4d2 2/2 Running 0 7h6m
submarine-mlflow-68c5559dcb-lkq4g 2/2 Running 0 7h6m
submarine-server-7c6d7bcfd8-5p42w 2/2 Running 0 9m33s
@@ -96,7 +102,7 @@ submarine-tensorboard-57c5b64778-t4lww 2/2 Running 0
7h6m
1. Exposing service
-```
+```bash
kubectl port-forward --address 0.0.0.0 -n istio-system
service/istio-ingressgateway 32080:80
```
@@ -116,75 +122,86 @@ Take a simple mnist tensorflow script as an example. We
choose `MultiWorkerMirro
```python
"""
./dev-support/examples/quickstart/train.py
-Reference:
https://github.com/kubeflow/tf-operator/blob/master/examples/v1/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
+Reference:
https://github.com/kubeflow/training-operator/blob/master/examples/tensorflow/distribution_strategy/keras-API/multi_worker_strategy-with-keras.py
"""
-import tensorflow_datasets as tfds
import tensorflow as tf
+import tensorflow_datasets as tfds
+from packaging.version import Version
from tensorflow.keras import layers, models
+
import submarine
+
def make_datasets_unbatched():
- BUFFER_SIZE = 10000
+ BUFFER_SIZE = 10000
- # Scaling MNIST data from (0, 255] to (0., 1.]
- def scale(image, label):
- image = tf.cast(image, tf.float32)
- image /= 255
- return image, label
+ # Scaling MNIST data from (0, 255] to (0., 1.]
+ def scale(image, label):
+ image = tf.cast(image, tf.float32)
+ image /= 255
+ return image, label
- datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True)
+ # If we use tensorflow_datasets > 3.1.0, we need to disable GCS
+ #
https://github.com/tensorflow/datasets/issues/2761#issuecomment-1187413141
+ if Version(tfds.__version__) > Version("3.1.0"):
+ tfds.core.utils.gcs_utils._is_gcs_disabled = True
+ datasets, _ = tfds.load(name="mnist", with_info=True, as_supervised=True)
- return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
+ return datasets["train"].map(scale).cache().shuffle(BUFFER_SIZE)
def build_and_compile_cnn_model():
- model = models.Sequential()
- model.add(
- layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
- model.add(layers.MaxPooling2D((2, 2)))
- model.add(layers.Conv2D(64, (3, 3), activation='relu'))
- model.add(layers.MaxPooling2D((2, 2)))
- model.add(layers.Conv2D(64, (3, 3), activation='relu'))
- model.add(layers.Flatten())
- model.add(layers.Dense(64, activation='relu'))
- model.add(layers.Dense(10, activation='softmax'))
-
- model.summary()
-
- model.compile(optimizer='adam',
- loss='sparse_categorical_crossentropy',
- metrics=['accuracy'])
-
- return model
-
-def main():
- strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
- communication=tf.distribute.experimental.CollectiveCommunication.AUTO)
+ model = models.Sequential()
+ model.add(layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28,
28, 1)))
+ model.add(layers.MaxPooling2D((2, 2)))
+ model.add(layers.Conv2D(64, (3, 3), activation="relu"))
+ model.add(layers.MaxPooling2D((2, 2)))
+ model.add(layers.Conv2D(64, (3, 3), activation="relu"))
+ model.add(layers.Flatten())
+ model.add(layers.Dense(64, activation="relu"))
+ model.add(layers.Dense(10, activation="softmax"))
- BATCH_SIZE_PER_REPLICA = 4
- BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
+ model.summary()
- with strategy.scope():
- ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
- options = tf.data.Options()
- options.experimental_distribute.auto_shard_policy = \
- tf.data.experimental.AutoShardPolicy.DATA
- ds_train = ds_train.with_options(options)
- # Model building/compiling need to be within `strategy.scope()`.
- multi_worker_model = build_and_compile_cnn_model()
+ model.compile(optimizer="adam", loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
- class MyCallback(tf.keras.callbacks.Callback):
- def on_epoch_end(self, epoch, logs=None):
- # monitor the loss and accuracy
- print(logs)
- submarine.log_metrics({"loss": logs["loss"], "accuracy":
logs["accuracy"]}, epoch)
+ return model
- multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70,
callbacks=[MyCallback()])
+def main():
+ strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
+ communication=tf.distribute.experimental.CollectiveCommunication.AUTO
+ )
+
+ BATCH_SIZE_PER_REPLICA = 4
+ BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
+
+ with strategy.scope():
+ ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
+ options = tf.data.Options()
+ options.experimental_distribute.auto_shard_policy = (
+ tf.data.experimental.AutoShardPolicy.DATA
+ )
+ ds_train = ds_train.with_options(options)
+ # Model building/compiling need to be within `strategy.scope()`.
+ multi_worker_model = build_and_compile_cnn_model()
+
+ class MyCallback(tf.keras.callbacks.Callback):
+ def on_epoch_end(self, epoch, logs=None):
+ # monitor the loss and accuracy
+ print(logs)
+ submarine.log_metric("loss", logs["loss"], epoch)
+ submarine.log_metric("accuracy", logs["accuracy"], epoch)
+
+ multi_worker_model.fit(ds_train, epochs=10, steps_per_epoch=70,
callbacks=[MyCallback()])
+ # save model
+ submarine.save_model(multi_worker_model, "tensorflow")
+
+
+if __name__ == "__main__":
+ main()
-if __name__ == '__main__':
- main()
```
### 2. Prepare an environment compatible with the training
@@ -218,4 +235,64 @@ eval $(minikube docker-env)

-### 5. Serve the model (In development)
+### 5. Serve the model
+
+1. Before serving, we need to register a new model.
+
+
+
+2. And then, check the output model in experiment page.
+
+
+
+3. Click the button and register the model.
+
+
+
+4. Go to the model page and deploy our model for serving.
+
+
+
+5. We can run the following commands to get the `VirtualService` and
`Endpoint` that use istio for external port forward or ingress.
+
+```bash
+## get VirtualService with your model name
+kubectl describe VirtualService -n submarine-user-test -l model-name=tf-mnist
+
+Name: submarine-model-1-2508dd65692740b18ff5c6c6c162b863
+Namespace: submarine-user-test
+Labels: model-id=2508dd65692740b18ff5c6c6c162b863
+ model-name=tf-mnist
+ model-version=1
+Annotations: <none>
+API Version: networking.istio.io/v1beta1
+Kind: VirtualService
+Metadata:
+ Creation Timestamp: 2022-09-18T05:26:38Z
+ Generation: 1
+ Managed Fields:
+ ...
+Spec:
+ Gateways:
+ istio-system/seldon-gateway
+ Hosts:
+ *
+ Http:
+ Match:
+ Uri:
+ Prefix: /seldon/submarine-user-test/1/1/
+ Rewrite:
+ Uri: /
+ Route:
+ Destination:
+ Host: submarine-model-1-2508dd65692740b18ff5c6c6c162b863
+ Port:
+ Number: 8000
+Events: <none>
+```
+
+6. After successfully serving the model, we can test the results of serving
using the test python code
[serve_predictions.py](https://github.com/apache/submarine/blob/master/dev-support/examples/quickstart/serve_predictions.py)
+
+
+
+
\ No newline at end of file
diff --git a/website/static/img/quickstart-artifacts.png
b/website/static/img/quickstart-artifacts.png
new file mode 100644
index 00000000..61be8593
Binary files /dev/null and b/website/static/img/quickstart-artifacts.png differ
diff --git a/website/static/img/submarine-model-registry.png
b/website/static/img/submarine-model-registry.png
new file mode 100644
index 00000000..cf6f2819
Binary files /dev/null and b/website/static/img/submarine-model-registry.png
differ
diff --git a/website/static/img/submarine-register-model.png
b/website/static/img/submarine-register-model.png
new file mode 100644
index 00000000..a520be6c
Binary files /dev/null and b/website/static/img/submarine-register-model.png
differ
diff --git a/website/static/img/submarine-serve-model.png
b/website/static/img/submarine-serve-model.png
new file mode 100644
index 00000000..19683325
Binary files /dev/null and b/website/static/img/submarine-serve-model.png differ
diff --git a/website/static/img/submarine-serve-prediction.png
b/website/static/img/submarine-serve-prediction.png
new file mode 100644
index 00000000..7286e639
Binary files /dev/null and b/website/static/img/submarine-serve-prediction.png
differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]