sandeep-krishnamurthy closed pull request #9066: FCN example updates URL: https://github.com/apache/incubator-mxnet/pull/9066
This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/example/fcn-xs/README.md b/example/fcn-xs/README.md index 66ae08fe71..145aa31cb7 100644 --- a/example/fcn-xs/README.md +++ b/example/fcn-xs/README.md @@ -1,6 +1,7 @@ -FCN-xs EXAMPLES ---------------- -This folder contains the examples of image segmentation in MXNet. +FCN-xs EXAMPLE +-------------- +This folder contains an example implementation for Fully Convolutional Networks (FCN) in MXNet. +The example is based on the [FCN paper](https://arxiv.org/abs/1411.4038) by long et al. of UC Berkeley. ## Sample results ![fcn-xs pasval_voc result](https://github.com/dmlc/web-data/blob/master/mxnet/image/fcnxs-example-result.jpg) @@ -17,32 +18,36 @@ We have trained a simple fcn-xs model, the hyper-parameters are below: The training dataset size is only 2027, and the validation dataset size is 462. -## How to train fcn-xs in mxnet -#### Getting Started +## Training the model + +### Step 1: setup pre-requisites - Install python package?`Pillow` (required by `image_segment.py`). ```shell -[sudo] pip install Pillow +pip install --upgrade Pillow ``` -- Assume that we are in a working directory, such as `~/train_fcn_xs`, and MXNet is built as `~/mxnet`. Now, copy example scripts into working directory. +- Setup your working directory. Assume your working directory is `~/train_fcn_xs`, and MXNet is built as `~/mxnet`. Copy example scripts into the working directory. ```shell cp ~/mxnet/example/fcn-xs/* . ``` -#### Step1: Download the vgg16fc model and experiment data -* vgg16fc model : you can download the ```VGG_FC_ILSVRC_16_layers-symbol.json``` and ```VGG_FC_ILSVRC_16_layers-0074.params``` [baidu yun](http://pan.baidu.com/s/1bgz4PC), [dropbox](https://www.dropbox.com/sh/578n5cxej7ofd6m/AACuSeSYGcKQDi1GoB72R5lya?dl=0). +### Step 2: Download the vgg16fc model and training data +* vgg16fc model: you can download the ```VGG_FC_ILSVRC_16_layers-symbol.json``` and ```VGG_FC_ILSVRC_16_layers-0074.params``` from [baidu yun](http://pan.baidu.com/s/1bgz4PC), [dropbox](https://www.dropbox.com/sh/578n5cxej7ofd6m/AACuSeSYGcKQDi1GoB72R5lya?dl=0). this is the fully convolution style of the origin [VGG_ILSVRC_16_layers.caffemodel](http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel), and the corresponding [VGG_ILSVRC_16_layers_deploy.prototxt](https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-vgg_ilsvrc_16_layers_deploy-prototxt), the vgg16 model has [license](http://creativecommons.org/licenses/by-nc/4.0/) for non-commercial use only. -* experiment data : you can download the ```VOC2012.rar``` [robots.ox.ac.uk](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar), and extract it. the file/folder will be like: -```JPEGImages folder```, ```SegmentationClass folder```, ```train.lst```, ```val.lst```, ```test.lst``` +* Training data: download the ```VOC2012.rar``` [robots.ox.ac.uk](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar), and extract it into ```.\VOC2012``` +* Mapping files: download ```train.lst```, ```val.lst``` from [baidu yun](http://pan.baidu.com/s/1bgz4PC) into the ```.\VOC2012``` directory + +Once you completed all these steps, your working directory should contain a ```.\VOC2012``` directory, which contains the following: ```JPEGImages folder```, ```SegmentationClass folder```, ```train.lst```, ```val.lst``` -#### Step2: Train fcn-xs model -* Configure GPU/CPU for training in `fcn_xs.py`. +#### Step 3: Train the fcn-xs model +* Based on your hardware, configure GPU or CPU for training in `fcn_xs.py`. It is recommended to use GPU due to the computational complexity and data load. ```python # ctx = mx.cpu(0) ctx = mx.gpu(0) ``` -* If you want to train the fcn-8s model, it's better for you trained the fcn-32s and fcn-16s model firstly. -when training the fcn-32s model, run in shell ```./run_fcnxs.sh```, the script in it is: +* It is recommended to train fcn-32s and fcn-16s before training the fcn-8s model + +To train the fcn-32s model, run the following: ```shell python -u fcn_xs.py --model=fcn32s --prefix=VGG_FC_ILSVRC_16_layers --epoch=74 --init-type=vgg16 ``` @@ -64,14 +69,15 @@ INFO:root:Epoch[0] Batch [350] Speed: 1.12 samples/sec Train-accuracy=0.912080 ``` ## Using the pre-trained model for image segmentation -* Similarly, you should first download the pre-trained model from [yun.baidu](http://pan.baidu.com/s/1bgz4PC), the symbol and model file is ```FCN8s_VGG16-symbol.json```, ```FCN8s_VGG16-0019.params``` -* Then put the image in your directory for segmentation, and change the ```img = YOUR_IMAGE_NAME``` in ```image_segmentaion.py``` -* At last, use ```image_segmentaion.py``` to segmentation one image by running in shell ```python image_segmentaion.py```, then you will get the segmentation image like the sample results above. +To try out the pre-trained model, follow these steps: +* Download the pre-trained symbol and weights from [yun.baidu](http://pan.baidu.com/s/1bgz4PC). You should download these files: ```FCN8s_VGG16-symbol.json``` and ```FCN8s_VGG16-0019.params``` +* Run the segmentation script, providing it your input image path: ```python image_segmentaion.py --input <your JPG image path>``` +* The segmented output ```.png``` file will be generated in the working directory ## Tips -* This is the whole image size training, that is to say, we do not need resize/crop the image to the same size, so the batch_size during training is set to 1. -* The fcn-xs model is based on vgg16 model, with some crop, deconv, element-sum layer added, so the model is quite big, moreover, the example is using whole image size training, if the input image is large(such as 700*500), then it may consume lots of memories, so I suggest you using the GPU with 12G memory. -* If you don't have GPU with 12G memory, maybe you should change the ```cut_off_size``` to a small value when you construct your FileIter, like this: +* This example runs full image size training, so there is no need to resize or crop input images to the same size. Accordingly, batch_size during training is set to 1. +* The fcn-xs model is based on vgg16 model, with some crop, deconv, element-sum layer added, so the model is quite big, moreover, the example is using whole image size training, if the input image is large(such as 700*500), then memory consumption may be high. Due to that, I suggest you use GPU with at least 12GB memory for training. +* If you don't have access to GPU with 12GB memory for training, I suggest you change the ```cut_off_size``` to a small value when constructing the FileIter, example below: ```python train_dataiter = FileIter( root_dir = "./VOC2012", @@ -80,4 +86,4 @@ train_dataiter = FileIter( rgb_mean = (123.68, 116.779, 103.939), ) ``` -* We are looking forward you to making this example more powerful, thanks. + diff --git a/example/fcn-xs/image_segmentaion.py b/example/fcn-xs/image_segmentaion.py index ddd850fe4e..75df2d128a 100644 --- a/example/fcn-xs/image_segmentaion.py +++ b/example/fcn-xs/image_segmentaion.py @@ -15,38 +15,68 @@ # specific language governing permissions and limitations # under the License. -# pylint: skip-file +""" +This module encapsulates running image segmentation model for inference. + +Example usage: + $ python image_segmentaion.py --input <your JPG image path> +""" + +import argparse +import os import numpy as np import mxnet as mx from PIL import Image -def getpallete(num_cls): - # this function is to get the colormap for visualizing the segmentation mask - n = num_cls - pallete = [0]*(n*3) - for j in xrange(0,n): - lab = j - pallete[j*3+0] = 0 - pallete[j*3+1] = 0 - pallete[j*3+2] = 0 - i = 0 - while (lab > 0): - pallete[j*3+0] |= (((lab >> 0) & 1) << (7-i)) - pallete[j*3+1] |= (((lab >> 1) & 1) << (7-i)) - pallete[j*3+2] |= (((lab >> 2) & 1) << (7-i)) - i = i + 1 - lab >>= 3 - return pallete +def make_file_extension_assertion(extension): + """Function factory for file extension argparse assertion + Args: + extension (string): the file extension to assert + + Returns: + string: the supplied extension, if assertion is successful. + + """ + def file_extension_assertion(file_path): + base, ext = os.path.splitext(file_path) + if ext.lower() != extension: + raise argparse.ArgumentTypeError('File must have ' + extension + ' extension') + return file_path + return file_extension_assertion + +def get_palette(num_colors=256): + """generates the colormap for visualizing the segmentation mask + Args: + num_colors (int): the number of colors to generate in the output palette -pallete = getpallete(256) -img = "./person_bicycle.jpg" -seg = img.replace("jpg", "png") -model_previx = "FCN8s_VGG16" -epoch = 19 -ctx = mx.gpu(0) + Returns: + string: the supplied extension, if assertion is successful. + + """ + pallete = [0]*(num_colors*3) + for j in range(0, num_colors): + lab = j + pallete[j*3+0] = 0 + pallete[j*3+1] = 0 + pallete[j*3+2] = 0 + i = 0 + while (lab > 0): + pallete[j*3+0] |= (((lab >> 0) & 1) << (7-i)) + pallete[j*3+1] |= (((lab >> 1) & 1) << (7-i)) + pallete[j*3+2] |= (((lab >> 2) & 1) << (7-i)) + i = i + 1 + lab >>= 3 + return pallete def get_data(img_path): - """get the (1, 3, h, w) np.array data for the img_path""" + """get the (1, 3, h, w) np.array data for the supplied image + Args: + img_path (string): the input image path + + Returns: + np.array: image data in a (1, 3, h, w) shape + + """ mean = np.array([123.68, 116.779, 103.939]) # (R,G,B) img = Image.open(img_path) img = np.array(img, dtype=np.float32) @@ -58,18 +88,37 @@ def get_data(img_path): return img def main(): - fcnxs, fcnxs_args, fcnxs_auxs = mx.model.load_checkpoint(model_previx, epoch) - fcnxs_args["data"] = mx.nd.array(get_data(img), ctx) + """Module main execution""" + # Initialization variables - update to change your model and execution context + model_prefix = "FCN8s_VGG16" + epoch = 19 + + # By default, MXNet will run on the CPU. Uncomment the line below to execute on the GPU + # ctx = mx.gpu() + + fcnxs, fcnxs_args, fcnxs_auxs = mx.model.load_checkpoint(model_prefix, epoch) + fcnxs_args["data"] = mx.nd.array(get_data(args.input), ctx) data_shape = fcnxs_args["data"].shape label_shape = (1, data_shape[2]*data_shape[3]) fcnxs_args["softmax_label"] = mx.nd.empty(label_shape, ctx) - exector = fcnxs.bind(ctx, fcnxs_args ,args_grad=None, grad_req="null", aux_states=fcnxs_args) + exector = fcnxs.bind(ctx, fcnxs_args, args_grad=None, grad_req="null", aux_states=fcnxs_args) exector.forward(is_train=False) output = exector.outputs[0] out_img = np.uint8(np.squeeze(output.asnumpy().argmax(axis=1))) out_img = Image.fromarray(out_img) - out_img.putpalette(pallete) - out_img.save(seg) + out_img.putpalette(get_palette()) + out_img.save(args.output) if __name__ == "__main__": + # Handle command line arguments + parser = argparse.ArgumentParser(description='Run VGG16-FCN-8s to segment an input image') + parser.add_argument('--input', + required=True, + type=make_file_extension_assertion('.jpg'), + help='The segmentation input JPG image') + parser.add_argument('--output', + default='segmented.png', + type=make_file_extension_assertion('.png'), + help='The segmentation putput PNG image') + args = parser.parse_args() main() ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services