Hi Lieven,

Thanks a lot for this proposal and welcome to the community ! Apologies for
the delay in the reply.
I think it is a nice proposal and opencv exceptions are good point to start
from.
Would you be able to add the proposal to a new cwiki or add the proposal to
the existing cwiki that you linked.
I suggest you add your new error struct to io.h. Also, I dont think for the
proposal, you would need to make any frontend changes.

Would you also be willing to add a phase 2 for the proposal which addresses
the following:

1. How will these errors be propagated to the frontend ? We need to have a
mapping of error codes from backend to frontend to communicate what kind of
exception it is.
2. Handling of std::exception.

Anirudh



On Sun, Nov 4, 2018 at 2:54 AM Lieven Govaerts <l...@apache.org> wrote:

> Hi MXNet devs,
>
>
> I'd like some feedback on the following proposal before I start
> implementing it.
>
> Context:
> I am working on migrating a classification product currently using Caffe to
> MXNet. Along the way I'm encountering some issues loading and augmenting
> the images dataset.
>
> Basically it seems my dataset contains some technically invalid images.
> When loading them using mx.io.ImageRecordIter (from a Python script), they
> get passed eventually to the OpenCV library which will throw a C++
> exception. MXNet currently doesn't capture those, resulting in my script
> aborting with a not very clear error message:
>
> "
> terminate called after throwing an instance of 'cv::Exception'
>
>   what():  OpenCV(3.4.3)
> /home/lgo/dev/opencv-3.4.3/modules/imgproc/src/resize.cpp:4044: error:
> (-215:Assertion failed) !ssize.empty() in function 'resize'
>
> Aborted (core dumped)
> "
>
> These type of issues have been reported before and I see a high level
> action plan has been documented in the wiki:
>
> https://cwiki.apache.org/confluence/display/MXNET/Improved+Exception+Handling+in+MXNet+-+Phase+2
>
> See also my previous pull request, which prevents OpenCV assertions by
> re-implementing the same checks in MXNet code:
> https://github.com/apache/incubator-mxnet/pull/12999
>
>
> As I'm focused now on data loading and OpenCV, I would like to propose the
> following implementation steps:
> 1. Catch cv:exception in all calls to OpenCV functions that can raise one
> (cv::resize, cv::imdecode, cv::addWeighted, cv::mean, cv::copyMakeBorder,
> cv::warpAffine ..)
> => a new macro CHECK_CV_NO_ASSERT
>
> 2. Create a new mxnet::Error class for OpenCV exceptions. Map the
> cv::exception fields to this new Error class: code, err, file, func, line,
> msg, what.
> Make the CHECK_CV_NO_ASSERT macro throw this new mxnet::Error.
> => struct OpenCVError: public dmlc::Error
>
> 3. Add unit tests where possible.
>
> Scope: There are many calls to OpenCV function in different parts of the
> MXNet code. I plan to focus on:
> - src/io/image_*
> - src/ndarray/ndarray.cc
> - plugin/opencv/cv_api.cc
>
> The other modules (R-package, cpp-package, example, julia, tools,
> plugin/sframe) are related to programming languages I don't use. The sframe
> plugin is not documented at all so it's not clear what it does (or why
> you'd keep it in the repo).
>
> Is include/mxnet/base.h a good place to define the new macro and Error
> struct? I'm not sure which include file is visible in all places where
> OpenCV calls are currently used.
>
> Some assumptions:
> - The public API may contain references to 3rd party library OpenCV
> - There is some value in knowing if an Error is the result of a call to the
> OpenCV library. If not, I might as well wrap std::Exception in a more
> generic way.
>
> If I just make these changes the main process will still abort, but now at
> least with a clear error message + stack trace(*). Updating all processing
> codes to handle OpenCVError's correctly is a next step, outside the scope
> of this proposal.
>
> regards,
>
> Lieven
>
>
> (*) Example stack trace:
>
> [23:31:30] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2:
> ./train.txt.rec, use 1 threads for decoding..
>
> [23:31:34] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2:
> ./val.txt.rec, use 1 threads for decoding..
>
> Traceback (most recent call last):
>
>   File "./test_train_carmodel_resnet.py", line 126, in <module>
>
>     for i, batch in enumerate(train_data):
>
>   File "/home/lgo/dev/incubator-mxnet/python/mxnet/io/io.py", line 228, in
> __next__
>
>     return self.next()
>
>   File "/home/lgo/dev/incubator-mxnet/python/mxnet/io/io.py", line 856, in
> next
>
>     check_call(_LIB.MXDataIterNext(self.handle, ctypes.byref(next_res)))
>
>   File "/home/lgo/dev/incubator-mxnet/python/mxnet/base.py", line 252, in
> check_call
>
>     raise MXNetError(py_str(_LIB.MXGetLastError()))
>
> mxnet.base.MXNetError: [23:31:34] src/io/image_aug_default.cc:413: OpenCV
> exception caught:
>
> OpenCV(3.4.3)
> /home/lgo/dev/opencv-3.4.3/modules/imgproc/src/resize.cpp:4044: error:
> (-215:Assertion failed) !ssize.empty() in function 'resize'
>
>
>
> Stack trace returned 10 entries:
>
> [bt] (0)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x53)
> [0x7f84af55b4f3]
>
> [bt] (1)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x39)
> [0x7f84af55bd69]
>
> [bt] (2)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::DefaultImageAugmenter::Process(cv::Mat
> const&, std::vector<float, std::allocator<float> >*,
> std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul,
> 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul,
> 18ul, 1812433253ul>*)+0x2941) [0x7f84b224ed11]
>
> [bt] (3)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*,
> float*, unsigned long, dmlc::InputSplit::Blob*)::{lambda()#1}::operator()()
> const+0x512) [0x7f84b22c3862]
>
> [bt] (4)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x3808eee)
> [0x7f84b22c4eee]
>
> [bt] (5) /usr/lib/x86_64-linux-gnu/libgomp.so.1(GOMP_parallel+0x3f)
> [0x7f8487f63ecf]
>
> [bt] (6)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*,
> float*, unsigned long, dmlc::InputSplit::Blob*)+0x1a7) [0x7f84b22c5e97]
>
> [bt] (7)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseNext(mxnet::DataBatch*)+0x1f9)
> [0x7f84b22ca199]
>
> [bt] (8)
>
> /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<dmlc::ThreadedIter<mxnet::DataBatch>::Init(std::function<bool
> (mxnet::DataBatch**)>, std::function<void ()>)::{lambda()#1}> >
> >::_M_run()+0x1f6) [0x7f84b2260676]
>
> [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd57f)
> [0x7f84e13f457f]
>

Reply via email to