Hi Lieven, Thanks a lot for this proposal and welcome to the community ! Apologies for the delay in the reply. I think it is a nice proposal and opencv exceptions are good point to start from. Would you be able to add the proposal to a new cwiki or add the proposal to the existing cwiki that you linked. I suggest you add your new error struct to io.h. Also, I dont think for the proposal, you would need to make any frontend changes.
Would you also be willing to add a phase 2 for the proposal which addresses the following: 1. How will these errors be propagated to the frontend ? We need to have a mapping of error codes from backend to frontend to communicate what kind of exception it is. 2. Handling of std::exception. Anirudh On Sun, Nov 4, 2018 at 2:54 AM Lieven Govaerts <l...@apache.org> wrote: > Hi MXNet devs, > > > I'd like some feedback on the following proposal before I start > implementing it. > > Context: > I am working on migrating a classification product currently using Caffe to > MXNet. Along the way I'm encountering some issues loading and augmenting > the images dataset. > > Basically it seems my dataset contains some technically invalid images. > When loading them using mx.io.ImageRecordIter (from a Python script), they > get passed eventually to the OpenCV library which will throw a C++ > exception. MXNet currently doesn't capture those, resulting in my script > aborting with a not very clear error message: > > " > terminate called after throwing an instance of 'cv::Exception' > > what(): OpenCV(3.4.3) > /home/lgo/dev/opencv-3.4.3/modules/imgproc/src/resize.cpp:4044: error: > (-215:Assertion failed) !ssize.empty() in function 'resize' > > Aborted (core dumped) > " > > These type of issues have been reported before and I see a high level > action plan has been documented in the wiki: > > https://cwiki.apache.org/confluence/display/MXNET/Improved+Exception+Handling+in+MXNet+-+Phase+2 > > See also my previous pull request, which prevents OpenCV assertions by > re-implementing the same checks in MXNet code: > https://github.com/apache/incubator-mxnet/pull/12999 > > > As I'm focused now on data loading and OpenCV, I would like to propose the > following implementation steps: > 1. Catch cv:exception in all calls to OpenCV functions that can raise one > (cv::resize, cv::imdecode, cv::addWeighted, cv::mean, cv::copyMakeBorder, > cv::warpAffine ..) > => a new macro CHECK_CV_NO_ASSERT > > 2. Create a new mxnet::Error class for OpenCV exceptions. Map the > cv::exception fields to this new Error class: code, err, file, func, line, > msg, what. > Make the CHECK_CV_NO_ASSERT macro throw this new mxnet::Error. > => struct OpenCVError: public dmlc::Error > > 3. Add unit tests where possible. > > Scope: There are many calls to OpenCV function in different parts of the > MXNet code. I plan to focus on: > - src/io/image_* > - src/ndarray/ndarray.cc > - plugin/opencv/cv_api.cc > > The other modules (R-package, cpp-package, example, julia, tools, > plugin/sframe) are related to programming languages I don't use. The sframe > plugin is not documented at all so it's not clear what it does (or why > you'd keep it in the repo). > > Is include/mxnet/base.h a good place to define the new macro and Error > struct? I'm not sure which include file is visible in all places where > OpenCV calls are currently used. > > Some assumptions: > - The public API may contain references to 3rd party library OpenCV > - There is some value in knowing if an Error is the result of a call to the > OpenCV library. If not, I might as well wrap std::Exception in a more > generic way. > > If I just make these changes the main process will still abort, but now at > least with a clear error message + stack trace(*). Updating all processing > codes to handle OpenCVError's correctly is a next step, outside the scope > of this proposal. > > regards, > > Lieven > > > (*) Example stack trace: > > [23:31:30] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: > ./train.txt.rec, use 1 threads for decoding.. > > [23:31:34] src/io/iter_image_recordio_2.cc:172: ImageRecordIOParser2: > ./val.txt.rec, use 1 threads for decoding.. > > Traceback (most recent call last): > > File "./test_train_carmodel_resnet.py", line 126, in <module> > > for i, batch in enumerate(train_data): > > File "/home/lgo/dev/incubator-mxnet/python/mxnet/io/io.py", line 228, in > __next__ > > return self.next() > > File "/home/lgo/dev/incubator-mxnet/python/mxnet/io/io.py", line 856, in > next > > check_call(_LIB.MXDataIterNext(self.handle, ctypes.byref(next_res))) > > File "/home/lgo/dev/incubator-mxnet/python/mxnet/base.py", line 252, in > check_call > > raise MXNetError(py_str(_LIB.MXGetLastError())) > > mxnet.base.MXNetError: [23:31:34] src/io/image_aug_default.cc:413: OpenCV > exception caught: > > OpenCV(3.4.3) > /home/lgo/dev/opencv-3.4.3/modules/imgproc/src/resize.cpp:4044: error: > (-215:Assertion failed) !ssize.empty() in function 'resize' > > > > Stack trace returned 10 entries: > > [bt] (0) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x53) > [0x7f84af55b4f3] > > [bt] (1) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x39) > [0x7f84af55bd69] > > [bt] (2) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::DefaultImageAugmenter::Process(cv::Mat > const&, std::vector<float, std::allocator<float> >*, > std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, > 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, > 18ul, 1812433253ul>*)+0x2941) [0x7f84b224ed11] > > [bt] (3) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*, > float*, unsigned long, dmlc::InputSplit::Blob*)::{lambda()#1}::operator()() > const+0x512) [0x7f84b22c3862] > > [bt] (4) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(+0x3808eee) > [0x7f84b22c4eee] > > [bt] (5) /usr/lib/x86_64-linux-gnu/libgomp.so.1(GOMP_parallel+0x3f) > [0x7f8487f63ecf] > > [bt] (6) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseChunk(float*, > float*, unsigned long, dmlc::InputSplit::Blob*)+0x1a7) [0x7f84b22c5e97] > > [bt] (7) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::io::ImageRecordIOParser2<float>::ParseNext(mxnet::DataBatch*)+0x1f9) > [0x7f84b22ca199] > > [bt] (8) > > /home/lgo/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<dmlc::ThreadedIter<mxnet::DataBatch>::Init(std::function<bool > (mxnet::DataBatch**)>, std::function<void ()>)::{lambda()#1}> > > >::_M_run()+0x1f6) [0x7f84b2260676] > > [bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd57f) > [0x7f84e13f457f] >