Hi,

The DNN module currently supports two backends, tensorflow (dnn_backend_tf.c) 
and native(dnn_backend_native.c). The native mode has external dependency, imho 
it's not good and need a change. I think it is still a proper timing for the 
change, for the limited functionality and performance of current native mode.

With current implementation, the native mode involves 3 formats and 3 phases, 
see below.

"Tensorflow model file format (format 1)"  ----->convert model (phase 1)----->  
 "native model file format (format 2) "   ----->load model file into memory 
(phase 2)----->   "in memory representation (format 3)"   ----->inference the 
model (phase 3)----->

We created format 2 and format 3, we write c code for phase 2 and phase 3 
within ffmpeg. The phase 1 is written in python to convert from tensorflow 
model file to native mode file, it is an external dependency at 
https://github.com/HighVoltageRocknRoll/sr. Once we add anything new in the 
model, for example, add a new layer, or even just add the padding option for 
the current Conv layer, we always need to change the external dependency. There 
will be many many times for such change.


I have two options to improve it.
Option 1) 
Use ONNX (https://github.com/onnx/onnx) to replace phase 1, format 2 and format 
3.
Open Neural Network Exchange (ONNX) provides an open source format for AI 
models, and the model files are protobuf pb files

The advantage is that we don't need to worry about phase 1, format 2 and format 
3, but to load the protobuf file into memory (phase 2), google just provides 
C++ support (without c support), while only C is allowed in FFmpeg, we have to 
write our c code in ffmpeg to parse/load the protobuf file.


Option 2)
Write c code in FFmpeg to convert tensorflow file format (format 1) directly 
into memory representation (format 3), and so we controls everything in ffmpeg 
community. And the conversion can be extended to import more file formats such 
as torch, darknet, etc. One example is that OpenCV uses this method.

The in memory representation (format 3) can still be current.


I personally prefer option 2. Anyway, will be glad to see any better options. I 
can continue the contribute on this area once the community decides the 
technical direction.


Thanks
Yejun
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to