Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-13 Thread Felix LeClair
I've pulled the branch and built with --enable-vulkan 
--enable-libglslang.


What else is needed? Do I need to pull the libplacebo repo as well 
and/or add any special enables in ./configure?


On Wed, Jan 13, 2021 at 5:12 am, Lynne  wrote:
Jan 12, 2021, 22:13 by felix.leclair...@hotmail.com 
:


 That's great! Any way for me to pull that branch or otherwise 
contribute?



The branch is here for now - 
The only blocker to having it merged is for me to rewrite the vulkan 
synchronization
mechanism we currently use. Which I should hopefully get around to 
soon.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org 


To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org 
 with subject "unsubscribe".


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-12 Thread Lynne
Jan 12, 2021, 22:13 by felix.leclair...@hotmail.com:

> That's great! Any way for me to pull that branch or otherwise contribute?
>
The branch is here for now - https://github.com/haasn/FFmpeg
The only blocker to having it merged is for me to rewrite the vulkan 
synchronization
mechanism we currently use. Which I should hopefully get around to soon.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-12 Thread Felix LeClair
That's great! Any way for me to pull that branch or otherwise 
contribute?


Have been using FFmpeg for a few years now, so hopping to be able to 
give back.


On Tue, Jan 12, 2021 at 5:55 am, Lynne  wrote:
Jan 11, 2021, 23:27 by felix.leclair...@hotmail.com 
:


 Hi guys and gals, first post on this mailing list, apologies for 
any formatting/stylistic snafus


 TLDR; we currently have tone mapping filters (typically used to map 
content from a 10bit HDR source to an 8bit SDR output) that are done 
on CPU with Zscale from Zlib, or hardware implementations using 
VAAPI or OpenCL. Having a version implemented in CUDA would round 
out the main HWaccels types.


 Context:
  I'm a computer engineering student up in Canada with an interest 
in high efficiency distributed processing. As a personal project I'm 
trying to build a cluster of Nvidia Jetson Nano's to be able to 
handle a few dozen streams (mix of SD, HD, FHD, UHD, 4kHDR) at once 
while drawing south of 100W at peak. These little devices can do 
anywhere from 1 to 9 streams of content at a time depending on 
resolution/framerate in hardware in any mix of HEVC or H.264, so 3 
of them should get me most of the way to where I want to go (this 
would be a 30W package capable of ~12 2160p30@10 bit -> 1080p30 8bit 
streams).


 The issue is that, 4 little arm64 cores are just not going to be 
able to tonemap using Zscale in real time, even with the encoder and 
decoders sharing memory with the CPU (so no PCIe memcopy penalty). 
On the other hand, the built in GPU and the relative simplicity of 
most tone mapping algorithms (say hable) should make quick work of 
this. Unfortunately (or fortunately for me to learn with?) there 
isn't a CUDA version of the filter.


 Question/guidance:
 I've read through the doc on how to write filters, as well as 
looking at the other cuda filters currently in the source and have a 
general idea of where I'm going, but haven't been able to fully nail 
down how to access frames from hwupload_cuda passed to 
vf_tonemap_cuda.c which in turn passes that frame to 
vf_tonemap_cuda.cu for processing. I have a repo with everything 
I've been pulling together for my project, but the piece of interest 
is under */cuda_filter/ in the source tree. 
<>


 Would anyone mind helping me out with how to architect this?



The tonemap filter is just a (very old by now) copy of libplacebo's 
tonemapping.

No one has bothered to keep it in sync.
I'm working on a libplacebo wrapper currently, so once that's merged 
there

will be up to date hardware tonemapping.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org 


To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org 
 with subject "unsubscribe".


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-11 Thread Lynne
Jan 11, 2021, 23:27 by felix.leclair...@hotmail.com:

> Hi guys and gals, first post on this mailing list, apologies for any 
> formatting/stylistic snafus
>
> TLDR; we currently have tone mapping filters (typically used to map content 
> from a 10bit HDR source to an 8bit SDR output) that are done on CPU with 
> Zscale from Zlib, or hardware implementations using VAAPI or OpenCL. Having a 
> version implemented in CUDA would round out the main HWaccels types.
>
> Context:
>  I'm a computer engineering student up in Canada with an interest in high 
> efficiency distributed processing. As a personal project I'm trying to build 
> a cluster of Nvidia Jetson Nano's to be able to handle a few dozen streams 
> (mix of SD, HD, FHD, UHD, 4kHDR) at once while drawing south of 100W at peak. 
> These little devices can do anywhere from 1 to 9 streams of content at a time 
> depending on resolution/framerate in hardware in any mix of HEVC or H.264, so 
> 3 of them should get me most of the way to where I want to go (this would be 
> a 30W package capable of ~12 2160p30@10 bit -> 1080p30 8bit streams).
>
> The issue is that, 4 little arm64 cores are just not going to be able to 
> tonemap using Zscale in real time, even with the encoder and decoders sharing 
> memory with the CPU (so no PCIe memcopy penalty). On the other hand, the 
> built in GPU and the relative simplicity of most tone mapping algorithms (say 
> hable) should make quick work of this. Unfortunately (or fortunately for me 
> to learn with?) there isn't a CUDA version of the filter.
>
> Question/guidance:
> I've read through the doc on how to write filters, as well as looking at the 
> other cuda filters currently in the source and have a general idea of where 
> I'm going, but haven't been able to fully nail down how to access frames from 
> hwupload_cuda passed to vf_tonemap_cuda.c which in turn passes that frame to 
> vf_tonemap_cuda.cu for processing. I have a repo with everything I've been 
> pulling together for my project, but the piece of interest is under 
> */cuda_filter/ in the source tree. 
> 
>
> Would anyone mind helping me out with how to architect this?
>

The tonemap filter is just a (very old by now) copy of libplacebo's tonemapping.
No one has bothered to keep it in sync.
I'm working on a libplacebo wrapper currently, so once that's merged there
will be up to date hardware tonemapping.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-11 Thread Felix LeClair
Hi guys and gals, first post on this mailing list, apologies for any 
formatting/stylistic snafus


TLDR; we currently have tone mapping filters (typically used to map 
content from a 10bit HDR source to an 8bit SDR output) that are done on 
CPU with Zscale from Zlib, or hardware implementations using VAAPI or 
OpenCL. Having a version implemented in CUDA would round out the main 
HWaccels types.


Context:
	I'm a computer engineering student up in Canada with an interest in 
high efficiency distributed processing. As a personal project I'm 
trying to build a cluster of Nvidia Jetson Nano's to be able to handle 
a few dozen streams (mix of SD, HD, FHD, UHD, 4kHDR) at once while 
drawing south of 100W at peak. These little devices can do anywhere 
from 1 to 9 streams of content at a time depending on 
resolution/framerate in hardware in any mix of HEVC or H.264, so 3 of 
them should get me most of the way to where I want to go (this would be 
a 30W package capable of ~12 2160p30@10 bit -> 1080p30 8bit streams).


The issue is that, 4 little arm64 cores are just not going to be able 
to tonemap using Zscale in real time, even with the encoder and 
decoders sharing memory with the CPU (so no PCIe memcopy penalty). On 
the other hand, the built in GPU and the relative simplicity of most 
tone mapping algorithms (say hable) should make quick work of this. 
Unfortunately (or fortunately for me to learn with?) there isn't a CUDA 
version of the filter.


Question/guidance:
I've read through the doc on how to write filters, as well as looking 
at the other cuda filters currently in the source and have a general 
idea of where I'm going, but haven't been able to fully nail down how 
to access frames from hwupload_cuda passed to vf_tonemap_cuda.c which 
in turn passes that frame to vf_tonemap_cuda.cu for processing. I have 
a repo with everything I've been pulling together for my project, but 
the piece of interest is under */cuda_filter/ in the source tree. 



Would anyone mind helping me out with how to architect this?

Thanks!

FelixCLC
(Alias's: FCLC, camofelix )



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".