Re: [FFmpeg-devel] What is FFmpeg and what should it be

2023-08-07 Thread Rémi Denis-Courmont
Le sunnuntaina 6. elokuuta 2023, 22.53.23 EEST Michael Niedermayer a écrit :
> > > > Did you ask people to do that?
> > > 
> > > yes, multiple times.
> > > Also normally patch objections come with a path forward, that was not
> > > the case here.
> > 
> > Not necessarily, sometimes preventing a bad idea from happening is a
> > positive thing in itself, and no path forward is needed.
> 
> That is missing that people suggest a path forward but
> with too few details to easily walk that path.

Uh, I hate to state the patently obvious, but if "no path forward is needed", 
then there should logically be _no_ "details to walk [a] path". Conversely, if 
avradio does not belong in FFmpeg, as Kieran, Tomas and others have been 
arguing, then there is no path forward to be given on FFmpeg-devel.


And besides I don't think it's even fair to state that "too few details" were 
given. People did suggest making this a new separate project properly isolated 
from FFmpeg internals, and/or joining efforts with existing OSS SDR projects 
rather than FFmpeg. Some specific projects have even been cited.

As far as FFmpeg(-devel) is concerned, I can't think how it could/should 
reasonably get any more specific than that.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] Hijack of FATE instances

2023-08-07 Thread Rémi Denis-Courmont
To whom it may concern,

It has come to Remlab Tmi's attention that the FATE samples suite has recently 
been abused to contain non-multimedia files. This is a breach of your trust and 
we feel that this is totally inappropriate. The FATE instances were explicitly 
setup and sponsored by Tmi Remlab only for FFmpeg & multimedia.

Remlab's FATE instances will no longer be synchronising the sample suite. This 
comes in effect immediately. All blame for consequent failures should be 
directed to whoever is mismanaging the FATE suite.

If this is not resolved, Remlab reserves the rights to terminate all its FATE 
instances without notice.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavu/hwcontext_vaapi: Add vaapi_drm_format_map support for x2rgb10

2023-08-09 Thread Rémi Denis-Courmont


Le 9 août 2023 15:02:45 GMT+03:00, David Rosca  a écrit :
>Support for allocating frames with x2rgb10 format was added
>in c00264f5013, this adds support for importing DMA-BUFs.
>---
> libavutil/hwcontext_vaapi.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
>index 6c3a227ddd..63544ce476 100644
>--- a/libavutil/hwcontext_vaapi.c
>+++ b/libavutil/hwcontext_vaapi.c
>@@ -1048,6 +1048,9 @@ static const struct {
> #if defined(VA_FOURCC_Y412) && defined(DRM_FORMAT_XVYU12_16161616)
> DRM_MAP(Y412, 1, DRM_FORMAT_XVYU12_16161616),
> #endif
>+#ifdef defined(VA_FOURCC_X2R10G10B10) && defined(DRM_FORMAT_XRGB2101010)
>+DRM_MAP(X2R10G10B10, 1, DRM_FORMAT_XRGB2101010),
>+#endif

That syntax is ostensibly wrong. Did you test the patch?

> };
> #undef DRM_MAP
> 
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] What is FFmpeg and what should it be

2023-08-08 Thread Rémi Denis-Courmont
Le tiistaina 8. elokuuta 2023, 18.22.49 EEST Michael Niedermayer a écrit :
> > > That is missing that people suggest a path forward but
> > > with too few details to easily walk that path.
> > 
> > Uh, I hate to state the patently obvious, but if "no path forward is
> > needed", then there should logically be _no_ "details to walk [a] path".
> > Conversely, if avradio does not belong in FFmpeg, as Kieran, Tomas and
> > others have been arguing, then there is no path forward to be given on
> > FFmpeg-devel.
> > 
> > 
> > And besides I don't think it's even fair to state that "too few details"
> > were given. People did suggest making this a new separate project
> > properly isolated from FFmpeg internals, and/or joining efforts with
> > existing OSS SDR projects rather than FFmpeg. Some specific projects have
> > even been cited.
> > 
> > As far as FFmpeg(-devel) is concerned, I can't think how it could/should
> > reasonably get any more specific than that.
> 
> The saying goes, one cannot win an Argument on the Internet.
> So, iam not trying to, but
> 
> IIRC, a while ago you said iam obliged to work on FFmpeg. Thats
> simply not the case.

I have made some preposterous statements in my dark past, but I am pretty sure 
that I didn't make any statement to that effect, no.

I did assert that there "are dozens of people, ostensibly including [you], 
that depend on FFmpeg being ""Serious OpenSource TM"" in some way, for their
livelihood, and millions for their computer use" in response to NG's argument 
that FFmpeg should be turned into a fun experimental research project, and 
that people who wanted to keep FFmpeg what he calls a "serious open-source 
trademark" should just fork.

> Its not an obligation but rather my choice that i like to work on
> something the end user will enjoy.

I don't think anybody denied your right to code whatever you want as a hobby. 
We object to having SDR code in FFmpeg upstream, and I personally believe that 
it would go against your own financial/business interest (c.f. quote above).

> And the end user, in fact
> more than 500 end users liked SDR in FFmpeg.

In all likelihood the vast majority of those 500 likes don't even have the 
hardware to use the SDR code, have no interests in acquiring it, and would 
like just about any new FFmpeg feature post. Without a statistically valid 
reference for comparison, that number means basically nothing.

(Also announcing a feature that is not release-ready is generally very 
inappropriate in my opinion.)

(...)
> You and vittorio here seem to suggest that instead there are no
> possible conditions and no path forward.

...within FFmpeg.

> Thus a fork would have to happen.

No. You are taking for granted that SDR belongs in FFmpeg in the first place, 
and that's exactly what people disagree with.

I don't even get what's so hard to comprehend here. Plenty of people here 
contirbute to other projects than FFmpeg, and/or have started other projects 
than FFmpeg for their other hobby coding activities. Just because you have 
personally been strongly associated with FFmpeg does not mean that everything 
you has to fit inside of it.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] avformat: add Software Defined Radio support

2023-06-24 Thread Rémi Denis-Courmont


Le 23 juin 2023 20:12:41 GMT+02:00, Michael Niedermayer 
 a écrit :
>Hi
>
>On Fri, Jun 23, 2023 at 06:37:18PM +0200, Rémi Denis-Courmont wrote:
>> Hi,
>> 
>> Le 23 juin 2023 13:17:28 GMT+02:00, Michael Niedermayer 
>>  a écrit :
>> >On Fri, Jun 23, 2023 at 10:34:10AM +0800, Kieran Kunhya wrote:
>> >> FFmpeg is not the place for SDR. SDR is as large and complex as the
>> >> entirety of multimedia.
>> >> 
>> >> What next, is FFmpeg going to implement TCP in userspace, Wifi, Ethernet,
>> >> an entire 4G and 5G stack?
>> >
>> >https://en.wikipedia.org/wiki/Straw_man
>> >
>> >What my patch is doing is adding support for AM demodulation, the AM
>> >specific code is like 2 pages. The future plan for FM demodulation will
>> >not add alot of code either. DAB/DVB should also not be anything big
>> >(if that is implemented at all by anyone)
>> 
>> Literally every one of those layer-2 protocols has a lower-level API already 
>> on Linux, and typically they are, or would be, backends to libavdevice.
>> 
>> (Specifically AM and FM are supported by V4L radio+ALSA; DAB and DVB by 
>> Linux-DVB. 4G and 5G are network devices.)
>
>4 problems
>* FFmpeg is not "linux only".

And then what? Whether you like it or not, radio signal processing sits on top 
of OS-specific APIs to access whatever bus or hardware. You can't make this 
OS-independent whether it's in FFmpeg or elsewhere.

At best you can write or reuse platform abstraction layers (such as libusb). 
Maybe.

In other words, whether this ends up in FFmpeg or not has absolutely no bearing 
on this "problem" as you call it.

But it doesn't end here. Audio input on Linux is normally exposed with ALSA 
modules (hw/plughw if the driver is in kernel, but it doesn't have to be), and 
other OSes have equivalent APIs. A sound (pun unintended) implementation of AM 
or FM would actually be an ALSA module, and *maybe* also PA and PW modules. 
(They don't have to be kernel mode drivers.)

...Not an FFmpeg device or demux.

>* No software i tried or was suggested to me used V4L or Linux-DVB.

That would be because audio input is done with ALSA (in combination with V4L 
for *hardware* radio tuning).

The point was that this is lower layer stuff that belongs in a lower level 
library or module, rather than FFmpeg. I never *literally* said that you should 
(or even could) use V4L or Linux-DVB here. Those APIs are rather for the 
*hardware* equivalent of what you're doing in *software*.

So again, no "problem" here.

>* iam not sure the RSP1A i have has linux drivers for these interfaces

Unless you're connecting to the radio receiver via IP (which would be a kludge 
IMO), you're going to have to have a kernel driver to expose the bus to the 
physical hardware. This is the same fallacious argument as the first one: you 
*can't* be OS-independent here, even if it's understandably and even agreeably 
desirable in an ideal (and purely imaginary) world.

>* What iam interrested in was working with the signals at a low level, why
>  because i find it interresting and fun.

Nothing wrong with that, but that doesn't make it fit in FFmpeg, which, for all 
the amazing work you've done on it, isn't your personal playground.

> Accessing AM/FM through some high
>  level API is not something iam interrested in. This is also because any
>  issues are likely unsolvable at that level.
>  If probing didnt find a station, or demodulation doesnt work, a high
>  level API likely wont allow doing anything about that.

I'm not sure what you even call high-level API here.

AM and FM are analogue audio sources, and ALSA modules and equivalent APIs on 
other OSes are as low as it practically gets for *exposing* digitised analogue 
audio. They're *lower* than libavformat/libavdevice even, I'd argue.

>
>> 
>> So I can only agree with Kieran that these are *lower* layers, that don't 
>> really look like they belong in FFmpeg.
>
>FFmpeg has been always very low level. We stoped at where the OS provides
>support that works, not at some academic "level". If every OS provides a great
>SDR API than i missed that, which is possible because that was never something
>i was interrested in.
>
>
>> 
>> >If the code grows beyond that it could be split out into a seperate
>> >library outside FFmpeg.
>> 
>> I think that the point is, that that code should be up-front in a separate 
>> FFmpeg-independent library. And it's not just a technical argument with 
>> layering. It's also that it's too far outside what FFmpeg typically works 
>> with, so it really should not be put under the purview of FFmpeg-devel. In 
>> other words, it's also a social proble

Re: [FFmpeg-devel] [PATCH v6 0/1] avformat: add Software Defined Radio support

2023-07-01 Thread Rémi Denis-Courmont
Hi,

Le 30 juin 2023 21:02:36 GMT+03:00, Michael Niedermayer 
 a écrit :
>On Fri, Jun 30, 2023 at 07:40:53PM +0200, Michael Niedermayer wrote:
>> On Fri, Jun 30, 2023 at 04:38:46PM +0200, Jean-Baptiste Kempf wrote:
>> > On Fri, 30 Jun 2023, at 16:08, Michael Niedermayer wrote:
>> > > Also as said previously, If there is at least a 2nd developer working
>> > > on this then we could & should move this to a seperate libraray 
>> > > (libavradio)
>> > 
>> > Why wait for a 2nd dev?
>> 
>> It is significant work
>
>And if we could put it in git master then people could work together to
>build the libavradio out of it as we all want.

You can also achieve that by creating a separate git repo for libavradio on 
git.ffmpeg.org. Admittedly some people could object to hosting this code by 
FFmpeg; the point is that I don't see what good putting this inside the FFmpeg 
master git branch achieves.

>Such collaboration is kind of one of the reasons of having a "git master"

You're better off collaborating with people interested in this, than with the 
entirety of FFmpeg-devel, me thinks.

That's if this turns into a "Boring Serious Open-Source" project to paraphrase 
a certain somebody else. If it remains just your hobby project, you're probably 
better off doing it in your own git repository where you can dictate 
everything. There you won't have to deal with the FFmpeg TC nor pesky minders 
of other people's source code such as I.

>If i cannot apply the patches but have to do very deep redesigns then
>it forces me to do all this work alone. That would affect other things
>i would otherwise work on.
>IMHO its better if everyone can collaborate on this libavradio project
>if everyone wants it.

That reasoning only makes sense if a sizable subset of FFmpeg people, and even 
then it would probably fit better in a separate git repository of the same 
organisation, than in the same git repository.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/4] avfilter/vf_ccrepack: Constify filter

2023-07-01 Thread Rémi Denis-Courmont


Le 29 juin 2023 22:42:17 GMT+03:00, Paul B Mahol  a écrit :
>On Thu, Jun 29, 2023 at 9:35 PM Andreas Rheinhardt <
>andreas.rheinha...@outlook.com> wrote:
>
>> Paul B Mahol:
>> > On Thu, Jun 29, 2023 at 8:18 PM Andreas Rheinhardt <
>> > andreas.rheinha...@outlook.com> wrote:
>> >
>> >> The discrepancy between the definition and the declaration
>> >> in allfilters.c is actually UB.
>> >>
>> >
>> > I get no such message with ubsan.
>> >
>>
>> UBSan is a runtime UB-detector, not a compile-time UB detector.
>> The earlier code is UB because of 6.2.7 (2) of C11: "All declarations
>> that refer to the same object or function shall have compatible type;
>> otherwise, the behavior is undefined." A type and its const-qualified
>> type are not compatible.
>>
>
>This is so minor, that it is fully irrelevant.

UB is one of the most severe type of bug that can happen in C. How exactly is 
that "fully irrelevant"?

Nobody is ordering you to fix this bug if you don't want to. That's not a 
reason to block an objective simple well-understood and well-informed bug fix 
that somebody else made.

>
>
>>
>> - Andreas
>>
>>
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v6 0/1] avformat: add Software Defined Radio support

2023-07-02 Thread Rémi Denis-Courmont
Hi,

Le 2 juillet 2023 13:08:54 GMT+03:00, Paul B Mahol  a écrit :
>On Sun, Jul 2, 2023 at 11:40 AM Nicolas George  wrote:
>
>> Michael Niedermayer (12023-06-30):
>> > And if we could put it in git master then people could work together to
>> > build the libavradio out of it as we all want.
>> >
>> > Such collaboration is kind of one of the reasons of having a "git master"
>>
>> I want you to continue writing your great code where it is the easiest
>> to install and enjoy for users and to contribute for others. Right now,
>> at the very beginning of the project, and since you are very familiar
>> with FFmpeg, a part of FFmpeg is the right place.
>>
>> The bean counters who tell you otherwise are actually not very good at
>> counting beans, when it is not their own.
>>
>
>I find this way of writing very offensive and inflammatory.

Judging by his email domain name, he is/was involved with possibly the most 
prestigious and selective grad/post-grad institution in the field of 
mathematics in France. He probably doesn't understand the low level of 
arithmetic expertise is needed of actual accountants, as opposed to FFmpeg 
developers. From his standpoint, they're just both much lower than his.

Otherwise it's not offensive and inflammatory. Rather it's insulting and 
defamatory.

>Great code in FFmpeg? Where is that code hiding.
>
>
>>
>> Regards,
>>
>> --
>>   Nicolas George
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>>
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] avformat: add Software Defined Radio support

2023-06-27 Thread Rémi Denis-Courmont
Le sunnuntaina 25. kesäkuuta 2023, 1.19.04 EEST Nicolas George a écrit :
> Michael Niedermayer (12023-06-23):
> > * What iam interrested in was working with the signals at a low level, why
> >   because i find it interresting and fun.
> 
> Then this is what you should be spending your time on, and to hell with
> anybody who says otherwise.

Straw man much? The argument is that SDR does not belong in FFmpeg master, not 
that Michael cannot do SDR in his free time.

> Unless you have commitments that we are not privy of, nobody can tell
> you how you are supposed to be spending your time and skills.

FYI https://fflabs.eu/about/

> Nobody can force you to manage releases and fix fuzzing bugs and do all the
> things you do that are necessary to the project. Necessary to a conception
> of the good of the project that is not even your own I think.

Same straw man argument.

> Nobody can prevent you from hacking the things that motivate you. At
> worst, they can prevent you from committing the resulting code into
> official FFmpeg. That would be the project's loss,

Not cluttering an open-source project is generally considered to be a win, not 
a loss. Less maintainance work and smaller code size.

> and you can still
> publish it on a private branch. But they do not have the power to block
> you from pushing.

> It is not even clear they have the authority do block
> you, more so if the code is really good and fits FFmpeg well.

...which it doesn't.

"A complete, cross-platform solution to record, convert and stream audio and 
video" is how the official website defines FFmpeg...

Radio waves (outside the visibible spectrum) are neither audio nor video.

[Gratuitous slander removed]

> I am especially annoyed by the “it's too hard” naysayers

Nobody said that it was too hard. The complexity argument was that DAB & DVB 
is much more complicated than AM and FM, which they indeed are by any 
reasonable objective metric, not that Michael wouldn't be able to implement 
them.

> They do not realize they reveal more about their own skills than anybody
> else.

It only tells that they have (at least) a rudimentary understanding of radio 
waves to figure out that DAB or DVB are much more complex signals than FM or 
AM.

Maybe among the general public that would be telling some. Among ffmpeg-devel, 
I think it really tells nothing because just about everybody here would know 
or be able to guess that much.

> But the whole attitude who wants FFmpeg to be a Serious OpenSource TM
> Project, who needs to make releases and worry above all about ABI
> stability, is really the attitude who is killing all the fun in working
> on FFmpeg.

Michael, you or really anyone is more than welcome to fork and have their fun 
coding on FFmpeg *outside* the community project, if they cannot have it 
within the framework that is that community.

Meanwhile, they are dozens of people, ostensibly including Michael himself, 
that depend on FFmpeg being "Serious OpenSource TM" in some way, for their 
livelihood, and millions for their computer use. You don't get to ruin that, 
and if you try, you will first be blocked by the TC, and if you try harder, you 
will be kicked by the CC.

It looks to me that some people need to learn that not every piece of code 
they write or intend to write belongs in upstream FFmpeg.

> Hey, people, realize FFmpeg does not exist to be a Serious OpenSource TM
> Project, FFmpeg does not exist to serve other projects, to serve
> companies who benefit from it give the bare minimum back.

It also does not exist to be your playground, especially not at the expense of 
other people. And it has certainly not evolved that way for the past 20 or so 
years.

> FFmpeg exists because some day a dude thought it would be fun to write a
> MPEG decoder.

> And everybody else told him it would be too hard,
> everybody else told him to use an existing library and to leave it to
> the professionals. He did not believe them and proved them utterly
> wrong, and the rest, as the saying goes, is history.

And? That is completely irrelevant to the question whether (and if so how) SDR 
should be integrated in FFmpeg.

> So I will say it explicitly. We — me, and everybody who agrees with me —
> do not want to just maintain a bunch of wrappers for the convenience of
> others, we want to have fun writing interesting code, trying new ways of
> doing things, inventing original optimizations. We can find a balance
> and work on useful things too. But if you want us to work only on the
> boring useful things, if you want to bar us from working on fun things,
> then just fork you.

To paraphrase you, the fact that you can throw such an argument with a 
straight face tells a lot.

So what if, for the sake of the argument, people's subjective notion of 
discretionary fun is writing FFmpeg code in a project that excludes you?


This is tiring, I'll leave it at that.

-- 
レミ・デニ-クールモン
http://www.remlab.net/




Re: [FFmpeg-devel] [PATCH 2/6] lavc/ac3dsp: RISC-V V float_to_fixed24

2023-06-15 Thread Rémi Denis-Courmont
Le torstaina 15. kesäkuuta 2023, 13.36.41 EEST Peiting Shen a écrit :
> From: Shen Peiting 
> 
> Vector instructions replaces scalar options of float convert to fixed
> 
> Benchmarks on Spike(cycles):
> len=16
> float_to_fixed24_c: 315
> float_to_fixed24_rvv: 27
> len=160
> float_to_fixed24_c: 2871
> float_to_fixed24_rvv: 67
> 
> Co-Authored by: Yang Xiaojun 
> Co-Authored by: Huang Xing 
> Co-Authored by: Zeng Fanchen 
> Signed-off-by: Shen Peiting 
> ---
>  libavcodec/riscv/ac3dsp_init.c |  5 -
>  libavcodec/riscv/ac3dsp_rvv.S  | 19 +++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> index bb67d86998..a4e75a7541 100644
> --- a/libavcodec/riscv/ac3dsp_init.c
> +++ b/libavcodec/riscv/ac3dsp_init.c
> @@ -25,13 +25,16 @@
>  #include "config.h"
> 
>  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> nb_coefs); +void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> unsigned int len);
> 
>  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>  {
>  int flags = av_get_cpu_flags();
>  #if HAVE_RVV
> -if (flags & AV_CPU_FLAG_RVV_I32)
> +if (flags & AV_CPU_FLAG_RVV_I32) {
>  c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
> +c->float_to_fixed24 = ff_float_to_fixed24_rvv;
> +}
>  #endif
>  }
> 
> diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> index 879123f4a7..d98e72c12c 100644
> --- a/libavcodec/riscv/ac3dsp_rvv.S
> +++ b/libavcodec/riscv/ac3dsp_rvv.S
> @@ -44,3 +44,22 @@ func ff_ac3_exponent_min_rvv, zve32x
>  3:
>  ret
>  endfunc
> +
> +
> +func ff_float_to_fixed24_rvv, zve32x
> +addit1, x0, 1

That's `li t1, 1` please.

> +sllit1, t1, 24
> +fcvt.s.wf1, t1

Please use ABI names for FPRs, e.g. `ft0`. Nobody wants to have to remember 
which ones are callee-saved and which ones aren't.

> +1:
> +vsetvli t0, a2, e32, m8
> +vle32.v v0, (a1)
> +vfmul.vfv0, v0, f1
> +vfcvt.x.f.v v16, v0
> +vse32.v v16, (a0)
> +sub a2, a2, t0
> +sllit0, t0, 2
> +add a1, a1, t0
> +add a0, a0, t0

Use sh2add to save one in three instruction here.

And please interleave scalar and vector instructions so in-order CPU can 
potentially multi-issue.

> +bgtza2, 1b
> +ret
> +endfunc

-- 
Реми Дёни-Курмон
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/6] lavc/ac3dsp: RISC-V V ac3_exponent_min

2023-06-15 Thread Rémi Denis-Courmont
ree Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA + */
> +
> +#include "libavutil/riscv/asm.S"
> +
> +func ff_ac3_exponent_min_rvv, zve32x
> +beq a1, x0, 3f

Conventionally, we use ABI names for GP and FP registers like almost everybody 
else and their moms in RISC-V world. So that would be `zero`.

But in this case, you should use the `beqz` alias anyway.

> +li  t0, 256
> +addia1, a1, 1
> +1:
> +mv  t2, a0

AFAICT, t2 is always the same as a0, and thus this is unnecessary.

> +mv  t3, a1
> +lb  t4, (t2)
> +2:
> +vsetvli t1, t3, e8, m8
> +vlse8.v v0, (t2), t0
> +vmv.s.x v8, t4
> +sub t3, t3, t1
> +vredminu.vs v8, v0, v8
> +vmv.x.s t4, v8
> +bnezt3, 2b
> +vsetivlit1, 1, e8

When you're not using the output, so use zero.

But you don't even need to reset the vector configuration here. Just use 
masking to store the one element (you could also transfer to scalar and store, 
but that's probably slower than masking).

> +vse8.v  v8, (a0)
> +addia0, a0, 1
> +addia2, a2, -1

This will stall on an in-order CPU. Please avoid immediately consecutive 
interdependent instructions.

> +bneza2, 1b
> +3:
> +ret
> +endfunc


-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 4xH and 2xH chroma blocks

2023-06-15 Thread Rémi Denis-Courmont
Le torstaina 15. kesäkuuta 2023, 17.58.37 EEST Arnie Chang a écrit :
> Since these functions are frequently called, I prefer instantiating similar
> code many times
> rather than calling another internal function, as it may introduce
> additional function call overhead.

This works both ways. Smaller code reduces IC overhead and the risk of its own 
eviction or that of some other frequently used code.

Here, we would just add one `li` to the 8x cases, and a pair of `li` and `j` 
to the 2x and 4x cases (like we already do for Opus postfilter). Indeed, since 
this is assembler, we can enforce tail-call optimisation.

Since this is assembler, you can count on tail-call optimisation. This is 
really just one `li` and `j` added on the 2 and 4.

Not that I could measure the actual impact of either approaches.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 0/6] RISC-V initial ac3dsp

2023-06-15 Thread Rémi Denis-Courmont
Le torstaina 15. kesäkuuta 2023, 16.57.18 EEST Lynne a écrit :
> Jun 15, 2023, 12:37 by shenpeit...@eswincomputing.com:
> > From: Shen Peiting 
> > 
> > We optimized the six interfaces of AC3 init by RVV, the optimized
> > performance was tested on the RISC-V ISA simulator--Spike, and the
> > results were attached to each commit.
> > 
> > shenpeiting (6):
> >  lavc/ac3dsp: RISC-V V ac3_exponent_min
> >  lavc/ac3dsp: RISC-V V float_to_fixed24
> >  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32
> >  lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_float
> >  lavc/ac3dsp: RISC-V V ac3_compute_mantissa_size
> >  lavc/ac3dsp: RISC-V B ac3_extract_exponents
> >  
> >  libavcodec/ac3dsp.c|   2 +
> >  libavcodec/ac3dsp.h|   1 +
> >  libavcodec/riscv/Makefile  |   3 +
> >  libavcodec/riscv/ac3dsp_init.c |  60 +
> >  libavcodec/riscv/ac3dsp_rvb.S  |  42 ++
> >  libavcodec/riscv/ac3dsp_rvv.S  | 225 +
> >  6 files changed, 333 insertions(+)
> >  create mode 100644 libavcodec/riscv/ac3dsp_init.c
> >  create mode 100644 libavcodec/riscv/ac3dsp_rvb.S
> >  create mode 100644 libavcodec/riscv/ac3dsp_rvv.S
> 
> Could you implement checkasm for this? It shouldn't
> be more than a hundred lines, and there are examples,
> tests/checkasm/aacpsdsp.c being the most similar.
> Since CPUs with the needed extensions aren't released,
> we're not doing any FATE runs,

Well... I accept hardware donations (with regular USB-C power supply and 
passive cooling) to back what would be the third generation of RISC-V FATE 
instances.

Until R-V-V 1.0 hardware production substitutes unobtainium for silicium, I 
also accept Lichee Pi4A or equivalent hardware bundles, which would be able to 
run most (but definitely not all) of FFmpeg's RVV functions with a sizable 
amount of kludging.

> and so if the results don't
> match the C version, we'll end up with broken code once
> they do exist. And no one wants to debug someone else's
> assembly.
> 
> Those results look far too optimistic, and I'm guessing
> it's because they're using a theoretical huge vector size
> limit. Could you re-test with something more realistic,
> like 256-bit vectors, using checkasm --bench?

It could also be that Spike counts everything as one cycle, regardless of the 
group multipler, not (just) the vector size.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] avformat: add Software Defined Radio support

2023-06-23 Thread Rémi Denis-Courmont
Hi,

Le 23 juin 2023 13:17:28 GMT+02:00, Michael Niedermayer 
 a écrit :
>On Fri, Jun 23, 2023 at 10:34:10AM +0800, Kieran Kunhya wrote:
>> FFmpeg is not the place for SDR. SDR is as large and complex as the
>> entirety of multimedia.
>> 
>> What next, is FFmpeg going to implement TCP in userspace, Wifi, Ethernet,
>> an entire 4G and 5G stack?
>
>https://en.wikipedia.org/wiki/Straw_man
>
>What my patch is doing is adding support for AM demodulation, the AM
>specific code is like 2 pages. The future plan for FM demodulation will
>not add alot of code either. DAB/DVB should also not be anything big
>(if that is implemented at all by anyone)

Literally every one of those layer-2 protocols has a lower-level API already on 
Linux, and typically they are, or would be, backends to libavdevice.

(Specifically AM and FM are supported by V4L radio+ALSA; DAB and DVB by 
Linux-DVB. 4G and 5G are network devices.)

So I can only agree with Kieran that these are *lower* layers, that don't 
really look like they belong in FFmpeg.

>If the code grows beyond that it could be split out into a seperate
>library outside FFmpeg.

I think that the point is, that that code should be up-front in a separate 
FFmpeg-independent library. And it's not just a technical argument with 
layering. It's also that it's too far outside what FFmpeg typically works with, 
so it really should not be put under the purview of FFmpeg-devel. In other 
words, it's also a social problem.

The flip side of that argument is that this may be of interest to other 
higher-level projects than FFmpeg, including projects that (rightfully) don't 
depend on FFmpeg, and that this may interest people who wouldn't contribute or 
participate in FFmpeg.

>The size of all of SDR really has as much bearing on FFmpeg as the size
>of all of mathematics has on the use of mathematics in FFmpeg.

On an empirical basis, I'd argue that FFmpeg mathematics are so fine-tuned to 
specific algorithmic use cases, that you will anyway end up writing custom 
algorithms and optimisations here. And thus you won't be sharing much code with 
(the rest of) FFmpeg down the line.

>
>
>> All without any separation of layers (the problem you currently have)?
>
>Lets see where the review process leads to.
>It is possible iam missing some things, its possible others are missing
>some factors.
>Ultimately sdr is more similar than it is different from existing input
>devices and demuxers.
>The review process may identify possible solutions that benefit other
>input devices too. It might identify shortcommings in FFmpeg that
>could lead to improvments.
>I dont really enjoy the review process ATM, no ;) but lets see where it
>leads to.
>
>Thx
>
>[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] DRAFT: riscv: add Linux riscv_hwprobe()

2023-06-20 Thread Rémi Denis-Courmont
---
 configure |  2 ++
 libavutil/riscv/cpu.c | 54 ---
 2 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index ed9efad985..8cad88cdd2 100755
--- a/configure
+++ b/configure
@@ -5412,6 +5412,8 @@ elif enabled ppc; then
 
 elif enabled riscv; then
 
+check_headers sys/hwprobe.h
+
 if test_cpp_condition stddef.h "__riscv_zbb"; then
 enable fast_clz
 fi
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
index a9263dbb78..36432f9777 100644
--- a/libavutil/riscv/cpu.c
+++ b/libavutil/riscv/cpu.c
@@ -20,6 +20,7 @@
 
 #include "libavutil/cpu.h"
 #include "libavutil/cpu_internal.h"
+#include "libavutil/macros.h"
 #include "libavutil/log.h"
 #include "config.h"
 
@@ -27,26 +28,53 @@
 #include 
 #define HWCAP_RV(letter) (1ul << ((letter) - 'A'))
 #endif
+#ifdef HAVE_SYS_HWPROBE_H
+#include 
+#endif
 
 int ff_get_cpu_flags_riscv(void)
 {
 int ret = 0;
+#if defined (HAVE_SYS_HWPROBE_H)
+struct riscv_hwprobe pairs[] = {
+{ RISCV_HWPROBE_KEY_BASE_BEHAVIOR, 0 },
+{ RISCV_HWPROBE_KEY_IMA_EXT_0, 0 },
+};
+
+if (riscv_hwprobe(pairs, FF_ARRAY_ELEMS(pairs), 0, NULL, 0) == 0) {
+if (pairs[0].value & RISCV_HWPROBE_BASE_BEHAVIOR_IMA) {
+ret |= AV_CPU_FLAG_RVI;
+if (pairs[1].value & RISCV_HWPROBE_IMA_FD)
+ret |= AV_FLAG_RVF | AV_FLAG_RVD;
+# ifdef RISCV_HWPROBE_IMA_V
+if (pairs[1].value & RISCV_HWPROBE_IMA_V)
+ret |= AV_CPU_FLAG_RVV_I32 | AV_CPU_FLAG_RVV_I64
+ | AV_CPU_FLAG_RVV_F32 | AV_CPU_FLAG_RVV_F64;
+# endif
+# ifdef RISCV_HWPROBE_EXT_ZBB
+if (pairs[1].value & RISCV_HWPROBE_EXT_ZBB)
+ret |= AV_FLAG_RVB_BASIC;
+# endif
+} else
+#endif
 #if HAVE_GETAUXVAL
-const unsigned long hwcap = getauxval(AT_HWCAP);
+{
+const unsigned long hwcap = getauxval(AT_HWCAP);
 
-if (hwcap & HWCAP_RV('I'))
-ret |= AV_CPU_FLAG_RVI;
-if (hwcap & HWCAP_RV('F'))
-ret |= AV_CPU_FLAG_RVF;
-if (hwcap & HWCAP_RV('D'))
-ret |= AV_CPU_FLAG_RVD;
-if (hwcap & HWCAP_RV('B'))
-ret |= AV_CPU_FLAG_RVB_BASIC;
+if (hwcap & HWCAP_RV('I'))
+ret |= AV_CPU_FLAG_RVI;
+if (hwcap & HWCAP_RV('F'))
+ret |= AV_CPU_FLAG_RVF;
+if (hwcap & HWCAP_RV('D'))
+ret |= AV_CPU_FLAG_RVD;
+if (hwcap & HWCAP_RV('B'))
+ret |= AV_CPU_FLAG_RVB_BASIC;
 
-/* The V extension implies all Zve* functional subsets */
-if (hwcap & HWCAP_RV('V'))
-ret |= AV_CPU_FLAG_RVV_I32 | AV_CPU_FLAG_RVV_I64
- | AV_CPU_FLAG_RVV_F32 | AV_CPU_FLAG_RVV_F64;
+/* The V extension implies all Zve* functional subsets */
+if (hwcap & HWCAP_RV('V'))
+ ret |= AV_CPU_FLAG_RVV_I32 | AV_CPU_FLAG_RVV_I64
+  | AV_CPU_FLAG_RVV_F32 | AV_CPU_FLAG_RVV_F64;
+}
 #endif
 
 #ifdef __riscv_i
-- 
2.40.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/ac3dsp: RISC-V B ac3_extract_exponents

2023-06-15 Thread Rémi Denis-Courmont
Le torstaina 15. kesäkuuta 2023, 13.36.45 EEST Peiting Shen a écrit :
> From: Shen Peiting 
> 
> Use RVB instruction clz to calculate the number of leading zeros of MSB
> instead of av_log2.
> 
> Benchmarks on Spike(cycles):
> ac3_extract_exponents_c: 8226
> ac3_extract_exponents_rvb: 1167

FWIW, RV-Zbb can be benchmarked on real hardware.

I would have done it already if only there was a checkasm case for this.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/6] lavc/ac3dsp: RISC-V V ac3_sum_square_butterfly_int32

2023-06-15 Thread Rémi Denis-Courmont
Le torstaina 15. kesäkuuta 2023, 13.36.42 EEST Peiting Shen a écrit :
> From: Shen Peiting 
> 
> Scalar calculating int32 sum_square optimized by using RVV instructions
> 
> Benchmarks on Spike(cycles):
> len=128
> ac3_sum_square_butterfly_int32_c: 8497
> ac3_sum_square_butterfly_int32_rvv: 258
> len=1280
> ac3_sum_square_butterfly_int32_c: 84529
> ac3_sum_square_butterfly_int32_rvv: 2274
> 
> Co-Authored by: Yang Xiaojun 
> Co-Authored by: Huang Xing 
> Co-Authored by: Zeng Fanchen 
> Signed-off-by: Shen Peiting 
> ---
>  libavcodec/riscv/ac3dsp_init.c |  8 +
>  libavcodec/riscv/ac3dsp_rvv.S  | 53 ++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/libavcodec/riscv/ac3dsp_init.c b/libavcodec/riscv/ac3dsp_init.c
> index a4e75a7541..4fd4abe83e 100644
> --- a/libavcodec/riscv/ac3dsp_init.c
> +++ b/libavcodec/riscv/ac3dsp_init.c
> @@ -26,6 +26,10 @@
> 
>  void ff_ac3_exponent_min_rvv(uint8_t *exp, int num_reuse_blocks, int
> nb_coefs); void ff_float_to_fixed24_rvv(int32_t *dst, const float *src,
> unsigned int len); +void ff_ac3_sum_square_butterfly_int32_rvv(int64_t
> sum[4],
> +const int32_t *coef0,
> +const int32_t *coef1,
> +int len);
> 
>  av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>  {
> @@ -35,6 +39,10 @@ av_cold void ff_ac3dsp_init_riscv(AC3DSPContext *c)
>  c->ac3_exponent_min = ff_ac3_exponent_min_rvv;
>  c->float_to_fixed24 = ff_float_to_fixed24_rvv;
>  }
> +#if (__riscv_xlen >= 64)
> +if (flags & AV_CPU_FLAG_RVV_I64)
> +c->sum_square_butterfly_int32 =
> ff_ac3_sum_square_butterfly_int32_rvv; +#endif
>  #endif
>  }
> 
> diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S
> index d98e72c12c..4e0d238f85 100644
> --- a/libavcodec/riscv/ac3dsp_rvv.S
> +++ b/libavcodec/riscv/ac3dsp_rvv.S
> @@ -63,3 +63,56 @@ func ff_float_to_fixed24_rvv, zve32x
>  bgtza2, 1b
>  ret
>  endfunc
> +
> +
> +func ff_ac3_sum_square_butterfly_int32_rvv, zve64x
> +vsetvli t0, a3, e32, m2
> +vle32.v v0, (a1)
> +vle32.v v2, (a2)
> +vadd.vv v4, v0, v2
> +vsub.vv v6, v0, v2
> +vwmul.vvv8, v0, v0
> +vwmul.vvv12, v2, v2
> +vwmul.vvv16, v4, v4
> +vwmul.vvv20, v6, v6
> +sub a3, a3, t0
> +sllit0, t0, 2
> +add a1, a1, t0
> +add a2, a2, t0
> +beq a3, x0, 2f
> +1:
> +vsetvli t0, a3, e32, m2
> +vle32.v v0, (a1)
> +vle32.v v2, (a2)
> +vadd.vv v4, v0, v2
> +vsub.vv v6, v0, v2
> +vwmacc.vv   v8, v0, v0
> +vwmacc.vv   v12, v2, v2
> +vwmacc.vv   v16, v4, v4
> +vwmacc.vv   v20, v6, v6
> +sub a3, a3, t0
> +sllit0, t0, 2
> +add a1, a1, t0
> +add a2, a2, t0
> +bneza3, 1b
> +2:
> +vsetvli t0, x0, e64, m4
> +vmv.s.x v24, x0
> +vmv.s.x v25, x0
> +vmv.s.x v26, x0
> +vmv.s.x v27, x0
> +vredsum.vs  v24, v8, v24
> +vredsum.vs  v25, v12, v25
> +vredsum.vs  v26, v16, v26
> +vredsum.vs  v27, v20, v27

As far as I can tell this is a reserved encoding (c.f. RVV 1.0 §3.4.2), and I 
believe that QEMU throws an Illegal instruction in this case. (I would check 
but there are no checkasm test case for this function.) Does this actual work 
on your simulator? Because if so, then your simulator is probably broken/
buggy.

> +vsetivlit0, 1, e64, m1
> +vse64.v v24, (a0)
> +addia0, a0, 8
> +vse64.v v25, (a0)
> +addia0, a0, 8
> +vse64.v v26, (a0)
> +addia0, a0, 8
> +vse64.v v27, (a0)
> +addia0, a0, 8
> +ret
> +endfunc


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] MAINTAINERS: add vanitous self to maintain RISC-V

2023-05-05 Thread Rémi Denis-Courmont
Hi,

Not really. My old key had to be revoked, so it should probably not be listed 
there.

(Sorry for top post)

Le 4 mai 2023 21:59:29 GMT+03:00, James Almer  a écrit :
>On 5/3/2023 1:10 PM, Rémi Denis-Courmont wrote:
>> ---
>>   MAINTAINERS | 1 +
>>   1 file changed, 1 insertion(+)
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 854ccc3fa4..f95be01dc6 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -543,6 +543,7 @@ LoongArch   Shiyou Yin
>>   Mac OS X / PowerPC  Romain Dolbeau, Guillaume Poirier
>>   Amiga / PowerPC Colin Ward
>>   Linux / PowerPC     Lauri Kasanen
>> +RISC-V  Rémi Denis-Courmont
>>   Windows MinGW   Alex Beregszaszi, Ramiro Polla
>>   Windows Cygwin  Victor Paesa
>>   Windows MSVCMatthew Oliver, Hendrik Leppkes
>
>Do you have a GPG fingerprint, to add at the end of the file? If not I'll push 
>this as is.
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic

2023-05-10 Thread Rémi Denis-Courmont
Hi,

Le 10 mai 2023 11:46:57 GMT+03:00, Arnie Chang  a écrit 
:
>Considering the benefits of the open ISA like RISC-V,
>the intrinsic code should still have a better chance of being optimized by
>the compiler for hardware variants.

You probably have access to proprietary performance information of SiFive which 
nobody else here can argue about, so maybe you are onto something here.

However, FFmpeg needs to support any RV64GC CPU with a single build, because 
that's how many Linux distributions and applications will build it. So we can't 
really rely on the compiler's per-CPU model tuning for scheduling. In any case, 
my guess is that there won't be that much room for the compiler to reorder 
vector code, even if it's using intrinsics.

To the contrary, I fear that we need to tune the group multiplier (LMUL) at 
runtime to get good performance on different processor designs. Essentially 
unrolling. And if that turns out to be true, then we *cannot* use intrinsics 
since they don't support varying the group multiplier at runtime unlike outline 
assembler.

So I could be completely wrong but if so, we'd need more substantial 
explanation and justification why.

>At this moment, the intrinsic implementation is the only thing available.
>It would take a significant amount of time to rewrite it in assembly due to
>the large amount of functions.

Is it really that much work? Leaving aside maybe converting the inline 
functions into assembler macros, it seems mostly like a case of passing the C 
code through the compiler, then disassembling the result and then reformatting 
for legibility here and there.

As the proverb goes, "on the Internet, nobody knows you're a monkey". Nobody 
needs to know that somebody wrote their assembler with the help of intrinsics 
and a compiler.

>I was wondering if we could treat the intrinsic code as an initial version
>for the RISC-V port with the following modification.
>- Add an option --enable-rvv-intrinsic to EXPLICITLY enable the
>intrinsic optimization, which is disabled by default.

I will let more senior developers to comment here, but I suspect that this 
would set a bad example that would eventually induce other people into choosing 
intrinsics over outline assembler for new code.

Adding a build option could be viable if we wanted to advise against using the 
code. But here we rather want to advise against using the code as a reference, 
not against running it.

If this were the kernel, I'd argue merging the code into `staging` but FFmpeg 
is not so large that it'd have a staging area.

>  Based on the given conditions, vector supports in GCC and intrinsics
>dislike and limits. Disabling it by default seems a reasonable way.
>
>For those who want to be involved in the optimization of H.264 decoder on
>RISC-V can work on the assembly and decide whether to refer to intrinsic
>code.
>I believe this would be a good starting point for future optimization.

Well most likely. The thing is though that nobody in the FFmpeg community 
(except you) has hardware access in any shape or form at this time, at least 
that I'd know. That's one of the reasons why my own efforts have stalled.

>
>
>On Wed, May 10, 2023 at 12:51 AM Rémi Denis-Courmont 
>wrote:
>
>> Hi,
>>
>> Le tiistaina 9. toukokuuta 2023, 12.50.25 EEST Arnie Chang a écrit :
>> > We are submitting a set of patches that significantly improve H.264
>> decoding
>> > performance by utilizing RVV intrinsic code.
>>
>> I believe that there is a general dislike of compiler intrinsic for vector
>> optimisations in FFmpeg for a plurality of reasons. FWIW, that dislike is
>> not
>> limited to FFmpeg:
>> https://www.reddit.com/r/RISCV/comments/131hlgq/comment/ji1ie3l/
>> Indeed, in my personal opinion, RISC-V V intrinsics specifically are
>> painful to
>> read/write compared to assembler.
>>
>> On top of that, in this particular case, intrinsics have at least three,
>> possibly four, additional and more objective challenges as compared to the
>> existing RVV assembler:
>>
>> 1) They are less portable, requiring the most bleeding edge version of
>> compilers. Case in point: our FATE GCC instance does not support them as
>> of
>> today (because Debian Unstable does not).
>>
>> 2) They do not work with run-time CPU detection, at least not currently.
>> This
>> is going to be a major stumbling point for Linux distributions which need
>> to
>> build code that runs on processors without vector unit.
>>
>> 3) V intrinsics require specifying the group multiplier at every
>> instruction.
>> In most cases, this is just very inconvenient. But in those algorithms
>> that
>> require a fixed vector size (e.g. Opus DSP alr

Re: [FFmpeg-devel] [PATCH] avformat/hls: look for trailing GET headers with m3u8 extension check

2023-05-16 Thread Rémi Denis-Courmont


Le 15 mai 2023 05:38:22 GMT+08:00, Michael Niedermayer  
a écrit :
>> > 
>> > But lets consider:
>> > file:///home/myname/myfile.m3u8?file.avi
>> > /home/myname/myfile.m3u8?file.avi
>> > http:/server/myfile.m3u8?file.avi
>> > 
>> > The first is odd, iam not sure what "?file.avi" is and i wonder if we
>> > could simply reject this at file protocol level.
>> 
>> > If its accepted, I think it would map to /home/myname/myfile.m3u8 on disk
>> > not "/home/myname/myfile.m3u8?file.avi"

Yes. You would escape the question mark if you wanted it in the local file path.

>> This is incorrect.

No.

>> try it by naming a file "foo.m3u8?bar.txt" and run
>> xdg-open 'file:///home/leo/foo.m3u8?bar.txt' and you will find that it opens
>> it.

It should stop at the question mark and drop everything from there. FWIW, URL 
syntax is specified by IETF, not XDG.

>What is incorrect ?
>we have some tools that will interpret "file:///home/leo/foo.m3u8?bar.txt" as
>/home/leo/foo.m3u8?bar.txt and some /home/leo/foo.m3u8 on disk
>
>I think that makes that sort of file URLs ambigous, dont you agree ?

Absolute URLs aren't ambiguous, and if the file scheme is specified, then the 
URL is absolute by definition. Situations whence relative locations are 
accepted, are ambiguous because some use URL syntax, some use local file path 
syntax (which is subtly incompatibly different) and some use a messed up mix of 
both relying on the tolerance of web browsers.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] [RFC] avformat: Add basic same origin check

2023-05-03 Thread Rémi Denis-Courmont
Le keskiviikkona 3. toukokuuta 2023, 16.33.59 EEST Michael Niedermayer a écrit 
:
> This patch was inspired by a report on ffmpeg-security about SSRF
> (for which custom io_open() callback or soem sort of sandboxing/VM can be
>  used to avoid it)
>  The patch here was intended to explore if we can provide something thats
>  better tahn currently by default

I am not sure how a dodgy HLS manifest would be any different from the user 
clicking an hyperlink from a dodgy website - or opening a dodgy playlist file 
in their FFmpeg-based media player application for that matter. Either way, it 
can open any URL.

It is obviously not an ideal situation, but any restriction here will most 
definitely break existing use cases (and likely be abused by server operators 
to lock FFmpeg out).

Even the "obvious" blocking of secure (HTTPS) to nonsecure (HTTP) references 
is likely to break stuff. If the end result is that everybody just turns origin 
checking off, it's pretty pointless.

> But the same issue with roles flipped occurs for the end user and the user
> cannot be expected to setup a custom io_open() callback for his player
> The current code can be also used to poke
> around the local network of the user. Which is unexpected by the user
> for example a avi file could be probed as a m3u8 playlist and then
> poke around on the local net while mixing that with remote urls
> from the timing of the remote accesses the remote party should be able
> to infer what happened with the local poking.

I agree, but it is unrealistic to change anything here. People make playlists 
mixed with local files and network file systems or cloud storage services. Yes, 
there is a slight information leakage. For instance, you can probe if a local 
file exists by interleaving local and remote URLs in a playlist.

In practice, this is a well-known issue and has been for two at least decades, 
and the "solution" is to limit what the open file can do. To state the obvious 
extreme, one wouldn't want to execute a shell script or an executable from a 
playlist.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] MAINTAINERS: add vanitous self to maintain RISC-V

2023-05-03 Thread Rémi Denis-Courmont
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 854ccc3fa4..f95be01dc6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -543,6 +543,7 @@ LoongArch   Shiyou Yin
 Mac OS X / PowerPC  Romain Dolbeau, Guillaume Poirier
 Amiga / PowerPC Colin Ward
 Linux / PowerPC Lauri Kasanen
+RISC-V  Rémi Denis-Courmont
 Windows MinGW   Alex Beregszaszi, Ramiro Polla
 Windows Cygwin  Victor Paesa
 Windows MSVCMatthew Oliver, Hendrik Leppkes
-- 
2.40.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] [RFC] avformat: Add basic same origin check

2023-05-03 Thread Rémi Denis-Courmont
Nit: different

But is there an actual threat model whence it is necessary or even useful for a 
media framework to implement origin policies? On top of my head, this can be 
used by content providers to prevent third parties from referencing their media 
files... but that seems user-hostile; it does not provide any security for the 
user of FFmpeg.

I could be wrong, but IMU, origin policy is meant to prevent harmful embedding 
of images and frames, and to prevent cross-site scripting, but FFmpeg doesn't 
support either if these anyway, so it's not concerned.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 0/5] RISC-V: Improve H264 decoding performance using RVV intrinsic

2023-05-09 Thread Rémi Denis-Courmont
Hi,

Le tiistaina 9. toukokuuta 2023, 12.50.25 EEST Arnie Chang a écrit :
> We are submitting a set of patches that significantly improve H.264 decoding
> performance by utilizing RVV intrinsic code.

I believe that there is a general dislike of compiler intrinsic for vector 
optimisations in FFmpeg for a plurality of reasons. FWIW, that dislike is not 
limited to FFmpeg:
https://www.reddit.com/r/RISCV/comments/131hlgq/comment/ji1ie3l/
Indeed, in my personal opinion, RISC-V V intrinsics specifically are painful to 
read/write compared to assembler.

On top of that, in this particular case, intrinsics have at least three, 
possibly four, additional and more objective challenges as compared to the 
existing RVV assembler:

1) They are less portable, requiring the most bleeding edge version of 
compilers. Case in point: our FATE GCC instance does not support them as of 
today (because Debian Unstable does not).

2) They do not work with run-time CPU detection, at least not currently. This 
is going to be a major stumbling point for Linux distributions which need to 
build code that runs on processors without vector unit.

3) V intrinsics require specifying the group multiplier at every instruction. 
In most cases, this is just very inconvenient. But in those algorithms that 
require a fixed vector size (e.g. Opus DSP already now), this simply does _not_ 
work.

Essentially, this is the downside of relying on the compiler to do the 
register allocation.

4) (Unsure) Intrinsics are notorious for missing some code points.


The first two points may be addressed eventually. But the third point is 
intrinsic to intrinsics (hohoho). So unless there is a case for why intrinsics 
would be all but _required_, please avoid them.

Now I do realise that that means some of the code won't be XLEN-indepent. 
Well, we can cross that bridge with macros if/when somebody actually cares 
about FFmpeg vector optimisations on RV32I.

Br,

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-19 Thread Rémi Denis-Courmont
Le keskiviikkona 17. toukokuuta 2023, 10.13.01 EEST Arnie Chang a écrit :
> Optimize the put and avg filtering for 8x8 chroma blocks
> 
> Signed-off-by: Arnie Chang 
> ---
>  libavcodec/h264chroma.c   |   2 +
>  libavcodec/h264chroma.h   |   1 +
>  libavcodec/riscv/Makefile |   3 +
>  libavcodec/riscv/h264_chroma_init_riscv.c |  39 ++
>  libavcodec/riscv/h264_mc_chroma.S | 492 ++
>  libavcodec/riscv/h264_mc_chroma.h |  34 ++
>  6 files changed, 571 insertions(+)
>  create mode 100644 libavcodec/riscv/h264_chroma_init_riscv.c
>  create mode 100644 libavcodec/riscv/h264_mc_chroma.S
>  create mode 100644 libavcodec/riscv/h264_mc_chroma.h
> 
> diff --git a/libavcodec/h264chroma.c b/libavcodec/h264chroma.c
> index 60b86b6fba..1eeab7bc40 100644
> --- a/libavcodec/h264chroma.c
> +++ b/libavcodec/h264chroma.c
> @@ -58,5 +58,7 @@ av_cold void ff_h264chroma_init(H264ChromaContext *c, int
> bit_depth) ff_h264chroma_init_mips(c, bit_depth);
>  #elif ARCH_LOONGARCH64
>  ff_h264chroma_init_loongarch(c, bit_depth);
> +#elif ARCH_RISCV
> +ff_h264chroma_init_riscv(c, bit_depth);
>  #endif
>  }
> diff --git a/libavcodec/h264chroma.h b/libavcodec/h264chroma.h
> index b8f9c8f4fc..9c81c18a76 100644
> --- a/libavcodec/h264chroma.h
> +++ b/libavcodec/h264chroma.h
> @@ -37,5 +37,6 @@ void ff_h264chroma_init_ppc(H264ChromaContext *c, int
> bit_depth); void ff_h264chroma_init_x86(H264ChromaContext *c, int
> bit_depth); void ff_h264chroma_init_mips(H264ChromaContext *c, int
> bit_depth); void ff_h264chroma_init_loongarch(H264ChromaContext *c, int
> bit_depth); +void ff_h264chroma_init_riscv(H264ChromaContext *c, int
> bit_depth);
> 
>  #endif /* AVCODEC_H264CHROMA_H */
> diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile
> index 965942f4df..08b76c93cb 100644
> --- a/libavcodec/riscv/Makefile
> +++ b/libavcodec/riscv/Makefile
> @@ -19,3 +19,6 @@ OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_init.o \
>  RVV-OBJS-$(CONFIG_PIXBLOCKDSP) += riscv/pixblockdsp_rvv.o
>  OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o
>  RVV-OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_rvv.o
> +
> +OBJS-$(CONFIG_H264CHROMA) += riscv/h264_chroma_init_riscv.o
> +RVV-OBJS-$(CONFIG_H264CHROMA) += riscv/h264_mc_chroma.o

Please maintain the existing ordering, which is to say, alphabetical.

> diff --git a/libavcodec/riscv/h264_chroma_init_riscv.c
> b/libavcodec/riscv/h264_chroma_init_riscv.c new file mode 100644
> index 00..b6f98ba693
> --- /dev/null
> +++ b/libavcodec/riscv/h264_chroma_init_riscv.c
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (c) 2023 SiFive, Inc. All rights reserved.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA + */
> +
> +#include 
> +
> +#include "libavutil/attributes.h"
> +#include "libavutil/cpu.h"
> +#include "libavcodec/h264chroma.h"
> +#include "config.h"
> +#include "h264_mc_chroma.h"
> +
> +av_cold void ff_h264chroma_init_riscv(H264ChromaContext *c, int bit_depth)
> +{
> +#if HAVE_RVV
> +const int high_bit_depth = bit_depth > 8;
> +
> +if (!high_bit_depth) {
> +c->put_h264_chroma_pixels_tab[0] = h264_put_chroma_mc8_rvv;
> +c->avg_h264_chroma_pixels_tab[0] = h264_avg_chroma_mc8_rvv;
> +}
> +#endif
> +}
> \ No newline at end of file
> diff --git a/libavcodec/riscv/h264_mc_chroma.S
> b/libavcodec/riscv/h264_mc_chroma.S new file mode 100644
> index 00..a02866f633
> --- /dev/null
> +++ b/libavcodec/riscv/h264_mc_chroma.S
> @@ -0,0 +1,492 @@
> +/*
> + * Copyright (c) 2023 SiFive, Inc. All rights reserved.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser 

Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-19 Thread Rémi Denis-Courmont
Le keskiviikkona 17. toukokuuta 2023, 17.54.22 EEST Lynne a écrit :
> Finally, run:
> make checkasm && ./tests/checkasm/checkasm --bench
> and report on the timings for both the C and assembly versions.
> If you've made a mistake somewhere, (forgot to restore stack, or a
> callee-saved register, or your function produces an incorrect result),
> checkasm will fail.

To be fair, in this particular case, the stack pointer and saved registers are 
never used, so the risk of messing those are zero.

checkasm would of course verify that the function does that it is supposed to 
do, and personally, I have kept off untested functions. But I am not sure if it 
is fair to require adding test cases whilst other architectures weren't 
required to have them.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3 7/7] avutil/la: Add function performance testing

2023-05-20 Thread Rémi Denis-Courmont
Le lauantaina 20. toukokuuta 2023, 10.27.19 EEST Hao Chen a écrit :
> From: yuanhecai 
> 
> This patch supports the use of the "checkasm --bench" testing feature
> on loongarch platform.
> 
> Change-Id: I42790388d057c9ade0dfa38a19d9c1fd44ca0bc3
> ---
>  libavutil/loongarch/timer.h | 48 +
>  libavutil/timer.h   |  2 ++
>  2 files changed, 50 insertions(+)
>  create mode 100644 libavutil/loongarch/timer.h
> 
> diff --git a/libavutil/loongarch/timer.h b/libavutil/loongarch/timer.h
> new file mode 100644
> index 00..44ed786409
> --- /dev/null
> +++ b/libavutil/loongarch/timer.h
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (c) 2023 Loongson Technology Corporation Limited
> + * Contributed by Hecai Yuan 
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA + */
> +
> +#ifndef AVUTIL_LOONGARCH_TIMER_H
> +#define AVUTIL_LOONGARCH_TIMER_H
> +
> +#include 
> +#include "config.h"
> +
> +#if HAVE_INLINE_ASM
> +
> +#define AV_READ_TIME read_time
> +
> +static inline uint64_t read_time(void)
> +{
> +
> +#if ARCH_LOONGARCH64
> +uint64_t a, id = 0;

Initial value is never used.

> +__asm__ volatile ( "rdtime.d  %0, %1" : "=r"(a), "=r"(id) :: "memory"
> ); +return a;
> +#else
> +uint32_t a, id = 0;
> +__asm__ volatile ( "rdtimel.w  %0, %1" : "=r"(a), "=r"(id) :: "memory"
> ); +return (uint64_t)a;
> +#endif

Why do you clobber memory here?

> +}
> +
> +#endif /* HAVE_INLINE_ASM */
> +
> +#endif /* AVUTIL_LOONGARCH_TIMER_H */
> diff --git a/libavutil/timer.h b/libavutil/timer.h
> index d3db5a27ef..861ba7e9d7 100644
> --- a/libavutil/timer.h
> +++ b/libavutil/timer.h
> @@ -61,6 +61,8 @@
>  #   include "riscv/timer.h"
>  #elif ARCH_X86
>  #   include "x86/timer.h"
> +#elif ARCH_LOONGARCH
> +#   include "loongarch/timer.h"
>  #endif
> 
>  #if !defined(AV_READ_TIME)


-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-20 Thread Rémi Denis-Courmont
Le perjantaina 19. toukokuuta 2023, 21.52.57 EEST Lynne a écrit :
> May 19, 2023, 19:16 by r...@remlab.net:
> > Le keskiviikkona 17. toukokuuta 2023, 17.54.22 EEST Lynne a écrit :
> >> Finally, run:
> >> make checkasm && ./tests/checkasm/checkasm --bench
> >> and report on the timings for both the C and assembly versions.
> >> If you've made a mistake somewhere, (forgot to restore stack, or a
> >> callee-saved register, or your function produces an incorrect result),
> >> checkasm will fail.
> > 
> > To be fair, in this particular case, the stack pointer and saved registers
> > are never used, so the risk of messing those are zero.
> > 
> > checkasm would of course verify that the function does that it is supposed
> > to do, and personally, I have kept off untested functions. But I am not
> > sure if it is fair to require adding test cases whilst other
> > architectures weren't required to have them.
> 
> Other hardware exists, and even without checkasm, bugs are found
> through fate.

There are exactly two FATE RISC-V instances of which exactly zero support 
vectors to date, so the chance for FATE to find bugs there is nul. I do 
appreciate that checkasm would make revectoring of the patchset easier and 
safer though.

That said, the argument is moot seen as you seem to have posted a fix.

-- 
Реми Дёни-Курмон
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/h264chroma: RISC-V V add motion compensation for 8x8 chroma blocks

2023-05-20 Thread Rémi Denis-Courmont
Le keskiviikkona 17. toukokuuta 2023, 17.54.22 EEST Lynne a écrit :
> Finally, run:
> make checkasm && ./tests/checkasm/checkasm --bench
> and report on the timings for both the C and assembly versions.
> If you've made a mistake somewhere, (forgot to restore stack, or a
> callee-saved register, or your function produces an incorrect result),
> checkasm will fail.

I don't specifically know SiFive's policies. In my experience however, silicon 
design companies will ABSOLUTELY NOT publish benchmark results from unreleased 
products in any stage of development (including FPGA simulation). Your say is 
unlikely to change those policies, so I don't think we can require RVV 
benchmarks this year.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Sovereign Tech Fund

2024-02-04 Thread Rémi Denis-Courmont
Hi,

Le 4 février 2024 21:28:44 GMT+02:00, Michael Niedermayer 
 a écrit :
>On Sun, Feb 04, 2024 at 03:38:43PM +0100, Rémi Denis-Courmont wrote:
>> Hi,
>> 
>> Le 4 février 2024 14:41:15 GMT+01:00, Michael Niedermayer 
>>  a écrit :
>> >Hi
>> >
>> >As said on IRC, i thought people knew it, but ‘the same person as before’ 
>> >is Thilo.
>> >
>> >Ive updated the price design suggestion for the merge task, its 16€ / 
>> >commit limited to 50k€
>> >this comes from looking at pauls fork which has around 500 commits in 2 
>> >months thus
>> >250 commits per month, 12 months, and if we allocate 50k that end with 
>> >roughly 16€ / commit
>> >if activity stays equal.
>> 
>> It's very different if we're talking about librempeg or some other 
>> unspecified fork. I could make a fork that removes MMX et al, and claim that 
>> I'm merging a fork.
>
>There are so many reasons why this wouldnt work
>(first you would have to lie, i dont think you would,
> then it would not be left that way on the wiki,
> not being sent to STF that way
> and not being accepted by STF and more)
>
>But assuming one could get away with that in the short term
>Why would anyone do something like this to destroy our all opertunity
>to obtains grants in the future ?

I don't know. That was purely an example, and I prefer to be the fictional bad 
guy in my examples, so nobody else feels insulted. But you can't blame people 
for being distrustful when a proposal is brought forward on short deadlines 
even though it was privately known for months.


>> Indeed I don't think that a semiformal open-source community with a lot of 
>> strong and varied opinions will carry such dotting of all i's very 
>> effectively. That has been one of the arguments for delegating this to a 
>> contracting IT company rather than to FFmpeg-devel and SPI.
>
>If the FFmpeg team can make decissions about what to fund then we do not need
>any contracting IT company.

Let's face it: FFmpeg is not the healthiest of open-source communities as of 
now.  But that's not even relevant here: OSS communities are typically focused 
on development and maybe support and promotion, *not* HR and payroll, nor 
waterfall-style project management.

Ergo, there would be no shame in conceding that FFmpeg would suck at those 
tasks, and a company whose job those things essentially are would be more 
effective at it. And I'm not saying this out of self-interest, just  pragmatism 
(call it cynicism if you will).

>OTOH If the FFmpeg team is not able to make decissions, thats a far bigger
>problem and it needs to be understood and corrected

I don't think the technical development lists of an OSS community should 
concern themselves with funding matters. Well-funded foundations surely need to 
concern themselves with this, but they don't mix it with development. And 
FFmpeg is not sl well-funded in the first place.

> Because whoever controlls the income of developers
>effectively controlls the project.

As long as there are several parties involved, and a single trust doesn't 
dominate the GA and the TC, I don't see that as a major problem. Or rather, 
it's a lesser problem than loosing competent developers because they need to 
work on something else to earn their living.

>An emloyee has to do what she is being told be her employer. So if the main 
>developers
>become employees payed to work on FFmpeg that would hand FFmpeg to some CEO on 
>a
>silver plate,

20€ are not remotely enough for that to happen. You'd need 2-3 orders of 
magnitude larger investment, without competition, to get there at minimum.

So I don't see a risk here. But it's up to Thilo really, if he insists on going 
through SPI or not applying for STF at all.

>This would change FFmpeg from a Free software project to a commercial company.
>I do NOT agree to this, and i belive many others also do not agree.

I think a lot of people would rather get paid to work on Ffmpeg, and would in 
fact contribute more effectively if they were. And conversely, quite a few 
contributors seem to be acting for their commercial employer already.

Also, as a consultant or maybe an associate for FFlabs, it's a rather 
contradictory position for you to hold.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-08 Thread Rémi Denis-Courmont
Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
> I think in most cases it is like this, but specifically for this function,
> using Reduction only once would be slower.
> 
> The currently submitted version roughly takes:
> pix_abs_0_0_rvv_i32: 136.2
> 
> The version that uses Reduction only once takes:
> pix_abs_0_0_rvv_i32: 169.2

You're only using one vector and half a vector respectively, so the 
logarithmic time of the sum is relatively small.

But are you sure that it wouldn't be faster to process multiple rows and 
larger group multiplers?

> Here is the implementation of the version that uses it only once:
> 
> func ff_pix_abs16_temp_rvv, zve32x
> vsetivlizero, 16, e32, m4, ta, ma
> vmv.v.i v24, 0
> vmv.s.x v0, zero
> 1:
> vsetvli zero, zero, e8, m1, tu, ma
> vle8.v  v4, (a1)
> vle8.v  v12, (a2)
> addia4, a4, -1
> vwsubu.vv   v16, v4, v12
> add a1, a1, a3
> vwsubu.vv   v20, v12, v4
> vsetvli zero, zero, e16, m2, tu, ma
> vmax.vv v16, v16, v20
> add a2, a2, a3
> vwadd.wvv24, v24, v16
> bneza4, 1b
> 
> vsetvli zero, zero, e32, m4, ta, ma
> vwredsumu.vsv0, v24, v0
> vmv.x.s a0, v0
> ret
> endfunc
> 
> Rémi Denis-Courmont  于2024年2月7日周三 00:58写道:
> 
> > Hi,
> > 
> > To sum a vector, you should only reduce once at the end of the function,
> > c.f.
> > how it's done in existing scalar products. Reduction instructions are
> > (intrinsically) slow.
> > 
> > --
> > Rémi Denis-Courmont
> > http://www.remlab.net/
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-09 Thread Rémi Denis-Courmont


Le 9 février 2024 00:39:38 GMT+02:00, flow gg  a écrit :
>From my understanding, to use larger group multipliers, one needs to
>utilize vlse64 (8x8) vlse128 (16x16).
>
>However, due to the use in tests of
>
>ptr = img2 + y * WIDTH + x;
>d2 = call_ref(NULL, img1, ptr, WIDTH, h);
>d1 = call_new(NULL, img1, ptr, WIDTH, h);
>
>will get:  pix_abs_1_0_rvv_i32 (fatal signal 7: Bus error)
>
>Because it can only load according to e8, it seems there's no way to use
>larger group multipliers.

vlse128.v requires 128-bit elements, which no hardware supports. vlse64.v works 
just fine; we're already using it. There's also the possibility of segmented 
strided loads, or simply multiple unit loads.

In any case, unrolling one way or other should improve performance.


>
>
>
>Rémi Denis-Courmont  于2024年2月9日周五 03:41写道:
>
>> Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
>> > I think in most cases it is like this, but specifically for this
>> function,
>> > using Reduction only once would be slower.
>> >
>> > The currently submitted version roughly takes:
>> > pix_abs_0_0_rvv_i32: 136.2
>> >
>> > The version that uses Reduction only once takes:
>> > pix_abs_0_0_rvv_i32: 169.2
>>
>> You're only using one vector and half a vector respectively, so the
>> logarithmic time of the sum is relatively small.
>>
>> But are you sure that it wouldn't be faster to process multiple rows and
>> larger group multiplers?
>>
>> > Here is the implementation of the version that uses it only once:
>> >
>> > func ff_pix_abs16_temp_rvv, zve32x
>> > vsetivlizero, 16, e32, m4, ta, ma
>> > vmv.v.i v24, 0
>> > vmv.s.x v0, zero
>> > 1:
>> > vsetvli zero, zero, e8, m1, tu, ma
>> > vle8.v  v4, (a1)
>> > vle8.v  v12, (a2)
>> > addia4, a4, -1
>> > vwsubu.vv   v16, v4, v12
>> > add a1, a1, a3
>> > vwsubu.vv   v20, v12, v4
>> > vsetvli zero, zero, e16, m2, tu, ma
>> > vmax.vv     v16, v16, v20
>> > add a2, a2, a3
>> > vwadd.wvv24, v24, v16
>> > bneza4, 1b
>> >
>> > vsetvli zero, zero, e32, m4, ta, ma
>> > vwredsumu.vsv0, v24, v0
>> > vmv.x.s a0, v0
>> > ret
>> > endfunc
>> >
>> > Rémi Denis-Courmont  于2024年2月7日周三 00:58写道:
>> >
>> > > Hi,
>> > >
>> > > To sum a vector, you should only reduce once at the end of the
>> function,
>> > > c.f.
>> > > how it's done in existing scalar products. Reduction instructions are
>> > > (intrinsically) slow.
>> > >
>> > > --
>> > > Rémi Denis-Courmont
>> > > http://www.remlab.net/
>> >
>> > ___
>> > ffmpeg-devel mailing list
>> > ffmpeg-devel@ffmpeg.org
>> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> >
>> > To unsubscribe, visit link above, or email
>> > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>>
>>
>> --
>> 雷米‧德尼-库尔蒙
>> http://www.remlab.net/
>>
>>
>>
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>>
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/4] checkasm/rv34dsp: add rv34_inv_transform_dc test

2024-02-12 Thread Rémi Denis-Courmont
Le perjantaina 2. helmikuuta 2024, 2.47.16 EET flow gg a écrit :
> It seems to be caused by movd m0, r1d in libavcodec/x86/rv34dsp.asm? I'm
> not quite sure.

If it affects only MMX and neither SSE nor AVX, add a patch to remove the 
offending code altogether.

It's ridiculous to hold checkasm tests off because of broken legacy code.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] lavc/blockdsp: R-V V clear_blocks

2024-02-12 Thread Rémi Denis-Courmont
Le perjantaina 2. helmikuuta 2024, 3.14.39 EET flow gg a écrit :
> Ok, updated it in the reply

Sorry I meant directive, not macro. .rept is just fine here.

-- 
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/3] lavc/vp8dsp: R-V V vp8_idct_dc_add

2024-02-12 Thread Rémi Denis-Courmont
Hi,

I think you cna use vwadd here?

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] lavc/rv34dsp: R-V V rv34_idct_dc_add

2024-02-12 Thread Rémi Denis-Courmont
Le keskiviikkona 31. tammikuuta 2024, 19.58.55 EET flow gg a écrit :
> Fixed the rv32 break in this reply

It looks like widening add would avoid the sign extension.

Although you'd need as many instructions, since V lacks signed to unsigned 
clipping.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/3] lavc/vp8dsp: R-V V vp8_idct_dc_add4y

2024-02-12 Thread Rémi Denis-Courmont
Hi,

To avoid repeating the code, you can either use .repr or .irp. You can even 
use assembler conditionals to elide the redundant code on the last iteration.

-- 
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-10 Thread Rémi Denis-Courmont
Le perjantaina 9. helmikuuta 2024, 17.34.40 EET flow gg a écrit :
> The issue here is that any load greater than e8 will fail the test(Bus
> error), so it cannot use vlse64 or similar methods...

AFAICT, data is aligned on 16 bytes here, so using larger element sizes should 
not be a problem. That being the case, you can load pretty much any power-of-
two byte quantity per row up to 512 bits, as 8 segments of 64-bit elements. 
That is more than enough to deal with 16-byte rows.

Of course, that results in a tiled data layout, so it only works if individual 
elements are all treated equally with no cross-row calculations. This might 
require trickery or not work at all for those functions that subtract adjacent 
values. But your patchset seems to leave those out anyway.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-10 Thread Rémi Denis-Courmont
Le lauantaina 10. helmikuuta 2024, 11.14.11 EET Rémi Denis-Courmont a écrit :
> But your patchset seems to leave those out anyway.

Nevermind that bit, I missed other mails


-- 
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-10 Thread Rémi Denis-Courmont
Happy new year,

The gains are -unsurprisingly- modest here. Did you try to reorder 
instructions to improve scheduling?

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] avutil/mem: limit alignment to maximum simd align

2024-02-11 Thread Rémi Denis-Courmont
Le perjantaina 9. helmikuuta 2024, 21.22.17 EET Timo Rothenpieler a écrit :
> On 13.01.2024 16:46, Timo Rothenpieler wrote:
> > FFmpeg has instances of DECLARE_ALIGNED(32, ...) in a lot of structs,
> > which then end up heap-allocated.
> > By declaring any variable in a struct, or tree of structs, to be 32 byte
> > aligned, it allows the compiler to safely assume the entire struct
> > itself is also 32 byte aligned.
> > 
> > This might make the compiler emit code which straight up crashes or
> > misbehaves in other ways, and at least in one instances is now
> > documented to actually do (see ticket 10549 on trac).
> > The issue there is that an unrelated variable in SingleChannelElement is
> > declared to have an alignment of 32 bytes. So if the compiler does a copy
> > in decode_cpe() with avx instructions, but ffmpeg is built with
> > --disable-avx, this results in a crash, since the memory is only 16 byte
> > aligned.
> > 
> > Mind you, even if the compiler does not emit avx instructions, the code
> > is still invalid and could misbehave. It just happens not to. Declaring
> > any variable in a struct with a 32 byte alignment promises 32 byte
> > alignment of the whole struct to the compiler.
> > 
> > This patch limits the maximum alignment to the maximum possible simd
> > alignment according to configure.
> > While not perfect, it at the very least gets rid of a lot of UB, by
> > matching up the maximum DECLARE_ALIGNED value with the alignment of heap
> > allocations done by lavu.
> > ---
> > 
> >   libavutil/mem.c  |  8 +++-
> >   libavutil/mem_internal.h | 14 --
> >   2 files changed, 15 insertions(+), 7 deletions(-)
> > 
> > diff --git a/libavutil/mem.c b/libavutil/mem.c
> > index 36b8940a0c..b5bcaab164 100644
> > --- a/libavutil/mem.c
> > +++ b/libavutil/mem.c
> > @@ -62,7 +62,13 @@ void  free(void *ptr);
> > 
> >   #endif /* MALLOC_PREFIX */
> > 
> > -#define ALIGN (HAVE_AVX512 ? 64 : (HAVE_AVX ? 32 : 16))
> > +#if defined(_MSC_VER)
> > +/* MSVC does not support conditionally limiting alignment.
> > +   Set minimum value here to maximum used throughout the codebase. */
> > +#define ALIGN (HAVE_SIMD_ALIGN_64 ? 64 : 32)

Not that I care whatsoever, but are we assuming that MSVC supports only x86? 
Otherwise, this conditional definition does not make much sense and seems very 
sketchy. In fact, I don't see the point in making this distinction at all 
(*unlike* below).

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [TC] Decision on FF_INTERNAL_FIELDS in libavfilter

2024-02-13 Thread Rémi Denis-Courmont
Hi,



Le 13 février 2024 18:22:46 GMT+02:00, Nicolas George  a écrit 
:
>Martin Storsjo (12024-02-13):
>> The main arguments raised were about API consistency, prevention of
>> accidental inclusions, as well as explicitness in marking a field as
>> public or private.
>
>Too bad the committee neglected to ask for the arguments of the people
>who opposed this. Like having a trial and not listening to the defense.

Speaking as an elected member of another OSS project's TC, I believe that the 
experienced adult developers on the FFmpeg TC are perfectly capable of reading 
mailing archives and Trac comments as necessary. In fact, I think it is 
preferable that they stick to arguments made in public, rather than go out of 
their way to dig up private information.

Or to follow your metaphor, in a trial, the arguments are made before the jury 
in public sessions. The deliberations of the jury are kept private to shield 
members from external pressure.

I don't expect that antagonising the TC members will do you, and any potential 
future arguments of yours, much good.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [RFC] clarifying the TC conflict of interest rule

2024-02-20 Thread Rémi Denis-Courmont


Le 20 février 2024 16:01:11 GMT+02:00, Michael Niedermayer 
 a écrit :
>On Tue, Feb 20, 2024 at 09:22:57AM +0100, Anton Khirnov wrote:
>[...]
>> their preferred wording, and then we can have the GA vote on it.
>
>Before this GA vote, we need another extra member discussion/vote.
>Because the last GA reset droped several developers from the GA

And so what?

Changing the voting body specifically before a decision, and outside the normal 
flow, sounds like a thinly veiled attempt at altering the ballot results, TBH. 
It's also unfair to people who contributed code in the past two months and will 
have to wait until the next normal renewal.

Maybe you mean well, but we can't read your mind, and it *looks* like a bad 
idea.

-1
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [RFC] clarifying the TC conflict of interest rule

2024-02-20 Thread Rémi Denis-Courmont
Le tiistaina 20. helmikuuta 2024, 10.22.57 EET Anton Khirnov a écrit :
> Hi,
> in the 'avcodec/s302m: enable non-PCM decoding' thread it became
> apparent that there is wide disagreement about the interpretation of
> 
> this line in the TC rules:
> > If the disagreement involves a member of the TC, that member should
> > recuse themselves from the decision.
> 
> The word 'involves' in it can be intepreted a variety of very different
> ways, to apply to TC members who e.g.:
> 1) authored the changes that are being objected to
> 2) are objecting to the changes
> 3) have any opinion on the changes, either positive or negative
> 4) have previously voiced an opinion that would apply to the changes
> 5) authored the code that is being modified
> 6) have a financial or other similar interest in a specific outcome of
>the disagreement
> 
> I believe the best way to address this is to make the rule more
> explicit,

The sentence in question can hardly be called a "rule". It is a 
recommendation. Maybe the author did not mean it that way, but what matters is 
the text that people agreed upon, not a post-facto originalist interpretation.

> so I propose that people with an opinion on the matter submit
> their preferred wording, and then we can have the GA vote on it.

It is completely normal, and even expected, of TC members to have opinions. 
The TC is a, well, Technical commitee, not a court room. The TC is making 
technical assessment, not determining guilt and giving sentences.

Of course, in principles we want to avoid biases of non-technical nature, 
including but not limited to financial or material conflict of interests. But I 
fail to see how such a constraint can be enforced in practice, and it is not 
even really a clear-cut and objective constraint either.

Furthermore, I don't think that a vote could *practically* be deemd invalid 
after the fact. I mean, One Does Not Simply revert the code that was merged as 
a consequence of a TC decision.

I however think that technical biases are totally acceptable, and even 
expected. Afterall, I certainly expect TC member to more or less agree with 
the subjective technical leanings of FFmpeg, as well as its "open-source 
political" leanings so to say. For example, FFmpeg favours C over C++, and 
outline SIMD assembler over intrinsics, and of course LGPLv2.1 over other 
licences.

All in all, I more or less agree with option 6, but that's assuming that the 
text retains the "should" modal. I don't think we can make a hard "must" rule 
here. In the end, if we are worried about conflict of interests, the most 
effective way around them is to keep diverse membership in the TC to counter-
balance conflicted members.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 5/7] lavc/me_cmp: R-V V vsse vsad

2024-02-21 Thread Rémi Denis-Courmont
Le tiistaina 6. helmikuuta 2024, 17.56.32 EET flow gg a écrit :
> 

Did you try to compute integral absolute values with the ad-hoc (floating 
point) instruction instead of vneg/vmax? It should work since the sign is in 
the same place, though I don't know if it will be faster.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_vp8_pixels

2024-02-21 Thread Rémi Denis-Courmont
Hello,

Le maanantaina 19. helmikuuta 2024, 13.13.43 EET flow gg a écrit :
> The reason for using m1+le8 instead of stride load + larger group
> multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V
> V pix_abs."
> 
> In the test, there is
> 
> #define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1)
> 
> Therefore, not using e8 will result : (fatal signal 7: Bus error).

Yes, you could also just say that alignment is insufficient :)

It is still possible to load rectangles of up to 8 columns using vlseg8e8, but 
it might be slower than just repeating the 8 regular loads, and it won't work 
if you need calculations between rows.

I may be missing something but I don't understand what purpose the header file 
serves here?

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/s302m: enable non-PCM decoding

2024-02-17 Thread Rémi Denis-Courmont
Le lauantaina 17. helmikuuta 2024, 13.46.27 EET Gyan Doshi a écrit :
> As a TC member who is part of the disagreement, I believe your
> participation is recused.

Obviously not. We don't want to get into a situation whence TC members have an 
incentive not to participate in regular code reviews just so that they can 
participate in the hypothetical making of later related TC decisions. That 
would be both dumb and counter-productive.

Somebody disagreeing with you does not necessarily mean that they have a 
conflict of interest in the subject matter.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

2024-02-19 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 14.27.56 EET flow gg a écrit :
> ping

Patch does not apply here.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-06 Thread Rémi Denis-Courmont
Hi,

I'm not sure why you're mixing element sizes this way, but the code should not 
even compile due to mismatched extensions.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

2024-02-06 Thread Rémi Denis-Courmont
Hi,

To sum a vector, you should only reduce once at the end of the function, c.f. 
how it's done in existing scalar products. Reduction instructions are 
(intrinsically) slow.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] STF SoWs

2024-02-07 Thread Rémi Denis-Courmont


Le 7 février 2024 14:16:51 GMT+02:00, Nicolas George  a écrit :
>Paul B Mahol (12024-02-06):
>> If this is again about SDR, go ahead,  merge it. I no longer care.
>
>You should care. But you should care by being FOR it, not AGAINST.
>
>The people who oppose SDR are the same libav people who are disgusting
>you and me and others away from the project with their authoritarian
>attitude, their behaving like on conquered ground, their disregard for
>features that only serve a handful of users but are crucial for them
>because they cannot be found in any other software than FFmpeg.
>
>These people are killing the project, we should oppose them whenever we
>have the courage.

*Yawn*. Sure, go ahead and make your own Democratic People's fork of FFmpeg 
without the corporate capitalist oppressors.

You can even make a downstream of librempeg but with SDR. Win-win-win.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/4] lavc/rv34dsp: R-V V rv34_inv_transform_dc

2024-02-09 Thread Rémi Denis-Courmont
Le keskiviikkona 7. helmikuuta 2024, 2.12.22 EET flow gg a écrit :
> My carelessness.. fixed it in the reply.

I know I said to avoid scalar multiplications, but this may be taking it a 
little too far. Either this works:
   slli t1, t0, 9
   sh2add t0, t0, t0
   sub t0, t1, t0
or just:
   li t1, 13 * 13 * 3
   mul t0, t0, t1

Also the second vsetvl seems pointless, unless you specifically meant that the 
pointer was aligned to 32 bits?

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] Require compilers to support C17.

2024-02-07 Thread Rémi Denis-Courmont


Le 7 février 2024 23:19:41 GMT+02:00, James Almer  a écrit :
>On 2/7/2024 6:10 PM, Cosmin Stejerean via ffmpeg-devel wrote:
>> 
>> 
>>> On Feb 7, 2024, at 11:27 AM, Lynne  wrote:
>>> 
> 
> As a compromise, we could start requiring C11 now, and C17 in 7.1.
> Or does anyone still care about compilers without even c11 support?
> 
 
 How about C11 now and C17 in a year with ffmpeg 8?
 
>>> 
>>> Do you have setups and reasons why you can't update them
>>> that don't support C17 or are you speculating?
>> 
>> I don't have any personal reasons why I can't support C17 immediately, but 
>> C11 now / C17 in a year seems like an approach more likely to find consensus 
>> than C17 immediately (or bumping to C17 in a minor release). It was also 
>> roughly the approach proposed in person at FOSDEM.
>
>What are the fixes in c17 that we would benefit from, that compilers from 
>before 2017 would be affected by?

Besides editorial corrections with no practical impact, C17 allows initialising 
atomics directly, without ATOMIC_VAR_INIT. This shouldn't be a problem for any 
real C11 compiler, but I haven't checked.

Then it also allows atomic load from const-qualified pointers. I don't know if 
this is relevant to FFmpeg.

There may be other small differences that I don't remember or know of.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/mem: use C11 aligned_malloc()

2024-02-18 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 18.27.35 EET Andreas Rheinhardt a écrit :
> 1. The function is called aligned_alloc (how did you test this?).
> 2. C11: "The value of alignment shall be a valid alignment supported by
> the implementation and the value of size shall be an integral multiple
> of alignment."
> a) To use this, you would have to round size upwards; but this will make
> sanitiziers more lenient.
> b) If ALIGN is just not supported by the implementation, then everything
> is UB in C11.

The letter of the specification is that all alignments of types defined in the 
specification must be supported and other "may" be supported. The intent is 
clearly that all relevant alignments on the target platform are supported.

FFmpeg assumes that alignment 16, 32 and 64 are supported already anyhow, so 
this would not be introducing any *new* UB. In this respect, FFmpeg is doing 
UB on practically all platforms other than x86, which seems to be the only 
platform to need alignment of 32 and 64 bytes for anything.

IMO, FFmpeg should not use custom alignments beyond `alignof(max_align_t)` 
unless they are specifically needed on the given platform, but that's a 
potentially tedious clean-up task with zero practical gains.

> 3. What's the advantage of this patch anyway?

In theory, `aligned_alloc()` (not `aligned_malloc()`) supports alignment of 1 
and any legal value until `sizeof(void*)`, *unlike* `posix_memalign()`. But 
since you can just as well use `malloc()` for that purpose, that is not a real 
advantage.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/mem: use C11 aligned_malloc()

2024-02-18 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 18.16.36 EET James Almer a écrit :
> Save for the Microsoft C Runtime library, where free() can't handle aligned
> buffers, aligned_malloc() should be available and working on all supported
> targets.
> Also, malloc() alone may be sufficient if alignment requirement is low, so
> add a check for it.
> 
> Signed-off-by: James Almer 
> ---
>  configure   |  2 --
>  libavutil/mem.c | 42 ++
>  2 files changed, 6 insertions(+), 38 deletions(-)
> 
> diff --git a/configure b/configure
> index 7c45ac25c8..8fd2895ac2 100755
> --- a/configure
> +++ b/configure
> @@ -6450,8 +6450,6 @@ if test -n "$custom_allocator"; then
>  fi
> 
>  check_func_headers malloc.h _aligned_malloc && enable aligned_malloc
> -check_func  ${malloc_prefix}memalign&& enable memalign
> -check_func  ${malloc_prefix}posix_memalign  && enable posix_memalign
> 
>  check_func  access
>  check_func_headers stdlib.h arc4random_buf
> diff --git a/libavutil/mem.c b/libavutil/mem.c
> index 36b8940a0c..a72981d1ab 100644
> --- a/libavutil/mem.c
> +++ b/libavutil/mem.c
> @@ -100,44 +100,14 @@ void *av_malloc(size_t size)
>  if (size > atomic_load_explicit(_alloc_size, memory_order_relaxed))
> return NULL;
> 
> -#if HAVE_POSIX_MEMALIGN
> -if (size) //OS X on SDK 10.6 has a broken posix_memalign implementation
> -if (posix_memalign(, ALIGN, size))
> -ptr = NULL;
> -#elif HAVE_ALIGNED_MALLOC
> +#if HAVE_ALIGNED_MALLOC
>  ptr = _aligned_malloc(size, ALIGN);
> -#elif HAVE_MEMALIGN
> -#ifndef __DJGPP__
> -ptr = memalign(ALIGN, size);
> -#else
> -ptr = memalign(size, ALIGN);
> -#endif
> -/* Why 64?
> - * Indeed, we should align it:
> - *   on  4 for 386
> - *   on 16 for 486
> - *   on 32 for 586, PPro - K6-III
> - *   on 64 for K7 (maybe for P3 too).
> - * Because L1 and L2 caches are aligned on those values.
> - * But I don't want to code such logic here!
> - */
> -/* Why 32?
> - * For AVX ASM. SSE / NEON needs only 16.
> - * Why not larger? Because I did not see a difference in benchmarks ...
> - */
> -/* benchmarks with P3
> - * memalign(64) + 1  3071, 3051, 3032
> - * memalign(64) + 2  3051, 3032, 3041
> - * memalign(64) + 4  2911, 2896, 2915
> - * memalign(64) + 8  2545, 2554, 2550
> - * memalign(64) + 16 2543, 2572, 2563
> - * memalign(64) + 32 2546, 2545, 2571
> - * memalign(64) + 64 2570, 2533, 2558
> - *
> - * BTW, malloc seems to do 8-byte alignment by default here.
> - */
>  #else
> -ptr = malloc(size);
> +// malloc may already allocate sufficiently aligned buffers
> +if (ALIGN > _Alignof(max_align_t))

If you ever try to reintroduce something like this, you would need 
 here, and thus you should use alignof rather than _Alignof (which 
was already deprecated by C23 deprecated).

> +ptr = aligned_malloc(size, ALIGN);
> +else
> +ptr = malloc(size);
>  #endif
>  if(!ptr && !size) {
>  size = 1;


-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/mem: use C11 aligned_malloc()

2024-02-18 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 18.29.32 EET James Almer a écrit :
> Removing all the different target specific allocation functions in favor
> of the standard one. But your second point makes it moot, so patch
> withdrawn.

If you want to get code closer to standards for dealing with alignment, I 
would argue that using alignas() instead of nonstandard constructs comes first.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/s302m: enable non-PCM decoding

2024-02-18 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 2.43.14 EET Michael Niedermayer a écrit :
> > > You clearly are one of the parties to the disagreement, and "recuse
> > > themselves from the decision" is self-explanatory.
> > 
> > Such a maximalist interpretation makes no sense - why should my opinion
> > become invalid because I commented on a patch,
> 
> "If the disagreement involves a member of the TC"
> does IMHO not preclude commenting on a patch.
> 
> For a disagreement we need 2 parties.
> For example one party who wants a patch in and one who blocks the patch. or
> 2 parties where both block the other.

This is an utterly absurd interpretation. By that logic, a TC member would 
automatically become party to the disagreement by expressing an opinion within 
even the TC itself. In fact, if you would read it maximally that way, any who 
has an opinion, even if they have not expressed it, would be a party.

So what then, the FFmpeg thought police?

You can argue that the rule is vague, and it is. But if anything, we can at 
least eliminate absurd interpretations. (And in any case, it says "should", 
not "must".)

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/s302m: enable non-PCM decoding

2024-02-18 Thread Rémi Denis-Courmont
Le sunnuntaina 18. helmikuuta 2024, 20.40.14 EET Nicolas George a écrit :
> The world is “involves”, its meaning is inherently maximalist.

The wording is very clear (emphasis added): "If the disagreement involves a 
member of the TC, that member SHOULD recuse themselves from the decision."

I trust that you do know the meaning of the auxillary "should". That very 
definitely and very obviously eliminates any "maximalist" interpretations.

> Exactly: you were part of this disagreement, you should recuse yourself.

The maximalist interpretation is clearly nonsensical as per reductio ad 
absurdum. This is a *guideline* or *recommendation*, and obviously it _cannot_ 
be applied to its logic extreme.

Instead Anton, and later rach other TC member will have to each determine for 
themselves if they are so implicated in the disagreement as to justify 
recusing themselves.

Gyan does not get to dictate it, and neither do you or I. There is no point 
arguing this further, and I won't.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] gdigrab: Allow capturing a window by its handle

2023-12-14 Thread Rémi Denis-Courmont


Le 13 décembre 2023 12:03:55 GMT+02:00, Nicolas George  a 
écrit :
>Rémi Denis-Courmont (12023-12-12):
>> ...and test for overflow errors in errno.m (which shall have been
>> zeroed beforehand). AFAIK, you need to do both if you want strict
>> error detection.
>
>Or we can consider that 30064771114 is just another valid way if writing
>42 = 042 = 0x2a. It would be better to check, but it is less critical
>than checking for garbage at the and, which itself is less critical than
>checking that the number is entirely absent.

That's completely arbitrary, TBH. Both cases are syntax errors, and there are 
no particular reasons to tolerate one but not the other. And even if it 
constitued a sensible distinction, that's simply not how strtoul() handles 
overflow: it returns ULONG_MAX, not a wrapped-around value.

In this case, both error cases are strong signs of a typing error or corruption.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/6] lavc/takdsp: R-V V decorrelate_ls

2023-12-18 Thread Rémi Denis-Courmont
Le maanantaina 18. joulukuuta 2023, 17.26.58 EET flow gg a écrit :
> A 'shnadd' should be moved to the front, updated in this reply.

Indeed, but please try to interleave scalar and vector instructions. The C908 
IP does not really care, but apparently, in-order vector processor are going 
to be happening next year.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] mlp: move pack_output pointer to decoder context

2023-12-18 Thread Rémi Denis-Courmont
The current pack_output function pointer is a property of the decoder,
rather than a constant method provided by the DSP code. Indeed, except
for an unused initialisation, the field is never used in DSP code.
---
 libavcodec/mlpdec.c | 48 ++---
 libavcodec/mlpdsp.c |  1 -
 libavcodec/mlpdsp.h |  8 
 3 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/libavcodec/mlpdec.c b/libavcodec/mlpdec.c
index 18e0f47864..ead5ecee76 100644
--- a/libavcodec/mlpdec.c
+++ b/libavcodec/mlpdec.c
@@ -173,6 +173,14 @@ typedef struct MLPDecodeContext {
 DECLARE_ALIGNED(32, int32_t, sample_buffer)[MAX_BLOCKSIZE][MAX_CHANNELS];
 
 MLPDSPContext dsp;
+int32_t (*pack_output)(int32_t lossless_check_data,
+   uint16_t blockpos,
+   int32_t (*sample_buffer)[MAX_CHANNELS],
+   void *data,
+   uint8_t *ch_assign,
+   int8_t *output_shift,
+   uint8_t max_matrix_channel,
+   int is32);
 } MLPDecodeContext;
 
 static const enum AVChannel thd_channel_order[] = {
@@ -422,10 +430,10 @@ static int read_major_sync(MLPDecodeContext *m, 
GetBitContext *gb)
 m->avctx->sample_fmt = AV_SAMPLE_FMT_S32;
 else
 m->avctx->sample_fmt = AV_SAMPLE_FMT_S16;
-m->dsp.mlp_pack_output = 
m->dsp.mlp_select_pack_output(m->substream[m->max_decoded_substream].ch_assign,
-   
m->substream[m->max_decoded_substream].output_shift,
-   
m->substream[m->max_decoded_substream].max_matrix_channel,
-   
m->avctx->sample_fmt == AV_SAMPLE_FMT_S32);
+m->pack_output = 
m->dsp.mlp_select_pack_output(m->substream[m->max_decoded_substream].ch_assign,
+   
m->substream[m->max_decoded_substream].output_shift,
+   
m->substream[m->max_decoded_substream].max_matrix_channel,
+   m->avctx->sample_fmt == 
AV_SAMPLE_FMT_S32);
 
 m->params_valid = 1;
 for (substr = 0; substr < MAX_SUBSTREAMS; substr++)
@@ -663,10 +671,10 @@ static int read_restart_header(MLPDecodeContext *m, 
GetBitContext *gbp,
 if (substr == m->max_decoded_substream) {
 av_channel_layout_uninit(>avctx->ch_layout);
 av_channel_layout_from_mask(>avctx->ch_layout, s->mask);
-m->dsp.mlp_pack_output = m->dsp.mlp_select_pack_output(s->ch_assign,
-   s->output_shift,
-   
s->max_matrix_channel,
-   
m->avctx->sample_fmt == AV_SAMPLE_FMT_S32);
+m->pack_output = m->dsp.mlp_select_pack_output(s->ch_assign,
+   s->output_shift,
+   s->max_matrix_channel,
+   m->avctx->sample_fmt == 
AV_SAMPLE_FMT_S32);
 
 if (m->avctx->codec_id == AV_CODEC_ID_MLP && m->needs_reordering) {
 if (s->mask == (AV_CH_LAYOUT_QUAD|AV_CH_LOW_FREQUENCY) ||
@@ -925,10 +933,10 @@ static int read_decoding_params(MLPDecodeContext *m, 
GetBitContext *gbp,
 }
 }
 if (substr == m->max_decoded_substream)
-m->dsp.mlp_pack_output = 
m->dsp.mlp_select_pack_output(s->ch_assign,
-   
s->output_shift,
-   
s->max_matrix_channel,
-   
m->avctx->sample_fmt == AV_SAMPLE_FMT_S32);
+m->pack_output = m->dsp.mlp_select_pack_output(s->ch_assign,
+   s->output_shift,
+   
s->max_matrix_channel,
+   
m->avctx->sample_fmt == AV_SAMPLE_FMT_S32);
 }
 
 if (s->param_presence_flags & PARAM_QUANTSTEP)
@@ -1155,14 +1163,14 @@ static int output_data(MLPDecodeContext *m, unsigned 
int substr,
 frame->nb_samples = s->blockpos;
 if ((ret = ff_get_buffer(avctx, frame, 0)) < 0)
 return ret;
-s->lossless_check_data = m->dsp.mlp_pack_output(s->lossless_check_data,
-s->blockpos,
-m->sample_buffer,
-frame->data[0],
-s->ch_assign,
- 

Re: [FFmpeg-devel] [PATCHv2 1/1] checkasm/lpc: test compute_autocorr

2023-12-18 Thread Rémi Denis-Courmont
Le sunnuntaina 17. joulukuuta 2023, 23.57.50 EET Martin Storsjö a écrit :
> > Rounding errors would not cause a constant gap across the different test
> > cases. This is most likely an off-by-one in the x86 code. I don't know if
> > this is a bug in the x86 code, or the test case being a little loose with
> > input parameters, and I have neither time, nor motivation not to mention
> > skills to figure that out, so there will be no test cases for this
> > function form me afterall.
> 
> FWIW, we've had these situations elsewhere before as well, in swscale,
> where the existing x86 assembly mismatches the C code in nontrivial ways,
> and we have new assembly (aarch64 in that case) that is missing a test
> (even if one was written) due to this.
> 
> First I considered if we should collect these extra checkasm tests in some
> branch somewhere, so they aren't lost, as they are useful when working on
> assembly on other architectures.
> 
> But rather than having the code rot, forgotten in a stray branch
> somewhere, I wonder if we should just go ahead and merge it with an #if
> !ARCH_X86 or something, together with a notable FIXME comment.

I'd certainly welcome more checkasm that literally anyone other than me wrote. 
If the divergence in the X86 code is simply due to optimising an inexact 
algorithm differently, that seems fine. 

But if it is a case that the X86 code is demonstrably buggy, I think that it 
should be commented out or removed. That would not only fix a bug, but also put 
stronger incentives for X68 fanboys to actually fix it. Worst case, the 
optimisation has become meaningless and we have actually fixed a bug.

Though I don't know which case this nor your swscale tests are.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [RFC] fftools/ffmpeg and libavdevice/sdl issue

2023-12-18 Thread Rémi Denis-Courmont


Le 18 décembre 2023 21:58:39 GMT+02:00, Michael Niedermayer 
 a écrit :
>On Mon, Dec 18, 2023 at 06:33:45PM +0100, Anton Khirnov wrote:
>> Quoting Stefano Sabatini (2023-12-16 16:18:07)
>> > On date Thursday 2023-12-14 10:35:56 +0100, Nicolas George wrote:
>> > > Anton Khirnov (12023-12-14):
>> > [...]
>> > > > I have to strongly disagree. This is neither practically workable,
>> > > > nor a good goal to aim at.
>> > > 
>> > > And I strongly agree with Stefano. Having the tools just thin wrappers
>> > > around the libraries is the only way to ensure the libraries are
>> > > maximally useful for other applications. Otherwise, useful code will
>> > > only reside in the tools and be only available through a clumsy
>> > > command-line interface.
>> > > 
>> > > > This mindset IMO inevitably leads to (among
>> > > > other problems):
>> > 
>> > > > * endless scope creep
>> > 
>> > Scope creep is a general tendency in software, as it tends to grow
>> > with more functionality and use cases in mind (FFmpeg itself started
>> > as an MPEG decoder). OTOH there is good and bad scope creep, it is bad
>> > if the functionality goes beyond the original design and core use
>> > case, or if the extension is not carefully designed and suffers from
>> > assumptions which limit how the software can be used. For example,
>> > making constraints about where the main thread can be executed.
>> > 
>> > (Unrelated note: I greatly appreciate Anton's threaded architecture
>> > endeavor, and I'm fine with the idea that something can result broken
>> > as a consequence of a major redesign, but I also think we should fix
>> > what can be fixed rather than just dismiss that as "not useful".
>> 
>> The entire question here is whether SDL muxing is useful enough to
>> warrant massive hacks in ffmpeg CLI.
>
>I think the first question is, does this actually need
>"massive hacks in ffmpeg CLI" ?

>If you ignore the recommandition that SDL should be run from the main
>thread then iam not sure what change would be needed in ffmpeg CLI at all.

As others noted earlier, that won't work for Mac and Windows.

>If you do want to run it in the main thread, well theres the option
>for the muxer to launch a seperate process by some way internally.

Starting a process from a library is not very practical. You need to locate the 
executable and the way to do that is different if you're working with a proper 
installation, or testing in the development tree.

I reckon that testing is a big motivation for the SDL code, so this can't be 
simply ignored.

And then you need an IPC, which is not portable, and not very different from 
the piping alternative proposal up-thread.

>then it has its own main thread (not great but its a clean solution)
>
>teh 2nd question is, is SDL the only thing requireing "main thread" or
>some "single thread" or other limitation ?

On Windows, it requires its own thread to use a single-threaded COM apartment. 
On Mac, it must run on the main thread to access GUI functionality. Main thread 
here means the initial thread in the address space.

>does any other decoder, encoder, muxer, demuxer, filter ... use an
>external lib thats not fully thread safe ? or has funny limitations ?

>The last option would maybe be to run some sort of AVExecutor on the
>main thread and allow things like muxers to que operations on it.
>The  muxers would totally unchanged be running on a random thread
>but que some operation on the main thread if an external lib, driver or
>other needs it.

To me, that counts as a horrible hack for a library to have, TBH. And if 
nothing other than SDL on Mac would even need this, then it's very much an 
ad-hoc kludge.

>Naively that feels relatively clean to me
>
>thx
>
>[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [RFC] fftools/ffmpeg and libavdevice/sdl issue

2023-12-19 Thread Rémi Denis-Courmont


Le 19 décembre 2023 11:29:04 GMT+02:00, Nicolas George  a 
écrit :
>Rémi Denis-Courmont (12023-12-19):
>> As others noted earlier, that won't work for Mac and Windows.
>
>If it works on Linux and other real Unixes, it is a lot better than if
>it does not work on any platform. And we should still blame whoever
>broke it rather than whoever is trying to fix it.

Anton's objections are against the horrible hacks necessary to support Mac and 
Windows, as far as I understand him.

Of course it's also objectionable for SDL to be modelled as a muxer, when it's 
ostensibly an audio output device and a video output device - not a 
multiplexer. (SPU blending, lip sync and whatever may require ESs to be 
processed together should not be tied to SDL.)

>> Starting a process from a library is not very practical. You need to
>> locate the executable and the way to do that is different if you're
>> working with a proper installation, or testing in the development
>> tree.
>
>You are confusing starting a process and executing a new executable.

Running on the main thread (the initial thread of an address space) requires an 
external executable, so if somebody is confusing the two, that's not me.

Besides, starting a new process without execution of an executable, in other 
words, forking without executing, is essentially impossible in a multithreaded 
Unix-like environment, since FFmpeg is not async-fork-safe. It is also 
completely impossible on Windows. So the distinction is completely helpless 
here.

>What? Having an API tu run functions in the main thread is a basic
>feature for any kind of threading architecture.

Oh really? And the POSIX thread function to run on the main thread is what 
exactly?

You're conflating main-loops and threads here. Thread-safe libraries don't 
normally depend on a main loop, even less the ability to run idle callbacks on 
it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] riscv: Tweak names of cpu flags, print flags in libavutil/tests/cpu

2023-12-17 Thread Rémi Denis-Courmont
Le sunnuntaina 17. joulukuuta 2023, 14.36.08 EET Martin Storsjö a écrit :
> The names of the cpu flags, when parsed from a string with
> av_parse_cpu_caps, are parsed by the libavutil eval functions. These
> interpret dashes as subtractions. Therefore, these previous cpu flag
> names haven't been possible to set.
> 
> Use the official names for these extensions, as the previous ad-hoc
> names wasn't parseable.
> 
> libavutil/tests/cpu tests that the cpu flags can be set, and prints
> the detected flags.

Acked-by: Rémi Denis-Courmont 

> ---
> v3: Fixed the name zve64d. Kept the cpuflags names all lowercase for
> consistency with the other cpuflags.
> ---
>  libavutil/cpu.c   | 12 ++--
>  libavutil/tests/cpu.c | 10 ++
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
> index 1e0607d581..48d195168c 100644
> --- a/libavutil/cpu.c
> +++ b/libavutil/cpu.c
> @@ -186,12 +186,12 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
>  { "rvi",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVI 
> },.unit = "flags" }, { "rvf",  NULL, 0, AV_OPT_TYPE_CONST, {
> .i64 = AV_CPU_FLAG_RVF  },.unit = "flags" }, { "rvd",  NULL, 0,
> AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVD  },.unit = "flags" }, -
>{ "rvv-i32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I32 }, .unit = "flags" }, -{ "rvv-f32",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32 }, .unit = "flags"
> }, -{ "rvv-i64",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I64 }, .unit = "flags" }, -{ "rvv",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64 }, .unit = "flags"
> }, -{ "rvb-addr",NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVB_ADDR },   .unit = "flags" }, -{ "rvb-basic",NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC },   .unit = "flags"
> }, +{ "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I32  },.unit = "flags" }, +{ "zve32f",   NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32  },.unit = "flags"
> }, +{ "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I64  },.unit = "flags" }, +{ "zve64d",   NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64  },.unit = "flags"
> }, +{ "zba",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVB_ADDR },.unit = "flags" }, +{ "zbb",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC },   .unit = "flags"
> }, #endif
>  { NULL },
>  };
> diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
> index 200f20388a..d91bfeab5c 100644
> --- a/libavutil/tests/cpu.c
> +++ b/libavutil/tests/cpu.c
> @@ -84,6 +84,16 @@ static const struct {
>  #elif ARCH_LOONGARCH
>  { AV_CPU_FLAG_LSX,   "lsx"},
>  { AV_CPU_FLAG_LASX,  "lasx"   },
> +#elif ARCH_RISCV
> +{ AV_CPU_FLAG_RVI,   "rvi"},
> +{ AV_CPU_FLAG_RVF,   "rvf"},
> +{ AV_CPU_FLAG_RVD,   "rvd"},
> +{ AV_CPU_FLAG_RVB_ADDR,  "zba"},
> +{ AV_CPU_FLAG_RVB_BASIC, "zbb"},
> +{ AV_CPU_FLAG_RVV_I32,   "zve32x" },
> +{ AV_CPU_FLAG_RVV_F32,   "zve32f" },
> +{ AV_CPU_FLAG_RVV_I64,   "zve64x" },
> +{ AV_CPU_FLAG_RVV_F64,   "zve64d" },
>  #endif
>  { 0 }
>  };


-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv2 1/1] checkasm/lpc: test compute_autocorr

2023-12-17 Thread Rémi Denis-Courmont
Le sunnuntaina 17. joulukuuta 2023, 18.09.45 EET James Almer a écrit :
> On 12/17/2023 6:13 AM, Rémi Denis-Courmont wrote:
> > ---
> > 
> >   tests/checkasm/lpc.c | 47 ++--
> >   1 file changed, 45 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
> > index 592e34c03d..9b33f8a3b0 100644
> > --- a/tests/checkasm/lpc.c
> > +++ b/tests/checkasm/lpc.c
> > @@ -57,10 +57,46 @@ static void test_window(int len)
> > 
> >   bench_new(src, len, dst1);
> >   
> >   }
> > 
> > +static void test_compute_autocorr(ptrdiff_t len, int lag)
> > +{
> > +LOCAL_ALIGNED(32, double, src, [5000 + 2 + MAX_LPC_ORDER]);
> > +LOCAL_ALIGNED(16, double, dst0, [MAX_LPC_ORDER + 1]);
> > +LOCAL_ALIGNED(16, double, dst1, [MAX_LPC_ORDER + 1]);
> > +
> > +declare_func(void, const double *in, ptrdiff_t len, int lag, double
> > *out); +
> > +av_assert0(lag >= 0 && lag <= MAX_LPC_ORDER);
> > +
> > +for (int i = 0; i < MAX_LPC_ORDER; i++)
> > +src[i] = 0.;
> > +
> > +src += MAX_LPC_ORDER;
> > +
> > +for (ptrdiff_t i = 0; i < len; i++) {
> > +src[i] = (double)rnd() / (double)UINT_MAX;
> > +}
> > +
> > +call_ref(src, len, lag, dst0);
> > +call_new(src, len, lag, dst1);
> > +
> > +for (size_t i = 0; i < lag; i++) {
> > +if (!double_near_abs_eps(dst0[i], dst1[i], EPS)) {
> 
> checkasm: using random seed 2504816888
> SSE2:
>   - lpc.apply_welch_window_even [OK]
>   - lpc.apply_welch_window_odd  [OK]
> 0:  770.224646270451 -  770.382378714191 = -0.15773244374
> autocorr_10_sse2 (lpc.c:86)
>   - lpc.compute_autocorr_10 [FAILED]
> 0:  807.574416481743 -  807.732148925482 = -0.157732443739
> autocorr_30_sse2 (lpc.c:86)
>   - lpc.compute_autocorr_30 [FAILED]
> 0:  787.32905328 -  787.486785732628 = -0.15773244374
> autocorr_32_sse2 (lpc.c:86)
>   - lpc.compute_autocorr_32 [FAILED]
> 
> checkasm: using random seed 827008587
> SSE2:
>   - lpc.apply_welch_window_even [OK]
>   - lpc.apply_welch_window_odd  [OK]
>   - lpc.compute_autocorr_10 [OK]
>   - lpc.compute_autocorr_30 [OK]
>   - lpc.compute_autocorr_32 [OK]
> 
> Some seeds work, others don't. So i guess EPS is too small

Rounding errors would not cause a constant gap across the different test cases. 
This is most likely an off-by-one in the x86 code. I don't know if this is a 
bug in the x86 code, or the test case being a little loose with input 
parameters, and I have neither time, nor motivation not to mention skills to 
figure that out, so there will be no test cases for this function form me 
afterall.

The RV loop has no such issue - always matches the C reference AFAICT.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] lavc/aacpsdsp: fix R-V V stereo interpolate

2023-12-17 Thread Rémi Denis-Courmont
The penultimate loop iteration could pick any vl such that:
 vlenb/4 < vl <= vlenb/2
Thus if the total length is not a multiple of vlenb/2, the vfadd.vf
on the penultimate iteration would yield corrupt values for the last
iteration.

To avoid this, force vl = vlen/2 until the last iteration. Unfortunately
this latent bug is not reproducible with either hardware or QEMU as of now.
---
 libavcodec/riscv/aacpsdsp_rvv.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S
index f46b35fe91..a79d7d7818 100644
--- a/libavcodec/riscv/aacpsdsp_rvv.S
+++ b/libavcodec/riscv/aacpsdsp_rvv.S
@@ -234,7 +234,8 @@ func ff_ps_stereo_interpolate_rvv, zve32f
 vfmacc.vfv22, ft3, v24
 fmul.s   ft3, ft3, ft4
 1:
-vsetvli   t0, a4, e32, m2, ta, ma
+min   t0, t0, a4
+vsetvli   zero, t0, e32, m2, ta, ma
 vlseg2e32.v v0, (a0) // v0:l_re, v2:l_im
 sub   a4, a4, t0
 vlseg2e32.v v4, (a1)// v4:r_re, v6:r_im
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] all: Don't set AVClass.item_name to its default value

2023-12-22 Thread Rémi Denis-Courmont
Le perjantaina 22. joulukuuta 2023, 15.48.45 EET Andreas Rheinhardt a écrit :
> Avoids relocations.

This is a little bit misleading. It reduces the number of relocations indeed, 
but the data structures still end up in nonshareable .data.relro rather than 
.rodata due to other remaining pointers.

I'm fine with the change though.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-22 Thread Rémi Denis-Courmont
Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit :
> func ff_decorrelate_sm_rvv, zve32x
> 1:
> vsetvli  t0, a2, e32, m8, ta, ma
> vle32.v  v8, (a1)
> sub a2,  a2, t0
> vle32.v  v0, (a0)
> vssra.vi  v8, v8, 1
> vsub.vv  v16, v0, v8
> vse32.v  v16, (a0)
> sh2add   a0, t0, a0
> vadd.vv  v16, v0, v8
> vse32.v  v16, (a1)
> sh2add   a1, t0, a1
> bnez a2, 1b
> ret
> endfunc
> 
> Is this way? In this situation, or when using vsra, there will be some
> tests that fail, and the result value differs by 1. I'm not sure where the
> problem..

No, I meant something like this, but it turns out slightly slower anyway. 
Saving the data dependency is not worth adding an instruction.

func ff_decorrelate_sm_rvv, zve32x
csrwi   vxrm, 0
1:
vsetvli t0, a2, e32, m8, ta, ma
vle32.v v8, (a1)
sub a2, a2, t0
vle32.v v0, (a0)
vsra.vi v16, v8, 1
vssra.vi v8, v8, 1
vsub.vv v16, v0, v16
vadd.vv v8, v0, v8
vse32.v v16, (a0)
sh2add  a0, t0, a0
vse32.v v8, (a1)
sh2add  a1, t0, a1
bneza2, 1b

ret
endfunc

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add

2023-12-22 Thread Rémi Denis-Courmont
Le perjantaina 22. joulukuuta 2023, 3.41.29 EET flow gg a écrit :
> It's at c908
> 
> According to the benchmark results, if vlseg2e64 is used, the speed is
> almost as slow as C language (dcmul_add_rvv_f64: 86.2), if vsseg2e64 is
> used, it will be only a bit slower (dcmul_add_rvv_f64: 50.2).

Fair enough but yikes. I doubt that this is going to turn out well on other 
vendors' upcoming IPs.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] FATE RISC-V planned maintenance

2023-12-22 Thread Rémi Denis-Courmont
Hello,

The RISC-V board will be personally visiting the taylor to get a fashionable 
custom-made outfit. As a consequence, it will be taking a much deserved break 
from FATE service over the end of year holidays.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] lavc/takdsp: R-V V decorrelate_sf

2023-12-22 Thread Rémi Denis-Courmont
decorrelate_sf_c:  259.2
decorrelate_sf_rvv_i32: 45.5
---
 libavcodec/riscv/takdsp_init.c |  2 ++
 libavcodec/riscv/takdsp_rvv.S  | 21 +
 2 files changed, 23 insertions(+)

diff --git a/libavcodec/riscv/takdsp_init.c b/libavcodec/riscv/takdsp_init.c
index 4312c8d99d..58be83860b 100644
--- a/libavcodec/riscv/takdsp_init.c
+++ b/libavcodec/riscv/takdsp_init.c
@@ -28,6 +28,7 @@
 void ff_decorrelate_ls_rvv(const int32_t *p1, int32_t *p2, int length);
 void ff_decorrelate_sr_rvv(int32_t *p1, const int32_t *p2, int length);
 void ff_decorrelate_sm_rvv(int32_t *p1, int32_t *p2, int length);
+void ff_decorrelate_sf_rvv(int32_t *p1, const int32_t *p2, int len, int, int);
 
 av_cold void ff_takdsp_init_riscv(TAKDSPContext *dsp)
 {
@@ -38,6 +39,7 @@ av_cold void ff_takdsp_init_riscv(TAKDSPContext *dsp)
 dsp->decorrelate_ls = ff_decorrelate_ls_rvv;
 dsp->decorrelate_sr = ff_decorrelate_sr_rvv;
 dsp->decorrelate_sm = ff_decorrelate_sm_rvv;
+dsp->decorrelate_sf = ff_decorrelate_sf_rvv;
 }
 #endif
 }
diff --git a/libavcodec/riscv/takdsp_rvv.S b/libavcodec/riscv/takdsp_rvv.S
index b593d9139a..fa942a3be6 100644
--- a/libavcodec/riscv/takdsp_rvv.S
+++ b/libavcodec/riscv/takdsp_rvv.S
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences (ISCAS).
+ * Copyright (c) 2023 Rémi Denis-Courmont
  *
  * This file is part of FFmpeg.
  *
@@ -65,3 +66,23 @@ func ff_decorrelate_sm_rvv, zve32x
 
 ret
 endfunc
+
+func ff_decorrelate_sf_rvv, zve32x
+csrwivxrm, 0
+1:
+vsetvli  t0, a2, e32, m8, ta, ma
+vle32.v  v8, (a1)
+sub  a2, a2, t0
+vsra.vx  v8, v8, a3
+sh2add   a1, t0, a1
+vle32.v  v0, (a0)
+vmul.vx  v8, v8, a4
+vssra.vi v8, v8, 8
+vsll.vx  v8, v8, a3
+vsub.vv  v0, v8, v0
+vse32.v  v0, (a0)
+sh2add   a0, t0, a0
+bnez a2, 1b
+
+ret
+endfunc
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] all: Don't set AVClass.item_name to its default value

2023-12-22 Thread Rémi Denis-Courmont
Le perjantaina 22. joulukuuta 2023, 18.13.51 EET Andreas Rheinhardt a écrit :
> Rémi Denis-Courmont:
> > Le perjantaina 22. joulukuuta 2023, 15.48.45 EET Andreas Rheinhardt a 
écrit :
> >> Avoids relocations.
> > 
> > This is a little bit misleading. It reduces the number of relocations
> > indeed, but the data structures still end up in nonshareable .data.relro
> > rather than .rodata due to other remaining pointers.
> 
> I never claimed that the AVClasses would be moved to .rodata. I only
> claimed that it avoids relocations. And it does.

It is misleading because it can be interpreted both ways. I could equally 
argue that it does not avoid relocations: the concerned objects are still 
subject to relocations.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] riscv: Tweak names of cpu flags, print flags in libavutil/tests/cpu

2023-12-15 Thread Rémi Denis-Courmont


Le 15 décembre 2023 15:05:10 GMT+02:00, "Martin Storsjö"  a 
écrit :
>The names of the cpu flags are pased by the libavutil eval

PaSsed

>functions - names with dashes are parsed as a subtraction.
>Replace dashes with underscores.

My personal preference would be to use official extension names, doubly so if 
we switch to underscore separators (which matches official syntax), rather than 
the made-up FFmpeg names. But that's just my opinion.


>
>Add these cpu flags to libavutil/tests/cpu, so that the detected
>cpu flags also get printed when run, also as part of the
>fate-cpu test.
>---
> libavutil/cpu.c   | 10 +-
> libavutil/tests/cpu.c | 10 ++
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>index 1e0607d581..8c1acc5e72 100644
>--- a/libavutil/cpu.c
>+++ b/libavutil/cpu.c
>@@ -186,12 +186,12 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
> { "rvi",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVI
>   },.unit = "flags" },
> { "rvf",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVF
>   },.unit = "flags" },
> { "rvd",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVD
>   },.unit = "flags" },
>-{ "rvv-i32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_I32 }, .unit = "flags" },
>-{ "rvv-f32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_F32 }, .unit = "flags" },
>-{ "rvv-i64",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_I64 }, .unit = "flags" },
>+{ "rvv_i32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_I32 }, .unit = "flags" },
>+{ "rvv_f32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_F32 }, .unit = "flags" },
>+{ "rvv_i64",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVV_I64 }, .unit = "flags" },
> { "rvv",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
> AV_CPU_FLAG_RVV_F64 }, .unit = "flags" },
>-{ "rvb-addr",NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVB_ADDR },   .unit = "flags" },
>-{ "rvb-basic",NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVB_BASIC },   .unit = "flags" },
>+{ "rvb_addr",NULL, 0, AV_OPT_TYPE_CONST,  { .i64 = 
>AV_CPU_FLAG_RVB_ADDR },   .unit = "flags" },
>+{ "rvb_basic",NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
>AV_CPU_FLAG_RVB_BASIC },   .unit = "flags" },
> #endif
> { NULL },
> };
>diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
>index 200f20388a..1cabd15b72 100644
>--- a/libavutil/tests/cpu.c
>+++ b/libavutil/tests/cpu.c
>@@ -84,6 +84,16 @@ static const struct {
> #elif ARCH_LOONGARCH
> { AV_CPU_FLAG_LSX,   "lsx"},
> { AV_CPU_FLAG_LASX,  "lasx"   },
>+#elif ARCH_RISCV
>+{ AV_CPU_FLAG_RVI,   "rvi"},
>+{ AV_CPU_FLAG_RVF,   "rvf"},
>+{ AV_CPU_FLAG_RVD,   "rvd"},
>+{ AV_CPU_FLAG_RVB_ADDR,  "rvb_addr"   },
>+{ AV_CPU_FLAG_RVB_BASIC, "rvb_basic"  },
>+{ AV_CPU_FLAG_RVV_I32,   "rvv_i32"},
>+{ AV_CPU_FLAG_RVV_F32,   "rvv_f32"},
>+{ AV_CPU_FLAG_RVV_I64,   "rvv_i64"},
>+{ AV_CPU_FLAG_RVV_F64,   "rvv"},
> #endif
> { 0 }
> };
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] riscv: Tweak names of cpu flags, print flags in libavutil/tests/cpu

2023-12-15 Thread Rémi Denis-Courmont


Le 15 décembre 2023 18:39:28 GMT+02:00, "Martin Storsjö"  a 
écrit :
>On Fri, 15 Dec 2023, Rémi Denis-Courmont wrote:
>
>> Le 15 décembre 2023 15:05:10 GMT+02:00, "Martin Storsjö"  
>> a écrit :
>>> The names of the cpu flags are pased by the libavutil eval
>> 
>> PaSsed
>
>Actually, I meant "parsed"
>
>>> functions - names with dashes are parsed as a subtraction.
>>> Replace dashes with underscores.
>> 
>> My personal preference would be to use official extension names, doubly so 
>> if we switch to underscore separators (which matches official syntax), 
>> rather than the made-up FFmpeg names. But that's just my opinion.
>
>Sure - which are those names? Do you want to suggest a patch yourself?

On top of my head, those would be Zve32x Zve64x Zve32f Zve64d Zba and Zbb. 
Whatever shows up in the existing `func` macros' second parameter.


>
>// Martin
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] riscv: vc1dsp: Don't check vlenb before checking the CPU flags

2023-12-15 Thread Rémi Denis-Courmont


Le 15 décembre 2023 15:02:04 GMT+02:00, "Martin Storsjö"  a 
écrit :
>We can't call ff_get_rv_vlenb() if we don't have RVV available
>at all.
>
>Due to the SIGILL signal handler in checkasm catching it, in an
>unexpected place, this caused checkasm to hang instead of reporting
>the issue.
>---
> libavcodec/riscv/vc1dsp_init.c | 16 +++-
> 1 file changed, 7 insertions(+), 9 deletions(-)
>
>diff --git a/libavcodec/riscv/vc1dsp_init.c b/libavcodec/riscv/vc1dsp_init.c
>index 0d22d28f4d..2bb7e7fe8f 100644
>--- a/libavcodec/riscv/vc1dsp_init.c
>+++ b/libavcodec/riscv/vc1dsp_init.c
>@@ -35,15 +35,13 @@ av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp)
> #if HAVE_RVV
> int flags = av_get_cpu_flags();
> 
>-if (ff_get_rv_vlenb() >= 16) {
>-if (flags & AV_CPU_FLAG_RVV_I64) {
>-dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv;
>-dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv;
>-}
>-if (flags & AV_CPU_FLAG_RVV_I32) {
>-dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
>-dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;
>-}
>+if (flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) {
>+dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv;
>+dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv;
>+}
>+if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) {
>+dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
>+dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;

I64 implies I32 so it is not necessary to check vlenb twice. That's what I was 
going for originally in my then review comments but then woopsie.

> }
> #endif
> }
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] riscv: vc1dsp: Don't check vlenb before checking the CPU flags

2023-12-15 Thread Rémi Denis-Courmont


Le 15 décembre 2023 17:39:48 GMT+02:00, "Martin Storsjö"  a 
écrit :
>On Fri, 15 Dec 2023, Rémi Denis-Courmont wrote:
>
>> Le 15 décembre 2023 15:02:04 GMT+02:00, "Martin Storsjö"  
>> a écrit :
>>> We can't call ff_get_rv_vlenb() if we don't have RVV available
>>> at all.
>>> 
>>> Due to the SIGILL signal handler in checkasm catching it, in an
>>> unexpected place, this caused checkasm to hang instead of reporting
>>> the issue.
>>> ---
>>> libavcodec/riscv/vc1dsp_init.c | 16 +++-
>>> 1 file changed, 7 insertions(+), 9 deletions(-)
>>> 
>>> diff --git a/libavcodec/riscv/vc1dsp_init.c b/libavcodec/riscv/vc1dsp_init.c
>>> index 0d22d28f4d..2bb7e7fe8f 100644
>>> --- a/libavcodec/riscv/vc1dsp_init.c
>>> +++ b/libavcodec/riscv/vc1dsp_init.c
>>> @@ -35,15 +35,13 @@ av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp)
>>> #if HAVE_RVV
>>> int flags = av_get_cpu_flags();
>>> 
>>> -if (ff_get_rv_vlenb() >= 16) {
>>> -if (flags & AV_CPU_FLAG_RVV_I64) {
>>> -dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv;
>>> -dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv;
>>> -}
>>> -if (flags & AV_CPU_FLAG_RVV_I32) {
>>> -dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
>>> -dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;
>>> -}
>>> +if (flags & AV_CPU_FLAG_RVV_I64 && ff_get_rv_vlenb() >= 16) {
>>> +dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv;
>>> +dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv;
>>> +}
>>> +if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) {
>>> +dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
>>> +dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;
>> 
>> I64 implies I32 so it is not necessary to check vlenb twice. That's what I 
>> was going for originally in my then review comments but then woopsie.
>
>Sure, fixed.
>
>FWIW I see that vc1_inv_trans_8x4_dc_rvv_i64 seems to fail the checkasm test 
>most of the time as well.

Hmm, I didn't write those optimisations but I thought I tested them before 
pushing. Is this subtly dependent on the vector length, maybe?  Currently only 
128-bit hardware is commercially available but QEMU can also emulate 256, 512 
and 1014.

>
>// Martin
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] riscv: vc1dsp: Don't check vlenb before checking the CPU flags

2023-12-16 Thread Rémi Denis-Courmont
Le perjantaina 15. joulukuuta 2023, 17.38.45 EET Martin Storsjö a écrit :
> We can't call ff_get_rv_vlenb() if we don't have RVV available
> at all.
> 
> Due to the SIGILL signal handler in checkasm catching it, in an
> unexpected place, this caused checkasm to hang instead of reporting
> the issue.
> ---
>  libavcodec/riscv/vc1dsp_init.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/libavcodec/riscv/vc1dsp_init.c b/libavcodec/riscv/vc1dsp_init.c
> index 0d22d28f4d..e47b644f80 100644
> --- a/libavcodec/riscv/vc1dsp_init.c
> +++ b/libavcodec/riscv/vc1dsp_init.c
> @@ -35,15 +35,13 @@ av_cold void ff_vc1dsp_init_riscv(VC1DSPContext *dsp)
>  #if HAVE_RVV
>  int flags = av_get_cpu_flags();
> 
> -if (ff_get_rv_vlenb() >= 16) {
> +if (flags & AV_CPU_FLAG_RVV_I32 && ff_get_rv_vlenb() >= 16) {
> +dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
> +dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;
>  if (flags & AV_CPU_FLAG_RVV_I64) {
>  dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_rvv;
>  dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_rvv;
>  }
> -if (flags & AV_CPU_FLAG_RVV_I32) {
> -dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_rvv;
> -dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_rvv;
> -}
>  }
>  #endif
>  }

Acked-by: Rémi Denis-Courmont 

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] lavc/vc1dsp: fix R-V V vector lengths

2023-12-16 Thread Rémi Denis-Courmont
The 8x4 and 4x4 use a needlessly large multiplier (unless/until we care
about embedded 64-bit-vector hardware). This is merely suboptimal.

The 8x4 case also uses an incorrect vector length, which leads to incorrect
behaviour on future/hypothetical hardware with 256-bit or larger vectors.

Pointed-out-by: Martin Storsjö 
---
 libavcodec/riscv/vc1dsp_rvv.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S
index 1a503ecc87..4a00945ead 100644
--- a/libavcodec/riscv/vc1dsp_rvv.S
+++ b/libavcodec/riscv/vc1dsp_rvv.S
@@ -68,7 +68,7 @@ endfunc
 
 func ff_vc1_inv_trans_8x4_dc_rvv, zve64x
 lht2, (a2)
-vsetivli  zero, 8, e8, mf2, ta, ma
+vsetivli  zero, 4, e8, mf4, ta, ma
 vlse64.v  v0, (a0), a1
 sh1addt2, t2, t2
 addi  t2, t2, 1
@@ -84,14 +84,14 @@ func ff_vc1_inv_trans_8x4_dc_rvv, zve64x
 vmax.vx   v4, v4, zero
 vsetvli   zero, zero, e8, m2, ta, ma
 vnclipu.wiv0, v4, 0
-vsetivli  zero, 8, e8, mf2, ta, ma
+vsetivli  zero, 4, e8, mf4, ta, ma
 vsse64.v  v0, (a0), a1
 ret
 endfunc
 
 func ff_vc1_inv_trans_4x4_dc_rvv, zve32x
 lht2, (a2)
-vsetivli  zero, 4, e8, mf2, ta, ma
+vsetivli  zero, 4, e8, mf4, ta, ma
 vlse32.v  v0, (a0), a1
 slli  t1, t2, 4
 add   t2, t2, t1
@@ -107,7 +107,7 @@ func ff_vc1_inv_trans_4x4_dc_rvv, zve32x
 vmax.vx   v2, v2, zero
 vsetvli   zero, zero, e8, m1, ta, ma
 vnclipu.wiv0, v2, 0
-vsetivli  zero, 4, e8, mf2, ta, ma
+vsetivli  zero, 4, e8, mf4, ta, ma
 vsse32.v  v0, (a0), a1
 ret
 endfunc
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] lavc/opusdsp: simplify R-V V postfilter

2023-12-16 Thread Rémi Denis-Courmont
This skips the round-trip to scalar register for the sliding 'x'
coefficients, improving performance by about 5%. The trick here is that
the vector slide-up instruction preserves elements in destination vector
until the slide offset.

The switch from vfslide1up.vf to vslideup.vi also allows the elimination
of data dependencies on consecutive slides. Since the specifications
recommend sticking to power of two offsets, we could slide as follows:

vslideup.vi v8, v0, 2
vslideup.vi v4, v0, 1
vslideup.vi v12, v8, 1
vslideup.vi v16, v8, 2

However in the device under test, this seems to make performance slightly
worse, so this is left for (in)validation with future better hardware.
---
 libavcodec/riscv/opusdsp_rvv.S | 30 --
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/libavcodec/riscv/opusdsp_rvv.S b/libavcodec/riscv/opusdsp_rvv.S
index 79ae86c30e..9a8914c78d 100644
--- a/libavcodec/riscv/opusdsp_rvv.S
+++ b/libavcodec/riscv/opusdsp_rvv.S
@@ -26,40 +26,34 @@ func ff_opus_postfilter_rvv, zve32f
 flw fa1, 4(a2) // g1
 sub t0, a0, t1
 flw fa2, 8(a2) // g2
+addit1, t0, -2 * 4 // data - (period + 2) = initial 
+vsetivli zero, 4, e32, m4, ta, ma
 addit0, t0, 2 * 4 // data - (period - 2) = initial 
-
-flw ft4, -16(t0)
+vle32.v v16, (t1)
 addit3, a1, -2 // maximum parallelism w/o stepping our tail
-flw ft3, -12(t0)
-flw ft2,  -8(t0)
-flw ft1,  -4(t0)
 1:
+vslidedown.vi v8, v16, 2
 min t1, a3, t3
+vslide1down.vx v12, v16, zero
 vsetvli t1, t1, e32, m4, ta, ma
 vle32.v v0, (t0) // x0
 sub a3, a3, t1
-vle32.v v28, (a0)
+vslide1down.vx v4, v8, zero
 sh2add  t0, t1, t0
-vfslide1up.vf v4, v0, ft1
+vle32.v v28, (a0)
 addit2, t1, -4
-vfslide1up.vf v8, v4, ft2
-vfslide1up.vf v12, v8, ft3
-vfslide1up.vf v16, v12, ft4
+vslideup.vi v4, v0, 1
+vslideup.vi v8, v4, 1
+vslideup.vi v12, v8, 1
+vslideup.vi v16, v12, 1
 vfadd.vv v20, v4, v12
 vfadd.vv v24, v0, v16
-vslidedown.vx v12, v0, t2
+vslidedown.vx v16, v0, t2
 vfmacc.vf v28, fa0, v8
-vslidedown.vi v4, v12, 2
 vfmacc.vf v28, fa1, v20
-vslide1down.vx v8, v12, zero
 vfmacc.vf v28, fa2, v24
-vslide1down.vx v0, v4, zero
 vse32.v v28, (a0)
-vfmv.f.s ft4, v12
 sh2add  a0, t1, a0
-vfmv.f.s ft2, v4
-vfmv.f.s ft3, v8
-vfmv.f.s ft1, v0
 bneza3, 1b
 
 ret
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] checkasm/lpc: test compute_autocorr

2023-12-14 Thread Rémi Denis-Courmont
Le torstaina 14. joulukuuta 2023, 18.41.24 EET Michael Niedermayer a écrit :
> SSE2:
>  - lpc.apply_welch_window_even [OK]
>  - lpc.apply_welch_window_odd  [OK]
> 0:  976.228035341704 -  976.998462662304 = -0.7704273206
>autocorr_10_sse2 (lpc.c:81)
>  - lpc.compute_autocorr_10 [FAILED]
> 0:  966.946397975397 -  967.716825295995 = -0.770427320599
>autocorr_30_sse2 (lpc.c:81)
>  - lpc.compute_autocorr_30 [FAILED]
> 0:  968.085384693526 -  968.855812014127 = -0.770427320601

Right, it seems that the SSE optimisations craps out on odd length. The RVV 
code seems to match the C code there, so I am not sure if this is exposing an 
existing bug in the SSE code, or if odd length are illegal.

On a related note, we should probably test for odd lag values, as the C code 
has special handling for them. But from a quick glance, it seems that the SSE 
code also fails to deal with that case.

>autocorr_32_sse2 (lpc.c:81)
>  - lpc.compute_autocorr_32 [FAILED]
> AVX2:
>  - lpc.apply_welch_window_even [OK]
>  - lpc.apply_welch_window_odd  [OK]
> checkasm: 3 of 7 tests have failed
> $ ffmpeg/tests/checkasm/checkasm --test=lpc
> checkasm: using random seed 470640728
> SSE2:
>  - lpc.apply_welch_window_even [OK]
>  - lpc.apply_welch_window_odd  [OK]
>  - lpc.compute_autocorr_10 [OK]
>  - lpc.compute_autocorr_30 [OK]
>  - lpc.compute_autocorr_32 [OK]
> AVX2:
>  - lpc.apply_welch_window_even [OK]
>  - lpc.apply_welch_window_odd  [OK]
> checkasm: all 7 tests passed
> 
> 
> [...]


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] riscv: Tweak names of cpu flags, print flags in libavutil/tests/cpu

2023-12-17 Thread Rémi Denis-Courmont
Le perjantaina 15. joulukuuta 2023, 22.52.51 EET Martin Storsjö a écrit :
> The names of the cpu flags, when parsed from a string with
> av_parse_cpu_caps, are parsed by the libavutil eval functions. These
> interpret dashes as subtractions. Therefore, these previous cpu flag
> names haven't been possible to set.
> 
> Use the official names for these extensions, as the previous ad-hoc
> names wasn't parseable.
> 
> libavutil/tests/cpu tests that the cpu flags can be set, and prints
> the detected flags.
> ---
>  libavutil/cpu.c   | 12 ++--
>  libavutil/tests/cpu.c | 10 ++
>  2 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
> index 1e0607d581..f04068acda 100644
> --- a/libavutil/cpu.c
> +++ b/libavutil/cpu.c
> @@ -186,12 +186,12 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
>  { "rvi",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVI 
> },.unit = "flags" }, { "rvf",  NULL, 0, AV_OPT_TYPE_CONST, {
> .i64 = AV_CPU_FLAG_RVF  },.unit = "flags" }, { "rvd",  NULL, 0,
> AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVD  },.unit = "flags" }, -
>{ "rvv-i32",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I32 }, .unit = "flags" }, -{ "rvv-f32",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32 }, .unit = "flags"
> }, -{ "rvv-i64",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I64 }, .unit = "flags" }, -{ "rvv",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64 }, .unit = "flags"
> }, -{ "rvb-addr",NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVB_ADDR },   .unit = "flags" }, -{ "rvb-basic",NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC },   .unit = "flags"
> }, +{ "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I32  },.unit = "flags" }, +{ "zve32f",   NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32  },.unit = "flags"
> }, +{ "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVV_I64  },.unit = "flags" },
> +{ "zve64f",   NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64  },.unit = "flags"

That's Zve64d. Or V though strictly speaking V also implies a vector length of 
at least 128 bits, while Zve64d only implies 64 bits.

> }, +{ "zba",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> AV_CPU_FLAG_RVB_ADDR },.unit = "flags" }, +{ "zbb",  NULL,
> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC },   .unit = "flags"
> }, #endif
>  { NULL },
>  };
> diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
> index 200f20388a..6b27bcdbad 100644
> --- a/libavutil/tests/cpu.c
> +++ b/libavutil/tests/cpu.c
> @@ -84,6 +84,16 @@ static const struct {
>  #elif ARCH_LOONGARCH
>  { AV_CPU_FLAG_LSX,   "lsx"},
>  { AV_CPU_FLAG_LASX,  "lasx"   },
> +#elif ARCH_RISCV
> +{ AV_CPU_FLAG_RVI,   "rvi"},
> +{ AV_CPU_FLAG_RVF,   "rvf"},
> +{ AV_CPU_FLAG_RVD,   "rvd"},
> +{ AV_CPU_FLAG_RVB_ADDR,  "zba"},
> +{ AV_CPU_FLAG_RVB_BASIC, "zbb"},
> +{ AV_CPU_FLAG_RVV_I32,   "zve32x" },
> +{ AV_CPU_FLAG_RVV_F32,   "zve32f" },
> +{ AV_CPU_FLAG_RVV_I64,   "zve64x" },
> +{ AV_CPU_FLAG_RVV_F64,   "zve64f" },
>  #endif
>  { 0 }
>  };


-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCHv2 1/1] checkasm/lpc: test compute_autocorr

2023-12-17 Thread Rémi Denis-Courmont
---
 tests/checkasm/lpc.c | 47 ++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
index 592e34c03d..9b33f8a3b0 100644
--- a/tests/checkasm/lpc.c
+++ b/tests/checkasm/lpc.c
@@ -57,10 +57,46 @@ static void test_window(int len)
 bench_new(src, len, dst1);
 }
 
+static void test_compute_autocorr(ptrdiff_t len, int lag)
+{
+LOCAL_ALIGNED(32, double, src, [5000 + 2 + MAX_LPC_ORDER]);
+LOCAL_ALIGNED(16, double, dst0, [MAX_LPC_ORDER + 1]);
+LOCAL_ALIGNED(16, double, dst1, [MAX_LPC_ORDER + 1]);
+
+declare_func(void, const double *in, ptrdiff_t len, int lag, double *out);
+
+av_assert0(lag >= 0 && lag <= MAX_LPC_ORDER);
+
+for (int i = 0; i < MAX_LPC_ORDER; i++)
+src[i] = 0.;
+
+src += MAX_LPC_ORDER;
+
+for (ptrdiff_t i = 0; i < len; i++) {
+src[i] = (double)rnd() / (double)UINT_MAX;
+}
+
+call_ref(src, len, lag, dst0);
+call_new(src, len, lag, dst1);
+
+for (size_t i = 0; i < lag; i++) {
+if (!double_near_abs_eps(dst0[i], dst1[i], EPS)) {
+fprintf(stderr, "%zu: %- .12f - %- .12f = % .12g\n",
+i, dst0[i], dst1[i], dst0[i] - dst1[i]);
+fail();
+break;
+}
+}
+
+bench_new(src, len, lag, dst1);
+}
+
 void checkasm_check_lpc(void)
 {
 LPCContext ctx;
-int len = rnd() % 5000;
+int len = 2000 + (rnd() % 3000);
+static const int lags[] = { 10, 30, 32 };
+
 ff_lpc_init(, 32, 16, FF_LPC_TYPE_DEFAULT);
 
 if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_even")) {
@@ -72,6 +108,13 @@ void checkasm_check_lpc(void)
 test_window(len | 1);
 }
 report("apply_welch_window_odd");
-
 ff_lpc_end();
+
+for (size_t i = 0; i < FF_ARRAY_ELEMS(lags); i++) {
+ff_lpc_init(, len, lags[i], FF_LPC_TYPE_DEFAULT);
+if (check_func(ctx.lpc_compute_autocorr, "autocorr_%d", lags[i]))
+test_compute_autocorr(len, lags[i]);
+report("compute_autocorr_%d", lags[i]);
+ff_lpc_end();
+}
 }
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] riscv: Tweak names of cpu flags, print flags in libavutil/tests/cpu

2023-12-17 Thread Rémi Denis-Courmont
Le sunnuntaina 17. joulukuuta 2023, 11.11.30 EET Martin Storsjö a écrit :
> >> AV_CPU_FLAG_RVV_I32 }, .unit = "flags" }, -{ "rvv-f32", 
> >> NULL,
> >> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32 }, .unit = "flags"
> >> }, -{ "rvv-i64",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> >> AV_CPU_FLAG_RVV_I64 }, .unit = "flags" }, -{ "rvv", 
> >> NULL,
> >> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64 }, .unit = "flags"
> >> }, -{ "rvb-addr",NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> >> AV_CPU_FLAG_RVB_ADDR },   .unit = "flags" }, -{ "rvb-basic",NULL,
> >> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB_BASIC },   .unit = "flags"
> >> }, +{ "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> >> AV_CPU_FLAG_RVV_I32  },.unit = "flags" }, +{ "zve32f",  
> >> NULL,
> >> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32  },.unit = "flags"
> >> }, +{ "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 =
> >> AV_CPU_FLAG_RVV_I64  },.unit = "flags" },
> >> +{ "zve64f",   NULL,
> >> 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F64  },.unit = "flags"
> > 
> > That's Zve64d. Or V though strictly speaking V also implies a vector
> > length of at least 128 bits, while Zve64d only implies 64 bits.
> 
> Oh, right. But we'd use it lowercased here, as "zve64d", as that's what we
> use with the function macros and with .option arch, +, right?

`.option arch` wants lower case names.

> Using the single-letter forms here for cpu flags would probably feel a bit
> obscure...
> Is there some similar names like these, that would be used for
> .option arch, for what we call rvi/rvf/rvd above?

I is the base Integer set, not an extension. F and D mean Float and Double, 
but I don't think that they have formal names. They only exists because 
checkasm wants flags for everything. You're not supposed to check those at run-
time.

Also D is not used for anything. I don't know how feasible it would be to 
rework the C versions of pixblockdsp and audiodsp to purge I and F.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 2/2] lavc/lpc: R-V V compute_autocorr

2023-12-12 Thread Rémi Denis-Courmont
The loop iterates over the length of the vector, not the order. This is
to avoid reloading the same data for each lag value. However this means
the loop only works if the maximum order is no larger than VLENB.

The loop is roughly equivalent to:

for (size_t j = 0; j < lag; j++)
autoc[j] = 1.;

while (len > lag) {
for (ptrdiff_t j = 0; j < lag; j++)
autoc[j] += data[j] * *data;
data++;
len--;
}

while (len > 0) {
for (ptrdiff_t j = 0; j < len; j++)
autoc[j] += data[j] * *data;
data++;
len--;
}

Since register pressure is only at 50%, it should be possible to implement
the same loop for order up to 2xVLENB. But this is left for future work.

Performance numbers are all over the place from ~1.25x to ~4x speedups,
but at least they are always noticeably better than nothing.
---
 libavcodec/riscv/lpc_init.c |  8 +++-
 libavcodec/riscv/lpc_rvv.S  | 29 +
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/libavcodec/riscv/lpc_init.c b/libavcodec/riscv/lpc_init.c
index c16e5745f0..ab91956f2d 100644
--- a/libavcodec/riscv/lpc_init.c
+++ b/libavcodec/riscv/lpc_init.c
@@ -22,16 +22,22 @@
 
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
+#include "libavutil/riscv/cpu.h"
 #include "libavcodec/lpc.h"
 
 void ff_lpc_apply_welch_window_rvv(const int32_t *, ptrdiff_t, double *);
+void ff_lpc_compute_autocorr_rvv(const double *, ptrdiff_t, int, double *);
 
 av_cold void ff_lpc_init_riscv(LPCContext *c)
 {
 #if HAVE_RVV && (__riscv_xlen >= 64)
 int flags = av_get_cpu_flags();
 
-if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR))
+if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR)) {
 c->lpc_apply_welch_window = ff_lpc_apply_welch_window_rvv;
+
+if (ff_get_rv_vlenb() >= c->max_order)
+c->lpc_compute_autocorr = ff_lpc_compute_autocorr_rvv;
+}
 #endif
 }
diff --git a/libavcodec/riscv/lpc_rvv.S b/libavcodec/riscv/lpc_rvv.S
index f81a2392c1..654156bf12 100644
--- a/libavcodec/riscv/lpc_rvv.S
+++ b/libavcodec/riscv/lpc_rvv.S
@@ -85,4 +85,33 @@ func ff_lpc_apply_welch_window_rvv, zve64d
 
 ret
 endfunc
+
+func ff_lpc_compute_autocorr_rvv, zve64d
+lit0, 1
+vsetvli   t1, a2, e64, m8, ta, ma
+fcvt.d.l  ft0, t0
+vle64.v   v0, (a0)
+sh3adda0, a2, a0   # data += lag
+vfmv.v.f  v16, ft0
+bge   a2, a1, 2f
+1:
+vfmv.f.s  ft0, v0
+fld   ft1, (a0)# ft1 = data[lag + i]
+vfmacc.vf v16, ft0, v0 # v16[j] += data[i] * data[i + j]
+addi  a1, a1, -1
+vfslide1down.vf v0, v0, ft1
+addi  a0, a0, 8
+bgt   a1, a2, 1b   # while (len > lag);
+2:
+vfmv.f.s  ft0, v0
+vsetvli   zero, a1, e64, m8, tu, ma
+vfmacc.vf v16, ft0, v0
+addi  a1, a1, -1
+vslide1down.vx v0, v0, zero
+bnez  a1, 2b   # while (len > 0);
+
+vsetvli   zero, a2, e64, m8, ta, ma
+vse64.v   v16, (a3)
+ret
+endfunc
 #endif
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/2] checkasm/lpc: test compute_autocorr

2023-12-12 Thread Rémi Denis-Courmont
---
 tests/checkasm/lpc.c | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
index 592e34c03d..4d84defec3 100644
--- a/tests/checkasm/lpc.c
+++ b/tests/checkasm/lpc.c
@@ -57,10 +57,41 @@ static void test_window(int len)
 bench_new(src, len, dst1);
 }
 
+static void test_compute_autocorr(ptrdiff_t len, int lag)
+{
+LOCAL_ALIGNED(16, double, src, [5000]);
+LOCAL_ALIGNED(16, double, dst0, [MAX_LPC_ORDER + 1]);
+LOCAL_ALIGNED(16, double, dst1, [MAX_LPC_ORDER + 1]);
+
+declare_func(void, const double *in, ptrdiff_t len, int lag, double *out);
+
+av_assert0(lag >= 0 && lag <= MAX_LPC_ORDER);
+
+for (size_t i = 0; i < len; i++) {
+src[i] = (double)rnd() / (double)UINT_MAX;
+}
+
+call_ref(src, len, lag, dst0);
+call_new(src, len, lag, dst1);
+
+for (size_t i = 0; i < lag; i++) {
+if (!double_near_abs_eps(dst0[i], dst1[i], EPS)) {
+fprintf(stderr, "%zu: %- .12f - %- .12f = % .12g\n",
+i, dst0[i], dst1[i], dst0[i] - dst1[i]);
+fail();
+break;
+}
+}
+
+bench_new(src, len, lag, dst1);
+}
+
 void checkasm_check_lpc(void)
 {
 LPCContext ctx;
-int len = rnd() % 5000;
+int len = 2000 + (rnd() % 3000);
+static const int lags[] = { 10, 30, 32 };
+
 ff_lpc_init(, 32, 16, FF_LPC_TYPE_DEFAULT);
 
 if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_even")) {
@@ -72,6 +103,13 @@ void checkasm_check_lpc(void)
 test_window(len | 1);
 }
 report("apply_welch_window_odd");
-
 ff_lpc_end();
+
+for (size_t i = 0; i < FF_ARRAY_ELEMS(lags); i++) {
+ff_lpc_init(, 32, lags[i], FF_LPC_TYPE_DEFAULT);
+if (check_func(ctx.lpc_compute_autocorr, "autocorr_%d", lags[i]))
+test_compute_autocorr(len, lags[i]);
+report("compute_autocorr_%d", lags[i]);
+ff_lpc_end();
+}
 }
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] gdigrab: Allow capturing a window by its handle

2023-12-12 Thread Rémi Denis-Courmont


Le 12 décembre 2023 16:07:28 GMT+02:00, Nicolas George  a 
écrit :
>Lena via ffmpeg-devel (12023-12-12):
>> The documentation for `strtol` says that on error, 0 is returned. This
>> makes it impossible to specify a window handle of 0 (the whole
>> desktop), but that case is already covered by the "desktop" input
>> filename, so it should be fine.
>
>The correct way to test for error in strtol is to check the endptr.

...and test for overflow errors in errno.m (which shall have been zeroed 
beforehand). AFAIK, you need to do both if you want strict error detection.

>
>But just use a single sscanf() and %n to see if it reached the end of
>the string.

Don't some distros forbid the use of the n specifier for (debatable) "security 
reasons"? Or is that only for formatting, and not in scanning?

>> -There are two options for the input filename:
>> +There are three options for the input filename:
>
>“Amongst options for the imput filenames are such elements as:”
>
>;-)
>
>Regards,
>
>-- 
>  Nicolas George
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] lavc/lpc: R-V V compute_autocorr

2023-12-12 Thread Rémi Denis-Courmont
Le tiistaina 12. joulukuuta 2023, 23.02.40 EET Rémi Denis-Courmont a écrit :
> The loop iterates over the length of the vector, not the order. This is
> to avoid reloading the same data for each lag value. However this means
> the loop only works if the maximum order is no larger than VLENB.
> 
> The loop is roughly equivalent to:
> 
> for (size_t j = 0; j < lag; j++)
> autoc[j] = 1.;
> 
> while (len > lag) {
> for (ptrdiff_t j = 0; j < lag; j++)
> autoc[j] += data[j] * *data;
> data++;
> len--;
> }
> 
> while (len > 0) {
> for (ptrdiff_t j = 0; j < len; j++)
> autoc[j] += data[j] * *data;
> data++;
> len--;
> }
> 
> Since register pressure is only at 50%, it should be possible to implement
> the same loop for order up to 2xVLENB. But this is left for future work.
> 
> Performance numbers are all over the place from ~1.25x to ~4x speedups,
> but at least they are always noticeably better than nothing.
> ---
>  libavcodec/riscv/lpc_init.c |  8 +++-
>  libavcodec/riscv/lpc_rvv.S  | 29 +
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/riscv/lpc_init.c b/libavcodec/riscv/lpc_init.c
> index c16e5745f0..ab91956f2d 100644
> --- a/libavcodec/riscv/lpc_init.c
> +++ b/libavcodec/riscv/lpc_init.c
> @@ -22,16 +22,22 @@
> 
>  #include "libavutil/attributes.h"
>  #include "libavutil/cpu.h"
> +#include "libavutil/riscv/cpu.h"
>  #include "libavcodec/lpc.h"
> 
>  void ff_lpc_apply_welch_window_rvv(const int32_t *, ptrdiff_t, double *);
> +void ff_lpc_compute_autocorr_rvv(const double *, ptrdiff_t, int, double *);
> 
>  av_cold void ff_lpc_init_riscv(LPCContext *c)
>  {
>  #if HAVE_RVV && (__riscv_xlen >= 64)
>  int flags = av_get_cpu_flags();
> 
> -if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR))
> +if ((flags & AV_CPU_FLAG_RVV_F64) && (flags & AV_CPU_FLAG_RVB_ADDR)) {
>  c->lpc_apply_welch_window = ff_lpc_apply_welch_window_rvv;
> +
> +if (ff_get_rv_vlenb() >= c->max_order)
> +c->lpc_compute_autocorr = ff_lpc_compute_autocorr_rvv;
> +}
>  #endif
>  }
> diff --git a/libavcodec/riscv/lpc_rvv.S b/libavcodec/riscv/lpc_rvv.S
> index f81a2392c1..654156bf12 100644
> --- a/libavcodec/riscv/lpc_rvv.S
> +++ b/libavcodec/riscv/lpc_rvv.S
> @@ -85,4 +85,33 @@ func ff_lpc_apply_welch_window_rvv, zve64d
> 
>  ret
>  endfunc
> +
> +func ff_lpc_compute_autocorr_rvv, zve64d
> +lit0, 1
> +vsetvli   t1, a2, e64, m8, ta, ma

t1 is unused and should be zero. This is leftover from incomplete attempt to 
unroll.

> +fcvt.d.l  ft0, t0
> +vle64.v   v0, (a0)
> +sh3adda0, a2, a0   # data += lag
> +vfmv.v.f  v16, ft0
> +bge   a2, a1, 2f
> +1:
> +vfmv.f.s  ft0, v0
> +fld   ft1, (a0)# ft1 = data[lag + i]
> +vfmacc.vf v16, ft0, v0 # v16[j] += data[i] * data[i + j]
> +addi  a1, a1, -1
> +vfslide1down.vf v0, v0, ft1
> +addi  a0, a0, 8
> +bgt   a1, a2, 1b   # while (len > lag);
> +2:
> +vfmv.f.s  ft0, v0
> +vsetvli   zero, a1, e64, m8, tu, ma
> +vfmacc.vf v16, ft0, v0
> +addi  a1, a1, -1
> +vslide1down.vx v0, v0, zero
> +bnez  a1, 2b   # while (len > 0);
> +
> +vsetvli   zero, a2, e64, m8, ta, ma
> +vse64.v   v16, (a3)
> +ret
> +endfunc
>  #endif


-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] configure: disable locale use in spa plugin

2023-12-28 Thread Rémi Denis-Courmont


Le 27 décembre 2023 17:25:03 GMT+01:00, Abhishek Ojha 
 a écrit :
>This commit requires to resolve the compilation error of pipewiregrab
>because Pipewire's spa plugin is requesting locale_t extension to
>compile.
>Which was added in POSIX 2008 but ffmpeg is using POSIX 2001 due to
>which spa plugin complains. __LOCALE_C_ONLY flag is set to disable
>the locale usage in spa plugin. Adding it in configure file fix both
>the library test and source compilation issue.
>Not sure if this is the right approach to fix the issue.
>Feedback/Suggestions will be highly appreciated.

AFAIK, glibc requires that macros with a single underscore be set by the 
external code before glibc headers, while those with two leading underscores 
are for internal glibc (header) use.

So then this seems undefined.

>
>Signed-off-by: Abhishek Ojha 
>---
> configure | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/configure b/configure
>index 375327d5fa..442d004258 100755
>--- a/configure
>+++ b/configure
>@@ -7106,6 +7106,8 @@ if enabled libxcb; then
> enabled libxcb_xfixes && check_pkg_config libxcb_xfixes xcb-xfixes 
> xcb/xfixes.h xcb_xfixes_get_cursor_image
> fi
> 
>+# _POSIX_C_SOURCE=200112 doesn't support locale
>+add_cppflags -D__LOCALE_C_ONLY
> enabled libpipewire && check_pkg_config libpipewire "libpipewire-0.3 >= 
> 0.3.40" pipewire/pipewire.h pw_init
> if enabled libpipewire; then
> enabled libgio_unix && check_pkg_config libgio_unix gio-unix-2.0 
> gio/gio.h g_main_loop_new
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [RFC] fftools/ffmpeg and libavdevice/sdl issue

2023-12-19 Thread Rémi Denis-Courmont
Le tiistaina 19. joulukuuta 2023, 18.58.40 EET Michael Niedermayer a écrit :
> so the idea is that we cannot access any GUI in any code from anything in
> libavformat and probably all other libs, ever

No. The idea is that a command line program cannot use the GUI, and a library 
can only use the GUI if the main program is a GUI program.

> no debug with graphical output
> no vissualizuation of anything
> no devices
> no libs that expose anything that would need a GUI for configuration

You can do all those things as long as you assume that the main program is a  
GUI program running the Mac-specific UI main loop. AFAICT, you can write a 
macOS OpenGL or Metal video output device and a CoreAudio audio output device, 
and any GUI program that uses FFmpeg can then use those devices.

Trying to fit this into a generic portable command line tool is not going to 
work though. Then SDL adds the extra problem that it probably only works with 
its own API driving the main loop, and not just any API layered on top of Mac 
GUI frameworks. If so, the only proper way to support it (on Mac) is to make a 
dedicated player. Of course, it's also possible to skip support for SDL on 
Mac.


Lastly, it has been made clear by the proponents of the muxer that this is but 
a convenient trick so that a real muxer can be swapped for a renderer wherever 
a muxer is expected. In other words, it is a literal kludge (from the 
wiktionary: "Any construction or practice, typically crude yet effective, 
designed to solve a problem temporarily or expediently"). That's pretty much 
the antithesis of good and sound API design.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread Rémi Denis-Courmont
Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> C908:
> decorrelate_sm_c: 130.0
> decorrelate_sm_rvv_i32: 43.7

+
+func ff_decorrelate_sm_rvv, zve32x
+1:
+vsetvli  t0, a2, e32, m8, ta, ma
+vle32.v  v0, (a0)
+sub a2,  a2, t0
+vle32.v  v8, (a1)
+vsra.vi  v16, v8, 1

You should load v8 first, since it is used as input before v0.

+vsub.vv  v0, v0, v16
+vse32.v  v0, (a0)
+sh2add   a0, t0, a0
+vadd.vv  v0, v0, v8

You can use VSSRA, and then VADD won't need to depend on the output of VSUB.

+vse32.v  v0, (a1)
+sh2add   a1, t0, a1
+bnez a2, 1b
+ret
+endfunc

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread Rémi Denis-Courmont
Le torstaina 21. joulukuuta 2023, 18.07.55 EET Rémi Denis-Courmont a écrit :
> You can use VSSRA, and then VADD won't need to depend on the output of VSUB.

P.S.: I have NOT checked which approach is actually faster.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavfilter/af_afir: R-V V dcmul_add

2023-12-21 Thread Rémi Denis-Courmont
Le tiistaina 19. joulukuuta 2023, 4.53.12 EET flow gg a écrit :
> c908:
> dcmul_add_c: 88.0
> dcmul_add_rvv_f64: 46.2
> 
> Did not use vlseg2e64, because it is much slower than vlse64
> Did not use vsseg2e64, because it is slightly slower than vsse64

Is this about C910 or C908? I have not checked this specific function, but the 
general understanding for C908 has been the exact opposite so far, i.e. 
segmented accesses are fast, while strided accesses are (unsurprisingly) slow.

See also https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Rémi Denis-Courmont
Le tiistaina 19. joulukuuta 2023, 14.02.00 EET Martin Storsjö a écrit :
> This replaces the riscv specific handling from
> 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is
> reverted, together with 286d6742218ba0235c32876b50bf593cb1986353)
> with a different implementation of the same (plus a bit more), based
> on the corresponding feature in dav1d's checkasm, supporting both Unix
> and Windows.
> 
> See in particular dav1d commits
> 0b6ee30eab2400e4f85b735ad29a68a842c34e21 and
> 0421f787ea592fd2cc74c887f20b8dc31393788b, authored by
> Henrik Gramner.
> 
> The overall approach is the same; set up a signal handler,
> store the state with setjmp/sigsetjmp, jump out of the crashing
> function with longjmp/siglongjmp.
> 
> The main difference is in what happens when the signal handler
> is invoked. In the previous implementation, it would resume from
> right before calling the crashing function, and then skip that call
> based on the setjmp return value.
> 
> In the imported implementation from dav1d, we return to right before
> the check_func() call, which will skip testing the current function
> (as the pointer is the same as it was before).
> 
> Other differences are:
> - Support for other signal handling mechanisms (Windows
>   AddVectoredExceptionHandler)
> - Using RtlCaptureContext/RtlRestoreContext instead of setjmp/longjmp
>   on Windows with SEH (which adds the design limitation that it doesn't
>   return a value like setjmp does)
> - Only catching signals once per function - if more than one
>   signal is delivered before signal handling is reenabled, any
>   signal is handled as it would without our handler
> - Not using an arch specific signal handler written in assembly
> ---
>  tests/checkasm/checkasm.c   | 100 ++--
>  tests/checkasm/checkasm.h   |  79 ++---
>  tests/checkasm/riscv/checkasm.S |  12 
>  3 files changed, 140 insertions(+), 51 deletions(-)
> 
> diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
> index 6318d9296b..668034c67f 100644
> --- a/tests/checkasm/checkasm.c
> +++ b/tests/checkasm/checkasm.c
> @@ -23,8 +23,10 @@
>  #include "config.h"
>  #include "config_components.h"
> 
> -#ifndef _GNU_SOURCE
> -# define _GNU_SOURCE // for syscall (performance monitoring API),
> strsignal() +#if CONFIG_LINUX_PERF
> +# ifndef _GNU_SOURCE
> +#  define _GNU_SOURCE // for syscall (performance monitoring API)
> +# endif
>  #endif
> 
>  #include 
> @@ -326,6 +328,7 @@ static struct {
>  const char *cpu_flag_name;
>  const char *test_name;
>  int verbose;
> +int catch_signals;

AFAICT, this needs to be volatile sigatomic_t

>  } state;
> 
>  /* PRNG state */
> @@ -627,6 +630,64 @@ static CheckasmFunc *get_func(CheckasmFunc **root,
> const char *name) return f;
>  }
> 
> +checkasm_context checkasm_context_buf;
> +
> +/* Crash handling: attempt to catch crashes and handle them
> + * gracefully instead of just aborting abruptly. */
> +#ifdef _WIN32
> +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
> +static LONG NTAPI signal_handler(EXCEPTION_POINTERS *const e) {
> +const char *err;
> +
> +if (!state.catch_signals)
> +return EXCEPTION_CONTINUE_SEARCH;
> +
> +switch (e->ExceptionRecord->ExceptionCode) {
> +case EXCEPTION_FLT_DIVIDE_BY_ZERO:
> +case EXCEPTION_INT_DIVIDE_BY_ZERO:
> +err = "fatal arithmetic error";
> +break;
> +case EXCEPTION_ILLEGAL_INSTRUCTION:
> +case EXCEPTION_PRIV_INSTRUCTION:
> +err = "illegal instruction";
> +break;
> +case EXCEPTION_ACCESS_VIOLATION:
> +case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
> +case EXCEPTION_DATATYPE_MISALIGNMENT:
> +case EXCEPTION_STACK_OVERFLOW:
> +err = "segmentation fault";
> +break;
> +case EXCEPTION_IN_PAGE_ERROR:
> +err = "bus error";
> +break;
> +default:
> +return EXCEPTION_CONTINUE_SEARCH;
> +}
> +state.catch_signals = 0;
> +checkasm_fail_func("%s", err);
> +checkasm_load_context();
> +return EXCEPTION_CONTINUE_EXECUTION; /* never reached, but shuts up gcc
> */ +}
> +#endif
> +#else
> +static void signal_handler(const int s) {
> +if (state.catch_signals) {
> +state.catch_signals = 0;
> +checkasm_fail_func("%s",
> +   s == SIGFPE ? "fatal arithmetic error" :
> +   s == SIGILL ? "illegal instruction" :
> +   s == SIGBUS ? "bus error" :
> + "segmentation fault");

The current code for the error print-out is both simpler and more versatile, 
so I don't get this.

> +checkasm_load_context();
> +} else {
> +/* fall back to the default signal handler */
> +static const struct sigaction default_sa = { .sa_handler = SIG_DFL
> }; +sigaction(s, _sa, NULL);
> +raise(s);

Why raise here? Returning from the handler will reevaluate the 

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Rémi Denis-Courmont


Le 22 décembre 2023 00:03:59 GMT+02:00, Henrik Gramner via ffmpeg-devel 
 a écrit :
>On Thu, Dec 21, 2023 at 9:16 PM Rémi Denis-Courmont  wrote:
>> > +checkasm_fail_func("%s",
>> > +   s == SIGFPE ? "fatal arithmetic error" :
>> > +   s == SIGILL ? "illegal instruction" :
>> > +   s == SIGBUS ? "bus error" :
>> > + "segmentation fault");
>>
>> The current code for the error print-out is both simpler and more versatile,
>> so I don't get this.
>
>IMO "illegal instruction" is a far better error message than "fatal
>signal 4" (with an implementation-defined number which nobody knows
>the meaning of without having to look it up).

The current code prints the number and the name.

>
>> > +/* fall back to the default signal handler */
>> > +static const struct sigaction default_sa = { .sa_handler = SIG_DFL
>> > }; +sigaction(s, _sa, NULL);
>> > +raise(s);
>>
>> Why raise here? Returning from the handler will reevaluate the same code with
>> the same thread state, and trigger the default signal handler anyway (since
>> you don't modify the user context).
>
>No it wont, it'll get stuck in an infinite loop invoking the signal
>handler over and over. At least on my system.

No, it won't since the default signal handler was restored. And it's much less 
confusing to debug if the signal comes from where it was actually triggered 
than from explicit raise call.

>
>> > +const struct sigaction sa = {
>> > +.sa_handler = signal_handler,
>> > +.sa_flags = SA_NODEFER,
>>
>> That does not look very sane to me. If a recursive signal occurs, processing
>> it recursively is NOT a good idea. This would cause an infinite loop,
>> eventually a literal stack overflow after maxing out the processor for a 
>> while.
>> I'd rather let the OS kernel deal with the problem, by killing the process or
>> whatever the last resort is.
>>
>> > +#define checkasm_save_context() setjmp(checkasm_context_buf)
>> > +#define checkasm_load_context() longjmp(checkasm_context_buf, 1)
>> > +#endif
>>
>> Been there done that and it did not end well.
>> sigsetjmp() & co are necessary here.
>
>For all intents and purposes sigjmp()/longjmp() with SA_NODEFER does
>the same thing as sigsetjmp()/siglongjmp() without SA_NODEFER for this
>particular use case (no infinite recursion is possible the way the
>code is written). The change isn't necessary per se but it seems
>reasonable and I have no objections to it.
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Rémi Denis-Courmont


Le 21 décembre 2023 22:16:09 GMT+02:00, "Rémi Denis-Courmont"  
a écrit :
>Le tiistaina 19. joulukuuta 2023, 14.02.00 EET Martin Storsjö a écrit :
>> This replaces the riscv specific handling from
>> 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is
>> reverted, together with 286d6742218ba0235c32876b50bf593cb1986353)
>> with a different implementation of the same (plus a bit more), based
>> on the corresponding feature in dav1d's checkasm, supporting both Unix
>> and Windows.
>> 
>> See in particular dav1d commits
>> 0b6ee30eab2400e4f85b735ad29a68a842c34e21 and
>> 0421f787ea592fd2cc74c887f20b8dc31393788b, authored by
>> Henrik Gramner.
>> 
>> The overall approach is the same; set up a signal handler,
>> store the state with setjmp/sigsetjmp, jump out of the crashing
>> function with longjmp/siglongjmp.
>> 
>> The main difference is in what happens when the signal handler
>> is invoked. In the previous implementation, it would resume from
>> right before calling the crashing function, and then skip that call
>> based on the setjmp return value.
>> 
>> In the imported implementation from dav1d, we return to right before
>> the check_func() call, which will skip testing the current function
>> (as the pointer is the same as it was before).
>> 
>> Other differences are:
>> - Support for other signal handling mechanisms (Windows
>>   AddVectoredExceptionHandler)
>> - Using RtlCaptureContext/RtlRestoreContext instead of setjmp/longjmp
>>   on Windows with SEH (which adds the design limitation that it doesn't
>>   return a value like setjmp does)
>> - Only catching signals once per function - if more than one
>>   signal is delivered before signal handling is reenabled, any
>>   signal is handled as it would without our handler
>> - Not using an arch specific signal handler written in assembly
>> ---
>>  tests/checkasm/checkasm.c   | 100 ++--
>>  tests/checkasm/checkasm.h   |  79 ++---
>>  tests/checkasm/riscv/checkasm.S |  12 
>>  3 files changed, 140 insertions(+), 51 deletions(-)
>> 
>> diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
>> index 6318d9296b..668034c67f 100644
>> --- a/tests/checkasm/checkasm.c
>> +++ b/tests/checkasm/checkasm.c
>> @@ -23,8 +23,10 @@
>>  #include "config.h"
>>  #include "config_components.h"
>> 
>> -#ifndef _GNU_SOURCE
>> -# define _GNU_SOURCE // for syscall (performance monitoring API),
>> strsignal() +#if CONFIG_LINUX_PERF
>> +# ifndef _GNU_SOURCE
>> +#  define _GNU_SOURCE // for syscall (performance monitoring API)
>> +# endif
>>  #endif
>> 
>>  #include 
>> @@ -326,6 +328,7 @@ static struct {
>>  const char *cpu_flag_name;
>>  const char *test_name;
>>  int verbose;
>> +int catch_signals;
>
>AFAICT, this needs to be volatile sigatomic_t
>
>>  } state;
>> 
>>  /* PRNG state */
>> @@ -627,6 +630,64 @@ static CheckasmFunc *get_func(CheckasmFunc **root,
>> const char *name) return f;
>>  }
>> 
>> +checkasm_context checkasm_context_buf;
>> +
>> +/* Crash handling: attempt to catch crashes and handle them
>> + * gracefully instead of just aborting abruptly. */
>> +#ifdef _WIN32
>> +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
>> +static LONG NTAPI signal_handler(EXCEPTION_POINTERS *const e) {
>> +const char *err;
>> +
>> +if (!state.catch_signals)
>> +return EXCEPTION_CONTINUE_SEARCH;
>> +
>> +switch (e->ExceptionRecord->ExceptionCode) {
>> +case EXCEPTION_FLT_DIVIDE_BY_ZERO:
>> +case EXCEPTION_INT_DIVIDE_BY_ZERO:
>> +err = "fatal arithmetic error";
>> +break;
>> +case EXCEPTION_ILLEGAL_INSTRUCTION:
>> +case EXCEPTION_PRIV_INSTRUCTION:
>> +err = "illegal instruction";
>> +break;
>> +case EXCEPTION_ACCESS_VIOLATION:
>> +case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
>> +case EXCEPTION_DATATYPE_MISALIGNMENT:
>> +case EXCEPTION_STACK_OVERFLOW:
>> +err = "segmentation fault";
>> +break;
>> +case EXCEPTION_IN_PAGE_ERROR:
>> +err = "bus error";
>> +break;
>> +default:
>> +return EXCEPTION_CONTINUE_SEARCH;
>> +}
>> +state.catch_signals = 0;
>> +checkasm_fail_func("%s", err);
>> +checkasm_load_context();
>> +  

Re: [FFmpeg-devel] [PATCH] lavc/lpc: R-V V apply_welch_window

2023-12-11 Thread Rémi Denis-Courmont


Le 11 décembre 2023 11:11:28 GMT+02:00, Anton Khirnov  a 
écrit :
>Quoting Rémi Denis-Courmont (2023-12-08 18:46:51)
>> +#if __riscv_xlen >= 64
>> +func ff_lpc_apply_welch_window_rvv, zve64d
>> +vsetvli t0, zero, e64, m8, ta, ma
>> +vid.v   v0
>> +addit2, a1, -1
>> +vfcvt.f.xu.v v0, v0
>> +li  t3, 2
>> +fcvt.d.l ft2, t2
>> +srait1, a1, 1
>> +fcvt.d.l ft3, t3
>> +li  t4, 1
>> +fdiv.d  ft0, ft3, ft2# ft0 = c = 2. / (len - 1)
>> +fcvt.d.l fa1, t4 # fa1 = 1.
>> +fsub.d  ft1, ft0, fa1
>> +vfrsub.vf v0, v0, ft1# v0[i] = c - i - 1.
>> +1:
>> +vsetvli t0, t1, e64, m8, ta, ma
>> +vfmul.vv v16, v0, v0  # no fused multipy-add as v0 is reused
>> +sub t1, t1, t0
>> +vle32.v v8, (a0)
>> +fcvt.d.l ft2, t0
>> +vfrsub.vf v16, v16, fa1  # v16 = 1. - w * w
>> +sh2add  a0, t0, a0
>> +vsetvli zero, zero, e32, m4, ta, ma
>> +vfwcvt.f.x.v v24, v8
>> +vsetvli zero, zero, e64, m8, ta, ma
>> +vfsub.vf v0, v0, ft2 # v0 -= vl
>> +vfmul.vv v8, v24, v16
>> +vse64.v v8, (a2)
>> +sh3add  a2, t0, a2
>> +bnezt1, 1b
>> +
>> +andit1, a1, 1
>> +beqzt1, 2f
>> +
>> +sd  zero, (a2)
>> +addia0, a0, 4
>> +addia2, a2, 8
>> +2:
>> +vsetvli t0, zero, e64, m8, ta, ma
>> +vid.v   v0
>> +srait1, a1, 1
>> +vfcvt.f.xu.v v0, v0
>> +fcvt.d.l ft1, t1
>> +fsub.d  ft1, ft0, ft1# ft1 = c - (len / 2)
>> +vfadd.vf v0, v0, ft1 # v0[i] = c - (len / 2) + i
>> +3:
>> +vsetvli t0, t1, e64, m8, ta, ma
>> +vfmul.vv v16, v0, v0
>> +sub t1, t1, t0
>> +vle32.v v8, (a0)
>> +fcvt.d.l ft2, t0
>> +vfrsub.vf v16, v16, fa1  # v16 = 1. - w * w
>> +sh2add  a0, t0, a0
>> +vsetvli zero, zero, e32, m4, ta, ma
>> +vfwcvt.f.x.v v24, v8
>> +vsetvli zero, zero, e64, m8, ta, ma
>> +vfadd.vf v0, v0, ft2 # v0 += vl
>> +vfmul.vv v8, v24, v16
>> +vse64.v v8, (a2)
>> +sh3add  a2, t0, a2
>> +bnezt1, 3b
>
>I think it'd look a lot less like base64 < /dev/random if you vertically
>aligned the first operands.

They are aligned to the 17th column. Problem is that quite a few vector 
mnemonics are larger than 7 characters.

>
>-- 
>Anton Khirnov
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/lpc: R-V V apply_welch_window

2023-12-11 Thread Rémi Denis-Courmont


Le 11 décembre 2023 11:57:50 GMT+02:00, Anton Khirnov  a 
écrit :
>Quoting Rémi Denis-Courmont (2023-12-11 10:50:53)
>> Le 11 décembre 2023 11:11:28 GMT+02:00, Anton Khirnov  a 
>> écrit :
>> >I think it'd look a lot less like base64 < /dev/random if you vertically
>> >aligned the first operands.
>> 
>> They are aligned to the 17th column. Problem is that quite a few vector 
>> mnemonics are larger than 7 characters.
>
>Align to 25 or 33 then?

IMO that's even worse. The operands end up too far off from most mnemonics that 
it hurts legibility more than it improves.

I initially aligned to the longest mnemonics but that turned out badly whenever 
revectoring (for obvious reasons).

>
>-- 
>Anton Khirnov
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] checkasm: test for abs_pow34

2023-12-11 Thread Rémi Denis-Courmont
Le lauantaina 9. joulukuuta 2023, 12.45.03 EET flow gg a écrit :
> There's a strange issue:
> 
> Adding tests can compile successfully on x86 and lichee4a (risc v), but it
> results in an error on k230.
> 
> > collect2: fatal error: ld terminated with signal 9 [Killed]
> > compilation terminated.
> > 
> > [32833.539109]
> 
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=ld,pid=6804,uid=100
> 0
> > [32833.547321] Out of memory: Killed process 6804 (ld) total-vm:363180kB,
> 
> anon-rss:357536kB, file-rss:932kB, shmem-rss:0kB, UID:1000 pgtables:732kB
> oom_score_adj:0
> 
> > [32833.653223] oom_reaper: reaped process 6804 (ld), now anon-rss:0kB,
> 
> file-rss:0kB, shmem-rss:0kB
> 
> If I remove the line 1429 with FF_CODEC_ENCODE_CB(aac_encode_frame), there
> is no error on k230, but I am unsure of the reason.

Looks like plain dumb out-of-memory situation?
Not all that surprising when the hardware has only 512 MiB of RAM.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/ac3: add R-V Zbb extract_exponents

2023-12-11 Thread Rémi Denis-Courmont
Le maanantaina 4. joulukuuta 2023, 4.34.26 EET Zhao Zhili a écrit :
> > So really you're better off with GCC. RISC-V support on LLVM is pretty
> > sad, TBH.
> OK, just check if this is an unknown issue. I’m totally fine to stay with
> GCC.

Build should be fixed (which means the "offending" files should no longer be 
compiled).

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


<    1   2   3   4   5   6   7   8   9   10   >