Re: [FFmpeg-user] Towards better trims & concatenations

Mark Filipak Mon, 08 Jan 2024 15:02:13 -0800

On 1/8/24 08:08, Rob Hallam wrote:

On Mon, 8 Jan 2024 at 12:37, Mark Filipak <markfilipak.i...@gmail.com> wrote:


On 1/8/24 07:16, Rob Hallam wrote:

On Mon, 8 Jan 2024 at 12:07, Mark Filipak <markfilipak.i...@gmail.com> wrote:

For example, if 'v' (video) and 'a' (audio) packets go from
v-a-a-a-a-v-a-a-a-a-v... to
a-a-a-a-a-a-a-a-v-v-v..., then somethings wrong, eh? That's the kind of 
difference I'm seeing
between the two versions of 01.mp4.


Forgive me for jumping in in the middle here, but is that strictly
true?

Is what true? Is it true that the audio packets are bunched up, out of time sequence, and pushed tothe front? Yes, it's true. That's why the MPV player has difficulty and doesn't start at00:00:00.000. Part of that problem is that, for some unknown reason, ffmpeg creates one time_basefor frame packets and a different time_base for audio packets. It seems to me that that's justlooking for trouble.

Honest question, perhaps the spec says that they should be
identical.


There is no spec that defines how to trim and concatenate.

Sorry, I don't understand you. Are you asking if I'm lying? I doubt it, but I 
don't know the
antecedent of "that". Also, when you wrote "the spec", what spec did you have 
in mind?


For clarity, I wasn't accusing you of lying...


For clarity, I didn't think you were, and said so.

... and it certainly wasn't my
intention to imply that; my apologies if it sounded that way!

The 'that' in the above-quoted case was your example of packets-
clearly they are ordered differently, something has changed and
perhaps it shouldn't have changed.


There is no 'perhaps' about it.

I wondered if there was a practical
difference; to go back to the multiplication example, if you get 120
either way, does it matter if you do 3*4*10 versus 10*3*4 ? Sometimes
it does matter -- like in cases of floating-point maths -- but  I am
wondering if ffmpeg here is producing something that appears different
but looks and sounds the same.


I address this further down.

I didn't have a particular spec in mind, but candidates would be
ffmpeg specs...


FFmpeg has specs? I'd surely like to see them.

...and/or specs for the container and codec formats in use-
ie does this behaviour contradict those.

I parse VOBs. I don't know the structures of M2TSs or MP4s or MKVs or anything else. But they allwork off packet headers (e.g., PESs (packetized elemental streams)) that contain the structure andthe settings that made the packet's payload what it is. There's no usage spec. Packet headerscontain DTS, PTS, DAR, width, height, etc. Packet headers don't 'specify' how applications shouldcreate and maintain a valid packet table, nor do they specify packet table access methods. The specsjust show structure. The H.262 spec goes a little further when it attempts to describe a virtualdecoder machine for MPEG TS streams. That machine is a simple outline of how DTS & PTS work torender time ordered presentations from time disordered packets that are received. Illustrating sucha small aspect of such a large procedure is like illustrating how the sun works by lighting a match.It's an important part, and the decoder model is good as far as it goes, but the rest is left up tothe application and the specification is silent about that.

In much the same way a*b*c is equivalent to b*a*c, does the order of
packets necessarily matter if the output is perceptually the same?

Yes, time order matters. If two videos are perceptually the same, then they're the same; they havethe same internals. You can't move frames or audio samples around and it not be perceived. Thingscan get so bad that players drop packets. Is that perceivable? Yes, at some level of probing, it is.

The frames and samples and chapters and subtitles are Legos. If you take the peak of a Lego buildingoff and stick it onto the side of the building, is that perceivable?


This is not brain surgery. It's Legos.

Oh, I think I see why your difficulty, Rob. "a*b*c" happens at one instant. It doesn't matter inwhat order the multiplication happens because it's all in a single instant. With video frames, ordermatters. Frames are separated in time -- out of order is visible.

The packets are in PTS order. Does the order of the packets matter? No, it's 
the order of the PTSs
that matters.

If the output is not perceptually the same, or there are timing issues
/ desync / other problems as a result then I can see that being a
potentially important bug.


The MPV player misbehaves for all 6 of the sons. The starting running time is not 
"00:00:00.000".


Does it matter that the starting running time is not "00:00:00.000" ?


Yes.

I presume it does, otherwise you might not be raising this issue; but
in my ignorance I can see the possibility that the reported starting
running time is a 'cosmetic' issue rather than a functional one.

Trimming errors are wrecking concatenations. If DTSs & PTSs aren't smooth and continuous at thejoin, bad things happen. By that I don't mean that packets have to be in PTS order. They are abouthalf the time and PTS-DTS varies between I-frames and P-frames and B-frames in order to allow thedecoder time to decode and do the interframe correlations -- motion vectors and all that stuff. Butthe trimming has to take PTS into account so that the cut happens in the right spot with no leftoverpackets that shouldn't be there, but that apparently isn't happening and I have the proof.

I am
happy to be corrected and educated, which is partly why I am still
subscribed to this ML.

I ask these questions because "ffmpeg produces output that plays
incorrectly" is a different bug to "ffmpeg produces output that plays
correctly but has a different file structure". Both are bugs, but it's
worth being clear to which one the issues have identified belong so
you and devs are on the same page.

To state it clearly, Rob, two MP4s for example that play correctly have the same structure. If oneof them has a different structure, then one of them does not play correctly and that can be seenand/or heard. v-a-a-a-a-v-a-a-a-a-v versus a-a-a-a-a-a-a-a-v-v-v is my poor portrayal of such adifference that I am actually seeing.

PS I've been following along as I am also interested in cutting and
re-joining- my first query to this ML was about whether there's a way
to chop off the starts and ends of some clips, add transitions and
re-encode those short overlapping bits, and then join them back on to
their parent clips to avoid having to re-encode the whole lot

To be frank, Rob, if you want to help yourself, you may want to help me. I published my procedure.Duplicate it and apply it to some of the videos you've had problems with. Learn how to use'-framecrc' and '-showinfo'. It will take you awhile, but it will be time well spent. It willdemystify a lot for you. I'll be here to help if you like.

The developers are interested in streaming methods and using them to get consulting jobs. That's asit should be because everyone needs to make a living. To get them to pay attention to this 'troll',I need allies.


Rob, video is not brain surgery. It's Legos.

-- Mark.

_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-user] Towards better trims & concatenations

Reply via email to