Re: [FFmpeg-devel] Improved SAMI support

2016-09-30 Thread Clément Bœsch
On Fri, Sep 30, 2016 at 12:28:14AM +0200, Jehan Pagès wrote:
> Hello!
> 
> So I posted a patch about SAMI support
> (https://trac.ffmpeg.org/ticket/3118), but I was told it would be
> ignored on the bug tracker, therefore to use the dev mailing list. So
> patch attached, and a small description:
> 
> SAMI subtitles can have several languages (as explained in the spec:
> https://msdn.microsoft.com/en-us/library/ms971327.aspx). But currently
> this is not supported by ffmpeg. What ffmpeg does is simply using
> subtitles for all language, and when 2 subtitles uses the same
> timestamp (which obviously happen for most text in the file since they
> are made for the same video), the one later defined in the file will
> override the first definition.
> Example:
> 
> > Some text in Korean (for instance)
> > […]
> > The same text but in English.
> 
> Result: the Korean text will never be displayed.
> Also you can end up with mix of languages (versions of a subtitle in
> another lang may have additional timestamps which don't get
> overriden). As a consequence, I have to edit nearly every SMI file
> which comes into my hand and delete all text from the language I don't
> want myself. And that's extremely annoying.
> 

> My attached patch is a first step. It will only take into account
> subtitles set for the default language (which means, the first defined
> in the SAMI file, cf. the spec), or without a language (therefore a
> subtitle can be common to all langs). Of course, the perfect version
> should extract 1 subtitle track per lang in the file. This first patch
> does not (maybe another one later!). But that's still a huge
> improvement from the current code since usually all the SMI files I
> find, the default language is indeed the one I need (since they were
> done for this purpose). I have used this patch for the last 2 days,
> and that's already a huge relief for me. :-)
> 
> Could this be integrated into ffmpeg?

Creating multiple streams is essential IMO (the api is ready;
lavf/subtitles supports this). You can check how VobSub are handled.

Then using regex is not reliable enough. And if you absolutely want to use
it, you need to add configure checks because it's likely not portable
enough. But I'd very much encourage you to have a proper parser.

Regards,

-- 
Clément B.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] Improved SAMI support

2016-09-29 Thread Jehan Pagès
Hello!

So I posted a patch about SAMI support
(https://trac.ffmpeg.org/ticket/3118), but I was told it would be
ignored on the bug tracker, therefore to use the dev mailing list. So
patch attached, and a small description:

SAMI subtitles can have several languages (as explained in the spec:
https://msdn.microsoft.com/en-us/library/ms971327.aspx). But currently
this is not supported by ffmpeg. What ffmpeg does is simply using
subtitles for all language, and when 2 subtitles uses the same
timestamp (which obviously happen for most text in the file since they
are made for the same video), the one later defined in the file will
override the first definition.
Example:

> Some text in Korean (for instance)
> […]
> The same text but in English.

Result: the Korean text will never be displayed.
Also you can end up with mix of languages (versions of a subtitle in
another lang may have additional timestamps which don't get
overriden). As a consequence, I have to edit nearly every SMI file
which comes into my hand and delete all text from the language I don't
want myself. And that's extremely annoying.

My attached patch is a first step. It will only take into account
subtitles set for the default language (which means, the first defined
in the SAMI file, cf. the spec), or without a language (therefore a
subtitle can be common to all langs). Of course, the perfect version
should extract 1 subtitle track per lang in the file. This first patch
does not (maybe another one later!). But that's still a huge
improvement from the current code since usually all the SMI files I
find, the default language is indeed the one I need (since they were
done for this purpose). I have used this patch for the last 2 days,
and that's already a huge relief for me. :-)

Could this be integrated into ffmpeg?
Thanks!

Jehan

-- 
ZeMarmot open animation film
http://film.zemarmot.net
Patreon: https://patreon.com/zemarmot
Tipeee: https://www.tipeee.com/zemarmot
From 0cd4d92b98ecbf364891038b851740291cf75219 Mon Sep 17 00:00:00 2001
From: Jehan 
Date: Wed, 28 Sep 2016 03:28:52 +0200
Subject: [PATCH] avformat: basic language support in SAMI subtitles.

Different languages are not yet extracted as separate subtitle tracks.
This first basic version simply uses the default language and ignore any
others. The spec says: "If the user (or author) has not explicitly
selected a language, the first Class (language) definition will be used
by default."
See: https://msdn.microsoft.com/en-us/library/ms971327.aspx
---
 libavformat/realtextdec.c |  14 ++--
 libavformat/samidec.c | 167 ++
 libavformat/subtitles.c   |  22 +-
 libavformat/subtitles.h   |   5 +-
 4 files changed, 187 insertions(+), 21 deletions(-)

diff --git a/libavformat/realtextdec.c b/libavformat/realtextdec.c
index 618d4f7..c0b0e44 100644
--- a/libavformat/realtextdec.c
+++ b/libavformat/realtextdec.c
@@ -85,10 +85,12 @@ static int realtext_read_header(AVFormatContext *s)
 
 if (!av_strncasecmp(buf.str, "codecpar->extradata = av_strdup(buf.str);
 if (!st->codecpar->extradata) {
 res = AVERROR(ENOMEM);
@@ -105,12 +107,16 @@ static int realtext_read_header(AVFormatContext *s)
 goto end;
 }
 if (!merge) {
-const char *begin = ff_smil_get_attr_ptr(buf.str, "begin");
-const char *end   = ff_smil_get_attr_ptr(buf.str, "end");
+char *begin = ff_smil_get_attr_ptr(buf.str, "begin");
+char *end   = ff_smil_get_attr_ptr(buf.str, "end");
 
 sub->pos  = pos;
 sub->pts  = begin ? read_ts(begin) : 0;
 sub->duration = end ? (read_ts(end) - sub->pts) : duration;
+if (begin)
+av_free(begin);
+if (end)
+av_free(end);
 }
 }
 av_bprint_clear();
diff --git a/libavformat/samidec.c b/libavformat/samidec.c
index 7ea1bdf..3bd16ea 100644
--- a/libavformat/samidec.c
+++ b/libavformat/samidec.c
@@ -26,6 +26,7 @@
 
 #include "avformat.h"
 #include "internal.h"
+#include "regex.h"
 #include "subtitles.h"
 #include "libavcodec/internal.h"
 #include "libavutil/avstring.h"
@@ -34,6 +35,10 @@
 
 typedef struct {
 FFDemuxSubtitlesQueue q;
+char **lang_names;
+char **lang_codes;
+char **lang_ids;
+intn_langs;
 } SAMIContext;
 
 static int sami_probe(AVProbeData *p)
@@ -51,11 +56,19 @@ static int sami_read_header(AVFormatContext *s)
 SAMIContext *sami = s->priv_data;
 AVStream *st =