Re: [whatwg] WebSRT feedback

2010-10-13 Thread philipj

On Fri, 08 Oct 2010 04:39:43 -0700, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:



On 08/10/2010, at 1:28 PM, Philip Jägenstedt phil...@opera.com wrote:

On Thu, 07 Oct 2010 13:18:37 -0700, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:


On Thu, Oct 7, 2010 at 4:06 PM, Philip Jägenstedt phil...@opera.com  
wrote:



On Thu, 07 Oct 2010 01:57:17 -0700, James Graham jgra...@opera.com
wrote:

On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:


As an aside, the idea of using an HTML parser for the cue text wasn't

very popular.



Why? Were any technical reasons given?



The question was directed at the media player/framework developers  
present.
One of them didn't care and one was strongly opposed on the basis of  
bloat.
This was an aside, if anyone is serious about using the HTML fragment  
parser
for WebSRT, we really should approach the developer mailing lists of  
media
players/frameworks. I doubt we will find much love, but would be  
happy to be

shown wrong.




The one I talked to said that HTML markup should totally be used in  
cues (he
even mentioned more generally why we didn't pick up USF). The reason  
being
that it clearly defines extensibility and would in fact already  
provide any
use case that anyone can come up with, thus stopping people from  
inventing

their own screwed up extensions, such as the use of ass commands in {}
inside srt subtitles.

The thing is: while the full set of features of HTML fragments seems  
bloat,
not every subtitle will consist of all the possible markup. Just like  
Web
pages are often created with very simple markup which uses less then  
1% of
what HTML is capable of, we will see the same happening with subtitle  
cues.
But the availability and clear definition of how such features should  
be

used prevents the introduction of crappy extension.


Even if very few subtitles use inline SVG, SVG in object, img,  
iframe, video, self-referencing track, etc in the cue text, all  
implementations would have to support it in the same way for it to be  
interoperable. That's quite an undertaking and I don't think it's  
really worth it.




They all need to be interoperable on all of these features already. It  
should be easier to keep them interoperable on something known and  
already implemented than on a set of new features, in particular when  
the new feature set is restricted and features beyond the limited given  
set are not available such that custom markup will be produced by  
plugins etc.



As for extensibility, I suggest that we generalize the WebSRT parser  
somewhat to produce a normal DOM with elements in a non-HTML namespace  
and then use CSS to style them as usual. Unknown element names  
shouldn't be valid, of course, but they'd still appear in the DOM. If  
XML5 (http://annevankesteren.nl/2007/10/xml5) was ready, I'd suggest  
we use that, with the constraint that it should only be able to output  
elements in that non-HTML namespace. (Just thinking out loud here.)


I think that's ok, even though I think it makes more sense to have HTML  
fragments than arbitrary markup that is related but somewhat different.  
I think we are then just re-inventing HTML.


On Fri, 08 Oct 2010 05:20:28 -0700, Robert O'Callahan
rob...@ocallahan.org wrote:


User agents only need to be interoperable over the common subset of HTML
features they support. HTML is mostly designed to degrade gracefully  
when a

user agent encounters elements it doesn't support. The simplest possible
video player would use an HTML parser (hopefully off-the-shelf) to build
some kind of DOM structure. Then it can group text into paragraphs for
rendering, and ignore the rest of the content.

In practice, we'll have to deal with user agents that support different  
sets
of WebSRT features --- when version 2 of WebSRT is developed, if not  
before.

Why not use existing, proven machinery --- HTML --- to cope with that
situation?


I'm making a few assumptions here:

* The cue text format of WebSRT will also be used in WebM when we add
support for in-band captions.

* We want non-browser players and tools to support WebSRT.

It's also worth noting that HTML is lacking semantics for the two most
important aspects of subtitles/captions -- timing and speakers.

If non-browsers get only bloat and no benefit from using an HTML parser.

I do think that a syntax that looks similar to HTML and XML should have
similar parsing, which WebSRT currently doesn't. However,

Main points:

* non-browsers won't want to implement it

* how do browsers implement it?


Even in browsers, it seems to be that using HTML as the cue text formats  
creates lots of complications. I'd like to understand in more detail what  
exactly is being suggested be done with the HTML fragments returned by the  
parser. The following questions mostly revolve around which document the  
fragment will be made part of.


* What are relative links relative to? Is it the containing document or  
the WebSRT resource? 

Re: [whatwg] WebSRT feedback

2010-10-13 Thread Philip Jägenstedt
On Fri, 08 Oct 2010 06:00:25 -0700, Jeroen Wijering  
jer...@longtailvideo.com wrote:




On Oct 8, 2010, at 2:24 PM, whatwg-requ...@lists.whatwg.org wrote:


Even if very few subtitles use inline SVG, SVG in object, img,
iframe, video, self-referencing track, etc in the cue text, all
implementations would have to support it in the same way for it to be
interoperable. That's quite an undertaking and I don't think it's  
really

worth it.



User agents only need to be interoperable over the common subset of HTML
features they support. HTML is mostly designed to degrade gracefully  
when a

user agent encounters elements it doesn't support. The simplest possible
video player would use an HTML parser (hopefully off-the-shelf) to build
some kind of DOM structure. Then it can group text into paragraphs for
rendering, and ignore the rest of the content.

In practice, we'll have to deal with user agents that support different  
sets
of WebSRT features --- when version 2 of WebSRT is developed, if not  
before.

Why not use existing, proven machinery --- HTML --- to cope with that
situation?

Rob


The requests we receive on the captioning functionality of the JW Player  
always revolve around styling. Font size, color, style, weight, outline  
and family. Block x, y, width, height, text-align, vertical-align,  
padding, margin, background and alpha. Both for an entire SRT file, for  
distinct captioning entries and for specific parts of a captioning  
entry. Not to say that a full parsing engine wouldn't be nice or useful,  
but at present there's simply no requests for it (not even for a ;).  
Plus, more advanced timed track applications can easily be built with  
javascript (timed boucing 3D balls using WebGL).


W3C's timed text does a decent job in facilitating the styling needs for  
captioning authors. Overall regions, single paragraphs and inline chunks  
(through span) can be styled. There are a few small misses, such as  
text outline, and vertical alignment (which can be done with separate  
regions though). IMO the biggest con of TT is that it uses its own,  
in-document styling namespace, instead of relying upon page CSS.


Kind regards,

Jeroen



--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] WebSRT feedback

2010-10-13 Thread Philip Jägenstedt

On Fri, 08 Oct 2010 04:39:43 -0700, Silvia Pfeiffer
silviapfeiff...@gmail.com wrote:



On 08/10/2010, at 1:28 PM, Philip Jägenstedt phil...@opera.com wrote:

On Thu, 07 Oct 2010 13:18:37 -0700, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:


On Thu, Oct 7, 2010 at 4:06 PM, Philip Jägenstedt phil...@opera.com  
wrote:



On Thu, 07 Oct 2010 01:57:17 -0700, James Graham jgra...@opera.com
wrote:

On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:


As an aside, the idea of using an HTML parser for the cue text wasn't

very popular.



Why? Were any technical reasons given?



The question was directed at the media player/framework developers  
present.
One of them didn't care and one was strongly opposed on the basis of  
bloat.
This was an aside, if anyone is serious about using the HTML fragment  
parser
for WebSRT, we really should approach the developer mailing lists of  
media
players/frameworks. I doubt we will find much love, but would be  
happy to be

shown wrong.




The one I talked to said that HTML markup should totally be used in  
cues (he
even mentioned more generally why we didn't pick up USF). The reason  
being
that it clearly defines extensibility and would in fact already  
provide any
use case that anyone can come up with, thus stopping people from  
inventing

their own screwed up extensions, such as the use of ass commands in {}
inside srt subtitles.

The thing is: while the full set of features of HTML fragments seems  
bloat,
not every subtitle will consist of all the possible markup. Just like  
Web
pages are often created with very simple markup which uses less then  
1% of
what HTML is capable of, we will see the same happening with subtitle  
cues.
But the availability and clear definition of how such features should  
be

used prevents the introduction of crappy extension.


Even if very few subtitles use inline SVG, SVG in object, img,  
iframe, video, self-referencing track, etc in the cue text, all  
implementations would have to support it in the same way for it to be  
interoperable. That's quite an undertaking and I don't think it's  
really worth it.




They all need to be interoperable on all of these features already. It  
should be easier to keep them interoperable on something known and  
already implemented than on a set of new features, in particular when  
the new feature set is restricted and features beyond the limited given  
set are not available such that custom markup will be produced by  
plugins etc.



As for extensibility, I suggest that we generalize the WebSRT parser  
somewhat to produce a normal DOM with elements in a non-HTML namespace  
and then use CSS to style them as usual. Unknown element names  
shouldn't be valid, of course, but they'd still appear in the DOM. If  
XML5 (http://annevankesteren.nl/2007/10/xml5) was ready, I'd suggest  
we use that, with the constraint that it should only be able to output  
elements in that non-HTML namespace. (Just thinking out loud here.)


I think that's ok, even though I think it makes more sense to have HTML  
fragments than arbitrary markup that is related but somewhat different.  
I think we are then just re-inventing HTML.


On Fri, 08 Oct 2010 05:20:28 -0700, Robert O'Callahan
rob...@ocallahan.org wrote:


User agents only need to be interoperable over the common subset of HTML
features they support. HTML is mostly designed to degrade gracefully  
when a

user agent encounters elements it doesn't support. The simplest possible
video player would use an HTML parser (hopefully off-the-shelf) to build
some kind of DOM structure. Then it can group text into paragraphs for
rendering, and ignore the rest of the content.

In practice, we'll have to deal with user agents that support different  
sets
of WebSRT features --- when version 2 of WebSRT is developed, if not  
before.

Why not use existing, proven machinery --- HTML --- to cope with that
situation?


I do think that a syntax that looks similar to HTML and XML should have  
similar parsing, which WebSRT currently doesn't. However, using HTML seems  
to create plenty of complications, such as:


* What are relative URLs in a and img relative to? Is it the  
containing document or the WebSRT document? When following links, which  
window is navigated?


* When are external resources like img, object and video loaded?

* If a WebSRT cue includes video autoplay, when should that nested video  
play?


* If a WebSRT cue starting at time 0 includes a self-referring  
videotrack that will be enabled by default, what should happen?


* When should the track be considered ready? This delays the  
loadedmetadata on video, see  
http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-timed-tracks-are-ready


I'd like to understand in more detail what exactly is being suggested be  
done with the HTML fragments returned by the parser, in order to answer  
these questions. Neither of the two obvious implementation approaches  
temporary 

Re: [whatwg] WebSRT feedback

2010-10-13 Thread Philip Jägenstedt

On Fri, 08 Oct 2010 22:54:53 +0200, phil...@opera.com wrote:


I'm making a few assumptions here:


Sorry all, my mail client (Opera, hrm) seems to have taken offense to my  
authoring and discarding of several replies when offline and has punished  
me by showing them to the world. Please ignore the mail starting as above  
quoted,  
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-October/028815.html  
is what I wanted to send.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] WebSRT feedback

2010-10-08 Thread Silvia Pfeiffer

On 08/10/2010, at 1:28 PM, Philip Jägenstedt phil...@opera.com wrote:

 On Thu, 07 Oct 2010 13:18:37 -0700, Silvia Pfeiffer 
 silviapfeiff...@gmail.com wrote:
 
 On Thu, Oct 7, 2010 at 4:06 PM, Philip Jägenstedt phil...@opera.com wrote:
 
 On Wed, 06 Oct 2010 21:37:06 -0700, Silvia Pfeiffer 
 silviapfeiff...@gmail.com wrote:
 
 On Tue, Oct 5, 2010 at 10:04 PM, Philip Jägenstedt phil...@opera.com
 wrote:
 
 
 Styling hooks were requested.If we only have the predefined tags (i, b,
 ...) and voices, these will most certainly be abused, e.g. resulting in
 i
 being used where italics isn't wanted or v Foo being used just for
 styling, breaking the accessibility value it has.
 
 As an aside, the idea of using an HTML parser for the cue text wasn't
 very
 popular.
 
 
 I believe that this feedback was provided by a person representing the
 deaf
 or hard-of-hearing community and not the subtitling community. In
 particular
 at FOMS I heard the opposite opinion.
 
 
 Is this feedback about styling hooks or HTML as the cue text format?
 Both?
 
 
 
 Oh, it was about the last sentence: about using HTML fragments in cue text.
 
 
 
 
 
 On Thu, 07 Oct 2010 01:57:17 -0700, James Graham jgra...@opera.com
 wrote:
 
 On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:
 
 As an aside, the idea of using an HTML parser for the cue text wasn't
 very popular.
 
 
 Why? Were any technical reasons given?
 
 
 The question was directed at the media player/framework developers present.
 One of them didn't care and one was strongly opposed on the basis of bloat.
 This was an aside, if anyone is serious about using the HTML fragment parser
 for WebSRT, we really should approach the developer mailing lists of media
 players/frameworks. I doubt we will find much love, but would be happy to be
 shown wrong.
 
 
 
 The one I talked to said that HTML markup should totally be used in cues (he
 even mentioned more generally why we didn't pick up USF). The reason being
 that it clearly defines extensibility and would in fact already provide any
 use case that anyone can come up with, thus stopping people from inventing
 their own screwed up extensions, such as the use of ass commands in {}
 inside srt subtitles.
 
 The thing is: while the full set of features of HTML fragments seems bloat,
 not every subtitle will consist of all the possible markup. Just like Web
 pages are often created with very simple markup which uses less then 1% of
 what HTML is capable of, we will see the same happening with subtitle cues.
 But the availability and clear definition of how such features should be
 used prevents the introduction of crappy extension.
 
 Even if very few subtitles use inline SVG, SVG in object, img, iframe, 
 video, self-referencing track, etc in the cue text, all implementations 
 would have to support it in the same way for it to be interoperable. That's 
 quite an undertaking and I don't think it's really worth it.
 

They all need to be interoperable on all of these features already. It should 
be easier to keep them interoperable on something known and already implemented 
than on a set of new features, in particular when the new feature set is 
restricted and features beyond the limited given set are not available such 
that custom markup will be produced by plugins etc. 


 As for extensibility, I suggest that we generalize the WebSRT parser somewhat 
 to produce a normal DOM with elements in a non-HTML namespace and then use 
 CSS to style them as usual. Unknown element names shouldn't be valid, of 
 course, but they'd still appear in the DOM. If XML5 
 (http://annevankesteren.nl/2007/10/xml5) was ready, I'd suggest we use that, 
 with the constraint that it should only be able to output elements in that 
 non-HTML namespace. (Just thinking out loud here.)

I think that's ok, even though I think it makes more sense to have HTML 
fragments than arbitrary markup that is related but somewhat different. I think 
we are then just re-inventing HTML.

Cheers,
Silvia. 

Re: [whatwg] WebSRT feedback

2010-10-08 Thread Jeroen Wijering

On Oct 8, 2010, at 2:24 PM, whatwg-requ...@lists.whatwg.org wrote:

 Even if very few subtitles use inline SVG, SVG in object, img,
 iframe, video, self-referencing track, etc in the cue text, all
 implementations would have to support it in the same way for it to be
 interoperable. That's quite an undertaking and I don't think it's really
 worth it.
 
 
 User agents only need to be interoperable over the common subset of HTML
 features they support. HTML is mostly designed to degrade gracefully when a
 user agent encounters elements it doesn't support. The simplest possible
 video player would use an HTML parser (hopefully off-the-shelf) to build
 some kind of DOM structure. Then it can group text into paragraphs for
 rendering, and ignore the rest of the content.
 
 In practice, we'll have to deal with user agents that support different sets
 of WebSRT features --- when version 2 of WebSRT is developed, if not before.
 Why not use existing, proven machinery --- HTML --- to cope with that
 situation?
 
 Rob

The requests we receive on the captioning functionality of the JW Player always 
revolve around styling. Font size, color, style, weight, outline and family. 
Block x, y, width, height, text-align, vertical-align, padding, margin, 
background and alpha. Both for an entire SRT file, for distinct captioning 
entries and for specific parts of a captioning entry. Not to say that a full 
parsing engine wouldn't be nice or useful, but at present there's simply no 
requests for it (not even for a ;). Plus, more advanced timed track 
applications can easily be built with javascript (timed boucing 3D balls using 
WebGL).

W3C's timed text does a decent job in facilitating the styling needs for 
captioning authors. Overall regions, single paragraphs and inline chunks 
(through span) can be styled. There are a few small misses, such as text 
outline, and vertical alignment (which can be done with separate regions 
though). IMO the biggest con of TT is that it uses its own, in-document styling 
namespace, instead of relying upon page CSS. 

Kind regards,

Jeroen

Re: [whatwg] WebSRT feedback

2010-10-07 Thread James Graham

On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:


As an aside, the idea of using an HTML parser for the cue text wasn't
very popular.


Why? Were any technical reasons given?



Finally, some things I think are broken in the current WebSRT parser:


One more from me: the spec is unusually hard to follow here since it 
makes extensive use of goto for flow control. Could it not be 
restructured as a state machine or something so it is easier to follow 
what is going on?


Re: [whatwg] WebSRT feedback

2010-10-07 Thread Philip Jägenstedt
On Wed, 06 Oct 2010 21:37:06 -0700, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:


On Tue, Oct 5, 2010 at 10:04 PM, Philip Jägenstedt  
phil...@opera.comwrote:




Styling hooks were requested.If we only have the predefined tags (i, b,
...) and voices, these will most certainly be abused, e.g. resulting in  
i

being used where italics isn't wanted or v Foo being used just for
styling, breaking the accessibility value it has.

As an aside, the idea of using an HTML parser for the cue text wasn't  
very

popular.



I believe that this feedback was provided by a person representing the  
deaf
or hard-of-hearing community and not the subtitling community. In  
particular

at FOMS I heard the opposite opinion.


Is this feedback about styling hooks or HTML as the cue text format?  
Both?


On Thu, 07 Oct 2010 01:57:17 -0700, James Graham jgra...@opera.com wrote:


On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:


As an aside, the idea of using an HTML parser for the cue text wasn't
very popular.


Why? Were any technical reasons given?


The question was directed at the media player/framework developers  
present. One of them didn't care and one was strongly opposed on the basis  
of bloat. This was an aside, if anyone is serious about using the HTML  
fragment parser for WebSRT, we really should approach the developer  
mailing lists of media players/frameworks. I doubt we will find much love,  
but would be happy to be shown wrong.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] WebSRT feedback

2010-10-07 Thread Philip Jägenstedt
On Thu, 07 Oct 2010 13:18:37 -0700, Silvia Pfeiffer  
silviapfeiff...@gmail.com wrote:


On Thu, Oct 7, 2010 at 4:06 PM, Philip Jägenstedt phil...@opera.com  
wrote:



On Wed, 06 Oct 2010 21:37:06 -0700, Silvia Pfeiffer 
silviapfeiff...@gmail.com wrote:

 On Tue, Oct 5, 2010 at 10:04 PM, Philip Jägenstedt phil...@opera.com

wrote:


Styling hooks were requested.If we only have the predefined tags (i,  
b,
...) and voices, these will most certainly be abused, e.g. resulting  
in

i
being used where italics isn't wanted or v Foo being used just for
styling, breaking the accessibility value it has.

As an aside, the idea of using an HTML parser for the cue text wasn't
very
popular.



I believe that this feedback was provided by a person representing the
deaf
or hard-of-hearing community and not the subtitling community. In
particular
at FOMS I heard the opposite opinion.



Is this feedback about styling hooks or HTML as the cue text format?
Both?




Oh, it was about the last sentence: about using HTML fragments in cue  
text.







On Thu, 07 Oct 2010 01:57:17 -0700, James Graham jgra...@opera.com
wrote:

 On 10/06/2010 04:04 AM, Philip Jägenstedt wrote:


 As an aside, the idea of using an HTML parser for the cue text wasn't

very popular.



Why? Were any technical reasons given?



The question was directed at the media player/framework developers  
present.
One of them didn't care and one was strongly opposed on the basis of  
bloat.
This was an aside, if anyone is serious about using the HTML fragment  
parser
for WebSRT, we really should approach the developer mailing lists of  
media
players/frameworks. I doubt we will find much love, but would be happy  
to be

shown wrong.




The one I talked to said that HTML markup should totally be used in cues  
(he
even mentioned more generally why we didn't pick up USF). The reason  
being
that it clearly defines extensibility and would in fact already provide  
any
use case that anyone can come up with, thus stopping people from  
inventing

their own screwed up extensions, such as the use of ass commands in {}
inside srt subtitles.

The thing is: while the full set of features of HTML fragments seems  
bloat,

not every subtitle will consist of all the possible markup. Just like Web
pages are often created with very simple markup which uses less then 1%  
of
what HTML is capable of, we will see the same happening with subtitle  
cues.

But the availability and clear definition of how such features should be
used prevents the introduction of crappy extension.


Even if very few subtitles use inline SVG, SVG in object, img,  
iframe, video, self-referencing track, etc in the cue text, all  
implementations would have to support it in the same way for it to be  
interoperable. That's quite an undertaking and I don't think it's really  
worth it.


As for extensibility, I suggest that we generalize the WebSRT parser  
somewhat to produce a normal DOM with elements in a non-HTML namespace and  
then use CSS to style them as usual. Unknown element names shouldn't be  
valid, of course, but they'd still appear in the DOM. If XML5  
(http://annevankesteren.nl/2007/10/xml5) was ready, I'd suggest we use  
that, with the constraint that it should only be able to output elements  
in that non-HTML namespace. (Just thinking out loud here.)


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] WebSRT feedback

2010-10-06 Thread Silvia Pfeiffer
On Tue, Oct 5, 2010 at 10:04 PM, Philip Jägenstedt phil...@opera.comwrote:


 Styling hooks were requested. If we only have the predefined tags (i, b,
 ...) and voices, these will most certainly be abused, e.g. resulting in i
 being used where italics isn't wanted or v Foo being used just for
 styling, breaking the accessibility value it has.

 As an aside, the idea of using an HTML parser for the cue text wasn't very
 popular.


I believe that this feedback was provided by a person representing the deaf
or hard-of-hearing community and not the subtitling community. In particular
at FOMS I heard the opposite opinion.


 ...


 * The current syntax looks like XML or HTML but has very different parsing.
 Voices like narrator don't create nodes at all and for tags like i the
 paser has a whitelist and also special rules for inserting rt. Unless
 there are strong reasons for this, then for simplicity and forward
 compatibility, I'd much rather have the parser create an actual DOM (not a
 tree of WebSRT Node Object) that reflects the input. If we also support
 attributes then people who actually want to use their (silly) font
 color=red tags can do so with CSS. This could also work as styling hooks.
 Obviously, a WebSRT parser should create elements in another namespace, we
 don't want e.g. img to work inside cues.


I still believe that in particular img and a are very important tags to
support.


That was all great feedback, btw!

Cheers,
Silvia.


[whatwg] WebSRT feedback

2010-10-05 Thread Philip Jägenstedt
Over the past week I've attended 3 video-related events in New York and  
have discussed track and WebSRT at all of them. Here's a lengthy report  
of feedback, mine and others.


At the Open Subtitles Design Summit [1], there was some discussion about  
captioning for the HoH. I've already put this input into a related bug  
[2], but to summarize: The default rendering for the voices syntax should  
probably be to prefix the text cue with the name of the speaker, not to do  
anything funny with colors or positioning. What's less clear is if it's  
annoying to always prefix with the speaker, or if it should be done only  
to disambiguate.


For my Open Video Conference [3] presentation [4] I did a JavaScript  
implementation of the most interesting parts of track and WebSRT to be  
able to demo what the future might hold [5][6][7]. I have some issues with  
the parser that are at the end of this mail.


At FOMS [8] we had a session on WebSRT [9] which was extremely helpful. It  
turns out that SRT has more syntax variations than we had thought, kindly  
pointed out by VLC developer j-b. Even though there is no SRT spec, there  
is a test suite of sorts [10] that I had never seen before. I'll call SRT  
which follows the syntax implied by these tests ale5000-SRT. Apart from  
the HTML-like markup we knew about, ale5000-SRT also has various markup on  
the form {...} which was borrowed from SSA, as well as \h and \N for hard  
space and line break respectively. Also in the crazy department is that  
tags which aren't matched with an opening and closing tag should be  
rendered as plain text. Stray  should also just be displayed as text. VLC  
actually implements most of this, as does VSFilter, which we should have  
tested but didn't [11]. It would probably be possible to write a spec for  
ale5000-SRT, but extensibility would be limited to matched opening and  
closing tags, which doesn't work for the suggested voices syntax. With  
this mess, I'd rather not extend ale5000-SRT. I can only agree with Silvia  
that we should make WebSRT identifiable, so that different parsers can be  
used.  So:


* Add magic bytes to identify WebSRT, maybe WebSRT. (This will break  
some existing SRT parsers.)
* Make WebSRT always be UTF-8, since you can't reuse existing SRT files  
anyway.
* Note that certain ale5000-SRT syntax is not part of WebSRT, so that one  
doesn't have to debug the parsing algorithm to learn that.


Styling hooks were requested. If we only have the predefined tags (i, b,  
...) and voices, these will most certainly be abused, e.g. resulting in  
i being used where italics isn't wanted or v Foo being used just for  
styling, breaking the accessibility value it has.


As an aside, the idea of using an HTML parser for the cue text wasn't very  
popular.


There was also some discussion about metadata. Language is sometimes  
necessary for the font engine to pick the right glyph. With legacy SRT the  
encoding could be used as a hint, but if we use UTF-8 that's not possible.  
License is also an often requested piece of metadata. I have no strong  
opinion about how to solve this, but key-value pairs like HTTP headers  
comes to mind.


Finally, some things I think are broken in the current WebSRT parser:

* Parsing of timestamps is more liberal than it needs to be. In  
particular, treating the part after the decimal separator as an integer  
and dividing by 1000 leads to 00:00:00.1 being interpreted as 0.001  
seconds, which is weird. This is what e.g. VLC does, but if we need to add  
a header we could just as well change this to make more sane.  
Alternatively, if we want to really align with C implementations using  
scanf, we should also handle negative numbers (00:01:-5,000 means 55  
seconds), octal and hexadecimal.


* The current syntax looks like XML or HTML but has very different  
parsing. Voices like narrator don't create nodes at all and for tags  
like i the paser has a whitelist and also special rules for inserting  
rt. Unless there are strong reasons for this, then for simplicity and  
forward compatibility, I'd much rather have the parser create an actual  
DOM (not a tree of WebSRT Node Object) that reflects the input. If we  
also support attributes then people who actually want to use their (silly)  
font color=red tags can do so with CSS. This could also work as styling  
hooks. Obviously, a WebSRT parser should create elements in another  
namespace, we don't want e.g. img to work inside cues.


* The bad cue handling is stricter than it should be. After collecting  
an id, the next line must be a timestamp line. Otherwise, we skip  
everything until a blank line, so in the following the parser would jump  
to bad cue on line 2 and skip the whole cue.


1
2
00:00:00.000 -- 00:00:01.000
Bla

This doesn't match what most existing SRT parsers do, as they simply look  
for timing lines and ignore everything else. If we really need to collect  
the id instead of ignoring it like