Regular expressions aren't well suited to handle things like checking line
lengths and moving line contents based upon differences in those lengths.
A better method is to use something like a text filter using a scripting
language that can check for things like text lengths and make text string
changes based upon runtime evaluations.
Below is a perl script text filter which will take as input a selection or
whole file of SRT formatted text. It will find any and all SRT sequence
entries with two lines of dialog text and reformat/reword wrap the lines of
text to a more equal line length leaving the second line longer if
necessary for proper word wrapping.
I've named it reformat_subtitle_text.pl and saved it in BBEdit's Text
Filters folder so it will be listed in BBEdit's Text Filters pallet. If
desired you can also set a keyboard shortcut for it.
You'll probably want to enhance the reformatting logic in the fixup_dialog
subroutine to handle cases where simple two line word wrap reformatting
produces awkward results. For example, what appears to be two person dialog
text like:
- Shall I get you something, Micke?
- No, I don't have time.
or
- Whose turn is it today?
- Malin's, isn't it?
with your simple word wrapping rule gets reformatted as:
- Shall I get you something,
Micke? - No, I don't have time.
- Whose turn is it
today? - Malin's, isn't it?
In the SRT formatting rules I found, "-" has no defined markup rule so
perhaps it is just an informal convention so people are using to indicate
multiple people speaking.
SRT formatting rules also allow simple markup annotations (e,g., bold - <b>
</b>) which will change the lengths of displayed text from the lengths of a
subtitle entry's raw dialog text. This script doesn't try to deal with that
complicating issue.
reformat_subtitle_text.pl:
#!/usr/bin/env perl
use strict;
use Text::Wrap;
use POSIX qw/ceil/;
my $subtitles = '';
# regex to dissect one subtitle entry 1) sequence number and time range, 2)
first dialog text line,
# and 3) second dialog text line
my $seq_item_re = qr/(\d+\n\d{2}:\d{2}:\d{2},\d{3} -->
\d{2}:\d{2}:\d{2},\d{3}\n)(.+\n)(.+\n)/;
# read in all the input subtitle text
$subtitles = do { local $/; <STDIN> };
# extract each and all subtitle entries with two lines of dialog text
# and replace them with reformatted version
$subtitles =~ s/$seq_item_re/$1 . fixup_dialog($2, $3)/mge;
#output the reformatted subtitles
print $subtitles;
# reformat two lines of dialog text to have more equal line lengths with
line two the longer if
# necessary for proper word wrapping
sub fixup_dialog {
my ($line1, $line2) = @_;
# trim trailing white space
$line1 =~ s/\s+$//;
$line2 =~ s/\s+$//;
# ideal column width for two lines of characters without word wrapping
# and with word wrapping will leave second line the longer of the two
lines
my $ideal_col_width = ceil((length($line1) + length($line2))/2) + 1;
my $total_text = $line1 . " " . $line2 . "\n";
# locally set wrapping parameters to not expand tabs and column width
constraint
local($Text::Wrap::unexpand) = 0;
local($Text::Wrap::columns) = $ideal_col_width;
my $wrapped_text = wrap('', '', $total_text);
# if word wrapping creates third line move it to end of second line
if ( $wrapped_text =~ m/(.+\n.+)\n(.+\n)/){
$wrapped_text = $1 . $2;
}
return $wrapped_text;
}
On Sunday, February 16, 2025 at 3:37:16 AM UTC-8 Otto Munters wrote:
> Is there a regex to divide the last two lines of each subtitle more evenly
> in the following example, so that both sentences are about the same length,
> with preference given to the longest sentence on the 4th line.
> Example:
> 351
> 00:18:23,120 --> 00:18:29,600
> not that likes and dislikes are
> your enemies
>
> 352
> 00:18:29,600 --> 00:18:31,960
> because they end up serving
> the society.
>
> Thanks for your kind help!
> Otto
>
--
This is the BBEdit Talk public discussion group. If you have a feature request
or believe that the application isn't working correctly, please email
"[email protected]" rather than posting here. Follow @bbedit on Mastodon:
<https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/386743e5-a8b0-4d21-b62f-1c9c5faf2ce9n%40googlegroups.com.