Regular expressions aren't well suited to handle things like checking line 
lengths and moving line contents based upon differences in those lengths.

A better method is to use something like a text filter using a scripting 
language that can check for things like text lengths and make text string 
changes based upon runtime evaluations.

Below is a perl script text filter which will take as input a selection or 
whole file of SRT formatted text. It will find any and all SRT sequence 
entries with two lines of dialog text and reformat/reword wrap the lines of 
text to a more equal line length leaving the second line longer if 
necessary for proper word wrapping.

I've named it reformat_subtitle_text.pl and saved it in BBEdit's Text 
Filters folder so it will be listed in BBEdit's Text Filters pallet. If 
desired you can also set a keyboard shortcut for it.

You'll probably want to enhance the reformatting logic in the fixup_dialog 
subroutine to handle cases where simple two line word wrap reformatting 
produces awkward results. For example, what appears to be two person dialog 
text like:

- Shall I get you something, Micke?
- No, I don't have time.

or

- Whose turn is it today?
- Malin's, isn't it?

with your simple word wrapping rule gets reformatted as:

- Shall I get you something,
Micke? - No, I don't have time.

- Whose turn is it
today? - Malin's, isn't it?

In the SRT formatting rules I found, "-" has no defined markup rule so 
perhaps it is just an informal convention so people are using to indicate 
multiple people speaking.

SRT formatting rules also allow simple markup annotations (e,g., bold - <b> 
</b>) which will change the lengths of displayed text from the lengths of a 
subtitle entry's raw dialog text. This script doesn't try to deal with that 
complicating issue.

reformat_subtitle_text.pl:

#!/usr/bin/env perl

use strict;
use Text::Wrap;
use POSIX qw/ceil/;

my $subtitles = '';

# regex to dissect one subtitle entry 1) sequence number and time range, 2) 
first dialog text line,
# and 3) second dialog text line
my $seq_item_re = qr/(\d+\n\d{2}:\d{2}:\d{2},\d{3} --> 
\d{2}:\d{2}:\d{2},\d{3}\n)(.+\n)(.+\n)/;

# read in all the input subtitle text
$subtitles = do { local $/; <STDIN> };

# extract each and all subtitle entries with two lines of dialog text
# and replace them with reformatted version
$subtitles =~ s/$seq_item_re/$1 . fixup_dialog($2, $3)/mge;

#output the reformatted subtitles
print $subtitles;

# reformat two lines of dialog text to have more equal line lengths with 
line two the longer if
# necessary for proper word wrapping

sub fixup_dialog {
    my ($line1, $line2) = @_;
    
#   trim trailing white space
    $line1 =~ s/\s+$//;
    $line2 =~ s/\s+$//;
    
#   ideal column width for two lines of characters without word wrapping
#   and with word wrapping will leave second line the longer of the two 
lines
    my $ideal_col_width = ceil((length($line1) + length($line2))/2) + 1;
    my $total_text = $line1 . " " . $line2 . "\n";
    
#   locally set wrapping parameters to not expand tabs and column width 
constraint
    local($Text::Wrap::unexpand) = 0;
    local($Text::Wrap::columns) = $ideal_col_width;
    my $wrapped_text = wrap('', '', $total_text);
    
#   if word wrapping creates third line move it to end of second line
    if ( $wrapped_text =~ m/(.+\n.+)\n(.+\n)/){
        $wrapped_text = $1 . $2;
    }
    return $wrapped_text;
}


On Sunday, February 16, 2025 at 3:37:16 AM UTC-8 Otto Munters wrote:

> Is there a regex to divide the last two lines of each subtitle more evenly 
> in the following example, so that both sentences are about the same length, 
> with preference given to the longest sentence on the 4th line.
> Example:
> 351
> 00:18:23,120 --> 00:18:29,600
> not that likes and dislikes are 
> your enemies
>
> 352
> 00:18:29,600 --> 00:18:31,960
> because they end up serving
> the society.
>
> Thanks for your kind help! 
> Otto
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/386743e5-a8b0-4d21-b62f-1c9c5faf2ce9n%40googlegroups.com.

Reply via email to