Thanks you for your help Mark and GP.
I succeeded by making a textfactory, which repeats a series of greps.
search: (?=^.{82}$)(.{29,42}\b)(.*)
replace: \1\n\2
the first grep is repeated with: (?=^.{81}$)(.{29,42}\b)(.*)
until (?=^.{43}$)(.{10,24}\b)(.*)
so that all lines of length 43 to 82 are converted into two lines of
approximately equal size.
Best regards, Otto
Op maandag 17 februari 2025 om 23:22:36 UTC+1 schreef GP:
> Oops!
> Forgot to concatenate a space character in fixing up third line word
> wrapping. In the fixup_dialog subroutine, change the line:
> $wrapped_text = $1 . $2;
> to:
> $wrapped_text = $1 . " " . $2;
>
> On Monday, February 17, 2025 at 2:06:15 PM UTC-8 GP wrote:
>
>> Regular expressions aren't well suited to handle things like checking
>> line lengths and moving line contents based upon differences in those
>> lengths.
>>
>> A better method is to use something like a text filter using a scripting
>> language that can check for things like text lengths and make text string
>> changes based upon runtime evaluations.
>>
>> Below is a perl script text filter which will take as input a selection
>> or whole file of SRT formatted text. It will find any and all SRT sequence
>> entries with two lines of dialog text and reformat/reword wrap the lines of
>> text to a more equal line length leaving the second line longer if
>> necessary for proper word wrapping.
>>
>> I've named it reformat_subtitle_text.pl and saved it in BBEdit's Text
>> Filters folder so it will be listed in BBEdit's Text Filters pallet. If
>> desired you can also set a keyboard shortcut for it.
>>
>> You'll probably want to enhance the reformatting logic in the
>> fixup_dialog subroutine to handle cases where simple two line word wrap
>> reformatting produces awkward results. For example, what appears to be two
>> person dialog text like:
>>
>> - Shall I get you something, Micke?
>> - No, I don't have time.
>>
>> or
>>
>> - Whose turn is it today?
>> - Malin's, isn't it?
>>
>> with your simple word wrapping rule gets reformatted as:
>>
>> - Shall I get you something,
>> Micke? - No, I don't have time.
>>
>> - Whose turn is it
>> today? - Malin's, isn't it?
>>
>> In the SRT formatting rules I found, "-" has no defined markup rule so
>> perhaps it is just an informal convention so people are using to indicate
>> multiple people speaking.
>>
>> SRT formatting rules also allow simple markup annotations (e,g., bold -
>> <b> </b>) which will change the lengths of displayed text from the lengths
>> of a subtitle entry's raw dialog text. This script doesn't try to deal with
>> that complicating issue.
>>
>> reformat_subtitle_text.pl:
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use Text::Wrap;
>> use POSIX qw/ceil/;
>>
>> my $subtitles = '';
>>
>> # regex to dissect one subtitle entry 1) sequence number and time range,
>> 2) first dialog text line,
>> # and 3) second dialog text line
>> my $seq_item_re = qr/(\d+\n\d{2}:\d{2}:\d{2},\d{3} -->
>> \d{2}:\d{2}:\d{2},\d{3}\n)(.+\n)(.+\n)/;
>>
>> # read in all the input subtitle text
>> $subtitles = do { local $/; <STDIN> };
>>
>> # extract each and all subtitle entries with two lines of dialog text
>> # and replace them with reformatted version
>> $subtitles =~ s/$seq_item_re/$1 . fixup_dialog($2, $3)/mge;
>>
>> #output the reformatted subtitles
>> print $subtitles;
>>
>> # reformat two lines of dialog text to have more equal line lengths with
>> line two the longer if
>> # necessary for proper word wrapping
>>
>> sub fixup_dialog {
>> my ($line1, $line2) = @_;
>>
>> # trim trailing white space
>> $line1 =~ s/\s+$//;
>> $line2 =~ s/\s+$//;
>>
>> # ideal column width for two lines of characters without word wrapping
>> # and with word wrapping will leave second line the longer of the two
>> lines
>> my $ideal_col_width = ceil((length($line1) + length($line2))/2) + 1;
>> my $total_text = $line1 . " " . $line2 . "\n";
>>
>> # locally set wrapping parameters to not expand tabs and column width
>> constraint
>> local($Text::Wrap::unexpand) = 0;
>> local($Text::Wrap::columns) = $ideal_col_width;
>> my $wrapped_text = wrap('', '', $total_text);
>>
>> # if word wrapping creates third line move it to end of second line
>> if ( $wrapped_text =~ m/(.+\n.+)\n(.+\n)/){
>> $wrapped_text = $1 . $2;
>> }
>> return $wrapped_text;
>> }
>>
>>
>> On Sunday, February 16, 2025 at 3:37:16 AM UTC-8 Otto Munters wrote:
>>
>>> Is there a regex to divide the last two lines of each subtitle more
>>> evenly in the following example, so that both sentences are about the same
>>> length, with preference given to the longest sentence on the 4th line.
>>> Example:
>>> 351
>>> 00:18:23,120 --> 00:18:29,600
>>> not that likes and dislikes are
>>> your enemies
>>>
>>> 352
>>> 00:18:29,600 --> 00:18:31,960
>>> because they end up serving
>>> the society.
>>>
>>> Thanks for your kind help!
>>> Otto
>>>
>>
--
This is the BBEdit Talk public discussion group. If you have a feature request
or believe that the application isn't working correctly, please email
"[email protected]" rather than posting here. Follow @bbedit on Mastodon:
<https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/bbedit/c2d2b594-3133-4a3e-b518-5b59403c3ae1n%40googlegroups.com.