If you’re going to ingest the text, you could turn all the spaces into line breaks and then remove duplicate lines. If you don’t care about punctuation you could remove that too.
Or create a script to do it. On Wed, Oct 30, 2024 at 7:21 AM ce gm <[email protected]> wrote: > I haven't found a thread on this, but apologies if one exists! > > I am new to BBEdit, and am using it to clean .txt files prior to text > mining. I am converting files to .txt from PDF to ensure R reads the files > in correctly (I've had issues with the R PDF reader). When I do this > conversion, there are often duplicates of words, appearing like "to to" or > "finally finally" throughout the text. These get flagged for grammar in > TextEdit and Word, but to fix it, it requires you go through the entire > document manually. I have thousands of pages to go through - if I ever want > to finish my dissertation, I can't do that. > > I tried the Process Duplicate Lines command in BBEdit, but it did not > remove duplicates of words within lines. Does anyone know if there is a way > to get BBEdit to identify duplicate words, then automatically delete one of > them? > > (or if not BBEdit, then Word or TextEdit?) > > Thanks! > > -- > This is the BBEdit Talk public discussion group. If you have a feature > request or believe that the application isn't working correctly, please > email "[email protected]" rather than posting here. Follow @bbedit on > Mastodon: <https://mastodon.social/@bbedit> > --- > You received this message because you are subscribed to the Google Groups > "BBEdit Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com > <https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/CABL0jSKsDAixhPOfzH27-%3DEyt9BJdKqzPHhpznrVODrUnH7w5g%40mail.gmail.com.
