That was my first thought too, but quickly realised it won’t work because in the end the document will contain only 1 example of each word in the original.
The duplicates to be removed need to be adjacent, not just appearing anywhere else in the document. Roger > On Oct 30, 2024, at 8:35 AM, Jim Straus <[email protected]> wrote: > > If you’re going to ingest the text, you could turn all the spaces into line > breaks and then remove duplicate lines. If you don’t care about punctuation > you could remove that too. > > Or create a script to do it. > > On Wed, Oct 30, 2024 at 7:21 AM ce gm <[email protected] > <mailto:[email protected]>> wrote: >> I haven't found a thread on this, but apologies if one exists! >> >> I am new to BBEdit, and am using it to clean .txt files prior to text >> mining. I am converting files to .txt from PDF to ensure R reads the files >> in correctly (I've had issues with the R PDF reader). When I do this >> conversion, there are often duplicates of words, appearing like "to to" or >> "finally finally" throughout the text. These get flagged for grammar in >> TextEdit and Word, but to fix it, it requires you go through the entire >> document manually. I have thousands of pages to go through - if I ever want >> to finish my dissertation, I can't do that. >> >> I tried the Process Duplicate Lines command in BBEdit, but it did not remove >> duplicates of words within lines. Does anyone know if there is a way to get >> BBEdit to identify duplicate words, then automatically delete one of them? >> >> (or if not BBEdit, then Word or TextEdit?) >> >> Thanks! >> >> -- >> This is the BBEdit Talk public discussion group. If you have a feature >> request or believe that the application isn't working correctly, please >> email "[email protected] <mailto:[email protected]>" rather than >> posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> >> --- >> You received this message because you are subscribed to the Google Groups >> "BBEdit Talk" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> To view this discussion visit >> https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > -- > This is the BBEdit Talk public discussion group. If you have a feature > request or believe that the application isn't working correctly, please email > "[email protected]" rather than posting here. Follow @bbedit on Mastodon: > <https://mastodon.social/@bbedit> > --- > You received this message because you are subscribed to the Google Groups > "BBEdit Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion visit > https://groups.google.com/d/msgid/bbedit/CABL0jSKsDAixhPOfzH27-%3DEyt9BJdKqzPHhpznrVODrUnH7w5g%40mail.gmail.com > > <https://groups.google.com/d/msgid/bbedit/CABL0jSKsDAixhPOfzH27-%3DEyt9BJdKqzPHhpznrVODrUnH7w5g%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/522095DA-DB12-4D1C-A72B-1D3809A60810%40gmail.com.
