That was my first thought too, but quickly realised it won’t work because in 
the end the document will contain only 1 example of each word in the original.

The duplicates to be removed need to be adjacent, not just appearing anywhere 
else in the document.

Roger



> On Oct 30, 2024, at 8:35 AM, Jim Straus <[email protected]> wrote:
> 
> If you’re going to ingest the text, you could turn all the spaces into line 
> breaks and then remove duplicate lines.  If you don’t care about punctuation 
> you could remove that too.
> 
> Or create a script to do it.
> 
> On Wed, Oct 30, 2024 at 7:21 AM ce gm <[email protected] 
> <mailto:[email protected]>> wrote:
>> I haven't found a thread on this, but apologies if one exists! 
>> 
>> I am new to BBEdit, and am using it to clean .txt files prior to text 
>> mining. I am converting files to .txt from PDF to ensure R reads the files 
>> in correctly (I've had issues with the R PDF reader). When I do this 
>> conversion, there are often duplicates of words, appearing like "to to" or 
>> "finally finally" throughout the text. These get flagged for grammar in 
>> TextEdit and Word, but to fix it, it requires you go through the entire 
>> document manually. I have thousands of pages to go through - if I ever want 
>> to finish my dissertation, I can't do that.
>> 
>> I tried the Process Duplicate Lines command in BBEdit, but it did not remove 
>> duplicates of words within lines. Does anyone know if there is a way to get 
>> BBEdit to identify duplicate words, then automatically delete one of them?
>> 
>> (or if not BBEdit, then Word or TextEdit?)
>> 
>> Thanks!
>> 
>> -- 
>> This is the BBEdit Talk public discussion group. If you have a feature 
>> request or believe that the application isn't working correctly, please 
>> email "[email protected] <mailto:[email protected]>" rather than 
>> posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "BBEdit Talk" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> This is the BBEdit Talk public discussion group. If you have a feature 
> request or believe that the application isn't working correctly, please email 
> "[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
> <https://mastodon.social/@bbedit>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To view this discussion visit 
> https://groups.google.com/d/msgid/bbedit/CABL0jSKsDAixhPOfzH27-%3DEyt9BJdKqzPHhpznrVODrUnH7w5g%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/bbedit/CABL0jSKsDAixhPOfzH27-%3DEyt9BJdKqzPHhpznrVODrUnH7w5g%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/522095DA-DB12-4D1C-A72B-1D3809A60810%40gmail.com.

Reply via email to