I haven't found a thread on this, but apologies if one exists! 

I am new to BBEdit, and am using it to clean .txt files prior to text 
mining. I am converting files to .txt from PDF to ensure R reads the files 
in correctly (I've had issues with the R PDF reader). When I do this 
conversion, there are often duplicates of words, appearing like "to to" or 
"finally finally" throughout the text. These get flagged for grammar in 
TextEdit and Word, but to fix it, it requires you go through the entire 
document manually. I have thousands of pages to go through - if I ever want 
to finish my dissertation, I can't do that.

I tried the Process Duplicate Lines command in BBEdit, but it did not 
remove duplicates of words within lines. Does anyone know if there is a way 
to get BBEdit to identify duplicate words, then automatically delete one of 
them?

(or if not BBEdit, then Word or TextEdit?)

Thanks!

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/2a1b0304-1e5e-4e25-90f4-829fbd7b650cn%40googlegroups.com.

Reply via email to