May I ask whether these duplicate words are arbitrary or do they e.g. mainly 
consist of articles? Also, do these words contain any accented characters or 
numerals?

(I expect a suitable grep search & replace could clean up quite a bit of these, 
though an example file would be helpful.)


Regards,

 Patrick Woolsey
==
Bare Bones Software, Inc.             <https://www.barebones.com/>


> On Oct 30, 2024, at 02:21, ce gm <[email protected]> wrote:
> 
> I haven't found a thread on this, but apologies if one exists! 
> 
> I am new to BBEdit, and am using it to clean .txt files prior to text mining. 
> I am converting files to .txt from PDF to ensure R reads the files in 
> correctly (I've had issues with the R PDF reader). When I do this conversion, 
> there are often duplicates of words, appearing like "to to" or "finally 
> finally" throughout the text. These get flagged for grammar in TextEdit and 
> Word, but to fix it, it requires you go through the entire document manually. 
> I have thousands of pages to go through - if I ever want to finish my 
> dissertation, I can't do that.
> 
> I tried the Process Duplicate Lines command in BBEdit, but it did not remove 
> duplicates of words within lines. Does anyone know if there is a way to get 
> BBEdit to identify duplicate words, then automatically delete one of them?
> 
> (or if not BBEdit, then Word or TextEdit?)
> 

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or believe that the application isn't working correctly, please email 
"[email protected]" rather than posting here. Follow @bbedit on Mastodon: 
<https://mastodon.social/@bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/bbedit/493FF4AD-F7CF-49FF-96F5-A3F2C992A32D%40barebones.com.

Reply via email to