On Tue, 5 Feb 2019, logical american wrote:

Is there a linux tool which cleans up the URLs in a text file (I believe
Western unicode encoding) so that all the tracking tags, fbclid, etc are
removed and the pure URL is left in the text?

In one recent email I received, there were 28 govdelivery.com tags and others embedded inside the URLs, and I don't wish the posted material to provide an easy access for the website to be tracked.

Randall,

I have no idea what your files look like so I can offer only a generic
overview. You have grep, sed, awk and the scripting languages Perl and
Python. Each will do the job but the choice depends on the structure of the
text file. You might need to pre-process the file(s) using an editor (emacs
I know will work; vim probably does too) so there the lines in the files are
uniform and the URLs can easily be indentified.

HTH,

Rich
_______________________________________________
PLUG mailing list
PLUG@pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to