I'm trying to use sed to munge some text in HTML files, converting Unicode characters to their HTML entity equivalents, however I can't seem to get it to work.

For instance, this command has no apparent effect:

  sed -i -e 's/\xe2\x80\x94/—/g' foo.html

Other sed operations using ASCII arguments work fine.

Does sed support Unicode in this fashion? The sed(1) man page is silent. The FAQ section on Character Sets <http://www.openbsd.org/faq/faq10.html#locales> indicates that:

   OpenBSD uses the ASCII character set by default. It also supports
   the Unicode (UTF-8) character set.

but I'm not sure what bearing that has on this issue.

Running OpenBSD 6.0 (GENERIC.MP) #2302: Sat Jul 23 09:33:37 MDT 2016 (amd64)

Many thanks in advance for any assistance.

Reply via email to