Re: [Haskell-cafe] Unescaping with HaXmL (or anything else!)

2008-04-01 Thread Yitzchak Gale
On Fri, Mar 28, 2008 at 4:26 AM, Anton van Straaten wrote:
 I want to unescape an encoded XML or HTML string, e.g. converting quot;
  to the quote character, etc.
  Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no
  luck

Hi Anton,

I only noticed your post today, sorry for the delay.

I also need this. In fact, it seems to me that it would be
generally useful. I hope that simple functions to escape/unescape
a string will be added to the API.

In the meantime, you are right that it is a bit tricky
to do this in HaXml. Besides the wrappers that you found
to be needed, there are two other issues:

One issue is that you need to lex and then parse the text first.
If you tell HaXml that your string is a CString, it
will believe you and just use the text the way it is without
any further processing.

The other issue is that HaXml's lexer currently can only
deal with XML content that begins with an XML tag. (I've
pointed this out to Malcolm Wallace, the author of HaXml.)
So in order to use it, you need to wrap your content in a
tag and then unwrap it after parsing.

The code below works for me (obviously it would be better to
remove the error calls):

Regards,
Yitz

import Text.XML.HaXml
import Text.XML.HaXml.Parse (xmlParseWith, document)
import Text.XML.HaXml.Lex (xmlLex)

unEscapeXML :: String - String
unEscapeXML = concatMap ctext . xmlUnEscapeContent stdXmlEscaper .
  unwrapTag .
  either error id . fst . xmlParseWith document .
  xmlLex oops, lexer failed . wrapWithTag t
  where
ctext (CString _ txt _) = txt
ctext (CRef (RefEntity name) _) = '' : name ++ ; -- skipped by escaper
ctext (CRef (RefChar num) _)= '' : '#' : show num ++ ; -- ditto
ctext _ = error oops, can't unescape non-cdata
wrapWithTag t s = concat [, t, , s, /, t, ]
unwrapTag (Document _ _ (Elem _ _ c) _) = c
unwrapTag _ = error oops, not wrapped
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Unescaping with HaXmL (or anything else!)

2008-03-28 Thread Henning Thielemann


On Thu, 27 Mar 2008, Anton van Straaten wrote:

I want to unescape an encoded XML or HTML string, e.g. converting quot; to 
the quote character, etc.


Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no luck, 
e.g. with HaXml 1.19.1:


let (CString _ s _) =
 head $ xmlUnEscapeContent stdXmlEscaper $
  [CString  False This is a quot;quoted stringquot; ()] in s

The result is unchanged, i.e. This is a quot;quoted stringquot;.

Am I doing something wrong, or are my expectations wrong, or is this a bug?

Or, is there any other library that includes a simple unescape function for 
XML or HTML?


Tagsoup must contain such a function but it doesn't seem to export it.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Unescaping with HaXmL (or anything else!)

2008-03-27 Thread Anton van Straaten
I want to unescape an encoded XML or HTML string, e.g. converting quot; 
to the quote character, etc.


Since I'm using HaXml anyway, I tried using xmlUnEscapeContent with no 
luck, e.g. with HaXml 1.19.1:


let (CString _ s _) =
  head $ xmlUnEscapeContent stdXmlEscaper $
   [CString  False This is a quot;quoted stringquot; ()] in s

The result is unchanged, i.e. This is a quot;quoted stringquot;.

Am I doing something wrong, or are my expectations wrong, or is this a bug?

Or, is there any other library that includes a simple unescape function 
for XML or HTML?


(The Network.URI module includes an unescape function, but that's 
specific to URIs, naturally.)


Anton

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe