[Haskell-cafe] Munging wiki articles with tagsoup

2008-09-07 Thread Gwern Branwen
Hiya Neil. So recently I've been trying to come up with some automated system 
to turn The Monad Reader articles like those in 
 into wiki-formatted articles 
for putting on Haskell.org. Thus far, I've had the most success with SVN Pandoc.

Pandoc does a good job - you can see an example conversion at 
. Modulo the 
errors which are largely due to haskell.org problems and a few limitations in 
Pandoc (no comments, no real support for references), it's fine.

But Pandoc's author will not support  tags inasmuch as they 
are an extension to MediaWiki and not universal; he prefers  or  tags. He suggested I use TagSoup to convert them into 
 tags. Well, alright. They're tags, TagSoup does tags - seems natural.

After an hour, I came up with a nice clean little script:



import Text.HTML.TagSoup.Render
import Text.HTML.TagSoup

main :: IO ()
main = interact convertPre

convertPre :: String -> String
convertPre = renderTags . map convertToHaskell . canonicalizeTags . parseTags

convertToHaskell :: Tag -> Tag
convertToHaskell x
   | isTagOpenName  "pre" x = TagOpen  "haskell" (extractAttribs x)
   | isTagCloseName "pre" x = TagClose "haskell"
   | otherwise  = x
 where
   extractAttribs :: Tag -> [Attribute]
   extractAttribs (TagOpen _ y) = y
   extractAttribs _ = error "The 
impossible happened."



On an aside, may I note that TagSoup doesn't seem to support transformations 
particularly well? Or if it does, I didn't notice any examples. I spent most of 
my time just figuring out how to convert the 'x' from a stuff to 
stuff. Also, it might be nice to define an 'interact' alike, which is 
(String -> String), and defined, I supposed, as 'interact f = renderTags . f . 
canonicalizeTags . parseTags'. Extraction functions would be good as well - 
you'd only need 3 groups, I think; 1 for the 2 items in TagOpen, 1 for 
TagPosition's 2 positions, and 1 which extracts the String from the rest.

Anyway, so my script seems to work. I ran the wiki output through it and this 
is the diff: 
.

Ok, good, it replaces all the tags... But wait, what's all this other stuff? It 
is replacing all my apostrophes with '! No doubt this has something to do 
with XML/HTML/SGML or whatever, but it's not ideal. Even if it doesn't break 
the formatting (as I think it does), it's still cluttering up the source.

So, how can I fix this? Am I just barking up the wrong tree and should be 
writing a simple-minded search-and-replace sed script which replaces  with 
,  with ...?

--
gwern
USS Enforcers SORO Morwenstow MOD Albright MI5 AOL 701 GCHQ


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ganesh Sittampalam

On Sun, 7 Sep 2008, Ashley Yakeley wrote:


Ganesh Sittampalam wrote:
Suppose I am writing something that I intend to be used as part of a 
plug-in that is reloaded in different forms again and again. And I see 
module K which does something I want, so I use it. It so happens that K 
uses M, which has a <-. If I knew that using K in my plug-in would cause a 
memory leak, I would avoid doing so; but since the whole point of <- is to 
avoid making the need for some state visible in the API.


The results from the <- in M will only be stored once for the life of the 
RTS, no matter how many times your plug-ins are reloaded.


Sorry, I keep forgetting that. OK, so you can't get an endless stream of 
leaks unless you use <- yourself, or modules on your system keep getting 
upgraded to new versions.


Ganesh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: [Haskell] Top Level <-

2008-09-07 Thread Curt Sampson
On 2008-08-28 14:45 -0700 (Thu), Jonathan Cast wrote:

> Now, I happen to know that the only top-level handles that can be
> established without issuing an open system call are
> 
> stdin
> stdout
> stderr
> 
> (unless you're happy to have your global nonStdErr start its life
> attached to an unopened FD).

I've not thought through exactly how this might relate to your argument,
but certainly, though there might or might not be Haskell Handles for
other file descriptors, they can start out open without calling open.
Compile this simple program:

#import 

int main() {
int n;
n = write(5, "foobar\n", 7);
printf("write returned %d\n", n);
return 0;
}

and run it with "./a.out 5>&1" and have a look at the result you get.

cjs
-- 
Curt Sampson   <[EMAIL PROTECTED]>+81 90 7737 2974   
Mobile sites and software consulting: http://www.starling-software.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ashley Yakeley

Ganesh Sittampalam wrote:
Suppose I am writing something that I intend to be used as part of a 
plug-in that is reloaded in different forms again and again. And I see 
module K which does something I want, so I use it. It so happens that K 
uses M, which has a <-. If I knew that using K in my plug-in would cause 
a memory leak, I would avoid doing so; but since the whole point of <- 
is to avoid making the need for some state visible in the API.


The results from the <- in M will only be stored once for the life of 
the RTS, no matter how many times your plug-ins are reloaded.


--
Ashley Yakeley
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: language proposal: ad-hoc overloading

2008-09-07 Thread Brandon S. Allbery KF8NH

On 2008 Sep 6, at 19:09, John Smith wrote:

Ryan Ingram wrote:

module Prob where
import qualified Data.Map as M

newtype Prob p a = Prob { runProb :: [(a,p)] }
combine :: (Num p, Ord a) => Prob p a -> Prob p a
combine m = Prob $
   M.assocs $
   foldl' (flip $ uncurry $ M.insertWith (+)) M.empty $
   runProb m
Do you see it?  All those "M." just seem dirty to me, especially
because the compiler should be able to deduce them from the types of
the arguments.


May I humbly suggest a much simpler solution to your problem: if an  
identifier is ambiguous, the compiler will use the last import. So,  
in your example, the compiler will assume that any instance of empty  
is Data.Map.empty


I don't like that idea very much; if I reorder my imports the program  
semantics suddenly change?


Some means of using an imported module as the default namespace, and  
requiring the Prelude to be qualified, may also help.


You can already do this by importing Prelude explicitly, possibly with  
the NoImplicitPrelude language option.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: language proposal: ad-hoc overloading

2008-09-07 Thread John Smith

Ryan Ingram wrote:

module Prob where
import qualified Data.Map as M


newtype Prob p a = Prob { runProb :: [(a,p)] }

combine :: (Num p, Ord a) => Prob p a -> Prob p a
combine m = Prob $
M.assocs $
foldl' (flip $ uncurry $ M.insertWith (+)) M.empty $
runProb m

Do you see it?  All those "M." just seem dirty to me, especially
because the compiler should be able to deduce them from the types of
the arguments.


May I humbly suggest a much simpler solution to your problem: if an identifier is ambiguous, the compiler will use the 
last import. So, in your example, the compiler will assume that any instance of empty is Data.Map.empty


Some means of using an imported module as the default namespace, and requiring 
the Prelude to be qualified, may also help.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ganesh Sittampalam

On Sun, 7 Sep 2008, Brandon S. Allbery KF8NH wrote:

You seem to think we must never insure that something will only be run 
once, that any program that does require this is broken.  As such, the 
standard Haskell libraries (including some whose interfaces are H98) are 
unfixably broken and you'd better start looking elsewhere for your 
"correct behavior".


Data.Unique might be unfixably broken, though perhaps some requirement 
that it not be unloaded while any values of type Unique are still around 
could solve the problem - though it's hard to see how this could be 
implemented sanely. But Data.Unique could (a) probably be replaced with 
something in terms of IORefs and (b) is pretty ugly anyway, since it 
forces you into IO.


I'm sure that for many other examples, re-initialisation would be fine. 
For example Data.HashTable just uses a global for instrumentation for 
performance tuning, which could happily be reset if it got unloaded and 
then reloaded. System.Random could get a new StdGen. I haven't yet had 
time to go through the entire list that Adrian Hey posted to understand 
why they are being used, though.


I'd also point out that if you unload and load libraries in C, global 
state will be lost and re-initialised.


Ganesh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Brandon S. Allbery KF8NH

On 2008 Sep 7, at 12:10, Ganesh Sittampalam wrote:

On Sun, 7 Sep 2008, Brandon S. Allbery KF8NH wrote:

Since you consider memory leaks to be worse than correct behavior,


Not leaking memory is *part* of correct behaviour. If <- is to be  
created at all, it should be created with restrictions that make it  
capable of guaranteeing correct behaviour.


(But you might want to go look at that list of modules which do  
global variable initialization and therefore aren't entirely  
trustworthy unless something like ACIO exists.)


We should fix them (and their interface) so this doesn't happen,  
rather than standardising something broken.


And we're right back to "so how do we do this when we aren't allowed  
to record that it has already been run?"  You seem to think we must  
never insure that something will only be run once, that any program  
that does require this is broken.  As such, the standard Haskell  
libraries (including some whose interfaces are H98) are unfixably  
broken and you'd better start looking elsewhere for your "correct  
behavior".


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] HXT from schema to data model

2008-09-07 Thread Andrea Rossato
On Sun, Sep 07, 2008 at 03:35:40PM +0200, Pierre-Edouard Portier wrote:
> Hi!
> Is there a way to generate a data model and a set of picklers from an XML (or
> RelaxNG) Schema using the HXT tool box?


not that I'm aware of. There's something for generating a data type
and an access interface from DTD. See, in hxt source code:

examples/arrows/dtd2hxt/DTDtoHXT.hs

nothing for picklers. and nothing from RelaxNG (even the validator is
not complete).

in HaXml there could be something worth having a look to (that depends
on what you are actually search for).

I had a similar problem some time ago and was looking for something
like that to implement CSL (and XML macro language for citation
formatting), but I handed up writing all the needed boilerplate code,
for the data type and the pickler deserializer.

Anyway, drop a line if you find (or write) something.

Cheers,
Andrea
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ganesh Sittampalam

On Sun, 7 Sep 2008, Brandon S. Allbery KF8NH wrote:

plug-in that is reloaded in different forms again and again. And I see 
module K which does something I want, so I use it. It so happens that K 
uses M, which has a <-. If I knew that using K in my plug-in would cause a 
memory leak, I would avoid doing so; but since the whole point of <- is to 
avoid making the need for some state visible in the API.


False, as it's in ACIO and therefore advertises that it will "leak 
memory" in the name of correct behavior.


I thought ACIO was a restriction on the thing on the right hand side of 
the <-? How does the module itself advertise its use of this 
(transitively) to users?



Since you consider memory leaks to be worse than correct behavior,


Not leaking memory is *part* of correct behaviour. If <- is to be created 
at all, it should be created with restrictions that make it capable of 
guaranteeing correct behaviour.


(But you might want to go look at that list of modules which do global 
variable initialization and therefore aren't entirely trustworthy unless 
something like ACIO exists.)


We should fix them (and their interface) so this doesn't happen, rather 
than standardising something broken.


Ganesh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Brandon S. Allbery KF8NH

On 2008 Sep 7, at 6:23, Ganesh Sittampalam wrote:

On Sat, 6 Sep 2008, Ashley Yakeley wrote:

Ganesh Sittampalam wrote:
The set of ACIO expressions exp is the "static initialisers" of  
M. The RTS must note when each static initialiser is run, and  
cache its result val. Let's call this cache of vals the "static  
results cache" of M.
When M is loaded, and a static results cache for M already  
exists, then it will be used for the vals of M.
This sounds "reachable" to me, and therefore static overhead and  
not a leak.
You can call it what you like, but it's still unacceptable  
behaviour, particularly since clients of M will have no way of  
telling from its API that it will happen.


That what will happen?


That memory will be used and not ever be reclaimable.

Suppose I am writing something that I intend to be used as part of a  
plug-in that is reloaded in different forms again and again. And I  
see module K which does something I want, so I use it. It so happens  
that K uses M, which has a <-. If I knew that using K in my plug-in  
would cause a memory leak, I would avoid doing so; but since the  
whole point of <- is to avoid making the need for some state visible  
in the API.


False, as it's in ACIO and therefore advertises that it will "leak  
memory" in the name of correct behavior.  Since you consider memory  
leaks to be worse than correct behavior, you can avoid anything that  
uses ACIO.  (But you might want to go look at that list of modules  
which do global variable initialization and therefore aren't entirely  
trustworthy unless something like ACIO exists.)


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] HXT from schema to data model

2008-09-07 Thread Pierre-Edouard Portier
Hi!
Is there a way to generate a data model and a set of picklers from an XML (or
RelaxNG) Schema using the HXT tool box?
Thank You,
Pierre-Edouard
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Online Real World Haskell, problem with Sqlite3 chapters

2008-09-07 Thread david48
On Fri, Sep 5, 2008 at 1:05 PM, Janis Voigtlaender
<[EMAIL PROTECTED]> wrote:
> See John's comment, right there in the online version:
>
> "The system that generated this webpage didn't have HDBC installed at
> the time. We'll get that fixed and re-post this chapter. In the
> meantime, unfortunately, all of the examples on this page will look that
> way."

Oops. Though, by the time I wrote the message, I didn't have access to
the comments.
I see now it's corrected. Kudos to the authors for an excellent book !

David.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ganesh Sittampalam

On Sat, 6 Sep 2008, Ashley Yakeley wrote:


Ganesh Sittampalam wrote:
The set of ACIO expressions exp is the "static initialisers" of M. The 
RTS must note when each static initialiser is run, and cache its result 
val. Let's call this cache of vals the "static results cache" of M.


When M is loaded, and a static results cache for M already exists, then 
it will be used for the vals of M.


This sounds "reachable" to me, and therefore static overhead and not a 
leak.


You can call it what you like, but it's still unacceptable behaviour, 
particularly since clients of M will have no way of telling from its 
API that it will happen.


That what will happen?


That memory will be used and not ever be reclaimable.

Suppose I am writing something that I intend to be used as part of a 
plug-in that is reloaded in different forms again and again. And I see 
module K which does something I want, so I use it. It so happens that K 
uses M, which has a <-. If I knew that using K in my plug-in would cause a 
memory leak, I would avoid doing so; but since the whole point of <- is to 
avoid making the need for some state visible in the API.


Ganesh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Top Level <-

2008-09-07 Thread Ganesh Sittampalam

On Sat, 6 Sep 2008, Brandon S. Allbery KF8NH wrote:


On 2008 Sep 6, at 18:25, Ashley Yakeley wrote:


2. If the dynamic loader loads an endless stream of different modules
containing initialisers, memory will thus leak.


I think if the issue is this vs. not being able to guarantee any 
once-only semantics, i consider the former necessary overhead for proper 
program behavior.


Not leaking memory is an important part of proper program behaviour.

And that, given that there exists extra-program global state that 
one might want to access, once-only initialization is a necessity.


In what cases? In the case of buffered I/O there's no reason (in theory) 
you couldn't unload libc, do unbuffered I/O for a while, then reload libc 
and start again.


Ganesh
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe