Re: Native Threads in the RTS

2002-11-19 Thread Nicolas Oury
Hello,
Le mardi 19 novembre 2002, à 01:28 , Wolfgang Thaller a écrit :


Nicolas Oury wrote:


I don't know if what I say is pertinent, but there was another problem 
that was discussed in the thread about threaded RTS.
One may want to use a finalizer in a particular thread.
For example, a finalizer that put make a rotating cube on screen must 
be ran in the same thread as the Opengl/GLUT things...

Good point. That feature won't be covered by my first proposal (As I 
said, I'll write up a proper document about that ASAP, that is, as soon 
as I find an entire hour of free time). It sounds useful at first, but 
I'm not that sure about it: after all, we can't rely on when the 
finalizer will be executed: the thread might no longer be around, and 
the GLUT window might be long closed. We should definitely think about 
it a little more, though.


These problems always appears with finalizer. An example (maybe a bit 
strange but) : one can want to close a window when the program can't 
reach it anymore. It may sound more realistic if it is closing an 
unreachable database with a databasse library made for one thread only...

I don't know if it is planned but I think it could be great to be able 
to have, in the new OS thread for OpenGL, an "expressivity only"
concurrence system. I mean that to be able to fork user threads that 
are executed in the new OS thread. These new threads  would be  
blocked on other threads in that kernel thread blocked, but can all 
access to this library, and will make programming easier.

This sounds a lot like the "thread group" idea that somebody had when 
we last discussed this. I think it gives us added flexibility at the 
cost of more difficult implementation and the danger of accidentally 
blocking OS threads [it might be just yet another source of bugs].


It is not bad that threads working in the OpenGL world are blocked when 
OpenGL can't receive orders.
The user would see new OS thread as a thread for a kind of group : group 
of people using OpenGL, are blocked wwhen OpenGl blocks. Think again of 
other example : imagine the glorious haskell web server  using a 
database library allowing only one thread access. One can't write all 
the request in only one thread, the programmer will have to add a 
monothreaded layer that serve requests from other threads that are 
preparing pages with database datas.

I don't know if that library (database monothread) exits but a good 
exercise would be to all try to find example library relevant to this 
problem.

I can start adding a small example :
SDL, which allows to manage every low level parts of 2D games and all 
that isn't OpenGl in an OpenGl game.

I have a small and dirty binder to SDL, and in SDL one have to use the 
same thread for pumping events (asking what have be done), than the one 
which was use to open video window !!!

A program would here be far better organize if multithreaded is allowed 
in one thread...

With that kind of problem, having a way to program multithreaded and run 
monothread is a power. As  there already is user level threads, it would 
be a pity to lose that.

I'll first write up something in order to explain/accurately define the 
simple solution I proposed. After that, we can still design a more 
complex solution that addresses these two issues.

Great,

Best regards,
Nicolas Oury



Cheers,

Wolfgang Thaller

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: unsafePerformIO and IORefs

2002-11-19 Thread Simon Marlow
> At 2002-11-18 11:05, Sven Panne wrote:
> 
> >global :: a -> IORef a
> >global a = unsafePerformIO (newIORef a)
> 
> This is useful, you can do this with it:
> 
>   ref = global Nothing
> 
>   convert :: a -> IO b
>   convert a = do
> writeIORef ref (Just a)
> Just b <- readIORef ref
> return b

This particular "flexibility" provided by unsafePerformIO is actually
documented...

http://www.haskell.org/ghc/docs/latest/html/base/System.IO.Unsafe.html#u
nsafePerformIO

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: unsafePerformIO and IORefs

2002-11-19 Thread Simon Marlow
> Hal Daume III wrote:
>  > You can't. [...]
> 
> Well, you can, but only for CAFs. This idiom/hack is used
> quite happily throughout GHC, HOpenGL, H/Direct, ...

I think "quite happily" is a bit strong ;-)  We'd much rather have a
safe way to do what is really quite a reasonable thing.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: first stab at -ffunction-sections

2002-11-19 Thread Simon Marlow
> I noticed a lot of not-obviously-used stuff brought in from various
> libraries and wanted to nuke some of the unneeded things. Step 1 was
> trying to compile the libraries with the option, which didn't quite
> fly... it looks like ghc-asm is the primary sufferer, and I'm not sure
> the compiler option is needed...

Yes, indeed you could do this in ghc-asm.  But don't forget the native
code generator too...

> -split-objs I didn't really realize was there. I see (tracing 
> through ghc5, whatever debian's latest shipping version is):
> 
> ghc/compiler/main/DriverFlags.hs:250
>   ,  ( "split-objs" , NoArg (if can_split
> then do writeIORef 
> v_Split_object_files True
> add v_Opt_C 
> "-fglobalise-toplev-name
> s"
> else hPutStrLn stderr
> "warning: don't 
> know how to  split \
> \object files on 
> this architecture"
> ) )

-split-objs has to go to some trouble to make more symbols global so
that they can still be resolved after the assembler file has been split
into chunks.  This is one rather ugly hack that it would be nice to get
rid of.

> Then in ghc/driver/split/ghc-split.lprl:287 (there's actually 
> one per arch):
> 
> # strip the marker
> 
> $str =~ s/(\.text\n\t\.align 
> .(,0x90)?\n)\.globl\s+.*_?__stg_split_marker.*\
> n/$1/;
> $str =~ s/(\t\.align 
> .(,0x90)?\n)\.globl\s+.*_?__stg_split_marker.*\n/$1/;  
> 
>...

Yes, this is all particularly horrible too.  The splitter has to go to
some trouble to make sure that "local constants" (strings and the like)
get duplicated in each assembler chunk that refers to them, subverting
the commoning up of constants that gcc does.

> So to me it looks feasible to figure out who's fooling with these
> things, though it's probably not necessary to do any of this 
> within the
> compiler except for whatever might circumvent ghc-asm, if anything.
> 
> At any rate, I am finding the amount of unused code/data linked into
> the generated executables significant... for instance, in a non-
> concurrent program:
> 
> 080a1c64 D MVar_modifyMVarzu_closure
> 0805aeb8 T MVar_modifyMVarzu_entry
> 0805aece T MVar_modifyMVarzu_fast3
> 0805aeb8 T MVar_modifyMVarzu_info

The IO library requires MVars (it's thread-safe) so most programs will
have some MVar bits linked in.  I'm not sure why modifyMVar in
particular is being included though.

> ... and as it's a 9-line script to mangle patches, it's certainly not
> using this:
> 
> 0805b140 T __stginit_PosixDB
> 
> The idea with -ffunction-sections or brewing up an equivalent is to
> build the libraries with it so when the final executable is linked, it
> imports only the code and statically-allocated data it uses from them.

Ok, I'm convinced :)  Let us know if you need any more help.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Strings are slow

2002-11-19 Thread Lauri Alanko
Hello. Here's a grep program:


module Main where

import Text.Regex
import System.IO
import System.Environment
import Control.Monad
import Data.Maybe

main = do [re] <- getArgs
  let rx = mkRegex re
  let loop = do line <- getLine
when (isJust (matchRegex rx line)) (putStrLn line)
eof <- isEOF
unless eof loop
  loop


It turned out that this is remarkably slow. The first problem was with
inlining. If this is compiled with ghc-5.04 -O -ddump-simpl, I get:

  case GHC.IOBase.unsafePerformIO
 @ (Data.Maybe.Maybe
(GHC.Base.String,
 GHC.Base.String,
 GHC.Base.String,
 [GHC.Base.String]))
 (Text.Regex.Posix.regexec (Text.Regex.mkRegex re) a731)
  of wild4 {

Ie. the regex is compiled anew every time a string is matched. A bug?

Anyway, without optimization the code produced is reasonable, but still
horrendously slow. Testing with a simple word as a pattern from a 7.3MB,
800kline file, the running time was 37.5 seconds. For comparison, a similar
program in mzscheme (interpreted!) took 7.3 seconds while the system
grep, of course, took 0.4 seconds.

I did some profiling by creating new top-level bindings for matchRegex
and getLine (is there a better way?):


total time  =   53.34 secs   (2667 ticks @ 20 ms)
total alloc = 1,172,482,496 bytes  (excludes profiling overheads)

COST CENTREMODULE   %time %alloc

match  Main  69.7   56.8
getl   Main  23.9   40.2
main   Main   6.33.0


So it seems like all the time is spent just converting ByteArrays to
char lists to C arrays. This makes me wonder how sensible it really is
to represent strings by char lists. Yes, it's nice and uniform and lazy,
but...

How can I get this faster, then? PackedStrings are not very useful
because they just don't support enough operations (getline, matchregex)
alone, and having to convert them to Strings sort of defeats their
purpose. _Any_ operation that provides only a String interface condemns
us to a gazillion allocations. And by default all char*-based foreign
interfaces are represented with Strings on the Haskell side.

Maybe a generic Textual class with at least String and PackedString (and
ByteArray?) as instances would help? Then the common string-based
operations could all have separate implementations for separate
representations. With heavy specialization, of course. This would be
especially useful if the FFI (especially withCString) supported it.

Or alternatively, maybe the foldr/build rewriting trick could be used to
eliminate some redundant conversions between representations?

Just throwing ideas in the air here.


Lauri Alanko
[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: Strings are slow

2002-11-19 Thread Simon Marlow
> module Main where
> 
> import Text.Regex
> import System.IO
> import System.Environment
> import Control.Monad
> import Data.Maybe
> 
> main = do [re] <- getArgs
> let rx = mkRegex re
> let loop = do line <- getLine
>   when (isJust (matchRegex rx line)) 
> (putStrLn line)
>   eof <- isEOF
>   unless eof loop
> loop
> 
> 
> It turned out that this is remarkably slow. The first problem was with
> inlining. If this is compiled with ghc-5.04 -O -ddump-simpl, I get:
> 
> case GHC.IOBase.unsafePerformIO
>@ (Data.Maybe.Maybe
>   (GHC.Base.String,
>GHC.Base.String,
>GHC.Base.String,
>[GHC.Base.String]))
>(Text.Regex.Posix.regexec 
> (Text.Regex.mkRegex re) a731)
> of wild4 {

This is indeed an optimiser bug, but it's the result of a design
decision: GHC is a bit laid back about inlining things inside the state
lambda in the IO monad, because it often enables important
optimisations.  However, we're experimenting with modifying this
"optimisation" so that it will be less likely to kill performance in the
way it did in your example.

In the meantime, you can add rx as an argument to loop, that will be
enough to fool GHC into not inlining rx.

> Ie. the regex is compiled anew every time a string is matched. A bug?
> 
> Anyway, without optimization the code produced is reasonable, 
> but still
> horrendously slow. Testing with a simple word as a pattern 
> from a 7.3MB,
> 800kline file, the running time was 37.5 seconds. For 
> comparison, a similar
> program in mzscheme (interpreted!) took 7.3 seconds while the system
> grep, of course, took 0.4 seconds.
> 
> I did some profiling by creating new top-level bindings for matchRegex
> and getLine (is there a better way?):
> 
> 
> total time  =   53.34 secs   (2667 ticks @ 20 ms)
> total alloc = 1,172,482,496 bytes  (excludes 
> profiling overheads)
> 
> COST CENTREMODULE   %time %alloc
> 
> match  Main  69.7   56.8
> getl   Main  23.9   40.2
> main   Main   6.33.0
> 
> 
> So it seems like all the time is spent just converting ByteArrays to
> char lists to C arrays. This makes me wonder how sensible it really is
> to represent strings by char lists. Yes, it's nice and 
> uniform and lazy, but...

String processing in Haskell is very slow, due to the list-of-characters
representation.  A more complete PackedString library with better
integration with other libraries (like Text.Regex) would help a lot for
these kind of examples.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Native Threads in the RTS

2002-11-19 Thread Wolfgang Thaller
I've now written up a slightly more formal proposal for native threads. 
(OK, it's only a tiny bit more formal...)
I doubt I have explained everything clearly, please tell me which 
points are unclear. And of course please tell me what you like/don't 
like about it.
I have some rough ideas on how to implement the proposal. I would be 
ready to invest some time, but I don't have enough free time to make 
any promises here. The discussion has to be finished first, anyway.

Cheers,
Wolfgang

***
Native Threads Proposal, version 1

Some "foreign" libraries (for example OpenGL) rely on a mechanism 
called thread-local storage. The meaning of an OpenGL call therefore 
usually depends on which OS thread it is called from. Therefore, some 
kind of direct mapping from Haskell threads to OS threads is necessary 
in order to use the affected foreign libraries.
Executing every haskell thread in its own OS thread is not feasible for 
performance reasons. However, perfomance of native OS threads is not 
too bad as long as there aren't too many, so I propose that some 
threads get their own OS threads, and some don't:

Every Haskell Thread can be either a "green" thread or a "native" 
thread.
For each "native" thread, there is exactly one OS thread created by the 
RTS. For a green thread, it is unspecified which OS thread it is 
executed in.
The main program and all haskell threads forked using forkIO are green 
threads. Threads forked using forkNativeThread :: IO () -> IO () are 
native threads.

Execution of a green thread might move from one OS thread to another at 
any time. A "green" thread is never executed in an OS thread that is 
reserved for a "native" thread.
A "native" haskell thread and all foreign imported functions that it 
calls are executed in its associated OS thread. A foreign exported 
callback that is called from C code executing in that OS thread is 
executed in the native haskell thread.
A foreign exported callback that is called from C code executing in an 
OS thread that is not associated with a "native" haskell thread is 
executed in a new green haskell thread.

Only one OS thread can execute Haskell code at any given time.

If a "native" haskell thread enters a foreign imported function that is 
marked as "safe" or "threadsafe", all other Haskell threads keep 
running. If the imported function is marked as "unsafe", no other 
threads are executed until the call finishes.

If a "green" haskell thread enters a foreign imported function marked 
as "threadsafe", a new OS thread is spawned that keeps executing other 
green haskell threads while the foreign function executes. Native 
haskell threads continue to run in their own OS threads.
If a "green" haskell thread enters a foreign imported function marked 
as "safe", all other green threads are blocked. Native haskell threads 
continue to run in their own OS threads. If the imported function is 
marked as "unsafe", no other threads are executed until the call 
finishes.

Finalizers are always run in green threads.

Issues deliberately not addressed in this proposal:
Some people may want to run several Haskell threads in a dedicated OS 
thread (this is what has been called "thread groups" before).
Some people may want to run finalizers in specific OS threads (are 
finalizers predictable enough for this to be useful?).
Everyone would want SMP if it came for free (but SMP seems to be too 
hard to do at the moment...)

Other things I'm not sure about:
What should we do get if a foreign function spawns a new OS thread and 
executes a haskell callback in that OS thread? Should a new native 
haskell thread that executes in the OS thread be created? Should the 
new OS thread be blocked and the callback executed in a green thread? 
What does the current threaded RTS do? (I assume the non-threaded RTS 
will just crash?)

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


re-opening a closed stdin?

2002-11-19 Thread Bernard James POPE
Hi all,

There's probably a really obvious answer to this, but I can't find it.

Is there any way in GHC to reopen stdin if it has been closed?

You may wonder why I'd want this. Well I'm writing a debugger
for Haskell 98 (*) and my debugger wants to do some interaction on the terminal
_after_ the user's program has run. If the user's program puts stdin into
a closed or semi-closed state then that causes trouble for my debugger.

What I'd like to do is close stdin after the end of the user's program,
flush any input waiting in the buffer, then reopen it fresh for reading.

If this can't be easily done perhaps there is another solution you can think
of.

(*) www.cs.mu.oz.au/~bjpop/buddha/ 

Cheers,
Bernie.
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users