Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread David Morris
On Mon, Mar 03, 2008 at 01:23:06PM +0800, Hongzheng Wang wrote:
 Hi,
 
 I have just cvsed and rebuilt the whole system.  No error
 occurs.  And my system is Debian sid.  So, I think the
 problem you encountered might due to some missing or
 mismatched packages on your debian box. Perhaps, the
 install.log in /usr/local/plan9/ would be helpful to
 discover what's wrong during installation.

Well, one step forward, one step back

install.log was no help, the message I quoted was everything
relevant.

I took a stab at running gdb through yacc, but the compiler
optimized the code to the point finding the problem was
nearly impossible.best I can say is its somewhere in the
dofmt() function (lib9/fmt/dofmt.c) or something it calls.

So I pulled out my VERY slow laptop and spent a few hours
letting it compile plan9port.

This time the build worked, so looks like some lenny/sid
packages don't work well together.  Hmm, or another
possibility occurs to me.  I use the AMD64 kernel
(2.6.22-3-amd64) on my desktop, but i686 on the laptop
(2.6.22-3-i686).  Any chance that could cause a problem?

I tried copying the pure lenny install to the main system
file structure, but that clearly does not work because wmii
(the application I need plan9port for) does not run.

So, any ideas on how to fix the build process?  The problem
stems from yacc.c at line #2173 in the sprint() function.
Could I replace that with the standard library sprintf()
function as a stop-gap measure?

--David


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread Philippe Anel


Ron, I thought Paul was talking about cache coherent system on which a 
high-contention lock can become a huge problem.
Although the work did by Jim Taft on the NASA project looks very 
interesting (and if you have pointers to papers about locking primitive 
on such system, I would appreciate), it seems this system is memory 
coherent, not cache coherent (coherency maintained by SGI NUMALink 
interconnect fabric).
And I agree with you. I also think (global) shared memory for IPC is 
more efficient than passing copied data across the nodes, and I suppose 
several papers tend to confirm this is the case: today's interconnect 
fabrics are lot of faster than memory memory access.
My conjecture (I only have access to a simple dual core machines) is 
about locking primitive used in CSP (and IPC), I mean libthread which is 
based on rendezvous system call (which does use locking primitives 
9/proc.c:sysrendezvous() ). I think this is the only reason why CSP 
would not scale well.

Regarding my (other) conjecture about IPI, please read my answer to Paul.

Phil;


If CSP system itself takes care about memory hierarchy and uses no
 synchronisation (using IPI to send message to another core by example),
 CSP scales very well.



Is this something you have measured or is this conjecture?
  



 Of course IPI mechanism requires a switch to kernel mode which costs a
 lot. But this is necessary only if the destination thread is running on
 another core, and I don't think latency is very important in algorigthms
 requiring a lot of cpus.



same question.

For a look at an interesting library that scaled well on a 1024-node
SMP at NASA Ame's, by Jim Taft.
Short form: use shared memory for IPC, not data sharing.

he's done very well this way.

ron


  




Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread Philippe Anel


Please note I'm not an expert in this domain. I am only interested in 
this area, and have only read a few papers.

It is interesting to talk with you about this 'real world' problems.
Latency is quite important in the application domain I have to target: 
the target is to produce a new image every 60th of a second, including 
all the simulation effort to get there.  In addition, we have user 
input which needs to be processed, and usually network delays to worry 
about as well.  Every bit of latency between user input and display 
breaks the illusion of control.  And though TVs are getting better, 
it's not atypical to see 4-6 frames of latency introduced by the 
display subsystem, once you've finished generating a frame buffer.
So, does this mean the latency is only required by the I/O system of 
your program ? If so, maybe I'm wrong, what you need is to be able to 
interrupt working cores and I'm afraid libthread doesn't help here.
If not and your algorithm requires (a lot of) fast IPC, maybe this is 
the reason why it doesn't scale well ?


I don't know what you mean by CSP system itself takes care about 
memory hierarchy.  Do you mean that the CSP implementation does 
something about it, or do you mean that the code using the CSP 
approach takes care of it?

Both :)
I agree with you about the fact programming for the memory hierarchy is 
way more important than optimizing CPU clocks.
But I also think synchronization primitives used in CSP systems are the 
main reason why CSP programs do not scale well (excepted bad designed 
algorithm of course).
I meant that a different CSP implementation, based on different 
synchronisation primitive (IPI), can help here.


IPI isn't free either - apart from the OS switch, it generates bus 
traffic that competes with the cache coherence protocols and memory 
traffic; in a well designed compute kernel that saturates both compute 
and bandwidth the latency hiccups so introduced can propagate really 
badly.


This is very interesting. For sure IPI is not free. But I thought the 
bus traffic generated by IPI was less important than cache coherence 
protocols such as MESI, mainly because it is a one way message.
I think now IPI are sent through the system bus (local APIC used to talk 
through a separate bus), so I agree with you about the fact it can 
saturate the bandwidth. But I wonder if locking primitive are not worse. 
It would be interesting to test this.


Phil;




Re: [9fans] [OT] interesting hardware project for gsoc

2008-03-03 Thread Alexander Sychev

In the Russian army:

Sgt.: Who is a painter here?
Soldier: I'm the painter.
Sgt.: Well, take this axe and draw me a stack of firewood in the morning.

On Mon, 03 Mar 2008 09:32:32 +0300, Skip Tavakkolian [EMAIL PROTECTED]  
wrote:



here's an idea from brucee for a gsoc project.  he did a
proof-of-concept earlier today.  basically take something like this:

http://www.rangboom.com/images/brucee/heap.jpg

and transform it into this:

http://www.rangboom.com/images/brucee/emptyarena.jpg

with this as the byproduct:

http://www.rangboom.com/images/brucee/contentaddressablestacks.jpg

he is searching for the right graduate student to mentor for this fine
art.




--
Best regards,
  santucco


Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread erik quanstrom
 install.log was no help, the message I quoted was everything
 relevant.
 
 I took a stab at running gdb through yacc, but the compiler
 optimized the code to the point finding the problem was
 nearly impossible.best I can say is its somewhere in the
 dofmt() function (lib9/fmt/dofmt.c) or something it calls.

i trust you ran yacc under gdb not gdb through yacc.  :-)
the problem is unlikely to be with the print.  it likely
occurred in argument parsing.

one thing that should be fixed in p9p is the ARGF() calls
should be replaced with EARGF(usage()) in setup().  the
definition of usage should be

void
usage(void)
{
fprint(2, usage: yacc [-Dn] [-vdS] [-o outputfile] [-s stem] 
grammar\n);
exits(usage);
}

once that is fixed, it would be interesting to see if yacc
prints a usage statement instead of printing the garbage.

assuming that things are still broken, i would suggest
adding fprint(2, ...) statements in setup to understand
where things are going wrong.

- erik


Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread sqweek
On Mon, Mar 3, 2008 at 5:46 PM, David Morris [EMAIL PROTECTED] wrote:
  This time the build worked, so looks like some lenny/sid
  packages don't work well together.  Hmm, or another
  possibility occurs to me.  I use the AMD64 kernel
  (2.6.22-3-amd64) on my desktop, but i686 on the laptop
  (2.6.22-3-i686).  Any chance that could cause a problem?

 I run p9p on x86_64 at work (CentOS), so no. There are some problems
with 9pfuse under x86_64 (which look like fuse's fault to me), but the
only problem I had at build was missing dependencies (some X11
development packages).
 I'll try and remember to cvs update and see if it still builds.
-sqweek


Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread Russ Cox
 I am trying to install plan9port on a Linux system (Debian),
 and am getting the following error:
 
 9 yacc -d -s bc bc.y
 
  fatal error:can't create , nil:1
 mk: 9 yacc -d ...  : exit status=exit(1)
 mk: for i in ...  : exit status=exit(1)

The nil is just because the error has happened 
very early and yacc hasn't opened the input file yet.
If you poke around in the code you'll find that 
it was trying to create bc.tab.h (or should have been)
but somehow this code (stem=bc, FILED = tab.h):

sprint(buf, %s.%s, stem, FILED);
fdefine = Bopen(buf, OWRITE);
if(fdefine == 0)
error(can't create %s, buf);

ended up with an empty string in buf instead of bc.tab.h.

 So, any ideas on how to fix the build process?  The problem
 stems from yacc.c at line #2173 in the sprint() function.
 Could I replace that with the standard library sprintf()
 function as a stop-gap measure?

It would be interesting to know if that makes it work,
but more interesting would be why the Plan 9 sprint
is broken.  This is a pretty simple sprint call and should work.

Can you reproduce the prolem if you just run:

cd /usr/local/plan9/src/cmd
9 yacc -s bc bc.y

?

I'm also interested to see the output of:

nm /usr/local/plan9/bin/yacc | grep sprint

@erik:
 once that is fixed, it would be interesting to see if yacc
 prints a usage statement instead of printing the garbage.

The command line passed in the mkfile has worked in
thousands of other builds.  Even if stem was nil, buf
should at least end up being nil.tab.h or .tab.h
or at the very worst, if %s was broken, ..

I doubt the command line is being misparsed, but 
I don't have any justifiable alternate theories.

Russ



Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread David Morris
On Tue, Mar 04, 2008 at 12:09:27AM +0900, sqweek wrote:
 On Mon, Mar 3, 2008 at 5:46 PM, David Morris [EMAIL PROTECTED] wrote:
   This time the build worked, so looks like some lenny/sid
   packages don't work well together.  Hmm, or another
   possibility occurs to me.  I use the AMD64 kernel
   (2.6.22-3-amd64) on my desktop, but i686 on the laptop
   (2.6.22-3-i686).  Any chance that could cause a problem?
 
  I run p9p on x86_64 at work (CentOS), so no. There are some problems
 with 9pfuse under x86_64 (which look like fuse's fault to me), but the
 only problem I had at build was missing dependencies (some X11
 development packages).
  I'll try and remember to cvs update and see if it still builds.

Good to know.  I was thinking more along the lines of a
problem because I'm using a 64-bit kernel in a 32-bit
userspace, a setup I've had other applications have problems
with, though it was a binary distribution I simply had to
recompile.

--David


Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread David Morris
On Mon, Mar 03, 2008 at 11:17:47AM -0500, Russ Cox wrote:
  I am trying to install plan9port on a Linux system (Debian),
  and am getting the following error:
  
  9 yacc -d -s bc bc.y
  
   fatal error:can't create , nil:1
  mk: 9 yacc -d ...  : exit status=exit(1)
  mk: for i in ...  : exit status=exit(1)
 
 The nil is just because the error has happened 
 very early and yacc hasn't opened the input file yet.
 If you poke around in the code you'll find that 
 it was trying to create bc.tab.h (or should have been)
 but somehow this code (stem=bc, FILED = tab.h):
 
   sprint(buf, %s.%s, stem, FILED);
   fdefine = Bopen(buf, OWRITE);
   if(fdefine == 0)
   error(can't create %s, buf);
 
 ended up with an empty string in buf instead of bc.tab.h.

At that point in the code, stem is set to bc as expected.

  So, any ideas on how to fix the build process?  The problem
  stems from yacc.c at line #2173 in the sprint() function.
  Could I replace that with the standard library sprintf()
  function as a stop-gap measure?
 
 It would be interesting to know if that makes it work,
 but more interesting would be why the Plan 9 sprint
 is broken.  This is a pretty simple sprint call and should work.

For what its worth, I just added the following lines to
yacc.c at the top of the file:

#include stdio.h
#define sprint sprintf

The build of plan9port just completed with no errors, the
problem is somewhere in sprint().

I'll try and find time tonight to test out the plan9port
build to verify it works.  Let me know if I can provide any
other useful information.  I might try tracking down the bug
later this week, but not certain I'll have much time to do
so.

 Can you reproduce the prolem if you just run:
 
   cd /usr/local/plan9/src/cmd
   9 yacc -s bc bc.y
 
 ?

Yes, I get the exact same output.

 I'm also interested to see the output of:
 
   nm /usr/local/plan9/bin/yacc | grep sprint

Here is the result:

0804fbe0 T sprint

 @erik:
  once that is fixed, it would be interesting to see if yacc
  prints a usage statement instead of printing the garbage.
 
 The command line passed in the mkfile has worked in
 thousands of other builds.  Even if stem was nil, buf
 should at least end up being nil.tab.h or .tab.h
 or at the very worst, if %s was broken, ..

Makes sense.

 I doubt the command line is being misparsed, but 
 I don't have any justifiable alternate theories.

My first thought was also the command-line was not being
parsed, but gdb shows stem is set to bc as expected.  Its
why I suspect the problem somewhere under sprint().

--David



[9fans] o/mero and o/live

2008-03-03 Thread Fco. J. Ballesteros
I just updated the distrib of the octopus as
exported from lsub, to include o/mero and o/live
(the file system for omero and the viewer).

I made today a silly package to use it
standalone on Plan 9 (without the rest of
the octopus) only to discover that somehow
mounting via #s made omero go s l o o o w.

Thus, I wont bother to publish a stand alone
package for use on  Plan 9 and will fix this
problem instead.

In any case, if anyone wanted to try it,
the version as distributed today has been
working for me and man pages are updated.

Once the octopus tar has been extracted
on inferno (mostly /dis/o files), running
o/mero is a matter of executing:

o/ports # event delivery
o/mero  # the file system proper
mkdir /mnt/ui/s0# create a screen
o/x # ~ to acme and sam language
o/live s0   # the viewer (at screen s0)

olive(1) is an introduction and is needed to
learn how to use our weird menus.

The next millenium, once this thing
runs fast enough,
I'll drop here a line to let others know.



Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread erik quanstrom
 
 For what its worth, I just added the following lines to
 yacc.c at the top of the file:
 
 #include stdio.h
 #define sprint sprintf
 
 The build of plan9port just completed with no errors, the
 problem is somewhere in sprint().
 
 I'll try and find time tonight to test out the plan9port
 build to verify it works.  Let me know if I can provide any
 other useful information.  I might try tracking down the bug
 later this week, but not certain I'll have much time to do
 so.

it is very likely that you have broken yacc in a different
way by doing this.  stdio formats are not compatable with
plan 9 print formats.  for example, u is a flag when used
with sprint but a verb when used with printf.

(not to mention the fact that other programs than yacc
use sprint.)

have you verified that a standalone program with a
similar print statement has the same problems?

- erik


Re: [9fans] [OT] interesting hardware project for gsoc

2008-03-03 Thread Martin Harriss

Alexander Sychev wrote:

In the Russian army:

Sgt.: Who is a painter here?
Soldier: I'm the painter.
Sgt.: Well, take this axe and draw me a stack of firewood in the morning.


That one sounds like it was written by Milligan.

Martin


[9fans] migrating from oventi to nventi

2008-03-03 Thread cinap_lenrek
I have problems migrating oventi data to nventi on another machine.

first tried to extract arenas with oventis venti/rdarena and then
pumping them into nventi with nventis venti/wrarena.
when tried to format fossil with the last root-score, it failed to
find the block.

i tried oventis venti/copy with -f mode... this time formating fossil
with the rootscore worked, but when doing du -a and venti read
error appeared. the missing scores had the last digit always set to
zero iirc...

the same happend with nventis venti/copy -m.
(exact same read errors)

then tried with -m -r (rewrite) option and this got me to where i started...
fossil cant find the rootscore.

the old venti/fossil system where i copying from runs just fine, no missing
blocks or disk errors.

any further ideas?

cinap



Re: [9fans] plan9port build failure on Linux (debian)

2008-03-03 Thread David Morris
On Mon, Mar 03, 2008 at 02:02:04PM -0500, erik quanstrom wrote:
  
  For what its worth, I just added the following lines to
  yacc.c at the top of the file:
  
  #include stdio.h
  #define sprint sprintf
  
  The build of plan9port just completed with no errors, the
  problem is somewhere in sprint().
  
  I'll try and find time tonight to test out the plan9port
  build to verify it works.  Let me know if I can provide any
  other useful information.  I might try tracking down the bug
  later this week, but not certain I'll have much time to do
  so.
 
 it is very likely that you have broken yacc in a different
 way by doing this.  stdio formats are not compatable with
 plan 9 print formats.  for example, u is a flag when used
 with sprint but a verb when used with printf.
 
 (not to mention the fact that other programs than yacc
 use sprint.)

Just ran a quick test.  While the applications compiled,
they were non-functional as you suspected.  I tried
replacing the content of sprint() with vsprintf().  All
applications compiled and the functionality I've tried so
far (that used by the wmii window manager) seems to work.

Entirely possible, though, I've just been lucky in not
hitting a string parsed differently by sprint().

 have you verified that a standalone program with a
 similar print statement has the same problems?

I just gave it a try using the following:


#define FILED   tab.h
#define stembc

int main(int argc, char** argv)
{
/* Lines copied from yacc.c */
char buf[256];
int result = sprint(buf, %s.%s, stem, FILED);
printf(%i: %s\n, result, buf);
/* End code from yacc.c */
return 0;
}


The result was the same as in yacc: return value of 0 and
'buf' is empty.

--David



Re: [9fans] migrating from oventi to nventi

2008-03-03 Thread cinap_lenrek
dont know, but this are 2 physical different machines... i'm not
putting a nventi on oventis partitions... this is all from scratch...
before every try i reformated any arenas, isects and bloom-filters
of nventi.

but i'll try the rd/wrarena thing again and this time rebuilding the
whole index first.

thansk :-)

cinap

---BeginMessage---
On Mon, Mar 3, 2008 at 3:26 PM, [EMAIL PROTECTED] wrote:

 first tried to extract arenas with oventis venti/rdarena and then
 pumping them into nventi with nventis venti/wrarena.
 when tried to format fossil with the last root-score, it failed to
 find the block.


I have found the need to rebuild the index a couple of times with nventi
since i switched from old to new.  The first was just after switching
in-place from old to new.  The second was a month later when I started to
see block-rot again, although most likely caused by something I did.
---End Message---


[9fans] Xen and new venti

2008-03-03 Thread stella
Hi,
I'm trying to install plan9 under Xen 3.2.0 with venti
but the kernel avaiable on the web is too old to support nventi.
I decided to try to compile it but there were some troubles I was able to fix
(with the help of #plan9). Now there's a problem (or two) I do not know 
how to solve; I must inform you that I'm not an expert of both plan9 and C.
This is what I did:

9fs sources
cpr /n/sources/xen/xen3/9 /sys/src/9
cd /sys/src/9/xen3

then I got /usr/local/xen from my xen installation and put it under xen-public
I edited xenpcf to remove il support

mk

I had a problem of type in xendat.h fixed by replacing 
uint8 with uint at line 1540

mk

I had some other problems which I fixed by 
1) adding void  mfence(void); into fns.h
2) editing line 286 in sdxen.c to look like
 xenbio(SDunit* unit, int lun, int write, void* data, long nb, uvlong 
bno)
3) adding /$objtype/lib/libip.a\ to LIB in mkfile
4) adding in l.s
TEXT mfence(SB), $0
BYTE $0x0f
BYTE $0xae
BYTE $0xf0
RET

mk now gives two errors which I do not know how to fix:
...omissis...
size 9xenpcf
v4parsecidr: undefined: memcpy in v4parsecidr
_strayintrx: _ctype: not defined
mk: 8c -FVw '-DKERNDATE='`{date ...  : exit status=rc 7851: 8l 7855: error

any idea?
Thank you

S.



Re: [9fans] awk, not utf aware...

2008-03-03 Thread Jack Johnson
On Thu, Feb 28, 2008 at 6:10 AM, erik quanstrom [EMAIL PROTECTED] wrote:
  perhaps it would be more effective to break down the concept
  a bit.  instead of a general locale hammer, why not expose some
  operations that could go into a locale?  for example, have a base-
  character folding switch that allows regexps to fold codpoints into
  base codepoints so that íïìîi - i.  this information is in the unicode
  tables.  perhaps the language-dependent character mapping should
  be specified explictly. c.

Loosely-related tangent:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg20395.html

 On the LINUX machines running utf-8 the ä is coded as $C3A4 which is
 in utf-8 equal to the character E4. The ä occupies in that way 2 bytes.

 I was very astonished, when I copied a mac-filename, pasted into a
 texteditor and looked at the file:

 In the mac-filename the letter ä is coded as: $61CC88, which in utf-8
 means the letter a followed by a $0308. (Combining diacritical marks)
 So the Mac combines the letter a with the two points above it instead
 using the E4 letter
 Now the things are clear: The filenames are different, in spite of
 looking equally.

So, if folding codepoints is a reasonable tactic, how many
representations do you need to fold?  How many binary representations
are needed to fold íïìîi - i?

-Jack


Re: [9fans] awk, not utf aware...

2008-03-03 Thread erik quanstrom
  On the LINUX machines running utf-8 the ä is coded as $C3A4 which is
  in utf-8 equal to the character E4. The ä occupies in that way 2 bytes.
 
  I was very astonished, when I copied a mac-filename, pasted into a
  texteditor and looked at the file:
 
  In the mac-filename the letter ä is coded as: $61CC88, which in utf-8
  means the letter a followed by a $0308. (Combining diacritical marks)
  So the Mac combines the letter a with the two points above it instead
  using the E4 letter
  Now the things are clear: The filenames are different, in spite of
  looking equally.
 
 So, if folding codepoints is a reasonable tactic, how many
 representations do you need to fold?  How many binary representations
 are needed to fold íïìîi - i?

i didn't make my point very well.  in this case i was suggesting a -f flag
for grep that would map a codepoints into their base codepoint.  the match
result would be the original text --- in the manner of the -i flag.

seperately, however ...

utf combining characters are a really unfortunate choice, imho.  there
is no limit to the number of combining codepoints one can add to
a base codepoint.  you can, for example build a single letter like this
U+0061 U+0302 ... U+0302
i don't think it's possible to build legible glyphs from bitmaps using
combining diacriticals.

therefore, i would argue for reducing letters made up of base+combiners
to a precombined codepoint whenever possible.  it would be helpful
if tcs did this.  infortunately some transliterations of russian into the roman
alphabet use characters with no precombined form in unicode.

rob probablly has a more informed opinion on this than i.

- erik


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread erik quanstrom
 In fact the more I think about it, the more it seems like having
 a direct way of manipulating L1/L2 caches would be more of a benefit
 than a curse at this point. Prefetches are nothing but a patchwork
 over the fundamental need for programming memory hierarchy in an
 efficient way. But, I guess, there's more to it on the hardware side
 than just crafting additional opcodes.
 

really?  to out-predict the cache hardware, you have to have pretty
complete knowlege of everything running on all cores and be pretty
good at guessing what will want scheduling next.  not to mention,
you'd need to keep close tabs on which memory is cachable/wc/etc.
maybe system management mode makes too much of an impression
on me.  but on an intel system, there's on way to prevent the cpu/bios/ipmi
from issuing an smm interrupt anytime it pleases and taking over
your hardware.  my conclusion is we don't have as much control
over the hardware as we think we do.

  decompositions that keep the current working set in cache (at L3, L2,  
  or L1 granularity, depending), while simultaneously avoiding having  
  multiple processors chewing on the same data (which leads to vast  
  amounts of cache synchronization bus traffic).  Successful algorithms  
  in this space work on small bundles of data that either get flushed  
  back to memory uncached (to keep more cache for streaming in), or in  
  small bundles that can be passed from compute kernel to compute  
  kernel cheaply.  Having language structures to help with these  
  decompositions and caching decisions is a great help - that's one of  
  the reasons why functional programming keeps rearing its head in this  
  space.  Without aliasing and global (serializing) state it's much  
  easier to analyze the program and chose how to break up the  
  computation into kernels that can be streamed, pipelined, or  
  otherwise separated to allow better cache utilization and parallelism.

aren't these arguments for networked rather than shared memory
multiprocessors?

- erik


Re: [9fans] Xen and new venti

2008-03-03 Thread erik quanstrom
 I'm trying to install plan9 under Xen 3.2.0 with venti
 but the kernel avaiable on the web is too old to support nventi.

perhaps you mean that this kernel has an old venti linked in?

 I had a problem of type in xendat.h fixed by replacing 
 uint8 with uint at line 1540

i suspect you mean uchar.  (or uvlong if they're counting 
bytes.)

 mk now gives two errors which I do not know how to fix:
 ...omissis...
 size 9xenpcf
 v4parsecidr: undefined: memcpy in v4parsecidr

replace memcpy with memmove.

 _strayintrx: _ctype: not defined

_ctype is used by the is* functions like isascii.

- erik


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread Paul Lalonde

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Mar 3, 2008, at 4:49 PM, erik quanstrom wrote:

really?  to out-predict the cache hardware, you have to have pretty
complete knowlege of everything running on all cores and be pretty
good at guessing what will want scheduling next.  not to mention,
you'd need to keep close tabs on which memory is cachable/wc/etc.


On the flip side, ignoring the cache leads to algorithms whose  
working sets can't fit in cache, wasting a considerable amount of  
processing to cache misses.  Being able to parameterize your  
algorithms to work comfortably in one or two ways of your cache can  
bring *huge* performance improvements without dropping to the  
traditional assembly.  I'm arguing that being aware of the caches  
lets the system better schedule your work because you aren't  
preventing it from doing something smart.




aren't these arguments for networked rather than shared memory
multiprocessors?


Yes.  Although I work for a company that prides itself on its cache  
coherence know-how, I'm very much a believer in networked  
multiprocessors, even on a chip.   I like Cell better than Opteron,  
for example.  They are harder to program up front, however, which  
causes difficulties in adoption.  Flip-side, once you've overcome  
your startup hurdles the networked model seems to provide more  
predictable performance management.


Paul



- erik


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHzLEdpJeHo/Fbu1wRAkNNAKC54Me95evWld4cUlUb0Wd9NXQF7QCfZ9zn
QZ5kn6JLfK3EXocNz+plF4c=
=xSzm
-END PGP SIGNATURE-


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread Paul Lalonde

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Mar 3, 2008, at 1:12 AM, Philippe Anel wrote:


So, does this mean the latency is only required by the I/O system  
of your program ? If so, maybe I'm wrong, what you need is to be  
able to interrupt working cores and I'm afraid libthread doesn't  
help here.
If not and your algorithm requires (a lot of) fast IPC, maybe this  
is the reason why it doesn't scale well ?


No, the whole simulation has to run in the low-latency space - it's a  
video game and its rendering engine, which are generally highly  
heterogeneous workload.  And that heterogeneity means that there are  
many points of contact between various subsystems.  And the (semi-) 
real-time constraint means that you can't just scale the problem up  
to cover overhead costs.




I don't know what you mean by CSP system itself takes care about  
memory hierarchy.  Do you mean that the CSP implementation does  
something about it, or do you mean that the code using the CSP  
approach takes care of it?

Both :)
I agree with you about the fact programming for the memory  
hierarchy is way more important than optimizing CPU clocks.
But I also think synchronization primitives used in CSP systems are  
the main reason why CSP programs do not scale well (excepted bad  
designed algorithm of course).
I meant that a different CSP implementation, based on different  
synchronisation primitive (IPI), can help here.


I'm more interested just now in working with lock-free algorithms;  
I've not made any good measurements of how badly our kernels would  
hit channels as the number of threads increases.  Perhaps some could  
be mitigated through a better channel implementation.




IPI isn't free either - apart from the OS switch, it generates bus  
traffic that competes with the cache coherence protocols and  
memory traffic; in a well designed compute kernel that saturates  
both compute and bandwidth the latency hiccups so introduced can  
propagate really badly.


This is very interesting. For sure IPI is not free. But I thought  
the bus traffic generated by IPI was less important than cache  
coherence protocols such as MESI, mainly because it is a one way  
message.


It depends immensely on the hardware implementation of your IPI.  If  
you wind up having to pay for MESI as well, then the advantage  
becomes less.


I think now IPI are sent through the system bus (local APIC used to  
talk through a separate bus), so I agree with you about the fact it  
can saturate the bandwidth. But I wonder if locking primitive are  
not worse. It would be interesting to test this.


Agreed!

Paul

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHzLSSpJeHo/Fbu1wRAkv/AKDKK4fuuWyYCqXv4JqbWWj+RXQd0wCfSFoS
b9E6X/a13bg6AzUGT5dLSqU=
=ppoF
-END PGP SIGNATURE-


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread erik quanstrom
 Yes.  Although I work for a company that prides itself on its cache  
 coherence know-how, I'm very much a believer in networked  
 multiprocessors, even on a chip.   I like Cell better than Opteron,  
 for example.  They are harder to program up front, however, which  
 causes difficulties in adoption.  Flip-side, once you've overcome  
 your startup hurdles the networked model seems to provide more  
 predictable performance management.

tell me about it.  a certain (nameless) vendor makes a pcie ethernet
chipset with its descriptor rings in system memory, not pci space.
it's bizarre watching the performance vs. the number of buffers loaded
into the ring between head ptr updates.  slight tweeks to the algorithm
can result in 35% performance differences.

suprisingly, another (also nameless) vendor makes a similar chipset with
rings in pci space.  this chipset has very stable performance in the face of
tuning of the reloading loop.  this chip performs just as well as the former
though each 32-bit write to the ring buffer results in a round trip over
the pcie bus to the card.

- erik


Re: [9fans] GCC/G++: some stress testing

2008-03-03 Thread Lyndon Nerenberg


On 2008-Mar-1, at 08:41 , ron minnich wrote:


very litlle f77 left in my world, maybe somebody else has some.


And also in response to Pietro's comments ...

I have lots of dusty but still valid F77 code I use for antenna and RF  
circuit design (i.e. NEC and SPICE). Yes, there are newer versions of  
this stuff in C, but none of the C code fixes any bugs I'm not aware  
of or cannot otherwise deal with. Or more importantly, my models are  
not already aware of and compensate for.


And then there is Dungeon.  Netlib's f2c + libraries are ready to go  
with Plan 9, given the correct compiler defines. Building a colossal  
cave out of old card decks does not require outsourcing to the  
dwarves ;-)


--lyndon