Re: [9fans] suicide message on vmware

2014-06-06 Thread erik quanstrom
 I thought that replica/pull on a 9atom would pull 9atom binaries and
 not the labs version. Looks like that assumption is wrong?

only on .iso versions of 9atom several years old.  to correct this issue,
you'd have to sync /usr/glenda/bin/rc/pull first.

- erik



Re: [9fans] suicide message on vmware

2014-06-06 Thread erik quanstrom
On Fri Jun  6 12:08:28 EDT 2014, vu3...@gmail.com wrote:
 On Fri, Jun 6, 2014 at 9:25 PM, erik quanstrom quans...@quanstro.net wrote:
  I thought that replica/pull on a 9atom would pull 9atom binaries and
  not the labs version. Looks like that assumption is wrong?
 
  only on .iso versions of 9atom several years old.  to correct this issue,
  you'd have to sync /usr/glenda/bin/rc/pull first.
 
 Thanks. I just discovered the existence of /dist/replica/atom which is
 used by /usr/glenda/bin/rc/pull script.

yup!  did you do it some other way?  that is, did i leave something
dangerous lying about?

- erik



[9fans] power _xinc vs ainc

2014-06-06 Thread erik quanstrom
the /sys/src/9/ppc kernel adds this line
to both _xinc and _xdec

DCBF(R4)/* fix for 603x bug */

would this also be needed for the c library version?

- erik



Re: [9fans] suicide message on vmware

2014-06-06 Thread erik quanstrom
 What two databases?

the divergent versions of /sys/lib/dist/replica/plan9.db and its log
on the sources and 9atom.

 Replica respects local changes at the file level.  You still
 have to do a manual merge if the server version changed as
 well.

that's what i said, but this is remove vs remote, and replica is
unable to deal with this sort of issue.

 The bigger issue is that the unit of update needs to be a
 *set* of files that take a system from one consistent state to
 another. If you update only a subset of files, you may be left
 with an inconsistent system.

 it would be a great feature, but it's unrelated to this failure.

i'm part of the way there with the patch system in atom.  in theory
a few scripts could allow one to add changes as required.  unfortunately,
it's pretty easy for this to go wrong, even with tools like hg.

 For a foolproof update in case of incompatible kernel changes
 (and if you're running the same distribution as you pulled
 from), you should

if one recalls back to the beginning of the 4th edition, the
install cd would upgrade things as well as to initial installs.

- erik



[9fans] glenda at 3200x1800x32

2014-06-06 Thread erik quanstrom
i was kindly sent this by someone who's had success with a modern
laptop:

 Hello Erik
 
 Just tried the latest usb image and wanted to let you know it works quite 
 well. I get this message, though:
 
 acpinitr: pm1sts 0x400 pm1en 0x100
 
 repeated all the time, getting a lot of context switching if i read stats 
 -lmisce correctly.
 
 The screen cat [can] get to 3200x1800x32 with vesa, which is the native panel 
 resolution,  and it works quite well. This laptop has an intel and a nvidia 
 card. I guess it is using the intel by default.
 
...
 
 The notebook also has two disks (liteon mstata 32GB and samsung evo 250GB) 
 and both get listed in  /dev/sd*. They are partitioned using GPT partition 
 tables, which i guess 9atom does not handle yet.
 
 So here is another proof that plan9 can run on recent hardware, including 
 beautiful notebooks :)

sadly, no onboard ethernet.  i think the usb stick ethernet should
work with a little poking.

- erik



Re: [9fans] glenda at 3200x1800x32

2014-06-06 Thread erik quanstrom
On Sat Jun  7 00:34:03 EDT 2014, s...@9front.org wrote:
  So here is another proof that plan9 can run on recent hardware, including
  beautiful notebooks :)
 
 Is any more specific information available about the manufacturer and model
 of the laptop?
 

i quote,
dell xps 15 (model 9530) from late 2013

- erik



Re: [9fans] advanced core Linux kernel features not in plan9

2014-06-06 Thread erik quanstrom
 I was curious to know which core features of the Linux kernel are not 
 implemented
 in the plan9 kernel. By core I mean that I know plan9 does not have all the 
 drivers,
 filesystems, buses, etc Linux has, but it has many of its core
 features (virtual memory, paging, swapping, demand loading, copy on write, 
 etc),
 and even more.

on a good day with the right kernel you may get paging with plan 9, but never
swapping.  i've turned it off in my kernels, and it's not coming back.  it's 
hard to
use with multiple page sizes, and as charles notes, devices are either big 
enough to
not need it, or too small to have anything to page out to.

 For instance I was not able to find any code related to the buffer cache 
 Linux has.
 If you open a big file in a plan9 process, then close it, and later you open 
 it again,
 will you pay the IO again? Or is it cached somewhere?

there is no buffer cache in plan 9.  file servers do cache, but they
are typically not on the same box, so it should be clear why there is no
buffer cache.

- erik



Re: [9fans] What is Plan9 exactly?

2014-06-05 Thread erik quanstrom
On Thu Jun  5 06:36:36 EDT 2014, charles.fors...@gmail.com wrote:

 On 5 June 2014 06:11, OMAR RADWAN owemeac...@live.com wrote:
 
  I just did, though I cannot find anything about the kernel architecture
 
 
 Fortunately, there is a book about it. http://lsub.org/who/nemo/9.pdf which
 might have been updated to 4th Edtion,
 but the overall structure is similar, although with many improvements.

also, brian's book:

http://www.amazon.com/Brian-L.-Stuart/e/B001JS3E00

- erik



Re: [9fans] minor kernel bug

2014-06-05 Thread erik quanstrom
 we do that in ilock() and canlock() so it's a bug I think to not do it also 
 in lock().
 The field is only used in iprintcanlock which use canlock(), not lock(), so 
 this
 if fine, but for consistency it would be better to also do it in lock() no?

ilock and unlock could assert(l-m-machno == m-machno).  but currently
you're right on both counts.

- erik



Re: [9fans] minor kernel bug

2014-06-05 Thread erik quanstrom
oh, but you missed a spot in lock.

- erik



Re: [9fans] suicide message on vmware

2014-06-05 Thread erik quanstrom
On Thu Jun  5 23:17:37 EDT 2014, vu3...@gmail.com wrote:
 Hi,
 
 I just saw a suicide message on 9atom running on plan9 while updating
 the system:
 
 % replica/pull -v /dist/replica/network
 
 After a while, I saw this printed, but the replica/pull is proceeding
 without any problem.
 
 (not completely readable because stats window overwrote the screen)
 ... bad sys call number 53 pc 101c6
 timesync 57: suicide: sys: bad syscall pc=0x101c6

nsec!  argh!

- erik



[9fans] 6c copy propgation

2014-06-04 Thread erik quanstrom
i thought this discussion was on 9fans, but i don't see it any more.

this is a recent bug report in 9front

http://code.google.com/p/plan9front/source/detail?r=f80b7ef22cd2352d3823513024d21d3ea14f4854
6a, 6c, 6l: fix copy propagation

Without an explicit signal for a truncation, copy propagation will
sometimes propagate a 32-bit truncation and end up overwriting uses of
the original 64-bit value.

This was independently discovered and fixed in Go. See:
http://golang.org/issue/1315
https://codereview.appspot.com/6002043/

Thanks Charles Forsyth for tips and advice.

i was in a panic for a bit that this was still hanging around, but
after a bit of investigation, i see this was fixed by charles and
pushed out to 9atom quite a while ago.  i have a dim recollection
of finding the bug.  the symptoms were spectacular.  i believe they
bit the log2 calculations, and blew up malloc.

there are more changes from charles in 9atom, including a pretty good
complement of avx instructions.

it might be worth a look.

- erik



Re: [9fans] kernel bug

2014-06-03 Thread erik quanstrom
 I think it should be
 if(mapsize  (SEGMAPSIZE))
 mapsize = SEGMAPSIZE;

hmm.  i think this code is correct.  ssegmap is a static map
to handle small segments.  small segments fit in ssegmap.
the point must have been to avoid malloc.

this test is a little more questionable

   if(mapsize  (SEGMAPSIZE*PTEPERTAB))
   mapsize = (SEGMAPSIZE*PTEPERTAB);

cf. the check in ibrk

if(newsize  (SEGMAPSIZE*PTEPERTAB)) {
qunlock(s-lk);
error(Enovmem);
}

i think this check is either not wrong, or more extensive rework
is necessary.

@anthony, do you know if this code or similar occurred in even older kernels?
if there was a cap also in ibrk() then i would suspect this code was originally
correct.

i don't know where a history of stuff older than sources (2002) is.

 Also why in the kernel they use 'struct Pte' instead of the better name 
 Pagetable.
 In many places this is very confusing because when I see Pte I think of a 
 Pagetable Entry
 where really they are speaking about a Pagetable.

i would naturally think a Pte* would be an array of Pte's, i.e. a Pte table,
just like an array of uchar* could be used as a table.

- erik



Re: [9fans] kernel bug

2014-06-03 Thread erik quanstrom
too bad, i don't see anthony's diff, and i get this error
(perhaps unrelated)

Too many diffs (26  25). Stopping.

- erik



Re: [9fans] kernel bug

2014-06-03 Thread erik quanstrom
 I made the change you suggest in the PAE kernel but perhaps Erik missed it
 during his merge:
 if(mapsize  nelem(s-ssegmap)){
 mapsize *= 2;
 if(mapsize  SEGMAPSIZE)
 mapsize = SEGMAPSIZE;
 s-map = smalloc(mapsize*sizeof(Pte*));
 s-mapsize = mapsize;
 }

ok.  good.  that's what happened.  perhaps some change needs
to be made to segattach to mirror this change.  

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-06-01 Thread erik quanstrom
   i'm pretty sure that jeff's version use ape/psh to execute
  commands, but not positive.  it must use psh to be posix-y.
 
 
 Ah, where is it available?

https://bitbucket.org/jas/cpython

you can pull hg directly from the mainline.

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-06-01 Thread erik quanstrom
On Sun Jun  1 10:48:20 EDT 2014, pavel.klinkov...@gmail.com wrote:

 
  https://bitbucket.org/jas/cpython
 
  you can pull hg directly from the mainline.
 
 
 Well, I am a little bit confused...
 
 1. Is is a new 'python' interpretter implementation? What is a difference
 from 'bichued/python'?

this is a fresh port. 

 2. Is my original problem with mercurial caused by 'hg' or 'python' on
 Plan9?

ti don't know which function was called to execute
the command.  it could be either, but one would guess py.

 3. I see a little chicken-egg problem, to clone such 'python' version, I
 need 'bichued/python' and 'bichued/hg' already installed... ;)

yes.  we were not able to upload a mkfs archive to sources do to size.
jeff, would you mind preping a mkfs archive to put on 9atom/extra?

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-05-31 Thread erik quanstrom
On Sat May 31 09:44:35 EDT 2014, pavel.klinkov...@gmail.com wrote:

 Hi Steven,
 
 For the most part, using HTTP/S repositories will give you the best bang
  for the proverbial buck.
 
 
 I see. In fact I tried to create and use Mercurial repository via 'ssh' and
 'ftp' (via ftpfs) and none of them works.

jeff also did a port that supports 386 and amd64.  this is a key bit for me, 
since
i don't usually run a 386 or pae kernel.  and it may support arm (it did at one
point).  you can track the tip from jeff's repo.  (sadly they won't include 
proper plan 9 support.)
it does not use openssh, but sadly my copy at least also doesn't work with 
ssh2.  it gets
confused in authentication.

jeff: is this supposed to work?

jeff's version of python has fixed the bug where reads of a plan 9 device 
reporting
0 size always failed.

atta; hg version
Mercurial Distributed SCM (version 2.9.1)
(see http://mercurial.selenic.com for more information)

Copyright (C) 2005-2014 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
atta; python --version
Python 2.7.6
atta; echo $cputype
amd64

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-05-31 Thread erik quanstrom
 I didn't do anything more special except adding keys to factotum.

one problem with ssh2 is the fact that it doesn't do keyboard interactive
at all, and doesn't give an error message that makes that apparent.

i've used the client successfully, especially to dell and arista switches.

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-05-31 Thread erik quanstrom
 When I manually make this command (what 'hg' does)...
 
 ssh xxx.yyy.zzz hg init test
 
 
 ...the result is:
 
 bash: hg: Command not found...
 
 
 When I manually modify it...
 
 
 ssh xxx.yyy.zzz 'hg init test'
 
 ...the result is...
 
 !Adding key: proto=pass server=xxx.yyy.zzz service=ssh user=pavel
 password:
 !
 
 ...and the remote repository is successfully created.
 
 Therefore I though the problem is wrong usage of  instead of ' in the
 mercurial port...

i think you're really close.  hg and python really assume a posix
environment, or windows.  so the plan 9 port uses ape, not the
native environment.  therefore, based on what you're reporting,
i would think that the problem is that hg is not using a bourne-
like shell to execute commands.  which version of hg are you
using?  i'm pretty sure that jeff's version use ape/psh to execute
commands, but not positive.  it must use psh to be posix-y.

- erik



Re: [9fans] Mistake in Plan9 Mercual port?

2014-05-31 Thread erik quanstrom
On Sat May 31 12:55:25 EDT 2014, ara...@mgk.ro wrote:
  I’ve never been able to get the ssh v2 version to work on Plan 9 for 
  testing.
 
 Never had any major problem[1] with ssh2 (using the one from labs, with
 factotum, not nfactotum).

the version of factotum is a red herring, as the factotum used
in 9atom, which is a decendent of charles'/rsc's work in the p9p
factotum, was done specificly to support ssh2, and the only version
of factotum used with ssh2 for a few years (up until the public
announcement).

- erik



[9fans] scheduler

2014-05-30 Thread erik quanstrom
the 9 schedulers guts break down to the following loop.  this
is the improved version, abstracted a bit (by hand)

spllo();
for(;;){
for(i = Npri-1; i = 0; i--)
-a for(p = runqueue[i]; p != nil; p = p-rproc)
if(softaffinity(p, m) ||
hardaffinity(p, m)  scheddelay(p) = Delay)
-b goto found;
}
-b while(monmwait(runvec, 0) == 0)
;
}
}

it occured to me that having multiple maches fighting over the
runqueue contains contention on number of cache lines.
what we'd like is for the maches to queue up, and try one-at-a-time.
but this is exactly what mcs locks do!  so adding a private lock at
(a) and a private iunlock at (b) is all that is required.

this nearly eliminates the scheduling penalty one saw with the
original version of the revised scheduler, an increases performance
marginally.  maybe 5%.

again, the cavet here is that i haven't tested with a very large multiprocessor.

- erik



Re: [9fans] [GSOC] Dial between two computers

2014-05-27 Thread erik quanstrom
 I used a program to dial from one system to another system, but
 it gives a connection time out error. I have searched on Internet for a
 long time and cannot get a solution. Could you please provide some
 suggestions or hints? Basically, one system is Linux based system with rc
 shell installed (we call it A). The other one is a auth+cpu+file server
 (we call it B). On B, I have used fossil/conf command to listen tcp!*!564.
 On A, I executed dial tcp!B's ip address!564, but it reports a time out
 error after waiting some time. Results are the same when A is a plan9
 terminal. By the way, I can ping A to B successfully.  What could be the
 possible problems?

fossil will not respond if the message is not 9p.

- erik



Re: [9fans] Too many checkpages() diagnostics ...

2014-05-27 Thread erik quanstrom
On Mon May 26 19:16:22 EDT 2014, lyn...@orthanc.ca wrote:

 For the last couple of days I have been plagued by many many diagnostics from 
 checkpages(), in conjunction with things like:
 
   rc: note: sys: trap: fault read addr=0x0 pc=0x000101c4
   rc 50675: suicide: sys: trap: fault read addr=0x0 pc=0x000101c4

acid says that this is an abort.  

; acid /n/sources/plan9/386/bin/rc
/n/sources/plan9/386/bin/rc:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid; src(0x000101c4)
/sys/src/libc/9sys/abort.c:6
 1  #include u.h
 2  #include libc.h
 3  void
 4  abort(void)
 5  {
6  while(*(int*)0)
 7  ;
 8  }

the problem is without a backtrace, there are a few too many possibilities.
if the abort is legit, these would be good canidates
- notifyf (plan9.c)
- _vsaop (not very likely)
- assert:
io.c:101:   assert(b-fd == -1 || b-bufp  b-buf);
pcmd.c:24:  assert(f != nil);


but ...

 The kernel print buffer holds corresponding entries like:
 
   coral# 10618 dns: checked 136 page table entries
   dns 10618: suicide: sys: trap: fault write addr=0x0 pc=0x00015cea

/sys/src/libc/port/pool.c:974
 969return a;
 970}
 971
 972/* poolallocl: attempt to allocate block to hold dsize user bytes; 
assumes lock held */
 973static void*
974poolallocl(Pool *p, ulong dsize)
 975{
 976ulong bsize;
 977Free *fb;
 978Alloc *ab;
 979
acid; asm(0x00015cea)
poolallocl 0x00015cea   SUBL$0x1c,SP
poolallocl+0x3 0x00015ced   MOVLdsize+0x4(FP),DX
poolallocl+0x7 0x00015cf1   CMPLDX,$0x8000
poolallocl+0xd 0x00015cf7   JCS poolallocl+0x22(SB)

this one doesn't make any sense, unless the stack ptr is smashed.

   26591 rfcmirror: checked 270 page table entries
   37326 rc: checked 51 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   47773 rc: checked 57 page table entries
   50675 rc: checked 53 page table entries

ah.  this is starting to make some sense.  remember above, there was
an abort in notifyf?  that was if the trap depth got too deep.  the problem
is we would need to see 33 events for pid 47773, but we don't.

i had a very similar problem under vbox on osx, and the solution
was to use gorka's ancient fix, which basically avoids clearing PTEs
which do not have the PteP bit set.  there are substantial differences
between the pc and nix kernel's here.

so for example mmuptefree() looks fishy to me since it clears
pages not present.  but i'm not sure.

- erik

the applied patch is /n/atom/patch/applied/vboxmmu

; diff -c mmu.c.orig mmu.c
mmu.c.orig:87,93 - mmu.c:87,93
  }
  
  void
- mmuflushtlb(uintmem)
+ xmmuflushtlb(uintmem)
  {
  
m-tlbpurge++;
mmu.c.orig:98,104 - mmu.c:98,122
putcr3(m-pml4-pa);
  }
  
+ /* hack for vbox */
  void
+ mmuflushtlb(uintmem)
+ {
+   int i;
+   PTE *pte;
+ 
+   m-tlbpurge++;
+   if(m-pml4-daddr){
+   pte = UINT2PTR(m-pml4-va);
+   for(i = 0; i  m-pml4-daddr; i++)
+   if(pte[i]  PteP)
+   pte[i] = 0;
+   m-pml4-daddr = 0;
+   }
+   putcr3(m-pml4-pa);
+ }
+ 
+ void
  mmuflush(void)
  {
Mpl pl;
mmu.c.orig:259,264 - mmu.c:277,283
  void
  mmuswitch(Proc* proc)
  {
+   int i;
PTE *pte;
Page *page;
Mpl pl;
mmu.c.orig:270,276 - mmu.c:289,300
}
  
if(m-pml4-daddr){
-   memset(UINT2PTR(m-pml4-va), 0, m-pml4-daddr*sizeof(PTE));
+   /* hack for vbox */
+ //memset(UINT2PTR(m-pml4-va), 0, m-pml4-daddr*sizeof(PTE));
+   pte = UINT2PTR(m-pml4-va);
+   for(i = 0; i  m-pml4-daddr; i++)
+   if(pte[i]  PteP)
+   pte[i] = 0;
m-pml4-daddr = 0;
}



[9fans] Brdstr / Brdline

2014-05-27 Thread erik quanstrom
regardless of the return value of Brdline, Blinelen() will return  0
even if there is no trailing newline.

Brdstr will return the line even if not terminated.  Blinelen() will be  0.

- erik



Re: [9fans] nix scheduler changes

2014-05-27 Thread erik quanstrom
 Nice. Excited to see how a cleaned up + simplified runproc() and the
 per-Mach queues could also change things. Any reason why the ping test
 w/ monmwait wasn't consistent with the performance improvement in
 other areas?

yes there is.  in a later post i describe it, but basically what i saw is that 
letting
one thread (mach0) get ahead of the others got performance back.  but it does
cost power.  i have a few ideas i am going to try when i get done with the work
i have scheduled for today.

- erik



Re: [9fans] 8l -e

2014-05-27 Thread erik quanstrom
 you can see there is a JMP over _tracein and a RET before _traceout.
 what gives?

ah, that's the magic!  the idea is to be able to enable and disable these 
tracepoints
at runtime in a multiprocessor environment without any locking.

- erik



Re: [9fans] 8l -e

2014-05-27 Thread erik quanstrom
 ok. i'm beginning to understand better. is there a specific use case,
 such as the kernel or userland?
 
 i didn't see anything like a tool that could poke nops into the right
 places. i started to write an acid function to put the nops in one
  named function, and then i realized that the ret can appear several
  times in one function and i would need to search for and patch them
  out. but only the *first* ret, not second, e.g.:

this tool was ment for use with the kernel.  there is a devtrace in 9atom's
pc and pcpae kernels that does this.

ron wrote a paper for the first athens, ga iwp9.  i don't remember the year.
2009?

- erik



Re: [9fans] /dev/cputemp

2014-05-27 Thread erik quanstrom
On Tue May 27 17:59:41 EDT 2014, j...@cowsay.org wrote:
 Just curious, is this not a thing in the nix kernel? grep'd the nix
 sources and it didn't seem to be in devarch.c, it's in 9/pc/ though;
 is there another way to grab cpu temp?
 
 I ask because there seems to be a significant temperature change on my
 test machine between the old nix kernel and some of the new scheduler
 changes, although I wanted some numbers to back that observation (or
 maybe I'm imagining things!). Not a super big deal but just something
 I was just curious about. :-)

yes, i did not bring cputemp into the nix kernel because i thought the code
was ugly, and the temperatures were not necessarly accurate.  but if you want
to pull it in, that would be fine.

- erik



Re: [9fans] dual boot

2014-05-26 Thread erik quanstrom
 Where does the installer script live?

the usb installer is in /sys/lib/dist/amd64.  the part that runs at boot is in
the install directory.  i think the bug is in the vga script in that directory.
(not to be confused with aux/vga.)  it may not do the right thing if there
are no edids.

  a proper screensize looks like XxYxD where X,Y is the screen size in
  pixels, and D is the bit depth.  i use 1600x1200x16.
 
 I set it to 1280x768x16. My monitor/VGA Controller resolutions seem to
 be missing from the /lib/vgadb perhaps?

vesa graphics do not use /lib/vgadb.

 Erik, thanks for putting together and maintaining 9atom. Without it
 and your help, I wouldn't be running a Plan9 system today. I will do
 my best in terms of bug reports and code to give back to the project.

no problems.  that's why it's there.

once you have things installed, you can submit patches
with apatch (see patch(1), as the man page is currently missing.)

- erik



[9fans] nix scheduler changes

2014-05-26 Thread erik quanstrom
so, i've done a little bit more work characterizing the performance
of the scheduler correctness changes, and i know have some understanding
on why e.g. ping times are a bit slower.

the old code essentially let processor 0 spin in runproc, other processors 
called
halt.  the new code uses monmwait to wait for a change on all processors.
this has some significant impacts on performance and power use.  for example,
on my test box with 4c/8t:

spin/halt   monmwaitspin/monmwait
ping8µs 14µs8µs # ip/ping -n10 $sysname
mk  6.26s   3.98s   3.80# make nix kernel
fansaudible silent  audible
δpower  -   -24w0   # resolution = .1A = 
12w @ 120v)

this seems to indicate the latency is all in runproc(), and not waiting for 
things
to be ready and assuming they will be has a big performance boost.

(the third column, testing spin on mach 0, plus monmwait on the others was done
to tell if monmwait has high latency or not.)

i'd really be interested to see what this does on 24c/48t machines.  something
tells me the performance impacts would be huge, and different.

- erik

---
ps. hzsched in the distribution is 10% off for HZ=100, since
schedticks = m-ticks + HZ/10, and delaysched tests
for  not the expected =.



Re: [9fans] devproc procctl close bug

2014-05-26 Thread erik quanstrom
On Mon May 26 16:10:59 EDT 2014, cinap_len...@felloff.net wrote:
 theres a bug in devproc again.
 
 the fd is not bounds checked for the close fd
 procctl command and the closefiles command misses
 the last fd as it iterates from:

good catch.  appled patch to 9atom.

- erik



Re: [9fans] devproc procctl close bug

2014-05-26 Thread erik quanstrom
On Mon May 26 16:32:54 EDT 2014, cinap_len...@felloff.net wrote:
 excellent :)

why, do you plan a plan 9 botnet that exploits this hole :-).

- erik



Re: [9fans] dual boot

2014-05-25 Thread erik quanstrom
 It proceeded to show me a list of resolutions (8 different options).
 Since my monitor has the highest resolution of 1920x1080, I selected
 that. Next it asked for image depth[no default] where I typed in 16.
 But it kept looking there, whatever I keyed in. If I proceed with no
 default, I see that the PLAN9.INI file in the 9fat partition (which I
 mounted with 9fat: command) has the resolution set as 1920x1080xno
 default. So, after installation, I got dropped into a text console,
 probably because of this error.

yes, sorry about this.  there is a bug in the script.  fortunately you were
smart enough to fix things.

a proper screensize looks like XxYxD where X,Y is the screen size in
pixels, and D is the bit depth.  i use 1600x1200x16.

- erik



Re: [9fans] dual boot

2014-05-24 Thread erik quanstrom
On Sat May 24 02:46:08 EDT 2014, vu3...@gmail.com wrote:
 I downloaded the usbinstamd64 image from the 9atom webpage and booted
 it up. First, I tried amd64 (selection 0), in a second or so, some
 text went past the screen quickly and the machine rebooted. I then
 tried selection 1 (386pae), that booted up but quickly halted with
 this:
 
 Plan 9
 E820: 
 ...
 apic: 6 machs started; flat mode vectors
 winbont .ff hw fff8
   no capabilities
 panic: kernel fault: no user process pc=f0162415 addr=0x00a8
 panic: kernel fault: no user process pc=f0162415 addr=0x00a8
 dumpstack disabled
 cpu0: exiting
 cpu0: spurious interrupt 39, last 0
 
 and it hangs there.
 
 Is there any options I can try to move past this step?
 -- 
   Ramakrishnan
 

well, first sorry.  this is not a usb issue.  since you're using
the 9paed kernel, this is the crash site:

acid; src(0xf0162415)
/sys/src/9/pcpae/ether8169.c:385
 380static int
 381rtl8169miimiw(Mii *mii, int pa, int ra, int data)
 382{
 383if(pa != 1)
 384return -1;
385return ·rtl8169miimiw(mii-ctlr, pa, ra, data);
 386}
 387
 388static Mii*
 389rtl8169mii(Ctlr* ctlr)
 390{

and after a little inspection, i see the issue.  and i've applied a patch.
does the amd64 kernel behave differently?

one thing i am confused about, you have

 panic: kernel fault: no user process pc=f0162415 addr=0x00a8
 panic: kernel fault: no user process pc=f0162415 addr=0x00a8

i assume this was typed by hand?  i would expect it to read:
 panic: kernel fault: no user process pc=0xf0162415 addr=0x00a8
 panic: kernel fault: no user process pc=0xf0162415 addr=0x00a8

there is a new TEST image @ http://ftp.9atom.org/other/+usbinstamd64.bz2
i have not had a chance to try it myself, but the crash seems obvious enough.

one warning: yesterday i applied some changes to the amd64 scheduler,
to generate the correct load average, and to calm down the absolute
mach affinity.  this would cause dramatic unfairness whever nrdy  nmach.
this is because on some cpu, two processes would get assigned.  they would
share the cpu fairly, but on nmach-1 maches, the busy process would get a whole
cpu.  the solution was to watch how long a process has been ready and use that
as a hint that it should be run, even if there is a proc with greater affinity.

(thanks to jyu and gsoc!)  but removing the obvious bugs has
put the scheduler a bit out of tune, and things like ping may see a 20µs
delay in some cases.  i'm working on it.

the good news is that we now see correct load averages, fairness, and kernel
compile times have dropped 40% from before.

- erik



Re: [9fans] dual boot

2014-05-24 Thread erik quanstrom
On Sat May 24 02:46:08 EDT 2014, vu3...@gmail.com wrote:
 I downloaded the usbinstamd64 image from the 9atom webpage and booted
 it up. First, I tried amd64 (selection 0), in a second or so, some
 text went past the screen quickly and the machine rebooted. I then
 tried selection 1 (386pae), that booted up but quickly halted with
 this:
 
 Plan 9
 E820: 
 ...
 apic: 6 machs started; flat mode vectors
 winbont .ff hw fff8
   no capabilities
 panic: kernel fault: no user process pc=f0162415 addr=0x00a8
 panic: kernel fault: no user process pc=f0162415 addr=0x00a8
 dumpstack disabled
 cpu0: exiting
 cpu0: spurious interrupt 39, last 0

i forgot to say ... excellent bug report.

- erik



Re: [9fans] plan9.iso

2014-05-24 Thread erik quanstrom
On Sat May 24 20:07:37 EDT 2014, j...@corpus-callosum.com wrote:
 Has anyone else had trouble getting recent plan9.iso’s to boot?
 I can get it to boot from sdE1!9fat!9pcflop.gz, but once the
 install starts it fails to recognize any sdE? devices that
 should up shortly on the console.

this was a prime motivator to put together 9atom in the first place.

- erik



Re: [9fans] dual boot

2014-05-24 Thread erik quanstrom
 Yes, I typed them by hand. Sorry about the error.
 
  panic: kernel fault: no user process pc=0xf0162415 addr=0x00a8
  panic: kernel fault: no user process pc=0xf0162415 addr=0x00a8
 
  there is a new TEST image @ http://ftp.9atom.org/other/+usbinstamd64.bz2
  i have not had a chance to try it myself, but the crash seems obvious 
  enough.
 
 Thanks. I just tried it and indeed, with the pae kernel, it gets me
 into the installer. At some point, I mistakenly selected (arches to
 install as amd64, well I really wanted to install amd64 but perhaps
 the kernel has not been rebuilt for amd64 I guess.
 
 Now, it proceeds to build full set of amd64 executables? which I
 selected yes. I then get a build error:
 
 8c -FTVw s_tolower.c
 s_rdinstack.c:13 not a function
 s_rdinstack.c:13 syntax error, last name: Sinstack
 mk: 8c -FTVw s_rdinstack.c  : exit status=rc 594: 8c 604: error
 mk: date for (i  ...  : exit status=rc 509: rc 563: mk 565: error
 halt system? (yes, no, skip)[no default]

oops.  sorry.  my fault.  fixed.

 I am going to try installing a 386 system instead (after sending out
 this email).
 
 Also the amd64 kernel still didn't boot. When I selected amd64
 (selection 0), it just printed something that I couldn't read and
 rebooted the system.

if this is an amd (not intel) system, i may have applied a band-aid for this.

 Very eager to try these out. If you have any new builds and want to
 test them out (and can bear the time zone difference -- I am in
 GMT+5.30), I will be glad to test them out.

everything is ready to try in

http://ftp.9atom.org/other/+usbinstamd64.bz2

this is the test image.

- erik



Re: [9fans] version control debate

2014-05-23 Thread erik quanstrom
 L.

have you converted to modula?  :-)

- erik



Re: [9fans] dual boot

2014-05-23 Thread erik quanstrom
 I plan to install 9atom natively (until now, I had been using VMs and
 Rpi, but I want to try it on my home AMD64 desktop machine). I
 currently have Debian GNU/Linux installed on a hard disk. I am adding
 a new hard disk on which I plan to install the 9atom. I am wondering
 if I need to take any care to do the dual-boot from grub.

that should work.  but i haven't tried it.  i change the boot order in
bios instead.

- erik



Re: [9fans] [GSOC] auth server: connection refused

2014-05-23 Thread erik quanstrom
On Fri May 23 14:26:50 EDT 2014, ccuiy...@gmail.com wrote:

 Finally got the reason.

personally, i preferred the big switch statement in cpurc.  it scales
even to large installations, and has the advantage of being a little
easier to get an overview.  and there's no need for a bunch of files
that are similar to
#!/bin/rc
. config-a
- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
 Features such as atomic commits, changesets, branches, push,
 pull, merge etc. can be useful in multiple contexts so it
 would be nice if they can integrated smoothly in an FS.
 
 - Installing a package is like a pull (or if you built it
   locally, a commit)
 - Uinstall is reverting the change.
 - Each machine's config can be in its own branch.

what is the advantage over seperate files?  imo, this complicates the issue.

 - You can use clone to create sandboxes.
 - A commit makes your private temp view permanent and
   potentially visible to others.
 - Conversely old commits can be spilled to a backup
   media (current SCMs want the entire history online).
 - Without needing a permanent connection you can `pull' data.
   [never have to do `9fs sources; ls /n/sources/contrib'.]

this is a nice list, but i think a key point is not being teased out.
the dump file systems are linear.  there is a full order of system
archives.  in hg, there is a full order of the tip, but not so of
branches in general.  in git, multiple orders (not sure if they're
full or not) can be imposed on a set of hashes.

another key point is that all distributed scms that i've used clone
entire systems.  but what would be more interesting is to clone, say,
/sys/src or some proto-based subset of the system, while using the
main file system for everything else.  imagine you want to work on
the kernel, and imagine that you keep console log files.  clearly
you want to see both the new log entries, and the modified kernel.

i would be concerned that this really challenges the plan 9 model
of namespaces.  one would have to figure out how to keep oneself
out of namespace hell if one were to build this into a file system and
use it heavily.

- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
 More seriously, though, on the issue of revision control on Plan 9
 (and code review, that being the really important aspect) I'd like us
 to keep in mind that being able to interface with existing
 repositories, difficult as it may be, would be greatly beneficial.  To

like i said, a hg gateway hosted on google code or whatever is for me
welcome, but that's not my top priority.  if it is your top priority, then
go to it.  if you need some system changes, submit patches.  if they're
invasive, it might be good to discuss the plan first in public.

- erik



Re: [9fans] VirtualBox, Mavericks, and Plan 9

2014-05-22 Thread erik quanstrom
On Thu May 22 06:55:44 EDT 2014, ara...@mgk.ro wrote:
 Why do people insist on VirtualBox? How many times it has to be said.
 VirtualBox is utter shite. QEMU and VMware work. QEMU is especially
 interesting because it can work without a broken kernel driver
 (although it can use kvm, a good kernel driver on Linux).

because they're using osx, and don't want to shell out for vmware.

- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
 That said, let me add my encouragement to sample apatch as suggested
 by Erik, although any valid objections ought to be raised here.  One,
 from me, comes from Erik himself a modified version of Nemo's
 (a)patch (I don't have the exact quote handy.  Nemo, could we please
 start this exercise with one, and only one version of this?

the original version is, as far as i know, no longer in use.
i only mentioned the lineage to credit nemo with the work.

- erik



Re: [9fans] syscall 53

2014-05-22 Thread erik quanstrom
 With all respect due to you and Mr Coraid (don't make mne look his

Coile.

- erik



Re: [9fans] syscall 53

2014-05-22 Thread erik quanstrom
   Is this the right place to discuss the actual procedure to include
   apatch in one's private Bell Labs' distribution?
 
   Is it preferable to use apatch within 9atom, or is it reasonably
   portable to the legacy (I presume that is what David intends
   with that moniker) distribution?

apatch is as specialized as patch in that it creates patches against the current
9atom tree.  patch is still there, and i often generate parallel patches for 
both
9atom and sources.  as long as your source hasn't drifted wrt the atom tree,
you can send an apatch from a 9atom, hybride or standard tree.

   Is there a willing participant who is prepared to offer backing
   storage for the target distribution(s) even when he or she
   disagrees with the outcome?  (And be vociferous both about
   disagreeing and accepting the outcome?) David, you've been good
   that way, are there others?  I'd like to think that we can
   leverage Plan 9's distributable properties to have more than one?
   I'd like to offer this myself, but I live at the bottom of an
   Internet gravity well, anyone with sufficient patience is welcome
   to approach me.

you are aware that you can mount the 9atom sources directly these days?

  nflag=-n srv $nflag -q tcp!atom.9atom.org atom /n/atom atom   # add to 9fs

- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
 Go is in a different league: Heretical as it may seem, we can generate
 Go binaries without compelling all Plan 9 installations to include the
 Go toolchain, no matter how valuable some of us may perceive it.  HG
 without Python is a dead rat.

that's a partially binary distribution.  a proper, full, distribution
would have the same issue.

and go introduces new issues, it's much more in flux than python.
the risk is that a go update could then break the system.
and only runs on 386.  it does not run on plan 9 mips, arm, or amd64.

- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
On Thu May 22 09:45:08 EDT 2014, lu...@proxima.alt.za wrote:
  the original version is, as far as i know, no longer in use.
  i only mentioned the lineage to credit nemo with the work.
 
 Out of curiosity, what prompted not using CVS?  I can think of a
 number of reasons, but none that echo with your comments up to now.

cvs wasn't considered because it's not appropriate.  the 9atom model
is single committer.  very much like linus' model.

- erik



Re: [9fans] hgfs

2014-05-22 Thread erik quanstrom
thinking about the idea of a revision control file system brings me back to
some work i followed by brian stuart.  his θfs has a object store.  the object
store allows arbitrary metadata and object size.  the ℙ snapshot device could
be modified to take snapshots based on an arbitrary reference point, rather than
the last snapshot.  so in theory all the bits are there, they would just need 
to
be put together in a different way.  fun little thing to think about.

in any event, back to the subject at hand.  this in-depth discussion of various
revision control systems seems to assume that revision control is the key issue.

i have seen plan 9-derived projects fail using codereview and google code
because there wasn't a shared goal.  so i would identify it as the key
issue  the goal is strategy, revision control is tactics.

instead of a goal or vision, perhaps values are more down to earth.
for example, a common value for kernel folks is never break user land.

let me propose this draft.
1.  keep with the original value system of plan 9: e.g. simple implementation of
advanced techniques, self-contained, avoid configuration, use mechanism such
as namespaces instead.

2.  run on as many systems as practical.  do not break compatability without
specific articulated reasons.

3.  all changes are up for debate, but there is a clear path to decision making.

- erik



[9fans] θfs vs webfs; θfs vs lookman

2014-05-22 Thread erik quanstrom
i think i've fixed the issues preventing
readweb http://www.9atom.org/magic/man2html/4/θfs
and
lookman θfs
from working.  it's surprising how many unicode bugs there still are.

- erik



Re: [9fans] hgfs

2014-05-22 Thread erik quanstrom
c'mon.  there's no point to namecalling.

- erik



Re: [9fans] CMS/MMS (VCS/SCM/DSCM) [was: syscall 53]

2014-05-22 Thread erik quanstrom
  another key point is that all distributed scms that i've used clone
  entire systems.  but what would be more interesting is to clone, say,
  /sys/src or some proto-based subset of the system, while using the
  main file system for everything else.  imagine you want to work on
  the kernel, and imagine that you keep console log files.  clearly
  you want to see both the new log entries, and the modified kernel.
 
 Actually with something like venti as the store, `clone' is
 trivial! Just find the hash for a particular changeset you want
 to clone and you can build the rest on demand. `rebase' or
 `pull' will be more painful.

venti is going to be a difficult model.  the objects in scm are typically
arbitrarly sized, and have arbitrary metadata.  θfs has this model for
its object store.  venti does not.

  i would be concerned that this really challenges the plan 9 model
  of namespaces.  one would have to figure out how to keep oneself
  out of namespace hell if one were to build this into a file system and
  use it heavily.
 
 Your concern is a bit premature.  We are just handwaving right
 now!  I am interested in finding out just how far we can push
 the plan9 model -- and if the current model doesn't naturally
 fall out of any extended model, we'd know.

i don't know.  this concern was addressed in the very first paper,
and i have delt with some plan 9-based systems that did this, and
didn't get it right.  

but it can be addressed without much trouble.  just have the fs export
em all.

- erik



Re: [9fans] θfs vs webfs; θfs vs lookman

2014-05-22 Thread erik quanstrom
On Thu May 22 16:10:21 EDT 2014, skip.tavakkol...@gmail.com wrote:

 what types of metadata are/were stored in a typical case?

for the object store, any metadata at all would be acceptable, and
i don't think there is a typical case.  there is no object store fs interface.

for the nfs and 9p server, just the typical metadata were stored.

- erik



Re: [9fans] VirtualBox, Mavericks, and Plan 9

2014-05-22 Thread erik quanstrom
On Thu May 22 17:25:07 EDT 2014, edgecombe...@gmail.com wrote:
 Aram, if you have a bunch of settings that work under VMWare Fusion
 for Plan 9, then I am all ears. I was under the understanding Plan 9
 didn't work under VMWare...

the second thing the nix terminal ran on was vmware.  i just have not
it much.

- erik



Re: [9fans] syscall 53

2014-05-21 Thread erik quanstrom
 PS: I have resurrected an old Nokia (5110, but I'm not sure) phone,
 but it's been borrowed and I have my doubts that I will be seeing it
 again any time soon.  Maybe this forum can help me decide what GSM
 equipment is safe from interference by the networks and their
 information masters?  My current hate-object is my Galaxy S4.

purchasing a phone directly from google allows one to cut
out the network operator. (to some extent.)  in theory the
cyanogenmod phone is less tied to google.  sadly, it doesn't
bluetooth-le, and i need that.

all options appear to me to boil down to walled gardens.
unless you build it yourself.

- erik



Re: [9fans] syscall 53

2014-05-21 Thread erik quanstrom
 I think such a beast would provide the foundations for a serious
 effort to bring the distributions back together.  I know many resist
 such efforts because of Python (a pet hate of mine, even though I
 don't know it from Adam), HG and codereview and I resist accusing them
 of reactionary behaviour; I do wish we could get past that problem,
 though.

fwiw...

i use a derivative of nemo's patch system.  all changes are applied through
patches.  anyone can comment on a patch, and comments, and patch
dispositions are mailed to everyone on the list.  the list is open to
general discussion.  the patch system allows folks to pull (not executables)
or apply patches themselves, so in spirit it's closer to git than hg.

i'm open to any sort of gateway to hg/codereview/git that folks find useful.
i just don't want hg to be a requirement.  one of the things i value about
plan 9 is the fact that it's a self-contained system.  requiring hg and websites
runs counter this.  i haven't carved out the time to do anything about it,
but patches are welcome.

i think a key bit to collaboration is going to be setting some ground rules.
and the most important one imho is having a clear goal.

off the top of my head, how about having the best plan 9 we can afford,
which runs on as much hardware, and as many vms as possible.  right out
of the box.

what do yall think?

- erik



Re: [9fans] syscall 53

2014-05-21 Thread erik quanstrom
 To keep the ball rolling, let me suggest that we drop the requirement
 that Plan 9 be self-contained as a measure to make some progress with
 existing expertise.  I wish we could keep Plan 9 as the sole
 foundation, but I think that's just not viable, I feel treasonous
 suggesting otherwise, but I'm merely stating my sentiments, not
 imposing a rule here.

can you explain why is this not viable?  what essential bits would be
missing if hg/git/whatever is not tightly integrated into the process?

- erik



Re: [9fans] syscall 53

2014-05-21 Thread erik quanstrom
On Wed May 21 14:28:51 EDT 2014, s...@9front.org wrote:
  i use a derivative of nemo's patch system.
 
 Where is this in the 9atom tree? Did you replace the old
 patch(1) entirely?

good question.  the commands are all apatch/create, apatch/note, etc.
patch(1) is not replaced, and the patch commands are intact.  i need
them as i try to send as many patches in to sources as possible.

9diff(1) is a nifty little hack.

- erik



Re: [9fans] syscall 53

2014-05-21 Thread erik quanstrom
 Ergo: Plan 9 does not (yet?) contain sufficient tools to be
 self-sustaining.  

we've managed for years

 at it; it needs firm buy-in by the community.  I, for one, would need
 some hard sell to consider patch and its offspring as sufficient and
 much more to convince me that it would be technically superior to
 codereview, others may well be even more hard-assed than I am, and
 their skills and contributions are too important to sacrifice.

i'd encourage you to try participating with apatch, and the mailing
list.

i would find specific issues easier to reason about than
technically superior.

- erik



Re: [9fans] syscall 53

2014-05-20 Thread erik quanstrom
  That said, the problems were due (IMHO) to a limitation in the
  update mechanism, not to the inclusion of a new system call.
 
 This is true depending on how you define update mechanism.
 A simple note from whoever made the decision to push the
 change out to the effect of hey, we're going to add a new
 syscall, update your kernels before pulling new binaries a
 while before the push would have been sufficient.

technology doesn't solve human communiction problems.

here's the Official Meme.  simply s/spam/update issues/g
and you're good:

https://craphound.com/spamsolutions.txt

- erik



Re: [9fans] waitfree

2014-05-20 Thread erik quanstrom
 Dunno what to say. I'm not trying this on Plan 9, and I can't
 reproduce your results on an i7 or an e5-2690. I'm certainly not
 claiming that all pipelines, processors, and caches are equal, but
 I've simply never seen this behavior. I also can't think of an
 application in which one would want to execute a million consecutive
 LOCK-prefixed instructions. Perhaps I just lack imagination.

the original question was, are locked instructions wait free.  the
purpose was to see if there could be more than a few percent
variation, and it appears there can be.

i think in modern intel microarches this question boils down to,
can a given cpu thread in an arbitrary topology transition an
arbitrary cacheline from an arbitrary state to state E in a bounded
number of QPI cycles.

of course this reformulation isn't particular helpful, at least to me,
given the vagueness in the descriptions i've seen.

this is practically important because if there is a dogpile lock or
bit of shared memory in the system, then a particular cpu may
end up waiting quite a long time to acquire the lock.  it may
even get starved out for a bit, perhaps with a subset of other cpus
batting around more than once.  

this would be hidden waiting behavior that might lead to surprising
(lack of) performance.

i would say this vaguery could impede worst-cast analysis for safety
critical systems, but you'd be pretty adventursome, to say the least,
to use a processor with SMM in such a system.

- erik



Re: [9fans] syscall 53

2014-05-20 Thread erik quanstrom
On Tue May 20 12:42:35 EDT 2014, rminn...@gmail.com wrote:
 I have a different perspective. There are millions of chromebooks out
 there updating all the time, from the firmware to the kernel to the
 root file system to everything. It all works.
 
 If you are telling me that the upgrade technology of Plan 9 can not
 handle an automatic upgrade, fine; we have the proof.
 
 If you are telling me Plan 9 should not or never will be able to
 handle an automatic upgrade, and is going to require a heads up email
 for each kernel change, I have a hard time taking that seriously.
 
 This is not a human communication problem. It's a technology problem,
 trivially solved for many years now by many systems.

leaving aside the fact that plan 9 is software that installs everywhere and
users can be expected to modify any and all system components, and that
android is a hardware appliance running essentially a binary blob

if we had a perfect update mechanism that did ota updates seamlessly,
it would not address the issue i'm trying to raise.  

since people modify the system, and there's a community built around
this, it would be extremely helpful if big system changes where communicated.

- erik



Re: [9fans] syscall 53

2014-05-20 Thread erik quanstrom
 I never understood why binaries are pulled. Even on a lowly
 RPi it takes 4 minutes to build everything (half if you cut
 out gs). And the 386 binaries are useless on non-386
 platforms!
 
 Why not just separate binary and source distributions?  Then
 include a file in the source distribution to warn people about
 changes such as this one (or the one about 21bit unicode) and
 how to avoid painting yourself in a corner. The binary distr.
 should have a provision for *only* updating the kernel and
 insisting the user boots off of it before further updates can
 proceed.

i think this is a good idea.  the 9atom usb installer builds the
full set of executables for the arches you want as part of the
install.  it reduces the size of the install image by at least 20MB
per installed architecture.  given the sad state of broadband
in many places, i'm hoping this is a good tradeoff.

- erik



Re: [9fans] waitfree

2014-05-20 Thread erik quanstrom
 I can't think of any reason it should be implemented in that way as
 long as the cache protocol has a total order (which it must given that
 the μops that generate the cache coherency protocol traffic have a
 total order), a state transition from X to E can be done in a bounded
 number of cycles.

my understanding is that in this context this only means that different
processors see the same order.  it doesn't say anything about fairness.

 The read function will try to find a value for addr in cache, then
 from memory. If the LOCK-prefixed instruction's decomposed read μop
 results in this behavior, a RFO miss can and will happen multiple
 times. This will stall the pipeline for multiple memory lookups. You
 can detect this with pipeline stall performance counters that will be
 measurably (with significance) higher on the starved threads.
 Otherwise, the pipeline stall counter should closely match the RFO
 miss and cache miss counters.

yes.

 For ainc() specifically, unless it was inlined (which ISTR the Plan 9
 C compilers don't do, but you'd know that way better than me), I can't
 imagine that screwing things up. The MOV's can't be LOCK-prepended
 anyway (nor do they deal with memory), and this gives other processors
 time to do cache coherency traffic.

it doesn't matter if this is hard to do.  if it is possible under any 
circumstances,
with any protcol-adhering implementation, then the assertion that amd64
lock is wait-free is false.

- erik



Re: [9fans] syscall 53

2014-05-20 Thread erik quanstrom
On Tue May 20 15:50:56 EDT 2014, rminn...@gmail.com wrote:
 Ah well, back to 'm' for this thread, and I now accept that this
 community is unwilling to solve this simple problem, as so many others
 have.  Bummer.

nobody said that.  there's a difference between noting a strawman
argument, and pointing out that one feels that there is a different
issue that's more important to solve, and being unwilling to address
an issue.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
 Which raises another question: are 9atom and 9front in synch with the
 BL distribution (itself in question) regarding syscall 53?

9atom is not.  i didn't know that it was added, nor do i 
know why nsec was added as a syscall.

i indirectly heard go needs it, but that is not really a reason
i can understand technically.  why must it be a system call?

getting ahead of myself, if the problem is shared memory vs
shared fds, then the solution is easy: fix nsec in the c library.
don't save a copy of the fd.  that leads to trouble.  (the new
call takes ~6µs on my e3 v2)

if the problem is getting very low-latency timing, or relative
timing, then the solution is still easy: use the timestamp counter.
no version of nsec works for relative timing due to timesync
adjustments!

i'm sure there are other possibilities, i don't think i see them without
an explination.  so if anyone has anything else, that would be
interesting.

- erik

---
; cat /sys/src/libc/9sys/nsec.c
#include u.h
#include libc.h

vlong
nsec(void)
{
uchar b[8];
int fd;

fd = open(/dev/bintime, OREAD);
if(fd != -1)
if(pread(fd, b, sizeof b, 0) == sizeof b){
close(fd);
return getbe(b, sizeof b);
}
close(fd);
return 0;
}



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
On Mon May 19 10:04:28 EDT 2014, lu...@proxima.alt.za wrote:
  i indirectly heard go needs it, but that is not really a reason
  i can understand technically.  why must it be a system call?
 
 Actually, Go raised an important alert, quite indirectly: when using
 high resolution timers, the issue of opening a device, reading it and
 converting the input value to a binary value can and in this case is
 very expensive.

 Curiously, the actual symptom - I cannot remember how it came about -
 was that using the timer leaked file descriptors, or, more likely,
 gave the impression of leaking file descriptors.  But the reality is
 that nanosecond accuracy cannot be achieved from reading a device by
 conventional means.

i think my original question still stands.  what is the purpose of timing,
what is the desired accuracy and precision, and is a relative or absolute
time wanted?  

a relative time (say a time adjusted with timesync, including leap seconds, etc)
is not what you want if you want relative timing.  something like the
timestamp counter makes a lot more sense.

i took a quick look at the runtime·nanotime, and it looks like it's being
used for gettimeofday, which shouldn't be super performance sensitive.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
On Mon May 19 12:26:00 EDT 2014, quans...@quanstro.net wrote:
 On Mon May 19 10:04:28 EDT 2014, lu...@proxima.alt.za wrote:
   i indirectly heard go needs it, but that is not really a reason
   i can understand technically.  why must it be a system call?
  
  Actually, Go raised an important alert, quite indirectly: when using
  high resolution timers, the issue of opening a device, reading it and
  converting the input value to a binary value can and in this case is
  very expensive.
 
  Curiously, the actual symptom - I cannot remember how it came about -
  was that using the timer leaked file descriptors, or, more likely,
  gave the impression of leaking file descriptors.  But the reality is
  that nanosecond accuracy cannot be achieved from reading a device by
  conventional means.
 
 i think my original question still stands.  what is the purpose of timing,
 what is the desired accuracy and precision, and is a relative or absolute
 time wanted?  

also, one cannot get close to 1ns precision with a system call.  a system call
takes a bare minimum of 400-500ns on 386/amd64.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
On Mon May 19 13:17:59 EDT 2014, lu...@proxima.alt.za wrote:
  also, one cannot get close to 1ns precision with a system call.  a system 
  call
  takes a bare minimum of 400-500ns on 386/amd64.
 
 Sure, but accessing /dev/time is, if I guess right, orders of
 magnitude slower, specially if you have to open the device first.

the full operation open/read/close/convert takes 6µs on my machine,
and a system call takes about 750ns.

getting the kitchen time should not need better than 6µs.  if this
were relative timing, i would understand, but it's not.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
 On 19 May 2014 18:13, lu...@proxima.alt.za wrote:
 
  Curiously, I'm pretty certain that it was the issue of an fd that
  remained open (something to do with caching the /dev/time fd, if I
  remember right) that caused some tests to fall apart, probably because
  a test for leaking fds actually needed to cache the time of day for
  time out purposes.
 
 
 That was entirely the result of a botched attempt to optimise something.
 Remove that botched optimisation, and that problem goes away.

+1, or rather, exactly.

- erik



[9fans] waitfree

2014-05-19 Thread erik quanstrom
i've been thinking about ainc() and for the amd64 implementation,

TEXT ainc(SB), 1, $-4   /* int ainc(int*) */
MOVL$1, AX
LOCK; XADDL AX, (RARG)
ADDL$1, AX
RET

does anyone know if the architecture says this is wait-free?
this boils down to exactly how LOCK works, and i can't find
a good-enough definition.  i tend to think that it might not be.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
 I am adding some logic to synchronize with the PPS signal from
 the GPS device that I hooked up to a RaspberryPi.  With this
 change the TOD clock should be accurate to within 10 to 20 µs.
 So I for one welcome the new syscall! [Though its introduction
 could've been better managed]

even a syscall on a rpi is going to cost you at least 5-10µs
and clock drift will make this, and your second point very hard.

 But using a TOD clock for measuring performance seems wrong
 since it will also have to account for leapseconds (at the
 moment timesync happily ignores leapseconds).

- erik



Re: [9fans] waitfree

2014-05-19 Thread erik quanstrom
On Mon May 19 15:51:27 EDT 2014, devon.od...@gmail.com wrote:
 The LOCK prefix is effectively a tiny, tiny mutex, but for all intents
 and purposes, this is wait-free. The LOCK prefix forces N processors
 to synchronize on bus and cache operations and this is how there is a
 guarantee of an atomic read or an atomic write. For instructions like
 cmpxchg and xadd where reads and writes are implied, the bus is locked
 for the duration of the instruction.
 
 Wait-freedom provides a guarantee on an upper-bound for completion.
 The pipeline will not be reordered around a LOCK instruction, so your
 instruction will cause a pipeline stall. But you are guaranteed to be
 bounded on the time to complete the instruction, and the instruction
 decomposed into μops will not be preempted as the μops are executed.

there is no bus.

what LOCK really does is invoke part of the MSEI protocol.  the state
diagrams i've seen do not specifiy how this is arbitrated if there are  1
processor trying to gain exclusive access to the same cacheline.

 Wait-freedom is defined by every operation having a bound on the
 number of steps required before the operation completes. In this case,
 you are bound by the number of μops of XADDL + latency to memory. This
 is a finite number, so this is wait-freedom.

i'm worried about the bound on the number of MSEI rounds.  i don't see
where the memory coherency protocol states that if there are n processors
a cacheline will be acquired in at most f(n) rounds.

- erik



Re: [9fans] syscall 53

2014-05-19 Thread erik quanstrom
 There was another complaint about /dev/bintime. Some people claimed
 that using it leaked file descriptors in multithreaded programs. I
 don't understand why this problem can't be solved by opening it
 close-on-exec. In fact, this problem doesn't exist in the port of
 Go to Plan 9 anymore (although the fix was different)...

close-on-exec doesn't solve any interesting issues.  close on fork might,
but we don't have that.

if two threads share memory, but do not share file descriptor tables, or
if you overflow the file descriptor array, you're really cooked.  things go
seriously wrong in a hurry.

there are other cases that are really terrible, but this is the most glaring 
one.

- erik



Re: [9fans] waitfree

2014-05-19 Thread erik quanstrom
On Mon May 19 17:02:57 EDT 2014, devon.od...@gmail.com wrote:
 So you seem to be worried that N processors in a tight loop of LOCK
 XADD could have a single processor. This isn't a problem because
 locked instructions have total order. Section 8.2.3.8:
 
 The memory-ordering model ensures that all processors agree on a
 single execution order of all locked instructions, including those
 that are larger than 8 bytes or are not naturally aligned.

i don't think this solves any problems.  given thread 0-n all executing
LOCK instructions, here's a valid ordering:

0   1   2   n
lockstall   stall   ... stall
lockstall   stall   ... stall
... ...
lockstall   stall   ... stall

i'm not sure if the LOCK really changes the situation.  any old exclusive
cacheline access should do?

the documentation appears not to cover this completely.

- erik



Re: [9fans] waitfree

2014-05-19 Thread erik quanstrom
On Mon May 19 18:15:21 EDT 2014, rminn...@gmail.com wrote:
 On Mon, May 19, 2014 at 3:05 PM, erik quanstrom quans...@quanstro.net wrote:
 
  the documentation appears not to cover this completely.
 
 
 Hmm. You put documentation and completely in the same sentence. Agents
 are converging on your house. Run!
 
 There's always an undocumented progress engine somewhere in the works.

true.  the internet does appear to run on the infinite undocumentation drive.
(sorry mr adams.)

- erik



Re: [9fans] waitfree

2014-05-19 Thread erik quanstrom
 It is an ordering, but I don't think it's a valid one: your ellipses
 suggest an unbounded execution time (given the context of the
 discussion). I don't think that's valid because the protocol can't
 possibly negotiate execution for more instructions than it has space
 for in its pipeline. Furthermore, the pipeline cannot possibly be

yes it is an unbounded set of instruction.  i am wondering if it isn't
possible for the same core to keep winning the MESI(F) arbitration.

i don't see tying µ-ops to cachelines.  load/store buffers i believe
is where cachelines come in to play.

 filled with LOCK-prefixed instructions because it also needs to
 schedule instruction loading, and it pipelines μops, not whole

i didn't read that in the arch guide.  where did you see this?

 instructions anyway. Furthermore, part of the execution cycle is
 decomposing an instruction into its μop parts. At some point, that
 processor is not going to be executing a LOCK instruction, it is going
 to be executing some other μop (like decoding the next LOCK-prefixed
 instruction it wants to execute). This won't be done with any
 synchronization. When this happens, other processors will execute
 their LOCK-prefixed instructions.

and this is an additional assumtion that i was trying to avoid.  i'm
interested if LOCK XADD is wait free in a theory.

 further instructions. Instruction load and decode stages are shared,

this is not always true.  and i think hints at the issue that it might be
inaccurate to generalize from your cpu to all MESI cpus.

i get a 126% difference executing lock xadd 1024*1024 times
with no branches using cores 4-7 of a xeon e3-1230.  i'm sure it would
be quite a bit more impressive if it were a bit easier to turn the timer
interrupt off.

i really wish i had a four package system to play with right now.  that
could yield some really fun numbers.  :-)

- erik

example run.  output core/cycles.
; 6.lxac
4 152880511
7 288660939
6 320991900
5 338755451




Re: [9fans] syscall 53

2014-05-18 Thread erik quanstrom
On Sun May 18 18:56:49 EDT 2014, skip.tavakkol...@gmail.com wrote:

 fyi, pulling/merging (e.g. adding IL back), building the kernels, booting
 and building the binaries works as expected for all cpu types in my
 environment (pc, bcm, rb and kw).

i'd put a vote into restoring il to the standard kernels.  there's no
downside.

- erik



Re: [9fans] syscall 53

2014-05-17 Thread erik quanstrom
On Sat May 17 06:28:02 EDT 2014, puta2001-...@yahoo.com wrote:

 Hello, help please, after recent (15 May) pull:
 
 mntgen 31: bad sys call number 53 pc 813f
 ipconfig, keyfs, webfs webcookies, faces = the same.
 ls -l for example shows
 ls 222: bad sys call number 53 pc bb8f
 ls 222: suicide: sys: bad sys call pc=0xbb8f
 acid leads to /sys/src/libc/386/main9.s:16

looks like nsec() in the c library was replaced with a syscall, and
this was put into libc, and many things were recompiled.  unfortunately
pull does not update your kernel, and your kernel doesn't support nsec.

if you have a dump file system, i would recommend copying /386/bin
executables back from before the may 15 update.

if you don't, the next best option is to copy mk from my contrib, then

r=/n/sources/contrib/quanstro/rescue
9fs sources 
cp $r/mk /386/bin/mk
cd /sys/src/libc/9sys
cp $r/mkfile $r/nsec.c .
mk

now you can rebuild /sys/src/cmd as needed to build a kernel with nsec.

by the way, i'm currenttly using this version of nsec and i prefer it, though
it requires 3 syscalls and not two, and takes about 2x as long.  it's still
just around 1µs on most of my machines.  if nsec() is causing issues, i would
think that cycles(2) would be a good option, and much faster than a syscall.

; cat nsec.c
#include u.h
#include libc.h

vlong
nsec(void)
{
uchar b[8];
int fd;

fd = open(/dev/bintime, OREAD);
if(fd != -1)
if(pread(fd, b, sizeof b, 0) == sizeof b){
close(fd);
return getbe(b, sizeof b);
}
close(fd);
return 0;
}



Re: [9fans] syscall 53

2014-05-17 Thread erik quanstrom
 it requires 3 syscalls and not two, and takes about 2x as long.  it's still

good grief.  s/two/one/.  

- erik



Re: [9fans] syscall 53

2014-05-17 Thread erik quanstrom
On Sat May 17 09:55:16 EDT 2014, puta2001-...@yahoo.com wrote:

 
 
 Thanks, tried to compile kernel with no luck cause of the same syscall 53. 
 Was postponing some kind of dump file system until it finally got me :) webfs 
 needs 53,  so no internet. Will load some linux and copy kernels into 9fat. 
 thanks

if you follow the directions i sent, i think you should be able to rejuvinate 
all your binaries.
then it will not be necessary to update the kernel immediately.  essentially 
you will be boostrapping
a downgrade.

you will need to update /sys/src/libc/9syscall/sys.h to build a new kernel:

/n/sources/plan9/sys/src/libc/9syscall/sys.h:49,52 - 
/sys/src/libc/9syscall/sys.h:49,51
  #define   PREAD   50
  #define   PWRITE  51
  #define   TSEMACQUIRE 52
- #define NSEC  53

- erik



Re: [9fans] syscall 53

2014-05-17 Thread erik quanstrom
On Sat May 17 15:16:03 EDT 2014, puta2001-...@yahoo.com wrote:
 
 p.s.  Caps lock is not working.  Also copying in 9fat directory shifts
 file time current time+3 hours, even +6 hours if renaming (mv).  My
 timezone is +3 GMT.  Its native plan9 on ibm t42.

the standard plan 9 key map maps caps lock - ctl.  you can change
the mapping with this script

#!/bin/rc
kval=0xf862
if(~ $1 on)
kval=0xf864
/dev/kbmap echo 0 0x3a $kval

if you have a caps lock led, it won't work with sources.  i added support
to 9atom though for both usb and ps/2 keyboards.

the other bit is all fat's fault by keeping times relative to local time not 
gmt.

- erik



Re: [9fans] RaspberryPi, monitor energy saving

2014-05-16 Thread erik quanstrom
On Fri May 16 04:47:03 EDT 2014, st...@quintile.net wrote:

 i believe that this works for vga attached monitors, vesa says that when the 
 clocks
 disappear on the sync the monitor should shutdown.
 
 the raspberry pi uses hdmi and also it doesn't use a vesa bios, it has a gpu 
 bios
 which does a similar job but is not standardised, and, though it is 
 documented,
 it can be tough to use (for me at least).
 
 i am happy to be contradicted on any of this of couse.

i don't have any hdmi monitors, but the vga monitors i do have
will power down when connected to the pi.  i can't check my
source against sources right now since i can't seem to connect.

- erik



Re: [9fans] RaspberryPi, monitor energy saving

2014-05-16 Thread erik quanstrom
On Fri May 16 08:33:41 EDT 2014, st...@quintile.net wrote:
 Mmm, that feels like good and bad news.
 
 I know richard did what he could to shut down the screen when
 its idle for a while so that seems to do the right thing with vga
 monitors, but I guess I do need CEC.
 
 Oh well, time for more digging.

is a hdmi-vga connector too gruesome, or unworkable?

- erik



Re: [9fans] Bread + note - loss

2014-05-16 Thread erik quanstrom
On Fri May 16 15:26:28 EDT 2014, cinap_len...@felloff.net wrote:
 btw, whats the program that gets hit by alarm notes but wants to
 continue with Bread()?

there's one you wrote, but more on that offline.

- erik



Re: [9fans] linking a program to run at a high address

2014-05-15 Thread erik quanstrom
On Thu May 15 15:03:10 EDT 2014, rminn...@gmail.com wrote:
 I've done this, and I've forgotten how. I need to tell 6l to link a
 program to run at
 
 0x7f00
 
 I've tried various combos of -T, -R, and -D and am failing to get the
 right result ... any hints to revive my poor memory would be welcome.

if you're talking about amd64, i don't think you did.  unless the high
address was a sign-extended 32-bit value. it's a limitation of the 
architecture.  

i suppose you could if the program were RIP-relative, but 6l doesn't do that.

- erik



Re: [9fans] linking a program to run at a high address

2014-05-15 Thread erik quanstrom
On Thu May 15 15:19:39 EDT 2014, cinap_len...@felloff.net wrote:
 that wont work for a.out userspace binary. the kernel loads
 the text segment on fixed base address UTZERO. in the a.out
 header are just longs with the sizes of the segments. theres
 an entry field but it doesnt change where the kernel puts the
 text segment.
 
 but you probably do not try to produce an a.out?

there is a provision for a 64-bit address in the extended a.out header.  

the problem is the amd64 architecture.  ron actually pointed this out to me
way back, when i thought it would be neater to load the kernel lower
than 0xf011 to allow the kernel to map more than 256mb
of memory, but that's not possible.  if using absolute addressing, the kernel
needs to load at a sign extended virtual address, or below 4g.  if the kernel
were rip-relative, i beleve it could be just about anywhere in the virtual
address space, but i haven't tried this and i may have missed a wherefore
in the intel manual.

it turns out that it's just as easy to load the data at KSEG2, so that's what
is done.  in fact, perhaps KZERO should be moved up to 2⁶⁴-64MB.  then
all the data could be in KSEG2.  clean, if a little unconventional.

- erik



Re: [9fans] Bread + note - loss

2014-05-14 Thread erik quanstrom
On Mon May 12 12:42:21 EDT 2014, cinap_len...@felloff.net wrote:
 why? if the program doesnt handle the note, then it shouldnt matter if it
 clunks the biobuf or not as it will be exited by the default handler.

you're right.

- erik



[9fans] ainc() 386/amd64 differences

2014-05-14 Thread erik quanstrom
for 386, libc has this definition for ainc

TEXT ainc(SB), $0   /* int ainc(int *); */
MOVLaddr+0(FP), BX
ainclp:
MOVL(BX), AX
MOVLAX, CX
INCLCX
LOCK
BYTE$0x0F; BYTE $0xB1; BYTE $0x0B   /* CMPXCHGL CX, (BX) */
JNZ ainclp
MOVLCX, AX
RET

the amd64 kernel has had this definition (the pc kernel doesn't define ainc)

TEXT ainc(SB), 1, $-4
MOVL$1, AX
LOCK; XADDL AX, (RARG)
ADDL$1, AX  /* overflow if -ve or 0 */
JGT _return
_trap:
XORQBX, BX
MOVQ(BX), BX/* over under sideways down */
_return:
RET

these are substantially different in two ways.
- the first is not wait free.  the second may be wait free.
- the second is geared toward reference counting, and will
trap instead of wrapping.  it can't be used for generating
a unique sequence.

i'd like to see the amd64 kernel version replace incref, and
this version of ainc

TEXT ainc(SB), 1, $-4
TEXT ainc32(SB), 1, $-4
MOVL$1, AX
LOCK; XADDL AX, (RARG)
INCLAX
RET

what does the list think?

- erik



Re: [9fans] ainc() 386/amd64 differences

2014-05-14 Thread erik quanstrom
 int
 incref(Ref *r)
 {
 int x;
 
 x = ainc(r-ref);
 if(x = 0)
 panic(incref pc=%#p, getcallerpc(r));
 return x;
 }

ah, yes.  i'd not remembered this nice implementation.

then your ainc is guard-free?  and your Ref is struct Ref {int ref;}?
also, did you decide that any reuse of the ref lock is already buggy and needs
no further review?  that's the bit i got bogged down on.

- erik



Re: [9fans] ainc() 386/amd64 differences

2014-05-14 Thread erik quanstrom
On Wed May 14 17:27:58 EDT 2014, quans...@quanstro.net wrote:
  int
  incref(Ref *r)
  {
  int x;
  
  x = ainc(r-ref);
  if(x = 0)
  panic(incref pc=%#p, getcallerpc(r));
  return x;
  }
 
 ah, yes.  i'd not remembered this nice implementation.
 
 then your ainc is guard-free?  and your Ref is struct Ref {int ref;}?
 also, did you decide that any reuse of the ref lock is already buggy and needs
 no further review?  that's the bit i got bogged down on.

i've rebooted my kernel with this change, and it appears to be solid.
still a bit concerned about additional consistency forced by the prior
incref.

- erik



Re: [9fans] arm fun

2014-05-13 Thread erik quanstrom
 When I’d try and kill it, there’d be a likely chance that rc
 would also get the same Semacquire deadlock.  This can also be seen
 using broke to try and prune dead dns processes:
 
   dream% acid 158
   /proc/158/text:arm plan 9 executable
   /sys/lib/acid/port
   /sys/lib/acid/arm
   acid: stk()
   semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
   lock(l=0x17104)+0x20 /sys/src/libc/port/lock.c:10
   plock()+0x8 /sys/src/libc/port/malloc.c:80
   poolalloc(p=0x18a8c,n=0x10)+0xc /sys/src/libc/port/pool.c:1223
   mallocz(size=0x8,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
   Malloc()+0x8 /sys/src/cmd/rc/plan9.c:624
   emalloc(n=0x8)+0x4 /sys/src/cmd/rc/subr.c:9
   newword(wd=0x18e4e,next=0x202d0)+0x8 /sys/src/cmd/rc/exec.c:33
   pushword(wd=0x18e4e)+0x40 /sys/src/cmd/rc/exec.c:44
   execforkexec()+0x34 /sys/src/cmd/rc/havefork.c:223
   Xsimple()+0x170 /sys/src/cmd/rc/simple.c:62
   main(argv=0x5f94,argc=0x2)+0x320 /sys/src/cmd/rc/exec.c:184
   _main+0x28 /sys/src/libc/arm/main9.s:19
   acid: 

image cache strikes again?

- erik



Re: [9fans] arm fun

2014-05-13 Thread Erik Quanstrom
the lock is in the bss.  maybe the wrong page gets accessed after fork.

Charles Forsyth charles.fors...@gmail.com wrote:

I've got one of those that was fine last time I tried it. I'll try it in the 
morning.

I wonder whether the change of lock to use semacquire instead of tas doesn't 
work well on the (that) ARM.

It seems a strange coincidence that it always fails there.





[9fans] Bread + note - loss

2014-05-12 Thread erik quanstrom
in looking at a particular situation with Bread, i noticed that it
Bgetc, and Bgetrune differ from Brdstr in what it does with read
returns a count = 0. Brdstr just returns nil.  Bread sets Binactive.

this wouldn't matter if there were not two fundamentally different
types of notes: alarms and everything else.  (from this perspective
it seems a shame that plan 9 doesn't allow syscalls to be restarted.)

for the interrupted case, this Bread with a different, undocumented,
(and bizarre) recovery strategy.  a Bseek(b, 0, 1) will rejuvinate the
Biobuf for Bread, but one must carefully paste the interrupted slop
together with a second response if one wishes to recover from an
interrupted read.

so, boo.  this seems like a real painful corner case.  and i don't see
any easy way out, unless bio were modified to allow read and write
to be replaced with indirect function calls.  this might be very interesting
for threaded applications.

crazy?

- erik



Re: [9fans] Bread + note - loss

2014-05-12 Thread erik quanstrom
On Mon May 12 12:19:47 EDT 2014, cinap_len...@felloff.net wrote:
 why not check the error string in Bread() and see if its interrupted
 and in that case, dont inactivate the stream? or are i'm missing
 something?

that's an excellent suggestion.  i thought about it before posting, but
sort of discarded that line of thinking.  i think the difficulty might
be in disturbing programs that depend on this behavior.

- erik



Re: [9fans] Bread + note - loss

2014-05-12 Thread erik quanstrom
i should elaborate.  the case were any error or interrupt looks like Beof
seems like the right thing for any program that looks like a filter.  this
is the majority of programs.  and this is the current behavior.  i wouldn't
want to make the simple case tricky.  but the hard case should also not
be impossible or impossibly tricky, either.

the suggestion of having Biobuf.^(read|write) skirts this issue buy allowing
a sophisticated program to interrupt itself and recover gracefully, without
letting Bio in on the joke.

- erik



Re: [9fans] DNS failures

2014-05-11 Thread erik quanstrom
 It is likely that I have a consistency error in the database:
 something along these lines, but not very clearly, was reported when I
 tried to do a zone transfer from Plan 9 to NetBSD; I have not been
 able to track the cause down.

yes, there clearly is junk in the in-memory database.  that line is

assert(rp-magic == RRmagic  rp-cached);

i would recommend using acid to see which one of these is false.
dumping the whole rp would make sense.

- erik



Re: [9fans] vx32 compilation for osx

2014-05-10 Thread erik quanstrom
 9vx for osx is for i386

9vx depends on 386 features.  it does
not extend to amd64.

- erik



Re: [9fans] vx32 compilation for osx

2014-05-10 Thread erik quanstrom
On Sat May 10 10:04:09 EDT 2014, ara...@mgk.ro wrote:
 It's easy to make it use clang directly (instead of gcc wrapper) and
 compile it in 32-bit mode, the larger issue is that it uses an
 obsolete devdraw implementation that doesn't compile in Mavericks any
 more...

are you sure that there are (full) 32-bit apis for cocoa?  the even larger
issue is that 9vx may be completely unsupportable with mavricks.
help me old by 10 8 krufted executable, you're my only hope.

- erik



<    1   2   3   4   5   6   7   8   9   10   >