Re: fix seekdir(3)

2013-11-04 Thread Philip Guenther
On Sat, Nov 2, 2013 at 5:48 PM, Ingo Schwarze schwa...@usta.de wrote:
 Here is an updated patch which now works correctly with Otto's
 regression test, with the new test i just committed, and with
 the test from the Perl test suite Andrew pointed out, even with
 threads enabled.  It also survived quite some manual testing.
...
 Comments?  Tests?  OKs?
...
  void
  seekdir(DIR *dirp, long loc)
  {
 +   struct dirent *dp;
 +
 +   /*
 +* First check whether the directory entry to seek for
 +* is still buffered in the directory structure in memory.
 +*/
 +
 _MUTEX_LOCK(dirp-dd_lock);
 -   __seekdir(dirp, loc);
 +   dp = (struct dirent *)dirp-dd_buf;
 +   if (dirp-dd_size  0  dp-d_off = loc) {
 +   for (dirp-dd_loc = 0;
 +dirp-dd_loc  dirp-dd_size;
 +dirp-dd_loc += dp-d_reclen) {
 +   dp = (struct dirent *)(dirp-dd_buf + dirp-dd_loc);
 +   if (dp-d_off  loc)
 +   continue;

This diff assumes that the d_off values for directory entries are
monotonically increasing as you advance through the directory.  That
is not guaranteed and is false for our implementation for some
filesystems.  The d_off values for NFS come straight from the NFS
server, and the NFS RFCs place no requirement on the cookies returned
by the server other than that the zero cookie indicates the first
entry.  (An OpenBSD NFS server returns cookies from the underlying
filesystem, so this will be difficult to see on a normal system.)

The current tmpfs code will also cause problems for this diff: the
d_off values for tmpfs are practically random, being derived from
kernel pointers returned by malloc(9).  That will need to change, as
kernel pointer values shouldn't be exposed to userland like that, but
it should serve as a way to confirm that this behavior causes problems
for the optimized seekdir().

One possibility is for the kernel to tell userland whether d_off
values are sure to be monotonically increasing for the opened
directory and then have userland only use the opimization when that's
the case.  We could add a pathconf/fpathconf name to get that
information: fpathconf(fd, _PC_DIRECTORY_MONOTONIC_OFFSETS) anyone?


Philip Guenther



Re: fix seekdir(3)

2013-11-04 Thread Ingo Schwarze
Hi Philip,

thanks for looking at this and for your insight.  The number of aspects
to keep in mind about this patch is growing, we have to disentangle.

I will send a minimal one-line patch to just fix the bug and do nothing
else.  We should get that one in quickly.
That would also be a candidate for -stable, i think.
I hope to come round to that tonight.

Then i will send two cleanup patches to remove useless stuff
and put the code into the right place, not changing any functionality.
That will make the cleanup easier to review.

Finally, we can work out how to do the optimization.
Probably, that will naturally factorize into two steps:

 (1) Use the information available in the userland buffer
 to avoid the getdents(2) syscall, *without* assuming
 monotonicity.

 (2) Further optimize searching when monotonicity is available.

Yours,
  Ingo


Philip Guenther wrote on Mon, Nov 04, 2013 at 12:26:34AM -0800:
 On Sat, Nov 2, 2013 at 5:48 PM, Ingo Schwarze schwa...@usta.de wrote:

 Here is an updated patch which now works correctly with Otto's
 regression test, with the new test i just committed, and with
 the test from the Perl test suite Andrew pointed out, even with
 threads enabled.  It also survived quite some manual testing.
 ...
 Comments?  Tests?  OKs?
 ...
  void
  seekdir(DIR *dirp, long loc)
  {
 +   struct dirent *dp;
 +
 +   /*
 +* First check whether the directory entry to seek for
 +* is still buffered in the directory structure in memory.
 +*/
 +
 _MUTEX_LOCK(dirp-dd_lock);
 -   __seekdir(dirp, loc);
 +   dp = (struct dirent *)dirp-dd_buf;
 +   if (dirp-dd_size  0  dp-d_off = loc) {
 +   for (dirp-dd_loc = 0;
 +dirp-dd_loc  dirp-dd_size;
 +dirp-dd_loc += dp-d_reclen) {
 +   dp = (struct dirent *)(dirp-dd_buf + dirp-dd_loc);
 +   if (dp-d_off  loc)
 +   continue;

 This diff assumes that the d_off values for directory entries are
 monotonically increasing as you advance through the directory.  That
 is not guaranteed and is false for our implementation for some
 filesystems.  The d_off values for NFS come straight from the NFS
 server, and the NFS RFCs place no requirement on the cookies returned
 by the server other than that the zero cookie indicates the first
 entry.  (An OpenBSD NFS server returns cookies from the underlying
 filesystem, so this will be difficult to see on a normal system.)
 
 The current tmpfs code will also cause problems for this diff: the
 d_off values for tmpfs are practically random, being derived from
 kernel pointers returned by malloc(9).  That will need to change, as
 kernel pointer values shouldn't be exposed to userland like that, but
 it should serve as a way to confirm that this behavior causes problems
 for the optimized seekdir().
 
 One possibility is for the kernel to tell userland whether d_off
 values are sure to be monotonically increasing for the opened
 directory and then have userland only use the opimization when that's
 the case.  We could add a pathconf/fpathconf name to get that
 information: fpathconf(fd, _PC_DIRECTORY_MONOTONIC_OFFSETS) anyone?



Re: Fixing an LLVM warning in the i2o code

2013-11-04 Thread Kenneth R Westerback
On Sun, Nov 03, 2013 at 10:51:43PM -0500, Brad Smith wrote:
 LLVM errors out on the i2o code with the following warning..
 
 ../../../../dev/i2o/iop.c:2399:42: error: comparison of unsigned expression  
 0 is always false [-Werror,-Wtautological-compare]
 pt-pt_nbufs  0 || pt-pt_replylen  0 ||
 ~~~ ^ ~
 
 Looking at the i2o code it looks as if the pt_replylen field isn't set 
 anywhere and
 doesn't do anything. It looks like it can be garbage collected.
 
 Comments? OK?

When WikiPedia describes i2o as a defunct computer input/output
specification whose SIG was open-source hostile (and was disbanded
in 2000), and which was implemented on only a few server class
machines, it is perhaps past time we lightened the kernel a bit.
:-)

 Ken

 
 
 Index: iop.c
 ===
 RCS file: /home/cvs/src/sys/dev/i2o/iop.c,v
 retrieving revision 1.38
 diff -u -p -r1.38 iop.c
 --- iop.c 30 May 2013 16:15:02 -  1.38
 +++ iop.c 4 Nov 2013 03:13:45 -
 @@ -2396,8 +2396,9 @@ iop_passthrough(struct iop_softc *sc, st
   pt-pt_msglen  (letoh16(sc-sc_status.inboundmframesize)  2) ||
   pt-pt_msglen  sizeof(struct i2o_msg) ||
   pt-pt_nbufs  IOP_MAX_MSG_XFERS ||
 - pt-pt_nbufs  0 || pt-pt_replylen  0 ||
 -pt-pt_timo  1000 || pt-pt_timo  5*60*1000)
 + pt-pt_nbufs  0 ||
 + pt-pt_timo  1000 ||
 + pt-pt_timo  5*60*1000)
   return (EINVAL);
  
   for (i = 0; i  pt-pt_nbufs; i++)
 @@ -2446,8 +2447,6 @@ iop_passthrough(struct iop_softc *sc, st
   i = (letoh32(im-im_rb-msgflags)  14)  ~3;
   if (i  IOP_MAX_MSG_SIZE)
   i = IOP_MAX_MSG_SIZE;
 - if (i  pt-pt_replylen)
 - i = pt-pt_replylen;
   if ((rv = copyout(im-im_rb, pt-pt_reply, i)) != 0)
   goto bad;
  
 Index: iopio.h
 ===
 RCS file: /home/cvs/src/sys/dev/i2o/iopio.h,v
 retrieving revision 1.2
 diff -u -p -r1.2 iopio.h
 --- iopio.h   26 Jun 2008 05:42:15 -  1.2
 +++ iopio.h   4 Nov 2013 03:14:02 -
 @@ -57,7 +57,6 @@ struct ioppt {
   void*pt_msg;/* pointer to message buffer */
   size_t  pt_msglen;  /* message buffer size in bytes */
   void*pt_reply;  /* pointer to reply buffer */
 - size_t  pt_replylen;/* reply buffer size in bytes */
   int pt_timo;/* completion timeout in ms */
   int pt_nbufs;   /* number of transfers */
   struct  ioppt_buf pt_bufs[IOP_MAX_MSG_XFERS]; /* transfers */
 
 -- 
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 



Re: Fixing an LLVM warning in the i2o code

2013-11-04 Thread David Gwynne

On 5 Nov 2013, at 12:40 am, Kenneth R Westerback kwesterb...@rogers.com wrote:

 On Sun, Nov 03, 2013 at 10:51:43PM -0500, Brad Smith wrote:
 LLVM errors out on the i2o code with the following warning..
 
 ../../../../dev/i2o/iop.c:2399:42: error: comparison of unsigned expression 
  0 is always false [-Werror,-Wtautological-compare]
pt-pt_nbufs  0 || pt-pt_replylen  0 ||
~~~ ^ ~
 
 Looking at the i2o code it looks as if the pt_replylen field isn't set 
 anywhere and
 doesn't do anything. It looks like it can be garbage collected.
 
 Comments? OK?
 
 When WikiPedia describes i2o as a defunct computer input/output
 specification whose SIG was open-source hostile (and was disbanded
 in 2000), and which was implemented on only a few server class
 machines, it is perhaps past time we lightened the kernel a bit.
 :-)

where there any recent edits to that wikipedia page by a certain mr westerback?

 
  Ken
 
 
 
 Index: iop.c
 ===
 RCS file: /home/cvs/src/sys/dev/i2o/iop.c,v
 retrieving revision 1.38
 diff -u -p -r1.38 iop.c
 --- iop.c30 May 2013 16:15:02 -  1.38
 +++ iop.c4 Nov 2013 03:13:45 -
 @@ -2396,8 +2396,9 @@ iop_passthrough(struct iop_softc *sc, st
  pt-pt_msglen  (letoh16(sc-sc_status.inboundmframesize)  2) ||
  pt-pt_msglen  sizeof(struct i2o_msg) ||
  pt-pt_nbufs  IOP_MAX_MSG_XFERS ||
 -pt-pt_nbufs  0 || pt-pt_replylen  0 ||
 -pt-pt_timo  1000 || pt-pt_timo  5*60*1000)
 +pt-pt_nbufs  0 ||
 +pt-pt_timo  1000 ||
 +pt-pt_timo  5*60*1000)
  return (EINVAL);
 
  for (i = 0; i  pt-pt_nbufs; i++)
 @@ -2446,8 +2447,6 @@ iop_passthrough(struct iop_softc *sc, st
  i = (letoh32(im-im_rb-msgflags)  14)  ~3;
  if (i  IOP_MAX_MSG_SIZE)
  i = IOP_MAX_MSG_SIZE;
 -if (i  pt-pt_replylen)
 -i = pt-pt_replylen;
  if ((rv = copyout(im-im_rb, pt-pt_reply, i)) != 0)
  goto bad;
 
 Index: iopio.h
 ===
 RCS file: /home/cvs/src/sys/dev/i2o/iopio.h,v
 retrieving revision 1.2
 diff -u -p -r1.2 iopio.h
 --- iopio.h  26 Jun 2008 05:42:15 -  1.2
 +++ iopio.h  4 Nov 2013 03:14:02 -
 @@ -57,7 +57,6 @@ struct ioppt {
  void*pt_msg;/* pointer to message buffer */
  size_t  pt_msglen;  /* message buffer size in bytes */
  void*pt_reply;  /* pointer to reply buffer */
 -size_t  pt_replylen;/* reply buffer size in bytes */
  int pt_timo;/* completion timeout in ms */
  int pt_nbufs;   /* number of transfers */
  struct  ioppt_buf pt_bufs[IOP_MAX_MSG_XFERS]; /* transfers */
 
 -- 
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 
 




Re: Fixing an LLVM warning in the i2o code

2013-11-04 Thread Janne Johansson
Could be the FBI code we were looking for!



2013/11/4 David Gwynne l...@animata.net


 On 5 Nov 2013, at 12:40 am, Kenneth R Westerback kwesterb...@rogers.com
 wrote:

  On Sun, Nov 03, 2013 at 10:51:43PM -0500, Brad Smith wrote:
  LLVM errors out on the i2o code with the following warning..
 
  ../../../../dev/i2o/iop.c:2399:42: error: comparison of unsigned
 expression  0 is always false [-Werror,-Wtautological-compare]
 pt-pt_nbufs  0 || pt-pt_replylen  0 ||
 ~~~ ^ ~
 
  Looking at the i2o code it looks as if the pt_replylen field isn't set
 anywhere and
  doesn't do anything. It looks like it can be garbage collected.
 
  Comments? OK?
 
  When WikiPedia describes i2o as a defunct computer input/output
  specification whose SIG was open-source hostile (and was disbanded
  in 2000), and which was implemented on only a few server class
  machines, it is perhaps past time we lightened the kernel a bit.
  :-)

 where there any recent edits to that wikipedia page by a certain mr
 westerback?

 
   Ken
 
 
 
  Index: iop.c
  ===
  RCS file: /home/cvs/src/sys/dev/i2o/iop.c,v
  retrieving revision 1.38
  diff -u -p -r1.38 iop.c
  --- iop.c30 May 2013 16:15:02 -  1.38
  +++ iop.c4 Nov 2013 03:13:45 -
  @@ -2396,8 +2396,9 @@ iop_passthrough(struct iop_softc *sc, st
   pt-pt_msglen  (letoh16(sc-sc_status.inboundmframesize) 
 2) ||
   pt-pt_msglen  sizeof(struct i2o_msg) ||
   pt-pt_nbufs  IOP_MAX_MSG_XFERS ||
  -pt-pt_nbufs  0 || pt-pt_replylen  0 ||
  -pt-pt_timo  1000 || pt-pt_timo  5*60*1000)
  +pt-pt_nbufs  0 ||
  +pt-pt_timo  1000 ||
  +pt-pt_timo  5*60*1000)
   return (EINVAL);
 
   for (i = 0; i  pt-pt_nbufs; i++)
  @@ -2446,8 +2447,6 @@ iop_passthrough(struct iop_softc *sc, st
   i = (letoh32(im-im_rb-msgflags)  14)  ~3;
   if (i  IOP_MAX_MSG_SIZE)
   i = IOP_MAX_MSG_SIZE;
  -if (i  pt-pt_replylen)
  -i = pt-pt_replylen;
   if ((rv = copyout(im-im_rb, pt-pt_reply, i)) != 0)
   goto bad;
 
  Index: iopio.h
  ===
  RCS file: /home/cvs/src/sys/dev/i2o/iopio.h,v
  retrieving revision 1.2
  diff -u -p -r1.2 iopio.h
  --- iopio.h  26 Jun 2008 05:42:15 -  1.2
  +++ iopio.h  4 Nov 2013 03:14:02 -
  @@ -57,7 +57,6 @@ struct ioppt {
   void*pt_msg;/* pointer to message buffer */
   size_t  pt_msglen;  /* message buffer size in bytes */
   void*pt_reply;  /* pointer to reply buffer */
  -size_t  pt_replylen;/* reply buffer size in bytes */
   int pt_timo;/* completion timeout in ms */
   int pt_nbufs;   /* number of transfers */
   struct  ioppt_buf pt_bufs[IOP_MAX_MSG_XFERS]; /* transfers */
 
  --
  This message has been scanned for viruses and
  dangerous content by MailScanner, and is
  believed to be clean.
 
 





-- 
May the most significant bit of your life be positive.


Re: Re : Re: Improve routing functions

2013-11-04 Thread Adam Thompson

On 13-11-03 02:27 PM, Loïc BLOT wrote:

then to explain my draft here is my own configuration, and why it could
be useful to set custom priorities:
[...]
Without the possibility to change the priorities (and dynamically is
better than recompile the kernel and change constant values, it would be
a great function to everybody want), it's impossible to solve this
routing loop (i have patched ospfd to refuse adding some specific routes
from specific hosts but it's not a proper solution, whereas it
worked...).


FWIW, I agree with Loïc on this; a router administrator should be able 
to have fine-grained control over route preference.


I've run into this in the past, where in a strange topology I wound up 
with a router (not OpenBSD) learning the same route via EIGRP, OSPF and 
BGP - each with a different next-hop.  Only one of those used the 
preferred (low-latency, high-bandwidth) path, I don't recall which, but 
we did have to manually adjust the local preference of one of the 
protocols in order to make it win.


Policy routing would not (could not) have solved the problem, since it 
was all dynamic routes.


I'm about to re-build something similar, actually, using BGP and OSPF 
(or possibly BGP and static routes), where I will need the ability to 
control which routing protocol takes precedence.  I'll have a route 
learned via BGP that I want used first, and a route learned by OSPF that 
I want used only if the BGP route vanishes. (It's a partial overlay 
network, with backup link over VPN to maintain control functions in the 
case of an outage on the main link which talks BGP to its peer.)


However... in reading route(8), I see the -priority flag exists. 
Delving a bit deeper into .../net/route.h, I see that the routing table 
*already has* exactly the support I need (and that I think Loïc needs) 
in rt_priority and the associated macros.


The change I think we're both asking for is that in 
.../usr/sbin/bgpd/kroute.c, on line 505 (5.4-RELEASE), where we see 
kr-r.priority = RTP_BGP;,  we need a way to override that value in 
(presumably) bgpd.conf.  (Ditto for the IPv6 function.)


Similarly, in .../usr/sbin/ospfd/kroute.c:257, where we have 
kn-r.priority = RTP_OSPF;, and presumably the same sort of thing in 
routed(8), ripd(8), ospf6d(8), possibly even ldpd(8)...


I believe all the support we need is already in the kernel, what we lack 
is a user-exposed knob to fiddle with those (for now) constant values.  
I believe they should be default values, not constant values.


I believe it is possible to work around this problem currently with 
route change, but doing so is a very, very ugly idea (worse in Loïc's 
case than mine, I think).  You'd have to have a script/daemon of your 
own watching the output from route monitor and executing route 
change every time a route gets inserted with the wrong priority.  This 
leads to much duplication of routing logic, since the correct route is 
not always known a priori.


--
-Adam Thompson
 athom...@athompso.net



Re: Fixing an LLVM warning in the i2o code

2013-11-04 Thread Kenneth R Westerback
On Tue, Nov 05, 2013 at 02:24:22AM +1000, David Gwynne wrote:
 
 On 5 Nov 2013, at 12:40 am, Kenneth R Westerback kwesterb...@rogers.com 
 wrote:
 
  On Sun, Nov 03, 2013 at 10:51:43PM -0500, Brad Smith wrote:
  LLVM errors out on the i2o code with the following warning..
  
  ../../../../dev/i2o/iop.c:2399:42: error: comparison of unsigned 
  expression  0 is always false [-Werror,-Wtautological-compare]
 pt-pt_nbufs  0 || pt-pt_replylen  0 ||
 ~~~ ^ ~
  
  Looking at the i2o code it looks as if the pt_replylen field isn't set 
  anywhere and
  doesn't do anything. It looks like it can be garbage collected.
  
  Comments? OK?
  
  When WikiPedia describes i2o as a defunct computer input/output
  specification whose SIG was open-source hostile (and was disbanded
  in 2000), and which was implemented on only a few server class
  machines, it is perhaps past time we lightened the kernel a bit.
  :-)
 
 where there any recent edits to that wikipedia page by a certain mr 
 westerback?
 

Were there? I thought I pushed the 'cancel' button. Odd.

 Ken

  
   Ken
  
  
  
  Index: iop.c
  ===
  RCS file: /home/cvs/src/sys/dev/i2o/iop.c,v
  retrieving revision 1.38
  diff -u -p -r1.38 iop.c
  --- iop.c  30 May 2013 16:15:02 -  1.38
  +++ iop.c  4 Nov 2013 03:13:45 -
  @@ -2396,8 +2396,9 @@ iop_passthrough(struct iop_softc *sc, st
 pt-pt_msglen  (letoh16(sc-sc_status.inboundmframesize)  2) ||
 pt-pt_msglen  sizeof(struct i2o_msg) ||
 pt-pt_nbufs  IOP_MAX_MSG_XFERS ||
  -  pt-pt_nbufs  0 || pt-pt_replylen  0 ||
  -pt-pt_timo  1000 || pt-pt_timo  5*60*1000)
  +  pt-pt_nbufs  0 ||
  +  pt-pt_timo  1000 ||
  +  pt-pt_timo  5*60*1000)
 return (EINVAL);
  
 for (i = 0; i  pt-pt_nbufs; i++)
  @@ -2446,8 +2447,6 @@ iop_passthrough(struct iop_softc *sc, st
 i = (letoh32(im-im_rb-msgflags)  14)  ~3;
 if (i  IOP_MAX_MSG_SIZE)
 i = IOP_MAX_MSG_SIZE;
  -  if (i  pt-pt_replylen)
  -  i = pt-pt_replylen;
 if ((rv = copyout(im-im_rb, pt-pt_reply, i)) != 0)
 goto bad;
  
  Index: iopio.h
  ===
  RCS file: /home/cvs/src/sys/dev/i2o/iopio.h,v
  retrieving revision 1.2
  diff -u -p -r1.2 iopio.h
  --- iopio.h26 Jun 2008 05:42:15 -  1.2
  +++ iopio.h4 Nov 2013 03:14:02 -
  @@ -57,7 +57,6 @@ struct ioppt {
 void*pt_msg;/* pointer to message buffer */
 size_t  pt_msglen;  /* message buffer size in bytes */
 void*pt_reply;  /* pointer to reply buffer */
  -  size_t  pt_replylen;/* reply buffer size in bytes */
 int pt_timo;/* completion timeout in ms */
 int pt_nbufs;   /* number of transfers */
 struct  ioppt_buf pt_bufs[IOP_MAX_MSG_XFERS]; /* transfers */
  
  -- 
  This message has been scanned for viruses and
  dangerous content by MailScanner, and is
  believed to be clean.
  
  
 



Re: fix seekdir(3)

2013-11-04 Thread Ingo Schwarze
Ingo Schwarze wrote on Mon, Nov 04, 2013 at 09:51:41AM +0100:

 I will send a minimal one-line patch to just fix the bug
 and do nothing else.  We should get that one in quickly.
 That would also be a candidate for -stable, i think.
 I hope to come round to that tonight.

Here it is.

This fixes both our own regression tests (Ottos and mine)
and the perl-5.18 op/threads-dirh.t test.

This is not particularly efficient, forcing getdents(2) for each
readdir(3) following seekdir(3), but at least it produces correct
results, for a start.

OK?
  Ingo


Index: telldir.c
===
RCS file: /cvs/src/lib/libc/gen/telldir.c,v
retrieving revision 1.15
diff -u -p -r1.15 telldir.c
--- telldir.c   16 Aug 2013 05:27:39 -  1.15
+++ telldir.c   4 Nov 2013 21:11:55 -
@@ -67,5 +67,6 @@ telldir(DIR *dirp)
 void
 __seekdir(DIR *dirp, long loc)
 {
+   dirp-dd_loc = 0;
dirp-dd_curpos = lseek(dirp-dd_fd, loc, SEEK_SET);
 }



Re: Re : Re: Improve routing functions

2013-11-04 Thread Claudio Jeker
On Mon, Nov 04, 2013 at 10:36:39AM -0600, Adam Thompson wrote:
 On 13-11-03 02:27 PM, Loïc BLOT wrote:
 then to explain my draft here is my own configuration, and why it could
 be useful to set custom priorities:
 [...]
 Without the possibility to change the priorities (and dynamically is
 better than recompile the kernel and change constant values, it would be
 a great function to everybody want), it's impossible to solve this
 routing loop (i have patched ospfd to refuse adding some specific routes
 from specific hosts but it's not a proper solution, whereas it
 worked...).
 
 FWIW, I agree with Loïc on this; a router administrator should be
 able to have fine-grained control over route preference.
 
 I've run into this in the past, where in a strange topology I wound
 up with a router (not OpenBSD) learning the same route via EIGRP,
 OSPF and BGP - each with a different next-hop.  Only one of those
 used the preferred (low-latency, high-bandwidth) path, I don't
 recall which, but we did have to manually adjust the local
 preference of one of the protocols in order to make it win.
 
 Policy routing would not (could not) have solved the problem, since
 it was all dynamic routes.
 
 I'm about to re-build something similar, actually, using BGP and
 OSPF (or possibly BGP and static routes), where I will need the
 ability to control which routing protocol takes precedence.  I'll
 have a route learned via BGP that I want used first, and a route
 learned by OSPF that I want used only if the BGP route vanishes.
 (It's a partial overlay network, with backup link over VPN to
 maintain control functions in the case of an outage on the main link
 which talks BGP to its peer.)
 
 However... in reading route(8), I see the -priority flag exists.
 Delving a bit deeper into .../net/route.h, I see that the routing
 table *already has* exactly the support I need (and that I think
 Loïc needs) in rt_priority and the associated macros.
 
 The change I think we're both asking for is that in
 .../usr/sbin/bgpd/kroute.c, on line 505 (5.4-RELEASE), where we see
 kr-r.priority = RTP_BGP;,  we need a way to override that value
 in (presumably) bgpd.conf.  (Ditto for the IPv6 function.)

Yes, this is the way to go. One thing to consider in some way is what you
want to do if somebody changes the prio and the reloads the config.
I think bgpd would need to do something similar to:
bgpctl fib decouple
change prio
bgpctl fib couple
 
 Similarly, in .../usr/sbin/ospfd/kroute.c:257, where we have
 kn-r.priority = RTP_OSPF;, and presumably the same sort of thing
 in routed(8), ripd(8), ospf6d(8), possibly even ldpd(8)...

ldpd is a special beast and does not need to be changed but the other
should be. (routed is dead, ripd ospfd and ospf6d all use a similar
kroute.c file that also is used by bgpd so once one is done the others
should be easier). ldpd does not insert new routes it only extends them
with the MPLS info.

 I believe all the support we need is already in the kernel, what we
 lack is a user-exposed knob to fiddle with those (for now) constant
 values.  I believe they should be default values, not constant
 values.

Yes, there is one annoying bug left in the kernel that I will fix in the
near future. You can assume the kernel has all the knowledge you need.
 
 I believe it is possible to work around this problem currently with
 route change, but doing so is a very, very ugly idea (worse in
 Loïc's case than mine, I think).  You'd have to have a script/daemon
 of your own watching the output from route monitor and executing
 route change every time a route gets inserted with the wrong
 priority.  This leads to much duplication of routing logic, since
 the correct route is not always known a priori.

Please someone not as overworked as me should just add the knobs to the
routing deamons to allow setting a different prio. For example just a
simple:
fib-priority 45
would work.

-- 
:wq Claudio



Re: fix seekdir(3)

2013-11-04 Thread Philip Guenther
On Monday, November 4, 2013, Ingo Schwarze wrote:

 This fixes both our own regression tests (Ottos and mine)
 and the perl-5.18 op/threads-dirh.t test.

 This is not particularly efficient, forcing getdents(2) for each
 readdir(3) following seekdir(3), but at least it produces correct
 results, for a start.

 OK?


Yep, ok guenther@


Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-04 Thread Philip Guenther
On Tue, 5 Nov 2013, Torbjorn Granlund wrote:
 I am working on getting the GMP bignum library to work better on 
 OpenBSD.
 
 With current GMP sources (GMP 5.0.x, 5.1.x, and development head) a 
 'fat' build will not work on amd64 under OpenBSD 5.3 and 5.4.  With 
 older version of OpenBSD (I've tested 4.9, 5.0, 5.2) things work as 
 expected.
 
 The problem is related to relocs, in particlar R_X86_64_32S which we use 
 in a GMP assembly file for 'fat' binaries.
 
 On the problem OpenBSD releases, a fat GMP build end with this error:
 
 libtool: link: gcc -std=gnu99 -O2 -pedantic -fomit-frame-pointer -m64 -o 
 t-bswap t-bswap.o  ./.libs/libtests.a 
 /var/tmp/gmp-obj/hannahobsd64v53-stat-fat/.libs/libgmp.a ../.libs/libgmp.a
 /usr/bin/ld: 
 /var/tmp/gmp-obj/hannahobsd64v53-stat-fat/.libs/libgmp.a(fat_entry.o): 
 relocation R_X86_64_32S can not be used when making a shared object; 
 recompile with -fPIC
 /var/tmp/gmp-obj/hannahobsd64v53-stat-fat/.libs/libgmp.a: could not read 
 symbols: Bad value
 
 The error message is strange, considering that we certainly are not 
 making a shared object.  We are linking a plain object with a static 
 library.

Ah, but you are, sorta.  In OpenBSD 5.3, platforms where the compiler and 
toolchain support were for robust for it were switched to build PIE 
objects and executables by default.  So yes, that object _is_ expected to 
be position independent.  c.f. the gcc-local(1) manpage.

At least on OpenBSD, the processor will define __PIC__ and __pic__ when 
building both PIC _and_ PIE objects, so you can test those to determine 
whether the ASM should use position-independent code sequences.

(If you _really_ need to know whether it's building PIE (and not PIC) you 
can check for __PIE__ or __pie__, but that's rarely useful.  Indeed, there 
are _no_ tests for those in the OpenBSD base tree right now.)

At least some Linux distributions, PIE is used for system executables and 
you can build your own programs with -fpie -pie (check their docs) to 
enable it in your own.


 I consider this a major deviation from the established AMD64 ELF ABI. 
 It's your decision, and I will not try to make you change your minds. 
 But I need to understand the decision and how you intend to proceed.  
 Do you have a document describing the AMD64 OpenBSD ABI with which 
 developers could comply?

As I understand it, it's the standard amd64 PIE ABI.


 * It seems R_X86_64_64 works in non-PIC code such as in the GMP example,
   and while slower than R_X86_64_32S, GMP could use it as a workaround.
   But do you intend to keep that working, or do you intend to completely
   de-support any non-GOT data references?

R_X86_64_64 is safe for use in PIC/PIE code, though I would expect %rip 
relative addressing to be more efficient when it's applicable.


Philip Guenther