Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Tinker

On 2016-02-15 10:15, Constantine A. Murenin wrote:
..

I think it got reverted by:

..

but I'm not an expert so would wait on confirmation by Bob Beck.



Yes, I think you are correct, and it was indeed reverted.

..

But it looks like the functions that were introduced in the above
commit are still WIP and don't actually flip anything yet:

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_high

307buf_flip_high(struct buf *bp)

..

313/* XXX does nothing to buffer for now */

..

317buf_flip_dma(struct buf *bp)

..

324/* XXX does not flip buffer for now */



Thank you for clarifying. This is a quite big deal, for anyone with lots 
of disk IO and RAM. This will be #2 on my OpenBSD wishlist for the year.



How complex is this to implement, and who would be able to do it?


May donor powers come to me or someone else this year to contribute. 
OpenBSD is the finest OS out there.




Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-14 Thread Nick Holland
On 02/13/16 11:49, Tinker wrote:
> Hi,
> 
> 1)
> http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 
> 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all 
> active chunks", i.e. read operations are spread evenly across disks.
> 
> Since then did anyone implement selective reading based on experienced 
> read operation time, or a user-specified device read priority order?
> 
> 
> That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 
> HDD mirror, which would give the best combination of IO performance and 
> data security OpenBSD would offer today.

I keep flip-flopping on the merits of this.
At one point, I was with you, thinking, "great idea!  Back an expensive,
fast disk with a cheap disk".

Currently, I'm thinking, "REALLY BAD IDEA".  Here's my logic:

There's no such thing as an "expensive disk" anymore.  A quick look
shows me that I can WALK INTO my local computer store and pick up a 2TB
SSD for under $1000US.  Now, that looks like a lot of money, and as a
life-long cheapskate, when I get to four digits, I'm expecting at least
two wheels and an engine.  But in the Big Picture?  No.  That's one heck
of a lot of stunningly fast storage for a reasonable chunk of change.

Thirty-four years ago when I started in this business, I was installing
10MB disks for $2000/ea as fast as we could get the parts (and at that
time, you could get a darned nice car for five of those drives, and a
new Corvette cost less than ten of them).  Now sure, the price has
dropped a whole lot since then, and my first reaction would be "What
does that have to do anything?  I can buy 2TB disks for under $100,
that's a huge savings over the SSD!"  In raw dollars, sure.  Percentage?
 Sure.  In "value to business"?  I don't think so.  In 1982, people felt
the computers of the day were worth adding $2000 to to get a tiny amount
of "fast" storage to make their very few business apps run better.  No
question in their mind, it was worth it.  Now we do much more with our
computers and it costs much less.  The business value of our investment
should be much greater than it was in 1982.

And ignoring hardware, it is.  Companies drop thousands of dollars on
consulting and assistance and think nothing of it.  And in a major
computer project, a couple $1000 disks barely show as a blip on the
budget.  Hey, I'm all about being a cheap bastard whenever possible, but
this just isn't a reasonable place to be cheap, so not somewhere I'd
suggest spending developer resources.


Also ... it's probably a bad idea for functional reasons.  You can't
just assume that "slower" is better than "nothing" -- very often, it's
indistinguishable from "nothing".  In many cases, computer systems that
perform below a certain speed are basically non-functional, as tasks can
pile up on them faster than they can produce results.  Anyone who has
dealt with an overloaded database server, mail server or firewall will
know what I'm saying here -- at a certain load, they go from "running
ok" to "death spiral", and they do it very quickly.

If you /need/ the speed of an SSD, you can justify the cost of a pair of
'em.  If you can't justify the cost, you are really working with a
really unimportant environment, and you can either wait for two cheap
slow disks or skip the RAID entirely.

How fast do you need to get to your porn, anyway?

(now ... that being said, part of me would love a tmpfs / disk RAID1,
one that would come up degraded, and the disk would populate the RAM
disk, writes would go to both subsystems, reads would come from the RAM
disk once populated.  I could see this for some applications like CVS
repositories or source directories where things are "read mostly", and
typically smaller than a practical RAM size these days, and as there are
still a few orders of magnitude greater performance in a RAM disk than
an SSD and this will likely remain true for a while, there are SOME
applications where this could be nice)


> 2)
> Also if there's a read/write failure (or excessive time consumption for 
> a single operation, say 15 seconds), will Softraid RAID1 learn to take 
> the broken disk out of use?

As far as I am aware, Softraid (like most RAID systems, hw or sw) will
deactivate a drive which reports a failure.  Drives which go super slow
(i.e., always manage to get the data BEFORE the X'th retry at which they
would toss an error) never report an error back, so never deactivate the
drive.

Sound implausible?  Nope.  It Happens.  Frustrating as heck when you
have this happen to you until you figure it out.  In fact, one key
feature of "enterprise" and "RAID" grade disks is that when they hop
off-line and throw an error fast and early, to prevent this problem
(some "NAS" grade disks may do this.  Or they may just see your credit
limit hasn't been reached).

However, having done this for a looong time, and seen the problems from
both rapid-failure and "try and try" disks, I'll take the "try and try"
problem any 

Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Constantine A. Murenin
On 14 February 2016 at 10:29, Karel Gardas  wrote:
> On Sat, Feb 13, 2016 at 9:39 PM, Stuart Henderson  
> wrote:
>> There was this commit, I don't *think* it got reverted.
>>
>>
>>
>> CVSROOT:/cvs
>> Module name:src
>> Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20
>>
>> Modified files:
>> sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
>>  vfs_biomem.c vfs_vops.c
>> sys/sys: buf.h mount.h
>> sys/uvm: uvm_extern.h uvm_page.c
>> usr.bin/systat : iostat.c
>>
>> Log message:
>> High memory page flipping for the buffer cache.
>>
>> This change splits the buffer cache free lists into lists of dma reachable
>> buffers and high memory buffers based on the ranges returned by pmemrange.
>> Buffers move from dma to high memory as they age, but are flipped to dma
>> reachable memory if IO is needed to/from and high mem buffer. The total
>> amount of buffers  allocated is now bufcachepercent of both the dma and
>> the high memory region.
>>
>> This change allows the use of large buffer caches on amd64 using more than
>> 4 GB of memory
>>
>> ok tedu@ krw@ - testing by many.
>
> I think it got reverted by:
>
> commit ac77fb26761065b7f6031098e6a182cacfaf7437
> Author: beck 
> Date:   Tue Jul 9 15:37:43 2013 +
>
> back out the cache flipper temporarily to work out of tree.
> will come back soon.
> ok deraadt@
>
>
> but I'm not an expert so would wait on confirmation by Bob Beck.


Yes, I think you are correct, and it was indeed reverted.


Some parts have since been reimplemented and brought back by
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/kern/vfs_bio.c#rev1.170
on 2015/07/19:

http://marc.info/?l=openbsd-cvs=143732292523715=2

> CVSROOT:/cvs
> Module name:src
> Changes by:b...@cvs.openbsd.org2015/07/19 10:21:11
>
> Modified files:
> sys/kern   : vfs_bio.c vfs_vops.c
> sys/sys: buf.h
>
> Log message:
> Use two 2q caches for the buffer cache, moving previously warm buffers from 
> the
> first queue to the second.
> Mark the first queue as DMA in preparation for being able to use more memory
> by flipping. Flipper code currently only sets and clears the flag.
> ok tedu@ guenther@


But it looks like the functions that were introduced in the above
commit are still WIP and don't actually flip anything yet:

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_high

307buf_flip_high(struct buf *bp)
308{
309KASSERT(ISSET(bp->b_flags, B_BC));
310KASSERT(ISSET(bp->b_flags, B_DMA));
311KASSERT(bp->cache == DMA_CACHE);
312CLR(bp->b_flags, B_DMA);
313/* XXX does nothing to buffer for now */
314}

http://bxr.su/o/sys/kern/vfs_bio.c#buf_flip_dma

317buf_flip_dma(struct buf *bp)
318{
319KASSERT(ISSET(bp->b_flags, B_BC));
320KASSERT(ISSET(bp->b_flags, B_BUSY));
321if (!ISSET(bp->b_flags, B_DMA)) {
322KASSERT(bp->cache > DMA_CACHE);
323KASSERT(bp->cache < NUM_CACHES);
324/* XXX does not flip buffer for now */

Cheers,
Constantine.



Softraids not sanwhichable? Re: Can I accelerate [...]

2016-02-14 Thread Tinker

Dear Karel,

Are you saying that softraids not can be sandwhiched as of today -

If they can't, about what kind of complexity needs to be solved?

Thanks,
Tinker

On 2016-01-31 19:06, Karel Gardas wrote:
[..]

If you are interested in softraid, then perhaps some funding of fixes
which would allow running several softraid drives on top of another
would be great way to reuse your specific caching softraid discipline
with whatever softraid OpenBSD supports now.




Re: Buffer cache made to use >32bit mem addresses (i.e. >~3GB support for the buffer cache) nowadays or planned soon?

2016-02-14 Thread Karel Gardas
On Sat, Feb 13, 2016 at 9:39 PM, Stuart Henderson  wrote:
> There was this commit, I don't *think* it got reverted.
>
>
>
> CVSROOT:/cvs
> Module name:src
> Changes by: b...@cvs.openbsd.org2013/06/11 13:01:20
>
> Modified files:
> sys/kern   : kern_sysctl.c spec_vnops.c vfs_bio.c
>  vfs_biomem.c vfs_vops.c
> sys/sys: buf.h mount.h
> sys/uvm: uvm_extern.h uvm_page.c
> usr.bin/systat : iostat.c
>
> Log message:
> High memory page flipping for the buffer cache.
>
> This change splits the buffer cache free lists into lists of dma reachable
> buffers and high memory buffers based on the ranges returned by pmemrange.
> Buffers move from dma to high memory as they age, but are flipped to dma
> reachable memory if IO is needed to/from and high mem buffer. The total
> amount of buffers  allocated is now bufcachepercent of both the dma and
> the high memory region.
>
> This change allows the use of large buffer caches on amd64 using more than
> 4 GB of memory
>
> ok tedu@ krw@ - testing by many.

I think it got reverted by:

commit ac77fb26761065b7f6031098e6a182cacfaf7437
Author: beck 
Date:   Tue Jul 9 15:37:43 2013 +

back out the cache flipper temporarily to work out of tree.
will come back soon.
ok deraadt@


but I'm not an expert so would wait on confirmation by Bob Beck.



Re: dhcrelay: send_packet: No buffer space available

2016-02-14 Thread Stuart Henderson
On 2016-02-13, Kapetanakis Giannis  wrote:
> On 12/02/16 18:56, Stuart Henderson wrote:
>> On 2016-02-12, Kapetanakis Giannis  wrote:
>>> Hi,
>>>
>>> I have a carped firewall which is using dhcrelay to forward dhcp
>>> requests to another carped dhcp server.
>>> After upgrade to Feb  4 snapshot I'm seeing these in my logs:
>> What version were you running before?
>>
>> To establish whether it's a dhcrelay problem or something to do with carp
>> can you try dhcrelay from slightly older source e.g. 'cvs up -D 2016/02/01'?
>>
>
> The previous version was from July 2015 so it was far away from now.

So there are a lot of changea to networking in that timeframe, but only
a few changes to dhcrelay.

To narrow it down and establish whether it's a dhcrelay problem or
something to do with carp can you try dhcrelay from slightly older source
e.g. 'cvs up -D 2016/02/01'?

> I guess it will not work with current kernel and pledge(2), tame(2) 
> changes correct?

This isn't anything to do with pledge. I wouldn't expect old network-related
binaries to work on -current due to the network stack changes though.



Re: sshfs man page, -o idmap=user

2016-02-14 Thread Stuart Henderson
On 2016-02-12, Daniel Boyd  wrote:
> I am having this same issue.  I also tried adding the -d switch
> to see if that would shed any light.
>
> $ sshfs -d -o idmap=user ...
> command-line line 0: Bad number.
> remote host has disconnected
>
> $ sshfs -d -o idmap=file,uidfile=myuidfile,gidfile=mygidfile ...
> command-line line 0: Bad number.
> remote host has disconnected
>
> Any ideas?  I'm also running 5.8.
>
> Thanks!
> Daniel
>
>

iirc the option-parsing needs something from the OS that OpenBSD probably 
doesn't
have (FUSE on OpenBSD is still missing some bits).



Re: Softraids not sanwhichable? Re: Can I accelerate [...]

2016-02-14 Thread Karel Gardas
http://permalink.gmane.org/gmane.os.openbsd.misc/202642 -- and a lot
of other references which google may throw on your head if you search
for stack/stacking of softraid openbsd.

On Sun, Feb 14, 2016 at 7:50 PM, Tinker  wrote:
> Dear Karel,
>
> Are you saying that softraids not can be sandwhiched as of today -
>
> If they can't, about what kind of complexity needs to be solved?
>
> Thanks,
> Tinker
>
> On 2016-01-31 19:06, Karel Gardas wrote:
> [..]
>>
>> If you are interested in softraid, then perhaps some funding of fixes
>> which would allow running several softraid drives on top of another
>> would be great way to reuse your specific caching softraid discipline
>> with whatever softraid OpenBSD supports now.



Re: How extensive OpenBSD's write caching (for softdep or async-mounted UFS, as long as I never fsync() )?

2016-02-14 Thread Otto Moerbeek
On Sun, Feb 14, 2016 at 06:56:16AM -0500, Donald Allen wrote:

> On Sun, Feb 14, 2016 at 1:43 AM, Tinker  wrote:
> 
> > Did two tests, one with async and one with softdep, on amd64, 5.9-CURRENT,
> > UFS.
> >
> > (Checked "dd"'s sources and there is no fsync() anywhere in there.
> >
> > The bufcache setting was 90, 3GB free RAM, pushed 2GB of data using "dd"
> > to disk.
> >
> 
> Based on knowledge of Unices from long ago (not on direct knowledge of
> OpenBSD internals), I believe 'dd' uses raw I/O, which bypasses the buffer
> cache.
> 
> Those who know the details better than I should comment and correct me if
> I'm wrong, but if I'm right, your test doesn't prove anything about
> file-system write caching because your writes didn't go through the
> file-system.

Depends where you are writing to: a file or a device?

-Otto

> 
> /Don Allen
> 
> >
> > It took 12 and 15 seconds respectively, which is the harddrive's write
> > speed - the buffer cache of course would have absorbed this in 0 seconds.)
> >
> >
> > So, both runs showed that OpenBSD *not* does any write caching to talk
> > about, at all.
> >
> >
> > This means if a program wants write caching, it needs to implement it
> > itself.
> >
> > Good to know.
> >
> > Tinker
> >
> >
> > On 2016-02-13 23:47, Tinker wrote:
> >
> >> Hi,
> >>
> >> How much of my file writing, and filesystem operations such as
> >> creating a new file/directory, will land in OpenBSD's disk/write cache
> >> without touching the disk before return of the respective operation to
> >> my program, for softdep or async UFS media and I never fsync() ?
> >>
> >>
> >> This is relevant to know for any usecase where there may be a big
> >> write load to a magnet disk *and* there's lots of RAM and "sysctl
> >> kern.bufcachepercent" is high.
> >>
> >> If those ops will be done in a way that is synchronous with the magnet
> >> disk, the actual fopen(), fwrite(), fread() (for re-read of the data
> >> that's been written but still only is in the OS RAM CACHE) etc. might
> >> be so slow that a program would need to implement its own write cache
> >> for supporting even small spikes in write activity.
> >>
> >> Sorry for the fuss but neither "man" nor googling taught me anything.
> >>
> >> Thanks!!
> >> Tinker



support new

2016-02-14 Thread Onofre L. Alvarado, Jr.
0
C Philippines
P National Capital Region
T Makati City
Z 1203
O OpenBSD Philippines
I Onofre L. Alvarado, Jr.
A 8400 Mayapis st., Bgy. San Antonio
M i...@openbsd.org.ph
U http://www.openbsd.org.ph/
B 63-2-7281903
X 63-2-7281903
N Over a decade and a half's experience in the use and deployment of OpenBSD.
Network planning and design, firewalls, routers, email, web and database 
servers,
VPNs. OpenBSD consultancy, installation, maintenance and support.



Re: How extensive OpenBSD's write caching (for softdep or async-mounted UFS, as long as I never fsync() )?

2016-02-14 Thread Tinker

On 2016-02-14 19:20, Otto Moerbeek wrote:

On Sun, Feb 14, 2016 at 06:56:16AM -0500, Donald Allen wrote:


On Sun, Feb 14, 2016 at 1:43 AM, Tinker  wrote:

> Did two tests, one with async and one with softdep, on amd64, 5.9-CURRENT,
> UFS.
>
> (Checked "dd"'s sources and there is no fsync() anywhere in there.
>
> The bufcache setting was 90, 3GB free RAM, pushed 2GB of data using "dd"
> to disk.
>

Based on knowledge of Unices from long ago (not on direct knowledge of
OpenBSD internals), I believe 'dd' uses raw I/O, which bypasses the 
buffer

cache.

Those who know the details better than I should comment and correct me 
if

I'm wrong, but if I'm right, your test doesn't prove anything about
file-system write caching because your writes didn't go through the
file-system.


Depends where you are writing to: a file or a device?

-Otto



I wrote to a file, as I pointed out it was on UFS which was mounted 
either softdep or async.


And, this is where essentially no write caching was experienced, making 
me understand that OpenBSD essentially not does write caching (and the 
"buffer cache" technically is a read cache).




spamd with ipv6 support

2016-02-14 Thread Harald Dunkel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi folks,

last information I have about spamd with IPv6 support is WIP.
Is there any code I could try? Maybe I can help, at least in
running tests?


Please mail
Harri
iQEcBAEBCAAGBQJWwJVyAAoJEAqeKp5m04HLJxMH/jF6nBeBn0gYe5HQj73vDgWL
utLHaxkD1ODydZgQGd+mG0p6tHmV4924y/Bnq+m7IU1Qj26vk+rmeZVuImTHxv9J
C6ug3PYRAKbLLC8FAJ0s4GekxUu4Ocb2ZlWWcq2BPFHVfZK/95NtypYPSvSyJqcB
1sMD44dgN914tAWsDzoGC3pGswDQqgwZmvyKuhiyT+I0/XzL+e1aWm5nQUNE+ruO
QP4Qf6XM8SJcK5KBVODmQQJ7B/b9QOgsFwdLPfsMoZ6PheRMCVC2aq64YDV2TZTE
EWGNiFSpKlRWRrZhDwRFopzLERmsznY0qhjlofJK5r9m2iE+VYjzQgUOcdm7nP8=
=tpAm
-END PGP SIGNATURE-



Re: How extensive OpenBSD's write caching (for softdep or async-mounted UFS, as long as I never fsync() )?

2016-02-14 Thread Donald Allen
On Sun, Feb 14, 2016 at 1:43 AM, Tinker  wrote:

> Did two tests, one with async and one with softdep, on amd64, 5.9-CURRENT,
> UFS.
>
> (Checked "dd"'s sources and there is no fsync() anywhere in there.
>
> The bufcache setting was 90, 3GB free RAM, pushed 2GB of data using "dd"
> to disk.
>

Based on knowledge of Unices from long ago (not on direct knowledge of
OpenBSD internals), I believe 'dd' uses raw I/O, which bypasses the buffer
cache.

Those who know the details better than I should comment and correct me if
I'm wrong, but if I'm right, your test doesn't prove anything about
file-system write caching because your writes didn't go through the
file-system.

/Don Allen

>
> It took 12 and 15 seconds respectively, which is the harddrive's write
> speed - the buffer cache of course would have absorbed this in 0 seconds.)
>
>
> So, both runs showed that OpenBSD *not* does any write caching to talk
> about, at all.
>
>
> This means if a program wants write caching, it needs to implement it
> itself.
>
> Good to know.
>
> Tinker
>
>
> On 2016-02-13 23:47, Tinker wrote:
>
>> Hi,
>>
>> How much of my file writing, and filesystem operations such as
>> creating a new file/directory, will land in OpenBSD's disk/write cache
>> without touching the disk before return of the respective operation to
>> my program, for softdep or async UFS media and I never fsync() ?
>>
>>
>> This is relevant to know for any usecase where there may be a big
>> write load to a magnet disk *and* there's lots of RAM and "sysctl
>> kern.bufcachepercent" is high.
>>
>> If those ops will be done in a way that is synchronous with the magnet
>> disk, the actual fopen(), fwrite(), fread() (for re-read of the data
>> that's been written but still only is in the OS RAM CACHE) etc. might
>> be so slow that a program would need to implement its own write cache
>> for supporting even small spikes in write activity.
>>
>> Sorry for the fuss but neither "man" nor googling taught me anything.
>>
>> Thanks!!
>> Tinker