Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread David Gwynne
On Sun, Feb 28, 2021 at 01:17:01PM +0100, Rachel Roch wrote:
> 
> 
> 
> 28 Feb 2021, 11:28 by s...@spacehopper.org:
> 
> > On 2021/02/28 11:46, Rachel Roch wrote:
> >
> >> Thank you all for the suggestions, I am currently testing a few of them.
> >>
> >> Incase it makes any difference, the underlying problem I have is I have 
> >> two firewalls with BGP upstreams, one acting as primary, one as standby.?? 
> >> So the problem I am seeing is the age-old problem of asymmetric traffic to 
> >> the secondary firewall meaning pkg_add on the secondary doesn't work.
> >>
> >
> > You can't just get two sessions from your upstreams so they can both be
> > active rather than one in standby?
> >
> 
> Maybe my wording is a little off.
> 
> I do have independent sessions from FW1 and FW2 to upstream routers.
> 
> The problem, I suspect, is more to do with overlapping of IP ranges being 
> advertised to upstreams, and hence traffic never making it back to FW2 
> because FW1 picks it up, hence the desire to have an effective way to tell 
> OpenBSD "send all localhost originating traffic from lo2 because the IPs on 
> lo2 are exclusive to that host".

I have a situation like that at work which I solved using the following
rules:

# let us talk to things
  match out on vlan363 to !vlan363:network !received-on any nat-to lo1
  match out on vlan364 to !vlan364:network !received-on any nat-to lo1
  pass out !received-on any

vlan363 and vlan364 are the links I use to talk to the rest of the
world.

There may be a less worse way to do that with the routing table now
though.



Re: Relayd cannot load keypair

2021-02-28 Thread Anthony J. Bentley
James Chase writes:
> /etc/relayd.conf:25: cannot load keypair nextcloud.mydomain.com
> for relay secure_proxy
>
> The keys are in /etc/ssl/ and /etc/ssl/private, and I got them from
> acme-client via lets encrypt. Named:
> nextcloud.mydomain.com:443.fullchain.crt
> and
> nextcloud.mydomain.com:443.key

>From relayd.conf(5):

 keypair name
 The relay will attempt to look up a private key in
 /etc/ssl/private/name:port.key and a public certificate
 in /etc/ssl/name:port.crt, where port is the specified
 port that the relay listens on.  If these files are not
 present, the relay will continue to look in
 /etc/ssl/private/name.key and /etc/ssl/name.crt.

So you need to tell acme-client to generate a fullchain certificate
simply called name:port.crt, not name:port.fullchain.crt.

-- 
Anthony J. Bentley



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-02-28 Thread Mark Schneider

Hi Karel,

Thank you very much for your feedback and hints.
I have already opened a bug request for this issue, however I am not 
able to deliver the output of "trace" and "ps" commands from the ddb{4}> 
or ddb{2}> prompts as the crashed system is frozen so I can not type or 
see output typing blind.


In another email to misc (or just further below) I described some more 
tests.
I have to check how to compile a kernel with debug support and install 
it on the OpenBSD 6.8 box for further investigations.


Kind regards
Mark


# --- copy of the previous email to misc

Thank you very much for your feedback, suggestions and hints.

Indeed yesterday I saw one read and one write error related to Samsung 
PRO SSDs before another OS crash (I run more different tests writing big 
files to the RAID5 using "dd" or "cat" commands)
Today I have installed three new 1TB Samsung PRO 960 SSD drives inside a 
third box (however also an ASUS mainboard with AMD FX CPU and 16GB ECC 
RAM) and set RAID5 as described in the attached file.


And again a similar error after dd (slightly different values):
# ---
dd if=/dev/urandom of=/arc-3xssd/1GB-urandom.bin bs=1M count=1024

# Error messages

uvm_fault(0x821ede50, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
ddb{4}>

The error happens on the RAID5 level (there is no encryption).

In the test case above I used 30cm long SATA 3G cables (Samsung PRO 860 
and the SATA controller are 6G) as I did not have the 6G SATA cables 
available.

I run the original tests with 6G SATA cables.

For some reason the "ddb{4}>" is frozen so I am not able to type 
anything on the ddb input prompt on the console (and I don't see any 
output typing  blind "trace" or "ps" ).


I have somewhere some older Samsung PRO 850 SSDs so I will try to test 
the RAID5 configuration with them.




On 28.02.21 19:55, Karel Gardas wrote:


Hi,

compile kernel with debug enabled so you will get line number from the 
crash. See what's there. Go thorough git/cvs logs and see if anybody
did anything with global mutex over sata/sr raid. Read the code. The 
possibility is you are hitting a bug which is there since raid5 was 
added to obsd, none
just tested with that amount of ssds so you are in unique position to 
hunt this bug down. Congratulations and good luck!


Karel

On 2/28/21 3:05 AM, Mark Schneider wrote:

Hi again,

I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G 
SATA drives as RAID5 and I do not face the crash issue of the OS when 
using SSDs in the RAID5.

Details of the RAID5 setting are in the attached file.

It looks like using SSD drives as RAID5 leads for some reason to the 
OpenBSD 6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface 
(what is different compared to tested HDDs)


NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm / 
cryptoLUKS) does not face any issues
  There are also no issues using those SSDs as RAID on FreeBSD 
(TrueNAS).


Kind regards
Mark


On 27.02.21 04:30, Mark Schneider wrote:

Hi,


I face system crash on OpenBSD 6.8 when trying to use softraid RAID5 
drive trying to write big files (like 10GBytes) to it.


I can reproduce the error (tested on two different systems with 
OpenBSD 6.8 installed on an SSD drive or an USB stick). The RAID5 
drive itself consist of six Samsung PRO 860 512GB SSDs.


In short:

bioctl -c 5 -l sd0a,sd1a,sd2a,sd3a,sd4a,sd5a softraid0

obsdssdarc# disklabel sd7
# /dev/rsd7c:
type: SCSI
disk: SCSI disk
label: SR RAID 5
duid: a50fb9a25bf07243
flags:
bytes/sector: 512
sectors/track: 255
tracks/cylinder: 511
sectors/cylinder: 130305
cylinders: 38379
total sectors: 5001073280
boundstart: 0
boundend: 5001073280
drivedata: 0

16 partitions:
#    size   offset  fstype [fsize bsize cpg]
  a:   5001073280    0  4.2BSD   8192 65536 52270
  c:   5001073280    0  unused

# 



obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/1GB-urandom.bin 
bs=1M count=1024

1024+0 records in
1024+0 records out
1073741824 bytes transferred in 8.120 secs (132218264 bytes/sec)
    0m08.13s real 0m00.00s user 0m08.14s system

# Working as expected 
^^



obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/10GB-urandom.bin 
bs=10M count=1024


# Error messages

uvm_fault(0x821f5490, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
ddb{2}>

# Crashing OpenBSD 6.8 
^^^



# After reboot:

obsdssdarc# mount /dev/sd7a /arc-ssd/
mount_ffs: /dev/sd7a on /arc-ssd: Device not configure

obsdssdarc# grep sd7 /var/run/dmesg.boot
softraid0: trying to bring up sd7 degraded
softraid0: sd7 was not shutdown properly
softraid0: sd7 is offline, will not be brought online


Mor

Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-02-28 Thread Mark Schneider

Hi Stefan

Thank you very much for your feedback, suggestions and hints.

Indeed yesterday I saw one read and one write error related to Samsung 
PRO SSDs before another OS crash (I run more different tests writing big 
files to the RAID5 using "dd" or "cat" commands)
Today I have installed three new 1TB Samsung PRO 960 SSD drives inside a 
third box (however also an ASUS mainboard with AMD FX CPU and 16GB ECC 
RAM) and set RAID5 as described in the attached file.


And again a similar error after dd (slightly different values):
# ---
dd if=/dev/urandom of=/arc-3xssd/1GB-urandom.bin bs=1M count=1024

# Error messages

uvm_fault(0x821ede50, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
ddb{4}>

The error happens on the RAID5 level (there is no encryption).

In the test case above I used 30cm long SATA 3G cables (Samsung PRO 860 
and the SATA controller are 6G) as I did not have the 6G SATA cables 
available.

I run the original tests with 6G SATA cables.

For some reason the "ddb{4}>" is frozen so I am not able to type 
anything on the ddb input prompt on the console (and I don't see any 
output typing  blind "trace" or "ps" ).


I have somewhere some older Samsung PRO 850 SSDs so I will try to test 
the RAID5 configuration with them.


Kind regards
Mark


On 28.02.21 20:17, Stefan Sperling wrote:

On Sun, Feb 28, 2021 at 03:05:49AM +0100, Mark Schneider wrote:

Hi again,

I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G SATA
drives as RAID5 and I do not face the crash issue of the OS when using SSDs
in the RAID5.
Details of the RAID5 setting are in the attached file.

It looks like using SSD drives as RAID5 leads for some reason to the OpenBSD
6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface (what is
different compared to tested HDDs)

NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm / cryptoLUKS)
does not face any issues
   There are also no issues using those SSDs as RAID on FreeBSD
(TrueNAS).

I've seen some Samsung Pro SSDs cause I/O errors on ahci(4) due to unhandled
NCQ error conditions. Not sure if this relates to your problem; I assume that
these errors were specific to my machine, which is over 10 years old. Its AHCI
controller has likely not been designed with modern SSDs in mind. I switched
to different SSDs and the problem disappeared. This was on RAID1 where the
kernel didn't crash. Instead, the volume ended up in degraded state.

Maybe some I/O error is happening in your case as well?
Perhaps the raid5 code doesn't handle i/o errors gracefully?

In any case, your bug report is missing important information:


# Error messages

uvm_fault(0x821f5490, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
ddb{2}>

This tells us where it crashed but not how the code flow ended up here.
Please show the stack trace printed by the 'trace' command, and the output
of the 'ps' command (both commands at the ddb> prompt).


# OpenBSD 6.8 RAID5 configuration with three 1TB "Samsung SSD PRO 860" drives 


sysctl hw.disknames

disklabel sd1
disklabel -E sd1
disklabel -E sd2
odisklabel -E sd3

bioctl -c 5 -l sd1a,sd2a,sd3a softraid0
disklabel -E sd4

newfs sd4a

obsdarc# mkdir /arc-3xssd
obsdarc# mount /dev/sd4a /arc-3xssd/


  
obsdarc# df -h | grep 3xssd 
/dev/sd4a  1.8T8.0K1.8T 0%/arc-3xssd





# --
dd if=/dev/urandom of=/arc-3xssd/1GB-urandom.bin bs=1M count=1024

# Error messages

uvm_fault(0x821ede50, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:cmpl $0,0x40(%r9)
ddb{4}>


# --
obsdarc# disklabel sd1  


  

# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: Samsung SSD 860 
duid: cb0d589d6d25894e
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 124519
total sectors: 2000409264
boundstart: 0
boundend: 2000409264
drivedata: 0 

16 partitions:
#size   offset  fstype [fsize bsize   cpg]
  a:   20004092640RAID
  c:   20004092640  unused  

...


# 

Re: Intel Turbo Memory in Thinkpad W500

2021-02-28 Thread Bryan Steele
On Sun, Feb 28, 2021 at 09:00:25PM +0100, Jan Stary wrote:
> This is 6.9-beta/amd64 on a Thinkpad W500 (dmesg below).
> 
> Taking out the unneeded stuff (I usually take out bluetooth,
> replace the wifi with Intel 7260 HMW etc), I also noticed this
> (see attachments). Taking it out, the difference in dmesg shows:
> 
> -"Intel Turbo Memory" rev 0x11 at pci4 dev 0 function 0 not configured
> 
> Given that it's "not configured", I don't think I'm missing much
> (and the Thinkpad's memory doesn't seem any less "Turbo"),
> but does anyone know what it does in the Thinkpad? AFAIG,
> is was supposed to be a thing before 4G of RAM and SSDs
> were common ...
> 
>   Jan

1-4GB of NAND flash on an option card.

There is an incomplete Linux reverse engineering effort, but it doesn't
look particularly all that interesting, and likely slower than an SSD by
today's standards.

https://github.com/yarrick/turbomem

-Bryan.



Re: Intel Turbo Memory in Thinkpad W500

2021-02-28 Thread Karel Gardas




On 2/28/21 9:00 PM, Jan Stary wrote:

This is 6.9-beta/amd64 on a Thinkpad W500 (dmesg below).

Taking out the unneeded stuff (I usually take out bluetooth,
replace the wifi with Intel 7260 HMW etc), I also noticed this
(see attachments). Taking it out, the difference in dmesg shows:

-"Intel Turbo Memory" rev 0x11 at pci4 dev 0 function 0 not configured

Given that it's "not configured", I don't think I'm missing much
(and the Thinkpad's memory doesn't seem any less "Turbo"),
but does anyone know what it does in the Thinkpad? AFAIG,
is was supposed to be a thing before 4G of RAM and SSDs
were common ..


Wikipedia does have article about it: 
https://en.wikipedia.org/wiki/Intel_Turbo_Memory




Relayd cannot load keypair

2021-02-28 Thread James Chase
I'm on openbsd 6.8, ran syspatch today.
relayd.conf:

table  { 192.168.1.158 }
http protocol "httpproxy" {
pass request quick header "Host" value "nextcloud.mydomain.com" \
forward to 
block
}
relay "proxy" {
   listen on 192.168.1.156 port 80
   protocol "httpproxy"
   forward to  port 80
}
http protocol "https" {
  tls keypair nextcloud.mydomain.com
  return error
  pass
}
relay "secure_proxy" {
listen on 192.168.1.156 port 443 tls
protocol https
forward to  port 80
}

Works for regular http, but when I try adding the https blocks I get:

/etc/relayd.conf:25: cannot load keypair nextcloud.mydomain.com
for relay secure_proxy

The keys are in /etc/ssl/ and /etc/ssl/private, and I got them from
acme-client via lets encrypt. Named:
nextcloud.mydomain.com:443.fullchain.crt
and
nextcloud.mydomain.com:443.key

Also tried generating them without the ports and with .pem,
etc.

Also, I've tried replacing 192.168.1.156 in the listen on
line in secure_proxy with "nextcloud.mydomain.com"
I've tried various examples online as well. Any help would
be appreciated! At this point it feels like a bug, but apparently
others have it working.



Intel Turbo Memory in Thinkpad W500

2021-02-28 Thread Jan Stary
This is 6.9-beta/amd64 on a Thinkpad W500 (dmesg below).

Taking out the unneeded stuff (I usually take out bluetooth,
replace the wifi with Intel 7260 HMW etc), I also noticed this
(see attachments). Taking it out, the difference in dmesg shows:

-"Intel Turbo Memory" rev 0x11 at pci4 dev 0 function 0 not configured

Given that it's "not configured", I don't think I'm missing much
(and the Thinkpad's memory doesn't seem any less "Turbo"),
but does anyone know what it does in the Thinkpad? AFAIG,
is was supposed to be a thing before 4G of RAM and SSDs
were common ...

Jan

OpenBSD 6.9-beta (GENERIC) #345: Tue Feb 23 01:02:38 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 8463781888 (8071MB)
avail mem = 8192016384 (7812MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xe0010 (80 entries)
bios0: vendor LENOVO version "6FET79WW (3.09 )" date 10/02/2009
bios0: LENOVO 40612JG
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT ECDT APIC MCFG HPET SLIC BOOT ASF! SSDT TCPA SSDT 
SSDT SSDT
acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP0(S4) EXP1(S4) EXP2(S4) 
EXP3(S4) EXP4(S4) PCI1(S4) USB0(S3) USB3(S3) USB5(S3) EHC0(S3) EHC1(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiec0 at acpi0
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM)2 Duo CPU P9500 @ 2.53GHz, 2527.45 MHz, 06-17-06
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,NXE,LONG,LAHF,PERF,SENSOR,MELTDOWN
cpu0: 6MB 64b/line 16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 7 var ranges, 88 fixed ranges
cpu0: apic clock running at 265MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2.1.3, IBE
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins, remapped
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (AGP_)
acpiprt2 at acpi0: bus 2 (EXP0)
acpiprt3 at acpi0: bus 3 (EXP1)
acpiprt4 at acpi0: bus 4 (EXP2)
acpiprt5 at acpi0: bus 5 (EXP3)
acpiprt6 at acpi0: bus 13 (EXP4)
acpiprt7 at acpi0: bus 21 (PCI1)
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpicmos0 at acpi0
acpibat0 at acpi0: BAT0 model "COMPATIBLE" serial  1388 type LION oem "SANYO"
acpiac0 at acpi0: AC unit online
acpithinkpad0 at acpi0: version 1.0
"PNP0C14" at acpi0 not configured
acpicpu0 at acpi0: !C3(250@17 mwait.3@0x20), !C2(500@1 mwait.1@0x10), C1(1000@1 
mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for USB0, USB3, USB5, EHC0, EHC1
acpitz0 at acpi0: critical temperature is 127 degC
acpitz1 at acpi0: critical temperature is 100 degC
acpidock0 at acpi0: GDCK not docked (0)
acpivideo0 at acpi0: VID_
acpivout0 at acpivideo0: LCD0
acpivideo1 at acpi0: VID_
acpivout1 at acpivideo1: LCD0
cpu0: Enhanced SpeedStep 2527 MHz: speeds: 2534, 2533, 1600, 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel GM45 Host" rev 0x07
ppb0 at pci0 dev 1 function 0 "Intel GM45 PCIE" rev 0x07: msi
pci1 at ppb0 bus 1
1:0:0: io address conflict 0x2000/0x100
radeondrm0 at pci1 dev 0 function 0 "ATI Mobility Radeon HD 3650" rev 0x00
drm1 at radeondrm0
radeondrm0: msi
inteldrm0 at pci0 dev 2 function 0 "Intel GM45 Video" rev 0x07
drm0 at inteldrm0
intagp0 at inteldrm0
agp0 at intagp0: aperture at 0xd000, size 0x1000
inteldrm0: apic 1 int 16, GM45, gen 4
"Intel GM45 HECI" rev 0x07 at pci0 dev 3 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel ICH9 IGP M AMT" rev 0x03: msi, address 
00:1c:25:97:c2:f5
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x03: apic 1 int 20
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x03: apic 1 int 21
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x03: apic 1 int 22
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x03: apic 1 int 23
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 
addr 1
azalia0 at pci0 dev 27 function 0 "Intel 82801I HD Audio" rev 0x03: msi
azalia0: codecs: Conexant CX20561, Conexant/0x2c06, using Conexant CX20561
audio0 at azalia0
ppb1 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x03: msi
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 1 "Intel 82801I PCIE" rev 0x03: msi
pci3 at ppb2 bus 3
iwn0 at pci3 dev 0 function 0 "Intel WiFi Link 5300" rev 0x00: msi, MIMO 3T3R, 
MoW, address 00:16:ea:b2:58:ec
ppb3 at pci0 dev 28 function 2 "Intel 82801I PCIE" rev 0x03: msi
pci4 at ppb3 bus 4
"Intel Turbo Memory" rev 0x11 at pci4 dev 0 function 0 not configured
ppb4 at pci0 dev 28 function 3 "Intel 82801I PCIE" rev 0x03: msi

Re: Default partitions allocate only 1GB to /

2021-02-28 Thread Janne Johansson
Den sön 28 feb. 2021 kl 14:51 skrev :
> I deleted the file and `pkg_add libreoffice` worked as expected.
> Post-install I still have 746MB free in /, according to `df -h`.
>
> This makes little sense to me. Why should deleting a 20MB file on a
> filesystem with >700MB free space be sufficient for the install to go
> through? Especially when the install obviously doesn't need that much
> space on the filesystem in question?
>
> (space available in /usr/local went from 11.4G, pre-install, to 10.8G,
> post-install... was `pkg_add` trying to stage files in /, even though
> /tmp is a separate filesystem?)

Is /var a filesystem of its own? Otherwise it could be /var/tmp or
some other place under /var which is used for unpacking packages.

-- 
May the most significant bit of your life be positive.



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-02-28 Thread Stefan Sperling
On Sun, Feb 28, 2021 at 03:05:49AM +0100, Mark Schneider wrote:
> Hi again,
> 
> I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G SATA
> drives as RAID5 and I do not face the crash issue of the OS when using SSDs
> in the RAID5.
> Details of the RAID5 setting are in the attached file.
> 
> It looks like using SSD drives as RAID5 leads for some reason to the OpenBSD
> 6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface (what is
> different compared to tested HDDs)
> 
> NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm / cryptoLUKS)
> does not face any issues
>   There are also no issues using those SSDs as RAID on FreeBSD
> (TrueNAS).

I've seen some Samsung Pro SSDs cause I/O errors on ahci(4) due to unhandled
NCQ error conditions. Not sure if this relates to your problem; I assume that
these errors were specific to my machine, which is over 10 years old. Its AHCI
controller has likely not been designed with modern SSDs in mind. I switched
to different SSDs and the problem disappeared. This was on RAID1 where the
kernel didn't crash. Instead, the volume ended up in degraded state.

Maybe some I/O error is happening in your case as well?
Perhaps the raid5 code doesn't handle i/o errors gracefully?

In any case, your bug report is missing important information:

> > # Error messages
> > 
> > uvm_fault(0x821f5490, 0x40, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
> > ddb{2}>

This tells us where it crashed but not how the code flow ended up here.
Please show the stack trace printed by the 'trace' command, and the output
of the 'ps' command (both commands at the ddb> prompt).



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-02-28 Thread Karel Gardas



Hi,

compile kernel with debug enabled so you will get line number from the 
crash. See what's there. Go thorough git/cvs logs and see if anybody
did anything with global mutex over sata/sr raid. Read the code. The 
possibility is you are hitting a bug which is there since raid5 was 
added to obsd, none
just tested with that amount of ssds so you are in unique position to 
hunt this bug down. Congratulations and good luck!


Karel

On 2/28/21 3:05 AM, Mark Schneider wrote:

Hi again,

I have repeated softraid tests using six pcs of 1TB Samsung HDD 3G 
SATA drives as RAID5 and I do not face the crash issue of the OS when 
using SSDs in the RAID5.

Details of the RAID5 setting are in the attached file.

It looks like using SSD drives as RAID5 leads for some reason to the 
OpenBSD 6.8 crash. Samsung 512MB PRO 860 SSDs have 6G SATA interface 
(what is different compared to tested HDDs)


NB: Using those SSDs as RAID6 on debian Linux (buster - mdadm / 
cryptoLUKS) does not face any issues
  There are also no issues using those SSDs as RAID on FreeBSD 
(TrueNAS).


Kind regards
Mark


On 27.02.21 04:30, Mark Schneider wrote:

Hi,


I face system crash on OpenBSD 6.8 when trying to use softraid RAID5 
drive trying to write big files (like 10GBytes) to it.


I can reproduce the error (tested on two different systems with 
OpenBSD 6.8 installed on an SSD drive or an USB stick). The RAID5 
drive itself consist of six Samsung PRO 860 512GB SSDs.


In short:

bioctl -c 5 -l sd0a,sd1a,sd2a,sd3a,sd4a,sd5a softraid0

obsdssdarc# disklabel sd7
# /dev/rsd7c:
type: SCSI
disk: SCSI disk
label: SR RAID 5
duid: a50fb9a25bf07243
flags:
bytes/sector: 512
sectors/track: 255
tracks/cylinder: 511
sectors/cylinder: 130305
cylinders: 38379
total sectors: 5001073280
boundstart: 0
boundend: 5001073280
drivedata: 0

16 partitions:
#    size   offset  fstype [fsize bsize cpg]
  a:   5001073280    0  4.2BSD   8192 65536 52270
  c:   5001073280    0  unused

# 



obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/1GB-urandom.bin bs=1M 
count=1024

1024+0 records in
1024+0 records out
1073741824 bytes transferred in 8.120 secs (132218264 bytes/sec)
    0m08.13s real 0m00.00s user 0m08.14s system

# Working as expected 
^^



obsdssdarc# time dd if=/dev/urandom of=/arc-ssd/10GB-urandom.bin 
bs=10M count=1024


# Error messages

uvm_fault(0x821f5490, 0x40, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  sr_validate_io+0x44:    cmpl $0,0x40(%r9)
ddb{2}>

# Crashing OpenBSD 6.8 
^^^



# After reboot:

obsdssdarc# mount /dev/sd7a /arc-ssd/
mount_ffs: /dev/sd7a on /arc-ssd: Device not configure

obsdssdarc# grep sd7 /var/run/dmesg.boot
softraid0: trying to bring up sd7 degraded
softraid0: sd7 was not shutdown properly
softraid0: sd7 is offline, will not be brought online


More details in attached files. Thanks a lot in advance for short 
feedback.



Kind regards

Mark







Re: Default partitions allocate only 1GB to /

2021-02-28 Thread tetrahedra

On Sat, Feb 27, 2021 at 11:52:39PM +, James Cook wrote:


Sorry, you're right, pkg_add can add files to /. But generally those
will be quite small (/etc/make2fs.conf sounds like a configuration
file).

How big is your root partition, and how much space is used? For example
mine is like this after several months of use and many packages
installed, indicating the installer's default behaviour has worked well
for me:


falsifian angel ~ $ df -h /
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd2a  989M199M741M21%/


My root partition is about the same -- circa 1GB in size, about 700MB 
free.


According to `df -h` my /user/local has 11.4GB available and /usr has 
3.5GB, so there *should* be plenty of space for Libreoffice.




Re: Default partitions allocate only 1GB to /

2021-02-28 Thread tetrahedra

On Sat, Feb 27, 2021 at 11:52:39PM +, James Cook wrote:

If you have a lot more space used, you could try to figure out what's
using it. My go-to command is "du -xah /|sort -h|less"


That's a neat command, and amazingly enough it did the trick: there was 
a 20MB file, INS@yjf(...) located in the root directory. It looked like 
a copy of the kernel binary which had been saved while I was messing 
about with kernel configuration options.


I deleted the file and `pkg_add libreoffice` worked as expected.

Post-install I still have 746MB free in /, according to `df -h`.

This makes little sense to me. Why should deleting a 20MB file on a 
filesystem with >700MB free space be sufficient for the install to go 
through? Especially when the install obviously doesn't need that much 
space on the filesystem in question?


(space available in /usr/local went from 11.4G, pre-install, to 10.8G, 
post-install... was `pkg_add` trying to stage files in /, even though 
/tmp is a separate filesystem?)




Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread Rachel Roch




28 Feb 2021, 11:28 by s...@spacehopper.org:

> On 2021/02/28 11:46, Rachel Roch wrote:
>
>> Thank you all for the suggestions, I am currently testing a few of them.
>>
>> Incase it makes any difference, the underlying problem I have is I have two 
>> firewalls with BGP upstreams, one acting as primary, one as standby.  So the 
>> problem I am seeing is the age-old problem of asymmetric traffic to the 
>> secondary firewall meaning pkg_add on the secondary doesn't work.
>>
>
> You can't just get two sessions from your upstreams so they can both be
> active rather than one in standby?
>

Maybe my wording is a little off.

I do have independent sessions from FW1 and FW2 to upstream routers.

The problem, I suspect, is more to do with overlapping of IP ranges being 
advertised to upstreams, and hence traffic never making it back to FW2 because 
FW1 picks it up, hence the desire to have an effective way to tell OpenBSD 
"send all localhost originating traffic from lo2 because the IPs on lo2 are 
exclusive to that host".




Encrypted home + hibernate: drives states? [ OpenBSD -current ]

2021-02-28 Thread martin mag
Hello!

My current partition setup is as follows (one SSD Disk, using -current
default kernel )
sd0a   100G RAID  == bioctl -c C -k sd1a ==> a=/
  b=swap

.  .

 p=/home (for sysupgrade to

 work without troubles)
sd0d   350G RAID  == bioctl -c C -C noauto -k sd1d ==>   a=/home/mmartin

(BTW, I use duids but for the sake of readers, using dev label here)

* Decryption of sd0a is done automatically at boot time => Perfect

* Decryption of sd0d (not automatically decrypted, see -C noauto),
is done with a modified rc script (just after wsconsctl), but it could be
done in /etc/rc.local (I just don't want to leave my keydisk too long
on my computer, personal preference ... debatable for sure).

I can run suspend (zzz) without any issue (but as I'm using FDE, I prefer not to
use it as encryption would be useless) and hibernate (ZZZ) seem to work
perfectly fine. The only problem I have is understanding in what state is
my sd0d partition.

sd0a is the encrypted root partition, automatically handled by the OS so when
waking from an hibernate state, the usb key needs to be inserted =>
When in hibernate mode, I assume sd0a is encrypted then .. right?

Now, as sd0d is handled manually (in /etc/rc or /etc/rc.local), I
don't really get in
which state it is when in hibernate mode. It doesn't seem to be
encrypted because
the usb key is not needed at wakeup time (or is it?.. but some key is
stored within the
image that is dumped to swap?. My first thought was that unmount /
detaching bioctl
should happen AFTER the system image is dumped to swap (so this cannot be
handled in /etc/apm/* files ... right?).
At the same time, I don't understand HOW it could not be encrypted as
powering off
the laptop (hibernate behaviour) will force bioctl to detach => hence
keep the drive
encrypted while powered off .. right?
Because of that, is there a high risk of getting corrupted data when
waking the laptop
up from hibernate state?

Last thing: If my /home/mmartin partition is not on the same drive or
partition as root,
should I avoid using hibernate if my laptop needs to be securely
powered-off? (swap
is on the encrypted drive sd0a (encrypted twice then but I read on
this mailing list that
the overhead is so low that everyone should do that if using FDE) so
is no factor
for a security breach)

Thank you very much!

PS: I use the -C noauto for my home partition because, IRL, I have a
small password
encrypted partition on the keydisk that, when decrypted, contains the key to
decrypt my home partition. (so automatic decryption is not going to
work for me).



Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread Rachel Roch
Thank you all for the suggestions, I am currently testing a few of them.

Incase it makes any difference, the underlying problem I have is I have two 
firewalls with BGP upstreams, one acting as primary, one as standby.  So the 
problem I am seeing is the age-old problem of asymmetric traffic to the 
secondary firewall meaning pkg_add on the secondary doesn't work.

I guess I could med/localpref tweak the secondary to push traffic via the 
primary.  But then I still have the problem of determining return path for the 
traffic (given inherent overlapping of IP ranges on the boxes).

26 Feb 2021, 15:34 by s...@spacehopper.org:

> On 2021-02-26, Daniel Jakots  wrote:
>
>> On Fri, 26 Feb 2021 11:53:40 +0100 (CET), Rachel Roch
>>
> > wrote:
>
>>> Let's say I'm running "pkg_add -u" on a OpenBSD-based router with
>>> multiple interfaces.
>>>
>>> What determines the source IP ?
>>>
>>
>> On -current there is
>>  route [-T rtable] sourceaddr [-inet|-inet6] [address]
>>  route [-T rtable] sourceaddr [-inet|-inet6] -ifp interface
>>
>
> Use with care though, this can be a footgun (especially if you are
> connecting from there to other local machines with "strict host model").
>
> If you want something more targetted then nat-to is one option.
>



Re: What determines source IP of traffic from OpenBSD box ?

2021-02-28 Thread Stuart Henderson
On 2021/02/28 11:46, Rachel Roch wrote:
> Thank you all for the suggestions, I am currently testing a few of them.
> 
> Incase it makes any difference, the underlying problem I have is I have two 
> firewalls with BGP upstreams, one acting as primary, one as standby.  So the 
> problem I am seeing is the age-old problem of asymmetric traffic to the 
> secondary firewall meaning pkg_add on the secondary doesn't work.

You can't just get two sessions from your upstreams so they can both be
active rather than one in standby?

> I guess I could med/localpref tweak the secondary to push traffic via the 
> primary.  But then I still have the problem of determining return path for 
> the traffic (given inherent overlapping of IP ranges on the boxes).
> 
> 26 Feb 2021, 15:34 by s...@spacehopper.org:
> 
> > On 2021-02-26, Daniel Jakots  wrote:
> >
> >> On Fri, 26 Feb 2021 11:53:40 +0100 (CET), Rachel Roch
> >>
> > > wrote:
> >
> >>> Let's say I'm running "pkg_add -u" on a OpenBSD-based router with
> >>> multiple interfaces.
> >>>
> >>> What determines the source IP ?
> >>>
> >>
> >> On -current there is
> >>  route [-T rtable] sourceaddr [-inet|-inet6] [address]
> >>  route [-T rtable] sourceaddr [-inet|-inet6] -ifp interface
> >>
> >
> > Use with care though, this can be a footgun (especially if you are
> > connecting from there to other local machines with "strict host model").
> >
> > If you want something more targetted then nat-to is one option.
> >
>