Re: EINVAL from copyin/out - how?

2014-08-15 Thread Thomas Schmitt
Hi,

> the tar file read loop is basically
>while ((n = read(fd, buf, bufzize)) > 0)
>write(ofd, buf, n);

That would be regular file content and thus should touch
cd9660_bmap(), which would translate the file-internal block
number to the partition/device block number.

man 9 VOP_BMAP says:
  "In case the logical block is not allocated, -1 is used."
But cd9660_bmap() does never set *ap->a_bnp to this value.
It happily computes the device block number from file start
block and desired file-internal block number.
On the other hand, a sane ISO 9660 filesystem should not
cause any block to be read which lies outside its hosting
device.

I assume read(2) or its helpers call VOP_BMAP(9) with the
vnode of the data file and then bread(9) with the device
vnode and the converted block number.

There is also cd9660_read()/VOP_READ(9) which i do not really
understand in the (vp->v_type == VREG) case. Especially i do
not see where the necessary file-to-device mapping is done.


In any case, i'd deem some printfs in cd9660_bmap() worthwhile
as long as there are no better ideas.
They should tell about the eventual emergency exit:
if (ap->a_bnp == NULL)
return (0);
resp. about the sucessfully mapped parameters:
  af->a_bn , *af->a_bnp, *af->a_runp (if af->a_runp != NULL)


One should also find out, whether cd9660_read() is called
in the course of read(2) on a regular file.
if (vp->v_type == VREG) {
const int advice = IO_ADV_DECODE(ap->a_ioflag);

printf("cd9660_read() called\n");

...
}
(In that case one would have to learn how this function actually
 works by help of ubc_uiomove(9) resp. how ubc_uiomove(9) learns
 about the data block numbers of the regular file.)


> The read() returns -1, with errno == EINVAL

Well, cd9660 has a few EINVALs. But except one in cd9660_read()
if (uio->uio_offset < 0)
return (EINVAL);
i cannot spot any other which would be suspicious to happen
underneath read(2).
(The others have to do with mounting, readdir, readlink, ...)

And overall we still lack an architecture specific trigger
for any error condition in cd9660.


Have a nice day :)

Thomas



Re: EINVAL from copyin/out - how?

2014-08-15 Thread Thomas Schmitt
Hi,

i'm not a NetBSD expert, but had reason to look into its cd9660
code for developing two PRs (48808, 48959).


Robert Elz:
> At this point it looks to be some kind of setup problem on amd64, when it
> reads nested directories, building some data struct that resuts in EINVAL
> from the copyout

I did not see any architecture specific code in cd9660.

Given the fact that you located the origin of EINVAL outside
of cd9660 (resp. underneath), i expect that ISO 9660 structure
aspects are only indirectly involved.
Especially the difference between i386 and amd64 can hardly
be explained by ISO 9660 aspects.

The connection between ISO 9660 and data file content blocks is
made by cd9660_bmap() as implementation of VOP_BMAP(9).


> diff -r of the mount points of the real DVD (/cdrom) and the
> mounted vnd0a (/mnt) - that completed without error.
> What's more, after that, tar had no problem reading the DVD either!

It is unlikely that the DVD delivers varying data blocks
from the ISO 9660 directory files, which would lure cd9660_bmap()
into requesting invalid data block addresses for file content.

My favority suspect would now be the code underneath bread(9)
and especially the mechanism which looks up the buffer and
decides whether a physical read operation on the DVD is needed.


> If anyone has any suggestiions for
> possible sources, I'll happily mangle my kernel

How about having a wrapper around bread(9), which checks for
error replies of bread(9) and eventually prints some message
which tells the failed block number.
Then use that wrapper instead of all the bread(9) calls in cd9660.
(Should be possible with a few vi commands on the few cd9660*.c
 files.)
This would make clear whether the problem is indeed underneath
bread(9) and whether it gets an implausible block number from
cd9660.

If the block number of a failure is in the size range of the ISO
image, the its content could be looked up by help of dd and be made
human readable by od or alike. Directory records show some
characteristic redundancy. So there is hope we can tell whether
it is metadata or data file content.
If the content is inconclusive, then i'd need the whole ISO
image in order to tell what the affected block shall mean.


Alternatively to an inspection of the ISO image, or if the block
number is implausible, you could equip the wrapper and its calls
with a string parameter which identifies the wrapper's caller.
So we could make a connection to particular cd9660 operations
and see whether it is always the same cd9660 gesture failing.


> ps: I can make the 2.1GiB .iso image available,

If you have an image which never shows problems and one that
reliably shows problems (e.g. on the first tar) then we could
look at the block addresses of the data files and directory
files which are involved.

As said, it would be interesting to look up the blocks of
failing bread(9) calls.


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-20 Thread Thomas Schmitt
Hi,

> > -   val_len = strlen(argv[0]) + 1;
> > -   mkbuf(&buf, &buflen, val_len);
> > +   val_len = strlen(argv[0]);
> > +   mkbuf(&buf, &buflen, val_len + 1);

Emmanuel Dreyfus:
> I committed this.

On the risk to be obtrusive, how about in line 409:

-   fwrite(buf, buflen, 1, stdout);
+   fwrite(buf, error, 1, stdout);

"buflen" is really the wrong variable. "error" is only a misnomer.

The negative effect is that long attribute content shows up in the
content of the next file with shorter content

  $ getextattr user bla /mnt/ffs1/[ax]
  /mnt/ffs1/a 
  /mnt/ffs1/x xblaAAA

The code path which does

  strvisx(visbuf, buf, error, flag_vis);

shows correct length:

  $ getextattr -s user bla /mnt/ffs1/[ax]
  /mnt/ffs1/a "\000"
  /mnt/ffs1/x "xbla\000"


Have a nice day :)

Thomas



Re: LG CRD-8322B does not work

2014-06-20 Thread Thomas Schmitt
Hi,

> I have LG CRD-8322B CD-ROM drive that is shipped with Sun Ultra 5.
> cd0 at atapibus0 drive 0:  cdrom removable
> I have received some comments.
> my problem may be caused from bad CD-R, or my CRD-8322B is broken.
> I believe "CRD-8322B, 1998/09/24, 1.05" has problem.

Old enough to have a proprietary command set. But above line
and also the internet say it has IDE/ATAPI connection. So it
might already be governed by SCSI/MMC.

In this case it might be worth to watch a program like cdrecord or
xorriso trying to inspect the loaded medium while their SCSI command
log is enabled:

  cdrecord -v -V dev=...?... -msinfo \
2>&1 | tee -i /tmp/cdrecord_log

or

  xorriso -scsi_log on -report_about debug -indev /dev/rcd0d -toc \
2>&1 | tee -i /tmp/xorriso_log

The logs will tell commands out of SCSI volumes SPC, SBC, MMC,
and the reaction of the drive. (I did not test the cdrecord command.)
Expect several hundred lines of text.


> atabus1 at cmdide0 channel 1
> ...
> atapibus0 at atabus1: 2 targets
> cd0 at atapibus0 drive 0:  cdrom removable
> ...
> wdc_atapi_intr: bad data phase DATAOUT
> cmdide0:1:0: reset failed
> cmdide0:1:0: unbusy timed out

This looks rather like a bus problem than like firmware
or mechano-optical drive component.

> I have another Sun Ultra 5 that has same configuration

But two identical bus controller failures ?

Maybe the IDE/ATAPI driver has a problem with that hardware ?


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-20 Thread Thomas Schmitt
Hi,

Emmanuel Dreyfus:

> IIRC, we must store a trailing 0 because of an oddity in the FreeBSD API.

It causes a slight but possibly painful incompatibility to other
systems. An appended 0 might have a meaning in some special case.
So one should not flood the "user" namespace with them.

What API/ABI might be affected ?


I believe to see that the current way in getextattr.c is wrong
if option -n sets variable "flag_null" to 1.

Pity there is no valgrind to prove that this is bad memory usage:

mkbuf(&buf, &buflen, val_len);
...
error = extattr_set_file(argv[arg_counter],
attrnamespace, attrname, buf,
val_len + flag_null);

In any case, the result proves that flag_null was indeed 1:

  $ echo c >/mnt/ffs1/c
  $ setextattr -n user bla c_with_n /mnt/ffs1/c
  $ getextattr -s user bla /mnt/ffs1/c
  /mnt/ffs1/c "c_with_n\000\000"


> How is it done in FreeBSD sources?

Without unconditionally adding 1 to strlen().

  
http://svnweb.freebsd.org/base/head/usr.sbin/extattr/rmextattr.c?revision=248995&view=markup#l186

if (what == EASET) {
mkbuf(&buf, &buflen, strlen(argv[0]) + 1);
strcpy(buf, argv[0]);
argc--; argv++;
}

Line 205:
len = strlen(buf) + flag_null;

Line 210:
ret = extattr_set_file(argv[arg_counter],
attrnamespace, attrname, buf, len);


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-20 Thread Thomas Schmitt
Hi,

me:
> > This 0-byte seems to be a bug-or-feature of setextattr(1).

Justin Cormack:
> This looks like a bug to me, attribute values need to be binary.

Probable cause is in getextattr.c line 314:

val_len = strlen(argv[0]) + 1;

with effect in line 338:
error = extattr_set_file(argv[arg_counter],
attrnamespace, attrname, buf,
val_len + flag_null);

I would do

-   val_len = strlen(argv[0]) + 1;
-   mkbuf(&buf, &buflen, val_len);
+   val_len = strlen(argv[0]);
+   mkbuf(&buf, &buflen, val_len + 1);

At least this helps here:

  $ cd /usr/src/usr.bin/extattr
  $ ln -s getextattr setextattr
  $ echo b > /mnt/ffs1/b
  $ ./setextattr user bla B /mnt/ffs1/b
  $ xorriso -for_backup -outdev stdio:/dev/null -map /mnt/ffs1/b /b -getfattr 
/b -- -rollback_end
  ...
  user.bla="B"

No zeros. No ambiguity. No visible problems with ./getextattr:

  $ ./getextattr -f user bla  /mnt/ffs1/*
  /mnt/ffs1/a 
  /mnt/ffs1/b B
  /mnt/ffs1/x xbla
  getextattr: /mnt/ffs1/xorriso_extracted: failed: No message available
  /mnt/ffs1/y YYYbla


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-20 Thread Thomas Schmitt
Hi,

one point for man 1 getextattr and against me:

The abort on the first file without attributes could be a feature,
if one is willing to count a non-existing requested attribute as
an error.
Option -f overrides this early abort. So my change proposal about
line 376 is not needed. (It was not complete anyway.)

-

One could discuss whether a missing attribute is really an error.
Linux, which has more experience with using xattr, only issues
warnings but does not indicate error:

  $ getfattr -n user.uerg x x
  x: user.uerg: No such attribute
  y: user.uerg: No such attribute
  $ echo $?
  0

In contrast to a missing file:

  $ getfattr -n user.uerg xyz x
  getfattr: xyz: No such file or directory
  x: user.uerg: No such attribute
  $ echo $?
  1


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-20 Thread Thomas Schmitt
Hi,

my development snapshot of xorriso is now able to record and
restore extattr on NetBSD.

Main work was to make it tolerate the error ENOENT that is
thrown by extattr_list_link(2) if the filesystem does not
support extattr.
Else it is much like FreeBSD. I will have to compare the
behavior of both in more detail.

I think getextattr(1) needs mending.
See at end of this mail: "getextattr(1) has problems"

---

One peculiarity shows up:

extattr_get_link(2) et.al. return the attribute content with
a trailing 0-byte.
This 0-byte seems to be a bug-or-feature of setextattr(1).

I overwrote an extattr in the ISO filesystem by xorriso means
without a trailing 0-byte. Then i extracted it into the FFSv1
filesystem and recorded it by xorriso again. No 0-byte appeared.
So the extattr_*(2) functions and xorriso are not the ones
who created those 0s.

Neither FreeBSD nor Linux show trailing 0-bytes with their
cleartext extattr/xattr.

I care for portability of attributes in namespace "user".
So it would be interesting to know whether the convention on
NetBSD is to have a 0-byte at the end of attribute content.
extattr(9) specifies "nul-terminated character string" for
names. But content is usually regarded as binary data, governed
by the length value and not by a terminating character.

If the 0-byte is convention, then i would consider to strip
it when recording and to add it when restoring.
But that would be feasible only if namespace "user" is reserved
for C character strings rather than byte arrays.


---
Demo with current development snapshot
  http://scdbackup.webframe.org/xorriso-1.3.7.tar.gz
  MD5 117954392e6359d5be8a2fcc0a965447
  Version timestamp :  2014.06.20.065402
  (in xorriso/xorriso_timestamp.h or in output of xorriso -version)


# Create files with extattr in suitable filesystem /mnt/ffs1

  echo x > /mnt/ffs1/x
  echo y > /mnt/ffs1/y
  setextattr user bla xbla /mnt/ffs1/x
  setextattr user bla YYYbla /mnt/ffs1/y

  getextattr user bla /mnt/ffs1/x
  getextattr user bla /mnt/ffs1/y
# yields:
#   /mnt/ffs1/x xbla
#   /mnt/ffs1/y YYYbla

# Record in an ISO 9660 filesystem image (/tmp/extattr_test.iso),
# or in disk device (/dev/wd2f), or on optical medium (/dev/rcd0d).
# I avoid the directory /mnt/ffs1/.attributes which is not readable
# for the common user.

  drive=/tmp/extattr_test.iso

  xorriso \
-for_backup \
-outdev "$drive" \
-blank as_needed \
-add /mnt/ffs1/* --

# Inspect ISO

  xorriso \
-for_backup \
-indev "$drive" \
-getfattr_r / --

# yields:
#   # file: mnt/ffs1/x
#   user.bla="xbla\000"
#   # file: mnt/ffs1/y
#   user.bla="YYYbla\000"

# Restore /mnt/ffs1 from ISO into a new directory /mnt/ffs1/xorriso_extracted

  xorriso  \
-for_backup \
-indev "$drive" \
-osirrox on \
-extract /mnt/ffs1 /mnt/ffs1/xorriso_extracted

# Inspect in FFSv1

  getextattr user bla /mnt/ffs1/xorriso_extracted/*

# yields:
#   /mnt/ffs1/xorriso_extracted/x   xbla
#   /mnt/ffs1/xorriso_extracted/y   YYYbla



getextattr(1) has problems:


  echo a > /mnt/ffs1/a
  setextattr user bla  /mnt/ffs1/a

  getextattr user bla /mnt/ffs1/*
# yields:
#   /mnt/ffs1/a 
#   /mnt/ffs1/x xblaAAA
#   getextattr: /mnt/ffs1/xorriso_extracted: failed: No message available
# Note that the content of ./x is shown mangled and ./y is not listed.

  getextattr user bla /mnt/ffs1/x
# yields:
#   /mnt/ffs1/x xbla

  getextattr user bla /mnt/ffs1/y
# yields:
#   /mnt/ffs1/y YYYbla

# Note that all three files were treated with setextattr(1)
# and not created by xorriso. I am not to blame (TM).



The problem with the mangled values is in
  /usr/src/usr.bin/extattr/extattr.c

In line 370 it asks for the size of the attribute

if (flag_nofollow)
error = extattr_get_link(argv[arg_counter],
attrnamespace, attrname, NULL, 0);
else
error = extattr_get_file(argv[arg_counter],
attrnamespace, attrname, NULL, 0);
if (error < 0)
break;

submits it to the local function

mkbuf(&buf, &buflen, error);

which leaves buflen unchanged if the new demand "error" is smaller
than the previous one. 

In line 409 it uses buflen as the length of the value

fwrite(buf, buflen, 1, stdout);

The small patch would be

-   fwrite(buf, buflen, 1, stdout);
+   fwrite(bu

Re: ffsv2 extattr support

2014-06-19 Thread Thomas Schmitt
Hi,

> you need to rebuild a kernel with the following options:
> options UFS_EXTATTR
> ...

Now it seems to work.
  netbsdcur# getextattr user bla /mnt/ffs1/x
  /mnt/ffs1/x blablabla

I will report as soon as i can offer backup for NetBSD extattr.


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-19 Thread Thomas Schmitt
Hi,

my NetBSD 6.99.44 seems to be complete now. From daily build ISO+FTP
and then from cvs checkout -A of yesterday. Guided by
  http://www.netbsd.org/docs/guide/en/chap-fetch.html#chap-fetch-cvs
  http://www.netbsd.org/docs/guide/en/chap-updating.html

But i had no success with getting an FFSv1 filesystem with extattr
support. In part it seems to believe to be able, but the real
action does not work.

---

netbsdcur# disklabel -i -I wd1
partition>N
Label name [fictitious]: ffs1test
partition>W
Label disk [n]?y
Label written
partition>E
# /dev/rwd1d:
type: ESDI
disk: QEMU HARDDISK   
label: ffs1test
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 260
total sectors: 262144
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

4 partitions:
#sizeoffset fstype [fsize bsize cpg/sgs]
 a:262144 0 4.2BSD  0 0 0  # (Cyl.  0 -260*)
 d:262144 0 unused  0 0# (Cyl.  0 -260*)
partition>Q

--

netbsdcur# newfs -O 1 /dev/rwd1a
/dev/rwd1a: 128.0MB (262144 sectors) block size 8192, fragment size 1024
using 4 cylinder groups of 32.00MB, 4096 blks, 7936 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 65568, 131104, 196640,

--

netbsdcur# mkdir /mnt/ffs1
netbsdcur# mount /dev/wd1a /mnt/ffs1
netbsdcur# cd /mnt/ffs1
netbsdcur# mkdir -p .attribute/system .attribute/user
netbsdcur# cd ..
netbsdcur# umount /mnt/ffs1

netbsdcur# mount -o extattr /dev/wd1a /mnt/ffs1
mount_ffs: /dev/wd1a on /mnt/ffs1: Operation not supported

netbsdcur# mount
/dev/wd0a on / type ffs (local)
...
/dev/wd1a on /mnt/ffs1 type ffs (extattr, local)

--

netbsdcur# echo xyz > /mnt/ffs1/x

netbsdcur# getextattr system bla /mnt/ffs1/x
getextattr: /mnt/ffs1/x: failed: Operation not supported

netbsdcur# setextattr user bla blablabla /mnt/ffs1/x
setextattr: /mnt/ffs1/x: failed: Operation not supported

netbsdcur# getextattr system bla /mnt/ffs1/x
getextattr: /mnt/ffs1/x: failed: Operation not supported

--

But statvfs(2) as well as mount(8) indicate that extattr is enabled:

int main(int argc, char **argv)
{
 int ret;
 struct statvfs buf;

 ret = statvfs("/mnt/ffs1/x", &buf);
 if (ret == 0) 
   printf("MNT_EXTATTR in statvfs.f_flag : %d\n", buf.f_flag & MNT_EXTATTR);
 exit(!!ret);
}

yields:

  MNT_EXTATTR in statvfs.f_flag : 16777216

---

Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-18 Thread Thomas Schmitt
Hi,

Emmanuel Dreyfus :
> format as FFSv1

That would be newfs -O 1 ? After disklabel ? (I will start with an
empty qemu disk.)
Up to now, it was always the installation software which formatted
my NetBSD disks. Let's hope the manual will guide me through this.


> ...
> mount -o extattr

I will try to achieve this when i got my new 6.99.44 installation
complete.

Meanwhile i have tested the FreeBSD-specific code with NetBSD
on a default filesystem (FFSv2 ?).
It works so far, but the reaction on non-extattr filesystems is
too harsh. I'll have to better distinguish filesystems without
extattr support from read problems with individual files.

Is there a global indication (in userspace) whether a mounted
filesystem supports extattr ? statvfs(2) seems not to tell. 


Have a nice day :)

Thomas



Re: ffsv2 extattr support

2014-06-18 Thread Thomas Schmitt
Hi,

> Mark Davies:
> >   back in Jan 2012 there was some discussion about ffsv2 extattr 
> > support initiated by Manuel Bouyer.  What is the current state of 
> > that?  Is there anything in Current?

Emmanuel Dreyfus:
> AFAIK extattr are only availabe in FFSv1, and userland tools sur as
> dump/restore/tar lack support for it, which means we cannot backup them.

In principle, my xorriso would be able to record and to restore
extattr and ACLs in ISO 9660 filesystems as SUSP entries. Currently
it is lacking the NetBSD functions to read and write them from
and to other filesystems.

I see there is a man page for extattr_get_file(2) et.al.
A FreeBSD implementation of the extattr interface is already existing
in libisofs. For a test i hacked the selection of conditional code and
it does at least compile on NetBSD.

So i would next need some instructions how to create a NetBSD
filesystem with extattr in order to verify that xorriso
properly records the attributes in its ISO 9660 filesystems
(as SUSP entries).
For restoring with extattr, one would have to employ xorriso
or implement AAIP in sys/fs/cd9660:
  http://libburnia-project.org/wiki/AAIP


Do i get it right that there is no ACL support in NetBSD ?
(At least there is no man page for acl_get_file().)


Have a nice day :)

Thomas



Re: How to keep the kernel from crashing on cd9660 error ?

2014-06-04 Thread Thomas Schmitt
Hi,

> EOPNOTSUPP doesn't look right here ...

The file is complying to ISO 9660 specs but indigestible
for any reader that relies on block-to-block mapping.
One would need byte-to-byte mapping to support that file.
NetBSD cannot, and it seems Linux isofs can't either.

Do you have an idea for a better errno in such a case ?


> The upcoming conversion to vcache should make it easier to deal with
> these problems.  I will add some priority to the conversion of cd9660.

Priority for cd9660 is highly appreciated, of course.

I have pending kern/48808 for mount option -s, and i am fine-tuning
the proposal for large data files.
So we would need to coordinate.

It would be helpful, if a test could be made with the error
generator code in my initial mail of this thread.
I have looked into other fs drivers of NetBSD meanwhile.
They all do similar error handling.

So i have to consider the possibility that only my NetBSD
copy is faulty.
The bug fixes, which i proposed and got comitted, can hardly
be to blame for any memory problem.

All is well, as long as i do not want to bail out after having
allocated the vnode. But the kernel becomes a mine field if
i do not follow the code path up to the end of function
cd9660_vget_internal().
Does this ring a bell ?


Have a nice day :)

Thomas



How to keep the kernel from crashing on cd9660 error ?

2014-06-03 Thread Thomas Schmitt
Hi,

i could need advise about getnewvnode(9) and how to revoke
the creation of the vnode.

While testing my next change proposal for stability with
undigestible ISO 9660 files, i experienced kernel crashes which
look like memory corruption.

To prove that my changes are not to blame, i installed a little
error generator in the current cd9660_vfsops.c, at the place
where my new code will throw EOPNOTSUPP because of an undigestible
file.

It triggers the same crash as the real error complaint in my
changed code. So the problem already sits in cd9660.

I could possibly fake an ISO image which would trigger an error
condition that is already in function cd9660_vget_internal() and
very near to the spot where my test causes havoc.

So this could be a DoS attack path.

---

What happens in cd9660_vget_internal() is about this:

- Input is the inode number.

- Shortcut is tried for cached vnode. No problem if it triggers.

- getnewvnode() obtains a new vnode.
  (It is needed at latest, when the directory record of the desired
   ino number shall be read. So this creation cannot be delayed after
   the error situation which is triggered by that record.)

- pool_get() obtains a iso_node for the new vnode.

- Obviously a check for race condition is made. (No problem.)

- Several operations are done which have the potential to cause
  an error. Most of them do in this case

vput(vp);
if (bp != 0)
brelse(bp, 0);
return (E...);

  So wanted i. But that seems to be a bad idea.

My mock-up in current cd9660_vfsops.c throws an error with every
third VOP_LOOKUP(9) or VOP_VGET(9) call.
It survives the first such error occasion and crashes on the
second occasion.
---
--- cd9660_vfsops.c.patch_006   2014-06-01 13:16:27.0 +
+++ cd9660_vfsops.c 2014-06-03 15:47:32.0 +
@@ -858,6 +858,19 @@ cd9660_vget_internal(struct mount *mp, i
break;
}
 
+/* <<< Error mock-up */
+{ static uint64_t error_cycler = 0;
+   error_cycler++;
+   if ((error_cycler % 3) == 0) {
+   printf("cd9660_vfsops.c: Deliberate error EOPNOTSUPP\n");
+   vput(vp);
+   if (bp != 0)
+   brelse(bp, 0);
+   return (EOPNOTSUPP);
+   }
+}
+
+
if (bp != 0)
brelse(bp, 0);
---
(I am aware there is a resource leak about iso_node.)

With this kernel booted, i do

  netbsd# mount_cd9660  '/dev/wd1f' '/mnt/iso'
  netbsd# ls -l /mnt/iso
  ls: my: Operation not supported
  total 8
  dr-x--  1 thomas  wheel  2048 May  3 14:58 dev
  dr-x--  1 thomas  wheel  2048 Jan 19 14:41 reg
  -r  1 thomas  dbus  6 May  6 15:34 small_file
  netbsd# ls -l /mnt/iso

This yields crash and reboot.

  netbsd# crash
  crash> dmesg
  ...
  cd9660_vfsops.c: Deliberate error EOPNOTSUPP
  panic: kernel diagnostic assertion "(*vpp)->v_size != VSIZENOTSET && 
(*vpp)->v_writesize != VSIZENOTSET" failed: file 
"/usr/src/sys/kern/vnode_if.c", line 124
cpu0: Begin traceback...
  vpanic(c2f5d840,c26a8800,2,0,daabdf68,c093e072,c2fb4d40,ff9c,bb90a3f4,0) 
at netbsd:vpanic+0x120
  cpu0: End traceback...

I only see the message of the first occasion. The second one did
not come through. But i am quite sure a second one happened.
At least i had to "ls -l" my bad file two times, before i began
to worsen the situation by adding diagnostic code.

---

What makes me think of memory corruption:

- Varying last screams in crash command "dmesg", when i tried to hunt
  down the problem in my changed code.

- Unplausible code paths. E.g. above KASSERT in
/usr/src/sys/kern/vnode_if.c
  is supposed to get in effect with error == 0, but triggers
  only if cd9660 is supposed to have returned error != 0.

- Symptoms getting worse if i insert printf() to trace the
  upward propagation of the error return value.
  It crashes already on the first error occasion and with more
  dramatic messages in crash's dmesg:

uvm_fault(0xc2a2a920, 0, 2) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 2 eip c02579d1 cs 8 eflags 10282 cr2 1c ilevel 0 esp 
c0943d3d
curlwp 0xc2fb5d40 pid 815 lid 1 lowest kstack 0xda96b2c0
panic: trap
cpu0: Begin traceback...
uvm_fault(0xc2a2a920, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c029e324 cs 8 eflags 10246 cr2 6 ilevel 0 esp 0
curlwp 0xc2fb5d40 pid 815 lid 1 lowest kstack 0xda96b2c0
Skipping crash dump on recursive panic
panic: trap
Faulted in mid-traceback; aborting...

---

My question is: How i shall repair this function, so that it
can revoke the creation of the vno

Re: Lockless IP input queue, the pktqueue interface

2014-05-30 Thread Thomas Schmitt
Hi,

Thor Lancelot Simon:
> Indeed, I note that over in tech-kern there is a long running thread
> in which a user, trying to debug a problem with NetBSD, complains that
> internals of the cd9660 implementation are *not* properly hidden

Urm, i did not complain but asked about the API/ABI rank of
those .h files in /usr/include.

And i am not really a user but rather an upstream programmer of an
ISO 9660 production program, who wants a decent mount environment
for the results of that program. Results so far:
1 user aquired (actually for Xfburn).
3 own packages promoted from wip to pkgsrc.
2 kernel kernel bug fixes and 2 userland bug fixes are committed.
1 bug pending: kern/48815, 1 enhancement pending: kern/48808.
1 obtrusive change proposal to come for supporting large data files.
  (It is being tested meanwhile.)


> He seems shocked
> that we do not go (did not go) much farther to hide implementation
> details

Well, i was surprised to see obvious kernel entrails published.
So i asked for instructions about how much care to take with
changes.

Meanwhile i learned that
  /usr/src/usr.sbin/makefs
is compiling
  /usr/src/sys/fs/udf/udf_osta.c
and
  /usr/src/usr.sbin/mtree/spec.c

So strict encapsulation seems to have never been enforced.

On the other hand, as system-neutral userland programmer, i would
not dig in the  files in order to find some
kernel interface on which i could daringly rely.
I rather have to take care to keep any system dependency in
special adapter modules, so that my code stays portable.

Still not decided is whether i shall keep <.../cd9660_node.h>
API-compatible in my change to come.
It will cost sizeof(void *) in each ISO 9660 vnode, but on the
other hand it saves the work to find any includer or copier of 
the files which i propose to change.

There are such "friends" of cd9660 and i can hardly propose
to break them without providing fixes.
But i already see problems with getting reviews for 48808 and
for my change-to-come. And most of the known affected programs
cannot easily be linked here because of obvious libc
incompatibilities between CVS and NetBSD-6.1.3.
So i could test only somewhat hacked versions.


Note well: I don't complain. It's a do-ocracy.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-29 Thread Thomas Schmitt
Hi,

David Holland:
> at least in the case of ufs (dunno about cd9660) the
> definitions are all mixed together.

In cd9660 it is the full spectrum:

 What i deem legit userland API

cd9660_mount.h : struct iso_args and its flags for mount(2).

 Somewhat exotic userland API supplements

iso.h  : Constants and structs describing the byte layout of
 ISO 9660 storage entities. Educational and re-usable
 because supposed to stay absolutely stable.

cd9660_rrip.h  : About the same for ISO 9660 add-on SUSP/RRIP (Rock Ridge).

 API for binaries intimately related to the kernel

cd9660_node.h  : In the non _KERNEL part it exposes the filesystem specific
 struct iso_node which serves as vnode.v_data:
 - struct iso_node points to the ISO 9660 directory record
   on disk.
   I did not yet decide whether i will propose to break
   this API aspect. Currently i tend to not breaking it.
 - type ISO_RRIP_INODE representing RRIP POSIX attributes.
 - Macro to retrieve iso_node from vnode.
 - Macro to retrieve vnode from iso_node.
 - a riddling flag bit IN_ACCESS which gets set but
   not inquired in cd9660
 - two riddling macros i_forw, i_back which are unused
   in cd9660.
 - To come (in order to keep that API fully usable after
   my changes to cd9660 and especially iso_node):
   - Macros to tell presence of multi-extent code in
 cd9660 and to tell its limitations.
   - A description of address and section count encoding
 in ino_t.
   - Macros to retrieve address and section count.
   - Macro which defines a C function which can tell
 the overall byte count of the data file represented
 by an iso_node.
 To be implanted and called in programs which cannot
 keep themselves from groping through VFS.

cd9660_extern.h: In non _KERNEL area there are a few entrails of the
 filesystem driver, and the struct iso_mnt which serves
 underneath struct mount as .mnt_data.
 Programs which peek into iso_node might be interested
 in hopping further to iso_mnt.

 Internal headers

iso_rrip.h : Internal definitions for the RRIP interpreter.
 No _KERNEL area.
 This interpreter is unreachable from userland, afaics.

--

The cleanup effect of removing only iso_rrip.h would be tiny.

A more radical approach would be to demand that the intimately kernel
related binaries include headers from /usr/src/sys/fs/cd9660 rather
than from /usr/include/isofs/cd9660.
This would make clear that they are depending on inofficial kernel
entrails.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-28 Thread Thomas Schmitt
Hi,

[skipping CVS discussion aspects]

matthew green:
> all we have to do is, like this case, simply not install them.

But i had to learn that this would break fstat(1) and pmap(1).
So i do not dare to uphold my proposal to remove the semi-private
header files.


In the other thread about the particular changes for large
file support in cd9660, the opinion seems to be that both programs
and any alikes (lsof ?) get no warranty of API stability with
these headers.

Nevertheless, nobody stated that a tiny increase in the memory
consumption of a vnode would be a problem. (I already had to go
half the way to ensure ABI compatibility.)

In this case i can keep up API/ABI compatibility for quite a low
price. Old includers will of course only see the first file section,
unless they enhance their API usage to the new capabilities. But
nothing will break or be worse than before the change.

So currently, i am testing this variant.
(I will be ready to follow any decision, if stronger opinions
 appear.)


Have a nice day :)

Thomas



Re: How to go on with ISO 9660 large file support ?

2014-05-27 Thread Thomas Schmitt
Hi,

Christos Zoulas:
> I'd say it is not critical that those work if we have to run through
> hoops to achieve compatibility. I.e. clean code would be my priority.

The fully API compatible code is slightly cleaner than the
only ABI compatible code, which saves sizeof(void *) with
every vnode/iso_node.

The main advantage of the API-compatible-wasteful variant
is that kmem_free() is a bit less frightening. (Actually
fully non-scary would be malloc(9)/free(9).)

if (ip->iso_sections != NULL)
kmem_free(ip->iso_sections,
  CD9660_FSECT_FROM_INO(ip->i_number)
  * sizeof(struct iso_file_section));
ip->iso_sections = NULL;

versus

if (CD9660_FSECT_FROM_INO(ip->i_number) > 1)
kmem_free(ip->iso_fsect.many,
  CD9660_FSECT_FROM_INO(ip->i_number)
  * sizeof(struct iso_file_section));
iso_set_ino_nfsect(&ip->i_number, 1);


Another advantage of API-compatible-wasteful is that there will
be no new union defined in cd9660_node.h

--
API compatible struct iso_node (freestyle diff):

+ struct iso_file_section {
+   unsigned long isofsc_size;  /* byte count */
+   unsigned long isofsc_start; /* block address */
+ };
+
  struct iso_node {
...
-   unsigned long iso_extent;   /* extent of file */
-   unsigned long i_size;
-   unsigned long iso_start;/* actual start of data of file 
(may be different */
+   /* These three numbers represent the
+* first extent and file section.
+* If CD9660_FSECT_FROM_INO(.i_number)
+* is larger than 1, then .iso_sections
+* is valid.
+*/
+   unsigned long iso_extent;   /* will not be used by cd9660 */
+   unsigned long i_size;   /* byte count */
+   unsigned long iso_start;/* block address */
...
+   struct iso_file_section *iso_sections;  /* Holds info about all file
+* sections if
+* CD9660_FSECT_FROM_INO(
+*   .i_number)
+* is larger than 1.
+*/
  };

--
ABI-only compatible:

+ struct iso_file_section {
+   unsigned long isofsc_extent;/* will not be used by cd9660 */
+   unsigned long isofsc_size;  /* byte count */
+   unsigned long isofsc_start; /* block address */
+ };
+ union iso_file_data {
+   struct iso_file_section single; /* used if only one extent exists:
+* CD9660_FSECT_FROM_INO(.i_number)==1
+*/
+   struct iso_file_section *many;  /* allocated with enough array elements
+* for CD9660_FSECT_FROM_INO(.i_number)
+* if this is larger than 1.
+*/
+ };
+
  struct iso_node {
...
-   unsigned long iso_extent;   /* extent of file */
-   unsigned long i_size;
-   unsigned long iso_start;/* actual start of data of file 
(may be different */
+   union iso_file_data iso_fsect;  /* File sections. Their number is given
+* by CD9660_FSECT_FROM_INO(.i_number)
+*/
...
  };

--

Have a nice day :)

Thomas



Re: How to go on with ISO 9660 large file support ?

2014-05-27 Thread Thomas Schmitt
Hi,

> New code should use kmem(9) for variable-sized or one-time
> allocations,

Currently it is running on kmem(9).
There is old malloc(9) usage in cd9660 which i tried to mimic
until i came to the deprecation statement in the man page.

Shall i change the old malloc(9) usage, too ?
Or in a later PR, maybe ?


> For the most part, struct iso_node is private to the kernel
> implementation of cd9660.  

Regrettably there is a #ifdef _KERNEL in cd9660_node.h
and struct iso_node is defined outside of that area.


> Since fstat(1) and pmap(1) use it, you
> should avoid changing the offsets within the structure of the members
> that fstat(1) and pmap(1) use,

I hope to have achieved that on all non-128-bit machine types by:

  struct iso_file_section {
unsigned long isofsc_extent;/* will not be used by cd9660 */
unsigned long isofsc_size;  /* byte count */
unsigned long isofsc_start; /* block address */
  };
  union iso_file_data {
struct iso_file_section single; /* used if only one extent exists:
 * CD9660_FSECT_FROM_INO(.i_number)==1
 */
struct iso_file_section *many;  /* allocated with enough array elements
 * for CD9660_FSECT_FROM_INO(.i_number)
 * if this is larger than 1.
 */
  };
  ...
  struct iso_node {
...
  #ifdef CD9660_MULTI_EXTENT
union iso_file_data iso_fsect;  /* File sections. Their number is given
 * by CD9660_FSECT_FROM_INO(.i_number)
 */
  #else
unsigned long iso_extent;   /* extent of file */
unsigned long i_size;
unsigned long iso_start;/* actual start of data of file 
(may be different */
/* from iso_extent, if file has extended 
attributes) */
  #endif /* ! CD9660_MULTI_EXTENT */
...
  };


> Some day, we ought to find a better way to do what fstat(1) and
> pmap(1) are doing with kernel guts.

At least they should make sure that the binary and the kernel
are compatible. As is done with dynamic linking and .so numbers.


> It would be nice if you could integrate your tests into the atf file
> system tests under src/tests/fs so that they get run automatically.

I promise to learn what's needed and to finally provide such
a test. As said, currently it is neither complete nor easily
portable to a different computer.


> This sounds bad (but not at all surprising).  If the `interesting
> effects' are anything more than harmless messages printed to the
> console and unreadable images -- e.g., programs crash, kernel panics
> or discloses private memory to userland, &c. -- it would be nice if
> you could identify little fixes or at least early checks that can be
> easily back-ported.

It's inbetween: Confusing to userland because file names get shown
twice or more, bear all the same attributes and content, although
the causing files do differ. That's why i propose to drop duplicate
names in favor of a probably best and youngest survivor.

No crashes yet.
The NetBSD-6.1.3 bug of kern/48787 is reported and fixed.

PR 48815 is still open. It shows behavior in the above confusion
class. I.e. ISO files hardly usable but rest of system still healthy.


> > The hardest reason why this information has to be encoded in ino_t
> > is the desire to implement method VFS_VGET(9).

> NFS uses VFS_VGET, so it would be good to continue to support that.

Isn't it rather using VFS_FHTOVP(9) ?
Currently its implementation cd9660_fhtovp() indeed uses VFS_VGET(),
but it could as well use other means to identify an iso_node and
its boss vnode.

The address encoding in ino_t would be a short-sighted hack, if not
NetBSD was far-sighted enough to have 64 bit ino_t.
It even provides enough bits to work around a shortcomming in the
list format of ISO 9660 directory records.

Nevertheless, after bin/48799 and applying my large-file proposal,
you will get to see things like this:

  netbsd# ls -li /mnt/iso/my
  total 8455812
  281479306324160 -r  1 thomas  dbus  4329375744 May  6 15:30 large_file

(Inode number in hex = 0x100010210a8c0 :
 A file with two sections. Directory records located above 4 GiB.
 Only 49 of 61 possible bits used.)


> But embedding an ino_t in struct iso_node is probably the easiest way
> to do it, and will avoid breaking fstat(1) and pmap(1).

A unique inode number is a must-have. But for that, ISO 9660
would not need 64 bit.

pmap(1) will not be affected by the API change, because it is
only interested in members of iso_node before the ones which
have to change.

fstat(1) is reading the size (of the first file section),
which still works well on i386 with the changed kernel,
but would fail to compile with the new cd9660_node.h.
(I plan to fix this, of course.)
It is 

Re: How to go on with ISO 9660 large file support ?

2014-05-27 Thread Thomas Schmitt
Hi,

Wolfgang Solfrank:
> I'm a bit scarce on round tuits right now,

The global tuits shortage crisis is of course a big obstacle.
Especially since ISO 9660 is essential when booting installation
CDs or DVDs. I plan to repack NetBSD-6.1.3-i386.iso with my
development kernel and to perform an installation.
(It is not nice of makefs to hide the El Torito boot image in
 blind space. No i have to guess its size.)

Currently i am looking for critical comments and questions about
the overall implementation concept.

A decision would be nice, whether API compatibility for cd9660_node.h
is worth an extra pointer in each iso_node (= vnode.v_data).
As stated, fstat(1) would be affected by an API break. On the other
hand, half of my CVS /usr/src seems to suffer from API problems with
the NetBSD-6.1.3 system installation.

Heavy testing makes sense only after this decision.


> RRIP doesn't have the concept of multiple versions of a file.

Yes. It looks funny if there is more than one ISO file version
which all have the same RRIP name. I tested this. My proposal
showed the best results among the tested systems.


> What 6.1.3 do you refer to here? 

NetBSD-6.1.3 i386. The host operating system is Debian 6 amd64.
My development subject from CVS refers to itself as NetBSD-6.99.40
(#190 meanwhile).


> Regarding associated files, I saw those used by Apple for their
> resource fork.  Albeit I haven't seen any ISOs with them recently.

Probably i will have to fake some by hand.
This implies the risk, though, that i got the specs wrong in both
cases. A real test case would be much more significant.

In general i would say that large files are much more common than
Associated Files, multiple Versions, or ISO 9660 Extended Attributes.
So even if there is a risk to break those, it still seems worth.


Have a nice day :)

Thomas



How to go on with ISO 9660 large file support ?

2014-05-27 Thread Thomas Schmitt
Hi,

i now have running support for large data files in cd9660.
Shall we first discuss it here, or shall i submit a large PR which
motivates the change, lengthily assesses the situation, presents the
model change, and proposes a patch ? 

Overview:

The goal is to let cd9660 recognize files with multiple file sections
and represent their multiple directory records as a single vnode with
a uniform byte space.

There shall be no duplicate filenames presented to VFS. If files with
equal names are found, then only the last one of those with the highest
ISO 9660 version number will be visible.
This warranty depends on properly sorted ISO 9660 directories.
The decision, which intervals of directory records form single files
depends on properly set ISO 9660 Multi-Extent flag bits.


The filesystem specific vnode.v_data struct iso_node needed a change
to represent the 1:n relation between file and file section.
This change caused adjustments all over the code of cd9660.
It makes nearly full use of the 64 bits of NetBSD's ino_t and
employs malloc(9) or kmem(9) memory for files with more than one
section. (Currently 12 bytes per file section.)

ABI compatibility of the changed struct iso_node is guaranteed for
systems with up to 96 bit pointer types.
The API of cd9660_node.h is not compatible, in my current
implementation. An alternative implementation is possible with
100 % API/ABI compatibility for single-section files.
It would cause 4 to 8 bytes size increase of struct iso_node.

fstat(1) and pmap(1) include .

Several implementations of interface methods are affected:

- cd9660_readdir() serving as VOP_READDIR(9)

  The case of mount -o norrip,nogens already used a delivery function
  with delayed file candidates: cd9660_vnops.c : iso_shipdir().
  Originally it only had the task to find the youngest version of a
  ISO 9660 data file. Now it also counts the follow-up directory
  records of the same file and skips over them.

- cd9660_lookup() as VOP_LOOKUP(9)

  mount -o norrip returned the last record of matching name,
  whereas -o rrip returned the first matching record.
  Now norrip,nogens with a healthy ISO 9660 filesystem drops only older
  versions of the same name.
  All three filesystem interpretation types now return the ino based
  on the byte address of first record of the winning file and on its
  number of file sections.

- cd9660_vget_internal() serving effectively as VFS_VGET(9)

  If the ino_t input parameter indicates a number of file sections larger
  than one, then the created iso_node gets malloced iso_node.iso_fsect.many
  as type M_ISOFSFSECT.
  The iso_node.i_number will indicate a file section count larger than 1
  only if such memory is attached to the iso_node and valid.

- cd9660_bmap() as VOP_BMAP(9)

  Nothing changes for files with a single file section.
  Those with more file sections need a loop to find the section which
  holds the desired block. Similar to the case of a single section, the
  last section will be base of the resulting block address, regardless
  whether its size includes the desired block.


The changes are tested by an ISO 9660 filesystem with large data file,
and examples for the Rock Ridge POSIX file types regular, directory,
block device, fifo, symbolic link. The test for block device
functionality makes it necessary to create it especially for a local
readable device file.
xorriso perception of Rock Ridge aspect:
  dr-x--1 1000 0   0 May  6 15:31 '/'
  dr-x--1 1000 0   0 May  3 14:58 '/dev'
  prw---1 1000 0   0 May 24 14:29 '/dev/test.fifo'
  br1 1000 50,12 May 14 14:33 '/dev/wd1e'
  dr-x--1 1000 10000 May  6 15:30 '/my'
  -r1 1000 1000 4329375744 May  6 15:30 '/my/large_file'
  dr-x--1 1000 0   0 Jan 19 14:41 '/reg'
  -r-x--1 1000 0  133411 Jan 19 14:41 '/reg/tar'
  lr-x--1 1000 0   0 May 24 14:29 '/reg/to_regfile' -> 
'tar'
  -r1 1000 10006 May  6 15:34 '/small_file'
A script is available for creating, mounting, and testing this
filesystem. It is not trivially portable to other computers because
there are several individual adaptions to be made.
Its result varies between
  ###
  # BAD TEST RESULTS: 8 #
  ###
and
  +++
  + All is well +
  +++
I will publish the script and be ready to support its conversion
into an automatic test of what can be tested in general.

There is also an ISO image emerging which (by hex editor) exposes
exotic or even illegal situations. 6.1.3 and the host operating system
show interesting effects when mounting it.


Remaining restrictions:

- ISO 9660 allows a file to be composed of multiple file sections
  with sizes which are not aligned to the filesystem block size.
  cd9660 demands that all but the last file section of a file mus

Re: Performance riddle: mounted fs much faster than dev

2014-05-26 Thread Thomas Schmitt
Hi,

David Holland:
> The short answer is: use rwd1f, not wd1f. 

I tried this already. Re-try on 6.99.40:

  netbsd# dd bs=2048 count=131072 if=/dev/rwd1f of=/dev/null
  131072+0 records in
  131072+0 records out
  268435456 bytes transferred in 36.982 secs (7258543 bytes/sec)

In any case the speed factor of about 15 to 20 cannot be explained
by any CPU effort. top(1) shows 33 % interrupt, and 23 % idle.

Reading the file out of the mounted fs shows 15 % interrupt and
0 % idle. I need to read the full 4+ GiB to get significant
values in top. It is awsome fast. And the MD5 is correct.
The file consists of a few non-zero bytes at start, a zillion zeros,
and a few non-zeros near the end.

On 6.1.3 i experienced faster throughput via /dev/rwd1f up to
the first hundred MB (watched by a pacifier filter named "raedchen"),
but the speed sank heavily afterwards.
It was never near the speed of the mounted file (on 6.99.40).


Don't get me wrong. I'm not complaining.
But i wonder what in cd9660 does the trick.
Is it the readahead information provided by cd9660_bmap() ?
(Good idea to base readahead on the file and not on the device.
 Could be a vaccination against the classic TAO CD readahead bug
 of my host OS.)

Where in /usr/src would i find code for reading ahead in order
to inject a printf() and watch what it is doing ?


Have a nice day :)

Thomas



Performance riddle: mounted fs much faster than dev

2014-05-26 Thread Thomas Schmitt
Hi,

while watching the thread about the performance in the newest
kernel, i experience my own performance riddle.

On a VM with 1 GiB reserved host RAM (which it does not exhaust)
and elsewise idle host system, i see poor performance of a virtual
DOS-MBR-partition as device file, while it delivers data with
satisfying speed from a file out of the mounted device.

  netbsd# dd bs=2048 count=131072 if=/dev/wd1f of=/dev/null
  131072+0 records out
  268435456 bytes transferred in 39.934 secs (6721977 bytes/sec)

Larger dd chunks do not really help (i see non-deterministic
speed variations of up 30 %):

  netbsd# dd bs=65536 count=4096 if=/dev/wd1f of=/dev/null
  4096+0 records out
  268435456 bytes transferred in 43.345 secs (6192997 bytes/sec)

But the mounted filesystem is fast (and correct):

  netbsd# mount_cd9660  /dev/wd1f  /mnt/iso
  netbsd# dd bs=2048 count=131072 if=/mnt/iso/my/large_file of=/dev/null
  268435456 bytes transferred in 1.957 secs (137166814 bytes/sec)

(Since yesterday my cd9660 can do this:
  4329375744 bytes transferred in 19.548 secs (221474101 bytes/sec)
:))

Device reading speed is low on 6.1.3 and 6.99.40 alike (can't
mount the filesystem on 6.1.3, though).

The host disk usually delivers 60 to 80 MB per second, if it does
not have to hop through a thick directory tree.
The host operating system (probably) sees the device as a heavily
sparse file. Both read intervals share a large number of blocks.
So the host system probably can deliver them out of RAM after
the first tests. Nevertheless, device reading shows no signs
of speed improvement if i repeat the same test.


What does cd9660 do so much better than dd or any other user space
program which accesses the device file ?


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-25 Thread Thomas Schmitt
Hi,

after learning the dire truth about includers of cd9660_node.h i am
pondering about ABI compatibility measures for the necessary
change which shall handle large files.

Alternative 1:
  Keep the old members and add some new members to the end of iso_node.
  Let the old members represent the first file section of the file.
  This is supposed to be absolutely API/ABI-safe for files with a single
  file section. (All other files are currently heavily misrepresented.
  Breaking their compatibility is not a real damage.)
  This alternative will at least waste 8 bytes per iso_node.

Alternative 2:
  Introduce a new struct iso_file_section with the same layout as the
  struct iso_node interval with the old members. Replace the old members
  by a union of struct iso_file_section and pointer to that struct.
  This breaks the API, but is supposed to be ABI-safe for machines with
  pointer size of up to 96 bits.
  It will waste at least 4 bytes per additional file section, which in
  normal ISO filesystem is supposed to happen very rarely.

I am currently working with alternative 2.
But if there is consensus to allow the storage waste of alternative 1,
then i could easily switch.

Please decide.

---
Necessities:

In cd9660_node.h : struct iso_node, i plan to replace

unsigned long iso_extent;
unsigned long i_size;
unsigned long iso_start;

effectively by

union iso_file_data iso_fsect;

which is minimally definded as

struct iso_file_section {
uint32_t isofsc_start;  /* block address */
uint32_t isofsc_size;   /* byte count */
};
union iso_file_data {
struct iso_file_section single;
struct iso_file_section *many;
};

This will allow to avoid malloc(9) with the vast majority of
iso_node objects, unless the ISO is very exotic or even malicious.

--
Alternative 1:

+ struct iso_file_section {
+   uint32_t isofsc_start;  /* block address */
+   uint32_t isofsc_size;   /* byte count */
+ };
+ union iso_file_data {
+   struct iso_file_section single;
+   struct iso_file_section *many;
+ };
  struct iso_node {
...
unsigned long iso_extent;
unsigned long i_size;
unsigned long iso_start;
...
+   union iso_file_data iso_fsect;  /* File sections. Their number is given
+* by CD9660_FSECT_FROM_INO(.i_number)
+*/
 };

--
Alternative 2:

+ struct iso_file_section {
+   unsigned long isofsc_extent;/* will not be used by cd9660 */
+   unsigned long isofsc_size;  /* byte count */
+   unsigned long isofsc_start; /* block address */
+ };
+ union iso_file_data {
+   struct iso_file_section single;
+   struct iso_file_section *many;
+ };
  struct iso_node {
...
-   unsigned long iso_extent;
-   unsigned long i_size;
-   unsigned long iso_start;
+   union iso_file_data iso_fsect;  /* File sections. Their number is given
+* by CD9660_FSECT_FROM_INO(.i_number)
+*/
...
 };

--

Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-24 Thread Thomas Schmitt
Hi,

assessment of includers of cd9660 entrails.

The following files where found including isofs/cd9660 files
other than cd9660_mount.h, iso.h, iso_rrip.h :


/usr/src/usr.bin/fstat/isofs.c:#include 
/usr/src/usr.bin/fstat/isofs.c:#include 

The inclusion of cd9660_extern.h is not needed.
cd9660_node.h is used to access several members of struct iso_node.

I disabled
  #include 
and had to work around this unrelated error at compile time
  In file included from /usr/src/sys/sys/mbuf.h:84:0,
   from /usr/src/sys/sys/domain.h:40,
   from fstat.c:54:
  /usr/include/stddef.h:41:25: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'ptrdiff_t'

After hacking /usr/include/stddef.h to
  /*
  typedef _BSD_PTRDIFF_T_ ptrdiff_t;
  */
  typedef int ptrdiff_t;
i could get it compiled up to a linker failure
  fstat.c:(.text+0x13a6): undefined reference to `inet6_getscopeid'
  fstat.c:(.text+0x20b0): undefined reference to `_ctype_tab_'

Thus kern/48808 will not affect ./fstat if only the proposal is
retracted to make cd9660_extern.h private.

The planned support for large files will affect
cd9660_node.h : struct iso_node with its members .i_size and .inode.
This will show up as API change in ./fstat.

Side observation:
man 8 lsof points by its SEE ALSO to fstat(8).
But in NetBSD-6.1.3 and in /usr/src the man page is fstat.1.


/usr/src/usr.bin/pmap/pmap.h:#include 

pmap.c accesses struct iso_node.i_dev and struct iso_node.i_number
via the macro
  V_DATA_IS(vp, iso_node, i_dev, i_number);

Both are not affected by kern/48808 except by the privatisation.
Both will most probably not be affected by changes in
struct iso_node for large file support.

A compile test with the emerging new cd9660_node.h succeeded.


/usr/src/usr.sbin/makefs/cd9660/iso9660_rrip.h:#include 

Should be not affected by changes.
It compiles up to a linker error:
  cd9660.o: In function `cd9660_level2_convert_filename':
  cd9660.c:(.text+0x271): undefined reference to `_ctype_tab_'

Probably because /usr/include/sys/ctype_bits.h defines
  extern const unsigned char  *_ctype_;
where /usr/src/sys/sys/ctype_bits.h has
  extern const unsigned short *_ctype_tab_;

I do not know how to work around.



The compile time problems happen when trying to compile /usr/src from
CVS on a 6.1.3 userland system.
Am i doing something wrong that i do not get included the /usr/src
but rather the /usr/include files ?



Do i make too much thoughts about compatibility of the files
cd9660_node.h and cd9660_extern.h which are obviously entrails of
cd9660, although published ?

Shall i just get ./fstat to work after changing cd9660_node.h
and trust that the installation of ./fstat stays in sync with
the kernel ?

How to get the compilation of /usr/src/usr.bin/fstat right ?


Summary:

It turns out that the file cd9660_rrip.h is more "on-disk format"
than the file iso_rrip.h.
So the legitimatley published files would be
  cd9660_mount.h
  iso.h
  iso_rrip.h
  cd9660_rrip.h
As a funny misnomer, the only effectively private header is
  cd9660_extern.h

The file
  cd9660_node.h
is obviously used to bypass stat(2) when having a vnode.

(Wouldn't it be nice to offer the commoners an API/ABI-stable
 VFS method to obtain a struct stat from its .st_ino ?)


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-24 Thread Thomas Schmitt
Hi,

David Holland wrote:
> In an ideal world, each filesystem would install two headers:
>  - one with the structures, constants, etc. needed to work with the
> on-disk format; this would be used by fsck, newfs, makefs, etc.

Sounds dangerous for my cause.

Obviously my grepping was not good enough on the first try.
  grep -R 'isofs/cd9660/.*\.h' /usr/src
now finds
  /usr/src/distrib/utils/sysinst/util.c:#include 
  /usr/src/usr.bin/fstat/isofs.c:#include 
  /usr/src/usr.bin/fstat/isofs.c:#include 
  /usr/src/usr.bin/fstat/isofs.c:#include 
  /usr/src/usr.bin/pmap/pmap.h:#include 
  /usr/src/usr.bin/pmap/pmap.h:#include 
  /usr/src/usr.sbin/mscdlabel/iso9660.c:#include 

So making these headers private seems to be impossible,
although  cd9660_extern.h and  cd9660_node.h  do not contain
on-disk format info, but rather VFS implementation entrails.

iso.h and iso_rrip.h are indeed representations of the specs.


I would love to justify my changes in
  cd9660_extern.h 
  cd9660_node.h
by Martin Husemanns comment to kern/48808:
  "Forward binary compat like this is not needed - a new mount_cd9660 binary
   can expect a new kernel always."
May i expand this expectation to above includers ?
Will they be recompiled if changed headers become public ?
I.e. may i break ABI compatibility ?

The changes necessary to enhance cd9660 for large data files in
  cd9660_node.h : struct iso_node
will probably affect its includers even on API level.

Less problematic are the planned changes in
  cd9660_extern.h : struct iso_mnt


A main suspect was makefs(8), which i now checked again.
Two includes are suspicious, although i do not understand how
the include path would point to /usr/include/isofs/cd9660/:
  ./cd9660.h:#include "iso.h"
  ./cd9660.h:#include "iso_rrip.h"
These names are not to find underneath
  /usr/src/usr.sbin/makefs
but exist in
  /usr/src/sys/fs/cd9660

Regrettably i cannot yet experiment with it:

  netbsd# cd /usr/src/usr.sbin/makefs
  netbsd# make USETOOLS=never
  ...
  In file included from /usr/src/usr.sbin/makefs/ffs/buf.h:48:0,
   from ffs/ffs_extern.h:35,
   from ffs.c:107:
  /usr/include/stddef.h:41:25: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'ptrdiff_t'


> All the other stuff (in-memory structures, function prototypes, etc.)
> should be in other headers that aren't installed for userland.

This content description applies to all 
except the header for mount(2) type MOUNT_CD9660: cd9660_mount.h
and except the headers with ISO 9660 format structs and constants:
iso.h and iso_rrip.h

But obviously some tools found reasons to include others
nevertheless.

kern/48808 proposes to only keep .
This will need to be changed now.

In the next days i will try to find out how much the found
tools depend on cd9660 entrails and what consequences my
change proposals would have.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-19 Thread Thomas Schmitt
Hi,

Greg Troxel:
> >>   get rid of OBJDIR and DESTDIR
> They would be under /usr/obj after you run build.sh.

None to see:
  netbsd# ls -1 /usr/obj
  sys
  tooldir.NetBSD-6.1.3-i386
  tooldir.NetBSD-6.99.40-i386
  tools

> That's really all
> there is to the script, even though there is more text.  I was really
> just trying to sugges that you read the script as an illustrated example
> of build.sh usage.

Sure. I read. But as said, i don't understand / did not yet encounter
the problems which these settings will avoid.
But i will refer to it whenever i learn about a new problem.

As kernel and NetBSD noob i am thankful for any hints.
Some make me riddle, and some riddles i cannot solve by myself.


> >   cd /usr/src/sys/arch/i386/compile/obj/GENERIC
> >   /usr/tools/bin/nbmake

> If you just want to build a kernel and know nothing else has changed,
> that's an ok shortcut.

It got me running for experiments, for fixing a few small bugs,
and for making an API/ABI extension proposal.


I could need a tip how to juggle with two change proposals on the
same code file at the same time.
My next assault on your patience will touch a file which is part
of the pending change request kern/48808: cd9660_vfsops.c 

Both changes are not much related. So normally they could be
processed independendly.

I can of course handle two local versions of the file and its
neighbors.
But how do i present the change in a patch ?
Shall i base it on the daring assumption that kern/48808 gets
accepted as is ?
Shall i wait with the new PR until kern/48808 is decided ?
Shall i only post assessment and plan as PR and wait with the
patches ?


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-19 Thread Thomas Schmitt
Hi,

> use build.sh -u to avoid the cleandir before the
> build, which is the normal method.

This reduced the run time to 6.7 seconds.
I now see that Martin Husemann already proposed this to me
a few days ago. Somehow i'm a bit overwhelmed by all the
things to learn.

> you may also like the -U flag.  then the entire build runs as
> non-root successfully.

For now i only run a sandbox VM where the superuser cannot
cause much damage. But if some day i manage to get NetBSD on
a real machine, then i will hopefully remember to use it.
So much to explore ...


Well, currently i enjoy the wealth of man 9.
Especially the VFS related pages.
My plan for enhancing the relation of struct iso_node and ISO 9660
file sections from 1:1 to 1:n is getting near to being implementable.
That will become some fat PR. Motivation and plan already have 500
lines. I will have to break them down into digestible pieces.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-19 Thread Thomas Schmitt
Hi,

>   get rid of OBJDIR and DESTDIR

Where would these be, if present ?
(I checked env and cd9660/Makefile.)


>   change a Makefile to no longer install a .h file
>
>   run a full build (from build.sh; see
>   pkgsrc/sysutils/etcmanage:BUILD-NetBSD for my approach)
>
> If there is a program (in the NetBSD base system) that relies on having
> the header in /usr/include, then the build will fail.

I ran make in /usr/pkgsrc to get etcmanage-0.8.2/BUILD-NetBSD.
But i have to confess that i do not understand the problem,
which it will solve.

Whatever, i did run
  cd /usr/src
  ./build.sh -j8 kernel=GENERIC
  ./build.sh tools
  ./build.sh -j8 kernel=GENERIC
with success.
(./build.sh in /usr/src/ always wants me to build tools and then
 spends 14 minutes with a full build. So normally i compile by
   cd /usr/src/sys/arch/i386/compile/obj/GENERIC
   /usr/tools/bin/nbmake
)

Have a nice day :)

Thomas



Re: CVS commit: src/sys/ufs/ufs

2014-05-15 Thread Thomas Schmitt
Hi

David Holland:
> I'm not entirely clear on how this structure is actually used, which
> is why I hadn't yet made this change myself after the issue came up. :-/

My observation in sys/fs/cd9660 is probably the initial cause for
this change. PR kern/48799.

I could produce a problem by exporting to NFS an ISO-9960 filesystem
which gets 64-bit inodes due to high storage loacations of its
directory records. NFS uses these structs to identify files in the
original filesystem. cd9660 then uses the struct ifid member ifid_ino
to fulfill this wish ... if no 32-bit bottleneck prevents it.

So if the fix in ufs causes problems, could you please also have
a look at the fix of kern/48799 ?


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-14 Thread Thomas Schmitt
Hi,

matthew green:
> in src/distrib/sets/lists/comp you'll
> find all these files you're looking at not installing now.
> they'll need to be marked "obsolete" in here.

Guided solely by the examples in there, i make five changes in
  /usr/src/distrib/sets/lists/comp/mi
like
  -./usr/include/isofs/cd9660/cd9660_extern.h comp-c-include
  +./usr/include/isofs/cd9660/cd9660_extern.h comp-obsolete   
obsolete


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-13 Thread Thomas Schmitt
Hi,

Taylor R Campbell:
> Just include  and use uint32_t.

Ok.
In (righteously public) mount_cd9660.h

+ #include 

so that any includer gets defined uint32_t for

  struct iso_args {
  ... old members ...
+ uint32_t ssector;
  };


> There are some that may be useful for userland applications
> to grovel through the physical format,

 could be seen as such a thing. udf exposes
the equivalent, ecma167-udf.h.
But any interested application can just copy it into its own
source tree, if it wants record layouts and number conversion
macros.
So i will propose to remove iso.h from the public list.
It will not hamper my own plans, though, if it is kept public.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-13 Thread Thomas Schmitt
Hi,

Greg Troxel:
> See src/sys/fs/cd9660/Makefile.  It's specified quite straightforwardly.

... it does ?
Yes. If i correct my viewpoint from "making" to "installing",
indicated by the path "/usr/include/isofs/cd9660", then i understand
your hint. (Compliments to you and this list.)

So i would accompany my change proposal by

- INCS=   cd9660_extern.h cd9660_mount.h cd9660_node.h cd9660_rrip.h iso.h \
-   iso_rrip.h
+ INCS=   cd9660_mount.h

and mention it prominently in the PR description ?


> grep on the source tree :-)

Did not bring any more findings, except 9 directory loops.

The open risk is that some application like makefs might rely
on these headers ... /usr/src/usr.sbin/makefs says it does not.
 
Where could i leave an announcement with a pointer to the
PR which removed the headers ?


> > attach a new member
> > (unsigned long int) to its end.
> New types should probably be uint64_t, rather than long.

The size of this variable is determined by ISO 9660 specs.
Block addresses and counters are specified as unsigned 32 bit.

I refrain from exposing uint32_t to the public, because this
usually needs inclusion of some .h file, which might not yet
be required by the cd9660 .h files. (unsigned long int) is
sufficient and harmless. Even sbin/mount_cd9660.c is happy,
because it can printf %lu iso_args.ssector without a cast.


> You can certainly drop them in the Makeefile in your source tree and do
> a build and see what happens.

My builds can hardly have installed or overwritten them.
cd9660_extern.h differs between /usr/include and /usr/src.
The files in /usr/include are of january 19, except those which i
changed manually.
(I just nbmake, copy the new ./netbsd to /netbsd-$number, and change
 the /netbsd link. Reboot. Heuristic procedure crafted from found
 twines and sticks.)

So this part will have to be tested by friendly experts.


I feel encouraged enough to prepare the PR and the first patch
proposal for discussion. It will have several XXX which should be
answered before a committable patch can emerge.


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-13 Thread Thomas Schmitt
Hi,

> I think Greg is right - historic accident.

They are quite up to date.

Is the mechanism known by which /usr/src/sys/fs/cd9660/*.h
gets forwarded to /usr/include/isofs/cd9660 ?
Has some script or makefile to be changed ?


> I would check with nxr.netbsd.org

93 milliseconds later:
No "iso_mnt" in project(s) "src" outside of /src/sys/fs/cd9660/.
  http://nxr.netbsd.org/search?q=iso_mnt&project=src&defs=&refs=&path=&hist=

Cross-check: "iso_args" is public.
nxr.netbsd.org finds /src/sbin/mount_cd9660 and others.

Any other proposals for look-up ?


> Otherwise let common sense decide wether any third-party software would have
> good reasons to use anything in those headers.

I'd really say no.

mount(2) and /usr/include/isofs/cd9660/mount_cd9660.h is as dirty
as you can get in userland. Even VFS is supposed to be encapsulated
by libc, isn't it ?
Those entrails are opaque to VFS. I am certain (and can be wrong).

I will possibly refrain from changing inappropriate (int) to
(uint32_t) in struct iso_mnt, and only attach a new member
(unsigned long int) to its end.
This will not fix potential 4 TiB rollovers, but will be more
compatible and still allow this:

  netbsd# ./mount_cd9660 -s 8768 /dev/cd0d /mnt/iso
  netbsd# ./mount_cd9660 -o getargs /dev/cd0d /mnt/iso
  0x40
  -s 8768



Nevertheless, for clarity and to remove lures for shooting
the own foot, i'd still like to get a decision about the removal
of at least four, better five of the files from /usr/include.
Only then will undue includers break and have a reason to enter
this discussion.

I am sure that one could provoke clear compiler error messages
by installing placeholders for some time of testing.

Encapsulation is a fine design pattern. One should try to enable it.

So if anybody can shed light on the mechanism that publishes
the cd9660/*.h files, then please consider to narrow it to only
 mount_cd9660.h . (iso.h makes few sense either, although
it is not really deliberate kernel entrails.)


Have a nice day :)

Thomas



Re: API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-13 Thread Thomas Schmitt
Hi,

> It's probably a historical accident that so much is installed.

That's a positive opinion, regarded from the viewpoint of my goals.


> In a lot of headers, there is "#ifdef _KERNEL" to keep kernel-private
> bits from being accessed by userland.

That's semi-bad news.
My main candidate for incompatible changes is cd9660_extern.h.
It does have such an #ifdef, and the struct iso_mnt which i want
to change is outside of it.

So somehow it was intended to have it exposed. But why ?

It is kindof a cookie, which VFS initially gets from cd9660 and
maybe derived from VFS objects whenever this ISO filesystem as
a whole needs to be referenced:
  int
  cd9660_blkatoff(struct vnode *vp, ...
  {
struct iso_node *ip;
struct iso_mnt *imp;
...
ip = VTOI(vp);
imp = ip->i_mnt;


> Likely all of this could be improved.

I am willing to do the necessary work if i get some instructions
and decisions about interfacing with userland and community.

My plans for now are:

A mount_cd9660 option -s like with the other BSDs.
This will need an API/ABI change which i seem to achieve in a very
compatible and transparent way. I am meanwhile testing userland and
kernel. Old and new in all combinations seem to cooperate neatly.
But the public cd9660_extern.h causes extra pain.

Support for data files of 4 GiB or larger.
This will have to change some assumptions of cd9660 about how
ISO 9660 allows file content to be arranged.
Assessment of relations still underway. One bug caught already
(kern/48799).
It has to be feared that i have to touch several .h files.
So a decision about their API/ABI-ness will be needed there too.


Have a nice day :)

Thomas



API/ABI rank of headers in /usr/include/isofs/cd9660

2014-05-13 Thread Thomas Schmitt
Hi,

while fixing rollovers and implementing a new mount_cd9660 option -s,
i stumble over the content of public include directory
  /usr/include/isofs/cd9660
which exposes all header files from the kernel source of cd9660:
  cd9660_extern.h cd9660_rrip.h   iso_rrip.h
  cd9660_mount.h  cd9660_node.h   iso.h

cd9660_mount.h is needed to operate mount(2) from userland.
I did not find the spot in the man pages yet, which clearly states
this. Is there a documentation gap between man 2 mount,
/usr/include/sys/mount.h, and man 8 mount_cd9660 ?

Whatever, the others are exposing smelly entrails of the kernel
implementation of cd9660.

Are they to be considered API/ABI nevertheless, and thus need
compatibility precautions ?


Reasons against API/ABI rank, except the smell:

I see a small difference between cd9660_extern.h of /usr/include
(6.1.3) and current /usr/src :
---
1c1
< /*$NetBSD: cd9660_extern.h,v 1.24 2008/06/28 01:34:05 rumble Exp $
*/
---
> /*$NetBSD: cd9660_extern.h,v 1.26 2013/06/23 07:28:36 dholland Exp $  
> */
88,91c88,91
< #define blkoff(imp, loc)  ((loc) & (imp)->im_bmask)
< #define lblktosize(imp, blk)  ((blk) << (imp)->im_bshift)
< #define lblkno(imp, loc)  ((loc) >> (imp)->im_bshift)
< #define blksize(imp, ip, lbn) ((imp)->logical_block_size)
---
> #define cd9660_blkoff(imp, loc)   ((loc) & (imp)->im_bmask)
> #define cd9660_lblktosize(imp, blk)   ((blk) << (imp)->im_bshift)
> #define cd9660_lblkno(imp, loc)   ((loc) >> (imp)->im_bshift)
> #define cd9660_blksize(imp, ip, lbn)  ((imp)->logical_block_size)
---
So API compatibility seems not to be a primary goal here ?

Filesystem UDF, for example, exposes only two of its 6 kernel headers
in /usr/include/fs/udf:
  udf_mount.hdefines public mount(2) struct and flag bits
  ecma167-udf.h  defines C representations of objects out of UDF specs
  
So why would cd9660 need more than this ?
  cd9660_mount.h defines public mount(2) struct and flag bits
  iso.h  defines C representations of objects out of ISO 9660 specs

Already iso_rrip.h defines funtions which are only reachable inside
/usr/src/sys/fs/cd9660. The others expose data structures which i
deem private to the kernel.


I would take outmost precautions for compatibility if the files in
question are really of public API/ABI rank. (Could need some kind
of supervisor for this.)

A change in struct iso_mnt of cd9660_extern.h is inevitable if
NetBSD's mount_cd9660 shall get the option -s, which FreeBSD and
OpenBSD already have. Further there are signed int, where unsigned
is approriate. 

Best would be, of course, if i could consider private all the headers
except  cd9660_mount.h .
My change there is hopefully well behaved in respect to API/ABI
compatibility, thanks to the data_len parameter of mount(2).


Have a nice day :)

Thomas



Re: About a probable 32 bit ino_t bottleneck in cd9660_fhtovp()

2014-05-10 Thread Thomas Schmitt
Hi,

me:
> >  Is there a tutorial "Localhost NFS for Dummies" ?

Taylor R Campbell:
> Just set up /etc/exports (see `man exports') and get rpcbind, mountd,
> and nfsd running:

I edited /etc/exports and did without final success:

  netbsd# cat /etc/exports
  /mnt/iso -ro localhost
  netbsd# mkdir /mnt/iso
  netbsd# mkdir /mnt/nfs
  netbsd# /etc/rc.d/rpcbind onestart
  Starting rpcbind.
  netbsd# /etc/rc.d/mountd onestart
  Starting mountd.
  netbsd# /etc/rc.d/nfsd onestart
  /etc/rc.d/nfsd: WARNING: $mountd is not enabled.
  netbsd# # mount_nfs localhost
  netbsd# mount_cd9660 /dev/cd0a /mnt/iso
  netbsd# mount_nfs localhost:/mnt/iso /mnt/nfs
  mount_nfs: rpcbind to nfs on server: RPC: Program not registered

  ^C
  netbsd# mount
  /dev/wd0a on / type ffs (local)
  kernfs on /kern type kernfs (local)
  ptyfs on /dev/pts type ptyfs (local)
  procfs on /proc type procfs (local)
  /dev/cd0a on /mnt/iso type cd9660 (read-only, NFS exported, local)

Seems to be known as PR bin/45679.

So it is about "mountd being enabled" ?
  http://www.netbsd.org/docs/guide/en/chap-net-services.html
In /etc/rc.conf
  mountd=YES
Progress
  /etc/rc.d/nfsd: WARNING: $rpcbind is not enabled.
In /etc/rc.conf
  rpcbind=YES
And ...
  netbsd# /etc/rc.d/nfsd onestart
  Starting nfsd.
  netbsd# ps -ax | grep nfs
  12681 ? Ssl  0:00.00 /usr/sbin/nfsd -6tun 4 
  netbsd# mount_nfs localhost:/mnt/iso /mnt/nfs
  netbsd# mount
  ...
  localhost:/mnt/iso on /mnt/nfs type nfs
  netbsd# tail -5 /var/log/messages
  May 10 20:32:38 netbsd thomas: /etc/rc.d/nfsd: WARNING: $rpcbind is not 
enabled.
  May 10 20:35:05 netbsd /netbsd: cd9660_fhtovp: ifh.ifid_ino = 34445312
  May 10 20:35:05 netbsd /netbsd: RRIP without PX field?
  May 10 20:35:05 netbsd /netbsd: cd9660_fhtovp: ifh.ifid_ino = 34445312
  May 10 20:35:05 netbsd /netbsd: cd9660_fhtovp: ifh.ifid_ino = 34445312

This matches the inode rollover of /bin/ls of 6.1.3:
  netbsd# ls -lid /mnt/iso
  34445312 drwxr-xr-x  1 root  wheel  2048 Feb  4 18:09 /mnt/iso

Whereas with fix for bin/48798:
  netbsd# /usr/src/bin/ls/ls  -lid /mnt/iso
  4329412608 drwxr-xr-x  1 root  wheel  2048 Feb  4 18:09 /mnt/iso

The NFS mountpoint is unusable
  netbsd# ls -l /mnt/nfs
  -r-xr-xr-x  1 root  wheel  0 Jan  1  1970 /mnt/nfs
But it can be unmouted, at least.

Note: This is a more populated ISO image than the published test ISO.
  Thus the inode numbers might slightly differ.

Now it is worth a PR.
kern/48799:
  32 bit ino_t bottleneck in interface of cd9660 to vfsops sabotages NFS



Strange forth and back after reboot with enabled rpcbind
and mountd:

  netbsd# /etc/rc.d/nfsd onestart
  Starting nfsd.
  netbsd# mount
  /dev/wd0a on / type ffs (NFS exported, local)
  ...
  /dev/cd0a on /mnt/iso type cd9660 (read-only, local)

The wrong filesystem is exported.

  netbsd# cat /etc/exports
  /mnt/iso -ro localhost
  netbsd# /etc/rc.d/nfsd onestop
  Stopping nfsd.
  netbsd# /etc/rc.d/mountd stop
  Stopping mountd.
  ...
  netbsd# /etc/rc.d/rpcbind stop
  Stopping rpcbind.
  ...
  netbsd# /etc/rc.d/rpcbind onestart
  Starting rpcbind.
  netbsd# mount
  /dev/wd0a on / type ffs (NFS exported, local)

Still wrong.

  netbsd# /etc/rc.d/mountd onestart
  Starting mountd.
  netbsd# mount
  /dev/wd0a on / type ffs (local)
  ...
  /dev/cd0a on /mnt/iso type cd9660 (read-only, NFS exported, local)

Now it is right. Why ?  

Theory:
If the exported directory is empty, then it prefers to export root.
The trouble ends when i mount /mnt/iso, stop the demons, and restart
them.

Is my /etc/exports so weird ?


Have a nice day :)

Thomas



About a probable 32 bit ino_t bottleneck in cd9660_fhtovp()

2014-05-10 Thread Thomas Schmitt
Hi,

i riddle under which circumstances function cd9660_fhtovp() is
called (e.g. via VFS_FHTOVP).

Because, if it gets called for a file from my test ISO, then
it should fail due to a 32-bit bottleneck in the cd9660-specific
data of struct fid (i.e. in struct ifid).

I wanted to prepare a PR, but fail to get this function called
so that the problem would show up.

No problem - no report, one could say.
But my findings are suspicious. So i ask here for a use case
of VFS_FHTOVP.

I just read "man 9 vfsops" which says
 int (*vfs_fhtovp)()VFS_FHTOVPNFS file handle to vnode
  lookup
 int (*vfs_vptofh)()VFS_VPTOFHVnode to NFS file handle
  lookup

Is there a tutorial "Localhost NFS for Dummies" ?

My PR sketch:
-
Probable 32 bit ino_t bottleneck in interface of cd9660 to vfsops
-

During my assessment of the relation between struct iso_node and
ISO 9660 File Sections, i came to the definition of the cd9660
version of vsops struct fid.
struct ifid is declared in cd9660_vfsops.c with comment
"File handle to vnode":

  struct ifid {
ushort  ifid_len;
ushort  ifid_pad;
int ifid_ino;
longifid_start;
  };

Function cd9660_vptofh() copies this struct over a struct fid
which is provided by the caller.
Before this copying ifid_ino gets filled with the lower 32 bit of
an ino_t value (ip is a struct iso_node, ip->i_number is ino_t):

ifh.ifid_ino = ip->i_number;

The higher bits are thrown away, obviously.

Function cd9660_fhtovp() copies such bytes back from struct fid
into a local struct ifid and uses .ifid_ino with

if ((error = VFS_VGET(mp, ifh.ifid_ino, &nvp)) != 0) {

VFS_VGET to my assessment is actually function cd9660_vget_internal().
It uses cd9660_ihashget() to look up a vnode.

I inserted a printf() in cd9660_ihashget(), mounted my test ISO,
and got lots of more-than-32-bit numbers into /var/log/messages.
  May 10 18:55:39 netbsd /netbsd: cd9660_ihashget: inum = 4329412608

I moved the printf() to cd9660_fhtovp() and cannot provoke any
such messages in /var/log/messages.

-

Test ISO is

http://scdbackup.webframe.org/large.iso.bz2
 
4470 bytes, MD5 7d78dc3efaec8ea3f1801335329f410d.
It inflates to 4,329,897,984 bytes.

Mountable only if the fix of kern/48787 is applied.

-


Have a nice day :)

Thomas



Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-09 Thread Thomas Schmitt
Hi,

> A PR (Problem Report) in the kern category with an attached unified
> diff would seem adequate if you cannot commit the changes yourself.

I am glad that others will filter my proposals.

kern/48787 can be counted as a successful one.
kern/48797 demonstrates that i need to free myself more from
expectations which occupied my mind when studying isofs of
a different kernel.
Thanks to Martin Husemann for posing the right questions.

My initial questions of this thread are now answered by:

- No, ISO 9660 Extended Attribute is specified as odd as the code
  indicates. (ECMA-119 6.4.4.1. Thanks to Wolfgang Solfrank.)

- No, 64 bit ino_t is a valuable feature of NetBSD that may well
  be exploited for convenience.

A new problem was added to my vague todo list: 

- ls -i shows rolled over 32 bit inode numbers


Have a nice day :)

Thomas



Re: Bug in fs/cd9660 raises questions about inode number computing

2014-05-06 Thread Thomas Schmitt
Hi,

i think i found answers to my questions.

- The inode computation in isodirino() is not necessarily faulty.
  Possibly i misunderstood the meaning of ECMA-119,
  "9.4.2 Extended Attribute Record length".

- The inode numbers (which currently are byte addresses of
  directory records) can be scaled down by a factor of 32.

Further i found the mechanism by which the 32 bit rollover in
isodirino() causes the failed mount:
cd9660_vfsops.c reverse computes the block number of a directory
entry from its inode number.

This will lead to two patch proposals:

- Fix only the rollover.

- Scale down the inode numbers so that they stay below 32 bit
  for directory locations below 128 GiB. (I wrote "TiB" in my
  previous mail. That is wrong, of course.)

How to properly submit them ?

Details:

The RRIP related part that reacts sour on the 32-bit rollover
of isodirino() is in sys/fs/cd9660/cd9660_vfsops.c:812

if (relocated) {
/*
 * On relocated directories we must
 * read the `.' entry out of a dir.
 */
ip->iso_start = ino >> imp->im_bshift;

where
struct iso_node *ip;
and
  struct iso_node {
...
unsigned long iso_start;/* [...] */

So the inode number computation is supposed to provide a message
for a program part that wants to know the block number of a
directory entry.

Well, by providing an according reverse computation macro
i could save my plan of 32 bit inode numbers up to 128 GiB.
(Actually i need two block-byte-to-inode macros. One for ISO
 blocks and one for DEV_BSHIFT blocks.)

Now the mounted ISO root directory looks like this:

  netbsd# ls -li /mnt
  total 16777244
  135294151 -rw-r--r--  1 thomas  dbus 396 Aug 26  2011 id_rsa.pub
  135294272 drwxr-xr-x  1 thomas  dbus6144 Aug  7  2013 libburn-1.3.2
  135296128 drwxr-xr-x  1 thomas  dbus6144 Dec 12 12:14 libburn-1.3.4
  135297984 drwxr-xr-x  1 thomas  dbus6144 Mar 18 12:40 libburn-1.3.6
  135294170 -rw-r--r--  1 thomas  dbus  4294965248 Jun 23  2012 powerpc.img
  135294170 -rw-r--r--  1 thomas  dbus  4294965248 Jun 23  2012 powerpc.img

(There is still some work to do with large files.)



Further i got a partial answer to my question why isodirino()
encodes ext_attr_length like a block offset.
After using the reverse computed ip->iso_start underneath
the boss vnode vp of ip, there is in cd9660_vfsops.c:

ip->iso_start = isonum_711(isodir->ext_attr_length) + ip->iso_extent;

Still strange that ext_attr_length is added to the block address
iso_extent.

Would anybody read from this ECMA-119 description that the
value in question is a block count ?
  "9.4.2 Extended Attribute Record length (BP 2)
   This field shall contain an 8-bit number. This number shall specify
   the assigned Extended Attribute Record length if an Extended
   Attribute Record is recorded. Otherwise this number shall be
   zero."

Shrug.
I never saw an ISO with non-zero values in those fields.



Have a nice day :)

Thomas



Bug in fs/cd9660 raises questions about inode number computing

2014-05-05 Thread Thomas Schmitt
Hi,

i am exploring a bug in fs/cd9660.
A fix seems straightforward after i found its 32 bit rollover
trigger in the code.

But my exploration raises questions:

- Is the inode computation in sys/fs/cd9660/cd9660_node.c:isodirino()
  faulty (additionally to its rollover) ?

- Are the inode numbers of cd9660 too 64-bit-happy ?

-

The bug shows up when mounting a multi-session ISO 9660 medium
which has its directory tree above 4 GiB. (Not possible with CD,
but with DVD or BD.)

  netbsd# mount -t cd9660 /dev/cd0a /mnt
  netbsd# ls -l /mnt
  -r-xr-xr-x  1 root  wheel  0 Jan  1  1970 /mnt
  netbsd# umount /mnt
  umount: /mnt: Invalid argument
  netbsd# umount /dev/cd0a
  umount: /mnt: Invalid argument
  netbsd# mount
  /dev/wd0a on / type ffs (local)
  kernfs on /kern type kernfs (local)
  ptyfs on /dev/pts type ptyfs (local)
  procfs on /proc type procfs (local)
  /dev/cd0a on /mnt type cd9660 (read-only, local)

I can get rid of the mount point only by reboot.

This does not happen with

  mount -t cd9660 -o norrip /dev/cd0a /mnt

making it clear that the bug sits in the Rock Ridge
interpretation. I did not find this spot yet.

But looking for possible 32 bit byte address rollovers i found
the initiating culprit in sys/fs/cd9660/cd9660_node.c :

  ino_t
  isodirino(struct iso_directory_record *isodir, struct iso_mnt *imp)
  {
ino_t ino;

ino = (isonum_733(isodir->extent) + isonum_711(isodir->ext_attr_length))
  << imp->im_bshift;
return (ino);
  }

Although sizeof(ino_t) == 8, this rolls over at least on NetBSD i386
because none of the operands of the computation is 64 bit wide.

A test in userspace confirms:

  sizeof(ino_t) = 8
  ( 2113952 + 0 ) << 11 = 34406400
  ( ((uint64_t) 2113952) + 0 ) << 11 = 4329373696

Fixing the rollover makes the ISO mountable and umountable:

 ino = (((uint64_t) isonum_733(isodir->extent)) +
isonum_711(isodir->ext_attr_length)) << imp->im_bshift;

---

The rollover seems not to be the only cause for the bad mount.
It would probably be harmless if not other inode computations
in cd9660 would have a 64 bit operand in their own expressions.

I suspect this causes inode numbers to be computed differently
on the same inode. Rock Ridge code obviously takes offense.

At least i can trigger the same problem without rollover just
by dividing the inode numbers of some computations by 32.
(This would still keep them unique because a directory record
 has at least 34 bytes. Idea learned from Linux iso9660.)

I tried to push the 32 bit limit to 128 TiB by this division,
because ls(1) option -i shows rolled over 32 bit numbers.
Probably other consumers of stat(2) don't expect true 64 bit
either.
I will try to consolidate all inode computations by a single macro
and then look whether it runs with division by 32.

And in general, what strange formula is used in isodirino() ?

My comment in my copy of NetBSD 6.99.40 says:

   This looks strange. Why add a byte offset to the block address
   and then multiply the sum by block size ?
ino = (isonum_733(isodir->extent) +
   isonum_711(isodir->ext_attr_length)) << imp->im_bshift;

   One could fear that (isodir->extent << imp->im_bshift) coincides
   with the first directory entry which is not ".". So adding any
   number between 1 and 33 after multiplication/shifting would
   prevent the result from coinciding with the first or second entry.
   Well, ext_attr_length is normally 0 and "." is implicitely specified
   to be the first entry in a directory (by ECMA-119 6.8.2.2 and 9.3).

   The other inode computations look more plausible
 dbtob(bp->b_blkno) + entryoffsetinblock

So i changed the formula in isodirino() to what the others do.
It seems to work properly. Actually trivially because ext_attr_length
is 0 in my test ISO.

---

Because this mail is not yet long enough:

I am the upstream developer of libburn and libisofs.
NetBSD user since a few weeks by a virtual 6.1.3-i386 (with a
6.99.40 kernel meanwhile).
No kernel background, but ISO 9660 and SCSI/MMC for CD burners.

I recently ported libburn to NetBSD. My daring test user and
my friendly pkgsrc packager kept me interested in NetBSD cd9660
although i myself cannot get installed NetBSD on my real hardware
and thus have no real DVD drive to play with. Only qemu CD-ROM.

More proposals are to come, if you do not chase me away:

- Files larger than 4 GiB are misrepresented after mount_cd9660.

- mount_cd9660(8) lacks an option equivalent to FreeBSD's
  mount_cd9660 -s.

- Selfishly i could need ioctl(SCIOCCOMMAND) with ld(4) populated
  by virtio(4) so that i get a real DVD burner forwarded into my
  NetBSD. (virtio /dev/ld0d on DVD+RW writes with 16 KB/s. Ugh.)
  
---

Have a