Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andriy Gapon
on 26/06/2012 15:50 Andrey V. Elsukov said the following:
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c

First of all, it's hard to parse the above sentence. probing ... should be
greatly increased.  Probing what? :-)  If probing time, then we don't want 
that ;-)

I looked through the ZFS-related part and here are a few comments:

1. I think that the predominant indentation style of i386/loader/main.c should 
be
preserved for consistency.

2. I am not sure if I like the approach of moving partition tasting code into
common ZFS code (zfs.c).  On one hand, it now makes sense because the new
partition iteration code is machine-independent.  On the other hand, the reason
that I added arch_zfs_probe method was to give platforms full control over which
partitions and in what order are probed.  It seems to be important for some of 
them.
So, I like how your new partition interface makes it much easier to ZFS-probe
partitions, but I would prefer to have that code in arch_zfs_probe 
implementations
rather than in zfs_probe_dev.

3.  Related to the above.  In what shape is sparc64 ZFS support in your branch?
Have you tried to adapt it to the new model too?
It's the platform that has special requirements for disk/partition probing 
order.
Marius can help with additional information and testing here.

Overall, thank you very much for this work!  I believe that it moves us in the
correct direction.

-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andrey V. Elsukov
On 16.07.2012 14:23, Andriy Gapon wrote:
 on 26/06/2012 15:50 Andrey V. Elsukov said the following:
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c
 
 First of all, it's hard to parse the above sentence. probing ... should be
 greatly increased.  Probing what? :-)  If probing time, then we don't want 
 that ;-)
 
 I looked through the ZFS-related part and here are a few comments:

Thanks for that.

 1. I think that the predominant indentation style of i386/loader/main.c 
 should be
 preserved for consistency.
 
 2. I am not sure if I like the approach of moving partition tasting code into
 common ZFS code (zfs.c).  On one hand, it now makes sense because the new
 partition iteration code is machine-independent.  On the other hand, the 
 reason
 that I added arch_zfs_probe method was to give platforms full control over 
 which
 partitions and in what order are probed.  It seems to be important for some 
 of them.
 So, I like how your new partition interface makes it much easier to ZFS-probe
 partitions, but I would prefer to have that code in arch_zfs_probe 
 implementations
 rather than in zfs_probe_dev.

From the other point of view, ZFS is not a just file system and it works
directly with disks and partitions. And it seems to me this code will be common
for other architectures.

 3.  Related to the above.  In what shape is sparc64 ZFS support in your 
 branch?
 Have you tried to adapt it to the new model too?
 It's the platform that has special requirements for disk/partition probing 
 order.
 Marius can help with additional information and testing here.

Currently i have not received any feedback reports from the users who can test
patches on the other architectures. I added VTOC8 support to the part.c, but it
seems it is not needed and ofw can work without this.


-- 
WBR, Andrey V. Elsukov




signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andriy Gapon
on 16/07/2012 13:57 Andrey V. Elsukov said the following:
 On 16.07.2012 14:23, Andriy Gapon wrote:
 on 26/06/2012 15:50 Andrey V. Elsukov said the following:
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c

 First of all, it's hard to parse the above sentence. probing ... should be
 greatly increased.  Probing what? :-)  If probing time, then we don't want 
 that ;-)

 I looked through the ZFS-related part and here are a few comments:
 
 Thanks for that.
 
 1. I think that the predominant indentation style of i386/loader/main.c 
 should be
 preserved for consistency.

 2. I am not sure if I like the approach of moving partition tasting code into
 common ZFS code (zfs.c).  On one hand, it now makes sense because the new
 partition iteration code is machine-independent.  On the other hand, the 
 reason
 that I added arch_zfs_probe method was to give platforms full control over 
 which
 partitions and in what order are probed.  It seems to be important for some 
 of them.
 So, I like how your new partition interface makes it much easier to ZFS-probe
 partitions, but I would prefer to have that code in arch_zfs_probe 
 implementations
 rather than in zfs_probe_dev.
 
 From the other point of view, ZFS is not a just file system and it works
 directly with disks and partitions. And it seems to me this code will be 
 common
 for other architectures.

Well, it seems that you haven't yet touched sparc64_zfs_probe.
If you'll find that you don't have to use any ugly hacks there, then good.
But my impression is that it would be easier to stick to the previous approach.

 3.  Related to the above.  In what shape is sparc64 ZFS support in your 
 branch?
 Have you tried to adapt it to the new model too?
 It's the platform that has special requirements for disk/partition probing 
 order.
 Marius can help with additional information and testing here.
 
 Currently i have not received any feedback reports from the users who can test
 patches on the other architectures. I added VTOC8 support to the part.c, but 
 it
 seems it is not needed and ofw can work without this.
 

-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andrey V. Elsukov
On 16.07.2012 15:05, Andriy Gapon wrote:
 2. I am not sure if I like the approach of moving partition tasting code 
 into
 common ZFS code (zfs.c).  On one hand, it now makes sense because the new
 partition iteration code is machine-independent.  On the other hand, the 
 reason
 that I added arch_zfs_probe method was to give platforms full control over 
 which
 partitions and in what order are probed.  It seems to be important for some 
 of them.
 So, I like how your new partition interface makes it much easier to 
 ZFS-probe
 partitions, but I would prefer to have that code in arch_zfs_probe 
 implementations
 rather than in zfs_probe_dev.

 From the other point of view, ZFS is not a just file system and it works
 directly with disks and partitions. And it seems to me this code will be 
 common
 for other architectures.
 
 Well, it seems that you haven't yet touched sparc64_zfs_probe.

Yes. It should work as before.
But if Marius can suggest how to change ofw_disk.c to get disk size and sector 
size,
then i will be able to break something here :)

 If you'll find that you don't have to use any ugly hacks there, then good.
 But my impression is that it would be easier to stick to the previous 
 approach.

-- 
WBR, Andrey V. Elsukov





signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andriy Gapon
on 16/07/2012 14:14 Andrey V. Elsukov said the following:
 On 16.07.2012 15:05, Andriy Gapon wrote:
 2. I am not sure if I like the approach of moving partition tasting code 
 into
 common ZFS code (zfs.c).  On one hand, it now makes sense because the new
 partition iteration code is machine-independent.  On the other hand, the 
 reason
 that I added arch_zfs_probe method was to give platforms full control over 
 which
 partitions and in what order are probed.  It seems to be important for 
 some of them.
 So, I like how your new partition interface makes it much easier to 
 ZFS-probe
 partitions, but I would prefer to have that code in arch_zfs_probe 
 implementations
 rather than in zfs_probe_dev.

 From the other point of view, ZFS is not a just file system and it works
 directly with disks and partitions. And it seems to me this code will be 
 common
 for other architectures.

 Well, it seems that you haven't yet touched sparc64_zfs_probe.
 
 Yes. It should work as before.

Well, but it's obvious that zfs_probe_dev would be attempting to do some 
unneeded
stuff (trying to treat partitions as disks) for that case.  To me this is a 
clear
indication zfs_probe_dev is not optimal for arch-independent implementation.  
So I
still think that arch_zfs_probe should decide what disks and partitions to 
probe,
and zfs_probe_dev should only probe what it's given and not try to be any 
smarter.
But I've repeated myself three times already :-)

 But if Marius can suggest how to change ofw_disk.c to get disk size and 
 sector size,
 then i will be able to break something here :)
 
 If you'll find that you don't have to use any ugly hacks there, then good.
 But my impression is that it would be easier to stick to the previous 
 approach.
 


-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Andrey V. Elsukov
On 16.07.2012 15:31, Andriy Gapon wrote:
 Yes. It should work as before.
 
 Well, but it's obvious that zfs_probe_dev would be attempting to do some 
 unneeded
 stuff (trying to treat partitions as disks) for that case.  To me this is a 
 clear
 indication zfs_probe_dev is not optimal for arch-independent implementation.  
 So I
 still think that arch_zfs_probe should decide what disks and partitions to 
 probe,
 and zfs_probe_dev should only probe what it's given and not try to be any 
 smarter.
 But I've repeated myself three times already :-)

And we will have the same - several copies of the same code in each 
architecture,
which i have deleted...

Sparc doesn't support DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls,
so it will not check each partition, only fd that is passed to the 
zfs_probe_dev.

Currently there is only one problem with ZFS tasting, that can affect users -
now we taste each disk and partition, but in the my branch ZFS tastes only 
disks and
partitions with type freebsd and freebsd-zfs. So if you have created ZFS on 
top
of MBR partition with type ntfs, then loader will be unable to detect it.

-- 
WBR, Andrey V. Elsukov





signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-07-16 Thread Marius Strobl
On Mon, Jul 16, 2012 at 04:00:49PM +0400, Andrey V. Elsukov wrote:
 On 16.07.2012 15:31, Andriy Gapon wrote:
  Yes. It should work as before.
  
  Well, but it's obvious that zfs_probe_dev would be attempting to do some 
  unneeded
  stuff (trying to treat partitions as disks) for that case.  To me this is a 
  clear
  indication zfs_probe_dev is not optimal for arch-independent 
  implementation.  So I
  still think that arch_zfs_probe should decide what disks and partitions to 
  probe,
  and zfs_probe_dev should only probe what it's given and not try to be any 
  smarter.
  But I've repeated myself three times already :-)
 
 And we will have the same - several copies of the same code in each 
 architecture,
 which i have deleted...
 
 Sparc doesn't support DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls,
 so it will not check each partition, only fd that is passed to the 
 zfs_probe_dev.
 
 Currently there is only one problem with ZFS tasting, that can affect users -
 now we taste each disk and partition, but in the my branch ZFS tastes only 
 disks and
 partitions with type freebsd and freebsd-zfs. So if you have created ZFS 
 on top
 of MBR partition with type ntfs, then loader will be unable to detect it.
 

Sorry, I'm missing the big picture of ZFS support in the loader and
currently unfortunately don't have the time to look into it or your
patches. I don't think there's a way to determine the media and
sector sizes without actually looking at the Sun and/or VTOC8 labels
though. As for zfs_probe_dev, some user recently indicated that
on sparc64 we should rather look at the disk devices listed in
the boot-device environment variable in order to mimic what Solaris
does rather than trying to probe anything that might be a disk device,
mimicking what the FreeBSD/i386 ZFS loader does. Maybe that's a hint
whether a arch_zfs_probe should exist.
I can test patches once you guys have figures out how things should
work though.

Marius

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Stefan Esser
Am 27.06.2012 21:14, schrieb Marcel Moolenaar:
 
 On Jun 27, 2012, at 12:08 PM, Christian Laursen wrote:
 
 On 06/27/12 16:28, John Baldwin wrote:
 On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:

 When we are in the FreeBSD, our loader can detect that device size
 is lower than it see and it will work. When primary header is OK, then
 other OSes should work with this GPT. When it isn't OK, you just can't
 load other OS :)

 Ah, yes.  The solution to violating standards is to make sure you never
 use standards-compliant software.  That's a great argument. :)

 (Although not entirely uncommon.  Standards aren't always perfect, but if
 we had a way to not gratuitously violate them it would be nice to avoid
 doing so.)

 To be standards compliant and allow whole-disk based mirroring to work at 
 the same time wouldn't nested GPT work like this?
 
 GPTs don't nest.

It is not strictly necessary to use nested GPT to have GMIRROR et.al.
and GPT co-exist. And I think this is possible without violation of
any standard.

Just modify GEOM classes that keep state at the end of a partition to
leave some spare area *behind* the GEOM data. I.e.:

 MBR or Primary GTP header
 User Data 1: [0 .. (End -32KB)]
 GMIRROR Configuration and State
 User Data 2: [(End -32KB) .. End] (Spare area for Sec. GPT header)

If creating a GMIRROR (or other GEOM that keeps state at the end of
the provider) left at least the last 32KByte untouched (33 GPT sectors
rounded up to a power of 2), GPT could use this spare space to store
its Secondary Header.

These sectors could be treated as part of the User Data area, i.e.
logical addresses would be translated by GMIRROR to skip the GMIRROR
configuration sector (which I'd enlarge to at least 4KB for alignment
of User Data 2).

This implies that the GMIRROR specification covers the whole provider
(including the spare space but without the sectors holding the GMIRROR
config, which are mapped out), since updates to the Secondary GPT
must be performed on all mirrored devices.

This is a complication of the current GMIRROR code, but could be added
without impact on existing disk layouts. (I have not checked, whether
backwards compatibility mandates introduction of a new GMIRROR class
that supports such spare space after the GMIRROR config data, but I
assume that there is enough spare space pre-initialized to 0 that can
be used to add a flag that declares the 32 KByte beyond the end of the
config data to be part of the mirror.)

The only modifications required are:

- If a GMIRROR is created, place the configuration sector 32 KByte
  before the end of the provider and mark it as GPT compatible.
  (It is unknown at this point, whether GPT is to be used on the
  mirror at a later time.)

- Tasting a provider should support looking for a valid GMIRROR (or
  GRAID) config sector not only at the end of the provider, but if
  that fails then also 32 KByte before the end of the provider.
  The GMIRROR is considered to be the provider for the GPT (i.e.
  the GMIRROR extends to 32 KByte beyond its config sector).

- Creating partitions with MBR or GPT within a GMIRROR is possible
  without modification. The only difference is that the protected
  GMIRROR configuration sector is physically within the range of
  sectors used for the partition, but logically mapped out. The
  space available for partitioning is the provider size minus the
  size of the GMIRROR configuration, just as it used to be.

- Readind and writing the mirror is allowed for all sectors in the
  User Data area, as in a normal GMIRROR. The only difference is
  the test for logical sectors in the last 32 KByte, for which the
  request is modified to be offset by a few sectors to skip the
  GMIRROR configuration sector. Requests that cover physical sectors
  before and behind these GMIRROR config sectors must be split.

If instead of splitting off the final 32 KByte as User Data 2,
just the 33 sectors (of 512 Byte) required for GPT were assigned
to that area, then there would never be requests that extend beyond
the GMIRROR config sectors on GPT partitioned disks. But since such
request were still possible if MBR partitions were used, code to
treat such requests was still required in GMIRROR.

There is one caveat, though: Creating a GMIRROR and then using an
OS that does not know about FreeBSD to partition the disk would
result in the GMIRROR configuration space being ignored.

Another problem could be, that the available space in the GPT is
the size of the disk minus the GMIRROR configuration sectors, i.e.
there is a difference between the number of physical sectors on
the disk and the number of sectors to be assigned to partitions
by GPT.

 Nothing but FreeBSD would understand the freebsd-geom partition
 type, so the inner GPT device should be valid and standards
 compliant.
 
 If it were standards compliant, it would be discoverable by non-FreeBSD.
 That clearly isn't the case -- hence it's not standards compliant. What
 for 

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Stefan Esser
Sorry for following up to self, but ...

I just noticed somebody else suggesting the same method
(put GMIRROR configuration below Secondary GPT header),
but I think there is a problem:

If GMIRROR is used to mirror whole GPT partitioned drives,
then you want the GPT sectors to be considered part of the
mirror (to keep them identical when GPT partitions are
created/modified on the mirrored disks).

But the GMIRROR configuration must not be assigned to any
GPT partition. Therefore it must be protected, either by
hiding it (e.g. create a special partition to hold GEOM
config data, just to reserve the space within GPT, since
the configuration data will still be located by looking
at specific sectors of the provider), or by skipping the
sectors assigned to GEOM config data in the GEOM provider
that interprets them (e.g. GMIRROR).

The former only works if a GMIRROR (or GELI or whatever)
is created on a disk that already has GPT headers (since
these lead to the GEOM config data put before the Secondary
GPT header and allow the GEOM config to be marked as a
special partition in that header).

The latter only works on disks without GPT headers, since
the size of the provider will be smaller then the physical
disk. Even with the last physical disks available for GPT,
the GPT headers will probably not conform to the standard,
since remapping of the sectors to hide the GMIRROR config
will lead to different logical sector numbers for the
secondary GPT header when looked at with or without GMIRROR
loaded.

I still think it is possible to find a layout, that does
not violate the GPT standard (use last LBAs on disk, have
self-referential information like own LBA address consistent
with physical block numbers and block numbers presented to
users of GMIRROR et.al.).

Perhaps, GMIRROR could treat its configuration sector
(that is placed at the sector just below the secondary
GPT header) as read only. Requests may go to all sectors
below and also to the area above the GMIRROR config sector
used for the GPT header, to write it to all mirrored
devices).

But this is also ugly, since GPT must know to not assign
the GMIRROR config sector to any partition (it is read-
only for user requests, but writable on each individual
drive in case of GMIRROR configuration or state changes,
just as it is now). The reservation was best achieved by
use of a specific GPT partition for the configuration
data, for which GPT headers must exist, before the
GMIRROR is created (or bith must be created at the same
time, but that would mix GPT knowledge into GMIRROR).

All of the above is ugly, U'm afraid :(

Regards, STefan
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Kevin Oberman
On Wed, Jun 27, 2012 at 2:39 PM, Poul-Henning Kamp p...@phk.freebsd.dk wrote:

 I would like to point out that all other operating system which has
 had this precise problem, have solved it by adding a bootfs partition
 to hold the kernel+modules required to truly understand the disk-layout ?

I have seen some form of this solution suggested three times (once by
me) and now by someone who I think I can safely states is pretty
familiar with geom. So far I have seen no direct response and only a
passing comment by jhb that it might be difficult.

Sometimes standards need to be broken. Sometimes they such so badly
that te entire industry ignores them. But, unless there i a good
reason to ignore them, one should fully justify doing so, all the more
so when there are obvious ways that non-compliance can lead to
disaster. (Think of  geli disk there some other software steps on the
last block.)

Moreover, I think I can see a legitimate case, though I have not tried it.

Say I have a FreeBSD system with a large, unused space on the disk and
it uses gmirror. I decide that I need to have the ability to
occasionally boot Linux on this system (or, even Windows 8). For some
reason, and I can think of several, I can't use a virtual system. I
create a new partition for the second OS and install it. It knows
nothing about the gmirror, so it just uses the disk it is installed on
and never touches the metadata.

Is this possible? Looks reasonable to me.

I really, really feel uncomfortable about all of this. And  when
people start claiming that, by a very strained interpretation of what
appears on the surface to be a clear specification, they are not
violating the standard.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Wojciech Puchar

Just modify GEOM classes that keep state at the end of a partition to
leave some spare area *behind* the GEOM data. I.e.:



what is really a problem aat all?

just leave as is. If someone want's use gpart and mirror then mirroring 
every partition is simpler. usually not every partition needs to be 
mirrored.


or mirror a whole and make gpart in it, it should still boot fine.

even better - update bsdlabel to work with 2TB devices.

MUCH better.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Andrey V. Elsukov
On 28.06.2012 14:35, Wojciech Puchar wrote:
 Just modify GEOM classes that keep state at the end of a partition to
 leave some spare area *behind* the GEOM data. I.e.:

 
 what is really a problem aat all?
 
 just leave as is. If someone want's use gpart and mirror then mirroring every 
 partition is simpler.
 usually not every partition needs to be mirrored.
 
 or mirror a whole and make gpart in it, it should still boot fine.

I already reverted changes related to the GPT and GEOM metadata detection.

 even better - update bsdlabel to work with 2TB devices.
 MUCH better.

DragonFlyBSD has disklabel64 partitioning scheme. Make a port is simple task.

-- 
WBR, Andrey V. Elsukov


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Marcel Moolenaar

On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:
 
 All of the above is ugly, U'm afraid :(

Indeed. The only sane way is to put the metadata in a partition of its own.
Every compliant OS will respect that and consequently will not scribble over
the data unintentionally. Any other scheme that puts valuable data in some
undocumented or unregistered location is violating the GPT spec right away
and is susceptible to being clobbered unintentionally.

If the metadata is in its own partition, one can document the metadata layout
and providing a reference implementation. That way one increases the chance
that someone, somewhere may port support for it to some other OS. Lacking
widespread support for the mirroring scheme, I think that the notion that one
can safely and reliably mirror entire disks (read: mirror data not owned or
controlled by FreeBSD) is a very questionable one -- all one has to do is
boot some other OS and start modifying one of its partitions and you've
failed to achieve the objective.

My advise is to leave disk mirroring to H/W or firmware solutions and use
FreeBSD mirroring for FreeBSD partitions only. If you want to mirror the
whole disk, don't partition the disk with non-FreeBSD partitioning schemes
and partition only with FreeBSD-specific schemes or put a FreeBSD file
system on the whole disk. In other words: make the whole disk private to
FreeBSD.

Whether or not people agree with this is besides the point. All I'm saying
is that unique disk identifiers such as the UniqueMBRSignature (a 4 byte
ID written at offset 440 in the MBR) or the DiskGUID (an UUID written to
offset 56 in the GPT header) cannot, in general, be mirrored across disks
if OSes can see the mirrored disks as independent entities. One violates
the spec on grounds of making the *unique* disk identifier non-unique by
presenting OSes with multiple disks that have identical IDs.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


geom metadata (Re: [CFC/CFT] large changes in the loader(8) code)

2012-06-28 Thread Dieter BSD
These schemes to just put the metadata in some special location
and have all the tools know about it create a lot of problems.
There is always some tool that doesn't know. There is always
some human that doesn't know. Telling the difference between
real metadata and some other data that happens to look similar.
Convoluted logic that is prone to bugs. I have seen complaints
from people that have lost data when some tool wrote metadata
on top of it. Losing data is absolutely unacceptable.

There is a time to be clever and a time to just keep it simple.

Define a FreeBSD geom metadata GPT partition type.
Create a 6 sector (3 KiB) FreeBSD geom metadata GPT partition just after
the GPT header.

PMBR
pri GPT header
pri GPT table
FreeBSD geom metadata
data partition(s)
sec GPT table
sec GPT header

Advantages:
1) All OSes will know that this space is taken.
2) Humans looking at the GPT partition table will know that this space is
taken, and what it is being used for.
3) The 1st data partition becomes 4 KiB aligned, which is important for
many recent disks (yes the metadata partition is not 4K aligned, but is
presumably accessed only rarely, so it is not a performance problem)

Disadvantages
1) uses up a partition type
2) uses up a partition
With GPT neither of these disadvantages is significant.

Alternately one could make the geom metadata partition smaller and
add a spaceholder partition to get 4K alignment. Yes you can just
leave a hole, but putting a partition there labled 4K_alignment
makes it obvious why it is there.

So, what have I missed?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote:
 
 On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:
  
  All of the above is ugly, U'm afraid :(
 
 Indeed. The only sane way is to put the metadata in a partition of its own.
 Every compliant OS will respect that and consequently will not scribble over
 the data unintentionally. Any other scheme that puts valuable data in some
 undocumented or unregistered location is violating the GPT spec right away
 and is susceptible to being clobbered unintentionally.

If the user runs:

# gpart create -s GPT /dev/mirror/foo

for me it is obvious that he wants to partition the mirror device and
not individual disks. Because the mirror was configured earlier, do you
expect gmirror to somehow detect that someone is writting GPT metadata
later and magically place GPT metadata on the raw disk and move mirror's
metadata to some magic partition? Not to mention that the mirror itself
doesn't have to be configured on top of raw disks. And not to mention
that the mirror may never be partitioned.

If GPT in your opinion is limited only to raw disks then I guess the
best way to fix that is to refuse to configure GPT on anything except
raw disks (which was already proposed by Andrey?). In my opinion this is
unacceptable, but I think this is what you are suggesting.

One of the GEOM design goals was to be flexible. Let the user decide in
what order he wants to configure various layers. How do you know that in
every possible scenerio software mirroring should come after
partitioning and encryption after mirroring? Why can't we provide
flexible tools to the user and let him decide? Maybe GPT nesting
violates standards, but why can't we support it as an extention, really?

I recognize the need to warn users if they use FreeBSD-specific
features. We do that with non-standard APIs. So how about this.

Let's modify gpart(8) to print a warning if GPT is configured on
something else than raw disk. Let's the warning say that such
configuration is non-standard and problems are expected if the disk is
shared between other OSes.

In my opinion that's fair.

With such a warning in place, I think we can allow users to decide on
their own if they really want that or not. Then, we can also improve
FreeBSD boot loader to play nice with FreeBSD-specific extensions.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp74cN3XpwPl.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Alexander Leidinger
On Thu, 28 Jun 2012 08:33:17 -0700 Marcel Moolenaar mar...@xcllnt.net
wrote:

 My advise is to leave disk mirroring to H/W or firmware solutions and
 use FreeBSD mirroring for FreeBSD partitions only. If you want to
 mirror the whole disk, don't partition the disk with non-FreeBSD
 partitioning schemes and partition only with FreeBSD-specific schemes
 or put a FreeBSD file system on the whole disk. In other words: make
 the whole disk private to FreeBSD.

If I gmirror the entire disk, I already expressed my interest to make
the whole disk private to FreeBSD, haven't I? Or are you suggesting to
convince all BIOS vendors to include the ability to boot from some kind
of FreeBSD private partitioning scheme (not MBR as it is not
suitable, not GPT as you are not OK to use it on a gmirror)?

 Whether or not people agree with this is besides the point. All I'm
 saying is that unique disk identifiers such as the
 UniqueMBRSignature (a 4 byte ID written at offset 440 in the MBR)
 or the DiskGUID (an UUID written to offset 56 in the GPT header)
 cannot, in general, be mirrored across disks if OSes can see the
 mirrored disks as independent entities. One violates the spec on
 grounds of making the *unique* disk identifier non-unique by
 presenting OSes with multiple disks that have identical IDs.

What about multipathing? In case the disk is attached via two paths but
multipath is not enabled, the OS sees the same disk (and the same
identical unique disk identifier) multiple times. Is this a violation
of the spec too?

Bye,
Alexander.

-- 
http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Marcel Moolenaar

On Jun 28, 2012, at 10:25 AM, Pawel Jakub Dawidek wrote:

 On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote:
 
 On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:
 
 All of the above is ugly, U'm afraid :(
 
 Indeed. The only sane way is to put the metadata in a partition of its own.
 Every compliant OS will respect that and consequently will not scribble over
 the data unintentionally. Any other scheme that puts valuable data in some
 undocumented or unregistered location is violating the GPT spec right away
 and is susceptible to being clobbered unintentionally.
 
 If the user runs:
 
   # gpart create -s GPT /dev/mirror/foo
 
 for me it is obvious that he wants to partition the mirror device and
 not individual disks.

It could definitely be interpreted as the user knowing what he/she
wants and as such design an infrastructure around this assumption.
If users were at least as knowledgable as developers, my concerns
wouldn't be as big. But we all know how knoweldgable users can be
and kike it or not, even developers aren't gurus in everything. We
may think to know stuff, but in practice we're just as clueless in
cases as users -- more clueless even sometimes.

So you may think the intend is obvious, but you should know better.

 Let's modify gpart(8) to print a warning if GPT is configured on
 something else than raw disk. Let's the warning say that such
 configuration is non-standard and problems are expected if the disk is
 shared between other OSes.

Yes. I think we finally reached the point we should have reached
years ago. With the proper tooling, our flexible infrastructure
can be used in a safe and complaint way while still giving the
freedom to those who unwisely think they know better.

Build it and I'll concur.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Marcel Moolenaar

On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote:

 On Thu, 28 Jun 2012 08:33:17 -0700 Marcel Moolenaar mar...@xcllnt.net
 wrote:
 
 My advise is to leave disk mirroring to H/W or firmware solutions and
 use FreeBSD mirroring for FreeBSD partitions only. If you want to
 mirror the whole disk, don't partition the disk with non-FreeBSD
 partitioning schemes and partition only with FreeBSD-specific schemes
 or put a FreeBSD file system on the whole disk. In other words: make
 the whole disk private to FreeBSD.
 
 If I gmirror the entire disk, I already expressed my interest to make
 the whole disk private to FreeBSD, haven't I?

No. All you've done is type some commands. There's no inherent value
in it that relays that you know what you're doing. I have no problem
accepting that you do in fact know what you're doing, but that doesn't
mean that anyone who types the same sequence of commands is as skilled
as you are -- that would be a silly inference. What you need to do is
not have it be about you, but about some random user.

 Or are you suggesting to
 convince all BIOS vendors to include the ability to boot from some kind
 of FreeBSD private partitioning scheme (not MBR as it is not
 suitable, not GPT as you are not OK to use it on a gmirror)?

I would be having less problems if the mirroring didn't force the backup
GPT header in anything but the last sector. If the metadata was somewhere
else, then we wouldn't need to kluge various places to deal with the
ambiguity and visible interoperability problems of the various tools and
OSes. Thus, it's not that I object to the mirroring per se, just to the
mirroring as it is currently implemented with gmirror.

 What about multipathing? In case the disk is attached via two paths but
 multipath is not enabled, the OS sees the same disk (and the same
 identical unique disk identifier) multiple times. Is this a violation
 of the spec too?

It's the same disk, isn't it? The OS can actually use the property
of the ID to infer that it has already seen this disk and not create
multiple device nodes.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Miroslav Lachman

Pawel Jakub Dawidek wrote:

On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote:


On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:


All of the above is ugly, U'm afraid :(


Indeed. The only sane way is to put the metadata in a partition of its own.
Every compliant OS will respect that and consequently will not scribble over
the data unintentionally. Any other scheme that puts valuable data in some
undocumented or unregistered location is violating the GPT spec right away
and is susceptible to being clobbered unintentionally.


If the user runs:

# gpart create -s GPT /dev/mirror/foo

for me it is obvious that he wants to partition the mirror device and
not individual disks. Because the mirror was configured earlier, do you
expect gmirror to somehow detect that someone is writting GPT metadata
later and magically place GPT metadata on the raw disk and move mirror's
metadata to some magic partition? Not to mention that the mirror itself
doesn't have to be configured on top of raw disks. And not to mention
that the mirror may never be partitioned.

If GPT in your opinion is limited only to raw disks then I guess the
best way to fix that is to refuse to configure GPT on anything except
raw disks (which was already proposed by Andrey?). In my opinion this is
unacceptable, but I think this is what you are suggesting.

One of the GEOM design goals was to be flexible. Let the user decide in
what order he wants to configure various layers. How do you know that in
every possible scenerio software mirroring should come after
partitioning and encryption after mirroring? Why can't we provide
flexible tools to the user and let him decide? Maybe GPT nesting
violates standards, but why can't we support it as an extention, really?

I recognize the need to warn users if they use FreeBSD-specific
features. We do that with non-standard APIs. So how about this.

Let's modify gpart(8) to print a warning if GPT is configured on
something else than raw disk. Let's the warning say that such
configuration is non-standard and problems are expected if the disk is
shared between other OSes.

In my opinion that's fair.

With such a warning in place, I think we can allow users to decide on
their own if they really want that or not. Then, we can also improve
FreeBSD boot loader to play nice with FreeBSD-specific extensions.


I think this is valid point of view. FreeBSD already does things not 
supported by other OSes and I am completely fine with it - I am running 
FreeBSD on servers, not sharing anything with other OSes so I prefer 
extended FreeBSD specific features over 100% standard compliant 
behaviour crippling SW mirroring etc.


I think that our tools should support / provide all standard compliant 
(compatible) features, but let user to choose any other extended 
non-compatible features if user wants it. Even if it can be seen as 
foot shooting by somebody else.


And maybe one day our solution will be widespread and taken as a standard.

Miroslav Lachman
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote:
 On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote:
  Or are you suggesting to
  convince all BIOS vendors to include the ability to boot from some kind
  of FreeBSD private partitioning scheme (not MBR as it is not
  suitable, not GPT as you are not OK to use it on a gmirror)?
 
 I would be having less problems if the mirroring didn't force the backup
 GPT header in anything but the last sector. [...]

GPT backup header is placed in the last sector of the mirror device,
just like the user asked. Gmirror doesn't force anything. User decides
to put GPT partitioning on the mirror device instead of raw disk.
Gmirror doesn't even know and doesn't have to know how the user uses
data area on the mirror device.

 [...] If the metadata was somewhere
 else, then we wouldn't need to kluge various places to deal with the
 ambiguity and visible interoperability problems of the various tools and
 OSes. [...]

Where is somewhere else, exactly?

If somewhere else on this disk, then where? At the begining of the disk?
Then you would complain that it keeps metadata where the primary header
should be located and also MBR metadata, BSDlabel metadata, etc.
Somewhere in the middle of the disk? Some future GPTng may want to use
the same spot, but also gmirror-unaware boot loader will see corrupted
data (shifted by one sector). Come on...

If somewhere else is not on this disk, then I'm sorry, but this is
totally impractical. Disks are the place you store stuff. In 99% of the
cases there is no other place to store it, but the disk itself. Should
we ask users to use additional disk to keep mirror's metadata?

 [...] Thus, it's not that I object to the mirroring per se, just to the
 mirroring as it is currently implemented with gmirror.

Do you know software RAID (=1) or volume manager that doesn't keep
metadata on component disks?

PS. We are discussing two totally different things here:
1. Is placing GPT on anything but raw disk violates the spec? I can
   agree that it does and I'm happy with gpart(8) growing a warning.
2. How to do software mirroring. Besides trying really hard I'm not sure
   what alternative are you proposing. Could you be more specific and
   describe how gmirror should be implemented in your opinion?

  What about multipathing? In case the disk is attached via two paths but
  multipath is not enabled, the OS sees the same disk (and the same
  identical unique disk identifier) multiple times. Is this a violation
  of the spec too?
 
 It's the same disk, isn't it? The OS can actually use the property
 of the ID to infer that it has already seen this disk and not create
 multiple device nodes.

You cannot trust some id that is found on disk to be unique, as all
your assumptions break when the user decides to dd(1)-copy content of
this disk to another disk, for example.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpDtjuGB9EcQ.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Marcel Moolenaar

On Jun 28, 2012, at 4:07 PM, Pawel Jakub Dawidek wrote:
 
 I would be having less problems if the mirroring didn't force the backup
 GPT header in anything but the last sector. [...]
 
 GPT backup header is placed in the last sector of the mirror device,
 just like the user asked. Gmirror doesn't force anything. User decides
 to put GPT partitioning on the mirror device instead of raw disk.
 Gmirror doesn't even know and doesn't have to know how the user uses
 data area on the mirror device.

This really is a cop-out paragraph.

 [...] If the metadata was somewhere
 else, then we wouldn't need to kluge various places to deal with the
 ambiguity and visible interoperability problems of the various tools and
 OSes. [...]
 
 Where is somewhere else, exactly?

I already suggested a few things in this thread. Go read it.

I'm bored now, so I'll just wait for UEFI booting to be forced upon
those who mirror the whole disk with gmirror. I think that's when
we will have a more substantial and meaningful continuation of this
thread.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Hiroki Sato
Pawel Jakub Dawidek p...@freebsd.org wrote
  in 20120628230725.gb1...@garage.freebsd.pl:

pj PS. We are discussing two totally different things here:
pj 1. Is placing GPT on anything but raw disk violates the spec? I can
pjagree that it does and I'm happy with gpart(8) growing a warning.

 I agree that there is a sort of violation, but in practice most of
 implementations which use GPT can recognize the backup header as long
 as the primary one is not corrupted by using the alternative LBA
 field.

 One thing we have to consider is what happens when the primary header
 becomes broken.  In that case and if a GEOM metadata is placed at the
 end of the raw disk, GPT will be lost and it cannot recover by
 non-GEOM-aware software including BIOS and other OS.  Also, even for
 FreeBSD it causes a boot failure.  The modification which ae@
 proposes mitigates this case.  Of course, maybe BIOS or EFI will not
 recognize the corrupted header because the backup header is not
 located at the end.  In that case all of the partitions are not
 recognized and the FreeBSD does not boot.  This is the trade-off when
 we use GPT in a logical volume provided by GEOM.  In short, the risk
 is that backup header does not work as a backup when the primary is
 broken.  I agree that putting a warning about that is good and
 enough.  Whether this risk is acceptable or not depends on the
 sysadmin.  Also, we can describe the pros and cons in detail in a
 section of the handbook because I and wblock@ are working on updating
 it.

pj 2. How to do software mirroring. Besides trying really hard I'm not sure
pjwhat alternative are you proposing. Could you be more specific and
pjdescribe how gmirror should be implemented in your opinion?

 I do not think this topic is related to ae@'s change and this should
 be discussed in a separate thread.  His change aims to support a
 non-standard GPT header location in a quite limited situation, not
 actively promote such a configuration.  The issue of GPT+GEOM is not
 limited to gmirror.  Just putting GEOM::LABEL metadata causes the
 same issue.

-- Hiroki


pgpk56bXmRWq8.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 12:50:20 am Andrey V. Elsukov wrote:
 On 26.06.2012 21:37, John Baldwin wrote:
  4. The gptboot now searches the backup GPT header in the previous sectors,
  when it finds the GEOM:: signature in the last sector. PMBR code also
  tries to do the same:
  common/gpt.c
  i386/pmbr/pmbr.s
  
  GPT really wants the backup header at the last LBA.  I know you can set it, 
  but I've interpreted that as a way to see if the primary header is correct 
  or 
  not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
  provider) will not work properly with partition editors for other OS's.  
  I'm 
  hesitant to encourage the use of this as I do think putting GPT inside of a 
  gmirror violates the GPT spec.
 
 The standard says:
 The following test must be performed to determine if a GPT is valid:
 • Check the Signature
 • Check the Header CRC
 • Check that the MyLBA entry points to the LBA that contains the GUID 
 Partition Table
 • Check the CRC of the GUID Partition Entry Array
 If the GPT is the primary table, stored at LBA 1:
 • Check the AlternateLBA to see if it is a valid GPT
 If the primary GPT is corrupt, software must check the last LBA of the device 
 to see if it has a
 valid GPT Header and point to a valid GPT Partition Entry Array.

Right, we break the last rule.  If you want to use a partition editor
that doesn't grok gmirror (because you are using another OS's editor),
to repair a GPT, it will do the wrong thing.

 If a user wants modify GPT in the disk editor from the another OS,
 he can do it, and it should work. The result depends only from the partition 
 editor,
 it might overwrite the last sector and might don't.

I would not assume it would work at all.  If it can't trust the
primary GPT, it has to assume the alternate is at the last LBA.

  5. Also the pmbr image now contains one fake partition record.
  When several first sectors are damaged the kernel can't detect GPT
  (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
  command, but the old pmbr image has an empty partition table and
  loader doesn't able to boot from GPT, when there is no partition record
  in the PMBR. Now it will be able. When pmbr is installed via 'gpart 
  bootcode'
  command, the kernel correctly modifies this partition record. So, this is 
  only
  for the first rescue step.
  
  As I said earlier, I do not think this is appropriate and that instead
  gpart should have an appropriate 'recover' command to install just the pmbr 
  on 
  a disk and also create a correct entry in the MBR if needed while doing so.
 
 gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM 
 class.
 It only sends control requests to the kernel. If GPT is not detected,
 there is no geom objects to manage. And we can't write bootcode with gpart(8).
 I think that adding such functions to the gpart(8) is not good. Maybe,
 the boot0cfg is the better tool for that. Also we still haven't any tool to
 install zfsboot.

We can't write bootcode with gpart?  What do you think the 'bootcode' command
does?

Also, there is no reason we can't have a 'recover' command that attempts to
recover a corrupted table including repairing the PMBR.  gpart(8) already
generates a full PMBR when you use 'gpart create' to create a GPT even though
there isn't a GPT object yet.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Tuesday, June 26, 2012 5:23:08 pm Pawel Jakub Dawidek wrote:
 On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
   4. The gptboot now searches the backup GPT header in the previous sectors,
   when it finds the GEOM:: signature in the last sector. PMBR code also
   tries to do the same:
   common/gpt.c
   i386/pmbr/pmbr.s
  
  GPT really wants the backup header at the last LBA.  I know you can set it, 
  but I've interpreted that as a way to see if the primary header is correct 
  or 
  not. [...]
 
 My interpretation is different: The way to verify if the header is valid
 is to check its checksum, not to check if the backup header location in
 the primary header points at the last LBA.
 
 Of course if primary header's checksum is incorrect it is hard to trust
 that the backup header location is correct. And we need the backup
 header when the primary header is invalid...

Right, which is why this fails.

  [...] It seems to me that GPT tables created in this fashion (inside a GEOM 
  provider) will not work properly with partition editors for other OS's.  
  I'm 
  hesitant to encourage the use of this as I do think putting GPT inside of a 
  gmirror violates the GPT spec.
 
 I don't think so. Most common case is to configure partitions on top of
 a mirror. Mirroring partitions is less common. Mostly because of
 hardware RAIDs being popular. You don't expect hardware RAID vendor to
 mirror partitions. Partition editors for other OS's won't work, but only
 because they don't support gmirror. If they wouldn't recognize and
 support some hardware (or pseudo-hardware) RAIDs there will be the same
 problem.

Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
as the logical volume is always seen by all OS's consistently.  The same
is even true of the software RAID that graid supports since the metadata
is defined by the vendor and thus the logical volume is always seen other
OS's consistently.

My approach has been to only use gmirror with MBR so far, though I realize
that doesn't work above 2TB (until recently one had to have a hardware RAID
to get above 2TB anyway which made this last a moot point).

I won't object to patch our tools to handle this, but I think it is a really
bad idea that users will have a hard way to recover from when they are bitten
by it.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 27.06.2012 16:07, John Baldwin wrote:
 • Check the Signature
 • Check the Header CRC
 • Check that the MyLBA entry points to the LBA that contains the GUID 
 Partition Table
 • Check the CRC of the GUID Partition Entry Array
 If the GPT is the primary table, stored at LBA 1:
 • Check the AlternateLBA to see if it is a valid GPT
 If the primary GPT is corrupt, software must check the last LBA of the 
 device to see if it has a
 valid GPT Header and point to a valid GPT Partition Entry Array.
 
 Right, we break the last rule.  If you want to use a partition editor
 that doesn't grok gmirror (because you are using another OS's editor),
 to repair a GPT, it will do the wrong thing.

When we are in the FreeBSD, our loader can detect that device size
is lower than it see and it will work. When primary header is OK, then
other OSes should work with this GPT. When it isn't OK, you just can't
load other OS :)

 As I said earlier, I do not think this is appropriate and that instead
 gpart should have an appropriate 'recover' command to install just the pmbr 
 on 
 a disk and also create a correct entry in the MBR if needed while doing so.

 gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM 
 class.
 It only sends control requests to the kernel. If GPT is not detected,
 there is no geom objects to manage. And we can't write bootcode with 
 gpart(8).
 I think that adding such functions to the gpart(8) is not good. Maybe,
 the boot0cfg is the better tool for that. Also we still haven't any tool to
 install zfsboot.
 
 We can't write bootcode with gpart?  What do you think the 'bootcode' command
 does?

`gpart bootcode -b` reads file, creates ioctl request and sends this data to
the GEOM_PART class. GEOM_PART receives the control request, checks the data
and writes it to the provider.
`gpart bootcode -p` works like dd(1) and writes bootcode to the given partition.
gpart(8) haven't any knowledge about specific partitioning scheme.

 Also, there is no reason we can't have a 'recover' command that attempts to
 recover a corrupted table including repairing the PMBR.  gpart(8) already
 generates a full PMBR when you use 'gpart create' to create a GPT even though
 there isn't a GPT object yet.

`gpart create` creates only ioctl control request to the GEOM_PART class.
GEOM_PART class creates new GPT geom object and this objects writes PMBR and its
metadata to the provider.

-- 
WBR, Andrey V. Elsukov


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
  I don't think so. Most common case is to configure partitions on top of
  a mirror. Mirroring partitions is less common. Mostly because of
  hardware RAIDs being popular. You don't expect hardware RAID vendor to
  mirror partitions. Partition editors for other OS's won't work, but only
  because they don't support gmirror. If they wouldn't recognize and
  support some hardware (or pseudo-hardware) RAIDs there will be the same
  problem.
 
 Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
 editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
 as the logical volume is always seen by all OS's consistently. [...]

Only if you won't connect this disk to a different controller.

 [...] The same
 is even true of the software RAID that graid supports since the metadata
 is defined by the vendor and thus the logical volume is always seen other
 OS's consistently.

But is it seen without metadata by the boot loader?

What I'm trying to say is that it is fair to expect from the user to not
use gmirror-configured disk on different OS. If the user wants to use
this disk in different OS then he has to use format that is recognized
by both.

Because gmirror is supported by FreeBSD we should improve the support by
teaching boot loader about it. Pretending gmirror is special and
recommending to mirror partitions with it instead of raw disks is not
the solution.

I really can't see how gmirror is different in this regard from any
other software RAID or volume manager. If you try to use disk that
contains unrecognized metadata the behaviour is undefined (but hopefully
not a panic).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpuYtYuIiw2R.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 10:08:17 am Pawel Jakub Dawidek wrote:
 On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
   I don't think so. Most common case is to configure partitions on top of
   a mirror. Mirroring partitions is less common. Mostly because of
   hardware RAIDs being popular. You don't expect hardware RAID vendor to
   mirror partitions. Partition editors for other OS's won't work, but only
   because they don't support gmirror. If they wouldn't recognize and
   support some hardware (or pseudo-hardware) RAIDs there will be the same
   problem.
  
  Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
  editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
  as the logical volume is always seen by all OS's consistently. [...]
 
 Only if you won't connect this disk to a different controller.

Yes, but people do not expect to be able to yank a hardware RAID drive out and 
hook it up to a raw disk controller and have it work.

  [...] The same
  is even true of the software RAID that graid supports since the metadata
  is defined by the vendor and thus the logical volume is always seen other
  OS's consistently.
 
 But is it seen without metadata by the boot loader?

Yes.  The logical volume shows up as a BIOS disk device.

 What I'm trying to say is that it is fair to expect from the user to not
 use gmirror-configured disk on different OS. If the user wants to use
 this disk in different OS then he has to use format that is recognized
 by both.
 
 Because gmirror is supported by FreeBSD we should improve the support by
 teaching boot loader about it. Pretending gmirror is special and
 recommending to mirror partitions with it instead of raw disks is not
 the solution.
 
 I really can't see how gmirror is different in this regard from any
 other software RAID or volume manager. If you try to use disk that
 contains unrecognized metadata the behaviour is undefined (but hopefully
 not a panic).

It is not gmirror I am complaining about, it is the non-standard use of GPT.
Note that gmirror + MBR works fine without violating what little standard 
there is for the MBR.  Using a dedicated GPT partition to hold the gmirrror 
metadata would work with GPT (but be a good bit harder to work with in terms 
of GEOM I realize).

But as I said, I won't object to these patches.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:
 On 27.06.2012 16:07, John Baldwin wrote:
  • Check the Signature
  • Check the Header CRC
  • Check that the MyLBA entry points to the LBA that contains the GUID 
  Partition Table
  • Check the CRC of the GUID Partition Entry Array
  If the GPT is the primary table, stored at LBA 1:
  • Check the AlternateLBA to see if it is a valid GPT
  If the primary GPT is corrupt, software must check the last LBA of the 
  device to see if it has a
  valid GPT Header and point to a valid GPT Partition Entry Array.
  
  Right, we break the last rule.  If you want to use a partition editor
  that doesn't grok gmirror (because you are using another OS's editor),
  to repair a GPT, it will do the wrong thing.
 
 When we are in the FreeBSD, our loader can detect that device size
 is lower than it see and it will work. When primary header is OK, then
 other OSes should work with this GPT. When it isn't OK, you just can't
 load other OS :)

Ah, yes.  The solution to violating standards is to make sure you never
use standards-compliant software.  That's a great argument. :)

(Although not entirely uncommon.  Standards aren't always perfect, but if
we had a way to not gratuitously violate them it would be nice to avoid
doing so.)

  We can't write bootcode with gpart?  What do you think the 'bootcode' 
  command
  does?
 
 `gpart bootcode -b` reads file, creates ioctl request and sends this data to
 the GEOM_PART class. GEOM_PART receives the control request, checks the data
 and writes it to the provider.
 `gpart bootcode -p` works like dd(1) and writes bootcode to the given 
 partition.
 gpart(8) haven't any knowledge about specific partitioning scheme.

Correct, but in both cases it writes bootcode.

  Also, there is no reason we can't have a 'recover' command that attempts to
  recover a corrupted table including repairing the PMBR.  gpart(8) already
  generates a full PMBR when you use 'gpart create' to create a GPT even 
  though
  there isn't a GPT object yet.
 
 `gpart create` creates only ioctl control request to the GEOM_PART class.
 GEOM_PART class creates new GPT geom object and this objects writes PMBR and 
 its
 metadata to the provider.

You can't add a new ioctl?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct or 
 not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.

Agreed.

While it is a nice trick to use the last sector for meta data, it does
create 2 problems. 1 is mentioned above. The second is that when there's
different metadata in the first *and* the last sector, you can't decide
which is to take precedence without also looking at the other and know
how to interpret it. We have not solved this second problem at all.  We
do get reports about the problems though. At best we're handwaving or
kluging.

I think it's unwise to depend on FreeBSD-specific extensions or features
in industry-standard partitioning schemes and as such make the use of
foreign tools hard if not impossible.

A much more flexible approach is to support out-of-band configuration
data. This allows us to mirror GPT disks without having to become non-
standard as it removes the need to use the last sector for meta-data.
The ability to construct GEOM hierarchies unambiguously is very
important and our current approach has proven to not deliver on that.
This is actually impacting existing FreeBSD consumers already, like
Juniper. So, se should not go deeper into this rabbit hole. We should
finally solve this problem for real...

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
 
 As for sharing disk with other OS. If you share the disk with OS that
 doesn't support gmirror, you shouldn't use gmirror in the first place.
 You probably want to use only formats that are recognized by all your
 OSes.

This statement is ridicuous by virtue of not being in touch with
reality and by making gmirror useless for such wide range of cases
that one can question why we have it at all.

Put differently: a mirroring class is a fairly basic and useful thing
to have. Limiting it's use is nothing but artificial and follows from
having to use the underlying provider to store metadata. This then
changes the view of the underlying providing to consumers above gmirror
in a way that makes the presence or absence of gmirror visible.
Solving the visibility problem makes gmirror useful all the time.
I see that as a better way of looking at it than simply blurting out
that you shouldn't use gmirror when certain awkward and artifical
conditions apply.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 9:50 PM, Andrey V. Elsukov wrote:

 If the primary GPT is corrupt, software must check the last LBA of the device 
 to see if it has a
 valid GPT Header and point to a valid GPT Partition Entry Array.
 
 For the FreeBSD an each GEOM provider can be treated as disk device.
 So, i don't see anything criminal if we will add some quirks in the our loader
 for the better supporting of our technologies.

You can't just re-interpret standards to match a context you know very well
isn't applicable and consequently redefine what the word device means.
You're on a slippery slope and while you may not see it as a problem, you
do make it a problem for FreeBSD users. It's our users we should be keeping
in mind when we solve problems.

 If a user wants modify GPT in the disk editor from the another OS,
 he can do it, and it should work. The result depends only from the partition 
 editor,
 it might overwrite the last sector and might don't.

Right. Another happy user that sees his/her FreeBSD installation destroyed
or degraded (no mirroring, warning messages about corrupted GPT, etc) for
no apparent reason and without any kind of warning that what he/she is doing
is potentially harmful... That's the spirit!

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
  
  GPT really wants the backup header at the last LBA.  I know you can set it, 
  but I've interpreted that as a way to see if the primary header is correct 
  or 
  not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
  provider) will not work properly with partition editors for other OS's.  
  I'm 
  hesitant to encourage the use of this as I do think putting GPT inside of a 
  gmirror violates the GPT spec.
 
 Agreed.

Guys. This doesn't violate the GPT spec in any way. The spec is
narrow-minded if it talks only about raw disks, but you should think
about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top
of RAID array is spec violation, then I guess we just have to live with it.

 While it is a nice trick to use the last sector for meta data, it does
 create 2 problems. 1 is mentioned above. [...]

It doesn't really matter where gmirror puts its metadata. If gmirror
would keep its metadata in the first sector, gpart/gpt will find its
metadata in the last sector and will complain about missing primary
header.

 [...] The second is that when there's
 different metadata in the first *and* the last sector, you can't decide
 which is to take precedence without also looking at the other and know
 how to interpret it. We have not solved this second problem at all.  We
 do get reports about the problems though. At best we're handwaving or
 kluging.

This is different kind of problem. It took me a while to realize that,
but now I know:)

The real problem is that not all metadata formats are suitable for
autodetection. That's all.

The metadata I use in my GEOM classes play nice with autodetection.
The solution is very easy - keep size of the disk device within metadata.
This allows gmirror to figure out if it is configured on raw disk, last
slice or last partition within last slice, etc.
If GPT would keep disk size in its metadata the second problem you
mentioned would not exist. And to be honest GPT kinda does that by having
backup header's LBA stored in the primary header. And this is fine as
long the primary header is valid.

The same problem is with things like UFS labels. There is no way to
properly support them using GEOM autodetection, because there is no
provider size in UFS superblock. UFS superblock contains file system
size, but it is not the same, as one can create smaller file system than
the underlying disk device.

 I think it's unwise to depend on FreeBSD-specific extensions or features
 in industry-standard partitioning schemes and as such make the use of
 foreign tools hard if not impossible.

If you plan to use the given disk with FreeBSD only, what's the problem?
Partitioning is not the end of the world. Even if you use
industry-standard partitioning schemes what file system are you going
to use to actually access your data? FAT? Of course if you do share your
disk between various OSes then probably your best bet is to use MBR or
GPT on raw disk and FAT file system. But if you use your disk with
FreeBSD only, then I see no reason to not to leverage FreeBSD-specific
features (be it gmirror, geli or zfs).

 A much more flexible approach is to support out-of-band configuration
 data. This allows us to mirror GPT disks without having to become non-
 standard as it removes the need to use the last sector for meta-data.
 The ability to construct GEOM hierarchies unambiguously is very
 important and our current approach has proven to not deliver on that.
 This is actually impacting existing FreeBSD consumers already, like
 Juniper. So, se should not go deeper into this rabbit hole. We should
 finally solve this problem for real...

Marcel, nothing stops anyone from implementing GEOM mirror class that
uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does
provide mechanism for autoconfiguration, but it is totally optional and
GEOM class might choose not to use it.

As an example you can take a look at two other GEOM classes of mine:
gconcat(8) and gstripe(8). You can use 'label' subcommand to store
metadata on component disks, which will take advantage of  GEOM
autodetection and autoconfiguration. You can also use 'create'
subcommand to create ad hoc provider that stores no metadata and makes
use of entire disks, which also means it won't be automatically created
on next boot.

For Juniper it might be more handy to use out-of-band configuration as
you know the hardware you are running on, so you know where the disks
are exactly, etc. My company build appliances too, so I have been there.
For most of our users automatic configuration is simply better, as they
can shuffle disks around and not wonder if the system will boot or not.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! 

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
  
  As for sharing disk with other OS. If you share the disk with OS that
  doesn't support gmirror, you shouldn't use gmirror in the first place.
  You probably want to use only formats that are recognized by all your
  OSes.
 
 This statement is ridicuous by virtue of not being in touch with
 reality and by making gmirror useless for such wide range of cases
 that one can question why we have it at all.
 
 Put differently: a mirroring class is a fairly basic and useful thing
 to have. Limiting it's use is nothing but artificial and follows from
 having to use the underlying provider to store metadata. This then
 changes the view of the underlying providing to consumers above gmirror
 in a way that makes the presence or absence of gmirror visible.
 Solving the visibility problem makes gmirror useful all the time.
 I see that as a better way of looking at it than simply blurting out
 that you shouldn't use gmirror when certain awkward and artifical
 conditions apply.

I'm sorry, Marcel, but what you describe here has nothing to do with
reality. To be able to implement realiable mirroring you have to use
on-disk metadata. There is no way around that. You can implement
non-redundant GEOM classes without using on-disk metadata, but
out-of-band configuration in case of mirroring is simply naive. How do
you detect that components are out of sync, for example?

And when it comes to visablity. Are you suggesting that gmirror should
present entire underlying provider to upper layers? Including its
metadata? I hope not, because we went through that hell already
(remember skipping first 16 sectors by UFS, as BSDlabel metadata might
be there? The same for swap?).
I think I did pretty good job by making the metadata as simple as
possible - I use exactly one sector at the end of the target device.
I'm really having a hard time to think of a simpler format.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpHuBBkXk10K.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 11:34 AM, Pawel Jakub Dawidek wrote:
 
 I'm sorry, Marcel, but what you describe here has nothing to do with
 reality. To be able to implement realiable mirroring you have to use
 on-disk metadata. There is no way around that. You can implement
 non-redundant GEOM classes without using on-disk metadata, but
 out-of-band configuration in case of mirroring is simply naive. How do
 you detect that components are out of sync, for example?

GEOM configuration and per-class runtime state are not to be
treated the same. Out-of-band configuration is trivial.
Per-class runtime state, like whether elements in a mirrored
configuration are in sync or not is more difficult, but does
not a priori require on-disk metadata as it's implemented now.
You can have the configuration tell the GEOM where that state
is being kept, so that you can put it in a partition on the
disks involved, or even keep it independent from the disks,
which then requires disks to be uniquely identifiable, for
sure. But that's what GPT gives you anyway.

But even without identification, you can invert the question
from how do I detect that components are out of sync to
how do I prove they are in fact in sync. That question has
a very simple O(n) answer. So, if time isn't a concern or
your storage is small, you can always scan all sectors as
such prove that the disks are in sync.

The point being: the current implementation isn't the only
one. Granted, it can easily be the simplest one or even the
best one in some cases, but that's besides the point you were
making.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Christian Laursen

On 06/27/12 16:28, John Baldwin wrote:

On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:


When we are in the FreeBSD, our loader can detect that device size
is lower than it see and it will work. When primary header is OK, then
other OSes should work with this GPT. When it isn't OK, you just can't
load other OS :)


Ah, yes.  The solution to violating standards is to make sure you never
use standards-compliant software.  That's a great argument. :)

(Although not entirely uncommon.  Standards aren't always perfect, but if
we had a way to not gratuitously violate them it would be nice to avoid
doing so.)


To be standards compliant and allow whole-disk based mirroring to work 
at the same time wouldn't nested GPT work like this?


Whole disk (start)
| GPT header
| GPT partition of type freebsd-geom (start)
| | gmirror device (start)
| | | GPT header
| | | | freebsd-boot
| | | | freebsd-ufs
| | | | freebsd-swap
| | | GPT backup header
| | gmirror metadata
| | gmirror device (end)
| GPT partition of type freebsd-geom (end)
| GPT backup header
Whole disk (end)

Nothing but FreeBSD would understand the freebsd-geom partition type, so 
the inner GPT device should be valid and standards compliant.


The boot loader would of course need to understand this setup but that 
shouldn't be impossible.


Just a thought.

It might be too complicated compared to the non-standards compliant way 
it works now which works quite well in practice though.


--
Christian Laursen


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Dimitry Andric
On 2012-06-26 14:50, Andrey V. Elsukov wrote:
 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.
 
 So, i have created the branch and committed the changes:
   http://svnweb.freebsd.org/base/user/ae/bootcode/
 The patch is here:
   http://people.freebsd.org/~ae/boot.diff

FWIW, I verified it compiles OK with clang, and especially boot2's size
isn't increased at all.

It would be nice if you could check it with clang now and again, before
you finally merge this project into head.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 11:20 AM, Pawel Jakub Dawidek wrote:

 On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct 
 or 
 not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  
 I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.
 
 Agreed.
 
 Guys. This doesn't violate the GPT spec in any way. The spec is
 narrow-minded if it talks only about raw disks, but you should think
 about gmirror as pseudo-hardware RAID.

I'm sorry, but this is a contradiction. If it doesn't violate the
spec, then the spec is not narrow-minded on the grounds of what
we're discussing. If the spec *is* narrow-minded then obviously
it doesn't capture our scenario, which means that we're violating
the spec.

Clearly we're not discussing anything that falls well within the
spec, or is undebatable. This makes the whole topic dangerous
anyway. When you're in the grey area (this is only for argument's
sake -- we're in violation for sure) you're opening yourself up to
compatibility problems. Should we deliberately go there?

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 1:45:35 pm Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
  
  As for sharing disk with other OS. If you share the disk with OS that
  doesn't support gmirror, you shouldn't use gmirror in the first place.
  You probably want to use only formats that are recognized by all your
  OSes.
 
 This statement is ridicuous by virtue of not being in touch with
 reality and by making gmirror useless for such wide range of cases
 that one can question why we have it at all.
 
 Put differently: a mirroring class is a fairly basic and useful thing
 to have. Limiting it's use is nothing but artificial and follows from
 having to use the underlying provider to store metadata. This then
 changes the view of the underlying providing to consumers above gmirror
 in a way that makes the presence or absence of gmirror visible.
 Solving the visibility problem makes gmirror useful all the time.
 I see that as a better way of looking at it than simply blurting out
 that you shouldn't use gmirror when certain awkward and artifical
 conditions apply.

I'm not sure we can force gmirror to be anything except FreeBSD-specific,
but it would be nice to not make non-standard GPT tables while we are at it.

The reason the metadata for things like Intel's onboard SATA RAID does work
ok is because the metadata format is enforced by the vendor, so it is
reasonable to assume that metadata format will work across other OS's.

Anyway, I've said my piece and will let the matter drop from my end at this
point.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 12:08 PM, Christian Laursen wrote:

 On 06/27/12 16:28, John Baldwin wrote:
 On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:
 
 When we are in the FreeBSD, our loader can detect that device size
 is lower than it see and it will work. When primary header is OK, then
 other OSes should work with this GPT. When it isn't OK, you just can't
 load other OS :)
 
 Ah, yes.  The solution to violating standards is to make sure you never
 use standards-compliant software.  That's a great argument. :)
 
 (Although not entirely uncommon.  Standards aren't always perfect, but if
 we had a way to not gratuitously violate them it would be nice to avoid
 doing so.)
 
 To be standards compliant and allow whole-disk based mirroring to work at the 
 same time wouldn't nested GPT work like this?

GPTs don't nest.

 Nothing but FreeBSD would understand the freebsd-geom partition type, so the 
 inner GPT device should be valid and standards compliant.

If it were standards compliant, it would be discoverable by non-FreeBSD.
That clearly isn't the case -- hence it's not standards compliant. What
for example if someone wanted to share the swap partition between Linux
and FreeBSD?

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 27.06.2012 21:55, Marcel Moolenaar wrote:
 You can't just re-interpret standards to match a context you know very well
 isn't applicable and consequently redefine what the word device means.
 You're on a slippery slope and while you may not see it as a problem, you
 do make it a problem for FreeBSD users. It's our users we should be keeping
 in mind when we solve problems.
 
 If a user wants modify GPT in the disk editor from the another OS,
 he can do it, and it should work. The result depends only from the partition 
 editor,
 it might overwrite the last sector and might don't.
 
 Right. Another happy user that sees his/her FreeBSD installation destroyed
 or degraded (no mirroring, warning messages about corrupted GPT, etc) for
 no apparent reason and without any kind of warning that what he/she is doing
 is potentially harmful... That's the spirit!

Ok. Let's return back to my patches. They don't add any new methods to
shoot in the foot. We are talking about the *FreeBSD loader*.
This is the program that starts FreeBSD kernel. It doesn't start other
OS. We already have many users who uses FreeBSD as a single system on
the machine. Many of them use GPT inside of some GEOM provider.
You can just read the lists, articles about installing FreeBSD, forums,
etc. We already have these users and i hope they will use FreeBSD as
before. So, why can't add a simple quirk to make theirs system a bit
more reliable?

As i understand there two parts where we haven't a consensus:

1. You are against from:
Our loader detects that primary GPT header is damaged. It tries to read
backup GPT header from the last LBA and it detects that there is
GEOM:: signature. It tries to read one previous sector and there is
*valid* GPT header. It is valid, because it's CRC is valid, it's
self_LBA is valid. For the *FreeBSD* users it is better to don't use
this GPT and just complain i'm sorry, can't boot. The other OSes
can't, and we shouldn't.

2. You are against from having one fake PMBR entry by default in the
/boot/pmbr image. Ok, I can propose several ways to resolve this:
 * remove from the loader's GPT probing code restriction to necessarily
have PMBR partition record in the MBR;
 * teach the boot0cfg command properly write the PMBR;
 * add new condition to mark GPT as corrupt when it has invalid PMBR.
Thus, when you write PMBR with empty partition table with dd(1), the
kernel will complain and you will be forced to run `gpart recover`.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Mark Felder

On Tue, 26 Jun 2012 12:37:11 -0500, John Baldwin j...@freebsd.org wrote:


I'm
hesitant to encourage the use of this as I do think putting GPT inside  
of a

gmirror violates the GPT spec.


I personally think this use case is a bit ... odd, anyway.

I have only request to those that manage GPT/GEOM/etc -- as I'm used to  
doing multiple mdadm RAID components on Linux for maximum flexibility,  
using gmirror upon multiple GPT partitions upon the same physical device  
is OK with me. My only complaint is that recovery is very, very stupid. We  
should by default detect and only rebuild ONE gmirror device at a time on  
the same physical provider. You get nothing but a smokin' angry head if  
you allow multiple to rebuild at the same time because it's fighting over  
sequential writes all the way across the platters. It would also be nice  
if gmirror rebuild could also be detected by fsck and fsck could either  
hold off or gmirror could be paused until a consistent filesystem state  
exists. It's probably best for the background fsck to go first so you can  
get the system up and running, but then when it's finished gmirror should  
continue.


Otherwise I have no issues with gmirror -- it does exactly the job I need  
it to.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 12:27 PM, Andrey V. Elsukov wrote:

 On 27.06.2012 21:55, Marcel Moolenaar wrote:
 You can't just re-interpret standards to match a context you know very well
 isn't applicable and consequently redefine what the word device means.
 You're on a slippery slope and while you may not see it as a problem, you
 do make it a problem for FreeBSD users. It's our users we should be keeping
 in mind when we solve problems.
 
 If a user wants modify GPT in the disk editor from the another OS,
 he can do it, and it should work. The result depends only from the 
 partition editor,
 it might overwrite the last sector and might don't.
 
 Right. Another happy user that sees his/her FreeBSD installation destroyed
 or degraded (no mirroring, warning messages about corrupted GPT, etc) for
 no apparent reason and without any kind of warning that what he/she is doing
 is potentially harmful... That's the spirit!
 
 Ok. Let's return back to my patches. They don't add any new methods to
 shoot in the foot. We are talking about the *FreeBSD loader*.
 This is the program that starts FreeBSD kernel. It doesn't start other
 OS. We already have many users who uses FreeBSD as a single system on
 the machine. Many of them use GPT inside of some GEOM provider.

Your patches are a continuation on a path that we're discussing isn't
necessarily the path we should be on. While you don't make things
worse from a compliance perspective, you make it worse by adding the
non-compliant behaviour to more components.

 As i understand there two parts where we haven't a consensus:
 
 1. You are against from:
 Our loader detects that primary GPT header is damaged. It tries to read
 backup GPT header from the last LBA and it detects that there is
 GEOM:: signature. It tries to read one previous sector and there is
 *valid* GPT header.

How do you know it's valid? It's in a location that is not valid
to begin with. Validity is based on rules and you're violating the
the rules without defining exactly what we call valid given the
new rules. This may seem nitpicking, but having went through the
hassle of dealing with the broken way we created the dangerously
dedicated disk, I appreciate the importance of being anal when it
comes to something that lives on non-volatile storage and gets to
be exposed to a world much larger than FreeBSD.

 2. You are against from having one fake PMBR entry by default in the
 /boot/pmbr image.

I don't understand what you're saying or what I'm being accused to
be against.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 28.06.2012 00:14, Marcel Moolenaar wrote:
 Our loader detects that primary GPT header is damaged. It tries to read
 backup GPT header from the last LBA and it detects that there is
 GEOM:: signature. It tries to read one previous sector and there is
 *valid* GPT header.
 
 How do you know it's valid? It's in a location that is not valid
 to begin with. Validity is based on rules and you're violating the
 the rules without defining exactly what we call valid given the
 new rules. This may seem nitpicking, but having went through the
 hassle of dealing with the broken way we created the dangerously
 dedicated disk, I appreciate the importance of being anal when it
 comes to something that lives on non-volatile storage and gets to
 be exposed to a world much larger than FreeBSD.

So why do you not prevent to attach GEOM_PART_GPT to any providers that
are not the disk drive? This will be the right solution to all our
problems. Just don't create invalid GPT.

-- 
WBR, Andrey V. Elsukov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 1:48 PM, Andrey V. Elsukov wrote:

 On 28.06.2012 00:14, Marcel Moolenaar wrote:
 Our loader detects that primary GPT header is damaged. It tries to read
 backup GPT header from the last LBA and it detects that there is
 GEOM:: signature. It tries to read one previous sector and there is
 *valid* GPT header.
 
 How do you know it's valid? It's in a location that is not valid
 to begin with. Validity is based on rules and you're violating the
 the rules without defining exactly what we call valid given the
 new rules. This may seem nitpicking, but having went through the
 hassle of dealing with the broken way we created the dangerously
 dedicated disk, I appreciate the importance of being anal when it
 comes to something that lives on non-volatile storage and gets to
 be exposed to a world much larger than FreeBSD.
 
 So why do you not prevent to attach GEOM_PART_GPT to any providers that
 are not the disk drive? This will be the right solution to all our
 problems. Just don't create invalid GPT.

It's not even the right solution, as it prevents legit nesting
of gpart GEOMs *and* is fundamentally based on a flawed assumption
that any non-disk GEOM underneath gpart yields an invalid GPT.
Think gnop.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Poul-Henning Kamp

I would like to point out that all other operating system which has
had this precise problem, have solved it by adding a bootfs partition
to hold the kernel+modules required to truly understand the disk-layout ?

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
 Hi All,
 
 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.

Just a quick note here. At some point when I was adding GPT attributes
to allow for test starts I greatly improved, at least parts of, the GPT
implementation. I did implement support for both CRC checksum
verification and fallback to backup GPT header when primary is broken.
And the code is still in sys/boot/common/gpt.c. So my question would be
what do you mean by this sentence?

 So, i have created the branch and committed the changes:
   http://svnweb.freebsd.org/base/user/ae/bootcode/
 The patch is here:
   http://people.freebsd.org/~ae/boot.diff
 
 What i already did:
 1. The partition tables handling now is machine independent,
 and it is compatible with the kernel's GEOM_PART implementation.
 There is new API for disk drivers in the loader to get information
 about partitions and tables:
 common/Makefile.inc
   common/part.c
   common/part.h
 
 2. The similar and general code from the disk drivers merged in the
 disk.c:
 common/disk.c
 common/disk.h
 i386/libi386/libi386.h
 i386/libi386/biosdisk.c
 userboot/test/test.c
 userboot/userboot/userboot_disk.c
 userboot/userboot.h
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c
 4. The gptboot now searches the backup GPT header in the previous sectors,
 when it finds the GEOM:: signature in the last sector. PMBR code also
 tries to do the same:
 common/gpt.c
 i386/pmbr/pmbr.s
 
 5. Also the pmbr image now contains one fake partition record.
 When several first sectors are damaged the kernel can't detect GPT
 (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
 command, but the old pmbr image has an empty partition table and
 loader doesn't able to boot from GPT, when there is no partition record
 in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode'
 command, the kernel correctly modifies this partition record. So, this is only
 for the first rescue step.
 
 6. I have changed userboot interface. I guess there is none consumers except
 the one test program. But if it isn't that, i can make it compatible.
 
 Any comments are welcome.
 
 -- 
 WBR, Andrey V. Elsukov
 
 



-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpiIPR0p9Pav.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Andrey V. Elsukov
On 26.06.2012 16:57, Pawel Jakub Dawidek wrote:
 On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
 Hi All,

 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
  http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.
 
 Just a quick note here. At some point when I was adding GPT attributes
 to allow for test starts I greatly improved, at least parts of, the GPT
 implementation. I did implement support for both CRC checksum
 verification and fallback to backup GPT header when primary is broken.
 And the code is still in sys/boot/common/gpt.c. So my question would be
 what do you mean by this sentence?

Yes, gptboot does that, but the loader/zfsloader doesn't. So there might
be a situation when gptboot does boot, but loader(8) can't.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote:
 On 26.06.2012 16:57, Pawel Jakub Dawidek wrote:
  On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
  Hi All,
 
  Some time ago i have started reading the code in the sys/boot.
  Especially i'm interested in the partition tables handling.
  I found several problems:
  1. There are several copies of the same code in the libi386/biosdisk.c
  and common/disk.c, and partially libpc98/biosdisk.c.
  2. ZFS probing is very slow, because the ZFS code doesn't know how many
  disks and partitions the system has:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
  3. The GPT support doesn't check CRC and even doesn't know anything
  about the secondary GPT header/table.
  
  Just a quick note here. At some point when I was adding GPT attributes
  to allow for test starts I greatly improved, at least parts of, the GPT
  implementation. I did implement support for both CRC checksum
  verification and fallback to backup GPT header when primary is broken.
  And the code is still in sys/boot/common/gpt.c. So my question would be
  what do you mean by this sentence?
 
 Yes, gptboot does that, but the loader/zfsloader doesn't. So there might
 be a situation when gptboot does boot, but loader(8) can't.

I see. I don't know if I'll find time for a proper review, but it is
really great that you are working on cleaning up this huge mess.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpLgEysD3gTw.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread John Baldwin
On Tuesday, June 26, 2012 8:50:36 am Andrey V. Elsukov wrote:
 Hi All,
 
 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.
 
 So, i have created the branch and committed the changes:
   http://svnweb.freebsd.org/base/user/ae/bootcode/
 The patch is here:
   http://people.freebsd.org/~ae/boot.diff
 
 What i already did:
 1. The partition tables handling now is machine independent,
 and it is compatible with the kernel's GEOM_PART implementation.
 There is new API for disk drivers in the loader to get information
 about partitions and tables:
 common/Makefile.inc
   common/part.c
   common/part.h
 
 2. The similar and general code from the disk drivers merged in the
 disk.c:
 common/disk.c
 common/disk.h
 i386/libi386/libi386.h
 i386/libi386/biosdisk.c
 userboot/test/test.c
 userboot/userboot/userboot_disk.c
 userboot/userboot.h
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c
 4. The gptboot now searches the backup GPT header in the previous sectors,
 when it finds the GEOM:: signature in the last sector. PMBR code also
 tries to do the same:
 common/gpt.c
 i386/pmbr/pmbr.s

GPT really wants the backup header at the last LBA.  I know you can set it, 
but I've interpreted that as a way to see if the primary header is correct or 
not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
provider) will not work properly with partition editors for other OS's.  I'm 
hesitant to encourage the use of this as I do think putting GPT inside of a 
gmirror violates the GPT spec.

 5. Also the pmbr image now contains one fake partition record.
 When several first sectors are damaged the kernel can't detect GPT
 (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
 command, but the old pmbr image has an empty partition table and
 loader doesn't able to boot from GPT, when there is no partition record
 in the PMBR. Now it will be able. When pmbr is installed via 'gpart 
bootcode'
 command, the kernel correctly modifies this partition record. So, this is 
only
 for the first rescue step.

As I said earlier, I do not think this is appropriate and that instead
gpart should have an appropriate 'recover' command to install just the pmbr on 
a disk and also create a correct entry in the MBR if needed while doing so.

 6. I have changed userboot interface. I guess there is none consumers except
 the one test program. But if it isn't that, i can make it compatible.

One other consumer is in the bhyve branch.  I think the 'kload' patches also 
use it.  However, they can probably be adapted easily.

[ Note, I haven't done a detailed review of the patch at all yet. ]

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
  4. The gptboot now searches the backup GPT header in the previous sectors,
  when it finds the GEOM:: signature in the last sector. PMBR code also
  tries to do the same:
  common/gpt.c
  i386/pmbr/pmbr.s
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct or 
 not. [...]

My interpretation is different: The way to verify if the header is valid
is to check its checksum, not to check if the backup header location in
the primary header points at the last LBA.

Of course if primary header's checksum is incorrect it is hard to trust
that the backup header location is correct. And we need the backup
header when the primary header is invalid...

 [...] It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.

I don't think so. Most common case is to configure partitions on top of
a mirror. Mirroring partitions is less common. Mostly because of
hardware RAIDs being popular. You don't expect hardware RAID vendor to
mirror partitions. Partition editors for other OS's won't work, but only
because they don't support gmirror. If they wouldn't recognize and
support some hardware (or pseudo-hardware) RAIDs there will be the same
problem.

In other words, IMHO, our problem is that FreeBSD's boot code doesn't
recognize/support gmirror's metadata. What Andrey is proposing is to
recognize the metadata and act accordingly - in case of a gmirror we
simply need to skip it.

In the future we will have the same problem with graid - until we add
support for it to the boot code, we won't be able to boot from it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp3XvXHY46CU.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote:
 Long ago I saw a proposal to create a dedicated partition on GPT to
 hold the metadata. With the large number of partitions available on
 GPT, tying up one just for GEOM seems like a low price and it moves
 the device GEOM out of the realm of FreeBSD unique and subject to
 serious issues when/if a disk is shared with some other OS. I have
 seen little comment on this and have never seen any argument that that
 it could not work.
 
 I think this is an issue that will continue to bite users unless it is fixed.

I don't really see how dedicating a partition for metadata can work or
is good idea, sorry.

As for sharing disk with other OS. If you share the disk with OS that
doesn't support gmirror, you shouldn't use gmirror in the first place.
You probably want to use only formats that are recognized by all your
OSes.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpWHeMC9knsD.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Kevin Oberman
On Tue, Jun 26, 2012 at 2:23 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
  4. The gptboot now searches the backup GPT header in the previous sectors,
  when it finds the GEOM:: signature in the last sector. PMBR code also
  tries to do the same:
          common/gpt.c
          i386/pmbr/pmbr.s

 GPT really wants the backup header at the last LBA.  I know you can set it,
 but I've interpreted that as a way to see if the primary header is correct or
 not. [...]

 My interpretation is different: The way to verify if the header is valid
 is to check its checksum, not to check if the backup header location in
 the primary header points at the last LBA.

 Of course if primary header's checksum is incorrect it is hard to trust
 that the backup header location is correct. And we need the backup
 header when the primary header is invalid...

 [...] It seems to me that GPT tables created in this fashion (inside a GEOM
 provider) will not work properly with partition editors for other OS's.  I'm
 hesitant to encourage the use of this as I do think putting GPT inside of a
 gmirror violates the GPT spec.

 I don't think so. Most common case is to configure partitions on top of
 a mirror. Mirroring partitions is less common. Mostly because of
 hardware RAIDs being popular. You don't expect hardware RAID vendor to
 mirror partitions. Partition editors for other OS's won't work, but only
 because they don't support gmirror. If they wouldn't recognize and
 support some hardware (or pseudo-hardware) RAIDs there will be the same
 problem.

 In other words, IMHO, our problem is that FreeBSD's boot code doesn't
 recognize/support gmirror's metadata. What Andrey is proposing is to
 recognize the metadata and act accordingly - in case of a gmirror we
 simply need to skip it.

 In the future we will have the same problem with graid - until we add
 support for it to the boot code, we won't be able to boot from it.

Long ago I saw a proposal to create a dedicated partition on GPT to
hold the metadata. With the large number of partitions available on
GPT, tying up one just for GEOM seems like a low price and it moves
the device GEOM out of the realm of FreeBSD unique and subject to
serious issues when/if a disk is shared with some other OS. I have
seen little comment on this and have never seen any argument that that
it could not work.

I think this is an issue that will continue to bite users unless it is fixed.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Andrey V. Elsukov
On 26.06.2012 21:37, John Baldwin wrote:
 4. The gptboot now searches the backup GPT header in the previous sectors,
 when it finds the GEOM:: signature in the last sector. PMBR code also
 tries to do the same:
 common/gpt.c
 i386/pmbr/pmbr.s
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct or 
 not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.

The standard says:
The following test must be performed to determine if a GPT is valid:
• Check the Signature
• Check the Header CRC
• Check that the MyLBA entry points to the LBA that contains the GUID Partition 
Table
• Check the CRC of the GUID Partition Entry Array
If the GPT is the primary table, stored at LBA 1:
• Check the AlternateLBA to see if it is a valid GPT
If the primary GPT is corrupt, software must check the last LBA of the device 
to see if it has a
valid GPT Header and point to a valid GPT Partition Entry Array.

For the FreeBSD an each GEOM provider can be treated as disk device.
So, i don't see anything criminal if we will add some quirks in the our loader
for the better supporting of our technologies.

If a user wants modify GPT in the disk editor from the another OS,
he can do it, and it should work. The result depends only from the partition 
editor,
it might overwrite the last sector and might don't.

 5. Also the pmbr image now contains one fake partition record.
 When several first sectors are damaged the kernel can't detect GPT
 (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
 command, but the old pmbr image has an empty partition table and
 loader doesn't able to boot from GPT, when there is no partition record
 in the PMBR. Now it will be able. When pmbr is installed via 'gpart 
 bootcode'
 command, the kernel correctly modifies this partition record. So, this is 
 only
 for the first rescue step.
 
 As I said earlier, I do not think this is appropriate and that instead
 gpart should have an appropriate 'recover' command to install just the pmbr 
 on 
 a disk and also create a correct entry in the MBR if needed while doing so.

gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM 
class.
It only sends control requests to the kernel. If GPT is not detected,
there is no geom objects to manage. And we can't write bootcode with gpart(8).
I think that adding such functions to the gpart(8) is not good. Maybe,
the boot0cfg is the better tool for that. Also we still haven't any tool to
install zfsboot.

-- 
WBR, Andrey V. Elsukov





signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Andrey V. Elsukov
On 27.06.2012 1:41, Kevin Oberman wrote:
 Long ago I saw a proposal to create a dedicated partition on GPT to
 hold the metadata. With the large number of partitions available on
 GPT, tying up one just for GEOM seems like a low price and it moves
 the device GEOM out of the realm of FreeBSD unique and subject to
 serious issues when/if a disk is shared with some other OS. I have
 seen little comment on this and have never seen any argument that that
 it could not work.

When you share some disk with another OS, it seems that much serious
issue will be when other OS did some changes in your mirror without
you knowing. I know about successful sharing of the disk between Windows
and FreeBSD via graid on the Intel pseudo raid. Just use compatible 
technologies.

-- 
WBR, Andrey V. Elsukov





signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Andriy Gapon
on 27/06/2012 07:50 Andrey V. Elsukov said the following:
 Also we still haven't any tool to install zfsboot.

Yeah, I think it would be nice if ZFS provided some interface (ioctl?) to
properly write stuff to its special areas.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org