Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-29 Thread Petter Reinholdtsen
[Richard Laager]
> For example, if you want to use the low 32-bits of /etc/machine-id,
> that would work too. It'd mean carrying a patch on Debian, but if the
> pain of a patch and different behavior is less than the benefits of
> the change, go for it.

I guess we would have to verify that /etc/machine-id is available in the
initrd for this to work with / in zfs.  But I guess that is a problem
with /etc/hostid too for gethostid(). :)

While researching this topic I came across
http://stackoverflow.com/questions/9258228/how-to-prevent-gethostid-from-doing-dns-lookups-on-linux
 >
which report that gethostid() might lock up a program if the DNS server
become unavailable.  A scary scenario just to get the machine ID.

I also came across http://0pointer.de/blog/projects/ids.html >,
which provide a very useful list of possible IDs to use in addition to
the gethostid() value.  It agrees that gethostid() have unclear
sematics. :)

-- 
Happy hacking
Petter Reinholdtsen



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-29 Thread Richard Laager
On 09/29/2016 05:19 AM, Michael Stone wrote:
> On Wed, Sep 28, 2016 at 09:03:38PM -0700, Richard Laager wrote:
>> Getting back to ZFS and /etc/hostid... I would think that a
>> randomly-generated /etc/hostid is probably sufficient. Whether that's
>> done in the libc, spl, or zfs package makes no difference to me.
> 
> You still haven't explained why zfs doesn't just generate a uuid itself.
>
> There's a large body of work ensuring reasonable uniqueness for uuids,
> and there isn't a clear benefit to clinging to getuid.

It can't be a full UUID. The on-disk format of ZFS uses a 32-bit
integer. It doesn't really matter what we use to derive it, but a 32-bit
integer is the constraint.

For example, if you want to use the low 32-bits of /etc/machine-id, that
would work too. It'd mean carrying a patch on Debian, but if the pain of
a patch and different behavior is less than the benefits of the change,
go for it.

> Even on solaris
> there's a big honkin' warning on the man page that it isn't guaranteed
> to be unique (IIRC, getuid on containers reflects the hardware the
> container is running on).

On Solaris the zone (container) wouldn't import the pool. Pools are
imported in the "global zone". So this isn't a problem.

-- 
Richard



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-29 Thread Michael Stone

On Wed, Sep 28, 2016 at 09:03:38PM -0700, Richard Laager wrote:

Getting back to ZFS and /etc/hostid... I would think that a
randomly-generated /etc/hostid is probably sufficient. Whether that's
done in the libc, spl, or zfs package makes no difference to me.


You still haven't explained why zfs doesn't just generate a uuid itself. 
There's a large body of work ensuring reasonable uniqueness for uuids, 
and there isn't a clear benefit to clinging to getuid. Even on solaris 
there's a big honkin' warning on the man page that it isn't guaranteed 
to be unique (IIRC, getuid on containers reflects the hardware the 
container is running on). 


Mike Stone



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-29 Thread Florian Weimer
* Richard Laager:

> Getting back to ZFS and /etc/hostid... I would think that a
> randomly-generated /etc/hostid is probably sufficient. Whether that's
> done in the libc, spl, or zfs package makes no difference to me.

As I tried to explain, the risks of collisions without central
coordination looks rather high.  glibc's current approach, using the
IP address associated with the host name, provides a certain level of
coordination, avoiding duplicates.



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Petter Reinholdtsen
[Aurelien Jarno]
> In any case it looks to me we should not reinvent the wheel. We
> already ended-up with two implementations of a unique machine ID, one
> in dbus and one for systemd (which fortunately now try to just copy
> the other one if it already exists), I am not sure we want a third
> one. Could we just copy (part) of this ID if it exists, otherwise
> generate a random number? Or even point the current gethostid() to
> /etc/machine-id if it exists?

Peeking at the dbus and systemd UUID (and perhaps preferring them over
the DMI UUID) seem like a good idea, as long as /etc/hostid is updated
once during installation.  Perhaps glibc is the wrong place to do this.
Perhaps a debian-installer udeb is a better place?  It will of course
miss out chroots, which is unfortunate.

We have /etc/machine-id from systemd, /var/lib/dbus/machine-id from dbus
and /sys/class/dmi/id/product_uuid from DMI which all contain 128 bits
coded as hexadecimal numbers.  I guess using the lower 32 bits for
gethostid() is as good as any of the other options.

> I am not even sure it's a good idea to fix this, it might be better to
> just mark this function as deprecated, and encourage existing users of
> this function (including hostid) to use something much longer than
> 32-bit to avoid collisions.

Mentioning alternatives with more bits in the gethostid() manual page
definitely sound like a good idea.

> One thing is sure however, if we change the current behaviour, it will
> change the hostid on many systems, including ones which do not return
> 007f0101.

I agree what it should not be done automatically on existing
installation.  This is why I propose to set a value in /etc/hostid only
on first time installation of libc6, and document in the manual page how
to set it for those that want to modify an existing installation.

-- 
Happy hacking
Petter Reinholdtsen



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Richard Laager
On 09/28/2016 04:41 AM, Petter Reinholdtsen wrote:
> I did not quite understand what you mean here.  Do you mean the DMI
> value in your experience isn't unique?

Absolutely, yes. I found this out because, for some reason that I don't
know, libvirtd wants a unique identifier. It defaults to looking at the
UUID from the BIOS. Unfortunately, SuperMicro boards have non-unique UUIDs.

Getting back to ZFS and /etc/hostid... I would think that a
randomly-generated /etc/hostid is probably sufficient. Whether that's
done in the libc, spl, or zfs package makes no difference to me.

-- 
Richard



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Florian Weimer
* Michael Stone:

> Other platforms have deprecated gethostid, that's the best way forward
> for linux, IMO.

I agree.  It's the most likely outcome if this issue was reported to
glibc upstream.



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Aurelien Jarno
On 2016-09-28 09:33, Petter Reinholdtsen wrote:
> Control: reassign -1 libc6
> Control: found -1 2.19-18
> Control: The value from gethostid() should be more unique and not change when 
> the host IP changes
> 
> Reassigning to glibc as that is the source of gethostid() where the
> problem with the missing unique identifier originates.  Using the
> version number in stable, but the issue have been around before that.
> 
> In my work as a system administrator for tens of thousand of machines, I
> have often had the need to get some semi-unique identifier out of the
> operating system.  On all other Unix like operating systems, hostid and
> gethostid() will provide this, but not on Linux.  I find this rather sad,
> and have had to spend time generating our own solution to the problem
> because gethostid() is useless on Linux.
> 
> Because of this, and to spare future system administrators to share that
> pain, I fully support the request from Martin Kraft to extend Debian to
> make sure the gethostid() value return something sensible.
> 
> The described approach from FreeBSD, using /etc/hostid,
> /sys/class/dmi/id/product_uuid or a random value (in that order) seem
> like a sensible one.  It might make sense to use other sources too, but
> the goal should be to pick a value that will stay the same until the hardware
> is replaced, and pick a value that will stay the same as long as the operating
> system isn't reinstalled if such hardware dependent value do not exist.
> 
> To avoid changing the ID on running systems I believe it should only be done
> when libc6 is installed for the first time.  Those willing to change their
> hostid at runtime should be provided a simple script to do so instead of doing
> it automatically.  It will fix the issue for future installations.  I am not
> sure how to sensibly fix it for existing installations without ending up with
> a lot of machines with the same hostid as 7f0100 is a very common hostid on
> Linux already, and everyone with a private IP address like those on 192.168.*
> will have collisions.  But then again a 32 bit number can only provide
> 4.294.967.296 unique IDs and with the amount of Linux machines in the world
> there are going to be collisions anyway.  We just should reduce the chance to
> a more sensible number.
> 
> Something like this should work, I guess:
> 
> if [ ! -f /etc/hostid ]; then
>if [ -e /sys/class/dmi/id/product_uuid ]; then
>sethostidfromuuid $(cat /sys/class/dmi/id/product_uuid)
>else
>   dd if=/dev/urandom bs=1 count=4 of=/etc/hostid 2>/dev/null
>fi
> fi
> 
> We need to figure out how to transform the UUID to a 32 bit integer, of 
> course.

Hmm DMI is something quite x86/aarch64 specific, so it means we will
always use the /dev/urandom fallback on other architectures.

Another question is about chroots. The above methods means we might
end-up with the same machine-id in chroots id the DMI UUID is available.
Is it something really wanted?

In any case it looks to me we should not reinvent the wheel. We already
ended-up with two implementations of a unique machine ID, one in dbus
and one for systemd (which fortunately now try to just copy the other
one if it already exists), I am not sure we want a third one. Could we
just copy (part) of this ID if it exists, otherwise generate a random
number? Or even point the current gethostid() to /etc/machine-id if it
exists?

I am not even sure it's a good idea to fix this, it might be better to
just mark this function as deprecated, and encourage existing users of
this function (including hostid) to use something much longer than 
32-bit to avoid collisions.

One thing is sure however, if we change the current behaviour, it will
change the hostid on many systems, including ones which do not return
007f0101.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Michael Stone

On Wed, Sep 28, 2016 at 11:11:21PM +0200, Petter Reinholdtsen wrote:

[Michael Stone]

Other platforms have deprecated gethostid, that's the best way forward for
linux, IMO.


Which platforms is this?  I find FreeBSD recommend to use sysctl and
KERN_HOSTID to get the hostid integer directly from the kernel instead
of using gethostid(), which isn't really depricating the feature, only
the way to get access to it.  A quick search did not show any other
platforms depricating the function and feature, so I am curious to learn
what those are.


openbsd deprecates it, netbsd doesn't have it at all. neither of those 
platforms is likely to have a useful value unless you set it yourself. 
I'd wonder where this *is* expected to be useful value more than I'd 
wonder where it isn't.



My proposal is to use the DMI info which should stay the same
independent of OS installation.


Which doesn't exist on many, many platforms. If you need an ID tied to 
the hardware that's the one to use, but you have to know that the 
hardware you're deploying to actually supports that feature.



The users I am aware of is zfs-linux and the tools we wrote at work to
detect when a Linux machine was reinstalled or had its hardware changed.


For the latter case, just use the smbios values directly (assuming 
you're buying enterprise style hardware, it should support machine 
uuids.) That way you know that you're getting something tied to the 
hardware, instead of hoping.



The use case of zfs-linux require the ID to be unique among the machines
sharing a storage solution, and not globally unique.


I can't understand why, for that use case, zfs-linux wouldn't simply 
create a uuid itself. I see no obvious advantage in the program trying 
to fix the semantics of a fundamentally broken function that was 
introduced in BSD in the 80s and was removed from BSD itself back in the 
90s.



A search in the source of all Debian packages[1] show this list of 148
packages mentioning the string 'gethostid': actiona alpine amanda
apcupsd aplus-fsf arpwatch ats-lang-anairiats audit bacula bareos
bluefish bsdgames burp busybox casacore cde cdrdao cdrkit
chromium-browser cl-irc clisp cmucl condor coreutils ctwm cython dc3dd
dcmtk deheader deja-dup dicom3tools dietlibc dist dmtcp dx e17
eclipse-titan edk2 emscripten erlang facter fpc frama-c freebsd-utils
freetds fs-uae ga gcc-h8300-hms gdb ghc glibc gnucash gnulib
gnustep-netclasses golang golang-1.6 golang-1.7 golang-golang-x-sys
hercules highlight hugs98 hurd iputils isdnutils ivtools kfreebsd-10
krb5 ksh latrace ldc libcanberra libconvert-binary-c-perl
libdata-uuid-perl libexplain libpam-tacplus libpcap libposix-2008-perl
linux linux-grsec ltrace lua-posix manpages manpages-de manpages-es
manpages-fr manpages-ja manpages-pl metview minc-tools mingw-w64 mono
mono-reference-assemblies musl nam ncbi-tools6 netatalk newlib nim nmap
nordugrid-arc ns2 ntirpc nwchem open-iscsi open-vm-tools openafs openmpi
otp pidgin pidgin-nateon pimd polygraph praat prayer pulseaudio
python-ptrace qemu radare2 rat roaraudio samhain sbcl silo-llnl sipxtapi
slirp smlnj sniffit spl-linux splint strace talksoup.app tau tcpdump
tcpslice tkrat topal trinity tripwire uclibc uclmmbase uhd uw-imap vde2
xfsdump yap zephyr zfs-fuse zfsutils.

I do not know what they use gethostid() for. :)


Pulling a couple at random:
libpcap -- the only occurance is in lbl/os-sunos4.h 
which is basically a list of function prototypes from a long obsolete OS 
for historic curiosity.


xfsdump -- honestly seems like a bug or at least a misunderstanding: 
ghdrp->gh_ipaddr = ( uint64_t )( unsigned long )gethostid( )


cdrdao -- questionable assumption in scsi-sun.c:
cpu_type = gethostid() >> 24

burp -- contains a couple of prototypes for the function, checks for it 
in configure, doesn't seem to actually use it


This really is a function with no current value that should just be 
forgotten. And certainly don't make random assumptions about the value 
it returns.


Mike Stone



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Petter Reinholdtsen
[Michael Stone]
> Other platforms have deprecated gethostid, that's the best way forward for 
> linux, IMO.

Which platforms is this?  I find FreeBSD recommend to use sysctl and
KERN_HOSTID to get the hostid integer directly from the kernel instead
of using gethostid(), which isn't really depricating the feature, only
the way to get access to it.  A quick search did not show any other
platforms depricating the function and feature, so I am curious to learn
what those are.

> This proposal doesn't fix the problem generally and actually changes
> the semantics of the call. (It was originally expected that the value
> would remain constant independent of a particular OS installation,
> which is not a property of a value stored on disk.)

My proposal is to use the DMI info which should stay the same
independent of OS installation.

> The main users of hostid (that I'm aware of) tended to be commercial
> software vendors locking licenses to systems--and they typically
> didn't use gethostid on linux because it was useless for the
> purpose. So I'm not aware of a userbase for this call on linux, and
> nobody should be using it for new development.

The users I am aware of is zfs-linux and the tools we wrote at work to
detect when a Linux machine was reinstalled or had its hardware changed.
The use case of zfs-linux require the ID to be unique among the machines
sharing a storage solution, and not globally unique.

A search in the source of all Debian packages[1] show this list of 148
packages mentioning the string 'gethostid': actiona alpine amanda
apcupsd aplus-fsf arpwatch ats-lang-anairiats audit bacula bareos
bluefish bsdgames burp busybox casacore cde cdrdao cdrkit
chromium-browser cl-irc clisp cmucl condor coreutils ctwm cython dc3dd
dcmtk deheader deja-dup dicom3tools dietlibc dist dmtcp dx e17
eclipse-titan edk2 emscripten erlang facter fpc frama-c freebsd-utils
freetds fs-uae ga gcc-h8300-hms gdb ghc glibc gnucash gnulib
gnustep-netclasses golang golang-1.6 golang-1.7 golang-golang-x-sys
hercules highlight hugs98 hurd iputils isdnutils ivtools kfreebsd-10
krb5 ksh latrace ldc libcanberra libconvert-binary-c-perl
libdata-uuid-perl libexplain libpam-tacplus libpcap libposix-2008-perl
linux linux-grsec ltrace lua-posix manpages manpages-de manpages-es
manpages-fr manpages-ja manpages-pl metview minc-tools mingw-w64 mono
mono-reference-assemblies musl nam ncbi-tools6 netatalk newlib nim nmap
nordugrid-arc ns2 ntirpc nwchem open-iscsi open-vm-tools openafs openmpi
otp pidgin pidgin-nateon pimd polygraph praat prayer pulseaudio
python-ptrace qemu radare2 rat roaraudio samhain sbcl silo-llnl sipxtapi
slirp smlnj sniffit spl-linux splint strace talksoup.app tau tcpdump
tcpslice tkrat topal trinity tripwire uclibc uclmmbase uhd uw-imap vde2
xfsdump yap zephyr zfs-fuse zfsutils.

I do not know what they use gethostid() for. :)

 [1] curl -s 
https://codesearch.debian.net/results/2308ff3051ed55cc/packages.json | jq -r 
'.Packages[]'

-- 
Happy hacking
Petter Reinholdtsen



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Florian Weimer
* Petter Reinholdtsen:

> [Florian Weimer]
>> That's not very different from /etc/machine-id, isn't it?
>
> Ah, thank you very much for bringing this systemd setting to my
> attention.  I was not aware of it.
>
> I agree that it seem very similar in purpose and implementation.  Will
> it be available on non-linux Debian architectures too?

It might be possible to port over this part, yes.

>>> We need to figure out how to transform the UUID to a 32 bit integer,
>>> of course.
>>
>> And I think this is the crux of the problem.  Whatever we do, with
>> today's cluster sizes it's just not reliably unique.
>
> Well, for the set of machines we have available at work (ca. 3000) it
> would be sufficiently unique.

I simulated 100,000 random assigns of 32-bit host IDs to 3,000 hosts,
and got collisions in 104 cases.

For 5,000 hosts, I got 286, and for 10,000, 1,112 (again in 100,000
runs).  I was lazy, it shouldn't be too hard to calculate expected
values accurately.

So a 32-bit value without central coordination is pretty much a time
bomb.

> For most sites it would make the return value from gethostid()
> unique.

The IP address of a host could be better than that.  I doubt it is
possible to imrpove upon the glibc implementation.

>> DMI data seems risky because it depends on firmware, and there are so
>> many firmware bugs out there.
>
> I did not quite understand what you mean here.  Do you mean the DMI
> value in your experience isn't unique?

I wouldn't count on them being unique.  Most such ID fields are
definitely not, and there are groups out there who strongly oppose
device IDs.



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Michael Stone

On Wed, Sep 28, 2016 at 12:32:04PM +0200, Florian Weimer wrote:

* Petter Reinholdtsen:

Something like this should work, I guess:

if [ ! -f /etc/hostid ]; then
   if [ -e /sys/class/dmi/id/product_uuid ]; then
   sethostidfromuuid $(cat /sys/class/dmi/id/product_uuid)
   else
dd if=/dev/urandom bs=1 count=4 of=/etc/hostid 2>/dev/null
   fi
fi


That's not very different from /etc/machine-id, isn't it?


We need to figure out how to transform the UUID to a 32 bit integer,
of course.


And I think this is the crux of the problem.  Whatever we do, with
today's cluster sizes it's just not reliably unique.

You could use /etc/machine-id instead.  Some effort goes into that to
make it actually unique.

DMI data seems risky because it depends on firmware, and there are so
many firmware bugs out there.  It would also not address the matter of
changing host IDs as the result of host migrations.


Yes, this seems a quixotic quest. In historic terms, this was mostly 
used on systems that actually had some kind of serial number burned onto 
the mainboard; it's fairly useless in the absence of that kind of 
controlled environment. Many systems these days actually do have that 
sort of ID, e.g., via dmi/smbios, but 1) it's not guaranteed to be there 
2) it's unlikely to fit in a 32 bit int.


Other platforms have deprecated gethostid, that's the best way forward for 
linux, IMO. This proposal doesn't fix the problem generally and actually 
changes the semantics of the call. (It was originally expected that the 
value would remain constant independent of a particular OS installation, 
which is not a property of a value stored on disk.) The main users of 
hostid (that I'm aware of) tended to be commercial software vendors 
locking licenses to systems--and they typically didn't use gethostid on 
linux because it was useless for the purpose. So I'm not aware of a 
userbase for this call on linux, and nobody should be using it for new 
development. If you need a stable unique id then you should be using 
something like the dmi uuid *and you need to have hardware from a vendor 
that sets such a property*. 

If you want something tied to the OS instance rather than the machine, 
then use /etc/machine-id (and gnash your teeth at the misnomer) rather 
than reinventing it.


Mike Stone



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Petter Reinholdtsen
[Florian Weimer]
> That's not very different from /etc/machine-id, isn't it?

Ah, thank you very much for bringing this systemd setting to my
attention.  I was not aware of it.

I agree that it seem very similar in purpose and implementation.  Will
it be available on non-linux Debian architectures too?

>> We need to figure out how to transform the UUID to a 32 bit integer,
>> of course.
>
> And I think this is the crux of the problem.  Whatever we do, with
> today's cluster sizes it's just not reliably unique.

Well, for the set of machines we have available at work (ca. 3000) it
would be sufficiently unique.  For most sites it would make the return
value from gethostid() unique.  In most use cases it do not need need to
globally unique.  Like the ZFS use case, it just need to be unique among
the hosts sharing the storage system.

In another use case at work, it should be unique across the entire stock
of linux machines.

> You could use /etc/machine-id instead.  Some effort goes into that to
> make it actually unique.

I will definitely put this systemd value in my tool box.  Again, thank
you very much for mentioning it. :)

> DMI data seems risky because it depends on firmware, and there are so
> many firmware bugs out there.

I did not quite understand what you mean here.  Do you mean the DMI
value in your experience isn't unique?

> It would also not address the matter of changing host IDs as the
> result of host migrations.

As far as I can tell, host migration could be solved by storing the
wanted hostid in /etc/hostid when migrating.

On an related note, I had a look at the POSIX definition for
gethostuid()[1], and its "Upon successful completion, gethostid() shall
return an identifier for the current host" is definitely very vague.  So
glibc is sure not violating POSIX by changing the value when the host
changes IP address or commonly returning identical IDs on different
machines, but real world applications on the other hand expect the
hostid value to be reasonably unique and fixed across IP changes and
reboots.

 [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/gethostid.html

-- 
Happy hacking
Petter Reinholdtsen



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Florian Weimer
* Petter Reinholdtsen:

> Something like this should work, I guess:
> 
> if [ ! -f /etc/hostid ]; then
>if [ -e /sys/class/dmi/id/product_uuid ]; then
>sethostidfromuuid $(cat /sys/class/dmi/id/product_uuid)
>else
>   dd if=/dev/urandom bs=1 count=4 of=/etc/hostid 2>/dev/null
>fi
> fi

That's not very different from /etc/machine-id, isn't it?

> We need to figure out how to transform the UUID to a 32 bit integer,
> of course.

And I think this is the crux of the problem.  Whatever we do, with
today's cluster sizes it's just not reliably unique.

You could use /etc/machine-id instead.  Some effort goes into that to
make it actually unique.

DMI data seems risky because it depends on firmware, and there are so
many firmware bugs out there.  It would also not address the matter of
changing host IDs as the result of host migrations.



Bug#595790: [Pkg-zfsonlinux-devel] Bug#595790: hostid: useless unless fixed

2016-09-28 Thread Petter Reinholdtsen
Control: reassign -1 libc6
Control: found -1 2.19-18
Control: The value from gethostid() should be more unique and not change when 
the host IP changes

Reassigning to glibc as that is the source of gethostid() where the
problem with the missing unique identifier originates.  Using the
version number in stable, but the issue have been around before that.

In my work as a system administrator for tens of thousand of machines, I
have often had the need to get some semi-unique identifier out of the
operating system.  On all other Unix like operating systems, hostid and
gethostid() will provide this, but not on Linux.  I find this rather sad,
and have had to spend time generating our own solution to the problem
because gethostid() is useless on Linux.

Because of this, and to spare future system administrators to share that
pain, I fully support the request from Martin Kraft to extend Debian to
make sure the gethostid() value return something sensible.

The described approach from FreeBSD, using /etc/hostid,
/sys/class/dmi/id/product_uuid or a random value (in that order) seem
like a sensible one.  It might make sense to use other sources too, but
the goal should be to pick a value that will stay the same until the hardware
is replaced, and pick a value that will stay the same as long as the operating
system isn't reinstalled if such hardware dependent value do not exist.

To avoid changing the ID on running systems I believe it should only be done
when libc6 is installed for the first time.  Those willing to change their
hostid at runtime should be provided a simple script to do so instead of doing
it automatically.  It will fix the issue for future installations.  I am not
sure how to sensibly fix it for existing installations without ending up with
a lot of machines with the same hostid as 7f0100 is a very common hostid on
Linux already, and everyone with a private IP address like those on 192.168.*
will have collisions.  But then again a 32 bit number can only provide
4.294.967.296 unique IDs and with the amount of Linux machines in the world
there are going to be collisions anyway.  We just should reduce the chance to
a more sensible number.

Something like this should work, I guess:

if [ ! -f /etc/hostid ]; then
   if [ -e /sys/class/dmi/id/product_uuid ]; then
   sethostidfromuuid $(cat /sys/class/dmi/id/product_uuid)
   else
dd if=/dev/urandom bs=1 count=4 of=/etc/hostid 2>/dev/null
   fi
fi

We need to figure out how to transform the UUID to a 32 bit integer, of course.

-- 
Happy hacking
Petter Reinholdtsen