On 04/10/2014 08:21 PM, Steven Schlansker wrote:
On Apr 9, 2014, at 2:21 AM, Paul Sandoz <paul.san...@oracle.com> wrote:
On Apr 8, 2014, at 9:15 PM, Mike Duigou <mike.dui...@oracle.com> wrote:
That seems a terribly broken usage of UUID for 128 bit numbers or a pair of
signed 64 bit numbers :-)
Part of me thinks we should not be supporting such broken usage. Might be worth
getting some usage data from grepcode or maven central.
I’m guilty of doing this at a point in the past. We used it to intermix
multiple sources of data - a few that used ‘long’ IDs and a few that used
‘real’ UUIDs. We took prefixes that had flags never generated by UUID
libraries (the “reserved for compatibility with Microsoft” ones IIRC) and
slapped those with a long to make pseudo-UUIDs. That way everything was a UUID
and we were guaranteed to never see collisions within our dataset.
We never expected them to give sensible meanings to the various getters e.g.
timestamp() but we did expect the UUID class to work generally.
We never relied on any particular ordering.
We could provide static methods to return appropriate comparators for version 1
and version 2 UUIDs--I've actually written them before for other projects.
It would be nice to just have one compareTo that does the right thing based of
the UUID types being compared.
If it were up to me only the time and DCE UUIDs would be comparable, there's no
ordering relationship for other versions.
I think it is fine for random UUIDs to be comparable with each other.
The comparators I've considered adding would only allow comparisons within the
same version.
Yes, although for a general comparator the primary sort key could be the
version value.
+1 to a “sort first by version then version-specific ordering” — it gives you
the best of both worlds IMO.
I think the natural ordering for UUIDs must be able to create an ordering over
all possible UUIDs, no matter the version or even if it is valid or not. If
you read UUIDs from an external source you have no way to understand what
version they are or aren’t. Imagine a process that loads data from a database
or text file into a TreeMap - it would be awful if a change in the UUID
generation scheme on the far side caused the Comparator you used to no longer
function.
Hi,
Code that relies on UUIDs to have a "natural" order, say
"chronological", is relying on being given the particular type of UUIDs
that have the time built-in. When given mixed-type or non-time-based
UUIDs, such code will break. The purpose of UUID schemes is generating
globally unique identifiers, not interpreting them. Various types exist
just because it's practical to generate UUIDs differently in different
contexts. Programs should not try to extract business information from
UUIDs (except probably for hunting down virus authors:
http://en.wikipedia.org/wiki/Melissa_%28computer_virus%29). So I think a
good general-purpose compareTo() method should try to discourage such
usage for example by reversing the bits of the UUID value before
comparing. This would also make algorithms that order UUIDs for
facilitating quick access (B-Tree, Red-Black-Tree, ...) happier.
Try the following benchmark (with options: -wi 5 -i 5 -t 1 -f 1):
public class TreeMapInsertionBench {
private static final int N = 100000;
@GenerateMicroBenchmark()
@BenchmarkMode(Mode.AverageTime)
public static void testTreeMapSequentialInsert(BlackHole bh) {
Map<Integer, Boolean> map = new TreeMap<>();
for (int i = 0; i < N; i++) {
map.put(new Integer(i), Boolean.TRUE);
}
bh.consume(map);
}
@GenerateMicroBenchmark()
@BenchmarkMode(Mode.AverageTime)
public static void testTreeMapReversedBitsInsert(BlackHole bh) {
Map<Integer, Boolean> map = new TreeMap<>();
for (int i = 0; i < N; i++) {
map.put(new Integer(Integer.reverse(i)), Boolean.TRUE);
}
bh.consume(map);
}
}
...and the results will be:
Benchmark Mode Samples Mean Mean error Units
o.s.TreeMapInsertionBench.testTreeMapReversedBitsInsert avgt
5 8.208 0.100 ms/op
o.s.TreeMapInsertionBench.testTreeMapSequentialInsert avgt
5 10.498 1.164 ms/op
That's just an opinion.
Regards, Peter