Github user spmallette commented on a diff in the pull request:
https://github.com/apache/tinkerpop/pull/892#discussion_r202651403
--- Diff:
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/graphml/GraphMLWriter.java
---
@@ -242,7 +257,31 @@ private void writeTypes(final Map<String, String>
identifiedVertexKeyTypes,
private void writeEdges(final XMLStreamWriter writer, final Graph
graph) throws XMLStreamException {
if (normalize) {
final List<Edge> edges = IteratorUtils.list(graph.edges());
- Collections.sort(edges, Comparators.ELEMENT_COMPARATOR);
+ Collections.sort(edges, new Comparator<Edge>() {
--- End diff --
Your algorithm for sorting leaves me with some concerns. It assumes that a
"string of numbers" will fit into the integer space. I suppose that you could
use `Long` instead, but what if the "string of numbers" is
`39098403984093284023984032840329840389430984` - you'd still blow the space.
Maybe I'm not following the code properly but if you had keys like this:
```text
gremlin> x = ["a100","100a","1a1"]
==>a100
==>100a
==>1a1
```
and you sorted them with existing sorting you get:
```text
gremlin> Collections.sort(x, Comparator.comparing({ it.toString() },
String.CASE_INSENSITIVE_ORDER))
gremlin> x
==>100a
==>1a1
==>a100
```
with your approach I get:
```text
gremlin> extractInt = { s ->
......1> num = s.replaceAll("\\D", "");
......2> return num.isEmpty() ? 0 : Integer.parseInt(num);
......3> }
gremlin> x = ["a100","100a","1a1"]
==>a100
==>100a
==>1a1
gremlin> x.sort{a,b ->
......1> o1fullString = a.toString();
......2> o2fullString = b.toString();
......3>
......3> o1StringPart = o1fullString.replaceAll("\\d", "");
......4> o2StringPart = o2fullString.replaceAll("\\d", "");
......5>
......5> if(o1StringPart.equalsIgnoreCase(o2StringPart)) {
......6> return extractInt(o1fullString) - extractInt(o2fullString);
......7> }
......8> return o1fullString.compareTo(o2fullString);
......9> }
==>1a1
==>a100
==>100a
```
Is that what you expected to have happen?
With IDs you really can't assume too much because TinkerPop is not ID
aware. The underlying graph implementation is the only thing that really knows
anything about what the type of format of the ID is.
---