msokolov commented on code in PR #12651:
URL: https://github.com/apache/lucene/pull/12651#discussion_r1359884135
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -284,6 +285,13 @@ int graphNextNeighbor(HnswGraph graph) throws IOException {
return graph.nextNeighbor();
}
+ private static int getGraphSize(HnswGraph graph) {
+ if (graph instanceof OnHeapHnswGraph) {
Review Comment:
I see - can be a follow-up, but perhaps we introduce `capacity()` to
`HnswGraph` so we can use it uniformly without casting
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -99,27 +122,31 @@ public void addNode(int level, int node) {
entryNode = node;
}
- if (level > 0) {
- // if the new node introduces a new level, add more levels to the graph,
- // and make this node the graph's new entry point
- if (level >= numLevels) {
- for (int i = numLevels; i <= level; i++) {
- graphUpperLevels.add(new HashMap<>());
- }
- numLevels = level + 1;
- entryNode = node;
+ if (node >= graph.length) {
+ if (noGrowth) {
+ throw new AssertionError(
Review Comment:
This should either be `assert node < graph.length || noGrowth == false:
"...message..."` or else we should throw an `IllegalArgumentException` - I
don't think we ought to be throwing `AssertionError` in production code.
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -99,27 +122,31 @@ public void addNode(int level, int node) {
entryNode = node;
}
- if (level > 0) {
- // if the new node introduces a new level, add more levels to the graph,
- // and make this node the graph's new entry point
- if (level >= numLevels) {
- for (int i = numLevels; i <= level; i++) {
- graphUpperLevels.add(new HashMap<>());
- }
- numLevels = level + 1;
- entryNode = node;
+ if (node >= graph.length) {
+ if (noGrowth) {
+ throw new AssertionError(
+ "The graph does not expect to growth when an initial size is
given");
Review Comment:
syntax: growth -> grow
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -32,39 +31,54 @@
*/
public final class OnHeapHnswGraph extends HnswGraph implements Accountable {
+ private static final int INIT_SIZE = 128;
+
private int numLevels; // the current number of levels in the graph
private int entryNode; // the current graph entry node on the top level. -1
if not set
- // Level 0 is represented as List<NeighborArray> – nodes' connections on
level 0.
- // Each entry in the list has the top maxConn/maxConn0 neighbors of a node.
The nodes correspond
- // to vectors
- // added to HnswBuilder, and the node values are the ordinals of those
vectors.
- // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
- private final List<NeighborArray> graphLevel0;
- // Represents levels 1-N. Each level is represented with a Map that maps a
levels level 0
- // ordinal to its neighbors on that level. All nodes are in level 0, so we
do not need to maintain
- // it in this list. However, to avoid changing list indexing, we always will
make the first
- // element
- // null.
- private final List<Map<Integer, NeighborArray>> graphUpperLevels;
- private final int nsize;
- private final int nsize0;
+ // the internal graph representation where the first dimension is node id
and second dimension is
+ // level
+ // e.g. graph[1][2] is all the neighbours of node 1 at level 2
+ private NeighborArray[][] graph;
+ // essentially another 2d map which the first dimension is level and second
dimension is node id,
+ // this is only
+ // generated on demand when there's someone calling getNodeOnLevel on a
non-zero level
+ private List<Integer>[] levelToNodes;
+ private int
+ lastFreezeSize; // remember the size we are at last time to freeze the
graph and generate
+ // levelToNodes
+ private int size; // graph size, which is number of nodes in level 0
+ private int
+ nonZeroLevelSize; // total number of NeighborArrays created that is not
on level 0, for now it
+ // is only used to account memory usage
+ private final int nsize; // neighbour array size at non-zero level
+ private final int nsize0; // neighbour array size at zero level
+ private final boolean
+ noGrowth; // if an initial size is passed in, we don't expect the graph
to grow itself
// KnnGraphValues iterator members
private int upto;
private NeighborArray cur;
- OnHeapHnswGraph(int M) {
+ /**
+ * ctor
+ *
+ * @param numNodes number of nodes that will be added to this graph, passing
in -1 means unbounded
+ * while passing in a non-negative value will lock the whole graph and
disable the graph from
+ * growth itself (you cannot add a node with has id >= numNodes)
Review Comment:
syntax nits, should read: "growing itself (you cannot add a node having id
>= numNodes)"
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -158,50 +185,82 @@ public int entryNode() {
return entryNode;
}
+ /**
+ * WARN: calling this method will essentially iterate through all nodes at
level 0 (even if you're
+ * not getting node at level 0), we have built some caching mechanism such
that if graph is not
+ * changed only the first non-zero level call will pay the cost. So it is
highly NOT recommended
+ * to call this method while the graph is still building.
+ *
+ * <p>NOTE: if the node is not inserted in order (e.g. we're init'd from
another graph) such that
+ * there are some node in middle are not inserted. (e.g. the largest node
inserted is 10 but node
+ * 5 is not yet inserted) Calling getNodesOnLevel(0) will lead to a wrong
behavior because it is a
+ * simple iterating over range(size) TODO: maybe we should fix the above
behavior?
Review Comment:
Could we do
```
if (size != capacity) throw new IllegalStateException("graph build not
complete, size=" + size + " capacity=" + capacity);
```
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph
implements Accountable {
* @param node the node whose neighbors are returned, represented as an
ordinal on the level 0.
*/
public NeighborArray getNeighbors(int level, int node) {
- if (level == 0) {
- return graphLevel0.get(node);
- }
- Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level);
- assert levelMap.containsKey(node);
- return levelMap.get(node);
+ assert graph.length > node && graph[node] != null && graph[node][level] !=
null;
Review Comment:
I think we might actually prefer not to assert the first two conditions so
that we can distinguish NPE from AIOOBE from AssertionError?
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -74,23 +88,32 @@ public final class OnHeapHnswGraph extends HnswGraph
implements Accountable {
* @param node the node whose neighbors are returned, represented as an
ordinal on the level 0.
*/
public NeighborArray getNeighbors(int level, int node) {
- if (level == 0) {
- return graphLevel0.get(node);
- }
- Map<Integer, NeighborArray> levelMap = graphUpperLevels.get(level);
- assert levelMap.containsKey(node);
- return levelMap.get(node);
+ assert graph.length > node && graph[node] != null && graph[node][level] !=
null;
+ return graph[node][level];
}
@Override
public int size() {
- return graphLevel0.size(); // all nodes are located on the 0th level
+ return size;
+ }
+
+ /**
+ * When we initialize from another graph, the capacity, which is length of
internal buffer of
+ * graph, is different from {@link #size()}, which is number of nodes
already inserted, such that
+ * we need two method to retrieve each
+ *
+ * @return length of internal representation of graph
+ */
+ public int capacity() {
+ return graph.length;
}
/**
* Add node on the given level. Nodes can be inserted out of order, but it
requires that the nodes
* preceded by the node inserted out of order are eventually added.
*
+ * <p>NOTE: You must add a node from the node's top level
Review Comment:
maybe expand to "You must add a node **starting** from the node's top level"?
##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -40,31 +41,29 @@ public final class OnHeapHnswGraph extends HnswGraph
implements Accountable {
// to vectors
// added to HnswBuilder, and the node values are the ordinals of those
vectors.
// Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
- private final List<NeighborArray> graphLevel0;
Review Comment:
ok, thanks I didn't see that
##########
lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java:
##########
@@ -454,77 +454,6 @@ public void testSearchWithSelectiveAcceptOrds() throws
IOException {
}
}
- public void testBuildOnHeapHnswGraphOutOfOrder() throws IOException {
Review Comment:
ok this test is no longer interesting, but could we add a few new tests
checking the error states:
1. attempt to add nodes with levels out of order
2. attempt to add nodes out of bounds to pre-allocated graph
3. attempt to retrieve nodes iterator on incomplete graph
Also maybe we still a need test where we add nodes out of order? I'm not
sure how well that is covered
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]