zhaih commented on code in PR #12050: URL: https://github.com/apache/lucene/pull/12050#discussion_r1061976538
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ########## @@ -143,10 +148,64 @@ public OnHeapHnswGraph build(RandomAccessVectorValues vectorsToAdd) throws IOExc return hnsw; } + /** + * Initializes the graph of this builder. Transfers the nodes and their neighbors from the + * initializer graph into the graph being produced by this builder, mapping ordinals from the + * initializer graph to their new ordinals in this builder's graph. The builder's graph must be + * empty before calling this method. + * + * @param initializerGraph graph used for initialization + * @param oldToNewOrdinalMap map for converting from ordinals in the initializerGraph to this + * builder's graph + */ + public void initializeFromGraph( + HnswGraph initializerGraph, Map<Integer, Integer> oldToNewOrdinalMap) throws IOException { + assert hnsw.size() == 0; + float[] vectorValue = null; + BytesRef binaryValue = null; + for (int level = 0; level < initializerGraph.numLevels(); level++) { + HnswGraph.NodesIterator it = initializerGraph.getNodesOnLevel(level); + + while (it.hasNext()) { + int oldOrd = it.nextInt(); + int newOrd = oldToNewOrdinalMap.get(oldOrd); + + hnsw.addNode(level, newOrd); + + if (level == 0) { + initializedNodes.add(newOrd); + } + + switch (this.vectorEncoding) { + case FLOAT32 -> vectorValue = vectors.vectorValue(newOrd); + case BYTE -> binaryValue = vectors.binaryValue(newOrd); + } + + NeighborArray newNeighbors = this.hnsw.getNeighbors(level, newOrd); + initializerGraph.seek(level, oldOrd); + for (int oldNeighbor = initializerGraph.nextNeighbor(); + oldNeighbor != NO_MORE_DOCS; + oldNeighbor = initializerGraph.nextNeighbor()) { + int newNeighbor = oldToNewOrdinalMap.get(oldNeighbor); + float score = + switch (this.vectorEncoding) { + case FLOAT32 -> this.similarityFunction.compare( Review Comment: Oh ok, I see. Initially I thought the neighbor ordering is for searching but seems that is not the case. Is this sorted order only used for calculating diversity easier? Anyway that can be a later topic and I think we can live with the existing logic for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org