nfsantos commented on PR #1159: URL: https://github.com/apache/jackrabbit-oak/pull/1159#issuecomment-1766166956
I rerun the tests with 20 properties per node and with larger values in the property, which I think is a more realistic scenario. The sorted version is slower, around 15% slower. I think the impact in performance will be measurable but maybe not very significant. In the case of the Pipelined strategy, the overhead from sorting will happen in the transform threads, which can be easily scaled up. The Mongo download thread is the main bottleneck, as this stage is single threaded and there is no good way to parallelize it, so I would resist adding overhead in the work done by this thread, but I don't have objections to slightly increasing the work of the transform threads. I would anyway suggest having a configuration setting to enable/disable sorting of the properties when writing to the FFS. ```java package org.apache.jackrabbit.oak.index.indexer.document.flatfile; import java.util.ArrayList; import org.apache.commons.lang3.RandomStringUtils; import org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState; import org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder; import org.apache.jackrabbit.oak.spi.blob.BlobStore; import org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore; import org.apache.jackrabbit.oak.spi.state.NodeBuilder; import org.apache.jackrabbit.oak.spi.state.NodeState; import org.junit.Test; public class MicroBenchmark { public void test() { BlobStore blobStore = new MemoryBlobStore(); NodeStateEntryWriter entryWriter = new NodeStateEntryWriter(blobStore); ArrayList<NodeState> list = new ArrayList<>(); for (int j = 0; j < 1000000; j++) { NodeBuilder b = new MemoryNodeBuilder(EmptyNodeState.EMPTY_NODE); for (int i = 0; i < 20; i++) { b.setProperty("p" + i, RandomStringUtils.random(40, true, true)); } NodeState ns = b.getNodeState(); list.add(ns); } // Profiler prof = new Profiler().startCollecting(); for(int test = 0; test < 10; test++) { long start = System.currentTimeMillis(); int len = 0; for (NodeState ns : list) { len += entryWriter.asJson(ns).length(); } long time = System.currentTimeMillis() - start; System.out.println(time + " ms; string length " + len + " unsorted"); start = System.currentTimeMillis(); len = 0; for (NodeState ns : list) { len += entryWriter.asSortedJson(ns).length(); } time = System.currentTimeMillis() - start; System.out.println(time + " ms; string length " + len + " sorted"); System.out.println(); } // System.out.println(prof.getTop(10)); } public static void main(String[] args) { new MicroBenchmark().test(); } } ``` ``` 4418 ms; string length 971000000 unsorted 2399 ms; string length 971000000 sorted 1886 ms; string length 971000000 unsorted 2156 ms; string length 971000000 sorted 1760 ms; string length 971000000 unsorted 2039 ms; string length 971000000 sorted 1667 ms; string length 971000000 unsorted 2000 ms; string length 971000000 sorted 1665 ms; string length 971000000 unsorted 2000 ms; string length 971000000 sorted 1665 ms; string length 971000000 unsorted 1999 ms; string length 971000000 sorted 1667 ms; string length 971000000 unsorted 2000 ms; string length 971000000 sorted 1664 ms; string length 971000000 unsorted 2001 ms; string length 971000000 sorted 1663 ms; string length 971000000 unsorted 1999 ms; string length 971000000 sorted 1667 ms; string length 971000000 unsorted 1999 ms; string length 971000000 sorted ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org