nfsantos commented on PR #1159:
URL: https://github.com/apache/jackrabbit-oak/pull/1159#issuecomment-1766166956

   I rerun the tests with 20 properties per node and with larger values in the 
property, which I think is a more realistic scenario. The sorted version is 
slower, around 15% slower. 
   
   I think the impact in performance will be measurable but maybe not very 
significant. In the case of the Pipelined strategy, the overhead from sorting 
will happen in the transform threads, which can be easily scaled up. The Mongo 
download thread is the main bottleneck, as this stage is single threaded and 
there is no good way to parallelize it, so I would resist adding overhead in 
the work done by this thread, but I don't have objections to slightly 
increasing the work of the transform threads.
   
   I would anyway suggest having a configuration setting to enable/disable 
sorting of the properties when writing to the FFS.
   
   ```java
   package org.apache.jackrabbit.oak.index.indexer.document.flatfile;
   
   import java.util.ArrayList;
   
   import org.apache.commons.lang3.RandomStringUtils;
   import org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState;
   import org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder;
   import org.apache.jackrabbit.oak.spi.blob.BlobStore;
   import org.apache.jackrabbit.oak.spi.blob.MemoryBlobStore;
   import org.apache.jackrabbit.oak.spi.state.NodeBuilder;
   import org.apache.jackrabbit.oak.spi.state.NodeState;
   import org.junit.Test;
   
   public class MicroBenchmark {
       public void test() {
           BlobStore blobStore = new MemoryBlobStore();
           NodeStateEntryWriter entryWriter = new 
NodeStateEntryWriter(blobStore);
           ArrayList<NodeState> list = new ArrayList<>();
           for (int j = 0; j < 1000000; j++) {
               NodeBuilder b = new MemoryNodeBuilder(EmptyNodeState.EMPTY_NODE);
               for (int i = 0; i < 20; i++) {
                   b.setProperty("p" + i, RandomStringUtils.random(40, true, 
true));
               }
               NodeState ns = b.getNodeState();
               list.add(ns);
           }
           // Profiler prof = new Profiler().startCollecting();
           for(int test = 0; test < 10; test++) {
               long start = System.currentTimeMillis();
               int len = 0;
               for (NodeState ns : list) {
                   len += entryWriter.asJson(ns).length();
               }
               long time = System.currentTimeMillis() - start;
               System.out.println(time + " ms; string length " + len + " 
unsorted");
   
               start = System.currentTimeMillis();
               len = 0;
               for (NodeState ns : list) {
                   len += entryWriter.asSortedJson(ns).length();
               }
               time = System.currentTimeMillis() - start;
               System.out.println(time + " ms; string length " + len + " 
sorted");
               System.out.println();
           }
           // System.out.println(prof.getTop(10));
       }
   
       public static void main(String[] args) {
           new MicroBenchmark().test();
       }
   }
   ```
   
   ```
   4418 ms; string length 971000000 unsorted
   2399 ms; string length 971000000 sorted
   
   1886 ms; string length 971000000 unsorted
   2156 ms; string length 971000000 sorted
   
   1760 ms; string length 971000000 unsorted
   2039 ms; string length 971000000 sorted
   
   1667 ms; string length 971000000 unsorted
   2000 ms; string length 971000000 sorted
   
   1665 ms; string length 971000000 unsorted
   2000 ms; string length 971000000 sorted
   
   1665 ms; string length 971000000 unsorted
   1999 ms; string length 971000000 sorted
   
   1667 ms; string length 971000000 unsorted
   2000 ms; string length 971000000 sorted
   
   1664 ms; string length 971000000 unsorted
   2001 ms; string length 971000000 sorted
   
   1663 ms; string length 971000000 unsorted
   1999 ms; string length 971000000 sorted
   
   1667 ms; string length 971000000 unsorted
   1999 ms; string length 971000000 sorted
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to