gf2121 commented on PR #12800:
URL: https://github.com/apache/lucene/pull/12800#issuecomment-1820687175
Thanks for feedback @mikemccand !
> Hmm it looks like random got a bit slower in candidate? Flush time ~550
ish ms in baseline and maybe ~650 ish ms in candidate?
Ohhh! I reconfirmed and it turns out i paste the benchmark result in wrong
place, sorry!
> And it's not quite random right -- it's sort of a sawtooth between 0 and
14? Or am I reading the results backwards?
Thanks for pointing out this, it is indeed not random enough. I changed the
RANDOM to use `java.util.Random` and renamed the origin RANDOM to ROUND. The
new random distribution get improved a bit more, 20+%.
<details><summary>Benchmark Detail</summary>
**Baseline**
```
Using order: RANDOM
DWPT 0 [2023-11-21T10:49:03.532762Z; main]: flush time 1138.599958 ms
DWPT 0 [2023-11-21T10:49:05.455861Z; main]: flush time 1059.658875 ms
DWPT 0 [2023-11-21T10:49:07.224901Z; main]: flush time 991.167625 ms
DWPT 0 [2023-11-21T10:49:08.847679Z; main]: flush time 850.281875 ms
DWPT 0 [2023-11-21T10:49:10.468672Z; main]: flush time 848.403625 ms
DWPT 0 [2023-11-21T10:49:12.104744Z; main]: flush time 861.536542 ms
DWPT 0 [2023-11-21T10:49:13.731466Z; main]: flush time 851.316958 ms
DWPT 0 [2023-11-21T10:49:15.350887Z; main]: flush time 847.584917 ms
DWPT 0 [2023-11-21T10:49:16.963837Z; main]: flush time 843.849042 ms
DWPT 0 [2023-11-21T10:49:18.579249Z; main]: flush time 843.092666 ms
Using order: ROUND
DWPT 1 [2023-11-21T10:49:20.051903Z; main]: flush time 648.484542 ms
DWPT 1 [2023-11-21T10:49:21.472366Z; main]: flush time 643.642417 ms
DWPT 1 [2023-11-21T10:49:22.889719Z; main]: flush time 644.994541 ms
DWPT 1 [2023-11-21T10:49:24.307484Z; main]: flush time 642.117958 ms
DWPT 1 [2023-11-21T10:49:25.727815Z; main]: flush time 642.38 ms
DWPT 1 [2023-11-21T10:49:27.143574Z; main]: flush time 639.769875 ms
DWPT 1 [2023-11-21T10:49:28.562387Z; main]: flush time 644.234375 ms
DWPT 1 [2023-11-21T10:49:29.975009Z; main]: flush time 639.01125 ms
DWPT 1 [2023-11-21T10:49:31.396969Z; main]: flush time 643.216 ms
DWPT 1 [2023-11-21T10:49:32.810467Z; main]: flush time 639.049041 ms
Using order: ASC
DWPT 2 [2023-11-21T10:49:34.100537Z; main]: flush time 473.33425 ms
DWPT 2 [2023-11-21T10:49:35.236826Z; main]: flush time 352.816167 ms
DWPT 2 [2023-11-21T10:49:36.312917Z; main]: flush time 293.915917 ms
DWPT 2 [2023-11-21T10:49:37.386792Z; main]: flush time 290.221458 ms
DWPT 2 [2023-11-21T10:49:38.463960Z; main]: flush time 287.046708 ms
DWPT 2 [2023-11-21T10:49:39.537561Z; main]: flush time 287.051709 ms
DWPT 2 [2023-11-21T10:49:40.610809Z; main]: flush time 287.296375 ms
DWPT 2 [2023-11-21T10:49:41.686863Z; main]: flush time 290.536083 ms
DWPT 2 [2023-11-21T10:49:42.751377Z; main]: flush time 289.183375 ms
DWPT 2 [2023-11-21T10:49:43.824249Z; main]: flush time 289.238584 ms
Using order: DESC
DWPT 3 [2023-11-21T10:49:45.039267Z; main]: flush time 394.276959 ms
DWPT 3 [2023-11-21T10:49:46.203835Z; main]: flush time 365.40575 ms
DWPT 3 [2023-11-21T10:49:47.359253Z; main]: flush time 364.55 ms
DWPT 3 [2023-11-21T10:49:48.548749Z; main]: flush time 385.198 ms
DWPT 3 [2023-11-21T10:49:49.715963Z; main]: flush time 366.247083 ms
DWPT 3 [2023-11-21T10:49:50.881628Z; main]: flush time 372.473333 ms
DWPT 3 [2023-11-21T10:49:52.037239Z; main]: flush time 367.666041 ms
DWPT 3 [2023-11-21T10:49:53.192338Z; main]: flush time 364.390834 ms
DWPT 3 [2023-11-21T10:49:54.346795Z; main]: flush time 367.417208 ms
DWPT 3 [2023-11-21T10:49:55.506692Z; main]: flush time 374.948625 ms
```
**Candidate**
```
Using order: RANDOM
DWPT 0 [2023-11-21T10:31:14.638348Z; main]: flush time 926.650958 ms
DWPT 0 [2023-11-21T10:31:16.527778Z; main]: flush time 983.61375 ms
DWPT 0 [2023-11-21T10:31:18.105650Z; main]: flush time 745.283416 ms
DWPT 0 [2023-11-21T10:31:19.545346Z; main]: flush time 614.212208 ms
DWPT 0 [2023-11-21T10:31:20.986866Z; main]: flush time 621.046833 ms
DWPT 0 [2023-11-21T10:31:22.418842Z; main]: flush time 613.169292 ms
DWPT 0 [2023-11-21T10:31:23.843488Z; main]: flush time 608.060375 ms
DWPT 0 [2023-11-21T10:31:25.289972Z; main]: flush time 633.770083 ms
DWPT 0 [2023-11-21T10:31:26.729025Z; main]: flush time 617.815 ms
DWPT 0 [2023-11-21T10:31:28.152042Z; main]: flush time 606.253292 ms
Using order: ROUND
DWPT 1 [2023-11-21T10:31:29.546556Z; main]: flush time 540.889709 ms
DWPT 1 [2023-11-21T10:31:30.891868Z; main]: flush time 534.34825 ms
DWPT 1 [2023-11-21T10:31:32.235487Z; main]: flush time 529.94025 ms
DWPT 1 [2023-11-21T10:31:33.585848Z; main]: flush time 538.600959 ms
DWPT 1 [2023-11-21T10:31:34.926304Z; main]: flush time 535.212458 ms
DWPT 1 [2023-11-21T10:31:36.261841Z; main]: flush time 529.868792 ms
DWPT 1 [2023-11-21T10:31:37.612535Z; main]: flush time 532.926375 ms
DWPT 1 [2023-11-21T10:31:38.950114Z; main]: flush time 531.968 ms
DWPT 1 [2023-11-21T10:31:40.283548Z; main]: flush time 529.449208 ms
DWPT 1 [2023-11-21T10:31:41.621569Z; main]: flush time 531.614458 ms
Using order: ASC
DWPT 2 [2023-11-21T10:31:42.931710Z; main]: flush time 466.0205 ms
DWPT 2 [2023-11-21T10:31:44.110242Z; main]: flush time 361.563833 ms
DWPT 2 [2023-11-21T10:31:45.270395Z; main]: flush time 344.598167 ms
DWPT 2 [2023-11-21T10:31:46.391066Z; main]: flush time 297.298416 ms
DWPT 2 [2023-11-21T10:31:47.508596Z; main]: flush time 292.465833 ms
DWPT 2 [2023-11-21T10:31:48.619912Z; main]: flush time 294.3465 ms
DWPT 2 [2023-11-21T10:31:49.733508Z; main]: flush time 294.211834 ms
DWPT 2 [2023-11-21T10:31:50.844318Z; main]: flush time 292.396292 ms
DWPT 2 [2023-11-21T10:31:51.957632Z; main]: flush time 294.951792 ms
DWPT 2 [2023-11-21T10:31:53.060245Z; main]: flush time 293.81875 ms
Using order: DESC
DWPT 3 [2023-11-21T10:31:54.309892Z; main]: flush time 397.02825 ms
DWPT 3 [2023-11-21T10:31:55.507951Z; main]: flush time 375.452125 ms
DWPT 3 [2023-11-21T10:31:56.705769Z; main]: flush time 379.94275 ms
DWPT 3 [2023-11-21T10:31:57.916353Z; main]: flush time 374.742583 ms
DWPT 3 [2023-11-21T10:31:59.098488Z; main]: flush time 370.185083 ms
DWPT 3 [2023-11-21T10:32:00.286668Z; main]: flush time 373.631208 ms
DWPT 3 [2023-11-21T10:32:01.479051Z; main]: flush time 369.689833 ms
DWPT 3 [2023-11-21T10:32:02.665413Z; main]: flush time 370.781 ms
DWPT 3 [2023-11-21T10:32:03.841312Z; main]: flush time 372.006916 ms
DWPT 3 [2023-11-21T10:32:05.019313Z; main]: flush time 374.449833 ms
```
</details>
<details><summary>Code</summary>
```
enum Order {
RANDOM,
ROUND,
ASC,
DESC;
}
public static void main(String[] args) throws IOException {
Random random = new Random(4317849138248L);
for (Order order : Order.values()) {
System.out.println("Using order: " + order.name());
Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());
cfg.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
cfg.setInfoStream(new PrintStreamInfoStream(System.out));
cfg.setMaxBufferedDocs(1_000_000);
cfg.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH);
cfg.setIndexSort(
new Sort(LongField.newSortField("sort_field", false,
SortedNumericSelector.Type.MIN)));
IndexWriter w = new IndexWriter(dir, cfg);
Document doc = new Document();
LongField sortField = new LongField("sort_field", 0);
doc.add(sortField);
TextField stringField1 = new TextField("string_field", "",
Field.Store.NO);
doc.add(stringField1);
TextField stringField2 = new TextField("string_field", "",
Field.Store.NO);
doc.add(stringField2);
TextField stringField3 = new TextField("string_field", "",
Field.Store.NO);
doc.add(stringField3);
for (int i = 0; i < 10_000_000; ++i) {
long sortValue =
switch (order) {
case RANDOM -> random.nextLong(15);
case ROUND -> i % 15;
case ASC -> i;
case DESC -> -i;
};
sortField.setLongValue(sortValue);
stringField1.setStringValue(Integer.toBinaryString(i % 10));
stringField2.setStringValue(Integer.toBinaryString(i % 100));
stringField3.setStringValue(Integer.toBinaryString(i % 1000));
w.addDocument(doc);
}
w.flush();
w.commit();
w.close();
}
}
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]