I finally dug into this, and it turns out the nightly benchmark I run had bad bottlenecks such that it couldn't feed documents quickly enough to Lucene to take advantage of the concurrent hardware in beast2.
I fixed that and just re-ran the nightly run and it shows good gains: https://plus.google.com/+MichaelMcCandless/posts/6mzSoY4ucFE I suspect more gains are possible ... I need to play some more. Mike McCandless http://blog.mikemccandless.com On Fri, Apr 15, 2016 at 12:43 PM, Robert Muir <rcm...@gmail.com> wrote: > you won't see indexing improvements there because the dataset in > question is wikipedia and mostly indexing full text. I think it may > have one measly numeric field. > > On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić > <otis.gospodne...@gmail.com> wrote: > > (replying to my original email because I didn't get people's replies, > even > > though I see in the archives people replied) > > > > Re BJ and beast2 upgrade. Yeah, I saw that, but.... > > * if there is no indexing throughput improvement after that, does that > mean > > that those particular indexing tests happen to be disk bound and not CPU > > bound? (I'm assuming beast2 has more cores than the previous hardware.... > > oh, I see, 72 cores vs. only 20 indexing threads) > > * the metrics for GC times are sums across all CPUs, not averages per > CPU? > > Would the latter be more useful? > > > > What I was fishing for was something in that indexing chart that would > show > > me this little nugget: > > > > *Lucene 6 brings a major new feature called Dimensional Points: a new > > tree-based data structure which will be used for numeric, date, and > > geospatial fields. Compared to the existing field format, this new > > structure uses half the disk space, is twice as fast to index, and > > increases search performance by 25%.* > > > > How come the charts on > > http://home.apache.org/~mikemccand/lucenebench/indexing.html don't show > the > > 2x faster indexing and various query performance charts don't show 25% > > improvement in search performance? > > > > Thanks, > > Otis > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > > On Thu, Apr 14, 2016 at 1:13 PM, Otis Gospodnetić < > > otis.gospodne...@gmail.com> wrote: > > > >> Hi, > >> > >> I was looking at Mike's > >> http://home.apache.org/~mikemccand/lucenebench/indexing.html secretly > >> hoping to spot some recent improvements in indexing throughput.... but > >> instead it looks like: > >> > >> * indexing throughput hasn't really gone up in the last ~5 years > >> * indexing was faster in 2014, but then dropped to pre-2014 speed in > early > >> 2015 > >> * indexing rate dropped some more in early 2016, and that seems to > roughly > >> correlate to a *big* jump in Young GC in late 2015 > >> > >> Does anyone know what happened in late 2015 that causes that big Young > GC > >> jump? > >> Or does that big jump just look scary in that chart, but is not > actually a > >> big concern in practice? > >> > >> Thanks, > >> Otis > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >