gerlowskija commented on PR #123:
URL: https://github.com/apache/solr-sandbox/pull/123#issuecomment-3258145316
Devs who want to test this out can run:
1. Download Raw Wiki Data
- `(mkdir ~/Downloads/wiki && cd ~/Downloads/wiki && wget
https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
&& bunzip2 enwiki-latest-pages-articles.xml.bz2)`
2. Compile gatling-data-prep
- `./gradlew :gatling-data-prep:jar`
3. Convert Raw Wiki Data to Solr Docs
- `mkdir .gatling/batches && java -cp
gatling-data-prep/build/libs/gatling-data-prep.jar WikipediaXmlToSolr
~/Downloads/wiki/enwiki-latest-pages-articles.xml .gatling/batches json 5000
1000`
4. Start a local Solr - any Solr can be used: local or remote, Docker or
baremetal, release or SNAPSHOT, etc. Benchmarking will assume
"http://localhost:8983/solr" unless told otherwise.
5. Install wiki configset to Solr
- `./scripts/gatling/setup_wikipedia_tests.sh`
6. Run benchmark
- `./gradlew gatlingRun --simulation
index.IndexWikipediaBatchesSimulation`
Steps (1) - (3) are only needed on initial setup, to prepare wikipedia data
into a format that's Solr-ready, so they only need to be run once. Which is
good, since these steps are pretty time-consuming. In an ideal world we would
zip up the converted data produced by (3) and have developers just download
that. The Lucene/McCandless benchmarks do something similar - they rely on
Lucene-ready pre-converted files stored in (I think) s3.
And step (5) can probably be folded into the simulation Java code itself -
it's largely just installing a configset to use in indexing.
So there's still room for simplification here. But even in the current
state, the setup is pretty manageable IMO.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]