[
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408872#comment-13408872
]
Mikhail Khludnev edited comment on SOLR-3585 at 7/8/12 7:18 PM:
----------------------------------------------------------------
Dmitry,
I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition
slightly updated solr 4.0 examples config to allow concurrency (see patch from
report.tar.gz)
in report.tar.gz you can see rate of utilization in iostat outputs.
summary:
on MacBookPro core i5
233/183/138 sec for 1/2/4 threads.
3M records, index size is slightly less than 1 G
single thread (solr as-is)
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
1024.00 6 5.99 0.00 0 0.00 36 2 62 2.62 2.16 2.10
233756 millis
Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv}
{add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg,
/m/08s7vmn, /m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0
233756
two threads:
disk0 disk2 cpu load average
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
104.09 128 13.01 0.00 0 0.00 46 6 48 4.53 2.94 2.30
183157 millis
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127,
/m/08s8wx0, /m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd,
/m/08s7zw3, /m/08s82t3, /m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ...
(1742490 adds)]} 0 183157
four threads
disk0 disk2 cpu load average
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
91.19 134 11.91 0.00 0 0.00 93 5 2 5.29 3.13 2.51
138413 millis
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn,
/m/08z05sg, /m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127,
/m/08s8wx0, /m/08s8dcy, /m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ...
(848935 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x,
/m/08s8szc, /m/08z09lt, /m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ...
(777467 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2,
/m/08s8hnz, /m/08s8mrk, /m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ...
(824674 adds)]} 0 138413
url
http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8
FYI 0.5G heap
$ java -Xmx512M -Xms512M -jar start.jar
was (Author: mkhludnev):
Dmitry,
it's a very good question. unfortunately we can choose only two of free, fast,
reliable. Contributing real life code written for customer requires enormous
legal efforts. I wrote that one from scratch.
Ok. I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition
slightly updated solr 4.0 examples config to allow concurrency (see patch from
report.tar.gz)
in report.tar.gz you can see rate of utilization in iostat outputs.
summary:
on MacBookPro core i5
233/183/138 sec for 1/2/4 threads.
3M records, index size is slightly less than 1 G
single thread (solr as-is)
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
1024.00 6 5.99 0.00 0 0.00 36 2 62 2.62 2.16 2.10
233756 millis
Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv}
{add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg,
/m/08s7vmn, /m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0
233756
two threads:
disk0 disk2 cpu load average
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
104.09 128 13.01 0.00 0 0.00 46 6 48 4.53 2.94 2.30
183157 millis
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127,
/m/08s8wx0, /m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157
Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd,
/m/08s7zw3, /m/08s82t3, /m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ...
(1742490 adds)]} 0 183157
four threads
disk0 disk2 cpu load average
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
91.19 134 11.91 0.00 0 0.00 93 5 2 5.29 3.13 2.51
138413 millis
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] webapp=/solr path=/update
params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads}
{add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn,
/m/08z05sg, /m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413
Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127,
/m/08s8wx0, /m/08s8dcy, /m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ...
(848935 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x,
/m/08s8szc, /m/08z09lt, /m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ...
(777467 adds)]} 0 138413
Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2,
/m/08s8hnz, /m/08s8mrk, /m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ...
(824674 adds)]} 0 138413
url
http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8
FYI 0.5G heap
$ java -Xmx512M -Xms512M -jar start.jar
> processing updates in multiple threads
> --------------------------------------
>
> Key: SOLR-3585
> URL: https://issues.apache.org/jira/browse/SOLR-3585
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 4.0
> Reporter: Mikhail Khludnev
> Priority: Minor
> Attachments: SOLR-3585.patch, multithreadupd.patch, report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which
> concurrently process the stream of commands. It may be beneficial for users
> who streams many docs through single request.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]