[ https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408872#comment-13408872 ]
Mikhail Khludnev edited comment on SOLR-3585 at 7/8/12 7:18 PM: ---------------------------------------------------------------- Dmitry, I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition slightly updated solr 4.0 examples config to allow concurrency (see patch from report.tar.gz) in report.tar.gz you can see rate of utilization in iostat outputs. summary: on MacBookPro core i5 233/183/138 sec for 1/2/4 threads. 3M records, index size is slightly less than 1 G single thread (solr as-is) KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 1024.00 6 5.99 0.00 0 0.00 36 2 62 2.62 2.16 2.10 233756 millis Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv} {add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg, /m/08s7vmn, /m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0 233756 two threads: disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 104.09 128 13.01 0.00 0 0.00 46 6 48 4.53 2.94 2.30 183157 millis Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads} {add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127, /m/08s8wx0, /m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157 Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd, /m/08s7zw3, /m/08s82t3, /m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ... (1742490 adds)]} 0 183157 four threads disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 91.19 134 11.91 0.00 0 0.00 93 5 2 5.29 3.13 2.51 138413 millis Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads} {add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn, /m/08z05sg, /m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413 Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127, /m/08s8wx0, /m/08s8dcy, /m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ... (848935 adds)]} 0 138413 Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x, /m/08s8szc, /m/08z09lt, /m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ... (777467 adds)]} 0 138413 Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2, /m/08s8hnz, /m/08s8mrk, /m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ... (824674 adds)]} 0 138413 url http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8 FYI 0.5G heap $ java -Xmx512M -Xms512M -jar start.jar was (Author: mkhludnev): Dmitry, it's a very good question. unfortunately we can choose only two of free, fast, reliable. Contributing real life code written for customer requires enormous legal efforts. I wrote that one from scratch. Ok. I've took 3M rows tsv from http://www.freebase.com/view/book/book_edition slightly updated solr 4.0 examples config to allow concurrency (see patch from report.tar.gz) in report.tar.gz you can see rate of utilization in iostat outputs. summary: on MacBookPro core i5 233/183/138 sec for 1/2/4 threads. 3M records, index size is slightly less than 1 G single thread (solr as-is) KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 1024.00 6 5.99 0.00 0 0.00 36 2 62 2.62 2.16 2.10 233756 millis Jul 8, 2012 11:41:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={commit=true&Zupdate.chain=threads&Zbacking.chain=logrun&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv} {add=[/m/08s9170, /m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7nqy, /m/08s7rkg, /m/08s7vmn, /m/08s7yzd, /m/08s7zlw, /m/08s7zw3, ... (3401073 adds)],commit=} 0 233756 two threads: disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 104.09 128 13.01 0.00 0 0.00 46 6 48 4.53 2.94 2.30 183157 millis Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads} {add=[/m/08s7myj, /m/08s7nfb, /m/08s912p, /m/08s7rkg, /m/08s7zlw, /m/08s8127, /m/08s8wx0, /m/08s8_cg, /m/08s8cd2, /m/08s8wjv, ... (1658583 adds)]} 0 183157 Jul 8, 2012 11:25:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s7yzd, /m/08s7zw3, /m/08s82t3, /m/08s8dcy, /m/08s8hnz, /m/08s8j3x, /m/08s8mfs, ... (1742490 adds)]} 0 183157 four threads disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 91.19 134 11.91 0.00 0 0.00 93 5 2 5.29 3.13 2.51 138413 millis Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={backing.chain=logrun&commit=true&stream.contentType=text/csv;charset%3Dutf-8&separator=%09&escape=\&stream.file=/Users/mkhl/Downloads/book_edition.tsv&update.chain=threads} {add=[/m/08s912p, /m/08s7yzd, /m/08s82t3, /m/08s8wjv, /m/08s8nx4, /m/08s8txn, /m/08z05sg, /m/08z05jm, /m/08yzqg0, /m/08yzkh2, ... (949997 adds)]} 0 138413 Jul 8, 2012 11:53:27 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s9170, /m/08s7nqy, /m/08s7vmn, /m/08s8127, /m/08s8wx0, /m/08s8dcy, /m/08s8mfs, /m/08s8v_7, /m/08z06nt, /m/08z05c5, ... (848935 adds)]} 0 138413 Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s7nfb, /m/08s7zw3, /m/08s8_cg, /m/08s8j3x, /m/08s8szc, /m/08z09lt, /m/08yzf7_, /m/08yz2b1, /m/08yz24n, /m/08yyz1r, ... (777467 adds)]} 0 138413 Jul 8, 2012 11:53:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] {add=[/m/08s7myj, /m/08s7rkg, /m/08s7zlw, /m/08s8cd2, /m/08s8hnz, /m/08s8mrk, /m/08s8v92, /m/08s8tz0, /m/08z097y, /m/08z047g, ... (824674 adds)]} 0 138413 url http://localhost:8983/solr/update?commit=true&separator=%09&escape=\&update.chain=threads&backing.chain=logrun&stream.file=/Users/mkhl/Downloads/book_edition.tsv&stream.contentType=text/csv;charset=utf-8 FYI 0.5G heap $ java -Xmx512M -Xms512M -jar start.jar > processing updates in multiple threads > -------------------------------------- > > Key: SOLR-3585 > URL: https://issues.apache.org/jira/browse/SOLR-3585 > Project: Solr > Issue Type: Improvement > Components: update > Affects Versions: 4.0 > Reporter: Mikhail Khludnev > Priority: Minor > Attachments: SOLR-3585.patch, multithreadupd.patch, report.tar.gz > > > Hello, > I'd like to contribute update processor which forks many threads which > concurrently process the stream of commands. It may be beneficial for users > who streams many docs through single request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org