Re: Solr Near Real-Time Search, Soft Commit Problem
I guess my first question is what evidence you have that Solr is unable to index fast enough? It's quite possible that your database connection is the thing that's unable to process fast enough. That's certainly a guess, but unless your documents are quite complex, 15 records/second isn't likely to cause Solr problems. You might try to run a small Java program that executes your database queries and see. The other question I'd ask is if you're absolutely sure that your delta-import query is correct? Is it possible that you're re-indexing *everything* every time? There's an interactive debugging console you can use that may help, try: http://localhost:8983/solr/admin/dataimport.jsp Best Erick On Thu, Nov 17, 2011 at 3:19 AM, Jak Akdemir jakde...@gmail.com wrote: Hi, I was trying to configure a Solr instance with the near real-time search and auto-complete capabilities. I stuck in the NRT feature. There are 15 new records per second that inserted into the database (mysql) and I indexed them with DIH. First, I tried to manage autoCommits from solrconfig.xml with the configuration below. autoCommit maxDocs1/maxDocs maxTime10/maxTime /autoCommit autoSoftCommit maxDocs15/maxDocs maxTime1000/maxTime /autoSoftCommit And the bash script below responsible for getting delta's without committing. while [ 1 ]; do wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false' 2/dev/null sleep 1 done Then I run my query from browser http://localhost:8080/solr-jak/select?q=movie_name_prefix_full:dogvilledefType=luceneq.op=ORhttp://localhost:8080/solr-sprongo/select?q=movie_name_prefix_full:%221398%22defType=luceneq.op=OR But I realized that, with this configuration index files are changing every second and after a minute there are only 600 new records in Solr index while 900 new records in the database. After experienced that, I removed autoCommit and autoSoftCommit elements in solrconfig.xml And updated my bashscript as follows. But still index files are changing and solr can not syncronized with database. while [ 1 ]; do echo Soft commit applied! wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false' 2/dev/null curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml --data-binary 'commit softCommit=true waitFlush=false waitSearcher=false/' 2/dev/null sleep 3 done Even I decreased the pressure on Solr as 1 new record per sec. and soft commits within 6 sec. still there is a gap between index and db. Is there anything that I missed? I took a look to /get too, but it is working only for pk. If there is an example configuration list (like 1 sec for soft commit and 10 min for hard commit) as a best practice it would be great. Finally, here is my configuration. Ubuntu 11.04 JDK 1.6.0_27 Tomcat 7.0.21 Solr 4.0 2011-10-24_08-53-02 All advices are appreciated, Best Regards, Jak
Re: Solr Near Real-Time Search, Soft Commit Problem
Eric, Thank you for your response, 1) I tried 2 new records (records have only 5 field in one table) per second, in 6 sec interval too. It should be quite easy for mysql. But I will check query responses per second as you suggested. 2) I am sure about delta-queries configured well. Full-Import is completed in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records. Also I checked it. There is no problem in it. Couple of evidences that drove me to think this is a configuration problem are 1- Index files are changing every second. 2- After a server restart last query results reserved. (In NRT they would disappear, right?) Please correct me if you see any problem in steps I applied for NRT. Additional specs, 32 bit OS 4 core i7-2630QM CPU @ 2.00GHz 6 GB memory Bests, Jak On Thu, Nov 17, 2011 at 10:44 AM, Erick Erickson erickerick...@gmail.comwrote: I guess my first question is what evidence you have that Solr is unable to index fast enough? It's quite possible that your database connection is the thing that's unable to process fast enough. That's certainly a guess, but unless your documents are quite complex, 15 records/second isn't likely to cause Solr problems. You might try to run a small Java program that executes your database queries and see. The other question I'd ask is if you're absolutely sure that your delta-import query is correct? Is it possible that you're re-indexing *everything* every time? There's an interactive debugging console you can use that may help, try: http://localhost:8983/solr/admin/dataimport.jsp Best Erick On Thu, Nov 17, 2011 at 3:19 AM, Jak Akdemir jakde...@gmail.com wrote: Hi, I was trying to configure a Solr instance with the near real-time search and auto-complete capabilities. I stuck in the NRT feature. There are 15 new records per second that inserted into the database (mysql) and I indexed them with DIH. First, I tried to manage autoCommits from solrconfig.xml with the configuration below. autoCommit maxDocs1/maxDocs maxTime10/maxTime /autoCommit autoSoftCommit maxDocs15/maxDocs maxTime1000/maxTime /autoSoftCommit And the bash script below responsible for getting delta's without committing. while [ 1 ]; do wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false ' 2/dev/null sleep 1 done Then I run my query from browser http://localhost:8080/solr-jak/select?q=movie_name_prefix_full :dogvilledefType=luceneq.op=OR http://localhost:8080/solr-sprongo/select?q=movie_name_prefix_full:%221398%22defType=luceneq.op=OR But I realized that, with this configuration index files are changing every second and after a minute there are only 600 new records in Solr index while 900 new records in the database. After experienced that, I removed autoCommit and autoSoftCommit elements in solrconfig.xml And updated my bashscript as follows. But still index files are changing and solr can not syncronized with database. while [ 1 ]; do echo Soft commit applied! wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false ' 2/dev/null curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml --data-binary 'commit softCommit=true waitFlush=false waitSearcher=false/' 2/dev/null sleep 3 done Even I decreased the pressure on Solr as 1 new record per sec. and soft commits within 6 sec. still there is a gap between index and db. Is there anything that I missed? I took a look to /get too, but it is working only for pk. If there is an example configuration list (like 1 sec for soft commit and 10 min for hard commit) as a best practice it would be great. Finally, here is my configuration. Ubuntu 11.04 JDK 1.6.0_27 Tomcat 7.0.21 Solr 4.0 2011-10-24_08-53-02 All advices are appreciated, Best Regards, Jak
Re: Solr Near Real-Time Search, Soft Commit Problem
On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote: 2) I am sure about delta-queries configured well. Full-Import is completed in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records. Also I checked it. There is no problem in it. That's 10,000 docs/sec. If you configure a soft commit for every 15 documents, that means solr is trying to do 666 commits/sec. Autocommit by number of docs rarely makes sense anymore - I'd suggest configuring both soft and hard commits based on time only. -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
Yonik, I updated my solrconfig time based only as follows. autoCommit maxTime30/maxTime /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit And changed my soft commit script to the first case. while [ 1 ]; do echo Soft commit applied! wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false' 2/dev/null sleep 1 done After full-import, I inserted 420 new records in a minute. (7 new records per second) And softCommitted every second as we can see in solrconfig.xml. It seems that after all solr can return only 326 of these new 420 records. Index files should not change every second, is it true? (After inserting 420 records if I call delta-import with commit true, all these records can be seen in solr results) Thanks, Jak On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote: 2) I am sure about delta-queries configured well. Full-Import is completed in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records. Also I checked it. There is no problem in it. That's 10,000 docs/sec. If you configure a soft commit for every 15 documents, that means solr is trying to do 666 commits/sec. Autocommit by number of docs rarely makes sense anymore - I'd suggest configuring both soft and hard commits based on time only. -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
Hmmm. It is suspicious that your index files change every second. If you change our cron task to update every 10 seconds, do the index files change every 10 seconds? Regarding your question about After a server restart last query results reserved. (In NRT they would disappear, right?) not necessarily. If your autoCommit interval is exceeded, the soft commits will be committed to disk so your Solr restart would pick them up after restart. But if somehow you're getting a hard commit to happen every second, you should also be seeing a lot of segment merging going on, are you? I think I'd stop the cron job and execute this manually for a while in order to see exactly where the problem is. I'd go ahead and comment out the autoCommit section as well. That should give you a much more reproducible test scenario. Say you do that, issue your delta-import and immediately kill your server. When it starts up if you then see the delta-data, we should understand why. Because it sure would seem like the commit=false isn't doing what you expect. Erick On Thu, Nov 17, 2011 at 12:41 PM, Jak Akdemir jakde...@gmail.com wrote: Yonik, I updated my solrconfig time based only as follows. autoCommit maxTime30/maxTime /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit And changed my soft commit script to the first case. while [ 1 ]; do echo Soft commit applied! wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false' 2/dev/null sleep 1 done After full-import, I inserted 420 new records in a minute. (7 new records per second) And softCommitted every second as we can see in solrconfig.xml. It seems that after all solr can return only 326 of these new 420 records. Index files should not change every second, is it true? (After inserting 420 records if I call delta-import with commit true, all these records can be seen in solr results) Thanks, Jak On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote: 2) I am sure about delta-queries configured well. Full-Import is completed in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records. Also I checked it. There is no problem in it. That's 10,000 docs/sec. If you configure a soft commit for every 15 documents, that means solr is trying to do 666 commits/sec. Autocommit by number of docs rarely makes sense anymore - I'd suggest configuring both soft and hard commits based on time only. -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
1- There is an improvement on the issue. I add 10 seconds time interval into the delta of data-config.xml, which will cover records that already indexed. revision_time gt; DATE_SUB('${dataimporter.last_index_time}', INTERVAL 10 SECOND); In this case 1369 new records inserted with 7 records per sec frequency. Solr response shows 1369 new records successfully. 2- If I update bashscript to sleep 10 seconds and autosoftcommit to 1 sec, index files are updated every 10 seconds If I updated autosoftcommit to 10 seconds and bashscript to sleep 10 sec, index files are updated every 10 seconds In index folder after each update, I see that segments/index files are changing. I restart the server before fell into the autocommit interval. Delta's are still in the result list. Here is my solrconfig. autoCommit maxTime30/maxTime /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit 4- I comment out the autocommit part. Still index files are changing. !-- autoCommit maxTime30/maxTime /autoCommit -- autoSoftCommit maxTime1000/maxTime /autoSoftCommit I did not modify the request part in all of these cases. wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false' 2/dev/null #curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml --data-binary 'commit softCommit=true waitFlush=false waitSearcher=false/' 2/dev/null Erick, as you mentioned, I believe that commit=false is not working properly. If you need any information, I can serve it. Thank you for all to your quick responses and advices. Bests, Jak On Thu, Nov 17, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm. It is suspicious that your index files change every second. If you change our cron task to update every 10 seconds, do the index files change every 10 seconds? Regarding your question about After a server restart last query results reserved. (In NRT they would disappear, right?) not necessarily. If your autoCommit interval is exceeded, the soft commits will be committed to disk so your Solr restart would pick them up after restart. But if somehow you're getting a hard commit to happen every second, you should also be seeing a lot of segment merging going on, are you? I think I'd stop the cron job and execute this manually for a while in order to see exactly where the problem is. I'd go ahead and comment out the autoCommit section as well. That should give you a much more reproducible test scenario. Say you do that, issue your delta-import and immediately kill your server. When it starts up if you then see the delta-data, we should understand why. Because it sure would seem like the commit=false isn't doing what you expect. Erick On Thu, Nov 17, 2011 at 12:41 PM, Jak Akdemir jakde...@gmail.com wrote: Yonik, I updated my solrconfig time based only as follows. autoCommit maxTime30/maxTime /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit And changed my soft commit script to the first case. while [ 1 ]; do echo Soft commit applied! wget -O /dev/null ' http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false ' 2/dev/null sleep 1 done After full-import, I inserted 420 new records in a minute. (7 new records per second) And softCommitted every second as we can see in solrconfig.xml. It seems that after all solr can return only 326 of these new 420 records. Index files should not change every second, is it true? (After inserting 420 records if I call delta-import with commit true, all these records can be seen in solr results) Thanks, Jak On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote: 2) I am sure about delta-queries configured well. Full-Import is completed in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records. Also I checked it. There is no problem in it. That's 10,000 docs/sec. If you configure a soft commit for every 15 documents, that means solr is trying to do 666 commits/sec. Autocommit by number of docs rarely makes sense anymore - I'd suggest configuring both soft and hard commits based on time only. -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
On Thu, Nov 17, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm. It is suspicious that your index files change every second. Why is this suspicious? A soft commit still writes out some files currently... it just doesn't fsync them. -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
Yonik, Is it ok to see soft committed records after server restart, too? If it is, there is no problem left at all. I added changing files and 1 sec of log at the end of the e-mail. One significant line says softCommit=true, so Solr recognizes our softCommit request. INFO: start commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=true) I want to fix just a little typo from my last e-mail. ... autosoftcommit to 10 seconds and bashscript to sleep 10 sec, index files are ... should be ... autosoftcommit to 10 seconds and bashscript to sleep *1 sec*, index files are ... Jak ___ jak@jak:/usr/java/solr4/data$ ls index/ _11_0.frq _11.nrm_14.fdx_16_0.tip _1b_0.prx _1b.per_1c.fnm _1d.fdt_1e_0.tim _1f.fdt _l.fdt_r_0.tim segments_2_t.fdx _11_0.prx _11.per_14.fnm_16.fdt_1b_0.tim _1c_0.frq _1c.nrm _1d.fdx_1e_0.tip _1f.fdx _l.fdx_r_0.tip segments.gen _t.fnm _11_0.tim _14_0.frq _14.nrm_16.fdx_1b_0.tip _1c_0.prx _1c.per _1d.fnm_1e.fdt_1.fnx_l.fnm_r.fdt_t_0.frq _t.nrm _11_0.tip _14_0.prx _14.per_16.fnm_1b.fdt_1c_0.tim _1d_0.frq _1d.nrm_1e.fdx_l_0.frq _l.nrm_r.fdx_t_0.prx _t.per _11.fdt_14_0.tim _16_0.frq _16.nrm_1b.fdx_1c_0.tip _1d_0.prx _1d.per_1e.fnm_l_0.prx _l.per_r.fnm_t_0.tim write.lock _11.fdx_14_0.tip _16_0.prx _16.per_1b.fnm_1c.fdt_1d_0.tim _1e_0.frq _1e.nrm_l_0.tim _r_0.frq _r.nrm_t_0.tip _11.fnm_14.fdt_16_0.tim _1b_0.frq _1b.nrm_1c.fdx_1d_0.tip _1e_0.prx _1e.per_l_0.tip _r_0.prx _r.per_t.fdt jak@jak:/usr/java/solr4/data$ ls index/ _11_0.frq _11.nrm_14.fdx_16_0.tip _1b_0.prx _1b.per_1c.fnm _1d.fdt_1e_0.tim _1f_0.frq _1f.nrm _l.fdt_r_0.tim segments_2 _t.fdx _11_0.prx _11.per_14.fnm_16.fdt_1b_0.tim _1c_0.frq _1c.nrm _1d.fdx_1e_0.tip _1f_0.prx _1.fnx_l.fdx_r_0.tip segments.gen _t.fnm _11_0.tim _14_0.frq _14.nrm_16.fdx_1b_0.tip _1c_0.prx _1c.per _1d.fnm_1e.fdt_1f_0.tim _1f.per _l.fnm_r.fdt_t_0.frq _t.nrm _11_0.tip _14_0.prx _14.per_16.fnm_1b.fdt_1c_0.tim _1d_0.frq _1d.nrm_1e.fdx_1f_0.tip _l_0.frq _l.nrm_r.fdx_t_0.prx _t.per _11.fdt_14_0.tim _16_0.frq _16.nrm_1b.fdx_1c_0.tip _1d_0.prx _1d.per_1e.fnm_1f.fdt_l_0.prx _l.per_r.fnm_t_0.tim write.lock _11.fdx_14_0.tip _16_0.prx _16.per_1b.fnm_1c.fdt_1d_0.tim _1e_0.frq _1e.nrm_1f.fdx_l_0.tim _r_0.frq _r.nrm_t_0.tip _11.fnm_14.fdt_16_0.tim _1b_0.frq _1b.nrm_1c.fdx_1d_0.tip _1e_0.prx _1e.per_1f.fnm_l_0.tip _r_0.prx _r.per_t.fdt ___ Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 17, 2011 2:55:17 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr-jak path=/dataimport params={commit=falsecommand=delta-import} status=0 QTime=0 Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties INFO: Read dataimport.properties Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: movie Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity movie with URL: jdbc:mysql://localhost/imdb Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 8 Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: movie rows obtained : 147 Nov 17, 2011 2:55:17 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=true) Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: movie rows obtained : 0 Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: movie Nov 17, 2011 2:55:17 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@1520a8e main Nov 17, 2011 2:55:17 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Nov 17, 2011 2:55:17 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@1520a8emain{DirectoryReader(segments_2:1321559475026:nrt _k(4.0):C388607 _50(4.0):C526/132 _3q(4.0):C444/141 _43(4.0):C450/126 _4r(4.0):C470/125 _4e(4.0):C456/135 _3f(4.0):C428/133 _51(4.0):C132/126
Re: Solr Near Real-Time Search, Soft Commit Problem
On Thu, Nov 17, 2011 at 3:56 PM, Jak Akdemir jakde...@gmail.com wrote: Is it ok to see soft committed records after server restart, too? Yes... we currently have Jetty configured to call some cleanups on exit (such as closing the index writer). -Yonik http://www.lucidimagination.com
Re: Solr Near Real-Time Search, Soft Commit Problem
This is great! I guess, there is nothing left to worry about for a while. Erick Yonik, thank you again for your great responses. Bests, Jak On Thu, Nov 17, 2011 at 4:01 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Nov 17, 2011 at 3:56 PM, Jak Akdemir jakde...@gmail.com wrote: Is it ok to see soft committed records after server restart, too? Yes... we currently have Jetty configured to call some cleanups on exit (such as closing the index writer). -Yonik http://www.lucidimagination.com