Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Erick Erickson
I guess my first question is what evidence you have that Solr is
unable to index fast enough? It's quite possible that your
database connection is the thing that's unable to process fast
enough.

That's certainly a guess, but unless your documents are
quite complex, 15 records/second isn't likely to cause Solr
problems. You might try to run a small Java program that
executes your database queries and see.

The other question I'd ask is if you're absolutely sure that
your delta-import query is correct? Is it possible that you're
re-indexing *everything* every time? There's an interactive
debugging console you can use that may help, try:
http://localhost:8983/solr/admin/dataimport.jsp

Best
Erick

On Thu, Nov 17, 2011 at 3:19 AM, Jak Akdemir jakde...@gmail.com wrote:
 Hi,

 I was trying to configure a Solr instance with the near real-time search
 and auto-complete capabilities. I stuck in the NRT feature. There are
 15 new records per second that inserted into the database (mysql) and I
 indexed them with DIH. First, I tried to manage autoCommits from
 solrconfig.xml with the configuration below.

 autoCommit
         maxDocs1/maxDocs
         maxTime10/maxTime
       /autoCommit

 autoSoftCommit
         maxDocs15/maxDocs
         maxTime1000/maxTime
 /autoSoftCommit

 And the bash script below responsible for getting delta's without
 committing.

 while [ 1 ]; do
 wget -O /dev/null '
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false'
 2/dev/null
 sleep 1
 done

 Then I run my query from browser
 http://localhost:8080/solr-jak/select?q=movie_name_prefix_full:dogvilledefType=luceneq.op=ORhttp://localhost:8080/solr-sprongo/select?q=movie_name_prefix_full:%221398%22defType=luceneq.op=OR

 But I realized that, with this configuration index files are changing every
 second and after a minute there are only 600 new records in Solr index
 while 900 new records in the database.
 After experienced that, I removed autoCommit and autoSoftCommit elements in
 solrconfig.xml And updated my bashscript as follows. But still index files
 are changing and solr can not syncronized with database.

 while [ 1 ]; do
 echo Soft commit applied!
 wget -O /dev/null '
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false'
 2/dev/null
 curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml
 --data-binary 'commit softCommit=true waitFlush=false
 waitSearcher=false/' 2/dev/null
 sleep 3
 done

 Even I decreased the pressure on Solr as 1 new record per sec. and soft
 commits within 6 sec. still there is a gap between index and db. Is there
 anything that I missed? I took a look to /get too, but it is working only
 for pk. If there is an example configuration list (like 1 sec for soft
 commit and 10 min for hard commit) as a best practice it would be great.

 Finally, here is my configuration.
 Ubuntu 11.04
 JDK 1.6.0_27
 Tomcat 7.0.21
 Solr 4.0 2011-10-24_08-53-02

 All advices are appreciated,

 Best Regards,

 Jak



Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Jak Akdemir
Eric,

Thank you for your response,

1) I tried 2 new records (records have only 5 field in one table) per
second, in 6 sec interval too. It should be quite  easy for mysql. But I
will check query responses per second as you suggested.

2) I am sure about delta-queries configured well. Full-Import is completed
in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records.
Also I checked it. There is no problem in it.

Couple of evidences that drove me to think this is a configuration problem
are
1- Index files are changing every second.
2- After a server restart last query results reserved. (In NRT they would
disappear, right?)

Please correct me if you see any problem in steps I applied for NRT.

Additional specs,
32 bit OS
4 core i7-2630QM CPU @ 2.00GHz
6 GB memory

Bests,

Jak

On Thu, Nov 17, 2011 at 10:44 AM, Erick Erickson erickerick...@gmail.comwrote:

 I guess my first question is what evidence you have that Solr is
 unable to index fast enough? It's quite possible that your
 database connection is the thing that's unable to process fast
 enough.

 That's certainly a guess, but unless your documents are
 quite complex, 15 records/second isn't likely to cause Solr
 problems. You might try to run a small Java program that
 executes your database queries and see.

 The other question I'd ask is if you're absolutely sure that
 your delta-import query is correct? Is it possible that you're
 re-indexing *everything* every time? There's an interactive
 debugging console you can use that may help, try:
 http://localhost:8983/solr/admin/dataimport.jsp

 Best
 Erick

 On Thu, Nov 17, 2011 at 3:19 AM, Jak Akdemir jakde...@gmail.com wrote:
  Hi,
 
  I was trying to configure a Solr instance with the near real-time search
  and auto-complete capabilities. I stuck in the NRT feature. There are
  15 new records per second that inserted into the database (mysql) and I
  indexed them with DIH. First, I tried to manage autoCommits from
  solrconfig.xml with the configuration below.
 
  autoCommit
  maxDocs1/maxDocs
  maxTime10/maxTime
/autoCommit
 
  autoSoftCommit
  maxDocs15/maxDocs
  maxTime1000/maxTime
  /autoSoftCommit
 
  And the bash script below responsible for getting delta's without
  committing.
 
  while [ 1 ]; do
  wget -O /dev/null '
 
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false
 '
  2/dev/null
  sleep 1
  done
 
  Then I run my query from browser
  http://localhost:8080/solr-jak/select?q=movie_name_prefix_full
 :dogvilledefType=luceneq.op=OR
 http://localhost:8080/solr-sprongo/select?q=movie_name_prefix_full:%221398%22defType=luceneq.op=OR
 
 
  But I realized that, with this configuration index files are changing
 every
  second and after a minute there are only 600 new records in Solr index
  while 900 new records in the database.
  After experienced that, I removed autoCommit and autoSoftCommit elements
 in
  solrconfig.xml And updated my bashscript as follows. But still index
 files
  are changing and solr can not syncronized with database.
 
  while [ 1 ]; do
  echo Soft commit applied!
  wget -O /dev/null '
 
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false
 '
  2/dev/null
  curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml
  --data-binary 'commit softCommit=true waitFlush=false
  waitSearcher=false/' 2/dev/null
  sleep 3
  done
 
  Even I decreased the pressure on Solr as 1 new record per sec. and soft
  commits within 6 sec. still there is a gap between index and db. Is there
  anything that I missed? I took a look to /get too, but it is working
 only
  for pk. If there is an example configuration list (like 1 sec for soft
  commit and 10 min for hard commit) as a best practice it would be great.
 
  Finally, here is my configuration.
  Ubuntu 11.04
  JDK 1.6.0_27
  Tomcat 7.0.21
  Solr 4.0 2011-10-24_08-53-02
 
  All advices are appreciated,
 
  Best Regards,
 
  Jak
 



Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote:
 2) I am sure about delta-queries configured well. Full-Import is completed
 in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records.
 Also I checked it. There is no problem in it.

That's 10,000 docs/sec.  If you configure a soft commit for every 15
documents, that means solr is trying to do 666 commits/sec.
Autocommit by number of docs rarely makes sense anymore - I'd suggest
configuring both soft and hard commits based on time only.

-Yonik
http://www.lucidimagination.com


Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Jak Akdemir
Yonik,

I updated my solrconfig time based only as follows.
autoCommit
 maxTime30/maxTime
   /autoCommit

autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

And changed my soft commit script to the first case.
while [ 1 ]; do
echo Soft commit applied!
wget -O /dev/null '
http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false'
2/dev/null
sleep 1
done

After full-import,  I inserted 420 new records in a minute. (7 new records
per second)  And softCommitted every second as we can see in solrconfig.xml.
It seems that after all solr can return only 326 of these new 420 records.
Index files should not change every second, is it true? (After inserting
420 records if I call delta-import with commit true, all these records can
be seen in solr results)

Thanks,

Jak

On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote:
  2) I am sure about delta-queries configured well. Full-Import is
 completed
  in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records.
  Also I checked it. There is no problem in it.

 That's 10,000 docs/sec.  If you configure a soft commit for every 15
 documents, that means solr is trying to do 666 commits/sec.
 Autocommit by number of docs rarely makes sense anymore - I'd suggest
 configuring both soft and hard commits based on time only.

 -Yonik
 http://www.lucidimagination.com



Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Erick Erickson
Hmmm. It is suspicious that your index files change every
second. If you change our cron task to update every 10
seconds, do the index files change every 10 seconds?

Regarding your question about
After a server restart last query results reserved. (In NRT they would
disappear, right?)
not necessarily. If your autoCommit interval is exceeded, the soft commits
will be committed to disk so your Solr restart would pick them up after restart.

But if somehow you're getting a hard commit to happen every second, you should
also be seeing a lot of segment merging going on, are you?

I think I'd stop the cron job and execute this manually for a while in
order to see exactly
where the problem is. I'd go ahead and comment out the autoCommit section
as well. That should give you a much more reproducible test scenario.

Say you do that, issue your delta-import and immediately kill your
server. When it
starts up if you then see the delta-data, we should understand why.
Because it sure
would seem like the commit=false isn't doing what you expect.

Erick

On Thu, Nov 17, 2011 at 12:41 PM, Jak Akdemir jakde...@gmail.com wrote:
 Yonik,

 I updated my solrconfig time based only as follows.
 autoCommit
         maxTime30/maxTime
       /autoCommit

 autoSoftCommit
         maxTime1000/maxTime
       /autoSoftCommit

 And changed my soft commit script to the first case.
 while [ 1 ]; do
 echo Soft commit applied!
 wget -O /dev/null '
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false'
 2/dev/null
 sleep 1
 done

 After full-import,  I inserted 420 new records in a minute. (7 new records
 per second)  And softCommitted every second as we can see in solrconfig.xml.
 It seems that after all solr can return only 326 of these new 420 records.
 Index files should not change every second, is it true? (After inserting
 420 records if I call delta-import with commit true, all these records can
 be seen in solr results)

 Thanks,

 Jak

 On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley
 yo...@lucidimagination.comwrote:

 On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com wrote:
  2) I am sure about delta-queries configured well. Full-Import is
 completed
  in 40 secs for 40 docs. And delta's are in 1 sec for 15 new records.
  Also I checked it. There is no problem in it.

 That's 10,000 docs/sec.  If you configure a soft commit for every 15
 documents, that means solr is trying to do 666 commits/sec.
 Autocommit by number of docs rarely makes sense anymore - I'd suggest
 configuring both soft and hard commits based on time only.

 -Yonik
 http://www.lucidimagination.com




Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Jak Akdemir
1- There is an improvement on the issue. I add 10 seconds time interval
into the delta of data-config.xml, which will cover records that already
indexed.
revision_time gt; DATE_SUB('${dataimporter.last_index_time}', INTERVAL 10
SECOND);
In this case 1369 new records inserted with 7 records per sec frequency.
Solr response shows 1369 new records successfully.

2-
If I update bashscript to sleep 10 seconds and autosoftcommit to 1 sec,
index files are updated every 10 seconds
If I updated autosoftcommit to 10 seconds and bashscript to sleep 10 sec,
index files are updated every 10 seconds
In index folder after each update, I see that segments/index files are
changing.
I restart the server before fell into the autocommit interval. Delta's are
still in the result list.
Here is my solrconfig.
autoCommit
 maxTime30/maxTime
   /autoCommit
autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

4- I comment out the autocommit part. Still index files are changing.
!--
autoCommit
 maxTime30/maxTime
   /autoCommit
--
autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

I did not modify the request part in all of these cases.
wget -O /dev/null '
http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false'
2/dev/null
#curl http://localhost:8080/solr-jak/update -H Content-Type: text/xml
--data-binary 'commit softCommit=true waitFlush=false
waitSearcher=false/' 2/dev/null

Erick, as you mentioned, I believe that commit=false is not working
properly. If you need any information, I can serve it.
Thank you for all to your quick responses and advices.

Bests,

Jak


On Thu, Nov 17, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm. It is suspicious that your index files change every
 second. If you change our cron task to update every 10
 seconds, do the index files change every 10 seconds?

 Regarding your question about
 After a server restart last query results reserved. (In NRT they would
 disappear, right?)
 not necessarily. If your autoCommit interval is exceeded, the soft
 commits
 will be committed to disk so your Solr restart would pick them up after
 restart.

 But if somehow you're getting a hard commit to happen every second, you
 should
 also be seeing a lot of segment merging going on, are you?

 I think I'd stop the cron job and execute this manually for a while in
 order to see exactly
 where the problem is. I'd go ahead and comment out the autoCommit section
 as well. That should give you a much more reproducible test scenario.

 Say you do that, issue your delta-import and immediately kill your
 server. When it
 starts up if you then see the delta-data, we should understand why.
 Because it sure
 would seem like the commit=false isn't doing what you expect.

 Erick

 On Thu, Nov 17, 2011 at 12:41 PM, Jak Akdemir jakde...@gmail.com wrote:
  Yonik,
 
  I updated my solrconfig time based only as follows.
  autoCommit
  maxTime30/maxTime
/autoCommit
 
  autoSoftCommit
  maxTime1000/maxTime
/autoSoftCommit
 
  And changed my soft commit script to the first case.
  while [ 1 ]; do
  echo Soft commit applied!
  wget -O /dev/null '
 
 http://localhost:8080/solr-jak/dataimport?command=delta-importcommit=false
 '
  2/dev/null
  sleep 1
  done
 
  After full-import,  I inserted 420 new records in a minute. (7 new
 records
  per second)  And softCommitted every second as we can see in
 solrconfig.xml.
  It seems that after all solr can return only 326 of these new 420
 records.
  Index files should not change every second, is it true? (After inserting
  420 records if I call delta-import with commit true, all these records
 can
  be seen in solr results)
 
  Thanks,
 
  Jak
 
  On Thu, Nov 17, 2011 at 12:14 PM, Yonik Seeley
  yo...@lucidimagination.comwrote:
 
  On Thu, Nov 17, 2011 at 11:48 AM, Jak Akdemir jakde...@gmail.com
 wrote:
   2) I am sure about delta-queries configured well. Full-Import is
  completed
   in 40 secs for 40 docs. And delta's are in 1 sec for 15 new
 records.
   Also I checked it. There is no problem in it.
 
  That's 10,000 docs/sec.  If you configure a soft commit for every 15
  documents, that means solr is trying to do 666 commits/sec.
  Autocommit by number of docs rarely makes sense anymore - I'd suggest
  configuring both soft and hard commits based on time only.
 
  -Yonik
  http://www.lucidimagination.com
 
 



Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.com wrote:
 Hmmm. It is suspicious that your index files change every
 second.

Why is this suspicious?
A soft commit still writes out some files currently... it just doesn't
fsync them.

-Yonik
http://www.lucidimagination.com


Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Jak Akdemir
Yonik,

Is it ok to see soft committed records after server restart, too? If it is,
there is no problem left at all.
I added changing files and 1 sec of log at the end of the e-mail. One
significant line says softCommit=true, so Solr recognizes our softCommit
request.
INFO: start
commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=true)

I want to fix just a little typo from my last e-mail.
... autosoftcommit to 10 seconds and bashscript to sleep 10 sec, index
files are ...
should be ... autosoftcommit to 10 seconds and bashscript to sleep *1 sec*,
index files are ...

Jak

___
jak@jak:/usr/java/solr4/data$ ls index/
_11_0.frq  _11.nrm_14.fdx_16_0.tip  _1b_0.prx  _1b.per_1c.fnm
 _1d.fdt_1e_0.tim  _1f.fdt   _l.fdt_r_0.tim  segments_2_t.fdx
_11_0.prx  _11.per_14.fnm_16.fdt_1b_0.tim  _1c_0.frq  _1c.nrm
 _1d.fdx_1e_0.tip  _1f.fdx   _l.fdx_r_0.tip  segments.gen  _t.fnm
_11_0.tim  _14_0.frq  _14.nrm_16.fdx_1b_0.tip  _1c_0.prx  _1c.per
 _1d.fnm_1e.fdt_1.fnx_l.fnm_r.fdt_t_0.frq  _t.nrm
_11_0.tip  _14_0.prx  _14.per_16.fnm_1b.fdt_1c_0.tim  _1d_0.frq
 _1d.nrm_1e.fdx_l_0.frq  _l.nrm_r.fdx_t_0.prx  _t.per
_11.fdt_14_0.tim  _16_0.frq  _16.nrm_1b.fdx_1c_0.tip  _1d_0.prx
 _1d.per_1e.fnm_l_0.prx  _l.per_r.fnm_t_0.tim
 write.lock
_11.fdx_14_0.tip  _16_0.prx  _16.per_1b.fnm_1c.fdt_1d_0.tim
 _1e_0.frq  _1e.nrm_l_0.tim  _r_0.frq  _r.nrm_t_0.tip
_11.fnm_14.fdt_16_0.tim  _1b_0.frq  _1b.nrm_1c.fdx_1d_0.tip
 _1e_0.prx  _1e.per_l_0.tip  _r_0.prx  _r.per_t.fdt
jak@jak:/usr/java/solr4/data$ ls index/
_11_0.frq  _11.nrm_14.fdx_16_0.tip  _1b_0.prx  _1b.per_1c.fnm
 _1d.fdt_1e_0.tim  _1f_0.frq  _1f.nrm   _l.fdt_r_0.tim  segments_2
   _t.fdx
_11_0.prx  _11.per_14.fnm_16.fdt_1b_0.tim  _1c_0.frq  _1c.nrm
 _1d.fdx_1e_0.tip  _1f_0.prx  _1.fnx_l.fdx_r_0.tip
 segments.gen  _t.fnm
_11_0.tim  _14_0.frq  _14.nrm_16.fdx_1b_0.tip  _1c_0.prx  _1c.per
 _1d.fnm_1e.fdt_1f_0.tim  _1f.per   _l.fnm_r.fdt_t_0.frq
   _t.nrm
_11_0.tip  _14_0.prx  _14.per_16.fnm_1b.fdt_1c_0.tim  _1d_0.frq
 _1d.nrm_1e.fdx_1f_0.tip  _l_0.frq  _l.nrm_r.fdx_t_0.prx
   _t.per
_11.fdt_14_0.tim  _16_0.frq  _16.nrm_1b.fdx_1c_0.tip  _1d_0.prx
 _1d.per_1e.fnm_1f.fdt_l_0.prx  _l.per_r.fnm_t_0.tim
   write.lock
_11.fdx_14_0.tip  _16_0.prx  _16.per_1b.fnm_1c.fdt_1d_0.tim
 _1e_0.frq  _1e.nrm_1f.fdx_l_0.tim  _r_0.frq  _r.nrm_t_0.tip
_11.fnm_14.fdt_16_0.tim  _1b_0.frq  _1b.nrm_1c.fdx_1d_0.tip
 _1e_0.prx  _1e.per_1f.fnm_l_0.tip  _r_0.prx  _r.per_t.fdt

___
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Nov 17, 2011 2:55:17 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr-jak path=/dataimport
params={commit=falsecommand=delta-import} status=0 QTime=0
Nov 17, 2011 2:55:17 PM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
INFO: Read dataimport.properties
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Starting delta collection.
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: movie
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity movie with URL:
jdbc:mysql://localhost/imdb
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 8
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: movie rows obtained : 147
Nov 17, 2011 2:55:17 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=true)
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: movie rows obtained : 0
Nov 17, 2011 2:55:17 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: movie
Nov 17, 2011 2:55:17 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@1520a8e main
Nov 17, 2011 2:55:17 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Nov 17, 2011 2:55:17 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming
Searcher@1520a8emain{DirectoryReader(segments_2:1321559475026:nrt
_k(4.0):C388607
_50(4.0):C526/132 _3q(4.0):C444/141 _43(4.0):C450/126 _4r(4.0):C470/125
_4e(4.0):C456/135 _3f(4.0):C428/133 _51(4.0):C132/126 

Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 3:56 PM, Jak Akdemir jakde...@gmail.com wrote:
 Is it ok to see soft committed records after server restart, too?

Yes... we currently have Jetty configured to call some cleanups on
exit (such as closing the index writer).

-Yonik
http://www.lucidimagination.com


Re: Solr Near Real-Time Search, Soft Commit Problem

2011-11-17 Thread Jak Akdemir
This is great! I guess, there is nothing left to worry about for a while.
Erick  Yonik, thank you again for your great responses.

Bests,

Jak

On Thu, Nov 17, 2011 at 4:01 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Nov 17, 2011 at 3:56 PM, Jak Akdemir jakde...@gmail.com wrote:
  Is it ok to see soft committed records after server restart, too?

 Yes... we currently have Jetty configured to call some cleanups on
 exit (such as closing the index writer).

 -Yonik
 http://www.lucidimagination.com