Re: Recovery problem in solrcloud

2012-08-08 Thread Jam Luo
Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:54)
at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at

Re: Recovery problem in solrcloud

2012-08-08 Thread Yonik Seeley
Stack trace looks normal - it's just a multi-term query instantiating
a bitset.  The memory is being taken up somewhere else.
How many documents are in your index?
Can you get a heap dump or use some other memory profiler to see
what's taking up the space?

 if I stop query more then  ten minutes, the solr instance will start normally.

Maybe queries are piling up in threads before the server is ready to
handle them and then trying to handle them all at once gives an OOM?
Is this live traffic or a test?  How many concurrent requests get sent?

-Yonik
http://lucidimagination.com


On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo cooljam2...@gmail.com wrote:
 Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
 heap space
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
 at org.eclipse.jetty.server.Server.handle(Server.java:351)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at org.apache.lucene.util.FixedBitSet.init(FixedBitSet.java:54)
 at
 org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
 at
 org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
 at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
 at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
 at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
 at
 

Re: Recovery problem in solrcloud

2012-08-08 Thread Jam Luo
There are 400 million documents in a shard, a document is less then 1 kb.
the data file _**.fdt is 149g.
Does the recovering need large memory in downloading or after downloaded?

I find some log before OOM as below:
Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+descdistrib=falsecollection=today,bloghl.fl=contentwt=javabinhl=falserows=10version=2f.content.hl.fragsize=0fl=idshard.url=index35:8983/solr/blog/NOW=1344217556702start=0q=(((somewordsA+%26%26+somewordsB+%26%26+somewordsC)+%26%26+platform:abc)+||+id:/)+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2012-08-06T01:43:28.462Z])_system=businessisShard=truefsv=truef.title.hl.fragsize=0}
hits=0 status=0 QTime=95
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/home/ant/jetty/solr/data/index.20120801114027,segFN=segments_aui,generation=14058,filenames=[_cdnu_nrm.cfs,
_cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim,
_cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per]
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 14058
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@13578a09 main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [blog] webapp=/solr path=/update
params={waitSearcher=truecommit_end_point=truewt=javabincommit=trueversion=2}
{commit=} 0 1439
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@1a630c4d main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+descdistrib=falsecollection=today,bloghl.fl=contentwt=javabinhl=falserows=10version=2f.content.hl.fragsize=0fl=idshard.url=index35:8983/solr/blog/NOW=1344217558778start=0_system=businessq=(((somewordsD)+%26%26+platform:(abc))+||+id:/)+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.537Z])isShard=truefsv=truef.title.hl.fragsize=0}
hits=0 status=0 QTime=490

Except this log, all of other are path=/select ** in a few minutes,
there is no add documents request in this cluster in this time.Is
that related to the OOM?

This is live traffic, so I can't test it frequently, Tonight I add
-XX:+HeapDumpOnOutOfMemoryError
option, if this problem appear once again, I will get the  heap dump, but I
am not sure I can analyse it and get a result. I will ask for your help
please.

thanks

2012/8/8 Yonik Seeley yo...@lucidimagination.com

 Stack trace looks normal - it's just a multi-term query instantiating
 a bitset.  The memory is being taken up somewhere else.
 How many documents are in your index?
 Can you get a heap dump or use some other memory profiler to see
 what's taking up the space?

  if I stop query more then  ten minutes, the solr instance will start
 normally.

 Maybe queries are piling up in threads before the server is ready to
 handle them and then trying to handle them all at once gives an OOM?
 Is this live traffic or a test?  How many concurrent requests get sent?

 -Yonik
 http://lucidimagination.com


 On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo cooljam2...@gmail.com wrote:
  Aug 06, 2012 10:05:55 AM 

RE: Recovery problem in solrcloud

2012-08-07 Thread Markus Jelsma
Perhaps this describes your problem:
https://issues.apache.org/jira/browse/SOLR-3685

 
 
-Original message-
 From:Jam Luo cooljam2...@gmail.com
 Sent: Tue 07-Aug-2012 11:52
 To: solr-user@lucene.apache.org
 Subject: Recovery problem in solrcloud
 
 Hi
 I have  big index data files  more then 200g, there are two solr
 instance in a shard.  leader startup and is ok, but the peer alway OOM
  when  it startup.  The peer alway download index files from leader because
 of  recoveringAfterStartup property in RecoveryStrategy, total time taken
 for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
 leader and the peer have a same generation number,  why the peer
 do recovering?
 
 thanks
 cooljam
 


Re: Recovery problem in solrcloud

2012-08-07 Thread Mark Miller

On Aug 7, 2012, at 5:49 AM, Jam Luo cooljam2...@gmail.com wrote:

 Hi
I have  big index data files  more then 200g, there are two solr
 instance in a shard.  leader startup and is ok, but the peer alway OOM
 when  it startup.  

Can you share the OOM msg and stacktrace please?

 The peer alway download index files from leader because
 of  recoveringAfterStartup property in RecoveryStrategy, total time taken
 for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
 leader and the peer have a same generation number,  why the peer
 do recovering?

We are looking into this.

 
 thanks
 cooljam

- Mark Miller
lucidimagination.com













Re: Recovery problem in solrcloud

2012-08-07 Thread Mark Miller
Still no idea on the OOM - please send the stacktrace if you can.

As for doing a replication recovery when it should not be necessary, yonik just 
committed a fix for that a bit ago.

On Aug 7, 2012, at 9:41 AM, Mark Miller markrmil...@gmail.com wrote:

 
 On Aug 7, 2012, at 5:49 AM, Jam Luo cooljam2...@gmail.com wrote:
 
 Hi
   I have  big index data files  more then 200g, there are two solr
 instance in a shard.  leader startup and is ok, but the peer alway OOM
 when  it startup.  
 
 Can you share the OOM msg and stacktrace please?
 
 The peer alway download index files from leader because
 of  recoveringAfterStartup property in RecoveryStrategy, total time taken
 for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
 leader and the peer have a same generation number,  why the peer
 do recovering?
 
 We are looking into this.
 
 
 thanks
 cooljam
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com