Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Victor Ruiz
Also, I forgot to say... the same error started to happen again.. the index
is again corrupted :(



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud vs Solr master-slave replication

2013-04-18 Thread Victor Ruiz
Thank you again for your answer Shawn. 

Network card seems to work fine, but we've found segmentation faults, so now
our hosting provider is going to run a full hw check. Hopefully they'll
replace the server and problem wil be solved

Regards,
Victor





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4056925.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud vs Solr master-slave replication

2013-04-15 Thread Victor Ruiz
Hi Shawn,

thank you for your reply. 

I'll check if network card drivers are ok. About the RAM, the JVM max heap
size is currently 6GB, but it never reaches the maximum, tipically the used
RAM is not more than 5GB. should I assign more RAM? I've read that excess of
RAM assigned could have also a bad effect on the performance. Apart of the
RAM used by JVM, the server has more than 10GB of unused RAM, which should
be enough to cache the index.

About SolrCloud, I know it doesn't use master-slave replication, but
incremental updates, item by item. That's why I thought it could work for
us, since our bottleneck appear to be the replication cycles. But another
point is, if the indexing occurs in all servers, 1200 updates/min could also
overload the servers? and therefore have a worst performance than with
master-slave replication?

Regards,
Victor





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4055995.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud vs Solr master-slave replication

2013-04-12 Thread Victor Ruiz
Hi,

I've just posted this week an issue today with our Solr index:
http://lucene.472066.n3.nabble.com/corrupted-index-in-slave-td4054769.html,

Today, that error started to happen constantly for almost every request, and
I created a JIRA issue becaue I thought it was a bug
https://issues.apache.org/jira/browse/SOLR-4707

As you can read, at the end it was due to a fail in the Solr master-slave
replication, and now I don't know if we should think about migrating to
SolrCloud, since Solr master-slave replications seems not to fit to our
requirements:

* index size:  ~20 million documents, ~9GB
* ~1200 updates/min
* ~1 queries/min (distributed over 2 slaves)  MoreLikeThis, RealTimeGet,
TermVectorComponent, SearchHandler

I would thank you if anyone could help me to answer these questions:

* Would it be advisable to migrate to SolrCloud? Would it have impact on the
replication performance? 
* In that case, what would have better performance? to maintain a copy of
the index in every server, or to use shard servers?
* How many shards and replicas would you advice for ensuring high
availability? 

Kind Regards,

Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: corrupted index in slave?

2013-04-09 Thread Victor Ruiz
sorry I forgot to say, the exceptions are not for every document, but only
for a few...

regards,
Victor

Victor Ruiz wrote
> Hi guys,
> 
> I'm getting exceptions in a Solr slave, when accessing TermVector
> component and RealTimeGetHandler. The weird thing is, that in the master
> and in one of the 2 slaves, the documents are ok, and the same query
> doesnt return any exception. For now, the only way I have to solve the
> problem is deleting these documents and indexing them again.
> 
> I upgraded Solr from 4.0 directly to 4.2, then to 4.2.1 last week These
> exceptions seems to appear since the upgrade to 4.2.
> I didn't run the script for migrating the index files (as I did in the
> migration from 3.6 to 4.0), should I? Has the format of the index changed?
> If not, is that a known bug? If it's, sorry I couldn't find it in JIRA.
> 
> These are the exceptions I get:
> 
> {"responseHeader":{"status":500,"QTime":1},"response":{"numFound":1,"start":0,"docs":[{"itemid":"105266867","text":"exklusiver
> kann man kaum würzen  safran ist das teuerste gewürz der welt handverlesen
> und in mühevoller kleinstarbeit hergestellt ist safran sehr selten und
> wird in winzigen mengen gehandelt und
> verwendet","title":"safran","domainid":4287,"date_i":"2012-11-21T17:01:23Z","date":"2012-11-21T17:01:09Z","category":["kultur","literatur","gesellschaft","umwelt","trinken","essen"]}]},"termVectors":["uniqueKeyFieldName","itemid","105266867",["uniqueKey","105266867"]],"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
> org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
> org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
> org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:493)\n\tat
> org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:175)\n\tat
> org.apache.lucene.index.BaseCompositeReader.getTermVectors(BaseCompositeReader.java:97)\n\tat
> org.apache.lucene.index.IndexReader.getTermVector(IndexReader.java:385)\n\tat
> org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:313)\n\tat
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)\n\tat
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)\n\tat
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)\n\tat
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)\n\tat
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)\n\tat
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat
> org.mortbay.jetty.Server.handle(Server.java:326)\n\tat
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)\n\tat
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)\n\tat
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\n","code":500}}
> 
> 
> {"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
> org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
> org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.vi

corrupted index in slave?

2013-04-09 Thread Victor Ruiz
Hi guys,

I'm getting exceptions in a Solr slave, when accessing TermVector component
and RealTimeGetHandler. The weird thing is, that in the master and in one of
the 2 slaves, the documents are ok, and the same query doesnt return any
exception. For now, the only way I have to solve the problem is deleting
these documents and indexing them again.

I upgraded Solr from 4.0 directly to 4.2, then to 4.2.1 last week These
exceptions seems to appear since the upgrade to 4.2.
I didn't run the script for migrating the index files (as I did in the
migration from 3.6 to 4.0), should I? Has the format of the index changed?
If not, is that a known bug? If it's, sorry I couldn't find it in JIRA.

These are the exceptions I get:

{"responseHeader":{"status":500,"QTime":1},"response":{"numFound":1,"start":0,"docs":[{"itemid":"105266867","text":"exklusiver
kann man kaum würzen  safran ist das teuerste gewürz der welt handverlesen
und in mühevoller kleinstarbeit hergestellt ist safran sehr selten und wird
in winzigen mengen gehandelt und
verwendet","title":"safran","domainid":4287,"date_i":"2012-11-21T17:01:23Z","date":"2012-11-21T17:01:09Z","category":["kultur","literatur","gesellschaft","umwelt","trinken","essen"]}]},"termVectors":["uniqueKeyFieldName","itemid","105266867",["uniqueKey","105266867"]],"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(CompressingTermVectorsReader.java:493)\n\tat
org.apache.lucene.index.SegmentReader.getTermVectors(SegmentReader.java:175)\n\tat
org.apache.lucene.index.BaseCompositeReader.getTermVectors(BaseCompositeReader.java:97)\n\tat
org.apache.lucene.index.IndexReader.getTermVector(IndexReader.java:385)\n\tat
org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:313)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)\n\tat
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)\n\tat
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)\n\tat
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)\n\tat
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)\n\tat
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat
org.mortbay.jetty.Server.handle(Server.java:326)\n\tat
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:926)\n\tat
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)\n\tat
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\n","code":500}}


{"error":{"trace":"java.lang.ArrayIndexOutOfBoundsException\n\tat
org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:132)\n\tat
org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:135)\n\tat
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:258)\n\tat
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139)\n\tat
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116)\n\tat
org.apache.lucene.index.IndexReader.document(IndexReader.java:436)\n\tat
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:640)\n\tat
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:568)\n\tat
org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:176)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)\n\ta

Re: Search data who does not have "x" field

2013-04-09 Thread Victor Ruiz
Sorry, I didnt explain my self good, I mean , you have to create an
additional field 'hasCategory' in your schema, and then, before indexing,
set the field 'hasCategory' in the indexed document as true, if your
document has categories, or set it to false, if it has any. With this you
will save computation time, since the query for a boolean field is much
easier for Solr than checking for an empty string field. 

The query should be => q=*:*&fq=hasCategory:true 


anurag.jain wrote
> "another solution would be to add a boolean field, hasCategory, and use it
> for filtering 
> q=
> 
> &fq=hasCategory:true "
> 
> 
> I am not getting result.
> 
> 
> i am trying
> 
> localhost:8983/search?q=*:*&fq=category:true
> 
> it is giving zero result.
> 
> by the way first technique is working fine.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-data-who-does-not-have-x-field-tp4046959p4054763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr replication takes long time

2013-03-13 Thread Victor Ruiz
While looking at Solr logs, I found a java.lang.OutOfMemoryError: Java heap
space that was happening 2 times per hour 
So I tried to increase the max memory heap assigned to JVM (-Xmx) and since
then  the servers are not crashing, even though the replication takes still
long time to complete. But for now, the 2 slaves can handle with no problems
all the queries.


Regards,
Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-replication-takes-long-time-tp4046388p4046993.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search data who does not have "x" field

2013-03-13 Thread Victor Ruiz
add this to your query, or filter query: 

q=&fq=-category:[* TO *]

another solution would be to add a boolean field, hasCategory, and use it
for filtering
q=&fq=hasCategory:true


Victor


anurag.jain wrote
> Hi all,
> 
> I am facing a problem. 
> 
> Problem is:
> 
> I have updated 250 data to solr. 
> 
> 
> and some of data have "category" field and some of don't have.
> 
> for example.
> 
> 
> {
> "id":"321",
> "name":"anurag",
> "category":"x"
> },
> {
> "id":"3",
> "name":"john"
> }
> 
> 
> now i want to search that data who does not have that field. 
> 
> what query should like. 
> 
> 
> please reply 
> 
> It is very urgent - i have to complete this task by today itself 

> 
> thanks in advance.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-data-who-does-not-have-x-field-tp4046959p4046961.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr replication takes long time

2013-03-13 Thread Victor Ruiz
After upgrading to 4.2, the problem is not yet solved, in this image you can
see, how slow is the transfer speed. At least, after the update the master
is not blocked during replication
 

Any idea?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-replication-takes-long-time-tp4046388p4046951.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr replication takes long time

2013-03-11 Thread Victor Ruiz
Thanks for your answer Mark. I think I'll try to update to 4.2. I'll keep you
updated.

Anyway, I'd not say that the full index is replicated, I've been monitoring
the replication process in the Solr admin console and there I see that
usually not more than 50-100 files are transferrend, the total size is
rarely greater than 50MB. Is this info trustable?

Victor

Mark Miller-3 wrote
> Okay - yes, 4.0 is a better choice for replication than 4.1.
> 
> It almost sounds like you may be replicating the full index rather than
> just changes or something. 4.0 had a couple issues as well - a couple
> things that were discovered while writing stronger tests for 4.2.
> 
> 4.2 is spreading onto mirrors now.
> 
> - Mark
> 
> On Mar 11, 2013, at 2:00 PM, Victor Ruiz <

> bik1979@

> > wrote:
> 
>> no, Solr 4.0.0, I wanted to update to Solr 4.1 but I read that there was
>> an
>> issue with the replication, so I decided not to try it for now
>> 
>> 
>> Mark Miller-3 wrote
>>> Are you using Solr 4.1?
>>> 
>>> - Mark
>>> 
>>> On Mar 11, 2013, at 1:53 PM, Victor Ruiz <
>> 
>>> bik1979@
>> 
>>> > wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> I have a problem with Solr replication. I have 2 solr servers (Solr
>>>> 4.0.0) 1
>>>> master and 1 slave (8 processors,16GB RAM ,Ubuntu 11,  ext3,  each). In
>>>> every server, there are 2 independent instances of solr running (I
>>>> tried
>>>> also multicore config, but having independent instances has for me
>>>> better
>>>> performance), every instance having a differente collection. So, we
>>>> have
>>>> 2
>>>> masters in server 1, and 2 slaves in server 2.
>>>> 
>>>> Index size is currently (for the biggest collection) around 17 million
>>>> documents, with a total size near 12 GB. The files transferred every
>>>> replication cycle are typically not more than 100, with a total size
>>>> not
>>>> bigger than 50MB. The other collection is not that big, just around 1
>>>> million docs and not bigger than 2 GB and not a high update ratio. The
>>>> big
>>>> collection has a load around 200 queries per second (MoreLikeThis,
>>>> RealTimeGetHandler , TermVectorComponent mainly), and for the small one
>>>> it
>>>> is below 50 queries per second
>>>> 
>>>> Replication has been working for long time with any problem, but in the
>>>> last
>>>> weeks the replication cycles started to take long and long time for the
>>>> big
>>>> collection, even more than 2 minutes, some times even more. During that
>>>> time, slaves are so overloaded, that many queries are timing out,
>>>> despite
>>>> the timeout in my clients is 30 seconds. 
>>>> 
>>>> The servers are in same LAN, gigabit ethernet, so the broadband should
>>>> not
>>>> be the bottleneck.
>>>> 
>>>> Since the index is receiving frequents updates and deletes (update
>>>> handler
>>>> receives more than 200 request per second for the big collection, but
>>>> not
>>>> more than 5 per second for the small one), I tried to use the
>>>> maxCommitsToKeep attribute, to ensure that no file was deleted during
>>>> replication, but it has no effect. 
>>>> 
>>>> My solrconfig.xml in the big collection is like that:
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
> 
>>>> 
>>>>
>>> 
> 
>>> LUCENE_40
>>> 
> 
>>>> 
>>>>
>>> 
> >
>>> 
>>>  
>>> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>>>> 
>>>> 
>>>>
>>> 
> 
>>>>
>>> 
> 
>>> 3
>>> 
> 
>>>> 
>>>>
>>> 
> 
>>>>
>>>>
>>> 
> 
>>> 10
>>> 
> 
>>>>
>>> 
> 
>>> 1
>>> 
> 
>>>>
>>>>
>>> 
> 
>>> 6HOUR
>>> 
> 
>>>> 
>>>>
>>> 
> 
>>>> 
>>>&

Re: Solr replication takes long time

2013-03-11 Thread Victor Ruiz
no, Solr 4.0.0, I wanted to update to Solr 4.1 but I read that there was an
issue with the replication, so I decided not to try it for now


Mark Miller-3 wrote
> Are you using Solr 4.1?
> 
> - Mark
> 
> On Mar 11, 2013, at 1:53 PM, Victor Ruiz <

> bik1979@

> > wrote:
> 
>> Hi guys,
>> 
>> I have a problem with Solr replication. I have 2 solr servers (Solr
>> 4.0.0) 1
>> master and 1 slave (8 processors,16GB RAM ,Ubuntu 11,  ext3,  each). In
>> every server, there are 2 independent instances of solr running (I tried
>> also multicore config, but having independent instances has for me better
>> performance), every instance having a differente collection. So, we have
>> 2
>> masters in server 1, and 2 slaves in server 2.
>> 
>> Index size is currently (for the biggest collection) around 17 million
>> documents, with a total size near 12 GB. The files transferred every
>> replication cycle are typically not more than 100, with a total size not
>> bigger than 50MB. The other collection is not that big, just around 1
>> million docs and not bigger than 2 GB and not a high update ratio. The
>> big
>> collection has a load around 200 queries per second (MoreLikeThis,
>> RealTimeGetHandler , TermVectorComponent mainly), and for the small one
>> it
>> is below 50 queries per second
>> 
>> Replication has been working for long time with any problem, but in the
>> last
>> weeks the replication cycles started to take long and long time for the
>> big
>> collection, even more than 2 minutes, some times even more. During that
>> time, slaves are so overloaded, that many queries are timing out, despite
>> the timeout in my clients is 30 seconds. 
>> 
>> The servers are in same LAN, gigabit ethernet, so the broadband should
>> not
>> be the bottleneck.
>> 
>> Since the index is receiving frequents updates and deletes (update
>> handler
>> receives more than 200 request per second for the big collection, but not
>> more than 5 per second for the small one), I tried to use the
>> maxCommitsToKeep attribute, to ensure that no file was deleted during
>> replication, but it has no effect. 
>> 
>> My solrconfig.xml in the big collection is like that:
>> 
>> 
>> 
>> 
> 
>> 
>>  
> 
> LUCENE_40
> 
>> 
>>  
> >
> 
> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>> 
>> 
>>  
> 
>>  
> 
> 3
> 
>> 
>>  
> 
>>  
>>  
> 
> 10
> 
>>  
> 
> 1
> 
>>  
>>  
> 
> 6HOUR
> 
>> 
>>  
> 
>> 
>>  
> 
>> 
>>  
> 
>> 
>>  
> 
>> 
>>  
> 
>>  
> 
> 2000
> 
>>  
> 
> 3
> 
>>  
> 
>> 
>>  
> 
>>  
> 
> 500
> 
>>  
> 
>> 
>>  
> 
>>  
> 
> ${solr.data.dir:}
> 
>>  
> 
>> 
>>  
> 
>> 
>>  
> 
>>  
> 
> 2048
> 
>> 
>>  
> >
>   class="solr.FastLRUCache"
>>  size="2048"
>>  initialSize="1024"
>>  autowarmCount="1024"/>
>> 
>>  
> >
>   class="solr.LRUCache"
>>  size="2048"
>>  initialSize="1024"
>>  autowarmCount="1024"/>
>> 
>>  
>>  
> >
>   class="solr.LRUCache"
>>  size="2048"
>>  initialSize="1024"
>>  autowarmCount="1024"/>
>> 
>>  
> 
> true
> 
>> 
>>  
> 
> 50
> 
>> 
>>  
> 
> 50
> 
>> 
>>  
> 
>>  
> 
>>  
> 
>>  

Solr replication takes long time

2013-03-11 Thread Victor Ruiz
Hi guys,

I have a problem with Solr replication. I have 2 solr servers (Solr 4.0.0) 1
master and 1 slave (8 processors,16GB RAM ,Ubuntu 11,  ext3,  each). In
every server, there are 2 independent instances of solr running (I tried
also multicore config, but having independent instances has for me better
performance), every instance having a differente collection. So, we have 2
masters in server 1, and 2 slaves in server 2.

Index size is currently (for the biggest collection) around 17 million
documents, with a total size near 12 GB. The files transferred every
replication cycle are typically not more than 100, with a total size not
bigger than 50MB. The other collection is not that big, just around 1
million docs and not bigger than 2 GB and not a high update ratio. The big
collection has a load around 200 queries per second (MoreLikeThis,
RealTimeGetHandler , TermVectorComponent mainly), and for the small one it
is below 50 queries per second

Replication has been working for long time with any problem, but in the last
weeks the replication cycles started to take long and long time for the big
collection, even more than 2 minutes, some times even more. During that
time, slaves are so overloaded, that many queries are timing out, despite
the timeout in my clients is 30 seconds. 

The servers are in same LAN, gigabit ethernet, so the broadband should not
be the bottleneck.

Since the index is receiving frequents updates and deletes (update handler
receives more than 200 request per second for the big collection, but not
more than 5 per second for the small one), I tried to use the
maxCommitsToKeep attribute, to ensure that no file was deleted during
replication, but it has no effect. 

My solrconfig.xml in the big collection is like that:





LUCENE_40





3



10
1

6HOUR










2000
3



500



${solr.data.dir:}





2048








true

50

50




*:*
date:[NOW/DAY-7DAY TO 
NOW/DAY+1DAY]
1000






*:*
date:[NOW/DAY-7DAY TO 
NOW/DAY+1DAY]
1000




true

4






${enable.master:false}
commit
startup
startup
schema.xml,solrconfig.xml,stopwords_de.txt,stopwords_en.txt,mapping-FoldToASCII.txt,mapping-FoldToASCII_de.txt


${enable.slave:false}
http://${MASTER_HOST}:${MASTER_PORT}/solr/${MASTER_CORE}
05:00
${MASTER_HTTP_USER}
${MASTER_HTTP_PWD}




*:*




Poll interval is now set to 5 min, I tried to reduce it to 2 min and to
increase it up to 10, with no effect, the replication is always taking so
long., even  with a poll time of 2 minutes, when there are only a few megas
to replicate.

Any idea suggestion about what could be the problem? 

Thanks in advance,
Victor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-replication-takes-long-time-tp4046388.html
Sent from the Solr - User mailing list archive at Nabble.com.


MoreLikeThis boosted by Date

2013-03-04 Thread Victor Ruiz
Hi,

In my Solr config I have a request handler that boosts newer items, using
date field:

   


true
10
itemid,score
{!boost b=$bf v=$qq}
recip(ms(NOW,date),0.6,1,1) 



And I use also the MoreLikeThis handler
   

true
false
50
itemid,score
title,text
1
1
3
15
true
text title^5



My question is: would it be possible to add a date boost to the items
returned by MoreLikeThis? Or to chain the requests in some way, that is, to
call my dateBoost handler with the itemids returned by MoreLikeThis. 
I can get the result I want by sending a second query when I get the results
of MoreLikeThis:

http://localhost:8983/solr/mlt?q=itemid:item0 => item1 item2 item3 item4

http://localhost:8983/solr/select?qq=itemid:(item1 item2 item3
item4)&qt=dateBoost

I've been working with Solr for almost 2 years, and I've not found yet the
way to do it, if it exists... but I still wonder if there's any way to get
the same output saving the 2nd query. 

Thanks in advance,
Victor




--
View this message in context: 
http://lucene.472066.n3.nabble.com/MoreLikeThis-boosted-by-Date-tp4044513.html
Sent from the Solr - User mailing list archive at Nabble.com.