commit in solr4 takes a longer time

2013-05-01 Thread vicky desai
Hi all,

I have recently migrated from solr 3.6 to solr 4.0. The documents in my core
are getting constantly updated and so I fire a code commit after every 10
thousand docs . However moving from 3.6 to 4.0 I have noticed that for the
same core size it takes about twice the time to commit in solr4.0 compared
to solr 3.6. 

Is there any workaround by which I can reduce this time. Any help would be
highly appreciated 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 4.2 rollback not working

2013-05-01 Thread Dipti Srivastava
Hi All,
WE have setup a 4.2 Solr cloud with 4 nodes and while the add/update/delete 
operations are working we are not able to perform a rollback. Is there 
something different for this operation vs the 3.x sole master/slave config?

Thanks,
Dipti
phone: 408.678.1595  |  cell: 408.806.1970 | email: 
dipti.srivast...@apollogrp.edu
Solutions Engineering and Integration: 
https://wiki.apollogrp.edu/display/NGP/Solutions+Engineering+and+Integration
Support process: 
https://wiki.apollogrp.edu/display/NGP/Classroom+Services+Support
P Please consider the environment before printing this email.


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.



Re: How to get/set customized Solr data source properties?

2013-05-01 Thread Xi Shen
Hi Hoss,

I reviewed the code from other DataSouce classes as well, that's how I
learned it should work. And this is my actual code. I create this
DataSource for testing my ideas. I am blocked at the very beginning...sucks
:(


On Saturday, April 27, 2013, Chris Hostetter wrote:

> :
> : I am working on a DataSource implementation. I want to get some
> customized
> : properties when the *DataSource.init* method is called. I tried to add
> the
> ...
> : 
> :: my="value" />
>
> My understanding from looking at other DataSources is that should work.
>
> : But initProps.getProperty("my") == null.
>
> can you show us some actual that fails with that dataConfig you mentioned?
>
>
> -Hoss
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84


RE: How to deal with cache for facet search when index is always increment?

2013-05-01 Thread Kuai, Ben
Hi

You can give soft-commit a try.
More details available here  http://wiki.apache.org/solr/NearRealtimeSearch


-Original Message-
From: 李威 [mailto:li...@antvision.cn] 
Sent: Thursday, 2 May 2013 12:02 PM
To: solr-user
Cc: 李景泽; 罗佳
Subject: How to deal with cache for facet search when index is always increment?

Hi folks,


For facet seach, solr would create cache which is based on the whole docs. If I 
import a new doc into index, the cache would out of time and need to create 
again. 
For real time seach, the docs would be import to index anytime. In this case, 
the cache is nealy always need to create again, which cause the facet seach is 
very slowly.
Do you have any idea to deal with such problem?


Thanks,
Wei Li


How to deal with cache for facet search when index is always increment?

2013-05-01 Thread 李威
Hi folks,


For facet seach, solr would create cache which is based on the whole docs. If I 
import a new doc into index, the cache would out of time and need to create 
again. 
For real time seach, the docs would be import to index anytime. In this case, 
the cache is nealy always need to create again, which cause the facet seach is 
very slowly.
Do you have any idea to deal with such problem?


Thanks,
Wei Li

Re: Server inconsistent state & Core Reload issue

2013-05-01 Thread Ravi Solr
Shawn,
  I don't believe its the container because we use the same container
in another setup that has 6 cores which is serving almost 1.8 Million
requests a day without a hitch.

If you look at my email the container that is running SOLR got the request
params (http access logs provided in first email) but when it goes through
the SOLR app/code on the container (probably through request filters or
dispatchers..I don't know exactly) its getting lost, which is what I am
trying to understand. I want to understand under what situations this mat
happen.

Having said that this application that uses this problematic SOLR instance
retrieves large number of facets results for each of 26 facets for each
query and every query is a group query, would that cause any issues with
SOLR caches that could lead to the issues like I am facing ???

With regards to the port number, our paranoid security folks wanted me to
not reveal our ports so I put it as 80 without thinking :-), I assure use
that its not 80.

Thanks,

Ravi


On Wed, May 1, 2013 at 6:03 PM, Shawn Heisey  wrote:

> On 5/1/2013 3:14 PM, Ravi Solr wrote:
>
>> We are using Solr 3.6.2 with a single core setup on a glassfish server,
>> every 4-5 hours the server gradually gets into a some kind of a
>> inconsistent state and stops accepting any queries giving back cached
>> results. Even the core reload fails giving the following. Has anybody
>> experienced such behavior ? Can anybody help me understand why this might
>> happen ?
>>
>> http://searchserver:80/solr/**admin/cores?action=RELOAD&**core=core1
>>
>> 
>>   
>>0
>>9
>>   
>>   
>>
>
> It is dropping the parameters from the /admin/cores request too, so it
> returns status instead of acting on the RELOAD.
>
> This is acting like a servlet container issue more than a Solr issue. It's
> always possible that it actually is Solr.
>
> It's a little unusual to see Solr running on port 80.  It's not
> impossible, just not the normal setup, because exposing Solr directly to
> the outside world is a very bad idea, so it's a lot safer to have it listen
> on another port.
>
> Is glassfish actually listening on port 80?  If it's not, then you
> probably have something acting as a proxy in front of Solr.  If your
> platform is a UNIX variant or Linux and has a fully functional 'lsof'
> command, the following will tell you which process is bound to port 80:
>
> lsof -nPi | grep ":80"
>
> Can you try running Solr under the jetty that's included with the Solr
> download?  For Solr 3.6.2, this is a slightly modified Jetty 6.  You can't
> use the Jetty 8 that's included with a newer version of Solr.  If port 80
> is a requirement, that should be possible as long as it's running as root.
>
> Thanks,
> Shawn
>
>


Re: Server inconsistent state & Core Reload issue

2013-05-01 Thread Shawn Heisey

On 5/1/2013 3:14 PM, Ravi Solr wrote:

We are using Solr 3.6.2 with a single core setup on a glassfish server,
every 4-5 hours the server gradually gets into a some kind of a
inconsistent state and stops accepting any queries giving back cached
results. Even the core reload fails giving the following. Has anybody
experienced such behavior ? Can anybody help me understand why this might
happen ?

http://searchserver:80/solr/admin/cores?action=RELOAD&core=core1


  
   0
   9
  
  


It is dropping the parameters from the /admin/cores request too, so it 
returns status instead of acting on the RELOAD.


This is acting like a servlet container issue more than a Solr issue. 
It's always possible that it actually is Solr.


It's a little unusual to see Solr running on port 80.  It's not 
impossible, just not the normal setup, because exposing Solr directly to 
the outside world is a very bad idea, so it's a lot safer to have it 
listen on another port.


Is glassfish actually listening on port 80?  If it's not, then you 
probably have something acting as a proxy in front of Solr.  If your 
platform is a UNIX variant or Linux and has a fully functional 'lsof' 
command, the following will tell you which process is bound to port 80:


lsof -nPi | grep ":80"

Can you try running Solr under the jetty that's included with the Solr 
download?  For Solr 3.6.2, this is a slightly modified Jetty 6.  You 
can't use the Jetty 8 that's included with a newer version of Solr.  If 
port 80 is a requirement, that should be possible as long as it's 
running as root.


Thanks,
Shawn



Server inconsistent state & Core Reload issue

2013-05-01 Thread Ravi Solr
We are using Solr 3.6.2 with a single core setup on a glassfish server,
every 4-5 hours the server gradually gets into a some kind of a
inconsistent state and stops accepting any queries giving back cached
results. Even the core reload fails giving the following. Has anybody
experienced such behavior ? Can anybody help me understand why this might
happen ?

http://searchserver:80/solr/admin/cores?action=RELOAD&core=core1


 
  0
  9
 
 
  
  core1
  /data/solr/core1-home/
  /data/solr/core/core1-data/
  2013-05-01T19:16:31.32Z
  137850
  
  21479
  25170
  1367184551418
  4
  true
  true
  org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/data/solr/core/core1-data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@71d9673
 2013-05-01T19:15:04Z
   
  
   



During the inconsistent state any queries being issued to the server loose
the query parameters. We can see the proper queries in the container's http
access logs but solr somehow solr doesn't get the query params at all. Also
note that "content length" on the container's access logs is always 68935,
which implies its always giving the same docs irrespective of the query.

If we restart the server everything is back to normal and the same queries
run properly.

SOLR Log
--
[#|2013-05-01T15:20:02.031-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=17 |#]

[#|2013-05-01T15:20:02.034-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=24;_ThreadName=httpSSLWorkerThread-9001-4;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=13 |#]

[#|2013-05-01T15:20:02.055-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=23;_ThreadName=httpSSLWorkerThread-9001-3;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=13 |#]

[#|2013-05-01T15:20:02.081-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=25;_ThreadName=httpSSLWorkerThread-9001-5;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.106-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.136-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-2;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=16 |#]

[#|2013-05-01T15:20:02.161-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]

[#|2013-05-01T15:20:02.185-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=24;_ThreadName=httpSSLWorkerThread-9001-4;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.209-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=23;_ThreadName=httpSSLWorkerThread-9001-3;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.241-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=25;_ThreadName=httpSSLWorkerThread-9001-5;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=16 |#]

[#|2013-05-01T15:20:02.266-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=19;_ThreadName=httpSSLWorkerThread-9001-0;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]

[#|2013-05-01T15:20:02.288-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-2;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=14 |#]

[#|2013-05-01T15:20:02.291-0400|INFO|sun-appserver2.1.1|org.apache.solr.core.SolrCore|_ThreadID=20;_ThreadName=httpSSLWorkerThread-9001-1;|[core1]
webapp=/solr path=/select params={} hits=21479 status=0 QTime=15 |#]



Container Access Logs
-
"xx.xxx.xx.xx" "" "01/May/2013:15:20:02 -0500" "GET
/solr/core1/select?q=*%3A*&rows=250&start=0&facet=true&fq=source%3A%
22site.com%22&fq=categories%3A%28%22Music+Venues%22%29&fl=name%2Cnamestring%2Cscore&sort=namestring+desc&wt=javabin&version=2&wt=javabin&version=2
HTTP/1.1" 200 68935

"xx.xxx.xx.xx" "" "01/May/2013:15:20:02 -0500" "GET
/solr/core1/select?q=*%3A*&rows=5&start=0&facet=true&fq=source%3A%22site.com%22&fq=categories%3A%28%22Exhibits%22%29&fq=types%3A%28%22Painting%2FDrawing%22%29&sort=closingdate

any plans to remove int32 limitation on the number of the documents in the index?

2013-05-01 Thread Valery Giner

Dear Solr Developers,

I've been unable to find an answer to the question in the subject line 
of this e-mail, except of a vague one.


We need to be able to index over 2bln+ documents.   We were doing well 
without sharding until the number of docs hit the limit ( 2bln+).   The 
performance was satisfactory for the queries, updates and indexing of 
new documents.


That is, except for the need to go around the int32 limit, we don't 
really have a need for setting up distributed solr.


I wonder whether some one on the solr team could tell us when/what 
version of solr we could expect the limit to be removed.


I hope this question may be of interest to some one else :)

--
Thanks,
Val



Re: Unsubscribing from JIRA

2013-05-01 Thread Raymond Wiker
On May 1, 2013, at 19:07 , johnmu...@aol.com wrote:
> Are you saying because I'm subscribed to dev, which I'm, is why I'm getting 
> JIRA mails too, and the only way I can stop JIRA mails is to unsubscribe from 
> dev?  I don't think so.  I'm subscribed to other projects, both dev and user, 
> and yet I do not receive JIRA mails.
> 

I'm pretty sure that's the case... I subscribed to dev, and got the JIRA mails. 
I unsubscribed from dev, and the JIRA mails stopped.

Re: Handling large no. of ids in solr

2013-05-01 Thread lavesh
i am sending list of online users and filters conditions as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-large-no-of-ids-in-solr-tp4060218p4060309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Only return snippets, not content

2013-05-01 Thread Bai Shen
I'll take a look.  Thanks.


On Wed, May 1, 2013 at 8:27 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Since you're doing this, you might want to make sure lazy field loading is
> on as well. Unfortunately I can't link you to the wiki because it still
> down (uh oh), but it's a setting in solrconfig.xml.
>
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, May 1, 2013 at 7:48 AM, Bai Shen  wrote:
>
> > Fixed it.  I just had to add "&fl=" with all of the fields except for
> > content that I wanted returned.
> >
> >
> > On Wed, May 1, 2013 at 7:38 AM, Bai Shen 
> wrote:
> >
> > > I have a lot of large files that I've indexed into solr.  Is there a
> way
> > > to have solr return the snippets instead of the content?  I'm only
> > > displaying the snippets to my users, so transferring the content as
> well
> > > just wastes bandwidth.
> > >
> > > Thanks.
> > >
> >
>


Re: Unsubscribing from JIRA

2013-05-01 Thread johnmunir
Are you saying because I'm subscribed to dev, which I'm, is why I'm getting 
JIRA mails too, and the only way I can stop JIRA mails is to unsubscribe from 
dev?  I don't think so.  I'm subscribed to other projects, both dev and user, 
and yet I do not receive JIRA mails.


--MJ



-Original Message-
From: Alan Woodward 
To: solr-user 
Sent: Wed, May 1, 2013 12:52 pm
Subject: Re: Unsubscribing from JIRA


Hi MJ,

It looks like you're subscribed to the lucene dev list.  Send an email to 
dev-unsubscr...@lucene.apache.org to get yourself taken off the list.

Alan Woodward
www.flax.co.uk


On 1 May 2013, at 17:25, johnmu...@aol.com wrote:

> Hi,
> 
> 
> Can someone show me how to unsubscribe from JIRA?
> 
> 
> Years ago, I subscribed to JIRA and since then I have been receiving emails 
from JIRA for all kind of issues: when an issue is created, closed or commented 
on.  Yes, I looked around and could not figure out how to unsubscribe, but 
maybe 
I didn't look hard enough?
> 
> 
> Here is an example email subject line header from JIRA: "[jira] [Commented] 
(LUCENE-3842) Analyzing Suggester"  I have the same issue from "Jenkins" (and 
example: "[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1537 - Still 
Failing").
> 
> 
> Thanks in advance!!!
> 
> 
> -MJ


 


Re: Unsubscribing from JIRA

2013-05-01 Thread Alan Woodward
Hi MJ,

It looks like you're subscribed to the lucene dev list.  Send an email to 
dev-unsubscr...@lucene.apache.org to get yourself taken off the list.

Alan Woodward
www.flax.co.uk


On 1 May 2013, at 17:25, johnmu...@aol.com wrote:

> Hi,
> 
> 
> Can someone show me how to unsubscribe from JIRA?
> 
> 
> Years ago, I subscribed to JIRA and since then I have been receiving emails 
> from JIRA for all kind of issues: when an issue is created, closed or 
> commented on.  Yes, I looked around and could not figure out how to 
> unsubscribe, but maybe I didn't look hard enough?
> 
> 
> Here is an example email subject line header from JIRA: "[jira] [Commented] 
> (LUCENE-3842) Analyzing Suggester"  I have the same issue from "Jenkins" (and 
> example: "[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1537 - Still 
> Failing").
> 
> 
> Thanks in advance!!!
> 
> 
> -MJ



Re: Delete from Solr Cloud 4.0 index..

2013-05-01 Thread Shawn Heisey

On 5/1/2013 8:42 AM, Annette Newton wrote:

It was a single delete with a date range query.  We have 8 machines each
with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
always been a problem for us with the heap not clearing on Full garbage
collection.  I don't know what is being held in memory and refuses to be
collected.

I have seen your java heap configuration on previous posts and it's very
like ours except that we are not currently using LargePages (I don't know
how much difference that has made to your memory usage).

We have tried various configurations around Java including the G1 collector
(which was awful) but all settings seem to leave the old generation at
least 50% full, so it quickly fills up again.

-Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly

If I could only figure out what keeps the heap to the current level I feel
we would be in a better place with solr.


With a single delete request, it was probably the commit that was very 
slow and caused the problem, not the delete itself.  This has been my 
experience with my large indexes.


My attempts with the G1 collector were similarly awful.  The idea seems 
sound on paper, but Oracle needs to do some work in making it better for 
large heaps.  Because my GC tuning was not very disciplined, I do not 
know how much impact UseLargePages is having.


Your overall RAM allocation should be good.  If these machines aren't 
being used for other software, then you have 24-25GB of memory available 
for caching your index, which should be very good with 26GB of index for 
that machine.


Looking over your message history, I see that you're using Amazon EC2. 
Solr performs much better on bare metal, although the EC2 instance 
you're using is probably very good.


SolrCloud is optimized for machines that are on the same Ethernet LAN. 
Communication between EC2 VMs (especially if they are not located in 
nearby data centers) will have some latency and a potential for dropped 
packets.  I'm going to proceed with the idea that EC2 and virtualization 
are not the problems here.


I'm not really surprised to hear that with an index of your size that so 
much of a 10GB heap is retained.  There may be things that could reduce 
your memory usage, so could you share your solrconfig.xml and schema.xml 
with a paste site that does XML highlighting (pastie.org being a good 
example), and give us an idea of how often you update and commit?  Feel 
free to search/replace sensitive information, as long that work is 
consistent and you don't entirely remove it.  Armed with that 
information, we can have a discussion about your needs and how to 
achieve them.


Do you know how long cache autowarming is taking?  The cache statistics 
should tell you how long it took on the last commit.


Some examples of typical real-world queries would be helpful too. 
Examples should be relatively complex for your setup, but not 
worst-case.  An example query for my setup that meets this requirement 
would probably be 4-10KB in size ... some of them are 20KB!


Not really related - a question about one of your old messages that 
never seemed to get resolved:  Are you still seeing a lot of CLOSE_WAIT 
connections in your TCP table?  A later message from you mentioned 
4.2.1, so I'm wondering specifically about that version.


Thanks,
Shawn



Unsubscribing from JIRA

2013-05-01 Thread johnmunir
Hi,


Can someone show me how to unsubscribe from JIRA?


Years ago, I subscribed to JIRA and since then I have been receiving emails 
from JIRA for all kind of issues: when an issue is created, closed or commented 
on.  Yes, I looked around and could not figure out how to unsubscribe, but 
maybe I didn't look hard enough?


Here is an example email subject line header from JIRA: "[jira] [Commented] 
(LUCENE-3842) Analyzing Suggester"  I have the same issue from "Jenkins" (and 
example: "[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1537 - Still 
Failing").


Thanks in advance!!!


-MJ


SolrCloud facet query repeatably fails with "No live SolrServers" for some terms, not all

2013-05-01 Thread Brett Hoerner
An example:
https://gist.github.com/bretthoerner/2ffc362450bcd4c2487a

I'll note that all shards and replicas show as "Up" (green) in the Admin UI.

Does anyone know how this could happen? I can repeat this over and over
with the same terms. It was my understanding that something like a facet
query would need to go to *all* shards for any query (I'm using the default
SolrCloud sharding mechanism, nothing special).

How could a text field search for 'happy' always work and 'austin' always
return an error, shouldn't that "down server" be hit for a 'happy' query
also?

Thanks,
Brett


RE: java.lang.NullPointerException. I am trying to use CachedSqlEntityProcessor

2013-05-01 Thread Dyer, James
If I remember correctly, 3.6 DIH had bugs related to CachedSqlEntityProcessor 
and some were fixed in 3.6.1, 3.6.2, but some were not fixed until 4.0.  You 
might want to use a 3.5 DIH jar with your 3.6 Solr.  Or, post your 
data-config.xml and maybe someone can figure something out.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: srinalluri [mailto:nallurisr...@yahoo.com] 
Sent: Tuesday, April 30, 2013 10:53 AM
To: solr-user@lucene.apache.org
Subject: RE: java.lang.NullPointerException. I am trying to use 
CachedSqlEntityProcessor

Thanks James for your reply.

I have updated to 3.6.2. Now the NullPointerException is gone. But the
entities with CachedSqlEntityProcessor don't add anything to solr.

And entities without CachedSqlEntityProcessor, are working fine.

Why entities with CachedSqlEntityProcessor don't do anything? What is wrong
in my entity?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-I-am-trying-to-use-CachedSqlEntityProcessor-tp4059815p4060043.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Handling large no. of ids in solr

2013-05-01 Thread adityab
Based on the fq ("-" in it)  you posted are you trying to filter out all the
offline users?

Other option do you need the complete list in one request? did you try
splitting them in to batches of say 100 ids in one solr query. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-large-no-of-ids-in-solr-tp4060218p4060282.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Maximum number of facet query ina single query

2013-05-01 Thread Jack Krupansky
You mean 6000 filter queries? Or do they really have 6000 faceted fields in 
a single query?!


Even so, I wouldn't recommend that an average new Solr developer should have 
either 6000 fields in a single document or 6000 query terms or even 6000 
parameters. I mean, sure, you can try it and if it does happen to work, 
great, go for it, but if it doesn't work for 6000 or 100 or even more than 
about 20 facets/filters, I wouldn't complain too loudly about Solr (or 
Lucene) if it doesn't achieve sub-100 ms query time.


-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Tuesday, April 30, 2013 10:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Maximum number of facet query ina single query

FWIW, one of our current clients runs queries with 6000 facet queries...

Otis
Solr & ElasticSearch Support
http://sematext.com/


On Apr 30, 2013 5:22 AM, "vicky desai"  wrote:


Hi,

Is there any upper limit on the number of facet queries I can include in a
single query. Also is there any performance hit if I include too many 
facet

queries in a single query

Any help would be appreciated



--
View this message in context:
http://lucene.472066.n3.nabble.com/Maximum-number-of-facet-query-ina-single-query-tp4059926.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: How to Recovery Backup Snapshot at SolrCloud?

2013-05-01 Thread Furkan KAMACI
So does backup command is used for just to get more consistent index folder
compared to just using index folder for backup?


2013/5/1 Michael Della Bitta 

> Yeah, it's a consistency problem. Copying all those files takes time, and
> without something with some knowledge of how Lucene works managing the
> atomicity of the work, you might end up with a segments file that doesn't
> match the segments you actually copied.
>
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, May 1, 2013 at 8:59 AM, Furkan KAMACI  >wrote:
>
> > Sorry but what will I do? Will I copy everything under snapshot folder
> into
> > under index folder? If I don't run backup command and just copy index
> > folder anywhere else what is the difference between them (is it something
> > like consistency for if any writing operation for segment files did not
> > finished, or does backup command retrieves indexes data at RAM)
> >
> >
> >
> > 2013/5/1 Timothy Potter 
> >
> > > I agree with Michael that you'll only ever need your backup if you
> > > lose all nodes hosting a shard (leader + all other replicas), so the
> > > tlog doesn't really factor in when recovering from backup.
> > >
> > > The snapshot created by the replication handler is the index only and
> > > it makes most sense in my mind to remove the tlog before firing up a
> > > new node using the snapshot.
> > >
> > > The way I see it, the backup would be needed if you lost all nodes
> > > hosting a shard, then you need to 1) recover one node (the leader)
> > > from backup, 2) re-index any documents that were indexed between the
> > > time of your last snapshot and the time of the failure, then 3) bring
> > > additional replicas online as needed. The additional replicas will
> > > sync with the leader, snap pulling the entire index. You could do it
> > > 1,2,3 or 1,3,2.
> > >
> > > Cheers,
> > > Tim
> > >
> > > On Tue, Apr 30, 2013 at 12:45 PM, Furkan KAMACI <
> furkankam...@gmail.com>
> > > wrote:
> > > > I had index and tlog folder under my data folder. I have a snapshot
> > > folder
> > > > too when I make backup. However what will I do next if I want to use
> > > > backup, will I remove index and tlog folders and put just my snapshot
> > > > folder? What folks do?
> > > >
> > > > 2013/4/30 Michael Della Bitta 
> > > >
> > > >> Presumably you'd only be restoring a backup in the face of a
> > > catastrophe.
> > > >>
> > > >> Yes, you'd need to stop the node. And the transaction logs may not
> be
> > > >> useful in this case. You'd have trouble reconciling them with the
> > > version
> > > >> of the index in your backup I would think.
> > > >>
> > > >> Anybody who knows more about this want to chime in?
> > > >>
> > > >>
> > > >> Michael Della Bitta
> > > >>
> > > >> 
> > > >> Appinions
> > > >> 18 East 41st Street, 2nd Floor
> > > >> New York, NY 10017-6271
> > > >>
> > > >> www.appinions.com
> > > >>
> > > >> Where Influence Isn’t a Game
> > > >>
> > > >>
> > > >> On Tue, Apr 30, 2013 at 11:03 AM, Furkan KAMACI <
> > furkankam...@gmail.com
> > > >> >wrote:
> > > >>
> > > >> > Should I stop the node first? And what will happen to transaction
> > > logs?
> > > >> > Should I backup it too?
> > > >> >
> > > >> > 2013/4/30 Michael Della Bitta 
> > > >> >
> > > >> > > That directory is the data directory for the core... you'd just
> > > swap it
> > > >> > in.
> > > >> > >
> > > >> > >
> > > >> > > Michael Della Bitta
> > > >> > >
> > > >> > > 
> > > >> > > Appinions
> > > >> > > 18 East 41st Street, 2nd Floor
> > > >> > > New York, NY 10017-6271
> > > >> > >
> > > >> > > www.appinions.com
> > > >> > >
> > > >> > > Where Influence Isn’t a Game
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Apr 30, 2013 at 8:06 AM, Furkan KAMACI <
> > > furkankam...@gmail.com
> > > >> > > >wrote:
> > > >> > >
> > > >> > > > Hi Folks;
> > > >> > > >
> > > >> > > > I can backup my indexes at SolrCloud via
> > > >> > > > http://_master_host_:_port_/solr/replication?command=backup
> > > >> > > > and it creates a file called snapshot. I know that I should
> pull
> > > that
> > > >> > > > directory any other safe place (a backup store) However what
> > > should I
> > > >> > do
> > > >> > > to
> > > >> > > > make a recovery from that backup file?
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>


Re: How to recover from "Error opening new searcher" when machine crashed while indexing

2013-05-01 Thread Furkan KAMACI
Sorry but how do you use check index tool? Do you use Luke or does Solr has
built in functionality?

2013/5/1 Otis Gospodnetic 

> Was afraid of that and wondering if CheckIndex could regenerate the
> segments file based on segments it finds in the index dir?
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
> On May 1, 2013 7:15 AM, "Michael McCandless" 
> wrote:
>
> > Alas I think CheckIndex can't do much here: there is no segments file,
> > so you'll have to reindex from scratch.
> >
> > Just to check: did you ever called commit while building the index
> > before the machine crashed?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Tue, Apr 30, 2013 at 8:17 PM, Otis Gospodnetic
> >  wrote:
> > > Hi,
> > >
> > > Try running the CheckIndex tool.
> > >
> > > Otis
> > > Solr & ElasticSearch Support
> > > http://sematext.com/
> > > On Apr 30, 2013 3:10 PM, "Utkarsh Sengar" 
> wrote:
> > >
> > >> Solr 4.0 was indexing data and the machine crashed.
> > >>
> > >> Any suggestions on how to recover my index since I don't want to
> delete
> > my
> > >> data directory?
> > >>
> > >> When I try to start it again, I get this error:
> > >> ERROR 12:01:46,493 Failed to load Solr core: xyz.index1
> > >> ERROR 12:01:46,493 Cause:
> > >> ERROR 12:01:46,494 Error opening new searcher
> > >> org.apache.solr.common.SolrException: Error opening new searcher
> > >> at org.apache.solr.core.SolrCore.(SolrCore.java:701)
> > >> at org.apache.solr.core.SolrCore.(SolrCore.java:564)
> > >> at
> > >>
> > >>
> >
> org.apache.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:213)
> > >> at
> > >>
> >
> com.datastax.bdp.plugin.SolrCorePlugin.activateImpl(SolrCorePlugin.java:66)
> > >> at
> > >>
> > >>
> >
> com.datastax.bdp.plugin.PluginManager$PluginInitializer.call(PluginManager.java:161)
> > >> at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > >> at
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >> at
> > >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >> at java.lang.Thread.run(Thread.java:662)
> > >> Caused by: org.apache.solr.common.SolrException: Error opening new
> > searcher
> > >> at
> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1290)
> > >> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1402)
> > >> at org.apache.solr.core.SolrCore.(SolrCore.java:675)
> > >> ... 9 more
> > >> Caused by: org.apache.lucene.index.IndexNotFoundException: no
> segments*
> > >> file found in
> > NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@
> > >> /media/SSD/data/solr.data/rlcatalogks.prodinfo/index
> > >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@d7581b;
> > >> maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_73ne_nrm.cfs,
> > >> _73ng_Lucene40_0.tip, _73nh_nrm.cfs, _73ng_Lucene40_0.tim, _73nf.fnm,
> > >> _73n5_Lucene40_0.frq, _73ne.fdt, _73nh.fdx, _73ne_nrm.cfe, _73ne.fdx,
> > >> _73ne_Lucene40_0.tim, _73ne.si, _73ni.fnm, _73nh_Lucene40_0.prx,
> > >> _73ni.fdt,
> > >> _73n5.si, _73ne_Lucene40_0.tip, _73nf_Lucene40_0.frq,
> > >> _73nf_Lucene40_0.prx,
> > >> _73nf_nrm.cfe, _73ne_Lucene40_0.frq, _73ng_Lucene40_0.prx,
> > >> _73nf_Lucene40_0.tip, _73n5.fdx, _73ng_Lucene40_0.frq, _73ng.fnm,
> > >> _73ni.fdx, _73n5.fnm, _73nf_Lucene40_0.tim, _73ni.si, _73n5.fdt,
> > >> _73nf_nrm.cfs, _73nh_nrm.cfe, _73ni_Lucene40_0.frq, _73ng.fdx,
> > >> _73ne_Lucene40_0.prx, _73nh.fnm, _73nh_Lucene40_0.tip,
> > >> _73nh_Lucene40_0.tim, _73nh.si, _73n5_Lucene40_0.tip,
> > >> _73ni_Lucene40_0.prx,
> > >> _73n5_Lucene40_0.tim, _73nf.si, _73ng_nrm.cfe, _73n5_Lucene40_0.prx,
> > >> _392j_42f.del, _73ng.fdt, _73ng.si, _73ni_nrm.cfe, _73n5_nrm.cfe,
> > >> _73ni_nrm.cfs, _73nf.fdx, _73ni_Lucene40_0.tip, _73n5_nrm.cfs,
> > >> _73ni_Lucene40_0.tim, _73nf.fdt, _73ne.fnm, _73nh.fdt,
> > >> _73nh_Lucene40_0.frq, _73ng_nrm.cfs]
> > >>
> > >>
> > >> --
> > >> Thanks,
> > >> -Utkarsh
> > >>
> >
>


EmbeddedSolrServer

2013-05-01 Thread Peri Subrahmanya
I m trying to use the EmbeddedSolrServer and here is my sample code:

CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");

Upon running I get the following exception - java.lang.NoClassDefFoundError:
org/apache/solr/common/cloud/ZooKeeperException.

I m not sure why its complaining about ZooKeeper. Any ideas please?

Thank you,
Peri Subrahmanya




*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: Delete from Solr Cloud 4.0 index..

2013-05-01 Thread Annette Newton
Hi Shawn

Thanks for the reply.

It was a single delete with a date range query.  We have 8 machines each
with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
always been a problem for us with the heap not clearing on Full garbage
collection.  I don't know what is being held in memory and refuses to be
collected.

I have seen your java heap configuration on previous posts and it's very
like ours except that we are not currently using LargePages (I don't know
how much difference that has made to your memory usage).

We have tried various configurations around Java including the G1 collector
(which was awful) but all settings seem to leave the old generation at
least 50% full, so it quickly fills up again.

-Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly

If I could only figure out what keeps the heap to the current level I feel
we would be in a better place with solr.

Thanks.



On 1 May 2013 14:40, Shawn Heisey  wrote:

> On 5/1/2013 3:39 AM, Annette Newton wrote:
> > We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
> > index.  A total of 24,000,000.  We issued a rather large delete yesterday
> > morning to reduce that size by about half, this resulted in the loss of
> all
> > shards while the delete was taking place, but when it had apparently
> > finished as soon as we started writing again we continued to lose shards.
> >
> > We have also issued much smaller deletes and lost shards but before they
> > have always come back ok.  This time we couldn't keep them online.  We
> > ended up rebuilding out cloud setup and switching over to it.
> >
> > Is there a better process for deleting documents?  Is this expected
> > behaviour?
>
> How was the delete composed?  Was it a single request with a simple
> query, or was a it a huge list of IDs or a huge query?  Was it millions
> of individual delete queries?  All of those should be fine, but the last
> option is the hardest on Solr, especially if you are doing a lot of
> commits at the same time.  You might need to increase the zkTimeout
> value on your startup commandline or in solr.xml.
>
> How many machines do your eight SolrCloud replicas live on? How much RAM
> to they have? How much of that memory is allocated to the Java heap?
>
> Assuming that your SolrCloud is living on eight separate machines that
> each have a 26GB index, I hope that you have 16 to 32 GB of RAM on each
> of those machines, and that a large chunk of that RAM is not allocated
> to Java or any other program.  If you don't, then it will be very
> difficult to get good performance out of Solr, especially for index
> commits.  If you have multiple 26GB shards per machine, you'll need even
> more free memory.  The free memory is used to cache your index files.
>
> Another possible problem here is Java garbage collection pauses.  If you
> have a large max heap and don't have a tuned GC configuration, then the
> only way to fix this is to reduce your heap and/or to tune Java's
> garbage collection.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


Re: How to Recovery Backup Snapshot at SolrCloud?

2013-05-01 Thread Michael Della Bitta
Yeah, it's a consistency problem. Copying all those files takes time, and
without something with some knowledge of how Lucene works managing the
atomicity of the work, you might end up with a segments file that doesn't
match the segments you actually copied.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, May 1, 2013 at 8:59 AM, Furkan KAMACI wrote:

> Sorry but what will I do? Will I copy everything under snapshot folder into
> under index folder? If I don't run backup command and just copy index
> folder anywhere else what is the difference between them (is it something
> like consistency for if any writing operation for segment files did not
> finished, or does backup command retrieves indexes data at RAM)
>
>
>
> 2013/5/1 Timothy Potter 
>
> > I agree with Michael that you'll only ever need your backup if you
> > lose all nodes hosting a shard (leader + all other replicas), so the
> > tlog doesn't really factor in when recovering from backup.
> >
> > The snapshot created by the replication handler is the index only and
> > it makes most sense in my mind to remove the tlog before firing up a
> > new node using the snapshot.
> >
> > The way I see it, the backup would be needed if you lost all nodes
> > hosting a shard, then you need to 1) recover one node (the leader)
> > from backup, 2) re-index any documents that were indexed between the
> > time of your last snapshot and the time of the failure, then 3) bring
> > additional replicas online as needed. The additional replicas will
> > sync with the leader, snap pulling the entire index. You could do it
> > 1,2,3 or 1,3,2.
> >
> > Cheers,
> > Tim
> >
> > On Tue, Apr 30, 2013 at 12:45 PM, Furkan KAMACI 
> > wrote:
> > > I had index and tlog folder under my data folder. I have a snapshot
> > folder
> > > too when I make backup. However what will I do next if I want to use
> > > backup, will I remove index and tlog folders and put just my snapshot
> > > folder? What folks do?
> > >
> > > 2013/4/30 Michael Della Bitta 
> > >
> > >> Presumably you'd only be restoring a backup in the face of a
> > catastrophe.
> > >>
> > >> Yes, you'd need to stop the node. And the transaction logs may not be
> > >> useful in this case. You'd have trouble reconciling them with the
> > version
> > >> of the index in your backup I would think.
> > >>
> > >> Anybody who knows more about this want to chime in?
> > >>
> > >>
> > >> Michael Della Bitta
> > >>
> > >> 
> > >> Appinions
> > >> 18 East 41st Street, 2nd Floor
> > >> New York, NY 10017-6271
> > >>
> > >> www.appinions.com
> > >>
> > >> Where Influence Isn’t a Game
> > >>
> > >>
> > >> On Tue, Apr 30, 2013 at 11:03 AM, Furkan KAMACI <
> furkankam...@gmail.com
> > >> >wrote:
> > >>
> > >> > Should I stop the node first? And what will happen to transaction
> > logs?
> > >> > Should I backup it too?
> > >> >
> > >> > 2013/4/30 Michael Della Bitta 
> > >> >
> > >> > > That directory is the data directory for the core... you'd just
> > swap it
> > >> > in.
> > >> > >
> > >> > >
> > >> > > Michael Della Bitta
> > >> > >
> > >> > > 
> > >> > > Appinions
> > >> > > 18 East 41st Street, 2nd Floor
> > >> > > New York, NY 10017-6271
> > >> > >
> > >> > > www.appinions.com
> > >> > >
> > >> > > Where Influence Isn’t a Game
> > >> > >
> > >> > >
> > >> > > On Tue, Apr 30, 2013 at 8:06 AM, Furkan KAMACI <
> > furkankam...@gmail.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi Folks;
> > >> > > >
> > >> > > > I can backup my indexes at SolrCloud via
> > >> > > > http://_master_host_:_port_/solr/replication?command=backup
> > >> > > > and it creates a file called snapshot. I know that I should pull
> > that
> > >> > > > directory any other safe place (a backup store) However what
> > should I
> > >> > do
> > >> > > to
> > >> > > > make a recovery from that backup file?
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>


Re: How to recover from "Error opening new searcher" when machine crashed while indexing

2013-05-01 Thread Otis Gospodnetic
Was afraid of that and wondering if CheckIndex could regenerate the
segments file based on segments it finds in the index dir?

Otis
Solr & ElasticSearch Support
http://sematext.com/



On May 1, 2013 7:15 AM, "Michael McCandless" 
wrote:

> Alas I think CheckIndex can't do much here: there is no segments file,
> so you'll have to reindex from scratch.
>
> Just to check: did you ever called commit while building the index
> before the machine crashed?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Apr 30, 2013 at 8:17 PM, Otis Gospodnetic
>  wrote:
> > Hi,
> >
> > Try running the CheckIndex tool.
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Apr 30, 2013 3:10 PM, "Utkarsh Sengar"  wrote:
> >
> >> Solr 4.0 was indexing data and the machine crashed.
> >>
> >> Any suggestions on how to recover my index since I don't want to delete
> my
> >> data directory?
> >>
> >> When I try to start it again, I get this error:
> >> ERROR 12:01:46,493 Failed to load Solr core: xyz.index1
> >> ERROR 12:01:46,493 Cause:
> >> ERROR 12:01:46,494 Error opening new searcher
> >> org.apache.solr.common.SolrException: Error opening new searcher
> >> at org.apache.solr.core.SolrCore.(SolrCore.java:701)
> >> at org.apache.solr.core.SolrCore.(SolrCore.java:564)
> >> at
> >>
> >>
> org.apache.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:213)
> >> at
> >>
> com.datastax.bdp.plugin.SolrCorePlugin.activateImpl(SolrCorePlugin.java:66)
> >> at
> >>
> >>
> com.datastax.bdp.plugin.PluginManager$PluginInitializer.call(PluginManager.java:161)
> >> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >> at java.lang.Thread.run(Thread.java:662)
> >> Caused by: org.apache.solr.common.SolrException: Error opening new
> searcher
> >> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1290)
> >> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1402)
> >> at org.apache.solr.core.SolrCore.(SolrCore.java:675)
> >> ... 9 more
> >> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments*
> >> file found in
> NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@
> >> /media/SSD/data/solr.data/rlcatalogks.prodinfo/index
> >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@d7581b;
> >> maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_73ne_nrm.cfs,
> >> _73ng_Lucene40_0.tip, _73nh_nrm.cfs, _73ng_Lucene40_0.tim, _73nf.fnm,
> >> _73n5_Lucene40_0.frq, _73ne.fdt, _73nh.fdx, _73ne_nrm.cfe, _73ne.fdx,
> >> _73ne_Lucene40_0.tim, _73ne.si, _73ni.fnm, _73nh_Lucene40_0.prx,
> >> _73ni.fdt,
> >> _73n5.si, _73ne_Lucene40_0.tip, _73nf_Lucene40_0.frq,
> >> _73nf_Lucene40_0.prx,
> >> _73nf_nrm.cfe, _73ne_Lucene40_0.frq, _73ng_Lucene40_0.prx,
> >> _73nf_Lucene40_0.tip, _73n5.fdx, _73ng_Lucene40_0.frq, _73ng.fnm,
> >> _73ni.fdx, _73n5.fnm, _73nf_Lucene40_0.tim, _73ni.si, _73n5.fdt,
> >> _73nf_nrm.cfs, _73nh_nrm.cfe, _73ni_Lucene40_0.frq, _73ng.fdx,
> >> _73ne_Lucene40_0.prx, _73nh.fnm, _73nh_Lucene40_0.tip,
> >> _73nh_Lucene40_0.tim, _73nh.si, _73n5_Lucene40_0.tip,
> >> _73ni_Lucene40_0.prx,
> >> _73n5_Lucene40_0.tim, _73nf.si, _73ng_nrm.cfe, _73n5_Lucene40_0.prx,
> >> _392j_42f.del, _73ng.fdt, _73ng.si, _73ni_nrm.cfe, _73n5_nrm.cfe,
> >> _73ni_nrm.cfs, _73nf.fdx, _73ni_Lucene40_0.tip, _73n5_nrm.cfs,
> >> _73ni_Lucene40_0.tim, _73nf.fdt, _73ne.fnm, _73nh.fdt,
> >> _73nh_Lucene40_0.frq, _73ng_nrm.cfs]
> >>
> >>
> >> --
> >> Thanks,
> >> -Utkarsh
> >>
>


Re: Delete from Solr Cloud 4.0 index..

2013-05-01 Thread Shawn Heisey
On 5/1/2013 3:39 AM, Annette Newton wrote:
> We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
> index.  A total of 24,000,000.  We issued a rather large delete yesterday
> morning to reduce that size by about half, this resulted in the loss of all
> shards while the delete was taking place, but when it had apparently
> finished as soon as we started writing again we continued to lose shards.
> 
> We have also issued much smaller deletes and lost shards but before they
> have always come back ok.  This time we couldn't keep them online.  We
> ended up rebuilding out cloud setup and switching over to it.
> 
> Is there a better process for deleting documents?  Is this expected
> behaviour?

How was the delete composed?  Was it a single request with a simple
query, or was a it a huge list of IDs or a huge query?  Was it millions
of individual delete queries?  All of those should be fine, but the last
option is the hardest on Solr, especially if you are doing a lot of
commits at the same time.  You might need to increase the zkTimeout
value on your startup commandline or in solr.xml.

How many machines do your eight SolrCloud replicas live on? How much RAM
to they have? How much of that memory is allocated to the Java heap?

Assuming that your SolrCloud is living on eight separate machines that
each have a 26GB index, I hope that you have 16 to 32 GB of RAM on each
of those machines, and that a large chunk of that RAM is not allocated
to Java or any other program.  If you don't, then it will be very
difficult to get good performance out of Solr, especially for index
commits.  If you have multiple 26GB shards per machine, you'll need even
more free memory.  The free memory is used to cache your index files.

Another possible problem here is Java garbage collection pauses.  If you
have a large max heap and don't have a tuned GC configuration, then the
only way to fix this is to reduce your heap and/or to tune Java's
garbage collection.

Thanks,
Shawn



Re: index operations

2013-05-01 Thread Shawn Heisey
On 5/1/2013 2:28 AM, Mav Peri wrote:
> We are seeing a large number of commit index operations on solr4
> master/slave setup (150 to 200+ operations). 
> 
> We don't initiate the commits manually as we are using  auto commit . I
> believe this results in search queries becoming slow/unresponsive over
> the course of a few hours  given sufficient load.
> 
> Any ideas or suggestions?

Including basic information is the key to getting help on a mailing
list.  In this case, basic information includes:

* Your Solr configuration, especially autoCommit.
** The whole configuration is better than a snippet.
* Your evidence, such as log files.

For a very small logs or config snippets with short lines, it's OK to
include it right in your email.  If it's very large or has long lines,
you need to use another method that will preserve readability.

The mailing list doesn't accept attachments.  Some of the portals where
people read the list do accept them, but the underlying list doesn't.
For configs or logs that fit in a few dozen kilobytes, you can use a
paste website like pastie.org.  Most paste websites have the ability to
do syntax highlighting on formats like XML or Java, and that is a major
key to readability.  Just include the resulting URL in your email.

If the things you want to include are larger than a few dozen KB, a file
sharing site like dropbox is better, as long as getting the file doesn't
require signing up, clicking through a large number of ads, or searching
a page full of irrelevant download links for the right one.  Dropbox
does not require any of these things.

If you need to sanitize the included data to remove sensitive
information, just be sure that you do a consistent search/replace so we
can still make sense of it even though part of the information is concealed.

Thanks,
Shawn



Re: index operations

2013-05-01 Thread Furkan KAMACI
If you use Solr 4.x and SolrCloud there is no master-slave architecture
that has been before. You can change autoSoftCommit time, autoCommit time
at solrconfig.xml. Also you can consider using commitWithin, it is
explained here: http://wiki.apache.org/solr/UpdateXmlMessages
Beside that options if you add new Solr Nodes into your SolrCloud they will
be added as a replica for shards as a round robin process. When you add new
replicas your search performance will increase as excepted.

2013/5/1 Mav Peri 

>  Hi there,
>
>  We are seeing a large number of commit index operations on solr4
> master/slave setup (150 to 200+ operations).
>
>  We don't initiate the commits manually as we are using  auto commit . I
> believe this results in search queries becoming slow/unresponsive over the
> course of a few hours  given sufficient load.
>
>  Any ideas or suggestions?
>
>  Many thanks
>
>  Mav
>
>
>
>
>
>
>


Re: How to Recovery Backup Snapshot at SolrCloud?

2013-05-01 Thread Furkan KAMACI
Sorry but what will I do? Will I copy everything under snapshot folder into
under index folder? If I don't run backup command and just copy index
folder anywhere else what is the difference between them (is it something
like consistency for if any writing operation for segment files did not
finished, or does backup command retrieves indexes data at RAM)



2013/5/1 Timothy Potter 

> I agree with Michael that you'll only ever need your backup if you
> lose all nodes hosting a shard (leader + all other replicas), so the
> tlog doesn't really factor in when recovering from backup.
>
> The snapshot created by the replication handler is the index only and
> it makes most sense in my mind to remove the tlog before firing up a
> new node using the snapshot.
>
> The way I see it, the backup would be needed if you lost all nodes
> hosting a shard, then you need to 1) recover one node (the leader)
> from backup, 2) re-index any documents that were indexed between the
> time of your last snapshot and the time of the failure, then 3) bring
> additional replicas online as needed. The additional replicas will
> sync with the leader, snap pulling the entire index. You could do it
> 1,2,3 or 1,3,2.
>
> Cheers,
> Tim
>
> On Tue, Apr 30, 2013 at 12:45 PM, Furkan KAMACI 
> wrote:
> > I had index and tlog folder under my data folder. I have a snapshot
> folder
> > too when I make backup. However what will I do next if I want to use
> > backup, will I remove index and tlog folders and put just my snapshot
> > folder? What folks do?
> >
> > 2013/4/30 Michael Della Bitta 
> >
> >> Presumably you'd only be restoring a backup in the face of a
> catastrophe.
> >>
> >> Yes, you'd need to stop the node. And the transaction logs may not be
> >> useful in this case. You'd have trouble reconciling them with the
> version
> >> of the index in your backup I would think.
> >>
> >> Anybody who knows more about this want to chime in?
> >>
> >>
> >> Michael Della Bitta
> >>
> >> 
> >> Appinions
> >> 18 East 41st Street, 2nd Floor
> >> New York, NY 10017-6271
> >>
> >> www.appinions.com
> >>
> >> Where Influence Isn’t a Game
> >>
> >>
> >> On Tue, Apr 30, 2013 at 11:03 AM, Furkan KAMACI  >> >wrote:
> >>
> >> > Should I stop the node first? And what will happen to transaction
> logs?
> >> > Should I backup it too?
> >> >
> >> > 2013/4/30 Michael Della Bitta 
> >> >
> >> > > That directory is the data directory for the core... you'd just
> swap it
> >> > in.
> >> > >
> >> > >
> >> > > Michael Della Bitta
> >> > >
> >> > > 
> >> > > Appinions
> >> > > 18 East 41st Street, 2nd Floor
> >> > > New York, NY 10017-6271
> >> > >
> >> > > www.appinions.com
> >> > >
> >> > > Where Influence Isn’t a Game
> >> > >
> >> > >
> >> > > On Tue, Apr 30, 2013 at 8:06 AM, Furkan KAMACI <
> furkankam...@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > Hi Folks;
> >> > > >
> >> > > > I can backup my indexes at SolrCloud via
> >> > > > http://_master_host_:_port_/solr/replication?command=backup
> >> > > > and it creates a file called snapshot. I know that I should pull
> that
> >> > > > directory any other safe place (a backup store) However what
> should I
> >> > do
> >> > > to
> >> > > > make a recovery from that backup file?
> >> > > >
> >> > >
> >> >
> >>
>


Re: Master - Slave File Sizes are not Same even after "command=abortfetch"

2013-05-01 Thread Furkan KAMACI
Shawn thanks for the detailed answer. I have 5 shards and 1 leader - 1
replica for each. I mean I have 10 Solr nodes. When I look at admin gui of
one of the shards leader I see that its replica has less MB of index than
leader. I don't update the data, I don't index new ones. I think that after
a time later it will sync its replica to itself but nothing has changed.

2013/5/1 Shawn Heisey 

> On 4/30/2013 8:33 AM, Furkan KAMACI wrote:
>
>> I think that replication occurs after commit by default. It has been long
>> time however there is still mismatch between leader and replica
>> (approximately 5 MB). I tried to pull indexes from leader but it is still
>> same.
>>
>
> My mail server has been down most of the day, and the Apache mail
> infrastructure hasn't noticed yet that I'm back up.  I don't have copies of
> the newest messages on this thread.  I checked the web archive to see what
> else has been said.  I'll be repeating some of what has been said before.
>
> On SolrCloud terminology: SolrCloud divides your index into one or more
> shards, each of which has a different piece of the index.  Each shard is
> made up of replicas.  One replica in each shard is designated leader. Note:
> a leader is still a replica, it is just the winner of the latest leader
> election.  Summary: shards, replicas, leader.
>
> One term that you are using is "follower" ... this is not a valid
> SolrCloud term.  It might make sense to use this term for a replica that is
> not a leader, but I have never seen it used in anything official. Any
> replica can become leader, if the conditions are just right.
>
> There are only two times that the leader replica has special significance
> - when you are indexing and when a replica starts operation, either as an
> existing replica that went down or as a new replica.
>
> In SolrCloud, replication is *NOT* used when you index new data.  The
> *ONLY* time that replication happens in SolrCloud is when a replica is
> starts up, and even then it will only happen if the leader cannot figure
> out how to use its transaction log to sync the replica.
>
> SolrCloud does distributed indexing.  This means that when an update comes
> in, SolrCloud determines which shard needs that update.  If the core that
> received the request is not the leader of that shard, the request is
> forwarded to the correct leader.  That leader will index the update and
> send it to all of the replicas for that shard, each of which will index the
> update independently.
>
> Because each replica indexes independently, you can end up with different
> sizes.  The actual search results should be the same, although scoring can
> sometimes be a little bit different between replicas because deleted
> documents that exist in one replica but not another will contribute to the
> score.  SolrCloud does not attempt to keep the replicas absolutely
> identical, as long as they contain the same non-deleted documents.
>
> Thanks,
> Shawn
>
>


Re: Only return snippets, not content

2013-05-01 Thread Michael Della Bitta
Since you're doing this, you might want to make sure lazy field loading is
on as well. Unfortunately I can't link you to the wiki because it still
down (uh oh), but it's a setting in solrconfig.xml.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, May 1, 2013 at 7:48 AM, Bai Shen  wrote:

> Fixed it.  I just had to add "&fl=" with all of the fields except for
> content that I wanted returned.
>
>
> On Wed, May 1, 2013 at 7:38 AM, Bai Shen  wrote:
>
> > I have a lot of large files that I've indexed into solr.  Is there a way
> > to have solr return the snippets instead of the content?  I'm only
> > displaying the snippets to my users, so transferring the content as well
> > just wastes bandwidth.
> >
> > Thanks.
> >
>


Re: Only return snippets, not content

2013-05-01 Thread Bai Shen
Fixed it.  I just had to add "&fl=" with all of the fields except for
content that I wanted returned.


On Wed, May 1, 2013 at 7:38 AM, Bai Shen  wrote:

> I have a lot of large files that I've indexed into solr.  Is there a way
> to have solr return the snippets instead of the content?  I'm only
> displaying the snippets to my users, so transferring the content as well
> just wastes bandwidth.
>
> Thanks.
>


Only return snippets, not content

2013-05-01 Thread Bai Shen
I have a lot of large files that I've indexed into solr.  Is there a way to
have solr return the snippets instead of the content?  I'm only displaying
the snippets to my users, so transferring the content as well just wastes
bandwidth.

Thanks.


Re: How to recover from "Error opening new searcher" when machine crashed while indexing

2013-05-01 Thread Michael McCandless
Alas I think CheckIndex can't do much here: there is no segments file,
so you'll have to reindex from scratch.

Just to check: did you ever called commit while building the index
before the machine crashed?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Apr 30, 2013 at 8:17 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Try running the CheckIndex tool.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Apr 30, 2013 3:10 PM, "Utkarsh Sengar"  wrote:
>
>> Solr 4.0 was indexing data and the machine crashed.
>>
>> Any suggestions on how to recover my index since I don't want to delete my
>> data directory?
>>
>> When I try to start it again, I get this error:
>> ERROR 12:01:46,493 Failed to load Solr core: xyz.index1
>> ERROR 12:01:46,493 Cause:
>> ERROR 12:01:46,494 Error opening new searcher
>> org.apache.solr.common.SolrException: Error opening new searcher
>> at org.apache.solr.core.SolrCore.(SolrCore.java:701)
>> at org.apache.solr.core.SolrCore.(SolrCore.java:564)
>> at
>>
>> org.apache.solr.core.CassandraCoreContainer.load(CassandraCoreContainer.java:213)
>> at
>> com.datastax.bdp.plugin.SolrCorePlugin.activateImpl(SolrCorePlugin.java:66)
>> at
>>
>> com.datastax.bdp.plugin.PluginManager$PluginInitializer.call(PluginManager.java:161)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> at java.lang.Thread.run(Thread.java:662)
>> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
>> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1290)
>> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1402)
>> at org.apache.solr.core.SolrCore.(SolrCore.java:675)
>> ... 9 more
>> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments*
>> file found in NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@
>> /media/SSD/data/solr.data/rlcatalogks.prodinfo/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@d7581b;
>> maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_73ne_nrm.cfs,
>> _73ng_Lucene40_0.tip, _73nh_nrm.cfs, _73ng_Lucene40_0.tim, _73nf.fnm,
>> _73n5_Lucene40_0.frq, _73ne.fdt, _73nh.fdx, _73ne_nrm.cfe, _73ne.fdx,
>> _73ne_Lucene40_0.tim, _73ne.si, _73ni.fnm, _73nh_Lucene40_0.prx,
>> _73ni.fdt,
>> _73n5.si, _73ne_Lucene40_0.tip, _73nf_Lucene40_0.frq,
>> _73nf_Lucene40_0.prx,
>> _73nf_nrm.cfe, _73ne_Lucene40_0.frq, _73ng_Lucene40_0.prx,
>> _73nf_Lucene40_0.tip, _73n5.fdx, _73ng_Lucene40_0.frq, _73ng.fnm,
>> _73ni.fdx, _73n5.fnm, _73nf_Lucene40_0.tim, _73ni.si, _73n5.fdt,
>> _73nf_nrm.cfs, _73nh_nrm.cfe, _73ni_Lucene40_0.frq, _73ng.fdx,
>> _73ne_Lucene40_0.prx, _73nh.fnm, _73nh_Lucene40_0.tip,
>> _73nh_Lucene40_0.tim, _73nh.si, _73n5_Lucene40_0.tip,
>> _73ni_Lucene40_0.prx,
>> _73n5_Lucene40_0.tim, _73nf.si, _73ng_nrm.cfe, _73n5_Lucene40_0.prx,
>> _392j_42f.del, _73ng.fdt, _73ng.si, _73ni_nrm.cfe, _73n5_nrm.cfe,
>> _73ni_nrm.cfs, _73nf.fdx, _73ni_Lucene40_0.tip, _73n5_nrm.cfs,
>> _73ni_Lucene40_0.tim, _73nf.fdt, _73ne.fnm, _73nh.fdt,
>> _73nh_Lucene40_0.frq, _73ng_nrm.cfs]
>>
>>
>> --
>> Thanks,
>> -Utkarsh
>>


Delete from Solr Cloud 4.0 index..

2013-05-01 Thread Annette Newton
We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
index.  A total of 24,000,000.  We issued a rather large delete yesterday
morning to reduce that size by about half, this resulted in the loss of all
shards while the delete was taking place, but when it had apparently
finished as soon as we started writing again we continued to lose shards.

We have also issued much smaller deletes and lost shards but before they
have always come back ok.  This time we couldn't keep them online.  We
ended up rebuilding out cloud setup and switching over to it.

Is there a better process for deleting documents?  Is this expected
behaviour?

Thanks very much.

-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


Handling large no. of ids in solr

2013-05-01 Thread lavesh

1
down vote
favorite
I need to perform an online search in solr i.e user need to find list of
user which are online with particular criteria.

how i am handling this:
we store the ids of user in a table and i send all online user id in solr
request like

&fq=-id:(id1 id2 id3 id5000)
problem with this approach is that when ids become large, solr talking too
much time to resolved and we need to transfer large request over the
network.

one solution can be use of join in solr but online data change regularly and
i cant index data everytime(say 5-10 min, it should be at-least an hr)

other solution i think of firing this query internally from solr based on
certain parameter in url. I dnt have much idea abt solr internals so don't
know how to proceed on this.

can anyone help me on this or can provide an alternate solution ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-large-no-of-ids-in-solr-tp4060218.html
Sent from the Solr - User mailing list archive at Nabble.com.


index operations

2013-05-01 Thread Mav Peri
Hi there,

We are seeing a large number of commit index operations on solr4 master/slave 
setup (150 to 200+ operations).

We don't initiate the commits manually as we are using  auto commit . I believe 
this results in search queries becoming slow/unresponsive over the course of a 
few hours  given sufficient load.

Any ideas or suggestions?

Many thanks

Mav




[cid:545DCE1B-4BF3-4076-9269-DD42DD9BD8EB]