date:20131007

Re: Problem regarding queries enclosed in double quotes in Solr 3.4

2013-10-07 Thread Kunal Mittal

Upayavira thanks for replying.

When we run the quoted query in edismax, we get correct results. The only
problem is that the quoted queries are very slow. 

Can you please point me to a link which talks about the quoted queries in
the edismax parser? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856p4093828.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to warm up filter queries for a category field with 1000 possible values ?

2013-10-07 Thread user 01

what's the way to warm up filter queries for a category field with 1000
possible values. Would I need to write 1000 lines manually in the
solrconig.xml or what is the format?

Re: How to warm up filter queries for a category field with 1000 possible values ?

2013-10-07 Thread Furkan KAMACI

If you are asking to read from a file for warm up and if there is not a
capability for what you want I can open a Jira issue and send a patch.


2013/10/7 user 01 user...@gmail.com

 what's the way to warm up filter queries for a category field with 1000
 possible values. Would I need to write 1000 lines manually in the
 solrconig.xml or what is the format?

Re: How to warm up filter queries for a category field with 1000 possible values ?

2013-10-07 Thread user 01

Sorry, didn't get you exactly.

I need to warm up my queries after the newSearcher  firstSearcher are
initialized. I am trying to warm up the filter caches for a category field
but I have almost 1000 categories( changing with time), which make it
impossible to list them in solrConfig.xml. Is there any way to iterate over
all categories  warm up the query for each ?


On Mon, Oct 7, 2013 at 12:10 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 If you are asking to read from a file for warm up and if there is not a
 capability for what you want I can open a Jira issue and send a patch.


 2013/10/7 user 01 user...@gmail.com

  what's the way to warm up filter queries for a category field with 1000
  possible values. Would I need to write 1000 lines manually in the
  solrconig.xml or what is the format?

Re: Soft commit and flush

2013-10-07 Thread adfel70

I understand the bottom line that soft commits are about visibility, hard
commits are about durability. I am just trying to gain better understanding
what happens under the hood...
2 more related questions you made me think of:
1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or
just for hard commit?
2. If soft commit does not flush - all data exists in RAM until we call hard
commit? If so, using soft commit without calling hard commit could cause OOE
... ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Shard split issue

2013-10-07 Thread Shalin Shekhar Mangar

I think what is happening here is that the sub shard replicas are taking
time to recover. We use a core admin command to wait for the replicas to
become active before the shard states are switched. The timeout value for
that command is just 120 seconds. We should wait for more than that. I'll
open an issue.


On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com wrote:

 Seems the issue occurs when the shard has more than one replica.

 I unload all replicas of the shard (less 1 to do the split) and the
 SPLITSHARD finished as expected, the parent went to inactive and the
 children active.

 If the parent has more than 1 replica, the process apparently is finish,
 the total number of documents of children are the same of the parent, the
 problem is that the parent never goes to inactive state and the children
 are stuck in construction state.

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote:

  I can attach the full log of the process if you want.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote:
 
   The error in log are:
  
   ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
 org.apache.solr.common.SolrException: splitshard the collection time
 out:300s
   ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
 null:org.apache.solr.common.SolrException: splitshard the collection time
 out:300s
  
  
   INFO  - 2013-10-05 22:48:54.083;
 org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection
 Processor: Message id:/overseer/collection-queue-work/qn-000138
 complete,
 response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I
 was asked to wait on state active for 192.168.
 20.105:8983_solr but I still do not see the requested state. I see state:
 recovering live:true},Operation splitshard caused
 exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create
 subshard replicas or timed out waiting for them to come
 up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out
 waiting for them to come up,rspCode=500}}
  
  
   --
   Yago Riveiro
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
   On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote:
  
I don't have the log, the rotation log file is configured to only 5
 files with a small size, I will reconfigured to a high value and retry the
 split again.
   
   
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote:
   
 On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro 
 yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote:

  How I can see the logs of the parent?
 
  They are stored on solr.log?

 Yes.

 --
 Regards,
 Shalin Shekhar Mangar.



   
   
  
 




-- 
Regards,
Shalin Shekhar Mangar.

Difference Between Query Time and Elapsed Time at Solrj Query Response

2013-10-07 Thread Furkan KAMACI

QueryResponse object at Solrj has two different methods for required time
for a given query. One of them is for *QTime(queryTime)* and the other one
is for *elapsedTime. *What are the differences between them and what
exactly for elapsedTime?

Re: Shard split issue

2013-10-07 Thread Yago Riveiro

If the replica has 20G must probably the recovery will take more than 120 
seconds. 

In my case I have ssd's and 120 it's not enough. 

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 7, 2013 at 9:19 AM, Shalin Shekhar Mangar wrote:

 I think what is happening here is that the sub shard replicas are taking
 time to recover. We use a core admin command to wait for the replicas to
 become active before the shard states are switched. The timeout value for
 that command is just 120 seconds. We should wait for more than that. I'll
 open an issue.
 
 
 On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
 
  Seems the issue occurs when the shard has more than one replica.
  
  I unload all replicas of the shard (less 1 to do the split) and the
  SPLITSHARD finished as expected, the parent went to inactive and the
  children active.
  
  If the parent has more than 1 replica, the process apparently is finish,
  the total number of documents of children are the same of the parent, the
  problem is that the parent never goes to inactive state and the children
  are stuck in construction state.
  
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
  On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote:
  
   I can attach the full log of the process if you want.
   
   --
   Yago Riveiro
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
   On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote:
   
The error in log are:

ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: splitshard the collection time
  out:300s
ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
   
  
  null:org.apache.solr.common.SolrException: splitshard the collection time
  out:300s


INFO - 2013-10-05 22:48:54.083;
  org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection
  Processor: Message id:/overseer/collection-queue-work/qn-000138
  complete,
  response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I
  was asked to wait on state active for 192.168.
  20.105:8983_solr but I still do not see the requested state. I see state:
  recovering live:true},Operation splitshard caused
  exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create
  subshard replicas or timed out waiting for them to come
  up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out
  waiting for them to come up,rspCode=500}}


--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote:

 I don't have the log, the rotation log file is configured to only 5
  files with a small size, I will reconfigured to a high value and retry the
  split again.
 
 
 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote:
 
  On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro 
  yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote:
  
   How I can see the logs of the parent?
   
   They are stored on solr.log?
  
  Yes.
  
  --
  Regards,
  Shalin Shekhar Mangar.
  
 

   
  
  
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

[SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam


Hi folks,

Long story short: I'm occasionally getting exceptions under heavy load 
(SocketException: Connection reset). I would expect HttpSolrServer to 
try again maxRetries-times, but it doesn't.


For reasons I don't entirely understand, the call to 
httpClient.execute(method) is not inside the retry block (and thus will 
never be retried).


Is this a bug in HttpSolrServer? Or is this intended behaviour? I'd 
rather not wrap my code in a retry mechanism if HttpSolrServer provides one.


Thx,

 - Bram

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Furkan KAMACI

Hi Bram;

Could you send you error logs?


2013/10/7 Bram Van Dam bram.van...@intix.eu

 Hi folks,

 Long story short: I'm occasionally getting exceptions under heavy load
 (SocketException: Connection reset). I would expect HttpSolrServer to try
 again maxRetries-times, but it doesn't.

 For reasons I don't entirely understand, the call to
 httpClient.execute(method) is not inside the retry block (and thus will
 never be retried).

 Is this a bug in HttpSolrServer? Or is this intended behaviour? I'd rather
 not wrap my code in a retry mechanism if HttpSolrServer provides one.

 Thx,

  - Bram

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam


On 10/07/2013 11:51 AM, Furkan KAMACI wrote:

Could you send you error logs?


Whoops, forgot to paste:


Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: http://localhost:8080/solr/fooIndex
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
at 
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
at 
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
at 
org.violet.search.service.IndexingService.addDocument(IndexingService.java:79) 
~[Violet-Search-1.06.003.jar:na]

... 8 common frames omitted
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:185) 
~[na:1.6.0_24]
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) 
~[httpcore-4.2.2.jar:4.2.2]
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) 
~[httpclient-4.2.3.jar:4.2.3]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353) 
~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]

... 13 common frames omitted

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Furkan KAMACI

One more thing, could you say that which version of Solr you are using?


2013/10/7 Bram Van Dam bram.van...@intix.eu

 On 10/07/2013 11:51 AM, Furkan KAMACI wrote:

 Could you send you error logs?


 Whoops, forgot to paste:


 Caused by: org.apache.solr.client.solrj.**SolrServerException:
 IOException occured when talking to server at: http://localhost:8080/solr/
 **fooIndex http://localhost:8080/solr/fooIndex
 at 
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:416)
 ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
 at 
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181)
 ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
 at org.apache.solr.client.solrj.**request.AbstractUpdateRequest.**
 process(AbstractUpdateRequest.**java:117) ~[solr-solrj-4.2.1.jar:4.2.1
 1461071 - mark - 2013-03-26 08:26:57]
 at 
 org.apache.solr.client.solrj.**SolrServer.add(SolrServer.**java:116)
 ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
 at 
 org.apache.solr.client.solrj.**SolrServer.add(SolrServer.**java:102)
 ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
 at 
 org.violet.search.service.**IndexingService.addDocument(**IndexingService.java:79)
 ~[Violet-Search-1.06.003.jar:**na]
 ... 8 common frames omitted
 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.**read(SocketInputStream.java:**185)
 ~[na:1.6.0_24]
 at org.apache.http.impl.io.**AbstractSessionInputBuffer.**
 fillBuffer(**AbstractSessionInputBuffer.**java:166)
 ~[httpcore-4.2.2.jar:4.2.2]
 at 
 org.apache.http.impl.io.**SocketInputBuffer.fillBuffer(**SocketInputBuffer.java:90)
 ~[httpcore-4.2.2.jar:4.2.2]
 at org.apache.http.impl.io.**AbstractSessionInputBuffer.**
 readLine(**AbstractSessionInputBuffer.**java:281)
 ~[httpcore-4.2.2.jar:4.2.2]
 at org.apache.http.impl.conn.**DefaultHttpResponseParser.**
 parseHead(**DefaultHttpResponseParser.**java:92)
 ~[httpclient-4.2.3.jar:4.2.3]
 at org.apache.http.impl.conn.**DefaultHttpResponseParser.**
 parseHead(**DefaultHttpResponseParser.**java:62)
 ~[httpclient-4.2.3.jar:4.2.3]
 at org.apache.http.impl.io.**AbstractMessageParser.parse(**
 AbstractMessageParser.java:**254) ~[httpcore-4.2.2.jar:4.2.2]
 at org.apache.http.impl.**AbstractHttpClientConnection.**
 receiveResponseHeader(**AbstractHttpClientConnection.**java:289)
 ~[httpcore-4.2.2.jar:4.2.2]
 at org.apache.http.impl.conn.**DefaultClientConnection.**
 receiveResponseHeader(**DefaultClientConnection.java:**252)
 ~[httpclient-4.2.3.jar:4.2.3]
 at org.apache.http.impl.conn.**ManagedClientConnectionImpl.**
 receiveResponseHeader(**ManagedClientConnectionImpl.**java:191)
 ~[httpclient-4.2.3.jar:4.2.3]
 at org.apache.http.protocol.**HttpRequestExecutor.**
 doReceiveResponse(**HttpRequestExecutor.java:300)
 ~[httpcore-4.2.2.jar:4.2.2]
 at 
 org.apache.http.protocol.**HttpRequestExecutor.execute(**HttpRequestExecutor.java:127)
 ~[httpcore-4.2.2.jar:4.2.2]
 at org.apache.http.impl.client.**DefaultRequestDirector.**
 tryExecute(**DefaultRequestDirector.java:**717)
 ~[httpclient-4.2.3.jar:4.2.3]
 at org.apache.http.impl.client.**DefaultRequestDirector.**execute(
 **DefaultRequestDirector.java:**522) ~[httpclient-4.2.3.jar:4.2.3]
 at 
 org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:906)
 ~[httpclient-4.2.3.jar:4.2.3]
 at 
 org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:805)
 ~[httpclient-4.2.3.jar:4.2.3]
 at 
 org.apache.http.impl.client.**AbstractHttpClient.execute(**AbstractHttpClient.java:784)
 ~[httpclient-4.2.3.jar:4.2.3]
 at 
 org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:353)
 ~[solr-solrj-4.2.1.jar:4.2.1 1461071 - mark - 2013-03-26 08:26:57]
 ... 13 common frames omitted

Re: [SolrJ] HttpSolrServer - maxRetries

2013-10-07 Thread Bram Van Dam


On 10/07/2013 12:55 PM, Furkan KAMACI wrote:

One more thing, could you say that which version of Solr you are using?


The stacktrace comes from 4.2.1, but I suspect that this could occur on 
4.4 as well. I've not been able to reproduce this consistently: it has 
happened twice (!) after indexing around 100 million documents.

feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Soyez Olivier

Hello,

In my company, we use Solr in production to offer full text search on
mailboxes.
We host dozens million of mailboxes, but only webmail users have such
feature (few millions).
We have the following use case :
- non static indexes with more update (indexing and deleting), than
select requests (ratio 7:1)
- homogeneous configuration for all indexes
- not so much user at the same time

We started to index mailboxes with Solr 1.4 in 2010, on a subset of
400,000 users.
- we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
instance
- we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
index (~2 million users)
- we upgraded to Solr 3.5 in 2012
As indexes grew, IOPS and the response times have increased more and more.

The index size was mainly due to stored fields (large .fdt files)
Retrieving these fields from the index was costly, because of many seek
in large files, and no limit usage possible.
There is also an overhead on queries : too many results are filtered to
find only results concerning user.
For these reason and others, like not pooled users, hardware savings,
better scoring, some requests that do not support filtering, we have
decided to use the LotsOfCores feature.

Our goal was to change the current I/O usage : from lots of random I/O
access on huge segments to mostly sequential I/O access on small segments.
For our use case, it's not a big deal, that the first query to one not
yet loaded core will be slow.
And, we don’t need to fit all the cores into memory at once.

We started from the SOLR-1293 issue and the LotsOfCores wiki page to
finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
core).
We don't need anymore to run so many Solr per node. We are now able to
have around 5 cores per Solr and we plan to grow to 100,000 cores
per instance.
In a first time, we used the solr.xml persistence. All cores have
loadOnStartup=false and transient=true attributes, so a cold start
is very quick. The response times were better than ever, in comparaison
with poor response times, we had before using LotsOfCores.

We added 2 Cores options :
- numBuckets to create a subdirectory based on a hash on the corename
% numBuckets in the core Datadir, because all cores cannot live in the
same directory
- Auto with 3 differents values :
1) false : default behaviour
2) createLoad : create, if not exist, and load the core on the fly on
the first incoming request (update, select).
3) onlyLoad : load the core on the fly on the first incoming request
(update, select), if exist on disk

Then, to improve performance and avoid synchronization in the solr.xml
persistence : we disabled it.
The drawback is we cannot see anymore all the availables cores list with
the admin core status command, only those warmed up.
Finally, we can achieve very good performances with Solr LotsOfCores :
- Index 5 emails (avg) + commit + search : x4.9 faster response time
(Mean), x5.4 faster (95th per)
- Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
faster (95th per)
- Search : x3.7 faster response time (Mean) 4x faster (95th per)

In fact, the better performance is mainly due to the little size of each
index, but also thanks to the isolation between cores (updates and
queries on many mailboxes don’t have side effects to each other).
One important thing with the LotsOfCores feature is to take care of :
- the number of file descriptors, it used a lot (need to increase global
max and per process fd)
- the value of the transientCacheSize depending of the RAM size and the
PermGen allocated size
- the leak of ClassLoader that increase minor GC times, when CMS GC is
enabled (use -XX:+CMSClassUnloadingEnabled)
- the overhead to parse solrconfig.xml and load dependencies to open
each core
- lotsOfCores doesn’t work with SolrCloud, then we store indexes
location outside of Solr. We have Solr proxies to route requests to the
right instance.

Not in production, we try the core discovery feature in Solr 4.4 with a
lots of cores.
When you start, it spend a lot of times to discover cores due to a big
number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
not done yet). It will be great to have for example an option for a core
discovery in background, or just to be able to disable it, like we do in
our use case.

If someone is interested in these new options for LotsOfCores feature,
just tell me


Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et

Re: Does the queryResultCache, contain all the results returned by main query or after filtering out

2013-10-07 Thread Erick Erickson

No, the queryResultCache contains the top N for the query, _including_
the filters.
The idea is that you should be able to get the next page of results
without going
to any searching code. You couldn't do this if in the scenario you describe.

If your filters are truly unique, you'll gain a little bit of
performance by specifying
the local param for your fq clause of: {!cache=false}, that will just
bypass adding
it to the filterCache.

Try thinking about it backwards. Especially if you don't care about
scoring, say you
are sorting by distance. Don't put the terms in the main query, put
everything in in
fq clauses so your query becomes something like:
q=*:*fq=field:term AND field:termfq={!cache=false}unique
clause1fq={!cache=false}unique clause2

The beauty of his is that you can assign weight to the cache=false clauses so
they will only be calculated for docs that make it through your
lower-cost fq clauses.
And the main query.
See: http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/

Best
Erick

On Sun, Oct 6, 2013 at 9:56 AM, Ertio Lew ertio...@gmail.com wrote:
 Background: I need to find items matching keywords provided by user,
 filtered by availability within certain radius from his location  filtered
 by other user specific params.

 So I think this may be very relevant for me because my filter queries may
 be very unique all the time(since I am filtering by geospatial search,
 people find items nearest to them). Also some additional user specific
 filters. So filter queries will always be unique, but the people may use
 common keywords to lookup, so main query (q param) may be common most
 times. So if queryResultCache contain all the results returned by main
 query(q param) , as before filtering then only I think this queryResultCache
 may be helpful for me. Isn't it ?


 On Sun, Oct 6, 2013 at 7:13 PM, Erick Erickson erickerick...@gmail.comwrote:

 First, why is it important to you? General background or a specific
 problem you're trying to address?

 But to answer, no. The queryResultCache contains the top N
 ids for the query. You control N by setting queryResultWindowSize
 in solrconfig.xml. It's often set to 2x the usual rows parameter
 on the theory that people rarely page past the second page.

 Best,
 Erick

 On Sun, Oct 6, 2013 at 5:22 AM, Ertio Lew ertio...@gmail.com wrote:
  Does the queryResultCache, contain all the results returned by main
 query(q
  param) or it contains results prepared after all filter queries ?

How to share Schema between multicore on Solr 4.4

2013-10-07 Thread Dharmendra Jaiswal

I am using Solr 4.4 version with SolrCloud on Windows machine.
Somehow i am not able to share schema between multiple core.

My solr.xml file look like:-
solr
str name=shareSchema${shareSchema:true}/str
solrcloud
str name=hostContext${hostContext:SolrEngine}/str
int name=hostPort${tomcat.port:8080}/int
int name=zkClientTimeout${zkClientTimeout:15000}/int
/solrcloud

I have used core.properties file for each core. One of the core (say
collection1) contains schema.xml file and rest will having all the config
file excluding schema.xml.

core.properties file contains
name=corename

After deployment 
I am getting following error

collection2:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Error loading schema resource schema.xml 

Please note that i have provided shareSchema=true in solr.xml file.

Please let me know if anything is missing.
Any pointer will be helpful.

Thanks,
Dharmendra Jaiswal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-share-Schema-between-multicore-on-Solr-4-4-tp4093881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Improving indexing performance

2013-10-07 Thread Erick Erickson

Just skimmed, but the usual reason you can't max out the server
is that the client can't go fast enough. Very quick experiment:
comment out the server.add line in your client and run it again,
does that speed up the client substantially? If not, then the time
is being spent on the client.

Or split your csv file into, say, 5 parts and run it from 5 different
PCs in parallel.

bq:  I can't rely on auto commit, otherwise I get an OutOfMemory error
This shouldn't be happening, I'd get to the bottom of this. Perhaps simply
allocating more memory to the JVM running Solr.

bq: committing every 100k docs gives worse performance
It'll be best to specify openSearcher=false for max indexing throughput
BTW. You should be able to do this quite frequently, 15 seconds seems
quite reasonable.

Best,
Erick

On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla matteo.gro...@gmail.com wrote:
 I'd like to have some suggestion on how to improve the indexing performance 
 on the following scenario
 I'm uploading 1M docs to solr,

 every docs has
 id: sequential number
 title:  small string
 date: date
 body: 1kb of text

 Here are my benchmarks (they are all single executions, not averages from 
 multiple executions):

 1)  using the updaterequesthandler
 and streaming docs from a csv file on the same disk of solr
 auto commit every 15s with openSearcher=false and commit after last 
 document

 total time: 143035ms

 1.1)using the updaterequesthandler
 and streaming docs from a csv file on the same disk of solr
 auto commit every 15s with openSearcher=false and commit after last 
 document
 ramBufferSizeMB500/ramBufferSizeMB
 maxBufferedDocs10/maxBufferedDocs

 total time: 134493ms

 1.2)using the updaterequesthandler
 and streaming docs from a csv file on the same disk of solr
 auto commit every 15s with openSearcher=false and commit after last 
 document
 mergeFactor30/mergeFactor

 total time: 143134ms

 2)  using a solrj client from another pc in the lan (100Mbps)
 with httpsolrserver
 with javabin format
 add documents to the server in batches of 1k docs   ( server.add( 
 collection ) )
 auto commit every 15s with openSearcher=false and commit after last 
 document

 total time: 139022ms

 3)  using a solrj client from another pc in the lan (100Mbps)
 with concurrentupdatesolrserver
 with javelin format
 add documents to the server in batches of 1k docs   ( server.add( 
 collection ) )
 server queue size=20k
 server threads=4
 no auto-commit and commit every 100k docs

 total time: 167301ms


 --On the solr server--
 cpu averages25%
 at best 100% for 1 core
 IO  is still far from being saturated
 iostat gives a pattern like this (every 5 s)

 time(s) %util
 100 45,20
 105 1,68
 110 17,44
 115 76,32
 120 2,64
 125 68
 130 1,28

 I thought that using concurrentupdatesolrserver I was able to max cpu or IO 
 but I wasn't.
 With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get 
 an OutOfMemory error
 and I found that committing every 100k docs gives worse performance than auto 
 commit every 15s (benchmark 3 with httpsolrserver took 193515)

 I'd really like to understand why I can't max out the resources on the server 
 hosting solr (disk above all)
 And I'd really like to understand what I'm doing wrong with 
 concurrentupdatesolrserver

 thanks

Re: Doing time sensitive search in solr

2013-10-07 Thread Erick Erickson

Wait, are you saying you have fields like
2013-12-01T00:00:00Z_entryDate? So
you have some wildcard definition in your
schema like
*_entryDate type=tdate?
If so, I think your model is just wrong and you should
have some field(s) that you store dates in.

That aside, and assuming you have wildcards like
I'm guessing, you could have a copyfield to
like
source=*_entryDate dest=bag_of_dates
and do your ranges on bag_of_dates.

Which would be the same as putting your dates
in a single field with a fixed name in the first place.

Best,
Erick

On Sun, Oct 6, 2013 at 4:34 PM, Darniz rnizamud...@edmunds.com wrote:
Thanks Eric.

i hope i understood correctly, but my main concern is i have to tie specific
content indexed to a specific time range. and make that document come up in
search results only for that time. As i have mentioned in my previous
example we have multiple data-string structures which makes a bit more
complicated, on top of that i don't know what will be the exact date. Hence
if someone searches for toyota and if today is 6-OCT-2013 this doc should
not come in search results since the keyword toyota should be searched only
after 1-DEC-2013.

date name=2013-09-01T00:00:00Z_entryDate2013-09-01T00:00:00Z/date
str name=2013-09-01T0:00:00Z_entryTextSept content : Honda is releasing
the car this month /str

date name=2013-12-01T00:00:00Z_entryDate2013-12-01T00:00:00Z/date
str name=2013-12-01T00:00:00Z_entryTextDec content : Toyota is releasing
the car this month /str

i dont know using a copy field might solve this or correct me if i am wrong.

may be we are pursuing something which is not meant for Solr.

Thanks
Rashid

--
View this message in context:
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to warm up filter queries for a category field with 1000 possible values ?

2013-10-07 Thread Erick Erickson

That's what the autowarm number for filterCache is about. It
re-executes the last N fq clauses and caches them. Similarly
for some of the other autowarm.

But don't go wild here. Measure _then_ fix. Usually autowarming
just a few  ( 32) is sufficient. And remember that autowarming
is done whenever you open a new searcher, so if you have your
soft commits configured to be, say, 5 seconds you'll get minimal
benefit here.

Are you saying you have almost 1,000 differently-named fields in
your documents? Or 1,000 _values_ in your category field? In either
case, please measure your query performance before assuming
you need to autowarm excessively. If you don't autowarm at all
and your first few queries are acceptable after you open a new searcher
then don't worry about it.

Often the biggest win is just filling the lower-level Lucene caches
which you can do with a very few queries.

Best
Erick

On Mon, Oct 7, 2013 at 2:49 AM, user 01 user...@gmail.com wrote:
 Sorry, didn't get you exactly.

 I need to warm up my queries after the newSearcher  firstSearcher are
 initialized. I am trying to warm up the filter caches for a category field
 but I have almost 1000 categories( changing with time), which make it
 impossible to list them in solrConfig.xml. Is there any way to iterate over
 all categories  warm up the query for each ?


 On Mon, Oct 7, 2013 at 12:10 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 If you are asking to read from a file for warm up and if there is not a
 capability for what you want I can open a Jira issue and send a patch.


 2013/10/7 user 01 user...@gmail.com

  what's the way to warm up filter queries for a category field with 1000
  possible values. Would I need to write 1000 lines manually in the
  solrconig.xml or what is the format?

Re: Soft commit and flush

2013-10-07 Thread Erick Erickson

bq: Does the NRTCachingDirectoryFactory relevant for both types of commit, or
just for hard commit

Don't know the code deeply, but NRT==Near Real Time == Soft commit I'd guess.

bq: If soft commit does not flush...

soft commit flushes the transaction log. On restart if the content of
the tlog isn't
in the index, then it's replayed to catch up the index. OOE? Out Of Energy? You
can optionally set up soft commits to fsync the tlog if you want to
eliminate the
remote possibility that you have an op-system (not JVM) crash between the time
the JVM passes the write off to the op system and the op system writes the
bits to disk.

Best,
Erick

On Mon, Oct 7, 2013 at 2:57 AM, adfel70 adfe...@gmail.com wrote:
 I understand the bottom line that soft commits are about visibility, hard
 commits are about durability. I am just trying to gain better understanding
 what happens under the hood...
 2 more related questions you made me think of:
 1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or
 just for hard commit?
 2. If soft commit does not flush - all data exists in RAM until we call hard
 commit? If so, using soft commit without calling hard commit could cause OOE
 ... ?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Difference Between Query Time and Elapsed Time at Solrj Query Response

2013-10-07 Thread Erick Erickson

Query time  is the time spent in Solr getting the search
results. It does NOT include reading the bits off disk
to assemble the response etc.

elapsed time is the time from when the query was sent
to the time it gets back. It includes qtime, reading the bits
off disk to assemble the response, transmission time, etc.

Best,
Erick

On Mon, Oct 7, 2013 at 4:49 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 QueryResponse object at Solrj has two different methods for required time
 for a given query. One of them is for *QTime(queryTime)* and the other one
 is for *elapsedTime. *What are the differences between them and what
 exactly for elapsedTime?

Regarding edismax parsing

2013-10-07 Thread Prashant Golash

Hi,

I have a question regarding to parsing of tokens in edismax parser and
subsequently a follow up question related to same.

   - Each field has list of analyzers and tokenizers as configured in
   schema.xml (Index and query time). Now, say I search for query - red shoes.
   So, is it like that for forming Disjunction query on each field, edismax
   will first apply analyzers configured to that field, and then form the
   query. For e.g if field1 has changes red to rd and field2 changes red to
   re, query will be like - (field1:rd) | (field2:re)  ?


   - If above holds true, then when I changed ordering of analyzers and put
   SynonymFilterFactory at top of all analyzers (in schema.xml), edismax
   still tokenizes the query first with respect to space and then only apply
   synonym filter factory, which leads me to think that this is not happening.
   My use case is like , before applying any tokenizer, I want to support
   phrase level synonym replacement and do rest of analysis.

Thanks,
Prashant Golash

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Erick Erickson

Thanks for the great writeup! It's always interesting to see how
a feature plays out in the real world. A couple of questions
though:

bq: We added 2 Cores options :
Do you mean you patched Solr? If so are you willing to shard the code
back? If both are yes, please open a JIRA, attach the patch and assign
it to me.

bq:  the number of file descriptors, it used a lot (need to increase global
max and per process fd)

Right, this makes sense since you have a bunch of cores all with their
own descriptors open. I'm assuming that you hit a rather high max
number and it stays pretty steady

bq: the overhead to parse solrconfig.xml and load dependencies to open
each core

Right, I tried to look at sharing the underlying solrconfig object but
it seemed pretty hairy. There are some extensive comments in the
JIRA of the problems I foresaw. There may be some action on this
in the future.

bq: lotsOfCores doesn’t work with SolrCloud

Right, we haven't concentrated on that, it's an interesting problem.
In particular it's not clear what happens when nodes go up/down,
replicate, resynch, all that.

bq: When you start, it spend a lot of times to discover cores due to a big

How long? I tried 15K cores on my laptop and I think I was getting 15
second delays or roughly 1K cores discovered/second. Is your delay
on the order of 50 seconds with 50K cores?

I'm not sure how you could do that in the background, but I haven't
thought about it much. I tried multi-threading core discovery and that
didn't help (SSD disk), I assumed that the problem was mostly I/O
contention (but didn't prove it). What if a request came in for a core
before you'd found it? I'm not sure what the right behavior would be
except perhaps to block on that request until core discovery was
complete. Hm. How would that work for your case? That
seems do-able.

BTW, so far you get the prize for the most cores on a node I think.

Thanks again for the great feedback!

Erick

On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
olivier.so...@worldline.com wrote:
 Hello,

 In my company, we use Solr in production to offer full text search on
 mailboxes.
 We host dozens million of mailboxes, but only webmail users have such
 feature (few millions).
 We have the following use case :
 - non static indexes with more update (indexing and deleting), than
 select requests (ratio 7:1)
 - homogeneous configuration for all indexes
 - not so much user at the same time

 We started to index mailboxes with Solr 1.4 in 2010, on a subset of
 400,000 users.
 - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
 instance
 - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
 index (~2 million users)
 - we upgraded to Solr 3.5 in 2012
 As indexes grew, IOPS and the response times have increased more and more.

 The index size was mainly due to stored fields (large .fdt files)
 Retrieving these fields from the index was costly, because of many seek
 in large files, and no limit usage possible.
 There is also an overhead on queries : too many results are filtered to
 find only results concerning user.
 For these reason and others, like not pooled users, hardware savings,
 better scoring, some requests that do not support filtering, we have
 decided to use the LotsOfCores feature.

 Our goal was to change the current I/O usage : from lots of random I/O
 access on huge segments to mostly sequential I/O access on small segments.
 For our use case, it's not a big deal, that the first query to one not
 yet loaded core will be slow.
 And, we don’t need to fit all the cores into memory at once.

 We started from the SOLR-1293 issue and the LotsOfCores wiki page to
 finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
 core).
 We don't need anymore to run so many Solr per node. We are now able to
 have around 5 cores per Solr and we plan to grow to 100,000 cores
 per instance.
 In a first time, we used the solr.xml persistence. All cores have
 loadOnStartup=false and transient=true attributes, so a cold start
 is very quick. The response times were better than ever, in comparaison
 with poor response times, we had before using LotsOfCores.

 We added 2 Cores options :
 - numBuckets to create a subdirectory based on a hash on the corename
 % numBuckets in the core Datadir, because all cores cannot live in the
 same directory
 - Auto with 3 differents values :
 1) false : default behaviour
 2) createLoad : create, if not exist, and load the core on the fly on
 the first incoming request (update, select).
 3) onlyLoad : load the core on the fly on the first incoming request
 (update, select), if exist on disk

 Then, to improve performance and avoid synchronization in the solr.xml
 persistence : we disabled it.
 The drawback is we cannot see anymore all the availables cores list with
 the admin core status command, only those warmed up.
 Finally, we can achieve very good performances with Solr LotsOfCores :
 - Index 5 emails (avg) +

no such field error:smaller big block size details while indexing doc files

2013-10-07 Thread sweety

Im trying to index .doc,.docx,pdf files,
im using this url:
curl
http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
-Fmyfile=@complex.doc

This is the error I get:
Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError:
SMALLER_BIG_BLOCK_SIZE_DETAILS
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190)
at
org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376)
at
org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)
at
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more

Also using same type of url,txt,mp3 and pdf files are indexed successfully.
(curl
http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
-Fmyfile=@abc.txt)

Schema.xml is:
schema  name=documents
fields 
field name=id type=string indexed=true stored=true required=true
multiValued=false/
field name=author type=string indexed=true stored=true
multiValued=true/
field name=comments type=text indexed=true stored=true
multiValued=false/
field name=keywords type=text indexed=true stored=true
multiValued=false/
field name=contents type=text indexed=true stored=true
multiValued=false/
field name=title type=text indexed=true stored=true
multiValued=false/
field name=revision_number type=string indexed=true stored=true
multiValued=false/
field name=_version_ type=long indexed=true stored=true
multiValued=false/

dynamicField name=ignored_* type=string indexed=false stored=true
multiValued=true/
copyfield source=id dest=text /
copyfield source=author dest=text /
/fields 

types
fieldType name=integer class=solr.IntField /
fieldType name=long class=solr.LongField /
fieldType name=string class=solr.StrField  /  
fieldType name=text class=solr.TextField /
fieldtype name=ignored stored=false indexed=false multiValued=true
class=solr.StrField /
/types
uniqueKeyid/uniqueKey
/schema

Im not able to understand what kind of error this is,please help me.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

2013-10-07 Thread adfel70

Sorry, by OOE I meant Out of memory exception...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search for non empty fields in a index with denormalized tables

2013-10-07 Thread SandroZbinden

Okay I try to specify my question a little bit.

I have a denormalized index of two sql tables patient and table.

If I add a patient with two images to the solr index my index contains 3
documents.

---
Pat_ID |Patient_Lastnname | Image_ID | Image_Name
---
   1  |Miller   |   EMPTY|   EMPTY
   1  |Miller   |   1   |   dog.jpg
   1  |Miller   |   2   |   cat.jpg
---

When I add now another patient without any images the solr index contains 4
documents.

---
Pat_ID |Patient_Lastnname | Image_ID | Image_Name
---
   1  |Miller   |   EMPTY|   EMPTY
   1  |Miller   |   1   |   dog.jpg
   1  |Miller   |   1   |   cat.jpg
   2  |Smith  |   EMPTY|   EMPTY
---


Now I want to select all patients that have no image. (image_id is empty)

If I query this with the following solr query the result would be Miller and
Smith but I need a query that will  return Smith only.

select?q=-Image_ID:[0 TO *] and then group by pat_id

What I would need is something like the having clause in sql. So I could
group by pat_id and filter for the one where count is less then 2

Bests Sandro



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-for-non-empty-fields-in-a-index-with-denormalized-tables-tp4093287p4093903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

2013-10-07 Thread Guido Medina


Out of Memory Exception is well known as OOM.

Guido.

On 07/10/13 14:11, adfel70 wrote:

Sorry, by OOE I meant Out of memory exception...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Yago Riveiro

I assume that the lotOfCores feature doesn't use zookeeper

I tried simulate the cores as collection, but when the size of 
clusterstate.json is bigger than 1M and -Djute.maxbuffer is needed to increase 
the 1 mega limitation.  

A naive question, why clusterstate.json is doesn't by collection?  

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, October 7, 2013 at 1:33 PM, Erick Erickson wrote:

 Thanks for the great writeup! It's always interesting to see how
 a feature plays out in the real world. A couple of questions
 though:
  
 bq: We added 2 Cores options :
 Do you mean you patched Solr? If so are you willing to shard the code
 back? If both are yes, please open a JIRA, attach the patch and assign
 it to me.
  
 bq: the number of file descriptors, it used a lot (need to increase global
 max and per process fd)
  
 Right, this makes sense since you have a bunch of cores all with their
 own descriptors open. I'm assuming that you hit a rather high max
 number and it stays pretty steady
  
 bq: the overhead to parse solrconfig.xml and load dependencies to open
 each core
  
 Right, I tried to look at sharing the underlying solrconfig object but
 it seemed pretty hairy. There are some extensive comments in the
 JIRA of the problems I foresaw. There may be some action on this
 in the future.
  
 bq: lotsOfCores doesn’t work with SolrCloud
  
 Right, we haven't concentrated on that, it's an interesting problem.
 In particular it's not clear what happens when nodes go up/down,
 replicate, resynch, all that.
  
 bq: When you start, it spend a lot of times to discover cores due to a big
  
 How long? I tried 15K cores on my laptop and I think I was getting 15
 second delays or roughly 1K cores discovered/second. Is your delay
 on the order of 50 seconds with 50K cores?
  
 I'm not sure how you could do that in the background, but I haven't
 thought about it much. I tried multi-threading core discovery and that
 didn't help (SSD disk), I assumed that the problem was mostly I/O
 contention (but didn't prove it). What if a request came in for a core
 before you'd found it? I'm not sure what the right behavior would be
 except perhaps to block on that request until core discovery was
 complete. Hm. How would that work for your case? That
 seems do-able.
  
 BTW, so far you get the prize for the most cores on a node I think.
  
 Thanks again for the great feedback!
  
 Erick
  
 On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
 olivier.so...@worldline.com (mailto:olivier.so...@worldline.com) wrote:
  Hello,
   
  In my company, we use Solr in production to offer full text search on
  mailboxes.
  We host dozens million of mailboxes, but only webmail users have such
  feature (few millions).
  We have the following use case :
  - non static indexes with more update (indexing and deleting), than
  select requests (ratio 7:1)
  - homogeneous configuration for all indexes
  - not so much user at the same time
   
  We started to index mailboxes with Solr 1.4 in 2010, on a subset of
  400,000 users.
  - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
  instance
  - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
  index (~2 million users)
  - we upgraded to Solr 3.5 in 2012
  As indexes grew, IOPS and the response times have increased more and more.
   
  The index size was mainly due to stored fields (large .fdt files)
  Retrieving these fields from the index was costly, because of many seek
  in large files, and no limit usage possible.
  There is also an overhead on queries : too many results are filtered to
  find only results concerning user.
  For these reason and others, like not pooled users, hardware savings,
  better scoring, some requests that do not support filtering, we have
  decided to use the LotsOfCores feature.
   
  Our goal was to change the current I/O usage : from lots of random I/O
  access on huge segments to mostly sequential I/O access on small segments.
  For our use case, it's not a big deal, that the first query to one not
  yet loaded core will be slow.
  And, we don’t need to fit all the cores into memory at once.
   
  We started from the SOLR-1293 issue and the LotsOfCores wiki page to
  finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
  core).
  We don't need anymore to run so many Solr per node. We are now able to
  have around 5 cores per Solr and we plan to grow to 100,000 cores
  per instance.
  In a first time, we used the solr.xml persistence. All cores have
  loadOnStartup=false and transient=true attributes, so a cold start
  is very quick. The response times were better than ever, in comparaison
  with poor response times, we had before using LotsOfCores.
   
  We added 2 Cores options :
  - numBuckets to create a subdirectory based on a hash on the corename
  % numBuckets in the core Datadir, because all cores cannot live in the
  same directory
  - Auto with 3

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Shalin Shekhar Mangar

I think we'd all love to see those improvements land in Solr.

I was involved in the work at AOL WebMail where the LotsOfCores idea
originated. We had many of the problems that you've had to solve yourself.
I remember that we switched to compound file format to reduce file
descriptors. Also we had to switch back to the Log Merge Policy from
TieredMergePolicy because TieredMergePolicy increased the overall random
disk i/o and we had latency issues because of it.


On Mon, Oct 7, 2013 at 1:23 PM, Soyez Olivier
olivier.so...@worldline.comwrote:

 Hello,

 In my company, we use Solr in production to offer full text search on
 mailboxes.
 We host dozens million of mailboxes, but only webmail users have such
 feature (few millions).
 We have the following use case :
 - non static indexes with more update (indexing and deleting), than
 select requests (ratio 7:1)
 - homogeneous configuration for all indexes
 - not so much user at the same time

 We started to index mailboxes with Solr 1.4 in 2010, on a subset of
 400,000 users.
 - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
 instance
 - we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
 index (~2 million users)
 - we upgraded to Solr 3.5 in 2012
 As indexes grew, IOPS and the response times have increased more and more.

 The index size was mainly due to stored fields (large .fdt files)
 Retrieving these fields from the index was costly, because of many seek
 in large files, and no limit usage possible.
 There is also an overhead on queries : too many results are filtered to
 find only results concerning user.
 For these reason and others, like not pooled users, hardware savings,
 better scoring, some requests that do not support filtering, we have
 decided to use the LotsOfCores feature.

 Our goal was to change the current I/O usage : from lots of random I/O
 access on huge segments to mostly sequential I/O access on small segments.
 For our use case, it's not a big deal, that the first query to one not
 yet loaded core will be slow.
 And, we don’t need to fit all the cores into memory at once.

 We started from the SOLR-1293 issue and the LotsOfCores wiki page to
 finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
 core).
 We don't need anymore to run so many Solr per node. We are now able to
 have around 5 cores per Solr and we plan to grow to 100,000 cores
 per instance.
 In a first time, we used the solr.xml persistence. All cores have
 loadOnStartup=false and transient=true attributes, so a cold start
 is very quick. The response times were better than ever, in comparaison
 with poor response times, we had before using LotsOfCores.

 We added 2 Cores options :
 - numBuckets to create a subdirectory based on a hash on the corename
 % numBuckets in the core Datadir, because all cores cannot live in the
 same directory
 - Auto with 3 differents values :
 1) false : default behaviour
 2) createLoad : create, if not exist, and load the core on the fly on
 the first incoming request (update, select).
 3) onlyLoad : load the core on the fly on the first incoming request
 (update, select), if exist on disk

 Then, to improve performance and avoid synchronization in the solr.xml
 persistence : we disabled it.
 The drawback is we cannot see anymore all the availables cores list with
 the admin core status command, only those warmed up.
 Finally, we can achieve very good performances with Solr LotsOfCores :
 - Index 5 emails (avg) + commit + search : x4.9 faster response time
 (Mean), x5.4 faster (95th per)
 - Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
 faster (95th per)
 - Search : x3.7 faster response time (Mean) 4x faster (95th per)

 In fact, the better performance is mainly due to the little size of each
 index, but also thanks to the isolation between cores (updates and
 queries on many mailboxes don’t have side effects to each other).
 One important thing with the LotsOfCores feature is to take care of :
 - the number of file descriptors, it used a lot (need to increase global
 max and per process fd)
 - the value of the transientCacheSize depending of the RAM size and the
 PermGen allocated size
 - the leak of ClassLoader that increase minor GC times, when CMS GC is
 enabled (use -XX:+CMSClassUnloadingEnabled)
 - the overhead to parse solrconfig.xml and load dependencies to open
 each core
 - lotsOfCores doesn’t work with SolrCloud, then we store indexes
 location outside of Solr. We have Solr proxies to route requests to the
 right instance.

 Not in production, we try the core discovery feature in Solr 4.4 with a
 lots of cores.
 When you start, it spend a lot of times to discover cores due to a big
 number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
 not done yet). It will be great to have for example an option for a core
 discovery in background, or just to be able to disable it, like we do in
 our use case.

 If someone is interested in these

Re: Among LatLonType SpatialRecursivePrefixTreeFieldType which one for filtering outside of bounding box?

2013-10-07 Thread Smiley, David W.

Use the location_rpt field type in the example schema.xml -- it has good
performance  less memory (what you asked for) compared to LatLonType.
To learn how to tweak some of the settings to get better performance at
the expense of some accuracy, see
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

~ David

On 10/5/13 8:53 AM, user 01 user...@gmail.com wrote:

For geospatial search, I need to filter out all points outside of certain
radius from a certain point. No need for precise results, Approximation
will work for me! No sorting is required either. I see there are two
Spatial impls: LatLonType  SpatialRecursivePrefixTreeFieldType. But I am
sure which one should I choose for good performance  less memory?
(As I said approximations are ok for me. Instead of circle if it gives me
approx bounding box results then too I will be fine)

Web App Engineer at Harvard-Smithsonian Astrophysical Observatory, full time, indefinite contract

2013-10-07 Thread Roman Chyla

Dear all,

We are looking for a new member to join our team. This position requires
solid knowledge of Python, plus experience with web development, HTML5,
XSLT, JSON, CSS3, relational databases and NoSQL but search (and SOLR) is
the central point of everything we do here. So, if you love SOLR/Lucene as
we do, then I'm sure there will be plenty of opportunities for search
related development for you too.


About the project:

http://labs.adsabs.harvard.edu/adsabs/

The ADS is the central discovery engine for astronomical information, used
nearly every day by nearly every astronomer. Conceived 20 years ago, moving
into its third decade, the ADS continues to serve the research community
worldwide.

The ADS is currently developing the next-generation web-based platform
supporting current and future services. The project is committed to
developing and using open-source software. The main components of the
system architecture are: Apache SOLR/Lucene (search), CERN Invenio and
MongoDB (storage), Python+Flask+Bootstrap (frontend).

We are looking for a highly-motivated full-stack developer interested in
joining a dynamic team of talented individuals architecting and
implementing the new platform. Your primary responsibility is the design,
development, and support of the ADS front-end applications (including the
new search interface) as well as the implementation of the user database,
login system and personalization features.

For more information, please see the full posting online at:
http://www.cfa.harvard.edu/hr/postings/13-32.html


Thank you,

  Roman

--

Dr. Roman Chyla
ADS, Harvard-Smithsonian Center for Astrophysics
roman.ch...@gmail.com

Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk


This query returns the correct results:
http://localhost:8983/solr/tt/select/?indent=onfq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist()

However, I want to add OR select on a field city as well:
fq=city:(brooklyn)

But when I add that to my querystring I get:

http://localhost:8983/solr/tt/select/?indent=onfq=city:(brooklyn)fq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist()

Then I get 0 results.

I have this in my schema.xml:
 solrQueryParser defaultOperator=OR/
 
How can I add an OR operator in my querystring and group fields city and
my geodist parameters?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jack Krupansky

The default query operator applies only within a single query parameter. If 
you want to OR two filter queries, you must combine them into one filter 
query parameter.


-- Jack Krupansky

-Original Message- 
From: PeterKerk

Sent: Monday, October 07, 2013 1:08 PM
To: solr-user@lucene.apache.org
Subject: Adding OR operator in querystring and grouping fields?


This query returns the correct results:
http://localhost:8983/solr/tt/select/?indent=onfq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist()

However, I want to add OR select on a field city as well:
fq=city:(brooklyn)

But when I add that to my querystring I get:

http://localhost:8983/solr/tt/select/?indent=onfq=city:(brooklyn)fq={!geofilt}pt=41.7882,-71.9498sfield=geolocationd=2000q=*:*start=0rows=12fl=id,titlefacet.mincount=1fl=_dist_:geodist()

Then I get 0 results.

I have this in my schema.xml:
solrQueryParser defaultOperator=OR/

How can I add an OR operator in my querystring and group fields city and
my geodist parameters?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk

Ok thanks.
you must combine them into one filter query parameter. , how would I do
that? Can I simply change the URL structure or must I change my schema.xml
and/or data-config.xml?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html
Sent from the Solr - User mailing list archive at Nabble.com.

Delete a field - Atomic updates (SOLR 4.1.0) without using null=true

2013-10-07 Thread SolrLover

I am using SOLR 4.1.0 and perform atomic updates on SOLR documents.

Unfortunately there is a bug in 4.1.0
(https://issues.apache.org/jira/browse/SOLR-4297) that blocks me from using
null=true for deleting a field through atomic update functionality. Is
there any other way to delete a field other than using this syntax?

 FYI..I wont be able to migrate to latest version now due to company code
freeze hence trying to figure out a temporary work around.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-a-field-Atomic-updates-SOLR-4-1-0-without-using-null-true-tp4093951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: {soft}Commit and cache flusing

2013-10-07 Thread Tim Vaillancourt

Is there a way to make autoCommit only commit if there are pending changes,
ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
and wipe the caches)?

Cheers,

Tim


On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:

 right. We've got the autoHard commit configured only atm. The soft-commits
 are controlled on the client. It was just easier to implement the first
 version of our internal commit policy that will commit to all solr
 instances at once. This is where we have noticed the reported behavior.


 On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote:

  if there are no modifications to an index and a softCommit or hardCommit
  issued, then solr flushes the cache.
 
 
  Indeed. The easiest way to work around this is by disabling auto commits
  and only commit when you have to.

Re: solr cpu usage

2013-10-07 Thread Tim Vaillancourt

Fantastic article!

Tim


On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote:

 From my perspective, your question is almost impossible to
 answer, there are too many variables. See:

 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

 Best,
 Erick

 On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
  Hi,
 
  More CPU cores means more concurrency.  This is good if you need to
 handle
  high query rates.
 
  Faster cores mean lower query latency, assuming you are not bottlenecked
 by
  memory or disk IO or network IO.
 
  So what is ideal for you depends on your concurrency and latency needs.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote:
 
  hi
  We're building a spec for a machine to purchase.
  We're going to buy 10 machines.
  we aren't sure yet how many proccesses we will run per machine.
  the question is  -should we buy faster cpu with less cores or slower cpu
  with more cores?
  in any case we will have 2 cpus in each machine.
  should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?
 
  what will we gain by having many cores?
 
  what kinds of usages would make cpu be the bottleneck?
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete a field - Atomic updates (SOLR 4.1.0) without using null=true

2013-10-07 Thread Jason Hellman

I don't know if there's a way to accomplish your goal directly, but as a pure 
workaround, you can write a routine to fetch all the stored values and resubmit 
the document without the field in question.  This is what atomic updates do, 
minus the overhead of the transmission.

On Oct 7, 2013, at 11:15 AM, SolrLover bbar...@gmail.com wrote:

 I am using SOLR 4.1.0 and perform atomic updates on SOLR documents.
 
 Unfortunately there is a bug in 4.1.0
 (https://issues.apache.org/jira/browse/SOLR-4297) that blocks me from using
 null=true for deleting a field through atomic update functionality. Is
 there any other way to delete a field other than using this syntax?
 
 FYI..I wont be able to migrate to latest version now due to company code
 freeze hence trying to figure out a temporary work around.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Delete-a-field-Atomic-updates-SOLR-4-1-0-without-using-null-true-tp4093951.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

2013-10-07 Thread Darniz

Thanks Eric 

Ok if we go by that proposal of copying all date fields into on bag_of_dates
field

Hence now we have a field and it will look something like this.
arr name=bag_of_dates
  str2013-09-01T00:00:00Z/str
  str2013-12-01T00:00:00Z/str
/arr
arr name=text
  strSept content : Honda is releasing the car this month/str
  strDec content : Toyota is releasing the car this month /str
/arr
and i also agree now we can make a range query where bag_of_dates:[* TO NOW]
AND text:Toyota but still how are we going to make sure the document should
not get returned since toyota is only searchable from 1-DEC-2013

i hope i am able to explain it properly

ON our website, when we render data we dont show this line Dec content :
Toyota is releasing the car this month on the page since todays date is not
1-DEC-2013 yet. hence we dont want this doc to be shown in search result as
well when we query solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Gracefully stopping jetty server - LockObtainFailedException

2013-10-07 Thread Ashwin Tandel

Hi,

I have solr cloud(4.1) setup  with embedded jetty server.

I use the below command to start and stop the server.

start server : nohup java -DSTOP.PORT=8085 -DSTOP.KEY=key -DnumShards=2
 -Dbootstrap_confdir=./solr/nlp/conf -Dcollection.configName=myconf
-DzkHost=10.88.139.206:2181,10.88.139.206:2182,10.88.139.206:2183 -jar
start.jar  output.log 21 

Stop Server :  java -DSTOP.PORT=8085 -DSTOP.KEY=key  -jar start.jar --stop

What I have observed is once I stop the server and start again, while
indexing it gives me 'org.apache.lucene.store.LockObtainFailedException:
Lock obtain timed out:' with the 'NativeFSLock@solr
/nlp/data/index.20130924205253479/write.lock'

after I delete lock file manually and start the server, indexing works fine.

Please let me know how we can resolve this. If this issue is answered
earlier, I would appreciate pointing me to the url, tried finding it but
could not.



Thanks in Advance,
Ashwin

SolrJ best pratices

2013-10-07 Thread Mark

Are there any links describing best practices for interacting with SolrJ? I've 
checked the wiki and it seems woefully incomplete: 
(http://wiki.apache.org/solr/Solrj)

Some specific questions:
- When working with HttpSolrServer should we keep around instances for ever or 
should we create a singleton that can/should be used over and over? 
- Is there a way to change the collection after creating the server or do we 
need to create a new server for each collection?
-..

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jack Krupansky

Combine the two filter queries with an explicit OR operator.

-- Jack Krupansky
-Original Message- 
From: PeterKerk

Sent: Monday, October 07, 2013 1:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding OR operator in querystring and grouping fields?

Ok thanks.
you must combine them into one filter query parameter. , how would I do
that? Can I simply change the URL structure or must I change my schema.xml
and/or data-config.xml?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jason Hellman

fq=here:there OR this:that

For the lurker:  an AND should be:

fq=here:therefq=this:that

While you can, technically, pass:

fq=here:there AND this:that

Solr will cache the separate fq= parameters and reuse them in any context.  The 
AND(ed) filter will be cached as a single entry and only used when the same AND 
construct is sent.  Perhaps useful, not as generally desirable.


On Oct 7, 2013, at 2:10 PM, Jack Krupansky j...@basetechnology.com wrote:

 Combine the two filter queries with an explicit OR operator.
 
 -- Jack Krupansky
 -Original Message- From: PeterKerk
 Sent: Monday, October 07, 2013 1:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Adding OR operator in querystring and grouping fields?
 
 Ok thanks.
 you must combine them into one filter query parameter. , how would I do
 that? Can I simply change the URL structure or must I change my schema.xml
 and/or data-config.xml?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093947.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Split shard doesn't persist data correctly on solr.xml

2013-10-07 Thread yriveiro

I notice that when a SPLISHARD operation finish, the solr.xml is not update
properly.

# Parent solr.xml:

core numShards=2 name=test_shard1_replica1
instanceDir=test_shard1_replica1 shard=shard1 collection=test/

# Children solr.xml:

core name=test_shard1_0_replica1 shardState=construction
instanceDir=test_shard1_0_replica1 shard=shard1_0 collection=test
property name=shardRange value=8000-bfff/
/core

core name=test_shard1_1_replica1 shardState=construction
instanceDir=test_shard1_1_replica1 shard=shard1_1 collection=test
property name=shardRange value=c000-/
/core


# Paren Clusterstate:
shard1:{
range:8000-,
state:inactive,
replicas:{192.168.2.18:8983_solr_test_shard1_replica1:{
state:active,
base_url:http://192.168.2.18:8983/solr;,
core:test_shard1_replica1,
node_name:192.168.2.18:8983_solr,
leader:true}}},

# Children Clusterstate:
shard1_0:{
range:8000-bfff,
state:active,
replicas:{192.168.2.18:8983_solr_test_shard1_0_replica1:{
state:active,
base_url:http://192.168.2.18:8983/solr;,
core:statistics-11_shard1_0_replica1,
node_name:192.168.2.18:8983_solr,
leader:true}}},
shard1_1:{
   range:c000-,
   state:active,
   replicas:{192.168.2.18:8983_solr_test_shard1_1_replica1:{
   state:active,
   base_url:http://192.168.2.18:8983/solr;,
   core:statistics-11_shard1_1_replica1,
   node_name:192.168.2.18:8983_solr,
   leader:true}}},

I only notice this because I did a restart and the nodes was show on cloud
graph as down.

The shards where I did a manual replication were written to the solr.xml
file as expected, but not at the time that I executed the CREATE command.

command: curl
'http://192.168.2.18:8983/solr/admin/cores?action=CREATEname=test_shard2_0_replicaXcollection=testshard=shard2_0'

Create replicaA - solr.xml not write nothing about the replicaA.
Create replicaA - solr.xml not write nothing about the replicaB, registered
data about the replicaA.

Is like I have a lag of 1 operation, this is normal?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Split-shard-doesn-t-persist-data-correctly-on-solr-xml-tp4093996.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to achieve distributed spelling check in SolrCloud ?

2013-10-07 Thread Shamik Bandopadhyay

Hi,

  We are in the process of transitioning to SolrCloud (4.4) from
Master-Slave architecture (4.2) . One of the issues I'm facing now is with
making spell check work. It only seems to work if I explicitly set
distrib=false. I'm using a custom request handler and included the spell
check option.

str name=spellcheckon/str
   str name=spellcheck.collatetrue/str
   str name=spellcheck.onlyMorePopularfalse/str
   str name=spellcheck.extendedResultsfalse/str
   str name=spellcheck.count1/str
   str name=spellcheck.dictionarydefault/str
  /lst
  !-- append spellchecking to our list of components --
  arr name=last-components
   strspellcheck/str
  /arr

The spellcheck component has the usual configuration.

The spell check is part of the request handler which is being used to
executed a distributed query.. I can't possibly add distrib=false.

Just wondering if there's a way to address this.

Any pointers will be appreciated.

-Thanks,
Shamik

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread PeterKerk

@Jason: your example worked perfectly!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-OR-operator-in-querystring-and-grouping-fields-tp4093942p4093999.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.5 - CoreAPI issue with CREATE

2013-10-07 Thread yriveiro

Hi,

I'm doing replicas for my shards manually and the solr.xml config doesn't
save the changes (solr.xml attribute persist = true).

The command used is: 

curl
'http://192.168.2.18:8983/solr/admin/cores?action=CREATEname=test_shard1_replica2collection=testshard=shard1'

Someone else with the same behaviour?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-5-CoreAPI-issue-with-CREATE-tp4094001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Fix sort order within an index ?

2013-10-07 Thread user 01

Any way to store documents in a fixed sort order within the indexes of
certain fields(either the arrival order or sorted by int ids, that also
serve as my unique key), so that I could store them optimized for browsing
lists of items ?

The order for browsing is always fixed  there are no further filter
queries. Just I need to fetch the top 20 (most recently added) document
with field value topic=x1

I came across this article  a JIRA issue which encouraged me that
something like this may be possible:

http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html

https://issues.apache.org/jira/browse/LUCENE-4752

Issue with distributed spelling check in Solr 4.4

2013-10-07 Thread shamik

Hi, 

  We are in the process of transitioning to SolrCloud (4.4) from 
Master-Slave architecture (4.2) . One of the issues I'm facing now is with 
making spell check work. It only seems to work if I explicitly set 
distrib=false. I'm using a custom request handler and included the spell 
check option. 

str name=spellcheckon/str 
   str name=spellcheck.collatetrue/str 
   str name=spellcheck.onlyMorePopularfalse/str 
   str name=spellcheck.extendedResultsfalse/str 
   str name=spellcheck.count1/str 
   str name=spellcheck.dictionarydefault/str 
  /lst 
   
  arr name=last-components 
   strspellcheck/str 
  /arr 

The spellcheck component has the usual configuration. 


searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypespell/str

lst name=spellchecker
str name=namedefault/str
str name=fieldtext/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix1/int
int name=maxInspections5/int
int name=minQueryLength4/int
float name=maxQueryFrequency0.01/float

/lst


lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str 
str name=fieldtext/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges10/int
/lst
/searchComponent


The spell check is part of the request handler which is being used to 
executed a distributed query. I can't possibly add distrib=false. 

Just wondering if there's a way to address this. 

Any pointers will be appreciated. 

-Thanks, 
Shamik 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-distributed-spelling-check-in-Solr-4-4-tp4094009.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Regarding edismax parsing

2013-10-07 Thread Erick Erickson

You're probably having problem with the distinction between
query parsing and analysis which has been discussed many
times.

The issues is that the query parser breaks things up into individual
tokens and _then_ sends them to the analyzer chain as individual
tokens (usually).

Try escaping your spaces.

Best
Erick

On Mon, Oct 7, 2013 at 8:28 AM, Prashant Golash
prashant.gol...@gmail.com wrote:
 Hi,

 I have a question regarding to parsing of tokens in edismax parser and
 subsequently a follow up question related to same.

- Each field has list of analyzers and tokenizers as configured in
schema.xml (Index and query time). Now, say I search for query - red shoes.
So, is it like that for forming Disjunction query on each field, edismax
will first apply analyzers configured to that field, and then form the
query. For e.g if field1 has changes red to rd and field2 changes red to
re, query will be like - (field1:rd) | (field2:re)  ?


- If above holds true, then when I changed ordering of analyzers and put
SynonymFilterFactory at top of all analyzers (in schema.xml), edismax
still tokenizes the query first with respect to space and then only apply
synonym filter factory, which leads me to think that this is not happening.
My use case is like , before applying any tokenizer, I want to support
phrase level synonym replacement and do rest of analysis.

 Thanks,
 Prashant Golash

Re: no such field error:smaller big block size details while indexing doc files

2013-10-07 Thread Erick Erickson

Well, one of the attributes parsed out of, probably the
meta-information associated with one of your structured
docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and
Solr Cel is faithfully sending that to your index. If you
want to throw all these in the bit bucket, try defining
a true catch-all field that ignores things, like this.
dynamicField name=* type=ignored multiValued=true /

Best,
Erick

On Mon, Oct 7, 2013 at 8:03 AM, sweety sweetyshind...@yahoo.com wrote:
 Im trying to index .doc,.docx,pdf files,
 im using this url:
 curl
 http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
 -Fmyfile=@complex.doc

 This is the error I get:
 Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError:
 SMALLER_BIG_BLOCK_SIZE_DETAILS
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS
 at
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93)
 at
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190)
 at
 org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184)
 at
 org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376)
 at
 org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)
 at
 org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
 at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
 ... 16 more

 Also using same type of url,txt,mp3 and pdf files are indexed successfully.
 (curl
 http://localhost:8080/solr/document/update/extract?literal.id=12commit=true;
 -Fmyfile=@abc.txt)

 Schema.xml is:
 schema  name=documents
 fields
 field name=id type=string indexed=true stored=true required=true
 multiValued=false/
 field name=author type=string indexed=true stored=true
 multiValued=true/
 field name=comments type=text indexed=true stored=true
 multiValued=false/
 field name=keywords type=text indexed=true stored=true
 multiValued=false/
 field name=contents type=text indexed=true stored=true
 multiValued=false/
 field name=title type=text indexed=true stored=true
 multiValued=false/
 field name=revision_number type=string indexed=true stored=true
 multiValued=false/
 field name=_version_ type=long indexed=true stored=true
 multiValued=false/

 dynamicField name=ignored_* type=string indexed=false stored=true
 multiValued=true/
 copyfield source=id dest=text /
 copyfield source=author dest=text /
 /fields

 types
 fieldType name=integer class=solr.IntField /
 fieldType name=long class=solr.LongField /
 fieldType name=string

Re: Search for non empty fields in a index with denormalized tables

2013-10-07 Thread Erick Erickson

I don't think your model fits well into Solr.

What I'd do is make my uniqueKey the patient ID, and
put the image names (or links or whatever) in a multiValued
field. Then you can do what you want by a simple
q=*:* -image_name:[* TO *]

Best,
Erick

On Mon, Oct 7, 2013 at 9:20 AM, SandroZbinden zbin...@imagic.ch wrote:
 Okay I try to specify my question a little bit.

 I have a denormalized index of two sql tables patient and table.

 If I add a patient with two images to the solr index my index contains 3
 documents.

 ---
 Pat_ID |Patient_Lastnname | Image_ID | Image_Name
 ---
1  |Miller   |   EMPTY|   EMPTY
1  |Miller   |   1   |   dog.jpg
1  |Miller   |   2   |   cat.jpg
 ---

 When I add now another patient without any images the solr index contains 4
 documents.

 ---
 Pat_ID |Patient_Lastnname | Image_ID | Image_Name
 ---
1  |Miller   |   EMPTY|   EMPTY
1  |Miller   |   1   |   dog.jpg
1  |Miller   |   1   |   cat.jpg
2  |Smith  |   EMPTY|   EMPTY
 ---


 Now I want to select all patients that have no image. (image_id is empty)

 If I query this with the following solr query the result would be Miller and
 Smith but I need a query that will  return Smith only.

 select?q=-Image_ID:[0 TO *] and then group by pat_id

 What I would need is something like the having clause in sql. So I could
 group by pat_id and filter for the one where count is less then 2

 Bests Sandro



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Search-for-non-empty-fields-in-a-index-with-denormalized-tables-tp4093287p4093903.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

2013-10-07 Thread Erick Erickson

bq:  If so, using soft commit without calling hard commit could cause OOM

no. Aside from anything you have configured for auto(hard) commit, the
ramBufferSizeMB in solrconfig.xml will flush the in-memory structures out
to the segments when the size reaches this limit. It won't _close_ the
current segment, so it won't be permanent, but it'll limit memory consumption.

Best,
Erick

On Mon, Oct 7, 2013 at 9:40 AM, Guido Medina guido.med...@temetra.com wrote:
 Out of Memory Exception is well known as OOM.

 Guido.


 On 07/10/13 14:11, adfel70 wrote:

 Sorry, by OOE I meant Out of memory exception...



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr cpu usage

2013-10-07 Thread Erick Erickson

Tim:

Thanks! Mostly I wrote it to have something official looking to hide
behind when I didn't have a good answer to the hardware sizing question
:).

On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt t...@elementspace.com wrote:
Fantastic article!

Tim

On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote:

From my perspective, your question is almost impossible to
answer, there are too many variables. See:

http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
Hi,

More CPU cores means more concurrency. This is good if you need to
handle
high query rates.

Faster cores mean lower query latency, assuming you are not bottlenecked
by
memory or disk IO or network IO.

So what is ideal for you depends on your concurrency and latency needs.

Otis
Solr ElasticSearch Support
http://sematext.com/
On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote:

hi
We're building a spec for a machine to purchase.
We're going to buy 10 machines.
we aren't sure yet how many proccesses we will run per machine.
the question is -should we buy faster cpu with less cores or slower cpu
with more cores?
in any case we will have 2 cpus in each machine.
should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?

what will we gain by having many cores?

what kinds of usages would make cpu be the bottleneck?

--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Doing time sensitive search in solr

2013-10-07 Thread Erick Erickson

I'd index them as separate documents.

Best,
Erick

On Mon, Oct 7, 2013 at 2:59 PM, Darniz rnizamud...@edmunds.com wrote:
 Thanks Eric

 Ok if we go by that proposal of copying all date fields into on bag_of_dates
 field

 Hence now we have a field and it will look something like this.
 arr name=bag_of_dates
   str2013-09-01T00:00:00Z/str
   str2013-12-01T00:00:00Z/str
 /arr
 arr name=text
   strSept content : Honda is releasing the car this month/str
   strDec content : Toyota is releasing the car this month /str
 /arr
 and i also agree now we can make a range query where bag_of_dates:[* TO NOW]
 AND text:Toyota but still how are we going to make sure the document should
 not get returned since toyota is only searchable from 1-DEC-2013

 i hope i am able to explain it properly

 ON our website, when we render data we dont show this line Dec content :
 Toyota is releasing the car this month on the page since todays date is not
 1-DEC-2013 yet. hence we dont want this doc to be shown in search result as
 well when we query solr



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4093961.html
 Sent from the Solr - User mailing list archive at Nabble.com.

55 matches

Mail list logo