Re: block join and atomic updates

2014-02-18 Thread mm
But isn't query time join much slower when it comes to a large amount  
of documents?


Zitat von Mikhail Khludnev mkhlud...@griddynamics.com:


Hello,

It sounds like you need to switch to query time join.
15.02.2014 21:57 пользователь m...@preselect-media.com написал:


Any suggestions?


Zitat von m...@preselect-media.com:

 Yonik Seeley yo...@heliosearch.com:



On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:


Is there any workaround to perform atomic updates on blocks or do I
have to
re-index the parent document and all its children always again if I
want to
update a field?



The latter, unfortunately.



Is there any plan to change this behavior in near future?

So, I'm thinking of alternatives without loosing the benefit of block
join.
I try to explain an idea I just thought about:

Let's say I have a parent document A with a number of fields I want to
update regularly and a number of child documents AC_1 ... AC_n which are
only indexed once and aren't going to change anymore.
So, if I index A and AC_* in a block and I update A, the block is gone.
But if I create an additional document AF which only contains something
like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
anymore), could I perform a {!parent ... } query on AF + AC_* and make an
join from the results to get A?
Does this makes any sense and is it even possible? ;-)
And if it's possible, how can I do it?

Thanks,
- Moritz












RE: query parameters

2014-02-18 Thread Andreas Owen
It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND 
roles:(174)) OR  (-organisations:[ TO *] AND -roles:[ TO *]) only returns 
docs that match the first conditions. it doesn't return any docs with the empty 
fields organisations and roles.

-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Montag, 17. Februar 2014 05:08
To: solr-user@lucene.apache.org
Subject: query parameters


in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would like 
to use fq to force the following conditions:
   1: organisations is empty and roles is empty
   2: organisations contains one of the commadelimited list in variable $org
   3: roles contains one of the commadelimited list in variable $r
   4: rule 2 and 3

snipet of what i got (havent checked out if the is a in operator like in sql 
for the list value)

lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=defTypeedismax/str
   str name=synonymstrue/str
   str name=qfplain_text^10 editorschoice^200
title^20 h_*^14 
tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
contentmanager^5 links^5
last_modified^5 url^5
   /str
   str name=fq(organisations='' roles='') or (organisations=$org 
roles=$r) or (organisations='' roles=$r) or (organisations=$org roles='')/str
   str name=bq(expiration:[NOW TO *] OR (*:* 
-expiration:*))^6/str  !-- tested: now or newer or empty gets small boost --
   str name=bfdiv(clicks,max(displays,1))^8/str !-- tested --
   






Re: Facet cache issue when deleting documents from the index

2014-02-18 Thread Marius Dumitru Florea
In the end the problem was actually in my code.. sorry for the noise.
The documents were deleted from my database but not from the Solr
index and I have a display filter that filters out search results that
correspond to documents that don't exist any more in the database,
but this filter doesn't update the facets.

Thanks for the help,
Marius

On Mon, Feb 17, 2014 at 10:52 PM, Marius Dumitru Florea
mariusdumitru.flo...@xwiki.com wrote:
 I tried to set the expungeDeletes flag but it didn't fix the problem.
 The SolrServer doesn't expose a way to set this flag so I had to use:

 new UpdateRequest().setAction(UpdateRequest.ACTION.COMMIT, true, true,
 1, true).process(solrServer);

 Any other hints?

 Note that I managed to run my test in my real environment at runtime
 and it passed, so it seems the behaviour depends on the size of the
 documents that are committed (added to or deleted from the index).

 Thanks,
 Marius

 On Mon, Feb 17, 2014 at 2:32 PM, Marius Dumitru Florea
 mariusdumitru.flo...@xwiki.com wrote:
 On Mon, Feb 17, 2014 at 2:00 PM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi,


 Also I noticed that in your code snippet you have server.delete(foo); 
 which does not exists. deleteById and deleteByQuery methods are defined in 
 SolrServer implementation.

 Yes, sorry, I have a wrapper over the SolrInstance that doesn't do
 much. In the case of delete it just forwards the call to deleteById.
 I'll check the expungeDeletes=true flag and post back the results.

 Thanks,
 Marius




 On Monday, February 17, 2014 1:42 PM, Ahmet Arslan iori...@yahoo.com 
 wrote:
 Hi Marius,

 Facets are computed from indexed terms. Can you commit with 
 expungeDeletes=true flag?

 Ahmet




 On Monday, February 17, 2014 12:17 PM, Marius Dumitru Florea 
 mariusdumitru.flo...@xwiki.com wrote:
 Hi guys,

 I'm using Solr 4.6.1 (embedded) and for some reason the facet cache is
 not invalidated when documents are deleted from the index. Sadly, for
 me, I cannot reproduce this issue with an integration test like this:

 --8--
 SolrInstance server = getSolrInstance();

 SolrInputDocument document = new SolrInputDocument();
 document.setField(id, foo);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 document = new SolrInputDocument();
 document.setField(id, bar);
 document.setField(locale, en);
 server.add(document);

 server.commit();

 SolrQuery query = new SolrQuery(*:*);
 query.set(facet, on);
 query.set(facet.field, locale);
 QueryResponse response = server.query(query);

 Assert.assertEquals(2, response.getResults().size());
 FacetField localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 Count en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(2, en.getCount());

 server.delete(foo);
 server.commit();

 response = server.query(query);

 Assert.assertEquals(1, response.getResults().size());
 localeFacet = response.getFacetField(locale);
 Assert.assertEquals(1, localeFacet.getValues().size());
 en = localeFacet.getValues().get(0);
 Assert.assertEquals(en, en.getName());
 Assert.assertEquals(1, en.getCount());
 --8--

 Nevertheless, when I do the 'same' on my real environment, the count
 for the locale facet remains 2 after one of the documents is deleted.
 The search result count is fine, so that's why I think it's a facet
 cache issue. Note that the facet count remains 2 even after I restart
 the server, so the cache is persisted on the file system.

 Strangely, the facet count is updated correctly if I modify the
 document instead of deleting it (i.e. removing a keyword from the
 content so that it isn't matched by the search query any more). So it
 looks like only delete triggers the issue.

 Now, an interesting fact is that if, on my real environment, I delete
 one of the documents and then add a new one, the facet count becomes
 3. So the last commit to the index, which inserts a new document,
 doesn't trigger a re-computation of the facet cache. The previous
 facet cache is simply incremented, so the error is perpetuated. At
 this point I don't even know how to fix the facet cache without
 deleting the Solr data folder so that the full index is rebuild.

 I'm still trying to figure out what is the difference between the
 integration test and my real environment (as I used the same schema
 and configuration). Do you know what might be wrong?

 Thanks,
 Marius



Re: query parameters

2014-02-18 Thread Raymond Wiker
That could be because the second condition does not do what you think it
does... have you tried running the second condition separately?

You may have to add a base term to the second condition, like what you
have for the bq parameter in your config file; i.e, something like

(*:* -organisations:[ TO *] -roles:[ TO *])




On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote:

 It seams that fq doesn't except OR because: (organisations:(150 OR 41) AND
 roles:(174)) OR  (-organisations:[ TO *] AND -roles:[ TO *]) only
 returns docs that match the first conditions. it doesn't return any docs
 with the empty fields organisations and roles.

 -Original Message-
 From: Andreas Owen [mailto:a...@conx.ch]
 Sent: Montag, 17. Februar 2014 05:08
 To: solr-user@lucene.apache.org
 Subject: query parameters


 in solrconfig of my solr 4.3 i have a userdefined requestHandler. i would
 like to use fq to force the following conditions:
1: organisations is empty and roles is empty
2: organisations contains one of the commadelimited list in variable
 $org
3: roles contains one of the commadelimited list in variable $r
4: rule 2 and 3

 snipet of what i got (havent checked out if the is a in operator like in
 sql for the list value)

 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypeedismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=fq(organisations='' roles='') or
 (organisations=$org roles=$r) or (organisations='' roles=$r) or
 (organisations=$org roles='')/str
str name=bq(expiration:[NOW TO *] OR (*:*
 -expiration:*))^6/str  !-- tested: now or newer or empty gets small
 boost --
str name=bfdiv(clicks,max(displays,1))^8/str !-- tested
 --








Re: block join and atomic updates

2014-02-18 Thread Mikhail Khludnev
absolutely.


On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote:

 But isn't query time join much slower when it comes to a large amount of
 documents?

 Zitat von Mikhail Khludnev mkhlud...@griddynamics.com:


  Hello,

 It sounds like you need to switch to query time join.
 15.02.2014 21:57 пользователь m...@preselect-media.com написал:

  Any suggestions?


 Zitat von m...@preselect-media.com:

  Yonik Seeley yo...@heliosearch.com:


  On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:

  Is there any workaround to perform atomic updates on blocks or do I
 have to
 re-index the parent document and all its children always again if I
 want to
 update a field?


 The latter, unfortunately.


 Is there any plan to change this behavior in near future?

 So, I'm thinking of alternatives without loosing the benefit of block
 join.
 I try to explain an idea I just thought about:

 Let's say I have a parent document A with a number of fields I want to
 update regularly and a number of child documents AC_1 ... AC_n which are
 only indexed once and aren't going to change anymore.
 So, if I index A and AC_* in a block and I update A, the block is gone.
 But if I create an additional document AF which only contains something
 like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
 anymore), could I perform a {!parent ... } query on AF + AC_* and make
 an
 join from the results to get A?
 Does this makes any sense and is it even possible? ;-)
 And if it's possible, how can I do it?

 Thanks,
 - Moritz











-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Indexed a new big database while the old is running?

2014-02-18 Thread Bruno Mannina

Dear Solr Users,

We have actually a SOLR db with around 88 000 000 docs.
All work fine :)

We receive each year a new backfile with the same content (but improved).

Index these docs takes several days on SOLR,
So is it possible to create a new collection (restart SOLR) and
Index these new 88 000 000 docs without stopping the current collection ?

We have around 1 million connections by month.

Do you think that this new indexation may cause problem to SOLR using?
Note: new database will not be used until the current collection will be 
stopped.


Thx for your comment,
Bruno



Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Vineet Mishra
Hi All,

I want to have clear idea about the Fault Tolerant Capability of SolrCloud

Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
each having a replica with single collection as given in the official Solr
Documentation.

https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

   *Collection1*
 /\
   /\
 /\
   /\
 /\
/   \
*Shard 1 Shard 2*
localhost:8983localhost:7574
localhost:8900localhost:7500


I Indexed some document and then if I shutdown any of the replica or Leader
say for ex- *localhost:8900*, I can't query to the collection to that
particular port

http:/*/localhost:8900*/solr/collection1/select?q=*:*

Then how is it Fault Tolerant or how the query has to be made.

Regards


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-18 Thread Daniel Bryant

Hi Shawn, Michael,

Many thanks for your responses - we're going to try the 
replication/backup command, as we're thinking this is a 'two bird with 
one stone' approach which will not only allow us to copy the indexes, 
but also help with backups in SolrCloud as well.


Thanks again to you both!

Best wishes,

Daniel



On 17/02/2014 20:25, Michael Della Bitta wrote:

I do know for certain that the backup command on a cloud core still works.
We have a script like this running on a cron to snapshot indexes:

curl -s '
http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp
'

(not really using /tmp for this, parameters changed to protect the guilty)

The admin handler for replication doesn't seem to be there, but the actual
API seems to work normally.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote:


On 2/17/2014 8:32 AM, Daniel Bryant wrote:

I have a production SolrCloud server which has multiple sharded indexes,
and I need to copy all of the indexes to a (non-cloud) Solr server
within our QA environment.

Can I ask for advice on the best way to do this please?

I've searched the web and found solr2solr
(https://github.com/dbashford/solr2solr), but the author states that
this is best for small indexes, and ours are rather large at ~20Gb each.
I've also looked at replication, but can't find a definite reference on
how this should be done between SolrCloud and Solr?

Any guidance is very much appreciated.

If the master index isn't changing at the time of the copy, and you're
on a non-Windows platform, you should be able to copy the index
directory directly.  On a Windows platform, whether you can copy the
index while Solr is using it would depend on how Solr/Lucene opens the
files.  A typical Windows file open will prevent anything else from
opening them, and I do not know whether Lucene is smarter than that.

SolrCloud requires the replication handler to be enabled on all configs,
but during normal operation, it does not actually use replication.  This
is a confusing thing for some users.

I *think* you can configure the replication handler on slave cores with
a non-cloud config that point at the master cores, and it should
replicate the main Lucene index, but not the config files.  I have no
idea whether things will work right if you configure other master
options like replicateAfter and config files, and I also don't know if
those options might cause problems for SolrCloud itself.  Those options
shouldn't be necessary for just getting the data into a dev environment,
though.

Thanks,
Shawn




--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


Re: Increasing number of SolrIndexSearcher (Leakage)?

2014-02-18 Thread Yonik Seeley
On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien
tien.nguyenm...@gmail.com wrote:
 - *But after i index some docs and run softCommit or hardCommit with
 openSearcher=false, number of SolrIndexSearcher increase by 1*

This is fine... it's more of an internal implementation detail (we
open what is called a real-time searcher so we can drop some other
data structures like the list of non-visible document updates, etc).
If you did the commit again, the count should not continue to
increase.

If the number of searchers continues to increase, you have a searcher
leak due to something else.
Are you using any custom components or anything else that isn't stock Solr?

-Yonik
http://heliosearch.org - native off-heap filters and fieldcache for solr


RE: Boost Query Example

2014-02-18 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Michael, Thanks for the information.

Now I am trying with the query , but I am not getting the sequence in order.

SKU with 223-CL10V3 lists first (Exact Match)
ManfacturerNumber with 223-CL10V3 list Second (Exact Match) if first is 
available if not MangacturesNumber doc will be first in the list.

SKU with 223-CL10V3* list third (Starts with the number if SKU or 
ManafactureNumeber not found then this will be first in Query.

Can you check below query or rewrite the query or some help references? Below 
query not returning the way it should be..

http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20
 OR%20SKU:223-CL10V3*^1wt=jsonindent=true



-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Monday, February 17, 2014 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Boost Query Example

Hi,

Filter queries don't affect score, so boosting won't have an effect there.
If you want those query terms to get boosted, move them into the q parameter.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Hope that helps!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:


 Hi can some one help me on the Boost  Sort query example.

 http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin
 dent=truefq=SKU:223-CL10V3^100
 OR SKU:223-CL1^90

 There is not different in the query Order, Let me know if I am missing 
 something. Also I like to Order with the exact match for 
 SKU:223-CL10V3^100

 Thanks

 Ravi



Re: Boost Query Example

2014-02-18 Thread Jack Krupansky
Add debugQuery=true to your queries and look at the scoring in the explain 
section. From the intermediate scoring by field, you should be able to do 
the math to figure out what boost would be required to rank your exact match 
high enough.


-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Tuesday, February 18, 2014 9:50 AM
To: solr-user@lucene.apache.org ; michael.della.bi...@appinions.com
Subject: RE: Boost Query Example

Hi Michael, Thanks for the information.

Now I am trying with the query , but I am not getting the sequence in order.

SKU with 223-CL10V3 lists first (Exact Match)
ManfacturerNumber with 223-CL10V3 list Second (Exact Match) if first is 
available if not MangacturesNumber doc will be first in the list.


SKU with 223-CL10V3* list third (Starts with the number if SKU or 
ManafactureNumeber not found then this will be first in Query.


Can you check below query or rewrite the query or some help references? 
Below query not returning the way it should be..


http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20 
OR%20SKU:223-CL10V3*^1wt=jsonindent=true




-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
Sent: Monday, February 17, 2014 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Boost Query Example

Hi,

Filter queries don't affect score, so boosting won't have an effect there.
If you want those query terms to get boosted, move them into the q 
parameter.


http://wiki.apache.org/solr/CommonQueryParameters#fq

Hope that helps!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:



Hi can some one help me on the Boost  Sort query example.

http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin
dent=truefq=SKU:223-CL10V3^100
OR SKU:223-CL1^90

There is not different in the query Order, Let me know if I am missing
something. Also I like to Order with the exact match for
SKU:223-CL10V3^100

Thanks

Ravi





Re: Limit amount of search result

2014-02-18 Thread Sameer Maggon
You are welcome!

On Mon, Feb 17, 2014 at 11:07 PM, rachun rachun.c...@gmail.com wrote:

 hi Samee,

 Thank you very much for your suggestion.
 Now I got it worked now;)

 Chun.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Limit-amount-of-search-result-tp4117062p4117952.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Boost Query Example

2014-02-18 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

I am not much experience on this boosting, can you explain with an example?  
Really appreciated on you help.

--Ravi

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Tuesday, February 18, 2014 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Boost Query Example

Add debugQuery=true to your queries and look at the scoring in the explain 
section. From the intermediate scoring by field, you should be able to do the 
math to figure out what boost would be required to rank your exact match high 
enough.

-- Jack Krupansky

-Original Message-
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Tuesday, February 18, 2014 9:50 AM
To: solr-user@lucene.apache.org ; michael.della.bi...@appinions.com
Subject: RE: Boost Query Example

Hi Michael, Thanks for the information.

Now I am trying with the query , but I am not getting the sequence in order.

SKU with 223-CL10V3 lists first (Exact Match) ManfacturerNumber with 223-CL10V3 
list Second (Exact Match) if first is available if not MangacturesNumber doc 
will be first in the list.

SKU with 223-CL10V3* list third (Starts with the number if SKU or 
ManafactureNumeber not found then this will be first in Query.

Can you check below query or rewrite the query or some help references? 
Below query not returning the way it should be..

http://localhost:8983/solr/SRSFR_ProductCollection/select?q=SKU:223-CL10V3^10%20OR%20ManufactureNumber:223-CL10V3^5%20
OR%20SKU:223-CL10V3*^1wt=jsonindent=true



-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
Sent: Monday, February 17, 2014 4:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Boost Query Example

Hi,

Filter queries don't affect score, so boosting won't have an effect there.
If you want those query terms to get boosted, move them into the q 
parameter.

http://wiki.apache.org/solr/CommonQueryParameters#fq

Hope that helps!

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

The Science of Influence Marketing

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/


On Mon, Feb 17, 2014 at 3:49 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:


 Hi can some one help me on the Boost  Sort query example.

 http://localhost:8983/solr/ProductCollection/select?q=*%3A*wt=jsonin
 dent=truefq=SKU:223-CL10V3^100
 OR SKU:223-CL1^90

 There is not different in the query Order, Let me know if I am missing
 something. Also I like to Order with the exact match for
 SKU:223-CL10V3^100

 Thanks

 Ravi
 



Re: Indexed a new big database while the old is running?

2014-02-18 Thread Shawn Heisey
On 2/18/2014 5:28 AM, Bruno Mannina wrote:
 We have actually a SOLR db with around 88 000 000 docs.
 All work fine :)
 
 We receive each year a new backfile with the same content (but improved).
 
 Index these docs takes several days on SOLR,
 So is it possible to create a new collection (restart SOLR) and
 Index these new 88 000 000 docs without stopping the current collection ?
 
 We have around 1 million connections by month.
 
 Do you think that this new indexation may cause problem to SOLR using?
 Note: new database will not be used until the current collection will be
 stopped.

You can instantly switch between collections by using the alias feature.
 To do this, you would have collections named something like test201302
and test201402, then you would create an alias named 'test' that points
to one of these collections.  Your code can use 'test' as the collection
name.

Without a lot more information, it's impossible to say whether building
a new collection will cause performance problems for the existing
collection.

It does seem like a problem that rebuilding the index takes several
days.  You might already be having performance problems.  It's also
possible that there's an aspect to this that I am not seeing, and that
several days is perfectly normal for YOUR index.

Not enough RAM is the most common reason for performance issues on a
large index:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Shawn Heisey
On 2/18/2014 6:05 AM, Vineet Mishra wrote:
 *Shard 1 Shard 2*
 localhost:8983localhost:7574
 localhost:8900localhost:7500
 
 
 I Indexed some document and then if I shutdown any of the replica or Leader
 say for ex- *localhost:8900*, I can't query to the collection to that
 particular port
 
 http:/*/localhost:8900*/solr/collection1/select?q=*:*
 
 Then how is it Fault Tolerant or how the query has to be made.

What is the complete error you are getting?  If you don't see the error
in the response, you'll need to find your Solr Logfile and look for the
error (including a large java stacktrace) there.

Thanks,
Shawn



Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Per Steffensen
If localhost:8900 is down but localhost:8983 contain replica of the same 
shard(s) that 8900 was running, all data/documents are still available. 
You cannot query the shutdown server (port 8900), but you can query any 
of the other servers (8983, 7574 or 7500). If you make a distributed 
query to collection1 you should still be able to find all of your 
documents, even though 8900 is down.


It is cumbersome to keep a list of crashed/shutdown servers, in order to 
make sure you are always querying a server that is not down. The 
information about what servers are running (and which are not) and which 
replica they run are all in ZooKeeper. So basically, just go look in 
ZooKeeper :-) Ahh, Solr has tool to help you do that - at least if you 
are running your client in java-code. Solr implement different kinds of 
clients (called XXXSolrServer - yes, obvious name for a client). There 
are HttpSolrServer that can do queries against a particular server (wont 
help you), there are LBHttpSolrServer that can do load-balancing over 
several HttpSolrServers (ahh, still not there), and there are 
CloudSolrServer that watches ZooKeeper in order to know what is running 
and where to send requests. CloudSolrServer uses LBHttpSolrServer behind 
the scenes. If you use CloudSolrServer as a client everything should be 
smooth and transparent with respect to querying when servers are down. 
CloudSolrServer will find out where to (and not to) route your requests.


Regards, Per Steffensen

On 18/02/14 14:05, Vineet Mishra wrote:

Hi All,

I want to have clear idea about the Fault Tolerant Capability of SolrCloud

Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
each having a replica with single collection as given in the official Solr
Documentation.

https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

*Collection1*
  /\
/\
  /\
/\
  /\
 /   \
*Shard 1 Shard 2*
localhost:8983localhost:7574
localhost:8900localhost:7500


I Indexed some document and then if I shutdown any of the replica or Leader
say for ex- *localhost:8900*, I can't query to the collection to that
particular port

http:/*/localhost:8900*/solr/collection1/select?q=*:*

Then how is it Fault Tolerant or how the query has to be made.

Regards





Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Amit Jha
Solr will complaint only if you brought down both replica  leader of same 
shard. It would be difficult to have highly available env. If you have less 
number of physical servers.

Rgds
AJ

 On 18-Feb-2014, at 18:35, Vineet Mishra clearmido...@gmail.com wrote:
 
 Hi All,
 
 I want to have clear idea about the Fault Tolerant Capability of SolrCloud
 
 Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
 each having a replica with single collection as given in the official Solr
 Documentation.
 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
 
   *Collection1*
 /\
   /\
 /\
   /\
 /\
/   \
 *Shard 1 Shard 2*
 localhost:8983localhost:7574
 localhost:8900localhost:7500
 
 
 I Indexed some document and then if I shutdown any of the replica or Leader
 say for ex- *localhost:8900*, I can't query to the collection to that
 particular port
 
 http:/*/localhost:8900*/solr/collection1/select?q=*:*
 
 Then how is it Fault Tolerant or how the query has to be made.
 
 Regards


Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Shawn Heisey

On 2/18/2014 8:32 AM, Shawn Heisey wrote:

On 2/18/2014 6:05 AM, Vineet Mishra wrote:

*Shard 1 Shard 2*
localhost:8983localhost:7574
localhost:8900localhost:7500


I Indexed some document and then if I shutdown any of the replica or Leader
say for ex- *localhost:8900*, I can't query to the collection to that
particular port

http:/*/localhost:8900*/solr/collection1/select?q=*:*

Then how is it Fault Tolerant or how the query has to be made.

What is the complete error you are getting?  If you don't see the error
in the response, you'll need to find your Solr Logfile and look for the
error (including a large java stacktrace) there.


Good catch by Per.  I did not notice that you were trying to send the 
query to the server that you took down.  This isn't going to work -- if 
the software you're trying to reach is not running, it won't respond.  
Think about what happens if you are sending requests to a server and it 
crashes completely.


If you want to always send to the same host/port, you will need a load 
balancer listening on that port.  You'll also want something that 
maintains a shared IP address, so that if the machine dies, the IP 
address and the load balancer move to another machine.  Haproxy and 
Pacemaker work very well as a combination for this.  There are many 
other choices, both hardware and software.


Per also mentioned the other option - you can write code that knows 
about multiple URLs and can switch between them.  This is something you 
get for free with CloudSolrServer when writing Java code with SolrJ.


Thanks,
Shawn



Additive boost function

2014-02-18 Thread Zwer
Hi Guys,

I faced with a problem of additive boosting.

2 fields: last_name and first_name.

User is searching for mike t

Query: (last_name:mike^15 last_name:mike*^7 first_name:mike^10
first_name:mike*^5) AND (last_name:t^15 last_name:t*^7 first_name:t^10
first_name:*^5)

The search result does not meet the expectations because score model
includes others statics of searching terms on the SOLR index. 
According to scoring formula of DefaultSimilarity the result score is a
multiplication.

The question is how to implement additive score model based on my boost
values ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-18 Thread Colin Bartolome

On 02/17/2014 09:46 PM, Shawn Heisey wrote:

I think I put too much information in my reply.  Apologies.  Here's the
most important information to deal with first:

Don't send hard commits at all.  Configure autoCommit in your server
config, with the all-important openSearcher parameter set to false.
That will take care of all your hard commit needs, but those commits
will never open a new searcher, so they cannot cause an overlap with the
soft commits that DO open a new searcher.

Thanks,
Shawn



I'll describe a bit more about our setup, so I can say why I don't think 
that'll work for us:


* Our web servers send update requests to Solr via a background thread, so 
HTTP requests don't have to wait for the request to complete.
* That background thread has a small chance of failing. If it does, the 
update request won't happen until our hard commit job runs.
* Other scheduled jobs can send update requests to Solr. Some jobs 
suppress this, because they do a lot of updating, instead relying on the 
hard commit job.
* The hard commit job does a batch of updates, waits for the commit to 
complete, then sets some flags in our database to indicate that the 
content has been successfully indexed.


It's that last point that leads us to want to do explicit hard commits. By 
setting those flags in our database, we're assuring ourselves that, no 
matter if any other steps failed along the way, we're absolutely sure the 
content was indexed properly.


If there's no other way to do this, I'm okay with filing an RFE in JIRA 
and continuing to ignore the multiple on-deck searchers warning for now.


Re: Best way to copy data from SolrCloud to standalone Solr?

2014-02-18 Thread Shalin Shekhar Mangar
There's a related issue: SOLR-5340 - Add support for named snapshots.
I think we'd want this in SolrCloud soon.

https://issues.apache.org/jira/browse/SOLR-5340

On Tue, Feb 18, 2014 at 7:23 PM, Daniel Bryant
daniel.bry...@tai-dev.co.uk wrote:
 Hi Shawn, Michael,

 Many thanks for your responses - we're going to try the replication/backup
 command, as we're thinking this is a 'two bird with one stone' approach
 which will not only allow us to copy the indexes, but also help with backups
 in SolrCloud as well.

 Thanks again to you both!

 Best wishes,

 Daniel




 On 17/02/2014 20:25, Michael Della Bitta wrote:

 I do know for certain that the backup command on a cloud core still works.
 We have a script like this running on a cron to snapshot indexes:

 curl -s '

 http://localhost:8080/solr/#{core}/replication?command=backupnumberToKeep=4location=/tmp
 '

 (not really using /tmp for this, parameters changed to protect the guilty)

 The admin handler for replication doesn't seem to be there, but the actual
 API seems to work normally.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062

 appinions inc.

 The Science of Influence Marketing

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:

 plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/


 On Mon, Feb 17, 2014 at 2:02 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/17/2014 8:32 AM, Daniel Bryant wrote:

 I have a production SolrCloud server which has multiple sharded indexes,
 and I need to copy all of the indexes to a (non-cloud) Solr server
 within our QA environment.

 Can I ask for advice on the best way to do this please?

 I've searched the web and found solr2solr
 (https://github.com/dbashford/solr2solr), but the author states that
 this is best for small indexes, and ours are rather large at ~20Gb each.
 I've also looked at replication, but can't find a definite reference on
 how this should be done between SolrCloud and Solr?

 Any guidance is very much appreciated.

 If the master index isn't changing at the time of the copy, and you're
 on a non-Windows platform, you should be able to copy the index
 directory directly.  On a Windows platform, whether you can copy the
 index while Solr is using it would depend on how Solr/Lucene opens the
 files.  A typical Windows file open will prevent anything else from
 opening them, and I do not know whether Lucene is smarter than that.

 SolrCloud requires the replication handler to be enabled on all configs,
 but during normal operation, it does not actually use replication.  This
 is a confusing thing for some users.

 I *think* you can configure the replication handler on slave cores with
 a non-cloud config that point at the master cores, and it should
 replicate the main Lucene index, but not the config files.  I have no
 idea whether things will work right if you configure other master
 options like replicateAfter and config files, and I also don't know if
 those options might cause problems for SolrCloud itself.  Those options
 shouldn't be necessary for just getting the data into a dev environment,
 though.

 Thanks,
 Shawn



 --
 *Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
 http://www.tai-dev.co.uk/*
 daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 (0)
 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk



-- 
Regards,
Shalin Shekhar Mangar.


Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-18 Thread Shawn Heisey

On 2/18/2014 10:59 AM, Colin Bartolome wrote:
I'll describe a bit more about our setup, so I can say why I don't 
think that'll work for us:


* Our web servers send update requests to Solr via a background 
thread, so HTTP requests don't have to wait for the request to complete.
* That background thread has a small chance of failing. If it does, 
the update request won't happen until our hard commit job runs.
* Other scheduled jobs can send update requests to Solr. Some jobs 
suppress this, because they do a lot of updating, instead relying on 
the hard commit job.
* The hard commit job does a batch of updates, waits for the commit 
to complete, then sets some flags in our database to indicate that the 
content has been successfully indexed.


It's that last point that leads us to want to do explicit hard 
commits. By setting those flags in our database, we're assuring 
ourselves that, no matter if any other steps failed along the way, 
we're absolutely sure the content was indexed properly.


If you want to be completely in control like that, get rid of the 
automatic soft commits and just do the hard commits.


I would personally choose another option for your setup -- get rid of 
*all* explicit commits entirely, and just configure autoCommit and 
autoSoftCommit in the server config.  Since you're running 4.x, you 
really should have the transaction log (updateLog in the config) 
enabled.  You can rely on the transaction log to replay updates since 
the last hard commit if there's ever a crash.


I would also recommend upgrading to 4.6.1, but that's a completely 
separate item.


Thanks,
Shawn



Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-18 Thread bbi123
Thanks a lot for your response Erik.

I was trying to find if I have any suggestion starting with numbers using
terms component but I couldn't find any.. Its very strange!!!

Anyways, thanks again for your response.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751p4118072.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Suggester not working in sharding (distributed search)

2014-02-18 Thread bbi123
Try this

http://solr:8983/solr/select?*q=*:**spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=/spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-distributed-search-with-the-suggest-component-tp3197651p4118075.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Additive boost function

2014-02-18 Thread Jack Krupansky

The edismax query parser bf parameter gives you an additive boost.

See:
http://wiki.apache.org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29

-- Jack Krupansky

-Original Message- 
From: Zwer

Sent: Tuesday, February 18, 2014 12:52 PM
To: solr-user@lucene.apache.org
Subject: Additive boost function

Hi Guys,

I faced with a problem of additive boosting.

2 fields: last_name and first_name.

User is searching for mike t

Query: (last_name:mike^15 last_name:mike*^7 first_name:mike^10
first_name:mike*^5) AND (last_name:t^15 last_name:t*^7 first_name:t^10
first_name:*^5)

The search result does not meet the expectations because score model
includes others statics of searching terms on the SOLR index.
According to scoring formula of DefaultSimilarity the result score is a
multiplication.

The question is how to implement additive score model based on my boost
values ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Additive-boost-function-tp4118066.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-18 Thread Colin Bartolome

On 02/18/2014 10:15 AM, Shawn Heisey wrote:

If you want to be completely in control like that, get rid of the
automatic soft commits and just do the hard commits.

I would personally choose another option for your setup -- get rid of
*all* explicit commits entirely, and just configure autoCommit and
autoSoftCommit in the server config.  Since you're running 4.x, you really
should have the transaction log (updateLog in the config) enabled.  You
can rely on the transaction log to replay updates since the last hard
commit if there's ever a crash.

I would also recommend upgrading to 4.6.1, but that's a completely
separate item.

Thanks,
Shawn



We use the automatic soft commits to get search index updates to our users 
faster, via Near Realtime Searching. We have the updateLog enabled. I'm 
not worried that the Solr side of the equation will lose data; I'm worried 
that the communication from our web servers and scheduled jobs to the Solr 
servers will break down and nothing will come along to make sure 
everything is up to date. It sounds like what we're picturing is not 
currently supported, so I'll file the RFE.


Will upgrading to 4.6.1 help at all with this issue?


JOB @ Sematext: Professional Services Lead = Head

2014-02-18 Thread Otis Gospodnetic
Hello,


We have what I think is a great opening at Sematext. Ideal candidate would
be in New York, but that's not an absolute must. More info below + on
http://sematext.com/about/jobs.html in job-ad-speak, but I'd be happy to
describe what we are looking for, what we do, and what types of companies
we work with in regular-human-speak off-line.

DESCRIPTION

Sematext is hiring a technical, hands-onProfessional Services Lead to join,
lead, and grow the Professional Services side of Sematext and potentially
grow into the Head role.

REQUIREMENTS

* Experience working with Solr or Elasticsearch

* Plan and coordinate customer engagements from business and technical
perspective

* Identify customer pain points, needs, and success criteria at the onset
of each engagement

* Provide expert-level consulting and support services and strive to be a
trustworthy advisor to a wide range of customers

* Resolve complex search issues involving Solr or Elasticsearch

* Identify opportunities to provide customers with additional value through
our products or services

* Communicate high-value use cases and customer feedback to our Product
teams

* Participate in open source community by contributing bug fixes,
improvements, answering questions, etc.

EXPERIENCE

* BS or higher in Engineering or Computer Science preferred

* 2 or more years of IT Consulting and/or Professional Services experience
required

* Exposure to other related open source projects (Hadoop, Nutch, Kafka,
Storm, Mahout, etc.) a plus

* Experience with other commercial and open source search technologies a
plus

* Enterprise Search, eCommerce, and/or Business Intelligence experience a
plus

* Experience working in a startup a plus

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


Re: Slow 95th-percentile

2014-02-18 Thread Allan Carroll
Thanks for the suggestions.

 I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil 
only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no 
full CMS collections. Anything other places to look for GC activity I might be 
missing?

I did a little investigation this morning and found that if I run a query once 
a second, every 10th query is slow. Looks suspiciously like the soft commits 
are causing the slow downs. I could make it further in between. Anything else I 
can look at to make those commits less costly?



Here are the java options:

-server -XX:+AggressiveOpts -XX:+UseCompressedOops -Xmx3G -Xms3G -Xss256k 
-XX:MaxPermSize=128m -XX:PermSize=96m -XX:NewSize=1024m -XX:MaxNewSize=1024m 
-XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=6 -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-Xloggc:/var/log/tomcat7/gc-tomcat.log -verbose:gc -XX:GCLogFileSize=10M 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:+PrintGCDetails 
-XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
-XX:+PrintTenuringDistribution -XX:-PrintGCApplicationStoppedTime 
-DzkHost=xx.xx.xx.xx:2181,xx.xx.xx.xx:2181,xx.xx.xx.xx:2181/solr 
-Dcom.sun.management.jmxremote 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Djava.endorsed.dirs=/usr/share/tomcat7/endorsed

I’m using tomcat, though I’ve heard that jetty can be a better choice.

I’ve also attached my solrconfig.

-Allan

On February 17, 2014 at 6:06:03 PM, Shawn Heisey (s...@elyograg.org) wrote:

On 2/17/2014 6:12 PM, Allan Carroll wrote:  
 I'm having trouble getting my Solr setup to get consistent performance. 
 Average select latency is great, but 95% is dismal (10x average). It's 
 probably something slightly misconfigured. I’ve seen it have nice, low 
 variance latencies for a few hours here and there, but can’t figure out 
 what’s different during those times.  
  
  
 * I’m running 4.1.0 using SolrCloud. 3 replicas of 1 shard on 3 EC2 boxes 
 (8proc, 30GB RAM, SSDs). Load peaks around 30 selects per second and about 
 150 updates per second.  
  
 * The index has about 11GB of data in 14M docs, the other 10MB of data in 3K 
 docs. Stays around 30 segments.  
  
 * Soft commits after 10 seconds, hard commits after 120 seconds. Though, 
 turning off the update traffic doesn’t seem to have any affect on the select 
 latencies.  
  
 * I think GC latency is low. Running 3GB heaps with 1G new size. GC time is 
 around 3ms per second.  
  
  
 Here’s a typical select query:  
  
 fl=*,sortScore:textScoresort=textScore descstart=0q=text:((soccer OR 
 MLS OR premier league OR FIFA OR world cup) OR (sorority OR 
 fraternity OR greek life OR dorm OR 
 campus))wt=jsonfq=startTime:[139265640 TO 139271754]fq={!frange 
 l=2 u=3}timeflag(startTime)fq={!frange l=139265640 u=139269594 
 cache=false}timefix(startTime,-2160)fq=privacy:OPENdefType=edismaxrows=131
   

The first thing to say is that it's fairly normal for the 95th and 99th  
percentile values to be quite a lot higher than the median and average  
values. I don't have actual values so I don't know if it's bad or not.  

You're good on the most important performance-related resource, which is  
memory for the OS disk cache. The only thing that stands out as a  
possible problem from what I know so far is garbage collection. It  
might be a case of full garbage collections happening too frequently, or  
it might be a case of garbage collection pauses taking too long. It  
might even be a combination of both.  

To fix frequent full collections, increase the heap size. To fix the  
other problem, use the CMS collector and tune it.  

Two bits of information will help with recommendations: Your java  
startup options, and your solrconfig.xml.  

You're using an option in your query that I've never seen before. I  
don't know if frange is slow or not.  

One last thing that might cause problems is super-frequent commits.  

I could also be completely wrong!  

Thanks,  
Shawn  



Caching Solr boost functions?

2014-02-18 Thread Gregg Donovan
We're testing out a new handler that uses edismax with three different
boost functions. One has a random() function in it, so is not very
cacheable, but the other two boost functions do not change from query to
query.

I'd like to tell Solr to cache those boost queries for the life of the
Searcher so they don't get recomputed every time. Is there any way to do
that out of the box?

In a different custom QParser we have we wrote a CachingValueSource that
wrapped a ValueSource with a custom ValueSource cache. Would it make sense
to implement that as a standard Solr function so that one could do:

boost=cache(expensiveFunctionQuery())

Thanks.

--Gregg


RE: query parameters

2014-02-18 Thread Andreas Owen
I tried it in solr admin query and it showed me all the docs without a value
in ogranisations and roles. It didn't matter if i used a base term, isn't
that give through the q-parameter?

-Original Message-
From: Raymond Wiker [mailto:rwi...@gmail.com] 
Sent: Dienstag, 18. Februar 2014 13:19
To: solr-user@lucene.apache.org
Subject: Re: query parameters

That could be because the second condition does not do what you think it
does... have you tried running the second condition separately?

You may have to add a base term to the second condition, like what you
have for the bq parameter in your config file; i.e, something like

(*:* -organisations:[ TO *] -roles:[ TO *])




On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote:

 It seams that fq doesn't except OR because: (organisations:(150 OR 41) 
 AND
 roles:(174)) OR  (-organisations:[ TO *] AND -roles:[ TO *]) only 
 returns docs that match the first conditions. it doesn't return any 
 docs with the empty fields organisations and roles.

 -Original Message-
 From: Andreas Owen [mailto:a...@conx.ch]
 Sent: Montag, 17. Februar 2014 05:08
 To: solr-user@lucene.apache.org
 Subject: query parameters


 in solrconfig of my solr 4.3 i have a userdefined requestHandler. i 
 would like to use fq to force the following conditions:
1: organisations is empty and roles is empty
2: organisations contains one of the commadelimited list in 
 variable $org
3: roles contains one of the commadelimited list in variable $r
4: rule 2 and 3

 snipet of what i got (havent checked out if the is a in operator 
 like in sql for the list value)

 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=defTypeedismax/str
str name=synonymstrue/str
str name=qfplain_text^10 editorschoice^200
 title^20 h_*^14
 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
 contentmanager^5 links^5
 last_modified^5 url^5
/str
str name=fq(organisations='' roles='') or 
 (organisations=$org roles=$r) or (organisations='' roles=$r) or 
 (organisations=$org roles='')/str
str name=bq(expiration:[NOW TO *] OR (*:* 
 -expiration:*))^6/str  !-- tested: now or newer or empty gets small 
 boost --
str name=bfdiv(clicks,max(displays,1))^8/str !-- 
 tested
 --









Using payloads for expanded query terms

2014-02-18 Thread Manuel Le Normand
Hello,
I'm trying to handle a situation with taxonomy search - that is for each
taxonomy I have a list of words with their boosts. These taxonomies are
updated frequently so I retrieve these scored lists at query time from an
external service.

My expectation would be:
 q={!some_query_parser}Cities_France OR Cities_England = q=max(Paris^0.5
Lyon^0.4 La Defense^0.3) OR max(London^0.5, Oxford^4)

Implementations possibilities I thought about:

   1. An adapted synonym filter, where query term boosts are encoded as
   payloads.
   2. Query parser that handles the term expansion and weighting. The main
   drawback is the fact it forces me to stick to my own query parser.
   3. Building the query outside Solr.

What would you recommand?

Thanks,
Manuel


Re: Slow 95th-percentile

2014-02-18 Thread Shawn Heisey

On 2/18/2014 11:51 AM, Allan Carroll wrote:

  I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil 
only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no 
full CMS collections. Anything other places to look for GC activity I might be 
missing?

I did a little investigation this morning and found that if I run a query once 
a second, every 10th query is slow. Looks suspiciously like the soft commits 
are causing the slow downs. I could make it further in between. Anything else I 
can look at to make those commits less costly?


It does indeed sound like the 10 second soft commit is the problem.  The 
opening a new searcher part of a commit tends to be fairly expensive.  
The impact is even greater when combined with flushing data to disk, 
which is why soft commits can be faster than hard commits ... but 
building a new searcher is not cheap even then.


Do you have autoCommit configured, with openSearcher=false?  If not, you 
should.


If you are using Solr caches, reducing (or eliminating) the 
autowarmCount values on each cache (particularly the filterCache) can 
make commits happen quite a lot faster.  With a commit potentially 
happening every ten seconds, you might want to configure those caches so 
they are pretty small.  Frequent commits mean that the caches are 
frequently invalidated.  If commit frequency is high and autowarmCount 
values are low, a large cache is just a waste of memory.  The cache 
config was the main thing I was interested in seeing when I asked for 
solrconfig.xml.


You have a lot of GC tuning going on, which is good - untuned GC and 
Solr do NOT get along.  I'll just show you what I use and let you make 
your own decision.


http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Thanks,
Shawn



RE: Solr4 performance

2014-02-18 Thread Joshi, Shital
Hi,

Thanks much for all suggestions. We're looking into reducing allocated heap 
size of Solr4 JVM. 

We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
Can someone please confirm?

Would optimization help with performance? We did that in QA (took about 13 
hours for 700 mil documents) 

Thanks!

-Original Message-
From: Roman Chyla [mailto:roman.ch...@gmail.com] 
Sent: Wednesday, February 12, 2014 3:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

And perhaps one other, but very pertinent, recommendation is: allocate only
as little heap as is necessary. By allocating more, you are working against
the OS caching. To know how much is enough is bit tricky, though.

Best,

  roman


On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/12/2014 12:07 PM, Greg Walters wrote:

 Take a look at http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-
 on-64bit.html as it's a pretty decent explanation of memory mapped
 files. I don't believe that the default configuration for solr is to use
 MMapDirectory but even if it does my understanding is that the entire file
 won't be forcibly cached by solr. The OS's filesystem cache should control
 what's actually in ram and the eviction process will depend on the OS.


 I only have a little bit to add.  Here's the first thing that Uwe's blog
 post (linked above) says:

 Since version 3.1, *Apache Lucene*and *Solr *use MMapDirectoryby default
 on 64bit Windows and Solaris systems; since version 3.3 also for 64bit
 Linux systems.

 The default in Solr 4.x is NRTCachingDirectory, which uses MMapDirectory
 by default under the hood.

 A summary about all this that should be relevant to the original question:

 It's the *operating system* that handles memory mapping, including any
 caching that happens.  Assuming that you don't have a badly configured
 virtual machine setup, I'm fairly sure that only real memory gets used,
 never swap space on the disk.  If something else on the system makes a
 memory allocation, the operating system will instantly give up memory used
 for caching and mapping.  One of the strengths of mmap is that it can't
 exceed available resources unless it's used incorrectly.

 Thanks,
 Shawn




Re: Solr4 performance

2014-02-18 Thread Shawn Heisey

On 2/18/2014 2:14 PM, Joshi, Shital wrote:

Thanks much for all suggestions. We're looking into reducing allocated heap 
size of Solr4 JVM.

We're using NRTCachingDirectoryFactory. Does it use MMapDirectory internally? 
Can someone please confirm?


In Solr, NRTCachingDirectory does indeed use MMapDirectory as its 
default delegate.  That's probably also the case with Lucene -- these 
are Lucene classes, after all.


MMapDirectory is almost always the most efficient way to handle on-disk 
indexes.


Thanks,
Shawn



Cluster state ranges are all null after reboot

2014-02-18 Thread Greg Pendlebury
We've got a 15 shard cluster spread across 3 hosts. This morning our puppet
software rebooted them all and afterwards the 'range' for each shard has
become null in zookeeper. Is there any way to restore this value short of
rebuilding a fresh index?

I've read various questions from people with a similar problem, although in
those cases it is usually a single shard that has become null allowing them
to infer what the value should be and manually fix it in ZK. In this case I
have no idea what the ranges should be. This is our test cluster, and
checking production I can see that the ranges don't appear to be
predictable based on the shard number.

I'm also not certain why it even occurred. Our test cluster only has a
single replica per shard, so when a JVM is rebooted the cluster is
unavailable... would that cause this? Production has 3 replicas so we can
do rolling reboots.


SOLR Suggester - return matched suggestion along with other suggestions

2014-02-18 Thread bbi123
Hi,

Is there a way to make suggester return the matched suggestion too?

http://localhost:8983/solr/core1/suggest?q=name:iphone

The above query should return 
*iphone *
iphone5c 
iphone4g

Currently it returns only
iphone5c 
iphone4g


I can use edge N gram filter to implement the above feature but not sure how
to achieve it when using suggester.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Suggester-return-matched-suggestion-along-with-other-suggestions-tp4118132.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrJ 3.4 Client compatible with Solr 4.6 Server?

2014-02-18 Thread Lan
I'm in the process of updating from Solr 3.4 to Solr 4.6.  Is the SolrJ 3.4
Client  forward compatible with Solr 4.6? 

This isn't mentioned in the documentation
http://wiki.apache.org/solr/javabin page.

In a test environment, I did some indexing and querying  with a SolrJ3.4
Client and a Solr4.6 server and there were no errors. I'm using the javabin
format for updates and sharded queries.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-3-4-Client-compatible-with-Solr-4-6-Server-tp4118134.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Escape \\n from getting highlighted - highlighter component

2014-02-18 Thread T. Kuro Kurosaka

Your search expression means 'talk' OR 'n' OR 'text'.
I think you want to do a phrase search. To do that, quote the whole 
thing with double-quotes talk n text, if you are using one of the Solr 
standard query parsers.



On 02/17/2014 03:53 PM, Developer wrote:

Hi,

When searching for a text like 'talk n text' the highlighter component also
adds the em tags to the special characters like \n. Is there a way to
avoid highlighting the special characters?

\\r\\n Family Messaging

  is getting replaced as

\\r\\emn/em Family Messaging


Kuro



Re: Slow 95th-percentile

2014-02-18 Thread Allan Carroll
Slowing the soft commits to every 100 seconds helped. The main culprit was a 
bad query that was coming through every few seconds. Something about the empty 
fq param and the q=* slowed everything else down.

INFO: [event] webapp=/solr path=/select 
params={start=0q=*wt=javabinfq=fq=startTime:139283643version=2} 
hits=1894 status=0 QTime=6943

Thanks for all your help. 

-Allan

On February 18, 2014 at 12:24:37 PM, Shawn Heisey (s...@elyograg.org) wrote:

On 2/18/2014 11:51 AM, Allan Carroll wrote:  
 I was thinking GC too, but it doesn’t feel like it is. Running jstat -gcutil 
 only shows a 10-50ms parnew collection every 10 or 15 seconds and almost no 
 full CMS collections. Anything other places to look for GC activity I might 
 be missing?  
  
 I did a little investigation this morning and found that if I run a query 
 once a second, every 10th query is slow. Looks suspiciously like the soft 
 commits are causing the slow downs. I could make it further in between. 
 Anything else I can look at to make those commits less costly?  

It does indeed sound like the 10 second soft commit is the problem. The  
opening a new searcher part of a commit tends to be fairly expensive.  
The impact is even greater when combined with flushing data to disk,  
which is why soft commits can be faster than hard commits ... but  
building a new searcher is not cheap even then.  

Do you have autoCommit configured, with openSearcher=false? If not, you  
should.  

If you are using Solr caches, reducing (or eliminating) the  
autowarmCount values on each cache (particularly the filterCache) can  
make commits happen quite a lot faster. With a commit potentially  
happening every ten seconds, you might want to configure those caches so  
they are pretty small. Frequent commits mean that the caches are  
frequently invalidated. If commit frequency is high and autowarmCount  
values are low, a large cache is just a waste of memory. The cache  
config was the main thing I was interested in seeing when I asked for  
solrconfig.xml.  

You have a lot of GC tuning going on, which is good - untuned GC and  
Solr do NOT get along. I'll just show you what I use and let you make  
your own decision.  

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning  

Thanks,  
Shawn  



Re: SOLR Suggester - return matched suggestion along with other suggestions

2014-02-18 Thread bbi123
Nevermind, I added a space to the end of all the field values (keywords)
supplied to suggester and it works!!!

iphone is indexed as iphone (with additional space at the end)

I trim the value passed to the search after selection the keyword from
dropdown suggestion so it will be again passed as iphone(without space) when
querying SOLR.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Suggester-return-matched-suggestion-along-with-other-suggestions-tp4118132p4118137.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Slow 95th-percentile

2014-02-18 Thread Chris Hostetter

: Slowing the soft commits to every 100 seconds helped. The main culprit 
: was a bad query that was coming through every few seconds. Something 
: about the empty fq param and the q=* slowed everything else down.
: 
: INFO: [event] webapp=/solr path=/select 
: params={start=0q=*wt=javabinfq=fq=startTime:139283643version=2} 
: hits=1894 status=0 QTime=6943

1) if you are using Solr 4.1 or earlier, then q=* is an expensive  
useless query that doesn't mean what you think it does...

  https://issues.apache.org/jira/browse/SOLR-2996

2) an empty fq doesn't cost anything -- if you use debugQuery=true you 
should see that it's not even included in parsed_filter_queries because 
it's totally ignored.

3) if that startTime value changes at some fixed and regular 
interval, that could explain some anomoloies if it's normally the 
same and cached, but changes once a day/hour/minute or whatever and is a 
bit slow to cache.


bottom line: a softCommit is going to re-open a searcher, which is going 
to wipe your caches.  if you don't have any (auto)warming configured, that 
means any fqs, or qs that you run regularly are going to pay the 
price of being slow the first time they are run against a new searcher 
is opened.

If your priority is low response time, you really want to open new 
searchers as infrequently as your SLA for visibility allows, and use 
(auto)warming for those common queries.



-Hoss
http://www.lucidworks.com/


Weird behavior of stopwords in search query

2014-02-18 Thread Shamik Bandopadhyay
Hi,

  I'm observing a weird behavior while using stopwords as part of the
search query. I'm able to replicate it in standalone Solr instance well.
The issue pops up when I'm trying to use other and and stopword
together in a query string. The query doesn't return any result. But it
works with any other combination. For e.g.

1. query yields no result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml


Debug Query :


str name=rawquerystringAWS other and Search/str

str name=querystringAWS other and Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 |
author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 |
keywords:search^5.0 | manu:search^1.1 | description:search^5.0 |
resourcename:search | name:search^1.2 | features:search |
sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5)
+(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5
| cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 |
description:search^5.0 | resourcename:search | name:search^1.2 |
features:search | sku:search^1.5))/str





2. query yields result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml

Debug Query
-

str name=rawquerystringAWS other an Search/str

str name=querystringAWS other an Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 |
cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5)
(id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))/str

Both other and and are part of the stopwords list.

I ran an analysis on text_general field, both stopwords were shows as
ignored during indexing and query time, but not happening during actual
search.

Not sure what I'm missing here, any pointers will be appreciated.

- Thanks,
Shamik


Re: SolrJ 3.4 Client compatible with Solr 4.6 Server?

2014-02-18 Thread Shawn Heisey

On 2/18/2014 5:13 PM, Lan wrote:

I'm in the process of updating from Solr 3.4 to Solr 4.6.  Is the SolrJ 3.4
Client  forward compatible with Solr 4.6?

This isn't mentioned in the documentation
http://wiki.apache.org/solr/javabin page.

In a test environment, I did some indexing and querying  with a SolrJ3.4
Client and a Solr4.6 server and there were no errors. I'm using the javabin
format for updates and sharded queries.


Almost everything you can do with the 3.x client will work without 
problems.  If you're trying to do something unusual, you might have some 
trouble.  Technically we don't recommend mixing versions, but I was 
running mixed versions for a number of months without problems.


You mentioned javabin -- both versions of SolrJ utilize javabin for 
responses, but requests are still XML in SolrJ 3.x. You should avoid 
switching to BinaryRequestWriter until after you upgrade SolrJ, because 
the 3.x client will try to use a different URL path for binary update 
requests, one that is not compatible with a typical 4.x configuration.


Are you in a position where you can make quick changes to the code and 
recompile?  If you are, I can definitely help you work through any 
problems.  I can't make promises about others, but I'm sure I'm not the 
only one willing to help.


There are a fair number of jarfile changes required to upgrade SolrJ, 
but the number of required code changes is usually small.  Upgrading 
SolrJ should be fairly high on your priority list, especially if you 
plan to utilize SolrCloud.


Thanks,
Shawn



Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-18 Thread Erick Erickson
Colin:

Stop. Back up. The automatic soft commits will make updates available to
your users every second. Those documents _include_ anything from your hard
commit jobs. What could be faster? Parenthetically I'll add that 1 second
soft commits are rarely an actual requirement, but that's your decision.

For the hard commits. Fine. Do them if you insist. Just set
openSearcher=false. The documents will be searchable the next time the soft
commit happens, within one second. The key is openSearcher=false. That
prevents starting a brand new searcher.

BTW, your commits are not failing. It's just that _after_ the commit
happens, the warming searcher limit is exceeded.

You can even wait until the segments are flushed to disk. All without
opening a searcher.

Shawn is spot on in his recommendations to not fixate on the commits. Solr
handles that. Here's a long blog about all the details of durability .vs.
visibility.
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

You're over-thinking the problem here, trying to control commits with a
sledgehammer when you don't need to, just use the built-in capabilities.

Best,
Erick



On Tue, Feb 18, 2014 at 10:33 AM, Colin Bartolome co...@e-e.com wrote:

 On 02/18/2014 10:15 AM, Shawn Heisey wrote:

 If you want to be completely in control like that, get rid of the
 automatic soft commits and just do the hard commits.

 I would personally choose another option for your setup -- get rid of
 *all* explicit commits entirely, and just configure autoCommit and
 autoSoftCommit in the server config.  Since you're running 4.x, you really
 should have the transaction log (updateLog in the config) enabled.  You
 can rely on the transaction log to replay updates since the last hard
 commit if there's ever a crash.

 I would also recommend upgrading to 4.6.1, but that's a completely
 separate item.

 Thanks,
 Shawn


 We use the automatic soft commits to get search index updates to our users
 faster, via Near Realtime Searching. We have the updateLog enabled. I'm not
 worried that the Solr side of the equation will lose data; I'm worried that
 the communication from our web servers and scheduled jobs to the Solr
 servers will break down and nothing will come along to make sure everything
 is up to date. It sounds like what we're picturing is not currently
 supported, so I'll file the RFE.

 Will upgrading to 4.6.1 help at all with this issue?



Re: query parameters

2014-02-18 Thread Erick Erickson
Solr/Lucene query language is NOT strictly boolean, see
Chris's excellent blog here:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best,
Erick


On Tue, Feb 18, 2014 at 11:54 AM, Andreas Owen a...@conx.ch wrote:

 I tried it in solr admin query and it showed me all the docs without a
 value
 in ogranisations and roles. It didn't matter if i used a base term, isn't
 that give through the q-parameter?

 -Original Message-
 From: Raymond Wiker [mailto:rwi...@gmail.com]
 Sent: Dienstag, 18. Februar 2014 13:19
 To: solr-user@lucene.apache.org
 Subject: Re: query parameters

 That could be because the second condition does not do what you think it
 does... have you tried running the second condition separately?

 You may have to add a base term to the second condition, like what you
 have for the bq parameter in your config file; i.e, something like

 (*:* -organisations:[ TO *] -roles:[ TO *])




 On Tue, Feb 18, 2014 at 12:16 PM, Andreas Owen a...@conx.ch wrote:

  It seams that fq doesn't except OR because: (organisations:(150 OR 41)
  AND
  roles:(174)) OR  (-organisations:[ TO *] AND -roles:[ TO *]) only
  returns docs that match the first conditions. it doesn't return any
  docs with the empty fields organisations and roles.
 
  -Original Message-
  From: Andreas Owen [mailto:a...@conx.ch]
  Sent: Montag, 17. Februar 2014 05:08
  To: solr-user@lucene.apache.org
  Subject: query parameters
 
 
  in solrconfig of my solr 4.3 i have a userdefined requestHandler. i
  would like to use fq to force the following conditions:
 1: organisations is empty and roles is empty
 2: organisations contains one of the commadelimited list in
  variable $org
 3: roles contains one of the commadelimited list in variable $r
 4: rule 2 and 3
 
  snipet of what i got (havent checked out if the is a in operator
  like in sql for the list value)
 
  lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=defTypeedismax/str
 str name=synonymstrue/str
 str name=qfplain_text^10 editorschoice^200
  title^20 h_*^14
  tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10
  contentmanager^5 links^5
  last_modified^5 url^5
 /str
 str name=fq(organisations='' roles='') or
  (organisations=$org roles=$r) or (organisations='' roles=$r) or
  (organisations=$org roles='')/str
 str name=bq(expiration:[NOW TO *] OR (*:*
  -expiration:*))^6/str  !-- tested: now or newer or empty gets small
  boost --
 str name=bfdiv(clicks,max(displays,1))^8/str !--
  tested
  --
 
 
 
 
 
 




Re: block join and atomic updates

2014-02-18 Thread Jason Hellman
Thinking in terms of normalized data in the context of a Lucene index is 
dangerous.  It is not a relational data model technology, and the join 
behaviors available to you have limited use.  Each approach requires 
compromises that are likely impermissible for certain uses cases.  

If it is at all reasonable to consider you will likely be best served 
de-normalizing the data.  Of course, your specific details may prove an 
exception to this rule…but generally approach works very well.

On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
wrote:

 absolutely.
 
 
 On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote:
 
 But isn't query time join much slower when it comes to a large amount of
 documents?
 
 Zitat von Mikhail Khludnev mkhlud...@griddynamics.com:
 
 
 Hello,
 
 It sounds like you need to switch to query time join.
 15.02.2014 21:57 пользователь m...@preselect-media.com написал:
 
 Any suggestions?
 
 
 Zitat von m...@preselect-media.com:
 
 Yonik Seeley yo...@heliosearch.com:
 
 
 On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:
 
 Is there any workaround to perform atomic updates on blocks or do I
 have to
 re-index the parent document and all its children always again if I
 want to
 update a field?
 
 
 The latter, unfortunately.
 
 
 Is there any plan to change this behavior in near future?
 
 So, I'm thinking of alternatives without loosing the benefit of block
 join.
 I try to explain an idea I just thought about:
 
 Let's say I have a parent document A with a number of fields I want to
 update regularly and a number of child documents AC_1 ... AC_n which are
 only indexed once and aren't going to change anymore.
 So, if I index A and AC_* in a block and I update A, the block is gone.
 But if I create an additional document AF which only contains something
 like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
 anymore), could I perform a {!parent ... } query on AF + AC_* and make
 an
 join from the results to get A?
 Does this makes any sense and is it even possible? ;-)
 And if it's possible, how can I do it?
 
 Thanks,
 - Moritz
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: block join and atomic updates

2014-02-18 Thread Walter Underwood
Listen to that advice. Denormalize, denormalize, denormalize. Think about the 
results page and work backwards from that. Flat data model.

wunder
Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg

On Feb 18, 2014, at 7:37 PM, Jason Hellman jhell...@innoventsolutions.com 
wrote:

 Thinking in terms of normalized data in the context of a Lucene index is 
 dangerous.  It is not a relational data model technology, and the join 
 behaviors available to you have limited use.  Each approach requires 
 compromises that are likely impermissible for certain uses cases.  
 
 If it is at all reasonable to consider you will likely be best served 
 de-normalizing the data.  Of course, your specific details may prove an 
 exception to this rule…but generally approach works very well.
 
 On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev mkhlud...@griddynamics.com 
 wrote:
 
 absolutely.
 
 
 On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote:
 
 But isn't query time join much slower when it comes to a large amount of
 documents?
 
 Zitat von Mikhail Khludnev mkhlud...@griddynamics.com:
 
 
 Hello,
 
 It sounds like you need to switch to query time join.
 15.02.2014 21:57 пользователь m...@preselect-media.com написал:
 
 Any suggestions?
 
 
 Zitat von m...@preselect-media.com:
 
 Yonik Seeley yo...@heliosearch.com:
 
 
 On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:
 
 Is there any workaround to perform atomic updates on blocks or do I
 have to
 re-index the parent document and all its children always again if I
 want to
 update a field?
 
 
 The latter, unfortunately.
 
 
 Is there any plan to change this behavior in near future?
 
 So, I'm thinking of alternatives without loosing the benefit of block
 join.
 I try to explain an idea I just thought about:
 
 Let's say I have a parent document A with a number of fields I want to
 update regularly and a number of child documents AC_1 ... AC_n which are
 only indexed once and aren't going to change anymore.
 So, if I index A and AC_* in a block and I update A, the block is gone.
 But if I create an additional document AF which only contains something
 like an foreign key to A and indexing AF + AC_* as a block (not A + AC_*
 anymore), could I perform a {!parent ... } query on AF + AC_* and make
 an
 join from the results to get A?
 Does this makes any sense and is it even possible? ;-)
 And if it's possible, how can I do it?
 
 Thanks,
 - Moritz
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com
 

--
Walter Underwood
wun...@wunderwood.org





Re: Preventing multiple on-deck searchers without causing failed commits

2014-02-18 Thread Colin Bartolome

Inline quoting ahead, sorry:


Colin:

Stop. Back up. The automatic soft commits will make updates available to
your users every second. Those documents _include_ anything from your hard
commit jobs. What could be faster? Parenthetically I'll add that 1 second
soft commits are rarely an actual requirement, but that's your decision.


The one-second commits not my decision, per se; it's the default value 
in solrconfig.xml and is also suggested as a common configuration in 
the Near Real Time Searching section of the reference guide.


(Our users at Experts Exchange used to have to wait up to five minutes 
before the search index updated with the latest content. While switching 
to Solr, we saw that the recommended configuration would refresh the 
index in seconds, rather than minutes, and rejoiced. We'd rather not 
increase the latency too far to solve this problem.)



For the hard commits. Fine. Do them if you insist. Just set
openSearcher=false. The documents will be searchable the next time the soft
commit happens, within one second. The key is openSearcher=false. That
prevents starting a brand new searcher.


Are you saying that the automatic soft commit will trigger, no matter 
what, even after our code has explicitly requested a hard commit? If so, 
that is, if the automatic soft commit triggers, even if no additional 
update requests have come in since the hard commit, then great! We'll do 
that!



BTW, your commits are not failing. It's just that _after_ the commit
happens, the warming searcher limit is exceeded.


My commits may indeed be succeeding, but the server is returning a HTTP 
503 response, which leads to SolrJ throwing a SolrServerException with 
the message No live SolrServers available to handle this request. Our 
code, understandably, interprets that as a failed request. This causes 
our job to abort and try again the next time it runs.



You can even wait until the segments are flushed to disk. All without
opening a searcher.


We will go with this if the automatic soft commit does indeed trigger 
after the explicit hard commit, thanks.



Shawn is spot on in his recommendations to not fixate on the commits. Solr
handles that. Here's a long blog about all the details of durability .vs.
visibility.
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

You're over-thinking the problem here, trying to control commits with a
sledgehammer when you don't need to, just use the built-in capabilities.


I get what you both are saying. If the problem is that I'm doing 
explicit hard commits, the solution is that I should stop doing explicit 
hard commits.


That's not really a solution, though.

What if, for whatever reason, I absolutely *had to* perform explicit 
hard commits? (I know you're saying I *don't* have to, but please 
indulge me for a moment.) Fortunately, the SolrJ client provides a way I 
can do this. But now my Solr server logs are full of Overlapping 
onDeckSearchers performance warnings. Fine, I'll turn 
maxWarmingSearchers down to 1. Now the server returns HTTP 503 responses 
every now and then and SolrJ throws an exception.


I think that's a problem that the servers can solve: just queue up the 
request until the number of warming searchers is under the limit. So I 
filed that RFE. Even when all the above suggestions work perfectly and 
fix our issues, it's still a valid RFE.


Re: Weird behavior of stopwords in search query

2014-02-18 Thread Jack Krupansky
Does other appear in the id, cat, or sku fields? This clause requires it 
to appear in at least one of those fields:


+DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5))

The and is treated as the AND operator. What query parser are you using?

Without and, the terms are OR'ed, which is the default query operator.

-- Jack Krupansky

-Original Message- 
From: Shamik Bandopadhyay

Sent: Tuesday, February 18, 2014 8:53 PM
To: solr-user@lucene.apache.org
Subject: Weird behavior of stopwords in search query

Hi,

 I'm observing a weird behavior while using stopwords as part of the
search query. I'm able to replicate it in standalone Solr instance well.
The issue pops up when I'm trying to use other and and stopword
together in a query string. The query doesn't return any result. But it
works with any other combination. For e.g.

1. query yields no result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+and+SearchdebugQuery=truewt=xml


Debug Query :


str name=rawquerystringAWS other and Search/str

str name=querystringAWS other and Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 |
author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 |
keywords:search^5.0 | manu:search^1.1 | description:search^5.0 |
resourcename:search | name:search^1.2 | features:search |
sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5)
+(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5
| cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 |
description:search^5.0 | resourcename:search | name:search^1.2 |
features:search | sku:search^1.5))/str





2. query yields result --
http://localhost:8983/solr/collection1/browse?q=AWS+other+an+SearchdebugQuery=truewt=xml

Debug Query
-

str name=rawquerystringAWS other an Search/str

str name=querystringAWS other an Search/strstr
name=parsedquery(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 |
cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5/no_coord/str

str name=parsedquery_toString+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5)
(id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))/str

Both other and and are part of the stopwords list.

I ran an analysis on text_general field, both stopwords were shows as
ignored during indexing and query time, but not happening during actual
search.

Not sure what I'm missing here, any pointers will be appreciated.

- Thanks,
Shamik 



Re: Weird behavior of stopwords in search query

2014-02-18 Thread shamik
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :

str name=qf
  id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   /str

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator. 

The part what I'm not clear is how and is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: block join and atomic updates

2014-02-18 Thread Mikhail Khludnev
Colleagues,
You are definitely right regarding denormcollapse. It works fine in most
cases, but look at the case more precisely. Moritz needs to update the
parent's fields, if they are copied during denormalization, the price of
update is the same as block join's. With q-time join updates are way
cheaper, but searching time, you know.
19.02.2014 8:15 пользователь Walter Underwood wun...@wunderwood.org
написал:

 Listen to that advice. Denormalize, denormalize, denormalize. Think about
 the results page and work backwards from that. Flat data model.

 wunder
 Search guy at Infoseek, Inktomi, Verity, Autonomy, Netflix, and Chegg

 On Feb 18, 2014, at 7:37 PM, Jason Hellman jhell...@innoventsolutions.com
 wrote:

  Thinking in terms of normalized data in the context of a Lucene index is
 dangerous.  It is not a relational data model technology, and the join
 behaviors available to you have limited use.  Each approach requires
 compromises that are likely impermissible for certain uses cases.
 
  If it is at all reasonable to consider you will likely be best served
 de-normalizing the data.  Of course, your specific details may prove an
 exception to this rule...but generally approach works very well.
 
  On Feb 18, 2014, at 4:19 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:
 
  absolutely.
 
 
  On Tue, Feb 18, 2014 at 1:20 PM, m...@preselect-media.com wrote:
 
  But isn't query time join much slower when it comes to a large amount
 of
  documents?
 
  Zitat von Mikhail Khludnev mkhlud...@griddynamics.com:
 
 
  Hello,
 
  It sounds like you need to switch to query time join.
  15.02.2014 21:57 пользователь m...@preselect-media.com написал:
 
  Any suggestions?
 
 
  Zitat von m...@preselect-media.com:
 
  Yonik Seeley yo...@heliosearch.com:
 
 
  On Thu, Feb 13, 2014 at 8:25 AM,  m...@preselect-media.com wrote:
 
  Is there any workaround to perform atomic updates on blocks or do I
  have to
  re-index the parent document and all its children always again if
 I
  want to
  update a field?
 
 
  The latter, unfortunately.
 
 
  Is there any plan to change this behavior in near future?
 
  So, I'm thinking of alternatives without loosing the benefit of
 block
  join.
  I try to explain an idea I just thought about:
 
  Let's say I have a parent document A with a number of fields I want
 to
  update regularly and a number of child documents AC_1 ... AC_n
 which are
  only indexed once and aren't going to change anymore.
  So, if I index A and AC_* in a block and I update A, the block is
 gone.
  But if I create an additional document AF which only contains
 something
  like an foreign key to A and indexing AF + AC_* as a block (not A +
 AC_*
  anymore), could I perform a {!parent ... } query on AF + AC_* and
 make
  an
  join from the results to get A?
  Does this makes any sense and is it even possible? ;-)
  And if it's possible, how can I do it?
 
  Thanks,
  - Moritz
 
 
 
 
 
 
 
 
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
  mkhlud...@griddynamics.com
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread Vineet Mishra
Thanks for all your response but my doubt is which *Server:Port* should the
query be made as we don't know the crashed server or which server might
crash in the future(as any server can go down).

The only intention for writing this doubt is to get an idea about how the
query format for distributed search might work if any of the shard or
replica goes down.

Thanks


On Tue, Feb 18, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/18/2014 8:32 AM, Shawn Heisey wrote:

 On 2/18/2014 6:05 AM, Vineet Mishra wrote:

 *Shard 1 Shard 2*
 localhost:8983localhost:7574
 localhost:8900localhost:7500


 I Indexed some document and then if I shutdown any of the replica or
 Leader
 say for ex- *localhost:8900*, I can't query to the collection to that
 particular port

 http:/*/localhost:8900*/solr/collection1/select?q=*:*

 Then how is it Fault Tolerant or how the query has to be made.

 What is the complete error you are getting?  If you don't see the error
 in the response, you'll need to find your Solr Logfile and look for the
 error (including a large java stacktrace) there.


 Good catch by Per.  I did not notice that you were trying to send the
 query to the server that you took down.  This isn't going to work -- if the
 software you're trying to reach is not running, it won't respond.  Think
 about what happens if you are sending requests to a server and it crashes
 completely.

 If you want to always send to the same host/port, you will need a load
 balancer listening on that port.  You'll also want something that maintains
 a shared IP address, so that if the machine dies, the IP address and the
 load balancer move to another machine.  Haproxy and Pacemaker work very
 well as a combination for this.  There are many other choices, both
 hardware and software.

 Per also mentioned the other option - you can write code that knows about
 multiple URLs and can switch between them.  This is something you get for
 free with CloudSolrServer when writing Java code with SolrJ.

 Thanks,
 Shawn




Re: Increasing number of SolrIndexSearcher (Leakage)?

2014-02-18 Thread Nguyen Manh Tien
I found a custom component cause that issue,
It creates a SolrQueryRequest but doesn't close at the end that make ref to
SolrIndexSearcher don't go to 0 and SIS is not released.




On Tue, Feb 18, 2014 at 9:31 PM, Yonik Seeley yo...@heliosearch.com wrote:

 On Mon, Feb 17, 2014 at 1:34 AM, Nguyen Manh Tien
 tien.nguyenm...@gmail.com wrote:
  - *But after i index some docs and run softCommit or hardCommit with
  openSearcher=false, number of SolrIndexSearcher increase by 1*

 This is fine... it's more of an internal implementation detail (we
 open what is called a real-time searcher so we can drop some other
 data structures like the list of non-visible document updates, etc).
 If you did the commit again, the count should not continue to
 increase.

 If the number of searchers continues to increase, you have a searcher
 leak due to something else.
 Are you using any custom components or anything else that isn't stock Solr?

 -Yonik
 http://heliosearch.org - native off-heap filters and fieldcache for solr



Re: Fault Tolerant Technique of Solr Cloud

2014-02-18 Thread shamik
As Shawn had pointed, if you are using CloudSolrServer client, then you are
immune to the scenario where a shard and its replica(s) go down. The
communication should be ideally with the zookeepers and not the solr servers
directly, One thing you need to make sure is to add the shard.tolerant
parameter so that the query returns result from the shard which is alive,
though it'll fetch a partial resultset.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fault-Tolerant-Technique-of-Solr-Cloud-tp4118003p4118196.html
Sent from the Solr - User mailing list archive at Nabble.com.