date:20141002

On 10/1/2014 11:23 PM, Danesh Kuruppu wrote:
 Currently we are using solr for service meta data indexing and Searching .
 we have embedded solr server running in our application and we are using
 solr 1.4 version. Have some doubts to be clear.
 
 1. What are the performance improvements we can gain from updating to the
 latest solr version(4.10.1).

One of the key areas that's better/faster is indexing, but there are
performance improvements for querying too.  Solr and Lucene have evolved
considerably in the four years since Solr 1.4 (using Lucene 2.9)
was released.

 2. Currently we are using embedded solr, I have an idea of moving to
 standalone server. What is best way of using standalone server in our java
 webapp.

The embedded server is still available, although even in 1.4 it was not
recommended for anything but a proof of concept.  You should simply
install one or more standalone servers and access them from your app via
http.  On a LAN, the overhead introduced by http is minimal.

Since you're already using the embedded server, that's simply a
matter of changing EmbeddedSolrServer to HttpSolrServer or
CloudSolrServer (depending on whether or not you use SolrCloud).  You
can also remove the solr-* and lucene-* jars from your classpath ... you
just need the solrj jar and its standard dependencies, which can be
found in the binary download or the compiled source code under
dist/solrj-lib.

Thanks,
Shawn

Re: Wildcard search makes no sense!!

On 10/2/2014 4:33 AM, waynemailinglist wrote:
 Something that is still not clear in my mind is how this tokenising works.
 For example with the filters I have when I run the analyser I get:
 Field: Hello You
 
 Hello|You
 Hello|You
 Hello|You
 hello|you
 hello|you
 
 
 Does this mean that the index is stored as 'hello|you' (the final one) and
 that when I run a query and it goes through the filters whatever the end
 result of that is must match the 'hello|you' in order to return a result?

The index has two terms for this field if this is the whole input --
hello and you -- which can be searched for individually.  The tokenizer
does the initial job of separating the input into tokens (terms) ...
some filters can create additional terms, depending on exactly what's
left when the tokenizer is done.

Thanks,
Shawn

Re: Master-Slave setup using SolrCloud

On 10/2/2014 6:58 AM, Sachin Kale wrote:
We are trying to move our traditional master-slave Solr configuration to
SolrCloud. As our index size is very small (around 1 GB), we are having
only one shard.
So basically, we are having same master-slave configuration with one leader
and 6 replicas.
We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute
(Let me know if these values does not make sense).

Caches are set such that warmup time is at most 20 seconds.

We are having continuous indexing requests mostly for updating the existing
documents. Few requests are for deleting/adding the documents.

The problem we are facing is that we are getting very frequent
NullPointerExceptions.
We get continuous 200-300 such exceptions within a period of 30 seconds and
for next few minutes, it works fine.

Stacktrace of NullPointerException:

*ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
null:java.lang.NullPointerException*
*at
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
*at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
*at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*

I am not sure what would be causing it. My guess, whenever, it is trying to
replay tlog, we are getting these exceptions. Is anything wrong in my
configuration?

Your automatic commit settings are fine. If you had tried to use a very
small maxTime like 1000 (1 second), I would tell you that it's probably
too short.

The tlogs only get replayed when a core is first started or reloaded.
These appear to be errors during queries, having nothing at all to do
with indexing.

I can't be sure with the available information (no Solr version,
incomplete stacktrace, no info about what request caused and received
the error), but if I had to guess, I'd say you probably changed your
schema so that certain fields are now required that weren't required
before, and didn't reindex, so those fields are not present on every
document. Or it might be that you added a uniqueKey and didn't reindex,
and that field is not present on every document.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

On 10/2/2014 7:25 AM, Phil Black-Knight wrote:
 I was helping to look into this with Nick  I think we may have figured out
 the core of the problem...
 
 The problem is easily reproducible by starting replication on the slave and
 then sending a shutdown command to tomcat (e.g. catalina.sh stop).
 
 With a debugger attached, it looks like the fsyncService thread is blocking
 VM shutdown because it is created as a non-daemon thread.

snip

 I can submit patches if needed... and cross post to the dev mailing list...

File a detailed issue in Jira and attach your patch there.  This is our
bugtracker.  You need an account on the Apache jira instance to do this:

https://issues.apache.org/jira/browse/SOLR

Thanks,
Shawn

Re: Wildcard search makes no sense!!

right, prior to 3.6, the standard way to handle wildcards was to,
essentially, pre-analyze the terms that had  wildcards. This works
fine for simple filters, things like lowercasing for instance, but
doesn't work so well for things like stemming.

So you're doing what can be done at this point, but moving to 4.x (or
even 3.6) would solve it better.

Best,
Erick

On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey apa...@elyograg.org wrote:
 On 10/2/2014 4:33 AM, waynemailinglist wrote:
 Something that is still not clear in my mind is how this tokenising works.
 For example with the filters I have when I run the analyser I get:
 Field: Hello You

 Hello|You
 Hello|You
 Hello|You
 hello|you
 hello|you


 Does this mean that the index is stored as 'hello|you' (the final one) and
 that when I run a query and it goes through the filters whatever the end
 result of that is must match the 'hello|you' in order to return a result?

 The index has two terms for this field if this is the whole input --
 hello and you -- which can be searched for individually.  The tokenizer
 does the initial job of separating the input into tokens (terms) ...
 some filters can create additional terms, depending on exactly what's
 left when the tokenizer is done.

 Thanks,
 Shawn

RE: SolrCould read-only replicas

2014-10-02 Thread Sandeep Tikoo

Erick,

Thank you for your response. Yup, when I said it is not possible to have a 
cross continent data center replica, I meant that we never ever want to do that 
because of the latency.

What I was hoping is that  I could have Solr cloud in my DataCentre A (DC-A) 
and get all the benefits of sharding ( scaling/parallel computing) and failover 
redundancy within the same data center. If I could then have a read-only 
replica (with no guaranteed consistency of course ) of this entire cloud in my 
DataCenter B (DC-B), that would make my reads over DC-B faster without making 
my writes slow. To clarify, all the writes were going to go against DC-A only. 
The read-only cluster in DC-B could  also be made the master in case the entire 
DC-A went down.  The DC-B wouldn't be guaranteed to be in sync with the DB-A 
master but in my use case I could live with that. Seems like that is no 
possible out-of-the-box if I am using Solr 4.0+ in the cloud mode. It is either 
Solr Coud or cross data center read only replica. Can't do both at the same 
time.
I think that is what you confirmed as well. If I have it wrong, please let me 
know. Also, any thoughts on the most easy way to accomplish the read-only 
replica of the entire solr cloud cluster?

Thanks!
Tikoo

From: Sandeep Tikoo
Sent: Saturday, September 27, 2014 9:43 PM
To: 'solr-user@lucene.apache.org'
Subject: SolrCould read-only replicas

Hi-

I have been reading up on SolrCloud and it seems that it is not possible to 
have a cross-datacenter read-only slave anymore but wanted to ask here to be 
sure.
We currently have a pre Solr 4.0 installation with the master instance in our 
US mid-west datacenter. The datacenter in Europe has read-replicas which pull 
data using solr.ReplicationHandler. We wanted to upgrade to SolrCloud. As far 
as I have been able to figure out, with SolrCloud you cannot have a read-only 
replica anymore. A replica has to be able to become a leader and writes against 
all replicas for a shard have to succeed. Because of the a strong consistency 
model across replicas, it seems that replicas cannot be across datacenters 
anymore.

So my question is, how can we have a read-ony replica in a remote datacenter in 
Solr 4.0+ similar to pre Solr 4.0? Is it not possible anymore without doing it 
all yourself?

cheers,
Tikoo

Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail

2014-10-02 Thread Phil Black-Knight

see the ticket here:
https://issues.apache.org/jira/browse/SOLR-6579

including a patch to fix it.

On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/2/2014 7:25 AM, Phil Black-Knight wrote:
  I was helping to look into this with Nick  I think we may have figured
 out
  the core of the problem...
 
  The problem is easily reproducible by starting replication on the slave
 and
  then sending a shutdown command to tomcat (e.g. catalina.sh stop).
 
  With a debugger attached, it looks like the fsyncService thread is
 blocking
  VM shutdown because it is created as a non-daemon thread.

 snip

  I can submit patches if needed... and cross post to the dev mailing
 list...

 File a detailed issue in Jira and attach your patch there.  This is our
 bugtracker.  You need an account on the Apache jira instance to do this:

 https://issues.apache.org/jira/browse/SOLR

 Thanks,
 Shawn

DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity

2014-10-02 Thread stockii

Hello

i am fighting with cacheImpl=SortedMapBackedCache.

I want to refactor my ugly entities and so i try out sub-entities with
caching.
My Problem is that my cached subquery do not return any values from the
select. but why?

thats my entity 
entity name=en1 pk=id transformer=DateFormatTransformer 
  query=SELECT id, product FROM table WHERE product = 'abc'

entity name=en2 pk=id transformer=DateFormatTransformer
cacheImpl=SortedMapBackedCache
query= SELECT id, code FROM table2 
where=id = '${en1.id}'/
/entity


this is very fast an clear and nice... but it does not work. all from table2
is not coming to my index =(
BUT if i remove the line with cacheImpl=SortedMapBackedCache all data is
present, but every row is selecte each by each.
i thought that this construct, hopefully replace my ugly big join-query in a
single entity!?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Master-Slave setup using SolrCloud

2014-10-02 Thread Sachin Kale

If I look into the logs, many times I get only following line without any
stacktrace:

*ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException;
java.lang.NullPointerException*

These exceptions are not coming continuously. Once in every 10-15 minutes.
But once it starts, there are continuous 800-1000 such exceptions one after
another. Is it related to cache warmup?

I can provide following information regarding the setup:
We are now on using Solr 4.10.0
Memory allocated to each SOLR instance is 7GB. I guess it is more than
sufficient for 1 GB index, right?
Indexes are stored as normal, local filesystem.
I am using three caches:
Query Cache: Size 4096, autoWarmCount 2048
Filter cache: size 8192, autoWarmCount 4096
Document cache: size 4096

I am experimenting with commitMaxTime for both soft and hard commits

After referring following:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Hence, I set following:

autoCommit
maxTime${solr.autoCommit.maxTime:6}/maxTime
openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:90}/maxTime
/autoSoftCommit

Also, we are getting following warnings many times:

*java.lang.NumberFormatException: For input string: 5193.0*

Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we
pointed it to the same index we were using for 4.4.0

On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey apa...@elyograg.org wrote:

On 10/2/2014 6:58 AM, Sachin Kale wrote:
We are trying to move our traditional master-slave Solr configuration to
SolrCloud. As our index size is very small (around 1 GB), we are having
only one shard.
So basically, we are having same master-slave configuration with one
leader
and 6 replicas.
We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1
minute
(Let me know if these values does not make sense).

Caches are set such that warmup time is at most 20 seconds.

We are having continuous indexing requests mostly for updating the
existing
documents. Few requests are for deleting/adding the documents.

The problem we are facing is that we are getting very frequent
NullPointerExceptions.
We get continuous 200-300 such exceptions within a period of 30 seconds
and
for next few minutes, it works fine.

Stacktrace of NullPointerException:

*ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
null:java.lang.NullPointerException*
*at

org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
*at

org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
*at

org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*

I am not sure what would be causing it. My guess, whenever, it is trying
to
replay tlog, we are getting these exceptions. Is anything wrong in my
configuration?

Your automatic commit settings are fine. If you had tried to use a very
small maxTime like 1000 (1 second), I would tell you that it's probably
too short.

The tlogs only get replayed when a core is first started or reloaded.
These appear to be errors during queries, having nothing at all to do
with indexing.

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn

RE: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity

2014-10-02 Thread Dyer, James

Try using the cacheKey/cacheLookup parameters instead:

entity 
 name=en1 
 pk=id 
 transformer=DateFormatTransformer 
 query=SELECT id, product FROM table WHERE product = 'abc'
   
  entity 
   name=en2 
   cacheKey=id
   cacheLookup=en1.id
   transformer=DateFormatTransformer 
   cacheImpl=SortedMapBackedCache
   query=SELECT id, code FROM table2 
  /
/entity

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: stockii [mailto:stock.jo...@googlemail.com] 
Sent: Thursday, October 02, 2014 9:19 AM
To: solr-user@lucene.apache.org
Subject: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity

Hello

i am fighting with cacheImpl=SortedMapBackedCache.

I want to refactor my ugly entities and so i try out sub-entities with
caching.
My Problem is that my cached subquery do not return any values from the
select. but why?

thats my entity 
entity name=en1 pk=id transformer=DateFormatTransformer 
  query=SELECT id, product FROM table WHERE product = 'abc'

entity name=en2 pk=id transformer=DateFormatTransformer
cacheImpl=SortedMapBackedCache
query= SELECT id, code FROM table2 
where=id = '${en1.id}'/
/entity


this is very fast an clear and nice... but it does not work. all from table2
is not coming to my index =(
BUT if i remove the line with cacheImpl=SortedMapBackedCache all data is
present, but every row is selecte each by each.
i thought that this construct, hopefully replace my ugly big join-query in a
single entity!?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html
Sent from the Solr - User mailing list archive at Nabble.com.

Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Grainne

I need to upgrade from Solr 4.4 to version 4.10.1 and am not sure if I need
to reindex.

The following from http://wiki.apache.org/solr/Solr4.0 leads me to believe I
don't:
The guarantee for this alpha release is that the index format will be the
4.0 index format, supported through the 5.x series of Lucene/Solr, unless
there is a critical bug (e.g. that would cause index corruption) that would
prevent this.

I've been looking through the change logs and news and the following from
http://lucene.apache.org/solr/solrnews.html makes me think that maybe I do
need to reindex:
Solr 4.6 Release Highlights:
...
New default index format: Lucene46Codec
...

It will not be an easy task to reindex the files so I am hoping the answer
is that it is not necessary.

Thanks for any advice,
Grainne



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCould read-only replicas

2014-10-02 Thread Walter Underwood

Here is a different approach.

Set up independent Solr Cloud clusters in each data center. Send all updates 
into a persistent message queue (Amazon SQS, whatever) and have each cluster 
get updates from the queue.

The two clusters are both live and configured identically, so there is nothing 
to change in a failover.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Oct 2, 2014, at 7:07 AM, Sandeep Tikoo sti...@digitalriver.com wrote:

 Erick,
 
 Thank you for your response. Yup, when I said it is not possible to have a 
 cross continent data center replica, I meant that we never ever want to do 
 that because of the latency.
 
 What I was hoping is that  I could have Solr cloud in my DataCentre A (DC-A) 
 and get all the benefits of sharding ( scaling/parallel computing) and 
 failover redundancy within the same data center. If I could then have a 
 read-only replica (with no guaranteed consistency of course ) of this entire 
 cloud in my DataCenter B (DC-B), that would make my reads over DC-B faster 
 without making my writes slow. To clarify, all the writes were going to go 
 against DC-A only. The read-only cluster in DC-B could  also be made the 
 master in case the entire DC-A went down.  The DC-B wouldn't be guaranteed to 
 be in sync with the DB-A master but in my use case I could live with that. 
 Seems like that is no possible out-of-the-box if I am using Solr 4.0+ in the 
 cloud mode. It is either Solr Coud or cross data center read only replica. 
 Can't do both at the same time.
 I think that is what you confirmed as well. If I have it wrong, please let me 
 know. Also, any thoughts on the most easy way to accomplish the read-only 
 replica of the entire solr cloud cluster?
 
 Thanks!
 Tikoo
 
 From: Sandeep Tikoo
 Sent: Saturday, September 27, 2014 9:43 PM
 To: 'solr-user@lucene.apache.org'
 Subject: SolrCould read-only replicas
 
 Hi-
 
 I have been reading up on SolrCloud and it seems that it is not possible to 
 have a cross-datacenter read-only slave anymore but wanted to ask here to be 
 sure.
 We currently have a pre Solr 4.0 installation with the master instance in our 
 US mid-west datacenter. The datacenter in Europe has read-replicas which pull 
 data using solr.ReplicationHandler. We wanted to upgrade to SolrCloud. As far 
 as I have been able to figure out, with SolrCloud you cannot have a read-only 
 replica anymore. A replica has to be able to become a leader and writes 
 against all replicas for a shard have to succeed. Because of the a strong 
 consistency model across replicas, it seems that replicas cannot be across 
 datacenters anymore.
 
 So my question is, how can we have a read-ony replica in a remote datacenter 
 in Solr 4.0+ similar to pre Solr 4.0? Is it not possible anymore without 
 doing it all yourself?
 
 cheers,
 Tikoo

Re: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Michael Della Bitta

You should of course perform a test first to be sure, but you shouldn't
need to reindex. Running an optimize on your cores or collections will
upgrade them to the new format, or you could use Lucene's IndexUpgrader
tool. In the meantime, bringing up your data in 4.10.1 will work, it just
won't take advantage of some of the file format improvements.

However, it is somewhat of a design smell that you can't reindex. In my
experience, it is extremely valuable to be able to reindex your data at
will.

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Thu, Oct 2, 2014 at 12:06 PM, Grainne grainne_rei...@harvard.edu wrote:

 I need to upgrade from Solr 4.4 to version 4.10.1 and am not sure if I need
 to reindex.

 The following from http://wiki.apache.org/solr/Solr4.0 leads me to
 believe I
 don't:
 The guarantee for this alpha release is that the index format will be the
 4.0 index format, supported through the 5.x series of Lucene/Solr, unless
 there is a critical bug (e.g. that would cause index corruption) that would
 prevent this.

 I've been looking through the change logs and news and the following from
 http://lucene.apache.org/solr/solrnews.html makes me think that maybe I do
 need to reindex:
 Solr 4.6 Release Highlights:
 ...
 New default index format: Lucene46Codec
 ...

 It will not be an easy task to reindex the files so I am hoping the answer
 is that it is not necessary.

 Thanks for any advice,
 Grainne



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-02 Thread waynemailinglist

Ok I think I understand your points there. Just clarify say if the term was
Large increased and my filters went something like:

Large|increased
Large|increase|increased
large|increase|increased

the final tokens indexed would be large|increase|increased  ?

Once again thanks for all the help.


On Thu, Oct 2, 2014 at 2:30 PM, Shawn Heisey-2 [via Lucene] 
ml-node+s472066n4162306...@n3.nabble.com wrote:

 On 10/2/2014 4:33 AM, waynemailinglist wrote:

  Something that is still not clear in my mind is how this tokenising
 works.
  For example with the filters I have when I run the analyser I get:
  Field: Hello You
 
  Hello|You
  Hello|You
  Hello|You
  hello|you
  hello|you
 
 
  Does this mean that the index is stored as 'hello|you' (the final one)
 and
  that when I run a query and it goes through the filters whatever the end
  result of that is must match the 'hello|you' in order to return a
 result?

 The index has two terms for this field if this is the whole input --
 hello and you -- which can be searched for individually.  The tokenizer
 does the initial job of separating the input into tokens (terms) ...
 some filters can create additional terms, depending on exactly what's
 left when the tokenizer is done.

 Thanks,
 Shawn



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162306.html
  To unsubscribe from Wildcard search makes no sense!!, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4162069code=d2F5bmVtYWlsaW5nbGlzdHNAZ21haWwuY29tfDQxNjIwNjl8LTIxOTMxNzkyNQ==
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana

Ahmet,Jeff,

Thanks.  Some terms are a bit overloaded.  By federated, I do mean the
ability to query multiple, disparate, repositories.  So, no.  All of my
data would not necessarily be in Solr.  Solr would be one of several -
databases, filesystems, document stores, etc...  that I would like to
plug-in.  The content in each repository would be of different types (the
shape/schema of the content would differ significantly).

Thanks,

Alejandro

On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com
wrote:

 Alejandro, you'll have to clarify how you are using the term federated
 search. I mean, technically Ahmet is correct in that Solr queries can be
 fanned out to shards and the results from each shard aggregated
 (federated) into a single result list, but... more traditionally,
 federated refers to disparate databases or search engines.

 See:
 http://en.wikipedia.org/wiki/Federated_search

 So, please tell us a little more about what you are really trying to do.

 I mean, is all of your data in Solr, in multiple collections, or on
 multiple Solr servers, or... is only some of your data in Solr and some is
 in other search engines?

 Another approach taken with Solr is that indeed all of your source data
 may be in disparate databases, but you perform an ETL (Extract,
 Transform, and Load) process to ingest all of that data into Solr and then
 simply directly search the data within Solr.

 -- Jack Krupansky

 -Original Message- From: Ahmet Arslan
 Sent: Wednesday, October 1, 2014 9:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr + Federated Search Question

 Hi,

 Federation is possible. Solr has distributed search support with shards
 parameter.

 Ahmet



 On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana 
 acalbaz...@gmail.com wrote:
 Hello,

 I have a general question about Solr in a federated search context.  I
 understand that Solr does not do federated search and that  different tools
 are often used to incorporate Solr indexes into a federated/enterprise
 search solution.  Does anyone have recommendations on any products (open
 source or otherwise) that addresses this space?

 Thanks,

 Alejandro

Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana

Alexandre,

Thanks.  I will have a look.

Alejandro

On Wed, Oct 1, 2014 at 3:03 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 http://project.carrot2.org/ is worth having a look at. It supports
 Solr well. In fact, a subset of it is shipped with Solr

 Regards,
Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
 Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


 On 1 October 2014 09:29, Alejandro Calbazana acalbaz...@gmail.com wrote:
  Hello,
 
  I have a general question about Solr in a federated search context.  I
  understand that Solr does not do federated search and that  different
 tools
  are often used to incorporate Solr indexes into a federated/enterprise
  search solution.  Does anyone have recommendations on any products (open
  source or otherwise) that addresses this space?
 
  Thanks,
 
  Alejandro

Re: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Grainne

Hi Michael,

Thanks for the quick response. Running optimize on the index sounds like a
good idea.  Do you know if  that is possible from the command line? 

I agree it is an omission to not be easily able to reindex files and that is
a story I need to prioritize.

Thanks again,
Grainne



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340p4162359.html
Sent from the Solr - User mailing list archive at Nabble.com.

Export feature issue in Solr 4.10

2014-10-02 Thread Ahmed Adel

Hi All,

I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe
I missed something.

Here's the scenario:


   1. Download Solr 4.10.0
   2. Use collection1 schema out of the box
   3. Add docValues=true to price and pages fields in schema.xml
   4. Index books.json using command line:
   curl http://localhost:8984/solr/collection1/update -H
Content-Type: text/json --data-binary
@example/exampledocs/books.json
   5. Try running this query:
   http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price
   6. Here's the error I get:

   java.lang.IllegalArgumentException: docID must be = 0 and 
maxDoc=4 (got docID=4)
at 
org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
at 
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
at 
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700)
at 
org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213)
at 
org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
...


Any ideas what could be missing?

Thanks,
A. Adel

Load existing index to solrCloud?

2014-10-02 Thread Tara

Hi ,



I have an index created by Lucen way back. I just set up a solrCloud, I
don't want to reindex my documents again since I already have the index, how
do I load the existing index into  solrCloud empty new collection?  I am
able to load it into a solr instance but not sure how to load it correctly
into a solrCloud so that the index will get redistributed. 

Thanks in advanced!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-existing-index-to-solrCloud-tp4162362.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Michael Della Bitta

Yes, you can just do something like curl
http://mysolrserver:mysolrport/solr/mycollectionname/update?optimize=true;.
You should expect heavy disk activity while this completes. I wouldn't do
more than one collection at a time.

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/

On Thu, Oct 2, 2014 at 12:55 PM, Grainne grainne_rei...@harvard.edu wrote:

 Hi Michael,

 Thanks for the quick response. Running optimize on the index sounds like a
 good idea.  Do you know if  that is possible from the command line?

 I agree it is an omission to not be easily able to reindex files and that
 is
 a story I need to prioritize.

 Thanks again,
 Grainne



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Upgrade-from-solr-4-4-to-4-10-1-tp4162340p4162359.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr + Federated Search Question

2014-10-02 Thread Ahmet Arslan

Hi Alejandro,

So your example is better called as metasearch. Here a quotation from a book.

Instead of retrieving information from a single information source using one 
search engine, one can utilize multiple search engines or a single search 
engine retrieving documents from a plethora of document collections. A scenario 
where multiple engines are used is known as metasearch, while the scenario 
where a single engine retrieves from multiple collections is known as 
federation. In both these scenarios, the final result of the retrieval effort 
needs to be a single, unified ranking of documents, based on several ranked 
lists.

Ahmet


On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana 
acalbaz...@gmail.com wrote:
Ahmet,Jeff,

Thanks.  Some terms are a bit overloaded.  By federated, I do mean the
ability to query multiple, disparate, repositories.  So, no.  All of my
data would not necessarily be in Solr.  Solr would be one of several -
databases, filesystems, document stores, etc...  that I would like to
plug-in.  The content in each repository would be of different types (the
shape/schema of the content would differ significantly).

Thanks,

Alejandro




On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com
wrote:

 Alejandro, you'll have to clarify how you are using the term federated
 search. I mean, technically Ahmet is correct in that Solr queries can be
 fanned out to shards and the results from each shard aggregated
 (federated) into a single result list, but... more traditionally,
 federated refers to disparate databases or search engines.

 See:
 http://en.wikipedia.org/wiki/Federated_search

 So, please tell us a little more about what you are really trying to do.

 I mean, is all of your data in Solr, in multiple collections, or on
 multiple Solr servers, or... is only some of your data in Solr and some is
 in other search engines?

 Another approach taken with Solr is that indeed all of your source data
 may be in disparate databases, but you perform an ETL (Extract,
 Transform, and Load) process to ingest all of that data into Solr and then
 simply directly search the data within Solr.

 -- Jack Krupansky

 -Original Message- From: Ahmet Arslan
 Sent: Wednesday, October 1, 2014 9:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr + Federated Search Question

 Hi,

 Federation is possible. Solr has distributed search support with shards
 parameter.

 Ahmet



 On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana 
 acalbaz...@gmail.com wrote:
 Hello,

 I have a general question about Solr in a federated search context.  I
 understand that Solr does not do federated search and that  different tools
 are often used to incorporate Solr indexes into a federated/enterprise
 search solution.  Does anyone have recommendations on any products (open
 source or otherwise) that addresses this space?

 Thanks,

 Alejandro

Does Solr handle an sshfs mounted index

2014-10-02 Thread Grainne

I am currently running Solr 4.4.0 on RHEL 6.  The index used to be mounted
via nfs and it all worked perfectly fine.  For security reasons we switched
the index to be sshfs mounted - and this seems to cause solr to fail after a
while.  If we switch back to nfs it works again.

The behavior is strange - Solr starts up and issues an error:
...
Oct 02, 2014 11:43:00 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
...
Caused by: java.io.FileNotFoundException:
/path/to/collection/data/index/_10_Lucene41_0.tim (Operation not permitted)
...

While Solr is running, if, as the same user, I look at the mounted path I
get the same behavior:
-bash-4.1$ ls /mounted/filesystem/path
ls: reading directory /mounted/filesystem/path: Operation not permitted

When I shut down Solr it behaves as expected and I get the file listing. 
The file is there and 

Several of us, including unix systems people, are looking at why this might
be happening and have yet to figure it out.

Does anyone know if it possible to run Solr where the index is mounted via
sshfs?  

Thanks for any advice,
Grainne





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-handle-an-sshfs-mounted-index-tp4162375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana

Thanks Ahmet.  Yay!  New term :)  Although it does look like federated
and metasearch can be  used interchangeably.

Alejandro

On Thu, Oct 2, 2014 at 2:37 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Alejandro,

 So your example is better called as metasearch. Here a quotation from a
 book.

 Instead of retrieving information from a single information source using
 one search engine, one can utilize multiple search engines or a single
 search engine retrieving documents from a plethora of document collections.
 A scenario where multiple engines are used is known as metasearch, while
 the scenario where a single engine retrieves from multiple collections is
 known as federation. In both these scenarios, the final result of the
 retrieval effort needs to be a single, unified ranking of documents, based
 on several ranked lists.

 Ahmet


 On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana 
 acalbaz...@gmail.com wrote:
 Ahmet,Jeff,

 Thanks.  Some terms are a bit overloaded.  By federated, I do mean the
 ability to query multiple, disparate, repositories.  So, no.  All of my
 data would not necessarily be in Solr.  Solr would be one of several -
 databases, filesystems, document stores, etc...  that I would like to
 plug-in.  The content in each repository would be of different types (the
 shape/schema of the content would differ significantly).

 Thanks,

 Alejandro




 On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  Alejandro, you'll have to clarify how you are using the term federated
  search. I mean, technically Ahmet is correct in that Solr queries can be
  fanned out to shards and the results from each shard aggregated
  (federated) into a single result list, but... more traditionally,
  federated refers to disparate databases or search engines.
 
  See:
  http://en.wikipedia.org/wiki/Federated_search
 
  So, please tell us a little more about what you are really trying to do.
 
  I mean, is all of your data in Solr, in multiple collections, or on
  multiple Solr servers, or... is only some of your data in Solr and some
 is
  in other search engines?
 
  Another approach taken with Solr is that indeed all of your source data
  may be in disparate databases, but you perform an ETL (Extract,
  Transform, and Load) process to ingest all of that data into Solr and
 then
  simply directly search the data within Solr.
 
  -- Jack Krupansky
 
  -Original Message- From: Ahmet Arslan
  Sent: Wednesday, October 1, 2014 9:35 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr + Federated Search Question
 
  Hi,
 
  Federation is possible. Solr has distributed search support with shards
  parameter.
 
  Ahmet
 
 
 
  On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana 
  acalbaz...@gmail.com wrote:
  Hello,
 
  I have a general question about Solr in a federated search context.  I
  understand that Solr does not do federated search and that  different
 tools
  are often used to incorporate Solr indexes into a federated/enterprise
  search solution.  Does anyone have recommendations on any products (open
  source or otherwise) that addresses this space?
 
  Thanks,
 
  Alejandro

RE: Upgrade from solr 4.4 to 4.10.1

2014-10-02 Thread Toke Eskildsen

Michael Della Bitta [michael.della.bi...@appinions.com] wrote:
 You should of course perform a test first to be sure, but you shouldn't
 need to reindex.

One gotcha is that support for DocValuesFormat=Disk was removed in Solr 4.9, 
so it simply can't open an index using that format. Fortunately it can be 
handled by changing format in the schema and performing an optimize using the 
old Solr version.

How the performance/memory-trade-off of the Disk-format falls under critical 
bug and thus is reason enough to break backwards compatibility, I don't know.

- Toke Eskildsen

Re: Does Solr handle an sshfs mounted index

2014-10-02 Thread Michael Della Bitta

Grainne,

I would recommend that you do not do this. In fact, I would recommend you not
use NFS as well, although that’s more likely to work, just not ideally. Solr’s
going to do best when it’s working with fast, local storage that the OS can
cache natively.

Michael Della Bitta
Senior Software Engineer
o: +1 646 532 3062

appinions inc.
“The Science of Influence Marketing”

18 East 41st Street
New York, NY 10017
t: @appinions | g+: plus.google.com/appinions
w: appinions.com

On Oct 2, 2014, at 14:44, Grainne grainne_rei...@harvard.edu wrote:

I am currently running Solr 4.4.0 on RHEL 6. The index used to be mounted
via nfs and it all worked perfectly fine. For security reasons we switched
the index to be sshfs mounted - and this seems to cause solr to fail after a
while. If we switch back to nfs it works again.

The behavior is strange - Solr starts up and issues an error:
...
Oct 02, 2014 11:43:00 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
...
Caused by: java.io.FileNotFoundException:
/path/to/collection/data/index/_10_Lucene41_0.tim (Operation not permitted)
...

While Solr is running, if, as the same user, I look at the mounted path I
get the same behavior:
-bash-4.1$ ls /mounted/filesystem/path
ls: reading directory /mounted/filesystem/path: Operation not permitted

When I shut down Solr it behaves as expected and I get the file listing.
The file is there and

Several of us, including unix systems people, are looking at why this might
be happening and have yet to figure it out.

Does anyone know if it possible to run Solr where the index is mounted via
sshfs?

Thanks for any advice,
Grainne

--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-Solr-handle-an-sshfs-mounted-index-tp4162375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Silent ping request logging

2014-10-02 Thread Junyang Xin

Hi,

The ping request log as below generates too much noise in my solr log:
INFO: [main0] webapp=/solr path=/admin/ping params={} status=0 QTime=0

I don't want to change the global logging level to eliminate this. Instead,
I wonder if there is a way to change the logging level just for such ping
requests in the code. If so, which class I should look into? Your help will
be really appreciated.

-- 
Best,
Junyang

Re: Export feature issue in Solr 4.10

2014-10-02 Thread Joel Bernstein

Yep getting the same error. Investigating...

Joel Bernstein
Search Engineer at Heliosearch

On Thu, Oct 2, 2014 at 12:59 PM, Ahmed Adel ahmed.a...@badrit.com wrote:

 Hi All,

 I'm trying to use Solr 4.10 export feature, but I'm getting an error. Maybe
 I missed something.

 Here's the scenario:


1. Download Solr 4.10.0
2. Use collection1 schema out of the box
3. Add docValues=true to price and pages fields in schema.xml
4. Index books.json using command line:
curl http://localhost:8984/solr/collection1/update -H
 Content-Type: text/json --data-binary
 @example/exampledocs/books.json
5. Try running this query:

 http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price
6. Here's the error I get:

java.lang.IllegalArgumentException: docID must be = 0 and 
 maxDoc=4 (got docID=4)
 at
 org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
 at
 org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
 at
 org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700)
 at
 org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213)
 at
 org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
 ...


 Any ideas what could be missing?

 Thanks,
 A. Adel

Re: Silent ping request logging

2014-10-02 Thread Junyang Xin

Please allow me to re-phrase my question a bit: I want to eliminate the
dummy ping request, not the ones that are generated by concrete queries.

On Thu, Oct 2, 2014 at 4:10 PM, Junyang Xin xinj...@gmail.com wrote:

 Hi,

 The ping request log as below generates too much noise in my solr log:
 INFO: [main0] webapp=/solr path=/admin/ping params={} status=0 QTime=0

 I don't want to change the global logging level to eliminate this.
 Instead, I wonder if there is a way to change the logging level just for
 such ping requests in the code. If so, which class I should look into? Your
 help will be really appreciated.

 --
 Best,
 Junyang




-- 
Best,
Junyang

Re: Export feature issue in Solr 4.10

2014-10-02 Thread Joel Bernstein

There is bug in how the export handler is working when you have very few
documents in the index and the solrconfig.xml is configured to enable lazy
document loading:

enableLazyFieldLoadingtrue/enableLazyFieldLoading

The tests didn't catch this because lazy loading was set to the default
which is false in the tests. The manual testing I did, didn't catch this
because I tested with a large number of documents in the index.

Your example will work if you change:

enableLazyFieldLoadingfalse/enableLazyFieldLoading

And if you load a typical index with lots of documents you should have no
problems running with lazy loading enabled.

I'll create jira to fix this issue.

Joel Bernstein
Search Engineer at Heliosearch

On Thu, Oct 2, 2014 at 4:10 PM, Joel Bernstein joels...@gmail.com wrote:

Yep getting the same error. Investigating...

Joel Bernstein
Search Engineer at Heliosearch

On Thu, Oct 2, 2014 at 12:59 PM, Ahmed Adel ahmed.a...@badrit.com wrote:

Hi All,

I'm trying to use Solr 4.10 export feature, but I'm getting an error.
Maybe
I missed something.

Here's the scenario:

1. Download Solr 4.10.0
2. Use collection1 schema out of the box
3. Add docValues=true to price and pages fields in schema.xml
4. Index books.json using command line:
curl http://localhost:8984/solr/collection1/update -H
Content-Type: text/json --data-binary
@example/exampledocs/books.json
5. Try running this query:

http://localhost:8984/solr/collection1/export?q=*:*sort=price%20ascfl=price
6. Here's the error I get:

java.lang.IllegalArgumentException: docID must be = 0 and
maxDoc=4 (got docID=4)
at
org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseCompositeReader.java:182)
at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:109)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:700)
at
org.apache.solr.util.SolrPluginUtils.optimizePreFetchDocs(SolrPluginUtils.java:213)
at
org.apache.solr.handler.component.QueryComponent.doPrefetch(QueryComponent.java:623)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:507)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
...

Any ideas what could be missing?

Thanks,
A. Adel

SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Hi All,

I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
replication factor of 2 .

I have fronted these 6 Solr nodes using a load balancer , what I notice is
that every time I do a search of the form
q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
only once in every 3 tries , telling me that the load balancer is
distributing the requests between the 3 shards and SolrCloud only returns a
result if the request goes to the core that as that id .

However if I do a simple search like q=*:* , I consistently get the right
aggregated results back of all the documents across all the shards for
every request from the load balancer. Can someone please let me know what
this is symptomatic of ?

Somehow Solr Cloud seems to be doing search query distribution and
aggregation for queries of type *:* only.

Thanks.

Regarding Default Scoring For Solr

2014-10-02 Thread mdemarco123

If i add this to the end of my query string I get a score back. fl=*,score
Is this the default score? I did read some info on scoring and it is
detailed
and granular and conceptual but because of limited time I can't go into 
the how's at the moment of the score calculation.  Are the links below a
good start
as to the default calculation or can it be put any more into a tutorial
fashion

http://www.lucenetutorial.com/advanced-topics/scoring.html
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-Default-Scoring-For-Solr-tp4162411.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Hmmm, nothing quite makes sense here

Here are some experiments:
1 avoid the load balancer and issue queries like
http://solr_server:8983/solr/collection/q=whateverdistrib=false

the distrib=false bit will cause keep SolrCloud from trying to send
the queries anywhere, they'll be served only from the node you address them to.
that'll help check whether the nodes are consistent. You should be
getting back the same results from each replica in a shard (i.e. 2 of
your 6 machines).

Next, try your failing query the same way.

Next, try your failing query from a browser, pointing it at successive
nodes.

Where is the first place problems show up?

My _guess_ is that your load balancer isn't quite doing what you think, or
your cluster isn't set up the way you think it is, but those are guesses.

Best,
Erick

On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote:
 Hi All,

 I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
 replication factor of 2 .

 I have fronted these 6 Solr nodes using a load balancer , what I notice is
 that every time I do a search of the form
 q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
 only once in every 3 tries , telling me that the load balancer is
 distributing the requests between the 3 shards and SolrCloud only returns a
 result if the request goes to the core that as that id .

 However if I do a simple search like q=*:* , I consistently get the right
 aggregated results back of all the documents across all the shards for
 every request from the load balancer. Can someone please let me know what
 this is symptomatic of ?

 Somehow Solr Cloud seems to be doing search query distribution and
 aggregation for queries of type *:* only.

 Thanks.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Erick,

Thanks for your reply, I tried your suggestions.

1 . When not using loadbalancer if  *I have distrib=false* I get consistent
results across the replicas.

2. However here's the insteresting part , while not using load balancer if
I *dont have distrib=false* , then when I query a particular node ,I get
the same behaviour as if I were using a loadbalancer , meaning the
distributed search from a node works intermittently .Does this give any
clue ?



On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Hmmm, nothing quite makes sense here

 Here are some experiments:
 1 avoid the load balancer and issue queries like
 http://solr_server:8983/solr/collection/q=whateverdistrib=false

 the distrib=false bit will cause keep SolrCloud from trying to send
 the queries anywhere, they'll be served only from the node you address
 them to.
 that'll help check whether the nodes are consistent. You should be
 getting back the same results from each replica in a shard (i.e. 2 of
 your 6 machines).

 Next, try your failing query the same way.

 Next, try your failing query from a browser, pointing it at successive
 nodes.

 Where is the first place problems show up?

 My _guess_ is that your load balancer isn't quite doing what you think, or
 your cluster isn't set up the way you think it is, but those are guesses.

 Best,
 Erick

 On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote:
  Hi All,
 
  I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
  replication factor of 2 .
 
  I have fronted these 6 Solr nodes using a load balancer , what I notice
 is
  that every time I do a search of the form
  q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
  only once in every 3 tries , telling me that the load balancer is
  distributing the requests between the 3 shards and SolrCloud only
 returns a
  result if the request goes to the core that as that id .
 
  However if I do a simple search like q=*:* , I consistently get the right
  aggregated results back of all the documents across all the shards for
  every request from the load balancer. Can someone please let me know what
  this is symptomatic of ?
 
  Somehow Solr Cloud seems to be doing search query distribution and
  aggregation for queries of type *:* only.
 
  Thanks.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Erick,

I would like to add that the interesting behavior i.e point #2 that I
mentioned in my earlier reply  happens in all the shards , if this were to
be a distributed search issue this should have not manifested itself in the
shard that contains the key that I am searching for , looks like the search
is just failing as whole intermittently .

Also ,the collection is being actively indexed as I query this, could that
be an issue too ?

Thanks.

On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote:

 Erick,

 Thanks for your reply, I tried your suggestions.

 1 . When not using loadbalancer if  *I have distrib=false* I get
 consistent results across the replicas.

 2. However here's the insteresting part , while not using load balancer if
 I *dont have distrib=false* , then when I query a particular node ,I get
 the same behaviour as if I were using a loadbalancer , meaning the
 distributed search from a node works intermittently .Does this give any
 clue ?



 On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Hmmm, nothing quite makes sense here

 Here are some experiments:
 1 avoid the load balancer and issue queries like
 http://solr_server:8983/solr/collection/q=whateverdistrib=false

 the distrib=false bit will cause keep SolrCloud from trying to send
 the queries anywhere, they'll be served only from the node you address
 them to.
 that'll help check whether the nodes are consistent. You should be
 getting back the same results from each replica in a shard (i.e. 2 of
 your 6 machines).

 Next, try your failing query the same way.

 Next, try your failing query from a browser, pointing it at successive
 nodes.

 Where is the first place problems show up?

 My _guess_ is that your load balancer isn't quite doing what you think, or
 your cluster isn't set up the way you think it is, but those are guesses.

 Best,
 Erick

 On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote:
  Hi All,
 
  I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
  replication factor of 2 .
 
  I have fronted these 6 Solr nodes using a load balancer , what I notice
 is
  that every time I do a search of the form
  q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
  only once in every 3 tries , telling me that the load balancer is
  distributing the requests between the 3 shards and SolrCloud only
 returns a
  result if the request goes to the core that as that id .
 
  However if I do a simple search like q=*:* , I consistently get the
 right
  aggregated results back of all the documents across all the shards for
  every request from the load balancer. Can someone please let me know
 what
  this is symptomatic of ?
 
  Somehow Solr Cloud seems to be doing search query distribution and
  aggregation for queries of type *:* only.
 
  Thanks.

Boosting Top selling items

2014-10-02 Thread Bob Laferriere

I have been working to try and identify top selling items in an eCommerce app 
and boost those in the results. The struggle I am having is that our catalog 
stores products and parts in the same taxonomy. Since parts are ordered more 
frequently when you search for something like TV you see cables and antennas 
first. My theory is that someone needs to tag products as Top Selling as a 
facet then use faceted search to avoid an artificial boost which screws up 
document relevance. Anyone fight with anything similar? Interested in 
discussing with other eCommerce search developers.

Regards,

Bob

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

bq: Also ,the collection is being actively indexed as I query this, could that
be an issue too ?

Not if the documents you're searching aren't being added as you search
(and all your autocommit intervals have expired).

I would turn off indexing for testing, it's just one more variable
that can get in the way of understanding this.

Do note that if the problem were endemic to Solr, there would probably
be a _lot_ more noise out there.

So to recap:
0 we can take the load balancer out of the picture all together.

1 when you query each shard individually with distrib=true, every
replica in a particular shard returns the same count.

2 when you query without distrib=true you get varying counts.

This is very strange and not at all expected. Let's try it again
without indexing going on

And what do you mean by indexing anyway? How are documents being fed
to your system?

Best,
Erick@PuzzledAsWell

On Thu, Oct 2, 2014 at 7:32 PM, S.L simpleliving...@gmail.com wrote:
 Erick,

 I would like to add that the interesting behavior i.e point #2 that I
 mentioned in my earlier reply  happens in all the shards , if this were to
 be a distributed search issue this should have not manifested itself in the
 shard that contains the key that I am searching for , looks like the search
 is just failing as whole intermittently .

 Also ,the collection is being actively indexed as I query this, could that
 be an issue too ?

 Thanks.

 On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote:

 Erick,

 Thanks for your reply, I tried your suggestions.

 1 . When not using loadbalancer if  *I have distrib=false* I get
 consistent results across the replicas.

 2. However here's the insteresting part , while not using load balancer if
 I *dont have distrib=false* , then when I query a particular node ,I get
 the same behaviour as if I were using a loadbalancer , meaning the
 distributed search from a node works intermittently .Does this give any
 clue ?



 On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Hmmm, nothing quite makes sense here

 Here are some experiments:
 1 avoid the load balancer and issue queries like
 http://solr_server:8983/solr/collection/q=whateverdistrib=false

 the distrib=false bit will cause keep SolrCloud from trying to send
 the queries anywhere, they'll be served only from the node you address
 them to.
 that'll help check whether the nodes are consistent. You should be
 getting back the same results from each replica in a shard (i.e. 2 of
 your 6 machines).

 Next, try your failing query the same way.

 Next, try your failing query from a browser, pointing it at successive
 nodes.

 Where is the first place problems show up?

 My _guess_ is that your load balancer isn't quite doing what you think, or
 your cluster isn't set up the way you think it is, but those are guesses.

 Best,
 Erick

 On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote:
  Hi All,
 
  I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
  replication factor of 2 .
 
  I have fronted these 6 Solr nodes using a load balancer , what I notice
 is
  that every time I do a search of the form
  q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
  only once in every 3 tries , telling me that the load balancer is
  distributing the requests between the 3 shards and SolrCloud only
 returns a
  result if the request goes to the core that as that id .
 
  However if I do a simple search like q=*:* , I consistently get the
 right
  aggregated results back of all the documents across all the shards for
  every request from the load balancer. Can someone please let me know
 what
  this is symptomatic of ?
 
  Somehow Solr Cloud seems to be doing search query distribution and
  aggregation for queries of type *:* only.
 
  Thanks.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

Eirck,

0 Load balancer is out of the picture
.
1When I query with *distrib=false* , I get consistent results as expected
for those shards that dont have the key i.e I dont get the results back for
those shards, however I just realized that while *distrib=false* is present
in the query for the shard that is supposed to contain the key,only the
replica of the shard that has this key returns the result , and the leader
does not , looks like replica and the leader do not have the same data and
replica seems to contain the key in the query for that shard.

2 By indexing I mean this collection is being populated by a web crawler.

So looks like 1 above  is pointing to leader and replica being out of
synch for atleast one shard.



On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson erickerick...@gmail.com
wrote:

 bq: Also ,the collection is being actively indexed as I query this, could
 that
 be an issue too ?

 Not if the documents you're searching aren't being added as you search
 (and all your autocommit intervals have expired).

 I would turn off indexing for testing, it's just one more variable
 that can get in the way of understanding this.

 Do note that if the problem were endemic to Solr, there would probably
 be a _lot_ more noise out there.

 So to recap:
 0 we can take the load balancer out of the picture all together.

 1 when you query each shard individually with distrib=true, every
 replica in a particular shard returns the same count.

 2 when you query without distrib=true you get varying counts.

 This is very strange and not at all expected. Let's try it again
 without indexing going on

 And what do you mean by indexing anyway? How are documents being fed
 to your system?

 Best,
 Erick@PuzzledAsWell

 On Thu, Oct 2, 2014 at 7:32 PM, S.L simpleliving...@gmail.com wrote:
  Erick,
 
  I would like to add that the interesting behavior i.e point #2 that I
  mentioned in my earlier reply  happens in all the shards , if this were
 to
  be a distributed search issue this should have not manifested itself in
 the
  shard that contains the key that I am searching for , looks like the
 search
  is just failing as whole intermittently .
 
  Also ,the collection is being actively indexed as I query this, could
 that
  be an issue too ?
 
  Thanks.
 
  On Thu, Oct 2, 2014 at 10:24 PM, S.L simpleliving...@gmail.com wrote:
 
  Erick,
 
  Thanks for your reply, I tried your suggestions.
 
  1 . When not using loadbalancer if  *I have distrib=false* I get
  consistent results across the replicas.
 
  2. However here's the insteresting part , while not using load balancer
 if
  I *dont have distrib=false* , then when I query a particular node ,I get
  the same behaviour as if I were using a loadbalancer , meaning the
  distributed search from a node works intermittently .Does this give any
  clue ?
 
 
 
  On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Hmmm, nothing quite makes sense here
 
  Here are some experiments:
  1 avoid the load balancer and issue queries like
  http://solr_server:8983/solr/collection/q=whateverdistrib=false
 
  the distrib=false bit will cause keep SolrCloud from trying to send
  the queries anywhere, they'll be served only from the node you address
  them to.
  that'll help check whether the nodes are consistent. You should be
  getting back the same results from each replica in a shard (i.e. 2 of
  your 6 machines).
 
  Next, try your failing query the same way.
 
  Next, try your failing query from a browser, pointing it at successive
  nodes.
 
  Where is the first place problems show up?
 
  My _guess_ is that your load balancer isn't quite doing what you
 think, or
  your cluster isn't set up the way you think it is, but those are
 guesses.
 
  Best,
  Erick
 
  On Thu, Oct 2, 2014 at 2:51 PM, S.L simpleliving...@gmail.com wrote:
   Hi All,
  
   I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
   replication factor of 2 .
  
   I have fronted these 6 Solr nodes using a load balancer , what I
 notice
  is
   that every time I do a search of the form
   q=*:*fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a
 result
   only once in every 3 tries , telling me that the load balancer is
   distributing the requests between the 3 shards and SolrCloud only
  returns a
   result if the request goes to the core that as that id .
  
   However if I do a simple search like q=*:* , I consistently get the
  right
   aggregated results back of all the documents across all the shards
 for
   every request from the load balancer. Can someone please let me know
  what
   this is symptomatic of ?
  
   Somehow Solr Cloud seems to be doing search query distribution and
   aggregation for queries of type *:* only.
  
   Thanks.

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.