Re: How to check if a solr core is down and is ready for a solr re-start

2017-08-31 Thread Erick Erickson
I have to ask a completely different question: "Why are the replicas
down in the first place?" Having to restart the Solr node is a
sledgehammer of a solution, I'd put more effort into finding out why
that's happening in the first place.

Are you getting OOM errors? Any other exception? Is the OOM killer
script executing?

Best,
Erick

On Thu, Aug 31, 2017 at 10:59 AM, Minu Theresa Thomas
 wrote:
> Hello Team,
>
> I have few experiences where restart of a solr node is the only option when
> a core goes down. I am trying to automate the restart of a solr server when
> a core goes down or the replica is unresponsive over a period of time.
>
> I have a script to check if the cores/ replicas associated with a node is
> up. I have two approaches - One is to get the cores from solr CLUSTERSTATUS
> API and do a PING on each core. If atleast one core on the node doesn't
> repond to ping, then mark that node down and do restart after few retries.
> Second is to get the cores from the solr CLUSTERSTATUS API along with its
> status. If the status is down, then mark that node down and do a restart
> after few retries.
>
> Which is the best way/ recommended approach to check if a core associated
> with a node is down and is ready for a solr service restart?
>
> Thanks!


Re: RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Zheng Lin Edwin Yeo
Hi Markus,

I did encounter the error before. It was due to the usage of the memory of
the server is very high, at almost 98%.
The usage is not all contributed by Solr, there are other programs and
Virtual Machines running on the server.

What is the memory usage of the system, and do you have other applications
running on the server?

Regards,
Edwin

On 31 August 2017 at 16:07, Stephan Schubert 
wrote:

> Hi Markus,
>
> I don't know what Client you use, but if you are using SolrJ enabling the
> logging could be an option to "dig deeper" into the problem. This can be
> the ouput for example via log4j on log level info:
>
> ...
> 2017-08-31 10:01:56 INFO  ZooKeeper:438 - Initiating client connection,
> connectString=ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,
> ZKHOST4:9983,ZKHOST5:9983
> sessionTimeout=60
> watcher=org.apache.solr.common.cloud.SolrZkClient$3@14379273
> 2017-08-31 10:01:56 INFO  ClientCnxn:876 - Socket connection established
> to SOLRHOST/ZKHOST3:9983, initiating session
> 2017-08-31 10:01:56 INFO  ClientCnxn:1299 - Session establishment complete
> on server SOLRHOST/ZKHOST3:9983, sessionid = 0x45e35eaa9fd3584, negotiated
> timeout = 4
> 2017-08-31 10:01:56 INFO  ZkStateReader:688 - Updated live nodes from
> ZooKeeper... (0) -> (4)
> 2017-08-31 10:01:56 INFO  ZkClientClusterStateProvider:134 - Cluster at
> ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 ready
>
>
>
>
>
> Von:Markus Jelsma 
> An: solr-user@lucene.apache.org 
> Datum:  31.08.2017 10:00
> Betreff:RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are
> disabled.
>
>
>
> Hello Stephan,
>
> I know that restarting stuff can sometimes cure what's wrong, but we are
> nog going to, we want to get rid of the problem, not restart microsoft
> windows whenever things run slow. Also, there is no indexing going on
> right now.
>
> We also see these sometimes, this explains at least why it cannot talk to
> Zookeeper, but why..
>  o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd,
> retrying..
>
> This has been going on with just one of our nodes for over two hours,
> other nodes are fine. And why is this bad node green in cluster overview?
>
> Thanks,
> Markus
>
> -Original message-
> > From:Stephan Schubert 
> > Sent: Thursday 31st August 2017 9:52
> > To: solr-user@lucene.apache.org
> > Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
> >
> > Hi markus,
> >
> > try to stop your indexing/update processes and restart your ZooKeeper
> > instances (not all at the same time of course). This is what I do in
> these
> > cases and helped me so far.
> >
> >
> >
> >
> > Von:Markus Jelsma 
> > An: Solr-user 
> > Datum:  31.08.2017 09:49
> > Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled.
> >
> >
> >
> > Hello,
> >
> > One node is behaving badly, at least according to the logs, but the node
>
> > is green in the cluster overview although the logs claim recovery fails
> > all the time. It is not the first time this message pops up in the logs
> of
> > one of the nodes, why can it not talk to Zookeeper? I miss a reason.
> >
> > The cluster is not extremely busy at the moment, we allow plenty of file
>
> > descriptors, there are no firewall restrictions, i cannot think of any
> > problem in our infrastructure.
> >
> > What's going on? What can i do? Can the error be explained a bit
> further?
> >
> > Thanks,
> > Markus
> >
> > 8/31/2017, 9:34:34 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> > 8/31/2017, 9:34:34 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> > 8/31/2017, 9:34:36 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> > 8/31/2017, 9:34:38 AM
> > ERROR false
> > RecoveryStrategy
> > Could not publish as ACTIVE after succesful recovery
> > 8/31/2017, 9:34:38 AM
> > ERROR true
> > RecoveryStrategy
> > Recovery failed - trying again... (0)
> > 8/31/2017, 9:34:49 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> > 8/31/2017, 9:34:49 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> > 8/31/2017, 9:34:50 AM
> > ERROR false
> > RecoveryStrategy
> > Could not publish as ACTIVE after succesful recovery
> > 8/31/2017, 9:34:50 AM
> > ERROR false
> > RecoveryStrategy
> > Recovery failed - trying again... (1)
> > 8/31/2017, 9:35:36 AM
> > ERROR false
> > RequestHandlerBase
> > org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
>
> > are disabled.
> >
> >
> >
> >
>
>
>
>


Re: Facet on a Payload field type?

2017-08-31 Thread Chris Hostetter

if the "middle tier" of your application doesn't already have an easy 
key-value lookup that you keep this translation data in (which would 
suprise me, because i've never seen anyone care about this type of 
"late-binding" translation of serach results w/o also caring about 
late-binding translation of other aspects of the UI) then you could always 
create a side car collection in solr: one document per "word" using the 
english term as the id, with a lowercased copy field for searching + one
field per langauge with the trnaslations if available.

after doing your main query, toss all the facet.field terms in the 
response into a second query to your side car "translation" collection 
using the "terms" parser and setting the "rows" == the total number of 
terms you're asking to tnraslate and fl=id,fr (or fl=id,es ... whatever 
language the user wants)

https://lucene.apache.org/solr/guide/6_6/other-parsers.html

...then use those results to translate the final output.


: Date: Thu, 31 Aug 2017 14:12:38 -0500
: From: Webster Homer 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Facet on a Payload field type?
: 
: You are describing the idea pretty accurately. Apparently Endeca has
: something that sort of supports this, which we used for the problem.
: 
: On Thu, Aug 31, 2017 at 1:59 PM, Chris Hostetter 
: wrote:
: 
: >
: > ok, so lemme attempt to restate your objective to ensure no
: > miscommunication:
: >
: > 1) you have fields like "color"
: > 2) you want to index english words into the color field for docs
: > 3) you want to search/filter against these fields using english words as
: > input
: > 4) you want to facet on the fields like "color"
: > 5) you want the list of terms:counts displayed to the end user when
: > faceting on these fields to be in a variety of different langauges, based
: > on a "user_lang" option specified at query time and a set of known
: > translations
: > 6) if no english->user_lang translation is available for a particular
: > term, you want to display the eglish workd when displaying the facet
: > counts
: >
: > does that sound right?
: >
: > based on your objective, attempting to embed/encode the various
: > translations into the terms when indexing (as payloads, or an
: > alternative field or prefixed terms, etc...) seems like a vastly
: > overcomplicated way to deal with this problem.
: >
: > If i were in your shoes, i would keep the translation aspect of the
: > displya completely distinct from Solr, and after solr has returned the
: > response then loop over the facet.field temrs and do a lookup in some
: > other (cached) key/value translation mapping in your middle layer --
: > replacing the english word with the translation if it exists.
: >
: > This has the added benefit of allowing you to tweak the translations w/o
: > reindexing any docs.
: >
: > Practically speaking: the idea of encoding these translations as payloads
: > wouldn't make sense -- because payloads exist per *occurance* of the term
: > -- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of
: > a term "red" when indexing a document, because you want those translations
: > for all instances of red -- not just that instance of red in that
: > singlular document.
: >
: >
: >
: > : Date: Mon, 28 Aug 2017 13:29:00 -0500
: > : From: Webster Homer 
: > : Reply-To: solr-user@lucene.apache.org
: > : To: solr-user@lucene.apache.org
: > : Subject: Re: Facet on a Payload field type?
: > :
: > : The issue is, that we lack translations for much of our attribute data.
: > We
: > : do have English versions. The idea is to use the English values for the
: > : faceted values and for the filters, but be able to retrieve different
: > : language versions of the term to the caller.
: > : If we have a facet on color if the value is red, be able to retrieve rojo
: > : for Spanish etc...
: > :
: > : Also users can switch regions between searches. If a user starts out in
: > : French, executes a search, selects a facet then switches to German they
: > : should get the German for the facet (if it exists) even when they
: > : originally used French. If all of the searching was in English where we
: > : have the data, we could then show French (or German etc) for the facet
: > : value.
: > :
: > : The real field value that we use for filtering would be in English but
: > the
: > : values returned to the user would be in the language of their locale or
: > : English if we don't have a translation for it. The idea being that the
: > : translations would be stored in the payloads
: > :
: > : On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter <
: > hossman_luc...@fucit.org>
: > : wrote:
: > :
: > : >
: > : > : The payload idea was from my boss, it's similar to how they did this
: > in
: > : > : Endeca.
: > : > ...
: > : > : My alternate idea is to have sets of facet fields for different
: > : > languages,
: > : > : then let our service layer determine the correct on

Re: Facet on a Payload field type?

2017-08-31 Thread Webster Homer
You are describing the idea pretty accurately. Apparently Endeca has
something that sort of supports this, which we used for the problem.

On Thu, Aug 31, 2017 at 1:59 PM, Chris Hostetter 
wrote:

>
> ok, so lemme attempt to restate your objective to ensure no
> miscommunication:
>
> 1) you have fields like "color"
> 2) you want to index english words into the color field for docs
> 3) you want to search/filter against these fields using english words as
> input
> 4) you want to facet on the fields like "color"
> 5) you want the list of terms:counts displayed to the end user when
> faceting on these fields to be in a variety of different langauges, based
> on a "user_lang" option specified at query time and a set of known
> translations
> 6) if no english->user_lang translation is available for a particular
> term, you want to display the eglish workd when displaying the facet
> counts
>
> does that sound right?
>
> based on your objective, attempting to embed/encode the various
> translations into the terms when indexing (as payloads, or an
> alternative field or prefixed terms, etc...) seems like a vastly
> overcomplicated way to deal with this problem.
>
> If i were in your shoes, i would keep the translation aspect of the
> displya completely distinct from Solr, and after solr has returned the
> response then loop over the facet.field temrs and do a lookup in some
> other (cached) key/value translation mapping in your middle layer --
> replacing the english word with the translation if it exists.
>
> This has the added benefit of allowing you to tweak the translations w/o
> reindexing any docs.
>
> Practically speaking: the idea of encoding these translations as payloads
> wouldn't make sense -- because payloads exist per *occurance* of the term
> -- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of
> a term "red" when indexing a document, because you want those translations
> for all instances of red -- not just that instance of red in that
> singlular document.
>
>
>
> : Date: Mon, 28 Aug 2017 13:29:00 -0500
> : From: Webster Homer 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: Re: Facet on a Payload field type?
> :
> : The issue is, that we lack translations for much of our attribute data.
> We
> : do have English versions. The idea is to use the English values for the
> : faceted values and for the filters, but be able to retrieve different
> : language versions of the term to the caller.
> : If we have a facet on color if the value is red, be able to retrieve rojo
> : for Spanish etc...
> :
> : Also users can switch regions between searches. If a user starts out in
> : French, executes a search, selects a facet then switches to German they
> : should get the German for the facet (if it exists) even when they
> : originally used French. If all of the searching was in English where we
> : have the data, we could then show French (or German etc) for the facet
> : value.
> :
> : The real field value that we use for filtering would be in English but
> the
> : values returned to the user would be in the language of their locale or
> : English if we don't have a translation for it. The idea being that the
> : translations would be stored in the payloads
> :
> : On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter <
> hossman_luc...@fucit.org>
> : wrote:
> :
> : >
> : > : The payload idea was from my boss, it's similar to how they did this
> in
> : > : Endeca.
> : > ...
> : > : My alternate idea is to have sets of facet fields for different
> : > languages,
> : > : then let our service layer determine the correct one for the user's
> : > : language, but I'm curious as to how others have solved this.
> : >
> : > Let's back up for a minute -- can you please explain your ultimate
> goal,
> : > from a "solr client application" perspective? (assuming we have no
> : > knowledge of how/how you might have used Endeca in the past)
> : >
> : > What is it you want your application to be able to do when indexing
> docs
> : > to solr and making queries to solr?  give us some real world examples
> : >
> : >
> : >
> : > (If i had to guess: i gather maybe you're just dealing with a
> "keywords"
> : > type field that you want to facet on -- and maybe you could use a diff
> : > field for each langauge, or encode the langauges as a prefix on each
> term
> : > and use facet.prefix to restrict the facet contraints returned)
> : >
> : >
> : >
> : > https://people.apache.org/~hossman/#xyproblem
> : > XY Problem
> : >
> : > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> : > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> : > without giving more details about the "X" so that we can understand the
> : > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> : > See Also: http://www.perlmonks.org/index.pl?node_id=542341
> : >
> : >
> : >
> : > :
> : > : On Wed, Aug 23, 2017 at 2:10 PM, Mar

RE: data import class not found

2017-08-31 Thread Steve Pruitt
I just tried putting the solr-dataimporthandler-6.6.0.jar in server/solr/lib 
and I got past the problem.  I still don't understand why not found in /dist

-Original Message-
From: Steve Pruitt [mailto:bpru...@opentext.com] 
Sent: Thursday, August 31, 2017 3:05 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] - data import class not found

I still can't understand how Solr establishes the classpath.

I have a custom entity processor that subclasses EntityProcessorBase.  When I 
execute the /dataimport call I get

java.lang.NoClassDefFoundError: 
org/apache/solr/handler/dataimport/EntityProcessorBase

no matter how I state in solrconfig.xml to locate the solr-dataimporthandler 
jar.

I have  tried:

from the existing libs in solrconfig.xml 

from the Ref Guide


try anything


But, I always get the class not found error.  The DataImportHandler class is 
found when Solr starts, since EntityProcessorBase is in the same jar why is it 
not found.

I have not tried putting in the core's lib thinking the above should work.  Of 
course, the 3rd choice is only an experiment.


Thanks.

-S


data import class not found

2017-08-31 Thread Steve Pruitt
I still can't understand how Solr establishes the classpath.

I have a custom entity processor that subclasses EntityProcessorBase.  When I 
execute the /dataimport call I get

java.lang.NoClassDefFoundError: 
org/apache/solr/handler/dataimport/EntityProcessorBase

no matter how I state in solrconfig.xml to locate the solr-dataimporthandler 
jar.

I have  tried:

from the existing libs in solrconfig.xml


from the Ref Guide


try anything


But, I always get the class not found error.  The DataImportHandler class is 
found when Solr starts, since EntityProcessorBase is in the same jar why is it 
not found.

I have not tried putting in the core's lib thinking the above should work.  Of 
course, the 3rd choice is only an experiment.


Thanks.

-S


Re: Facet on a Payload field type?

2017-08-31 Thread Chris Hostetter

ok, so lemme attempt to restate your objective to ensure no 
miscommunication:

1) you have fields like "color"
2) you want to index english words into the color field for docs
3) you want to search/filter against these fields using english words as 
input
4) you want to facet on the fields like "color"
5) you want the list of terms:counts displayed to the end user when 
faceting on these fields to be in a variety of different langauges, based 
on a "user_lang" option specified at query time and a set of known 
translations
6) if no english->user_lang translation is available for a particular 
term, you want to display the eglish workd when displaying the facet 
counts

does that sound right?

based on your objective, attempting to embed/encode the various 
translations into the terms when indexing (as payloads, or an 
alternative field or prefixed terms, etc...) seems like a vastly 
overcomplicated way to deal with this problem.

If i were in your shoes, i would keep the translation aspect of the 
displya completely distinct from Solr, and after solr has returned the 
response then loop over the facet.field temrs and do a lookup in some 
other (cached) key/value translation mapping in your middle layer -- 
replacing the english word with the translation if it exists.

This has the added benefit of allowing you to tweak the translations w/o 
reindexing any docs.

Practically speaking: the idea of encoding these translations as payloads 
wouldn't make sense -- because payloads exist per *occurance* of the term 
-- ie: it wouldn't make sense to put "es=rojo;fr=rouge" in the payload of 
a term "red" when indexing a document, because you want those translations 
for all instances of red -- not just that instance of red in that 
singlular document.  



: Date: Mon, 28 Aug 2017 13:29:00 -0500
: From: Webster Homer 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Facet on a Payload field type?
: 
: The issue is, that we lack translations for much of our attribute data. We
: do have English versions. The idea is to use the English values for the
: faceted values and for the filters, but be able to retrieve different
: language versions of the term to the caller.
: If we have a facet on color if the value is red, be able to retrieve rojo
: for Spanish etc...
: 
: Also users can switch regions between searches. If a user starts out in
: French, executes a search, selects a facet then switches to German they
: should get the German for the facet (if it exists) even when they
: originally used French. If all of the searching was in English where we
: have the data, we could then show French (or German etc) for the facet
: value.
: 
: The real field value that we use for filtering would be in English but the
: values returned to the user would be in the language of their locale or
: English if we don't have a translation for it. The idea being that the
: translations would be stored in the payloads
: 
: On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter 
: wrote:
: 
: >
: > : The payload idea was from my boss, it's similar to how they did this in
: > : Endeca.
: > ...
: > : My alternate idea is to have sets of facet fields for different
: > languages,
: > : then let our service layer determine the correct one for the user's
: > : language, but I'm curious as to how others have solved this.
: >
: > Let's back up for a minute -- can you please explain your ultimate goal,
: > from a "solr client application" perspective? (assuming we have no
: > knowledge of how/how you might have used Endeca in the past)
: >
: > What is it you want your application to be able to do when indexing docs
: > to solr and making queries to solr?  give us some real world examples
: >
: >
: >
: > (If i had to guess: i gather maybe you're just dealing with a "keywords"
: > type field that you want to facet on -- and maybe you could use a diff
: > field for each langauge, or encode the langauges as a prefix on each term
: > and use facet.prefix to restrict the facet contraints returned)
: >
: >
: >
: > https://people.apache.org/~hossman/#xyproblem
: > XY Problem
: >
: > Your question appears to be an "XY Problem" ... that is: you are dealing
: > with "X", you are assuming "Y" will help you, and you are asking about "Y"
: > without giving more details about the "X" so that we can understand the
: > full issue.  Perhaps the best solution doesn't involve "Y" at all?
: > See Also: http://www.perlmonks.org/index.pl?node_id=542341
: >
: >
: >
: > :
: > : On Wed, Aug 23, 2017 at 2:10 PM, Markus Jelsma <
: > markus.jel...@openindex.io>
: > : wrote:
: > :
: > : > Technically they could, facetting is possible on TextField, but it
: > would
: > : > be useless for facetting. Payloads are only used for scoring via a
: > custom
: > : > Similarity. Payloads also can only contain one byte of information (or
: > was
: > : > it 64 bits?)
: > : >
: > : > Payloads are not something you want to use when dealing with
: >

Re: query with wild card with AND taking lot of time

2017-08-31 Thread David Hastings
a field:* query always takes a long time, and should be avoided if at all
possible.  solr/lucene is still going to try to rank the documents based on
that, even thought theres nothing to really rank.  every single document
where that field is not empty will have the same score for that part of the
ranking.  I dont know what the purpose of adding that in is in your case.

On Thu, Aug 31, 2017 at 2:38 PM, Josh Lincoln 
wrote:

> The closest thing to an execution plan that I know of is debug=true.That'll
> show timings of some of the components
> I also find it useful to add echoParams=all when troubleshooting. That'll
> show every param solr is using for the request, including params set in
> solrconfig.xml and not passed in the request. This can help explain the
> debug output (e.g. what queryparser is being used, if fields are being
> expanded through field aliases, etc.).
>
> On Thu, Aug 31, 2017 at 1:35 PM suresh pendap 
> wrote:
>
> > Hello everybody,
> >
> > We are seeing that the below query is running very slow and taking
> almost 4
> > seconds to finish
> >
> >
> > [] webapp=/solr path=/select
> >
> > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > :8983/solr/flat_product_index_shard7_replica1/
> %7Chttp://:8983/solr/flat_product_index_shard7_
> replica2/%7Chttp://:8983/solr/flat_product_index_
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> AND+abstract_or_primary_product_id:*+AND+(gtin:<
> numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > hits=0 status=0 QTime=3663
> >
> >
> > It seems like the abstract_or_primary_product_id:* clause is
> contributing
> > to the overall response time. It seems that the
> > abstract_or_primary_product_id:* . clause is not adding any value in the
> > query criteria and can be safely removed.  Is my understanding correct?
> >
> > I would like to know if the order of the clauses in the AND query would
> > affect the response time of the query?
> >
> > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> >
> > Doesn't Lucene/Solr pick up the optimal query execution plan?
> >
> > Is there anyway to look at the query execution plan generated by Lucene?
> >
> > Regards
> > Suresh
> >
>


Re: query with wild card with AND taking lot of time

2017-08-31 Thread Josh Lincoln
The closest thing to an execution plan that I know of is debug=true.That'll
show timings of some of the components
I also find it useful to add echoParams=all when troubleshooting. That'll
show every param solr is using for the request, including params set in
solrconfig.xml and not passed in the request. This can help explain the
debug output (e.g. what queryparser is being used, if fields are being
expanded through field aliases, etc.).

On Thu, Aug 31, 2017 at 1:35 PM suresh pendap 
wrote:

> Hello everybody,
>
> We are seeing that the below query is running very slow and taking almost 4
> seconds to finish
>
>
> [] webapp=/solr path=/select
>
> params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> :8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> hits=0 status=0 QTime=3663
>
>
> It seems like the abstract_or_primary_product_id:* clause is contributing
> to the overall response time. It seems that the
> abstract_or_primary_product_id:* . clause is not adding any value in the
> query criteria and can be safely removed.  Is my understanding correct?
>
> I would like to know if the order of the clauses in the AND query would
> affect the response time of the query?
>
> For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
>
> Doesn't Lucene/Solr pick up the optimal query execution plan?
>
> Is there anyway to look at the query execution plan generated by Lucene?
>
> Regards
> Suresh
>


Re: query with wild card with AND taking lot of time

2017-08-31 Thread suresh pendap
Thanks Lincoln for your suggestions. It was very helpful.

I am still curious as to why is the original query taking long time.  It is
something that Lucene should have ideally optimized.
Is there any way to see the execution plan used by Lucene?

Thanks
Suresh


On Thu, Aug 31, 2017 at 11:11 AM, Josh Lincoln 
wrote:

> As I understand it, using a different fq for each clause makes the
> resultant caches more likely to be used in future requests.
>
> For the query
> fq=first:bob AND last:smith
> a subsequent query for
> fq=first:tim AND last:smith
> won't be able to use the fq cache from the first query.
>
> However, if the first query was
> fq=first:bob
> fq=last:smith
> and subsequently
> fq=first:tim
> fq=last:smith
> then the second query will at least benefit from the last:smith cache
>
> Because fq clauses are always ANDed, this does not work for ORed clauses.
>
> I suppose if some conditions are frequently used together it may be better
> to put them in the same fq so there's only one cache. E.g. if an ecommerce
> site reqularly queried for featured:Y AND instock:Y
>
> On Thu, Aug 31, 2017 at 1:48 PM David Hastings <
> hastings.recurs...@gmail.com>
> wrote:
>
> > >
> > > 2) Because all your clauses are more like filters and are ANDed
> together,
> > > you'll likely get better performance by putting them _each_ in an fq
> > > E.g.
> > > fq=product_identifier_type:DOTCOM_OFFER
> > > fq=abstract_or_primary_product_id:[* TO *]
> >
> >
> > why is this the case?  is it just better to have no logic operators in
> the
> > filter queries?
> >
> >
> >
> > On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln 
> > wrote:
> >
> > > Suresh,
> > > Two things I noticed.
> > > 1) If your intent is to only match records where there's something,
> > > anything, in abstract_or_primary_product_id, you should use
> fieldname:[*
> > > TO
> > > *]  but that will exclude records where that field is empty/missing. If
> > you
> > > want to match records even if that field is empty/missing, then you
> > should
> > > remove that clause entirely
> > > 2) Because all your clauses are more like filters and are ANDed
> together,
> > > you'll likely get better performance by putting them _each_ in an fq
> > > E.g.
> > > fq=product_identifier_type:DOTCOM_OFFER
> > > fq=abstract_or_primary_product_id:[* TO *]
> > > fq=gtin:
> > > fq=product_class_type:BUNDLE
> > > fq=hasProduct:N
> > >
> > >
> > > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap  >
> > > wrote:
> > >
> > > > Hello everybody,
> > > >
> > > > We are seeing that the below query is running very slow and taking
> > > almost 4
> > > > seconds to finish
> > > >
> > > >
> > > > [] webapp=/solr path=/select
> > > >
> > > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> > > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > > > :8983/solr/flat_product_index_shard7_replica1/
> > > %7Chttp://:8983/solr/flat_product_index_shard7_
> > > replica2/%7Chttp://:8983/solr/flat_product_index_
> > >
> > shard7_replica0/&rows=11&version=2&q=product_
> identifier_type:DOTCOM_OFFER+
> > > AND+abstract_or_primary_product_id:*+AND+(gtin:<
> > > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> > > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > > > hits=0 status=0 QTime=3663
> > > >
> > > >
> > > > It seems like the abstract_or_primary_product_id:* clause is
> > > contributing
> > > > to the overall response time. It seems that the
> > > > abstract_or_primary_product_id:* . clause is not adding any value in
> > the
> > > > query criteria and can be safely removed.  Is my understanding
> correct?
> > > >
> > > > I would like to know if the order of the clauses in the AND query
> would
> > > > affect the response time of the query?
> > > >
> > > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> > > >
> > > > Doesn't Lucene/Solr pick up the optimal query execution plan?
> > > >
> > > > Is there anyway to look at the query execution plan generated by
> > Lucene?
> > > >
> > > > Regards
> > > > Suresh
> > > >
> > >
> >
>


Re: query with wild card with AND taking lot of time

2017-08-31 Thread Josh Lincoln
As I understand it, using a different fq for each clause makes the
resultant caches more likely to be used in future requests.

For the query
fq=first:bob AND last:smith
a subsequent query for
fq=first:tim AND last:smith
won't be able to use the fq cache from the first query.

However, if the first query was
fq=first:bob
fq=last:smith
and subsequently
fq=first:tim
fq=last:smith
then the second query will at least benefit from the last:smith cache

Because fq clauses are always ANDed, this does not work for ORed clauses.

I suppose if some conditions are frequently used together it may be better
to put them in the same fq so there's only one cache. E.g. if an ecommerce
site reqularly queried for featured:Y AND instock:Y

On Thu, Aug 31, 2017 at 1:48 PM David Hastings 
wrote:

> >
> > 2) Because all your clauses are more like filters and are ANDed together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
>
>
> why is this the case?  is it just better to have no logic operators in the
> filter queries?
>
>
>
> On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln 
> wrote:
>
> > Suresh,
> > Two things I noticed.
> > 1) If your intent is to only match records where there's something,
> > anything, in abstract_or_primary_product_id, you should use fieldname:[*
> > TO
> > *]  but that will exclude records where that field is empty/missing. If
> you
> > want to match records even if that field is empty/missing, then you
> should
> > remove that clause entirely
> > 2) Because all your clauses are more like filters and are ANDed together,
> > you'll likely get better performance by putting them _each_ in an fq
> > E.g.
> > fq=product_identifier_type:DOTCOM_OFFER
> > fq=abstract_or_primary_product_id:[* TO *]
> > fq=gtin:
> > fq=product_class_type:BUNDLE
> > fq=hasProduct:N
> >
> >
> > On Thu, Aug 31, 2017 at 1:35 PM suresh pendap 
> > wrote:
> >
> > > Hello everybody,
> > >
> > > We are seeing that the below query is running very slow and taking
> > almost 4
> > > seconds to finish
> > >
> > >
> > > [] webapp=/solr path=/select
> > >
> > > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> > start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > > :8983/solr/flat_product_index_shard7_replica1/
> > %7Chttp://:8983/solr/flat_product_index_shard7_
> > replica2/%7Chttp://:8983/solr/flat_product_index_
> >
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> > AND+abstract_or_primary_product_id:*+AND+(gtin:<
> > numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> > 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > > hits=0 status=0 QTime=3663
> > >
> > >
> > > It seems like the abstract_or_primary_product_id:* clause is
> > contributing
> > > to the overall response time. It seems that the
> > > abstract_or_primary_product_id:* . clause is not adding any value in
> the
> > > query criteria and can be safely removed.  Is my understanding correct?
> > >
> > > I would like to know if the order of the clauses in the AND query would
> > > affect the response time of the query?
> > >
> > > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> > >
> > > Doesn't Lucene/Solr pick up the optimal query execution plan?
> > >
> > > Is there anyway to look at the query execution plan generated by
> Lucene?
> > >
> > > Regards
> > > Suresh
> > >
> >
>


How to check if a solr core is down and is ready for a solr re-start

2017-08-31 Thread Minu Theresa Thomas
Hello Team,

I have few experiences where restart of a solr node is the only option when
a core goes down. I am trying to automate the restart of a solr server when
a core goes down or the replica is unresponsive over a period of time.

I have a script to check if the cores/ replicas associated with a node is
up. I have two approaches - One is to get the cores from solr CLUSTERSTATUS
API and do a PING on each core. If atleast one core on the node doesn't
repond to ping, then mark that node down and do restart after few retries.
Second is to get the cores from the solr CLUSTERSTATUS API along with its
status. If the status is down, then mark that node down and do a restart
after few retries.

Which is the best way/ recommended approach to check if a core associated
with a node is down and is ready for a solr service restart?

Thanks!


Re: query with wild card with AND taking lot of time

2017-08-31 Thread David Hastings
>
> 2) Because all your clauses are more like filters and are ANDed together,
> you'll likely get better performance by putting them _each_ in an fq
> E.g.
> fq=product_identifier_type:DOTCOM_OFFER
> fq=abstract_or_primary_product_id:[* TO *]


why is this the case?  is it just better to have no logic operators in the
filter queries?



On Thu, Aug 31, 2017 at 1:47 PM, Josh Lincoln 
wrote:

> Suresh,
> Two things I noticed.
> 1) If your intent is to only match records where there's something,
> anything, in abstract_or_primary_product_id, you should use fieldname:[*
> TO
> *]  but that will exclude records where that field is empty/missing. If you
> want to match records even if that field is empty/missing, then you should
> remove that clause entirely
> 2) Because all your clauses are more like filters and are ANDed together,
> you'll likely get better performance by putting them _each_ in an fq
> E.g.
> fq=product_identifier_type:DOTCOM_OFFER
> fq=abstract_or_primary_product_id:[* TO *]
> fq=gtin:
> fq=product_class_type:BUNDLE
> fq=hasProduct:N
>
>
> On Thu, Aug 31, 2017 at 1:35 PM suresh pendap 
> wrote:
>
> > Hello everybody,
> >
> > We are seeing that the below query is running very slow and taking
> almost 4
> > seconds to finish
> >
> >
> > [] webapp=/solr path=/select
> >
> > params={df=_text_&distrib=false&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> > :8983/solr/flat_product_index_shard7_replica1/
> %7Chttp://:8983/solr/flat_product_index_shard7_
> replica2/%7Chttp://:8983/solr/flat_product_index_
> shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+
> AND+abstract_or_primary_product_id:*+AND+(gtin:<
> numericValue>)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=
> 1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> > hits=0 status=0 QTime=3663
> >
> >
> > It seems like the abstract_or_primary_product_id:* clause is
> contributing
> > to the overall response time. It seems that the
> > abstract_or_primary_product_id:* . clause is not adding any value in the
> > query criteria and can be safely removed.  Is my understanding correct?
> >
> > I would like to know if the order of the clauses in the AND query would
> > affect the response time of the query?
> >
> > For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
> >
> > Doesn't Lucene/Solr pick up the optimal query execution plan?
> >
> > Is there anyway to look at the query execution plan generated by Lucene?
> >
> > Regards
> > Suresh
> >
>


Re: query with wild card with AND taking lot of time

2017-08-31 Thread Josh Lincoln
Suresh,
Two things I noticed.
1) If your intent is to only match records where there's something,
anything, in abstract_or_primary_product_id, you should use fieldname:[* TO
*]  but that will exclude records where that field is empty/missing. If you
want to match records even if that field is empty/missing, then you should
remove that clause entirely
2) Because all your clauses are more like filters and are ANDed together,
you'll likely get better performance by putting them _each_ in an fq
E.g.
fq=product_identifier_type:DOTCOM_OFFER
fq=abstract_or_primary_product_id:[* TO *]
fq=gtin:
fq=product_class_type:BUNDLE
fq=hasProduct:N


On Thu, Aug 31, 2017 at 1:35 PM suresh pendap 
wrote:

> Hello everybody,
>
> We are seeing that the below query is running very slow and taking almost 4
> seconds to finish
>
>
> [] webapp=/solr path=/select
>
> params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://
> :8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
> hits=0 status=0 QTime=3663
>
>
> It seems like the abstract_or_primary_product_id:* clause is contributing
> to the overall response time. It seems that the
> abstract_or_primary_product_id:* . clause is not adding any value in the
> query criteria and can be safely removed.  Is my understanding correct?
>
> I would like to know if the order of the clauses in the AND query would
> affect the response time of the query?
>
> For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10
>
> Doesn't Lucene/Solr pick up the optimal query execution plan?
>
> Is there anyway to look at the query execution plan generated by Lucene?
>
> Regards
> Suresh
>


query with wild card with AND taking lot of time

2017-08-31 Thread suresh pendap
Hello everybody,

We are seeing that the below query is running very slow and taking almost 4
seconds to finish


[] webapp=/solr path=/select
params={df=_text_&distrib=false&fl=id&shards.purpose=4&start=0&fsv=true&sort=modified_dtm+desc&shard.url=http://:8983/solr/flat_product_index_shard7_replica1/%7Chttp://:8983/solr/flat_product_index_shard7_replica2/%7Chttp://:8983/solr/flat_product_index_shard7_replica0/&rows=11&version=2&q=product_identifier_type:DOTCOM_OFFER+AND+abstract_or_primary_product_id:*+AND+(gtin:)+AND+-product_class_type:BUNDLE+AND+-hasProduct:N&NOW=1504196301534&isShard=true&timeAllowed=25000&wt=javabin}
hits=0 status=0 QTime=3663


It seems like the abstract_or_primary_product_id:* clause is contributing
to the overall response time. It seems that the
abstract_or_primary_product_id:* . clause is not adding any value in the
query criteria and can be safely removed.  Is my understanding correct?

I would like to know if the order of the clauses in the AND query would
affect the response time of the query?

For e.g . f1: 3 AND f2:10 AND f3:* vs . f3:* AND f1:3 AND f2:10

Doesn't Lucene/Solr pick up the optimal query execution plan?

Is there anyway to look at the query execution plan generated by Lucene?

Regards
Suresh


Re: slow solr facet processing

2017-08-31 Thread Yonik Seeley
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.

Example: to enable on the "union" field, add f.union.facet.method=uif

Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466

-Yonik


On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
 wrote:
> Hi,
>
> in the meantime I came across the reason for the slow facet processing
> capacities of SOLR since version 5.x
>
>  https://issues.apache.org/jira/browse/SOLR-8096
> https://issues.apache.org/jira/browse/LUCENE-5666
>
> compared to version 4.x
>
> Various library networks across the world are suffering from the same
> symptoms:
>
> Facet processing is one of the most important features of a search server
> (for us) and it seems (at least IMHO) there is no solution for the issue
> since March 2015 (release date for the last SOLR 4 version)
>
> What are the plans / ideas of the solr developers for a possible future
> solution? Or maybe there is already a solution I haven't seen so far.
>
> Thanks for a feedback
>
> Günter
>
>
>
> On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
>>
>> Hi,
>>
>> I can't figure out the reason why the facet processing in version 6 needs
>> significantly more time compared to version 4.
>>
>> The debugging response (for 30 million documents)
>>
>> solr 4
>> 280.0> name="query">0.0> name="time">280.0
>> (once the query is cached)
>> before caching: between 1.5 and 2 sec
>>
>>
>> solr 6.x (my last try was with 6.6)
>> without docvalues for facetting fields (same schema as version 4)
>> 5874.0> name="query">0.0> name="time">5873.0> name="time">0.0
>> the time is not getting better even after repeating the query several
>> times
>>
>>
>> solr 6.6 with docvalues for facetting fields
>> 9837.0> name="query">0.0> name="time">9837.0> name="time">0.0
>>
>> used query (our productive system with version 4)
>>
>> http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
>>
>>
>> Running the queries on smaller indices (8 million docs) the difference is
>> similar although the absolut figures for processing time are smaller
>>
>>
>> Any hints why this huge differences?
>>
>> Günter
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> Universität Basel
> Universitätsbibliothek
> Günter Hipler
> Projekt SwissBib
> Schoenbeinstrasse 18-20
> 4056 Basel, Schweiz
> Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
> E-Mail guenter.hip...@unibas.ch
> URL: www.swissbib.org  / http://www.ub.unibas.ch/
>


Re: "What is Solr" in Google search results

2017-08-31 Thread Stefan Matheis
Well, isn't it always the same with Wikipedia?

It's already there .. so it has to be correct. If you're trying to remove
it, you have to prove it - but there is not even prove it should be there
in the first place oO

You really need to have time to go through that kind of argument ...

-Stefan

On Aug 31, 2017 4:37 PM, "Vincenzo D'Amore"  wrote:

Hi Rick,

right, I've already tried to correct the wikipedia page, to be honest, I've
just removed the sentence "Solr is the second-most... etc."
But my change has been discarded because I missed to add a valid motivation.

Anyway, not sure I'm the most representative person to discuss this in the
wikipedia talk page :) but I'll try to do whatever I can

And just to share with you my thought, my principal motivation is that even
if DB Engines has a proven accuracy, the sentence in question has not be
considered so relevant to explain what is Solr. For sure, it should be used
as first one.


On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir  wrote:

> Vincenzo,
> This is a discussion for the wikipedia 'talk' page. My sense is that
> information must be verifiable, and that the popularity rating at
> db-engines is not transparent. Would you like to start the discussion?
> Cheers -- Rick
>
> On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore 
> wrote:
> >Hi All,
> >
> >googling for "what is Solr" I found this as *first* sentence:
> >
> >"Solr is the second-most popular enterprise search engine after
> >Elasticsearch. ... "
> >
> >The description comes from wikipedia https://en.
> >wikipedia.org/wiki/Apache_Solr
> >
> >Now, well, I'm a little upset, because I think this is a misleading
> >description, this answer does not really... well, answer the question.
> >
> >And even... because Solr is not the first most popular :)))
> >
> >Ok, seriously, the first sentence (or the answer at all) should not
> >define
> >the position of the search engine in a list, in a kind of competition
> >where
> >Solr has the second place.
> >If it is the first, the second or whatever most popular is not the
> >right
> >answer.
> >
> >So I want inform the community and search for an advice, if any, how to
> >have a better description in the Google results page.
> >
> >If you have any comments or questions, please let me know.
> >
> >Best regards,
> >Vincenzo
> >
> >
> >--
> >Vincenzo D'Amore
> >email: v.dam...@gmail.com
> >skype: free.dev
> >mobile: +39 349 8513251 <349%20851%203251>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com




--
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251 <349%20851%203251>


RE: "What is Solr" in Google search results

2017-08-31 Thread Davis, Daniel (NIH/NLM) [C]
Wikipedia seems to be better now.  Thank you, Peaceray.   

Honestly, though, by the numbers, I think the comment was correct.   
Elasticsearch has a much smoother on-ramp for IT developers, but it is much 
harder to customize relevancy and integrate with BigData pipelines.   IT 
developers are the big voters here.

Now, Google will simply index this thread, and then show different rich 
snippets to all of us here...

-Original Message-
From: Vincenzo D'Amore [mailto:v.dam...@gmail.com] 
Sent: Thursday, August 31, 2017 10:37 AM
To: solr-user@lucene.apache.org
Subject: Re: "What is Solr" in Google search results

Hi Rick,

right, I've already tried to correct the wikipedia page, to be honest, I've 
just removed the sentence "Solr is the second-most... etc."
But my change has been discarded because I missed to add a valid motivation.

Anyway, not sure I'm the most representative person to discuss this in the 
wikipedia talk page :) but I'll try to do whatever I can

And just to share with you my thought, my principal motivation is that even if 
DB Engines has a proven accuracy, the sentence in question has not be 
considered so relevant to explain what is Solr. For sure, it should be used as 
first one.


On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir  wrote:

> Vincenzo,
> This is a discussion for the wikipedia 'talk' page. My sense is that 
> information must be verifiable, and that the popularity rating at 
> db-engines is not transparent. Would you like to start the discussion?
> Cheers -- Rick
>
> On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore 
> 
> wrote:
> >Hi All,
> >
> >googling for "what is Solr" I found this as *first* sentence:
> >
> >"Solr is the second-most popular enterprise search engine after 
> >Elasticsearch. ... "
> >
> >The description comes from wikipedia https://en.
> >wikipedia.org/wiki/Apache_Solr
> >
> >Now, well, I'm a little upset, because I think this is a misleading 
> >description, this answer does not really... well, answer the question.
> >
> >And even... because Solr is not the first most popular :)))
> >
> >Ok, seriously, the first sentence (or the answer at all) should not 
> >define the position of the search engine in a list, in a kind of 
> >competition where Solr has the second place.
> >If it is the first, the second or whatever most popular is not the 
> >right answer.
> >
> >So I want inform the community and search for an advice, if any, how 
> >to have a better description in the Google results page.
> >
> >If you have any comments or questions, please let me know.
> >
> >Best regards,
> >Vincenzo
> >
> >
> >--
> >Vincenzo D'Amore
> >email: v.dam...@gmail.com
> >skype: free.dev
> >mobile: +39 349 8513251 <349%20851%203251>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com




--
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251 <349%20851%203251>


Re: Solr Reindex Issue - Can't able to Reindex Old Data

2017-08-31 Thread Erick Erickson
I have no idea where to even start. Have you looked at your Solr logs
to see if there are helpful error messages? What is "reindex"?
Something from some program you're running? 'cause it's not a field
option for Solr schema field definitions, if you're putting that in a
Solr schema I wouldn't even expect the core to initialize.

You might review:
https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick


On Wed, Aug 30, 2017 at 6:34 PM, @Nandan@
 wrote:
> Hi ,
>
> I am using Apache Solr with Cassandra Database. In my table, I have 20
> rows. Due to some changes, I changed my Solr schema and Reindex schema with
> below option as
>
> *reindex=true and deleteAll=false*
>
> After Reindexing my Solr Schema, I am not able to do reindex my old data
> which are present in my table before. I am only able to retrieve newly
> added data which is done after reindexing.
>
> Please help in this issue.
>
> Thanks


Re: Index relational database

2017-08-31 Thread Erick Erickson
To pile on here: When you denormalize you also get some functionality
that you do not get with Solr joins, they've been called "pseudo
joins" in Solr for a reason.

If you just use the simple approach of indexing the two tables then
joining across them you can't return fields from both tables in a
single document. To do that you need to use parent/child docs which
has its own restrictions.

So rather than worry excessively about which is faster, I'd recommend
you decide on the functionality you need as a starting point.

Best,
Erick

On Thu, Aug 31, 2017 at 7:34 AM, Walter Underwood  wrote:
> There is no way tell which is faster without trying it.
>
> Query speed depends on the size of the data (rows), the complexity of the 
> join, which database, what kind of disk, etc.
>
> Solr speed depends on the size of the documents, the complexity of your 
> analysis chains, what kind of disk, how much CPU is available, etc.
>
> We have one query that extracts 9 million documents from MySQL in about 20 
> minutes. We have another query on a different MySQL database that takes 90 
> minutes to get 7 million documents.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 31, 2017, at 12:54 AM, Renuka Srishti  
>> wrote:
>>
>> Thanks Erick, Walter
>> But I think join query will reduce the performance. Denormalization will be
>> the better way than join query, am I right?
>>
>>
>>
>> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
>> wrote:
>>
>>> Think about making a denormalized view, with all the fields needed in one
>>> table. That view gets sent to Solr. Each row is a Solr document.
>>>
>>> It could be implemented as a view or as SQL, but that is a useful mental
>>> model for people starting from a relational background.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
 On Aug 30, 2017, at 9:14 AM, Erick Erickson 
>>> wrote:

 First, it's often best, by far, to denormalize the data in your solr
>>> index,
 that's what I'd explore first.

 If you can't do that, the join query parser might work for you.

 On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
 wrote:

> Thanks Susheel for your response.
> Here is the scenario about which I am talking:
>
>  - Let suppose there are two documents doc1 and doc2.
>  - I want to fetch the data from doc2 on the basis of doc1 fields which
>  are related to doc2.
>
> How to achieve this efficiently.
>
>
> Thanks,
>
> Renuka Srishti
>
>
> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
> wrote:
>
>> Hello Renuka,
>>
>> I would suggest to start with your use case(s). May be start with your
>> first use case with the below questions
>>
>> a) What is that you want to search (which fields like name, desc, city
>> etc.)
>> b) What is that you want to show part of search result (name, city
>>> etc.)
>>
>> Based on above two questions, you would know what data to pull in from
>> relational database and create solr schema and index the data.
>>
>> You may first try to denormalize / flatten the structure so that you
>>> deal
>> with one collection/schema and query upon it.
>>
>> HTH.
>>
>> Thanks,
>> Susheel
>>
>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
>> renuka.srisht...@gmail.com>
>> wrote:
>>
>>> Hii,
>>>
>>> What is the best way to index relational database, and how it impacts
> on
>>> the performance?
>>>
>>> Thanks
>>> Renuka Srishti
>>>
>>
>
>>>
>>>
>


slow solr facet processing

2017-08-31 Thread Günter Hipler

Hi,

in the meantime I came across the reason for the slow facet processing 
capacities of SOLR since version 5.x


 https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666

compared to version 4.x

Various library networks across the world are suffering from the same 
symptoms:


Facet processing is one of the most important features of a search 
server (for us) and it seems (at least IMHO) there is no solution for 
the issue since March 2015 (release date for the last SOLR 4 version)


What are the plans / ideas of the solr developers for a possible future 
solution? Or maybe there is already a solution I haven't seen so far.


Thanks for a feedback

Günter



On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:

Hi,

I can't figure out the reason why the facet processing in version 6 
needs significantly more time compared to version 4.


The debugging response (for 30 million documents)

solr 4
280.0name="query">0.0name="facet">280.0

(once the query is cached)
before caching: between 1.5 and 2 sec


solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
5874.0name="query">0.0name="facet">5873.0name="facet_module">0.0
the time is not getting better even after repeating the query several 
times



solr 6.6 with docvalues for facetting fields
9837.0name="query">0.0name="facet">9837.0name="facet_module">0.0


used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count


Running the queries on smaller indices (8 million docs) the difference 
is similar although the absolut figures for processing time are smaller



Any hints why this huge differences?

Günter











--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/



Re: "What is Solr" in Google search results

2017-08-31 Thread Vincenzo D'Amore
Hi Rick,

right, I've already tried to correct the wikipedia page, to be honest, I've
just removed the sentence "Solr is the second-most... etc."
But my change has been discarded because I missed to add a valid motivation.

Anyway, not sure I'm the most representative person to discuss this in the
wikipedia talk page :) but I'll try to do whatever I can

And just to share with you my thought, my principal motivation is that even
if DB Engines has a proven accuracy, the sentence in question has not be
considered so relevant to explain what is Solr. For sure, it should be used
as first one.


On Thu, Aug 31, 2017 at 5:53 AM, Rick Leir  wrote:

> Vincenzo,
> This is a discussion for the wikipedia 'talk' page. My sense is that
> information must be verifiable, and that the popularity rating at
> db-engines is not transparent. Would you like to start the discussion?
> Cheers -- Rick
>
> On August 30, 2017 5:17:25 PM MDT, Vincenzo D'Amore 
> wrote:
> >Hi All,
> >
> >googling for "what is Solr" I found this as *first* sentence:
> >
> >"Solr is the second-most popular enterprise search engine after
> >Elasticsearch. ... "
> >
> >The description comes from wikipedia https://en.
> >wikipedia.org/wiki/Apache_Solr
> >
> >Now, well, I'm a little upset, because I think this is a misleading
> >description, this answer does not really... well, answer the question.
> >
> >And even... because Solr is not the first most popular :)))
> >
> >Ok, seriously, the first sentence (or the answer at all) should not
> >define
> >the position of the search engine in a list, in a kind of competition
> >where
> >Solr has the second place.
> >If it is the first, the second or whatever most popular is not the
> >right
> >answer.
> >
> >So I want inform the community and search for an advice, if any, how to
> >have a better description in the Google results page.
> >
> >If you have any comments or questions, please let me know.
> >
> >Best regards,
> >Vincenzo
> >
> >
> >--
> >Vincenzo D'Amore
> >email: v.dam...@gmail.com
> >skype: free.dev
> >mobile: +39 349 8513251 <349%20851%203251>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com




-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251 <349%20851%203251>


Re: Index relational database

2017-08-31 Thread Walter Underwood
There is no way tell which is faster without trying it.

Query speed depends on the size of the data (rows), the complexity of the join, 
which database, what kind of disk, etc.

Solr speed depends on the size of the documents, the complexity of your 
analysis chains, what kind of disk, how much CPU is available, etc.

We have one query that extracts 9 million documents from MySQL in about 20 
minutes. We have another query on a different MySQL database that takes 90 
minutes to get 7 million documents.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 31, 2017, at 12:54 AM, Renuka Srishti  
> wrote:
> 
> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
> 
> 
> 
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
> 
>> Think about making a denormalized view, with all the fields needed in one
>> table. That view gets sent to Solr. Each row is a Solr document.
>> 
>> It could be implemented as a view or as SQL, but that is a useful mental
>> model for people starting from a relational background.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Aug 30, 2017, at 9:14 AM, Erick Erickson 
>> wrote:
>>> 
>>> First, it's often best, by far, to denormalize the data in your solr
>> index,
>>> that's what I'd explore first.
>>> 
>>> If you can't do that, the join query parser might work for you.
>>> 
>>> On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
>>> wrote:
>>> 
 Thanks Susheel for your response.
 Here is the scenario about which I am talking:
 
  - Let suppose there are two documents doc1 and doc2.
  - I want to fetch the data from doc2 on the basis of doc1 fields which
  are related to doc2.
 
 How to achieve this efficiently.
 
 
 Thanks,
 
 Renuka Srishti
 
 
 On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
 wrote:
 
> Hello Renuka,
> 
> I would suggest to start with your use case(s). May be start with your
> first use case with the below questions
> 
> a) What is that you want to search (which fields like name, desc, city
> etc.)
> b) What is that you want to show part of search result (name, city
>> etc.)
> 
> Based on above two questions, you would know what data to pull in from
> relational database and create solr schema and index the data.
> 
> You may first try to denormalize / flatten the structure so that you
>> deal
> with one collection/schema and query upon it.
> 
> HTH.
> 
> Thanks,
> Susheel
> 
> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> renuka.srisht...@gmail.com>
> wrote:
> 
>> Hii,
>> 
>> What is the best way to index relational database, and how it impacts
 on
>> the performance?
>> 
>> Thanks
>> Renuka Srishti
>> 
> 
 
>> 
>> 



Re: Solr index getting replaced instead of merged

2017-08-31 Thread David Hastings
>Can anyone tell is it possible to paginate the data using Solr UI?

use the start/rows input fields using standard array start as 0,  ie
start=0, rows=10
start=10, rows=10
start=20, rows=10


On Thu, Aug 31, 2017 at 8:21 AM, Agrawal, Harshal (GE Digital) <
harshal.agra...@ge.com> wrote:

> Hello All,
>
> If I check out clear option while indexing 2nd table it worked.Thanks
> Gurdeep :)
> Can anyone tell is it possible to paginate the data using Solr UI?
> If yes please tell me the features which I can use?
>
> Regards
> Harshal
>
> From: Agrawal, Harshal (GE Digital)
> Sent: Wednesday, August 30, 2017 4:36 PM
> To: 'solr-user@lucene.apache.org' 
> Cc: Singh, Susnata (GE Digital) 
> Subject: Solr index getting replaced instead of merged
>
> Hello Guys,
>
> I have installed solr in my local system and was able to connect to
> Teradata successfully.
> For single table I am able to index the data and query it also but when I
> am trying for multiple tables in the same schema and doing indexing one by
> one respectively.
> I can see datasets getting replaced instead of merged .
>
> Can anyone help me please:
>
> Regards
> Harshal
>
>
>


RE: Solr index getting replaced instead of merged

2017-08-31 Thread Agrawal, Harshal (GE Digital)
Hello All,

If I check out clear option while indexing 2nd table it worked.Thanks Gurdeep :)
Can anyone tell is it possible to paginate the data using Solr UI?
If yes please tell me the features which I can use?

Regards
Harshal

From: Agrawal, Harshal (GE Digital)
Sent: Wednesday, August 30, 2017 4:36 PM
To: 'solr-user@lucene.apache.org' 
Cc: Singh, Susnata (GE Digital) 
Subject: Solr index getting replaced instead of merged

Hello Guys,

I have installed solr in my local system and was able to connect to Teradata 
successfully.
For single table I am able to index the data and query it also but when I am 
trying for multiple tables in the same schema and doing indexing one by one 
respectively.
I can see datasets getting replaced instead of merged .

Can anyone help me please:

Regards
Harshal




Re: Index relational database

2017-08-31 Thread David Hastings
when indexing a relational database its generally always best to denormalize it
in a view or in your indexing code

On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti 
wrote:

> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
>
>
>
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
>
> > Think about making a denormalized view, with all the fields needed in one
> > table. That view gets sent to Solr. Each row is a Solr document.
> >
> > It could be implemented as a view or as SQL, but that is a useful mental
> > model for people starting from a relational background.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> > wrote:
> > >
> > > First, it's often best, by far, to denormalize the data in your solr
> > index,
> > > that's what I'd explore first.
> > >
> > > If you can't do that, the join query parser might work for you.
> > >
> > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > > wrote:
> > >
> > >> Thanks Susheel for your response.
> > >> Here is the scenario about which I am talking:
> > >>
> > >>   - Let suppose there are two documents doc1 and doc2.
> > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> which
> > >>   are related to doc2.
> > >>
> > >> How to achieve this efficiently.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Renuka Srishti
> > >>
> > >>
> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar  >
> > >> wrote:
> > >>
> > >>> Hello Renuka,
> > >>>
> > >>> I would suggest to start with your use case(s). May be start with
> your
> > >>> first use case with the below questions
> > >>>
> > >>> a) What is that you want to search (which fields like name, desc,
> city
> > >>> etc.)
> > >>> b) What is that you want to show part of search result (name, city
> > etc.)
> > >>>
> > >>> Based on above two questions, you would know what data to pull in
> from
> > >>> relational database and create solr schema and index the data.
> > >>>
> > >>> You may first try to denormalize / flatten the structure so that you
> > deal
> > >>> with one collection/schema and query upon it.
> > >>>
> > >>> HTH.
> > >>>
> > >>> Thanks,
> > >>> Susheel
> > >>>
> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > >>> renuka.srisht...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hii,
> > 
> >  What is the best way to index relational database, and how it
> impacts
> > >> on
> >  the performance?
> > 
> >  Thanks
> >  Renuka Srishti
> > 
> > >>>
> > >>
> >
> >
>


Re: Index relational database

2017-08-31 Thread Renuka Srishti
Thank all for sharing your thoughts  :)

On Thu, Aug 31, 2017 at 5:28 PM, Susheel Kumar 
wrote:

> Yes, if you can avoid join and work with flat/denormalized structure then
> that's the best.
>
> On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti <
> renuka.srisht...@gmail.com>
> wrote:
>
> > Thanks Erick, Walter
> > But I think join query will reduce the performance. Denormalization will
> be
> > the better way than join query, am I right?
> >
> >
> >
> > On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood <
> wun...@wunderwood.org>
> > wrote:
> >
> > > Think about making a denormalized view, with all the fields needed in
> one
> > > table. That view gets sent to Solr. Each row is a Solr document.
> > >
> > > It could be implemented as a view or as SQL, but that is a useful
> mental
> > > model for people starting from a relational background.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > > > On Aug 30, 2017, at 9:14 AM, Erick Erickson  >
> > > wrote:
> > > >
> > > > First, it's often best, by far, to denormalize the data in your solr
> > > index,
> > > > that's what I'd explore first.
> > > >
> > > > If you can't do that, the join query parser might work for you.
> > > >
> > > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" <
> renuka.srisht...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks Susheel for your response.
> > > >> Here is the scenario about which I am talking:
> > > >>
> > > >>   - Let suppose there are two documents doc1 and doc2.
> > > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> > which
> > > >>   are related to doc2.
> > > >>
> > > >> How to achieve this efficiently.
> > > >>
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Renuka Srishti
> > > >>
> > > >>
> > > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar <
> susheel2...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> Hello Renuka,
> > > >>>
> > > >>> I would suggest to start with your use case(s). May be start with
> > your
> > > >>> first use case with the below questions
> > > >>>
> > > >>> a) What is that you want to search (which fields like name, desc,
> > city
> > > >>> etc.)
> > > >>> b) What is that you want to show part of search result (name, city
> > > etc.)
> > > >>>
> > > >>> Based on above two questions, you would know what data to pull in
> > from
> > > >>> relational database and create solr schema and index the data.
> > > >>>
> > > >>> You may first try to denormalize / flatten the structure so that
> you
> > > deal
> > > >>> with one collection/schema and query upon it.
> > > >>>
> > > >>> HTH.
> > > >>>
> > > >>> Thanks,
> > > >>> Susheel
> > > >>>
> > > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > > >>> renuka.srisht...@gmail.com>
> > > >>> wrote:
> > > >>>
> > >  Hii,
> > > 
> > >  What is the best way to index relational database, and how it
> > impacts
> > > >> on
> > >  the performance?
> > > 
> > >  Thanks
> > >  Renuka Srishti
> > > 
> > > >>>
> > > >>
> > >
> > >
> >
>


Re: Index relational database

2017-08-31 Thread Susheel Kumar
Yes, if you can avoid join and work with flat/denormalized structure then
that's the best.

On Thu, Aug 31, 2017 at 3:54 AM, Renuka Srishti 
wrote:

> Thanks Erick, Walter
> But I think join query will reduce the performance. Denormalization will be
> the better way than join query, am I right?
>
>
>
> On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
> wrote:
>
> > Think about making a denormalized view, with all the fields needed in one
> > table. That view gets sent to Solr. Each row is a Solr document.
> >
> > It could be implemented as a view or as SQL, but that is a useful mental
> > model for people starting from a relational background.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> > wrote:
> > >
> > > First, it's often best, by far, to denormalize the data in your solr
> > index,
> > > that's what I'd explore first.
> > >
> > > If you can't do that, the join query parser might work for you.
> > >
> > > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > > wrote:
> > >
> > >> Thanks Susheel for your response.
> > >> Here is the scenario about which I am talking:
> > >>
> > >>   - Let suppose there are two documents doc1 and doc2.
> > >>   - I want to fetch the data from doc2 on the basis of doc1 fields
> which
> > >>   are related to doc2.
> > >>
> > >> How to achieve this efficiently.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Renuka Srishti
> > >>
> > >>
> > >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar  >
> > >> wrote:
> > >>
> > >>> Hello Renuka,
> > >>>
> > >>> I would suggest to start with your use case(s). May be start with
> your
> > >>> first use case with the below questions
> > >>>
> > >>> a) What is that you want to search (which fields like name, desc,
> city
> > >>> etc.)
> > >>> b) What is that you want to show part of search result (name, city
> > etc.)
> > >>>
> > >>> Based on above two questions, you would know what data to pull in
> from
> > >>> relational database and create solr schema and index the data.
> > >>>
> > >>> You may first try to denormalize / flatten the structure so that you
> > deal
> > >>> with one collection/schema and query upon it.
> > >>>
> > >>> HTH.
> > >>>
> > >>> Thanks,
> > >>> Susheel
> > >>>
> > >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> > >>> renuka.srisht...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hii,
> > 
> >  What is the best way to index relational database, and how it
> impacts
> > >> on
> >  the performance?
> > 
> >  Thanks
> >  Renuka Srishti
> > 
> > >>>
> > >>
> >
> >
>


Antwort: RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Stephan Schubert
Hi Markus,

I don't know what Client you use, but if you are using SolrJ enabling the 
logging could be an option to "dig deeper" into the problem. This can be 
the ouput for example via log4j on log level info:

...
2017-08-31 10:01:56 INFO  ZooKeeper:438 - Initiating client connection, 
connectString=ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 
sessionTimeout=60 
watcher=org.apache.solr.common.cloud.SolrZkClient$3@14379273
2017-08-31 10:01:56 INFO  ClientCnxn:876 - Socket connection established 
to SOLRHOST/ZKHOST3:9983, initiating session
2017-08-31 10:01:56 INFO  ClientCnxn:1299 - Session establishment complete 
on server SOLRHOST/ZKHOST3:9983, sessionid = 0x45e35eaa9fd3584, negotiated 
timeout = 4
2017-08-31 10:01:56 INFO  ZkStateReader:688 - Updated live nodes from 
ZooKeeper... (0) -> (4)
2017-08-31 10:01:56 INFO  ZkClientClusterStateProvider:134 - Cluster at 
ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 ready





Von:Markus Jelsma 
An: solr-user@lucene.apache.org 
Datum:  31.08.2017 10:00
Betreff:RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are 
disabled.



Hello Stephan,

I know that restarting stuff can sometimes cure what's wrong, but we are 
nog going to, we want to get rid of the problem, not restart microsoft 
windows whenever things run slow. Also, there is no indexing going on 
right now.

We also see these sometimes, this explains at least why it cannot talk to 
Zookeeper, but why..
 o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd, 
retrying.. 

This has been going on with just one of our nodes for over two hours, 
other nodes are fine. And why is this bad node green in cluster overview?

Thanks,
Markus

-Original message-
> From:Stephan Schubert 
> Sent: Thursday 31st August 2017 9:52
> To: solr-user@lucene.apache.org
> Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
> 
> Hi markus,
> 
> try to stop your indexing/update processes and restart your ZooKeeper 
> instances (not all at the same time of course). This is what I do in 
these 
> cases and helped me so far.
> 
> 
> 
> 
> Von:Markus Jelsma 
> An: Solr-user 
> Datum:  31.08.2017 09:49
> Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled.
> 
> 
> 
> Hello,
> 
> One node is behaving badly, at least according to the logs, but the node 

> is green in the cluster overview although the logs claim recovery fails 
> all the time. It is not the first time this message pops up in the logs 
of 
> one of the nodes, why can it not talk to Zookeeper? I miss a reason.
> 
> The cluster is not extremely busy at the moment, we allow plenty of file 

> descriptors, there are no firewall restrictions, i cannot think of any 
> problem in our infrastructure.
> 
> What's going on? What can i do? Can the error be explained a bit 
further?
> 
> Thanks,
> Markus
> 
> 8/31/2017, 9:34:34 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 8/31/2017, 9:34:34 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 8/31/2017, 9:34:36 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 8/31/2017, 9:34:38 AM
> ERROR false
> RecoveryStrategy
> Could not publish as ACTIVE after succesful recovery
> 8/31/2017, 9:34:38 AM
> ERROR true
> RecoveryStrategy
> Recovery failed - trying again... (0)
> 8/31/2017, 9:34:49 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 8/31/2017, 9:34:49 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 8/31/2017, 9:34:50 AM
> ERROR false
> RecoveryStrategy
> Could not publish as ACTIVE after succesful recovery
> 8/31/2017, 9:34:50 AM
> ERROR false
> RecoveryStrategy
> Recovery failed - trying again... (1)
> 8/31/2017, 9:35:36 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 

> are disabled.
> 
> 
> 
> 





RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Markus Jelsma
Hello Stephan,

I know that restarting stuff can sometimes cure what's wrong, but we are nog 
going to, we want to get rid of the problem, not restart microsoft windows 
whenever things run slow. Also, there is no indexing going on right now.

We also see these sometimes, this explains at least why it cannot talk to 
Zookeeper, but why..
 o.a.s.c.RecoveryStrategy Socket timeout on send prep recovery cmd, retrying.. 

This has been going on with just one of our nodes for over two hours, other 
nodes are fine. And why is this bad node green in cluster overview?

Thanks,
Markus

-Original message-
> From:Stephan Schubert 
> Sent: Thursday 31st August 2017 9:52
> To: solr-user@lucene.apache.org
> Subject: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.
> 
> Hi markus,
> 
> try to stop your indexing/update processes and restart your ZooKeeper 
> instances (not all at the same time of course). This is what I do in these 
> cases and helped me so far.
> 
> 
> 
> 
> Von:Markus Jelsma 
> An: Solr-user 
> Datum:  31.08.2017 09:49
> Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled.
> 
> 
> 
> Hello,
> 
> One node is behaving badly, at least according to the logs, but the node 
> is green in the cluster overview although the logs claim recovery fails 
> all the time. It is not the first time this message pops up in the logs of 
> one of the nodes, why can it not talk to Zookeeper? I miss a reason.
> 
> The cluster is not extremely busy at the moment, we allow plenty of file 
> descriptors, there are no firewall restrictions, i cannot think of any 
> problem in our infrastructure.
> 
> What's going on? What can i do? Can the error be explained a bit further?
> 
> Thanks,
> Markus
> 
> 8/31/2017, 9:34:34 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 8/31/2017, 9:34:34 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 8/31/2017, 9:34:36 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 8/31/2017, 9:34:38 AM
> ERROR false
> RecoveryStrategy
> Could not publish as ACTIVE after succesful recovery
> 8/31/2017, 9:34:38 AM
> ERROR true
> RecoveryStrategy
> Recovery failed - trying again... (0)
> 8/31/2017, 9:34:49 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 8/31/2017, 9:34:49 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 8/31/2017, 9:34:50 AM
> ERROR false
> RecoveryStrategy
> Could not publish as ACTIVE after succesful recovery
> 8/31/2017, 9:34:50 AM
> ERROR false
> RecoveryStrategy
> Recovery failed - trying again... (1)
> 8/31/2017, 9:35:36 AM
> ERROR false
> RequestHandlerBase
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
> are disabled.
> 
> 
> 
> 


Re: Index relational database

2017-08-31 Thread Renuka Srishti
Thanks Erick, Walter
But I think join query will reduce the performance. Denormalization will be
the better way than join query, am I right?



On Wed, Aug 30, 2017 at 10:18 PM, Walter Underwood 
wrote:

> Think about making a denormalized view, with all the fields needed in one
> table. That view gets sent to Solr. Each row is a Solr document.
>
> It could be implemented as a view or as SQL, but that is a useful mental
> model for people starting from a relational background.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 30, 2017, at 9:14 AM, Erick Erickson 
> wrote:
> >
> > First, it's often best, by far, to denormalize the data in your solr
> index,
> > that's what I'd explore first.
> >
> > If you can't do that, the join query parser might work for you.
> >
> > On Aug 30, 2017 4:49 AM, "Renuka Srishti" 
> > wrote:
> >
> >> Thanks Susheel for your response.
> >> Here is the scenario about which I am talking:
> >>
> >>   - Let suppose there are two documents doc1 and doc2.
> >>   - I want to fetch the data from doc2 on the basis of doc1 fields which
> >>   are related to doc2.
> >>
> >> How to achieve this efficiently.
> >>
> >>
> >> Thanks,
> >>
> >> Renuka Srishti
> >>
> >>
> >> On Mon, Aug 28, 2017 at 7:02 PM, Susheel Kumar 
> >> wrote:
> >>
> >>> Hello Renuka,
> >>>
> >>> I would suggest to start with your use case(s). May be start with your
> >>> first use case with the below questions
> >>>
> >>> a) What is that you want to search (which fields like name, desc, city
> >>> etc.)
> >>> b) What is that you want to show part of search result (name, city
> etc.)
> >>>
> >>> Based on above two questions, you would know what data to pull in from
> >>> relational database and create solr schema and index the data.
> >>>
> >>> You may first try to denormalize / flatten the structure so that you
> deal
> >>> with one collection/schema and query upon it.
> >>>
> >>> HTH.
> >>>
> >>> Thanks,
> >>> Susheel
> >>>
> >>> On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti <
> >>> renuka.srisht...@gmail.com>
> >>> wrote:
> >>>
>  Hii,
> 
>  What is the best way to index relational database, and how it impacts
> >> on
>  the performance?
> 
>  Thanks
>  Renuka Srishti
> 
> >>>
> >>
>
>


Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Stephan Schubert
Hi markus,

try to stop your indexing/update processes and restart your ZooKeeper 
instances (not all at the same time of course). This is what I do in these 
cases and helped me so far.




Von:Markus Jelsma 
An: Solr-user 
Datum:  31.08.2017 09:49
Betreff:6.6 Cannot talk to ZooKeeper - Updates are disabled.



Hello,

One node is behaving badly, at least according to the logs, but the node 
is green in the cluster overview although the logs claim recovery fails 
all the time. It is not the first time this message pops up in the logs of 
one of the nodes, why can it not talk to Zookeeper? I miss a reason.

The cluster is not extremely busy at the moment, we allow plenty of file 
descriptors, there are no firewall restrictions, i cannot think of any 
problem in our infrastructure.

What's going on? What can i do? Can the error be explained a bit further?

Thanks,
Markus

8/31/2017, 9:34:34 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
8/31/2017, 9:34:34 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
8/31/2017, 9:34:36 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
8/31/2017, 9:34:38 AM
ERROR false
RecoveryStrategy
Could not publish as ACTIVE after succesful recovery
8/31/2017, 9:34:38 AM
ERROR true
RecoveryStrategy
Recovery failed - trying again... (0)
8/31/2017, 9:34:49 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
8/31/2017, 9:34:49 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
8/31/2017, 9:34:50 AM
ERROR false
RecoveryStrategy
Could not publish as ACTIVE after succesful recovery
8/31/2017, 9:34:50 AM
ERROR false
RecoveryStrategy
Recovery failed - trying again... (1)
8/31/2017, 9:35:36 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.





6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Markus Jelsma
Hello,

One node is behaving badly, at least according to the logs, but the node is 
green in the cluster overview although the logs claim recovery fails all the 
time. It is not the first time this message pops up in the logs of one of the 
nodes, why can it not talk to Zookeeper? I miss a reason.

The cluster is not extremely busy at the moment, we allow plenty of file 
descriptors, there are no firewall restrictions, i cannot think of any problem 
in our infrastructure.

What's going on? What can i do? Can the error be explained a bit further?

Thanks,
Markus

8/31/2017, 9:34:34 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
8/31/2017, 9:34:34 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
8/31/2017, 9:34:36 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
8/31/2017, 9:34:38 AM
ERROR false
RecoveryStrategy
Could not publish as ACTIVE after succesful recovery
8/31/2017, 9:34:38 AM
ERROR true
RecoveryStrategy
Recovery failed - trying again... (0)
8/31/2017, 9:34:49 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
8/31/2017, 9:34:49 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
8/31/2017, 9:34:50 AM
ERROR false
RecoveryStrategy
Could not publish as ACTIVE after succesful recovery
8/31/2017, 9:34:50 AM
ERROR false
RecoveryStrategy
Recovery failed - trying again... (1)
8/31/2017, 9:35:36 AM
ERROR false
RequestHandlerBase
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.


Antwort: Re: Bug in Solr 6.6.0? "Cannot change DocValues type from SORTED_SET to SORTED"

2017-08-31 Thread Stephan Schubert
No, not really. I'm wondering why on both Servers (each Collection has a 
replica on a different server) doesn't have a problem with it in 6.5., but 
if I update on Solr 6.6. on both instances the error occured. As I already 
said, If i revert the instances back to 6.5. everything seems to be fine. 
This is a little bit strange in my opinion ;)



Von:Erick Erickson 
An: solr-user 
Datum:  30.08.2017 21:34
Betreff:Re: Bug in Solr 6.6.0? "Cannot change DocValues type from 
SORTED_SET to SORTED"



P.S. Perhaps the defaults changed when you upgraded for some reason?

Erick

On Wed, Aug 30, 2017 at 11:15 AM, Erick Erickson
 wrote:
> This usually means you changed multiValued from true to false or vice
> versa then added more docs.
>
> So since each segment is its own "mini index", different segments have
> different expectations and when you query this error is thrown.
>
> Most of the time when you change a field's type in the schema you have
> to re-index from scratch. And I'd delete *:* first (or just use a new
> collection and alias).
>
> Best,
> Erick
>
> On Wed, Aug 30, 2017 at 10:04 AM, Stephan Schubert
>  wrote:
>> After I tried an update from Solr 6.5.0 to Solr 6.6.0 (SolrCloud mode), 
I
>> receive in one collection the following error:
>>
>> "Cannot change DocValues type from SORTED_SET to SORTED for field
>> "index_todelete".
>>
>> I had a look on the index values (if set all are true or not filled,
>> checked via faceting in the working instance) and I can't see anything
>> special issues on this field. In the case I move back to Solr 6.5.0 the
>> Solr collection is coming up normal with the same set of index data. So 
I
>> assume there was any change in 6.6.0 but couldn't find anything in the
>> release notes nor in any known issues in JIRA.
>>
>> Does anyone have an idea what's going on here? The field even doesn't 
have
>> docValues set or multivalued, so I don't understand the error message
>> here.
>>
>> Configuration in schema.xml:
>> > stored="true" type="boolean"/>
>>
>>
>> Error Log:
>> java.util.concurrent.ExecutionException:
>> org.apache.solr.common.SolrException: Unable to create core
>> [GLOBAL-Fileshares-Index_shard1_replica2]
>>  at
>> java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>  at
>> java.util.concurrent.FutureTask.get(FutureTask.java:192)
>>  at
>> 
org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:586)
>>  at
>> 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>>  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>  at
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at
>> 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>>  at
>> 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>  at
>> 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.solr.common.SolrException: Unable to create core
>> [GLOBAL-Fileshares-Index_shard1_replica2]
>>  at
>> org.apache.solr.core.CoreContainer.create(CoreContainer.java:935)
>>  at
>> 
org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:558)
>>  at
>> 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
>>  ... 5 more
>> Caused by: org.apache.solr.common.SolrException: Error opening new
>> searcher
>>  at
>> org.apache.solr.core.SolrCore.(SolrCore.java:977)
>>  at
>> org.apache.solr.core.SolrCore.(SolrCore.java:830)
>>  at
>> org.apache.solr.core.CoreContainer.create(CoreContainer.java:920)
>>  ... 7 more
>> Caused by: org.apache.solr.common.SolrException: Error opening new
>> searcher
>>  at
>> org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2069)
>>  at
>> org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2189)
>>  at
>> org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1071)
>>  at
>> org.apache.solr.core.SolrCore.(SolrCore.java:949)
>>  ... 9 more
>> Caused by: java.lang.IllegalArgumentException: cannot change DocValues
>> type from SORTED_SET to SORTED for field "index_todelete"
>>  at
>> org.apache.lucene.index.FieldInfo.setDocValuesType(FieldInfo.java:212)
>>  at
>> 
org.apache.lucene.index.FieldInfos$Builder.addOrUpdateInternal(FieldInfos.java:430)
>>  at
>> org.apache.lucene.index.FieldInfos$Builder.add(FieldInfos.java:438)
>>  at
>> org.apache.lucene.index.FieldInfos$B