Re: query formulation

2016-09-09 Thread Shawn Heisey
On 9/9/2016 9:17 PM, Prasanna S. Dhakephalkar wrote:
> Further search on net got me answer
>
> The query to be
>
> a_id:20 OR (*:* NOT a_id:*)
>
> I don't understand this syntax

The basic problem here is that negative queries don't work.  If you're
going to subtract X, you have to start with something (like all docs),
or the result is nothing.

For simple queries (just a single "-field:X" clause), Solr is able to
detect the unworkable situation and implicitly add a "*:*" starting
point, so the query works.

When the query has ANY complexity, Solr's negative query detection isn't
possible, and the query can't be fixed automatically, so it doesn't work.

Thanks,
Shawn



Re: Detecting down node with SolrJ

2016-09-09 Thread Shawn Heisey
On 9/9/2016 4:38 PM, Brent wrote:
> Is there a way to tell whether or not a node at a specific address is
> up using a SolrJ API? 

Based on your other questions, I think you're running cloud.  If that
assumption is correct, use the Collections API with HttpSolrClient
(instead of CloudSolrClient) to get a list of collections.  Using
HttpSolrClient will direct the request to that specific server.

If it doesn't throw an exception, it's up.  Here's some SolrJ code. 
You're going to need some exception handling that's not included here:

RequestConfig rc =
RequestConfig.custom().setConnectTimeout(5000).setSocketTimeout(5000)
.build();
HttpClient hc =
HttpClients.custom().setDefaultRequestConfig(rc).disableAutomaticRetries().build();
SolrClient client = new
HttpSolrClient("http://server:8983/solr";, hc);
CollectionAdminRequest.List req = new CollectionAdminRequest.List();
CollectionAdminResponse response = req.process(client);

I am setting the timeouts for HttpClient to five seconds so that the
request will time out relatively quick in the case where the server
isn't up, or where it's up but not functioning correctly.

Thanks,
Shawn



RE: query formulation

2016-09-09 Thread Prasanna S. Dhakephalkar
Hi,

 

Further search on net got me answer

The query to be

a_id:20 OR (*:* NOT a_id:*)

 

I don't understand this syntax

I am bit raw at solr query formations :)

 

Regards,

 

Prasanna.

 

From: Prasanna S. Dhakephalkar [mailto:prasann...@merajob.in] 
Sent: Saturday, September 10, 2016 8:24 AM
To: 'solr-user@lucene.apache.org'
Subject: query formulation

 

Greetings Group,

 

I am attempting to formulate a query that gives me all the records such that

1.   The record does not have field a_id

2.   If a_id field exists then it should have a value 20

 

So,  for 1. I used -a_id:* (got 25 results)

For 2. I used a_id:20 (got 3 results)

 

For combination I used

-a_id:* OR a_id:20 (was expecting 28 results)

Got nothing.

 

What Am I missing ?

 

Regards,

 

Prasanna.



query formulation

2016-09-09 Thread Prasanna S. Dhakephalkar
Greetings Group,

 

I am attempting to formulate a query that gives me all the records such that

1.   The record does not have field a_id

2.   If a_id field exists then it should have a value 20

 

So,  for 1. I used -a_id:* (got 25 results)

For 2. I used a_id:20 (got 3 results)

 

For combination I used

-a_id:* OR a_id:20 (was expecting 28 results)

Got nothing.

 

What Am I missing ?

 

Regards,

 

Prasanna.



Re: [Rerank Query] Distributed search + pagination

2016-09-09 Thread Alessandro Benedetti
Let me explain further,
let's assume a simple case when we have 2 shards.
ReRankDocs =10 , rows=10 .

Correct me if I am wrong Joel,
What we would like :
1 page : top 10 re-scored
2 page: remaining 10 re-scored
>From page 3 the original scored docs.
This is what is happening in a single sol instance if we put reRankDocs to
20.

Let's see with sharing :
To get the first page we get top 10 ( re-scored) from shard1 and top 10
reranked for shard 2.
Then the merged top 10 ( re-scored) will be calculated, and that is the
page 1.

But when we require the page 2, which means we additionally ask now :
20 docs to shard1, 10 re-scored and 10 not.
20 docs to shard2, 10 re-scored and 10 not.
At this point we have 40 docs to merge and rank..
The docs with the original score can go at any position ( not necessarily
the last 20)
In the page 2 we can find potentially docs with the original score.
This is even more likely if the scores are on differente scales (e.g. the
re-scored 0100 ) .

Am I right ?
Did I make any wrong assumption so far ?

Cheers


On Fri, Sep 9, 2016 at 7:47 PM, Joel Bernstein  wrote:

> I'm not understanding where the inconsistency comes into play.
>
> The re-ranking occurs on the shards. The aggregator node will be sent some
> docs that have been re-scored and others that are not. But the sorting
> should be the same as someone pages through the result set.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Sep 9, 2016 at 9:28 AM, Alessandro Benedetti <
> abenede...@apache.org>
> wrote:
>
> > Hi guys,
> > was just experimenting some reranker with really low number of rerank
> docs
> > ( 10= pageSize) .
> > Let's focus on the distributed enviroment and  the manual sharding
> > approach.
> >
> > Currently what happens is that the reranking task is delivered by the
> > shards, they rescore the docs and then send them back to the aggregator
> > node.
> >
> > If you want to rerank only few docs ( leaving the others with the
> original
> > score following), this can be done in a single Solr instance ( the
> howmany
> > logic manages that in the reranker) .
> >
> > What happens when you move to a distributed environment ?
> > The aggregator will aggregate both rescored and original scored
> documents,
> > making the final ranking inconsistent.
> > In the other hand if we make the rarankingDocs threshold dynamic ( to
> adapt
> > to start+rows) we can incur in the very annoying issue of having a
> document
> > sliding through the pages ( visible in the first page , then appearing
> > again in the third ect ect).
> >
> > Any thought ?
> >
> > Cheers
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Detecting down node with SolrJ

2016-09-09 Thread Brent
Is there a way to tell whether or not a node at a specific address is up
using a SolrJ API?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Detecting-down-node-with-SolrJ-tp4295402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread igiguere
"If you want the new apps to use the latest Solr, then I would assume you
want them on the new ZKworry about the config."
- Don't worry about that.  That was the easy part.  We install and configure
these apps often enough during a development phase!

I double-checked the requirements, and Solr configurations of the other app
are not needed on the same ZK.

So, for my part, my problem is solved (for now): one config, to avoid
confusion, then migration goes smoothly.

I hope you have as much fun with you migration(s)... ;)

Thanks again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-config-found-but-not-mapped-to-collection-tp4295375p4295400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread John Bickerstaff
Sure - and the apps may not have been pointed at the new ZK?

If you want the new apps to use the latest Solr, then I would assume you
want them on the new ZK, but I'll bet a dollar that there are some
configurations, etc, that need to change in the applications before that
will work right...

On Fri, Sep 9, 2016 at 3:57 PM, igiguere  wrote:

> Sorry for the confusion... I forget that not everyone sees what I see ;)
>
> The other configs that I mention are from another application that uses
> Solr
> and Zookeeper.  In theory, both apps should be able to share resources like
> Solr and ZK, but I need to double-check on the necessity to have both app's
> Solr configurations on the same ZK...
>
> Thank you for your help!
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Migrating-config-found-but-not-mapped-to-
> collection-tp4295375p4295394.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread igiguere
Sorry for the confusion... I forget that not everyone sees what I see ;)

The other configs that I mention are from another application that uses Solr
and Zookeeper.  In theory, both apps should be able to share resources like
Solr and ZK, but I need to double-check on the necessity to have both app's
Solr configurations on the same ZK...

Thank you for your help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-config-found-but-not-mapped-to-collection-tp4295375p4295394.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread John Bickerstaff
I'm not sure, but based on what I saw in the script command line, I would
expect ONLY otac_en to show up - since that is the only one mentioned on
the -confname flag...

Glad you've made some progress - happy to try to assist in figuring out the
larger problem if you want to.

I have sweat a lot of blood on trying to figure out how to upgrade, so it
may be that I'll have some useful insights...  Or not..  you never
know...  Depends on how similar our use cases are I suppose...

On Fri, Sep 9, 2016 at 3:41 PM, igiguere  wrote:

> Hi again;
>
> To answer your questions:
>
> 1. I use a ZK browser, so I can see what happens in ZK.
> https://github.com/DeemOpen/zkui
>
> 2. Solr 4.3 is on it's own ZK, on one server.  Solr 5.4 is on another ZK,
> on
> a different server.  No mixing whatsoever.
>
> 3. Take  to mean "ZK root", so, yes /ot is at the root
>
> Meanwhile, I have made some "progress", if that's what I could call it.  By
> deleting form ZK all the irrelevant configurations, and leaving only
> "otac_en", I was able to migrate one collection.
>
> Fortunately, for this specific project, only "otac_en" is relevant, so we
> could live with the deletion of the other configs.
> However, we can't expect ZK to only ever hold just one config for Solr.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Migrating-config-found-but-not-mapped-to-
> collection-tp4295375p4295391.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread igiguere
Hi again;

To answer your questions:

1. I use a ZK browser, so I can see what happens in ZK.
https://github.com/DeemOpen/zkui

2. Solr 4.3 is on it's own ZK, on one server.  Solr 5.4 is on another ZK, on
a different server.  No mixing whatsoever.

3. Take  to mean "ZK root", so, yes /ot is at the root

Meanwhile, I have made some "progress", if that's what I could call it.  By
deleting form ZK all the irrelevant configurations, and leaving only
"otac_en", I was able to migrate one collection.

Fortunately, for this specific project, only "otac_en" is relevant, so we
could live with the deletion of the other configs.
However, we can't expect ZK to only ever hold just one config for Solr.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-config-found-but-not-mapped-to-collection-tp4295375p4295391.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread John Bickerstaff
I'm afraid I'm pretty confused about what's going on...  Naturally, because
it's new to me and you've been staring at it for a long time...

I'm afraid I'll have to ask some basic questions to get myself on the right
page...

When you say this:
The scrtipt does the job: the config is visible at
/ot/solr/configs/otac_en

1. Do you mean that you've gone to the Zookeeper server and used the
zkCli.sh command to look inside the "directories" that Zookeeper maintains
and there is a /ot/solr/configs/otac_en "directory" on the Zookeeper server?

2. Also, are you "mixing" the data in Zookeeper?  By this I mean that you
have no distinction in Zookeeper for which data you're dealing with (Solr
4.x or Solr 5.x)?

I ask this second question because I found that things got hopelessly
muddled unless I made a totally clear demarcation between different
versions of Solr by using the "chroot" functionality to make a totally
separate "node" in Zookeeper...

I used this command line to make a totally new "node" in Zookeeper for my
upgrades...

/opt/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost zoo1,zoo2,zoo3 -cmd
makepath /solr6_1

This gave me a totally clean "solr6_1" node in Zookeeper and made it much
easier to tell if things were "done" or not...

3. What is the "structure" of your nodes in Zookeeper?  If you do a "ls l"
in the zkCli.sh tool on zookeeper, what does it show?  Does it show /ot on
the "root"?





On Fri, Sep 9, 2016 at 3:10 PM, igiguere  wrote:

> Hi JohnB;
>
> We have a script that calls the ZK CLI with this command:
> zk.bat -cmd upconfig -zkhost %SOLR_ZK_ENSEMBLE% -confdir
> %INSTALL_DIR%\etc\solr\otac\default-en\conf -confname otac_en
>
> The scrtipt does the job: the config is visible at
> /ot/solr/configs/otac_en
>
> You can see in Solr's error message that the config does exist: otac_en
> found:[otif_und, otif_de, otif_fr, otif_it, otif_es, otif_nl, otif_noText,
> otif_ja, otac_en, otif_en, otif_pt, otif_ru]
>
> In ZK, clusterstate.js contains information about the collection,  but
> /ot/solr/collections is empty.
>
> Thanks;
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Migrating-config-found-but-not-mapped-to-
> collection-tp4295375p4295384.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread igiguere
Hi JohnB;

We have a script that calls the ZK CLI with this command:
zk.bat -cmd upconfig -zkhost %SOLR_ZK_ENSEMBLE% -confdir
%INSTALL_DIR%\etc\solr\otac\default-en\conf -confname otac_en

The scrtipt does the job: the config is visible at
/ot/solr/configs/otac_en

You can see in Solr's error message that the config does exist: otac_en
found:[otif_und, otif_de, otif_fr, otif_it, otif_es, otif_nl, otif_noText,
otif_ja, otac_en, otif_en, otif_pt, otif_ru] 

In ZK, clusterstate.js contains information about the collection,  but
/ot/solr/collections is empty.

Thanks;



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-config-found-but-not-mapped-to-collection-tp4295375p4295384.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating: config found, but not mapped to collection

2016-09-09 Thread John Bickerstaff
Specifically, how did you push the configuration to Zookeeper?

Does the config exist in a separate "chroot" on Zookeeper?  If so, do all
the collection names exist inside there (On Zookeeper)?

On Fri, Sep 9, 2016 at 2:01 PM, igiguere  wrote:

> Hi;
>
> I am migrating collections from Solr 4.3 to Solr 5.4.1
>
> The configuration was pushed to Zookeeper, on the Zookeeper connected with
> Solr 5.4.1:
> schema.xml: 
> solrconfig.xml: 5.4.1
>
> I can manually create a new core, using the Solr Admin UI, as long as I use
> the name "otac_en" for parameter "collection".  Solr default for
> collection.configName is the same name given for the collection.
>
> However, I have to migrate a number of collections that have a variety of
> names, some name created automatically by an application (example:
> a-0c512bac-abc9-48e9-8f66-96432b724263_shard1_replica1)
>
> "Normally", if the collections have the same name as the configuration, I
> can either point the new Solr 5.4.1 home to the old Solr 4.3 home, or copy
> the collection folder from the old /solr folder to the new /solr.
>
> But here, for every collection that has a different name than the config, I
> get:
> SolrCore Initialization Failures
> a-0c512bac-abc9-48e9-8f66-96432b724263_shard1_replica1:
> org.apache.solr.common.cloud.ZooKeeperException:org.apache.
> solr.common.cloud.ZooKeeperException:
> Could not find configName for collection
> a-0c512bac-abc9-48e9-8f66-96432b724263 found:[otif_und, otif_de, otif_fr,
> otif_it, otif_es, otif_nl, otif_noText, otif_ja, otac_en, otif_en, otif_pt,
> otif_ru]
>
> Where can I specify collection.configName when SolrCore Initialization runs
> ?
>
> Thanks;
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Migrating-config-found-but-not-mapped-to-
> collection-tp4295375.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Collection Create API queries

2016-09-09 Thread Swathi Singamsetty
Thank you Anshum.
I would try the approach of managing it from outside first and see how it
works.

On Fri, Sep 9, 2016 at 1:51 PM, Anshum Gupta  wrote:

> If you want to build a monitoring tool that maintains a replication factor,
> I would suggest you use the Collections APIs (ClusterStatus, AddReplica,
> DeleteReplica, etc.) and manage this from outside of Solr. I don't want to
> pull you back from trying to build something but I think you'd be biting a
> lot for the first bite if you take this up as the first thing to implement
> within Solr.
>
>
> On Fri, Sep 9, 2016 at 1:41 PM Swathi Singamsetty <
> swathisingamsett...@gmail.com> wrote:
>
> > I am experimenting on this functionality and see how the overseer
> monitors
> > and keeps the minimum no of replicas up and running.
> >
> >
> > In heavy indexing/search flow , if any replica goes down we need to keep
> > the minimum no. of replicas up and running to serve the traffic and
> > mainitain the availability of the cluster.
> >
> >
> > Please let me know if you need more information.
> >
> > Can you point me to the git repo branch where I can dig deeper and see
> this
> > functionality ?
> >
> >
> >
> > Thanks,
> > Swathi.
> >
> >
> >
> >
> >
> > On Fri, Sep 9, 2016 at 1:10 PM, Anshum Gupta 
> > wrote:
> >
> > > Just to clarify here, I said that I really think it's an XY problem
> > here. I
> > > still don't know what is being attempted/built.
> > >
> > > From the last email, sounds like you want to build/support
> auto-addition
> > of
> > > replica but I would wait until you clarify the use case to suggest
> > > anything.
> > >
> > > -Anshum
> > >
> > > On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson  >
> > > wrote:
> > >
> > > > I think you're missing my point. The _feature_ may be there,
> > > > you'll have to investigate. But it is not named "smartCloud" or
> > > >  "autoManageCluster". Those terms
> > > > 1> do not appear in the final patch.
> > > > 2> do not appear in any file in Solr 6x.
> > > >
> > > > They were suggested names, what the final implementation
> > > > used should be in the ref guide, although I admit this latter
> > > > sometimes lags.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
> > > >  wrote:
> > > > > I am working on solr 6.0.0 to implement this feature.
> > > > > I had a chat with Anshum and confirmed that this feature is
> available
> > > in
> > > > > 6.0.0 version.
> > > > >
> > > > >
> > > > > The functionality is that to allow the overseer to bring up
> > > > >  the minimum no. of replicas for each shard as per the
> > > replicationFactor
> > > > > set.
> > > > >
> > > > > I will look into the ref guide as well.
> > > > >
> > > > > Thanks,
> > > > > Swathi.
> > > > >
> > > > > On Friday, September 9, 2016, Erick Erickson <
> > erickerick...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> You cannot just pick arbitrary parts of a JIRA discussion
> > > > >> and expect them to work. JIRAs are places where
> > > > >> discussion of alternatives takes place and the discussion
> > > > >> often suggests ideas that are not incorporated
> > > > >> in the final patch. The patch for the JIRA you mentioned,
> > > > >> for instance, does not mention either of those parameters,
> > > > >> which implies that they were simply part of the discussion
> > > > >> and were never implemented.
> > > > >>
> > > > >> So this sounds like an "XY" problem. You're asking why
> > > > >> properties aren't persisted when you really want to take
> > > > >> advantage of some functionality. What is that functionality?
> > > > >>
> > > > >> BTW, I'd go by the ref guide rather than JIRAs unless you
> > > > >> examine the patch and see that the discussion was
> > > > >> implemented in the patch.
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> > > > >> > wrote:
> > > > >> > Hi Team,
> > > > >> >
> > > > >> > To implement the feature "Persist and use the
> > > > >> > replicationFactor,maxShardsPerNode at Collection&Shard level"
> am
> > > > >> following
> > > > >> > the steps mentioned in the jira ticket
> > > > >> > https://issues.apache.org/jira/browse/SOLR-4808.
> > > > >> >
> > > > >> > I used the "smartCloud" and "autoManageCluster" properties to
> > > create a
> > > > >> > collection in the create collection API to allow the overseer to
> > > > bring up
> > > > >> > the minimum no. of replicas for each shard as per the
> > > > replicationFactor
> > > > >> set
> > > > >> > . But these 2 properties did not persist in the cluster state.
> > Could
> > > > >> > someone let me know how to use these properties in this feature?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Thanks & Regards,
> > > > >> > Swathi.
> > > > >>
> > > >
> > >
> >
>


Re: Solr Collection Create API queries

2016-09-09 Thread Anshum Gupta
If you want to build a monitoring tool that maintains a replication factor,
I would suggest you use the Collections APIs (ClusterStatus, AddReplica,
DeleteReplica, etc.) and manage this from outside of Solr. I don't want to
pull you back from trying to build something but I think you'd be biting a
lot for the first bite if you take this up as the first thing to implement
within Solr.


On Fri, Sep 9, 2016 at 1:41 PM Swathi Singamsetty <
swathisingamsett...@gmail.com> wrote:

> I am experimenting on this functionality and see how the overseer monitors
> and keeps the minimum no of replicas up and running.
>
>
> In heavy indexing/search flow , if any replica goes down we need to keep
> the minimum no. of replicas up and running to serve the traffic and
> mainitain the availability of the cluster.
>
>
> Please let me know if you need more information.
>
> Can you point me to the git repo branch where I can dig deeper and see this
> functionality ?
>
>
>
> Thanks,
> Swathi.
>
>
>
>
>
> On Fri, Sep 9, 2016 at 1:10 PM, Anshum Gupta 
> wrote:
>
> > Just to clarify here, I said that I really think it's an XY problem
> here. I
> > still don't know what is being attempted/built.
> >
> > From the last email, sounds like you want to build/support auto-addition
> of
> > replica but I would wait until you clarify the use case to suggest
> > anything.
> >
> > -Anshum
> >
> > On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson 
> > wrote:
> >
> > > I think you're missing my point. The _feature_ may be there,
> > > you'll have to investigate. But it is not named "smartCloud" or
> > >  "autoManageCluster". Those terms
> > > 1> do not appear in the final patch.
> > > 2> do not appear in any file in Solr 6x.
> > >
> > > They were suggested names, what the final implementation
> > > used should be in the ref guide, although I admit this latter
> > > sometimes lags.
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
> > >  wrote:
> > > > I am working on solr 6.0.0 to implement this feature.
> > > > I had a chat with Anshum and confirmed that this feature is available
> > in
> > > > 6.0.0 version.
> > > >
> > > >
> > > > The functionality is that to allow the overseer to bring up
> > > >  the minimum no. of replicas for each shard as per the
> > replicationFactor
> > > > set.
> > > >
> > > > I will look into the ref guide as well.
> > > >
> > > > Thanks,
> > > > Swathi.
> > > >
> > > > On Friday, September 9, 2016, Erick Erickson <
> erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> You cannot just pick arbitrary parts of a JIRA discussion
> > > >> and expect them to work. JIRAs are places where
> > > >> discussion of alternatives takes place and the discussion
> > > >> often suggests ideas that are not incorporated
> > > >> in the final patch. The patch for the JIRA you mentioned,
> > > >> for instance, does not mention either of those parameters,
> > > >> which implies that they were simply part of the discussion
> > > >> and were never implemented.
> > > >>
> > > >> So this sounds like an "XY" problem. You're asking why
> > > >> properties aren't persisted when you really want to take
> > > >> advantage of some functionality. What is that functionality?
> > > >>
> > > >> BTW, I'd go by the ref guide rather than JIRAs unless you
> > > >> examine the patch and see that the discussion was
> > > >> implemented in the patch.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> > > >> > wrote:
> > > >> > Hi Team,
> > > >> >
> > > >> > To implement the feature "Persist and use the
> > > >> > replicationFactor,maxShardsPerNode at Collection&Shard level" am
> > > >> following
> > > >> > the steps mentioned in the jira ticket
> > > >> > https://issues.apache.org/jira/browse/SOLR-4808.
> > > >> >
> > > >> > I used the "smartCloud" and "autoManageCluster" properties to
> > create a
> > > >> > collection in the create collection API to allow the overseer to
> > > bring up
> > > >> > the minimum no. of replicas for each shard as per the
> > > replicationFactor
> > > >> set
> > > >> > . But these 2 properties did not persist in the cluster state.
> Could
> > > >> > someone let me know how to use these properties in this feature?
> > > >> >
> > > >> >
> > > >> >
> > > >> > Thanks & Regards,
> > > >> > Swathi.
> > > >>
> > >
> >
>


Re: Solr Collection Create API queries

2016-09-09 Thread Swathi Singamsetty
I am experimenting on this functionality and see how the overseer monitors
and keeps the minimum no of replicas up and running.


In heavy indexing/search flow , if any replica goes down we need to keep
the minimum no. of replicas up and running to serve the traffic and
mainitain the availability of the cluster.


Please let me know if you need more information.

Can you point me to the git repo branch where I can dig deeper and see this
functionality ?



Thanks,
Swathi.





On Fri, Sep 9, 2016 at 1:10 PM, Anshum Gupta  wrote:

> Just to clarify here, I said that I really think it's an XY problem here. I
> still don't know what is being attempted/built.
>
> From the last email, sounds like you want to build/support auto-addition of
> replica but I would wait until you clarify the use case to suggest
> anything.
>
> -Anshum
>
> On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson 
> wrote:
>
> > I think you're missing my point. The _feature_ may be there,
> > you'll have to investigate. But it is not named "smartCloud" or
> >  "autoManageCluster". Those terms
> > 1> do not appear in the final patch.
> > 2> do not appear in any file in Solr 6x.
> >
> > They were suggested names, what the final implementation
> > used should be in the ref guide, although I admit this latter
> > sometimes lags.
> >
> > Best,
> > Erick
> >
> > On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
> >  wrote:
> > > I am working on solr 6.0.0 to implement this feature.
> > > I had a chat with Anshum and confirmed that this feature is available
> in
> > > 6.0.0 version.
> > >
> > >
> > > The functionality is that to allow the overseer to bring up
> > >  the minimum no. of replicas for each shard as per the
> replicationFactor
> > > set.
> > >
> > > I will look into the ref guide as well.
> > >
> > > Thanks,
> > > Swathi.
> > >
> > > On Friday, September 9, 2016, Erick Erickson 
> > > wrote:
> > >
> > >> You cannot just pick arbitrary parts of a JIRA discussion
> > >> and expect them to work. JIRAs are places where
> > >> discussion of alternatives takes place and the discussion
> > >> often suggests ideas that are not incorporated
> > >> in the final patch. The patch for the JIRA you mentioned,
> > >> for instance, does not mention either of those parameters,
> > >> which implies that they were simply part of the discussion
> > >> and were never implemented.
> > >>
> > >> So this sounds like an "XY" problem. You're asking why
> > >> properties aren't persisted when you really want to take
> > >> advantage of some functionality. What is that functionality?
> > >>
> > >> BTW, I'd go by the ref guide rather than JIRAs unless you
> > >> examine the patch and see that the discussion was
> > >> implemented in the patch.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> > >> > wrote:
> > >> > Hi Team,
> > >> >
> > >> > To implement the feature "Persist and use the
> > >> > replicationFactor,maxShardsPerNode at Collection&Shard level" am
> > >> following
> > >> > the steps mentioned in the jira ticket
> > >> > https://issues.apache.org/jira/browse/SOLR-4808.
> > >> >
> > >> > I used the "smartCloud" and "autoManageCluster" properties to
> create a
> > >> > collection in the create collection API to allow the overseer to
> > bring up
> > >> > the minimum no. of replicas for each shard as per the
> > replicationFactor
> > >> set
> > >> > . But these 2 properties did not persist in the cluster state. Could
> > >> > someone let me know how to use these properties in this feature?
> > >> >
> > >> >
> > >> >
> > >> > Thanks & Regards,
> > >> > Swathi.
> > >>
> >
>


Re: Solr Collection Create API queries

2016-09-09 Thread Anshum Gupta
Just to clarify here, I said that I really think it's an XY problem here. I
still don't know what is being attempted/built.

>From the last email, sounds like you want to build/support auto-addition of
replica but I would wait until you clarify the use case to suggest anything.

-Anshum

On Fri, Sep 9, 2016 at 8:20 AM Erick Erickson 
wrote:

> I think you're missing my point. The _feature_ may be there,
> you'll have to investigate. But it is not named "smartCloud" or
>  "autoManageCluster". Those terms
> 1> do not appear in the final patch.
> 2> do not appear in any file in Solr 6x.
>
> They were suggested names, what the final implementation
> used should be in the ref guide, although I admit this latter
> sometimes lags.
>
> Best,
> Erick
>
> On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
>  wrote:
> > I am working on solr 6.0.0 to implement this feature.
> > I had a chat with Anshum and confirmed that this feature is available in
> > 6.0.0 version.
> >
> >
> > The functionality is that to allow the overseer to bring up
> >  the minimum no. of replicas for each shard as per the replicationFactor
> > set.
> >
> > I will look into the ref guide as well.
> >
> > Thanks,
> > Swathi.
> >
> > On Friday, September 9, 2016, Erick Erickson 
> > wrote:
> >
> >> You cannot just pick arbitrary parts of a JIRA discussion
> >> and expect them to work. JIRAs are places where
> >> discussion of alternatives takes place and the discussion
> >> often suggests ideas that are not incorporated
> >> in the final patch. The patch for the JIRA you mentioned,
> >> for instance, does not mention either of those parameters,
> >> which implies that they were simply part of the discussion
> >> and were never implemented.
> >>
> >> So this sounds like an "XY" problem. You're asking why
> >> properties aren't persisted when you really want to take
> >> advantage of some functionality. What is that functionality?
> >>
> >> BTW, I'd go by the ref guide rather than JIRAs unless you
> >> examine the patch and see that the discussion was
> >> implemented in the patch.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> >> > wrote:
> >> > Hi Team,
> >> >
> >> > To implement the feature "Persist and use the
> >> > replicationFactor,maxShardsPerNode at Collection&Shard level" am
> >> following
> >> > the steps mentioned in the jira ticket
> >> > https://issues.apache.org/jira/browse/SOLR-4808.
> >> >
> >> > I used the "smartCloud" and "autoManageCluster" properties to create a
> >> > collection in the create collection API to allow the overseer to
> bring up
> >> > the minimum no. of replicas for each shard as per the
> replicationFactor
> >> set
> >> > . But these 2 properties did not persist in the cluster state. Could
> >> > someone let me know how to use these properties in this feature?
> >> >
> >> >
> >> >
> >> > Thanks & Regards,
> >> > Swathi.
> >>
>


Migrating: config found, but not mapped to collection

2016-09-09 Thread igiguere
Hi;

I am migrating collections from Solr 4.3 to Solr 5.4.1

The configuration was pushed to Zookeeper, on the Zookeeper connected with
Solr 5.4.1:
schema.xml: 
solrconfig.xml: 5.4.1

I can manually create a new core, using the Solr Admin UI, as long as I use
the name "otac_en" for parameter "collection".  Solr default for
collection.configName is the same name given for the collection.

However, I have to migrate a number of collections that have a variety of
names, some name created automatically by an application (example:
a-0c512bac-abc9-48e9-8f66-96432b724263_shard1_replica1)

"Normally", if the collections have the same name as the configuration, I
can either point the new Solr 5.4.1 home to the old Solr 4.3 home, or copy
the collection folder from the old /solr folder to the new /solr.

But here, for every collection that has a different name than the config, I
get:
SolrCore Initialization Failures
a-0c512bac-abc9-48e9-8f66-96432b724263_shard1_replica1:
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
Could not find configName for collection
a-0c512bac-abc9-48e9-8f66-96432b724263 found:[otif_und, otif_de, otif_fr,
otif_it, otif_es, otif_nl, otif_noText, otif_ja, otac_en, otif_en, otif_pt,
otif_ru] 

Where can I specify collection.configName when SolrCore Initialization runs
?

Thanks;



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-config-found-but-not-mapped-to-collection-tp4295375.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Apache Solr 5.5.3 released

2016-09-09 Thread Anshum Gupta
09 September 2016, Apache Solr™ 5.5.3 available

The Lucene PMC is pleased to announce the release of Apache Solr 5.5.3

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

This release includes 5 bug fixes since the 5.5.2 release.

The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.3

This release specially contains 2 critical fixes:
* The number of TCP connections in CLOSE_WAIT state do not spike during
indexing,
* PeerSync no longer fails on a node restart due to IndexFingerPrint
mismatch.

Please read CHANGES.txt for a detailed list of changes:

  https://lucene.apache.org/solr/5_5_3/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

-Anshum Gupta


Re: [Rerank Query] Distributed search + pagination

2016-09-09 Thread Joel Bernstein
I'm not understanding where the inconsistency comes into play.

The re-ranking occurs on the shards. The aggregator node will be sent some
docs that have been re-scored and others that are not. But the sorting
should be the same as someone pages through the result set.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Sep 9, 2016 at 9:28 AM, Alessandro Benedetti 
wrote:

> Hi guys,
> was just experimenting some reranker with really low number of rerank docs
> ( 10= pageSize) .
> Let's focus on the distributed enviroment and  the manual sharding
> approach.
>
> Currently what happens is that the reranking task is delivered by the
> shards, they rescore the docs and then send them back to the aggregator
> node.
>
> If you want to rerank only few docs ( leaving the others with the original
> score following), this can be done in a single Solr instance ( the howmany
> logic manages that in the reranker) .
>
> What happens when you move to a distributed environment ?
> The aggregator will aggregate both rescored and original scored documents,
> making the final ranking inconsistent.
> In the other hand if we make the rarankingDocs threshold dynamic ( to adapt
> to start+rows) we can incur in the very annoying issue of having a document
> sliding through the pages ( visible in the first page , then appearing
> again in the third ect ect).
>
> Any thought ?
>
> Cheers
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Bug with bootstrap_confdir?

2016-09-09 Thread M Skazy
Hi,

I was having an issue setting up a Solr instance w/ a external Zookeeper.
My SOLR_HOME is not set to the default location.  I believe the problem is
related to the following line and I wanted to confirm if this is a bug:

https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L1383

It would seem that if we're checking if a file exists at some position
relative to SOLR_HOME that the path supplied for bootstrap_confdir should
also be rooted by SOLR_HOME instead of the working directory of the solr
process.

For me this translated into errors when solr started as it was trying to
load a configuration into ZK from a directory that did not exist.

The fix is easy and I can create a patch for this if it is decided that
this is a bug.

Thanks,

Mike


Re: Solr slow response collection-High Load

2016-09-09 Thread Ankush Khanna
Thanks Erick and Kshitij.
Would try both the options and see what works best.

Regards
Ankush Khanna

On Fri, 9 Sep 2016 at 16:33 Erick Erickson  wrote:

> The soft commit interval governs opening new
> searchers, which should be "warmed" in order
> load up caches. Mu guess is that you're not doing much
> warming and thus seeing long search times.
>
> Most attachments are stripped by the mail server,
> if you want people to see the images put them up somewhere
> and provide a link is the usual practice.
>
> Have you examined the options here?
> https://wiki.apache.org/solr/SolrPerformanceFactors
>
> Best,
> Erick
>
> On Fri, Sep 9, 2016 at 2:10 AM, kshitij tyagi
>  wrote:
> > Hi Ankush,
> >
> > As you are updating highly on one of the cores, hard commit will play a
> > major role.
> >
> > Reason: During hard commits solr merges your segments and this is a time
> > taking process.
> >
> > During merging of segments indexing of documents gets affected i.e. gets
> > slower.
> >
> > Try figuring out the right number of segments you need to have and focus
> on
> > analysing the merge process of solr when you are updating high amount of
> > data.
> >
> > You will need to find the correct time for hard commits and the required
> > number of segments for the collection.
> >
> > Hope this helps.
> >
> >
> >
> > On Fri, Sep 9, 2016 at 2:13 PM, Ankush Khanna 
> wrote:
> >
> >> Hello,
> >>
> >> We are running some test for improving our solr performance.
> >>
> >> We have around 15 collections on our solr cluster.
> >> But we are particularly interested in one collection holding high
> amount of
> >> documents. (
> >> https://gist.github.com/AnkushKhanna/9a472bccc02d9859fce07cb0204862da)
> >>
> >> Issue:
> >> We see that there are high response time from the collection, for the
> same
> >> queries, when user load or update load is increased.
> >>
> >> What are we aiming for:
> >> Low response time (lower than 3 sec) in high update/traffic.
> >>
> >> Current collection, production:
> >> * Solr Cloud, 2 Shards 2 Replicas
> >> * Indexed: 5.4 million documents
> >> * 45 indexed fields per document
> >> * Soft commit: 5 seconds
> >> * Hard commit: 10 minutes
> >>
> >> Test Setup:
> >> * Indexed: 3 million documents
> >> * Rest is same as in production
> >> * Using gatling to mimic behaviour of updates and user traffic
> >>
> >> Finding:
> >> We see the problem occurring more often when:
> >> * query size is greater than 2000 characters (we can limit the search to
> >> 2000 characters, but is there a solution to do this without limiting the
> >> size)
> >> * there is high updates going on
> >> * high user traffic
> >>
> >> Some settings I explored:
> >> * 1 Shard and 3 Replicas
> >> * Hard commit: 5 minutes (Referencing
> >> https://lucidworks.com/blog/2013/08/23/understanding-
> >> transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> )
> >>
> >> With both the above solutions we see some improvements, but not drastic.
> >> (Attach images)
> >>
> >> I would like to have more insights into the following questions:
> >> * Why is there an improvement with lowering the hard commit time, would
> it
> >> interesting to explore with lower hard commit time.
> >>
> >> Can some one provide some other pointer I could explore.
> >>
> >> Regards
> >> Ankush Khanna
> >>
>


Question : Invalid results on full-text search with a combination of numbers and dots in research terms

2016-09-09 Thread Sambeau PRAK
Hello,


I have no results on full-text searching with a combination of numbers and dots 
in research terms (example : 304.411)

Does Lucene core (version 4.1.3) have limits or do I have missing parameters ?


Thanks in advance,

Sambeau PRAK
Efalia (DMS editor)
sp...@efalia.com


ConcurrentUpdateSolrClient threads

2016-09-09 Thread Rallavagu

All,

Running Solr 5.4.1 with embedded Jetty with frequent updates coming in 
and softCommit is set to 10 min. What I am noticing is occasional "slow" 
updates (takes 8 sec to 15 sec sometimes) and about the same time slow 
QTimes. Upon investigating, it appears that 
"ConcurrentUpdateSolrClient:blockUntilFinished:429" is waiting on thread 
to be free. Looking at https://issues.apache.org/jira/browse/SOLR-8500 
it appears that it presents with an option to increase the number of 
threads that might help with managing more updates without having to 
wait (though need to update Solr to 5.5). I could not figure out the 
default number of threads for ConcurrentUpdateSolrClient class. Before I 
can try increasing number of threads, wondering if there are any 
"gotchas" increasing the number of threads and what is the reasonable 
number of the threads if so?



org.apache.solr.update.SolrCmdDistributor:finish:90 (method time = 0 ms, 
total time = 7489 ms)
 org.apache.solr.update.SolrCmdDistributor:blockAndDoRetries:232 
(method time = 0 ms, total time = 7489 ms)
  org.apache.solr.update.StreamingSolrClients:blockUntilFinished:107 
(method time = 0 ms, total time = 7489 ms)


org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:blockUntilFinished:429 
(method time = 0 ms, total time = 7489 ms)

java.lang.Object:wait (method time = 7489 ms, total time = 7489 ms)


Thanks in advance


Re: Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina

Le 09/09/2016 à 17:57, Shawn Heisey a écrit :

On 9/8/2016 9:41 AM, Bruno Mannina wrote:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA
to 2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

This isn't a Solr issue, which is easy to determine by the fact that
you've stopped Solr and it's not even running.  It's a problem with the
filesystem, probably the destination filesystem.

The most common reason that I have found for this error is a destination
filesystem that is incapable of holding a large file -- which can happen
when the disk is formatted fat32 instead of ntfs or a Linux filesystem.
You can have a 2TB filesystem with fat32, but no files larger than 4GB
-- so your 65GB file won't fit.

I think you're going to need to reformat that external drive with
another filesystem.  If you choose NTFS, you'll be able to use the disk
on either Linux or Windows.

Thanks,
Shawn



Hi Shawn,

First thanks for your answer, effectively it's a little bit clear.
Tonight I will check the file system of my hdd.

And sorry for this question out of solr subject.

Cdlt,
Bruno



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: Strange error when I try to copy....

2016-09-09 Thread Shawn Heisey
On 9/8/2016 9:41 AM, Bruno Mannina wrote:
> - I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz
>
> - I do a simple directory copy /data to my HDD backup (from 2To SATA
> to 2To SATA directly connected to the Mothercard).
>
> All files are copied fine but one not ! the biggest (~65Go) failed.
>
> I have the message : "Error splicing file: Input/output error"

This isn't a Solr issue, which is easy to determine by the fact that
you've stopped Solr and it's not even running.  It's a problem with the
filesystem, probably the destination filesystem.

The most common reason that I have found for this error is a destination
filesystem that is incapable of holding a large file -- which can happen
when the disk is formatted fat32 instead of ntfs or a Linux filesystem. 
You can have a 2TB filesystem with fat32, but no files larger than 4GB
-- so your 65GB file won't fit.

I think you're going to need to reformat that external drive with
another filesystem.  If you choose NTFS, you'll be able to use the disk
on either Linux or Windows.

Thanks,
Shawn



Strange error when I try to copy....

2016-09-09 Thread Bruno Mannina

Dear Solr Users,

I use since several years SOLR and since two weeks, I have a problem
when I try to copy my solr index.

My solr index is around 180Go (~100 000 000 docs, 1 doc ~ 3ko)

My method to save my index every Sunday:

- I stop SOLR 5.4 on Ubuntu 14.04LTS - 16Go - i3-2120 CPU @ 3.30Ghz

- I do a simple directory copy /data to my HDD backup (from 2To SATA to
2To SATA directly connected to the Mothercard).

All files are copied fine but one not ! the biggest (~65Go) failed.

I have the message : "Error splicing file: Input/output error"

I tried also on windows (I have a dualboot), I have "redondance error".

I check my HDD, no error, I check the file "_k46.fdt" no error, I can
delete docs, add docs, my database can be reach and works fine.

Is someone have an idea to backup my database ? or why I have this error ?

Many thanks for your help,

Sincerely,

Bruno





---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus


Re: Solr Collection Create API queries

2016-09-09 Thread Erick Erickson
I think you're missing my point. The _feature_ may be there,
you'll have to investigate. But it is not named "smartCloud" or
 "autoManageCluster". Those terms
1> do not appear in the final patch.
2> do not appear in any file in Solr 6x.

They were suggested names, what the final implementation
used should be in the ref guide, although I admit this latter
sometimes lags.

Best,
Erick

On Fri, Sep 9, 2016 at 7:51 AM, Swathi Singamsetty
 wrote:
> I am working on solr 6.0.0 to implement this feature.
> I had a chat with Anshum and confirmed that this feature is available in
> 6.0.0 version.
>
>
> The functionality is that to allow the overseer to bring up
>  the minimum no. of replicas for each shard as per the replicationFactor
> set.
>
> I will look into the ref guide as well.
>
> Thanks,
> Swathi.
>
> On Friday, September 9, 2016, Erick Erickson 
> wrote:
>
>> You cannot just pick arbitrary parts of a JIRA discussion
>> and expect them to work. JIRAs are places where
>> discussion of alternatives takes place and the discussion
>> often suggests ideas that are not incorporated
>> in the final patch. The patch for the JIRA you mentioned,
>> for instance, does not mention either of those parameters,
>> which implies that they were simply part of the discussion
>> and were never implemented.
>>
>> So this sounds like an "XY" problem. You're asking why
>> properties aren't persisted when you really want to take
>> advantage of some functionality. What is that functionality?
>>
>> BTW, I'd go by the ref guide rather than JIRAs unless you
>> examine the patch and see that the discussion was
>> implemented in the patch.
>>
>> Best,
>> Erick
>>
>> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
>> > wrote:
>> > Hi Team,
>> >
>> > To implement the feature "Persist and use the
>> > replicationFactor,maxShardsPerNode at Collection&Shard level" am
>> following
>> > the steps mentioned in the jira ticket
>> > https://issues.apache.org/jira/browse/SOLR-4808.
>> >
>> > I used the "smartCloud" and "autoManageCluster" properties to create a
>> > collection in the create collection API to allow the overseer to bring up
>> > the minimum no. of replicas for each shard as per the replicationFactor
>> set
>> > . But these 2 properties did not persist in the cluster state. Could
>> > someone let me know how to use these properties in this feature?
>> >
>> >
>> >
>> > Thanks & Regards,
>> > Swathi.
>>


Re: Solr Collection Create API queries

2016-09-09 Thread Swathi Singamsetty
I am working on solr 6.0.0 to implement this feature.
I had a chat with Anshum and confirmed that this feature is available in
6.0.0 version.


The functionality is that to allow the overseer to bring up
 the minimum no. of replicas for each shard as per the replicationFactor
set.

I will look into the ref guide as well.

Thanks,
Swathi.

On Friday, September 9, 2016, Erick Erickson 
wrote:

> You cannot just pick arbitrary parts of a JIRA discussion
> and expect them to work. JIRAs are places where
> discussion of alternatives takes place and the discussion
> often suggests ideas that are not incorporated
> in the final patch. The patch for the JIRA you mentioned,
> for instance, does not mention either of those parameters,
> which implies that they were simply part of the discussion
> and were never implemented.
>
> So this sounds like an "XY" problem. You're asking why
> properties aren't persisted when you really want to take
> advantage of some functionality. What is that functionality?
>
> BTW, I'd go by the ref guide rather than JIRAs unless you
> examine the patch and see that the discussion was
> implemented in the patch.
>
> Best,
> Erick
>
> On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
> > wrote:
> > Hi Team,
> >
> > To implement the feature "Persist and use the
> > replicationFactor,maxShardsPerNode at Collection&Shard level" am
> following
> > the steps mentioned in the jira ticket
> > https://issues.apache.org/jira/browse/SOLR-4808.
> >
> > I used the "smartCloud" and "autoManageCluster" properties to create a
> > collection in the create collection API to allow the overseer to bring up
> > the minimum no. of replicas for each shard as per the replicationFactor
> set
> > . But these 2 properties did not persist in the cluster state. Could
> > someone let me know how to use these properties in this feature?
> >
> >
> >
> > Thanks & Regards,
> > Swathi.
>


Re: Solr Collection Create API queries

2016-09-09 Thread Erick Erickson
You cannot just pick arbitrary parts of a JIRA discussion
and expect them to work. JIRAs are places where
discussion of alternatives takes place and the discussion
often suggests ideas that are not incorporated
in the final patch. The patch for the JIRA you mentioned,
for instance, does not mention either of those parameters,
which implies that they were simply part of the discussion
and were never implemented.

So this sounds like an "XY" problem. You're asking why
properties aren't persisted when you really want to take
advantage of some functionality. What is that functionality?

BTW, I'd go by the ref guide rather than JIRAs unless you
examine the patch and see that the discussion was
implemented in the patch.

Best,
Erick

On Thu, Sep 8, 2016 at 9:33 PM, Swathi Singamsetty
 wrote:
> Hi Team,
>
> To implement the feature "Persist and use the
> replicationFactor,maxShardsPerNode at Collection&Shard level" am following
> the steps mentioned in the jira ticket
> https://issues.apache.org/jira/browse/SOLR-4808.
>
> I used the "smartCloud" and "autoManageCluster" properties to create a
> collection in the create collection API to allow the overseer to bring up
> the minimum no. of replicas for each shard as per the replicationFactor set
> . But these 2 properties did not persist in the cluster state. Could
> someone let me know how to use these properties in this feature?
>
>
>
> Thanks & Regards,
> Swathi.


Re: Solr slow response collection-High Load

2016-09-09 Thread Erick Erickson
The soft commit interval governs opening new
searchers, which should be "warmed" in order
load up caches. Mu guess is that you're not doing much
warming and thus seeing long search times.

Most attachments are stripped by the mail server,
if you want people to see the images put them up somewhere
and provide a link is the usual practice.

Have you examined the options here?
https://wiki.apache.org/solr/SolrPerformanceFactors

Best,
Erick

On Fri, Sep 9, 2016 at 2:10 AM, kshitij tyagi
 wrote:
> Hi Ankush,
>
> As you are updating highly on one of the cores, hard commit will play a
> major role.
>
> Reason: During hard commits solr merges your segments and this is a time
> taking process.
>
> During merging of segments indexing of documents gets affected i.e. gets
> slower.
>
> Try figuring out the right number of segments you need to have and focus on
> analysing the merge process of solr when you are updating high amount of
> data.
>
> You will need to find the correct time for hard commits and the required
> number of segments for the collection.
>
> Hope this helps.
>
>
>
> On Fri, Sep 9, 2016 at 2:13 PM, Ankush Khanna  wrote:
>
>> Hello,
>>
>> We are running some test for improving our solr performance.
>>
>> We have around 15 collections on our solr cluster.
>> But we are particularly interested in one collection holding high amount of
>> documents. (
>> https://gist.github.com/AnkushKhanna/9a472bccc02d9859fce07cb0204862da)
>>
>> Issue:
>> We see that there are high response time from the collection, for the same
>> queries, when user load or update load is increased.
>>
>> What are we aiming for:
>> Low response time (lower than 3 sec) in high update/traffic.
>>
>> Current collection, production:
>> * Solr Cloud, 2 Shards 2 Replicas
>> * Indexed: 5.4 million documents
>> * 45 indexed fields per document
>> * Soft commit: 5 seconds
>> * Hard commit: 10 minutes
>>
>> Test Setup:
>> * Indexed: 3 million documents
>> * Rest is same as in production
>> * Using gatling to mimic behaviour of updates and user traffic
>>
>> Finding:
>> We see the problem occurring more often when:
>> * query size is greater than 2000 characters (we can limit the search to
>> 2000 characters, but is there a solution to do this without limiting the
>> size)
>> * there is high updates going on
>> * high user traffic
>>
>> Some settings I explored:
>> * 1 Shard and 3 Replicas
>> * Hard commit: 5 minutes (Referencing
>> https://lucidworks.com/blog/2013/08/23/understanding-
>> transaction-logs-softcommit-and-commit-in-sorlcloud/
>> )
>>
>> With both the above solutions we see some improvements, but not drastic.
>> (Attach images)
>>
>> I would like to have more insights into the following questions:
>> * Why is there an improvement with lowering the hard commit time, would it
>> interesting to explore with lower hard commit time.
>>
>> Can some one provide some other pointer I could explore.
>>
>> Regards
>> Ankush Khanna
>>


Solr Configuration for Hortontworks HA

2016-09-09 Thread Heybati Farhad
Hi All,

We implement the Hortonworks Standby NameNode and i'm wondering how to 
configure the Solr to point to the cluster name instead of the Name node 
Hostname?

 
?

I tried to configure Dolr in several ways without succes:
1) Using the cluser name
2) using a "," separate host name of the both active and standby NameNode
3) using a ";" separate host name of the both active and standby NameNode

Do you have anu suggestion?

Thanks
Regards
Farhad


Re: Default stop word list

2016-09-09 Thread Emir Arnautovic
I would partially agree with Walter - having more resources allows us to 
include stopwords in index and let scoring model do its job. However, 
there are other Solr features that can suffer from that approach: e.g. 
if you use edismax and mm=80%, in case of query with stopwords, you can 
end up with irrelevant results only because they survived mm while 
relevant did not because it was missing stopwords.


I would say that decision should depend on field type - it is some 
description, I would include StopFilterFactory, but if it is some title, 
than keeping stopwords in index is one way of making sure extreme titles 
can be found. Alternative is to index it in different ways - analyzed, 
string, shingles... and combine those fields to find best match without 
loosing "to be or not to be".


Regards,
Emir


On 08.09.2016 18:21, Walter Underwood wrote:

I recommend that you remove StopFilterFactor from every analysis chain.

In the tf.idf scoring model, rare words are automatically weighted more than 
common words.

I have an index with 11.6 million documents. “the” occurs in 9.9 million of 
those documents. “cat” occurs in 16,000 of those documents. (I just did 
searches to get the counts).

This is the idf (inverse document frequency) formula for Solr:

public float idf(int docFreq, int numDocs) {
 return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
   }
“the” has an idf of 1.07. “cat” has an idf of 3.86.

The term “the” still counts for relevance, but it is dominated by the weight 
for “cat”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Sep 8, 2016, at 7:09 AM, Steven White  wrote:

Hi Walter and all.  Sorry for the late reply, I was out of town.

Are you saying the list of stop words from the stop word file be remove?  I
understand the issues I will run into because of the stop word list, but
all alone, my understanding of stop word list being in the stop word file
is -- to eliminate them from being indexed -- is so that relevancy ranking
is improved.  For example, if I index the word "the" instead of removing it
than when I send the search term "the cat" (without quotes) than records
with "the" will rank far higher vs. records with "cat" in my result set.
In fact records with "cat" may not even be on the first page.  Wasn't this
was stop word list created?

If my understanding is correct, is there a way for me to rank lower records
that have a hit due to a list of common words, such as stop words?  This
way: (1) I can than get rid of all the stop word list in the stop word
file, (2) solve the issue of searching on "be with me", et. al., and (3)
prevent the ranking issue.

Steve

On Mon, Aug 29, 2016 at 9:18 PM, Walter Underwood 
wrote:


Do not remove stop words. Want to search for “vitamin a”? That won’t work.

Stop word removal is a hack left over from when we were running search
engines in 64 kbytes of memory.

Yes, common words are less important for search, but removing them is a
brute force approach with severe side effects. Instead, we use a
proportional approach with the tf.idf model. That puts a higher weight on
rare words and a lower weight on common words.

For some real-life examples of problems with stop words, you can read the
list of movie titles that disappear with stemming and stop words. I
discovered these when I was running search at Netflix.

• Being There (this is the first one I noticed)
• To Be and To Have (Être et Avoir)
• To Have and To Have Not
• Once and Again
• To Be or Not To Be (1942) (OK, it isn’t just a quote from Hamlet)
• To Be or Not To Be (1983)
• Now and Then, Here and There
• Be with Me
• I’ll Be There
• It Had to Be You
• You Should Not Be Here
• You Are Here

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 29, 2016, at 5:39 PM, Steven White  wrote:

Thanks Shawn.  This is the best answer I have seen, much appreciated.

A follow up question, I want to remove stop words from the list, but if I
do, then search quality will degradation (and index size will grow (less

of

an issue)).  For example, if I remove "a", then if someone search for

"For

a Few Dollars More" (without quotes) chances are good records with "a"

will

land higher up that are not relevant to user's search.  How can I address
this?  Can I setup my schema so that records that get hits against a list
of words, let's say off the stop word list, are ranked lower?

Steve

On Sat, Aug 27, 2016 at 2:53 PM, Shawn Heisey 

wrote:

On 8/27/2016 12:39 PM, Shawn Heisey wrote:

I personally think that stopword removal is more of a problem than a
solution.

There actually is one thing that a stopword filter can dothat has little
to do with the purpose it was designed for.  You can make it impossible
to search for certain words.

[Rerank Query] Distributed search + pagination

2016-09-09 Thread Alessandro Benedetti
Hi guys,
was just experimenting some reranker with really low number of rerank docs
( 10= pageSize) .
Let's focus on the distributed enviroment and  the manual sharding approach.

Currently what happens is that the reranking task is delivered by the
shards, they rescore the docs and then send them back to the aggregator
node.

If you want to rerank only few docs ( leaving the others with the original
score following), this can be done in a single Solr instance ( the howmany
logic manages that in the reranker) .

What happens when you move to a distributed environment ?
The aggregator will aggregate both rescored and original scored documents,
making the final ranking inconsistent.
In the other hand if we make the rarankingDocs threshold dynamic ( to adapt
to start+rows) we can incur in the very annoying issue of having a document
sliding through the pages ( visible in the first page , then appearing
again in the third ect ect).

Any thought ?

Cheers

-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Sorting on DateRangeField?

2016-09-09 Thread David Smiley
Hi Alex,

DateRangeField extends some spatial stuff, which has that error message in
it, not in DateRangeField proper.  You cannot sort on a DateRangeField.  If
you want to... try adding either one plain docValues field if you just have
date instances, or a pair of them to hold a min & max and pick the right
one to sort on.

The "sorting by the query" in the context of spatial refers to doing a
score sorted sort, noting that the score of a spatial query can be the
distance or some formula involving the distance or possibly overlap of the
shape with something else.  e.g.  q={!geofilt score=distance ...}  This
is documented in the ref guide on the spatial page, including an example
for BBoxField.

&q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10))


I think that example could be simpler using {!bbox} but probably wants to
show different ways to skin this cat, so to speak.

~ David

On Wed, Sep 7, 2016 at 1:49 PM Alexandre Rafalovitch 
wrote:

> So, I tried sorting on a DateRangeField. And I got back:  "Sorting not
> supported on SpatialField: release_date, instead try sorting by
> query."
>
> Two questions:
> 1) Spatial is kind of super-internal info here, the message is rather
> confusing.
> 2) What's "sorting by query" in this case? Can I still sort on the
> field, but with a different syntax?
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


solr json facets return format

2016-09-09 Thread Michael Aleythe, Sternwald
Hi everybody,

i'm currently working on using the json request api for solr and hit a problem 
using facets. I'm using solr 5.5.2 and solrJ 5.5.2

When querying solr by url-parameters like so:
http://.../select?wt=json&facet.range=MEDIA_TS&f.MEDIA_TS.facet.range.end=2028-02-01T0:00:00.000Z&f.MEDIA_TS.facet.range.gap=%2B1YEAR&f.MEDIA_TS.facet.range.start=1993-01-01T0:00:00.000Z&facet=true

the returned json contains an element called "facet_counts" which is the top 
element for all faceting information.


:   "facet_counts":
:   {
:   :   "facet_queries":
:   :   {
:   :   },
:   :   "facet_fields":
:   :   {
:   :   },
:   :   "facet_dates":
:   :   {
:   :   },
:   :   "facet_ranges":
:   :   {
:   :   :   "MEDIA_TS":
:   :   :   {
:   :   :   :   "counts":
:   :   :   :   [
:   :   :   :   :   "1993-01-01T00:00:00Z",
:   :   :   :   :   0,
:   :   :   :   :   "1994-01-01T00:00:00Z",
:   :   :   :   :   1634,
:   :   :   :   :   "1995-01-01T00:00:00Z",
:   :   :   :   :   6656,
:   :   :   :   :   "1996-01-01T00:00:00Z",
:   :   :   :   :   30016,
:   :   :   :   :   "1997-01-01T00:00:00Z",
:   :   :   :   :   76819,
:   :   :   :   :   "1998-01-01T00:00:00Z",
:   :   :   :   :   152099,

The same query using the json request api like so:

{"facet":{"MEDIA_TS":{"field":"MEDIA_TS","gap":"+1YEAR","start":"1993-01-01T00:00:00Z","end":"2028-01-01T00:00:00Z","type":"range"}}}

Returns an element "facets" which is the top element for all faceting 
information:

:   "facets":
:   {
:   :   "count":5815481,
:   :   "MEDIA_TS":
:   :   {
:   :   :   "buckets":
:   :   :   [
:   :   :   :   {
:   :   :   :   :   "val":"1993-01-01T00:00:00Z",
:   :   :   :   :   "count":0
:   :   :   :   },
:   :   :   :   {
:   :   :   :   :   "val":"1994-01-01T00:00:00Z",
:   :   :   :   :   "count":1634
:   :   :   :   },
:   :   :   :   {
:   :   :   :   :   "val":"1995-01-01T00:00:00Z",
:   :   :   :   :   "count":6656
:   :   :   :   },

This inconsistency breaks the respone parser of solrJ. Am i doing something 
wrong?


Best Regards


Michael Aleythe
Java Entwickler | STERNWALD SYSTEMS GMBH




Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-09 Thread Bernd Fehling
After some more testing it feels like the parsing in 5.5.3 is _really_ messed 
up.

Query version 4.10.4:


  (text:(star AND trek AND wars)^200 OR text:("star trek wars")^350)


  (text:(star AND trek AND wars)^200 OR text:("star trek wars")^350)


  (+(((+text:star +text:trek +text:war)^200.0) PhraseQuery(text:"star trek 
war"^350.0)))/no_coord


  +(((+text:star +text:trek +text:war)^200.0) text:"star trek war"^350.0)



Same query version 5.5.3:


  (text:(star AND trek AND wars)^200 OR text:("star trek wars")^350)


  (text:(star AND trek AND wars)^200 OR text:("star trek wars")^350)


  (+((+text:star +text:trek +text:war^200.0 PhraseQuery(text:"star trek 
war"))~2))/no_coord


  +(((+text:star +text:trek +text:war)^200.0 text:"star trek war"^350.0)~2)


As you can see version 5.5.3 "parsedquery" is different to version 4.10.4.

And why is parsedquery different to parsedquery_toString in version 5.5.3?

Where is my second boost in "parsedquery" of 5.5.3?


Bernd



Am 09.09.2016 um 08:44 schrieb Bernd Fehling:
> Hi Greg,
> 
> thanks a lot, thats it.
> After setting q.op to OR it works _nearly_ as before with 4.10.4.
> 
> But how stupid this?
> I have in my schema 
> and also had q.op to AND to make sure my default _is_ AND,
> meant as conjunction between terms.
> But now I have q.op to OR and defaultOperator in schema to AND
> to just get _nearly_ my old behavior back.
> 
> schema has following comment:
> "... The default is OR, which is generally assumed so it is
> not a good idea to change it globally here.  The "q.op" request
> parameter takes precedence over this. ..."
> 
> What I don't understand is why they change some major internals
> and don't give any notice about how to keep old parsing behavior.
> 
> From my point of view the old parsing behavior was correct.
> If searching for a term without operator it is always OR, otherwise
> you can add "+" or "-" to modify that. Now with q.op AND it is
> modified to "+" as a MUST.
> 
> I still get some differences in search results between 4.10.4 and 5.5.3.
> What other side effects has this change of q.op from AND to OR in
> other parts of query handling, parsing and searching?
> 
> Regards
> Bernd
> 
> Am 09.09.2016 um 05:43 schrieb Greg Pendlebury:
>> I forgot to mention the tickets:
>> SOLR-2649 and SOLR-8812
>>
>> On 9 September 2016 at 13:38, Greg Pendlebury 
>> wrote:
>>
>>> Under 4.10 q.op was ignored by the edismax parser and always forced to OR.
>>> 5.5 is looking at the q.op=AND you requested.
>>>
>>> There are also some changes to the default values selected for mm, but I
>>> doubt those apply here since you are setting it explicitly.
>>>
>>> On 8 September 2016 at 00:35, Mikhail Khludnev  wrote:
>>>
 I suppose
+((text:star text:trek)~2)
 and
   +(+text:star +text:trek)
 are equal. mm=2 is equal to +foo +bar

 On Wed, Sep 7, 2016 at 10:52 AM, Bernd Fehling <
 bernd.fehl...@uni-bielefeld.de> wrote:

> Hi list,
>
> while going from SOLR 4.10.4 to 5.5.3 I noticed a change in query
 parsing.
> 4.10.4
> text:star text:trek
>   text:star text:trek
>   (+((text:star text:trek)~2))/no_coord
>   +((text:star text:trek)~2)
>
> 5.5.3
> text:star text:trek
>   text:star text:trek
>   (+(+text:star +text:trek))/no_coord
>   +(+text:star +text:trek)
>
> There are very many new features and changes between this two versions.
> It looks like a change in query parsing.
> Can someone point me to the solr or lucene jira about the changes?
> Or even give a hint how to get my "old" query parsing back?
>
> Regards
> Bernd
>



 --
 Sincerely yours
 Mikhail Khludnev



Re: Solr slow response collection-High Load

2016-09-09 Thread kshitij tyagi
Hi Ankush,

As you are updating highly on one of the cores, hard commit will play a
major role.

Reason: During hard commits solr merges your segments and this is a time
taking process.

During merging of segments indexing of documents gets affected i.e. gets
slower.

Try figuring out the right number of segments you need to have and focus on
analysing the merge process of solr when you are updating high amount of
data.

You will need to find the correct time for hard commits and the required
number of segments for the collection.

Hope this helps.



On Fri, Sep 9, 2016 at 2:13 PM, Ankush Khanna  wrote:

> Hello,
>
> We are running some test for improving our solr performance.
>
> We have around 15 collections on our solr cluster.
> But we are particularly interested in one collection holding high amount of
> documents. (
> https://gist.github.com/AnkushKhanna/9a472bccc02d9859fce07cb0204862da)
>
> Issue:
> We see that there are high response time from the collection, for the same
> queries, when user load or update load is increased.
>
> What are we aiming for:
> Low response time (lower than 3 sec) in high update/traffic.
>
> Current collection, production:
> * Solr Cloud, 2 Shards 2 Replicas
> * Indexed: 5.4 million documents
> * 45 indexed fields per document
> * Soft commit: 5 seconds
> * Hard commit: 10 minutes
>
> Test Setup:
> * Indexed: 3 million documents
> * Rest is same as in production
> * Using gatling to mimic behaviour of updates and user traffic
>
> Finding:
> We see the problem occurring more often when:
> * query size is greater than 2000 characters (we can limit the search to
> 2000 characters, but is there a solution to do this without limiting the
> size)
> * there is high updates going on
> * high user traffic
>
> Some settings I explored:
> * 1 Shard and 3 Replicas
> * Hard commit: 5 minutes (Referencing
> https://lucidworks.com/blog/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
> )
>
> With both the above solutions we see some improvements, but not drastic.
> (Attach images)
>
> I would like to have more insights into the following questions:
> * Why is there an improvement with lowering the hard commit time, would it
> interesting to explore with lower hard commit time.
>
> Can some one provide some other pointer I could explore.
>
> Regards
> Ankush Khanna
>


Solr slow response collection-High Load

2016-09-09 Thread Ankush Khanna
Hello,

We are running some test for improving our solr performance.

We have around 15 collections on our solr cluster.
But we are particularly interested in one collection holding high amount of
documents. (
https://gist.github.com/AnkushKhanna/9a472bccc02d9859fce07cb0204862da)

Issue:
We see that there are high response time from the collection, for the same
queries, when user load or update load is increased.

What are we aiming for:
Low response time (lower than 3 sec) in high update/traffic.

Current collection, production:
* Solr Cloud, 2 Shards 2 Replicas
* Indexed: 5.4 million documents
* 45 indexed fields per document
* Soft commit: 5 seconds
* Hard commit: 10 minutes

Test Setup:
* Indexed: 3 million documents
* Rest is same as in production
* Using gatling to mimic behaviour of updates and user traffic

Finding:
We see the problem occurring more often when:
* query size is greater than 2000 characters (we can limit the search to
2000 characters, but is there a solution to do this without limiting the
size)
* there is high updates going on
* high user traffic

Some settings I explored:
* 1 Shard and 3 Replicas
* Hard commit: 5 minutes (Referencing
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
)

With both the above solutions we see some improvements, but not drastic.
(Attach images)

I would like to have more insights into the following questions:
* Why is there an improvement with lowering the hard commit time, would it
interesting to explore with lower hard commit time.

Can some one provide some other pointer I could explore.

Regards
Ankush Khanna