Re: Performance on faceting using docValues

2015-03-05 Thread Toke Eskildsen
On Thu, 2015-03-05 at 21:14 +0100, lei wrote:

You present a very interesting observation. I have not noticed what you
describe, but on the other hand we have not done comparative speed
tests.

> q=*:*&fq=country:"US"&fq=category:112

First observation: Your query is '*:*, which is a "magic" query. Non-DV
faceting has optimizations both for this query (although that ought to
be disabled due to the fq) and for the "inverse" case where there are
more hits than non-hits. Perhaps you could test with a handful of
queries, which has different result sizes?

> &facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000

The combination of index order and a high limit might be an explanation:
When resolving the Strings of the facet result, non-DV will perform
ordinal-lookup, which is fast when done in monotonic rising order
(sort=index) and if the values are close (limit=2000). I do not know if
DV benefits the same way.

On the other hand, your limit seems to apply only to material, so it
could be that the real number of unique values is low and you just set
the limit to 2000 to be sure you get everything?

> &facet.field=manufacturer&facet.field=seller&facet.field=material
> &f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100
> &f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100
> &f.material.facet.mincount=1&sort=score+desc

How large is your index in bytes, how many documents does it contain and
is it single-shard or cloud? Could you paste the loglines containing
"UnInverted field", which describes the number of unique values and size
of your facet fields?

- Toke Eskildsen, State and University Library, Denmark




Re: problem with tutorial

2015-03-05 Thread gaohang wang
do you publish you solr in tomcat?which is the tomcat port?

2014-12-16 15:45 GMT+08:00 Xin Cai :

> hi Everyone
> I am a complete noob when it comes to Solr and when I try to follow the
> tutorial and run Solr I get the error message
>
> "Waiting to see Solr listening on port 8983 [-]  Still not seeing Solr
> listening on 8983 after 30 seconds!"
>
> I did some googling and all I found was instruction for removing grep
> commands which doesn't sound right to me...I have checked my ports and
> currently I don't have any service listening on port 8983 and my firewall
> is not on, so I am not sure what is happening. Any help would be
> appreciated. Thanks
>
> Xin Cai
>


Re: Admin UI doesn't show logs?

2015-03-05 Thread Alexandre Rafalovitch
And given that you configured it under Tomcat, I'd check that the logs
are generated at all first. Just as a sanity check.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 March 2015 at 20:15, Shawn Heisey  wrote:
> On 3/5/2015 6:01 PM, Jakov Sosic wrote:
>> I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.
>>
>> When I click on a "Logging" - I don't see actual entries but only:
>>
>>
>>"No Events available"
>
> The logging tab in the admin UI only shows log entries where the
> severity of the log is at least WARN.  The default file-level logging
> setup in the example logs a lot more -- it is normally set to INFO, and
> a normal startup will generate hundreds or thousands of log entries at
> the INFO level, which would be overwhelming to view in a web browser.
> That's why they are only logged to a file named ./logs/solr.log, if you
> have the log4j.properties file included in the example.
>
> I believe there is a way to configure Solr so that the admin UI will
> show you everything, but trust me when I say that you do most likely do
> not want those log entries to be in the admin UI, because there are a
> LOT of them.
>
> Thanks,
> Shawn
>


Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread Aman Tandon
Thanks shamik :)

With Regards
Aman Tandon

On Fri, Mar 6, 2015 at 3:30 AM, shamik  wrote:

> The other way you can do that is to specify the startup parameters in
> solr.in.sh.
>
> Example :
>
> SOLR_MODE=solrcloud
>
> ZK_HOST="zoohost1:2181,zoohost2:2181,zoohost3:2181"
>
> SOLR_PORT=4567
>
> You can simply start solr by running "./solr start"
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Admin UI doesn't show logs?

2015-03-05 Thread Shawn Heisey
On 3/5/2015 6:01 PM, Jakov Sosic wrote:
> I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.
>
> When I click on a "Logging" - I don't see actual entries but only:
>
>
>"No Events available"

The logging tab in the admin UI only shows log entries where the
severity of the log is at least WARN.  The default file-level logging
setup in the example logs a lot more -- it is normally set to INFO, and
a normal startup will generate hundreds or thousands of log entries at
the INFO level, which would be overwhelming to view in a web browser. 
That's why they are only logged to a file named ./logs/solr.log, if you
have the log4j.properties file included in the example.

I believe there is a way to configure Solr so that the admin UI will
show you everything, but trust me when I say that you do most likely do
not want those log entries to be in the admin UI, because there are a
LOT of them.

Thanks,
Shawn



Admin UI doesn't show logs?

2015-03-05 Thread Jakov Sosic

Hi,

I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.

When I click on a "Logging" - I don't see actual entries but only:


   "No Events available"


and round icon circling non stop.

When I click on Level, I see the same icon, and message "Loading ...".



Is there a hint or something you could point me to, so I could fix it?


Re: Solrcloud Index corruption

2015-03-05 Thread Shawn Heisey
On 3/5/2015 3:13 PM, Martin de Vries wrote:
> I understand there is not a "master" in SolrCloud. In our case we use
> haproxy as a load balancer for every request. So when indexing every
> document will be sent to a different solr server, immediately after
> each other. Maybe SolrCloud is not able to handle that correctly?

SolrCloud can handle that correctly, but currently sending index updates
to a core that is not the leader of the shard will incur a significant
performance hit, compared to always sending updates to the correct
core.  A small performance penalty would be understandable, because the
request must be redirected, but what actually happens is a much larger
penalty than anyone expected.  We have an issue in Jira to investigate
that performance issue and make it work as efficiently as possible.

Indexing batches of documents is recommended, not sending one document
per update request.

General performance problems with Solr itself can lead to extremely odd
and unpredictable behavior from SolrCloud.  Most often these kinds of
performance problems are related in some way to memory, either the java
heap or available memory in the system.

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Solrcloud Index corruption

2015-03-05 Thread Mark Miller
If you google replication can cause index corruption there are two jira issues 
that are the most likely cause of corruption in a solrcloud env. 

- Mark

> On Mar 5, 2015, at 2:20 PM, Garth Grimm  
> wrote:
> 
> For updates, the document will always get routed to the leader of the 
> appropriate shard, no matter what server first receives the request.
> 
> -Original Message-
> From: Martin de Vries [mailto:mar...@downnotifier.com] 
> Sent: Thursday, March 05, 2015 4:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solrcloud Index corruption
> 
> Hi Erick,
> 
> Thank you for your detailed reply.
> 
> You say in our case some docs didn't made it to the node, but that's not 
> really true: the docs can be found on the corrupted nodes when I search on 
> ID. The docs are also complete. The problem is that the docs do not appear 
> when I filter on certain fields (however the fields are in the doc and have 
> the right value when I search on ID). So something seems to be corrupt in the 
> filter index. We will try the checkindex, hopefully it is able to identify 
> the problematic cores.
> 
> I understand there is not a "master" in SolrCloud. In our case we use haproxy 
> as a load balancer for every request. So when indexing every document will be 
> sent to a different solr server, immediately after each other. Maybe 
> SolrCloud is not able to handle that correctly?
> 
> 
> Thanks,
> 
> Martin
> 
> 
> 
> 
> Erick Erickson schreef op 05.03.2015 19:00:
> 
>> Wait up. There's no "master" index in SolrCloud. Raw documents are 
>> forwarded to each replica, indexed and put in the local tlog. If a 
>> replica falls too far out of synch (say you take it offline), then the 
>> entire index _can_ be replicated from the leader and, if the leader's 
>> index was incomplete then that might propagate the error.
>> 
>> The practical consequence of this is that if _any_ replica has a 
>> complete index, you can recover. Before going there though, the 
>> brute-force approach is to just re-index everything from scratch.
>> That's likely easier, especially on indexes this size.
>> 
>> Here's what I'd do.
>> 
>> Assuming you have the Collections API calls for ADDREPLICA and 
>> DELETEREPLICA, then:
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Copy 1 good index from each shard somewhere just to have a backup.
>> 2> DELETEREPLICA on all the incomplete replicas
>> 2.5> I might shut down all the nodes at this point and check that all 
>> the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
>> deleted_core_dir'.
>> 3> ADDREPLICA to get the ones removed in back.
>> 
>> should copy the entire index from the leader for each replica. As you 
>> do the leadership will change and after you've deleted all the 
>> incomplete replicas, one of the complete ones will be the leader and 
>> you should be OK.
>> 
>> If you don't want to/can't use the Collections API, then
>> 0> Identify the complete replicas. If you're lucky you have at least
>> one for each shard.
>> 1> Shut 'em all down.
>> 2> Copy the good index somewhere just to have a backup.
>> 3> 'rm -rf data' for all the incomplete cores.
>> 4> Bring up the good cores.
>> 5> Bring up the cores that you deleted the data dirs from.
>> 
>> What should do is replicate the entire index from the leader. When you 
>> restart the good cores (step 4 above), they'll _become_ the leader.
>> 
>> bq: Is it possible to make Solrcloud invulnerable for network problems 
>> I'm a little surprised that this is happening. It sounds like the 
>> network problems were such that some nodes weren't out of touch long 
>> enough for Zookeeper to sense that they were down and put them into 
>> recovery. Not sure there's any way to secure against that.
>> 
>> bq: Is it possible to see if a core is corrupt?
>> There's "CheckIndex", here's at least one link:
>> http://java.dzone.com/news/lucene-and-solrs-checkindex
>> What you're describing, though, is that docs just didn't make it to 
>> the node, _not_ that the index has unexpected bits, bad disk sectors 
>> and the like so CheckIndex can't detect that. How would it know what 
>> _should_ have been in the index?
>> 
>> bq: I noticed a difference in the "Gen" column on Overview - 
>> Replication. Does this mean there is something wrong?
>> You cannot infer anything from this. In particular, the merging will 
>> be significantly different between a single full-reindex and what the 
>> state of segment merges is in an incrementally built index.
>> 
>> The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
>> thing is entirely misleading. In SolrCloud, since all the raw data is 
>> forwarded to all replicas, and any auto commits that happen may very 
>> well be slightly out of sync, the index size, number of segments, 
>> generations, and all that are pretty safely ignored.
>> 
>> Best,
>> Erick
>> 
>> On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
>> 
>> wrote:
>> 
>

Re: solr cloud does not start with many collections

2015-03-05 Thread Damien Kamerman
I've tried a few variations, with 3 x ZK, 6 X nodes, solr 4.10.3, solr 5.0
without any success and no real difference. There is a tipping point at
around 3,000-4,000 cores (varies depending on hardware) from where I can
restart the cloud OK within ~4min, to the cloud not working and
continuous 'conflicting
information about the leader of shard' warnings.

On 5 March 2015 at 14:15, Shawn Heisey  wrote:

> On 3/4/2015 5:37 PM, Damien Kamerman wrote:
> > I'm running on Solaris x86, I have plenty of memory and no real limits
> > # plimit 15560
> > 15560:  /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G
> > -XX:MaxMetasp
> >resource  current maximum
> >   time(seconds) unlimited   unlimited
> >   file(blocks)  unlimited   unlimited
> >   data(kbytes)  unlimited   unlimited
> >   stack(kbytes) unlimited   unlimited
> >   coredump(blocks)  unlimited   unlimited
> >   nofiles(descriptors)  65536   65536
> >   vmemory(kbytes)   unlimited   unlimited
> >
> > I've been testing with 3 nodes, and that seems OK up to around 3,000
> cores
> > total. I'm thinking of testing with more nodes.
>
> I have opened an issue for the problems I encountered while recreating a
> config similar to yours, which I have been doing on Linux.
>
> https://issues.apache.org/jira/browse/SOLR-7191
>
> It's possible that the only thing the issue will lead to is improvements
> in the documentation, but I'm hopeful that there will be code
> improvements too.
>
> Thanks,
> Shawn
>
>


-- 
Damien Kamerman


RE: Solrcloud Index corruption

2015-03-05 Thread Garth Grimm
For updates, the document will always get routed to the leader of the 
appropriate shard, no matter what server first receives the request.

-Original Message-
From: Martin de Vries [mailto:mar...@downnotifier.com] 
Sent: Thursday, March 05, 2015 4:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud Index corruption

Hi Erick,

Thank you for your detailed reply.

You say in our case some docs didn't made it to the node, but that's not really 
true: the docs can be found on the corrupted nodes when I search on ID. The 
docs are also complete. The problem is that the docs do not appear when I 
filter on certain fields (however the fields are in the doc and have the right 
value when I search on ID). So something seems to be corrupt in the filter 
index. We will try the checkindex, hopefully it is able to identify the 
problematic cores.

I understand there is not a "master" in SolrCloud. In our case we use haproxy 
as a load balancer for every request. So when indexing every document will be 
sent to a different solr server, immediately after each other. Maybe SolrCloud 
is not able to handle that correctly?


Thanks,

Martin




Erick Erickson schreef op 05.03.2015 19:00:

> Wait up. There's no "master" index in SolrCloud. Raw documents are 
> forwarded to each replica, indexed and put in the local tlog. If a 
> replica falls too far out of synch (say you take it offline), then the 
> entire index _can_ be replicated from the leader and, if the leader's 
> index was incomplete then that might propagate the error.
>
> The practical consequence of this is that if _any_ replica has a 
> complete index, you can recover. Before going there though, the 
> brute-force approach is to just re-index everything from scratch.
> That's likely easier, especially on indexes this size.
>
> Here's what I'd do.
>
> Assuming you have the Collections API calls for ADDREPLICA and 
> DELETEREPLICA, then:
> 0> Identify the complete replicas. If you're lucky you have at least
> one for each shard.
> 1> Copy 1 good index from each shard somewhere just to have a backup.
> 2> DELETEREPLICA on all the incomplete replicas
> 2.5> I might shut down all the nodes at this point and check that all 
> the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
> deleted_core_dir'.
> 3> ADDREPLICA to get the ones removed in back.
>
> should copy the entire index from the leader for each replica. As you 
> do the leadership will change and after you've deleted all the 
> incomplete replicas, one of the complete ones will be the leader and 
> you should be OK.
>
> If you don't want to/can't use the Collections API, then
> 0> Identify the complete replicas. If you're lucky you have at least
> one for each shard.
> 1> Shut 'em all down.
> 2> Copy the good index somewhere just to have a backup.
> 3> 'rm -rf data' for all the incomplete cores.
> 4> Bring up the good cores.
> 5> Bring up the cores that you deleted the data dirs from.
>
> What should do is replicate the entire index from the leader. When you 
> restart the good cores (step 4 above), they'll _become_ the leader.
>
> bq: Is it possible to make Solrcloud invulnerable for network problems 
> I'm a little surprised that this is happening. It sounds like the 
> network problems were such that some nodes weren't out of touch long 
> enough for Zookeeper to sense that they were down and put them into 
> recovery. Not sure there's any way to secure against that.
>
> bq: Is it possible to see if a core is corrupt?
> There's "CheckIndex", here's at least one link:
> http://java.dzone.com/news/lucene-and-solrs-checkindex
> What you're describing, though, is that docs just didn't make it to 
> the node, _not_ that the index has unexpected bits, bad disk sectors 
> and the like so CheckIndex can't detect that. How would it know what 
> _should_ have been in the index?
>
> bq: I noticed a difference in the "Gen" column on Overview - 
> Replication. Does this mean there is something wrong?
> You cannot infer anything from this. In particular, the merging will 
> be significantly different between a single full-reindex and what the 
> state of segment merges is in an incrementally built index.
>
> The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
> thing is entirely misleading. In SolrCloud, since all the raw data is 
> forwarded to all replicas, and any auto commits that happen may very 
> well be slightly out of sync, the index size, number of segments, 
> generations, and all that are pretty safely ignored.
>
> Best,
> Erick
>
> On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
> 
> wrote:
>
>> Hi Andrew, Even our master index is corrupt, so I'm afraid this won't 
>> help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:
>>
>>> Force a fetchindex on slave from master command:
>>> http://slave_host:port/solr/replication?command=fetchindex - from 
>>> http://wiki.apache.org/solr/SolrReplication [1] The above command 
>>> will download the whole 

Re: Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi Erick,

Thank you for your detailed reply.

You say in our case some docs didn't made it to the node, but that's 
not really true: the docs can be found on the corrupted nodes when I 
search on ID. The docs are also complete. The problem is that the docs 
do not appear when I filter on certain fields (however the fields are in 
the doc and have the right value when I search on ID). So something 
seems to be corrupt in the filter index. We will try the checkindex, 
hopefully it is able to identify the problematic cores.


I understand there is not a "master" in SolrCloud. In our case we use 
haproxy as a load balancer for every request. So when indexing every 
document will be sent to a different solr server, immediately after each 
other. Maybe SolrCloud is not able to handle that correctly?



Thanks,

Martin




Erick Erickson schreef op 05.03.2015 19:00:


Wait up. There's no "master" index in SolrCloud. Raw documents are
forwarded to each replica, indexed and put in the local tlog. If a
replica falls too far out of synch (say you take it offline), then 
the

entire index _can_ be replicated from the leader and, if the leader's
index was incomplete then that might propagate the error.

The practical consequence of this is that if _any_ replica has a
complete index, you can recover. Before going there though, the
brute-force approach is to just re-index everything from scratch.
That's likely easier, especially on indexes this size.

Here's what I'd do.

Assuming you have the Collections API calls for ADDREPLICA and
DELETEREPLICA, then:
0> Identify the complete replicas. If you're lucky you have at least
one for each shard.
1> Copy 1 good index from each shard somewhere just to have a backup.
2> DELETEREPLICA on all the incomplete replicas
2.5> I might shut down all the nodes at this point and check that all
the cores I'd deleted were gone. If any remnants exist, 'rm -rf
deleted_core_dir'.
3> ADDREPLICA to get the ones removed in back.

should copy the entire index from the leader for each replica. As
you do the leadership will change and after you've deleted all the
incomplete replicas, one of the complete ones will be the leader and
you should be OK.

If you don't want to/can't use the Collections API, then
0> Identify the complete replicas. If you're lucky you have at least
one for each shard.
1> Shut 'em all down.
2> Copy the good index somewhere just to have a backup.
3> 'rm -rf data' for all the incomplete cores.
4> Bring up the good cores.
5> Bring up the cores that you deleted the data dirs from.

What should do is replicate the entire index from the leader. When
you restart the good cores (step 4 above), they'll _become_ the
leader.

bq: Is it possible to make Solrcloud invulnerable for network 
problems

I'm a little surprised that this is happening. It sounds like the
network problems were such that some nodes weren't out of touch long
enough for Zookeeper to sense that they were down and put them into
recovery. Not sure there's any way to secure against that.

bq: Is it possible to see if a core is corrupt?
There's "CheckIndex", here's at least one link:
http://java.dzone.com/news/lucene-and-solrs-checkindex
What you're describing, though, is that docs just didn't make it to
the node, _not_ that the index has unexpected bits, bad disk sectors
and the like so CheckIndex can't detect that. How would it know what
_should_ have been in the index?

bq: I noticed a difference in the "Gen" column on Overview -
Replication. Does this mean there is something wrong?
You cannot infer anything from this. In particular, the merging will
be significantly different between a single full-reindex and what the
state of segment merges is in an incrementally built index.

The admin UI screen is rooted in the pre-cloud days, the Master/Slave
thing is entirely misleading. In SolrCloud, since all the raw data is
forwarded to all replicas, and any auto commits that happen may very
well be slightly out of sync, the index size, number of segments,
generations, and all that are pretty safely ignored.

Best,
Erick

On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 


wrote:

Hi Andrew, Even our master index is corrupt, so I'm afraid this 
won't

help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:


Force a fetchindex on slave from master command:
http://slave_host:port/solr/replication?command=fetchindex - from
http://wiki.apache.org/solr/SolrReplication [1] The above command
will download the whole index from master to slave, there are
configuration options in solr to make this problem happen less 
often

(allowing it to recover from new documents added and only send the
changes with a wider gap) - but I cant remember what those were.




Links:
--
[1] http://wiki.apache.org/solr/SolrReplication


Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread shamik
The other way you can do that is to specify the startup parameters in
solr.in.sh. 

Example :

SOLR_MODE=solrcloud

ZK_HOST="zoohost1:2181,zoohost2:2181,zoohost3:2181"

SOLR_PORT=4567

You can simply start solr by running "./solr start"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Performance on faceting using docValues

2015-03-05 Thread Ryan, Michael F. (LNG-DAY)
This is consistent with my experience. DocValues is faster for the first call 
(compared to UnInvertedField, which is what is used when there are no 
DocValues), but is slower on subsequent calls.

I'm curious as to this as well, since I haven't heard anyone else before you 
also mention this. I thought maybe I was the only one...

-Michael

-Original Message-
From: lei [mailto:simpl...@gmail.com] 
Sent: Thursday, March 05, 2015 2:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance on faceting using docValues

Here is the specs of some example query faceting on three fields (all string 
type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent 
calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues) consistently the 
total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei  wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, 
> and found that on first request, performance with docValues is much 
> faster than non-docValues. However, for subsequent requests (where the 
> queries are cached), the performance is slower for docValues than 
> non-docValues. Is this an expected behavior? Any idea or solution is 
> appreciated. Thanks.
>


Re: Performance on faceting using docValues

2015-03-05 Thread lei
Some mistake in the previous email.

Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 100+ ms (with docValues) vs. 30+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000

The query looks like this:

q=*:*&fq=country:"US"&fq=category:112&facet=on&facet.sort=index&facet.mincount=1&facet.limit=2000&facet.field=manufacturer&facet.field=seller&facet.field=material&f.manufacturer.facet.mincount=1&f.manufacturer.facet.sort=count&f.manufacturer.facet.limit=100&f.seller.facet.mincount=1&f.seller.facet.sort=count&f.seller.facet.limit=100&f.material.facet.mincount=1&sort=score+desc

Thanks,

On Thu, Mar 5, 2015 at 11:42 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> I have one consideration on top of my head, would you mind to show a brief
> snapshot by a sampler?
>
> On Thu, Mar 5, 2015 at 10:18 PM, lei  wrote:
>
> > Hi there,
> >
> > I'm testing facet performance with vs without docValues in Solr 4.7, and
> > found that on first request, performance with docValues is much faster
> than
> > non-docValues. However, for subsequent requests (where the queries are
> > cached), the performance is slower for docValues than non-docValues. Is
> > this an expected behavior? Any idea or solution is appreciated. Thanks.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Performance on faceting using docValues

2015-03-05 Thread Mikhail Khludnev
Hello,

I have one consideration on top of my head, would you mind to show a brief
snapshot by a sampler?

On Thu, Mar 5, 2015 at 10:18 PM, lei  wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, and
> found that on first request, performance with docValues is much faster than
> non-docValues. However, for subsequent requests (where the queries are
> cached), the performance is slower for docValues than non-docValues. Is
> this an expected behavior? Any idea or solution is appreciated. Thanks.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Performance on faceting using docValues

2015-03-05 Thread lei
Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei  wrote:

> Hi there,
>
> I'm testing facet performance with vs without docValues in Solr 4.7, and
> found that on first request, performance with docValues is much faster
> than non-docValues. However, for subsequent requests (where the queries are
> cached), the performance is slower for docValues than non-docValues. Is
> this an expected behavior? Any idea or solution is appreciated. Thanks.
>


Performance on faceting using docValues

2015-03-05 Thread lei
Hi there,

I'm testing facet performance with vs without docValues in Solr 4.7, and
found that on first request, performance with docValues is much faster than
non-docValues. However, for subsequent requests (where the queries are
cached), the performance is slower for docValues than non-docValues. Is
this an expected behavior? Any idea or solution is appreciated. Thanks.


Parsing cluster result's docs

2015-03-05 Thread Jorge Lazo

Hi,

I have a Solr instance using the clustering component (with the Lingo 
algorithm) working perfectly. However when I get back the cluster 
results only the ID's of these come back with it. What is the easiest 
way to retrieve full documents instead? Should I parse these IDs into a 
new query to Solr, or is there some configuration I am missing to return 
full docs instead of IDs?


If it matters, I am using Solr 4.10.

Thanks.


Labels for facets on Velocity

2015-03-05 Thread Henrique O. Santos
Hello,

I’ve been trying to have a pretty name for my facets on Velocity Response 
Writer. Do you know how can I do that?

For example, suppose that I am faceting field1. My query returns 3 facets: 
uglyfacet1, uglyfacet2 and uglyfacet3. I want to show them to the user a pretty 
name, like "Pretty Facet 1", "Pretty Facet 2" and "Pretty Facet 3".

The thing is that linking on velocity should still work, so the user can 
navigate the results.

Thank you.
Henrique.

Re: Solrcloud Index corruption

2015-03-05 Thread Erick Erickson
Wait up. There's no "master" index in SolrCloud. Raw documents are
forwarded to each replica, indexed and put in the local tlog. If a
replica falls too far out of synch (say you take it offline), then the
entire index _can_ be replicated from the leader and, if the leader's
index was incomplete then that might propagate the error.

The practical consequence of this is that if _any_ replica has a
complete index, you can recover. Before going there though, the
brute-force approach is to just re-index everything from scratch.
That's likely easier, especially on indexes this size.


Here's what I'd do.

Assuming you have the Collections API calls for ADDREPLICA and
DELETEREPLICA, then:
0> Identify the complete replicas. If you're lucky you have at least
one for each shard.
1> Copy 1 good index from each shard somewhere just to have a backup.
2> DELETEREPLICA on all the incomplete replicas
2.5> I might shut down all the nodes at this point and check that all
the cores I'd deleted were gone. If any remnants exist, 'rm -rf
deleted_core_dir'.
3> ADDREPLICA to get the ones removed in <2> back.

<3> should copy the entire index from the leader for each replica. As
you do <2> the leadership will change and after you've deleted all the
incomplete replicas, one of the complete ones will be the leader and
you should be OK.


If you don't want to/can't use the Collections API, then
0> Identify the complete replicas. If you're lucky you have at least
one for each shard.
1> Shut 'em all down.
2> Copy the good index somewhere just to have a backup.
3> 'rm -rf data' for all the incomplete cores.
4> Bring up the good cores.
5> Bring up the cores that you deleted the data dirs from.

What <5> should do is replicate the entire index from the leader. When
you restart the good cores (step 4 above), they'll _become_ the
leader.


bq: Is it possible to make Solrcloud invulnerable for network problems
I'm a little surprised that this is happening. It sounds like the
network problems were such that some nodes weren't out of touch long
enough for Zookeeper to sense that they were down and put them into
recovery. Not sure there's any way to secure against that.

bq: Is it possible to see if a core is corrupt?
There's "CheckIndex", here's at least one link:
http://java.dzone.com/news/lucene-and-solrs-checkindex
What you're describing, though, is that docs just didn't make it to
the node, _not_ that the index has unexpected bits, bad disk sectors
and the like so CheckIndex can't detect that. How would it know what
_should_ have been in the index?

bq:  I noticed a difference in the "Gen" column on Overview -
Replication. Does this mean there is something wrong?
You cannot infer anything from this. In particular, the merging will
be significantly different between a single full-reindex and what the
state of segment merges is in an incrementally built index.

The admin UI screen is rooted in the pre-cloud days, the Master/Slave
thing is entirely misleading. In SolrCloud, since all the raw data is
forwarded to all replicas, and any auto commits that happen may very
well be slightly out of sync, the index size, number of segments,
generations, and all that are pretty safely ignored.

Best,
Erick

On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries  wrote:
> Hi Andrew,
>
> Even our master index is corrupt, so I'm afraid this won't help in our case.
>
> Martin
>
>
> Andrew Butkus schreef op 05.03.2015 16:45:
>
>
>> Force a fetchindex on slave from master command:
>> http://slave_host:port/solr/replication?command=fetchindex - from
>> http://wiki.apache.org/solr/SolrReplication
>>
>> The above command will download the whole index from master to slave,
>> there are configuration options in solr to make this problem happen less
>> often (allowing it to recover from new documents added and only send the
>> changes with a wider gap) - but I cant remember what those were.
>
>


Re: SOLR query parameters

2015-03-05 Thread Erick Erickson
Whew! I was afraid that my memory was failing since I'd no memory of
ever seeing anything remotely like that!

Erick

On Thu, Mar 5, 2015 at 6:04 AM,   wrote:
> Please ignore my question.
>
> These are form field names which I created a couple of months ago, not SOLR 
> query parameters.
>
> Philippe
>
>
> - Mail original -
> De: phi...@free.fr
> À: solr-user@lucene.apache.org
> Envoyé: Jeudi 5 Mars 2015 14:54:26
> Objet: SOLR query parameters
>
> Hello,
>
> could someone please explain what these SOLR query parameter keywords stand 
> for:
>
> - ppcdb
>
> - srbycb
>
> - as
>
> For instance,
>
> http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=&srbycb=&as=&q=kaiser&sort=
>
> I could not find them in the SOLR documentation.
>
> Many thanks.
>
> Philippe
>
>
>
>


Re: Issue while enabling clustering/integrating carrot2 with solr 4.4.0 and tomact under ubuntu

2015-03-05 Thread Erick Erickson
Class cast exceptions are usually the result of having a mix of old
and new jars in your classpath, or even of having the same jar in two
different places. Is this possible here?

Best,
Erick

On Wed, Mar 4, 2015 at 6:44 PM, sthita  wrote:
> 1.My solr.xml
>
> 
> 
>  hostPort="8980">
>  config="solrconfig.xml" collection="rn"/>
> ..
> ..
>   
> 
>
>
> 2.My solrconfig.xml changes for carrot2 integrate
>
>  class="org.apache.solr.handler.clustering.ClusteringComponent"
> enable="${solr.clustering.enabled:false}" name="clustering">
>
>  default
>   name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm
>  20
>
>  
>
>  enable="${solr.clustering.enabled:false}" class="solr.SearchHandler">
> .
> .
> .
> .
> 
>
> 
>
>
> 3.Copied all the required jars to /solr/lib folder those are
> solr-clustering-4.4.0.jar
> carrot2-mini-3.6.2.jar
> hppc-0.4.1.jar
> jackson-core-asl-1.7.4.jar
> jackson-mapper-asl-1.7.4.jar
> mahout-collections-1.0.jar
> mahout-math-0.6.jar
> simple-xml-2.6.4.jar
>
> 4.created a file named setenv.sh under "/usr/share/tomcat/bin/"  with
> clustering enabled
>
> CATALINA_OPTS = "-Dsolr.clustering.enabled=true"
>
>  5.Restarted tomcat and
>
> I am getting the following  error while starting solr server after
> -Dsolr.clustering.enabled=true on CATALINA_OPTS
>
> ERROR org.apache.solr.servlet.SolrDispatchFilter –
> null:org.apache.solr.common.SolrException: SolrCore 'rn0' is not available
> due to init f
> ailure: Error Instantiating SearchComponent,
> org.apache.solr.handler.clustering.ClusteringComponent failed to instantiate
> org.apache.solr.handler.component.SearchCompon
> ent
> at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Error Instantiating
> SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent
> failed to instantiate org.a
> pache.solr.handler.component.SearchComponent
> at org.apache.solr.core.SolrCore.(SolrCore.java:835)
> at org.apache.solr.core.SolrCore.(SolrCore.java:629)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
> at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
> at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:1)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ... 3 more
> Caused by: org.apache.solr.common.SolrException: Error Instantiating
> SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent
> failed to instantiate org.apache.solr.handler.component.SearchComponent
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
> at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:586)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2173)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2167)
> at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2200)
> at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1231)
> at org.apache.solr.core.SolrCore.(SolrCore.java:767)
> ... 11 more
> Caused by: java.lang.ClassCastException: class
> org.apache.solr.handler.clustering.ClusteringComponent
> at java.lang.Class.asSubclass(Class.java:3208)
> at
>

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-05 Thread Erick Erickson
I would, BTW, either just get rid of the  all together or
make it much higher, i.e. 10. I don't think this is really your
problem, but you're creating a lot of segments here.

But I'm kind of at a loss as to what would be different about your setup.
Is there _any_ chance that you have some secondary process looking at
your index that's maintaining open searchers? Any custom code that's
perhaps failing to close searchers? Is this a Unix or Windows system?

And just to be really clear, you _only_ seeing more segments being
added, right? If you're only counting files in the index directory, it's
_possible_ that merging is happening, you're just seeing new files take
the place of old ones.

Best,
Erick

On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey  wrote:
> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>> I _think_, but don't know for sure, that the merging stuff doesn't get
>> triggered until you commit, it doesn't "just happen".
>>
>> Shot in the dark...
>
> I believe that new segments are created when the indexing buffer
> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
> anytime a new segment is created, the merge policy is checked to see
> whether a merge is needed.
>
> Thanks,
> Shawn
>


Re: [ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Oded Sofer
Hello Mike, 

How are you? This is Oded Sofer from IBM Guardium. 
We had moved to SolrCloud, I thought you may be able to help me find something. 
The Facet search is very slow, I do not know how to check what is the size of 
our facets (gb / count). 

Do you know how I can check it? 
 

 On Thursday, March 5, 2015 5:28 PM, Michael McCandless 
 wrote:
   

 October 2014, Apache Solr™ 4.10.4 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.10.4 is available for immediate download at:

    http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4

Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13
bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Mike McCandless

http://blog.mikemccandless.com

   

[ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.4 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.10.4 is available for immediate download at:

http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4

Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13
bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Mike McCandless

http://blog.mikemccandless.com


RE: Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi Andrew,

Even our master index is corrupt, so I'm afraid this won't help in our 
case.


Martin


Andrew Butkus schreef op 05.03.2015 16:45:


Force a fetchindex on slave from master command:
http://slave_host:port/solr/replication?command=fetchindex - from
http://wiki.apache.org/solr/SolrReplication

The above command will download the whole index from master to slave,
there are configuration options in solr to make this problem happen 
less
often (allowing it to recover from new documents added and only send 
the

changes with a wider gap) - but I cant remember what those were.




RE: Solrcloud Index corruption

2015-03-05 Thread Andrew Butkus
Force a fetchindex on slave from master command: 
http://slave_host:port/solr/replication?command=fetchindex - from 
http://wiki.apache.org/solr/SolrReplication

The above command will download the whole index from master to slave, there are 
configuration options in solr to make this problem happen less often (allowing 
it to recover from new documents added and only send the changes with a wider 
gap) - but I cant remember what those were.

-Original Message-
From: Andrew Butkus [mailto:andrew.but...@c6-intelligence.com] 
Sent: 05 March 2015 14:42
To: 
Subject: Re: Solrcloud Index corruption

We had a similar issue, when this happened we did a fetch index on each core 
out of sync to put them back right again 

Sent from my iPhone

> On 5 Mar 2015, at 14:40, Martin de Vries  wrote:
> 
> Hi,
> 
> We have index corruption on some cores on our Solrcloud running version 
> 4.8.1. The index is corrupt on several servers. (for example: when we do an 
> fq search we get results on some servers, on other servers we don't, while 
> the stored document contains the field on all servers).
> 
> A full re-index of the content didn't help, so we created a new core and did 
> the reindex on that one.
> 
> We think the index corruption is caused by network issues we had a few weeks 
> ago. I hope someone can help us with some questions:
> - Is it possible to make Solrcloud invulnerable for network problems like 
> packet loss or connection errors? Will it for example help to use an SSL 
> connection between the Solr servers?
> - Is it possible to see if a core is corrupt? We now noticed because we 
> didn't find some documents while searching on the website, but don't know if 
> other cores are corrupt. I noticed a difference in the "Gen" column on 
> Overview - Replication. Does this mean there is something wrong? Or is there 
> any other way to see the corruption?
> 
> Corrupt core:
>VersionGenSize
> Master (Searching)14255655752492023309472.41 MB
> Master (Replicable)14255660985102023310-
> Slave (Searching)14255655752532023308472.38 MB
> 
> Re-created core:
>VersionGenSize
> Master (Searching)142556610817435283.98 MB
> Master (Replicable)142556610817435-
> Slave (Searching)142556610667435288.24 MB
> 
> 
> 
> Kind regards,
> 
> Martin
> 
> 


Re: Solrcloud Index corruption

2015-03-05 Thread Andrew Butkus
We had a similar issue, when this happened we did a fetch index on each core 
out of sync to put them back right again 

Sent from my iPhone

> On 5 Mar 2015, at 14:40, Martin de Vries  wrote:
> 
> Hi,
> 
> We have index corruption on some cores on our Solrcloud running version 
> 4.8.1. The index is corrupt on several servers. (for example: when we do an 
> fq search we get results on some servers, on other servers we don't, while 
> the stored document contains the field on all servers).
> 
> A full re-index of the content didn't help, so we created a new core and did 
> the reindex on that one.
> 
> We think the index corruption is caused by network issues we had a few weeks 
> ago. I hope someone can help us with some questions:
> - Is it possible to make Solrcloud invulnerable for network problems like 
> packet loss or connection errors? Will it for example help to use an SSL 
> connection between the Solr servers?
> - Is it possible to see if a core is corrupt? We now noticed because we 
> didn't find some documents while searching on the website, but don't know if 
> other cores are corrupt. I noticed a difference in the "Gen" column on 
> Overview - Replication. Does this mean there is something wrong? Or is there 
> any other way to see the corruption?
> 
> Corrupt core:
>VersionGenSize
> Master (Searching)14255655752492023309472.41 MB
> Master (Replicable)14255660985102023310-
> Slave (Searching)14255655752532023308472.38 MB
> 
> Re-created core:
>VersionGenSize
> Master (Searching)142556610817435283.98 MB
> Master (Replicable)142556610817435-
> Slave (Searching)142556610667435288.24 MB
> 
> 
> 
> Kind regards,
> 
> Martin
> 
> 


Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi,

We have index corruption on some cores on our Solrcloud running version 
4.8.1. The index is corrupt on several servers. (for example: when we do 
an fq search we get results on some servers, on other servers we don't, 
while the stored document contains the field on all servers).


A full re-index of the content didn't help, so we created a new core 
and did the reindex on that one.


We think the index corruption is caused by network issues we had a few 
weeks ago. I hope someone can help us with some questions:
- Is it possible to make Solrcloud invulnerable for network problems 
like packet loss or connection errors? Will it for example help to use 
an SSL connection between the Solr servers?
- Is it possible to see if a core is corrupt? We now noticed because we 
didn't find some documents while searching on the website, but don't 
know if other cores are corrupt. I noticed a difference in the "Gen" 
column on Overview - Replication. Does this mean there is something 
wrong? Or is there any other way to see the corruption?


Corrupt core:
Version Gen Size
Master (Searching)  1425565575249   2023309 472.41 MB
Master (Replicable) 1425566098510   2023310 -
Slave (Searching)   1425565575253   2023308 472.38 MB

Re-created core:
Version Gen Size
Master (Searching)  1425566108174   35  283.98 MB
Master (Replicable) 1425566108174   35  -
Slave (Searching)   1425566106674   35  288.24 MB



Kind regards,

Martin




Re: Cores and and ranking (search quality)

2015-03-05 Thread Toke Eskildsen
On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
> My question is this: if I put my data in multiple cores and use
> distributed search will the ranking be different if I had all my data
> in a single core?

Yes, it will be different. The practical impact depends on how
homogeneous your data are across the shards and how large your shards
are. If you have small and dissimilar shards, your ranking will suffer a
lot.

Work is being done to remedy this:
https://issues.apache.org/jira/browse/SOLR-1632

> Also, will facet and more-like-this quality / result be the same?

It is not formally guaranteed, but for most practical purposes, faceting
on multi-shards will give you the same results as single-shards.

I don't know about more-like-this. My guess is that it will be affected
in the same way that standard searches are.

> Also, reading the distributed search wiki
> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does
> the search and result merging (all I have to do is issue a search), is
> this correct?

Yes. From a user-perspective, searches are no different.

- Toke Eskildsen, State and University Library, Denmark




Re: SOLR query parameters

2015-03-05 Thread phiroc
Please ignore my question.

These are form field names which I created a couple of months ago, not SOLR 
query parameters.

Philippe


- Mail original -
De: phi...@free.fr
À: solr-user@lucene.apache.org
Envoyé: Jeudi 5 Mars 2015 14:54:26
Objet: SOLR query parameters

Hello,

could someone please explain what these SOLR query parameter keywords stand for:

- ppcdb

- srbycb

- as

For instance,

http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=&srbycb=&as=&q=kaiser&sort=

I could not find them in the SOLR documentation.

Many thanks.

Philippe






SOLR query parameters

2015-03-05 Thread phiroc
Hello,

could someone please explain what these SOLR query parameter keywords stand for:

- ppcdb

- srbycb

- as

For instance,

http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=&srbycb=&as=&q=kaiser&sort=

I could not find them in the SOLR documentation.

Many thanks.

Philippe






RE: Cores and and ranking (search quality)

2015-03-05 Thread Markus Jelsma
Hello - facetting will be the same and distributed more like this is also 
possible since 5.0, and there is a working patch for 4.10.3. Regular search 
will work as well since 5.0 because of distributed IDF, which you need to 
enable manually. Behaviour will not be the same if you rely on average document 
length statistics, which is true when you use BM25 instead of the default TFIDF 
similarity. Solr will do the result merging so everything is transparent, 
awesome!

Markus 
 
-Original message-
> From:johnmu...@aol.com 
> Sent: Thursday 5th March 2015 14:38
> To: solr-user@lucene.apache.org
> Subject: Cores and and ranking (search quality)
> 
> Hi,
> 
> I have data in which I will index and search on.  This data is well define 
> such that I can index into a single core or multiple cores like so: 
> core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc.
> 
> My question is this: if I put my data in multiple cores and use distributed 
> search will the ranking be different if I had all my data in a single core?  
> If yes, how will it be different?  Also, will facet and more-like-this 
> quality / result be the same?
> 
> Also, reading the distributed search wiki 
> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the 
> search and result merging (all I have to do is issue a search), is this 
> correct?
> 
> Thanks!
> 
> - MJ
> 


Cores and and ranking (search quality)

2015-03-05 Thread johnmunir
Hi,

I have data in which I will index and search on.  This data is well define such 
that I can index into a single core or multiple cores like so: core_1:Jan2015, 
core_2:Feb2015, core_3:Mar2015, etc.

My question is this: if I put my data in multiple cores and use distributed 
search will the ranking be different if I had all my data in a single core?  If 
yes, how will it be different?  Also, will facet and more-like-this quality / 
result be the same?

Also, reading the distributed search wiki 
(http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the 
search and result merging (all I have to do is issue a search), is this correct?

Thanks!

- MJ


Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread Aman Tandon
Thanks Erick.

So for the other audience who got stuck in same situation. Here is the
solution.

If you are able to run the remote/local zookeeper ensemble, then you can
create the Solr Cluster by the following method.

Suppose you have an zookeeper ensemble of 3 zookeeper server running on
three different machines which has the IP addresses as :192.168.11.12,
192.168.101.12, 192.168.101.92 and these machines are using the zookeeper
client port as 2181 for every machine (as mentioned in zoo.cfg) and in my
case I am using the solr-5.0.0 version

Now go to the bin directory of your extracted solr tar/zip file and run
this command for each solr server of your SolrCloud cluster.

./solr start -c -z 192.168.11.12:2181,192.168.101.12:2181,
192.168.101.92:2181 -p 4567

-p -> for specifying the another port number other than 8983 in my case it
is 4567
-c -> to start server in cloud mode
-z -> to specifying the zookeeper host address

With Regards
Aman Tandon

On Wed, Mar 4, 2015 at 5:18 AM, Erick Erickson 
wrote:

> Have you seen this page?:
>
> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
>
> This is really "the new way"
>
> Best,
> Erick
>
> On Tue, Mar 3, 2015 at 7:18 AM, Aman Tandon 
> wrote:
> > Thanks Shawn, also thanks for sharing info about chroot.
> >
> > I am trying to implement the solr cloud with solr-5.0.0.
> >
> > I also checked the documentations https://wiki.apache.org/solr/SolrCloud
> ,
> > the method shown there is using start.jar. But after few update start.jar
> > (jetty) will not work. So I want to go through the way which will work as
> > it is even after upgrade.
> >
> > So how could i start it from bin directory with all these parameters of
> > external zookeeper or any other best way which you can suggest.
> >
> > With Regards
> > Aman Tandon
> >
> > On Tue, Mar 3, 2015 at 8:09 PM, Shawn Heisey 
> wrote:
> >
> >> On 3/3/2015 4:21 AM, Aman Tandon wrote:
> >> > I am new to solr-cloud, i have connected the zookeepers located on 3
> >> remote
> >> > servers. All the configs are uploaded and linked successfully.
> >> >
> >> > Now i am stuck to how to start solr in cloud mode using these external
> >> > zookeeper which are remotely located.
> >> >
> >> > Zookeeper is installed at 3 servers and using the 2181 as client
> port. ON
> >> > all three server, solr server along with external zookeeper is
> present.
> >> >
> >> > solrcloud1.com (solr + zookeper is present)
> >> > solrcloud2.com
> >> > solrcloud3.com
> >> >
> >> > Now i have to start the solr by telling the solr to use the external
> >> > zookeeper. So how should I do that.
> >>
> >> You simply tell Solr about all your zookeeper servers on startup, using
> >> the zkHost property.  Here's the format of that property:
> >>
> >> server1:port,server2:port,server3:port/solr1
> >>
> >> The /solr1 part (the ZK chroot) is optional, but I recommend it ... it
> >> can be just about any text you like, starting with a forward slash.
> >> What this does is put all of SolrCloud's information inside a path in
> >> zookeeper, sort of like a filesystem.  With no chroot, that information
> >> is placed at the "root" of zookeeper.  If you want to use a zookeeper
> >> ensemble for multiple applications, you're going to need a chroot.  Even
> >> when multiple applications are not required, I recommend it to keep the
> >> zookeeper root clean.
> >>
> >> You can see some examples of zkHost values in the javadoc for SolrJ:
> >>
> >>
> >>
> http://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient%28java.lang.String%29
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Help needed to understand zookeeper in solrcloud

2015-03-05 Thread svante karlsson
The network will "only" split if you get errors on your network hardware.
(or fiddle with iptables) Let's say you placed your zookeepers in separate
racks and someone pulls network cable between them - that will leave you
with 5 working servers but they can't reach each other. This is "split
brain scenario".

>Are they guaranteed to split 4/0
Yes. A node failure will not partition the network.

> any odd number - it could be 21 even
Since all write a synchronous you don't want to use a too large number of
zookeepers since that would slow down the cluster. Use a reasonable number
to reach your SLA. (3 or 5 are common choices)

>and from a single failure you drop to an even number - then there is the
danger of NOT getting quorum.
No, se above.

BUT, if you first lose most of  your nodes due to a network partition and
then lose another due to node failure - then you are out of quorum.


/svante



2015-03-05 9:29 GMT+01:00 Julian Perry :

>
> I start out with 5 zk's.  All good.
>
> One zk fails - I'm left with four.  Are they guaranteed
> to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
> right?
>
> Surely to start with 5 zk's (or in fact any odd number - it
> could be 21 even), and from a single failure you drop to an
> even number - then there is the danger of NOT getting quorum.
>
> So ... I can only assume that there is a mechanism in place
> inside zk to guarantee this cannot happen, right?
>
> --
> Cheers
> Jules.
>
>
>
> On 05/03/2015 06:47, svante karlsson wrote:
>
>> Yes, as long as it is three (the majority of 5) or more.
>>
>> This is why there is no point of having a 4 node cluster. This would also
>> require 3 nodes for majority thus giving it the fault tolerance of a 3
>> node
>> cluster but slower and more expensive.
>>
>>
>>
>> 2015-03-05 7:41 GMT+01:00 Aman Tandon :
>>
>>  Thanks svante.
>>>
>>> What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
>>> zookeeper election can occur with 4 / even number of zookeepers alive?
>>>
>>> With Regards
>>> Aman Tandon
>>>
>>> On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson  wrote:
>>>
>>>  synchronous update of state and a requirement of more than half the
 zookeepers alive (and in sync) this makes it impossible to have a "split
 brain" situation ie when you partition a network and get let's say 3

>>> alive
>>>
 on one side and 2 on the other.

 In this case the 2 node networks stops serving request since it's not in
 majority.








 2015-03-03 13:15 GMT+01:00 Aman Tandon :

  But how they handle the failure?
>
> With Regards
> Aman Tandon
>
> On Tue, Mar 3, 2015 at 5:17 PM, O. Klein  wrote:
>
>  Zookeeper requires a majority of servers to be available. For
>>
> example:
>>>
 Five
>
>> machines ZooKeeper can handle the failure of two machines. That's why
>>
> odd

> numbers are recommended.
>>
>


Re: Help needed to understand zookeeper in solrcloud

2015-03-05 Thread Julian Perry


I start out with 5 zk's.  All good.

One zk fails - I'm left with four.  Are they guaranteed
to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
right?

Surely to start with 5 zk's (or in fact any odd number - it
could be 21 even), and from a single failure you drop to an
even number - then there is the danger of NOT getting quorum.

So ... I can only assume that there is a mechanism in place
inside zk to guarantee this cannot happen, right?

--
Cheers
Jules.


On 05/03/2015 06:47, svante karlsson wrote:

Yes, as long as it is three (the majority of 5) or more.

This is why there is no point of having a 4 node cluster. This would also
require 3 nodes for majority thus giving it the fault tolerance of a 3 node
cluster but slower and more expensive.



2015-03-05 7:41 GMT+01:00 Aman Tandon :


Thanks svante.

What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
zookeeper election can occur with 4 / even number of zookeepers alive?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson  wrote:


synchronous update of state and a requirement of more than half the
zookeepers alive (and in sync) this makes it impossible to have a "split
brain" situation ie when you partition a network and get let's say 3

alive

on one side and 2 on the other.

In this case the 2 node networks stops serving request since it's not in
majority.








2015-03-03 13:15 GMT+01:00 Aman Tandon :


But how they handle the failure?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 5:17 PM, O. Klein  wrote:


Zookeeper requires a majority of servers to be available. For

example:

Five

machines ZooKeeper can handle the failure of two machines. That's why

odd

numbers are recommended.