Re: Performance on faceting using docValues

2015-03-05 Thread Mikhail Khludnev
Hello,

I have one consideration on top of my head, would you mind to show a brief
snapshot by a sampler?

On Thu, Mar 5, 2015 at 10:18 PM, lei simpl...@gmail.com wrote:

 Hi there,

 I'm testing facet performance with vs without docValues in Solr 4.7, and
 found that on first request, performance with docValues is much faster than
 non-docValues. However, for subsequent requests (where the queries are
 cached), the performance is slower for docValues than non-docValues. Is
 this an expected behavior? Any idea or solution is appreciated. Thanks.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Performance on faceting using docValues

2015-03-05 Thread lei
Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei simpl...@gmail.com wrote:

 Hi there,

 I'm testing facet performance with vs without docValues in Solr 4.7, and
 found that on first request, performance with docValues is much faster
 than non-docValues. However, for subsequent requests (where the queries are
 cached), the performance is slower for docValues than non-docValues. Is
 this an expected behavior? Any idea or solution is appreciated. Thanks.



Performance on faceting using docValues

2015-03-05 Thread lei
Hi there,

I'm testing facet performance with vs without docValues in Solr 4.7, and
found that on first request, performance with docValues is much faster than
non-docValues. However, for subsequent requests (where the queries are
cached), the performance is slower for docValues than non-docValues. Is
this an expected behavior? Any idea or solution is appreciated. Thanks.


Re: Performance on faceting using docValues

2015-03-05 Thread lei
Some mistake in the previous email.

Here is the specs of some example query faceting on three fields (all
string type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues)
subsequent calls: 100+ ms (with docValues) vs. 30+ ms (w/o docValues)
consistently
the total # of docs returned is around 600,000

The query looks like this:

q=*:*fq=country:USfq=category:112facet=onfacet.sort=indexfacet.mincount=1facet.limit=2000facet.field=manufacturerfacet.field=sellerfacet.field=materialf.manufacturer.facet.mincount=1f.manufacturer.facet.sort=countf.manufacturer.facet.limit=100f.seller.facet.mincount=1f.seller.facet.sort=countf.seller.facet.limit=100f.material.facet.mincount=1sort=score+desc

Thanks,

On Thu, Mar 5, 2015 at 11:42 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello,

 I have one consideration on top of my head, would you mind to show a brief
 snapshot by a sampler?

 On Thu, Mar 5, 2015 at 10:18 PM, lei simpl...@gmail.com wrote:

  Hi there,
 
  I'm testing facet performance with vs without docValues in Solr 4.7, and
  found that on first request, performance with docValues is much faster
 than
  non-docValues. However, for subsequent requests (where the queries are
  cached), the performance is slower for docValues than non-docValues. Is
  this an expected behavior? Any idea or solution is appreciated. Thanks.
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
 mkhlud...@griddynamics.com



RE: Performance on faceting using docValues

2015-03-05 Thread Ryan, Michael F. (LNG-DAY)
This is consistent with my experience. DocValues is faster for the first call 
(compared to UnInvertedField, which is what is used when there are no 
DocValues), but is slower on subsequent calls.

I'm curious as to this as well, since I haven't heard anyone else before you 
also mention this. I thought maybe I was the only one...

-Michael

-Original Message-
From: lei [mailto:simpl...@gmail.com] 
Sent: Thursday, March 05, 2015 2:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Performance on faceting using docValues

Here is the specs of some example query faceting on three fields (all string 
type):
first call: 1+ sec (with docValues) vs. 4+ sec (w/o docValues) subsequent 
calls: 30+ ms (with docValues) vs. 100+ ms (w/o docValues) consistently the 
total # of docs returned is around 600,000



On Thu, Mar 5, 2015 at 11:18 AM, lei simpl...@gmail.com wrote:

 Hi there,

 I'm testing facet performance with vs without docValues in Solr 4.7, 
 and found that on first request, performance with docValues is much 
 faster than non-docValues. However, for subsequent requests (where the 
 queries are cached), the performance is slower for docValues than 
 non-docValues. Is this an expected behavior? Any idea or solution is 
 appreciated. Thanks.



Re: Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi Erick,

Thank you for your detailed reply.

You say in our case some docs didn't made it to the node, but that's 
not really true: the docs can be found on the corrupted nodes when I 
search on ID. The docs are also complete. The problem is that the docs 
do not appear when I filter on certain fields (however the fields are in 
the doc and have the right value when I search on ID). So something 
seems to be corrupt in the filter index. We will try the checkindex, 
hopefully it is able to identify the problematic cores.


I understand there is not a master in SolrCloud. In our case we use 
haproxy as a load balancer for every request. So when indexing every 
document will be sent to a different solr server, immediately after each 
other. Maybe SolrCloud is not able to handle that correctly?



Thanks,

Martin




Erick Erickson schreef op 05.03.2015 19:00:


Wait up. There's no master index in SolrCloud. Raw documents are
forwarded to each replica, indexed and put in the local tlog. If a
replica falls too far out of synch (say you take it offline), then 
the

entire index _can_ be replicated from the leader and, if the leader's
index was incomplete then that might propagate the error.

The practical consequence of this is that if _any_ replica has a
complete index, you can recover. Before going there though, the
brute-force approach is to just re-index everything from scratch.
That's likely easier, especially on indexes this size.

Here's what I'd do.

Assuming you have the Collections API calls for ADDREPLICA and
DELETEREPLICA, then:
0 Identify the complete replicas. If you're lucky you have at least
one for each shard.
1 Copy 1 good index from each shard somewhere just to have a backup.
2 DELETEREPLICA on all the incomplete replicas
2.5 I might shut down all the nodes at this point and check that all
the cores I'd deleted were gone. If any remnants exist, 'rm -rf
deleted_core_dir'.
3 ADDREPLICA to get the ones removed in back.

should copy the entire index from the leader for each replica. As
you do the leadership will change and after you've deleted all the
incomplete replicas, one of the complete ones will be the leader and
you should be OK.

If you don't want to/can't use the Collections API, then
0 Identify the complete replicas. If you're lucky you have at least
one for each shard.
1 Shut 'em all down.
2 Copy the good index somewhere just to have a backup.
3 'rm -rf data' for all the incomplete cores.
4 Bring up the good cores.
5 Bring up the cores that you deleted the data dirs from.

What should do is replicate the entire index from the leader. When
you restart the good cores (step 4 above), they'll _become_ the
leader.

bq: Is it possible to make Solrcloud invulnerable for network 
problems

I'm a little surprised that this is happening. It sounds like the
network problems were such that some nodes weren't out of touch long
enough for Zookeeper to sense that they were down and put them into
recovery. Not sure there's any way to secure against that.

bq: Is it possible to see if a core is corrupt?
There's CheckIndex, here's at least one link:
http://java.dzone.com/news/lucene-and-solrs-checkindex
What you're describing, though, is that docs just didn't make it to
the node, _not_ that the index has unexpected bits, bad disk sectors
and the like so CheckIndex can't detect that. How would it know what
_should_ have been in the index?

bq: I noticed a difference in the Gen column on Overview -
Replication. Does this mean there is something wrong?
You cannot infer anything from this. In particular, the merging will
be significantly different between a single full-reindex and what the
state of segment merges is in an incrementally built index.

The admin UI screen is rooted in the pre-cloud days, the Master/Slave
thing is entirely misleading. In SolrCloud, since all the raw data is
forwarded to all replicas, and any auto commits that happen may very
well be slightly out of sync, the index size, number of segments,
generations, and all that are pretty safely ignored.

Best,
Erick

On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
mar...@downnotifier.com

wrote:

Hi Andrew, Even our master index is corrupt, so I'm afraid this 
won't

help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:


Force a fetchindex on slave from master command:
http://slave_host:port/solr/replication?command=fetchindex - from
http://wiki.apache.org/solr/SolrReplication [1] The above command
will download the whole index from master to slave, there are
configuration options in solr to make this problem happen less 
often

(allowing it to recover from new documents added and only send the
changes with a wider gap) - but I cant remember what those were.




Links:
--
[1] http://wiki.apache.org/solr/SolrReplication


Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread shamik
The other way you can do that is to specify the startup parameters in
solr.in.sh. 

Example :

SOLR_MODE=solrcloud

ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181

SOLR_PORT=4567

You can simply start solr by running ./solr start



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solrcloud Index corruption

2015-03-05 Thread Garth Grimm
For updates, the document will always get routed to the leader of the 
appropriate shard, no matter what server first receives the request.

-Original Message-
From: Martin de Vries [mailto:mar...@downnotifier.com] 
Sent: Thursday, March 05, 2015 4:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud Index corruption

Hi Erick,

Thank you for your detailed reply.

You say in our case some docs didn't made it to the node, but that's not really 
true: the docs can be found on the corrupted nodes when I search on ID. The 
docs are also complete. The problem is that the docs do not appear when I 
filter on certain fields (however the fields are in the doc and have the right 
value when I search on ID). So something seems to be corrupt in the filter 
index. We will try the checkindex, hopefully it is able to identify the 
problematic cores.

I understand there is not a master in SolrCloud. In our case we use haproxy 
as a load balancer for every request. So when indexing every document will be 
sent to a different solr server, immediately after each other. Maybe SolrCloud 
is not able to handle that correctly?


Thanks,

Martin




Erick Erickson schreef op 05.03.2015 19:00:

 Wait up. There's no master index in SolrCloud. Raw documents are 
 forwarded to each replica, indexed and put in the local tlog. If a 
 replica falls too far out of synch (say you take it offline), then the 
 entire index _can_ be replicated from the leader and, if the leader's 
 index was incomplete then that might propagate the error.

 The practical consequence of this is that if _any_ replica has a 
 complete index, you can recover. Before going there though, the 
 brute-force approach is to just re-index everything from scratch.
 That's likely easier, especially on indexes this size.

 Here's what I'd do.

 Assuming you have the Collections API calls for ADDREPLICA and 
 DELETEREPLICA, then:
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Copy 1 good index from each shard somewhere just to have a backup.
 2 DELETEREPLICA on all the incomplete replicas
 2.5 I might shut down all the nodes at this point and check that all 
 the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
 deleted_core_dir'.
 3 ADDREPLICA to get the ones removed in back.

 should copy the entire index from the leader for each replica. As you 
 do the leadership will change and after you've deleted all the 
 incomplete replicas, one of the complete ones will be the leader and 
 you should be OK.

 If you don't want to/can't use the Collections API, then
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Shut 'em all down.
 2 Copy the good index somewhere just to have a backup.
 3 'rm -rf data' for all the incomplete cores.
 4 Bring up the good cores.
 5 Bring up the cores that you deleted the data dirs from.

 What should do is replicate the entire index from the leader. When you 
 restart the good cores (step 4 above), they'll _become_ the leader.

 bq: Is it possible to make Solrcloud invulnerable for network problems 
 I'm a little surprised that this is happening. It sounds like the 
 network problems were such that some nodes weren't out of touch long 
 enough for Zookeeper to sense that they were down and put them into 
 recovery. Not sure there's any way to secure against that.

 bq: Is it possible to see if a core is corrupt?
 There's CheckIndex, here's at least one link:
 http://java.dzone.com/news/lucene-and-solrs-checkindex
 What you're describing, though, is that docs just didn't make it to 
 the node, _not_ that the index has unexpected bits, bad disk sectors 
 and the like so CheckIndex can't detect that. How would it know what 
 _should_ have been in the index?

 bq: I noticed a difference in the Gen column on Overview - 
 Replication. Does this mean there is something wrong?
 You cannot infer anything from this. In particular, the merging will 
 be significantly different between a single full-reindex and what the 
 state of segment merges is in an incrementally built index.

 The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
 thing is entirely misleading. In SolrCloud, since all the raw data is 
 forwarded to all replicas, and any auto commits that happen may very 
 well be slightly out of sync, the index size, number of segments, 
 generations, and all that are pretty safely ignored.

 Best,
 Erick

 On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
 mar...@downnotifier.com
 wrote:

 Hi Andrew, Even our master index is corrupt, so I'm afraid this won't 
 help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:

 Force a fetchindex on slave from master command:
 http://slave_host:port/solr/replication?command=fetchindex - from 
 http://wiki.apache.org/solr/SolrReplication [1] The above command 
 will download the whole index from master to slave, there are 
 configuration options in solr to make this 

Re: solr cloud does not start with many collections

2015-03-05 Thread Damien Kamerman
I've tried a few variations, with 3 x ZK, 6 X nodes, solr 4.10.3, solr 5.0
without any success and no real difference. There is a tipping point at
around 3,000-4,000 cores (varies depending on hardware) from where I can
restart the cloud OK within ~4min, to the cloud not working and
continuous 'conflicting
information about the leader of shard' warnings.

On 5 March 2015 at 14:15, Shawn Heisey apa...@elyograg.org wrote:

 On 3/4/2015 5:37 PM, Damien Kamerman wrote:
  I'm running on Solaris x86, I have plenty of memory and no real limits
  # plimit 15560
  15560:  /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G
  -XX:MaxMetasp
 resource  current maximum
time(seconds) unlimited   unlimited
file(blocks)  unlimited   unlimited
data(kbytes)  unlimited   unlimited
stack(kbytes) unlimited   unlimited
coredump(blocks)  unlimited   unlimited
nofiles(descriptors)  65536   65536
vmemory(kbytes)   unlimited   unlimited
 
  I've been testing with 3 nodes, and that seems OK up to around 3,000
 cores
  total. I'm thinking of testing with more nodes.

 I have opened an issue for the problems I encountered while recreating a
 config similar to yours, which I have been doing on Linux.

 https://issues.apache.org/jira/browse/SOLR-7191

 It's possible that the only thing the issue will lead to is improvements
 in the documentation, but I'm hopeful that there will be code
 improvements too.

 Thanks,
 Shawn




-- 
Damien Kamerman


Re: Solrcloud Index corruption

2015-03-05 Thread Mark Miller
If you google replication can cause index corruption there are two jira issues 
that are the most likely cause of corruption in a solrcloud env. 

- Mark

 On Mar 5, 2015, at 2:20 PM, Garth Grimm garthgr...@averyranchconsulting.com 
 wrote:
 
 For updates, the document will always get routed to the leader of the 
 appropriate shard, no matter what server first receives the request.
 
 -Original Message-
 From: Martin de Vries [mailto:mar...@downnotifier.com] 
 Sent: Thursday, March 05, 2015 4:14 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solrcloud Index corruption
 
 Hi Erick,
 
 Thank you for your detailed reply.
 
 You say in our case some docs didn't made it to the node, but that's not 
 really true: the docs can be found on the corrupted nodes when I search on 
 ID. The docs are also complete. The problem is that the docs do not appear 
 when I filter on certain fields (however the fields are in the doc and have 
 the right value when I search on ID). So something seems to be corrupt in the 
 filter index. We will try the checkindex, hopefully it is able to identify 
 the problematic cores.
 
 I understand there is not a master in SolrCloud. In our case we use haproxy 
 as a load balancer for every request. So when indexing every document will be 
 sent to a different solr server, immediately after each other. Maybe 
 SolrCloud is not able to handle that correctly?
 
 
 Thanks,
 
 Martin
 
 
 
 
 Erick Erickson schreef op 05.03.2015 19:00:
 
 Wait up. There's no master index in SolrCloud. Raw documents are 
 forwarded to each replica, indexed and put in the local tlog. If a 
 replica falls too far out of synch (say you take it offline), then the 
 entire index _can_ be replicated from the leader and, if the leader's 
 index was incomplete then that might propagate the error.
 
 The practical consequence of this is that if _any_ replica has a 
 complete index, you can recover. Before going there though, the 
 brute-force approach is to just re-index everything from scratch.
 That's likely easier, especially on indexes this size.
 
 Here's what I'd do.
 
 Assuming you have the Collections API calls for ADDREPLICA and 
 DELETEREPLICA, then:
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Copy 1 good index from each shard somewhere just to have a backup.
 2 DELETEREPLICA on all the incomplete replicas
 2.5 I might shut down all the nodes at this point and check that all 
 the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
 deleted_core_dir'.
 3 ADDREPLICA to get the ones removed in back.
 
 should copy the entire index from the leader for each replica. As you 
 do the leadership will change and after you've deleted all the 
 incomplete replicas, one of the complete ones will be the leader and 
 you should be OK.
 
 If you don't want to/can't use the Collections API, then
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Shut 'em all down.
 2 Copy the good index somewhere just to have a backup.
 3 'rm -rf data' for all the incomplete cores.
 4 Bring up the good cores.
 5 Bring up the cores that you deleted the data dirs from.
 
 What should do is replicate the entire index from the leader. When you 
 restart the good cores (step 4 above), they'll _become_ the leader.
 
 bq: Is it possible to make Solrcloud invulnerable for network problems 
 I'm a little surprised that this is happening. It sounds like the 
 network problems were such that some nodes weren't out of touch long 
 enough for Zookeeper to sense that they were down and put them into 
 recovery. Not sure there's any way to secure against that.
 
 bq: Is it possible to see if a core is corrupt?
 There's CheckIndex, here's at least one link:
 http://java.dzone.com/news/lucene-and-solrs-checkindex
 What you're describing, though, is that docs just didn't make it to 
 the node, _not_ that the index has unexpected bits, bad disk sectors 
 and the like so CheckIndex can't detect that. How would it know what 
 _should_ have been in the index?
 
 bq: I noticed a difference in the Gen column on Overview - 
 Replication. Does this mean there is something wrong?
 You cannot infer anything from this. In particular, the merging will 
 be significantly different between a single full-reindex and what the 
 state of segment merges is in an incrementally built index.
 
 The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
 thing is entirely misleading. In SolrCloud, since all the raw data is 
 forwarded to all replicas, and any auto commits that happen may very 
 well be slightly out of sync, the index size, number of segments, 
 generations, and all that are pretty safely ignored.
 
 Best,
 Erick
 
 On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
 mar...@downnotifier.com
 wrote:
 
 Hi Andrew, Even our master index is corrupt, so I'm afraid this won't 
 help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:
 
 

Re: Solrcloud Index corruption

2015-03-05 Thread Shawn Heisey
On 3/5/2015 3:13 PM, Martin de Vries wrote:
 I understand there is not a master in SolrCloud. In our case we use
 haproxy as a load balancer for every request. So when indexing every
 document will be sent to a different solr server, immediately after
 each other. Maybe SolrCloud is not able to handle that correctly?

SolrCloud can handle that correctly, but currently sending index updates
to a core that is not the leader of the shard will incur a significant
performance hit, compared to always sending updates to the correct
core.  A small performance penalty would be understandable, because the
request must be redirected, but what actually happens is a much larger
penalty than anyone expected.  We have an issue in Jira to investigate
that performance issue and make it work as efficiently as possible.

Indexing batches of documents is recommended, not sending one document
per update request.

General performance problems with Solr itself can lead to extremely odd
and unpredictable behavior from SolrCloud.  Most often these kinds of
performance problems are related in some way to memory, either the java
heap or available memory in the system.

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Admin UI doesn't show logs?

2015-03-05 Thread Shawn Heisey
On 3/5/2015 6:01 PM, Jakov Sosic wrote:
 I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.

 When I click on a Logging - I don't see actual entries but only:


No Events available

The logging tab in the admin UI only shows log entries where the
severity of the log is at least WARN.  The default file-level logging
setup in the example logs a lot more -- it is normally set to INFO, and
a normal startup will generate hundreds or thousands of log entries at
the INFO level, which would be overwhelming to view in a web browser. 
That's why they are only logged to a file named ./logs/solr.log, if you
have the log4j.properties file included in the example.

I believe there is a way to configure Solr so that the admin UI will
show you everything, but trust me when I say that you do most likely do
not want those log entries to be in the admin UI, because there are a
LOT of them.

Thanks,
Shawn



Admin UI doesn't show logs?

2015-03-05 Thread Jakov Sosic

Hi,

I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.

When I click on a Logging - I don't see actual entries but only:


   No Events available


and round icon circling non stop.

When I click on Level, I see the same icon, and message Loading 



Is there a hint or something you could point me to, so I could fix it?


Re: Admin UI doesn't show logs?

2015-03-05 Thread Alexandre Rafalovitch
And given that you configured it under Tomcat, I'd check that the logs
are generated at all first. Just as a sanity check.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 5 March 2015 at 20:15, Shawn Heisey apa...@elyograg.org wrote:
 On 3/5/2015 6:01 PM, Jakov Sosic wrote:
 I'm running 4.10.3 under tomcat 7, and I have an issue with Admin UI.

 When I click on a Logging - I don't see actual entries but only:


No Events available

 The logging tab in the admin UI only shows log entries where the
 severity of the log is at least WARN.  The default file-level logging
 setup in the example logs a lot more -- it is normally set to INFO, and
 a normal startup will generate hundreds or thousands of log entries at
 the INFO level, which would be overwhelming to view in a web browser.
 That's why they are only logged to a file named ./logs/solr.log, if you
 have the log4j.properties file included in the example.

 I believe there is a way to configure Solr so that the admin UI will
 show you everything, but trust me when I say that you do most likely do
 not want those log entries to be in the admin UI, because there are a
 LOT of them.

 Thanks,
 Shawn



Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread Aman Tandon
Thanks shamik :)

With Regards
Aman Tandon

On Fri, Mar 6, 2015 at 3:30 AM, shamik sham...@gmail.com wrote:

 The other way you can do that is to specify the startup parameters in
 solr.in.sh.

 Example :

 SOLR_MODE=solrcloud

 ZK_HOST=zoohost1:2181,zoohost2:2181,zoohost3:2181

 SOLR_PORT=4567

 You can simply start solr by running ./solr start



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-start-solr-in-solr-cloud-mode-using-external-zookeeper-tp4190630p4191286.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR query parameters

2015-03-05 Thread Erick Erickson
Whew! I was afraid that my memory was failing since I'd no memory of
ever seeing anything remotely like that!

Erick

On Thu, Mar 5, 2015 at 6:04 AM,  phi...@free.fr wrote:
 Please ignore my question.

 These are form field names which I created a couple of months ago, not SOLR 
 query parameters.

 Philippe


 - Mail original -
 De: phi...@free.fr
 À: solr-user@lucene.apache.org
 Envoyé: Jeudi 5 Mars 2015 14:54:26
 Objet: SOLR query parameters

 Hello,

 could someone please explain what these SOLR query parameter keywords stand 
 for:

 - ppcdb

 - srbycb

 - as

 For instance,

 http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort=

 I could not find them in the SOLR documentation.

 Many thanks.

 Philippe






Re: Help needed to understand zookeeper in solrcloud

2015-03-05 Thread Julian Perry


I start out with 5 zk's.  All good.

One zk fails - I'm left with four.  Are they guaranteed
to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
right?

Surely to start with 5 zk's (or in fact any odd number - it
could be 21 even), and from a single failure you drop to an
even number - then there is the danger of NOT getting quorum.

So ... I can only assume that there is a mechanism in place
inside zk to guarantee this cannot happen, right?

--
Cheers
Jules.


On 05/03/2015 06:47, svante karlsson wrote:

Yes, as long as it is three (the majority of 5) or more.

This is why there is no point of having a 4 node cluster. This would also
require 3 nodes for majority thus giving it the fault tolerance of a 3 node
cluster but slower and more expensive.



2015-03-05 7:41 GMT+01:00 Aman Tandon amantandon...@gmail.com:


Thanks svante.

What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
zookeeper election can occur with 4 / even number of zookeepers alive?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson s...@csi.se wrote:


synchronous update of state and a requirement of more than half the
zookeepers alive (and in sync) this makes it impossible to have a split
brain situation ie when you partition a network and get let's say 3

alive

on one side and 2 on the other.

In this case the 2 node networks stops serving request since it's not in
majority.








2015-03-03 13:15 GMT+01:00 Aman Tandon amantandon...@gmail.com:


But how they handle the failure?

With Regards
Aman Tandon

On Tue, Mar 3, 2015 at 5:17 PM, O. Klein kl...@octoweb.nl wrote:


Zookeeper requires a majority of servers to be available. For

example:

Five

machines ZooKeeper can handle the failure of two machines. That's why

odd

numbers are recommended.


Re: Help needed to understand zookeeper in solrcloud

2015-03-05 Thread svante karlsson
The network will only split if you get errors on your network hardware.
(or fiddle with iptables) Let's say you placed your zookeepers in separate
racks and someone pulls network cable between them - that will leave you
with 5 working servers but they can't reach each other. This is split
brain scenario.

Are they guaranteed to split 4/0
Yes. A node failure will not partition the network.

 any odd number - it could be 21 even
Since all write a synchronous you don't want to use a too large number of
zookeepers since that would slow down the cluster. Use a reasonable number
to reach your SLA. (3 or 5 are common choices)

and from a single failure you drop to an even number - then there is the
danger of NOT getting quorum.
No, se above.

BUT, if you first lose most of  your nodes due to a network partition and
then lose another due to node failure - then you are out of quorum.


/svante



2015-03-05 9:29 GMT+01:00 Julian Perry ju...@limitless.co.uk:


 I start out with 5 zk's.  All good.

 One zk fails - I'm left with four.  Are they guaranteed
 to split 4/0 or 3/1 - because if they split 2/2 I'm screwed,
 right?

 Surely to start with 5 zk's (or in fact any odd number - it
 could be 21 even), and from a single failure you drop to an
 even number - then there is the danger of NOT getting quorum.

 So ... I can only assume that there is a mechanism in place
 inside zk to guarantee this cannot happen, right?

 --
 Cheers
 Jules.



 On 05/03/2015 06:47, svante karlsson wrote:

 Yes, as long as it is three (the majority of 5) or more.

 This is why there is no point of having a 4 node cluster. This would also
 require 3 nodes for majority thus giving it the fault tolerance of a 3
 node
 cluster but slower and more expensive.



 2015-03-05 7:41 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Thanks svante.

 What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will
 zookeeper election can occur with 4 / even number of zookeepers alive?

 With Regards
 Aman Tandon

 On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson s...@csi.se wrote:

  synchronous update of state and a requirement of more than half the
 zookeepers alive (and in sync) this makes it impossible to have a split
 brain situation ie when you partition a network and get let's say 3

 alive

 on one side and 2 on the other.

 In this case the 2 node networks stops serving request since it's not in
 majority.








 2015-03-03 13:15 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  But how they handle the failure?

 With Regards
 Aman Tandon

 On Tue, Mar 3, 2015 at 5:17 PM, O. Klein kl...@octoweb.nl wrote:

  Zookeeper requires a majority of servers to be available. For

 example:

 Five

 machines ZooKeeper can handle the failure of two machines. That's why

 odd

 numbers are recommended.




Re: How to start solr in solr cloud mode using external zookeeper ?

2015-03-05 Thread Aman Tandon
Thanks Erick.

So for the other audience who got stuck in same situation. Here is the
solution.

If you are able to run the remote/local zookeeper ensemble, then you can
create the Solr Cluster by the following method.

Suppose you have an zookeeper ensemble of 3 zookeeper server running on
three different machines which has the IP addresses as :192.168.11.12,
192.168.101.12, 192.168.101.92 and these machines are using the zookeeper
client port as 2181 for every machine (as mentioned in zoo.cfg) and in my
case I am using the solr-5.0.0 version

Now go to the bin directory of your extracted solr tar/zip file and run
this command for each solr server of your SolrCloud cluster.

./solr start -c -z 192.168.11.12:2181,192.168.101.12:2181,
192.168.101.92:2181 -p 4567

-p - for specifying the another port number other than 8983 in my case it
is 4567
-c - to start server in cloud mode
-z - to specifying the zookeeper host address

With Regards
Aman Tandon

On Wed, Mar 4, 2015 at 5:18 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Have you seen this page?:

 https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference

 This is really the new way

 Best,
 Erick

 On Tue, Mar 3, 2015 at 7:18 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Thanks Shawn, also thanks for sharing info about chroot.
 
  I am trying to implement the solr cloud with solr-5.0.0.
 
  I also checked the documentations https://wiki.apache.org/solr/SolrCloud
 ,
  the method shown there is using start.jar. But after few update start.jar
  (jetty) will not work. So I want to go through the way which will work as
  it is even after upgrade.
 
  So how could i start it from bin directory with all these parameters of
  external zookeeper or any other best way which you can suggest.
 
  With Regards
  Aman Tandon
 
  On Tue, Mar 3, 2015 at 8:09 PM, Shawn Heisey apa...@elyograg.org
 wrote:
 
  On 3/3/2015 4:21 AM, Aman Tandon wrote:
   I am new to solr-cloud, i have connected the zookeepers located on 3
  remote
   servers. All the configs are uploaded and linked successfully.
  
   Now i am stuck to how to start solr in cloud mode using these external
   zookeeper which are remotely located.
  
   Zookeeper is installed at 3 servers and using the 2181 as client
 port. ON
   all three server, solr server along with external zookeeper is
 present.
  
   solrcloud1.com (solr + zookeper is present)
   solrcloud2.com
   solrcloud3.com
  
   Now i have to start the solr by telling the solr to use the external
   zookeeper. So how should I do that.
 
  You simply tell Solr about all your zookeeper servers on startup, using
  the zkHost property.  Here's the format of that property:
 
  server1:port,server2:port,server3:port/solr1
 
  The /solr1 part (the ZK chroot) is optional, but I recommend it ... it
  can be just about any text you like, starting with a forward slash.
  What this does is put all of SolrCloud's information inside a path in
  zookeeper, sort of like a filesystem.  With no chroot, that information
  is placed at the root of zookeeper.  If you want to use a zookeeper
  ensemble for multiple applications, you're going to need a chroot.  Even
  when multiple applications are not required, I recommend it to keep the
  zookeeper root clean.
 
  You can see some examples of zkHost values in the javadoc for SolrJ:
 
 
 
 http://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient%28java.lang.String%29
 
  Thanks,
  Shawn
 
 



Re: [ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Oded Sofer
Hello Mike, 

How are you? This is Oded Sofer from IBM Guardium. 
We had moved to SolrCloud, I thought you may be able to help me find something. 
The Facet search is very slow, I do not know how to check what is the size of 
our facets (gb / count). 

Do you know how I can check it? 
 

 On Thursday, March 5, 2015 5:28 PM, Michael McCandless 
luc...@mikemccandless.com wrote:
   

 October 2014, Apache Solr™ 4.10.4 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.10.4 is available for immediate download at:

    http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4

Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13
bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Mike McCandless

http://blog.mikemccandless.com

   

Re: Issue while enabling clustering/integrating carrot2 with solr 4.4.0 and tomact under ubuntu

2015-03-05 Thread Erick Erickson
Class cast exceptions are usually the result of having a mix of old
and new jars in your classpath, or even of having the same jar in two
different places. Is this possible here?

Best,
Erick

On Wed, Mar 4, 2015 at 6:44 PM, sthita sthit...@gmail.com wrote:
 1.My solr.xml

 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true sharedLib=/solr/lib
 cores defaultCoreName=rn0  hostContext=/solr adminPath=/admin/cores
 hostPort=8980
 core schema=schema.xml shard=shard1 instanceDir=rn0/ name=rn0
 config=solrconfig.xml collection=rn/
 ..
 ..
   /cores
 /solr


 2.My solrconfig.xml changes for carrot2 integrate

 searchComponent
 class=org.apache.solr.handler.clustering.ClusteringComponent
 enable=${solr.clustering.enabled:false} name=clustering
lst name=engine
  str name=namedefault/str
  str
 name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str
  str name=LingoClusteringAlgorithm.desiredClusterCountBase20/str
/lst
  /searchComponent

 requestHandler name=/clustering startup=lazy
 enable=${solr.clustering.enabled:false} class=solr.SearchHandler
 .
 .
 .
 .
 /requestHandler

 lib dir=/solr/lib regex=.*\.jar /


 3.Copied all the required jars to /solr/lib folder those are
 solr-clustering-4.4.0.jar
 carrot2-mini-3.6.2.jar
 hppc-0.4.1.jar
 jackson-core-asl-1.7.4.jar
 jackson-mapper-asl-1.7.4.jar
 mahout-collections-1.0.jar
 mahout-math-0.6.jar
 simple-xml-2.6.4.jar

 4.created a file named setenv.sh under /usr/share/tomcat/bin/  with
 clustering enabled

 CATALINA_OPTS = -Dsolr.clustering.enabled=true

  5.Restarted tomcat and

 I am getting the following  error while starting solr server after
 -Dsolr.clustering.enabled=true on CATALINA_OPTS

 ERROR org.apache.solr.servlet.SolrDispatchFilter –
 null:org.apache.solr.common.SolrException: SolrCore 'rn0' is not available
 due to init f
 ailure: Error Instantiating SearchComponent,
 org.apache.solr.handler.clustering.ClusteringComponent failed to instantiate
 org.apache.solr.handler.component.SearchCompon
 ent
 at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.solr.common.SolrException: Error Instantiating
 SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent
 failed to instantiate org.a
 pache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.(SolrCore.java:835)
 at org.apache.solr.core.SolrCore.(SolrCore.java:629)
 at
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:1)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 ... 3 more
 Caused by: org.apache.solr.common.SolrException: Error Instantiating
 SearchComponent, org.apache.solr.handler.clustering.ClusteringComponent
 failed to instantiate org.apache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:586)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2173)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2167)
 at 

Re: Solrcloud Index corruption

2015-03-05 Thread Erick Erickson
Wait up. There's no master index in SolrCloud. Raw documents are
forwarded to each replica, indexed and put in the local tlog. If a
replica falls too far out of synch (say you take it offline), then the
entire index _can_ be replicated from the leader and, if the leader's
index was incomplete then that might propagate the error.

The practical consequence of this is that if _any_ replica has a
complete index, you can recover. Before going there though, the
brute-force approach is to just re-index everything from scratch.
That's likely easier, especially on indexes this size.


Here's what I'd do.

Assuming you have the Collections API calls for ADDREPLICA and
DELETEREPLICA, then:
0 Identify the complete replicas. If you're lucky you have at least
one for each shard.
1 Copy 1 good index from each shard somewhere just to have a backup.
2 DELETEREPLICA on all the incomplete replicas
2.5 I might shut down all the nodes at this point and check that all
the cores I'd deleted were gone. If any remnants exist, 'rm -rf
deleted_core_dir'.
3 ADDREPLICA to get the ones removed in 2 back.

3 should copy the entire index from the leader for each replica. As
you do 2 the leadership will change and after you've deleted all the
incomplete replicas, one of the complete ones will be the leader and
you should be OK.


If you don't want to/can't use the Collections API, then
0 Identify the complete replicas. If you're lucky you have at least
one for each shard.
1 Shut 'em all down.
2 Copy the good index somewhere just to have a backup.
3 'rm -rf data' for all the incomplete cores.
4 Bring up the good cores.
5 Bring up the cores that you deleted the data dirs from.

What 5 should do is replicate the entire index from the leader. When
you restart the good cores (step 4 above), they'll _become_ the
leader.


bq: Is it possible to make Solrcloud invulnerable for network problems
I'm a little surprised that this is happening. It sounds like the
network problems were such that some nodes weren't out of touch long
enough for Zookeeper to sense that they were down and put them into
recovery. Not sure there's any way to secure against that.

bq: Is it possible to see if a core is corrupt?
There's CheckIndex, here's at least one link:
http://java.dzone.com/news/lucene-and-solrs-checkindex
What you're describing, though, is that docs just didn't make it to
the node, _not_ that the index has unexpected bits, bad disk sectors
and the like so CheckIndex can't detect that. How would it know what
_should_ have been in the index?

bq:  I noticed a difference in the Gen column on Overview -
Replication. Does this mean there is something wrong?
You cannot infer anything from this. In particular, the merging will
be significantly different between a single full-reindex and what the
state of segment merges is in an incrementally built index.

The admin UI screen is rooted in the pre-cloud days, the Master/Slave
thing is entirely misleading. In SolrCloud, since all the raw data is
forwarded to all replicas, and any auto commits that happen may very
well be slightly out of sync, the index size, number of segments,
generations, and all that are pretty safely ignored.

Best,
Erick

On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries mar...@downnotifier.com wrote:
 Hi Andrew,

 Even our master index is corrupt, so I'm afraid this won't help in our case.

 Martin


 Andrew Butkus schreef op 05.03.2015 16:45:


 Force a fetchindex on slave from master command:
 http://slave_host:port/solr/replication?command=fetchindex - from
 http://wiki.apache.org/solr/SolrReplication

 The above command will download the whole index from master to slave,
 there are configuration options in solr to make this problem happen less
 often (allowing it to recover from new documents added and only send the
 changes with a wider gap) - but I cant remember what those were.




Labels for facets on Velocity

2015-03-05 Thread Henrique O. Santos
Hello,

I’ve been trying to have a pretty name for my facets on Velocity Response 
Writer. Do you know how can I do that?

For example, suppose that I am faceting field1. My query returns 3 facets: 
uglyfacet1, uglyfacet2 and uglyfacet3. I want to show them to the user a pretty 
name, like Pretty Facet 1, Pretty Facet 2 and Pretty Facet 3.

The thing is that linking on velocity should still work, so the user can 
navigate the results.

Thank you.
Henrique.

RE: Cores and and ranking (search quality)

2015-03-05 Thread Markus Jelsma
Hello - facetting will be the same and distributed more like this is also 
possible since 5.0, and there is a working patch for 4.10.3. Regular search 
will work as well since 5.0 because of distributed IDF, which you need to 
enable manually. Behaviour will not be the same if you rely on average document 
length statistics, which is true when you use BM25 instead of the default TFIDF 
similarity. Solr will do the result merging so everything is transparent, 
awesome!

Markus 
 
-Original message-
 From:johnmu...@aol.com johnmu...@aol.com
 Sent: Thursday 5th March 2015 14:38
 To: solr-user@lucene.apache.org
 Subject: Cores and and ranking (search quality)
 
 Hi,
 
 I have data in which I will index and search on.  This data is well define 
 such that I can index into a single core or multiple cores like so: 
 core_1:Jan2015, core_2:Feb2015, core_3:Mar2015, etc.
 
 My question is this: if I put my data in multiple cores and use distributed 
 search will the ranking be different if I had all my data in a single core?  
 If yes, how will it be different?  Also, will facet and more-like-this 
 quality / result be the same?
 
 Also, reading the distributed search wiki 
 (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the 
 search and result merging (all I have to do is issue a search), is this 
 correct?
 
 Thanks!
 
 - MJ
 


SOLR query parameters

2015-03-05 Thread phiroc
Hello,

could someone please explain what these SOLR query parameter keywords stand for:

- ppcdb

- srbycb

- as

For instance,

http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort=

I could not find them in the SOLR documentation.

Many thanks.

Philippe






Re: SOLR query parameters

2015-03-05 Thread phiroc
Please ignore my question.

These are form field names which I created a couple of months ago, not SOLR 
query parameters.

Philippe


- Mail original -
De: phi...@free.fr
À: solr-user@lucene.apache.org
Envoyé: Jeudi 5 Mars 2015 14:54:26
Objet: SOLR query parameters

Hello,

could someone please explain what these SOLR query parameter keywords stand for:

- ppcdb

- srbycb

- as

For instance,

http://searcharchives.iht.com:8983/solr/inytapdf0/browse?ppdcb=srbycb=as=q=kaisersort=

I could not find them in the SOLR documentation.

Many thanks.

Philippe






Re: Cores and and ranking (search quality)

2015-03-05 Thread Toke Eskildsen
On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
 My question is this: if I put my data in multiple cores and use
 distributed search will the ranking be different if I had all my data
 in a single core?

Yes, it will be different. The practical impact depends on how
homogeneous your data are across the shards and how large your shards
are. If you have small and dissimilar shards, your ranking will suffer a
lot.

Work is being done to remedy this:
https://issues.apache.org/jira/browse/SOLR-1632

 Also, will facet and more-like-this quality / result be the same?

It is not formally guaranteed, but for most practical purposes, faceting
on multi-shards will give you the same results as single-shards.

I don't know about more-like-this. My guess is that it will be affected
in the same way that standard searches are.

 Also, reading the distributed search wiki
 (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does
 the search and result merging (all I have to do is issue a search), is
 this correct?

Yes. From a user-perspective, searches are no different.

- Toke Eskildsen, State and University Library, Denmark




[ANNOUNCE] Apache Solr 4.10.4 released

2015-03-05 Thread Michael McCandless
October 2014, Apache Solr™ 4.10.4 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.10.4

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search. Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.10.4 is available for immediate download at:

http://www.apache.org/dyn/closer.cgi/lucene/solr/4.10.4

Solr 4.10.4 includes 24 bug fixes, as well as Lucene 4.10.4 and its 13
bug fixes.

See the CHANGES.txt file included with the release for a full list of
changes and further details.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.

Mike McCandless

http://blog.mikemccandless.com


Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-05 Thread Erick Erickson
I would, BTW, either just get rid of the maxBufferedDocs all together or
make it much higher, i.e. 10. I don't think this is really your
problem, but you're creating a lot of segments here.

But I'm kind of at a loss as to what would be different about your setup.
Is there _any_ chance that you have some secondary process looking at
your index that's maintaining open searchers? Any custom code that's
perhaps failing to close searchers? Is this a Unix or Windows system?

And just to be really clear, you _only_ seeing more segments being
added, right? If you're only counting files in the index directory, it's
_possible_ that merging is happening, you're just seeing new files take
the place of old ones.

Best,
Erick

On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 3/4/2015 4:12 PM, Erick Erickson wrote:
 I _think_, but don't know for sure, that the merging stuff doesn't get
 triggered until you commit, it doesn't just happen.

 Shot in the dark...

 I believe that new segments are created when the indexing buffer
 (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
 anytime a new segment is created, the merge policy is checked to see
 whether a merge is needed.

 Thanks,
 Shawn



Parsing cluster result's docs

2015-03-05 Thread Jorge Lazo

Hi,

I have a Solr instance using the clustering component (with the Lingo 
algorithm) working perfectly. However when I get back the cluster 
results only the ID's of these come back with it. What is the easiest 
way to retrieve full documents instead? Should I parse these IDs into a 
new query to Solr, or is there some configuration I am missing to return 
full docs instead of IDs?


If it matters, I am using Solr 4.10.

Thanks.


Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi,

We have index corruption on some cores on our Solrcloud running version 
4.8.1. The index is corrupt on several servers. (for example: when we do 
an fq search we get results on some servers, on other servers we don't, 
while the stored document contains the field on all servers).


A full re-index of the content didn't help, so we created a new core 
and did the reindex on that one.


We think the index corruption is caused by network issues we had a few 
weeks ago. I hope someone can help us with some questions:
- Is it possible to make Solrcloud invulnerable for network problems 
like packet loss or connection errors? Will it for example help to use 
an SSL connection between the Solr servers?
- Is it possible to see if a core is corrupt? We now noticed because we 
didn't find some documents while searching on the website, but don't 
know if other cores are corrupt. I noticed a difference in the Gen 
column on Overview - Replication. Does this mean there is something 
wrong? Or is there any other way to see the corruption?


Corrupt core:
Version Gen Size
Master (Searching)  1425565575249   2023309 472.41 MB
Master (Replicable) 1425566098510   2023310 -
Slave (Searching)   1425565575253   2023308 472.38 MB

Re-created core:
Version Gen Size
Master (Searching)  1425566108174   35  283.98 MB
Master (Replicable) 1425566108174   35  -
Slave (Searching)   1425566106674   35  288.24 MB



Kind regards,

Martin




Re: Solrcloud Index corruption

2015-03-05 Thread Andrew Butkus
We had a similar issue, when this happened we did a fetch index on each core 
out of sync to put them back right again 

Sent from my iPhone

 On 5 Mar 2015, at 14:40, Martin de Vries mar...@downnotifier.com wrote:
 
 Hi,
 
 We have index corruption on some cores on our Solrcloud running version 
 4.8.1. The index is corrupt on several servers. (for example: when we do an 
 fq search we get results on some servers, on other servers we don't, while 
 the stored document contains the field on all servers).
 
 A full re-index of the content didn't help, so we created a new core and did 
 the reindex on that one.
 
 We think the index corruption is caused by network issues we had a few weeks 
 ago. I hope someone can help us with some questions:
 - Is it possible to make Solrcloud invulnerable for network problems like 
 packet loss or connection errors? Will it for example help to use an SSL 
 connection between the Solr servers?
 - Is it possible to see if a core is corrupt? We now noticed because we 
 didn't find some documents while searching on the website, but don't know if 
 other cores are corrupt. I noticed a difference in the Gen column on 
 Overview - Replication. Does this mean there is something wrong? Or is there 
 any other way to see the corruption?
 
 Corrupt core:
VersionGenSize
 Master (Searching)14255655752492023309472.41 MB
 Master (Replicable)14255660985102023310-
 Slave (Searching)14255655752532023308472.38 MB
 
 Re-created core:
VersionGenSize
 Master (Searching)142556610817435283.98 MB
 Master (Replicable)142556610817435-
 Slave (Searching)142556610667435288.24 MB
 
 
 
 Kind regards,
 
 Martin
 
 


RE: Solrcloud Index corruption

2015-03-05 Thread Andrew Butkus
Force a fetchindex on slave from master command: 
http://slave_host:port/solr/replication?command=fetchindex - from 
http://wiki.apache.org/solr/SolrReplication

The above command will download the whole index from master to slave, there are 
configuration options in solr to make this problem happen less often (allowing 
it to recover from new documents added and only send the changes with a wider 
gap) - but I cant remember what those were.

-Original Message-
From: Andrew Butkus [mailto:andrew.but...@c6-intelligence.com] 
Sent: 05 March 2015 14:42
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud Index corruption

We had a similar issue, when this happened we did a fetch index on each core 
out of sync to put them back right again 

Sent from my iPhone

 On 5 Mar 2015, at 14:40, Martin de Vries mar...@downnotifier.com wrote:
 
 Hi,
 
 We have index corruption on some cores on our Solrcloud running version 
 4.8.1. The index is corrupt on several servers. (for example: when we do an 
 fq search we get results on some servers, on other servers we don't, while 
 the stored document contains the field on all servers).
 
 A full re-index of the content didn't help, so we created a new core and did 
 the reindex on that one.
 
 We think the index corruption is caused by network issues we had a few weeks 
 ago. I hope someone can help us with some questions:
 - Is it possible to make Solrcloud invulnerable for network problems like 
 packet loss or connection errors? Will it for example help to use an SSL 
 connection between the Solr servers?
 - Is it possible to see if a core is corrupt? We now noticed because we 
 didn't find some documents while searching on the website, but don't know if 
 other cores are corrupt. I noticed a difference in the Gen column on 
 Overview - Replication. Does this mean there is something wrong? Or is there 
 any other way to see the corruption?
 
 Corrupt core:
VersionGenSize
 Master (Searching)14255655752492023309472.41 MB
 Master (Replicable)14255660985102023310-
 Slave (Searching)14255655752532023308472.38 MB
 
 Re-created core:
VersionGenSize
 Master (Searching)142556610817435283.98 MB
 Master (Replicable)142556610817435-
 Slave (Searching)142556610667435288.24 MB
 
 
 
 Kind regards,
 
 Martin
 
 


Cores and and ranking (search quality)

2015-03-05 Thread johnmunir
Hi,

I have data in which I will index and search on.  This data is well define such 
that I can index into a single core or multiple cores like so: core_1:Jan2015, 
core_2:Feb2015, core_3:Mar2015, etc.

My question is this: if I put my data in multiple cores and use distributed 
search will the ranking be different if I had all my data in a single core?  If 
yes, how will it be different?  Also, will facet and more-like-this quality / 
result be the same?

Also, reading the distributed search wiki 
(http://wiki.apache.org/solr/DistributedSearch) it looks like Solr does the 
search and result merging (all I have to do is issue a search), is this correct?

Thanks!

- MJ


RE: Solrcloud Index corruption

2015-03-05 Thread Martin de Vries

Hi Andrew,

Even our master index is corrupt, so I'm afraid this won't help in our 
case.


Martin


Andrew Butkus schreef op 05.03.2015 16:45:


Force a fetchindex on slave from master command:
http://slave_host:port/solr/replication?command=fetchindex - from
http://wiki.apache.org/solr/SolrReplication

The above command will download the whole index from master to slave,
there are configuration options in solr to make this problem happen 
less
often (allowing it to recover from new documents added and only send 
the

changes with a wider gap) - but I cant remember what those were.




Re: Performance on faceting using docValues

2015-03-05 Thread Toke Eskildsen
On Thu, 2015-03-05 at 21:14 +0100, lei wrote:

You present a very interesting observation. I have not noticed what you
describe, but on the other hand we have not done comparative speed
tests.

 q=*:*fq=country:USfq=category:112

First observation: Your query is '*:*, which is a magic query. Non-DV
faceting has optimizations both for this query (although that ought to
be disabled due to the fq) and for the inverse case where there are
more hits than non-hits. Perhaps you could test with a handful of
queries, which has different result sizes?

 facet=onfacet.sort=indexfacet.mincount=1facet.limit=2000

The combination of index order and a high limit might be an explanation:
When resolving the Strings of the facet result, non-DV will perform
ordinal-lookup, which is fast when done in monotonic rising order
(sort=index) and if the values are close (limit=2000). I do not know if
DV benefits the same way.

On the other hand, your limit seems to apply only to material, so it
could be that the real number of unique values is low and you just set
the limit to 2000 to be sure you get everything?

 facet.field=manufacturerfacet.field=sellerfacet.field=material
 f.manufacturer.facet.mincount=1f.manufacturer.facet.sort=countf.manufacturer.facet.limit=100
 f.seller.facet.mincount=1f.seller.facet.sort=countf.seller.facet.limit=100
 f.material.facet.mincount=1sort=score+desc

How large is your index in bytes, how many documents does it contain and
is it single-shard or cloud? Could you paste the loglines containing
UnInverted field, which describes the number of unique values and size
of your facet fields?

- Toke Eskildsen, State and University Library, Denmark




Re: problem with tutorial

2015-03-05 Thread gaohang wang
do you publish you solr in tomcat?which is the tomcat port?

2014-12-16 15:45 GMT+08:00 Xin Cai xincai2...@gmail.com:

 hi Everyone
 I am a complete noob when it comes to Solr and when I try to follow the
 tutorial and run Solr I get the error message

 Waiting to see Solr listening on port 8983 [-]  Still not seeing Solr
 listening on 8983 after 30 seconds!

 I did some googling and all I found was instruction for removing grep
 commands which doesn't sound right to me...I have checked my ports and
 currently I don't have any service listening on port 8983 and my firewall
 is not on, so I am not sure what is happening. Any help would be
 appreciated. Thanks

 Xin Cai