Sorting and grouping performance.

2015-04-09 Thread Modassar Ather
Hi, I have 4 node SolrCloud without replicas with solr-4-10-3. If I add a 5th node (i.e 5 node SolrCloud without replicas) and do a re-indexing kindly let me know: What effect we can expect in sorting and grouping performance as compared to 4 node SolrCloud VS 5 node SolrCloud? The document is

clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
What I really meant is trying to get cluster status directly through ZK API. Your approach a bit different from what I meant but it's a nice one as it seems will work across versions 4 and 5. On Thursday, April 9, 2015, Shalin Shekhar Mangar shalinman...@gmail.com

Re: Indexing Process: Lucene checksum error

2015-04-09 Thread Shalin Shekhar Mangar
This is not a problem. The message is logged during replication where it tries to find files that are mis-matched between leader and replica to determine whether a full or a partial replication is to be performed. If you actually get an exception in the logs saying CorruptIndexException then it is

RE: SOLR searching

2015-04-09 Thread Reitzel, Charles
The other question is how often do prices change? Is it much more often than other product info (or per-user-and-product info)? These are use cases for things like CurrencyField and ExternalFileField.The thing to know about these is that CurrencyField values are searchable, while

Re: Indexing Process: Lucene checksum error

2015-04-09 Thread Aman Tandon
OKay. Thanks Shalin. With Regards Aman Tandon On Thu, Apr 9, 2015 at 3:55 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This is not a problem. The message is logged during replication where it tries to find files that are mis-matched between leader and replica to determine whether

Re: Shard is down, what to do?

2015-04-09 Thread Allan Kamau
What do the error logs say? Allan. On Thu, Apr 9, 2015 at 2:17 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, One of my shard is down, it is also not recovering at all. It is down from last 4 hours. What should I do? What could be the reasons behind this? With Regards Aman Tandon

RE: Clusterstate - state active

2015-04-09 Thread Matt Kuiper
Erick, I do not give it an explicit name. I use call like: curl 172.29.24.47:8983/solr/admin/collections?action=ADDREPLICAcollection=kla_collectionshard=shard25node=172.29.24.75:8983_solr It does not appear to be reusing the name, if by name you mean core_node*, or core. Both are different

Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Shalin Shekhar Mangar
I don't quite follow. Are you saying that you intend to use the ZK REST API to fetch live_nodes and then send the 'clusterstatus' API call to one of the live nodes? On Thu, Apr 9, 2015 at 7:13 PM, Ahmed Adel ahmed.a...@badrit.com wrote: In fact, the advantage I see of using ZK is that we don't

Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Shalin Shekhar Mangar
Hi Ahmed, Can you give more details? What did you expect and what was the actual? Also, are you looking directly at the clusterstate.json inside ZooKeeper or are you using the 'clusterstatus' Collection API? You shouldn't look at the clusterstate.json directly because 1) things like live-ness is

Re: Memory Leak in solr 4.8.1

2015-04-09 Thread Toke Eskildsen
On Wed, 2015-04-08 at 14:00 -0700, pras.venkatesh wrote: 1. 8 nodes, 4 shards(2 nodes per shard) 2. each node having about 55 GB of Data, in total there is 450 million documents in the collection. so the document size is not huge, So ~120M docs/shard. 3. The schema has 42 fields, it gets

Re: Memory Leak in solr 4.8.1

2015-04-09 Thread pras.venkatesh
I don't have a filter cache, and have completely disabled filter cache. Since I am not using filter queries. -- View this message in context: http://lucene.472066.n3.nabble.com/Memory-Leak-in-solr-4-8-1-tp4198488p4198716.html Sent from the Solr - User mailing list archive at Nabble.com.

Replication for SolrCloud

2015-04-09 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, Can anyone please tell me how does shard replication work when the indexes are stored in HDFS? i..e with HDFS, the default replication factor is 3. Now, for the Solr shards, if I set the replication factor to 3 again, does that mean, internally index data is replicated thrice and then HDFS

Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
In fact, the advantage I see of using ZK is that we don't have to iterate over nodes in case the first node receiving that request is down, whereas, by using ZK REST API, we can do that in a single request as I assume we can check live_nodes (in case this approach is guaranteed when using Solr

Re: Group by score

2015-04-09 Thread Ryan Josal
You can use Result Grouping by a function using query(), but you'll need a version of Lucene with this bug fixed: https://issues.apache.org/jira/browse/SOLR-7046 Ryan On Thursday, April 9, 2015, Jens Mayer mjen...@yahoo.com.invalid wrote: Hey everybody, I have the following situation in my

integrating Accumulo with solr

2015-04-09 Thread madhvi
Hi, I have created lucene indexes of data stored in accumulo in HDFS. Lucene queries are working fine over that but I want to use those indexes to be searched via accumulo means the lucene queries should run via accumulo.Do you have any idea about that if it is related to what you are trying

Indexing Process: Lucene checksum error

2015-04-09 Thread Aman Tandon
Hi, I am getting this type of error while indexing on solr cloud. Could somebody help, I have no knowledge what it is. WARN - 2015-04-09 07:11:27.705; org.apache.solr.handler.SnapPuller; File _p_Lucene50_0.tip did not match. expected checksum is 1515849197 and actual is checksum 1522458868.

Re: Shard is down, what to do?

2015-04-09 Thread Shalin Shekhar Mangar
Please always report the Solr version whenever you are reporting a problem. It helps us track down issues faster. Please also post the complete ZK tree using a pastebin or gist.github.com link. You can copy the entire ZK tree from the Admin UI under Cloud Dump tab. On Thu, Apr 9, 2015 at 4:47

variable length ngramfilter highlights

2015-04-09 Thread Dan Sullivan
Hi, I apologize if this question is redundant. I've spent a few days on it and scoured the Internet; I know that this question has been asked and answered in various capacities for different versions of Solr; the reason I am inquiring to this mailing list is because what I am attempting to do

Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Shalin Shekhar Mangar
Yes, you can use the 'clusterstatus' API which will return an aggregation of all states. See https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18 On Thu, Apr 9, 2015 at 5:52 PM, Ahmed Adel ahmed.a...@badrit.com wrote: Hi Shalin, Thanks for your response. I'm

Shard is down, what to do?

2015-04-09 Thread Aman Tandon
Hi, One of my shard is down, it is also not recovering at all. It is down from last 4 hours. What should I do? What could be the reasons behind this? With Regards Aman Tandon

Re: Shard is down, what to do?

2015-04-09 Thread Aman Tandon
Sorry but I unable to find anything wrong in the logs. With Regards Aman Tandon On Thu, Apr 9, 2015 at 5:09 PM, Allan Kamau kamaual...@gmail.com wrote: What do the error logs say? Allan. On Thu, Apr 9, 2015 at 2:17 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, One of my shard

Controlling ReplicationHandler consistently across multiple cores

2015-04-09 Thread Davis, Daniel (NIH/NLM) [C]
For backup purposes to an offsite data center, I need to make sure that each core's configuration has replication to a consistently defined backup directory on a Netapp filer. The Netapp filer's snapshot can be invoked manually, and its snap mirror will copy the data to the offsite data

Re: omitTermFreqAndPositions issue

2015-04-09 Thread Ryan Josal
Thanks a lot Erick, your suggestion on using similarity will work great; I wasn't aware you could define similarity on a field by field basis until now, and that solution works perfectly. Sorry what I said was a little misleading. I should have said I don't want it to issue phrase queries to that

clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
Hi All, On Solr 5.0 and ZK 3.4.6 sometimes clusterstate.json does not reflect the aggregation of states of collections, the latter is always correct. I could verify this from the admin panel (under Tree view) and from ZKCli. Is there something I'm missing that could generate this issue? -- A.

Re: Replication for SolrCloud

2015-04-09 Thread Erick Erickson
Yes. 3 replicas and an HDFS replication factor of 3 means 9 copies of the index are laying around. You can change your HDFS replication factor, but that affects other applications using HDFS, so that may not be an option. Best, Erick On Thu, Apr 9, 2015 at 2:31 AM, Vijaya Narayana Reddy Bhoomi

Group by score

2015-04-09 Thread Jens Mayer
Hey everybody, I have the following situation in my search application: I've been searching street sources. By executing a search I receive several matches. The first 10 matches are displayed. But in this situation a part of the results are nearly the same.As example if I seach for Berlin I'll

Re: clusterstate.json is sometimes out-of-sync

2015-04-09 Thread Ahmed Adel
Hi Shalin, Thanks for your response. I'm actually looking inside ZooKeeper in order to obtain highest availability. What I expected is that clusterstate.json contains the aggregation of all state.json children nodes of each collection. But your second paragraph explains the behavior I see in Solr