SolrCloud replica always fully resync index from leader node

2014-11-28 Thread stephon
I have an SolrCloud core with 4 shards, and replication factor is 1.
mentioned below: * coreA_shard1_replica1 * coreA_shard2_replica1 *
coreA_shard3_replica1 * coreA_shard4_replica1

After added the new replica of coreA_shard1, i.e.: coreA_shard1_replica2. it
will do fully resync from the leader node (coreA_shard1_replica1) every 2
days.

In the solrconfir.xml of coreA, autocommit has set to 30 secs

 autoCommit
   maxTime3/maxTime
   openSearchertrue/openSearcher
 /autoCommit
and setting replicateAfter:commit

How do I prevent coreA_shard1_replica2 from always fully resyncing from
coreA_shard1_replica1 ?

Thanks a lot.

stephon



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replica-always-fully-resync-index-from-leader-node-tp4171403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replica always fully resync index from leader node

2014-11-28 Thread lboutros
Hi Stephon,

do you see Zookeeper timeout errors in your log files ?

Could you please give us additional informations like :

How often is your index updated ? Which version of Solr do you use ? What is
the size of your index ?

Make sure you have this handler in your solr configuration file :

requestHandler name=/get class=solr.RealTimeGetHandler
  lst name=defaults
  str name=omitHeadertrue/str
  /lst
  /requestHandler

Ludovic.



-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replica-always-fully-resync-index-from-leader-node-tp4171403p4171407.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replica always fully resync index from leader node

2014-11-28 Thread stephon
Hello Ludovic,

Zookeeper timeout errors not found in log file

Here is my SolrCloud environment information.
* Solr 4.5.1 used
* Index size : ~270G
* Index update: every 30 secs, each update will contain 3~4 index
version changes
 * example:
  * old index version number:   1417165450218
  * new index version number: 1417165480450
* omitHeader has been set to true

If additional information needed, please let me know.

Thanks a lot
---
stephon




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replica-always-fully-resync-index-from-leader-node-tp4171403p4171409.html
Sent from the Solr - User mailing list archive at Nabble.com.


phrase extraction from user paragraph input

2014-11-28 Thread Nikos Chaliasos
Hello,

I am investigating a university project where in a part of it, the user
would give a paragraph of text as input and the parsing process (after
removing stopwords) would extract a series of descriptive topics about the
paragraph, with which I could then search in documents for results.
Is there any available bibliography/source that could help me start with? I
am very new to solr/lucene and I couldn't find anything similar to what I
am thinking.

Thank you,

Nikos Chaliasos


Re: phrase extraction from user paragraph input

2014-11-28 Thread Vineet Mishra
Hi Nokos,

Can you quote an example for your usecase, I guess that will be helpful for
understanding the problem more clearly.

Cheers!

On Fri, Nov 28, 2014 at 2:31 PM, Nikos Chaliasos nchal...@cs.uoi.gr wrote:

 Hello,

 I am investigating a university project where in a part of it, the user
 would give a paragraph of text as input and the parsing process (after
 removing stopwords) would extract a series of descriptive topics about the
 paragraph, with which I could then search in documents for results.
 Is there any available bibliography/source that could help me start with? I
 am very new to solr/lucene and I couldn't find anything similar to what I
 am thinking.

 Thank you,

 Nikos Chaliasos



Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread RajaDilipChowdary.Kolli
Hi Team,

We are using Apache Solr 1.4.1 for our project. Now a days we are facing many 
problems regarding solr indexing, so when we saw website we found latest 
version is 4.10, could you please help us in Upgrading the Solr.

Is there any specific things which we need to change from our current setup

Regards,
Raja
+91-8121704967
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


Re: Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread David Philip
Hi Raja,

  Could you please mention the list of solr features that you were/are
using in Solr 1.4. There have been tremendous changes since 1.4 to 4.10.
Also, you may have to explore solr cloud for resolving the indexing
operation. But what kind of indexing problems are you facing?

You should look into the link mentioned below. The best way to upgrade from
such older version to latest is to configure the features that you were
using in Solr 1.4 into solr 4.10, run test cases and start using it.

Thanks - David.

https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr











On Fri, Nov 28, 2014 at 3:14 PM, rajadilipchowdary.ko...@cognizant.com
wrote:

 Hi Team,

 We are using Apache Solr 1.4.1 for our project. Now a days we are facing
 many problems regarding solr indexing, so when we saw website we found
 latest version is 4.10, could you please help us in Upgrading the Solr.

 Is there any specific things which we need to change from our current setup

 Regards,
 Raja
 +91-8121704967
 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information. If you are not the intended recipient(s), please reply to the
 sender and destroy all copies of the original message. Any unauthorized
 review, use, disclosure, dissemination, forwarding, printing or copying of
 this email, and/or any action taken in reliance on the contents of this
 e-mail is strictly prohibited and may be unlawful. Where permitted by
 applicable law, this e-mail and other e-mail communications sent to and
 from Cognizant e-mail addresses may be monitored.



Re: phrase extraction from user paragraph input

2014-11-28 Thread Ahmet Arslan
Hi,

For the first part of the task, you can use key phrase extraction.

https://code.google.com/p/maui-indexer/
http://www.nzdl.org/Kea/

Ahmet





On Friday, November 28, 2014 11:23 AM, Nikos Chaliasos nchal...@cs.uoi.gr 
wrote:
Hello,

I am investigating a university project where in a part of it, the user
would give a paragraph of text as input and the parsing process (after
removing stopwords) would extract a series of descriptive topics about the
paragraph, with which I could then search in documents for results.
Is there any available bibliography/source that could help me start with? I
am very new to solr/lucene and I couldn't find anything similar to what I
am thinking.

Thank you,

Nikos Chaliasos


Solr 4.10.2 - DataImportHandler - Qustion

2014-11-28 Thread Umang Agrawal
Hi All

I have a question on loading data into Solr using DataImportHandler.

I have a xml file which I need to load into Solr using data import handler
via Xpath transformer:

XML file structure is:

*record*
* item*
* tags*
* tag1*
* tag*
* id01/id*
* valuevalue01/value*
* /tag*
* tag*
* id02/id*
* valuevalue02/value*
* /tag*
* tag1*
* tag2*
* tag*
* id03/id*
* valuevalue03/value*
* /tag *
* tag2*
* tag3*
* tag*
* id04/id*
* valuevalue04/value*
* /tag*
* tag*
* id05/id*
* valuevalue05/value*
* /tag*
* tag*
* id06/id*
* valuevalue06/value*
* /tag *
* tag3 *
* /tags*
* /item*
*/record *


I can use */record/item/tags//tag/id *to load the data in a multivalue
field but I need to maintain the relationship of id with value, I mean I
need a way, so that at query time I can get the value related to id.


Please advice. Thanks in advance.

-- 
Thanx  Regards
Umang Agrawal


Re: Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-28 Thread Erick Erickson
Does grouping work for you here? Because
even if you solve this problem, if I'm reading
this right you're going to fetch stored values
for every doc that matches the query, which
is an anti-pattern big-time, consider *:*

Of course I did a very quick skim, so maybe
I'm all wet

Best,
Erick

On Thu, Nov 27, 2014 at 5:28 PM, Darin Amos dari...@gmail.com wrote:
 Hello,

 I am trying to implement a Rollup Search component on a version of SOLR that 
 exists previously to the parent/child additions, so I am trying to implement 
 my own. The searches will be executed exclusively against the child 
 documents, and I want to “rollup” those child documents into the parent 
 documents.

 The interface is going to allow the user to add the following parameters to 
 the SOLR query:

 rollup=truerollup.parentField=idrollup.childField=parentId

 My code so far is below. What I have works so far, except my second parent 
 query loses the order. I would like to be able to sort my parent query by the 
 score of the previous child search. Perhaps I would take the highest score 
 from all children (haven’t decided yet). My problem however is that I don’t 
 know how I can get the score from all the hits in the original search, just 
 what is returned. If my child query gets 10,000 hits, but only return 100 
 records, I can’t get all the scores I need.

 Does anyone have any recommendations?

 Thanks!!
 Darin


 //Loop through all the records and look for the 
 parent reference field
 SetString parentRefs = new HashSetString();
 DocIterator docSetIterator = 
 rb.getResults().docSet.iterator();
 while(docSetIterator.hasNext()){
 int docInt = docSetIterator.next();
 String fieldValues[] = 
 rb.req.getSearcher().doc(docInt).getValues(childFieldName);

 for(String fieldValue : fieldValues){
 if(fieldValue != null  
 fieldValue.length()  0  !parentRefs.contains(fieldValue)){
 parentRefs.add(fieldValue);
 }
 }
 }

 //Build a boolean query of term queries
 BooleanQuery parentQuery = new BooleanQuery();
 IteratorString parentIdIterator = 
 parentRefs.iterator();
 while(parentIdIterator.hasNext()){
 String parentId = parentIdIterator.next();
 TermQuery termQuery = new TermQuery(new 
 Term(parentFieldName, parentId));
 parentQuery.add(termQuery, 
 BooleanClause.Occur.SHOULD);
 }


 DocList parentList = searcher.getDocList(parentQuery, 
 new ArrayListQuery(), null, 0, 100, 1); //TODO: use correct start/end/flags 
 later...

 //Add parent results
 ResultContext resultContext = new ResultContext();
 resultContext.docs = parentList;
 resultContext.query = parentQuery;
 rb.rsp.add(parents, resultContext);
 rb.rsp.getToLog().add(hits, parentList.matches());


Re: SolrCloud replica always fully resync index from leader node

2014-11-28 Thread Erick Erickson
Stephon:

Not quite sure what's going on, but you're hinting
that you're mixing old-style replication with SolrCloud,
the two are orthogonal.

This, for instance, is irrelevant for SolrCloud:
and setting replicateAfter:commit

So let's see the relevant configuration from solrconfig.xml.

Also, what is your evidence that a full replication is happening?
Showing us what you see will offer some more clues.

Best,
Erick

On Fri, Nov 28, 2014 at 1:10 AM, stephon step...@gmail.com wrote:
 Hello Ludovic,

 Zookeeper timeout errors not found in log file

 Here is my SolrCloud environment information.
 * Solr 4.5.1 used
 * Index size : ~270G
 * Index update: every 30 secs, each update will contain 3~4 index
 version changes
  * example:
   * old index version number:   1417165450218
   * new index version number: 1417165480450
 * omitHeader has been set to true

 If additional information needed, please let me know.

 Thanks a lot
 ---
 stephon




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-replica-always-fully-resync-index-from-leader-node-tp4171403p4171409.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread Erick Erickson
P.S. Do _NOT_ just copy your 1.4 configs to 4.x. Start
with the 4x sample configs and selectively move any
customizations from 1.4 or you'll get burned
by things like schema requiring _version_ in 4.x and
possibly _root_ etc.


Best,
Erick

On Fri, Nov 28, 2014 at 2:53 AM, David Philip
davidphilipshe...@gmail.com wrote:
 Hi Raja,

   Could you please mention the list of solr features that you were/are
 using in Solr 1.4. There have been tremendous changes since 1.4 to 4.10.
 Also, you may have to explore solr cloud for resolving the indexing
 operation. But what kind of indexing problems are you facing?

 You should look into the link mentioned below. The best way to upgrade from
 such older version to latest is to configure the features that you were
 using in Solr 1.4 into solr 4.10, run test cases and start using it.

 Thanks - David.

 https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr











 On Fri, Nov 28, 2014 at 3:14 PM, rajadilipchowdary.ko...@cognizant.com
 wrote:

 Hi Team,

 We are using Apache Solr 1.4.1 for our project. Now a days we are facing
 many problems regarding solr indexing, so when we saw website we found
 latest version is 4.10, could you please help us in Upgrading the Solr.

 Is there any specific things which we need to change from our current setup

 Regards,
 Raja
 +91-8121704967
 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information. If you are not the intended recipient(s), please reply to the
 sender and destroy all copies of the original message. Any unauthorized
 review, use, disclosure, dissemination, forwarding, printing or copying of
 this email, and/or any action taken in reliance on the contents of this
 e-mail is strictly prohibited and may be unlawful. Where permitted by
 applicable law, this e-mail and other e-mail communications sent to and
 from Cognizant e-mail addresses may be monitored.



Re: Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-28 Thread Darin Amos
Hi Eric,

I am curious why this would b considered an anti-patern to check a stored 
valued for every matching document. Is this not what the facet query component 
is doing anyway so it can get the total counts?

Grouping doesn’t solve the issue because again, I will only see groups for the 
items returned, not all matching documents.

I am using a product that has SOLR 4.3 embedded into it, so I cannot upgrade to 
4.9 or 4.10 and take advantage of the parent/child feature added recently.

Thanks!

Darin

 On Nov 28, 2014, at 12:03 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 Does grouping work for you here? Because
 even if you solve this problem, if I'm reading
 this right you're going to fetch stored values
 for every doc that matches the query, which
 is an anti-pattern big-time, consider *:*
 
 Of course I did a very quick skim, so maybe
 I'm all wet
 
 Best,
 Erick
 
 On Thu, Nov 27, 2014 at 5:28 PM, Darin Amos dari...@gmail.com wrote:
 Hello,
 
 I am trying to implement a Rollup Search component on a version of SOLR that 
 exists previously to the parent/child additions, so I am trying to implement 
 my own. The searches will be executed exclusively against the child 
 documents, and I want to “rollup” those child documents into the parent 
 documents.
 
 The interface is going to allow the user to add the following parameters to 
 the SOLR query:
 
 rollup=truerollup.parentField=idrollup.childField=parentId
 
 My code so far is below. What I have works so far, except my second parent 
 query loses the order. I would like to be able to sort my parent query by 
 the score of the previous child search. Perhaps I would take the highest 
 score from all children (haven’t decided yet). My problem however is that I 
 don’t know how I can get the score from all the hits in the original search, 
 just what is returned. If my child query gets 10,000 hits, but only return 
 100 records, I can’t get all the scores I need.
 
 Does anyone have any recommendations?
 
 Thanks!!
 Darin
 
 
//Loop through all the records and look for the 
 parent reference field
SetString parentRefs = new HashSetString();
DocIterator docSetIterator = 
 rb.getResults().docSet.iterator();
while(docSetIterator.hasNext()){
int docInt = docSetIterator.next();
String fieldValues[] = 
 rb.req.getSearcher().doc(docInt).getValues(childFieldName);
 
for(String fieldValue : fieldValues){
if(fieldValue != null  
 fieldValue.length()  0  !parentRefs.contains(fieldValue)){
parentRefs.add(fieldValue);
}
}
}
 
//Build a boolean query of term queries
BooleanQuery parentQuery = new BooleanQuery();
IteratorString parentIdIterator = 
 parentRefs.iterator();
while(parentIdIterator.hasNext()){
String parentId = parentIdIterator.next();
TermQuery termQuery = new TermQuery(new 
 Term(parentFieldName, parentId));
parentQuery.add(termQuery, 
 BooleanClause.Occur.SHOULD);
}
 
 
DocList parentList = searcher.getDocList(parentQuery, 
 new ArrayListQuery(), null, 0, 100, 1); //TODO: use correct 
 start/end/flags later...
 
//Add parent results
ResultContext resultContext = new ResultContext();
resultContext.docs = parentList;
resultContext.query = parentQuery;
rb.rsp.add(parents, resultContext);
rb.rsp.getToLog().add(hits, parentList.matches());



Re: Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread Shawn Heisey
On 11/28/2014 2:44 AM, rajadilipchowdary.ko...@cognizant.com wrote:
 We are using Apache Solr 1.4.1 for our project. Now a days we are facing many 
 problems regarding solr indexing, so when we saw website we found latest 
 version is 4.10, could you please help us in Upgrading the Solr.
 
 Is there any specific things which we need to change from our current setup

Solr 1.4.1 is extremely solid software, despite the fact that it was
released over four years ago.  There are very few bugs in it, though it
is missing a lot of functionality and performance that can be found in
newer versions.  We would recommend upgrading, but unless you give us
details about the problems you are having, we won't know for sure
whether upgrading will solve those problems.

The other two replies you received talked about starting with the 4.10.2
example config/schema and customizing that until you achieve a config
that meets your needs.  That's excellent advice ... you should listen to it.

I would strongly recommend reindexing from scratch.  Verion 4.x will not
be able to read indexes from Solr 1.4.1 at all, and if you take the
advice given earlier, your existing index may very well be incompatible
with the new config/schema that you create.

To use the existing index (assuming it's even compatible with the new
config/schema), you would first have to upgrade to version 3.6.2,
optimize the index, and then upgrade again.

You'll get better results by starting with a blank index on the new
version and reindexing.  Many bugs in indexing have been fixed over the
years.  Without a reindex, you will not see the benefit of those bugfixes.

http://wiki.apache.org/solr/HowToReindex

Your last question, about specific things to change ... this is
impossible to answer without seeing the existing setup ... and there
have been so many changes that starting over with the 4.10.2 example (as
already mentioned) is highly recommended.  It's a fair amount of work to
build a new config with such a major version jump, you'll meet
resistance if you ask us to do it for you.

Thanks,
Shawn



Re: Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-28 Thread Erick Erickson
Because you're fetching and decompressing the doc from disk. Grouping etc.
Do their work from _indexed_ terms, which are already in memory. Two
different things.

If I'm reading this right on a quick scan...

Best
Erick
On Nov 28, 2014 10:21 AM, Darin Amos dari...@gmail.com wrote:

 Hi Eric,

 I am curious why this would b considered an anti-patern to check a stored
 valued for every matching document. Is this not what the facet query
 component is doing anyway so it can get the total counts?

 Grouping doesn’t solve the issue because again, I will only see groups for
 the items returned, not all matching documents.

 I am using a product that has SOLR 4.3 embedded into it, so I cannot
 upgrade to 4.9 or 4.10 and take advantage of the parent/child feature added
 recently.

 Thanks!

 Darin

  On Nov 28, 2014, at 12:03 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Does grouping work for you here? Because
  even if you solve this problem, if I'm reading
  this right you're going to fetch stored values
  for every doc that matches the query, which
  is an anti-pattern big-time, consider *:*
 
  Of course I did a very quick skim, so maybe
  I'm all wet
 
  Best,
  Erick
 
  On Thu, Nov 27, 2014 at 5:28 PM, Darin Amos dari...@gmail.com wrote:
  Hello,
 
  I am trying to implement a Rollup Search component on a version of SOLR
 that exists previously to the parent/child additions, so I am trying to
 implement my own. The searches will be executed exclusively against the
 child documents, and I want to “rollup” those child documents into the
 parent documents.
 
  The interface is going to allow the user to add the following
 parameters to the SOLR query:
 
  rollup=truerollup.parentField=idrollup.childField=parentId
 
  My code so far is below. What I have works so far, except my second
 parent query loses the order. I would like to be able to sort my parent
 query by the score of the previous child search. Perhaps I would take the
 highest score from all children (haven’t decided yet). My problem however
 is that I don’t know how I can get the score from all the hits in the
 original search, just what is returned. If my child query gets 10,000 hits,
 but only return 100 records, I can’t get all the scores I need.
 
  Does anyone have any recommendations?
 
  Thanks!!
  Darin
 
 
 //Loop through all the records and look for the
 parent reference field
 SetString parentRefs = new HashSetString();
 DocIterator docSetIterator =
 rb.getResults().docSet.iterator();
 while(docSetIterator.hasNext()){
 int docInt = docSetIterator.next();
 String fieldValues[] =
 rb.req.getSearcher().doc(docInt).getValues(childFieldName);
 
 for(String fieldValue : fieldValues){
 if(fieldValue != null 
 fieldValue.length()  0  !parentRefs.contains(fieldValue)){
 
 parentRefs.add(fieldValue);
 }
 }
 }
 
 //Build a boolean query of term queries
 BooleanQuery parentQuery = new BooleanQuery();
 IteratorString parentIdIterator =
 parentRefs.iterator();
 while(parentIdIterator.hasNext()){
 String parentId =
 parentIdIterator.next();
 TermQuery termQuery = new TermQuery(new
 Term(parentFieldName, parentId));
 parentQuery.add(termQuery,
 BooleanClause.Occur.SHOULD);
 }
 
 
 DocList parentList =
 searcher.getDocList(parentQuery, new ArrayListQuery(), null, 0, 100, 1);
 //TODO: use correct start/end/flags later...
 
 //Add parent results
 ResultContext resultContext = new ResultContext();
 resultContext.docs = parentList;
 resultContext.query = parentQuery;
 rb.rsp.add(parents, resultContext);
 rb.rsp.getToLog().add(hits, parentList.matches());




Re: Trying to get ALL scores from a previous search in a custom search component (last-components)

2014-11-28 Thread Darin Amos
Thanks for the advice,

I will take a look to see if there is some tuning I can do here, I am not 
terribly concerned about that yet anyway.

My concern still remains with how can I get the scores of the entire matched 
set. Maybe it is not possible, or perhaps I need to write my own 
query/match/scorer to do this.

Curious is anyone else has tried to implement a similar rollup type search 
component before.

Thanks!

Darin

 On Nov 28, 2014, at 4:11 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 Because you're fetching and decompressing the doc from disk. Grouping etc.
 Do their work from _indexed_ terms, which are already in memory. Two
 different things.
 
 If I'm reading this right on a quick scan...
 
 Best
 Erick
 On Nov 28, 2014 10:21 AM, Darin Amos dari...@gmail.com wrote:
 
 Hi Eric,
 
 I am curious why this would b considered an anti-patern to check a stored
 valued for every matching document. Is this not what the facet query
 component is doing anyway so it can get the total counts?
 
 Grouping doesn’t solve the issue because again, I will only see groups for
 the items returned, not all matching documents.
 
 I am using a product that has SOLR 4.3 embedded into it, so I cannot
 upgrade to 4.9 or 4.10 and take advantage of the parent/child feature added
 recently.
 
 Thanks!
 
 Darin
 
 On Nov 28, 2014, at 12:03 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 Does grouping work for you here? Because
 even if you solve this problem, if I'm reading
 this right you're going to fetch stored values
 for every doc that matches the query, which
 is an anti-pattern big-time, consider *:*
 
 Of course I did a very quick skim, so maybe
 I'm all wet
 
 Best,
 Erick
 
 On Thu, Nov 27, 2014 at 5:28 PM, Darin Amos dari...@gmail.com wrote:
 Hello,
 
 I am trying to implement a Rollup Search component on a version of SOLR
 that exists previously to the parent/child additions, so I am trying to
 implement my own. The searches will be executed exclusively against the
 child documents, and I want to “rollup” those child documents into the
 parent documents.
 
 The interface is going to allow the user to add the following
 parameters to the SOLR query:
 
 rollup=truerollup.parentField=idrollup.childField=parentId
 
 My code so far is below. What I have works so far, except my second
 parent query loses the order. I would like to be able to sort my parent
 query by the score of the previous child search. Perhaps I would take the
 highest score from all children (haven’t decided yet). My problem however
 is that I don’t know how I can get the score from all the hits in the
 original search, just what is returned. If my child query gets 10,000 hits,
 but only return 100 records, I can’t get all the scores I need.
 
 Does anyone have any recommendations?
 
 Thanks!!
 Darin
 
 
   //Loop through all the records and look for the
 parent reference field
   SetString parentRefs = new HashSetString();
   DocIterator docSetIterator =
 rb.getResults().docSet.iterator();
   while(docSetIterator.hasNext()){
   int docInt = docSetIterator.next();
   String fieldValues[] =
 rb.req.getSearcher().doc(docInt).getValues(childFieldName);
 
   for(String fieldValue : fieldValues){
   if(fieldValue != null 
 fieldValue.length()  0  !parentRefs.contains(fieldValue)){
 
 parentRefs.add(fieldValue);
   }
   }
   }
 
   //Build a boolean query of term queries
   BooleanQuery parentQuery = new BooleanQuery();
   IteratorString parentIdIterator =
 parentRefs.iterator();
   while(parentIdIterator.hasNext()){
   String parentId =
 parentIdIterator.next();
   TermQuery termQuery = new TermQuery(new
 Term(parentFieldName, parentId));
   parentQuery.add(termQuery,
 BooleanClause.Occur.SHOULD);
   }
 
 
   DocList parentList =
 searcher.getDocList(parentQuery, new ArrayListQuery(), null, 0, 100, 1);
 //TODO: use correct start/end/flags later...
 
   //Add parent results
   ResultContext resultContext = new ResultContext();
   resultContext.docs = parentList;
   resultContext.query = parentQuery;
   rb.rsp.add(parents, resultContext);
   rb.rsp.getToLog().add(hits, parentList.matches());
 
 



Re: SolrCloud replica always fully resync index from leader node

2014-11-28 Thread stephon
Hello Erick,

My solrconfig.xml is in attachment.
solrconfig.xml
http://lucene.472066.n3.nabble.com/file/n4171487/solrconfig.xml  

It is running with a Debian server with 64GB RAM.

And the full replication evidence is coreA_shard1_replica2 is in recovering
state.
Since in this state, solr/coreA_shard1_replica2/ has a index.TIMESTAMP
directory which is full resyncing from the leader node, and runs out of my
rest disk space :/.

What is the correct way to make a replica in SolrCloud 4.5 ?

Thanks a lot.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replica-always-fully-resync-index-from-leader-node-tp4171403p4171487.html
Sent from the Solr - User mailing list archive at Nabble.com.


Constantly high disk read access (40-60M/s)

2014-11-28 Thread Po-Yu Chuang
Hi all,

I am using Solr 4.9 with Tomcat. Thanks to the suggestions from Yonik and
Dmitry about the slow start up. Everything works fine now, but I noticed
that the load average of the server is high because there is constantly
heavy disk read access. Please point me some directions.

Some numbers about my system:
RAM: 18G
swap space: 2G
number of documents: 27 million
Solr home: 185G
disk read access constantly 40-60M/s
document cache size: 16K entries
document cache hit ratio: 0.65
query cache size: 16K
query cache hit ratio: 0.03

At first, I wondered if the disk read comes from swap, so I decreased the
swappiness from 60 to 10, but the disk read is still there, which means
that the disk read access does not result from swapping in.

Then, I tried different document cache size and query different size. The
effect on changing query cache size is not obvious. I tried 512, 16K, 256K
entries and the hit ratio is between 0.01 to 0.03.

For document cache, the larger cache size did improve the hit ratio of
document cache size (I tried 512, 16K, 256K, 512K, 1024K and the hit ratio
is between 0.58 - 0.87), but the disk read is still high.

Is adjusting document cache size a reasonable direction? Or I should just
increase the physical memory? Is there any method to estimate the right
size of document cache (or other caches) and to estimate the size of
physical memory needed?

Thanks,
Po-Yu