Re: Tips for faster indexing

2015-07-21 Thread solr . user . 1507
I can confirm this behavior, seen when sending json docs in batch, never 
happens when sending one by one, but sporadic when sending batches.

Like if sole/jetty drops couple of documents out of the batch.

Regards

 On 21 Jul 2015, at 21:38, Vineeth Dasaraju vineeth.ii...@gmail.com wrote:
 
 Hi,
 
 Thank You Erick for your inputs. I tried creating batches of 1000 objects
 and indexing it to solr. The performance is way better than before but I
 find that number of indexed documents that is shown in the dashboard is
 lesser than the number of documents that I had actually indexed through
 solrj. My code is as follows:
 
 private static String SOLR_SERVER_URL = http://localhost:8983/solr/newcore
 ;
 private static String JSON_FILE_PATH = /home/vineeth/week1_fixed.json;
 private static JSONParser parser = new JSONParser();
 private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
 
 public static void main(String[] args) throws IOException,
 SolrServerException, ParseException {
File file = new File(JSON_FILE_PATH);
Scanner scn=new Scanner(file,UTF-8);
JSONObject object;
int i = 0;
CollectionSolrInputDocument batch = new
 ArrayListSolrInputDocument();
while(scn.hasNext()){
object= (JSONObject) parser.parse(scn.nextLine());
SolrInputDocument doc = indexJSON(object);
batch.add(doc);
if(i%1000==0){
System.out.println(Indexed  + (i+1) +  objects. );
solr.add(batch);
batch = new ArrayListSolrInputDocument();
}
i++;
}
solr.add(batch);
solr.commit();
System.out.println(Indexed  + (i+1) +  objects. );
 }
 
 public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
 ParseException, IOException, SolrServerException {
CollectionSolrInputDocument batch = new
 ArrayListSolrInputDocument();
 
SolrInputDocument mainEvent = new SolrInputDocument();
mainEvent.addField(id, generateID());
mainEvent.addField(RawEventMessage, jsonOBJ.get(RawEventMessage));
mainEvent.addField(EventUid, jsonOBJ.get(EventUid));
mainEvent.addField(EventCollector, jsonOBJ.get(EventCollector));
mainEvent.addField(EventMessageType, jsonOBJ.get(EventMessageType));
mainEvent.addField(TimeOfEvent, jsonOBJ.get(TimeOfEvent));
mainEvent.addField(TimeOfEventUTC, jsonOBJ.get(TimeOfEventUTC));
 
Object obj = parser.parse(jsonOBJ.get(User).toString());
JSONObject userObj = (JSONObject) obj;
 
SolrInputDocument childUserEvent = new SolrInputDocument();
childUserEvent.addField(id, generateID());
childUserEvent.addField(User, userObj.get(User));
 
obj = parser.parse(jsonOBJ.get(EventDescription).toString());
JSONObject eventdescriptionObj = (JSONObject) obj;
 
SolrInputDocument childEventDescEvent = new SolrInputDocument();
childEventDescEvent.addField(id, generateID());
childEventDescEvent.addField(EventApplicationName,
 eventdescriptionObj.get(EventApplicationName));
childEventDescEvent.addField(Query, eventdescriptionObj.get(Query));
 
obj= JSONValue.parse(eventdescriptionObj.get(Information).toString());
JSONArray informationArray = (JSONArray) obj;
 
for(int i = 0; iinformationArray.size(); i++){
JSONObject domain = (JSONObject) informationArray.get(i);
 
SolrInputDocument domainDoc = new SolrInputDocument();
domainDoc.addField(id, generateID());
domainDoc.addField(domainName, domain.get(domainName));
 
String s = domain.get(columns).toString();
obj= JSONValue.parse(s);
JSONArray ColumnsArray = (JSONArray) obj;
 
SolrInputDocument columnsDoc = new SolrInputDocument();
columnsDoc.addField(id, generateID());
 
for(int j = 0; jColumnsArray.size(); j++){
JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
SolrInputDocument columnDoc = new SolrInputDocument();
columnDoc.addField(id, generateID());
columnDoc.addField(movieName, ColumnsObj.get(movieName));
columnsDoc.addChildDocument(columnDoc);
}
domainDoc.addChildDocument(columnsDoc);
childEventDescEvent.addChildDocument(domainDoc);
}
 
mainEvent.addChildDocument(childEventDescEvent);
mainEvent.addChildDocument(childUserEvent);
return mainEvent;
 }
 
 I would be grateful if you could let me know what I am missing.
 
 On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
 First thing is it looks like you're only sending one document at a
 time, perhaps with child objects. This is not optimal at all. I
 usually batch my docs up in groups of 1,000, and there is anecdotal
 evidence that there may (depending on the docs) be some gains above
 that number. Gotta balance the batch size off against how bug the docs
 are of course.
 
 Assuming that you really are calling this method for one doc (and
 

Re: Basic auth

2015-07-19 Thread solr . user . 1507
I followed this guide:
http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr

But there is some something wrong, can anyone help or refer to a guide on how 
to setup http basic auth?

Regards

 On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
 
 SOLR-4470 is about:
 Support for basic auth in internal Solr  requests.
 
 What is wrong with the internal requests?
 Can someone help simplify, would it ever be possible to run with basic auth? 
 What work arounds?
 
 Regards


Basic auth

2015-07-18 Thread solr . user . 1507
SOLR-4470 is about:
Support for basic auth in internal Solr  requests.

What is wrong with the internal requests?
Can someone help simplify, would it ever be possible to run with basic auth? 
What work arounds?

Regards

Re: Programmatically find out if node is overseer

2015-07-17 Thread solr . user . 1507
Hi Anshum what do you mean by:
ideally, there shouldn't be a point where you have multiple active
Overseers in a single cluster

How can multiple Overseers happen? And what are the consequences?

Regards

 On 17 Jul 2015, at 19:37, Anshum Gupta ans...@anshumgupta.net wrote:
 
 ideally, there shouldn't be a point where you have multiple active
 Overseers in a single cluster


Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thanks Shawn, but don't want to build something in front of Solr cloud to help 
Solr assign leader role to distribute load of indexing.

Instead of doing this manual step (rebalance leaders) maybe one host should not 
take the leader role of multiple shards for same collection if the number of 
live nodes are equal to number of shards.

But assuming that when you say it will happen over time, Maybe I'll continue 
indexing and see that leaders will be rebalanced soon.

Regards

 On 16 Jul 2015, at 14:57, Shawn Heisey apa...@elyograg.org wrote:
 
 On 7/16/2015 5:51 AM, SolrUser2015 wrote:
 Hi, I'm new to solr!
 
 So downloaded version 5.2 and modified the solr file so it allows me to 
 create a 5 node cluster:
 
 5 shards and replication factor 3 
 
 Now I see that one node is marked as leader for 3 shards.
 
 So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
 be uneven distribution of load?  
 
 SolrCloud will distribute individual queries to different replicas, so
 over time the entire cloud will be used.  The leader role shouldn't
 affect queries, that role is mostly there for indexing and fault handling.
 
 If you are really concerned about this, you can assign preferred leaders
 and then ask Solr to reshuffle them.  I have never used this
 functionality.  Here's the documentation on it:
 
 https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders
 
 Thanks,
 Shawn
 


Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thank you, very good explanation.

Regards

 On 16 Jul 2015, at 17:12, Shawn Heisey apa...@elyograg.org wrote:
 
 On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote:
 Thanks Shawn, but don't want to build something in front of Solr cloud to 
 help Solr assign leader role to distribute load of indexing.
 
 Instead of doing this manual step (rebalance leaders) maybe one host should 
 not take the leader role of multiple shards for same collection if the 
 number of live nodes are equal to number of shards.
 
 But assuming that when you say it will happen over time, Maybe I'll 
 continue indexing and see that leaders will be rebalanced soon.
 
 Unless you have a fairly major event (like Solr restarting or an
 operation taking longer than zkClientTimeout) your leaders will never
 change.  It's a semi-permanent role.  When a qualifying event happens,
 SolrCloud does an election process to determine the leader, but
 elections do not happen unless you force them with a REBALANCELEADERS
 action or one of several errors occurs.
 
 You don't have to build anything in front of Solr.  You simply have to
 assign a preferred leader for each shard, an action that can be done
 with an HTTP call in a browser.  I don't think we have anything in the
 admin UI to assign preferred leaders ... I will look into it and open an
 issue if necessary.
 
 The thing that I'm saying will happen over time is that all replicas
 will be used for queries.  If you send a thousand queries, you'll find
 that they will be divided fairly evenly among all replicas.  The fact
 that you have one node as leader for three of your shards is not very
 much of a big deal, but if you really want to change it, you can do so
 with the preferred leader feature.
 
 Thanks,
 Shawn