Re: Tips for faster indexing

2015-07-21 Thread solr . user . 1507
I can confirm this behavior, seen when sending json docs in batch, never 
happens when sending one by one, but sporadic when sending batches.

Like if sole/jetty drops couple of documents out of the batch.

Regards

> On 21 Jul 2015, at 21:38, Vineeth Dasaraju  wrote:
> 
> Hi,
> 
> Thank You Erick for your inputs. I tried creating batches of 1000 objects
> and indexing it to solr. The performance is way better than before but I
> find that number of indexed documents that is shown in the dashboard is
> lesser than the number of documents that I had actually indexed through
> solrj. My code is as follows:
> 
> private static String SOLR_SERVER_URL = "http://localhost:8983/solr/newcore
> ";
> private static String JSON_FILE_PATH = "/home/vineeth/week1_fixed.json";
> private static JSONParser parser = new JSONParser();
> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
> 
> public static void main(String[] args) throws IOException,
> SolrServerException, ParseException {
>File file = new File(JSON_FILE_PATH);
>Scanner scn=new Scanner(file,"UTF-8");
>JSONObject object;
>int i = 0;
>Collection batch = new
> ArrayList();
>while(scn.hasNext()){
>object= (JSONObject) parser.parse(scn.nextLine());
>SolrInputDocument doc = indexJSON(object);
>batch.add(doc);
>if(i%1000==0){
>System.out.println("Indexed " + (i+1) + " objects." );
>solr.add(batch);
>batch = new ArrayList();
>}
>i++;
>}
>solr.add(batch);
>solr.commit();
>System.out.println("Indexed " + (i+1) + " objects." );
> }
> 
> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
> ParseException, IOException, SolrServerException {
>Collection batch = new
> ArrayList();
> 
>SolrInputDocument mainEvent = new SolrInputDocument();
>mainEvent.addField("id", generateID());
>mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
>mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
>mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
>mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
> 
>Object obj = parser.parse(jsonOBJ.get("User").toString());
>JSONObject userObj = (JSONObject) obj;
> 
>SolrInputDocument childUserEvent = new SolrInputDocument();
>childUserEvent.addField("id", generateID());
>childUserEvent.addField("User", userObj.get("User"));
> 
>obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>JSONObject eventdescriptionObj = (JSONObject) obj;
> 
>SolrInputDocument childEventDescEvent = new SolrInputDocument();
>childEventDescEvent.addField("id", generateID());
>childEventDescEvent.addField("EventApplicationName",
> eventdescriptionObj.get("EventApplicationName"));
>childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));
> 
>obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
>JSONArray informationArray = (JSONArray) obj;
> 
>for(int i = 0; iJSONObject domain = (JSONObject) informationArray.get(i);
> 
>SolrInputDocument domainDoc = new SolrInputDocument();
>domainDoc.addField("id", generateID());
>domainDoc.addField("domainName", domain.get("domainName"));
> 
>String s = domain.get("columns").toString();
>obj= JSONValue.parse(s);
>JSONArray ColumnsArray = (JSONArray) obj;
> 
>SolrInputDocument columnsDoc = new SolrInputDocument();
>columnsDoc.addField("id", generateID());
> 
>for(int j = 0; jJSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>SolrInputDocument columnDoc = new SolrInputDocument();
>columnDoc.addField("id", generateID());
>columnDoc.addField("movieName", ColumnsObj.get("movieName"));
>columnsDoc.addChildDocument(columnDoc);
>}
>domainDoc.addChildDocument(columnsDoc);
>childEventDescEvent.addChildDocument(domainDoc);
>}
> 
>mainEvent.addChildDocument(childEventDescEvent);
>mainEvent.addChildDocument(childUserEvent);
>return mainEvent;
> }
> 
> I would be grateful if you could let me know what I am missing.
> 
> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson 
> wrote:
> 
>> First thing is it looks like you're only sending one document at a
>> time, perhaps with child objects. This is not optimal at all. I
>> usually batch my docs up in groups of 1,000, and there is anecdotal
>> evidence that there may (depending on the docs) be some gains above
>> that number. Gotta balance the batch size off against how bug the docs
>> are of course.
>> 
>> Assuming that you really are calling this method for one doc

Re: Basic auth

2015-07-19 Thread solr . user . 1507
I followed this guide:
http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr

But there is some something wrong, can anyone help or refer to a guide on how 
to setup http basic auth?

Regards

> On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote:
> 
> SOLR-4470 is about:
> Support for basic auth in internal Solr  requests.
> 
> What is wrong with the internal requests?
> Can someone help simplify, would it ever be possible to run with basic auth? 
> What work arounds?
> 
> Regards


Basic auth

2015-07-18 Thread solr . user . 1507
SOLR-4470 is about:
Support for basic auth in internal Solr  requests.

What is wrong with the internal requests?
Can someone help simplify, would it ever be possible to run with basic auth? 
What work arounds?

Regards

Re: Programmatically find out if node is overseer

2015-07-17 Thread solr . user . 1507
Hi Anshum what do you mean by:
>ideally, there shouldn't be a point where you have multiple active
Overseers in a single cluster

How can multiple Overseers happen? And what are the consequences?

Regards

> On 17 Jul 2015, at 19:37, Anshum Gupta  wrote:
> 
> ideally, there shouldn't be a point where you have multiple active
> Overseers in a single cluster


Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thank you, very good explanation.

Regards

> On 16 Jul 2015, at 17:12, Shawn Heisey  wrote:
> 
>> On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote:
>> Thanks Shawn, but don't want to build something in front of Solr cloud to 
>> help Solr assign leader role to distribute load of indexing.
>> 
>> Instead of doing this manual step (rebalance leaders) maybe one host should 
>> not take the leader role of multiple shards for same collection if the 
>> number of live nodes are equal to number of shards.
>> 
>> But assuming that when you say it will happen "over time", Maybe I'll 
>> continue indexing and see that leaders will be rebalanced soon.
> 
> Unless you have a fairly major event (like Solr restarting or an
> operation taking longer than zkClientTimeout) your leaders will never
> change.  It's a semi-permanent role.  When a qualifying event happens,
> SolrCloud does an election process to determine the leader, but
> elections do not happen unless you force them with a REBALANCELEADERS
> action or one of several errors occurs.
> 
> You don't have to build anything in front of Solr.  You simply have to
> assign a preferred leader for each shard, an action that can be done
> with an HTTP call in a browser.  I don't think we have anything in the
> admin UI to assign preferred leaders ... I will look into it and open an
> issue if necessary.
> 
> The thing that I'm saying will happen over time is that all replicas
> will be used for queries.  If you send a thousand queries, you'll find
> that they will be divided fairly evenly among all replicas.  The fact
> that you have one node as leader for three of your shards is not very
> much of a big deal, but if you really want to change it, you can do so
> with the preferred leader feature.
> 
> Thanks,
> Shawn
> 


Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thanks Shawn, but don't want to build something in front of Solr cloud to help 
Solr assign leader role to distribute load of indexing.

Instead of doing this manual step (rebalance leaders) maybe one host should not 
take the leader role of multiple shards for same collection if the number of 
live nodes are equal to number of shards.

But assuming that when you say it will happen "over time", Maybe I'll continue 
indexing and see that leaders will be rebalanced soon.

Regards

> On 16 Jul 2015, at 14:57, Shawn Heisey  wrote:
> 
>> On 7/16/2015 5:51 AM, SolrUser2015 wrote:
>> Hi, I'm new to solr!
>> 
>> So downloaded version 5.2 and modified the solr file so it allows me to 
>> create a 5 node cluster:
>> 
>>> 5 shards and replication factor 3 <
>> 
>> Now I see that one node is marked as leader for 3 shards.
>> 
>> So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
>> be uneven distribution of load?  
> 
> SolrCloud will distribute individual queries to different replicas, so
> over time the entire cloud will be used.  The leader role shouldn't
> affect queries, that role is mostly there for indexing and fault handling.
> 
> If you are really concerned about this, you can assign preferred leaders
> and then ask Solr to reshuffle them.  I have never used this
> functionality.  Here's the documentation on it:
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders
> 
> Thanks,
> Shawn
>