I can confirm this behavior, seen when sending json docs in batch, never
happens when sending one by one, but sporadic when sending batches.
Like if sole/jetty drops couple of documents out of the batch.
Regards
> On 21 Jul 2015, at 21:38, Vineeth Dasaraju wrote:
>
> Hi,
>
> Thank You Erick for your inputs. I tried creating batches of 1000 objects
> and indexing it to solr. The performance is way better than before but I
> find that number of indexed documents that is shown in the dashboard is
> lesser than the number of documents that I had actually indexed through
> solrj. My code is as follows:
>
> private static String SOLR_SERVER_URL = "http://localhost:8983/solr/newcore
> ";
> private static String JSON_FILE_PATH = "/home/vineeth/week1_fixed.json";
> private static JSONParser parser = new JSONParser();
> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
>
> public static void main(String[] args) throws IOException,
> SolrServerException, ParseException {
>File file = new File(JSON_FILE_PATH);
>Scanner scn=new Scanner(file,"UTF-8");
>JSONObject object;
>int i = 0;
>Collection batch = new
> ArrayList();
>while(scn.hasNext()){
>object= (JSONObject) parser.parse(scn.nextLine());
>SolrInputDocument doc = indexJSON(object);
>batch.add(doc);
>if(i%1000==0){
>System.out.println("Indexed " + (i+1) + " objects." );
>solr.add(batch);
>batch = new ArrayList();
>}
>i++;
>}
>solr.add(batch);
>solr.commit();
>System.out.println("Indexed " + (i+1) + " objects." );
> }
>
> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
> ParseException, IOException, SolrServerException {
>Collection batch = new
> ArrayList();
>
>SolrInputDocument mainEvent = new SolrInputDocument();
>mainEvent.addField("id", generateID());
>mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
>mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
>mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
>mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
>
>Object obj = parser.parse(jsonOBJ.get("User").toString());
>JSONObject userObj = (JSONObject) obj;
>
>SolrInputDocument childUserEvent = new SolrInputDocument();
>childUserEvent.addField("id", generateID());
>childUserEvent.addField("User", userObj.get("User"));
>
>obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>JSONObject eventdescriptionObj = (JSONObject) obj;
>
>SolrInputDocument childEventDescEvent = new SolrInputDocument();
>childEventDescEvent.addField("id", generateID());
>childEventDescEvent.addField("EventApplicationName",
> eventdescriptionObj.get("EventApplicationName"));
>childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));
>
>obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
>JSONArray informationArray = (JSONArray) obj;
>
>for(int i = 0; iJSONObject domain = (JSONObject) informationArray.get(i);
>
>SolrInputDocument domainDoc = new SolrInputDocument();
>domainDoc.addField("id", generateID());
>domainDoc.addField("domainName", domain.get("domainName"));
>
>String s = domain.get("columns").toString();
>obj= JSONValue.parse(s);
>JSONArray ColumnsArray = (JSONArray) obj;
>
>SolrInputDocument columnsDoc = new SolrInputDocument();
>columnsDoc.addField("id", generateID());
>
>for(int j = 0; jJSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>SolrInputDocument columnDoc = new SolrInputDocument();
>columnDoc.addField("id", generateID());
>columnDoc.addField("movieName", ColumnsObj.get("movieName"));
>columnsDoc.addChildDocument(columnDoc);
>}
>domainDoc.addChildDocument(columnsDoc);
>childEventDescEvent.addChildDocument(domainDoc);
>}
>
>mainEvent.addChildDocument(childEventDescEvent);
>mainEvent.addChildDocument(childUserEvent);
>return mainEvent;
> }
>
> I would be grateful if you could let me know what I am missing.
>
> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson
> wrote:
>
>> First thing is it looks like you're only sending one document at a
>> time, perhaps with child objects. This is not optimal at all. I
>> usually batch my docs up in groups of 1,000, and there is anecdotal
>> evidence that there may (depending on the docs) be some gains above
>> that number. Gotta balance the batch size off against how bug the docs
>> are of course.
>>
>> Assuming that you really are calling this method for one doc