I can confirm this behavior, seen when sending json docs in batch, never
happens when sending one by one, but sporadic when sending batches.
Like if sole/jetty drops couple of documents out of the batch.
Regards
On 21 Jul 2015, at 21:38, Vineeth Dasaraju vineeth.ii...@gmail.com wrote:
Hi,
Thank You Erick for your inputs. I tried creating batches of 1000 objects
and indexing it to solr. The performance is way better than before but I
find that number of indexed documents that is shown in the dashboard is
lesser than the number of documents that I had actually indexed through
solrj. My code is as follows:
private static String SOLR_SERVER_URL = http://localhost:8983/solr/newcore
;
private static String JSON_FILE_PATH = /home/vineeth/week1_fixed.json;
private static JSONParser parser = new JSONParser();
private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
public static void main(String[] args) throws IOException,
SolrServerException, ParseException {
File file = new File(JSON_FILE_PATH);
Scanner scn=new Scanner(file,UTF-8);
JSONObject object;
int i = 0;
CollectionSolrInputDocument batch = new
ArrayListSolrInputDocument();
while(scn.hasNext()){
object= (JSONObject) parser.parse(scn.nextLine());
SolrInputDocument doc = indexJSON(object);
batch.add(doc);
if(i%1000==0){
System.out.println(Indexed + (i+1) + objects. );
solr.add(batch);
batch = new ArrayListSolrInputDocument();
}
i++;
}
solr.add(batch);
solr.commit();
System.out.println(Indexed + (i+1) + objects. );
}
public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
ParseException, IOException, SolrServerException {
CollectionSolrInputDocument batch = new
ArrayListSolrInputDocument();
SolrInputDocument mainEvent = new SolrInputDocument();
mainEvent.addField(id, generateID());
mainEvent.addField(RawEventMessage, jsonOBJ.get(RawEventMessage));
mainEvent.addField(EventUid, jsonOBJ.get(EventUid));
mainEvent.addField(EventCollector, jsonOBJ.get(EventCollector));
mainEvent.addField(EventMessageType, jsonOBJ.get(EventMessageType));
mainEvent.addField(TimeOfEvent, jsonOBJ.get(TimeOfEvent));
mainEvent.addField(TimeOfEventUTC, jsonOBJ.get(TimeOfEventUTC));
Object obj = parser.parse(jsonOBJ.get(User).toString());
JSONObject userObj = (JSONObject) obj;
SolrInputDocument childUserEvent = new SolrInputDocument();
childUserEvent.addField(id, generateID());
childUserEvent.addField(User, userObj.get(User));
obj = parser.parse(jsonOBJ.get(EventDescription).toString());
JSONObject eventdescriptionObj = (JSONObject) obj;
SolrInputDocument childEventDescEvent = new SolrInputDocument();
childEventDescEvent.addField(id, generateID());
childEventDescEvent.addField(EventApplicationName,
eventdescriptionObj.get(EventApplicationName));
childEventDescEvent.addField(Query, eventdescriptionObj.get(Query));
obj= JSONValue.parse(eventdescriptionObj.get(Information).toString());
JSONArray informationArray = (JSONArray) obj;
for(int i = 0; iinformationArray.size(); i++){
JSONObject domain = (JSONObject) informationArray.get(i);
SolrInputDocument domainDoc = new SolrInputDocument();
domainDoc.addField(id, generateID());
domainDoc.addField(domainName, domain.get(domainName));
String s = domain.get(columns).toString();
obj= JSONValue.parse(s);
JSONArray ColumnsArray = (JSONArray) obj;
SolrInputDocument columnsDoc = new SolrInputDocument();
columnsDoc.addField(id, generateID());
for(int j = 0; jColumnsArray.size(); j++){
JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
SolrInputDocument columnDoc = new SolrInputDocument();
columnDoc.addField(id, generateID());
columnDoc.addField(movieName, ColumnsObj.get(movieName));
columnsDoc.addChildDocument(columnDoc);
}
domainDoc.addChildDocument(columnsDoc);
childEventDescEvent.addChildDocument(domainDoc);
}
mainEvent.addChildDocument(childEventDescEvent);
mainEvent.addChildDocument(childUserEvent);
return mainEvent;
}
I would be grateful if you could let me know what I am missing.
On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson erickerick...@gmail.com
wrote:
First thing is it looks like you're only sending one document at a
time, perhaps with child objects. This is not optimal at all. I
usually batch my docs up in groups of 1,000, and there is anecdotal
evidence that there may (depending on the docs) be some gains above
that number. Gotta balance the batch size off against how bug the docs
are of course.
Assuming that you really are calling this method for one doc (and