RE: Getting DIH status with SolrJ

2011-08-24 Thread Dyer, James
Shawn,

I do not know of an easy or a good way to do this.  It would be nice if there 
were a non-frail, programmatic way to get back DIH status but I don't think 
there is one.  I have a (monsterous) program that polls a running DIH handler 
every so often to get its status.  The crux is something like this:

DirectXmlRequest req = new DirectXmlRequest(requestUrl, null);
req.setMethod(METHOD.GET);
req.setParams(params);
NamedListObject nl = server.request(/dataimport);

String status = (String) nl.get(status);
String response = (String) nl.get(importResponse);

MapString, String msgs = (MapString, String) nl.get(statusMessages);
if(msgs!=null)
{
String numReq = (String) msgs.get(Total Requests made to DataSource);
String numRows = (String) msgs.get(Total Documents Processed);
String docsSkipped = (String) msgs.get(Total Documents Skipped);
String timeStarted = (String) msgs.get(Full Dump Started);
String elapsed = (String) msgs.get(Time taken );
String aborted = (String) msgs.get(Aborted);
String plaintextMsg = (String) msgs.get();
}

Not sure this is what you're after, but maybe it'd be helpful.  Like I say, I 
wish [I knew of|there was] a better way to do this...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, August 24, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: Getting DIH status with SolrJ

I can't figure out how to get the particular information I need out of a 
Solr response with SolrJ.  I see that QueryResponse has a number of 
methods for getting specific information out, but as far as I can see, 
none of them have anything at all to do with the DIH.  I've started out 
with the following code.  The solrCore object is a CommonsHttpSolrServer 
that is defined at the class level and initialized by the constructor:

 ModifiableSolrParams p = new ModifiableSolrParams();
 p.set(qt, /dataimport);
 QueryResponse qr = solrCore.query(p);

What do I do with qr?  I've been looking at the docs and cannot figure 
it out.  Can someone fill my poor clueless brain in on what I'm 
missing?  Is there a better approach than what I've got above?

 From the /dataimport response, I need to see the value of status and 
then I need to access several pieces of information under the 
statusMessages section.  I haven't been able to find an example.

Thanks,
Shawn



Re: Getting DIH status with SolrJ

2011-08-24 Thread Shawn Heisey

On 8/24/2011 3:24 PM, Dyer, James wrote:

Shawn,

I do not know of an easy or a good way to do this.  It would be nice if there 
were a non-frail, programmatic way to get back DIH status but I don't think 
there is one.  I have a (monsterous) program that polls a running DIH handler 
every so often to get its status.  The crux is something like this:

DirectXmlRequest req = new DirectXmlRequest(requestUrl, null);
req.setMethod(METHOD.GET);
req.setParams(params);
NamedListObject  nl = server.request(/dataimport);

String status = (String) nl.get(status);
String response = (String) nl.get(importResponse);

MapString, String  msgs = (MapString, String) nl.get(statusMessages);
if(msgs!=null)
{
String numReq = (String) msgs.get(Total Requests made to DataSource);
String numRows = (String) msgs.get(Total Documents Processed);
String docsSkipped = (String) msgs.get(Total Documents Skipped);
String timeStarted = (String) msgs.get(Full Dump Started);
String elapsed = (String) msgs.get(Time taken );
String aborted = (String) msgs.get(Aborted);
String plaintextMsg = (String) msgs.get();
}

Not sure this is what you're after, but maybe it'd be helpful.  Like I say, I 
wish [I knew of|there was] a better way to do this...


It might not be the prettiest code, but I'll take it.  Thank you.  I 
paraphrased quite a bit and have ended up with the following:


String numRows = null;
String elapsed = null;
String aborted = null;
String plaintextMsg = null;

SolrRequest req = new DirectXmlRequest(/dataimport, null);
NamedListObject nl = solrCore.request(req);

String status = (String) nl.get(status);
@SuppressWarnings(unchecked)
MapString, String msgs = (MapString, String) nl
.get(statusMessages);
if (msgs != null)
{
numRows = (String) msgs.get(Total Documents Processed);
elapsed = (String) msgs.get(Time taken );
aborted = (String) msgs.get(Aborted);
plaintextMsg = (String) msgs.get();
}

I've tried it and it seems to work reliably.  If anyone out there knows 
a better method to pull this off, I'd certainly like to hear about it.


Thanks,
Shawn



Re: Getting DIH status with SolrJ

2011-08-24 Thread Shawn Heisey

On 8/24/2011 4:15 PM, Shawn Heisey wrote:
It might not be the prettiest code, but I'll take it.  Thank you.  I 
paraphrased quite a bit and have ended up with the following:


I put all this into a somewhat generic method.  Hopefully it will prove 
useful to someone else on the list.  There are some minimal comments to 
explain what it does:


/**
 * Gets the DataImportHandler status.
 *
 * @return Long.MIN_VALUE: an error occurred, or the import never 
started.
 * Negative value: Import in progress, invert the sign to 
see how

 * many documents added so far. Zero or positive value: Import
 * complete, total number of documents added.
 * @throws SolrServerException
 * @throws IOException
 */
public long getDIHStatus() throws SolrServerException, IOException
{
Long processed = null;
String tmpProcessed = null;
String tmpFetched = null;
String elapsed = null;
String aborted = null;
String msg = null;

SolrRequest req = new DirectXmlRequest(/dataimport, null);
NamedListObject nl = solrCore.request(req);

String status = (String) nl.get(status);
@SuppressWarnings(unchecked)
MapString, String msgs = (MapString, String) nl
.get(statusMessages);
if (msgs != null)
{
tmpProcessed = (String) msgs.get(Total Documents Processed);
tmpFetched = (String) msgs.get(Total Rows Fetched);
elapsed = (String) msgs.get(Time taken );
aborted = (String) msgs.get(Aborted);
msg = (String) msgs.get();
}

/**
 * The Total Documents Processed field disappears between the 
time the
 * actual import is done and the DIH finishes indexing, 
committing, and
 * optimizing. If it's not there, try to pull it from the  
field. As a

 * last-ditch effort, get the (possibly inaccurate) value from the
 * Total Rows Fetched field.
 */
if (tmpProcessed != null)
{
processed = Long.parseLong(tmpProcessed);
}
else if (msg != null)
{
/**
 * Pull up to two numbers out of the message. Example: Indexing
 * completed. Added/Updated: 370055 documents. Deleted 0 
documents.

 */
Pattern p = Pattern.compile((\\d+));
Matcher m = p.matcher(msg);
if (m.find())
{
tmpProcessed = m.group();
processed = Long.parseLong(tmpProcessed);
}
if (m.find())
{
tmpProcessed = m.group();
processed += Long.parseLong(tmpProcessed);
}
}
else if (tmpFetched != null)
{
processed = Long.parseLong(tmpFetched);
}

/**
 * All available info has been gathered from the response. Now 
we parse

 * what we have and determine the return value.
 */
if (aborted != null || processed == null)
{
return Long.MIN_VALUE;
}

if (status.equals(busy))
{
if (processed == 0)
{
processed = -1L;
}
else
{
processed = -processed;
}
return processed;
}

if (status.equals(idle))
{
if (elapsed == null)
{
return Long.MIN_VALUE;
}
return processed;
}
return Long.MIN_VALUE;
}