Re: Doc add limit problem, old issue

2006-09-09 Thread Michael Imbeault
Fixed my problem, the implementation of solPHP was faulty. It was 
sending one doc at a time (one curl per doc) and the system quickly ran 
out of resources. Now I modified it to send by batch (1000 at a time) 
and everything is #1!


Michael Imbeault wrote:
Old issue (see 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html), 
but I'm experiencing the same exact thing on windows xp, latest 
tomcat. I noticed that the tomcat process gobbles memory (10 megs a 
second maybe) and then jams at 125 megs. Can't find a fix yet. I'm 
using a php interface and curl to post my xml, one document at a time, 
and commit every 100 document. Indexing 3 docs, it hangs at maybe 
5000. Anyone got an idea on this one? It would be helpful. I may try 
to switch to jetty tomorrow if nothing works :(



--

Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Doc add limit, im experiencing it too

2006-09-06 Thread sangraal aiken

I sent out an email about this a while back, but basically this limit
appears only on Tomcat and only when Solr attempts to write to the response.


You can work around it by splitting up your posts so that you're posting
less than 5000 (or whatever your limit seems to be) at a time. You DO NOT
have to commit after each post. I recently indexed a 38 million document
data base with this problem and although it took about 8-9 hours it did
work... I only commited every 100,000 or so.

-Sangraal

On 9/6/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:


Old issue (see
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html),
but I'm experiencing the same exact thing on windows xp, latest tomcat.
I noticed that the tomcat process gobbles memory (10 megs a second
maybe) and then jams at 125 megs. Can't find a fix yet. I'm using a php
interface and curl to post my xml, one document at a time, and commit
every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone
got an idea on this one? It would be helpful. I may try to switch to
jetty tomorrow if nothing works :(

--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212



Re: Doc add limit, im experiencing it too

2006-09-06 Thread Chris Hostetter

: Old issue (see
: http://www.mail-archive.com/solr-user@lucene.apache.org/msg00651.html),
: but I'm experiencing the same exact thing on windows xp, latest tomcat.

did you notice the followup thread from sangraal where he mentioned that
he'd narrowed the problem down to both using Tomcat and submitting docs
containing CDATA? ...

http://www.nabble.com/Add-doc-limit---Follow-Up-tf2186440.html#a6048436

: I noticed that the tomcat process gobbles memory (10 megs a second
: maybe) and then jams at 125 megs. Can't find a fix yet. I'm using a php
: interface and curl to post my xml, one document at a time, and commit
: every 100 document. Indexing 3 docs, it hangs at maybe 5000. Anyone

interesting ... you may not be hitting the same problem, Sangraal
specificly said he only encountered this bug when submitting a large
number of docs in a single request -- when you say it jams at 125 what do
you mean? ... are you sure you aren't just getting an OutOfMemory error?




-Hoss



Re: Doc add limit

2006-08-01 Thread Chris Hostetter
:
: The only output I get from 'lsof -p' that pertains to TCP connections are
: the following...I'm not too sure how to interpret it though:

i'm not much of an lsof expert either, but maybe someone else has some
suggestions -- one thing though, it's not just the TCP connections i'm
interested in, i'm wondering about all the filehandles both your client
and the tomcat server have open at the moment it hangs ... are any files
listed more then once? are there a large number of TCP connections open
for either process?

: > You are using InputStreamReader to deal with the InputStreams of your
: > remote XML files -- but you aren't specifying a charset, so it's using
: > your system default which may be differnet from the charset of the
: > orriginal XML files you are pulling from the URL -- which (i *think*)
: > means that your InputStreamReader may in some cases fail to read all of
: > the bytes of the stream, which might some dangling filehandles (i'm just
: > guessing on that part ... i'm not acctually sure whta happens in that
: > case).
: >
: > What if you simplify your code (for the purposes of testing) and just put
: > the post-transform version ganja-full.xml in a big ass String variable in
: > your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
: > over again ... does that cause the same problem?

: In the code, I read the XML with a StringReader and then pass it to
: GanjaUpdate as a string anyway.  I've output the String object and verified
: that it is in fact all there.

right, but that's only part of what i was wondering about .. did you try
that last suggestion i had for simplifying the testing.  jsut and paste
that string into a big String literarly and call doUPdate on it in a loop
(completley eliminating all the other aspects of your code) to verify if
the Solr communication is causing the problem (and not some other resource
limitation caused elsewhere in the code) ?


-Hoss



Re: Doc add limit

2006-07-31 Thread sangraal aiken

Just an update, I changed my doUpdate method to use the HTTPClient API and
have the exact same problem... the update hangs at exactly the same point...
6,144.

 private String doUpdate(String sw) {
   StringBuffer updateResult = new StringBuffer();
   try {
 // open connection
 log.info("Connecting to and preparing to post to SolrUpdate
servlet.");
 URL url = new URL("http://localhost:8080/update";);
 HTTPConnection con = new HTTPConnection(url);
 HTTPResponse resp = con.Post(url.getFile(), sw);

 if (resp.getStatusCode() >= 300) {
   System.err.println("Received Error: " + resp.getReasonLine());
   System.err.println(resp.getText());
 } else {
   updateResult.append(new String(resp.getData()));
 }

 log.info("Done updating Solr for site " + updateSite);
   } catch (IOException ioe) {
 System.err.println(ioe.toString());
   } catch (ModuleException me) {
 System.err.println("Error handling request: " + me.getMessage());
   } catch (Exception e) {
 System.err.println("Unknown Error: " + e.getMessage());
   }

   return updateResult.toString();
 }

-S

On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote:


Very interesting... thanks Thom. I haven't given HttpClient a shot yet,
but will be soon.

-S


On 7/31/06, Thom Nelson < [EMAIL PROTECTED]> wrote:
>
> I had a similar problem and was able to fix it in Solr by manually
> buffering the responses to a StringWriter before sending it to Tomcat.
> Essentially, Tomcat's buffer will only hold so much and at that point
> it blocks (thus it always hangs at a constant number of documents).
> However, a better solution (to be implemented) is to use more
> intelligent code on the client to read the response at the same time
> that it is sending input -- not too difficult to do, though best to do
> with two threads ( i.e. fire off a thread to read the response before
> you send any data).  Seeing as the HttpClient code probably does this
> already, I'll most likely end up using that.
>
> On 7/31/06, sangraal aiken < [EMAIL PROTECTED]> wrote:
> > Those are some great ideas Chris... I'm going to try some of them
> out.  I'll
> > post the results when I get a chance to do more testing. Thanks.
> >
> > At this point I can work around the problem by ignoring Solr's
> response but
> > this is obviously not ideal. I would feel better knowing what is
> causing the
> > issue as well.
> >
> > -Sangraal
> >
> >
> >
> > On 7/29/06, Chris Hostetter < [EMAIL PROTECTED]> wrote:
> > >
> > >
> > > : Sure, the method that does all the work updating Solr is the
> > > doUpdate(String
> > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging
> when
> > > I
> > > : try to read the response... the last output I receive in my log is
> Got
> > > : Reader...
> > >
> > > I don't have the means to try out this code right now ... but i
> can't see
> > > any obvious problems with it (there may be somewhere that you are
> opening
> > > a stream or reader and not closing it, but i didn't see one) ... i
> notice
> > > you are running this client on the same machine as Solr (hence the
> > > localhost URLs) did you by any chance try running the client on a
> seperate
> > > machine to see if hte number of updates before it hangs changes?
> > >
> > > my money is still on a filehandle resource limit somwhere ... if you
> are
> > > running on a system that has "lsof" (on some Unix/Linux
> installations you
> > > need sudo/su root permissions to run it) you can use "lsof -p "
> to
> > > look up what files/network connections are open for a given
> process.  You
> > > can try running that on both the client pid and the Solr server pid
> once
> > > it's hung -- You'll probably see a lot of Jar files in use for both,
> but
> > > if you see more then a few XML files open by the client, or more
> then a
> > > 1 TCP connection open by either the client or the server, there's
> your
> > > culprit.
> > >
> > > I'm not sure what Windows equivilent of lsof may exist.
> > >
> > > Wait ... i just had another thought
> > >
> > > You are using InputStreamReader to deal with the InputStreams of
> your
> > > remote XML files -- but you aren't specifying a charset, so it's
> using
> > > your system default which may be differnet from the charset of the
> > > orriginal XML files you are pulling from the URL -- which (i
> *think*)
> > > means that your InputStreamReader may in some cases fail to read all
> of
> > > the bytes of the stream, which might some dangling filehandles (i'm
> just
> > > guessing on that part ... i'm not acctually sure whta happens in
> that
> > > case).
> > >
> > > What if you simplify your code (for the purposes of testing) and
> just put
> > > the post-transform version ganja-full.xml in a big ass String
> variable in
> > > your java app and just call GanjaUpdate.doUpdate(bigAssString) over
> and
> > > over again ... does that cause the same problem?
> > >
> > >
> > > :
> > > : --
> > > :
> > > : package co

Re: Doc add limit

2006-07-31 Thread sangraal aiken

Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but
will be soon.

-S

On 7/31/06, Thom Nelson <[EMAIL PROTECTED]> wrote:


I had a similar problem and was able to fix it in Solr by manually
buffering the responses to a StringWriter before sending it to Tomcat.
Essentially, Tomcat's buffer will only hold so much and at that point
it blocks (thus it always hangs at a constant number of documents).
However, a better solution (to be implemented) is to use more
intelligent code on the client to read the response at the same time
that it is sending input -- not too difficult to do, though best to do
with two threads (i.e. fire off a thread to read the response before
you send any data).  Seeing as the HttpClient code probably does this
already, I'll most likely end up using that.

On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Those are some great ideas Chris... I'm going to try some of them
out.  I'll
> post the results when I get a chance to do more testing. Thanks.
>
> At this point I can work around the problem by ignoring Solr's response
but
> this is obviously not ideal. I would feel better knowing what is causing
the
> issue as well.
>
> -Sangraal
>
>
>
> On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> >
> > : Sure, the method that does all the work updating Solr is the
> > doUpdate(String
> > : s) method in the GanjaUpdate class I'm pasting below. It's hanging
when
> > I
> > : try to read the response... the last output I receive in my log is
Got
> > : Reader...
> >
> > I don't have the means to try out this code right now ... but i can't
see
> > any obvious problems with it (there may be somewhere that you are
opening
> > a stream or reader and not closing it, but i didn't see one) ... i
notice
> > you are running this client on the same machine as Solr (hence the
> > localhost URLs) did you by any chance try running the client on a
seperate
> > machine to see if hte number of updates before it hangs changes?
> >
> > my money is still on a filehandle resource limit somwhere ... if you
are
> > running on a system that has "lsof" (on some Unix/Linux installations
you
> > need sudo/su root permissions to run it) you can use "lsof -p " to
> > look up what files/network connections are open for a given
process.  You
> > can try running that on both the client pid and the Solr server pid
once
> > it's hung -- You'll probably see a lot of Jar files in use for both,
but
> > if you see more then a few XML files open by the client, or more then
a
> > 1 TCP connection open by either the client or the server, there's your
> > culprit.
> >
> > I'm not sure what Windows equivilent of lsof may exist.
> >
> > Wait ... i just had another thought
> >
> > You are using InputStreamReader to deal with the InputStreams of your
> > remote XML files -- but you aren't specifying a charset, so it's using
> > your system default which may be differnet from the charset of the
> > orriginal XML files you are pulling from the URL -- which (i *think*)
> > means that your InputStreamReader may in some cases fail to read all
of
> > the bytes of the stream, which might some dangling filehandles (i'm
just
> > guessing on that part ... i'm not acctually sure whta happens in that
> > case).
> >
> > What if you simplify your code (for the purposes of testing) and just
put
> > the post-transform version ganja-full.xml in a big ass String variable
in
> > your java app and just call GanjaUpdate.doUpdate(bigAssString) over
and
> > over again ... does that cause the same problem?
> >
> >
> > :
> > : --
> > :
> > : package com.iceninetech.solr.update;
> > :
> > : import com.iceninetech.xml.XMLTransformer;
> > :
> > : import java.io.*;
> > : import java.net.HttpURLConnection;
> > : import java.net.URL;
> > : import java.util.logging.Logger;
> > :
> > : public class GanjaUpdate {
> > :
> > :   private String updateSite = "";
> > :   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;
> > :
> > :   private static final File xmlStorageDir = new
> > : File("/source/solr/xml-dls/");
> > :
> > :   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
> > :
> > :   public GanjaUpdate(String siteName) {
> > : this.updateSite = siteName;
> > : log.info("GanjaUpdate is primed and ready to update " +
siteName);
> > :   }
> > :
> > :   public void update() {
> > : StringWriter sw = new StringWriter();
> > :
> > : try {
> > :   // transform gawkerInput XML to SOLR update XML
> > :   XMLTransformer transform = new XMLTransformer();
> > :   log.info("About to transform ganjaInput XML to Solr Update
XML");
> > :   transform.transform(getXML(), sw, getXSL());
> > :   log.info("Completed ganjaInput/SolrUpdate XML transform");
> > :
> > :   // Write transformed XML to Disk.
> > :   File transformedXML = new File(xmlStorageDir,
updateSite+".sml");
> > :   FileWriter fw = new FileWriter(transformedXML);
> > :   fw.write(sw.toStri

Re: Doc add limit

2006-07-31 Thread sangraal aiken

Chris, my response is below each of your paragraphs...


I don't have the means to try out this code right now ... but i can't see

any obvious problems with it (there may be somewhere that you are opening
a stream or reader and not closing it, but i didn't see one) ... i notice
you are running this client on the same machine as Solr (hence the
localhost URLs) did you by any chance try running the client on a seperate
machine to see if hte number of updates before it hangs changes?



When I run the client locally and the Solr server on a slower and separate
development box, the maximum number of updates drops to 3,219. So it's
almost as if it's related to some sort of timeout problem because the
maximum number of updates drops considerably on a slower machine, but it's
weird how consistent the number is. 6,144 locally, 5,000 something when I
run it on the external server, and 3,219 when the client is separate from
the server.

my money is still on a filehandle resource limit somwhere ... if you are

running on a system that has "lsof" (on some Unix/Linux installations you
need sudo/su root permissions to run it) you can use "lsof -p " to
look up what files/network connections are open for a given process.  You
can try running that on both the client pid and the Solr server pid once
it's hung -- You'll probably see a lot of Jar files in use for both, but
if you see more then a few XML files open by the client, or more then a
1 TCP connection open by either the client or the server, there's your
culprit.



The only output I get from 'lsof -p' that pertains to TCP connections are
the following...I'm not too sure how to interpret it though:
java4104 sangraal  261u  IPv6 0x5b060f0   0t0  TCP *:8009
(LISTEN)
java4104 sangraal  262u  IPv6 0x55d59e8   0t0  TCP
[::127.0.0.1]:8005
(LISTEN)
java4104 sangraal  263u  IPv6 0x53cc0e0   0t0  TCP [::127.0.0.1
]:http-alt->[::127.0.0.1]:51039 (ESTABLISHED)
java4104 sangraal  264u  IPv6 0x5b059d0   0t0  TCP [::127.0.0.1
]:51045->[::127.0.0.1]:http-alt (ESTABLISHED)
java4104 sangraal  265u  IPv6 0x53cc9c8   0t0  TCP [::127.0.0.1
]:http-alt->[::127.0.0.1]:51045 (ESTABLISHED)
java4104 sangraal   11u  IPv6 0x5b04f20   0t0  TCP *:http-alt
(LISTEN)
java4104 sangraal   12u  IPv6 0x5b06d68   0t0  TCP
localhost:51037->localhost:51036 (TIME_WAIT)

I'm not sure what Windows equivilent of lsof may exist.


Wait ... i just had another thought

You are using InputStreamReader to deal with the InputStreams of your
remote XML files -- but you aren't specifying a charset, so it's using
your system default which may be differnet from the charset of the
orriginal XML files you are pulling from the URL -- which (i *think*)
means that your InputStreamReader may in some cases fail to read all of
the bytes of the stream, which might some dangling filehandles (i'm just
guessing on that part ... i'm not acctually sure whta happens in that
case).

What if you simplify your code (for the purposes of testing) and just put
the post-transform version ganja-full.xml in a big ass String variable in
your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
over again ... does that cause the same problem?



In the code, I read the XML with a StringReader and then pass it to
GanjaUpdate as a string anyway.  I've output the String object and verified
that it is in fact all there.


-Sangraal


Re: Doc add limit

2006-07-31 Thread Thom Nelson

I had a similar problem and was able to fix it in Solr by manually
buffering the responses to a StringWriter before sending it to Tomcat.
Essentially, Tomcat's buffer will only hold so much and at that point
it blocks (thus it always hangs at a constant number of documents).
However, a better solution (to be implemented) is to use more
intelligent code on the client to read the response at the same time
that it is sending input -- not too difficult to do, though best to do
with two threads (i.e. fire off a thread to read the response before
you send any data).  Seeing as the HttpClient code probably does this
already, I'll most likely end up using that.

On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Those are some great ideas Chris... I'm going to try some of them out.  I'll
post the results when I get a chance to do more testing. Thanks.

At this point I can work around the problem by ignoring Solr's response but
this is obviously not ideal. I would feel better knowing what is causing the
issue as well.

-Sangraal



On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : Sure, the method that does all the work updating Solr is the
> doUpdate(String
> : s) method in the GanjaUpdate class I'm pasting below. It's hanging when
> I
> : try to read the response... the last output I receive in my log is Got
> : Reader...
>
> I don't have the means to try out this code right now ... but i can't see
> any obvious problems with it (there may be somewhere that you are opening
> a stream or reader and not closing it, but i didn't see one) ... i notice
> you are running this client on the same machine as Solr (hence the
> localhost URLs) did you by any chance try running the client on a seperate
> machine to see if hte number of updates before it hangs changes?
>
> my money is still on a filehandle resource limit somwhere ... if you are
> running on a system that has "lsof" (on some Unix/Linux installations you
> need sudo/su root permissions to run it) you can use "lsof -p " to
> look up what files/network connections are open for a given process.  You
> can try running that on both the client pid and the Solr server pid once
> it's hung -- You'll probably see a lot of Jar files in use for both, but
> if you see more then a few XML files open by the client, or more then a
> 1 TCP connection open by either the client or the server, there's your
> culprit.
>
> I'm not sure what Windows equivilent of lsof may exist.
>
> Wait ... i just had another thought
>
> You are using InputStreamReader to deal with the InputStreams of your
> remote XML files -- but you aren't specifying a charset, so it's using
> your system default which may be differnet from the charset of the
> orriginal XML files you are pulling from the URL -- which (i *think*)
> means that your InputStreamReader may in some cases fail to read all of
> the bytes of the stream, which might some dangling filehandles (i'm just
> guessing on that part ... i'm not acctually sure whta happens in that
> case).
>
> What if you simplify your code (for the purposes of testing) and just put
> the post-transform version ganja-full.xml in a big ass String variable in
> your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
> over again ... does that cause the same problem?
>
>
> :
> : --
> :
> : package com.iceninetech.solr.update;
> :
> : import com.iceninetech.xml.XMLTransformer;
> :
> : import java.io.*;
> : import java.net.HttpURLConnection;
> : import java.net.URL;
> : import java.util.logging.Logger;
> :
> : public class GanjaUpdate {
> :
> :   private String updateSite = "";
> :   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;
> :
> :   private static final File xmlStorageDir = new
> : File("/source/solr/xml-dls/");
> :
> :   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
> :
> :   public GanjaUpdate(String siteName) {
> : this.updateSite = siteName;
> : log.info("GanjaUpdate is primed and ready to update " + siteName);
> :   }
> :
> :   public void update() {
> : StringWriter sw = new StringWriter();
> :
> : try {
> :   // transform gawkerInput XML to SOLR update XML
> :   XMLTransformer transform = new XMLTransformer();
> :   log.info("About to transform ganjaInput XML to Solr Update XML");
> :   transform.transform(getXML(), sw, getXSL());
> :   log.info("Completed ganjaInput/SolrUpdate XML transform");
> :
> :   // Write transformed XML to Disk.
> :   File transformedXML = new File(xmlStorageDir, updateSite+".sml");
> :   FileWriter fw = new FileWriter(transformedXML);
> :   fw.write(sw.toString());
> :   fw.close();
> :
> :   // post to Solr
> :   log.info("About to update Solr for site " + updateSite);
> :   String result = this.doUpdate(sw.toString());
> :   log.info("Solr says: " + result);
> :   sw.close();
> : } catch (Exception e) {
> :   e.printStackTrace();
> : }
> :   }
> :
> :  

Re: Doc add limit

2006-07-31 Thread sangraal aiken

Those are some great ideas Chris... I'm going to try some of them out.  I'll
post the results when I get a chance to do more testing. Thanks.

At this point I can work around the problem by ignoring Solr's response but
this is obviously not ideal. I would feel better knowing what is causing the
issue as well.

-Sangraal



On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: Sure, the method that does all the work updating Solr is the
doUpdate(String
: s) method in the GanjaUpdate class I'm pasting below. It's hanging when
I
: try to read the response... the last output I receive in my log is Got
: Reader...

I don't have the means to try out this code right now ... but i can't see
any obvious problems with it (there may be somewhere that you are opening
a stream or reader and not closing it, but i didn't see one) ... i notice
you are running this client on the same machine as Solr (hence the
localhost URLs) did you by any chance try running the client on a seperate
machine to see if hte number of updates before it hangs changes?

my money is still on a filehandle resource limit somwhere ... if you are
running on a system that has "lsof" (on some Unix/Linux installations you
need sudo/su root permissions to run it) you can use "lsof -p " to
look up what files/network connections are open for a given process.  You
can try running that on both the client pid and the Solr server pid once
it's hung -- You'll probably see a lot of Jar files in use for both, but
if you see more then a few XML files open by the client, or more then a
1 TCP connection open by either the client or the server, there's your
culprit.

I'm not sure what Windows equivilent of lsof may exist.

Wait ... i just had another thought

You are using InputStreamReader to deal with the InputStreams of your
remote XML files -- but you aren't specifying a charset, so it's using
your system default which may be differnet from the charset of the
orriginal XML files you are pulling from the URL -- which (i *think*)
means that your InputStreamReader may in some cases fail to read all of
the bytes of the stream, which might some dangling filehandles (i'm just
guessing on that part ... i'm not acctually sure whta happens in that
case).

What if you simplify your code (for the purposes of testing) and just put
the post-transform version ganja-full.xml in a big ass String variable in
your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
over again ... does that cause the same problem?


:
: --
:
: package com.iceninetech.solr.update;
:
: import com.iceninetech.xml.XMLTransformer;
:
: import java.io.*;
: import java.net.HttpURLConnection;
: import java.net.URL;
: import java.util.logging.Logger;
:
: public class GanjaUpdate {
:
:   private String updateSite = "";
:   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;
:
:   private static final File xmlStorageDir = new
: File("/source/solr/xml-dls/");
:
:   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
:
:   public GanjaUpdate(String siteName) {
: this.updateSite = siteName;
: log.info("GanjaUpdate is primed and ready to update " + siteName);
:   }
:
:   public void update() {
: StringWriter sw = new StringWriter();
:
: try {
:   // transform gawkerInput XML to SOLR update XML
:   XMLTransformer transform = new XMLTransformer();
:   log.info("About to transform ganjaInput XML to Solr Update XML");
:   transform.transform(getXML(), sw, getXSL());
:   log.info("Completed ganjaInput/SolrUpdate XML transform");
:
:   // Write transformed XML to Disk.
:   File transformedXML = new File(xmlStorageDir, updateSite+".sml");
:   FileWriter fw = new FileWriter(transformedXML);
:   fw.write(sw.toString());
:   fw.close();
:
:   // post to Solr
:   log.info("About to update Solr for site " + updateSite);
:   String result = this.doUpdate(sw.toString());
:   log.info("Solr says: " + result);
:   sw.close();
: } catch (Exception e) {
:   e.printStackTrace();
: }
:   }
:
:   public File getXML() {
: String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja-
: full.xml";
:
: // check for file
: File localXML = new File(xmlStorageDir, updateSite + ".xml");
:
: try {
:   if (localXML.createNewFile() && localXML.canWrite()) {
: // open connection
: log.info("Downloading: " + XML_URL);
: URL url = new URL(XML_URL);
: HttpURLConnection conn = (HttpURLConnection) url.openConnection
();
: conn.setRequestMethod("GET");
:
: // Read response to File
: log.info("Storing XML to File" + localXML.getCanonicalPath());
: FileOutputStream fos = new FileOutputStream(new
File(xmlStorageDir,
: updateSite + ".xml"));
:
: BufferedReader rd = new BufferedReader(new InputStreamReader(
: conn.getInputStream()));
: String line;
: while ((line = rd.readLine()) != null) {
:   

Re: Doc add limit

2006-07-29 Thread Chris Hostetter

: Sure, the method that does all the work updating Solr is the doUpdate(String
: s) method in the GanjaUpdate class I'm pasting below. It's hanging when I
: try to read the response... the last output I receive in my log is Got
: Reader...

I don't have the means to try out this code right now ... but i can't see
any obvious problems with it (there may be somewhere that you are opening
a stream or reader and not closing it, but i didn't see one) ... i notice
you are running this client on the same machine as Solr (hence the
localhost URLs) did you by any chance try running the client on a seperate
machine to see if hte number of updates before it hangs changes?

my money is still on a filehandle resource limit somwhere ... if you are
running on a system that has "lsof" (on some Unix/Linux installations you
need sudo/su root permissions to run it) you can use "lsof -p " to
look up what files/network connections are open for a given process.  You
can try running that on both the client pid and the Solr server pid once
it's hung -- You'll probably see a lot of Jar files in use for both, but
if you see more then a few XML files open by the client, or more then a
1 TCP connection open by either the client or the server, there's your
culprit.

I'm not sure what Windows equivilent of lsof may exist.

Wait ... i just had another thought

You are using InputStreamReader to deal with the InputStreams of your
remote XML files -- but you aren't specifying a charset, so it's using
your system default which may be differnet from the charset of the
orriginal XML files you are pulling from the URL -- which (i *think*)
means that your InputStreamReader may in some cases fail to read all of
the bytes of the stream, which might some dangling filehandles (i'm just
guessing on that part ... i'm not acctually sure whta happens in that
case).

What if you simplify your code (for the purposes of testing) and just put
the post-transform version ganja-full.xml in a big ass String variable in
your java app and just call GanjaUpdate.doUpdate(bigAssString) over and
over again ... does that cause the same problem?


:
: --
:
: package com.iceninetech.solr.update;
:
: import com.iceninetech.xml.XMLTransformer;
:
: import java.io.*;
: import java.net.HttpURLConnection;
: import java.net.URL;
: import java.util.logging.Logger;
:
: public class GanjaUpdate {
:
:   private String updateSite = "";
:   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;
:
:   private static final File xmlStorageDir = new
: File("/source/solr/xml-dls/");
:
:   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
:
:   public GanjaUpdate(String siteName) {
: this.updateSite = siteName;
: log.info("GanjaUpdate is primed and ready to update " + siteName);
:   }
:
:   public void update() {
: StringWriter sw = new StringWriter();
:
: try {
:   // transform gawkerInput XML to SOLR update XML
:   XMLTransformer transform = new XMLTransformer();
:   log.info("About to transform ganjaInput XML to Solr Update XML");
:   transform.transform(getXML(), sw, getXSL());
:   log.info("Completed ganjaInput/SolrUpdate XML transform");
:
:   // Write transformed XML to Disk.
:   File transformedXML = new File(xmlStorageDir, updateSite+".sml");
:   FileWriter fw = new FileWriter(transformedXML);
:   fw.write(sw.toString());
:   fw.close();
:
:   // post to Solr
:   log.info("About to update Solr for site " + updateSite);
:   String result = this.doUpdate(sw.toString());
:   log.info("Solr says: " + result);
:   sw.close();
: } catch (Exception e) {
:   e.printStackTrace();
: }
:   }
:
:   public File getXML() {
: String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja-
: full.xml";
:
: // check for file
: File localXML = new File(xmlStorageDir, updateSite + ".xml");
:
: try {
:   if (localXML.createNewFile() && localXML.canWrite()) {
: // open connection
: log.info("Downloading: " + XML_URL);
: URL url = new URL(XML_URL);
: HttpURLConnection conn = (HttpURLConnection) url.openConnection();
: conn.setRequestMethod("GET");
:
: // Read response to File
: log.info("Storing XML to File" + localXML.getCanonicalPath());
: FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir,
: updateSite + ".xml"));
:
: BufferedReader rd = new BufferedReader(new InputStreamReader(
: conn.getInputStream()));
: String line;
: while ((line = rd.readLine()) != null) {
:   line = line + '\n'; // add break after each line. It preserves
: formatting.
:   fos.write(line.getBytes("UTF8"));
: }
:
: // close connections
: rd.close();
: fos.close();
: conn.disconnect();
: log.info("Got the XML... File saved.");
:   }
: } catch (Exception e) {
:   e.printStackTrace();
: }
:
:

Re: Doc add limit

2006-07-28 Thread Yonik Seeley

Does anyone know if the following table is still valid for HttpUrlConnection:
http://www.innovation.ch/java/HTTPClient/urlcon_vs_httpclient.html

If so, there are a couple of advantages to using HTTPClient with Solr:
- direct streaming to/from socket (could be important for very large
requests/responses)
- can read code/headers/body regardless what the response code is
- more flexible authorization (not used in Solr now, but could be in the future)
- can set timeouts

-Yonik

On 7/28/06, Andrew May <[EMAIL PROTECTED]> wrote:

I'm using HttpClient for indexing and searching and it seems to work well. You 
can either
POST files directly (only works in 3.1 alpha, use InputStreamRequestEntity in 
3.0):

   PostMethod post = new PostMethod(solrUrl);
   post.setRequestEntity(new FileRequestEntity(file, "application/xml"));
   int response = new HttpClient().executeMethod(post);

or send a String (e.g. the result of an XSLT transformation)

   PostMethod post = new PostMethod(solrUrl);
   post.setRequestEntity(new StringRequestEntity(text, "application/xml", 
"UTF-8"));
   int response = new HttpClient().executeMethod(post);

You can also pool connections if you're writing something multi-threaded.

-Andrew

Bertrand Delacretaz wrote:
> On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
>> ...Getting all the little details of connection handling correct can be
>> tough... it's probably a good idea if we work toward common client
>> libraries so everyone doesn't have to reinvent them
>
> Jakarta's HttpClient [1] is IMHO a good base for Java clients, and
> it's easy to use, see the PostXML example in [2].
>
> -Bertrand
>
> [1] http://jakarta.apache.org/commons/httpclient/
>
> [2]
> 
http://svn.apache.org/viewvc/jakarta/commons/proper/httpclient/trunk/src/examples/PostXML.java?revision=410848&view=markup
>


Re: Doc add limit

2006-07-28 Thread Andrew May
I'm using HttpClient for indexing and searching and it seems to work well. You can either 
POST files directly (only works in 3.1 alpha, use InputStreamRequestEntity in 3.0):


  PostMethod post = new PostMethod(solrUrl);
  post.setRequestEntity(new FileRequestEntity(file, "application/xml"));
  int response = new HttpClient().executeMethod(post);

or send a String (e.g. the result of an XSLT transformation)

  PostMethod post = new PostMethod(solrUrl);
  post.setRequestEntity(new StringRequestEntity(text, "application/xml", 
"UTF-8"));
  int response = new HttpClient().executeMethod(post);

You can also pool connections if you're writing something multi-threaded.

-Andrew

Bertrand Delacretaz wrote:

On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


...Getting all the little details of connection handling correct can be
tough... it's probably a good idea if we work toward common client
libraries so everyone doesn't have to reinvent them


Jakarta's HttpClient [1] is IMHO a good base for Java clients, and
it's easy to use, see the PostXML example in [2].

-Bertrand

[1] http://jakarta.apache.org/commons/httpclient/

[2] 
http://svn.apache.org/viewvc/jakarta/commons/proper/httpclient/trunk/src/examples/PostXML.java?revision=410848&view=markup 



Re: Re: Doc add limit

2006-07-28 Thread Bertrand Delacretaz

On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


...Getting all the little details of connection handling correct can be
tough... it's probably a good idea if we work toward common client
libraries so everyone doesn't have to reinvent them


Jakarta's HttpClient [1] is IMHO a good base for Java clients, and
it's easy to use, see the PostXML example in [2].

-Bertrand

[1] http://jakarta.apache.org/commons/httpclient/

[2] 
http://svn.apache.org/viewvc/jakarta/commons/proper/httpclient/trunk/src/examples/PostXML.java?revision=410848&view=markup


Re: Doc add limit

2006-07-28 Thread sangraal aiken

Yeah that code is pretty bare bones... I'm still in the initial testing
stage. You're right it definitely needs some more thourough work.

I did try removing all the conn.disconnect(); statements and there was no
change.

I'm going to give the Java Client code you sent me yesterday a shot and see
what happens with that. I'm kind of out of ideas for what could be causing
the hang... it really seems to just get locked in some sort of loop, but
there are absolutely no exceptions being thrown either on the Solr side or
the Client side... it just stops processing.

-Sangraal

On 7/28/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


It may be some sort of weird interaction with persistent connections
and timeouts (both client and server have connection timeouts I
assume).

Does anything change if you remove your .disconnect() call (it
shouldn't be needed).
Do you ever see any exceptions in the client side?

The code you show probably needs more error handling (finally blocks
with closes), but  if you don't see any stack traces from your
e.printStackTrace() then it doesn't have anything to do with this
problem.

Getting all the little details of connection handling correct can be
tough... it's probably a good idea if we work toward common client
libraries so everyone doesn't have to reinvent them.

-Yonik

On 7/28/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Sure, the method that does all the work updating Solr is the
doUpdate(String
> s) method in the GanjaUpdate class I'm pasting below. It's hanging when
I
> try to read the response... the last output I receive in my log is Got
> Reader...
>
> --
>
> package com.iceninetech.solr.update;
>
> import com.iceninetech.xml.XMLTransformer;
>
> import java.io.*;
> import java.net.HttpURLConnection;
> import java.net.URL;
> import java.util.logging.Logger;
>
> public class GanjaUpdate {
>
>   private String updateSite = "";
>   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;
>
>   private static final File xmlStorageDir = new
> File("/source/solr/xml-dls/");
>
>   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
>
>   public GanjaUpdate(String siteName) {
> this.updateSite = siteName;
> log.info("GanjaUpdate is primed and ready to update " + siteName);
>   }
>
>   public void update() {
> StringWriter sw = new StringWriter();
>
> try {
>   // transform gawkerInput XML to SOLR update XML
>   XMLTransformer transform = new XMLTransformer();
>   log.info("About to transform ganjaInput XML to Solr Update XML");
>   transform.transform(getXML(), sw, getXSL());
>   log.info("Completed ganjaInput/SolrUpdate XML transform");
>
>   // Write transformed XML to Disk.
>   File transformedXML = new File(xmlStorageDir, updateSite+".sml");
>   FileWriter fw = new FileWriter(transformedXML);
>   fw.write(sw.toString());
>   fw.close();
>
>   // post to Solr
>   log.info("About to update Solr for site " + updateSite);
>   String result = this.doUpdate(sw.toString());
>   log.info("Solr says: " + result);
>   sw.close();
> } catch (Exception e) {
>   e.printStackTrace();
> }
>   }
>
>   public File getXML() {
> String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja-
> full.xml";
>
> // check for file
> File localXML = new File(xmlStorageDir, updateSite + ".xml");
>
> try {
>   if (localXML.createNewFile() && localXML.canWrite()) {
> // open connection
> log.info("Downloading: " + XML_URL);
> URL url = new URL(XML_URL);
> HttpURLConnection conn = (HttpURLConnection) url.openConnection
();
> conn.setRequestMethod("GET");
>
> // Read response to File
> log.info("Storing XML to File" + localXML.getCanonicalPath());
> FileOutputStream fos = new FileOutputStream(new
File(xmlStorageDir,
> updateSite + ".xml"));
>
> BufferedReader rd = new BufferedReader(new InputStreamReader(
> conn.getInputStream()));
> String line;
> while ((line = rd.readLine()) != null) {
>   line = line + '\n'; // add break after each line. It preserves
> formatting.
>   fos.write(line.getBytes("UTF8"));
> }
>
> // close connections
> rd.close();
> fos.close();
> conn.disconnect();
> log.info("Got the XML... File saved.");
>   }
> } catch (Exception e) {
>   e.printStackTrace();
> }
>
> return localXML;
>   }
>
>   public File getXSL() {
> StringBuffer retVal = new StringBuffer();
>
> // check for file
> File localXSL = new File(xmlStorageDir, "ganja.xsl");
>
> try {
>   if (localXSL.createNewFile() && localXSL.canWrite()) {
> // open connection
> log.info("Downloading: " + XSL_URL);
> URL url = new URL(XSL_URL);
> HttpURLConnection conn = (HttpURLConnection) url.openConnection
();
> conn.setRequestMethod("GET");
> // Read respo

Re: Doc add limit

2006-07-28 Thread Yonik Seeley

It may be some sort of weird interaction with persistent connections
and timeouts (both client and server have connection timeouts I
assume).

Does anything change if you remove your .disconnect() call (it
shouldn't be needed).
Do you ever see any exceptions in the client side?

The code you show probably needs more error handling (finally blocks
with closes), but  if you don't see any stack traces from your
e.printStackTrace() then it doesn't have anything to do with this
problem.

Getting all the little details of connection handling correct can be
tough... it's probably a good idea if we work toward common client
libraries so everyone doesn't have to reinvent them.

-Yonik

On 7/28/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Sure, the method that does all the work updating Solr is the doUpdate(String
s) method in the GanjaUpdate class I'm pasting below. It's hanging when I
try to read the response... the last output I receive in my log is Got
Reader...

--

package com.iceninetech.solr.update;

import com.iceninetech.xml.XMLTransformer;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.logging.Logger;

public class GanjaUpdate {

  private String updateSite = "";
  private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;

  private static final File xmlStorageDir = new
File("/source/solr/xml-dls/");

  final Logger log = Logger.getLogger(GanjaUpdate.class.getName());

  public GanjaUpdate(String siteName) {
this.updateSite = siteName;
log.info("GanjaUpdate is primed and ready to update " + siteName);
  }

  public void update() {
StringWriter sw = new StringWriter();

try {
  // transform gawkerInput XML to SOLR update XML
  XMLTransformer transform = new XMLTransformer();
  log.info("About to transform ganjaInput XML to Solr Update XML");
  transform.transform(getXML(), sw, getXSL());
  log.info("Completed ganjaInput/SolrUpdate XML transform");

  // Write transformed XML to Disk.
  File transformedXML = new File(xmlStorageDir, updateSite+".sml");
  FileWriter fw = new FileWriter(transformedXML);
  fw.write(sw.toString());
  fw.close();

  // post to Solr
  log.info("About to update Solr for site " + updateSite);
  String result = this.doUpdate(sw.toString());
  log.info("Solr says: " + result);
  sw.close();
} catch (Exception e) {
  e.printStackTrace();
}
  }

  public File getXML() {
String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja-
full.xml";

// check for file
File localXML = new File(xmlStorageDir, updateSite + ".xml");

try {
  if (localXML.createNewFile() && localXML.canWrite()) {
// open connection
log.info("Downloading: " + XML_URL);
URL url = new URL(XML_URL);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");

// Read response to File
log.info("Storing XML to File" + localXML.getCanonicalPath());
FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir,
updateSite + ".xml"));

BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
  line = line + '\n'; // add break after each line. It preserves
formatting.
  fos.write(line.getBytes("UTF8"));
}

// close connections
rd.close();
fos.close();
conn.disconnect();
log.info("Got the XML... File saved.");
  }
} catch (Exception e) {
  e.printStackTrace();
}

return localXML;
  }

  public File getXSL() {
StringBuffer retVal = new StringBuffer();

// check for file
File localXSL = new File(xmlStorageDir, "ganja.xsl");

try {
  if (localXSL.createNewFile() && localXSL.canWrite()) {
// open connection
log.info("Downloading: " + XSL_URL);
URL url = new URL(XSL_URL);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
// Read response
BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
  line = line + '\n';
  retVal.append(line);
}
// close connections
rd.close();
conn.disconnect();

log.info("Got the XSLT.");

// output file
log.info("Storing XSL to File" + localXSL.getCanonicalPath());
FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir,
"ganja.xsl"));
fos.write(retVal.toString().getBytes());
fos.close();
log.info("File saved.");
  }
} catch (Exception e) {
  e.printStackTrace();
}
return localXSL;
  }

  private String doUpdate(String sw) {
StringBuffer updateResult = new StringBuffer();
try {

Re: Doc add limit

2006-07-28 Thread sangraal aiken

Sure, the method that does all the work updating Solr is the doUpdate(String
s) method in the GanjaUpdate class I'm pasting below. It's hanging when I
try to read the response... the last output I receive in my log is Got
Reader...

--

package com.iceninetech.solr.update;

import com.iceninetech.xml.XMLTransformer;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.logging.Logger;

public class GanjaUpdate {

 private String updateSite = "";
 private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";;

 private static final File xmlStorageDir = new
File("/source/solr/xml-dls/");

 final Logger log = Logger.getLogger(GanjaUpdate.class.getName());

 public GanjaUpdate(String siteName) {
   this.updateSite = siteName;
   log.info("GanjaUpdate is primed and ready to update " + siteName);
 }

 public void update() {
   StringWriter sw = new StringWriter();

   try {
 // transform gawkerInput XML to SOLR update XML
 XMLTransformer transform = new XMLTransformer();
 log.info("About to transform ganjaInput XML to Solr Update XML");
 transform.transform(getXML(), sw, getXSL());
 log.info("Completed ganjaInput/SolrUpdate XML transform");

 // Write transformed XML to Disk.
 File transformedXML = new File(xmlStorageDir, updateSite+".sml");
 FileWriter fw = new FileWriter(transformedXML);
 fw.write(sw.toString());
 fw.close();

 // post to Solr
 log.info("About to update Solr for site " + updateSite);
 String result = this.doUpdate(sw.toString());
 log.info("Solr says: " + result);
 sw.close();
   } catch (Exception e) {
 e.printStackTrace();
   }
 }

 public File getXML() {
   String XML_URL = "http://localhost:8080/"; + updateSite + "/ganja-
full.xml";

   // check for file
   File localXML = new File(xmlStorageDir, updateSite + ".xml");

   try {
 if (localXML.createNewFile() && localXML.canWrite()) {
   // open connection
   log.info("Downloading: " + XML_URL);
   URL url = new URL(XML_URL);
   HttpURLConnection conn = (HttpURLConnection) url.openConnection();
   conn.setRequestMethod("GET");

   // Read response to File
   log.info("Storing XML to File" + localXML.getCanonicalPath());
   FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir,
updateSite + ".xml"));

   BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
   String line;
   while ((line = rd.readLine()) != null) {
 line = line + '\n'; // add break after each line. It preserves
formatting.
 fos.write(line.getBytes("UTF8"));
   }

   // close connections
   rd.close();
   fos.close();
   conn.disconnect();
   log.info("Got the XML... File saved.");
 }
   } catch (Exception e) {
 e.printStackTrace();
   }

   return localXML;
 }

 public File getXSL() {
   StringBuffer retVal = new StringBuffer();

   // check for file
   File localXSL = new File(xmlStorageDir, "ganja.xsl");

   try {
 if (localXSL.createNewFile() && localXSL.canWrite()) {
   // open connection
   log.info("Downloading: " + XSL_URL);
   URL url = new URL(XSL_URL);
   HttpURLConnection conn = (HttpURLConnection) url.openConnection();
   conn.setRequestMethod("GET");
   // Read response
   BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
   String line;
   while ((line = rd.readLine()) != null) {
 line = line + '\n';
 retVal.append(line);
   }
   // close connections
   rd.close();
   conn.disconnect();

   log.info("Got the XSLT.");

   // output file
   log.info("Storing XSL to File" + localXSL.getCanonicalPath());
   FileOutputStream fos = new FileOutputStream(new File(xmlStorageDir,
"ganja.xsl"));
   fos.write(retVal.toString().getBytes());
   fos.close();
   log.info("File saved.");
 }
   } catch (Exception e) {
 e.printStackTrace();
   }
   return localXSL;
 }

 private String doUpdate(String sw) {
   StringBuffer updateResult = new StringBuffer();
   try {
 // open connection
 log.info("Connecting to and preparing to post to SolrUpdate
servlet.");
 URL url = new URL("http://localhost:8080/update";);
 HttpURLConnection conn = (HttpURLConnection) url.openConnection();
 conn.setRequestMethod("POST");
 conn.setRequestProperty("Content-Type", "application/octet-stream");
 conn.setDoOutput(true);
 conn.setDoInput(true);
 conn.setUseCaches(false);

 // Write to server
 log.info("About to post to SolrUpdate servlet.");
 DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
 output.writeBytes(sw);
 output.flush();
 output.close();
 log.info("Finished posting to SolrUpdate servlet.");

 // Read response
 log.info("Ready to read response.");
 BufferedReader rd = new BufferedReader(new InputStream

Re: Doc add limit

2006-07-27 Thread Chris Hostetter

: I'm sure... it seems like solr is having trouble writing to a tomcat
: response that's been inactive for a bit. It's only 30 seconds though, so I'm
: not entirely sure why that would happen.

but didn't you say you don't have this problem when you use curl -- just
your java client code?

Did you try Yonik's python test client? or the java client in Jira?

looking over the java clinet codey you sent, it's not clear if you are
reading the response back, or closing the connections ... can you post a
more complete sample app thatexhibits the problem for you?



-Hoss



Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'm sure... it seems like solr is having trouble writing to a tomcat
response that's been inactive for a bit. It's only 30 seconds though, so I'm
not entirely sure why that would happen.

I use the same client code for DL'ing XSL sheets from external servers and
it works fine, but in those instances the server responds much faster to the
request.

This is an elusive bug for sure.

-S

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Commenting out the following line in SolrCore fixes my problem... but of
> course I don't get the result status info... but this isn't a problem
for me
> really.
>
> -Sangraal
>
> writer.write("");

While it's possible you hit a Tomcat bug, I think it's more likely a
client problem.

-Yonik



Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'll give that a shot...

Thanks again for all your help.

-S

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


You might also try the Java update client here:
http://issues.apache.org/jira/browse/SOLR-20

-Yonik



Re: Doc add limit

2006-07-27 Thread Yonik Seeley

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Commenting out the following line in SolrCore fixes my problem... but of
course I don't get the result status info... but this isn't a problem for me
really.

-Sangraal

writer.write("");


While it's possible you hit a Tomcat bug, I think it's more likely a
client problem.

-Yonik


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Commenting out the following line in SolrCore fixes my problem... but of
course I don't get the result status info... but this isn't a problem for me
really.

-Sangraal

writer.write("");
On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:


I'm running on Tomcat... and I've verified that the complete post is
making it through the SolrUpdate servlet and into the SolrCore object...
thanks for the info though.
--
So the code is hanging on this call in SolrCore.java

writer.write("");

The thread dump:

"http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu
time= 40698.0440ms user time=38646.1680ms

 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
:92)
 at java.net.SocketOutputStream.write (SocketOutputStream.java:136)
 at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(
InternalOutputBuffer.java:746)
 at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java
:433)
 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348)
 at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite
(InternalOutputBuffer.java:769)
 at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite (
ChunkedOutputFilter.java:125)
 at org.apache.coyote.http11.InternalOutputBuffer.doWrite(
InternalOutputBuffer.java:579)
 at org.apache.coyote.Response.doWrite(Response.java:559)
 at org.apache.catalina.connector.OutputBuffer.realWriteBytes (
OutputBuffer.java:361)
 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324)
 at org.apache.tomcat.util.buf.IntermediateOutputStream.write(
C2BConverter.java:235)
 at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes (StreamEncoder.java
:336)
 at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(
StreamEncoder.java:404)
 at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java
:408)
 at sun.nio.cs.StreamEncoder.flush (StreamEncoder.java:152)
 at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
 at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java
:184)
 at org.apache.tomcat.util.buf.C2BConverter.flushBuffer (
C2BConverter.java:127)
 at org.apache.catalina.connector.OutputBuffer.realWriteChars(
OutputBuffer.java:536)
 at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java
:439)
 at org.apache.tomcat.util.buf.CharChunk.append (CharChunk.java:370)
 at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java
:491)
 at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:161)
 at org.apache.catalina.connector.CoyoteWriter.write (
CoyoteWriter.java:170)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:695)
 at org.apache.solr.servlet.SolrUpdateServlet.doPost(
SolrUpdateServlet.java:52)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java :252)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
 at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
 at org.apache.catalina.core.StandardContextValve.invoke (
StandardContextValve.java:178)
 at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
 at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
 at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
 at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
 at org.apache.coyote.http11.Http11Processor.process (
Http11Processor.java:869)
 at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
 at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
 at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
 at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java :684)

 at java.lang.Thread.run(Thread.java:613)

On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED] > wrote:
>
> I haven't been following the thread, but
> Not sure if you are using Tomcat or Jetty, but Jetty has a POST size
> limit (set somewhere in its configs) that may be the source of the problem.
>
> Otis
> P.S.
> Just occurred to me.
> Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing
> Jerry or Jerrymouse.
>
> - Original Message 
> From: Mike Klaas < [EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 27, 2006 6:33:16 PM
&

Re: Doc add limit

2006-07-27 Thread Yonik Seeley

You might also try the Java update client here:
http://issues.apache.org/jira/browse/SOLR-20

-Yonik


Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'm running on Tomcat... and I've verified that the complete post is making
it through the SolrUpdate servlet and into the SolrCore object... thanks for
the info though.
--
So the code is hanging on this call in SolrCore.java

   writer.write("");

The thread dump:

"http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu
time=40698.0440ms user time=38646.1680ms
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(
InternalOutputBuffer.java:746)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:433)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348)
at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite
(InternalOutputBuffer.java:769)
at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(
ChunkedOutputFilter.java:125)
at org.apache.coyote.http11.InternalOutputBuffer.doWrite(
InternalOutputBuffer.java:579)
at org.apache.coyote.Response.doWrite(Response.java:559)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(
OutputBuffer.java:361)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324)
at org.apache.tomcat.util.buf.IntermediateOutputStream.write(
C2BConverter.java:235)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java
:336)
at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(
StreamEncoder.java:404)
at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java
:184)
at org.apache.tomcat.util.buf.C2BConverter.flushBuffer(
C2BConverter.java:127)
at org.apache.catalina.connector.OutputBuffer.realWriteChars(
OutputBuffer.java:536)
at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java:439)
at org.apache.tomcat.util.buf.CharChunk.append(CharChunk.java:370)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java
:491)
at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:161)
at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:170)
at org.apache.solr.core.SolrCore.update(SolrCore.java:695)
at org.apache.solr.servlet.SolrUpdateServlet.doPost(
SolrUpdateServlet.java:52)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:613)

On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


I haven't been following the thread, but
Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit
(set somewhere in its configs) that may be the source of the problem.

Otis
P.S.
Just occurred to me.
Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing
Jerry or Jerrymouse.

- Original Message 
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, July 27, 2006 6:33:16 PM
Subject: Re: Doc add limit

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB 

Re: Doc add limit

2006-07-27 Thread Otis Gospodnetic
I haven't been following the thread, but
Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit (set 
somewhere in its configs) that may be the source of the problem.

Otis
P.S.
Just occurred to me.
Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing Jerry 
or Jerrymouse.

- Original Message 
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, July 27, 2006 6:33:16 PM
Subject: Re: Doc add limit

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Mike,
>  I've been posting with the content type set like this:
>   conn.setRequestProperty("Content-Type", "application/octet-stream");
>
> I tried your suggestion though, and unfortunately there was no change.
>   conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");
>
> -Sangraal
>
>
> On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >
> > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > > class SolrConnection:
> > >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > > self.host = host
> > > self.solrBase = solrBase
> > > #a connection to the server is not opened at this point.
> > > self.conn = httplib.HTTPConnection(self.host)
> > > #self.conn.set_debuglevel(100)
> > > self.postheaders = {"Connection":"close"}
> > >
> > >   def doUpdateXML(self, request):
> > > try:
> > >   self.conn.request('POST', self.solrBase+'/update', request,
> > > self.postheaders)
> >
> > Disgressive note: I'm not sure if it is necessary with tomcat, but in
> > my experience driving solr with python using Jetty, it was necessary
> > to specify the content-type when posting utf-8 data:
> >
> > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
> >
> > -Mike
> >
>
>





Re: Doc add limit

2006-07-27 Thread sangraal aiken

Yeah, I'm closing them.  Here's the method:

-
 private String doUpdate(String sw) {
   StringBuffer updateResult = new StringBuffer();
   try {
 // open connection
 log.info("Connecting to and preparing to post to SolrUpdate
servlet.");
 URL url = new URL("http://localhost:8080/update";);
 HttpURLConnection conn = (HttpURLConnection) url.openConnection();
 conn.setRequestMethod("POST");
 conn.setRequestProperty("Content-Type", "application/octet-stream");
 conn.setDoOutput(true);
 conn.setDoInput(true);
 conn.setUseCaches(false);

 // Write to server
 log.info("About to post to SolrUpdate servlet.");
 DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
 output.writeBytes(sw);
 output.flush();
 output.close();
 log.info("Finished posting to SolrUpdate servlet.");

 // Read response
 log.info("Ready to read response.");
 BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
 log.info("Got reader");
 String line;
 while ((line = rd.readLine()) != null) {
   log.info("Writing to result...");
   updateResult.append(line);
 }
 rd.close();

 // close connections
 conn.disconnect();

 log.info("Done updating Solr for site" + updateSite);
   } catch (Exception e) {
 e.printStackTrace();
   }

   return updateResult.toString();
 }
}

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


Are you reading the response and closing the connection?  If not, you
are probably running out of socket connections.

-Yonik

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Yonik,
> It looks like the problem is with the way I'm posting to the SolrUpdate
> servlet. I am able to use curl to post the data to my tomcat instance
> without a problem. It only fails when I try to handle the http post from
> java... my code is below:
>
>   URL url = new URL("http://localhost:8983/solr/update";);
>   HttpURLConnection conn = (HttpURLConnection) url.openConnection();
>   conn.setRequestMethod("POST");
>   conn.setRequestProperty("Content-Type",
"application/octet-stream");
>   conn.setDoOutput(true);
>   conn.setDoInput(true);
>   conn.setUseCaches(false);
>
>   // Write to server
>   log.info("About to post to SolrUpdate servlet.");
>   DataOutputStream output = new DataOutputStream(
conn.getOutputStream
> ());
>   output.writeBytes(sw);
>   output.flush();
>   log.info("Finished posting to SolrUpdate servlet.");
>
> -Sangraal
>
> On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > I removed everything from the Add xml so the docs looked like this:
> > >
> > > 
> > > 187880
> > > 
> > > 
> > > 187852
> > > 
> > >
> > > and it still hung at 6,144...
> >
> > Maybe you can try the following simple Python client to try and rule
> > out some kind of different client interactions... the attached script
> > adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
> > and Jetty
> >
> > -Yonik
> >
> >
> >  solr.py --
> > import httplib
> > import socket
> >
> > class SolrConnection:
> >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > self.host = host
> > self.solrBase = solrBase
> > #a connection to the server is not opened at this point.
> > self.conn = httplib.HTTPConnection(self.host)
> > #self.conn.set_debuglevel(100)
> > self.postheaders = {"Connection":"close"}
> >
> >   def doUpdateXML(self, request):
> > try:
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
> > except (socket.error,httplib.CannotSendRequest) :
> >   #reconnect in case the connection was broken from the server
going
> > down,
> >   #the server timing out our persistent connection, or another
> >   #network failure.
> >   #Also catch httplib.CannotSendRequest because the HTTPConnection
> > object
> >   #can get in a bad state.
> >   self.conn.close()
> >   self.conn.connect()
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
> >
> > rsp = self.conn.getresponse()
> > #print rsp.status, rsp.reason
> > data = rsp.read()
> > #print "data=",data
> > self.conn.close()
> >
> >   def delete(self, id):
> > xstr = ''+id+''
> > self.doUpdateXML(xstr)
> >
> >   def add(self, **fields):
> > #todo: XML escaping
> > flist=['%s' % f for f in fields.items() ]
> > flist.insert(0,'')
> > flist.append('')
> > xstr = ''.join(flist)
> > self.doUpdateXML(xstr)
> >
> > c = SolrConnection()
> > #for i in range(1):
> > #  c.delete(str(i))
> > for i in range(1):
> >   c.add(id=i)



Re: Doc add limit

2006-07-27 Thread Yonik Seeley

Are you reading the response and closing the connection?  If not, you
are probably running out of socket connections.

-Yonik

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Yonik,
It looks like the problem is with the way I'm posting to the SolrUpdate
servlet. I am able to use curl to post the data to my tomcat instance
without a problem. It only fails when I try to handle the http post from
java... my code is below:

  URL url = new URL("http://localhost:8983/solr/update";);
  HttpURLConnection conn = (HttpURLConnection) url.openConnection();
  conn.setRequestMethod("POST");
  conn.setRequestProperty("Content-Type", "application/octet-stream");
  conn.setDoOutput(true);
  conn.setDoInput(true);
  conn.setUseCaches(false);

  // Write to server
  log.info("About to post to SolrUpdate servlet.");
  DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
  output.writeBytes(sw);
  output.flush();
  log.info("Finished posting to SolrUpdate servlet.");

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > I removed everything from the Add xml so the docs looked like this:
> >
> > 
> > 187880
> > 
> > 
> > 187852
> > 
> >
> > and it still hung at 6,144...
>
> Maybe you can try the following simple Python client to try and rule
> out some kind of different client interactions... the attached script
> adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
> and Jetty
>
> -Yonik
>
>
>  solr.py --
> import httplib
> import socket
>
> class SolrConnection:
>   def __init__(self, host='localhost:8983', solrBase='/solr'):
> self.host = host
> self.solrBase = solrBase
> #a connection to the server is not opened at this point.
> self.conn = httplib.HTTPConnection(self.host)
> #self.conn.set_debuglevel(100)
> self.postheaders = {"Connection":"close"}
>
>   def doUpdateXML(self, request):
> try:
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)
> except (socket.error,httplib.CannotSendRequest) :
>   #reconnect in case the connection was broken from the server going
> down,
>   #the server timing out our persistent connection, or another
>   #network failure.
>   #Also catch httplib.CannotSendRequest because the HTTPConnection
> object
>   #can get in a bad state.
>   self.conn.close()
>   self.conn.connect()
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)
>
> rsp = self.conn.getresponse()
> #print rsp.status, rsp.reason
> data = rsp.read()
> #print "data=",data
> self.conn.close()
>
>   def delete(self, id):
> xstr = ''+id+''
> self.doUpdateXML(xstr)
>
>   def add(self, **fields):
> #todo: XML escaping
> flist=['%s' % f for f in fields.items() ]
> flist.insert(0,'')
> flist.append('')
> xstr = ''.join(flist)
> self.doUpdateXML(xstr)
>
> c = SolrConnection()
> #for i in range(1):
> #  c.delete(str(i))
> for i in range(1):
>   c.add(id=i)


Re: Doc add limit

2006-07-27 Thread sangraal aiken

I think you're right... I will probably work on splitting the batches up
into smaller pieces at some point in the future. I think I will need the
capability to do large batches at some point though, so I want to make sure
the system can handle it. I also want to make sure this problem doesn't pop
up and bite me later.

-Sangraal

On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:


Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Mike,
>  I've been posting with the content type set like this:
>   conn.setRequestProperty("Content-Type",
"application/octet-stream");
>
> I tried your suggestion though, and unfortunately there was no change.
>   conn.setRequestProperty("Content-Type", "text/xml;
charset=utf-8");
>
> -Sangraal
>
>
> On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >
> > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > > class SolrConnection:
> > >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > > self.host = host
> > > self.solrBase = solrBase
> > > #a connection to the server is not opened at this point.
> > > self.conn = httplib.HTTPConnection(self.host)
> > > #self.conn.set_debuglevel(100)
> > > self.postheaders = {"Connection":"close"}
> > >
> > >   def doUpdateXML(self, request):
> > > try:
> > >   self.conn.request('POST', self.solrBase+'/update', request,
> > > self.postheaders)
> >
> > Disgressive note: I'm not sure if it is necessary with tomcat, but in
> > my experience driving solr with python using Jetty, it was necessary
> > to specify the content-type when posting utf-8 data:
> >
> > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
> >
> > -Mike
> >
>
>



Re: Doc add limit

2006-07-27 Thread Mike Klaas

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Mike,
 I've been posting with the content type set like this:
  conn.setRequestProperty("Content-Type", "application/octet-stream");

I tried your suggestion though, and unfortunately there was no change.
  conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");

-Sangraal


On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
> On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> > class SolrConnection:
> >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > self.host = host
> > self.solrBase = solrBase
> > #a connection to the server is not opened at this point.
> > self.conn = httplib.HTTPConnection(self.host)
> > #self.conn.set_debuglevel(100)
> > self.postheaders = {"Connection":"close"}
> >
> >   def doUpdateXML(self, request):
> > try:
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
>
> Disgressive note: I'm not sure if it is necessary with tomcat, but in
> my experience driving solr with python using Jetty, it was necessary
> to specify the content-type when posting utf-8 data:
>
> self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
>
> -Mike
>




Re: Doc add limit

2006-07-27 Thread sangraal aiken

Mike,
I've been posting with the content type set like this:
 conn.setRequestProperty("Content-Type", "application/octet-stream");

I tried your suggestion though, and unfortunately there was no change.
 conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");

-Sangraal


On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:


On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> class SolrConnection:
>   def __init__(self, host='localhost:8983', solrBase='/solr'):
> self.host = host
> self.solrBase = solrBase
> #a connection to the server is not opened at this point.
> self.conn = httplib.HTTPConnection(self.host)
> #self.conn.set_debuglevel(100)
> self.postheaders = {"Connection":"close"}
>
>   def doUpdateXML(self, request):
> try:
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)

Disgressive note: I'm not sure if it is necessary with tomcat, but in
my experience driving solr with python using Jetty, it was necessary
to specify the content-type when posting utf-8 data:

self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})

-Mike



Re: Doc add limit

2006-07-27 Thread Mike Klaas

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


class SolrConnection:
  def __init__(self, host='localhost:8983', solrBase='/solr'):
self.host = host
self.solrBase = solrBase
#a connection to the server is not opened at this point.
self.conn = httplib.HTTPConnection(self.host)
#self.conn.set_debuglevel(100)
self.postheaders = {"Connection":"close"}

  def doUpdateXML(self, request):
try:
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)


Disgressive note: I'm not sure if it is necessary with tomcat, but in
my experience driving solr with python using Jetty, it was necessary
to specify the content-type when posting utf-8 data:

self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})

-Mike


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Yonik,
It looks like the problem is with the way I'm posting to the SolrUpdate
servlet. I am able to use curl to post the data to my tomcat instance
without a problem. It only fails when I try to handle the http post from
java... my code is below:

 URL url = new URL("http://localhost:8983/solr/update";);
 HttpURLConnection conn = (HttpURLConnection) url.openConnection();
 conn.setRequestMethod("POST");
 conn.setRequestProperty("Content-Type", "application/octet-stream");
 conn.setDoOutput(true);
 conn.setDoInput(true);
 conn.setUseCaches(false);

 // Write to server
 log.info("About to post to SolrUpdate servlet.");
 DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
 output.writeBytes(sw);
 output.flush();
 log.info("Finished posting to SolrUpdate servlet.");

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> I removed everything from the Add xml so the docs looked like this:
>
> 
> 187880
> 
> 
> 187852
> 
>
> and it still hung at 6,144...

Maybe you can try the following simple Python client to try and rule
out some kind of different client interactions... the attached script
adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
and Jetty

-Yonik


 solr.py --
import httplib
import socket

class SolrConnection:
  def __init__(self, host='localhost:8983', solrBase='/solr'):
self.host = host
self.solrBase = solrBase
#a connection to the server is not opened at this point.
self.conn = httplib.HTTPConnection(self.host)
#self.conn.set_debuglevel(100)
self.postheaders = {"Connection":"close"}

  def doUpdateXML(self, request):
try:
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)
except (socket.error,httplib.CannotSendRequest) :
  #reconnect in case the connection was broken from the server going
down,
  #the server timing out our persistent connection, or another
  #network failure.
  #Also catch httplib.CannotSendRequest because the HTTPConnection
object
  #can get in a bad state.
  self.conn.close()
  self.conn.connect()
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)

rsp = self.conn.getresponse()
#print rsp.status, rsp.reason
data = rsp.read()
#print "data=",data
self.conn.close()

  def delete(self, id):
xstr = ''+id+''
self.doUpdateXML(xstr)

  def add(self, **fields):
#todo: XML escaping
flist=['%s' % f for f in fields.items() ]
flist.insert(0,'')
flist.append('')
xstr = ''.join(flist)
self.doUpdateXML(xstr)

c = SolrConnection()
#for i in range(1):
#  c.delete(str(i))
for i in range(1):
  c.add(id=i)



Re: Doc add limit

2006-07-27 Thread Yonik Seeley

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

I removed everything from the Add xml so the docs looked like this:


187880


187852


and it still hung at 6,144...


Maybe you can try the following simple Python client to try and rule
out some kind of different client interactions... the attached script
adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
and Jetty

-Yonik


 solr.py --
import httplib
import socket

class SolrConnection:
 def __init__(self, host='localhost:8983', solrBase='/solr'):
   self.host = host
   self.solrBase = solrBase
   #a connection to the server is not opened at this point.
   self.conn = httplib.HTTPConnection(self.host)
   #self.conn.set_debuglevel(100)
   self.postheaders = {"Connection":"close"}

 def doUpdateXML(self, request):
   try:
 self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)
   except (socket.error,httplib.CannotSendRequest) :
 #reconnect in case the connection was broken from the server going down,
 #the server timing out our persistent connection, or another
 #network failure.
 #Also catch httplib.CannotSendRequest because the HTTPConnection object
 #can get in a bad state.
 self.conn.close()
 self.conn.connect()
 self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)

   rsp = self.conn.getresponse()
   #print rsp.status, rsp.reason
   data = rsp.read()
   #print "data=",data
   self.conn.close()

 def delete(self, id):
   xstr = ''+id+''
   self.doUpdateXML(xstr)

 def add(self, **fields):
   #todo: XML escaping
   flist=['%s' % f for f in fields.items() ]
   flist.insert(0,'')
   flist.append('')
   xstr = ''.join(flist)
   self.doUpdateXML(xstr)

c = SolrConnection()
#for i in range(1):
#  c.delete(str(i))
for i in range(1):
 c.add(id=i)


Re: Doc add limit

2006-07-26 Thread sangraal aiken

I removed everything from the Add xml so the docs looked like this:


187880


187852


and it still hung at 6,144...

-S


On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


If you narrow the docs down to just the "id" field, does it still
happen at the same place?

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> I see the problem on Mac OS X/JDK: 1.5.0_06 and Debian/JDK: 1.5.0_07.
>
> I don't think it's a socket problem, because I can initiate additional
> updates while the server is hung... weird I know.
>
> Thanks for all your help, I'll send a post if/when I find a solution.
>
> -S
>
> On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > Tomcat problem, or a Solr problem that is only manifesting on your
> > platform, or a JVM or libc problem, or even a client update problem...
> > (possibly you might be exhausting the number of sockets in the server
> > by using persistent connections with a long timeout and never reusing
> > them?)
> >
> > What is your OS/JVM?
> >
> > -Yonik
> >
> > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > Right now the heap is set to 512M but I've increased it up to 2GB
and
> > yet it
> > > still hangs at the same number 6,144...
> > >
> > > Here's something interesting... I pushed this code over to a
different
> > > server and tried an update. On that server it's hanging on #5,267.
Then
> > > tomcat seems to try to reload the webapp... indefinitely.
> > >
> > > So I guess this is looking more like a tomcat problem more than a
> > > lucene/solr problem huh?
> > >
> > > -Sangraal
> > >
> > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > > >
> > > > So it looks like your client is hanging trying to send somethig
over
> > > > the socket to the server and blocking... probably because Tomcat
isn't
> > > > reading anything from the socket because it's busy trying to
restart
> > > > the webapp.
> > > >
> > > > What is the heap size of the server? try increasing it... maybe
tomcat
> > > > could have detected low memory and tried to reload the webapp.
> > > >
> > > > -Yonik
> > > >
> > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > > > Thanks for you help Yonik, I've responded to your questions
below:
> > > > >
> > > > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > It's possible it's not hanging, but just takes a long time on
a
> > > > > > specific add.  This is because Lucene will occasionally merge
> > > > > > segments.  When very large segments are merged, it can take a
long
> > > > > > time.
> > > > >
> > > > >
> > > > > I've left it running (hung) for up to a half hour at a time and
I've
> > > > > verified that my cpu idles during the hang. I have witnessed
much
> > > > shorter
> > > > > hangs on the ramp up to my 6,144 limit but they have been more
like
> > 2 -
> > > > 10
> > > > > seconds in length. Perhaps this is the Lucene merging you
mentioned.
> > > > >
> > > > > In the log file, add commands are followed by the number of
> > > > > > milliseconds the operation took.  Next time Solr hangs, wait
for a
> > > > > > number of minutes until you see the operation logged and note
how
> > long
> > > > > > it took.
> > > > >
> > > > >
> > > > > Here are the last 5 log entries before the hang the last one is
doc
> > > > #6,144.
> > > > > Also it looks like Tomcat is trying to redeploy the webapp those
> > last
> > > > tomcat
> > > > > entries repeat indefinitely every 10 seconds or so. Perhaps this
is
> > a
> > > > Tomcat
> > > > > problem?
> > > > >
> > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > > INFO: add (id=110705) 0 36596
> > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > > INFO: add (id=110700) 0 36600
> > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > > INFO: add (id=110688) 0 36603
> > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > > INFO: add (id=110690) 0 36608
> > > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > > INFO: add (id=110686) 0 36611
> > > > > Jul 26, 2006 1:25:36 PM
> > > > org.apache.catalina.startup.HostConfigcheckResources
> > > > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > > > tomcat-5.5.17
> > > > > /webapps/ROOT
> > > > > Jul 26, 2006 1:25:36 PM
> > > > org.apache.catalina.startup.HostConfigcheckResources
> > > > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > > > tomcat-5.5.17
> > > > > /webapps/ROOT/META-INF/context.xml
> > > > > Jul 26, 2006 1:25:36 PM
> > > > org.apache.catalina.startup.HostConfigcheckResources
> > > > > FINE: Checking context[] reload resource /source/solr/apache-
> > > > tomcat-5.5.17
> > > > > /webapps/ROOT/WEB-INF/web.xml
> > > > > Jul 26, 2006 1:25:36 PM
> > > > org.apache.catalina.startup.HostConfigcheckResources
> > > > > FINE: Checking context[] reload resource /source/solr/apache-
> > > > tomcat-5.5.17
> > > > > /webapps/ROOT/META-INF/context.x

Re: Doc add limit

2006-07-26 Thread Yonik Seeley

If you narrow the docs down to just the "id" field, does it still
happen at the same place?

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

I see the problem on Mac OS X/JDK: 1.5.0_06 and Debian/JDK: 1.5.0_07.

I don't think it's a socket problem, because I can initiate additional
updates while the server is hung... weird I know.

Thanks for all your help, I'll send a post if/when I find a solution.

-S

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> Tomcat problem, or a Solr problem that is only manifesting on your
> platform, or a JVM or libc problem, or even a client update problem...
> (possibly you might be exhausting the number of sockets in the server
> by using persistent connections with a long timeout and never reusing
> them?)
>
> What is your OS/JVM?
>
> -Yonik
>
> On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > Right now the heap is set to 512M but I've increased it up to 2GB and
> yet it
> > still hangs at the same number 6,144...
> >
> > Here's something interesting... I pushed this code over to a different
> > server and tried an update. On that server it's hanging on #5,267. Then
> > tomcat seems to try to reload the webapp... indefinitely.
> >
> > So I guess this is looking more like a tomcat problem more than a
> > lucene/solr problem huh?
> >
> > -Sangraal
> >
> > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > >
> > > So it looks like your client is hanging trying to send somethig over
> > > the socket to the server and blocking... probably because Tomcat isn't
> > > reading anything from the socket because it's busy trying to restart
> > > the webapp.
> > >
> > > What is the heap size of the server? try increasing it... maybe tomcat
> > > could have detected low memory and tried to reload the webapp.
> > >
> > > -Yonik
> > >
> > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > > Thanks for you help Yonik, I've responded to your questions below:
> > > >
> > > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > It's possible it's not hanging, but just takes a long time on a
> > > > > specific add.  This is because Lucene will occasionally merge
> > > > > segments.  When very large segments are merged, it can take a long
> > > > > time.
> > > >
> > > >
> > > > I've left it running (hung) for up to a half hour at a time and I've
> > > > verified that my cpu idles during the hang. I have witnessed much
> > > shorter
> > > > hangs on the ramp up to my 6,144 limit but they have been more like
> 2 -
> > > 10
> > > > seconds in length. Perhaps this is the Lucene merging you mentioned.
> > > >
> > > > In the log file, add commands are followed by the number of
> > > > > milliseconds the operation took.  Next time Solr hangs, wait for a
> > > > > number of minutes until you see the operation logged and note how
> long
> > > > > it took.
> > > >
> > > >
> > > > Here are the last 5 log entries before the hang the last one is doc
> > > #6,144.
> > > > Also it looks like Tomcat is trying to redeploy the webapp those
> last
> > > tomcat
> > > > entries repeat indefinitely every 10 seconds or so. Perhaps this is
> a
> > > Tomcat
> > > > problem?
> > > >
> > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > INFO: add (id=110705) 0 36596
> > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > INFO: add (id=110700) 0 36600
> > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > INFO: add (id=110688) 0 36603
> > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > INFO: add (id=110690) 0 36608
> > > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > > INFO: add (id=110686) 0 36611
> > > > Jul 26, 2006 1:25:36 PM
> > > org.apache.catalina.startup.HostConfigcheckResources
> > > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > > tomcat-5.5.17
> > > > /webapps/ROOT
> > > > Jul 26, 2006 1:25:36 PM
> > > org.apache.catalina.startup.HostConfigcheckResources
> > > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > > tomcat-5.5.17
> > > > /webapps/ROOT/META-INF/context.xml
> > > > Jul 26, 2006 1:25:36 PM
> > > org.apache.catalina.startup.HostConfigcheckResources
> > > > FINE: Checking context[] reload resource /source/solr/apache-
> > > tomcat-5.5.17
> > > > /webapps/ROOT/WEB-INF/web.xml
> > > > Jul 26, 2006 1:25:36 PM
> > > org.apache.catalina.startup.HostConfigcheckResources
> > > > FINE: Checking context[] reload resource /source/solr/apache-
> > > tomcat-5.5.17
> > > > /webapps/ROOT/META-INF/context.xml
> > > > Jul 26, 2006 1:25:36 PM
> > > org.apache.catalina.startup.HostConfigcheckResources
> > > > FINE: Checking context[] reload resource /source/solr/apache-
> > > tomcat-5.5.17
> > > > /conf/context.xml
> > > >
> > > > How many documents are in the index before you do a batch that
> causes
> > > > > a hang?  Does it happen on the first batch?  If so, you might be
> > > > > seeing some othe

Re: Doc add limit

2006-07-26 Thread sangraal aiken

I see the problem on Mac OS X/JDK: 1.5.0_06 and Debian/JDK: 1.5.0_07.

I don't think it's a socket problem, because I can initiate additional
updates while the server is hung... weird I know.

Thanks for all your help, I'll send a post if/when I find a solution.

-S

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


Tomcat problem, or a Solr problem that is only manifesting on your
platform, or a JVM or libc problem, or even a client update problem...
(possibly you might be exhausting the number of sockets in the server
by using persistent connections with a long timeout and never reusing
them?)

What is your OS/JVM?

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Right now the heap is set to 512M but I've increased it up to 2GB and
yet it
> still hangs at the same number 6,144...
>
> Here's something interesting... I pushed this code over to a different
> server and tried an update. On that server it's hanging on #5,267. Then
> tomcat seems to try to reload the webapp... indefinitely.
>
> So I guess this is looking more like a tomcat problem more than a
> lucene/solr problem huh?
>
> -Sangraal
>
> On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > So it looks like your client is hanging trying to send somethig over
> > the socket to the server and blocking... probably because Tomcat isn't
> > reading anything from the socket because it's busy trying to restart
> > the webapp.
> >
> > What is the heap size of the server? try increasing it... maybe tomcat
> > could have detected low memory and tried to reload the webapp.
> >
> > -Yonik
> >
> > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > Thanks for you help Yonik, I've responded to your questions below:
> > >
> > > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > > >
> > > > It's possible it's not hanging, but just takes a long time on a
> > > > specific add.  This is because Lucene will occasionally merge
> > > > segments.  When very large segments are merged, it can take a long
> > > > time.
> > >
> > >
> > > I've left it running (hung) for up to a half hour at a time and I've
> > > verified that my cpu idles during the hang. I have witnessed much
> > shorter
> > > hangs on the ramp up to my 6,144 limit but they have been more like
2 -
> > 10
> > > seconds in length. Perhaps this is the Lucene merging you mentioned.
> > >
> > > In the log file, add commands are followed by the number of
> > > > milliseconds the operation took.  Next time Solr hangs, wait for a
> > > > number of minutes until you see the operation logged and note how
long
> > > > it took.
> > >
> > >
> > > Here are the last 5 log entries before the hang the last one is doc
> > #6,144.
> > > Also it looks like Tomcat is trying to redeploy the webapp those
last
> > tomcat
> > > entries repeat indefinitely every 10 seconds or so. Perhaps this is
a
> > Tomcat
> > > problem?
> > >
> > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > INFO: add (id=110705) 0 36596
> > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > INFO: add (id=110700) 0 36600
> > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > INFO: add (id=110688) 0 36603
> > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > INFO: add (id=110690) 0 36608
> > > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > > INFO: add (id=110686) 0 36611
> > > Jul 26, 2006 1:25:36 PM
> > org.apache.catalina.startup.HostConfigcheckResources
> > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > tomcat-5.5.17
> > > /webapps/ROOT
> > > Jul 26, 2006 1:25:36 PM
> > org.apache.catalina.startup.HostConfigcheckResources
> > > FINE: Checking context[] redeploy resource /source/solr/apache-
> > tomcat-5.5.17
> > > /webapps/ROOT/META-INF/context.xml
> > > Jul 26, 2006 1:25:36 PM
> > org.apache.catalina.startup.HostConfigcheckResources
> > > FINE: Checking context[] reload resource /source/solr/apache-
> > tomcat-5.5.17
> > > /webapps/ROOT/WEB-INF/web.xml
> > > Jul 26, 2006 1:25:36 PM
> > org.apache.catalina.startup.HostConfigcheckResources
> > > FINE: Checking context[] reload resource /source/solr/apache-
> > tomcat-5.5.17
> > > /webapps/ROOT/META-INF/context.xml
> > > Jul 26, 2006 1:25:36 PM
> > org.apache.catalina.startup.HostConfigcheckResources
> > > FINE: Checking context[] reload resource /source/solr/apache-
> > tomcat-5.5.17
> > > /conf/context.xml
> > >
> > > How many documents are in the index before you do a batch that
causes
> > > > a hang?  Does it happen on the first batch?  If so, you might be
> > > > seeing some other bug.  What appserver are you using?  Do the
admin
> > > > pages respond when you see this hang?  If so, what does a stack
trace
> > > > look like?
> > >
> > >
> > > I actually don't think I had the problem on the first batch, in fact
my
> > > first batch contained very close to 6,144 documents so perhaps there
is
> > a
> > > relation there. Right now, I'm adding to an inde

Re: Doc add limit

2006-07-26 Thread Yonik Seeley

Tomcat problem, or a Solr problem that is only manifesting on your
platform, or a JVM or libc problem, or even a client update problem...
(possibly you might be exhausting the number of sockets in the server
by using persistent connections with a long timeout and never reusing
them?)

What is your OS/JVM?

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Right now the heap is set to 512M but I've increased it up to 2GB and yet it
still hangs at the same number 6,144...

Here's something interesting... I pushed this code over to a different
server and tried an update. On that server it's hanging on #5,267. Then
tomcat seems to try to reload the webapp... indefinitely.

So I guess this is looking more like a tomcat problem more than a
lucene/solr problem huh?

-Sangraal

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> So it looks like your client is hanging trying to send somethig over
> the socket to the server and blocking... probably because Tomcat isn't
> reading anything from the socket because it's busy trying to restart
> the webapp.
>
> What is the heap size of the server? try increasing it... maybe tomcat
> could have detected low memory and tried to reload the webapp.
>
> -Yonik
>
> On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > Thanks for you help Yonik, I've responded to your questions below:
> >
> > On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > >
> > > It's possible it's not hanging, but just takes a long time on a
> > > specific add.  This is because Lucene will occasionally merge
> > > segments.  When very large segments are merged, it can take a long
> > > time.
> >
> >
> > I've left it running (hung) for up to a half hour at a time and I've
> > verified that my cpu idles during the hang. I have witnessed much
> shorter
> > hangs on the ramp up to my 6,144 limit but they have been more like 2 -
> 10
> > seconds in length. Perhaps this is the Lucene merging you mentioned.
> >
> > In the log file, add commands are followed by the number of
> > > milliseconds the operation took.  Next time Solr hangs, wait for a
> > > number of minutes until you see the operation logged and note how long
> > > it took.
> >
> >
> > Here are the last 5 log entries before the hang the last one is doc
> #6,144.
> > Also it looks like Tomcat is trying to redeploy the webapp those last
> tomcat
> > entries repeat indefinitely every 10 seconds or so. Perhaps this is a
> Tomcat
> > problem?
> >
> > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > INFO: add (id=110705) 0 36596
> > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > INFO: add (id=110700) 0 36600
> > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > INFO: add (id=110688) 0 36603
> > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > INFO: add (id=110690) 0 36608
> > Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> > INFO: add (id=110686) 0 36611
> > Jul 26, 2006 1:25:36 PM
> org.apache.catalina.startup.HostConfigcheckResources
> > FINE: Checking context[] redeploy resource /source/solr/apache-
> tomcat-5.5.17
> > /webapps/ROOT
> > Jul 26, 2006 1:25:36 PM
> org.apache.catalina.startup.HostConfigcheckResources
> > FINE: Checking context[] redeploy resource /source/solr/apache-
> tomcat-5.5.17
> > /webapps/ROOT/META-INF/context.xml
> > Jul 26, 2006 1:25:36 PM
> org.apache.catalina.startup.HostConfigcheckResources
> > FINE: Checking context[] reload resource /source/solr/apache-
> tomcat-5.5.17
> > /webapps/ROOT/WEB-INF/web.xml
> > Jul 26, 2006 1:25:36 PM
> org.apache.catalina.startup.HostConfigcheckResources
> > FINE: Checking context[] reload resource /source/solr/apache-
> tomcat-5.5.17
> > /webapps/ROOT/META-INF/context.xml
> > Jul 26, 2006 1:25:36 PM
> org.apache.catalina.startup.HostConfigcheckResources
> > FINE: Checking context[] reload resource /source/solr/apache-
> tomcat-5.5.17
> > /conf/context.xml
> >
> > How many documents are in the index before you do a batch that causes
> > > a hang?  Does it happen on the first batch?  If so, you might be
> > > seeing some other bug.  What appserver are you using?  Do the admin
> > > pages respond when you see this hang?  If so, what does a stack trace
> > > look like?
> >
> >
> > I actually don't think I had the problem on the first batch, in fact my
> > first batch contained very close to 6,144 documents so perhaps there is
> a
> > relation there. Right now, I'm adding to an index with close to 90,000
> > documents in it.
> > I'm running Tomcat 5.5.17 and the admin pages respond just fine when
> it's
> > hung... I did a thread dump and this is the trace of my update:
> >
> > "http-8080-Processor25" Id=33 in RUNNABLE (running in native) total cpu
> > time=6330.7360ms user time=5769.5920ms
> >  at java.net.SocketOutputStream.socketWrite0(Native Method)
> >  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
> :92)
> >  at java.net.SocketOutputStream.write(SocketOutpu

Re: Doc add limit

2006-07-26 Thread sangraal aiken

Right now the heap is set to 512M but I've increased it up to 2GB and yet it
still hangs at the same number 6,144...

Here's something interesting... I pushed this code over to a different
server and tried an update. On that server it's hanging on #5,267. Then
tomcat seems to try to reload the webapp... indefinitely.

So I guess this is looking more like a tomcat problem more than a
lucene/solr problem huh?

-Sangraal

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


So it looks like your client is hanging trying to send somethig over
the socket to the server and blocking... probably because Tomcat isn't
reading anything from the socket because it's busy trying to restart
the webapp.

What is the heap size of the server? try increasing it... maybe tomcat
could have detected low memory and tried to reload the webapp.

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Thanks for you help Yonik, I've responded to your questions below:
>
> On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > It's possible it's not hanging, but just takes a long time on a
> > specific add.  This is because Lucene will occasionally merge
> > segments.  When very large segments are merged, it can take a long
> > time.
>
>
> I've left it running (hung) for up to a half hour at a time and I've
> verified that my cpu idles during the hang. I have witnessed much
shorter
> hangs on the ramp up to my 6,144 limit but they have been more like 2 -
10
> seconds in length. Perhaps this is the Lucene merging you mentioned.
>
> In the log file, add commands are followed by the number of
> > milliseconds the operation took.  Next time Solr hangs, wait for a
> > number of minutes until you see the operation logged and note how long
> > it took.
>
>
> Here are the last 5 log entries before the hang the last one is doc
#6,144.
> Also it looks like Tomcat is trying to redeploy the webapp those last
tomcat
> entries repeat indefinitely every 10 seconds or so. Perhaps this is a
Tomcat
> problem?
>
> Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> INFO: add (id=110705) 0 36596
> Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> INFO: add (id=110700) 0 36600
> Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> INFO: add (id=110688) 0 36603
> Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> INFO: add (id=110690) 0 36608
> Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
> INFO: add (id=110686) 0 36611
> Jul 26, 2006 1:25:36 PM
org.apache.catalina.startup.HostConfigcheckResources
> FINE: Checking context[] redeploy resource /source/solr/apache-
tomcat-5.5.17
> /webapps/ROOT
> Jul 26, 2006 1:25:36 PM
org.apache.catalina.startup.HostConfigcheckResources
> FINE: Checking context[] redeploy resource /source/solr/apache-
tomcat-5.5.17
> /webapps/ROOT/META-INF/context.xml
> Jul 26, 2006 1:25:36 PM
org.apache.catalina.startup.HostConfigcheckResources
> FINE: Checking context[] reload resource /source/solr/apache-
tomcat-5.5.17
> /webapps/ROOT/WEB-INF/web.xml
> Jul 26, 2006 1:25:36 PM
org.apache.catalina.startup.HostConfigcheckResources
> FINE: Checking context[] reload resource /source/solr/apache-
tomcat-5.5.17
> /webapps/ROOT/META-INF/context.xml
> Jul 26, 2006 1:25:36 PM
org.apache.catalina.startup.HostConfigcheckResources
> FINE: Checking context[] reload resource /source/solr/apache-
tomcat-5.5.17
> /conf/context.xml
>
> How many documents are in the index before you do a batch that causes
> > a hang?  Does it happen on the first batch?  If so, you might be
> > seeing some other bug.  What appserver are you using?  Do the admin
> > pages respond when you see this hang?  If so, what does a stack trace
> > look like?
>
>
> I actually don't think I had the problem on the first batch, in fact my
> first batch contained very close to 6,144 documents so perhaps there is
a
> relation there. Right now, I'm adding to an index with close to 90,000
> documents in it.
> I'm running Tomcat 5.5.17 and the admin pages respond just fine when
it's
> hung... I did a thread dump and this is the trace of my update:
>
> "http-8080-Processor25" Id=33 in RUNNABLE (running in native) total cpu
> time=6330.7360ms user time=5769.5920ms
>  at java.net.SocketOutputStream.socketWrite0(Native Method)
>  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
:92)
>  at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java
:105)
>  at java.io.PrintStream.write(PrintStream.java:412)
>  at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java
> :112)
>  at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:533)
>  at sun.net.www.protocol.http.HttpURLConnection.writeRequests(
> HttpURLConnection.java:410)
>  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
> HttpURLConnection.java:934)
>  at com.gawker.solr.update.GanjaUpdate.doUpdate(GanjaU

Re: Doc add limit

2006-07-26 Thread Yonik Seeley

So it looks like your client is hanging trying to send somethig over
the socket to the server and blocking... probably because Tomcat isn't
reading anything from the socket because it's busy trying to restart
the webapp.

What is the heap size of the server? try increasing it... maybe tomcat
could have detected low memory and tried to reload the webapp.

-Yonik

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Thanks for you help Yonik, I've responded to your questions below:

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> It's possible it's not hanging, but just takes a long time on a
> specific add.  This is because Lucene will occasionally merge
> segments.  When very large segments are merged, it can take a long
> time.


I've left it running (hung) for up to a half hour at a time and I've
verified that my cpu idles during the hang. I have witnessed much shorter
hangs on the ramp up to my 6,144 limit but they have been more like 2 - 10
seconds in length. Perhaps this is the Lucene merging you mentioned.

In the log file, add commands are followed by the number of
> milliseconds the operation took.  Next time Solr hangs, wait for a
> number of minutes until you see the operation logged and note how long
> it took.


Here are the last 5 log entries before the hang the last one is doc #6,144.
Also it looks like Tomcat is trying to redeploy the webapp those last tomcat
entries repeat indefinitely every 10 seconds or so. Perhaps this is a Tomcat
problem?

Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110705) 0 36596
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110700) 0 36600
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110688) 0 36603
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110690) 0 36608
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110686) 0 36611
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] redeploy resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] redeploy resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/META-INF/context.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/WEB-INF/web.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/META-INF/context.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/conf/context.xml

How many documents are in the index before you do a batch that causes
> a hang?  Does it happen on the first batch?  If so, you might be
> seeing some other bug.  What appserver are you using?  Do the admin
> pages respond when you see this hang?  If so, what does a stack trace
> look like?


I actually don't think I had the problem on the first batch, in fact my
first batch contained very close to 6,144 documents so perhaps there is a
relation there. Right now, I'm adding to an index with close to 90,000
documents in it.
I'm running Tomcat 5.5.17 and the admin pages respond just fine when it's
hung... I did a thread dump and this is the trace of my update:

"http-8080-Processor25" Id=33 in RUNNABLE (running in native) total cpu
time=6330.7360ms user time=5769.5920ms
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
 at java.io.PrintStream.write(PrintStream.java:412)
 at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java
:112)
 at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:533)
 at sun.net.www.protocol.http.HttpURLConnection.writeRequests(
HttpURLConnection.java:410)
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
HttpURLConnection.java:934)
 at com.gawker.solr.update.GanjaUpdate.doUpdate(GanjaUpdate.java:169)
 at com.gawker.solr.update.GanjaUpdate.update(GanjaUpdate.java:62)
 at org.apache.jsp.update_jsp._jspService(update_jsp.java:57)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at org.apache.jasper.servlet.JspServletWrapper.service(
JspServletWrapper.java:332)
 at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java
:314)
 at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at org.apache.cata

Re: Doc add limit

2006-07-26 Thread sangraal aiken

Thanks for you help Yonik, I've responded to your questions below:

On 7/26/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


It's possible it's not hanging, but just takes a long time on a
specific add.  This is because Lucene will occasionally merge
segments.  When very large segments are merged, it can take a long
time.



I've left it running (hung) for up to a half hour at a time and I've
verified that my cpu idles during the hang. I have witnessed much shorter
hangs on the ramp up to my 6,144 limit but they have been more like 2 - 10
seconds in length. Perhaps this is the Lucene merging you mentioned.

In the log file, add commands are followed by the number of

milliseconds the operation took.  Next time Solr hangs, wait for a
number of minutes until you see the operation logged and note how long
it took.



Here are the last 5 log entries before the hang the last one is doc #6,144.
Also it looks like Tomcat is trying to redeploy the webapp those last tomcat
entries repeat indefinitely every 10 seconds or so. Perhaps this is a Tomcat
problem?

Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110705) 0 36596
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110700) 0 36600
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110688) 0 36603
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110690) 0 36608
Jul 26, 2006 1:25:28 PM org.apache.solr.core.SolrCore update
INFO: add (id=110686) 0 36611
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] redeploy resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] redeploy resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/META-INF/context.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/WEB-INF/web.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/webapps/ROOT/META-INF/context.xml
Jul 26, 2006 1:25:36 PM org.apache.catalina.startup.HostConfigcheckResources
FINE: Checking context[] reload resource /source/solr/apache-tomcat-5.5.17
/conf/context.xml

How many documents are in the index before you do a batch that causes

a hang?  Does it happen on the first batch?  If so, you might be
seeing some other bug.  What appserver are you using?  Do the admin
pages respond when you see this hang?  If so, what does a stack trace
look like?



I actually don't think I had the problem on the first batch, in fact my
first batch contained very close to 6,144 documents so perhaps there is a
relation there. Right now, I'm adding to an index with close to 90,000
documents in it.
I'm running Tomcat 5.5.17 and the admin pages respond just fine when it's
hung... I did a thread dump and this is the trace of my update:

"http-8080-Processor25" Id=33 in RUNNABLE (running in native) total cpu
time=6330.7360ms user time=5769.5920ms
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.PrintStream.write(PrintStream.java:412)
at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java
:112)
at sun.net.www.http.HttpClient.writeRequests(HttpClient.java:533)
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(
HttpURLConnection.java:410)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
HttpURLConnection.java:934)
at com.gawker.solr.update.GanjaUpdate.doUpdate(GanjaUpdate.java:169)
at com.gawker.solr.update.GanjaUpdate.update(GanjaUpdate.java:62)
at org.apache.jsp.update_jsp._jspService(update_jsp.java:57)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.jasper.servlet.JspServletWrapper.service(
JspServletWrapper.java:332)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java
:314)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)

Re: Doc add limit

2006-07-26 Thread Yonik Seeley

It's possible it's not hanging, but just takes a long time on a
specific add.  This is because Lucene will occasionally merge
segments.  When very large segments are merged, it can take a long
time.

In the log file, add commands are followed by the number of
milliseconds the operation took.  Next time Solr hangs, wait for a
number of minutes until you see the operation logged and note how long
it took.

How many documents are in the index before you do a batch that causes
a hang?  Does it happen on the first batch?  If so, you might be
seeing some other bug.  What appserver are you using?  Do the admin
pages respond when you see this hang?  If so, what does a stack trace
look like?

-Yonik


On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Hey there... I'm having an issue with large doc updates on my solr
installation. I'm adding in batches between 2-20,000 docs at a time and I've
noticed Solr seems to hang at 6,144 docs every time. Breaking the adds into
smaller batches works just fine, but I was wondering if anyone knew why this
would happen. I've tried doubling memory as well as tweaking various config
options but nothing seems to let me break the 6,144 barrier.

This is the output from Solr admin. Any help would be greatly appreciated.


*name: * updateHandler  *class: *
org.apache.solr.update.DirectUpdateHandler2  *version: * 1.0  *description:
* Update handler that efficiently directly updates the on-disk main lucene
index  *stats: *commits : 0
optimizes : 0
docsPending : 6144
deletesPending : 6144
adds : 6144
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 6144
cumulative_deletesById : 0
cumulative_deletesByQuery : 0
cumulative_errors : 0
docsDeleted : 0