Re: Using Solr Analyzers in Lucene
I guess I missed the init() method. I was looking at the factory and thought I saw config loading stuff (like getInt) which I assumed meant it need to have schema.xml available. Thanks! -Max On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter wrote: > Hi Max, > > why don't you use WordDelimiterFilterFactory directly? I'm doing the same > stuff inside my own analyzer: > > final Map args = new HashMap(); > > args.put("generateWordParts", "1"); > args.put("generateNumberParts", "1"); > args.put("catenateWords", "0"); > args.put("catenateNumbers", "0"); > args.put("catenateAll", "0"); > args.put("splitOnCaseChange", "1"); > args.put("splitOnNumerics", "1"); > args.put("preserveOriginal", "1"); > args.put("stemEnglishPossessive", "0"); > args.put("language", "English"); > > wordDelimiter = new WordDelimiterFilterFactory(); > wordDelimiter.init(args); > stream = wordDelimiter.create(stream); > > -- > Kind regards, > Mathias > > > -Original Message- > > From: Max Lynch [mailto:ihas...@gmail.com] > > Sent: Tuesday, October 05, 2010 1:03 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Using Solr Analyzers in Lucene > > > > I have made progress on this by writing my own Analyzer. I basically > added > > the TokenFilters that are under each of the solr factory classes. I had > to > > copy and paste the WordDelimiterFilter because, of course, it was package > > protected. > > > > > > > > On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch wrote: > > > > > Hi, > > > I asked this question a month ago on lucene-user and was referred here. > > > > > > I have content being analyzed in Solr using these tokenizers and > filters: > > > > > > > > positionIncrementGap="100"> > > > > > > > > > > > > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > language="English" > > > protected="protwords.txt"/> > > > > > > > > > > > > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > > > > language="English" > > > protected="protwords.txt"/> > > > > > > > > > > > > Basically I want to be able to search against this index in Lucene with > one > > > of my background searching applications. > > > > > > My main reason for using Lucene over Solr for this is that I use the > > > highlighter to keep track of exactly which terms were found which I use > for > > > my own scoring system and I always collect the whole set of found > > > documents. I've messed around with using Boosts but it wasn't fine > grained > > > enough and I wasn't able to effectively create a score threshold (would > > > creating my own scorer be a better idea?) > > > > > > Is it possible to use this analyzer from Lucene, or at least re-create > it > > > in code? > > > > > > Thanks. > > > > > > > >
Re: Using Solr Analyzers in Lucene
I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected. On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch wrote: > Hi, > I asked this question a month ago on lucene-user and was referred here. > > I have content being analyzed in Solr using these tokenizers and filters: > > positionIncrementGap="100"> > > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > Basically I want to be able to search against this index in Lucene with one > of my background searching applications. > > My main reason for using Lucene over Solr for this is that I use the > highlighter to keep track of exactly which terms were found which I use for > my own scoring system and I always collect the whole set of found > documents. I've messed around with using Boosts but it wasn't fine grained > enough and I wasn't able to effectively create a score threshold (would > creating my own scorer be a better idea?) > > Is it possible to use this analyzer from Lucene, or at least re-create it > in code? > > Thanks. > >
Using Solr Analyzers in Lucene
Hi, I asked this question a month ago on lucene-user and was referred here. I have content being analyzed in Solr using these tokenizers and filters: Basically I want to be able to search against this index in Lucene with one of my background searching applications. My main reason for using Lucene over Solr for this is that I use the highlighter to keep track of exactly which terms were found which I use for my own scoring system and I always collect the whole set of found documents. I've messed around with using Boosts but it wasn't fine grained enough and I wasn't able to effectively create a score threshold (would creating my own scorer be a better idea?) Is it possible to use this analyzer from Lucene, or at least re-create it in code? Thanks.
Search a URL
Is there a tokenizer that will allow me to search for parts of a URL? For example, the search "google" would match on the data " http://mail.google.com/dlkjadf"; This tokenizer factory doesn't seem to be sufficient: Thanks.
Re: Updating document without removing fields
Thanks Lance. I have decided to just put all of my processing on a bigger server along with solr. It's too bad, but I can manage. -Max On Sun, Aug 29, 2010 at 9:59 PM, Lance Norskog wrote: > No. Document creation is all-or-nothing, fields are not updateable. > > I think you have to filter all of your field changes through a "join" > server. That is, > all field updates could go to a database and the master would read > document updates > from that database. Or, you could have one updater feed updates to the > other, The > sends all updates to the master. > > Lance > > On Sun, Aug 29, 2010 at 6:19 PM, Max Lynch wrote: > > Hi, > > I have a master solr server and two slaves. On each of the slaves I have > > programs running that read the slave index, do some processing on each > > document, add a few new fields, and commit the changes back to the > master. > > > > The problem I'm running into right now is one slave will update one > document > > and the other slave will eventually update the same document, but the > > changes will overwrite each other. For example, one slave will add a > field > > and commit the document, but the other slave won't have that field yet so > it > > won't duplicate the document when it updates the doc with its own new > > field. This causes the document to miss one set of fields from one of > the > > slaves. > > > > Can I update a document without having to recreate it? Is there a way to > > update the slave and then have the slave commit the changes to the master > > (adding new fields in the process?) > > > > Thanks. > > > > > > -- > Lance Norskog > goks...@gmail.com >
Updating document without removing fields
Hi, I have a master solr server and two slaves. On each of the slaves I have programs running that read the slave index, do some processing on each document, add a few new fields, and commit the changes back to the master. The problem I'm running into right now is one slave will update one document and the other slave will eventually update the same document, but the changes will overwrite each other. For example, one slave will add a field and commit the document, but the other slave won't have that field yet so it won't duplicate the document when it updates the doc with its own new field. This causes the document to miss one set of fields from one of the slaves. Can I update a document without having to recreate it? Is there a way to update the slave and then have the slave commit the changes to the master (adding new fields in the process?) Thanks.
Re: Duplicating a Solr Doc
It seems like this is a way to accomplish what I was looking for: CoreContainer coreContainer = new CoreContainer(); File home = new File("/home/max/packages/test/apache-solr-1.4.1/example/solr"); File f = new File(home, "solr.xml"); coreContainer.load("/home/max/packages/test/apache-solr-1.4.1/example/solr", f); SolrCore core = coreContainer.getCore("newsblog"); IndexSchema schema = core.getSchema(); DocumentBuilder builder = new DocumentBuilder(schema); // get a Lucene Doc // Document d = ... SolrDocument solrDocument = new SolrDocument(); builder.loadStoredFields(solrDocument, d); logger.debug("Loaded stored date: " + solrDocument.getFieldValue("date_added_solr")); However, one thing that scares me is the warning message I get from the CoreContainer: [java] Aug 25, 2010 10:25:23 PM org.apache.solr.update.SolrIndexWriter finalize [java] SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! I'm not sure what exactly triggers that but it's a result of the code I posted above. On Wed, Aug 25, 2010 at 10:49 PM, Max Lynch wrote: > Right now I am doing some processing on my Solr index using Lucene Java. > Basically, I loop through the index in Java and do some extra processing of > each document (processing that is too intensive to do during indexing). > > However, when I try to update the document in solr with new fields (using > SolrJ), the document either loses fields I don't explicitly set, or if I > have Solr-specific fields such as a solr "date" field type, I am not able to > copy the value as I can't read the value from Java. > > Is there a way to add a field to a solr document without having to > re-create the document? If not, how can I read the value of a Solr date in > java? Document.get("date_field") returns null even though the value shows > up when I access it through solr. If I could read this value I could just > copy the fields from the Lucene Document to a SolrInputDocument. > > Thanks. >
Re: Delete by query issue
Thanks Lance. I'll give that a try going forward. On Wed, Aug 25, 2010 at 9:59 PM, Lance Norskog wrote: > Here's the problem: the standard Solr parser is a little weird about > negative queries. The way to make this work is to say >*:* AND -field:[* TO *] > > This means "select everything AND only these documents without a value > in the field". > > On Wed, Aug 25, 2010 at 7:55 PM, Max Lynch wrote: > > I was trying to filter out all documents that HAVE that field. I was > trying > > to delete any documents where that field had empty values. > > > > I just found a way to do it, but I did a range query on a string date in > the > > Lucene DateTools format and it worked, so I'm satisfied. However, I > believe > > it worked because all of my documents have values for that field. > > > > Oh well. > > > > -max > > > > On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) >wrote: > > > >> Excuse me, what's the hyphen before the field name 'date_added_solr'? > Is > >> this some kind of new query format that I didn't know? > >> > >> -date_added_solr:[* TO *]' > >> > >> - Original Message - > >> From: "Max Lynch" > >> To: > >> Sent: Thursday, August 26, 2010 6:12 AM > >> Subject: Delete by query issue > >> > >> > >> > Hi, > >> > I am trying to delete all documents that have null values for a > certain > >> > field. To that effect I can see all of the documents I want to delete > by > >> > doing this query: > >> > -date_added_solr:[* TO *] > >> > > >> > This returns about 32,000 documents. > >> > > >> > However, when I try to put that into a curl call, no documents get > >> deleted: > >> > curl http://localhost:8985/solr/newsblog/update?commit=true -H > >> > "Content-Type: text/xml" --data-binary > >> '-date_added_solr:[* > >> > TO *]' > >> > > >> > Solr responds with: > >> > > >> > 0 >> > name="QTime">364 > >> > > >> > > >> > But nothing happens, even if I explicitly issue a commit afterward. > >> > > >> > Any ideas? > >> > > >> > Thanks. > >> > > >> > >> > >> > >> > > >> > >> > >> > >> %<&b6G$J0T.'$$'d(l/f,r!C > >> Checked by AVG - www.avg.com > >> Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10 > >> 14:34:00 > >> > > > > > > -- > Lance Norskog > goks...@gmail.com >
Duplicating a Solr Doc
Right now I am doing some processing on my Solr index using Lucene Java. Basically, I loop through the index in Java and do some extra processing of each document (processing that is too intensive to do during indexing). However, when I try to update the document in solr with new fields (using SolrJ), the document either loses fields I don't explicitly set, or if I have Solr-specific fields such as a solr "date" field type, I am not able to copy the value as I can't read the value from Java. Is there a way to add a field to a solr document without having to re-create the document? If not, how can I read the value of a Solr date in java? Document.get("date_field") returns null even though the value shows up when I access it through solr. If I could read this value I could just copy the fields from the Lucene Document to a SolrInputDocument. Thanks.
Re: Delete by query issue
I was trying to filter out all documents that HAVE that field. I was trying to delete any documents where that field had empty values. I just found a way to do it, but I did a range query on a string date in the Lucene DateTools format and it worked, so I'm satisfied. However, I believe it worked because all of my documents have values for that field. Oh well. -max On Wed, Aug 25, 2010 at 9:45 PM, scott chu (朱炎詹) wrote: > Excuse me, what's the hyphen before the field name 'date_added_solr'? Is > this some kind of new query format that I didn't know? > > -date_added_solr:[* TO *]' > > - Original Message - > From: "Max Lynch" > To: > Sent: Thursday, August 26, 2010 6:12 AM > Subject: Delete by query issue > > > > Hi, > > I am trying to delete all documents that have null values for a certain > > field. To that effect I can see all of the documents I want to delete by > > doing this query: > > -date_added_solr:[* TO *] > > > > This returns about 32,000 documents. > > > > However, when I try to put that into a curl call, no documents get > deleted: > > curl http://localhost:8985/solr/newsblog/update?commit=true -H > > "Content-Type: text/xml" --data-binary > '-date_added_solr:[* > > TO *]' > > > > Solr responds with: > > > > 0 > name="QTime">364 > > > > > > But nothing happens, even if I explicitly issue a commit afterward. > > > > Any ideas? > > > > Thanks. > > > > > > > > > > %<&b6G$J0T.'$$'d(l/f,r!C > Checked by AVG - www.avg.com > Version: 9.0.851 / Virus Database: 271.1.1/3093 - Release Date: 08/25/10 > 14:34:00 >
Delete by query issue
Hi, I am trying to delete all documents that have null values for a certain field. To that effect I can see all of the documents I want to delete by doing this query: -date_added_solr:[* TO *] This returns about 32,000 documents. However, when I try to put that into a curl call, no documents get deleted: curl http://localhost:8985/solr/newsblog/update?commit=true -H "Content-Type: text/xml" --data-binary '-date_added_solr:[* TO *]' Solr responds with: 0364 But nothing happens, even if I explicitly issue a commit afterward. Any ideas? Thanks.
Re: Duplicate a core
What I'm doing now is just adding the documents to the other core each night and deleting old documents from the other core when I'm finished. Is there a better way? On Tue, Aug 3, 2010 at 4:38 PM, Max Lynch wrote: > Is it possible to duplicate a core? I want to have one core contain only > documents within a certain date range (ex: 3 days old), and one core with > all documents that have ever been in the first core. The small core is then > replicated to other servers which do "real-time" processing on it, but the > "archive" core exists for longer term searching. > > I understand I could just connect to both cores from my indexer, but I > would like to not have to send duplicate documents across the network to > save bandwidth. > > Is this possible? > > Thanks. >
Duplicate a core
Is it possible to duplicate a core? I want to have one core contain only documents within a certain date range (ex: 3 days old), and one core with all documents that have ever been in the first core. The small core is then replicated to other servers which do "real-time" processing on it, but the "archive" core exists for longer term searching. I understand I could just connect to both cores from my indexer, but I would like to not have to send duplicate documents across the network to save bandwidth. Is this possible? Thanks.
Re: Know which terms are in a document
Yea, I've had mild success with the highlighting approach with lucene, but wasn't sure if there was another method available from solr. Thanks Mike. On Thu, Jul 29, 2010 at 5:17 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > This is a fairly frequently requested and missing feature in Lucene/Solr... > > Lucene actually "knows" this information while it's scoring each > document; it's just that it in no way tries to record that. > > If you will only do this on a few documents (eg the one page of > results) then piggybacking on the highlighter is an OK approach. > > If you need it on more docs than that, then probably you should > customize how your queries are scored to also tally up which docs had > which terms. > > Mike > > On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch wrote: > > I would like to be search against my index, and then *know* which of a > set > > of given terms were found in each document. > > > > For example, let's say I want to show articles with the word "pizza" or > > "cake" in them, but would like to be able to say which of those two was > > found. I might use this to handle the article differently if it is about > > pizza, or if it is about cake. I understand I can do multiple queries > but I > > would like to avoid that. > > > > One thought I had was to use a highlighter and only return a fragment > with > > the highlighted word, but I'm not sure how to do this with the various > > highlighting options. > > > > Is there a way? > > > > Thanks. > > >
Know which terms are in a document
I would like to be search against my index, and then *know* which of a set of given terms were found in each document. For example, let's say I want to show articles with the word "pizza" or "cake" in them, but would like to be able to say which of those two was found. I might use this to handle the article differently if it is about pizza, or if it is about cake. I understand I can do multiple queries but I would like to avoid that. One thought I had was to use a highlighter and only return a fragment with the highlighted word, but I'm not sure how to do this with the various highlighting options. Is there a way? Thanks.
Re: CommonsHttpSolrServer add document hangs
I'm still having trouble with this. My program will run for a while, then hang up at the same place. Here is my add/commit process: I am using StreamingUpdateSolrServer with queue size = 100 and num threads = 3. My indexing process spawns 8 threads to process a subset of RSS feeds which each thread then loops through. Once a thread has processed a new article, it constructs a new SolrInputDocument, creates a temporary Collection containing just the one new document, then calls server.add(docs). I never call commit() or optimize() from my java code (I did before though, but I took that out). On the server side, I have these related settings: 300 1 I also have replication set up, as this is the master, here are the settings: commit startup schema.xml,stopwords.txt Those are the only extra settings I've set. I also have a cron job running every minute executing this command: curl http://localhost:8985/solr/mycore/update -F stream.body=' ' Otherwise I don't see the numDocs number increase on the admin statistics page. This process will soon be ONLY for indexing. Is there a better way to optimize it? I replicate from the slaves every 60 seconds, and I want documents to be available to the slaves as soon as possible. Currently I have a search process that has some IndexSearcher's on the Solr index (it's a pure Lucene program), could that be causing issues? This process never opens an IndexWriter. Thanks! On Tue, Jul 13, 2010 at 10:52 AM, Max Lynch wrote: > Great, thanks! > > > On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom > wrote: > >> If you're only adding documents you can also have a go with >> StreamingUpdateSolrServer instead of the CommonsHttpSolrServer. >> Couple that with the suggestion of master/slave so the searches don't >> interfere with the indexing and you should have a pretty responsive >> system. >> >> -Original Message- >> From: Robert Petersen [mailto:rober...@buy.com] >> Sent: maandag 12 juli 2010 22:30 >> To: solr-user@lucene.apache.org >> Subject: RE: CommonsHttpSolrServer add document hangs >> >> You could try a master slave setup using replication perhaps, then the >> slave serves searches and indexing commits on the master won't hang up >> searches at least... >> >> Here is the description: http://wiki.apache.org/solr/SolrReplication >> >> >> -Original Message- >> From: Max Lynch [mailto:ihas...@gmail.com] >> Sent: Monday, July 12, 2010 11:57 AM >> To: solr-user@lucene.apache.org >> Subject: Re: CommonsHttpSolrServer add document hangs >> >> Thanks Robert, >> >> My script did start going again, but it was waiting for about half an >> hour >> which seems a bit excessive to me. Is there some tuning I can do on the >> solr end to optimize for my use case, which is very heavy on commits and >> very light on searches (I do most of my searches on the raw Lucene index >> in >> the background)? >> >> Thanks. >> >> On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen >> wrote: >> >> > Maybe solr is busy doing a commit or optimize? >> > >> > -Original Message- >> > From: Max Lynch [mailto:ihas...@gmail.com] >> > Sent: Monday, July 12, 2010 9:59 AM >> > To: solr-user@lucene.apache.org >> > Subject: CommonsHttpSolrServer add document hangs >> > >> > Hey guys, >> > I'm using Solr 1.4.1 and I've been having some problems lately with >> code >> > that adds documents through a CommonsHttpSolrServer. It seems that >> > randomly >> > the call to theserver.add() will hang. I am currently running my code >> > in a >> > single thread, but I noticed this would happen in multi threaded code >> as >> > well. The jar version of commons-httpclient is 3.1. >> > >> > I got a thread dump of the process, and one thread seems to be waiting >> > on >> > the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager >> as >> > shown below. All other threads are in a RUNNABLE state (besides the >> > Finalizer daemon). >> > >> > [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM >> (16.3-b01 >> > mixed mode): >> > [java] >> > [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10 >> > tid=0x7f441051c800 nid=0x527c in Object.wait() >> [0x7f4417e2f000] >> > [java]java.lang.Thread.State: WAITING (on object monitor) >> > [java] at java.lang.Object.wait(Native Method) >> > [java] - waiting on <0x7f443ae5b290> (a >> > java.lang.ref.ReferenceQueue$Lock) >> > [java] at >> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) >> > [java] - locked <0x7f443ae5b290> (a >> > java.lang.ref.ReferenceQueue$Lock) >> > [java] at >> > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) >> > [java] at >> > >> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen >> > ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) >> > >> > Any ideas? >> > >> > Thanks. >> > >> > >
Re: CommonsHttpSolrServer add document hangs
Great, thanks! On Tue, Jul 13, 2010 at 2:55 AM, Fornoville, Tom wrote: > If you're only adding documents you can also have a go with > StreamingUpdateSolrServer instead of the CommonsHttpSolrServer. > Couple that with the suggestion of master/slave so the searches don't > interfere with the indexing and you should have a pretty responsive > system. > > -Original Message- > From: Robert Petersen [mailto:rober...@buy.com] > Sent: maandag 12 juli 2010 22:30 > To: solr-user@lucene.apache.org > Subject: RE: CommonsHttpSolrServer add document hangs > > You could try a master slave setup using replication perhaps, then the > slave serves searches and indexing commits on the master won't hang up > searches at least... > > Here is the description: http://wiki.apache.org/solr/SolrReplication > > > -Original Message- > From: Max Lynch [mailto:ihas...@gmail.com] > Sent: Monday, July 12, 2010 11:57 AM > To: solr-user@lucene.apache.org > Subject: Re: CommonsHttpSolrServer add document hangs > > Thanks Robert, > > My script did start going again, but it was waiting for about half an > hour > which seems a bit excessive to me. Is there some tuning I can do on the > solr end to optimize for my use case, which is very heavy on commits and > very light on searches (I do most of my searches on the raw Lucene index > in > the background)? > > Thanks. > > On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen > wrote: > > > Maybe solr is busy doing a commit or optimize? > > > > -Original Message- > > From: Max Lynch [mailto:ihas...@gmail.com] > > Sent: Monday, July 12, 2010 9:59 AM > > To: solr-user@lucene.apache.org > > Subject: CommonsHttpSolrServer add document hangs > > > > Hey guys, > > I'm using Solr 1.4.1 and I've been having some problems lately with > code > > that adds documents through a CommonsHttpSolrServer. It seems that > > randomly > > the call to theserver.add() will hang. I am currently running my code > > in a > > single thread, but I noticed this would happen in multi threaded code > as > > well. The jar version of commons-httpclient is 3.1. > > > > I got a thread dump of the process, and one thread seems to be waiting > > on > > the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager > as > > shown below. All other threads are in a RUNNABLE state (besides the > > Finalizer daemon). > > > > [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM > (16.3-b01 > > mixed mode): > > [java] > > [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10 > > tid=0x7f441051c800 nid=0x527c in Object.wait() > [0x7f4417e2f000] > > [java]java.lang.Thread.State: WAITING (on object monitor) > > [java] at java.lang.Object.wait(Native Method) > > [java] - waiting on <0x7f443ae5b290> (a > > java.lang.ref.ReferenceQueue$Lock) > > [java] at > > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) > > [java] - locked <0x7f443ae5b290> (a > > java.lang.ref.ReferenceQueue$Lock) > > [java] at > > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) > > [java] at > > > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen > > ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) > > > > Any ideas? > > > > Thanks. > > >
Re: CommonsHttpSolrServer add document hangs
Thanks Robert, My script did start going again, but it was waiting for about half an hour which seems a bit excessive to me. Is there some tuning I can do on the solr end to optimize for my use case, which is very heavy on commits and very light on searches (I do most of my searches on the raw Lucene index in the background)? Thanks. On Mon, Jul 12, 2010 at 12:06 PM, Robert Petersen wrote: > Maybe solr is busy doing a commit or optimize? > > -Original Message- > From: Max Lynch [mailto:ihas...@gmail.com] > Sent: Monday, July 12, 2010 9:59 AM > To: solr-user@lucene.apache.org > Subject: CommonsHttpSolrServer add document hangs > > Hey guys, > I'm using Solr 1.4.1 and I've been having some problems lately with code > that adds documents through a CommonsHttpSolrServer. It seems that > randomly > the call to theserver.add() will hang. I am currently running my code > in a > single thread, but I noticed this would happen in multi threaded code as > well. The jar version of commons-httpclient is 3.1. > > I got a thread dump of the process, and one thread seems to be waiting > on > the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as > shown below. All other threads are in a RUNNABLE state (besides the > Finalizer daemon). > > [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 > mixed mode): > [java] > [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10 > tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] > [java]java.lang.Thread.State: WAITING (on object monitor) > [java] at java.lang.Object.wait(Native Method) > [java] - waiting on <0x7f443ae5b290> (a > java.lang.ref.ReferenceQueue$Lock) > [java] at > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) > [java] - locked <0x7f443ae5b290> (a > java.lang.ref.ReferenceQueue$Lock) > [java] at > java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) > [java] at > org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Referen > ceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) > > Any ideas? > > Thanks. >
CommonsHttpSolrServer add document hangs
Hey guys, I'm using Solr 1.4.1 and I've been having some problems lately with code that adds documents through a CommonsHttpSolrServer. It seems that randomly the call to theserver.add() will hang. I am currently running my code in a single thread, but I noticed this would happen in multi threaded code as well. The jar version of commons-httpclient is 3.1. I got a thread dump of the process, and one thread seems to be waiting on the org.apache.commons.httpclient.MultiThreadedHttpConnectionManager as shown below. All other threads are in a RUNNABLE state (besides the Finalizer daemon). [java] Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01 mixed mode): [java] [java] "MultiThreadedHttpConnectionManager cleanup" daemon prio=10 tid=0x7f441051c800 nid=0x527c in Object.wait() [0x7f4417e2f000] [java]java.lang.Thread.State: WAITING (on object monitor) [java] at java.lang.Object.wait(Native Method) [java] - waiting on <0x7f443ae5b290> (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [java] - locked <0x7f443ae5b290> (a java.lang.ref.ReferenceQueue$Lock) [java] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [java] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122) Any ideas? Thanks.
MailEntityProcessor class cast exception
With last night's build of solr, I am trying to use the MailEntityProcessor to index an email account. However, when I call my dataimport url, I receive a class cast exception: INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=44 Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/home/m/g/spider/misc/solrindex_nl/index,segFN=segments_1,version=1276738117525,generation=1,filenames=[segments_1] Jun 16, 2010 8:16:03 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1276738117525 Jun 16, 2010 8:16:03 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:99544078513223 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:804) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:535) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:260) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:184) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:392) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:373) Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.MailEntityProcessor cannot be cast to org.apache.solr.handler.dataimport.EntityProcessor at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:801) ... 6 more Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jun 16, 2010 8:16:03 PM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Here is my dataimport part of my solrconfig.xml: /home/max/packages/apache-solr-4.0-2010-06-16_08-05-33/e/solr/conf/data-config.xml and my data-config.xml: I did try to rebuild the solr nightly, but I still receive the same error. I have all of the required jar's (AFAIK) in my application's lib folder. Any ideas? Thanks.