Re: Data from deleted from Solr (Solr cloud)
Yeah, I ran into this issue myself with solr-4.0.0. To fix it, I had to compile my own version from the solr-4x branch. That is, I assume it's fixed as I have been unable to replicate it after the switch. I'm afraid you will have to reindex your data. -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk On Wed, Dec 19, 2012 at 5:08 PM, shreejay shreej...@gmail.com wrote: Hi All, I have a solrlcoud instance with 3 shards. Each shard has 2 instance (2 servers each running a instance of solr) Lets say I had Instance1 and instance2 in shard1 … At some point, instance2 went down due to OOM (out of memory) . instance1 for some reason was not replicating the data properly and when it became the leader, it had only around 1% of the data that instance2 had. I restarted instance2, and hoped that instance1 will replicate from 2, but instead instanace2 replicated from instance1 . and ended up deleting the original index folder it had. There were around 2 million documents in that instance. Can any one of solrlcoud users give any hints if I can recover this data? --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Data-from-deleted-from-Solr-Solr-cloud-tp4028055.html Sent from the Solr - User mailing list archive at Nabble.com.
Pause and resume indexing on SolR 4 for backups
Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
On 20 December 2012 15:46, Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. [...] Unless I am missing something, the index is only being written to when you are adding/updating the index. So, the question is how is this being done in your case, and could you pause indexing for the duration of the backup? Regards, Gora
RE: Pause and resume indexing on SolR 4 for backups
You can use the replication handler to fetch a complete snapshot of the index over HTTP. http://wiki.apache.org/solr/SolrReplication#HTTP_API -Original message- From:Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk Sent: Thu 20-Dec-2012 11:23 To: solr-user@lucene.apache.org Subject: Pause and resume indexing on SolR 4 for backups Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 10:24, Gora Mohanty wrote: Unless I am missing something, the index is only being written to when you are adding/updating the index. So, the question is how is this being done in your case, and could you pause indexing for the duration of the backup? Regards, Gora It's attached to a web-app, which accepts uploads and will be available 24/7, with a global audience, so pausing it may be rather difficult (tho I may put this to the developer - it may for instance be possible if he has a small number of choke points for input into SolR). Thanks. -- Andy D'Arcy Jewell SysMicro Limited Linux Support T: 0844 9918804 M: 07961605631 E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
On 20 December 2012 16:14, Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk wrote: [...] It's attached to a web-app, which accepts uploads and will be available 24/7, with a global audience, so pausing it may be rather difficult (tho I may put this to the developer - it may for instance be possible if he has a small number of choke points for input into SolR). [...] It adds work for the web developer, but one could pause indexing, put indexing requests into some kind of a queuing system, do the backup, and flush the queue when the backup is done. Regards, Gora
Re: Pause and resume indexing on SolR 4 for backups
I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. Upayavira On Thu, Dec 20, 2012, at 10:34 AM, Markus Jelsma wrote: You can use the replication handler to fetch a complete snapshot of the index over HTTP. http://wiki.apache.org/solr/SolrReplication#HTTP_API -Original message- From:Andy D'Arcy Jewell andy.jew...@sysmicro.co.uk Sent: Thu 20-Dec-2012 11:23 To: solr-user@lucene.apache.org Subject: Pause and resume indexing on SolR 4 for backups Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Finding the last committed record in SOLR 4
I cannot see how SolrJ and the admin UI would return different results. Could you run exactly the same query on both and show what you get here? Upayavira On Thu, Dec 20, 2012, at 06:17 AM, Joe wrote: I'm using SOLR 4 for an application, where I need to search the index soon after inserting records. I'm using the solrj code below to get the last ID in the index. However, I noticed that the last id I see when I execute a query through the solr web admin is often lagging behind this. And that my searches are not including all documents up until the last ID I get from the code snippet below. I'm guessing this is because of delays in hard commits. I don't need to switch to soft commits yet. I just want to make sure that I get the ID of the last searchable document. Is this possible to do? SolrQuery query = new SolrQuery(); query.set(qt,/select); query.setQuery( *:* ); query.setFields(id); query.set(rows,1); query.set(sort,id desc); QueryResponse rsp = m_Server.query( query ); SolrDocumentList docs = rsp.getResults(); SolrDocument doc = docs.get(0); long id = (Long) doc.getFieldValue(id); -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-the-last-committed-record-in-SOLR-4-tp4028235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic modification of field value
Which strikes me as the right way to go. Upayavira On Thu, Dec 20, 2012, at 12:30 PM, AlexeyK wrote: Implemented it with http://wiki.apache.org/solr/DocTransformers. -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-modification-of-field-value-tp4028234p4028301.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 11:58, Upayavira wrote: I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. I've looked at that this morning as suggested by Markus Jelsma. Looks good, but I'll have to work out how to use the resultant backup directory. I've been dealing with another unrelated issue in the mean-time and I haven't had a chance to look for any docu so far. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. That's the approach that Shawn Heisey proposed, and what I've been working towards, but it still leaves open the question of how to *pause* SolR or prevent commits during the backup (otherwise we have a potential race condition). -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Upayavira On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: On 20/12/12 11:58, Upayavira wrote: I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. I've looked at that this morning as suggested by Markus Jelsma. Looks good, but I'll have to work out how to use the resultant backup directory. I've been dealing with another unrelated issue in the mean-time and I haven't had a chance to look for any docu so far. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. That's the approach that Shawn Heisey proposed, and what I've been working towards, but it still leaves open the question of how to *pause* SolR or prevent commits during the backup (otherwise we have a potential race condition). -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Where does schema.xml's schema/@name displays?
Personally I have never given it any attention, so I suspect it doesn't matter much. Upayavira On Thu, Dec 20, 2012, at 05:08 AM, Alexandre Rafalovitch wrote: Hello, In the schema.xml, we have a name attribute on the root note. The documentation says it is for display purpose only. But for display where? It seems that the admin console uses the name in the solr.xml file instead. And deleting the name attribute does not seem to cause any problems either. The reason I ask is because I am writing an explanation example which involves schema.xml config file being copied and modified over and over again. If @name is significant, I need to mention changing it. If not, I will just delete it all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
RE: Pause and resume indexing on SolR 4 for backups
You can use the postCommit event in updateHandler to execute a task. -Original message- From:Upayavira u...@odoko.co.uk Sent: Thu 20-Dec-2012 14:45 To: solr-user@lucene.apache.org Subject: Re: Pause and resume indexing on SolR 4 for backups The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Upayavira On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: On 20/12/12 11:58, Upayavira wrote: I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. I've looked at that this morning as suggested by Markus Jelsma. Looks good, but I'll have to work out how to use the resultant backup directory. I've been dealing with another unrelated issue in the mean-time and I haven't had a chance to look for any docu so far. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. That's the approach that Shawn Heisey proposed, and what I've been working towards, but it still leaves open the question of how to *pause* SolR or prevent commits during the backup (otherwise we have a potential race condition). -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
That's neat, but wouldn't that run on every commit? How would you use it to, say, back up once a day? Upayavira On Thu, Dec 20, 2012, at 01:57 PM, Markus Jelsma wrote: You can use the postCommit event in updateHandler to execute a task. -Original message- From:Upayavira u...@odoko.co.uk Sent: Thu 20-Dec-2012 14:45 To: solr-user@lucene.apache.org Subject: Re: Pause and resume indexing on SolR 4 for backups The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Upayavira On Thu, Dec 20, 2012, at 12:44 PM, Andy D'Arcy Jewell wrote: On 20/12/12 11:58, Upayavira wrote: I've never used it, but the replication handler has an option: http://master_host:port/solr/replication?command=backup Which will take you a backup. I've looked at that this morning as suggested by Markus Jelsma. Looks good, but I'll have to work out how to use the resultant backup directory. I've been dealing with another unrelated issue in the mean-time and I haven't had a chance to look for any docu so far. Also something to note, if you don't want to use the above, and you are running on Unix, you can create fast 'hard link' clones of lucene indexes. Doing: cp -lr data data.bak will copy your index instantly. If you can avoid doing this when a commit is happening, then you'll have a good index copy, that will take no space on your disk and be made instantly. This is because it just copies the directory structure, not the files themselves, and given files in a lucene index never change (they are only ever deleted or replaced), this works as a good copy technique for backing up. That's the approach that Shawn Heisey proposed, and what I've been working towards, but it still leaves open the question of how to *pause* SolR or prevent commits during the backup (otherwise we have a potential race condition). -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Store and retrieve an xml sequence without losing the markup
Hi everybody, i'm newbie with Solr technologies but in the past i worked with lucene and another solution similar to Solr. I'm working with solr 4.0. I use solrj for embedding an Solr server in a cocoon 2.1 application. I want to know if it's possible to store (without indexing) a field containing a xml sequence. I mean a field which can store xml data in indexes without losing xpath informations. For exemple, this's a document to index: add doc field name=idid_1/field field name=infotesting/field field name=subdoc subdoc id=id_1 datatesting/data /subdoc /field /doc ... /add As you can see, the field named subdoc contains an xml sequence. So, when i query the indexes, i want to retrieve the data in subdoc and i want to conserve the xml markup. Thank you for your help. -- -- | Modou DIA | modo...@gmail.com --
Re: Where does schema.xml's schema/@name displays?
I checked the 4.x source code and except for the fact that you will get a warning if you leave it out, nothing uses that name. But... that's not to say that a future release might not require it - the doc/comments don't explicitly say that it is optional. Note that the version attribute is optional (as per the source code, but no mention in doc/comments) and defaults to 1.0, with no warning. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, December 20, 2012 12:08 AM To: solr-user@lucene.apache.org Subject: Where does schema.xml's schema/@name displays? Hello, In the schema.xml, we have a name attribute on the root note. The documentation says it is for display purpose only. But for display where? It seems that the admin console uses the name in the solr.xml file instead. And deleting the name attribute does not seem to cause any problems either. The reason I ask is because I am writing an explanation example which involves schema.xml config file being copied and modified over and over again. If @name is significant, I need to mention changing it. If not, I will just delete it all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query
Hi James, I don't get how the spellcheck.maxResultsForSuggest param helps with making sure that the suggestions returned satisfy the fq params? That's the main problem we're trying to solve, how often suggestions are being returned is not really an issue for us at the moment. Thanks, Nalini On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James james.d...@ingramcontent.comwrote: Instead of using spellcheck.collateParam.mm, try just setting spellcheck.maxResultsForSuggest to a very high value (you can use up to Integer.MAX_VALUE here). So long as the user gets fewer results that whatever this is set for, you will get suggestions (and collations if desired). I was just playing with this and if I am understanding you correctly think this combination of parameters will give you what you want: spellcheck=true spellcheck.dictionary=whatever spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is before you don't want suggestions) spellcheck.count=20 (+/- depending on performance vs # suggestions required) spellcheck.maxCollationTries=10 (+/- depending on performance vs # suggestions required) spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions required) spellcheck.collate=true spellcheck.collateExtendedResults=true James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Nalini Kartha [mailto:nalinikar...@gmail.com] Sent: Wednesday, December 19, 2012 2:06 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query Hi James, Yup the example you gave about sums it up. Reason we use an OR query is that we want the flexibility of every term not having to match but when it comes to corrections we want to be sure that the ones we pick will actually return results (we message the user with the corrected query so it would be bad/confusing if there were no matches for the corrections). *- by default the spellchecker doesn't see this as a problem because it has hits (mm=0 and wrapping matches something). So you get neither individual words back nor collations from the spellchecker.* * * I think we would still get back 'papr - paper' as a correction and 'christmas wrapping paper' as a collation in this case - I've seen that corrections are returned even for OR queries. Problem is these would be returned even if 'paper' doesn't exist in any docs that have item:in_stock. *- with spellcheck.collateParam.mm http://spellcheck.collateparam.mm/ =100 it tries to fix both papr and christmas but can't fix christmas because spelling isn't the problem here (it is an irrelevant term not in the index). So while you get words suggested there are no collations. The individual words would be helpful, but you're not sure because they might all apply to items that do not match fq=item:in_stock.* Yup, exactly. Do you think the workaround I suggested would work (and not have terrible perf)? Or any other ideas? Thanks, Nalini On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James james.d...@ingramcontent.comwrote: Let me try and get a better idea of what you're after. Is it that your users might query a combination of irrelevant terms and misspelled terms, so you want the ability to ignore the irrelevant terms but still get suggestions for the misspelled terms? For instance if someone wanted q=christmas wrapping paprmm=0fq=item:in_stock, but christmas was not in the index and you wanted to return results for just wrapping paper, the problem is... - by default the spellchecker doesn't see this as a problem because it has hits (mm=0 and wrapping matches something). So you get neither individual words back nor collations from the spellchecker. - with spellcheck.collateParam.mm=100 it tries to fix both papr and christmas but can't fix christmas because spelling isn't the problem here (it is an irrelevant term not in the index). So while you get words suggested there are no collations. The individual words would be helpful, but you're not sure because they might all apply to items that do not match fq=item:in_stock. Is this the problem? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Nalini Kartha [mailto:nalinikar...@gmail.com] Sent: Wednesday, December 19, 2012 11:20 AM To: solr-user@lucene.apache.org Subject: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query Hi, With the DirectSolrSpellChecker, we want to be able to make sure that the corrections that are being returned satisfy the fq params of the original query. The collate functionality helps with this but seems to only work with default AND queries - our use case is for default OR queries. I also saw that there is now a spellcheck.collateParam.XX param which allows you to override params
Re: SolrCloud: only partial results returned
Does all the data have unique ids? - Mark On Dec 19, 2012, at 8:30 PM, Lili lyuan1...@gmail.com wrote: We set up SolrCloud with 2 shards and separate multiple zookeepers. The data added using http post with json in tutorial sample are not completely returned in query.However, if you send the same http post request again or shutdown solr instance and restart, the complete results will be returned. We have tried adding distrib=true in query or even adding shards=.. Still, only partial results are returned. This happened with embeded zookeepers too. However, this doesn't seem to happen if you add data with xml in tutorial samples. Any thoughts on what might be wrong or is it a known issue? Thanks, Lili -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query
The spellchecker doesn't support checking the indivdual words against the index with fq applied. This is only done for collations (and only if maxCollationTries is greater than 0). Checking every suggested word individually against the index after applying filter queries is probably going to be very expensive no matter how you implement it. However, someone with more lucene-core knowledge than I have might be able to give you better advice. If your root problem, though, is getting good did-you-mean-style suggestions with dismax queries and mm=0, and if you want to consider the case where some words in the query are misspelled and others are entirely irrelevant (and can't be corrected), then setting maxResultsForSuggest to a high value might give you the end result you want. Unlike if you use spellcheck.collateParam.mm=100%, it won't insist that the irrelevant terms (or a corrected irrelevant term) match anything. On the other hand, it won't assume the query is Correctly Spelled just because you got some hits from it (because mm=0 will just cause the misspelled terms to be thrown out). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Nalini Kartha [mailto:nalinikar...@gmail.com] Sent: Thursday, December 20, 2012 8:53 AM To: solr-user@lucene.apache.org Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query Hi James, I don't get how the spellcheck.maxResultsForSuggest param helps with making sure that the suggestions returned satisfy the fq params? That's the main problem we're trying to solve, how often suggestions are being returned is not really an issue for us at the moment. Thanks, Nalini On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James james.d...@ingramcontent.comwrote: Instead of using spellcheck.collateParam.mm, try just setting spellcheck.maxResultsForSuggest to a very high value (you can use up to Integer.MAX_VALUE here). So long as the user gets fewer results that whatever this is set for, you will get suggestions (and collations if desired). I was just playing with this and if I am understanding you correctly think this combination of parameters will give you what you want: spellcheck=true spellcheck.dictionary=whatever spellcheck.maxResultsForSuggest=1000 (or whatever the cut off is before you don't want suggestions) spellcheck.count=20 (+/- depending on performance vs # suggestions required) spellcheck.maxCollationTries=10 (+/- depending on performance vs # suggestions required) spellcheck.maxCollations=10 (+/- depending on performance vs # suggestions required) spellcheck.collate=true spellcheck.collateExtendedResults=true James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Nalini Kartha [mailto:nalinikar...@gmail.com] Sent: Wednesday, December 19, 2012 2:06 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query Hi James, Yup the example you gave about sums it up. Reason we use an OR query is that we want the flexibility of every term not having to match but when it comes to corrections we want to be sure that the ones we pick will actually return results (we message the user with the corrected query so it would be bad/confusing if there were no matches for the corrections). *- by default the spellchecker doesn't see this as a problem because it has hits (mm=0 and wrapping matches something). So you get neither individual words back nor collations from the spellchecker.* * * I think we would still get back 'papr - paper' as a correction and 'christmas wrapping paper' as a collation in this case - I've seen that corrections are returned even for OR queries. Problem is these would be returned even if 'paper' doesn't exist in any docs that have item:in_stock. *- with spellcheck.collateParam.mm http://spellcheck.collateparam.mm/ =100 it tries to fix both papr and christmas but can't fix christmas because spelling isn't the problem here (it is an irrelevant term not in the index). So while you get words suggested there are no collations. The individual words would be helpful, but you're not sure because they might all apply to items that do not match fq=item:in_stock.* Yup, exactly. Do you think the workaround I suggested would work (and not have terrible perf)? Or any other ideas? Thanks, Nalini On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James james.d...@ingramcontent.comwrote: Let me try and get a better idea of what you're after. Is it that your users might query a combination of irrelevant terms and misspelled terms, so you want the ability to ignore the irrelevant terms but still get suggestions for the misspelled terms? For instance if someone wanted q=christmas wrapping paprmm=0fq=item:in_stock, but christmas was not in the index and you wanted to
Re: Pause and resume indexing on SolR 4 for backups
On 20/12/12 13:38, Upayavira wrote: The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Hmm. Strange - the files created by the backup API don't seem to correlate exactly with the files stored under the solr data directory: andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/_2vq.fdx /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim /tmp/snapshot.20121220155853703/segments_2vs /tmp/snapshot.20121220155853703/_2vq_nrm.cfs /tmp/snapshot.20121220155853703/_2vq.fnm /tmp/snapshot.20121220155853703/_2vq_nrm.cfe /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq /tmp/snapshot.20121220155853703/_2vq.fdt /tmp/snapshot.20121220155853703/_2vq.si /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip andydj@me-solr01:~$ find /var/lib/solr/data/index/ /var/lib/solr/data/index/ /var/lib/solr/data/index/_2w6_Lucene40_0.frq /var/lib/solr/data/index/_2w6.si /var/lib/solr/data/index/segments_2w8 /var/lib/solr/data/index/write.lock /var/lib/solr/data/index/_2w6_nrm.cfs /var/lib/solr/data/index/_2w6.fdx /var/lib/solr/data/index/_2w6_Lucene40_0.tip /var/lib/solr/data/index/_2w6_nrm.cfe /var/lib/solr/data/index/segments.gen /var/lib/solr/data/index/_2w6.fnm /var/lib/solr/data/index/_2w6.fdt /var/lib/solr/data/index/_2w6_Lucene40_0.tim Am I correct in thinking that to restore from this backup, I would need to do the following? 1. Stop Tomcat (or maybe just solr) 2. Remove all files under /var/lib/solr/data/index/ 3. Move/copy files from /tmp/snapshot.20121220155853703/ to /var/lib/solr/data/index/ 4. Restart Tomcat (or just solr) Thanks everyone who's pitched in on this! Once I've got this working, I'll document it. -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Pause and resume indexing on SolR 4 for backups
Are you sure a commit didn't happen between? Also, a background merge might have happened. As to using a backup, you are right, just stop solr, put the snapshot into index/data, and restart. Upayavira On Thu, Dec 20, 2012, at 05:16 PM, Andy D'Arcy Jewell wrote: On 20/12/12 13:38, Upayavira wrote: The backup directory should just be a clone of the index files. I'm curious to know whether it is a cp -r or a cp -lr that the replication handler produces. You would prevent commits by telling your app not to commit. That is, Solr only commits when it is *told* to. Unless you use autocommit, in which case I guess you could monitor your logs for the last commit, and do your backup a 10 seconds after that. Hmm. Strange - the files created by the backup API don't seem to correlate exactly with the files stored under the solr data directory: andydj@me-solr01:~$ find /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/ /tmp/snapshot.20121220155853703/_2vq.fdx /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tim /tmp/snapshot.20121220155853703/segments_2vs /tmp/snapshot.20121220155853703/_2vq_nrm.cfs /tmp/snapshot.20121220155853703/_2vq.fnm /tmp/snapshot.20121220155853703/_2vq_nrm.cfe /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.frq /tmp/snapshot.20121220155853703/_2vq.fdt /tmp/snapshot.20121220155853703/_2vq.si /tmp/snapshot.20121220155853703/_2vq_Lucene40_0.tip andydj@me-solr01:~$ find /var/lib/solr/data/index/ /var/lib/solr/data/index/ /var/lib/solr/data/index/_2w6_Lucene40_0.frq /var/lib/solr/data/index/_2w6.si /var/lib/solr/data/index/segments_2w8 /var/lib/solr/data/index/write.lock /var/lib/solr/data/index/_2w6_nrm.cfs /var/lib/solr/data/index/_2w6.fdx /var/lib/solr/data/index/_2w6_Lucene40_0.tip /var/lib/solr/data/index/_2w6_nrm.cfe /var/lib/solr/data/index/segments.gen /var/lib/solr/data/index/_2w6.fnm /var/lib/solr/data/index/_2w6.fdt /var/lib/solr/data/index/_2w6_Lucene40_0.tim Am I correct in thinking that to restore from this backup, I would need to do the following? 1. Stop Tomcat (or maybe just solr) 2. Remove all files under /var/lib/solr/data/index/ 3. Move/copy files from /tmp/snapshot.20121220155853703/ to /var/lib/solr/data/index/ 4. Restart Tomcat (or just solr) Thanks everyone who's pitched in on this! Once I've got this working, I'll document it. -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: SolrCloud: only partial results returned
Mark, yes, they have unique ids. Most the time, after the 2nd json http post, query will return complete results. I believe the data was indexed already with 1st post since if I shutdown the solr after 1st post and restart again, query will return complete result set. Thanks, Lili -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-only-partial-results-returned-tp4028200p4028367.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr/Lucene Engineer - Contract Opportunity - Raleigh, NC
Hi Lance, I am an IT Recruiter in Raleigh, NC. Would you or would anyone you know be interested in a long term contract opportunity for a Solr/Lucene Engineer with Cisco here in RTP, NC? Thanks for your time Lance and have a safe and happy Holiday! [Description: Description: Description: Description: C:\Users\dhil2\AppData\Roaming\Microsoft\Signatures\ExperisIT.jpg] Tom Polak IT Recruiter Experis IT Staffing 1122 Oberlin Road Raleigh, NC 27605 T: 919 755 5838 F: 919 755 5828 C: 919 457 8530 tom.po...@experis.commailto:tom.po...@experis.com www.experis.comhttp://www.experis.com/ [Description: cid:image008.png@01CD9A38.C965CAF0]https://twitter.com/tphires4itinnc [Description: cid:image007.gif@01CD9A38.C965CAF0]http://www.linkedin.com/in/tompolak Referral Program: Easy ...Refer an IT Professional in your network to me today!
Re: Pause and resume indexing on SolR 4 for backups
To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a backup. The CheckIndex program will verify the index backup. java -cp yourcopy/lucene-core-SOMETHING.jar org.apache.lucene.index.CheckIndex collection/data/index lucene-core-SOMETHING.jar is usually in the solr-webapp directory where Solr is unpacked. On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy
Re: Pause and resume indexing on SolR 4 for backups
You're saying that there's no chance to catch it in the middle of writing the segments file? Having said that, the segments file is pretty small, so the chance would be pretty slim. Upayavira On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote: To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a backup. The CheckIndex program will verify the index backup. java -cp yourcopy/lucene-core-SOMETHING.jar org.apache.lucene.index.CheckIndex collection/data/index lucene-core-SOMETHING.jar is usually in the solr-webapp directory where Solr is unpacked. On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy
Re: Where does schema.xml's schema/@name displays?
Jack, FWIW I've found occurrence in SystemInfoHandler.java On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky j...@basetechnology.comwrote: I checked the 4.x source code and except for the fact that you will get a warning if you leave it out, nothing uses that name. But... that's not to say that a future release might not require it - the doc/comments don't explicitly say that it is optional. Note that the version attribute is optional (as per the source code, but no mention in doc/comments) and defaults to 1.0, with no warning. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, December 20, 2012 12:08 AM To: solr-user@lucene.apache.org Subject: Where does schema.xml's schema/@name displays? Hello, In the schema.xml, we have a name attribute on the root note. The documentation says it is for display purpose only. But for display where? It seems that the admin console uses the name in the solr.xml file instead. And deleting the name attribute does not seem to cause any problems either. The reason I ask is because I am writing an explanation example which involves schema.xml config file being copied and modified over and over again. If @name is significant, I need to mention changing it. If not, I will just delete it all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Pause and resume indexing on SolR 4 for backups
Depending on your architecture, why not index the same data into two machines? One will be your prod another your backup? Thanks. Alex. -Original Message- From: Upayavira u...@odoko.co.uk To: solr-user solr-user@lucene.apache.org Sent: Thu, Dec 20, 2012 11:51 am Subject: Re: Pause and resume indexing on SolR 4 for backups You're saying that there's no chance to catch it in the middle of writing the segments file? Having said that, the segments file is pretty small, so the chance would be pretty slim. Upayavira On Thu, Dec 20, 2012, at 06:45 PM, Lance Norskog wrote: To be clear: 1) is fine. Lucene index updates are carefully sequenced so that the index is never in a bogus state. All data files are written and flushed to disk, then the segments.* files are written that match the data files. You can capture the files with a set of hard links to create a backup. The CheckIndex program will verify the index backup. java -cp yourcopy/lucene-core-SOMETHING.jar org.apache.lucene.index.CheckIndex collection/data/index lucene-core-SOMETHING.jar is usually in the solr-webapp directory where Solr is unpacked. On 12/20/2012 02:16 AM, Andy D'Arcy Jewell wrote: Hi all. Can anyone advise me of a way to pause and resume SolR 4 so I can perform a backup? I need to be able to revert to a usable (though not necessarily complete) index after a crash or other disaster more quickly than a re-index operation would yield. I can't yet afford the extravagance of a separate SolR replica just for backups, and I'm not sure if I'll ever have the luxury. I'm currently running with just one node, be we are not yet live. I can think of the following ways to do this, each with various downsides: 1) Just backup the existing index files whilst indexing continues + Easy + Fast - Incomplete - Potential for corruption? (e.g. partial files) 2) Stop/Start Tomcat + Easy - Very slow and I/O, CPU intensive - Client gets errors when trying to connect 3) Block/unblock SolR port with IpTables + Fast - Client gets errors when trying to connect - Have to wait for existing transactions to complete (not sure how, maybe watch socket FD's in /proc) 4) Pause/Restart SolR service + Fast ? (hopefully) - Client gets errors when trying to connect In any event, the web app will have to gracefully handle unavailability of SolR, probably by displaying a down for maintenance message, but this should preferably be only a very short amount of time. Can anyone comment on my proposed solutions above, or provide any additional ones? Thanks for any input you can provide! -Andy
Re: Where does schema.xml's schema/@name displays?
Yeah... not sure how I missed it, but my search sees it now. Also, the name will default to schema.xml is you do leave it out of the schema. -- Jack Krupansky -Original Message- From: Mikhail Khludnev Sent: Thursday, December 20, 2012 3:06 PM To: solr-user Subject: Re: Where does schema.xml's schema/@name displays? Jack, FWIW I've found occurrence in SystemInfoHandler.java On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky j...@basetechnology.comwrote: I checked the 4.x source code and except for the fact that you will get a warning if you leave it out, nothing uses that name. But... that's not to say that a future release might not require it - the doc/comments don't explicitly say that it is optional. Note that the version attribute is optional (as per the source code, but no mention in doc/comments) and defaults to 1.0, with no warning. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, December 20, 2012 12:08 AM To: solr-user@lucene.apache.org Subject: Where does schema.xml's schema/@name displays? Hello, In the schema.xml, we have a name attribute on the root note. The documentation says it is for display purpose only. But for display where? It seems that the admin console uses the name in the solr.xml file instead. And deleting the name attribute does not seem to cause any problems either. The reason I ask is because I am writing an explanation example which involves schema.xml config file being copied and modified over and over again. If @name is significant, I need to mention changing it. If not, I will just delete it all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Store and retrieve an xml sequence without losing the markup
What happens if you just supply it as CDATA into a string field? Store, no index, probably compressed and lazy. Regards, Alex On 20 Dec 2012 09:30, Modou DIA modo...@gmail.com wrote: Hi everybody, i'm newbie with Solr technologies but in the past i worked with lucene and another solution similar to Solr. I'm working with solr 4.0. I use solrj for embedding an Solr server in a cocoon 2.1 application. I want to know if it's possible to store (without indexing) a field containing a xml sequence. I mean a field which can store xml data in indexes without losing xpath informations. For exemple, this's a document to index: add doc field name=idid_1/field field name=infotesting/field field name=subdoc subdoc id=id_1 datatesting/data /subdoc /field /doc ... /add As you can see, the field named subdoc contains an xml sequence. So, when i query the indexes, i want to retrieve the data in subdoc and i want to conserve the xml markup. Thank you for your help. -- -- | Modou DIA | modo...@gmail.com --
Re: Store and retrieve an xml sequence without losing the markup
Right, you can store it, but you can't search on it that way, and you certainly can't do complex searches that take the XML structure into account (e.g. xpath queries). Upayavira On Thu, Dec 20, 2012, at 10:22 PM, Alexandre Rafalovitch wrote: What happens if you just supply it as CDATA into a string field? Store, no index, probably compressed and lazy. Regards, Alex On 20 Dec 2012 09:30, Modou DIA modo...@gmail.com wrote: Hi everybody, i'm newbie with Solr technologies but in the past i worked with lucene and another solution similar to Solr. I'm working with solr 4.0. I use solrj for embedding an Solr server in a cocoon 2.1 application. I want to know if it's possible to store (without indexing) a field containing a xml sequence. I mean a field which can store xml data in indexes without losing xpath informations. For exemple, this's a document to index: add doc field name=idid_1/field field name=infotesting/field field name=subdoc subdoc id=id_1 datatesting/data /subdoc /field /doc ... /add As you can see, the field named subdoc contains an xml sequence. So, when i query the indexes, i want to retrieve the data in subdoc and i want to conserve the xml markup. Thank you for your help. -- -- | Modou DIA | modo...@gmail.com --
RE: occasional GC crashes
Hi Otis, I thought Java 7 had a bug which wasn't being addressed by Oracle which was making it not suitable for Solr. Did that get fixed now? http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/ I did see this but it doesn't really mention the bug: http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/ Thanks Robi -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Tuesday, December 18, 2012 5:25 PM To: solr-user@lucene.apache.org Subject: Re: occasional GC crashes Robert, Step 1 is to get the latest Java 7 or if you have to remain on 6 then use the latest 6. Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012 7:54 PM, Petersen, Robert rober...@buy.com wrote: Hi solr user group, ** ** Sorry if this isn't directly a Solr question. Seems like once in a blue moon the GC crashes on a server in our Solr 3.6.1 slave farm. This seems to only happen on a couple of the twelve slaves we have deployed and only very rarely on those. It seems like this doesn't directly affect solr because in the logs it looks like solr keeps working after the time of the exception but our external monitoring tool reports that the solr service is down so our operations department restarts solr on that box and alerts me. The solr logs show nothing unusual. The exception does show up in the catalina.out log file though. Does this happen to anyone else? Here is the basic error and I have attached the crash dump file also. Our total uptime on these boxes is over a year now BTW. ** ** # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724, tid=1082353984 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 # # An error report file with more information is saved as: # /var/LucidWorks/lucidworks/hs_err_pid13724.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # ** ** VM Arguments: jvm_args: -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf /logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=6060 -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp java_command: org.apache.catalina.startup.Bootstrap -server -Dsolr.solr.home=lucidworks/solr start Launcher Type: SUN_STANDARD ** ** Stack: [0x,0x], sp=0x40835eb0, free space=1056983k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 V [libjvm.so+0x3c481a] CMSConcMarkingTask::do_work_steal(int)+0xfa V [libjvm.so+0x3c3dcf] CMSConcMarkingTask::work(int)+0xef V [libjvm.so+0x8783dc] YieldingFlexibleGangWorker::loop()+0xbc V [libjvm.so+0x8755b4] GangWorker::run()+0x24 V [libjvm.so+0x71096f] java_start(Thread*)+0x13f ** ** Heap par new generation total 345024K, used 180672K [0x2e12, 0x2aaac578, 0x2aaac578) eden space 306688K, 53% used [0x2e12, 0x2aaab8243c28, 0x2aaac0ca) from space 38336K, 40% used [0x2aaac321, 0x2aaac415c3f8, 0x2aaac578) to space 38336K, 0% used [0x2aaac0ca, 0x2aaac0ca, 0x2aaac321) concurrent mark-sweep generation total 33171072K, used 12144213K [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12) concurrent-mark-sweep perm gen total 83968K, used 50650K [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332) ** ** Code Cache [0x2b054000, 0x2b9a4000, 0x2e054000)** ** total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512 largest_free_block=15808 ** ** ** ** ** ** Thanks, ** ** *Robert (Robi) Petersen* Senior Software Engineer Search Department ** **
Japanese exact match results do not show on top of results
Hi folks, I am having couple of problems with Japanese data, 1. it is not properly indexing all the data 2. displaying the exact match result on top and then 90%match and 80%match etc. does not work. I am using solr3.6.1 and using text_ja as the fieldType here is the schema field name=q type=text_ja indexed=true stored=true / field name=qs type=text_general indexed=false stored=true multiValued=true/ field name=q_e type=string indexed=true stored=true / copyField source=q dest=q_e maxChars=250/ what I want to achieve is that if there is an exact query match it should provide the results from q_e followed by results from partial match from q field and if there is nothing in q_e field then partial matches should come from q field. This is how I specify the query http://localhost:7983/zoom/jp/select/?q=鹿児島 鹿児島銀行rows=10version=2.2qf=query+query_exact^1mm=90%25pf=q^1+q_e^10 OR version=2.2rows=10qf=q+q_e^1pf=query^10+query_exact^1 somehow the exact query matches results do not come on top, though the data contains it. It is puzzling that all the documents do not get indexed properly, but if I change the q field to string and q_e to text_ja then all the records are indexed properly, but that still does not solve the problem of exact match on top followed by partial matches. text_ja field uses: filter class=solr.JapaneseBaseFormFilterFactory/ filter class=solr.JapanesePartOfSpeechStopFilterFactory tags=../../../solr/conf/lang/stoptags_ja.txt enablePositionIncrements=true/ filter class=solr.CJKWidthFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=../../../solr/conf/lang/stopwords_ja.txt enablePositionIncrements=true / filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/ filter class=solr.LowerCaseFilterFactory/ How to solve this problem, Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Japanese exact match results do not show on top of results
I think you are hitting solr-3589. There is a vote underway for a 3.6.2 that contains this fix On Dec 20, 2012 6:29 PM, kirpakaro khem...@yahoo.com wrote: Hi folks, I am having couple of problems with Japanese data, 1. it is not properly indexing all the data 2. displaying the exact match result on top and then 90%match and 80%match etc. does not work. I am using solr3.6.1 and using text_ja as the fieldType here is the schema field name=q type=text_ja indexed=true stored=true / field name=qs type=text_general indexed=false stored=true multiValued=true/ field name=q_e type=string indexed=true stored=true / copyField source=q dest=q_e maxChars=250/ what I want to achieve is that if there is an exact query match it should provide the results from q_e followed by results from partial match from q field and if there is nothing in q_e field then partial matches should come from q field. This is how I specify the query http://localhost:7983/zoom/jp/select/?q=鹿児島 鹿児島銀行rows=10version=2.2qf=query+query_exact^1mm=90%25pf=q^1+q_e^10 OR version=2.2rows=10qf=q+q_e^1pf=query^10+query_exact^1 somehow the exact query matches results do not come on top, though the data contains it. It is puzzling that all the documents do not get indexed properly, but if I change the q field to string and q_e to text_ja then all the records are indexed properly, but that still does not solve the problem of exact match on top followed by partial matches. text_ja field uses: filter class=solr.JapaneseBaseFormFilterFactory/ filter class=solr.JapanesePartOfSpeechStopFilterFactory tags=../../../solr/conf/lang/stoptags_ja.txt enablePositionIncrements=true/ filter class=solr.CJKWidthFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=../../../solr/conf/lang/stopwords_ja.txt enablePositionIncrements=true / filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/ filter class=solr.LowerCaseFilterFactory/ How to solve this problem, Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Japanese-exact-match-results-do-not-show-on-top-of-results-tp4028422.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: jconsole over jmx - should threads be visible?
: If I connect jconsole to a remote Solr installation (or any app) using jmx, : all the graphs are populated except 'threads' ... is this expected, or have I : done something wrong? I can't seem to locate the answer with google. i just tried running the 4x solr example with the jetty options to allow remote JMX... java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1099 -jar start.jar ...and was then able to monitor using jconsole and see all of the thread info as well fro ma remote machine. -Hoss
Store document while using Solr
hi there, I am quite new to Solr and have a very basic question about storing and indexing the document. I am trying with the Solr example, and when I run command like 'java -jar post.jar foo/test.xml', it gives me the feeling that solr will index the given file, no matter where it is store, and solr won't re-store this file to some other location in the file system. Am I correct? If I want use file system to manage the document, it seem like it is better to define some location, which will be used to store all the potential files(It may need some processing to move/copy/upload the files to this location), then use solr to index them under this location. Am I correct? Cheers, Nick
Re: Store document while using Solr
Hi, You can use Solr's DataImportHandler to index files in the file system. You could set things up in such a way that Solr keeps indexing whatever you put in some specific location in the FS. This is not the most common setup, but it's certainly possible. Solr keeps the searchable index in its own directory defined in in one of its configs. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Thu, Dec 20, 2012 at 8:15 PM, Nicholas Li nicholas...@yarris.com wrote: hi there, I am quite new to Solr and have a very basic question about storing and indexing the document. I am trying with the Solr example, and when I run command like 'java -jar post.jar foo/test.xml', it gives me the feeling that solr will index the given file, no matter where it is store, and solr won't re-store this file to some other location in the file system. Am I correct? If I want use file system to manage the document, it seem like it is better to define some location, which will be used to store all the potential files(It may need some processing to move/copy/upload the files to this location), then use solr to index them under this location. Am I correct? Cheers, Nick
RE: occasional GC crashes
Hi Robi, Oh that's the thing of the past, go for the latest Java 7 if they let you! Otis -- Performance Monitoring - http://sematext.com/spm On Dec 20, 2012 6:29 PM, Petersen, Robert rober...@buy.com wrote: Hi Otis, I thought Java 7 had a bug which wasn't being addressed by Oracle which was making it not suitable for Solr. Did that get fixed now? http://searchhub.org/2011/07/28/dont-use-java-7-for-anything/ I did see this but it doesn't really mention the bug: http://opensearchnews.com/2012/04/announcing-java7-support-with-apache-solr-and-lucene/ Thanks Robi -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Tuesday, December 18, 2012 5:25 PM To: solr-user@lucene.apache.org Subject: Re: occasional GC crashes Robert, Step 1 is to get the latest Java 7 or if you have to remain on 6 then use the latest 6. Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Dec 18, 2012 7:54 PM, Petersen, Robert rober...@buy.com wrote: Hi solr user group, ** ** Sorry if this isn't directly a Solr question. Seems like once in a blue moon the GC crashes on a server in our Solr 3.6.1 slave farm. This seems to only happen on a couple of the twelve slaves we have deployed and only very rarely on those. It seems like this doesn't directly affect solr because in the logs it looks like solr keeps working after the time of the exception but our external monitoring tool reports that the solr service is down so our operations department restarts solr on that box and alerts me. The solr logs show nothing unusual. The exception does show up in the catalina.out log file though. Does this happen to anyone else? Here is the basic error and I have attached the crash dump file also. Our total uptime on these boxes is over a year now BTW. ** ** # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b5379346612, pid=13724, tid=1082353984 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 # # An error report file with more information is saved as: # /var/LucidWorks/lucidworks/hs_err_pid13724.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # ** ** VM Arguments: jvm_args: -Djava.util.logging.config.file=/var/LucidWorks/lucidworks/tomcat/conf /logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx32768m -Xms32768m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=6060 -Djava.endorsed.dirs=/var/LucidWorks/lucidworks/tomcat/endorsed -Dcatalina.base=/var/LucidWorks/lucidworks/tomcat -Dcatalina.home=/var/LucidWorks/lucidworks/tomcat -Djava.io.tmpdir=/var/LucidWorks/lucidworks/tomcat/temp java_command: org.apache.catalina.startup.Bootstrap -server -Dsolr.solr.home=lucidworks/solr start Launcher Type: SUN_STANDARD ** ** Stack: [0x,0x], sp=0x40835eb0, free space=1056983k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x3c4612] Par_ConcMarkingClosure::trim_queue(unsigned long)+0x82 V [libjvm.so+0x3c481a] CMSConcMarkingTask::do_work_steal(int)+0xfa V [libjvm.so+0x3c3dcf] CMSConcMarkingTask::work(int)+0xef V [libjvm.so+0x8783dc] YieldingFlexibleGangWorker::loop()+0xbc V [libjvm.so+0x8755b4] GangWorker::run()+0x24 V [libjvm.so+0x71096f] java_start(Thread*)+0x13f ** ** Heap par new generation total 345024K, used 180672K [0x2e12, 0x2aaac578, 0x2aaac578) eden space 306688K, 53% used [0x2e12, 0x2aaab8243c28, 0x2aaac0ca) from space 38336K, 40% used [0x2aaac321, 0x2aaac415c3f8, 0x2aaac578) to space 38336K, 0% used [0x2aaac0ca, 0x2aaac0ca, 0x2aaac321) concurrent mark-sweep generation total 33171072K, used 12144213K [0x2aaac578, 0x2ab2ae12, 0x2ab2ae12) concurrent-mark-sweep perm gen total 83968K, used 50650K [0x2ab2ae12, 0x2ab2b332, 0x2ab2b332) ** ** Code Cache [0x2b054000, 0x2b9a4000, 0x2e054000)** ** total_blobs=2800 nmethods=2273 adapters=480 free_code_cache=40752512 largest_free_block=15808 ** ** ** ** ** **
Re: Invalid version (expected 2, but 60) or the data in not in 'javabin'
Hi, Have a look at http://search-lucene.com/?q=invalid+version+javabin Otis -- Solr Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Wed, Dec 19, 2012 at 11:23 AM, Shahar Davidson shah...@checkpoint.comwrote: Hi, I'm encountering this error randomly when running a distributed facet. (i.e. I'm sending the exact same request, yet this does not reproduce consistently) I have about 180 shards that are being queried. It seems that when Solr distributes the request to the shards one , or perhaps more, shards return an XML reply instead of Javabin. I added some debug output to JavaBinCode.unmarshal (as done in the debugging.patch of SOLR-3258) to check whether the XML reply holds an error or not, and I noticed that the XML actually holds the response from one of the shards. I'm using the patch provided in SOLR-2894 on top of trunk 1404975. Has anyone encountered such an issue? Any ideas? Thanks, Shahar.
Re: Where does schema.xml's schema/@name displays?
Thank you. So, the conclusion to me is that @name can be skipped. It is not used in anything (or anything critical anyway) and there is a default. That's good enough for me. On another hand, having @version default to 1.0 is probably an oversight, given the number of changes present Should it not default to latest or at least to 1.5 (and change periodically)? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Dec 21, 2012 at 7:50 AM, Jack Krupansky j...@basetechnology.comwrote: Yeah... not sure how I missed it, but my search sees it now. Also, the name will default to schema.xml is you do leave it out of the schema. -- Jack Krupansky -Original Message- From: Mikhail Khludnev Sent: Thursday, December 20, 2012 3:06 PM To: solr-user Subject: Re: Where does schema.xml's schema/@name displays? Jack, FWIW I've found occurrence in SystemInfoHandler.java On Thu, Dec 20, 2012 at 6:32 PM, Jack Krupansky j...@basetechnology.com* *wrote: I checked the 4.x source code and except for the fact that you will get a warning if you leave it out, nothing uses that name. But... that's not to say that a future release might not require it - the doc/comments don't explicitly say that it is optional. Note that the version attribute is optional (as per the source code, but no mention in doc/comments) and defaults to 1.0, with no warning. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, December 20, 2012 12:08 AM To: solr-user@lucene.apache.org Subject: Where does schema.xml's schema/@name displays? Hello, In the schema.xml, we have a name attribute on the root note. The documentation says it is for display purpose only. But for display where? It seems that the admin console uses the name in the solr.xml file instead. And deleting the name attribute does not seem to cause any problems either. The reason I ask is because I am writing an explanation example which involves schema.xml config file being copied and modified over and over again. If @name is significant, I need to mention changing it. If not, I will just delete it all together. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitchhttp://www.linkedin.com/in/**alexandrerafalovitch http://**www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Where does schema.xml's schema/@name displays?
: On another hand, having @version default to 1.0 is probably an oversight, : given the number of changes present Should it not default to latest or : at least to 1.5 (and change periodically)? If the default value changed, then users w/o a version attribute in their schema would suddenly get very different behavior if they upgraded from one version of solr the the next. -Hoss
Re: Where does schema.xml's schema/@name displays?
I agree actually (about not surprising the users). But the consequences of forgetting this value may also lead to some serious debugging issues. An interesting (not sure if reasonable) compromise would be to look at an error message for @version=1 and using @multiValued attribute and make sure it actually complains if it sees such combination and that the message explicitly say What's your @version value? Maybe it needs to be explicit/more recent. Same with autoGeneratePhraseQueries and @version=1.4. Then, somebody patching together a config file from multiple sources will be guided in the right direction. Just a newbie-oriented thought. I am sure, there are other more-processing things on a pipeline. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Dec 21, 2012 at 4:49 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : On another hand, having @version default to 1.0 is probably an oversight, : given the number of changes present Should it not default to latest or : at least to 1.5 (and change periodically)? If the default value changed, then users w/o a version attribute in their schema would suddenly get very different behavior if they upgraded from one version of solr the the next. -Hoss
Reply:How to add the extra analyzer jar?
The issue has been solved and sorry for my negligence. At 2012-12-21 11:10:53,SuoNayi suonayi2...@163.com wrote: Hi all, for solrcloud(solr 4.0) how to add the third analyzer? There is a third analyzer jar and I want to integrate it with solrcloud. Here are my steps but the ClassNotFoundException is thrown at last when startup. 1.add the fieldType in the schema.xml and here is a snippet : !-- ikanalyzer -- fieldType name=text_ik class=solr.TextField analyzer class=org.wltea.analyzer.lucene.IKAnalyzer/ analyzer type=index tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory isMaxWordLength=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory isMaxWordLength=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType 2. add the IKAnalyzer.cfg.xml and stopword.dic files into the classes directory of the solr.war(open the war and add those two files). 3.use the start.jar to start up and the ClassNotFoundException is thrown. Could some help me to figure out what's wrong or tell me where I can add the extra/third jar lib into the classpath of the solrcloud? Thanks, SuoNayi
Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.
You can place the missing JAR files in the contrib/extraction/lib. For class files: asm-x.x.jar For mp4 files: aspectjrt-x.x.jar FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209 Regards, Shinichiro Abe On 2012/12/21, at 15:08, Shigeki Kobayashi wrote: Hi, I use ManifoldCF1.1dev to crawl files and index them into Solr4.0 While indexing class files and mp4 files, Solr caused NoClassDefFoundError as following: Indexing a mp4 file 2012-12-19 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) ... 18 more Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 29 more -- Indexing a class file 2012-12-19 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at
Which token filter can combine 2 terms into 1?
Hi, I am looking for a token filter that can combine 2 terms into 1? E.g. the input has been tokenized by white space: t1 t2 t2a t3 I want a filter that output: t1 t2t2a t3 I know it is a very special case, and I am thinking about develop a filter of my own. But I cannot figure out which API I should use to look for terms in a Token Stream. -- Regards, David Shen http://about.me/davidshen https://twitter.com/#!/davidshen84
Solr 3.5: java.lang.NegativeArraySizeException caused by negative start value
This is on Solr 3.5.0. We are getting a java.lang.NegativeArraySizeException when our webapp sends a query where the start parameter is set to a negative value. This seems to set off a denial of service problem within Solr. I don't yet know whether it's a mistake in coding, or whether some malicious user has found an attack vector on our site. After the first exception, another exception (org.mortbay.jetty.EofException) appears in the logs with increasing frequency. Within minutes of the first exception, the load balancer complains about having no servers available because ping requests are failing. This is distributed search, but the shards parameter is in solrconfig.xml, not provided by the client. Full exception: Dec 20, 2012 7:41:34 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NegativeArraySizeException at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:108) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.init(ShardDoc.java:139) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:712) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:571) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:550) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Later exceptions: Dec 21, 2012 12:24:37 AM org.apache.solr.common.SolrException log SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at
Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.
Thanks Abe-san! Your advice is very informative. Thanks again. Regards, Shigeki 2012/12/21 Shinichiro Abe shinichiro.ab...@gmail.com You can place the missing JAR files in the contrib/extraction/lib. For class files: asm-x.x.jar For mp4 files: aspectjrt-x.x.jar FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209 Regards, Shinichiro Abe On 2012/12/21, at 15:08, Shigeki Kobayashi wrote: Hi, I use ManifoldCF1.1dev to crawl files and index them into Solr4.0 While indexing class files and mp4 files, Solr caused NoClassDefFoundError as following: Indexing a mp4 file 2012-12-19 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) ... 18 more Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 29 more -- Indexing a class file 2012-12-19 08:10:58,327%P[solr.servlet.SolrDispatchFilter]-[TP-Processor3]-:null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/objectweb/asm/ClassVisitor at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at