Using recency rord on /distrib
Hi, I have to put recency using recip and rord functions on an app using /distrib requesthandler. Can i put bf param in /distrib directly call the url like: http://localhost:8983/solr/distrib/?q=cable where in /distrib requesthandler bf is defined as: str name=bf recip(rord(last_sold_date),1,1000,1000)^0.7 /str I am not able to see the difference in the results with or without the bf param defined. Please share your views. regards, Pooja
Re: Can solr build on top of HBase
hi, thanks, and now i can index data from hbase to the solr server using nutch core. but the indexdata will be local storage,that 's what i worry about,to be too large in local. MountableHDFS i never use it ,i am not sure weather solr can write the index into HDFS,i doubt it can work without implements Writable in HDFS. and i think the point is the reading and writing the indexfile in HDFS just like it in local filesystem , can u make a new index file format witch can use in the HDFS, if it can ,i think that will be a great helpful to distrabuted index. if solr built on top of lucene , will it be easy to implement the HDFS file format? 2009/9/24 Amit Nithian anith...@gmail.com Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use? I wonder if you could take the data from HBase and index it into a Lucene index stored on HDFS. 2009/9/23 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com can hbase be mounted on the filesystem? Solr can only read data from a filesystem On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote: hi, i use hbase and solr ,now i have a large data need to index ,it means solr-index will be large, as the data increases,it will be more larger than now. so solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from api ,and point to my distrabuted hbase data storage, and if the index is too large ,will it be slow? thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Can solr build on top of HBase
I don't think using HDFS or HBase will perform for this kind of thing at all. If you are that large, you should look into distributing your index into shards and using Solr's distributed search capabilities. -Grant On Sep 24, 2009, at 3:25 AM, 梁景明 wrote: hi, thanks, and now i can index data from hbase to the solr server using nutch core. but the indexdata will be local storage,that 's what i worry about,to be too large in local. MountableHDFS i never use it ,i am not sure weather solr can write the index into HDFS,i doubt it can work without implements Writable in HDFS. and i think the point is the reading and writing the indexfile in HDFS just like it in local filesystem , can u make a new index file format witch can use in the HDFS, if it can ,i think that will be a great helpful to distrabuted index. if solr built on top of lucene , will it be easy to implement the HDFS file format? 2009/9/24 Amit Nithian anith...@gmail.com Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use? I wonder if you could take the data from HBase and index it into a Lucene index stored on HDFS. 2009/9/23 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com can hbase be mounted on the filesystem? Solr can only read data from a filesystem On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote: hi, i use hbase and solr ,now i have a large data need to index ,it means solr-index will be large, as the data increases,it will be more larger than now. so solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from api ,and point to my distrabuted hbase data storage, and if the index is too large ,will it be slow? thanks. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: define index at search time
No, I am talking about having multiple indexes, i want to send the index name to the searcher so it will search that index, rather than use the one defined in the schema/solrconfig. nothing t do with multiple cores, i mean different indexes entirely with completely different content. Avlesh Singh wrote: Are you talking about multiple cores? Cheers Avlesh On Mon, Sep 21, 2009 at 9:15 PM, DHast hastings.recurs...@gmail.com wrote: is there a way i can actually tell solr which index i want it to search against with the query? I know it will cost a bit on performance, but it would be helpful i have many indexes and it would be nice to determine which one should be used by the user. thanks -- View this message in context: http://www.nabble.com/define-index-at-search-time-tp25530378p25530378.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/define-index-at-search-time-tp25530378p25564438.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: define index at search time
well after looking at http://wiki.apache.org/solr/CoreAdmin perhaps multiple cores is what i want, DHast wrote: No, I am talking about having multiple indexes, i want to send the index name to the searcher so it will search that index, rather than use the one defined in the schema/solrconfig. nothing t do with multiple cores, i mean different indexes entirely with completely different content. Avlesh Singh wrote: Are you talking about multiple cores? Cheers Avlesh On Mon, Sep 21, 2009 at 9:15 PM, DHast hastings.recurs...@gmail.com wrote: is there a way i can actually tell solr which index i want it to search against with the query? I know it will cost a bit on performance, but it would be helpful i have many indexes and it would be nice to determine which one should be used by the user. thanks -- View this message in context: http://www.nabble.com/define-index-at-search-time-tp25530378p25530378.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/define-index-at-search-time-tp25530378p25564937.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multivalue Field Cache
Have a look at UninvertedField.java. I think that might help. On Sep 23, 2009, at 2:35 PM, Amit Nithian wrote: Are there any good implementations of a field cache that will return all values of a multivalued field? I am in the process of writing one for my immediate needs but I was wondering if if there is a complete implementation of one that handles the different field types. If not, then I can continue on with mine and donate back. Thanks! Amit -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Finding near duplicates which searching Documents
On Sep 23, 2009, at 2:55 PM, Jason Rutherglen wrote: I think don't this handle near duplicates which would require some of the methods mentioned recently on the Mahout list. It's pluggable and I believe the TextProfileSignature is a fuzzy implementation in Solr that was brought over from Nutch. Agree on the Mahout discussion, too, though: http://www.lucidimagination.com/search/document/9d7ad3a882e2a944/finding_the_similarity_of_documents_using_mahout_for_deduplication#b0321c0f25f835a0 On Wed, Sep 23, 2009 at 2:59 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com wrote: Hi, When we have news content crawled we face a problme of same content being repeated in many documents. We want to add a near duplicate document filter to detect such documents. Is there a way to do that in SOLR? Look at http://wiki.apache.org/solr/Deduplication -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Showcase: Facetted Search for Wine using Solr
Hello everybody! The purpose of this mail is to say thank you to the creators of Solr and to the community that supports it. We released our first project using Solr several weeks ago, after having tested Solr for several months. The project I'm talking about is a product search for an online wine shop (sorry, german user interface only): http://www.koelner-weinkeller.de/index.php?id=sortiment Our client offers about 3000 different wines and other related products. Before we introduced Solr, the products have been searched via complicated and slow SQL statements, with all kinds problems related to that. No full text indexing, no stemming etc. We are happy to make use of several built-in features which solve problems that bugged us: Facetted search, german accents and stemming and synonyms beeing the most important ones. The surrounding website is TYPO3 driven. We integrated Solr by creating our own frontend plugin which talks to the Solr webservice (and we're very happy about the PHP output type!). I'd be glad about your comments. Cheers, Marian
Sorting/paging problem
I've run into a strange issue with my Solr installation. I'm running queries that are sorting by a DateField field but from time to time, I'm seeing individual records very much out of order. What's more, they appear on multiple pages of my result set. Let me give an example. Starting with a basic query, I sort on the date that the document was added to the index and see these rows on the first page (I'm just showing the date field here): docdate name=indexed_date2009-09-23T19:24:47.419Z/date/doc docdate name=indexed_date2009-09-23T19:25:03.229Z/date/doc docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc docdate name=indexed_date2009-09-23T19:25:19.951/date/doc docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc Note how the last document's date jumps a bit. Not necessarily a problem, but the next page looks this: docdate name=indexed_date2009-09-23T19:26:16.022Z/date/doc docdate name=indexed_date2009-09-23T19:26:32.547Z/date/doc docdate name=indexed_date2009-09-23T19:27:45.470Z/date/doc docdate name=indexed_date2009-09-23T19:27:45.592Z/date/doc docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc So, not only is the date sorting wrong, but the exact same document shows up on the next page, also still out of date order. I've seen the same document show up in 4-5 pages in some cases. It's always the last record on the page, too. If I change the page size, the problem seems to disappear for a while, but then starts up again later. Also, running the same query/queries later on doesn't show the same behavior. Could it be some sort of page boundary issue with the cache? Has anyone else run into a problem like this? I'm using the Sept 22 nightly build. - Charlie
Re: Can we point a Solr server to index directory dynamically at runtime..
Using a multicore approach, you could send a create a core named 'core3weeksold' pointing to '/datadirs/3weeksold' command to a live Solr, which would spin it up on the fly. Then you query it, and maybe keep it spun up until it's not queried for 60 seconds or something, then send a remove core 'core3weeksold' command. See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler . Michael On Thu, Sep 24, 2009 at 12:31 AM, Silent Surfer silentsurfe...@yahoo.comwrote: Hi, Is there any way to dynamically point the Solr servers to an index/data directories at run time? We are generating 200 GB worth of index per day and we want to retain the index for approximately 1 month. So our idea is to keep the first 1 week of index available at anytime for the users i.e have set of Solr servers up and running and handle request to get the past 1 week of date. But when user tries to query data which is older than 7 days old, we want to dynamically point the existing Solr instances to the inactive/dormant indexes and get the results. The main intention is to limit the number of Solr Slave instances and there by limit the # of Servers required. If the index directory and Solr instances are tightly coupled, then most of the Solr instances are just up and running and may hardly used, as most of the users are mainly interested in past 1 week data and not beyond that. Any thoughts or any other approaches to tackle this would be greatly appreciated. Thanks, sS
Alphanumeric Wild Card Search Question
Hello Solr Users, I've tried to find the answer to this question, and have tried changing my configuration several times, but to no avail. I think someone on this list will know the answer. Here's my question: I have some products that I want to allow people to search for with wild cards. For example, if my product is YBM354, I'd like for users to be able to search on YBM*, YBM3*, YBM35* and for any of these searches to return that product. I've found that I can search for YBM* and get the product, just not the other combinations. I found this: http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--td24369209.html, but adding preserveOriginal=1 doesn't seem to make a difference. I found an example here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory that is close, but I want to do the opposite. The example is: Super-Duper-XL500-42-AutoCoder! - 0:Super, 1:Duper, 2:XL, 2:SuperDuperXL, 3:500 4:42, . In this example, I want to be able to find this record by searching for XL5*. I appreciate the help. Please let me know if there are any questions. Thanks, Adrian Carr
RE: Alphanumeric Wild Card Search Question
Here's my question: I have some products that I want to allow people to search for with wild cards. For example, if my product is YBM354, I'd like for users to be able to search on YBM*, YBM3*, YBM35* and for any of these searches to return that product. I've found that I can search for YBM* and get the product, just not the other combinations. Are you using WordDelimiterFilterFactory? That would explain this behavior. If so, do you need it - for the queries you describe you don't need that kind of tokenization. Also, have you played with the analysis tool on the admin page, it is a great help in debugging things like this. -Ken
download pre-release nightly solr 1.4
Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new features (e.g. field collapsing)? Thanks, Michael -- View this message in context: http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25590281.html Sent from the Solr - User mailing list archive at Nabble.com.
unsubcribe
unsubcribe
Re: download pre-release nightly solr 1.4
michael8 wrote: Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new features (e.g. field collapsing)? Thanks, Michael You can download nightlies here:http://people.apache.org/builds/lucene/solr/nightly/ field collapsing won't be in 1.4 though. You have to build from svn after applying the patch for that. -- - Mark http://www.lucidimagination.com
Looking for suggestion of WordDelimiter filter config and 'ALMA awards'
Hi, I have this situation that I believe is very common but was curious if anyone knows the right way to go about solving it. I have a document with 'ALMA awards' in it. However, when user searches for 'aLMA awards', it ends up with no results found. However, when I search for 'alma awards' or 'ALMA awards', the right results came back as expected. I immediately went to solr/admin/analysis to see what is going on with indexing of 'ALMA awards' and query parsing of 'aLMA awards', and looks like WordDelimiter is the one causing the mismatched. WordDelimiter, with splitOnCaseChange=1, will turn my search query 'aLMA awards' into 'a' and 'LMA' and 'awards', which is exactly what splitOnCaseChange does. In this type of situation, is there a proper way to handle such a situation whereby the user simply got the case wrong for the 1st letter, or maybe n letters? I like the benefits that WordDelimiter filter w/ splitOnCaseChange provides me, but I am not sure what is the proper way to solve this situation without compromising on the other benefits this filter provides. I also tried preserveOriginal=1, hoping that aLMA will be preserved and later on became all lowercase alma via another filter, but with no luck. P.S.: I am basically using the standard config for 'text' fieldtype for my default search field. (solr 1.3) Thanks, Michael -- View this message in context: http://www.nabble.com/Looking-for-suggestion-of-WordDelimiter-filter-config-and-%27ALMA-awards%27-tp25591381p25591381.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr highlighting doesn't respect quotes
If I do a query for a couple of words in quotes, Solr correctly only returns pages where those words appear exactly within the quotes. But the highlighting acts as if the words were given separately, and stems them and everything. For example, if I search for knee pain, it returns a document that has the word knee pain, and doesn't return documents that have knee and pain without other words between them. However, with highlighting turned on, the highlighted field will have knee, knees, pain and pains highlighted even when they aren't next to each other. For instance: responselst name='responseHeader'int name='status'0/int int name='QTime'45/int lst name='params'str name='explainOther'/ str name='fl'*,score/str str name='indent'on/str str name='start'0/str str name='q'knee pain/str str name='hl.fl'text/str str name='qt'standard/str str name='wt'standard/str str name='hl'on/str str name='rows'10/str str name='version'2.2/str /lst /lst lst name='2: http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=ind_focus.storyamp;STORY=/www/story/09-24-2009/0005100306amp;EDATE= 'arr name='text'strI had one injection in each lt;emkneelt;/em and my doctor said it could relieve my lt;emkneelt;/em lt;empainlt;/em for up to six/str /arr /lst -- http://www.linkedin.com/in/paultomblin
OutOfMemoryError due to auto-warming
Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier
RE: OutOfMemoryError due to auto-warming
You also can increase the JVM HeapSize if you have enough physical memory, like for example if you have 4GB physical, gives the JVM heapsize 2GB or 2.5GB. Francis -Original Message- From: didier deshommes [mailto:dfdes...@gmail.com] Sent: Thursday, September 24, 2009 3:32 PM To: solr-user@lucene.apache.org Cc: Andrew Montalenti Subject: OutOfMemoryError due to auto-warming Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier
Re: OutOfMemoryError due to auto-warming
On Thu, Sep 24, 2009 at 5:40 PM, Francis Yakin fya...@liquid.com wrote: You also can increase the JVM HeapSize if you have enough physical memory, like for example if you have 4GB physical, gives the JVM heapsize 2GB or 2.5GB. Thanks, we can definitely do that (we have 4GB available). I also forgot to add that we're running a development version of solr (git clone from ~ 3 weeks ago). Thanks, didier Francis -Original Message- From: didier deshommes [mailto:dfdes...@gmail.com] Sent: Thursday, September 24, 2009 3:32 PM To: solr-user@lucene.apache.org Cc: Andrew Montalenti Subject: OutOfMemoryError due to auto-warming Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier
RE: OutOfMemoryError due to auto-warming
I reduced the size of queryResultCache in solrconfig seems to fix the issue as well. !-- Maximum number of documents to cache for any entry in the queryResultCache. -- queryResultMaxDocsCached200/queryResultMaxDocsCached From 500 !-- Maximum number of documents to cache for any entry in the queryResultCache. -- queryResultMaxDocsCached500/queryResultMaxDocsCached Francis -Original Message- From: didier deshommes [mailto:dfdes...@gmail.com] Sent: Thursday, September 24, 2009 3:32 PM To: solr-user@lucene.apache.org Cc: Andrew Montalenti Subject: OutOfMemoryError due to auto-warming Hi there, We are running solr and allocating 1GB to it and we keep having OutOfMemoryErrors. We get messages like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.lt;initgt;(String.java:216) at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169) at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51) at org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332) at org.apache.solr.search.LRUCache.warm(LRUCache.java:194) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) And like this: Error during auto-warming of key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError: Java heap space We've searched and one suggestion was to reduce the size of the various caches that do sorting in solrconfig.xml (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html). Does this solution generally work? Can anyone think of any other cause for this problem? didier
Re: Solr highlighting doesn't respect quotes
Set hl.usePhraseHighlighter parameter to true: http://wiki.apache.org/solr/HighlightingParameters#hl.usePhraseHighlighter Koji Paul Tomblin wrote: If I do a query for a couple of words in quotes, Solr correctly only returns pages where those words appear exactly within the quotes. But the highlighting acts as if the words were given separately, and stems them and everything. For example, if I search for knee pain, it returns a document that has the word knee pain, and doesn't return documents that have knee and pain without other words between them. However, with highlighting turned on, the highlighted field will have knee, knees, pain and pains highlighted even when they aren't next to each other. For instance: responselst name='responseHeader'int name='status'0/int int name='QTime'45/int lst name='params'str name='explainOther'/ str name='fl'*,score/str str name='indent'on/str str name='start'0/str str name='q'knee pain/str str name='hl.fl'text/str str name='qt'standard/str str name='wt'standard/str str name='hl'on/str str name='rows'10/str str name='version'2.2/str /lst /lst lst name='2: http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=ind_focus.storyamp;STORY=/www/story/09-24-2009/0005100306amp;EDATE= 'arr name='text'strI had one injection in each lt;emkneelt;/em and my doctor said it could relieve my lt;emkneelt;/em lt;empainlt;/em for up to six/str /arr /lst
Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th
Friendly Reminder! One week to go. On Mon, Sep 14, 2009 at 11:35 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, It's time for another Hadoop/Lucene/ApacheCloud Stack meetup! This month it'll be on Wednesday, the 30th, at 6:45 pm. We should have a few interesting guests this time around -- someone from Facebook may be stopping by to talk about Hive :) We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're back at the University of Washington, Allen Computer Science Center (not Computer Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours (and there's usually food): we'll have two in-depth talks of 15-20 minutes each, and then several lightning talks of 5 minutes. If no one offers, We'll then have discussion and 'social time'. we'll just have general discussion. Let net know if you're interested in speaking or attending. We'd like to focus on education, so every presentation *needs* to ask some questions at the end. We can talk about these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com Cheers, Bradford -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Use cases for ReplicationHandler's backup facility?
The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication) has support for backups, which can be triggered in one of two ways: 1. in response to startup/commit/optimize events (specified through the backupAfter tag specified in the handler's requestHandler tag in solrconfig.xml) 2. by manually hitting http://master_host:port/solr/replication?command=backup These backups get placed in directories named, e.g. snapshot.20090924033521, inside the solr data directory. According to the docs, these backups are not necessary for replication to work. My question is: What use case *are* they meant to address? The first potential use case that came to mind was that maybe I would be able to restore my index from these snapshot directories should it ever become corrupted. (I could just do something like rm -r data; mv snapshot.20090924033521 data.) That appears not to be one of the intended use cases, though; if it were, then I imagine the snapshot directories would contain the entire index, whereas they seem to contain only deltas of one form or another. Thanks, Chris
Re: Solrj possible deadlock
Well, in the same processes I am using a jdbc connection to get all the relative paths to the documents I want to index, then I parse the documents to plain text using tones of open source libraries like POI, PFDBox etc.(which might account for java2d) then I add them to the index and commit every 2000 documents. I write a db row for each row I index so I can resume where I left off after a crash or exception. My app will happily index for hours then after it hangs, the resumed indexing run will only last a few additional minutes! The thread dumps look the same. Cheers. ryantxu wrote: do you have anything custom going on? The fact that the lock is in java2d seems suspicious... On Sep 23, 2009, at 7:01 PM, pof wrote: I had the same problem again yesterday except the process halted after about 20mins this time. pof wrote: Hello, I was running a batch index the other day using the Solrj EmbeddedSolrServer when the process abruptly froze in it's tracks after running for about 4-5 hours and indexing ~400K documents. There were no document locks so it would seem likely that there was some kind of thread deadlock. I was hoping someone might be able to tell me some information about the following thread dump taken at the time: Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode): DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition [0x..0x0018a044] java.lang.Thread.State: RUNNABLE Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in Object.wait() [0x0311d000..0x0311def4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at sun.java2d.Disposer.run(Disposer.java:143) at java.lang.Thread.run(Thread.java:636) pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on condition [0x08a6a000..0x08a6b074] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x967acfd0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:1978) at java .util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 386) at java .util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 1043) at java .util .concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 1103) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5 runnable [0x..0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on condition [0x..0x096a7af4] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting on condition [0x..0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait() [0x005ca000..0x005caef4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at java.lang.ref.Finalizer $FinalizerThread.run(Finalizer.java:177) Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in Object.wait() [0x00579000..0x00579d74] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock) VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on condition JNI global references: 1072 Heap def new generation total 36288K, used 23695K [0x93f1, 0x9667, 0x9667) eden space 32256K, 73% used [0x93f1, 0x95633f60, 0x95e9) from space 4032K, 0% used [0x95e9, 0x95e9, 0x9628) to space 4032K, 0% used [0x9628, 0x9628, 0x9667) tenured generation total 483968K, used 72129K
Re: Parallel requests to Tomcat
Are you on Java 5, 6 or 7? Each release sees some tweaking of the Java multithreading model as well as performance improvements (and bug fixes) in the Sun HotSpot runtime. You may be tripping over the TCP/IP multithreaded connection manager. You might wish to create each client thread with a separate socket. Also, here is a standard bit of benchmarking advice: include think time. This means that instead of sending requests constantly, each thread should time out for a few seconds before sending the next request. This simulates a user stopping and thinking before clicking the mouse again. This helps simulate the quantity of threads, etc. which are stopped and waiting at each stage of the request pipeline. As it is, you are trying to simulate the throughput behaviour without simulating the horizontal volume. (Benchmarking is much harder than it looks.) On Wed, Sep 23, 2009 at 9:43 AM, Grant Ingersoll gsing...@apache.org wrote: On Sep 23, 2009, at 12:09 PM, Michael wrote: On Wed, Sep 23, 2009 at 12:05 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote: If this were IO bound, wouldn't I see the same results when sending my 8 requests to 8 Tomcats? There's only one disk (well, RAM) whether I'm querying 8 processes or 8 threads in 1 process, right? Right - I was thinking IO bound at the Lucene Directory level - which synchronized in the past and led to poor concurrency. Buy your Solr version is recent enough to use the newer unsynchronized method by default (on non-windows) Ah, OK. So it looks like comparing to Jetty is my only next step. Although I'm not sure what I'm going to do based on the result of that test -- if Jetty behaves differently, then I still don't know why the heck Tomcat is behaving badly! :) Have you done any profiling to see where hotspots are? Have you looked at garbage collection? Do you have any full collections occurring? What garbage collector are you using? How often are you updating/committing, etc? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Lance Norskog goks...@gmail.com
Re: Very big numbers
There is no bignum support in Solr at this time. You can pick a fixed-length string with leading zeros. That is, if your other strings are the same length as the above. 99,999,999,999,999.99 00,000,999,999,999.99 You can do sorted queries, range queries, and facets from this format. Solr is generally not a math engine so you won't miss much. On Wed, Sep 23, 2009 at 1:56 PM, Jonathan Ariel ionat...@gmail.com wrote: Hi! I need to index in solr very big numbers. Something like 99,999,999,999,999.99 Right now i'm using an sdouble field type because I need to make range queries on this field. The problem is that the field value is being returned in scientific notation. Is there any way to avoid that? Thanks! Jonathan -- Lance Norskog goks...@gmail.com
Re: Mixed field types and boolean searching
No- there are various analyzers. StandardAnalyzer is geared toward searching bodies of text for interesting words - punctuation is ripped out. Other analyzers are more useful for concrete text. You may have to work at finding one that leaves punctuation in. On Wed, Sep 23, 2009 at 2:14 PM, Ensdorf Ken ensd...@zoominfo.com wrote: Hi- let's say you have two indexed fields, F1 and F2. F1 uses the StandardAnalyzer, while F2 doesn't. Now imagine you index a document where you have F1=A B F2=C + D Now imagine you run a query: (F1:A OR F2:A) AND (F1:B OR F2:B) in other words, both A and B must exist in at least one of F1 or F2. This returns the document in question. Now imagine you run another query: (F1:A OR F2:A) AND (F1: OR F2:) Since is removed by the StandardAnalyzer, the parsed query looks like (F1:A OR F2:A) AND (F2:) Now you don't match the document. Is this a bug? Thanks! -Ken -- Lance Norskog goks...@gmail.com
Re: Solr http post performance seems slow - help?
In top, press the '1' key. This will give a list of the CPUs and how much load is on each. The display is otherwise a little weird for multi-cpu machines. But don't be surprised when Solr is I/O bound. The biggest fanciest RAID is often a better investment than CPUs. On one project we bought low-end rack servers come with 6-8 disk bays, filling them with 10k/15k RPM disks. On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey dan.dic...@savvis.net wrote: On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote: ... Our JBoss expert and I will be looking into why this might be occurring. Does anyone know of any JBoss related slowness with Solr? And does anyone have any other sort of suggestions to speed indexing performance? Thanks for your help all! I'll keep you up to date with further progress. Ok, further progress... just to keep any interested parties up to date and for the record... I'm finding that using the example jetty setup (will be switching very very soon to a real jetty installation) is about the fastest. Using several processes to send posts to Solr helps a lot, and we're seeing about 80 posts a second this way. We also stripped down JBoss to the bare bones and the Solr in it is running nearly as fast - about 50 posts a second. It was our previous JBoss configuration that was making it appear slow for some reason. We will be running more tests and spreading out the pre-index workload across more machines and more processes. In our case we were seeing the bottleneck being one machine running 18 processes. The 2 quad core xeon system is experiencing about a 25% cpu load. And I'm not certain, but I think this may be actually 25% of one of the 8 cores. So, there's *lots* of room for Solr to be doing more work there. -Dan -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net -- Lance Norskog goks...@gmail.com
Re: solr caching problem
There are now two excellent books: Lucene In Action 2 and Solr 1.4 Enterprise Search Server the describe the inners workings of these technologies and how they fit together. Otherwise Solr and Lucene knowledge are only available in a fragmented form across many wiki pages, bug reports and email discussions. But the direct answer is: before you commit your changes you will not seem them in queries. When you commit them, all caches are thrown away and rebuilt when you do the same queries you did before. This rebuilding process has various tools to control it in solrconfig.xml. On Wed, Sep 23, 2009 at 8:27 PM, satya tosatyaj...@gmail.com wrote: Is there any way to analyze or see that which documents are getting cached by documentCache - documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote: First of all , thanks a lot for the clarification.Is there any way to see, how this cache is working internally and what are the objects being stored and how much memory its consuming,so that we can get a clear picture in mind.And how to test the performance through cache. On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote: 1)Then do you mean , if we delete a perticular doc ,then that is going to be deleted from cache also. When you delete document, and then COMMIT your changes, new caches will be warmed up (and prepopulated by some key-value pairs from old instances), etc: !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ - this one won't be 'prepopulated'. 2)In solr,is cache storing the entire document in memory or only the references to documents in memory. There are many different cache instances, DocumentCache should store ID, Document pairs, etc -- Lance Norskog goks...@gmail.com
Re: Solrj possible deadlock
: Well, in the same processes I am using a jdbc connection to get all the : relative paths to the documents I want to index, then I parse the documents : to plain text using tones of open source libraries like POI, PFDBox : etc.(which might account for java2d) then I add them to the index and commit : every 2000 documents. Since nothing in your threaddumps refers to solrj or solr (either as the current method, or in the call stack suggesting that it's code called by Solr(J)), there's really no indication that the problem is even remotely solr related. i suspect that if you commented out all of hte code where you use SolrJ so that you still did all of the parsing and then just wrote the resulting data to /dev/null you would probably still see this behavior -- perhaps one of the other libraries you are using has some semantics you aren't obeying (ie: a parser that must be used single threaded, or an object that must be closed so that it can reset some static state) that is acusing this problem only after some time has elapsed (or on particular permutations of data) -Hoss
Re: Sorting/paging problem
Which version of Java are you using? Please try the standard tricks: Do a fresh checkout of the Solr trunk. Do 'ant clean dist' and use the newly built war latest lucene libraries. Try changing the JVM startup parameters which control how incremental compilation works: -server and others. Also try changing the garbage collection algorithms. On Thu, Sep 24, 2009 at 9:49 AM, Charlie Jackson charlie.jack...@cision.com wrote: I've run into a strange issue with my Solr installation. I'm running queries that are sorting by a DateField field but from time to time, I'm seeing individual records very much out of order. What's more, they appear on multiple pages of my result set. Let me give an example. Starting with a basic query, I sort on the date that the document was added to the index and see these rows on the first page (I'm just showing the date field here): docdate name=indexed_date2009-09-23T19:24:47.419Z/date/doc docdate name=indexed_date2009-09-23T19:25:03.229Z/date/doc docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc docdate name=indexed_date2009-09-23T19:25:19.951/date/doc docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc Note how the last document's date jumps a bit. Not necessarily a problem, but the next page looks this: docdate name=indexed_date2009-09-23T19:26:16.022Z/date/doc docdate name=indexed_date2009-09-23T19:26:32.547Z/date/doc docdate name=indexed_date2009-09-23T19:27:45.470Z/date/doc docdate name=indexed_date2009-09-23T19:27:45.592Z/date/doc docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc So, not only is the date sorting wrong, but the exact same document shows up on the next page, also still out of date order. I've seen the same document show up in 4-5 pages in some cases. It's always the last record on the page, too. If I change the page size, the problem seems to disappear for a while, but then starts up again later. Also, running the same query/queries later on doesn't show the same behavior. Could it be some sort of page boundary issue with the cache? Has anyone else run into a problem like this? I'm using the Sept 22 nightly build. - Charlie -- Lance Norskog goks...@gmail.com
Re: Showcase: Facetted Search for Wine using Solr
Hi Marian, Looks great! Wish I could order some wine. When you get a chance, please add the site to http://wiki.apache.org/solr/PublicServers! Cheers, Grant On Sep 24, 2009, at 11:51 AM, marian.steinbach wrote: Hello everybody! The purpose of this mail is to say thank you to the creators of Solr and to the community that supports it. We released our first project using Solr several weeks ago, after having tested Solr for several months. The project I'm talking about is a product search for an online wine shop (sorry, german user interface only): http://www.koelner-weinkeller.de/index.php?id=sortiment Our client offers about 3000 different wines and other related products. Before we introduced Solr, the products have been searched via complicated and slow SQL statements, with all kinds problems related to that. No full text indexing, no stemming etc. We are happy to make use of several built-in features which solve problems that bugged us: Facetted search, german accents and stemming and synonyms beeing the most important ones. The surrounding website is TYPO3 driven. We integrated Solr by creating our own frontend plugin which talks to the Solr webservice (and we're very happy about the PHP output type!). I'd be glad about your comments. Cheers, Marian -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Can we point a Solr server to index directory dynamically at runtime..
: Using a multicore approach, you could send a create a core named : 'core3weeksold' pointing to '/datadirs/3weeksold' command to a live Solr, : which would spin it up on the fly. Then you query it, and maybe keep it : spun up until it's not queried for 60 seconds or something, then send a : remove core 'core3weeksold' command. : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler . something that seems implicit in the question is what to do when the request spans all of the data ... this is where (in theory) distributed searching could help you out. index each days worth of data into it's own core, that makes it really easy to expire the old data (just UNLOAD and delete an entire core once it's more then 30 days old) if your user is only searching current dta then your app can directly query the core containing the most current data -- but if they want to query the last week, or last two weeks worth of data, you do a distributed request for all of the shards needed to search the appropriate amount of data. Between the ALIAS and SWAP commands it on the CoreAdmin screen it should be pretty easy have cores with names like today,1dayold,2dayold so that your app can configure simple shard params for all the perumations you'll need to query. -Hoss
Re: Use cases for ReplicationHandler's backup facility?
On Fri, Sep 25, 2009 at 4:57 AM, Chris Harris rygu...@gmail.com wrote: The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication) has support for backups, which can be triggered in one of two ways: 1. in response to startup/commit/optimize events (specified through the backupAfter tag specified in the handler's requestHandler tag in solrconfig.xml) 2. by manually hitting http://master_host:port/solr/replication?command=backup These backups get placed in directories named, e.g. snapshot.20090924033521, inside the solr data directory. According to the docs, these backups are not necessary for replication to work. My question is: What use case *are* they meant to address? The first potential use case that came to mind was that maybe I would be able to restore my index from these snapshot directories should it ever become corrupted. (I could just do something like rm -r data; mv snapshot.20090924033521 data.) That appears not to be one of the intended use cases, though; if it were, then I imagine the snapshot directories would contain the entire index, whereas they seem to contain only deltas of one form or another. Yes, the only reason to take a backup should be for restoration/archival They should contain all the files required for the latest commit point. Thanks, Chris -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Can we point a Solr server to index directory dynamically at runtime..
Hi, Thank you Michael and Chris for the response. Today after the mail from Michael, we tested with the dynamic loading of cores and it worked well. So we need to go with the hybrid approach of Multicore and Distributed searching. As per our testing, we found that a Solr instance with 20 GB of index(single index or spread across multiple cores) can provide better performance when compared to having a Solr instance say 40 (or) 50 GB of index (single index or index spread across cores). So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve instances. On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave instances = 200*2/20=20 ... .. .. On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave instances = 200*30/20=300 So with the above approach, we may need ~300 Solr slave instances, which becomes very unmanageable. But we know that most of the queries is for the past 1 week, i.e we definitely need 70 Solr Slaves containing the last 7 days worth of data up and running. Now for the rest of the 230 Solr instances, do we need to keep it running for the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) which can come up only a couple of times a day. This linear increase of Solr servers with the retention period doesn't seems to be a very scalable solution. So we are looking for something more simpler approach to handle this scenario. Appreciate any further inputs/suggestions. Regards, sS --- On Fri, 9/25/09, Chris Hostetter hossman_luc...@fucit.org wrote: From: Chris Hostetter hossman_luc...@fucit.org Subject: Re: Can we point a Solr server to index directory dynamically at runtime.. To: solr-user@lucene.apache.org Date: Friday, September 25, 2009, 4:04 AM : Using a multicore approach, you could send a create a core named : 'core3weeksold' pointing to '/datadirs/3weeksold' command to a live Solr, : which would spin it up on the fly. Then you query it, and maybe keep it : spun up until it's not queried for 60 seconds or something, then send a : remove core 'core3weeksold' command. : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler . something that seems implicit in the question is what to do when the request spans all of the data ... this is where (in theory) distributed searching could help you out. index each days worth of data into it's own core, that makes it really easy to expire the old data (just UNLOAD and delete an entire core once it's more then 30 days old) if your user is only searching current dta then your app can directly query the core containing the most current data -- but if they want to query the last week, or last two weeks worth of data, you do a distributed request for all of the shards needed to search the appropriate amount of data. Between the ALIAS and SWAP commands it on the CoreAdmin screen it should be pretty easy have cores with names like today,1dayold,2dayold so that your app can configure simple shard params for all the perumations you'll need to query. -Hoss