Re: SEVERE: Could not start SOLR. Check solr/home property
Did you by any chance set up multicore? Try passing in the path to the Solr home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr. On Mon, Apr 26, 2010 at 1:04 PM, Jon Drukman wrote: > What does this error mean? > > SEVERE: Could not start SOLR. Check solr/home property > > I've had this solr installation working before, but I haven't looked at it > in a few months. I checked it today and the web side is returning a 500 > error, the log file shows this when starting up: > > > SEVERE: Could not start SOLR. Check solr/home property > java.lang.RuntimeException: java.io.IOException: read past EOF > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) > at org.apache.solr.core.SolrCore.(SolrCore.java:579) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) > at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) > at > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) > at > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) > at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) > at > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) > > For the record, I've never explictly set "solr/home" ever. It always "just > worked". > > -jsd- > > -- - Siddhant
Re: What hardware do I need ?
If its worth mentioning here, in my case the disk read speeds seemed to have a really noticeable effect on the query times. What disks are you planning on using? Also, as Otis has already pointed out, I doubt if a single box of that capacity can handle 100-700 queries per second. On Fri, Apr 23, 2010 at 1:32 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Xavier, > > 100-700 QPS is still high. I'm guessing your 1 box won't handle that > without sweating a lot (read: slow queries). > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > From: Xavier Schepler > > To: solr-user@lucene.apache.org > > Sent: Fri, April 23, 2010 11:53:23 AM > > Subject: Re: What hardware do I need ? > > > > Le 23/04/2010 17:08, Otis Gospodnetic a écrit : > > Xavier, > > > > > > 0-1000 QPS is a pretty wide range. Plus, it depends on how good your > > auto-complete is, which depends on types of queries it issues, among > other > > things. > > 100K short docs is small, so that will all fit in RAM nicely, > > assuming those other processes leave enough RAM for the OS to cache the > > index. > > > > That said, you do need more than 1 box if you want > > your auto-complete more fault tolerant. > > > > Otis > > > > > > Sematext :: > > >http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem > > search :: > > >http://search-lucene.com/ > > > > > > > > - Original > > Message > > > >> From: Xavier Schepler< > > ymailto="mailto:xavier.schep...@sciences-po.fr"; > > href="mailto:xavier.schep...@sciences-po.fr";> > xavier.schep...@sciences-po.fr> > >> > > To: > > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org > >> > > Sent: Fri, April 23, 2010 11:01:24 AM > >> Subject: What hardware do I > > need ? > >> > >> Hi, > >> > > I'm > > working with Solr 1.4. > > My schema has about 50 fields. > > > > I'm > > > >> using full text search in short strings (~ > > 30-100 terms) and facetted > >> search. > >> > > > > My index will have 100 000 documents. > > > > The number of > > requests > > > >> per second will be low. Let's say > > between 0 and 1000 because of > >> auto-complete. > >> > > > > Is a standard server (3ghz proc, 4gb ram) with the > > client > > > >> application (apache + php5 + ZF + apc) > > and Tomcat + Solr enough ??? > >> > > Do I > > need > > > >> more hardware ? > >> > > > > Thanks in advance, > > > > Xavier S. > > > > > Well my auto-complete is built on the facet prefix search > > component. > I think that 100-700 requests per seconds is maybe a better > > approximation. > -- - Siddhant
Re: exclude words?
I think you can use something like "q=hello world -books". Should do. On Wed, Mar 31, 2010 at 7:34 PM, Sebastian Funk wrote: > Hey there, > > I'm sure this easy a pretty easy thing, but I can't find the solution: > can I search for a text with one word (e.g. "books") especially not in it? > so solr returns all documents, that don't have "books" somewhere in them? > > thanks for the help, > sebastian > -- - Siddhant
Re: jmap output help
Gentle bounce On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel wrote: > Hi everyone, > > The output of "jmap -histo:live 27959 | head -30" is something like the > following : > > num #instances #bytes class name > -- >1:448441 180299464 [C >2: 5311 135734480 [I >3: 3623 68389720 [B >4:445669 17826760 java.lang.String >5:391739 15669560 org.apache.lucene.index.TermInfo >6:417442 13358144 org.apache.lucene.index.Term >7: 587675171496 > org.apache.lucene.index.FieldsReader$LazyField >8: 329025049760 >9: 329023955920 > 10: 28433512688 > 11: 23973128048 [Lorg.apache.lucene.index.Term; > 12:353053592 [J > 13: 33044288 [Lorg.apache.lucene.index.TermInfo; > 14: 556712707536 > 15: 272822701352 [Ljava.lang.Object; > 16: 28432212384 > 17: 23432132224 > 18: 264241056960 java.util.ArrayList > 19: 164231051072 java.util.LinkedHashMap$Entry > 20: 20391028944 > 21: 14336 917504 org.apache.lucene.document.Field > 22: 29587 710088 java.lang.Integer > 23: 3171 583464 java.lang.Class > 24: 813 492880 [Ljava.util.HashMap$Entry; > 25: 8471 474376 org.apache.lucene.search.PhraseQuery > 26: 4184 402848 [[I > 27: 4277 380704 [S > > Is it ok to assume that the top 3 entries (character/integer/byte arrays) > are referring to the entries inside the solr cache? > > Thanks, > > > -- > - Siddhant > -- - Siddhant
jmap output help
Hi everyone, The output of "jmap -histo:live 27959 | head -30" is something like the following : num #instances #bytes class name -- 1:448441 180299464 [C 2: 5311 135734480 [I 3: 3623 68389720 [B 4:445669 17826760 java.lang.String 5:391739 15669560 org.apache.lucene.index.TermInfo 6:417442 13358144 org.apache.lucene.index.Term 7: 587675171496 org.apache.lucene.index.FieldsReader$LazyField 8: 329025049760 9: 329023955920 10: 28433512688 11: 23973128048 [Lorg.apache.lucene.index.Term; 12:353053592 [J 13: 33044288 [Lorg.apache.lucene.index.TermInfo; 14: 556712707536 15: 272822701352 [Ljava.lang.Object; 16: 28432212384 17: 23432132224 18: 264241056960 java.util.ArrayList 19: 164231051072 java.util.LinkedHashMap$Entry 20: 20391028944 21: 14336 917504 org.apache.lucene.document.Field 22: 29587 710088 java.lang.Integer 23: 3171 583464 java.lang.Class 24: 813 492880 [Ljava.util.HashMap$Entry; 25: 8471 474376 org.apache.lucene.search.PhraseQuery 26: 4184 402848 [[I 27: 4277 380704 [S Is it ok to assume that the top 3 entries (character/integer/byte arrays) are referring to the entries inside the solr cache? Thanks, -- - Siddhant
Re: Solr Performance Issues
Hi, Apparently the bottleneck seem to be the time periods when CPU is waiting to do some I/O. Out of all the numbers I can see, the CPU wait times for I/O seem to be the highest. I've alloted 4GB to Solr out of the total 8GB available. There's only 47MB free on the machine, so I assume the rest of the memory is being used for OS disk caches. In addition, the hit ratios for queryResultCache isn't going beyond 20%. So the problem I think is not at Solr's end. Are there any pointers available on how can I resolve such issues related to disk I/O? Does this mean I need more overall memory? Or reducing the amount of memory allocated to Solr so that the disk cache has more memory, would help? Thanks, On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson wrote: > Sounds like you're pretty well on your way then. This is pretty typical > of multi-threaded situations... Threads 1-n wait around on I/O and > increasing the number of threads increases throughput without > changing (much) the individual response time. > > Threads n+1 - p don't change throughput much, but increase > the response time for each request. On aggregate, though, the > throughput doesn't change (much). > > Adding threads after p+1 *decreases* throughput while > *increasing* individual response time as your processors start > spending w to much time context and/or memory > swapping. > > The trick is finding out what n and p are . > > Best > Erick > > On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel >wrote: > > > Hi, > > > > Thanks for your responses. It actually feels good to be able to locate > > where > > the bottlenecks are. > > > > I've created two sets of data - in the first one I'm measuring the time > > took > > purely on Solr's end, and in the other one I'm including network latency > > (just for reference). The data that I'm posting below contains the time > > took > > purely by Solr. > > > > I'm running 10 threads simultaneously and the average response time (for > > each query in each thread) remains close to 40 to 50 ms. But as soon as I > > increase the number of threads to something like 100, the response time > > goes > > up to ~600ms, and further up when the number of threads is close to 500. > > Yes > > the average time definitely depends on the number of concurrent requests. > > > > Going from memory, debugQuery=on will let you know how much time > > > was spent in various operations in SOLR. It's important to know > > > whether it was the searching, assembling the response, or > > > transmitting the data back to the client. > > > > > > I just tried this. The information that it gives me for a query that took > > 7165ms is - http://pastebin.ca/1835644 > > > > So out of the total time 7165ms, QueryComponent took most of the time. > Plus > > I can see the load average going up when the number of threads is really > > high. So it actually makes sense. (I didn't add any other component while > > searching; it was a plain /select?q=query call). > > Like I mentioned earlier in this mail, I'm maintaining separate sets for > > data with/without network latency, and I don't think its the bottleneck. > > > > > > > How many threads does it take to peg the CPU? And what > > > response times are you getting when your number of threads is > > > around 10? > > > > > > > If the number of threads is greater than 100, that really takes its toll > on > > the CPU. So probably thats the number. > > > > When the number of threads is around 10, the response times average to > > something like 60ms (and 95% of the queries fall within 100ms of that > > value). > > > > Thanks, > > > > > > > > > > > > > > Erick > > > > > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel > > >wrote: > > > > > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS > > > disk > > > > caching. > > > > > > > > I think that at any point of time, there can be a maximum of of > > > > threads> concurrent requests, which happens to make sense btw (does > > it?). > > > > > > > > As I increase the number of threads, the load average shown by top > goes > > > up > > > > to as high as 80%. But if I keep the number of threads low (~10), the > > > load > > > > average never goes beyond ~8). So probably thats the number of > requests > > I > >
Re: Solr Performance Issues
Hi, Thanks for your responses. It actually feels good to be able to locate where the bottlenecks are. I've created two sets of data - in the first one I'm measuring the time took purely on Solr's end, and in the other one I'm including network latency (just for reference). The data that I'm posting below contains the time took purely by Solr. I'm running 10 threads simultaneously and the average response time (for each query in each thread) remains close to 40 to 50 ms. But as soon as I increase the number of threads to something like 100, the response time goes up to ~600ms, and further up when the number of threads is close to 500. Yes the average time definitely depends on the number of concurrent requests. Going from memory, debugQuery=on will let you know how much time > was spent in various operations in SOLR. It's important to know > whether it was the searching, assembling the response, or > transmitting the data back to the client. I just tried this. The information that it gives me for a query that took 7165ms is - http://pastebin.ca/1835644 So out of the total time 7165ms, QueryComponent took most of the time. Plus I can see the load average going up when the number of threads is really high. So it actually makes sense. (I didn't add any other component while searching; it was a plain /select?q=query call). Like I mentioned earlier in this mail, I'm maintaining separate sets for data with/without network latency, and I don't think its the bottleneck. > How many threads does it take to peg the CPU? And what > response times are you getting when your number of threads is > around 10? > If the number of threads is greater than 100, that really takes its toll on the CPU. So probably thats the number. When the number of threads is around 10, the response times average to something like 60ms (and 95% of the queries fall within 100ms of that value). Thanks, > > Erick > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel >wrote: > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS > disk > > caching. > > > > I think that at any point of time, there can be a maximum of > threads> concurrent requests, which happens to make sense btw (does it?). > > > > As I increase the number of threads, the load average shown by top goes > up > > to as high as 80%. But if I keep the number of threads low (~10), the > load > > average never goes beyond ~8). So probably thats the number of requests I > > can expect Solr to serve concurrently on this index size with this > > hardware. > > > > Can anyone give a general opinion as to how much hardware should be > > sufficient for a Solr deployment with an index size of ~43GB, containing > > around 2.5 million documents? I'm expecting it to serve at least 20 > > requests > > per second. Any experiences? > > > > Thanks > > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West > >wrote: > > > > > > > > How much of your memory are you allocating to the JVM and how much are > > you > > > leaving free? > > > > > > If you don't leave enough free memory for the OS, the OS won't have a > > large > > > enough disk cache, and you will be hitting the disk for lots of > queries. > > > > > > You might want to monitor your Disk I/O using iostat and look at the > > > iowait. > > > > > > If you are doing phrase queries and your *prx file is significantly > > larger > > > than the available memory then when a slow phrase query hits Solr, the > > > contention for disk I/O with other queries could be slowing everything > > > down. > > > You might also want to look at the 90th and 99th percentile query times > > in > > > addition to the average. For our large indexes, we found at least an > > order > > > of magnitude difference between the average and 99th percentile > queries. > > > Again, if Solr gets hit with a few of those 99th percentile slow > queries > > > and > > > your not hitting your caches, chances are you will see serious > contention > > > for disk I/O.. > > > > > > Of course if you don't see any waiting on i/o, then your bottleneck is > > > probably somewhere else:) > > > > > > See > > > > > > > > > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 > > > for more background on our experience. > > > > > > Tom Burton-West > > > University of Michigan Library > > > www.hathitrust.org > > > > &g
Re: Solr Performance Issues
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching. I think that at any point of time, there can be a maximum of concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high as 80%. But if I keep the number of threads low (~10), the load average never goes beyond ~8). So probably thats the number of requests I can expect Solr to serve concurrently on this index size with this hardware. Can anyone give a general opinion as to how much hardware should be sufficient for a Solr deployment with an index size of ~43GB, containing around 2.5 million documents? I'm expecting it to serve at least 20 requests per second. Any experiences? Thanks On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West wrote: > > How much of your memory are you allocating to the JVM and how much are you > leaving free? > > If you don't leave enough free memory for the OS, the OS won't have a large > enough disk cache, and you will be hitting the disk for lots of queries. > > You might want to monitor your Disk I/O using iostat and look at the > iowait. > > If you are doing phrase queries and your *prx file is significantly larger > than the available memory then when a slow phrase query hits Solr, the > contention for disk I/O with other queries could be slowing everything > down. > You might also want to look at the 90th and 99th percentile query times in > addition to the average. For our large indexes, we found at least an order > of magnitude difference between the average and 99th percentile queries. > Again, if Solr gets hit with a few of those 99th percentile slow queries > and > your not hitting your caches, chances are you will see serious contention > for disk I/O.. > > Of course if you don't see any waiting on i/o, then your bottleneck is > probably somewhere else:) > > See > > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 > for more background on our experience. > > Tom Burton-West > University of Michigan Library > www.hathitrust.org > > > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel > >wrote: > > > > > Hi everyone, > > > > > > I have an index corresponding to ~2.5 million documents. The index size > > is > > > 43GB. The configuration of the machine which is running Solr is - Dual > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, > > 8GB > > > RAM, and 250 GB HDD. > > > > > > I'm observing a strange trend in the queries that I send to Solr. The > > query > > > times for queries that I send earlier is much lesser than the queries I > > > send > > > afterwards. For instance, if I write a script to query solr 5000 times > > > (with > > > 5000 distinct queries, most of them containing not more than 3-5 words) > > > with > > > 10 threads running in parallel, the average times for queries goes from > > > ~50ms in the beginning to ~6000ms. Is this expected or is there > > something > > > wrong with my configuration. Currently I've configured the > > queryResultCache > > > and the documentCache to contain 2048 entries (hit ratios for both is > > close > > > to 50%). > > > > > > Apart from this, a general question that I want to ask is that is such > a > > > hardware enough for this scenario? I'm aiming at achieving around 20 > > > queries > > > per second with the hardware mentioned above. > > > > > > Thanks, > > > > > > Regards, > > > > > > -- > > > - Siddhant > > > > > > > > > -- > - Siddhant > > > > -- > View this message in context: > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: field length normalization
Did you reindex after setting omitNorms to false? I'm not sure whether or not it is needed, but it makes sense. On Thu, Mar 11, 2010 at 5:34 PM, muneeb wrote: > > Hi, > > In my schema, the document title field has "omitNorms=false", which, if I > am > not wrong, causes length of titles to be counted in the scoring. > > But when I query with: "word1 word2 word3" I dont know why still the top > two > documents title have these words and other words, where as the document > which has exact and only these query words is coming on third place. > > Setting omitNorms to false, should bring the titles with exact words on top > shouldn't it? > > Also I realized when debugged query, that all three top documents have same > score, shouldn't this be different as they have different title lengths? > > Thanks very much. > -A > -- > View this message in context: > http://old.nabble.com/field-length-normalization-tp27862618p27862618.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: Solr Performance Issues
Hi Erick, The way the load test works is that it picks up 5000 queries, splits them according to the number of threads (so if we have 10 threads, it schedules 10 threads - each one sending 500 queries). So it might be possible that the number of queries at a point later in time is greater than the number of queries earlier in time. I'm not very sure about that though. Its a simple Ruby script that starts up threads, calls the search function in each thread, and then waits for each of them to exit. How many queries per second can we expect Solr to serve, given this kind of hardware? If what you suggest is true, then is it possible that while Solr is serving a query, another query hits it, which increases the response time even further? I'm not sure about it. But yes I can observe the query times going up as I increase the number of threads. Thanks, Regards, On Thu, Mar 11, 2010 at 8:30 PM, Erick Erickson wrote: > How many outstanding queries do you have at a time? Is it possible > that when you start, you have only a few queries executing concurrently > but as your test runs you have hundreds? > > This really is a question of how your load test is structured. You might > get a better sense of how it works if your tester had a limited number > of threads running so the max concurrent requests SOLR was serving > at once were capped (30, 50, whatever). > > But no, I wouldn't expect SOLR to bog down the way you're describing > just because it was running for a while. > > HTH > Erick > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel >wrote: > > > Hi everyone, > > > > I have an index corresponding to ~2.5 million documents. The index size > is > > 43GB. The configuration of the machine which is running Solr is - Dual > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, > 8GB > > RAM, and 250 GB HDD. > > > > I'm observing a strange trend in the queries that I send to Solr. The > query > > times for queries that I send earlier is much lesser than the queries I > > send > > afterwards. For instance, if I write a script to query solr 5000 times > > (with > > 5000 distinct queries, most of them containing not more than 3-5 words) > > with > > 10 threads running in parallel, the average times for queries goes from > > ~50ms in the beginning to ~6000ms. Is this expected or is there something > > wrong with my configuration. Currently I've configured the > queryResultCache > > and the documentCache to contain 2048 entries (hit ratios for both is > close > > to 50%). > > > > Apart from this, a general question that I want to ask is that is such a > > hardware enough for this scenario? I'm aiming at achieving around 20 > > queries > > per second with the hardware mentioned above. > > > > Thanks, > > > > Regards, > > > > -- > > - Siddhant > > > -- - Siddhant
Solr Performance Issues
Hi everyone, I have an index corresponding to ~2.5 million documents. The index size is 43GB. The configuration of the machine which is running Solr is - Dual Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB RAM, and 250 GB HDD. I'm observing a strange trend in the queries that I send to Solr. The query times for queries that I send earlier is much lesser than the queries I send afterwards. For instance, if I write a script to query solr 5000 times (with 5000 distinct queries, most of them containing not more than 3-5 words) with 10 threads running in parallel, the average times for queries goes from ~50ms in the beginning to ~6000ms. Is this expected or is there something wrong with my configuration. Currently I've configured the queryResultCache and the documentCache to contain 2048 entries (hit ratios for both is close to 50%). Apart from this, a general question that I want to ask is that is such a hardware enough for this scenario? I'm aiming at achieving around 20 queries per second with the hardware mentioned above. Thanks, Regards, -- - Siddhant
Re: Question about fieldNorms
Wonderful! That explains it. Thanks a lot! Regards, On Mon, Mar 8, 2010 at 6:39 AM, Jay Hill wrote: > Yes, if omitNorms=true, then no lengthNorm calculation will be done, and > the > fieldNorm value will be 1.0, and lengths of the field in question will not > be a factor in the score. > > To see an example of this you can do a quick test. Add two "text" fields, > and on one omitNorms: > > >omitNorms="true"/> > > Index a doc with the same value for both fields: > 1 2 3 4 5 > 1 2 3 4 5 > > Set &debugQuery=true and do two queries: &q=foo:5 &q=bar:5 > > in the "explain" section of the debug output note that the fieldNorm value > for the "foo" query is this: > >0.4375 = fieldNorm(field=foo, doc=1) > > and the value for the "bar" query is this: > >1.0 = fieldNorm(field=bar, doc=1) > > A simplified description of how the fieldNorm value is: fieldNorm = > lengthNorm * documentBoost * documentFieldBoosts > > and the lengthNorm is calculated like this: lengthNorm = > 1/(numTermsInField)**.5 > [note that the value is encoded as a single byte, so there is some > precision > loss] > > When omitNorms=true no norm calculation is done, so fieldNorm will always > be > one on those fields. > > You can also use the Luke utility to view the document in the index, and it > will show that there is a norm value for the foo field, but not the bar > field. > > -Jay > http://www.lucidimagination.com > > > On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel >wrote: > > > Hi everyone, > > > > Is the fieldNorm calculation altered by the omitNorms factor? I saw on > this > > page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) > the > > formula for calculation of fieldNorms (fieldNorm = > > fieldBoost/sqrt(numTermsForField)). > > > > Does this mean that for a document containing a string like "A B C D E" > in > > its field, its fieldNorm would be boost/sqrt(5), and for another document > > containing the string "A B C" in the same field, its fieldNorm would be > > boost/sqrt(3). Is that correct? > > > > If yes, then is *this* what omitNorms affects? > > > > Thanks, > > > > -- > > - Siddhant > > > -- - Siddhant
Question about fieldNorms
Hi everyone, Is the fieldNorm calculation altered by the omitNorms factor? I saw on this page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) the formula for calculation of fieldNorms (fieldNorm = fieldBoost/sqrt(numTermsForField)). Does this mean that for a document containing a string like "A B C D E" in its field, its fieldNorm would be boost/sqrt(5), and for another document containing the string "A B C" in the same field, its fieldNorm would be boost/sqrt(3). Is that correct? If yes, then is *this* what omitNorms affects? Thanks, -- - Siddhant
Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley
Now that I missed attending it, where can I view it? :-) Thanks On Fri, Feb 26, 2010 at 10:11 PM, Jay Hill wrote: > Yes, it will be recorded and available to view after the presentation. > > -Jay > > > On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton < > bernadette.hough...@deakin.edu.au> wrote: > > > Yonk, can you please advise whether this event will be recorded and > > available for later download? (It starts 5am our time ;-) ) > > > > Regards > > Bern > > > > -Original Message- > > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > > Seeley > > Sent: Thursday, 25 February 2010 10:23 AM > > To: solr-user@lucene.apache.org > > Subject: Free Webinar: Mastering Solr 1.4 with Yonik Seeley > > > > I'd like to invite you to join me for an in-depth review of Solr's > > powerful, versatile new features and functions. The free webinar, > > sponsored by my company, Lucid Imagination, covers an intensive > > "how-to" for the features you need to make the most of Solr for your > > search application: > > > >* Faceting deep dive, from document fields to performance management > >* Best practices for sharding, index partitioning and scaling > >* How to construct efficient Range Queries and function queries > >* Sneak preview: Solr 1.5 roadmap > > > > Join us for a free webinar > > Thursday, March 4, 2010 > > 10:00 AM PST / 1:00 PM EST / 18:00 GMT > > Follow this link to sign up > > > > http://www.eventsvc.com/lucidimagination/030410?trk=WR-MAR2010-AP > > > > Thanks, > > > > -Yonik > > http://www.lucidimagination.com > > > -- - Siddhant
Re: multiCore
Can you provide the error message that you got? On Sat, Mar 6, 2010 at 11:13 AM, Suram wrote: > > Hi, > > > how can i send the xml file to solr after created the multicore.i tried it > refuse accept > -- > View this message in context: > http://old.nabble.com/multiCore-tp27802043p27802043.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: field not found for search
Did you send a commit after indexing those files? On Thu, Mar 4, 2010 at 6:30 PM, Suram wrote: > > Hi, > >I newly Indexed some xml files, it was not found for search and > autosuggestion > > My xml Index file http://old.nabble.com/file/p27780413/Nike.xmlNike.xml > > and my scheme is http://old.nabble.com/file/p27780413/schema.xmlschema.xml > > how can i achive this. > -- > View this message in context: > http://old.nabble.com/field-not-found-for-search-tp27780413p27780413.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: Indexing HTML document
There is an HTML filter documented here, which might be of some help - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory Control characters can be eliminated using code like this - http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449 On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt wrote: > Hi, How to index properly HTML documents? All the documents are HTML, some > containing charaters encodid like ží ... Is there a character > filter for filtering these codes? Is there a way to strip the HTML tags > out? > Does solr weight the terms in the document based on where they appear?.. > words in headers (H1, H2,..) would be supposed to describe the document > more > then words in paragraphs. > > Thanks for help, > > Georg > -- - Siddhant
Re: fieldType "text"
I think that's because of the internal tokenization that Solr does. If a document contains HP1, and you're using the default text field type, Solr would tokenize that to HP and 1, so that document figures in the list of documents containing HP, and hence that documents appears in the search results for HP. Creating a separate text field which does not tokenize like that might be what you want. The various filter/tokenizer types are listed here - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro < frederico.azeite...@cision.com> wrote: > Hi, > > I'm using the default "text" field type that comes with the example. > > > > When searching for simple words as 'HP' or 'TCS' solr is returning > results that contains 'HP1' or 'T&CS' > > Is there a solution for to avoid this? > > > > Thanks, > > Frederico > > -- - Siddhant
Re: updating particular field
Yep. I think updation in Lucene means first a deletion, and then an addition. So the entire document needs to be sent to update. On Mon, Mar 1, 2010 at 7:24 PM, Israel Ekpo wrote: > Unfortunately, because of how Lucene works internally, you will not be able > to update just one or two fields. You have to resubmit the entire document. > > If you only send just one or two fields, then the updated document will > only > have the fields sent in the last update. > > On Mon, Mar 1, 2010 at 7:09 AM, Suram wrote: > > > > > > > > > Siddhant wrote: > > > > > > Yes. You can just re-add the document with your changes, and the rest > of > > > the > > > fields in the document will remain unchanged. > > > > > > On Mon, Mar 1, 2010 at 5:09 PM, Suram wrote: > > > > > >> > > >> Hi, > > >> > > >> > > >> EN7800GTX/2DHTV/256M > > >> ASUS Computer Inc. > > >> electronics > > >> graphics card > > >> NVIDIA GeForce 7800 GTX GPU/VPU clocked at > > >> 486MHz > > >> 256MB GDDR3 Memory clocked at 1.35GHz > > >> 479.95 > > >> 7 > > >> false > > >> 2006-02-13T15:26:37Z/DAY > > >> > > >> > > >> can i possible to update true without > > >> affect > > >> any field of my previous document > > >> > > >> Thanks in advance > > >> -- > > >> View this message in context: > > >> > > http://old.nabble.com/updating-particular-field-tp27742399p27742399.html > > >> Sent from the Solr - User mailing list archive at Nabble.com. > > >> > > >> > > > > > > > > > -- > > > - Siddhant > > > > > > > > > > > > Hi, > > Here i don't want to reload entire data just i want u update a > field > > i need to change(ie one or more field with id not whole) > > > > > > -- > > View this message in context: > > http://old.nabble.com/updating-particular-field-tp27742399p27742671.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ > -- - Siddhant
Re: updating particular field
Yes. You can just re-add the document with your changes, and the rest of the fields in the document will remain unchanged. On Mon, Mar 1, 2010 at 5:09 PM, Suram wrote: > > Hi, > > > EN7800GTX/2DHTV/256M > ASUS Computer Inc. > electronics > graphics card > NVIDIA GeForce 7800 GTX GPU/VPU clocked at > 486MHz > 256MB GDDR3 Memory clocked at 1.35GHz > 479.95 > 7 > false > 2006-02-13T15:26:37Z/DAY > > > can i possible to update true without affect > any field of my previous document > > Thanks in advance > -- > View this message in context: > http://old.nabble.com/updating-particular-field-tp27742399p27742399.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: CoreAdmin
Hi, Did you *really* go through this page - http://wiki.apache.org/solr/CoreAdmin ? On Thu, Feb 25, 2010 at 7:40 PM, Sudhakar_Thangavel wrote: > > Hi, >Am new to Solr .Am not getting clearly in wiki..can any one tell me > how to configure coreAdmin i need step by step instruction.. > > > > -- > View this message in context: > http://old.nabble.com/CoreAdmin-tp27714440p27714440.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Siddhant
Re: Ruby client fails to build
On Wed, Jan 20, 2010 at 4:19 PM, Erik Hatcher wrote: > Where are you getting your solr-ruby code from? You can simply "gem > install" it to pull in an already pre-built gem. > I'm just picking it up from the 1.4 release. I also tried checking out the latest copy from svn, but the results were the same. So I just figured out I was using the pre built gem the wrong way. Its working fine here. Is there any documentation that you could point me to? Right now I'm just figuring out how to use it on a hit and trial basis, and random googling. The wiki page doesn't tell me much about all the search options supported. Thanks, -- - Siddhant
Ruby client fails to build
Hi, I'm using Solr 1.4 (and trying to use the Ruby client (solr-ruby) to access it). The problem is that I just cant get it to work. :-) If I run the tests (rake test), it fails giving me the following output - /path/to/solr-ruby/test/unit/delete_test.rb:52: invalid multibyte char (US-ASCII) /path/to/solr-ruby/test/unit/delete_test.rb:52: syntax error, unexpected $end, expecting ')' request = Solr::Request::Delete.new(:query => 'ëäïöü') ^ from /home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in `block in ' from /home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in `each' from /home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in `' rake aborted! Command failed with status (1): [/usr/bin/ruby -I"lib" -r solr -r test/unit...] And If I try to build the gem anyway, it fails giving me the following error (after quite a few lines of output) - rake aborted! private method `rm_f' called for File:Class /path/to/solr-ruby/Rakefile:79:in `block (2 levels) in ' Could anyone please tell me what am I missing here? Thanks, -- - Siddhant
Re: Queries of type field:value not functioning
Hi, Thanks for the responses. q.alt did the job. Turns out that the dismax query parser was at fault, and wasn't able to handle queries of the type *:*. Putting the query in q.alt, or adding a defType=lucene (as pointed out to me on the irc channel) worked. Thanks, -- - Siddhant
Queries of type field:value not functioning
Hi all, Any query I make which is of type field:value does not return any documents. Same is the case for the *:* query. The *:* query doesn't return any result either. The index size is close to 1GB now, so it should be returning some documents. The rest of the queries are functioning properly. Any help? Thanks, -- - Siddhant
Re: Reload synonyms
On Tue, Jan 5, 2010 at 2:24 PM, Peter A. Kirk wrote: > Thanks for the answer. How does one "reload" a core? Is there an API, or a > url one can use? > I think this should be it - http://wiki.apache.org/solr/CoreAdmin#RELOAD -- - Siddhant
Re: Adaptive search?
On Tue, Dec 22, 2009 at 12:01 PM, Ryan Kennedy wrote: > This approach will be limited to applying a "global" rank to all the > documents, which may have some unintended consequences. The most > popular document in your index will be the most popular, even for > queries for which it was never clicked on. Right. Makes so much sense. Thanks for sharing. -- - Siddhant
Re: Adaptive search?
Let say we have a search engine (a simple front end - web app kind of a thing - responsible for querying Solr and then displaying the results in a human readable form) based on Solr. If a user searches for something, gets quite a few search results, and then clicks on one such result - is there any mechanism by which we can notify Solr to boost the score/relevance of that particular result in future searches? If not, then any pointers on how to go about doing that would be very helpful. Thanks, On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht wrote: > What can it mean to "adapt to user clicks" ? Quite many things in my head. > Do you have maybe a citation that inspires you here? > > paul > > > Le 17-déc.-09 à 13:52, Siddhant Goel a écrit : > > > Does Solr provide adaptive searching? Can it adapt to user clicks within >> the >> search results it provides? Or that has to be done externally? >> > > -- - Siddhant
Adaptive search?
Hi, Does Solr provide adaptive searching? Can it adapt to user clicks within the search results it provides? Or that has to be done externally? I couldn't find anything on googling for it. Thanks, -- - Siddhant