Re: How does one sort facet queries?
On 2/19/2010 2:15 AM, Kelly Taylor wrote: All sorting of facets works great at the field level (count/index)...all good there...but how is sorting accomplished with range queries? The solrj response doesn't seem to maintain the order the queries are sent in, and the order is not in index or count order. What's the trick? http://localhost:8983/solr/select?q=someterm rows=0 facet=true facet.limit=-1 facet.query=price:[* TO 100] facet.query=price:[100 TO 200] facet.query=price:[200 TO 300] facet.query=price:[300 TO 400] facet.query=price:[400 TO 500] facet.query=price:[500 TO 600] facet.query=price:[600 TO 700] facet.query=price:[700 TO *] facet.mincount=1 collapse.field=dedupe_hash collapse.threshold=1 collapse.type=normal collapse.facet=before The trick I use is to use LocalParams to give eacht facet query a well defined name. Afterwards you can loop through the names in whatever order you want. so basically facet.query={!key=price_0}[* TO 100] etc. N.B. the facet queries in your example will lead to some documents to be counted double (i.e. when the price is exactly 100, 200, 300). Regards, gwk
Re: replications issue
Ciao, Uhm after some time a new index in data/index on the slave has been written with the ~size of the master index. the configure on both master slave is the same one on the solrReplication wiki page enable/disable master/slave in a node requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://localhost:8983/solr/replication/str str name=pollInterval00:00:60/str /lst /requestHandler When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately , these values can be stored in a solrcore.properties file as follows #solrcore.properties in master enable.master=true enable.slave=false Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto: giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: giskard gisk...@autistici.org To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 12:16:37 PM Subject: replications issue Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard -- ciao, giskard
Question regarding wildcards and dismax
Hi all, We have a web application build on top of Solr, and we are using a lot of facets - everything works just fine. When the user first hits the searchpage - we would like to do a get all query to the a result, and thereby get all facets so we can build up the user interface from this result/facets. So I would like to do a q=*:* on the search. But since I have switched to the dismax requesthandler this does not work anymore. ? My request/url looks like this: a) /solr/da/mysearcher/?q=*:* Does not work b) /solr/da/select?q=*:* Does work But I really need to use a) since I control boosting/ranking in the definition. Furthermore when the user drill down the search result, by selecting from the facets, I still need to get the full searchresult, like: /solr/da/mysearcher/?q=*:*fq=color:red Does not work. Can anyone help here? I think that the situation for my web application here is quite normal (Get a full resultset to build facets, then let the user du a drill down etc) Thanks a lot in advance med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk Alpha Solutions A/S Borgergade 2, 3.sal, 1300 København K Tel: (+45) 70 20 65 38 Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/ ** This message including any attachments may contain confidential and/or privileged information intended only for the person or entity to which it is addressed. If you are not the intended recipient you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately by telephone, or e-mail and delete all copies of this message and any attachments from your system. Thank you.
Re: Question regarding wildcards and dismax
Have a look at the q.alt parameter (http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt) which is used for exactly this issue. Basically putting q.alt=*:* in your query means you can leave out the q parameter if you want all documents to be selected. Regards, gwk On 2/19/2010 11:28 AM, Roland Villemoes wrote: Hi all, We have a web application build on top of Solr, and we are using a lot of facets - everything works just fine. When the user first hits the searchpage - we would like to do a get all query to the a result, and thereby get all facets so we can build up the user interface from this result/facets. So I would like to do a q=*:* on the search. But since I have switched to the dismax requesthandler this does not work anymore. ? My request/url looks like this: a) /solr/da/mysearcher/?q=*:* Does not work b) /solr/da/select?q=*:* Does work But I really need to use a) since I control boosting/ranking in the definition. Furthermore when the user drill down the search result, by selecting from the facets, I still need to get the full searchresult, like: /solr/da/mysearcher/?q=*:*fq=color:red Does not work.
range of scores : queryNorm()
Hello , I have observed that even if we change boosting drastically, scores are being normalized at the end because of queryNorm value. Is there anything ( regarding to the queryNorm) that we can rely on ? like score will always be under 10 or some fixed value ? The main objective is to provide scores in a fixed range to the partner. So have you been experienced anything like this? Is it possible to do so ?. Have you been experienced any strange situation like for a particular query, result scores were really high compared to routine? if yes,I would like to know the factor that effected scores drastically, because it may help me to proceed or understand the cases. Thanks.
Re: Range Searches in Collections
Unfortunately the number of fees is unknown so we couldn't add the fields into the solr schema until runtime. The work-around we did was create an additional column in the view I'm pulling from for the index to determine each record's minimum fee and throw that into the column. A total hack, but now I can simply sort on the minFee and problem (hackingly) solved! :) Otis Gospodnetic wrote: Hm, yes, it sounds like your fees field has multiple values/tokens, one for each fee. That's full-text search for you. :) How about having multiple fee fields, each with just one fee value? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: cjkadakia cjkada...@sonicbids.com To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 7:58:23 PM Subject: Range Searches in Collections Hi, I'm trying to do a search on a range of floats that are part of my solr schema. Basically we have a collection of fees that are associated with each document in our index. The query I tried was: q=fees:[3 TO 10] This should return me documents with Fee values between 3 and 10 inclusively, which it does. However, I need it to check for ALL items in this collection, not just one that satisfies it. Currently, this is returning me documents with fee values above 10 and below 3 as long as it contains at least one other within. Any suggestions on how to accomplish this? -- View this message in context: http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27653341.html Sent from the Solr - User mailing list archive at Nabble.com.
highlighting fragments EMPTY
hi i am trying to get highlighting working and its turning out to be a pain. here is my schema field name=id type=string indexed=true stored=true required=true / field name=title type=string indexed=true stored=true / field name=pi type=string indexed=true stored=true / field name=status type=string indexed=true stored=true / here is the catchall field (default field for search as well) field name=content type=text indexed=true stored=false multiValued=true/ here is how I have setup the solrconfig file !-- example highlighter config, enable per-query with hl=true -- str name=hl.fltitle pi status/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.title.hl.alternateFieldcontent/str str name=f.pi.hl.alternateFieldcontent/str str name=f.status.hl.alternateFieldcontent/str str name=f.title.hl.fragmenterregex/str !-- defined below -- str name=f.pi.hl.fragmenterregex/str !-- defined below -- str name=f.status.hl.fragmenterregex/str !-- defined below -- after this when I search for lets say http://localhost:8983/solr/select?q=submithl=true I get these results in highlight section lst name=highlighting lst name=FP1934 / lst name=FP1934-PR02 / lst name=FP1934-PR03 / lst name=FP0526 / lst name=FP0385 / /lst with no reference to the actual string .. this number thats being returned is the id of the records .. and is also the unique identifier .. why am I not getting the string fragments with search terms highlighted thanks for ur help -- View this message in context: http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting fragments EMPTY
All of your fields seem to be of a string type, that's why the highlighting doesn't work. The highlighting fields must be tokenized before you can do the highlighting on them. Jan. --- On Fri, 2/19/10, adeelmahmood adeelmahm...@gmail.com wrote: From: adeelmahmood adeelmahm...@gmail.com Subject: highlighting fragments EMPTY To: solr-user@lucene.apache.org Date: Friday, February 19, 2010, 4:46 PM hi i am trying to get highlighting working and its turning out to be a pain. here is my schema field name=id type=string indexed=true stored=true required=true / field name=title type=string indexed=true stored=true / field name=pi type=string indexed=true stored=true / field name=status type=string indexed=true stored=true / here is the catchall field (default field for search as well) field name=content type=text indexed=true stored=false multiValued=true/ here is how I have setup the solrconfig file !-- example highlighter config, enable per-query with hl=true -- str name=hl.fltitle pi status/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.title.hl.alternateFieldcontent/str str name=f.pi.hl.alternateFieldcontent/str str name=f.status.hl.alternateFieldcontent/str str name=f.title.hl.fragmenterregex/str !-- defined below -- str name=f.pi.hl.fragmenterregex/str !-- defined below -- str name=f.status.hl.fragmenterregex/str !-- defined below -- after this when I search for lets say http://localhost:8983/solr/select?q=submithl=true I get these results in highlight section lst name=highlighting lst name=FP1934 / lst name=FP1934-PR02 / lst name=FP1934-PR03 / lst name=FP0526 / lst name=FP0385 / /lst with no reference to the actual string .. this number thats being returned is the id of the records .. and is also the unique identifier .. why am I not getting the string fragments with search terms highlighted thanks for ur help -- View this message in context: http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is largest reasonable setting for ramBufferSizeMB?
On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton glen.new...@gmail.com wrote: You may consider using LuSql[1] to create the indexes, if your source content is in a JDBC accessible db. It is quite a bit faster than Solr, as it is a tool specifically created and tuned for Lucene indexing. Any idea why it's faster? AFAIK, the main purpose of DIH is indexing databases too. If DIH is much slower, we should speed it up! -Yonik http://www.lucidimagination.com
Re: long warmup duration
Hey, I am quite confused with your configuration. It seems to me, that your caches are extremly small for 30 million documents (128) and during warmup you only put up to 20 docs in it. Please correct me if I misunderstand anything. In my opinion your warm up duration is not that impressiv, since we currently disabled warmup, the new searcher is registered only in a few seconds. Actually, I would not drop these cache numbers. With a cache of 30k documents we had a hitraion of 60%, decreasing this size the hitratio decreased as well. With a hitratio of currently 30% it seems to be better to disable caching anyway. Of course we would love to use caching ;-). with best regards, Stefan Antonio Lobato wrote: Drop those cache numbers. Way down. I warm up 30 million documents in about 2 minutes with the following configuration: documentCache class=solr.FastLRUCache size=128 initialSize=10 cleanupThread=true / queryResultCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / fieldValueCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / filterCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / Mind you, I also use Solr 1.4. Also, setup a decent warming query or two, as so: lst str name=qdate:[NOW-2DAYS TO NOW]/str str name=start0/str str name=rows100/str str name=sortdate desc/str/lst Don't warm facets that have a large amount of terms or you will kill your warm up time. Hope this helps! On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote: Hi all, we are facing extremly increasing warmup times the last 15 days, which we are not able to explain, since the number of documents and their size is stable. Before the increase we can commit our changes in nearly 20 minutes, now it is about 2 hours. We were able to identify the warmup of the caches (queryresultCache and filterCache) as the reason. We tried to decrease the number of warmup elements from 3 to 1 without any impact. What influences the runtime during the warmup? Is there any possibility to boost the warmup? I attach some more information and statistics. Thanks a lot for your help. Stefan Solr:1.3 Documents: 4.000.000 -Xmx 12G index size/disc 4.7G config: queryResultWindowSize100/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached No queries configured for warming. CACHES: === name: queryResultCache class: org.apache.solr.search.LRUCache version:1.0 description:LRU Cache(maxSize=20, initialSize=3, autowarmCount=1, regenerator=org.apache.solr.search.solrindexsearche...@36eb7331) stats: lookups:15958 hits : 9589 hitratio: 0.60 inserts:16211 evictions: 0 size: 16169 warmupTime :1960239 cumulative_lookups: 436250 cumulative_hits:260678 cumulative_hitratio:0.59 cumulative_inserts: 174066 cumulative_evictions: 0 name:filterCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=20, initialSize=3, autowarmCount=3, regenerator=org.apache.solr.search.solrindexsearche...@9818f80) stats: lookups: 6313622 hits: 6304004 hitratio: 0.99 inserts: 42266 evictions: 0 size: 40827 warmupTime: 1268074 cumulative_lookups: 118887830 cumulative_hits: 118605224 cumulative_hitratio: 0.99 cumulative_inserts: 296134 cumulative_evictions: 0 -- Stefan Neumann Dipl.-Ing. freiheit.com technologies gmbh Straßenbahnring 22 / 20251 Hamburg, Germany fon +49 (0)40 / 890584-0 fax +49 (0)40 / 890584-20 HRB Hamburg 70814 1CB2 BA3C 168F 0C2B 6005 FC5E 3EBA BCE2 1BF0 21D3 Geschäftsführer: Claudia Dietze, Stefan Richter, Jörg Kirchhof
Re: Run Solr within my war
Using EmbeddedSolrServer is a client side way of communicating with Solr via the file system. Solr has to still be up and running before that. My question is more along the lines of how to put the server jars that perform the core functionality and bundle them to start up within a war which is also the application war for the program that will communicate as the client with the Solr server. On Thu, Feb 18, 2010 at 5:49 PM, Richard Frovarp rfrov...@apache.org wrote: On 2/18/2010 4:22 PM, Pulkit Singhal wrote: Hello Everyone, I do NOT want to host Solr separately. I want to run it within my war with the Java Application which is using it. How easy/difficult is that to setup? Can anyone with past experience on this topic, please comment. thanks, - Pulkit So basically you're talking about running an embedded version of Solr like the EmbeddedSolrServer? I have no experience on this, but this should provide you the correct search term to find documentation on use. From what little code I've seen to run test cases against Solr, it looks relatively straight forward to get running. To use you would use the SolrJ library to communicate with the embedded solr server. Richard
Re: @Field annotation support
Ok then, is this the correct class to support the @Field annotation? Because I have it on the path but its not working. org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class 2010/2/18 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation? -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Run Solr within my war
Pulkit Singhal wrote: Using EmbeddedSolrServer is a client side way of communicating with Solr via the file system. Solr has to still be up and running before that. My question is more along the lines of how to put the server jars that perform the core functionality and bundle them to start up within a war which is also the application war for the program that will communicate as the client with the Solr server. I could be way wrong, but my interpretation is that EmbeddedSolrServer provides a way to embed Solr into an application without requiring that anything else is running. http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer If you are looking for a method of your application doing SolrJ calls to Solr, without having to install a separate Solr instance, EmbeddedSolrServer would meet your needs. You'd have to use a few other functions to load the core and register it, but it's doable without having anything else running.
Re: What is largest reasonable setting for ramBufferSizeMB?
Hi Glen, I'd love to use LuSql, but our data is not in a db. Its 6-8TB of files containing OCR (one file per page for about 1.5 billion pages) gzipped on disk which are ugzipped, concatenated, and converted to Solr documents on-the-fly. We have multiple instances of our Solr document producer script running. At this point we can run enough producers, so that the rate at which Solr can ingest and index documents is our current bottleneck and so far the bottleneck we see for indexing appears to be disk I/O for Solr/Lucene during merges. Is there any obvious relationship between the size of the ramBuffer and how much heap you need to give the JVM, or is there some reasonable method of finding this out by experimentation? We would rather not find out by decreasing the amount of memory allocated to the JVM until we get an OOM. Tom I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't noticed the GC issues mark mentioned in this configuration, I have seen them in the ranges he discusses (on 1.6 update 18). You may consider using LuSql[1] to create the indexes, if your source content is in a JDBC accessible db. It is quite a bit faster than Solr, as it is a tool specifically created and tuned for Lucene indexing. But it is command-line, not RESTful like Solr. The released version of LuSql only runs single machine (though designed for many threads), the new release will allow distributing indexing across any number of machines (with each machine building a shard). The new release also has plugable sources, so it is not restricted to JDBC. -Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql -- View this message in context: http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27658384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Example
Are you sure that you don't have any java processes that are still running? Did you change the port or are you still using 8983? Lee Smith-6 wrote: Hey All Trying to dip my feet into multicore and hoping someone can advise why the example is not working. Basically I have been working with the example single core fine so I have stopped the server and restarted with the new command line for multicore ie, java -Dsolr.solr.home=multicore -jar start.jar When it launches I get this error: 2010-02-19 11:13:39.740::WARN: EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at etc Any ideas what this can be because I have stopped the first one. Thank you if you can advise. -- View this message in context: http://old.nabble.com/Multicore-Example-tp27659052p27659102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Example
Do you have something else using port 8983 or 8080? Sent from my iPhone On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote: Hey All Trying to dip my feet into multicore and hoping someone can advise why the example is not working. Basically I have been working with the example single core fine so I have stopped the server and restarted with the new command line for multicore ie, java -Dsolr.solr.home=multicore -jar start.jar When it launches I get this error: 2010-02-19 11:13:39.740::WARN: EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at etc Any ideas what this can be because I have stopped the first one. Thank you if you can advise.
Re: Seattle Hadoop/Lucene/NoSQL Meetup; Wed Feb 24th, Feat. MongoDB
Reminder: this month's Seattle Hadoop Meetup is this Wednesday. Don't forget to RSVP! On Tue, Feb 16, 2010 at 6:09 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL Meetup! As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here: http://www.washington.edu/home/maps/southcentral.html?cse Last month, we had a great talk from Steve McPherson of Razorfish on their usage of Hadoop. This month, we'll have Richard Kreuter from MongoDB talking about, well, MongoDB. As well as assorted discussion on the Hadoop ecosystem. If you can, please RSVP here (not required, but very nice): http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/ My cell # is 904-415-3009 if you have questions/get lost. Cheers, Bradford -- http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Multicore Example
How can I find out ?? On 19 Feb 2010, at 19:26, Dave Searle wrote: Do you have something else using port 8983 or 8080? Sent from my iPhone On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote: Hey All Trying to dip my feet into multicore and hoping someone can advise why the example is not working. Basically I have been working with the example single core fine so I have stopped the server and restarted with the new command line for multicore ie, java -Dsolr.solr.home=multicore -jar start.jar When it launches I get this error: 2010-02-19 11:13:39.740::WARN: EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at etc Any ideas what this can be because I have stopped the first one. Thank you if you can advise.
Strange performance behaviour when concurrent requests are done
Hey there, I have been doing some stress with a 2 physical CPU (with 4 cores each) server. After some reading about GC performance tunning I have configured it this way: /usr/lib/jvm/java-6-sun/bin/java -server -Xms7000m -Xmx7000m -XX:ReservedCodeCacheSize=10m -XX:NewSize=1000m -XX:MaxNewSize=1000m -XX:SurvivorRatio=16 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:PermSize=35m -XX:MaxPermSize=35m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/tomcat-shard-00/common/endorsed My java version is: java version 1.6.0_12 Java(TM) SE Runtime Environment (build 1.6.0_12-b04) Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode) My index is optimized with compound file and in readOnly mode. I just have a solr core in my Solr box. I have launched different test against an index and for my surprise the results are: test1: number of concurrent threads: 2 throughput: 15 response ms: 130 test2: number of concurrent threads: 3 throughput: 22.3 average response ms: 130 test3: number of concurrent threads: 4 throughput: 28 average response ms: 140 test4: number of concurrent threads: 5 throughput: 26.8 average response ms: 190 test5: number of concurrent threads: 6 throughput: 22 average response ms: 270 All requests are launched to the same IndexSearcher (no reloads or warmings are done during the test) I have activated the debug in the JVM to see when a GC happens. It is happening every 3 seconds and it takes 20ms aprox in test1,test2,test3. In test4 and test5 it happens every 3 seconds aswell and takes 40ms. So, looks like GC is not delaying the average response time of the requests. The machine has 4 cores and it is really not stressed in terms of CPU, neighter IO (I am using ssd disk). Given this scenario, how is it possible that changing from 5 concurrent threads to 6 the average response time is almost double? (or from 4 to 5 is not double but it still significantly more) I think GC can't be the cause given the numbers I have mencioned. As far as I always have understood Lucene IndexSearcher deals perfectly with concurreny but it's seems that there's something there that blocks when there is more that 2 requests at the same time. Compound file optimization gives better response times but could in any way be bad for performance? I am so confused about this... can someone explain me if this is normal or why does it happens? I mean, if lucene or Solr has some blocking thing? Thanks in advance -- View this message in context: http://old.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp27659695p27659695.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Documents disappearing
Try inspecting your index with luke Ankit -Original Message- From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] Sent: Friday, February 19, 2010 2:22 PM To: solr-user@lucene.apache.org Subject: Documents disappearing Hi, I have encounter a situation that I can't explain. We are indexing documents that are often duplicates so we activated deduplication like this: processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupestrue/bool str name=signatureFieldsignature/str str name=fieldstitle,text/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor What I can't explain is that when I look at the documents count in the log, I see documents disappearing. 11:24:23 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0 14:04:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10 14:17:07 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42 14:25:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1 14:47:12 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12 15:17:22 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13 15:47:31 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19 16:17:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13 16:38:17 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10 16:39:10 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1 16:47:40 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46 16:51:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74 17:02:13 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102 17:17:41 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8 11:24 was the time at which Solr was started that day. Around 13:30, we started the indexation. At some point during the indexation, I notice that a batch a documents were resend (i.e, documents with the same id field were sent again to the index). And according to the log, NO delete was sent to Solr. I understand that if I send duplicates (either documents with the same id or with the same signature), the count of documents should stay the same. But how can we explain that it is lowering? What are the possible causes of this behavior? Thanks! -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Example
Are you on windows? Try netstat -a Sent from my iPhone On 19 Feb 2010, at 20:02, Lee Smith l...@weblee.co.uk wrote: How can I find out ?? On 19 Feb 2010, at 19:26, Dave Searle wrote: Do you have something else using port 8983 or 8080? Sent from my iPhone On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote: Hey All Trying to dip my feet into multicore and hoping someone can advise why the example is not working. Basically I have been working with the example single core fine so I have stopped the server and restarted with the new command line for multicore ie, java -Dsolr.solr.home=multicore -jar start.jar When it launches I get this error: 2010-02-19 11:13:39.740::WARN: EXCEPTION java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at etc Any ideas what this can be because I have stopped the first one. Thank you if you can advise.
Re: Multicore Example
Assuming you are on a unix variant with a working lsof, use this. This probably won't work correctly on Solaris 10: lsof -nPi | grep 8983 lsof -nPi | grep 8080 On Windows, you can do this in a command prompt. It requires elevation on Vista or later. The -b option was added in WinXP SP2 and Win2003 SP1, without it you can't see the program name that's got the port open: netstat -b ports.txt ports.txt Shawn On 2/19/2010 1:01 PM, Lee Smith wrote: How can I find out ?? On 19 Feb 2010, at 19:26, Dave Searle wrote: Do you have something else using port 8983 or 8080?
RE: Documents disappearing
Using LukeRequestHandler, I see: int name=numDocs7725/int int name=maxDoc28099/int int name=numTerms758826/int long name=version1266355690710/long bool name=optimizedfalse/bool bool name=currenttrue/bool bool name=hasDeletionstrue/bool str name=directory org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index /str I will copy the index to my local machine so I can open it with luke. Should I look for something specific? Thanks! ANKITBHATNAGAR wrote: Try inspecting your index with luke Ankit -Original Message- From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] Sent: Friday, February 19, 2010 2:22 PM To: solr-user@lucene.apache.org Subject: Documents disappearing Hi, I have encounter a situation that I can't explain. We are indexing documents that are often duplicates so we activated deduplication like this: processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupestrue/bool str name=signatureFieldsignature/str str name=fieldstitle,text/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor What I can't explain is that when I look at the documents count in the log, I see documents disappearing. 11:24:23 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0 14:04:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10 14:17:07 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42 14:25:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1 14:47:12 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12 15:17:22 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13 15:47:31 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19 16:17:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13 16:38:17 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10 16:39:10 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1 16:47:40 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46 16:51:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74 17:02:13 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102 17:17:41 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8 11:24 was the time at which Solr was started that day. Around 13:30, we started the indexation. At some point during the indexation, I notice that a batch a documents were resend (i.e, documents with the same id field were sent again to the index). And according to the log, NO delete was sent to Solr. I understand that if I send duplicates (either documents with the same id or with the same signature), the count of documents should stay the same. But how can we explain that it is lowering? What are the possible causes of this behavior? Thanks! -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Example
Thanks Shawn I am actually running it on mac It does not like those unix commands ?? Any further advice ? Lee On 19 Feb 2010, at 20:32, Shawn Heisey wrote: Assuming you are on a unix variant with a working lsof, use this. This probably won't work correctly on Solaris 10: lsof -nPi | grep 8983 lsof -nPi | grep 8080 On Windows, you can do this in a command prompt. It requires elevation on Vista or later. The -b option was added in WinXP SP2 and Win2003 SP1, without it you can't see the program name that's got the port open: netstat -b ports.txt ports.txt Shawn On 2/19/2010 1:01 PM, Lee Smith wrote: How can I find out ?? On 19 Feb 2010, at 19:26, Dave Searle wrote: Do you have something else using port 8983 or 8080?
Re: Multicore Example
The point that these guys are trying to make is that if another program is using the port that Solr is trying to bind to then they will both fight over the exclusive use of the port. Both the netstat and lsof command work fine on my Mac (Leopard 10.5.8). Trinity:~ kelvin$ which netstat /usr/sbin/netstat Trinity:~ kelvin$ which lsof /usr/sbin/lsof Trinity:~ kelvin$ If you use MacPorts, you can also find out port information using 'nmap'. If something is already using the port Solr is trying to use then you need to configure Solr to use a different port. K On Fri, Feb 19, 2010 at 12:51 PM, Lee Smith l...@weblee.co.uk wrote: Thanks Shawn I am actually running it on mac It does not like those unix commands ?? Any further advice ? Lee On 19 Feb 2010, at 20:32, Shawn Heisey wrote: Assuming you are on a unix variant with a working lsof, use this. This probably won't work correctly on Solaris 10: lsof -nPi | grep 8983 lsof -nPi | grep 8080 On Windows, you can do this in a command prompt. It requires elevation on Vista or later. The -b option was added in WinXP SP2 and Win2003 SP1, without it you can't see the program name that's got the port open: netstat -b ports.txt ports.txt Shawn On 2/19/2010 1:01 PM, Lee Smith wrote: How can I find out ?? On 19 Feb 2010, at 19:26, Dave Searle wrote: Do you have something else using port 8983 or 8080?
Solr 1.5 in production
What is the prevailing opinion on using solr 1.5 in a production environment? I know that many people were using 1.4 in production for a while before it became an official release. Specifically I'm interested in using some of the new spatial features. Thanks, Asif -- Asif Rahman Lead Engineer - NewsCred a...@newscred.com http://platform.newscred.com
Re: Solr 1.5 in production
On Feb 19, 2010, at 4:54 PM, Asif Rahman wrote: What is the prevailing opinion on using solr 1.5 in a production environment? I know that many people were using 1.4 in production for a while before it became an official release. Specifically I'm interested in using some of the new spatial features. These aren't fully baked yet (still need some spatial filtering capabilities which I'm getting close to done with, or close enough to submit a patch anyway), but feedback would be welcome. The main risk, I suppose, is that any new APIs could change. Other than that, the usually advice applies: Test it out in your environment and see if it meets your needs. On the spatial stuff, we'd definitely appreciate feedback on performance, functionality, APIs, etc. -Grant
Re: long warmup duration
You can disable warming, and a new searcher will register (almost) instantly, no matter the size. However, once you run your first search, you will be warming your searcher, and it will block for a long, long time, giving the end user a frozen page. Warming is just another word for running a set of queries before the searcher is pushed to the front end. Naturally if you disable warming, your searcher will register right away. I wouldn't recommend it though. If I disable warming on my documents, my new searchers would register instantly, but my first search on my web page would be stuck for 50 seconds or so. As for the cache size, caching does a cache on entry data, not documents. That's what warming is for. On 2/19/2010 12:17 PM, Stefan Neumann wrote: Hey, I am quite confused with your configuration. It seems to me, that your caches are extremly small for 30 million documents (128) and during warmup you only put up to 20 docs in it. Please correct me if I misunderstand anything. In my opinion your warm up duration is not that impressiv, since we currently disabled warmup, the new searcher is registered only in a few seconds. Actually, I would not drop these cache numbers. With a cache of 30k documents we had a hitraion of 60%, decreasing this size the hitratio decreased as well. With a hitratio of currently 30% it seems to be better to disable caching anyway. Of course we would love to use caching ;-). with best regards, Stefan Antonio Lobato wrote: Drop those cache numbers. Way down. I warm up 30 million documents in about 2 minutes with the following configuration: documentCache class=solr.FastLRUCache size=128 initialSize=10 cleanupThread=true / queryResultCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / fieldValueCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / filterCache class=solr.FastLRUCache size=128 initialSize=10 autowarmCount=20 cleanupThread=true / Mind you, I also use Solr 1.4. Also, setup a decent warming query or two, as so: lst str name=qdate:[NOW-2DAYS TO NOW]/str str name=start0/str str name=rows100/str str name=sortdate desc/str/lst Don't warm facets that have a large amount of terms or you will kill your warm up time. Hope this helps! On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote: Hi all, we are facing extremly increasing warmup times the last 15 days, which we are not able to explain, since the number of documents and their size is stable. Before the increase we can commit our changes in nearly 20 minutes, now it is about 2 hours. We were able to identify the warmup of the caches (queryresultCache and filterCache) as the reason. We tried to decrease the number of warmup elements from 3 to 1 without any impact. What influences the runtime during the warmup? Is there any possibility to boost the warmup? I attach some more information and statistics. Thanks a lot for your help. Stefan Solr: 1.3 Documents: 4.000.000 -Xmx12G index size/disc 4.7G config: queryResultWindowSize100/queryResultWindowSize queryResultMaxDocsCached200/queryResultMaxDocsCached No queries configured for warming. CACHES: === name: queryResultCache class: org.apache.solr.search.LRUCache version:1.0 description:LRU Cache(maxSize=20, initialSize=3, autowarmCount=1, regenerator=org.apache.solr.search.solrindexsearche...@36eb7331) stats: lookups:15958 hits : 9589 hitratio: 0.60 inserts:16211 evictions: 0 size: 16169 warmupTime :1960239 cumulative_lookups: 436250 cumulative_hits:260678 cumulative_hitratio:0.59 cumulative_inserts: 174066 cumulative_evictions: 0 name: filterCache class: org.apache.solr.search.LRUCache version:1.0 description:LRU Cache(maxSize=20, initialSize=3, autowarmCount=3, regenerator=org.apache.solr.search.solrindexsearche...@9818f80) stats: lookups:6313622 hits: 6304004 hitratio: 0.99 inserts: 42266 evictions: 0 size: 40827 warmupTime: 1268074 cumulative_lookups: 118887830 cumulative_hits: 118605224 cumulative_hitratio: 0.99 cumulative_inserts: 296134 cumulative_evictions: 0
filter result by catalog
So, I am looking at better ways to filter a resultset by catalog. So, I have an index of products. And based on the user, I want to filter the search results to what they are allowed to see. I will probably have up to 200 or so different catalogs.
Re: highlighting fragments EMPTY
well ok I guess that makes sense and I tried changing my title field to text type and then highlighting worked on it .. but 1) as far as not merging all fields in catchall field and instead configuring the dismax handler to search through them .. do you mean then ill have to specify the field I want to do the search in .. e.g. q=somethinghl.fl=title or q=somethingelsehl.fl=status .. and another thing is that I have abuot 20 some fields which I am merging in my catch all fields .. with that many fields do you still think its better to use dismax or catchall field ??? 2) secondly for highlighting q=title:searchterm also didnt worked .. it only works if I change the type of title field to text instead of string .. even if I give the full string in q param .. it still doesnt highlights it unless like I said I change the field type to text ... so why is that .. and if thats just how it is and I have to change some of my fields to text .. then my question is that solr will analyze them first their own field and then copy them to the catchall field while doing the analysis one more time .. since catchall field is also text .. i guess this is just more of a understanding question thanks for all u guys help Ahmet Arslan wrote: hi i am trying to get highlighting working and its turning out to be a pain. here is my schema field name=id type=string indexed=true stored=true required=true / field name=title type=string indexed=true stored=true / field name=pi type=string indexed=true stored=true / field name=status type=string indexed=true stored=true / here is the catchall field (default field for search as well) field name=content type=text indexed=true stored=false multiValued=true/ here is how I have setup the solrconfig file !-- example highlighter config, enable per-query with hl=true -- str name=hl.fltitle pi status/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.title.hl.alternateFieldcontent/str str name=f.pi.hl.alternateFieldcontent/str str name=f.status.hl.alternateFieldcontent/str str name=f.title.hl.fragmenterregex/str !-- defined below -- str name=f.pi.hl.fragmenterregex/str !-- defined below -- str name=f.status.hl.fragmenterregex/str !-- defined below -- after this when I search for lets say http://localhost:8983/solr/select?q=submithl=true I get these results in highlight section lst name=highlighting lst name=FP1934 / lst name=FP1934-PR02 / lst name=FP1934-PR03 / lst name=FP0526 / lst name=FP0385 / /lst with no reference to the actual string .. this number thats being returned is the id of the records .. and is also the unique identifier .. why am I not getting the string fragments with search terms highlighted You need to change type of fields (title, pi, staus) from string to text (same as content field). There should be a match/hit on that field in order to create highlighted snippets. For example q=title:submit should return documents so that snippet of title can be generated. FYI: You can search title, pi, status at the same time using http://wiki.apache.org/solr/DisMaxRequestHandler without copying all of them into a catch all field. -- View this message in context: http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27661657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheck.build=true has no effect
Hello Can someone please correct me or acknowlege me is this the correct behaviour. Thanksdarniz darniz wrote: Hello All. After doing a lot of research i came to this conclusion please correct me if i am wrong. i noticed that if you have buildonCommit and buildOnOptimize as true in your spell check component, then the spell check builds whenever a commit or optimze happens. which is the desired behaviour and correct. please read on. I am using Index based spell checker and i am copying make and model to my spellcheck field. i index some document and the make and model are being copied to spellcheck field when i commit. Now i stopped my solr server and I added one more filed bodytype to be copied to my spellcheck field. i dont want to reindex data so i issued a http request to rebuild my spellchecker spellcheck=truespellcheck.build=truespellcheck.dictionary=default. Looks like the above command has no effect, the bodyType is not being copied to spellcheck field. The only time the spellcheck filed has bodyType value copied into it is when i have to do again reindex document and do a commmit. Is this the desired behaviour. Adding buildOncommit and buildOnOptimize will force the spellchecker to rebuild only if a commit or optimize happens Please let me know if there are some configurable parameters so that i can issue the http command rather than indexing data again and again. thanks darniz -- View this message in context: http://old.nabble.com/spellcheck.build%3Dtrue-has-no-effect-tp27648346p27661847.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter result by catalog
So, hello Kevin, So what have you tried so far? I see from http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl you've tried the acl field approach. How about the bitset approach described there? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Kevin Osborn osbo...@yahoo.com To: Solr solr-user@lucene.apache.org Sent: Fri, February 19, 2010 6:06:51 PM Subject: filter result by catalog So, I am looking at better ways to filter a resultset by catalog. So, I have an index of products. And based on the user, I want to filter the search results to what they are allowed to see. I will probably have up to 200 or so different catalogs.
Re: Documents disappearing
Pascal, Look at that difference between numDocs and maxDocs. That delta represents deleted docs. Maybe there is something deleting your docs after all! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Pascal Dimassimo thesuper...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 3:50:26 PM Subject: RE: Documents disappearing Using LukeRequestHandler, I see: 7725 28099 758826 1266355690710 false true true org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index I will copy the index to my local machine so I can open it with luke. Should I look for something specific? Thanks! ANKITBHATNAGAR wrote: Try inspecting your index with luke Ankit -Original Message- From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] Sent: Friday, February 19, 2010 2:22 PM To: solr-user@lucene.apache.org Subject: Documents disappearing Hi, I have encounter a situation that I can't explain. We are indexing documents that are often duplicates so we activated deduplication like this: class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory true true signature title,text name=signatureClassorg.apache.solr.update.processor.Lookup3Signature What I can't explain is that when I look at the documents count in the log, I see documents disappearing. 11:24:23 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0 14:04:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10 14:17:07 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42 14:25:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1 14:47:12 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12 15:17:22 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13 15:47:31 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19 16:17:42 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13 16:38:17 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10 16:39:10 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1 16:47:40 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46 16:51:24 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74 17:02:13 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102 17:17:41 INFO - [myindex] webapp=null path=null params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8 11:24 was the time at which Solr was started that day. Around 13:30, we started the indexation. At some point during the indexation, I notice that a batch a documents were resend (i.e, documents with the same id field were sent again to the index). And according to the log, NO delete was sent to Solr. I understand that if I send duplicates (either documents with the same id or with the same signature), the count of documents should stay the same. But how can we explain that it is lowering? What are the possible causes of this behavior? Thanks! -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is largest reasonable setting for ramBufferSizeMB?
Glen may be referring to LuSql indexing with multiple threads? Does/can DIH do that, too? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 11:41:07 AM Subject: Re: What is largest reasonable setting for ramBufferSizeMB? On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton wrote: You may consider using LuSql[1] to create the indexes, if your source content is in a JDBC accessible db. It is quite a bit faster than Solr, as it is a tool specifically created and tuned for Lucene indexing. Any idea why it's faster? AFAIK, the main purpose of DIH is indexing databases too. If DIH is much slower, we should speed it up! -Yonik http://www.lucidimagination.com
Re: replications issue
Hello, You are replicating every 60 seconds? I hope you don't have a large index with lots of continuous index updates on the master, as replicating every 60 seconds, while doable, may be a bit too frequent (depending on index size, amount of changes, cache settings, etc.). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: giskard gisk...@autistici.org To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 4:11:56 AM Subject: Re: replications issue Ciao, Uhm after some time a new index in data/index on the slave has been written with the ~size of the master index. the configure on both master slave is the same one on the solrReplication wiki page enable/disable master/slave in a node ${enable.master:false} commit schema.xml,stopwords.txt ${enable.slave:false} http://localhost:8983/solr/replication 00:00:60 When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately , these values can be stored in a solrcore.properties file as follows #solrcore.properties in master enable.master=true enable.slave=false Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto: giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: giskard To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 12:16:37 PM Subject: replications issue Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard -- ciao, giskard
Re: optimize is taking too much time
Hello, Solr will never optimize the whole index without somebody explicitly asking for it. Lucene will merge index segments on the master as documents are indexed. How often it does that depends on mergeFactor. See: http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: mklprasad mklpra...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, February 19, 2010 1:02:11 AM Subject: Re: optimize is taking too much time Jagdish Vasani-2 wrote: Hi, you should not optimize index after each insert of document.insted you should optimize it after inserting some good no of documents. because in optimize it will merge all segments to one according to setting of lucene index. thanks, Jagdish On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote: hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com. Yes, Thanks for reply i have removed the optmize() from code. but i have a doubt .. 1.Will mergefactor internally do any optmization (or) we have to specify 2. Even if solr initaiates optmize if i have a large data like 52GB will that takes huge time? Thanks, Prasad -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter result by catalog
Yes I thought about both methods. The ACL method is easier, but has some scalability issues. We use the bitset method in another product, but there are some complexity and resource problems. This is a new project so I am revisiting the issue to see if anyone had any better ideas. On Fri Feb 19th, 2010 6:18 PM PST Otis Gospodnetic wrote: So, hello Kevin, So what have you tried so far? I see from http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl you've tried the acl field approach. How about the bitset approach described there? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Kevin Osborn osbo...@yahoo.com To: Solr solr-user@lucene.apache.org Sent: Fri, February 19, 2010 6:06:51 PM Subject: filter result by catalog So, I am looking at better ways to filter a resultset by catalog. So, I have an index of products. And based on the user, I want to filter the search results to what they are allowed to see. I will probably have up to 200 or so different catalogs.