caching
If i don't explicity set any default query in the solrconfig.xml for caching and make use of the default config file,does solr do the caching automatically based on the query? Thanks
Dynamic range Facets
my documents (products) have a price field, and I want to have a dynamically calculated range facet for that in the response. E.g. I want to have this in the response price:[* TO 20] - 23 price:[20 TO 40] - 42 price:[40 TO *] - 33 if prices are between 0 and 60 but price:[* TO 100] - 23 price:[100 TO 200] - 42 price:[200 TO *] - 33 if prices are between 0 and 300 So the question is how to get the dynamic facets response from solr. This is same question as previously posted back in 2007. But still waits an answer?? Is there any solution on this?? -- View this message in context: http://www.nabble.com/Dynamic-range-Facets-tp22675413p22675413.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem for replication : segment optimized automaticly
How can I stop this? Noble Paul നോബിള് नोब्ळ् wrote: if the DIH status does not say that it optimized, it is lucene mergeing the segments On Mon, Mar 23, 2009 at 8:15 PM, sunnyfr johanna...@gmail.com wrote: I checked this out but It doesn't say nothing about optimizing. I'm sure it's lucene part about merging or I don't know ...?? Noble Paul നോബിള് नोब्ळ् wrote: the easiest way to find out what DIH did is to hit it's status command. It will give you a brief description of what all it did during last import On Mon, Mar 23, 2009 at 12:59 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Lucene will automatically merge segments when they exceed the mergeFactor. This may be one reason but I'm not sure. I checked DataImportHandler's code again. It won't optimize if optimize=false is specified. On Mon, Mar 23, 2009 at 12:43 AM, sunnyfr johanna...@gmail.com wrote: Do you have any idea ??? :( cheer, sunnyfr wrote: Hi everybody ... still me :) hoo happy day :) Just, I dont get where I miss something, I will try to be clear. this is my index folder (and we can notice the evolution according to the delta import every 30mn) : r...@search-01:/data/solr# ls video/data/index/ _2bel.fdt _2bel.fnm _2bel.nrm _2bel.tii _2bel.tvd _2bel.tvx _2bem.fdt _2bem.fnm _2bem.nrm _2bem.tii _2bem.tvd _2bem.tvx _2ben.frq _2ben.prx _2ben.tis _2beo.fdx segments.gen _2bel.fdx _2bel.frq _2bel.prx _2bel.tis _2bel.tvf _2bel_1.del _2bem.fdx _2bem.frq _2bem.prx _2bem.tis _2bem.tvf _2ben.fnm _2ben.nrm _2ben.tii _2beo.fdt _2beo.fnm segments_230x r...@search-01:/data/solr# ls video/data/index/ _2bel.fdt _2bel.frq _2bel.tii _2bel.tvf _2bem.fdt _2bem.frq _2bem.tii _2bem.tvf _2ben.frq _2ben.tii _2beo.fdx _2beo.nrm _2beo.tis _2beo.tvx _2bel.fdx _2bel.nrm _2bel.tis _2bel.tvx _2bem.fdx _2bem.nrm _2bem.tis _2bem.tvx _2ben.nrm _2ben.tis _2beo.fnm _2beo.prx _2beo.tvd segments.gen _2bel.fnm _2bel.prx _2bel.tvd _2bel_1.del _2bem.fnm _2bem.prx _2bem.tvd _2ben.fnm _2ben.prx _2beo.fdt _2beo.frq _2beo.tii _2beo.tvf segments_230x r...@search-01:/data/solr# ls video/data/index/ _2beo.fdt _2beo.fdx _2beo.fnm _2beo.frq _2beo.nrm _2beo.prx _2beo.tii _2beo.tis _2beo.tvd _2beo.tvf _2beo.tvx segments.gen segments_230y So as you can notice my segments increased which is perfect for my replication (faster to get JUST last segments) But like you can see in my last ls my segment has been optimized. Like I can notice in my log : Mar 19 15:42:37 search-01 jsvc.exec[23255]: Mar 19, 2009 3:42:37 PM org.apache.solr.handler.dataimport.DocBuilder commit INFO: Full Import completed successfully without optimization Mar 19 15:42:37 search-01 jsvc.exec[23255]: Mar 19, 2009 3:42:37 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true) But I didn't fire any optimize, my delta-import is fired like : /solr/video/dataimport?command=delta-importoptimize=false Solrconfig.xml : autocommit turnd off !--luceneAutoCommitfalse/luceneAutoCommit-- !-- commitLockTimeout18500/commitLockTimeout-- !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs /autoCommit -- Maybe it comes from lucene parameters? !-- options specific to the main on-disk lucene index -- useCompoundFilefalse/useCompoundFile ramBufferSizeMB50/ramBufferSizeMB mergeFactor50/mergeFactor !-- Deprecated -- !--maxBufferedDocs1000/maxBufferedDocs-- maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength Thanks a lot for your help, Sunny -- View this message in context: http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22649412.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- View this message in context: http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22661545.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22675729.html Sent from the Solr - User mailing list archive at Nabble.com.
search individual words but facet on delimiter
I want following output from solr: I index a field with value - A B;C D;E F I have applied a pattern tokenizer on this field because I know the value will contain ; fieldtype name=conditionText class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory pattern=; / /analyzer /fieldtype So it indexes A B, C D, E F properly... So I get facets A B (1) C D (1) E F (1) This is the exact output of facets I want. But I also want to search this document when I just search individual word 'A' or 'D' etc. So I want facets exactly same as above but at the same time to be able to search on individual words also. Is there a way to achieve this??? Thanks in advance, Ashish -- View this message in context: http://www.nabble.com/search-individual-words-but-facet-on-delimiter-tp22676007p22676007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search individual words but facet on delimiter
On Tue, Mar 24, 2009 at 2:03 PM, Ashish P ashish.ping...@gmail.com wrote: So it indexes A B, C D, E F properly... So I get facets A B (1) C D (1) E F (1) This is the exact output of facets I want. But I also want to search this document when I just search individual word 'A' or 'D' etc. So I want facets exactly same as above but at the same time to be able to search on individual words also. Yes, you can create another field whose type is text (or anything which can tokenize on whitespace and punctuation). Use the copyField directive to copy the contents into both your original and the above field. Search on the above field and facet on the original field. -- Regards, Shalin Shekhar Mangar.
Re: no subject
We should obviously get to the bottom of this. But I was thinking, should we have some sort of timeouts on the SnapPuller in the slave to avoid such scenarios? Locking out snap pulls forever is not a good idea. On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: So this is only one slave that hangs up and not the master? Can you get thread dumps on both the master and the slave during a hang? -Yonik http://www.lucidimagination.com On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn jnewb...@zappos.com wrote: We are having an intermittent problem with replication. We reindex nightly which usually means there are 2 commits during replication then a final commit/optimize at the end. For some reason the replication will hang occasionally with the following screenshot. This is frustrating as it will completely stall out any further replications. Additionally, it seems to only happen on reindex and it will strike 1 server randomly but not always the same server. In case the screen shot doesn’t come through: Masterhttp://10.66.209.38:8080/solr/zeta-main/replication Latest Index Version:1233423827699, Generation: 6237 Replicatable Index Version:0, Generation: 0 Poll Interval 00:05:00 Local Index Index Version: 1233423827684, Generation: 6222 Location: /opt/solr-data/zeta-main/index Size: 1.29 GB Times Replicated Since Startup: 3591 Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009 Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009 Config Files Replicated: [synonyms.txt] Times Config Files Replicated Since Startup: 4 Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009 Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009 Files Downloaded: 12 / 163 Downloaded: 4.12 MB / 1.41 GB [0.0%] Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%] Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163 bytes/s -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 -- Regards, Shalin Shekhar Mangar.
Re: no subject
We do not set an conn_timeoout,read_timeout for the httpclient in snappuller. I guess it should be set to some very high value say 1hr for read-timeout and say 1 minute for conn_timeout and we can make it configurable . --Noble On Tue, Mar 24, 2009 at 2:13 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: We should obviously get to the bottom of this. But I was thinking, should we have some sort of timeouts on the SnapPuller in the slave to avoid such scenarios? Locking out snap pulls forever is not a good idea. On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: So this is only one slave that hangs up and not the master? Can you get thread dumps on both the master and the slave during a hang? -Yonik http://www.lucidimagination.com On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn jnewb...@zappos.com wrote: We are having an intermittent problem with replication. We reindex nightly which usually means there are 2 commits during replication then a final commit/optimize at the end. For some reason the replication will hang occasionally with the following screenshot. This is frustrating as it will completely stall out any further replications. Additionally, it seems to only happen on reindex and it will strike 1 server randomly but not always the same server. In case the screen shot doesn’t come through: Master http://10.66.209.38:8080/solr/zeta-main/replication Latest Index Version:1233423827699, Generation: 6237 Replicatable Index Version:0, Generation: 0 Poll Interval 00:05:00 Local Index Index Version: 1233423827684, Generation: 6222 Location: /opt/solr-data/zeta-main/index Size: 1.29 GB Times Replicated Since Startup: 3591 Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009 Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009 Config Files Replicated: [synonyms.txt] Times Config Files Replicated Since Startup: 4 Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009 Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009 Files Downloaded: 12 / 163 Downloaded: 4.12 MB / 1.41 GB [0.0%] Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%] Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163 bytes/s -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
autocommit and crashing tomcat
Hi, If I'm using autocommit, and I have a crash of tomcat (or the whole machine) while there are still docs pending, will I lose those documents in limbo, or will I just be able to restart and then the commit will run? If the answer is they go away: Is there anyway to ensure integrity of an update? I'd like to make a patch to help out with this, where would one do it? Thanks a bunch! Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
lucene-java version mismatches
Hello list, I have a hard time in a project that's not yet fully converted to solr with multiple versions of the lucene core classes. I can switch over to the ones of solr (solr-lucene-core-1.3.0) but they are incompatible with lucene-core-2.3.1 and don't share the same version number but also don't make sources available. Is there a lucene version that solr-lucene-core-1.3.0 is? Is there a danger for me to migrate some tools that use lucene- core-2.3.1 to use solr-lucene-core-1.3.0? thanks paul smime.p7s Description: S/MIME cryptographic signature
Re: lucene-java version mismatches
On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht p...@activemath.org wrote: Is there a lucene version that solr-lucene-core-1.3.0 is? The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn revision r691741. You can check out the source from lucene's svn using that revision number. -- Regards, Shalin Shekhar Mangar.
Re: Combination of solr.xml and solrconfig.xml
Hi, question ;-) !DOCTYPE config SYSTEM http://java.sun.com/dtd/web-app_2_3.dtd; [ !ENTITY default_solrconfig SYSTEM /var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml ] Is there a chance to set the home directory using a variable ? For example an unix enviroment variable ? Greets -Ralf- No chance ? Greets -Ralf-
Re: Combination of solr.xml and solrconfig.xml
On Tue, Mar 24, 2009 at 4:16 PM, Kraus, Ralf | pixelhouse GmbH r...@pixelhouse.de wrote: Hi, question ;-) !DOCTYPE config SYSTEM http://java.sun.com/dtd/web-app_2_3.dtd; [ !ENTITY default_solrconfig SYSTEM /var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml ] Is there a chance to set the home directory using a variable ? For example an unix enviroment variable ? Greets -Ralf- No chance ? One can use system variables in solrconfig.xml through the ${var-name} syntax but that is expanded only for DOM elements. It may not work for the entity includes though I haven't tried. -- Regards, Shalin Shekhar Mangar.
external fields storage
Hi Solr users Our index could be much smaller if we could store some of fields not in index directly but in some kind of external storage. All I've found until now is ExternalFileField class which shows that it's possible to implement such a storage, but I'm quite sure that the requirement is common and there should be some existing implementations. Also it would be good to be able to search using these fields, to include them in the search result sets and to update them with standard Solr update handlers. -- Andrew Klochkov
Re: external fields storage
Andrey Klochkov wrote: Hi Solr users Our index could be much smaller if we could store some of fields not in index directly but in some kind of external storage. All I've found until now is ExternalFileField class which shows that it's possible to implement such a storage, but I'm quite sure that the requirement is common and there should be some existing implementations. Also it would be good to be able to search using these fields, to include them in the search result sets and to update them with standard Solr update handlers. Thats a tall order. It almost sounds as if you want to be able to not use the index to store fields, but have them still fully functional as if indexed. That would be quite the magic trick. You might check out http://skwish.sourceforge.net/. Its a cool little library that lets you store arbitrary data keyed by an auto generated id. -- - Mark http://www.lucidimagination.com
Re: external fields storage
Our index could be much smaller if we could store some of fields not in index directly but in some kind of external storage. All I've found until now is ExternalFileField class which shows that it's possible to implement such a storage, but I'm quite sure that the requirement is common and there should be some existing implementations. Also it would be good to be able to search using these fields, to include them in the search result sets and to update them with standard Solr update handlers. Thats a tall order. It almost sounds as if you want to be able to not use the index to store fields, but have them still fully functional as if indexed. That would be quite the magic trick. Well there's a number of posts in different mail lists which talk about the same requirements so I wonder is lucene/solr/smth else doesn't implement something like this. For example, see this post: http://markmail.org/message/t4lv2hqtret4p62g?q=lucene+storing+fields+in+external+storagepage=1refer=bmode2h2dwjpymba#query:lucene%20storing%20fields%20in%20external%20storage+page:1+mid:t4lv2hqtret4p62g+state:results You might check out http://skwish.sourceforge.net/. Its a cool little library that lets you store arbitrary data keyed by an auto generated id. We already have the storage (Coherence), we just want to make it accessible through standard Solr API and not to create an additional logic above Solr. I mean the logic which processes result sets and add additional fields to it by taking values from the external storage. And in the case of that custom post-search logic we also will have to implement some additional filtering/ordering/etc of result sets based on values of that external fields. So the question is is it possible to use Solr/Lucene features to use external field storage for some of fields. -- Andrew Klochkov
Re: external fields storage
On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller markrmil...@gmail.com wrote: Thats a tall order. It almost sounds as if you want to be able to not use the index to store fields, but have them still fully functional as if indexed. That would be quite the magic trick. Look here, people wanted exactly the same feature in 2004. Is it still not implemented? http://www.gossamer-threads.com/lists/lucene/java-user/8672 -- Andrew Klochkov
Not able to configure multicore
Hello Friends, I am newbee to solr. so sorry for silly question. I am facing a problem related to multiple cores configuration. I have placed a solr.xml file in solr.home directory. eventhough when I am trying to access http://localhost:8983/solr/admin/cores it gives me tomcat error. Can anyone tell me what can be possible issue with this?? Thanks, Mitul Patel -- View this message in context: http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22682691.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not able to configure multicore
mitulpatel wrote: Hello Friends, I am newbee to solr. so sorry for silly question. I am facing a problem related to multiple cores configuration. I have placed a solr.xml file in solr.home directory. eventhough when I am trying to access http://localhost:8983/solr/admin/cores it gives me tomcat error. Can anyone tell me what can be possible issue with this?? Thanks, Mitul Patel Have you set adminPath=/admin/cores in cores? -- - Mark http://www.lucidimagination.com
Update field values without re-extracting text?
I'd like to be able to index various documents and have the text extracted from them using the DataImportHandler. I think I have this working just fine. However, I'd later like to be able to update a field value or several, without re-extracting the text all over again with the DIH. Yes - and if possible, only update one or a few of the field values and leave the rest as is. I haven't seen a way to do this - can it be done? What do I need to read yet to accomplish this? Can someone point me in the right direction to do this? Thanks! -Dan -- Dan A. Dickey | Senior Software Engineer
Fwd: multicore solrconfig issues
No problem Kimani. I am forwarding this message to the mailing list, in case that it can help others. Audrey -- Forwarded message -- From: Kimani Nielsen kniel...@gmail.com Date: Tue, Mar 24, 2009 at 8:57 AM Subject: Re: multicore solrconfig issues To: Audrey Foo chry...@gmail.com Audrey, Yep that was my problem as well! Thank you so much for your helpful reply. Funny thing was the application never complained about a missing elevation.xml config file. Thanks again. - Kimani On Tue, Mar 24, 2009 at 11:48, Audrey Foo chry...@gmail.com wrote: Hi Kimani Yes, I thought I had copied all xml files, but was missing elevate.xml Thanks Audrey On Tue, Mar 24, 2009 at 7:55 AM, kniel...@gmail.com wrote: Hi, I am running into the exact same error when setting up a multi-core configuration using Websphere6.1. Were you able to find the solution to this? - Kimani Audrey Foo-2 wrote: Hi I am using most recent drupal apachesolr module with solr 1.4 nightly build * solrconfig.xml == http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15view=markuppathrev=DRUPAL-6--1-0-BETA5 * schema.xml == http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30view=markuppathrev=DRUPAL-6--1-0-BETA5 and attempting to use the multicore functionality * copied the txt files from example/solr/conf to example/multicore/core0/conf * copied the xml files above to example/multicore/core0/conf * started jetty: java -Dsolr.solr.home=multicore -jar start.jar It throws these severe errors on bootstrap SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) at java.util.concurrent.FutureTask.run(FutureTask.java:123) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:613) Any suggestions, about what to try further? Thanks AF Quoted from: http://www.nabble.com/multicore-solrconfig-issues-tp22591761p22591761.html
Re: Optimize
thanks for your answer, then what fire merging because in my log i've optimize=true, if it's not optimization because I don't fire it, it must me marging how can I stop this?? Thanks a lot, Shalin Shekhar Mangar wrote: No, optimize is not automatic. You have to invoke it yourself just like commits. Take a look at the following for examples: http://wiki.apache.org/solr/UpdateXmlMessages On Thu, Oct 2, 2008 at 2:03 PM, sunnyfr johanna...@gmail.com wrote: Hi, Can somebody explain me a bit how works optimize? I read the doc but didn't get really what fire optimize. Thanks a lot, -- View this message in context: http://www.nabble.com/Optimize-tp19775320p19775320.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Optimize-tp19775320p22684113.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Index IP address
Hi All, I have a txt file, that captured all of my network traffic. How can I use Solr to filter out a particular IP address? Thank you, Nga.
Re: external fields storage
Andrey Klochkov wrote: On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller markrmil...@gmail.com wrote: Thats a tall order. It almost sounds as if you want to be able to not use the index to store fields, but have them still fully functional as if indexed. That would be quite the magic trick. Look here, people wanted exactly the same feature in 2004. Is it still not implemented? http://www.gossamer-threads.com/lists/lucene/java-user/8672 -- Andrew Klochkov Right - I was exaggerating your description a bit. It reads as if you want it to have all the same power as an indexed field. So I made a bad joke. If you want to be able to search the field, its index entry needs to be updated anyway. I don't see how you get search on external stored fields without having to update and keep them in the index - external field storage is simple to add on your own, either using that skwish library, or even a basic database. You can then do id to offset mapping like that guy is looking for - simply add the id to Lucene and do your content updates with the external db. -- - Mark http://www.lucidimagination.com
Streaming results of analysis to shards ... possible?
Hello all, Our application involves a high index write rate - anywhere from a few dozen to many thousands of docs per sec. The write rate is frequently higher than the read rate (though not always), and our index must be as fresh as possible (we'd like search results to be no more than a couple of seconds out of date). We're considering many approaches to achieving our desired TCO. We've noted that the indexing process can be quite costly. Our latest POC shards the total index over N machines which effectively distributes the indexing load and keeps refresh and and search response times decent, but to maintain performance during peak write rates, we've had to make N a much larger number than we'd like. One idea we're floating would be to do all the analysis centrally, perhaps on N/4 machines, and then stream the raw tokens and data directly to the read slaves, who would (hopefully) need to do nothing more than manage segments and readers. We have some very rough math that makes the approach compelling, but before diving in wholesale, we thought we'd ask if anyone else has taken a similar approach. Thoughts? Sincerely, Cass Costello www.stubhub.com
Re: How to Index IP address
I don't think that Solr is the best thing to use for searching a text file. I'd use grep myself, if you're on a unix-like system. To use solr, you'd need to throw each network 'event' (GET, POST, etc etc) into an XML document, and post those into Solr so it could generate the index. You could then do things like ip:10.206.158.154 to find a specific IP address, or even ip: 10.206.158* to get a subnet. Perhaps the thing that's building your text file could post to Solr instead? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 24, 2009, at 9:32 AM, nga pham wrote: Hi All, I have a txt file, that captured all of my network traffic. How can I use Solr to filter out a particular IP address? Thank you, Nga.
Re: How to Index IP address
Do you think luence is better to filter out a particular IP address from a txt file? Thank you Runo, Nga On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo mr...@zappos.com wrote: I don't think that Solr is the best thing to use for searching a text file. I'd use grep myself, if you're on a unix-like system. To use solr, you'd need to throw each network 'event' (GET, POST, etc etc) into an XML document, and post those into Solr so it could generate the index. You could then do things like ip:10.206.158.154 to find a specific IP address, or even ip:10.206.158* to get a subnet. Perhaps the thing that's building your text file could post to Solr instead? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 24, 2009, at 9:32 AM, nga pham wrote: Hi All, I have a txt file, that captured all of my network traffic. How can I use Solr to filter out a particular IP address? Thank you, Nga.
Re: Update field values without re-extracting text?
Hi Dan, We should turn this into a FAQ. In the mean time, have a look at SOLR-139 and the issue linked to that one. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dan A. Dickey dan.dic...@savvis.net To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 11:43:35 AM Subject: Update field values without re-extracting text? I'd like to be able to index various documents and have the text extracted from them using the DataImportHandler. I think I have this working just fine. However, I'd later like to be able to update a field value or several, without re-extracting the text all over again with the DIH. Yes - and if possible, only update one or a few of the field values and leave the rest as is. I haven't seen a way to do this - can it be done? What do I need to read yet to accomplish this? Can someone point me in the right direction to do this? Thanks! -Dan -- Dan A. Dickey | Senior Software Engineer
Trivial question: request for id when indexing using CURL ExtractingRequestHandler
I'm performing this operation: curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary @ZOLA.doc -H 'Content-type:text/html' in order to index word document ZOLA.doc into Solr using the example schema.xml. It says I have not provided an 'id', which is a required field. I'm not sure how (syntactically) to provide the id- should it be part of the query string? And if so, how? Any help much appreciated! Thanks! Chris.
Multi-select on more than one facet field
Looking at the example here: http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef This being the query for selecting PDF: q=mainqueryfq=status:publicfq={! tag=dt}doctype:pdffacet=onfacet.field={!ex=dt}doctype How would you do the query for selecting PDF OR Excel AND, assuming there is another facet field named author, where author is Mike? Thank you, Nasseam
Solr index deletion
On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
Re: How to Index IP address
Well, I think you'll have the same problem. Lucene, and Solr (since it's built on Lucene) are both going to expect a structured document as input. Once you send in a bunch of documents, you can then query them for whatever you want to find. A quick search of the internets found me this Apache Labs project - called Pinpoint. It's designed to take log data in, and build an index out of it. I'm not sure how developed it is, but it might be a good starting point for you. There are probably other projects out there along the same lines.. Here's Pinpoint: http://svn.apache.org/repos/asf/labs/pinpoint/trunk/ Why do you want to use Solr / Lucene to look through your files? If you have a huge dataset, some people are using Hadoop (a version of Google's MapReduce) to look through very large sets of logfiles: http://www.lexemetech.com/2008/01/hadoop-and-log-file-analysis.html Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 24, 2009, at 10:28 AM, nga pham wrote: Do you think luence is better to filter out a particular IP address from a txt file? Thank you Runo, Nga On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo mr...@zappos.com wrote: I don't think that Solr is the best thing to use for searching a text file. I'd use grep myself, if you're on a unix-like system. To use solr, you'd need to throw each network 'event' (GET, POST, etc etc) into an XML document, and post those into Solr so it could generate the index. You could then do things like ip:10.206.158.154 to find a specific IP address, or even ip: 10.206.158* to get a subnet. Perhaps the thing that's building your text file could post to Solr instead? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Mar 24, 2009, at 9:32 AM, nga pham wrote: Hi All, I have a txt file, that captured all of my network traffic. How can I use Solr to filter out a particular IP address? Thank you, Nga.
Re: Solr index deletion
Somehow that sounds very unlikely. Have you looked at logs? What have you found from Solr there? I am not checking the sources, but I don't think there is any place in Solr where the index directory gets deleted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra nass...@bodukai.com To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 2:35:22 PM Subject: Solr index deletion On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler
I've tried this too, still no luck: curl http://localhost:8983/solr/update/extract?ext.def.fl=text -F id=123 -F te...@zola.doc 2009/3/24 Chris Muktar ch...@wikijob.co.uk I'm performing this operation: curl http://localhost:8983/solr/update/extract?ext.def.fl=text--data-binary @ZOLA.doc -H 'Content-type:text/html' in order to index word document ZOLA.doc into Solr using the example schema.xml. It says I have not provided an 'id', which is a required field. I'm not sure how (syntactically) to provide the id- should it be part of the query string? And if so, how? Any help much appreciated! Thanks! Chris.
Re: Problem with Facet Date Query
: : This is my query: : q=productPublicationDate_product_dt:[*%20TO%20NOW]facet=truefacet.field=productPublicationDate_product_dt:[*%20TO%20NOW]qt=dismaxrequest that specific error is happening because you are passing this string... productPublicationDate_product_dt:[*%20TO%20NOW] ...to the facet.field param. that parameter expects the name of a field, and it will then facet on all the indexed values. what you are passing it isn't the name of a field, you are passing it a query string. if you want the faceting count for a query string, use the facet.query param, which you already seem to be doing with a different range of dates by hardcoding it into your solrconfig... : I have entered this field in solrConfig.xml also in the below manner. : : lst name=invariants : str name=facet.fieldcat/str : str name=facet.fieldmanu_exact/str : str name=facet.queryprice:[* TO 500]/str : str name=facet.queryprice:[500 TO *]/str : str name=facet.queryproductPublicationDate_product_dt:[* TO : NOW/DAY-1MONTH]^2.2/str : /lst I'm not entirely sure what it is you are trying to do, but you're also going to have problems because you are using the standard query syntax in your q param, but you have specified qt=dismax. Please explain what your *goal* is and then people can help you explain how to achieve your goal ... what you've got here in your example makes no sense, and it's not clear what advice to give you to get it to make sense without knowing what it is you want to do. This is similar to an XY Problem... http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: How to Index IP address
Well, A log file is theoretically structured. Every log record is a - very - flat set of fields. So, every log file line would be a Lucene document. Then, one could use Solr to search, filter and facet records. Of course, this requires parsing log file back into record components. Most log files were created for output, not for re-input. But if you can parse it back, you might be able to do custom data import. Or, if you can intercept log file before it hits serialization, you might be able to index the fields directly. Or you could just buy Splunk ( http://www.splunk.com/ ) and be done with it. Parsing and visualizing log files is exactly what they set out to deal with. No (great) open source solution yet. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/ - I think age is a very high price to pay for maturity (Tom Stoppard) On Tue, Mar 24, 2009 at 2:40 PM, Matthew Runo mr...@zappos.com wrote: Well, I think you'll have the same problem. Lucene, and Solr (since it's built on Lucene) are both going to expect a structured document as input. Once you send in a bunch of documents, you can then query them for whatever you want to find.
Re: Multi-select on more than one facet field
On Tue, Mar 24, 2009 at 2:29 PM, Nasseam Elkarra nass...@bodukai.com wrote: Looking at the example here: http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef This being the query for selecting PDF: q=mainqueryfq=status:publicfq={!tag=dt}doctype:pdffacet=onfacet.field={!ex=dt}doctype How would you do the query for selecting PDF OR Excel AND, assuming there is another facet field named author, where author is Mike? If author is not a multi-select facet (i.e. you already selected author:Mike and hence wish to no longer get other counts for the author field) then: q=mainquery fq=status:public fq={!tag=dt}doctype:(PDF OR Excel) fq=author:Mike facet=onfacet.field={!ex=dt}doctype If author *is* multi-select, then you wish to get facet counts for the author field, ignoring the author:Mike restriction for the author facet only: q=mainquery fq=status:public fq={!tag=dt}doctype:(PDF OR Excel) fq={!tag=auth}author:Mike facet=onfacet.field={!ex=dt}doctype facet.field={!ex=auth}author -Yonik http://www.lucidimagination.com
Re: Solr index deletion
Correction: index was not deleted. The folder is still there with the index files in it but a *:* query returns 0 results. Is there a tool to check the health of an index? Thanks, Nasseam On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote: Somehow that sounds very unlikely. Have you looked at logs? What have you found from Solr there? I am not checking the sources, but I don't think there is any place in Solr where the index directory gets deleted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra nass...@bodukai.com To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 2:35:22 PM Subject: Solr index deletion On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
Re: Solr index deletion
There is, it's called CheckIndex and it is a part of Lucene (and Lucene jars that come with Solr, I believe): http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra nass...@bodukai.com To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 4:21:50 PM Subject: Re: Solr index deletion Correction: index was not deleted. The folder is still there with the index files in it but a *:* query returns 0 results. Is there a tool to check the health of an index? Thanks, Nasseam On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote: Somehow that sounds very unlikely. Have you looked at logs? What have you found from Solr there? I am not checking the sources, but I don't think there is any place in Solr where the index directory gets deleted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 2:35:22 PM Subject: Solr index deletion On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
Hardware Questions...
We have three Solr servers (several two processor Dell PowerEdge servers). I'd like to get three newer servers and I wanted to see what we should be getting. I'm thinking the following... Dell PowerEdge 2950 III 2x2.33GHz/12M 1333MHz Quad Core 16GB RAM 6 x 146GB 15K RPM RAID-5 drives How do people spec out servers, especially CPU, memory and disk? Is this all based on the number of doc's, indexes, etc... Also, what are people using for benchmarking and monitoring Solr? Thanks - Mike
Re: Field tokenizer question
: as far as I know solr.StrField is not analized but it is indexed as is : (verbatim). correct ... but there is definitely a bug here if the analysis.jsp is implying that an analyzer is being used... https://issues.apache.org/jira/browse/SOLR-1086 -Hoss
Re: Delta import
Ok i'm ok with the fact the solr gonna do X request to database for X update.. but when i try to run the delta-import command with 2 row to update is it normal that its kinda really slow ~ 1 document fetched / sec ? Noble Paul നോബിള് नोब्ळ् wrote: not possible really, that may not be useful to a lot of users because there may be too many changed ids and the 'IN' part can be really long. You can raise an issue anyway On Mon, Mar 23, 2009 at 9:30 PM, AlexxelA alexandre.boudrea...@canoe.ca wrote: I'm using the delta-import command. Here's the deltaQuery and deltaImportQuery i use : select uid from profil_view where last_modified '${dataimporter.last_index_time}' select * from profil_view where uid='${dataimporter.delta.uid} When i look at the delta import status i see that the total request to datasource equal the number of modification i had. Is it possible to make only one request to database and fetch all modification ? select * from profil_view where uid in ('${dataimporter.delta.ALLuid}') (something like that). -- View this message in context: http://www.nabble.com/Delta-import-tp22663196p22663196.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Delta-import-tp22663196p22689588.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit is taking very long time
: My application is in prod and quite frequently�getting NullPointerException. ... : java.lang.NullPointerException : at com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.indexData(AuctionCollectionServiceImpl.java:251) : at com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.process(AuctionCollectionServiceImpl.java:135) : at com.fm.search.job.SearchIndexingJob.executeInternal(SearchIndexingJob.java:68) : at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) : at org.quartz.core.JobRunShell.run(JobRunShell.java:202) : at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529) that stack trace doesn't suggest anything remotely related to Solr. none of those classes are in teh Solr code base -- without having any idea what the code on line 251 of your AuctionCollectionServiceImpl class looks like, no one could even begin to speculate what is causing the NPE. Even if we know what line 251 looks like, uinderstanding why some refrence on that line is null would probably require knowing a whole lot more about your application. -Hoss
RE: Exact Match
: Depending on your needs, you might want to do some sort of minimal : analysis on the field (ignore punctuation, lowercase,...) Here's the : text_exact field that I use: Deans reply is a great example of what exact is a vague term. with a TextField you can get an exact match using a simple phrase query (ie: putting quotes arround the input) assuming your meaning of exact is that all the tokens appera together in sequence, and assuming your analyzer doesn't change things in a way that makes a phrase search match in a way that you don't consider exact enough if you want to ensure that the documents contains exactly what the user queried for, no more and no less, then using a copyField into StrField is really the best way to do that. -Hoss
Re: Response schema for an update.
: Subject: Response schema for an update. : In-Reply-To: shivayigjyfbf88vtu21...@shiva.ceiindia.com : References: 69de18140903230141t38dbcd28n40bbcc944ddb0...@mail.gmail.com : shivayigjyfbf88vtu21...@shiva.ceiindia.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: Not able to configure multicore
: I am facing a problem related to multiple cores configuration. I have placed : a solr.xml file in solr.home directory. eventhough when I am trying to : access http://localhost:8983/solr/admin/cores it gives me tomcat error. : : Can anyone tell me what can be possible issue with this?? not without knowing exactly what the tomcat error message is, what your solr.xml file looks like, what log messages you see on startup, etc... -Hoss
Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler
Deja-Vu... http://www.nabble.com/Missing-required-field%3A-id-Using-ExtractingRequestHandler-to22611039.html : I'm performing this operation: : : curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary : @ZOLA.doc -H 'Content-type:text/html' : : in order to index word document ZOLA.doc into Solr using the example : schema.xml. It says I have not provided an 'id', which is a required field. : I'm not sure how (syntactically) to provide the id- should it be part of the : query string? And if so, how? -Hoss
Re: lucene-java version mismatches
Le 24-mars-09 à 11:14, Shalin Shekhar Mangar a écrit : On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht p...@activemath.org wrote: Is there a lucene version that solr-lucene-core-1.3.0 is? The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn revision r691741. You can check out the source from lucene's svn using that revision number. thanks, that's useful, could I suggest that the maven repositories are populated next-time a release of solr-specific-lucenes are made? paul smime.p7s Description: S/MIME cryptographic signature
Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler
Fantastic thank you! I'm executing this: curl -F te...@zheng.doc -F 'commit=true' http://localhost:8983/solr/update/extract?ext.def.fl=text\ext.literal.id=2 however performing the query http://localhost:8983/solr/select?q=id:2 produces the output but without a text field. I'm not sure if it's being extracted indexed correctly. The commit is going through though. This is using the example schema. Any thoughts? XML response follows... response - lst name=responseHeader int name=status0/int int name=QTime2/int - lst name=params str name=qid:2/str /lst /lst - result name=response numFound=1 start=0 - doc str name=id2/str int name=popularity0/int str name=sku2/str date name=timestamp2009-03-24T22:27:00.714Z/date /doc /result /response
Re: delta-import commit=false doesn't seems to work
Hi, Sorry I still don't know what should I do ??? I can see in my log which clearly optimize somewhere even if my command is deltaimportoptimize=false is it a parameter to add to the commit or to the snappuller or ??? Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM org.apache.solr.handler.dataimport.SolrWriter persistStartTime INFO: Wrote last indexed time to dataimport.properties Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM org.apache.solr.handler.dataimport.DocBuilder commit INFO: Full Import completed successfully Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true) thanks a lot for your help sunnyfr wrote: Like you can see, I did that and I've no information in my DIH but you can notice in my logs and even my segments that and optimize is fired alone automaticly? Noble Paul നോബിള് नोब्ळ् wrote: just hit the DIH without any command and you may be able to see the status of the last import. It can tell you whether a commit/optimize was performed On Fri, Mar 20, 2009 at 7:07 PM, sunnyfr johanna...@gmail.com wrote: Thanks I gave more information there : http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-td22601442.html thanks a lot Paul Noble Paul നോബിള് नोब्ळ् wrote: sorry, the whole thing was commented . I did not notice that. I'll look into that 2009/3/20 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: you have set autoCommit every x minutes . it must have invoked commit automatically On Thu, Mar 19, 2009 at 4:17 PM, sunnyfr johanna...@gmail.com wrote: Hi, Even if I hit command=delta-importcommit=falseoptimize=false I still have commit set in my logs and sometimes even optimize=true, About optimize I wonder if it comes from commitment too close and one is not done, but still I don't know really. Any idea? Thanks a lot, -- View this message in context: http://www.nabble.com/delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22597630p22597630.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- --Noble Paul -- View this message in context: http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22620439.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22691417.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler
If your text field is not stored, then it won't be available in results. That's the likely explanation. Seems like all is well. Erik On Mar 24, 2009, at 11:34 PM, Chris Muktar wrote: Fantastic thank you! I'm executing this: curl -F te...@zheng.doc -F 'commit=true' http://localhost:8983/solr/update/extract?ext.def.fl=text \ext.literal.id=2 however performing the query http://localhost:8983/solr/select?q=id:2 produces the output but without a text field. I'm not sure if it's being extracted indexed correctly. The commit is going through though. This is using the example schema. Any thoughts? XML response follows... response - lst name=responseHeader int name=status0/int int name=QTime2/int - lst name=params str name=qid:2/str /lst /lst - result name=response numFound=1 start=0 - doc str name=id2/str int name=popularity0/int str name=sku2/str date name=timestamp2009-03-24T22:27:00.714Z/date /doc /result /response
Re: Hardware Questions...
Have you looked at http://wiki.apache.org/solr/SolrPerformanceData ?http://wiki.apache.org/solr/SolrPerformanceData On Tue, Mar 24, 2009 at 4:51 PM, solr s...@highbeam.com wrote: We have three Solr servers (several two processor Dell PowerEdge servers). I'd like to get three newer servers and I wanted to see what we should be getting. I'm thinking the following... Dell PowerEdge 2950 III 2x2.33GHz/12M 1333MHz Quad Core 16GB RAM 6 x 146GB 15K RPM RAID-5 drives How do people spec out servers, especially CPU, memory and disk? Is this all based on the number of doc's, indexes, etc... Also, what are people using for benchmarking and monitoring Solr? Thanks - Mike
Re: Solr index deletion
The tool says there are no problems. Solr is pointing to the right directory so not sure what is preventing it from returning any results. Any ideas? Here is the output: Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA [Lucene 2.9] 1 of 1: name=_0 docCount=18021 compound=false hasProx=true numFiles=9 size (MB)=8.389 has deletions [delFileName=_0_1.del] test: open reader.OK [18 deleted docs] test: fields, norms...OK [35 fields] test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs pairs; 1224063 tokens] test: stored fields...OK [386828 total field count; avg 21.487 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/ freq vector fields per doc] No problems were detected with this index. -- Thanks, Nasseam On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote: There is, it's called CheckIndex and it is a part of Lucene (and Lucene jars that come with Solr, I believe): http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra nass...@bodukai.com To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 4:21:50 PM Subject: Re: Solr index deletion Correction: index was not deleted. The folder is still there with the index files in it but a *:* query returns 0 results. Is there a tool to check the health of an index? Thanks, Nasseam On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote: Somehow that sounds very unlikely. Have you looked at logs? What have you found from Solr there? I am not checking the sources, but I don't think there is any place in Solr where the index directory gets deleted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 2:35:22 PM Subject: Solr index deletion On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
get all facets
Can I get all the facets in QueryResponse?? Thanks, Ashish -- View this message in context: http://www.nabble.com/get-all-facets-tp22693809p22693809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: autocommit and crashing tomcat
Hi Yonik, Thanks for the response. If I shut down tomcat cleanly, does it commit all uncommitted documents? Best, Jacob -- Forwarded message -- From: Yonik Seeley yo...@lucidimagination.com Date: Tue, Mar 24, 2009 at 8:48 PM Subject: Re: autocommit and crashing tomcat To: solr-user@lucene.apache.org On Tue, Mar 24, 2009 at 5:52 AM, Jacob Singh jacobsi...@gmail.com wrote: If I'm using autocommit, and I have a crash of tomcat (or the whole machine) while there are still docs pending, will I lose those documents in limbo Yep. If the answer is they go away: Is there anyway to ensure integrity of an update? You can only be sure that the docs are on the disk after you have done a commit. An optional transaction log would be part of high-availability for writes, is something we should eventually get to though. -Yonik http://www.lucidimagination.com -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: Solr index deletion
Hm, you are not saying much about what you've tried. Could it be your Solr home is wrong and not even pointing to the index you just checked? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra nass...@bodukai.com To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 7:47:08 PM Subject: Re: Solr index deletion The tool says there are no problems. Solr is pointing to the right directory so not sure what is preventing it from returning any results. Any ideas? Here is the output: Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA [Lucene 2.9] 1 of 1: name=_0 docCount=18021 compound=false hasProx=true numFiles=9 size (MB)=8.389 has deletions [delFileName=_0_1.del] test: open reader.OK [18 deleted docs] test: fields, norms...OK [35 fields] test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs pairs; 1224063 tokens] test: stored fields...OK [386828 total field count; avg 21.487 fields per doc] test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields per doc] No problems were detected with this index. -- Thanks, Nasseam On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote: There is, it's called CheckIndex and it is a part of Lucene (and Lucene jars that come with Solr, I believe): http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 4:21:50 PM Subject: Re: Solr index deletion Correction: index was not deleted. The folder is still there with the index files in it but a *:* query returns 0 results. Is there a tool to check the health of an index? Thanks, Nasseam On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote: Somehow that sounds very unlikely. Have you looked at logs? What have you found from Solr there? I am not checking the sources, but I don't think there is any place in Solr where the index directory gets deleted. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nasseam Elkarra To: solr-user@lucene.apache.org Sent: Tuesday, March 24, 2009 2:35:22 PM Subject: Solr index deletion On a few occasions, our development server crashed and in the process solr deleted the index folder. We are suspecting another app on the server caused an OutOfMemoryException on Tomcat causing all apps including solr to crash. So my question is why is solr deleting the index? We are not doing any updates to the index only reading from it so any insight would be appreciated. Thank you, Nasseam
Re: lucene-java version mismatches
On Wed, Mar 25, 2009 at 3:23 AM, Paul Libbrecht p...@activemath.org wrote: could I suggest that the maven repositories are populated next-time a release of solr-specific-lucenes are made? But they are? It is inside the org.apache.solr group since those lucene jars are released by Solr -- http://repo2.maven.org/maven2/org/apache/solr/ -- Regards, Shalin Shekhar Mangar.
Re: Delta import
On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA alexandre.boudrea...@canoe.cawrote: Ok i'm ok with the fact the solr gonna do X request to database for X update.. but when i try to run the delta-import command with 2 row to update is it normal that its kinda really slow ~ 1 document fetched / sec ? Not really, I've seen 1000x faster. Try firing a few of those queries on the database directly. Are they slow? Is the database remote? -- Regards, Shalin Shekhar Mangar.
Re: Not able to configure multicore
hossman wrote: : I am facing a problem related to multiple cores configuration. I have placed : a solr.xml file in solr.home directory. eventhough when I am trying to : access http://localhost:8983/solr/admin/cores it gives me tomcat error. : : Can anyone tell me what can be possible issue with this?? not without knowing exactly what the tomcat error message is, what your solr.xml file looks like, what log messages you see on startup, etc... -Hoss Hello Hoss, Thanks for reply. Here is the error message shown on browser: HTTP Status 404 - /solr2/admin/cores type Status report message /solr2/admin/cores description The requested resource (/solr2/admin/cores) is not available. and here is the solr.xml file. solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=core0 / core name=core1 instanceDir=core1 / /cores /solr -- View this message in context: http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22695098.html Sent from the Solr - User mailing list archive at Nabble.com.
Index time boost
Hi all, Can we specify the index-time boost value for a particular field in schema.xml? Thanks, Siddharth
Snapinstaller + Overlapping onDeckSearchers Problems
We have been running our solr slaves without autowarming our new searchers for a long time, but that was causing us 50-75 requests in 20+ seconds timeframe after every update on the slaves. I have turned on autowarming and that has fixed our slow response times, but I'm running into occasional Overlapping onDeckSearchers. We have replication setup and are using the snapinstaller script every 10 minutes: /home/solr/bin/snappuller -M util01 -P 18984 -D /home/solr/write/data -S /home/solr/logs -d /home/solr/read/data -u instruct; /home/solr/bin/snapinstaller -M util01 -S /home/solr/write/logs -d /home/solr/read/data -u instruct Here's what a successful update/commit log looks like: [14:13:02.510] start commit(optimize=false,waitFlush=false,waitSearcher=true) [14:13:02.522] Opening searc...@e9b4bb main [14:13:02.524] end_commit_flush [14:13:02.525] autowarming searc...@e9b4bb main from searc...@159e6e8 main [14:13:02.525] filterCache{lookups=1809739,hits=1766607,hitratio=0.97,inserts=43211,evictions=0, size=43154,cumulative_lookups=1809739,cumulative_hits=1766607,cumulative_hitratio=0.97,cumulative_inserts=43211,cumulative_evictions=0} -- [14:15:42.372] {commit=} 0 159964 [14:15:42.373] /update 0 159964 Here's what a unsuccessful update/commit log looks like, where the /update took too long and we started another commit: [21:03:03.829] start commit(optimize=false,waitFlush=false,waitSearcher=true) [21:03:03.836] Opening searc...@b2f2d6 main [21:03:03.836] end_commit_flush [21:03:03.836] autowarming searc...@b2f2d6 main from searc...@103c520 main [21:03:03.836] filterCache{lookups=1062196,hits=1062160,hitratio=0.99,inserts=49144,evictions=0,size=48353,cumulative_lookups=259485564,cumulative_hits=259426904,cumulative_hitratio=0.99,cumulative_inserts=68467,cumulative_evictions=0} -- [21:23:04.794] start commit(optimize=false,waitFlush=false,waitSearcher=true) [21:23:04.794] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 [21:23:04.802] Opening searc...@f11bc main [21:23:04.802] end_commit_flush -- [21:24:55.987] {commit=} 0 1312158 [21:24:55.987] /update 0 1312158 I don't understand why this sometimes takes two minutes between the start commit /update and sometimes takes 20 minutes? One of our caches has about ~40,000 items, but I can't imagine it taking 20 minutes to autowarm a searcher. It would be super handy if the Snapinstaller script would wait until the previous one was done before starting a new one, but I'm not sure how to make that happen. Thanks for any help with this. best, cloude -- VP of Product Development Instructables.com http://www.instructables.com/member/lebowski
Re: Index time boost
On Wed, Mar 25, 2009 at 10:14 AM, Gargate, Siddharth sgarg...@ptc.comwrote: Hi all, Can we specify the index-time boost value for a particular field in schema.xml? No. You can specify it along with the document when you add it to Solr. -- Regards, Shalin Shekhar Mangar.