multi cores vs filter queries for a multi tenant deployment
Hi everyone, I'm sort of looking in to a deployment which will support multi tenancy. This means that there will be 1000s of tenant domains each having 1000s of users. I need to figure out which approach is better for this deployment when using the solr server. Approach #1 - Use multi cores for each tenant and thereby use separate indexes for each. If necessary use filter queries with user ids for users. Approach #2 - Use filter queries with tenant ids to filter out results of different tenant domains. Similarly, as above, use user ids as needed. My concern comes on aspects of performance and security. Will using approach #1 be a killer for performance? With this many number of users, this setup has to scale smoothly for so many number of users. When the deployment potentially will have 1000s of cores, how can I prevent a security vulnerability appearing between cores? What are the implications of using approach #2? Will I have to constantly check around for code with security checks since only a single index is used? Any feedback for the above concerns would be really appreciated. Thanks in advance. -- Regards, Tharindu
Re: facet.method: enum vs. fc
Thank you Erick, your explanation was helpful. I'll stick with fc and come back to this later if I need further tuning. Paolo Erick Erickson wrote: Yep, that was probably the best choice It's a classic time/space tradeoff. The enum method creates a bitset for #each# unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring some overhead here). So if your facet field has 10 unique values, and 8M documents, you'll use up 10M bytes or so. 20 unique values will use up 20M bytes and so on. But this is very, very fast. fc on the other hand, eats up cache for storing the string value for each unique value, plus various counter arrays (several bytes/doc). For most cases, it will use less memory than enum, but will be slower. I'd stick with fc for the time being and think about enum if 1> you have a good idea of what the number of unique terms is or 2> you start to need to finely tune your speed. HTH Erick On Mon, Oct 11, 2010 at 11:30 AM, Paolo Castagna < castagna.li...@googlemail.com> wrote: Hi, I am using Solr v1.4 and I am not sure which facet.method I should use. What should I use if I do not know in advance if the number of values for a given field will be high or low? What are the pros/cons of using facet.method=enum vs. facet.method=fc? When should I use enum vs. fc? I have found some comments and suggestions here: "enum enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query. This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. "fc (stands for field cache), the facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document. This was the default method for single valued fields prior to Solr 1.4. The default value is fc (except for BoolField) since it tends to use less memory and is faster when a field has many unique terms in the index." -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method "facet.method=enum [...] this is excellent for fields where there is a small set of distinct values. The average number of values per document does not matter. facet.method=fc [...] this is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the filterCache for terms that match many documents." -- http://wiki.apache.org/solr/SolrFacetingOverview "If you are faceting on a field that you know only has a small number of values (say less than 50), then it is advisable to explicitly set this to enum. When faceting on multiple fields, remember to set this for the specific fields desired and not universally for all facets. The request handler configuration is a good place to put this." -- Book: "Solr 1.4 Enterprise Search Server", pag. 148 This is the part of the Solr code which deals with the facet.method parameter: if (enumMethod) { counts = getFacetTermEnumCounts([...]); } else { if (multiToken) { UnInvertedField uif = [...] counts = uif.getCounts([...]); } else { [...] if (per_segment) { [...] counts = ps.getFacetCounts([...]); } else { counts = getFieldCacheCounts([...]); } } } -- https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java See also: - http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values At the end, since I do not know in advance the number of different values for my fields I went for facet.method=fc, does this seems reasonable to you? Thank you, Paolo
Re: configuring custom CharStream in solr
On 10/11/2010 10:18 PM, Chris Hostetter wrote: : OK - I found the answer pecking through the source - apparently the name of : the element to configure a CharFilter is - fancy that :) there's even an example, right there on the wiki... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories -Hoss I am just bathing myself in wizardly astuteness today thanks -Mike
Re: LuceneRevolution - NoSQL: A comparison
It sounds, of course, a lot like transaction isolation using MVCC. It's the obvious solution, and has been for since the late 1970's. I hope it won't be too hard to convince people to use it :-) It's been the reason for the early success of Oracle. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/11/10, Yonik Seeley wrote: > From: Yonik Seeley > Subject: Re: LuceneRevolution - NoSQL: A comparison > To: solr-user@lucene.apache.org > Date: Monday, October 11, 2010, 7:20 PM > On Mon, Oct 11, 2010 at 8:32 PM, > Peter Keegan > wrote: > > I listened with great interest to Grant's presentation > of the NoSQL > > comparisons/alternatives to Solr/Lucene. It sounds > like the jury is still > > out on much of this. Here's a use case that might > favor using a NoSQL > > alternative for storing 'stored fields' outside of > Lucene. > > > > When Solr does a distributed search across shards, it > does this in 2 phases > > (correct me if I'm wrong): > > > > 1. 1st query to get the docIds and facet counts > > 2. 2nd query to retrieve the stored fields of the top > hits > > > > The problem here is that the index could change > between (1) and (2), so it's > > not an atomic transaction. > > Yep. > > As I discussed with Peter at Lucene Revolution, if this > feature is > important to people, I think the easiest way to solve it > would be via > leases. > > During a query, a client could request a lease for a > certain amount of > time on whatever index version is used to generate the > response. Solr > would then return the index version to the client along > with the > response, and keep the index open for that amount of > time. The client > could make consistent additional requests (such as the 2nd > phase of a > distributed request) by requesting the same version > of the index. > > -Yonik >
Re: LuceneRevolution - NoSQL: A comparison
Well, I think that if some is searching the 'whole of the dataset' to find the 'individual data' then an SQL database outside of Solr makes as much sense. There's plenty of data in the world or most applications that needs to stay normalized or at least has benefits to being that way. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/11/10, Peter Keegan wrote: > From: Peter Keegan > Subject: LuceneRevolution - NoSQL: A comparison > To: solr-user@lucene.apache.org > Date: Monday, October 11, 2010, 5:32 PM > I listened with great interest to > Grant's presentation of the NoSQL > comparisons/alternatives to Solr/Lucene. It sounds like the > jury is still > out on much of this. Here's a use case that might favor > using a NoSQL > alternative for storing 'stored fields' outside of Lucene. > > When Solr does a distributed search across shards, it does > this in 2 phases > (correct me if I'm wrong): > > 1. 1st query to get the docIds and facet counts > 2. 2nd query to retrieve the stored fields of the top hits > > The problem here is that the index could change between (1) > and (2), so it's > not an atomic transaction. If the stored fields were kept > outside of Lucene, > only the first query would be necessary. However, this > would mean that the > external NoSQL data store would have to be synchronized > with the Lucene > index, which might present its own problems. (I'm just > throwing this out for > discussion) > > Peter >
Re: LuceneRevolution - NoSQL: A comparison
On Mon, Oct 11, 2010 at 8:32 PM, Peter Keegan wrote: > I listened with great interest to Grant's presentation of the NoSQL > comparisons/alternatives to Solr/Lucene. It sounds like the jury is still > out on much of this. Here's a use case that might favor using a NoSQL > alternative for storing 'stored fields' outside of Lucene. > > When Solr does a distributed search across shards, it does this in 2 phases > (correct me if I'm wrong): > > 1. 1st query to get the docIds and facet counts > 2. 2nd query to retrieve the stored fields of the top hits > > The problem here is that the index could change between (1) and (2), so it's > not an atomic transaction. Yep. As I discussed with Peter at Lucene Revolution, if this feature is important to people, I think the easiest way to solve it would be via leases. During a query, a client could request a lease for a certain amount of time on whatever index version is used to generate the response. Solr would then return the index version to the client along with the response, and keep the index open for that amount of time. The client could make consistent additional requests (such as the 2nd phase of a distributed request) by requesting the same version of the index. -Yonik
Re: configuring custom CharStream in solr
: OK - I found the answer pecking through the source - apparently the name of : the element to configure a CharFilter is - fancy that :) there's even an example, right there on the wiki... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories -Hoss
Re: configuring custom CharStream in solr
On 10/11/2010 8:38 PM, Michael Sokolov wrote: On 10/11/2010 6:41 PM, Koji Sekiguchi wrote: (10/10/12 5:57), Michael Sokolov wrote: I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration syntax in schema.xml, or would I need to define my own Analyzer? The solr wiki describes adding Tokenizers, but doesn't say anything about CharReaders/Filters. Thanks for any pointers -Mike Hi Mike, You can write your own CharFilterFactory that creates your own CharStream. Please refer existing CharFilterFactories in Solr to see how you can implement it. Koji Koji - thanks for your response. I think I can see my way clear to making a factory class for my stream. My question was really about how to configure the factory. I see a number of examples of tokenizers and analyzers configured in the example schema.xml, but no readers. For example: configures a specific tokenizer. If I want to configure my CharStream, is there an element for that? Eg: I am guessing that I need to create my own analyzer and hard-code the reader/tokenizer filter chain in there, but it would be nice if there were a syntax like the one I inferred above. -Mike OK - I found the answer pecking through the source - apparently the name of the element to configure a CharFilter is - fancy that :) -MIke
Re: configuring custom CharStream in solr
On 10/11/2010 6:41 PM, Koji Sekiguchi wrote: (10/10/12 5:57), Michael Sokolov wrote: I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration syntax in schema.xml, or would I need to define my own Analyzer? The solr wiki describes adding Tokenizers, but doesn't say anything about CharReaders/Filters. Thanks for any pointers -Mike Hi Mike, You can write your own CharFilterFactory that creates your own CharStream. Please refer existing CharFilterFactories in Solr to see how you can implement it. Koji Koji - thanks for your response. I think I can see my way clear to making a factory class for my stream. My question was really about how to configure the factory. I see a number of examples of tokenizers and analyzers configured in the example schema.xml, but no readers. For example: configures a specific tokenizer. If I want to configure my CharStream, is there an element for that? Eg: I am guessing that I need to create my own analyzer and hard-code the reader/tokenizer filter chain in there, but it would be nice if there were a syntax like the one I inferred above. -Mike
LuceneRevolution - NoSQL: A comparison
I listened with great interest to Grant's presentation of the NoSQL comparisons/alternatives to Solr/Lucene. It sounds like the jury is still out on much of this. Here's a use case that might favor using a NoSQL alternative for storing 'stored fields' outside of Lucene. When Solr does a distributed search across shards, it does this in 2 phases (correct me if I'm wrong): 1. 1st query to get the docIds and facet counts 2. 2nd query to retrieve the stored fields of the top hits The problem here is that the index could change between (1) and (2), so it's not an atomic transaction. If the stored fields were kept outside of Lucene, only the first query would be necessary. However, this would mean that the external NoSQL data store would have to be synchronized with the Lucene index, which might present its own problems. (I'm just throwing this out for discussion) Peter
multicore replication slave
Hello, I can't get my multicore slave to replicate from the master. The master is setup properly and the following urls return "00OKNo command" as expected: http://solr.mydomain.com:8983/solr/core1/replication http://solr.mydomain.com:8983/solr/core2/replication http://solr.mydomain.com:8983/solr/core3/replication The following pastie shows how my slave is setup: http://pastie.org/1214209 But it's not working (i.e. I see no replication attempts in the slave's log). Any ideas? Thanks for the help.
Re: Trouble with exception Document [Null] missing required field DocID
: Right. You're requiring that every document have an ID (via uniqueKey), but : there's nothing : magic about DIH that'll automagically parse a PDF file and map something : into your ID : field. : : So you have to create a unique ID before you send your doc to Curl. I'm a) This example isn't using DIH, it's using the extracting request handler directly b) in the example URL provided, Ahson was already using the exact syntax you mentioned... : > curl : > " : > http://localhost:8983/solr1/update/extract?literal.DocID=123&fmap.content=Contents&commit=true : > " : > -F "myfi...@d:/solr/apache-solr-1.4.0/docs/filename1.pdf" ...note the "literal.DocID" param (where "DocID" is the field listed as uniqueKey in his example) The actual root of the problem is that the "lowernames" param (which is declared "true" in the Solr 1.4 example declaration of /update/extract) is getting applied to all field names, even the literal ones... http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations Ahson: You could change your uniqueKey field to something that is all lowercase, or you could set lowernames=false in your config (which will impact all field names extract by Tika) (Personally, i think the order of operations in the ExtractingRequestHandler makes no sense at all) -Hoss
Re: having problem about Solr Date Field.
So, regarding DST, do you put everything in GMT, and make adjustments for in the 'seach for/between' data/time values before the query for both DST and TZ? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/11/10, Chris Hostetter wrote: > From: Chris Hostetter > Subject: Re: having problem about Solr Date Field. > To: solr-user@lucene.apache.org > Date: Monday, October 11, 2010, 3:23 PM > > : Of course if your index is for users in one time zone > only, you may > : insert the local time to Solr, and everything will work > well. However, > > This is a bad assumption to make -- it will screw you up if > your "one time > zone" has anything like "Daylight Saving Time" (Because UTC > Does not) > > > -Hoss >
Re: Records from DIH not easily queried for
Well, found the problem, us of course. We were using string instead of text for the field type in the schema config file. So it wasn't tokenizing words or doing other 'search by word' enabling preprocessing before storing the document in the index. We could have only found whole sentences. Now it works! But, now the long road to tuning it to find what we WANT it to find . . . begins. That and getting what we want out of geospatial. We're just starting on that. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sun, 10/10/10, Erick Erickson wrote: > From: Erick Erickson > Subject: Re: Records from DIH not easily queried for > To: solr-user@lucene.apache.org > Date: Sunday, October 10, 2010, 8:11 AM > The phrase that jumps out is "with > fields slightly modified". I'm > guessing that your modifications are off by a little. > Here's what > I'd check first: > 1> check the case. Sometimes the DB <-> field link > is case > sensitive. > 2> Look in your index via the admin page and look at > your actual > fields as reported there. Are they really what you expect? > 3> Try your query with &debugQuery=on. Is what you > get back > what you expect? > 4> Sometimes your browser cache will fool you, try the > force-refresh > combination on your browser. > > There's no magic here, nothing special or different about > DIH > imported data than any other sort. So it's almost > certainly > some innocent-seeming change that's not, typo, incorrect > assumption, etc. > > If none of that works, you need to post your schema changes > and > your query results (with &debugQuery=on). Particularly, > post > the fieldType definitions as well as your field > definitions... > > Best > Erick > > On Sun, Oct 10, 2010 at 10:55 AM, Dennis Gearon wrote: > > > With a brand new setup, per the demo/tutorial, with > fields slightly changed > > in the config and data, posting XML records results in > a simple qiery being > > able to find records. > > > > > > But records imported via a plain jane DIH request can > only be found using > > 'q=*:*' queries. > > > > There's no filtering, tokenizing, blah blah. It's the > factory settings. The > > installation is as new at this as we are :-) > > > > Anyone have any ideas why we can't query for DIH > handled records? Do they > > have some magic juju done to them that XML Posts > don't, or visa versa? > > > > Dennis Gearon > > > > Signature Warning > > > > It is always a good idea to learn from your own > mistakes. It is usually a > > better idea to learn from others’ mistakes, so you > do not have to make them > > yourself. from ' > > http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > EARTH has a Right To Life, > > otherwise we all die. > > >
Re: Where is the lock file?
: I've looked through the configuration file. I can see where it defines the : lock type and I can see the unlock configuration. But I don't see where it : specifies the lock file. Where is it? What is its name? as mentioned in the stack trace you pasted, the name of the lock file in question is "write.lock" however what's really odd is that based on your stack trace you seem to be using the SingleInstanceLockFactory (ie: "single") which means the lock file is never written to disk -- it's an entirely in memory Lock object. If you are getting that stack trace, that suggests that something is seriously wack with your Solr setup -- is it possible you have multiple isntances of Solr in the same JVM trying to use the same directory? (ie: an instance that wasn't shutdown cleanly, and then you started up a new instance using war hot deploy or something like it?) : Also, to speed up nutch, we changed the configuration to start several map : tasks at once. Is nutch trying to kick off several solr sessions at once and : is that causing messages like the above? Should we just change the lock to : simple? i don't know enough about nutch to know what this means ... if Nutch is starting up multiple Solr servers (in the same JVM) then this might explain the exception above ... using a "simple" lock isn't going to make hte problem go away though: only one Solr instance can be writting to an index at a time. -Hoss
Re: How to get line numbers from Solr plugin to show up in stack trace
: Hello, I am writing a clustering component for Solr. It registers, loads and : works properly. However, whenever there is an exception inside my plugin, I : cannot get tomcat to show me the line numbers. It always says "Unknown source" : for my classes. The stack trace in tomcat shows line numbers for everything up : to org.apache.solr.handler.component.SearchHandler class, but after that it : shows my class names without line numbers. My compiler in ant build file is set : to include debug info: : I've never seen "debuglevel" in a build.xml ... Solr's build.xml just uses debug="true" and things seem to work fine. Googling for "ant debuglevel" suggests that: 1) you don't want "and" in that attribute 2) you odn't want any spaces in there either -Hoss
Re: configuring custom CharStream in solr
(10/10/12 5:57), Michael Sokolov wrote: I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration syntax in schema.xml, or would I need to define my own Analyzer? The solr wiki describes adding Tokenizers, but doesn't say anything about CharReaders/Filters. Thanks for any pointers -Mike Hi Mike, You can write your own CharFilterFactory that creates your own CharStream. Please refer existing CharFilterFactories in Solr to see how you can implement it. Koji -- http://www.rondhuit.com/en/
Re: having problem about Solr Date Field.
: Of course if your index is for users in one time zone only, you may : insert the local time to Solr, and everything will work well. However, This is a bad assumption to make -- it will screw you up if your "one time zone" has anything like "Daylight Saving Time" (Because UTC Does not) -Hoss
Re: StatsComponent and multi-valued fields
: I'm able to execute stats queries against multi-valued fields, but when : given a facet, the statscomponent only considers documents that have a facet : value as the last value in the field. : : As an example, imagine you are running stats on "fooCount", and you want to : facet on "bar", which is multi-valued. Two documents... It's a known bug ... StatsComponent's "Faceted Stats" make some really gross assumptions about the Field... https://issues.apache.org/jira/browse/SOLR-1782 -Hoss
weighted facets
Hi, I need a feature which is well explained from Mr Goll at this site ** So, it then would be nice to do sth. like: facet.stats=sum(fieldX)&facet.stats.sort=fieldX And the output (sorted against the sum-output) can look sth. like this: 767 892 Is there something similar or was this answered from Hoss at the lucene revolution? If not I'll open a JIRA issue ... BTW: is the work from http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf contributed back to solr? Regards, Peter. PS: Related issue: https://issues.apache.org/jira/browse/SOLR-680 https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch ** http://lucene.crowdvine.com/posts/14137409 Quoting his question in case the site goes offline: Hi Chris, Usually a facet search returns the document count for the unique values in the facet field. Is there a way to return a weighted facet count based on a user-defined function (sum, product, etc.) of another field? Here is a sum example. Assume we have the following 4 documents with 3 fields ID facet_field weight_field 1 solr 0.4 2 lucene 0.3 3 lucene 0.1 4 lucene 0.2 Is there a way to return solr 0.4 lucene 0.6 instead of solr 1 lucene 3 Given the facet_field contains multiple values ID facet_field weight_field 1 solr lucene 0.2 2 lucene 0.3 3 solr lucene 0.1 4 lucene 0.2 Is there a way to return solr 0.3 lucene 0.8 instead of solr 2 lucene 4 Thanks, Johannes
Re: data import / delta question
Thanks, Erick. I was starting to think I may have to go the SolrJ route. Here's a simplified version of my DIH config showing what I'm trying to do. On Mon, Oct 11, 2010 at 4:25 PM, Erick Erickson wrote: > Without seeing your DIH config, it's really hard to say much of anything. > > You can gain finer control over edge cases by writing a Java > app that uses SolrJ if necessary. > > HTH > Erick > > On Mon, Oct 11, 2010 at 3:27 PM, Tim Heckman wrote: > >> My data-import-config.xml has a parent entity and a child entity. The >> data is coming from rdbms's. >> >> I'm trying to make use of the delta-import feature where a change in >> the child entity can be used to regenerate the entire document. >> >> The child entity is on a different database (and a different server) >> from the parent entity, so the child's parentDeltaQuery cannot >> reference the table of the parent entity the way that the example on >> the wiki does, because it's bound to the database connection for the >> child entity's data (as far as I can tell). >> >> http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command >> >> >> I have tried extracting the parent's ID's from the child table in the >> parentDeltaQuery, thinking that these id's would be fed into the >> parent's deltaImportQuery, but this doesn't seem to work, either. >> >> Should this work? If not, any suggestions how to work around it? >> >> thanks, >> Tim >> >
configuring custom CharStream in solr
I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration syntax in schema.xml, or would I need to define my own Analyzer? The solr wiki describes adding Tokenizers, but doesn't say anything about CharReaders/Filters. Thanks for any pointers -Mike
Re: Deleting Documents with null fields by query
"erase all the content". Oops. first, I should look more carefully. You don't want the AND in there, use *:* -content:[* TO *] In general, don't mix and match booleans and native Lucene query syntax... Before sending this to Solr, what do you get back when you try just the query in, say, the admin page? I'd be testing the query there before actually submitting the delete Best Erick On Mon, Oct 11, 2010 at 4:33 PM, Claudio Devecchi wrote: > yes.. > > dont work, doing it I erase all the content. :( > > or, another thing that will help me is to make a query that doesnt bring > the > null one. > > tks > > On Mon, Oct 11, 2010 at 5:27 PM, Erick Erickson >wrote: > > > Have you tried something like: > > > > '*:* AND > > -content:[* TO *] > > > > > > On Mon, Oct 11, 2010 at 4:01 PM, Claudio Devecchi > >wrote: > > > > > Hi everybody, > > > > > > I'm trying to delete by query some documents with null content (this > > > happened because I crawled my intranet and somethings came null) > > > > > > When I try this works fine (I'm deleting from my solr index every > > document > > > that dont have wiki on the field content) > > > curl http://localhost:8983/solr/update?commit=true -H 'Content-Type: > > > text/xml' --data-binary '*:* AND > > > -content:wiki' > > > > > > Now I need to make a query that delete every document that have the > field > > > content null. > > > > > > Somebody could help me pls? > > > > > > Tks > > > CLaudio > > > > > > > > > -- > Claudio Devecchi > flickr.com/cdevecchi >
Re: Deleting Documents with null fields by query
yes.. dont work, doing it I erase all the content. :( or, another thing that will help me is to make a query that doesnt bring the null one. tks On Mon, Oct 11, 2010 at 5:27 PM, Erick Erickson wrote: > Have you tried something like: > > '*:* AND > -content:[* TO *] > > > On Mon, Oct 11, 2010 at 4:01 PM, Claudio Devecchi >wrote: > > > Hi everybody, > > > > I'm trying to delete by query some documents with null content (this > > happened because I crawled my intranet and somethings came null) > > > > When I try this works fine (I'm deleting from my solr index every > document > > that dont have wiki on the field content) > > curl http://localhost:8983/solr/update?commit=true -H 'Content-Type: > > text/xml' --data-binary '*:* AND > > -content:wiki' > > > > Now I need to make a query that delete every document that have the field > > content null. > > > > Somebody could help me pls? > > > > Tks > > CLaudio > > > -- Claudio Devecchi flickr.com/cdevecchi
Re: Disable (or prohibit) per-field overrides
I'm clueless in that case, because you're right, that's a lot of picky maintenance Sorry 'bout that Erick On Mon, Oct 11, 2010 at 4:18 PM, Markus Jelsma wrote: > Yes, we're using it but the problem is that there can be many fields and > that means quite a large list of parameters to set for each request handler, > and there can be many request handlers. > > It's not very practical for us to maintain such big set of invariants. > > Thanks > > > > On Mon, 11 Oct 2010 16:12:35 -0400, Erick Erickson < > erickerick...@gmail.com> wrote: > >> Have you looked at "invariants" in solrconfig.xml? >> >> Best >> Erick >> >> On Mon, Oct 11, 2010 at 12:23 PM, Markus Jelsma >> wrote: >> >> Hi, >>> >>> Anyone knows useful method to disable or prohibit the per-field override >>> features for the search components? If not, where to start to make it >>> configurable via solrconfig and attempt to come up with a working patch? >>> >>> Cheers, >>> -- >>> Markus Jelsma - CTO - Openindex >>> http://www.linkedin.com/in/markus17 >>> 050-8536600 / 06-50258350 >>> >>> > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 >
Re: Deleting Documents with null fields by query
Have you tried something like: '*:* AND -content:[* TO *] On Mon, Oct 11, 2010 at 4:01 PM, Claudio Devecchi wrote: > Hi everybody, > > I'm trying to delete by query some documents with null content (this > happened because I crawled my intranet and somethings came null) > > When I try this works fine (I'm deleting from my solr index every document > that dont have wiki on the field content) > curl http://localhost:8983/solr/update?commit=true -H 'Content-Type: > text/xml' --data-binary '*:* AND > -content:wiki' > > Now I need to make a query that delete every document that have the field > content null. > > Somebody could help me pls? > > Tks > CLaudio >
Re: data import / delta question
Without seeing your DIH config, it's really hard to say much of anything. You can gain finer control over edge cases by writing a Java app that uses SolrJ if necessary. HTH Erick On Mon, Oct 11, 2010 at 3:27 PM, Tim Heckman wrote: > My data-import-config.xml has a parent entity and a child entity. The > data is coming from rdbms's. > > I'm trying to make use of the delta-import feature where a change in > the child entity can be used to regenerate the entire document. > > The child entity is on a different database (and a different server) > from the parent entity, so the child's parentDeltaQuery cannot > reference the table of the parent entity the way that the example on > the wiki does, because it's bound to the database connection for the > child entity's data (as far as I can tell). > > http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command > > > I have tried extracting the parent's ID's from the child table in the > parentDeltaQuery, thinking that these id's would be fed into the > parent's deltaImportQuery, but this doesn't seem to work, either. > > Should this work? If not, any suggestions how to work around it? > > thanks, > Tim >
Re: Solr unresponsive but still taking queries
The first question is "what's been changing"? I suspect something's been growing right along and finally tripped you up. Places I would look first: 1> how much free space is on your disk? Have your logs (or other files) grown without bound? 2> If this is a Unix box, what does "top" report? In other words, profile your machine and see what the limiting resources is. You should be seeing something pathalogical. Your CPUs should be pegged (find out the program using it up). Or your I/O is swapping like a crazy thing. Or Until you have some clue where you're being starved, you're just guessing... Even negative data is better than none (i.e. being CPU bound rules out most I/O problems and vice-versa). It's even possible that what's happening is that some other program on that box is mis-behaving and starving your searcher process. The possibilities are endless. A *very* quick way to test a lot would be to move the searcher onto another box and see what happens then. Best Erick On Mon, Oct 11, 2010 at 2:36 PM, Hitendra Molleti wrote: > Hi, > > > > We are running a CMS based on Java and use Solr 1.4 as the indexer. > > > > Till today afternoon things were fine until we hit this Solr issue where it > sort of becomes unresponsive. We tried to stop and restart Solr but no > help. > > > > When we look into the logs Solr is receiving queries and running them but > we > do not seem to get the responses and after an endless wait the page > generates a 503 error (Varnish on the front end). > > > > Can someone help us with any possible suggestions or solutions. > > > > Thanks > > > > Hitendra > >
Re: Disable (or prohibit) per-field overrides
Yes, we're using it but the problem is that there can be many fields and that means quite a large list of parameters to set for each request handler, and there can be many request handlers. It's not very practical for us to maintain such big set of invariants. Thanks On Mon, 11 Oct 2010 16:12:35 -0400, Erick Erickson wrote: Have you looked at "invariants" in solrconfig.xml? Best Erick On Mon, Oct 11, 2010 at 12:23 PM, Markus Jelsma wrote: Hi, Anyone knows useful method to disable or prohibit the per-field override features for the search components? If not, where to start to make it configurable via solrconfig and attempt to come up with a working patch? Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Disable (or prohibit) per-field overrides
Have you looked at "invariants" in solrconfig.xml? Best Erick On Mon, Oct 11, 2010 at 12:23 PM, Markus Jelsma wrote: > Hi, > > Anyone knows useful method to disable or prohibit the per-field override > features for the search components? If not, where to start to make it > configurable via solrconfig and attempt to come up with a working patch? > > Cheers, > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 >
Re: deleteByQuery issue
I'd guess that after you delete your documents and commit, you're still using an IndexReader that you haven't reopened when you search. WARNING: I'm not all that familiar with EmbeddedSolrServer, so this may be way off base. HTH Erick On Mon, Oct 11, 2010 at 12:04 PM, Claudio Atzori wrote: > On 10/11/2010 04:06 PM, Ahmet Arslan wrote: > >> >> --- On Mon, 10/11/10, Claudio Atzori wrote: >> >> From: Claudio Atzori >>> Subject: deleteByQuery issue >>> To: solr-user@lucene.apache.org >>> Date: Monday, October 11, 2010, 10:38 AM >>> Hi everybody, >>> in my application I use an instance of EmbeddedSolrServer >>> (solr 1.4.1), the following snippet shows how I am >>> instantiating it: >>> >>> File home = new >>> File(indexDataPath(solrDataDir, indexName)); >>> container = new >>> CoreContainer(indexDataPath(solrDataDir, indexName)); >>> container.load(indexDataPath(solrDataDir, >>> indexName), new File(home, "solr.xml")); >>> return new >>> EmbeddedSolrServer(container, indexName); >>> >>> and I'm going through some issues using deleteByQuery >>> method, in fact, when I try to delete a subset of documents, >>> or even all the documents from the index, I see as they are >>> correctly marked for deletion on the luke inspector ( >>> http://code.google.com/p/luke/), but after a commit I >>> can still retrieve them, just like they haven't been >>> removed... >>> >>> I can see the difference and see the documents disappear >>> only when I restart my jetty application, but obviously this >>> cannot be a feature... any idea? >>> >> I think you are accessing same solr index using both embedded server and >> http. >> The changes that you made using embedded server won't be reflected to http >> until a commit issued from http. I mean if you hit this url: >> >> http://localhost:8983/solr/update?commit=true >> >> the deleted documents won't be retrieved anymore. >> >> P.s. if you want to expunge deleted docs completely you can either >> optimize or commit with expungeDeletes = "true". >> >> > Thanks for your reply. > Alright I'll better explain my scenario. I'm not exposing any http > interface of the index. I handle the whole index 'life cycle' via java code > with the EmbeddedSolrServer instance, so I'm handling commits, > optimizations, feedings, index creation, all through that instance, moreover > my client application calls embeddedSolrServerInstance.commit() after > deleteByQuery, but the documents are still there > >
Deleting Documents with null fields by query
Hi everybody, I'm trying to delete by query some documents with null content (this happened because I crawled my intranet and somethings came null) When I try this works fine (I'm deleting from my solr index every document that dont have wiki on the field content) curl http://localhost:8983/solr/update?commit=true -H 'Content-Type: text/xml' --data-binary '*:* AND -content:wiki' Now I need to make a query that delete every document that have the field content null. Somebody could help me pls? Tks CLaudio
Re: Prioritizing adjectives in solr search
You can do some interesting things with payloads. You could index a particular value as the payload that identified the "kind" of word it was, where "kind" is something you define. Then at query time, you could boost depending on what part kind of word you identified it as in both the query and at indexing time. But I can't even imagine how one would go about supporting this in a general search engine. This kind of thing seems far too domain specific. Best Erick On Sun, Oct 10, 2010 at 8:50 PM, Ron Mayer wrote: > Walter Underwood wrote: > > I think this is a bad idea. The tf.idf algorithm will already put a > higher weight on "hammers" than on "blue", because "hammers" will be more > rare than "blue". Plus, you are making huge assumptions about the queries. > In a search for "Canon camera", "Canon" is an adjective, but it is the > important part of the query. > > > > Have you looked at your query logs and which queries are successful and > which are not? > > > > Don't make radical changes like this unless you can justify them from the > logs. > > The one radical change I'd like in the area of adjectives in noun clauses > is if > more weight were put when the adjectives apply to the appropriate noun. > > For example, a search for: > 'red baseball cap black leather jacket' > should find a doc with "the guy wore a red cap, blue jeans, and a leather > jacket" > before one that says "the guy wore a black cap, leather pants, and a red > jacket". > > > The closest I've come at doing this was to use a variety of "phrase slop" > boosts simultaneously - so that "red [any_few_words] cap" "baseball cap" > "leather jacket", "black [any_few_words] jacket" all add boosts to the > score. > > > > > > > > > > > wunder > > > > On Oct 4, 2010, at 8:38 PM, Otis Gospodnetic wrote: > > > >> Hi, > >> > >> If you want "blue" to be used in search, then you should not treat it as > a > >> stopword. > >> > >> Re payloads: http://search-lucene.com/?q=payload+score > >> and http://search-lucene.com/?q=payload+score&fc_type=wiki (even > better, look at > >> hit #1) > >> > >> Otis > >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> Lucene ecosystem search :: http://search-lucene.com/ > >> > >> > >> > >> - Original Message > >>> From: Hasnain > >>> To: solr-user@lucene.apache.org > >>> Sent: Mon, October 4, 2010 9:50:46 AM > >>> Subject: Re: Prioritizing advectives in solr search > >>> > >>> > >>> Hi Otis, > >>> > >>> Thank you for replying, unfortunately Im unable to fully grasp > what > >>> you are trying to say, can you please elaborate what is payload with > >>> adjective terms? > >>> > >>> also Im using stopwords.txt to stop adjectives, adverbs and verbs, now > when > >>> I search for "Blue hammers", solr searches for "blue hammers" and > "hammers" > >>> but not "blue", but the problem here is user can also search for just > >>> "Blue", then it wont search for anything... > >>> > >>> any suggestions on this?? > >>> > >>> -- > >>> View this message in context: > >>> > http://lucene.472066.n3.nabble.com/Prioritizing-adjectives-in-solr-search-tp1613029p1629725.html > >>> > >>> Sent from the Solr - User mailing list archive at Nabble.com. > >>> > > > > > > > > > >
Re: CoreContainer Usage
Hi sorry perhaps my question wasn't very clear. Basically I am trying to build a federated search where I blend the results of queries to multiple cores together. This is like distributed search but I believe the distributed search will issue network calls which I would like to avoid. I have read that someone will use a single core as the federated search handler and then run the searches across multiple cores and blend the results. This is great but I can't figure out how to easily get access to an instance of the CoreContainer that I hope has been initialized (so I am not having it re-parse the configuration files). Any help would be appreciated. Thanks! Amit On Thu, Oct 7, 2010 at 10:07 AM, Amit Nithian wrote: > I am trying to understand the multicore setup of Solr more and saw > that SolrCore.getCore is deprecated in favor of > CoreContainer.getCore(name). How can I get a reference to the > CoreContainer for I assume it's been created somewhere in Solr and is > it possible for one core to get access to another SolrCore via the > CoreContainer? > > Thanks > Amit >
data import / delta question
My data-import-config.xml has a parent entity and a child entity. The data is coming from rdbms's. I'm trying to make use of the delta-import feature where a change in the child entity can be used to regenerate the entire document. The child entity is on a different database (and a different server) from the parent entity, so the child's parentDeltaQuery cannot reference the table of the parent entity the way that the example on the wiki does, because it's bound to the database connection for the child entity's data (as far as I can tell). http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command I have tried extracting the parent's ID's from the child table in the parentDeltaQuery, thinking that these id's would be fed into the parent's deltaImportQuery, but this doesn't seem to work, either. Should this work? If not, any suggestions how to work around it? thanks, Tim
Re: Prioritizing advectives in solr search
: here is my scenario, im using dismax handler and my understanding is when I : query "Blue hammer", solr brings me results for "blue hammer", "blue" and : "hammer", and in the same hierarchy, which is understandable, is there any : way I can manage the "blue" keyword, so that solr searches for "blue hammer" : and "hammer" and not any results for "blue". at a very simple level, you can achieve something like this by using a "qf" that points at fields where adjectives have been removed (ie: using StopFilter) and using "pf" fields where the adjectives have been left alone -- thus a query for "blue hammer" will match any doc containing "hammer" but the "pf" clause will boost documents matching the phrase "blue hammer" (documents matching only "blue" will not match, and documents matching "blue" and "hammer" farther apart then the "ps" param will not get the phrase boost) But pleast note Walter's comments and consider them carefully before treating this as a silver bullet. : : my handler is as follows... : : : : :dismax :explicit : 0.6 : name^2.3 mat_nr^0.4 : 0% : : any suggestion on this?? : -- : View this message in context: http://lucene.472066.n3.nabble.com/Prioritizing-advectives-in-solr-search-tp1613029p1613029.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
Solr unresponsive but still taking queries
Hi, We are running a CMS based on Java and use Solr 1.4 as the indexer. Till today afternoon things were fine until we hit this Solr issue where it sort of becomes unresponsive. We tried to stop and restart Solr but no help. When we look into the logs Solr is receiving queries and running them but we do not seem to get the responses and after an endless wait the page generates a 503 error (Varnish on the front end). Can someone help us with any possible suggestions or solutions. Thanks Hitendra
Re: Search within a subset of documents
And so I think. Actually I hope that I can do something like that: 1) tell the Solr to prepare for searching 2) start my very fast filtering routine 3) send asynchronoussly IDs of filtered documents to the Solr and expect that Solr is ranging them in the parallel 4) get the result quickly On 11 October 2010 21:25, Gora Mohanty wrote: > On Mon, Oct 11, 2010 at 8:20 PM, Sergey Bartunov wrote: >> Whether it will be enough effective if the subset is really large? > [...] > > If the subset of IDs is large, and disjoint (so that you cannot use ranges), > the query might look ugly, but generating it should not be much of a > problem if you are using some automated method to create the query. > > If you mean whether it will be efficient enough, the only way is to try > it out, and measure performance. Offhand, I do not think that it should > increase the query response time by a lot. > > Regards, > Gora >
Re: How to manage different indexes for different users
Great! Just what I need. Thanks for all the help. I'll let you know how it goes. On Mon, Oct 11, 2010 at 11:37 PM, Markus Jelsma wrote: > Well, set the user ID for each document and use a filter query to filter > only on field:. > > On Mon, 11 Oct 2010 23:25:29 +0530, Tharindu Mathew > wrote: > >> On Mon, Oct 11, 2010 at 10:48 PM, Markus Jelsma wrote: >> >> Then you probably read on how to create [1] the new core. Keep in >> mind, you might need to do some additional local scripting to create a >> new instance dir. >> >> Do the user share the same schema? If so, you'd be better of keeping >> a single index and preventing the users from querying others. >> >> Yes, they will be sharing the same schema. If I understand correctly. >> going with a single core is recommended in that case? But how do I >> prevent users from querying other users data? >> >> [1]: http://wiki.apache.org/solr/CoreAdmin#CREATE [2] >> >> On Mon, 11 Oct 2010 22:40:03 +0530, Tharindu Mathew wrote: >> >> Thanks Li. I checked out multi cores documentation. >> >> How do I dynamically create new cores as new users are added. Is >> that >> possible? >> >> On Mon, Oct 11, 2010 at 2:31 PM, Li Li wrote: >> >> >> will one user search other user's index? >> if not, you can use multi cores. >> >> 2010/10/11 Tharindu Mathew : >> >> > Hi everyone, >> > >> > I'm using solr to integrate search into my web app. >> > >> > I have a bunch of users who would have to be given their own >> individual >> > indexes. >> > >> > I'm wondering whether I'd have to append their user ID as I index >> a file. >> > I'm not sure which approach to follow. Is there a sample or a doc >> I can >> read >> > to understand how to approach this problem? >> > >> > Thanks in advance. >> > >> > -- >> > Regards, >> > >> > Tharindu >> > >> >> -- >> Markus Jelsma - CTO - Openindex >> http://www.linkedin.com/in/markus17 [6] >> 050-8536600 / 06-50258350 >> > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 > -- Regards, Tharindu
Re: Sorting on arbitary 'custom' fields
On Sat, Oct 09, 2010 at 06:31:19PM -0400, Erick Erickson said: > I'm confused. What do you mean that a user can "set any > number of arbitrarily named fields on a document". It sounds > like you are talking about a user adding arbitrarily may entries > to a multi-valued field? Or is it some kind of key:value pairs > in a field in your schema? Users can add arbitary key/values to documents. Kind of like Machine Tags. So whilst a document has some standard fields (e.g title="My Random Document", user="Simon", date="2010-10-11") I might have added current_temp_in_c="32" to one of my documents but you might have put time_taken_to_write_in_mins="30". We currently don't index these fields but we'd like to and be able to have users sort on them. Ideas I had: - Everytime a user adds a new field (e.g time_taken_to_write_in_mins) update the global schema But that would be horrible and would create an index with many thousands of fields. - Give each user their own core and update each individual schema Better but still inelegant The multi valued field idea occurred to me because I could have, for example user_field: [time_taken_to_write_in_mins=30, current_temp_in_c=32] (i.e flatten the key/value) I could then maybe write something that allowed sorting only on matched values of multi-value field. sort=user_field:time_taken_to_write_in_mins=* or fq=user_field:time_taken_to_write_in_mins=*&sort=user_field It was just an idea though and I was hoping that there would be a simpler more orthodox way of doing it. thanks, Simon
Re: How to manage different indexes for different users
Well, set the user ID for each document and use a filter query to filter only on field:. On Mon, 11 Oct 2010 23:25:29 +0530, Tharindu Mathew wrote: On Mon, Oct 11, 2010 at 10:48 PM, Markus Jelsma wrote: Then you probably read on how to create [1] the new core. Keep in mind, you might need to do some additional local scripting to create a new instance dir. Do the user share the same schema? If so, you'd be better of keeping a single index and preventing the users from querying others. Yes, they will be sharing the same schema. If I understand correctly. going with a single core is recommended in that case? But how do I prevent users from querying other users data? [1]: http://wiki.apache.org/solr/CoreAdmin#CREATE [2] On Mon, 11 Oct 2010 22:40:03 +0530, Tharindu Mathew wrote: Thanks Li. I checked out multi cores documentation. How do I dynamically create new cores as new users are added. Is that possible? On Mon, Oct 11, 2010 at 2:31 PM, Li Li wrote: will one user search other user's index? if not, you can use multi cores. 2010/10/11 Tharindu Mathew : > Hi everyone, > > I'm using solr to integrate search into my web app. > > I have a bunch of users who would have to be given their own individual > indexes. > > I'm wondering whether I'd have to append their user ID as I index a file. > I'm not sure which approach to follow. Is there a sample or a doc I can read > to understand how to approach this problem? > > Thanks in advance. > > -- > Regards, > > Tharindu > -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 [6] 050-8536600 / 06-50258350 -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: How to manage different indexes for different users
On Mon, Oct 11, 2010 at 10:48 PM, Markus Jelsma wrote: > Then you probably read on how to create [1] the new core. Keep in mind, you > might need to do some additional local scripting to create a new instance > dir. > > Do the user share the same schema? If so, you'd be better of keeping a > single index and preventing the users from querying others. > > Yes, they will be sharing the same schema. If I understand correctly. going with a single core is recommended in that case? But how do I prevent users from querying other users data? [1]: http://wiki.apache.org/solr/CoreAdmin#CREATE > > > On Mon, 11 Oct 2010 22:40:03 +0530, Tharindu Mathew > wrote: > >> Thanks Li. I checked out multi cores documentation. >> >> How do I dynamically create new cores as new users are added. Is that >> possible? >> >> On Mon, Oct 11, 2010 at 2:31 PM, Li Li wrote: >> >> will one user search other user's index? >>> if not, you can use multi cores. >>> >>> 2010/10/11 Tharindu Mathew : >>> > Hi everyone, >>> > >>> > I'm using solr to integrate search into my web app. >>> > >>> > I have a bunch of users who would have to be given their own individual >>> > indexes. >>> > >>> > I'm wondering whether I'd have to append their user ID as I index a >>> file. >>> > I'm not sure which approach to follow. Is there a sample or a doc I can >>> read >>> > to understand how to approach this problem? >>> > >>> > Thanks in advance. >>> > >>> > -- >>> > Regards, >>> > >>> > Tharindu >>> > >>> >>> > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536600 / 06-50258350 > -- Regards, Tharindu
Re: facet.method: enum vs. fc
Yep, that was probably the best choice It's a classic time/space tradeoff. The enum method creates a bitset for #each# unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring some overhead here). So if your facet field has 10 unique values, and 8M documents, you'll use up 10M bytes or so. 20 unique values will use up 20M bytes and so on. But this is very, very fast. fc on the other hand, eats up cache for storing the string value for each unique value, plus various counter arrays (several bytes/doc). For most cases, it will use less memory than enum, but will be slower. I'd stick with fc for the time being and think about enum if 1> you have a good idea of what the number of unique terms is or 2> you start to need to finely tune your speed. HTH Erick On Mon, Oct 11, 2010 at 11:30 AM, Paolo Castagna < castagna.li...@googlemail.com> wrote: > Hi, > I am using Solr v1.4 and I am not sure which facet.method I should use. > > What should I use if I do not know in advance if the number of values > for a given field will be high or low? > > What are the pros/cons of using facet.method=enum vs. facet.method=fc? > > When should I use enum vs. fc? > > I have found some comments and suggestions here: > > "enum enumerates all terms in a field, calculating the set intersection > of documents that match the term with documents that match the query. > This was the default (and only) method for faceting multi-valued fields > prior to Solr 1.4. > "fc (stands for field cache), the facet counts are calculated by > iterating over documents that match the query and summing the terms > that appear in each document. This was the default method for single > valued fields prior to Solr 1.4. > The default value is fc (except for BoolField) since it tends to use > less memory and is faster when a field has many unique terms in the > index." > -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method > > "facet.method=enum [...] this is excellent for fields where there is > a small set of distinct values. The average number of values per > document does not matter. > facet.method=fc [...] this is excellent for situations where the > number of indexed values for the field is high, but the number of > values per document is low. For multi-valued fields, a hybrid approach > is used that uses term filters from the filterCache for terms that > match many documents." > -- http://wiki.apache.org/solr/SolrFacetingOverview > > "If you are faceting on a field that you know only has a small number > of values (say less than 50), then it is advisable to explicitly set > this to enum. When faceting on multiple fields, remember to set this > for the specific fields desired and not universally for all facets. > The request handler configuration is a good place to put this." > -- Book: "Solr 1.4 Enterprise Search Server", pag. 148 > > This is the part of the Solr code which deals with the facet.method > parameter: > > if (enumMethod) { >counts = getFacetTermEnumCounts([...]); > } else { >if (multiToken) { > UnInvertedField uif = [...] > counts = uif.getCounts([...]); >} else { > [...] > if (per_segment) { >[...] >counts = ps.getFacetCounts([...]); > } else { >counts = getFieldCacheCounts([...]); > } >} > } > -- > https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java > > See also: > > - > http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values > > At the end, since I do not know in advance the number of different > values for my fields I went for facet.method=fc, does this seems > reasonable to you? > > Thank you, > Paolo >
Re: Problem with Indexing
On Mon, Oct 11, 2010 at 1:27 PM, Jörg Agatz wrote: > ok, i have try it.. and now iget this error: > > POSTing file e067f59c-d046-11df-b552-000c29e17baa_SEARCH.xml > SimplePostTool: FATAL: Solr returned an error: > this_writer_hit_an_OutOfMemoryError_cannot_flush__javalangIllegalStateException [...] Not sure in this particular case, but this looks like Solr is running out of memory. How much RAM do you have allocated in the Java container that Solr is running in? Regards, Gora
Re: Search within a subset of documents
On Mon, Oct 11, 2010 at 8:20 PM, Sergey Bartunov wrote: > Whether it will be enough effective if the subset is really large? [...] If the subset of IDs is large, and disjoint (so that you cannot use ranges), the query might look ugly, but generating it should not be much of a problem if you are using some automated method to create the query. If you mean whether it will be efficient enough, the only way is to try it out, and measure performance. Offhand, I do not think that it should increase the query response time by a lot. Regards, Gora
Re: How to manage different indexes for different users
Then you probably read on how to create [1] the new core. Keep in mind, you might need to do some additional local scripting to create a new instance dir. Do the user share the same schema? If so, you'd be better of keeping a single index and preventing the users from querying others. [1]: http://wiki.apache.org/solr/CoreAdmin#CREATE On Mon, 11 Oct 2010 22:40:03 +0530, Tharindu Mathew wrote: Thanks Li. I checked out multi cores documentation. How do I dynamically create new cores as new users are added. Is that possible? On Mon, Oct 11, 2010 at 2:31 PM, Li Li wrote: will one user search other user's index? if not, you can use multi cores. 2010/10/11 Tharindu Mathew : > Hi everyone, > > I'm using solr to integrate search into my web app. > > I have a bunch of users who would have to be given their own individual > indexes. > > I'm wondering whether I'd have to append their user ID as I index a file. > I'm not sure which approach to follow. Is there a sample or a doc I can read > to understand how to approach this problem? > > Thanks in advance. > > -- > Regards, > > Tharindu > -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: How to manage different indexes for different users
Thanks Li. I checked out multi cores documentation. How do I dynamically create new cores as new users are added. Is that possible? On Mon, Oct 11, 2010 at 2:31 PM, Li Li wrote: > will one user search other user's index? > if not, you can use multi cores. > > 2010/10/11 Tharindu Mathew : > > Hi everyone, > > > > I'm using solr to integrate search into my web app. > > > > I have a bunch of users who would have to be given their own individual > > indexes. > > > > I'm wondering whether I'd have to append their user ID as I index a file. > > I'm not sure which approach to follow. Is there a sample or a doc I can > read > > to understand how to approach this problem? > > > > Thanks in advance. > > > > -- > > Regards, > > > > Tharindu > > > -- Regards, Tharindu
Disable (or prohibit) per-field overrides
Hi, Anyone knows useful method to disable or prohibit the per-field override features for the search components? If not, where to start to make it configurable via solrconfig and attempt to come up with a working patch? Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: deleteByQuery issue
On 10/11/2010 04:06 PM, Ahmet Arslan wrote: --- On Mon, 10/11/10, Claudio Atzori wrote: From: Claudio Atzori Subject: deleteByQuery issue To: solr-user@lucene.apache.org Date: Monday, October 11, 2010, 10:38 AM Hi everybody, in my application I use an instance of EmbeddedSolrServer (solr 1.4.1), the following snippet shows how I am instantiating it: File home = new File(indexDataPath(solrDataDir, indexName)); container = new CoreContainer(indexDataPath(solrDataDir, indexName)); container.load(indexDataPath(solrDataDir, indexName), new File(home, "solr.xml")); return new EmbeddedSolrServer(container, indexName); and I'm going through some issues using deleteByQuery method, in fact, when I try to delete a subset of documents, or even all the documents from the index, I see as they are correctly marked for deletion on the luke inspector (http://code.google.com/p/luke/), but after a commit I can still retrieve them, just like they haven't been removed... I can see the difference and see the documents disappear only when I restart my jetty application, but obviously this cannot be a feature... any idea? I think you are accessing same solr index using both embedded server and http. The changes that you made using embedded server won't be reflected to http until a commit issued from http. I mean if you hit this url: http://localhost:8983/solr/update?commit=true the deleted documents won't be retrieved anymore. P.s. if you want to expunge deleted docs completely you can either optimize or commit with expungeDeletes = "true". Thanks for your reply. Alright I'll better explain my scenario. I'm not exposing any http interface of the index. I handle the whole index 'life cycle' via java code with the EmbeddedSolrServer instance, so I'm handling commits, optimizations, feedings, index creation, all through that instance, moreover my client application calls embeddedSolrServerInstance.commit() after deleteByQuery, but the documents are still there
facet.method: enum vs. fc
Hi, I am using Solr v1.4 and I am not sure which facet.method I should use. What should I use if I do not know in advance if the number of values for a given field will be high or low? What are the pros/cons of using facet.method=enum vs. facet.method=fc? When should I use enum vs. fc? I have found some comments and suggestions here: "enum enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query. This was the default (and only) method for faceting multi-valued fields prior to Solr 1.4. "fc (stands for field cache), the facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document. This was the default method for single valued fields prior to Solr 1.4. The default value is fc (except for BoolField) since it tends to use less memory and is faster when a field has many unique terms in the index." -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method "facet.method=enum [...] this is excellent for fields where there is a small set of distinct values. The average number of values per document does not matter. facet.method=fc [...] this is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the filterCache for terms that match many documents." -- http://wiki.apache.org/solr/SolrFacetingOverview "If you are faceting on a field that you know only has a small number of values (say less than 50), then it is advisable to explicitly set this to enum. When faceting on multiple fields, remember to set this for the specific fields desired and not universally for all facets. The request handler configuration is a good place to put this." -- Book: "Solr 1.4 Enterprise Search Server", pag. 148 This is the part of the Solr code which deals with the facet.method parameter: if (enumMethod) { counts = getFacetTermEnumCounts([...]); } else { if (multiToken) { UnInvertedField uif = [...] counts = uif.getCounts([...]); } else { [...] if (per_segment) { [...] counts = ps.getFacetCounts([...]); } else { counts = getFieldCacheCounts([...]); } } } -- https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java See also: - http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values At the end, since I do not know in advance the number of different values for my fields I went for facet.method=fc, does this seems reasonable to you? Thank you, Paolo
Re: Search within a subset of documents
Whether it will be enough effective if the subset is really large? On 11 October 2010 18:39, Gora Mohanty wrote: > On Mon, Oct 11, 2010 at 7:00 PM, Sergey Bartunov wrote: >> Is it possible to use Solr for searching within a subset of documents >> represented by enumeration of document IDs? > > Couldn't you add the document ID to the query, e.g., if the field is > called id, you can use ?q=id:, e.g., ?q=id:1234? You could > use a range, etc., to include all the desired IDs. > > Regards, > Gora >
Re: Search within a subset of documents
On Mon, Oct 11, 2010 at 7:00 PM, Sergey Bartunov wrote: > Is it possible to use Solr for searching within a subset of documents > represented by enumeration of document IDs? Couldn't you add the document ID to the query, e.g., if the field is called id, you can use ?q=id:, e.g., ?q=id:1234? You could use a range, etc., to include all the desired IDs. Regards, Gora
Re: KStemmer for Solr
> Because I'm using solr from trunk and not from lucid > imagination > I was missing KStemmer. So I decided to add this stemmer to > my installation. > > After some modifications KStemmer is now working fine as > stand-alone. > Now I have a KStemmerFilter. > Next will be to write the KStemmerFilterFactory. > > I would place the Factory in > "lucene-solr/solr/src/java/org/apache/solr/analysis/" > to the other Factories, but where to place the Filter? > > Does it make sense to place the Filter somewhere under > "lucene-solr/modules/analysis/common/src/java/org/apache/lucene/analysis/" > ? > But this is for Lucene and not Solr... > > Or should I place the Filter in a subdirectory of the > Factories? For this kind of modification you don't need to modify standard distro. You can jar these new classes, and put this jar into solrhome/lib directory. For more info : http://wiki.apache.org/solr/SolrPlugins
Re: Index time boosting is not working with boosting value in document level
> Eric, > Score is not coming properly even after > giving boost value in document > and field level. > Please find the solrconfig.xml, > schema.xml, data-config.xml, the feed and > the score & query. > Doc with id 'ABCDEF/L' is boosted and doc > with id 'MA147LL/A' is not > boosted, but both are returning same score - 0.1942141. > Could you please help me to find where I > did a mistake? It seems that you are using DIH to index feed.xml. You can directly post feed.xml to solr, then your boosts will be taken into account. There is a script named post.sh for this purpose. As Erik said, you can always verify boosts with &debugQuery=on.
KStemmer for Solr
Because I'm using solr from trunk and not from lucid imagination I was missing KStemmer. So I decided to add this stemmer to my installation. After some modifications KStemmer is now working fine as stand-alone. Now I have a KStemmerFilter. Next will be to write the KStemmerFilterFactory. I would place the Factory in "lucene-solr/solr/src/java/org/apache/solr/analysis/" to the other Factories, but where to place the Filter? Does it make sense to place the Filter somewhere under "lucene-solr/modules/analysis/common/src/java/org/apache/lucene/analysis/" ? But this is for Lucene and not Solr... Or should I place the Filter in a subdirectory of the Factories? Any suggestion for me? Regards, Bernd
Re: deleteByQuery issue
--- On Mon, 10/11/10, Claudio Atzori wrote: > From: Claudio Atzori > Subject: deleteByQuery issue > To: solr-user@lucene.apache.org > Date: Monday, October 11, 2010, 10:38 AM > Hi everybody, > in my application I use an instance of EmbeddedSolrServer > (solr 1.4.1), the following snippet shows how I am > instantiating it: > > > File home = new > File(indexDataPath(solrDataDir, indexName)); > > > > container = new > CoreContainer(indexDataPath(solrDataDir, indexName)); > > > container.load(indexDataPath(solrDataDir, > indexName), new File(home, "solr.xml")); > > > > return new > EmbeddedSolrServer(container, indexName); > > and I'm going through some issues using deleteByQuery > method, in fact, when I try to delete a subset of documents, > or even all the documents from the index, I see as they are > correctly marked for deletion on the luke inspector > (http://code.google.com/p/luke/), but after a commit I > can still retrieve them, just like they haven't been > removed... > > I can see the difference and see the documents disappear > only when I restart my jetty application, but obviously this > cannot be a feature... any idea? I think you are accessing same solr index using both embedded server and http. The changes that you made using embedded server won't be reflected to http until a commit issued from http. I mean if you hit this url: http://localhost:8983/solr/update?commit=true the deleted documents won't be retrieved anymore. P.s. if you want to expunge deleted docs completely you can either optimize or commit with expungeDeletes = "true".
Re: How to get Term Frequency
> I have a question that how could somebody get term > frequency as we do get in > lucene by the following method DocFreq(new Term("Field", > "value")); using solr/solrnet. You can get term frequency with http://wiki.apache.org/solr/TermVectorComponent. If you are interested in document frequency, you can use http://wiki.apache.org/solr/TermsComponent
Search within a subset of documents
Is it possible to use Solr for searching within a subset of documents represented by enumeration of document IDs?
Re: Index time boosting is not working with boosting value in document level
Eric, Score is not coming properly even after giving boost value in document and field level. Please find the solrconfig.xml, schema.xml, data-config.xml, the feed and the score & query. Doc with id 'ABCDEF/L' is boosted and doc with id 'MA147LL/A' is not boosted, but both are returning same score - 0.1942141. Could you please help me to find where I did a mistake? solrconfig.xml data-config.xml solr schema.xml id name data-config.xml feed F8V7067-APL-KIT Belkin Mobile Power Cord for iPod w/ Dock IW-02 iPod & iPod Mini USB 2.0 Cable MA147LL/A Apple 60 GB iPod with Video Playback Black ABCDEF/L Apple 60 GB iPod with Video Playback Black Query & Response http://localhost:8080/solr/core0/select/?q=ipod&version=2.2&start=0&rows=10&indent=on&fl=score 0 15 0.27466023 IW-02 iPod & iPod Mini USB 2.0 Cable 0.24276763 F8V7067-APL-KIT Belkin Mobile Power Cord for iPod w/ Dock 0.1942141 MA147LL/A Apple 60 GB iPod with Video Playback Black 0.1942141 ABCDEF/L Apple 60 GB iPod with Video Playback Black -- View this message in context: http://lucene.472066.n3.nabble.com/Index-time-boosting-is-not-working-with-boosting-value-in-document-level-tp1649072p1680215.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr start in server
I solved it nohup java -jar start.jar& Thnx. -- Yavuz Selim YILMAZ 2010/10/11 Gora Mohanty > On Mon, Oct 11, 2010 at 1:23 PM, Yavuz Selim YILMAZ > wrote: > > I use AIX 5.3. > > > > How can I handle? > [...] > > Have not used AIX in ages, but this should work, assuming a sh-type of > shell: > java -jar start.jar > jetty_log.txt 2>&1 & > This will save the output from Jetty to jetty_log.txt. If you do not want > to > save the output (the file might get quite large depending on your usage), > you can use > java -jar start.jar > /dev/null 2>&1 & > > Regards, > Gora >
Re: Tuning Solr caches with high commit rates (NRT)
Hi, why do you need to change the lockType? Does a readonly instance need locks at all? thanks, Anders. On Tue, 14 Sep 2010 15:00:54 +0200, Peter Karich wrote: > Peter Sturge, > > this was a nice hint, thanks again! If you are here in Germany anytime I > can invite you to a beer or an apfelschorle ! :-) > I only needed to change the lockType to none in the solrconfig.xml, > disable the replication and set the data dir to the master data dir! > > Regards, > Peter Karich. > >> Hi Peter, >> >> this scenario would be really great for us - I didn't know that this is >> possible and works, so: thanks! >> At the moment we are doing similar with replicating to the readonly >> instance but >> the replication is somewhat lengthy and resource-intensive at this >> datavolume ;-) >> >> Regards, >> Peter. >> >> >>> 1. You can run multiple Solr instances in separate JVMs, with both >>> having their solr.xml configured to use the same index folder. >>> You need to be careful that one and only one of these instances will >>> ever update the index at a time. The best way to ensure this is to use >>> one for writing only, >>> and the other is read-only and never writes to the index. This >>> read-only instance is the one to use for tuning for high search >>> performance. Even though the RO instance doesn't write to the index, >>> it still needs periodic (albeit empty) commits to kick off >>> autowarming/cache refresh. >>> >>> Depending on your needs, you might not need to have 2 separate >>> instances. We need it because the 'write' instance is also doing a lot >>> of metadata pre-write operations in the same jvm as Solr, and so has >>> its own memory requirements. >>> >>> 2. We use sharding all the time, and it works just fine with this >>> scenario, as the RO instance is simply another shard in the pack. >>> >>> >>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich wrote: >>> >>> Peter, thanks a lot for your in-depth explanations! Your findings will be definitely helpful for my next performance improvement tests :-) Two questions: 1. How would I do that: > or a local read-only instance that reads the same core as the indexing > instance (for the latter, you'll need something that periodically > refreshes - i.e. runs commit()). > > 2. Did you try sharding with your current setup (e.g. one big, nearly-static index and a tiny write+read index)? Regards, Peter. > Hi, > > Below are some notes regarding Solr cache tuning that should prove > useful for anyone who uses Solr with frequent commits (e.g. <5min). > > Environment: > Solr 1.4.1 or branch_3x trunk. > Note the 4.x trunk has lots of neat new features, so the notes here > are likely less relevant to the 4.x environment. > > Overview: > Our Solr environment makes extensive use of faceting, we perform > commits every 30secs, and the indexes tend be on the large-ish side > (>20million docs). > Note: For our data, when we commit, we are always adding new data, > never changing existing data. > This type of environment can be tricky to tune, as Solr is more geared > toward fast reads than frequent writes. > > Symptoms: > If anyone has used faceting in searches where you are also performing > frequent commits, you've likely encountered the dreaded OutOfMemory or > GC Overhead Exeeded errors. > In high commit rate environments, this is almost always due to > multiple 'onDeck' searchers and autowarming - i.e. new searchers don't > finish autowarming their caches before the next commit() > comes along and invalidates them. > Once this starts happening on a regular basis, it is likely your > Solr's JVM will run out of memory eventually, as the number of > searchers (and their cache arrays) will keep growing until the JVM > dies of thirst. > To check if your Solr environment is suffering from this, turn on INFO > level logging, and look for: 'PERFORMANCE WARNING: Overlapping > onDeckSearchers=x'. > > In tests, we've only ever seen this problem when using faceting, and > facet.method=fc. > > Some solutions to this are: > Reduce the commit rate to allow searchers to fully warm before the > next commit > Reduce or eliminate the autowarming in caches > Both of the above > > The trouble is, if you're doing NRT commits, you likely have a good > reason for it, and reducing/elimintating autowarming will very > significantly impact search performance in high commit rate > environments. > > Solution: > Here are some setup steps we've used that allow lots of faceting (we > typically search with at least 20-35 different facet fields, and date > faceting/sorting) on large indexes, and still keep decent search > performance: > > 1. Firstly, you
How to get Term Frequency
hi All I have a question that how could somebody get term frequency as we do get in lucene by the following method DocFreq(new Term("Field", "value")); using solr/solrnet.
Re: Multiple masters and replication between masters?
Thanks Otis. That was helpful. On Mon, Oct 11, 2010 at 9:19 AM, Otis Gospodnetic wrote: > Arun, > > Yes, changing the solrconfig.xml to point to the new master could require a > restart. > However, if you use logical addresses (VIPs in the Load Balancer or even local > hostname aliases if you don't have a LB) then you just need to point those > VIPs/aliases to new IPs and the Solr slave won't have to be restarted. > > > Otis > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Arunkumar Ayyavu >> To: solr-user@lucene.apache.org >> Sent: Sun, October 10, 2010 1:57:34 PM >> Subject: Re: Multiple masters and replication between masters? >> >> On Mon, Oct 4, 2010 at 4:58 PM, Upayavira wrote: >> > On Mon, 2010-10-04 at 00:25 +0530, Arunkumar Ayyavu wrote: >> >> I'm looking at setting up multiple masters for redundancy (for index >> >> updates). I found the thread in this link >> >> >>(http://www.lucidimagination.com/search/document/68ac303ce8425506/multiple_masters_solr_replication_1_4) >> >> >> discussed this subject more than a year back. Does Solr support such >> >> configuration today? >> > >> > Solr does not support master/master replication. When you commit >> > documents to SOLR, it adds a segment to the underlying Lucene index. >> > Replication then syncs that segment to your slaves. To do master/master >> > replication, you would have to pull changes from each master, then merge >> > those changed segments into a single updated index. This is more complex >> > than what is happening in the current Solr replication (which is not >> > much more than an rsync of the index files). >> > >> > Note, if you commit your changes to two masters, you cannot switch a >> > slave between them, as it is unlikely that the two masters will have >> > matching index files. If you did so, you would probably trigger a pull >> > of the entire index across the network, which (while it would likely >> > work) would not be the most efficient action. >> > >> > What you can do is think cleverly about how you organise your >> > master/slave setup. E.g. have a slave that doesn't get queried, but >> > exists to take over the role of the master in case it fails. The index >> > on a slave is the same as that in a master, and can immediately take on >> > the role of the master (receiving commits), and upon failure of your >> > master, you could point your other slaves at this new master, and things >> > should just carry on as before. >> Wouldn't this require restart of Solr instances? >> >> Sorry, I couldn't respond to you earlier as I wasn't checking my mails >> for sometime. >> >> > >> > Also, if you have a lot of slaves (such that they are placing too big a >> > load on your master), insert intermediate hosts that are both slaves off >> > the master, and masters to your query slaves. That way, you could have, >> > say, two boxes slaving off the master, then 20 or 30 slaving off them. >> > >> >> And does Solr support replication between masters? Otherwise, I'll >> >> have to post the updates to all masters to keep the indexes of masters >> >> in sync. Does SolrCloud address this case? (Please note it is too >> >> early for me to read about SolrCloud as I'm still learning Solr) >> > >> > I don't believe SolrCloud is aiming to support master/master >> > replication. >> > >> > HTH >> > >> > Upayavira >> > >> > >> > >> >> >> >> -- >> Arun >> > -- Arun
Re: How to manage different indexes for different users
will one user search other user's index? if not, you can use multi cores. 2010/10/11 Tharindu Mathew : > Hi everyone, > > I'm using solr to integrate search into my web app. > > I have a bunch of users who would have to be given their own individual > indexes. > > I'm wondering whether I'd have to append their user ID as I index a file. > I'm not sure which approach to follow. Is there a sample or a doc I can read > to understand how to approach this problem? > > Thanks in advance. > > -- > Regards, > > Tharindu >
question about SolrCore
hi all, I want to know the detail of IndexReader in SolrCore. I read a little codes of SolrCore. Here is my understanding, are they correct? Each SolrCore has many SolrIndexSearcher and keeps them in _searchers. and _searcher keep trace of the latest version of index. Each SolrIndexSearcher has a SolrIndexReader. If there isn't any update, all these searchers share one single SolrIndexReader. If there is an update, then a newSearcher will be created and a new SolrIndexReader associated with it. I did a simple test. A thread do a query and blocked by breakpoint. Then I feed some data to update index. When commit, a newSearcher is created. Here is the debug info: SolrCore _searcher [solrindexsearc...@...ab] _searchers[solrindexsearc...@...77,solrindexsearc...@...ab,solrindexsearc...@..f8] solrindexsearc...@...77 's SolrIndexReader is old one and ab and f8 share the same newest SolrIndexReader When query finished solrindexsearc...@...77 is discarded. When newSearcher success to warmup, There is only one SolrIndexSearcher. The SolrIndexReader of old version of index is discarded and only segments in newest SolrIndexReader are referenced. Those segments not in new version can then be deleted because no file pointer reference them . Then I start 3 queries. There is only one SolrIndexSearcher but RefCount=4. It seems many search can share one single SolrIndexSearcher. So in which situation, there will exist more than one SolrIndexSearcher that they share just one SolrIndexReader? Another question, for each version of index, is there just one SolrIndexReader instance associated with it? will it occur that more than one SolrIndexReader are opened and they are the same version of index?
How to manage different indexes for different users
Hi everyone, I'm using solr to integrate search into my web app. I have a bunch of users who would have to be given their own individual indexes. I'm wondering whether I'd have to append their user ID as I index a file. I'm not sure which approach to follow. Is there a sample or a doc I can read to understand how to approach this problem? Thanks in advance. -- Regards, Tharindu
Re: Solr start in server
On Mon, Oct 11, 2010 at 1:23 PM, Yavuz Selim YILMAZ wrote: > I use AIX 5.3. > > How can I handle? [...] Have not used AIX in ages, but this should work, assuming a sh-type of shell: java -jar start.jar > jetty_log.txt 2>&1 & This will save the output from Jetty to jetty_log.txt. If you do not want to save the output (the file might get quite large depending on your usage), you can use java -jar start.jar > /dev/null 2>&1 & Regards, Gora
Re: Problem with Indexing
ok, i have try it.. and now iget this error: POSTing file e067f59c-d046-11df-b552-000c29e17baa_SEARCH.xml SimplePostTool: FATAL: Solr returned an error: this_writer_hit_an_OutOfMemoryError_cannot_flush__javalangIllegalStateException_this_writer_hit_an_OutOfMemoryError_cannot_flush__at_orgapacheluceneindexIndexWriterdoFlushInternalIndexWriterjava4204__at_orgapacheluceneindexIndexWriterdoFlushIndexWriterjava4192__at_orgapacheluceneindexIndexWriterflushIndexWriterjava4183__at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2647__at_orgapacheluceneindexIndexWriterupdateDocumentIndexWriterjava2601__at_orgapachesolrupdateDirectUpdateHandler2addDocDirectUpdateHandler2java241__at_orgapachesolrupdateprocessorRunUpdateProcessorprocessAddRunUpdateProcessorFactoryjava61__at_orgapachesolrhandlerXMLLoaderprocessUpdateXMLLoaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentStreamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1316__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_ i dont know, how i can index a lot of xml (fast)
Re: Solr start in server
I use AIX 5.3. How can I handle? -- Yavuz Selim YILMAZ 2010/10/11 Gora Mohanty > On Mon, Oct 11, 2010 at 1:09 PM, Yavuz Selim YILMAZ > wrote: > > I have a solr installation on a server. I start it with the help of putty > ( > > with the start.jar). But when I close the putty instance, automatically > solr > > instance also closes. How can I solve this problem? I mean, I close > > connection with server, but solr instance still runs? > [...] > > What operating system is the server running? You will have to put the job > in the background. For some operating systems/shells, you also have to > configure things so that background jobs are not killed on logging out. > > Regards, > Gora >
Re: Solr start in server
On Mon, Oct 11, 2010 at 1:09 PM, Yavuz Selim YILMAZ wrote: > I have a solr installation on a server. I start it with the help of putty ( > with the start.jar). But when I close the putty instance, automatically solr > instance also closes. How can I solve this problem? I mean, I close > connection with server, but solr instance still runs? [...] What operating system is the server running? You will have to put the job in the background. For some operating systems/shells, you also have to configure things so that background jobs are not killed on logging out. Regards, Gora
Solr start in server
I have a solr installation on a server. I start it with the help of putty ( with the start.jar). But when I close the putty instance, automatically solr instance also closes. How can I solve this problem? I mean, I close connection with server, but solr instance still runs? -- Yavuz Selim YILMAZ
deleteByQuery issue
Hi everybody, in my application I use an instance of EmbeddedSolrServer (solr 1.4.1), the following snippet shows how I am instantiating it: File home = new File(indexDataPath(solrDataDir, indexName)); container = new CoreContainer(indexDataPath(solrDataDir, indexName)); container.load(indexDataPath(solrDataDir, indexName), new File(home, "solr.xml")); return new EmbeddedSolrServer(container, indexName); and I'm going through some issues using deleteByQuery method, in fact, when I try to delete a subset of documents, or even all the documents from the index, I see as they are correctly marked for deletion on the luke inspector (http://code.google.com/p/luke/), but after a commit I can still retrieve them, just like they haven't been removed... I can see the difference and see the documents disappear only when I restart my jetty application, but obviously this cannot be a feature... any idea?
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
On 11.10.2010, at 07:03, Israel Ekpo wrote: > I am currently working on a couple of bug fixes for the Solr PECL extension > that will be available in the next release 0.9.12 sometime this month. > > http://pecl.php.net/package/solr > > Documentation of the current API and features for the PECL extension is > available here > > http://www.php.net/solr > > A couple of users in the community were asking when the PHP extension will > be moving from beta to stable. > > The API looks stable so far with no serious issues and I am looking to > moving it from *Beta* to *Stable *on November 20 2010 > > If you are using Solr via PHP and would like to see any new features in the > extension please feel free to send me a note. > > I would like to incorporate those changes in 0.9.12 so that user can try > them out and send me some feedback before the release of version 1.0 > > Thanks in advance for your response. we already had some emails about this. imho there are too many methods for specialized tasks, that its easy to get lost in the API, especially since not all of them have written documentation yet beyond the method signatures. also i do think that there should be methods for escaping and also tokenizing lucene queries to enable "validation" of the syntax used etc. see here for a use case and a user land implementation: http://pooteeweet.org/blog/1796 regards, Lukas Kahwe Smith m...@pooteeweet.org