Creating document schema at runtime
Hi, I'm looking on some tips on how to create a new document schema and add it to solr core at runtime. The use case that I'm trying to solve is: 1. Using a custom configuration tool, user creates a solr schema 2. The schema is added (uploaded) to a solr instance (on a remote machine). 3. Documents corresponding to the newly added schema are added to solr. I understand that with SOLR-215, I can create a new core by specifying the config and schema but still, there is no way for me to do this from a remote machine using HTTP calls. If this capability does not exist, I would be happy to open an issue in JIRA and contribute patches. Looking for your inputs. -- Regards, Shalin Shekhar Mangar.
Creating user-defined field types
Hi, Can anyone guide me as to how one can go on to implement a user defined field types in solr? I could not find anything on the solr-wiki. Help of any kind would be appreciated. Regards, Rishabh
Facets - What's a better term for non technical people?
Whilst many of the people on this list (myself included) have a pretty good grasp of what is meant by the term facet, this is not clear to people who approach the system from a more fresh point of view. So, has anyone got a good example of the language they might use over, say, a set of radio buttons and fields on a web form, to indicate that selecting one or more of these would return facets. 'Show grouping by' or 'List the sets that the results fall into' or something similar. Ben
Re: Facets - What's a better term for non technical people?
On 11/12/2007, at 8:32 PM, Benjamin O'Steen wrote: So, has anyone got a good example of the language they might use over, say, a set of radio buttons and fields on a web form, to indicate that selecting one or more of these would return facets. 'Show grouping by' or 'List the sets that the results fall into' or something similar. "Filter by" is what I'd use which is unfortunately already used in Solr, though very much related since the facet is generally added as a filter query. Not close enough to use the same term though. Other things that are close but not really right would be groups or categories. Maybe "Limit to" so facets would be limiters. I think facet is the right term and what you need is to add "see also" type entries under a bunch of these other terms. Regards, Adrian Sutton http://www.symphonious.net
RE: Facets - What's a better term for non technical people?
Hi, > So, has anyone got a good example of the language they might use over, > say, a set of radio buttons and fields on a web form, to indicate that > selecting one or more of these would return facets. 'Show grouping by' > or 'List the sets that the results fall into' or something similar. Here's what i found some time : http://www.searchtools.com/info/faceted-metadata.html It has been quite useful to me. André Davignon
Re: Replication hooks
That's what I was after. As always, thanks for the quick response. Tracy On Dec 11, 2007, at 12:18 AM, Yonik Seeley wrote: On Dec 10, 2007 11:22 PM, climbingrose <[EMAIL PROTECTED]> wrote: I think there is a event listener interface for hooking into Solr events such as post commit, post optimise and open new searcher. I can't remember on top of my head but if you do a search for *EventListener in Eclipse, you'll find it. The Wiki shows how to trigger snapshooter after each commit and optimise. You should be able to follow this example to create your own listener. Right... you shouldn't need to implement your own listeners though. Search for postCommit in the example solrconfig.xml -Yonik
How to effectively search inside fields that should be indexed with changing them.
Hello all, The titles of our docs have the form "ABC0001231-This is an important doc.pdf". I would like to be able to search for 'important', or '1231', or 'ABC000*', or 'This is an important doc' in the title field. I looked a the NGramTokenizer and tried to use it. In the index it doesn't seem to work, I cannot get any hits. The analysis tool on the admin pages shows me that the ngram tokenizing works by highlighting the matches between the indexed value and a query. I have set the min and max ngram size to 2 and 6, with side equal to left. Can anyone recommend a procedure that will allow me to search as stated above? I would also like to find out more about how to use the NgramTokenizer, but have found little in the form of documentation. Anyone know about any good sources? Thanks, Brian
Two Solr Webapps, one folder for the index data?
I have successfully configured two parallel Solr webapps , however I see that all data gets stored in one folder of my Tomcat installation, namely C:\Tomcat\solr\data\index. How can I configure that each Solr webapp shall store the data in the folders I assigned at , where already the Solr scheme etc. resides (so that it get stored at individualSolrFolder/data/index)? Thanks
Re: Two Solr Webapps, one folder for the index data?
I actually have a patch for solr config parser which allows you to use context environment variables in the solrconfig.xml I generally use it for development when I'm working with multiple instances and different data dirs. I'll add it to jira today if you want it. P Jörg Kiegeland wrote: I have successfully configured two parallel Solr webapps , however I see that all data gets stored in one folder of my Tomcat installation, namely C:\Tomcat\solr\data\index. How can I configure that each Solr webapp shall store the data in the folders I assigned at , where already the Solr scheme etc. resides (so that it get stored at individualSolrFolder/data/index)? Thanks -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: Two Solr Webapps, one folder for the index data?
I actually have a patch for solr config parser which allows you to use context environment variables in the solrconfig.xml I generally use it for development when I'm working with multiple instances and different data dirs. I'll add it to jira today if you want it. That would be nice. However I cannot believe that one cannot configure this by someconfiguration file by now - what if only one index needs to be backuped and the other index does not need to be backuped because it carries only redundant information of some other data source, like in my case. And if all data is put in one folder, you can backup only both indexes together..
Re: Facets - What's a better term for non technical people?
FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: > Hi, > > > So, has anyone got a good example of the language they might use over, > > say, a set of radio buttons and fields on a web form, to indicate that > > selecting one or more of these would return facets. 'Show grouping by' > > or 'List the sets that the results fall into' or something similar. > > Here's what i found some time : > http://www.searchtools.com/info/faceted-metadata.html > > It has been quite useful to me. > > André Davignon > >
Re: Creating user-defined field types
On Dec 11, 2007 5:17 AM, Rishabh Joshi <[EMAIL PROTECTED]> wrote: > Can anyone guide me as to how one can go on to implement a user defined > field types in solr? At a higher level, what are you trying to accomplish? If you just want to customize analysis, just copy and modify an existing fieldType definition in the schema.xml file. -Yonik
Re: Two Solr Webapps, one folder for the index data?
I use jvm system properties for this; they seem to work well. -Mike On 11-Dec-07, at 7:39 AM, patrick o'leary wrote: I actually have a patch for solr config parser which allows you to use context environment variables in the solrconfig.xml I generally use it for development when I'm working with multiple instances and different data dirs. I'll add it to jira today if you want it. P Jörg Kiegeland wrote: I have successfully configured two parallel Solr webapps , however I see that all data gets stored in one folder of my Tomcat installation, namely C:\Tomcat\solr\data\index. How can I configure that each Solr webapp shall store the data in the folders I assigned at value="individualSolrFolder">, where already the Solr scheme etc. resides (so that it get stored at individualSolrFolder/data/index)? Thanks -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: Two Solr Webapps, one folder for the index data?
JVM properties restrict you to a single implementation within a jvm. For instance if you want multiple instances of solr running with the same schema, with different data dir's in the one app server. You'll have to have several copies of solrconfig and schema.xml. By using context environment, I can have multiple contexts like pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr.xml pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr1.xml Changing just the solr/data/dir for each instance. And in my solrconfig.xml ${env/solr/data/dir:./solr/data} It certainly makes development & operations easier. P Mike Klaas wrote: I use jvm system properties for this; they seem to work well. -Mike On 11-Dec-07, at 7:39 AM, patrick o'leary wrote: I actually have a patch for solr config parser which allows you to use context environment variables in the solrconfig.xml I generally use it for development when I'm working with multiple instances and different data dirs. I'll add it to jira today if you want it. P Jörg Kiegeland wrote: I have successfully configured two parallel Solr webapps , however I see that all data gets stored in one folder of my Tomcat installation, namely C:\Tomcat\solr\data\index. How can I configure that each Solr webapp shall store the data in the folders I assigned at , where already the Solr scheme etc. resides (so that it get stored at individualSolrFolder/data/index)? Thanks -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: Two Solr Webapps, one folder for the index data?
: However I cannot believe that one cannot configure this by someconfiguration : file by now - what if only one index needs to be backuped and the other index is the option you are looking forthe in solrconfig.xml ? -Hoss
Pattern that generates two tokens per match
Hi all, I've got a pattern in a document (call it "xy") that I want to turn into two tokens - "xy" and "y". One approach I could use is PatternTokenizer to extract "xy", and then a custom filter that returns "xy" and then "y" on the next call (caches the next result). Or I could extend PatternTokenizer to return multiple tokens per match, though figuring out how to specify that in the schema seems harder. Is there another approach that wouldn't require any custom code? Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it"
Re: Two Solr Webapps, one folder for the index data?
: I actually have a patch for solr config parser which allows you to use : context environment variables in the solrconfig.xml : I generally use it for development when I'm working with multiple : instances and different data dirs. I'll add it to jira today if you : want it. yes please! ... Solr already has system property variable replacement in solrconfig.xml, and we discussed a while back (on solr-dev i think) adding code to automatily create system properties on startup for any solr/* JNDI variables set so the same variable subst code could be reused ... but i don't think anyone ever opened an issue or created a patch for it. -Hoss
Re: Pattern that generates two tokens per match
On 11-Dec-07, at 11:51 AM, Ken Krugler wrote: Hi all, I've got a pattern in a document (call it "xy") that I want to turn into two tokens - "xy" and "y". One approach I could use is PatternTokenizer to extract "xy", and then a custom filter that returns "xy" and then "y" on the next call (caches the next result). Or I could extend PatternTokenizer to return multiple tokens per match, though figuring out how to specify that in the schema seems harder. Is there another approach that wouldn't require any custom code? Not that I can think of. Perhaps the natural way of extending PatterTokenizer to return subtokens is to use the grouping of the regular expression. That is, specify "x(y)" to return both. I assume that java has a non-selecting re group operator (it's (?:) in python) so the basic grouping functionality would not be lost. Python does this for re.split, which I find nice: >>> re.split('a(b)c', 'oneabctwoabcthree') ['one', 'b', 'two', 'b', 'three']
Re: Solr, Multiple processes running
Martin, Look into MultiCore (new stuff, some info on the Wiki) or into running multiple Solrs inside a single JVM. We just did this with Jetty 6.1.6 for a client and it works beautifully. This is also documented on the Wiki. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: "Owens, Martin" <[EMAIL PROTECTED]> To: "Owens, Martin" <[EMAIL PROTECTED]>; solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 4:01:53 PM Subject: Solr, Multiple processes running Hello everyone, The system we're moving from (dtSearch) allows each of our clients to have a search index. So far I have yet to find the options required to set this, it seems I can only set the directory path before run time. Each of the indexes uses the same schema, same configuration just different data in each; what kind of performance penalty would I have from running a new solr instance per required database? what is the best way to track what port or what index is being used? would I be able to run 1,000 or more solr instances without performance degradation? Thanks for your help. Best regards, Martin Owens
Re: How to effectively search inside fields that should be indexed with changing them.
Brian, This is not really a job for n-grams. It sounds like you'll want to write a custom Tokenizer that has knowledge about this particular pattern, knows how to split input like the one in your example, and produce multiple tokens out of it. For the natural language part you can probably get away with one of the existing tokenizers/analyzers/factories. For the first part you'll likely want to extract (W+)0+ -- 1 or morel etters followed by 1 or more zeros as one token, and then 0+(D+) -- 1 or more zeros followed by 1 or more digits. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Brian Carmalt <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 9:17:32 AM Subject: How to effectively search inside fields that should be indexed with changing them. Hello all, The titles of our docs have the form "ABC0001231-This is an important doc.pdf". I would like to be able to search for 'important', or '1231', or 'ABC000*', or 'This is an important doc' in the title field. I looked a the NGramTokenizer and tried to use it. In the index it doesn't seem to work, I cannot get any hits. The analysis tool on the admin pages shows me that the ngram tokenizing works by highlighting the matches between the indexed value and a query. I have set the min and max ngram size to 2 and 6, with side equal to left. Can anyone recommend a procedure that will allow me to search as stated above? I would also like to find out more about how to use the NgramTokenizer, but have found little in the form of documentation. Anyone know about any good sources? Thanks, Brian
Re: Solr, Multiple processes running
Keeping track of 1000+ indices is actually not that hard. I've implemented Simpy - http://simpy.com - in a way that keeps each member's index (or indices - some users have multiple indices) separate. I can't give out the total number of Simpy users, but I can tell you it is weeell beyond 1000 :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erick Erickson <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 4:33:45 PM Subject: Re: Solr, Multiple processes running How much data are we talking about here? Because it seems *much* simpler to just index a field with each document indicating the user and then just AND that user's ID in with your query. Or think about facets (although I admit I don't know enough about facets to weigh in on its merits, it's just been mentioned a lot). Keeping track of 1,000+ indexes seems like a maintenance headache, but much depends upon how much data you're talking about. When replying, the number of documents is almost, but not quite totally, useless unless combined with the number of fields you're storing per doc, the average length of each field, etc . Erick On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > Hello everyone, > > The system we're moving from (dtSearch) allows each of our clients to have > a search index. So far I have yet to find the options required to set this, > it seems I can only set the directory path before run time. > > Each of the indexes uses the same schema, same configuration just > different data in each; what kind of performance penalty would I have from > running a new solr instance per required database? what is the best way to > track what port or what index is being used? would I be able to run 1,000 or > more solr instances without performance degradation? > > Thanks for your help. > > Best regards, Martin Owens >
RE: Two Solr Webapps, one folder for the index data?
I asked a question similar to this back in http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/[EMAIL PROTECTED] and didn't really find anyone who was doing this. What I wound up doing was adding a variable to the context.xml file called contextRelativeHome: solr/contextRelativeHome java.lang.Boolean true Which causes the SolrResourceLoader to prepend the context directory to the solr/home variable (the context directory is identified in the SolrDispatchFilter and stored in the global Config). This way, I can have a multiple instances of Solr up and running with the exact same configuration, and their indices contained wholly within their deployment directories. So since this is a fresh thread, does this seem like a bad way to do it? It would be much easier if I could put context variables directly into the existing solr/home variable, for sure. Anthony -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 1:09 PM To: solr-user@lucene.apache.org Subject: Re: Two Solr Webapps, one folder for the index data? : I actually have a patch for solr config parser which allows you to use : context environment variables in the solrconfig.xml : I generally use it for development when I'm working with multiple : instances and different data dirs. I'll add it to jira today if you : want it. yes please! ... Solr already has system property variable replacement in solrconfig.xml, and we discussed a while back (on solr-dev i think) adding code to automatily create system properties on startup for any solr/* JNDI variables set so the same variable subst code could be reused ... but i don't think anyone ever opened an issue or created a patch for it. -Hoss
Re: Solr, Multiple processes running
You're right, I'm wrong. I certainly am willing to defer to someone who's been there before . On Dec 11, 2007 4:44 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Keeping track of 1000+ indices is actually not that hard. I've > implemented Simpy - http://simpy.com - in a way that keeps each member's > index (or indices - some users have multiple indices) separate. I can't > give out the total number of Simpy users, but I can tell you it is > weeell beyond 1000 :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Erick Erickson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, December 11, 2007 4:33:45 PM > Subject: Re: Solr, Multiple processes running > > How much data are we talking about here? Because it seems *much* > simpler > to just index a field with each document indicating the user and then > just > AND that user's ID in with your query. > > Or think about facets (although I admit I don't know enough about > facets > to weigh in on its merits, it's just been mentioned a lot). > > Keeping track of 1,000+ indexes seems like a maintenance headache, but > much depends upon how much data you're talking about. > > When replying, the number of documents is almost, but not quite > totally, useless unless combined with the number of fields you're > storing per doc, the average length of each field, etc . > > Erick > > On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]> > wrote: > > > Hello everyone, > > > > The system we're moving from (dtSearch) allows each of our clients to > have > > a search index. So far I have yet to find the options required to set > this, > > it seems I can only set the directory path before run time. > > > > Each of the indexes uses the same schema, same configuration just > > different data in each; what kind of performance penalty would I have > from > > running a new solr instance per required database? what is the best > way to > > track what port or what index is being used? would I be able to run > 1,000 or > > more solr instances without performance degradation? > > > > Thanks for your help. > > > > Best regards, Martin Owens > > > > > >
Re: Two Solr Webapps, one folder for the index data?
Maybe I'm confused. Can't you use the brand-spanking new MultiCore stuff for this, or JNDI, as I just mentioned in the "Re: Solr, Multiple processes running" thread? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: patrick o'leary <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 2:18:58 PM Subject: Re: Two Solr Webapps, one folder for the index data? JVM properties restrict you to a single implementation within a jvm. For instance if you want multiple instances of solr running with the same schema, with different data dir's in the one app server. You'll have to have several copies of solrconfig and schema.xml. By using context environment, I can have multiple contexts like pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr.xml pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr1.xml Changing just the solr/data/dir for each instance. And in my solrconfig.xml ${env/solr/data/dir:./solr/data} It certainly makes development & operations easier. P Mike Klaas wrote: I use jvm system properties for this; they seem to work well. -Mike On 11-Dec-07, at 7:39 AM, patrick o'leary wrote: I actually have a patch for solr config parser which allows you to use context environment variables in the solrconfig.xml I generally use it for development when I'm working with multiple instances and different data dirs. I'll add it to jira today if you want it. P Jörg Kiegeland wrote: I have successfully configured two parallel Solr webapps , however I see that all data gets stored in one folder of my Tomcat installation, namely C:\Tomcat\solr\data\index. How can I configure that each Solr webapp shall store the data in the folders I assigned at , where already the Solr scheme etc. resides (so that it get stored at individualSolrFolder/data/index)? Thanks -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile -- Patrick O'Leary You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: Solr, Multiple processes running
How much data are we talking about here? Because it seems *much* simpler to just index a field with each document indicating the user and then just AND that user's ID in with your query. Or think about facets (although I admit I don't know enough about facets to weigh in on its merits, it's just been mentioned a lot). Keeping track of 1,000+ indexes seems like a maintenance headache, but much depends upon how much data you're talking about. When replying, the number of documents is almost, but not quite totally, useless unless combined with the number of fields you're storing per doc, the average length of each field, etc . Erick On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > Hello everyone, > > The system we're moving from (dtSearch) allows each of our clients to have > a search index. So far I have yet to find the options required to set this, > it seems I can only set the directory path before run time. > > Each of the indexes uses the same schema, same configuration just > different data in each; what kind of performance penalty would I have from > running a new solr instance per required database? what is the best way to > track what port or what index is being used? would I be able to run 1,000 or > more solr instances without performance degradation? > > Thanks for your help. > > Best regards, Martin Owens >
Re: SOLR X FAST
I think it all depends, what do you want out of Solr or FAST? Thanks! Matthew Runo Software Developer 702.943.7833 On Dec 11, 2007, at 2:09 PM, William Silva wrote: Hi, How is the best way to compare SOLR and FAST Search ? Thanks, William.
Solr and Flex
Has anyone used Solr in a Flex application? Any code snipplets to share? Thank you. Jennifer -- View this message in context: http://www.nabble.com/Solr-and-Flex-tp14284703p14284703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR X FAST
Hi, Why use FAST and not use SOLR ? For example. What will FAST offer that will justify the investment ? I would like a matrix comparing both. Thanks, William. On Dec 11, 2007 8:15 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > I think it all depends, what do you want out of Solr or FAST? > >Thanks! > > Matthew Runo > Software Developer > 702.943.7833 > > On Dec 11, 2007, at 2:09 PM, William Silva wrote: > > > Hi, > > How is the best way to compare SOLR and FAST Search ? > > Thanks, > > William. > >
Re: SOLR X FAST
Stability and better Support (at great cost obviously) On Dec 11, 2007 10:20 PM, William Silva <[EMAIL PROTECTED]> wrote: > Hi, > Why use FAST and not use SOLR ? For example. > What will FAST offer that will justify the investment ? > I would like a matrix comparing both. > Thanks, > William. > > > On Dec 11, 2007 8:15 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > > > I think it all depends, what do you want out of Solr or FAST? > > > >Thanks! > > > > Matthew Runo > > Software Developer > > 702.943.7833 > > > > On Dec 11, 2007, at 2:09 PM, William Silva wrote: > > > > > Hi, > > > How is the best way to compare SOLR and FAST Search ? > > > Thanks, > > > William. > > > > >
Re: Solr, Multiple processes running
Since they all use the same schema, can you add a client ID to each document when it is indexed? Filter by "clientid:4" and you get a subset of the index. wunder On 12/11/07 1:01 PM, "Owens, Martin" <[EMAIL PROTECTED]> wrote: > Hello everyone, > > The system we're moving from (dtSearch) allows each of our clients to have a > search index. So far I have yet to find the options required to set this, it > seems I can only set the directory path before run time. > > Each of the indexes uses the same schema, same configuration just different > data in each; what kind of performance penalty would I have from running a new > solr instance per required database? what is the best way to track what port > or what index is being used? would I be able to run 1,000 or more solr > instances without performance degradation? > > Thanks for your help. > > Best regards, Martin Owens
Re: distributing indexes via solr
On 10-Dec-07, at 12:50 PM, Doug T wrote: I have been using parallelmultisearches on multi-CPU machines, and seen sizable benefit over a single large index (even if all of the fragments are on 1 disk). Is there a way to quickly enable this on a solr server? Or do I need to go into the source to make the change? Unfortunately, there is no easy way to enable this. Patches welcome! -Mike
Re: SOLR X FAST
Depends, if you are looking for a small sized index (gigabytes rather than dozens or hundreds of gigabytes or terabytes) with relatively simple requirements (a few facets, simple tokenization, English only linguistics, etc.) Solr is likely to be appropriate for most cases. FAST however gives you great horizontal scalability, out of the box linguistics for many languages (including CJK), contextual and scope searching, a web, file and database crawler, programmable ingestion pipeline, etc. Regards. --Nuno On 11 Dec 2007, at 22:09, William Silva wrote: Hi, How is the best way to compare SOLR and FAST Search ? Thanks, William.
Re: SOLR X FAST
Could you please elaborate on what you mean by ingestion pipeline and horizontal scalability? I apologize if this is a stupid question everyone else on the forum is familiar with. Thanks, Ravi On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote: > Depends, if you are looking for a small sized index (gigabytes rather > than dozens or hundreds of gigabytes or terabytes) with relatively > simple requirements (a few facets, simple tokenization, English only > linguistics, etc.) Solr is likely to be appropriate for most cases. > > FAST however gives you great horizontal scalability, out of the box > linguistics for many languages (including CJK), contextual and scope > searching, a web, file and database crawler, programmable ingestion > pipeline, etc. > > Regards. > > --Nuno > > > On 11 Dec 2007, at 22:09, William Silva wrote: > > > Hi, > > How is the best way to compare SOLR and FAST Search ? > > Thanks, > > William. > >
Re: SOLR X FAST
FAST uses two pipelines - an ingestion pipeline (for document feeding) and a query pipeline which are fully programmable (i.e., you can customize it fully). At ingestion time you typically prepare documents for indexing (tokenize, character normalize, lemmatize, clean up text, perform entity extraction for facets, perform static boosting for certain documents, etc.), while at query time you can expand synonyms, and do other general query side tasks (not unlike Solr). Horizontal scalability means the ability to cluster your search engine across a large number of servers, so you can scale up on the number of documents, queries, crawls, etc. There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds. Yet again, if your requirements are relatively simple then Lucene might do the job just fine. Hope this helps. --Nuno. On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote: Could you please elaborate on what you mean by ingestion pipeline and horizontal scalability? I apologize if this is a stupid question everyone else on the forum is familiar with. Thanks, Ravi On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote: Depends, if you are looking for a small sized index (gigabytes rather than dozens or hundreds of gigabytes or terabytes) with relatively simple requirements (a few facets, simple tokenization, English only linguistics, etc.) Solr is likely to be appropriate for most cases. FAST however gives you great horizontal scalability, out of the box linguistics for many languages (including CJK), contextual and scope searching, a web, file and database crawler, programmable ingestion pipeline, etc. Regards. --Nuno On 11 Dec 2007, at 22:09, William Silva wrote: Hi, How is the best way to compare SOLR and FAST Search ? Thanks, William.
RE: SOLR X FAST
FAST is a little less flexible (no dynamic fields) and not programmable at the Lucene level. We recently switched from FAST to Solr because of cost reasons. They did not know how to license us; they are used to, say, IBM running FAST on hundreds of servers. We are a startup with very specific needs. It's turned out to be worthwhile because we only want to do one thing really well and we can customize Solr for it. Lance -Original Message- From: Nuno Leitao [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 5:51 PM To: solr-user@lucene.apache.org Subject: Re: SOLR X FAST FAST uses two pipelines - an ingestion pipeline (for document feeding) and a query pipeline which are fully programmable (i.e., you can customize it fully). At ingestion time you typically prepare documents for indexing (tokenize, character normalize, lemmatize, clean up text, perform entity extraction for facets, perform static boosting for certain documents, etc.), while at query time you can expand synonyms, and do other general query side tasks (not unlike Solr). Horizontal scalability means the ability to cluster your search engine across a large number of servers, so you can scale up on the number of documents, queries, crawls, etc. There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds. Yet again, if your requirements are relatively simple then Lucene might do the job just fine. Hope this helps. --Nuno. On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote: > Could you please elaborate on what you mean by ingestion pipeline and > horizontal scalability? I apologize if this is a stupid question > everyone else on the forum is familiar with. > > Thanks, > Ravi > > On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote: >> Depends, if you are looking for a small sized index (gigabytes rather >> than dozens or hundreds of gigabytes or terabytes) with relatively >> simple requirements (a few facets, simple tokenization, English only >> linguistics, etc.) Solr is likely to be appropriate for most cases. >> >> FAST however gives you great horizontal scalability, out of the box >> linguistics for many languages (including CJK), contextual and scope >> searching, a web, file and database crawler, programmable ingestion >> pipeline, etc. >> >> Regards. >> >> --Nuno >> >> >> On 11 Dec 2007, at 22:09, William Silva wrote: >> >>> Hi, >>> How is the best way to compare SOLR and FAST Search ? >>> Thanks, >>> William. >> >>
Re: Facets - What's a better term for non technical people?
"category counts" On 11-Dec-07, at 6:38 PM, Norskog, Lance wrote: In SQL terms they are: 'select unique'. Except on only one field. -Original Message- From: Charles Hornberger [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Facets - What's a better term for non technical people? FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: Hi, So, has anyone got a good example of the language they might use over, say, a set of radio buttons and fields on a web form, to indicate that selecting one or more of these would return facets. 'Show grouping by' or 'List the sets that the results fall into' or something similar. Here's what i found some time : http://www.searchtools.com/info/faceted-metadata.html It has been quite useful to me. André Davignon
RE: Facets - What's a better term for non technical people?
In SQL terms they are: 'select unique'. Except on only one field. -Original Message- From: Charles Hornberger [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Facets - What's a better term for non technical people? FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: > Hi, > > > So, has anyone got a good example of the language they might use > > over, say, a set of radio buttons and fields on a web form, to > > indicate that selecting one or more of these would return facets. 'Show > > grouping by' > > or 'List the sets that the results fall into' or something similar. > > Here's what i found some time : > http://www.searchtools.com/info/faceted-metadata.html > > It has been quite useful to me. > > André Davignon > >
Re: SOLR X FAST
Just to comment on that last part: "There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds." There are also a lot of Lucene or Solr deployments with similar setups - I've worked on and with some decent-sized search clusters with 50-100 search servers, fault tolerance, high input rates, and high query rates. It's all doable with Lucene and Solr, it's just that not everything comes out of the box, so you have to either find somebody to help out or do it in house. You pay developers to build exactly what you need as opposed to paying FAST a pile of $$$ based on, say, the query rate. Or you don't pay $250,000 for a Google Appliance that can index only 10MM docs (this is a real number). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Nuno Leitao <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 8:50:37 PM Subject: Re: SOLR X FAST FAST uses two pipelines - an ingestion pipeline (for document feeding) and a query pipeline which are fully programmable (i.e., you can customize it fully). At ingestion time you typically prepare documents for indexing (tokenize, character normalize, lemmatize, clean up text, perform entity extraction for facets, perform static boosting for certain documents, etc.), while at query time you can expand synonyms, and do other general query side tasks (not unlike Solr). Horizontal scalability means the ability to cluster your search engine across a large number of servers, so you can scale up on the number of documents, queries, crawls, etc. There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds. Yet again, if your requirements are relatively simple then Lucene might do the job just fine. Hope this helps. --Nuno. On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote: > Could you please elaborate on what you mean by ingestion pipeline and > horizontal scalability? I apologize if this is a stupid question > everyone else on the forum is familiar with. > > Thanks, > Ravi > > On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote: >> Depends, if you are looking for a small sized index (gigabytes rather >> than dozens or hundreds of gigabytes or terabytes) with relatively >> simple requirements (a few facets, simple tokenization, English only >> linguistics, etc.) Solr is likely to be appropriate for most cases. >> >> FAST however gives you great horizontal scalability, out of the box >> linguistics for many languages (including CJK), contextual and scope >> searching, a web, file and database crawler, programmable ingestion >> pipeline, etc. >> >> Regards. >> >> --Nuno >> >> >> On 11 Dec 2007, at 22:09, William Silva wrote: >> >>> Hi, >>> How is the best way to compare SOLR and FAST Search ? >>> Thanks, >>> William. >> >>
Re: Facets - What's a better term for non technical people?
Isn't that GROUP BY ColumnX, count(1) type of thing? I'd think "group by" would be a good label. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: "Norskog, Lance" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 9:38:37 PM Subject: RE: Facets - What's a better term for non technical people? In SQL terms they are: 'select unique'. Except on only one field. -Original Message- From: Charles Hornberger [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Facets - What's a better term for non technical people? FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: > Hi, > > > So, has anyone got a good example of the language they might use > > over, say, a set of radio buttons and fields on a web form, to > > indicate that selecting one or more of these would return facets. 'Show grouping by' > > or 'List the sets that the results fall into' or something similar. > > Here's what i found some time : > http://www.searchtools.com/info/faceted-metadata.html > > It has been quite useful to me. > > André Davignon > >