Re: performance of million documents search
Hi Erick, It's very useful.Thank you very much 2010/4/26 Erick Erickson > NGrams might help here, search the SOLR list for "NGram" > and I think you'll find that this subject has been discussed > several times... > > HTH > Erick > > On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang wrote: > > > Hi, > > > > I have about 2 million documents in my index. I want to search them by a > > string field. Every document have this field such as 'LB681' . > > > > The field is a dynamic Field which type is string. So, in solr/admin , I > > do > > search by using " PartNo_s:L* " which means started with L, > > > > I can get the result from 2 million documents less than 300 ms. But, > when > > I use " PartNo_s:*B68* " which means included B68 to search, > > > > It take more than 2000 ms. It is too slow for me. > > > > Has anyone know that how can I get the result more faster? > > > > > > thank you very much > > >
Re: hybrid approach to using cloud servers for Solr/Lucene
Hello Dennis >>If the load goes up, then queries are sent to the cloud at a certain point. My advice is to do load balancing between local and cloud. Your local system seems to be capable as it is a dedicated host. Another option is to do indexing in local and sync it with cloud. Cloud will be only used for search. Hope it helps. Regards Aditya www,findbestopensource.com On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon wrote: > I'm working on an app that could grow much faster and bigger than I could > scale local resources, at least on certain dates and for other reasons. > > So I'd like to run a local machine in a dedicated host or even virtual > machine at a host. > > If the load goes up, then queries are sent to the cloud at a certain point. > > Is this practical, anyone have experience in this? > > This is obviously a search engine app based on solr/lucene if someone is > wondering. > > Dennis Gearon > > Signature Warning > > EARTH has a Right To Life, > otherwise we all die. > > Read 'Hot, Flat, and Crowded' > Laugh at http://www.yert.com/film.php >
Re: DIH: inner select fails when outter entity is null/empty
Hi, Thanks for this tip, Paul. But what if this is not an error. Is this what transformers should be used for somehow? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Noble Paul നോബിള് नोब्ळ् > To: solr-user@lucene.apache.org > Sent: Sun, April 25, 2010 9:16:22 AM > Subject: Re: DIH: inner select fails when outter entity is null/empty > > do an onError="skip" on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, > Otis Gospodnetic < > href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com> > wrote: > Hello, > > Here is a newbie DataImportHandler > question: > > Currently, I have entities with entities. There are > some > situations where a column value from the outer entity is null, and > when I try to use it in the inner entity, the null just gets replaced with > an > empty string. That in turn causes the SQL query in the inner entity > to > fail. > > This seems like a common problem, but I > couldn't find any solutions or mention in the FAQ ( > href="http://wiki.apache.org/solr/DataImportHandlerFaq"; target=_blank > >http://wiki.apache.org/solr/DataImportHandlerFaq ) > > What is > the best practice to avoid or convert null values to something safer? > Would > this be done via a Transformer or is there a better mechanism for > this? > > I think the problem I'm describing is similar to what was > described here: > >http://search-lucene.com/m/cjlhtFkG6m > ... except I don't have the > luxury of rewriting the SQL selects. > > Thanks, > > Otis > > Sematext :: > target=_blank >http://sematext.com/ :: Solr - Lucene - Nutch > Lucene > ecosystem search :: > >http://search-lucene.com/ > > -- > - Noble Paul | > Systems Architect| AOL | > >http://aol.com
Re: hybrid approach to using cloud servers for Solr/Lucene
Hi, Hm. Everything is doable, but this sounds a bit undefined and possibly messy. If flexibility is of such importance, why have the "local" part at all? Why not have everything in an elastic cloud environment? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Dennis Gearon > To: solr-user@lucene.apache.org > Sent: Sun, April 25, 2010 10:17:11 PM > Subject: hybrid approach to using cloud servers for Solr/Lucene > > I'm working on an app that could grow much faster and bigger than I could > scale > local resources, at least on certain dates and for other reasons. So I'd > like to run a local machine in a dedicated host or even virtual machine at a > host. If the load goes up, then queries are sent to the cloud at a > certain point. Is this practical, anyone have experience in > this? This is obviously a search engine app based on solr/lucene if > someone is wondering. Dennis Gearon Signature > Warning EARTH has a Right To Life, otherwise we > all die. Read 'Hot, Flat, and Crowded' Laugh at > href="http://www.yert.com/film.php"; target=_blank > >http://www.yert.com/film.php
RE: Howto build a function query using the 'query' function
If the 'query' returned a count, yes. But my problem is exactly that as far as I can see from the description of the 'query' function, it does NOT return the count but the score of the search. So my quetion is; How can I write a 'query' function that returns a count, not a score? Cheers, Gert. From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Sun 4/25/2010 2:15 AM To: solr-user@lucene.apache.org Subject: Re: Howto build a function query using the 'query' function Villemos, Gert wrote: > I want to build a function expression for a dismax request handler 'bf' > field, to boost the documents if it is referenced by other documents. > I.e. the more often a document is referenced, the higher the boost. > > > > Something like > > > linear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference > d, 1), 0.01, 1) > > > > Intended to mean; > > if count is 0, then the boost is 0*0.01+1 = 1 > > if count is 1, then the boost is 1*0.01+1 = 1.01 > > If count is 100, then the boost is 100*0.01 + 1 = 2 > > > > However the query function > (http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able > to return the score of the query results, not the count of results. > > Probably I'm missing something, but doesn't just using linear function meet your needs? i.e. linear(myQueryReturningACountOfHowOftenThisDocumentIsReferenced, 0.01,1) Koji -- http://www.rondhuit.com/en/ Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
hybrid approach to using cloud servers for Solr/Lucene
I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: performance of million documents search
NGrams might help here, search the SOLR list for "NGram" and I think you'll find that this subject has been discussed several times... HTH Erick On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang wrote: > Hi, > > I have about 2 million documents in my index. I want to search them by a > string field. Every document have this field such as 'LB681' . > > The field is a dynamic Field which type is string. So, in solr/admin , I > do > search by using " PartNo_s:L* " which means started with L, > > I can get the result from 2 million documents less than 300 ms. But, when > I use " PartNo_s:*B68* " which means included B68 to search, > > It take more than 2000 ms. It is too slow for me. > > Has anyone know that how can I get the result more faster? > > > thank you very much >
Re: [spAm] Solr does not honor facet.mincount and field.facet.mincount
: REQUEST: : http://localhost:8983/solr/select/?q=*%3A*&version=2.2&rows=0&start=0&indent=on&facet=true&facet.field=Instrument&facet.field=Location&facet.mincount=9 : : RESPONSE: ... : ... : 9 ...the REQUST url you listed says facet.mincount, but the response from Solr disagrees. according to it you actaully had a capital "C" in facet.minCount ... solr params are case sensitive, so Solr is completley ignoring facet.minCount. As for why you don't get any values for the Instrument facet -- understanding that requires you to tell us more about the field/fieldType for Instrument. -Hoss
local vs cloud
I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: DIH: inner select fails when outter entity is null/empty
do an onError="skip" on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic wrote: > Hello, > > Here is a newbie DataImportHandler question: > > Currently, I have entities with entities. There are some > situations where a column value from the outer entity is null, and when I try > to use it in the inner entity, the null just gets replaced with an > empty string. That in turn causes the SQL query in the inner entity to > fail. > > This seems like a common problem, but I couldn't find any solutions or > mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq ) > > What is the best practice to avoid or convert null values to something safer? > Would > this be done via a Transformer or is there a better mechanism for this? > > I think the problem I'm describing is similar to what was described here: > http://search-lucene.com/m/cjlhtFkG6m > ... except I don't have the luxury of rewriting the SQL selects. > > Thanks, > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: How to setup search engine for B2B web app
Hi Bill, On Sun, Apr 25, 2010 at 12:23 PM, Bill Paetzke wrote: > *Given:* > > - 1 database per client (business customer) > - 5000 clients > - Clients have between 2 to 2000 users (avg is ~100 users/client) > - 100k to 10 million records per database > - Users need to search those records often (it's the best way to navigate > their data) > > *The Question:* > > How would you setup Solr (or Lucene) search so that each client can only > search within its database? > > How would you setup the index(es)? > I'd look at setting up multiple cores for each client. You may need to setup slaves as well depending on search traffic. > Where do you store the index(es)? > Setting up 5K cores on one box will not work. So you will need to partition the clients into multiple boxes each having a subset of cores. > Would you need to add a filter to all search queries? > Nope, but you will need to send the query to the correct host (perhaps a mapping DB will help) > If a client cancelled, how would you delete their (part of the) index? > (this > may be trivial--not sure yet) > > With different cores for each client, this'd be pretty easy. -- Regards, Shalin Shekhar Mangar.