Re: Solr feasibility with terabyte-scale data
Hi, Some quick notes, since it's late here. - You'll need to wait for SOLR-303 - there is no way even a big machine will be able to search such a large index in a reasonable amount of time, plus you may simply not have enough RAM for such a large index. - I'd suggest you wait for Solr 1.3 (or some -dev version that uses the about-to-be-released Lucene 2.3)...performance reasons. - As for avoiding index duplication - how about having a SAN with a single copy of the index that all searchers (and the master) point to? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Phillip Farber <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, January 18, 2008 5:26:21 PM Subject: Solr feasibility with terabyte-scale data Hello everyone, We are considering Solr 1.2 to index and search a terabyte-scale dataset of OCR. Initially our requirements are simple: basic tokenizing, score sorting only, no faceting. The schema is simple too. A document consists of a numeric id, stored and indexed and a large text field, indexed not stored, containing the OCR typically ~1.4Mb. Some limited faceting or additional metadata fields may be added later. The data in question currently amounts to about 1.1Tb of OCR (about 1M docs) which we expect to increase to 10Tb over time. Pilot tests on the desktop w/ 2.6 GHz P4 with 2.5 Gb memory, java 1Gb heap on ~180 Mb of data via HTTP suggest we can index at a rate sufficient to keep up with the inputs (after getting over the 1.1 Tb hump). We envision nightly commits/optimizes. We expect to have low QPS (<10) rate and probably will not need millisecond query response. Our environment makes available Apache on blade servers (Dell 1955 dual dual-core 3.x GHz Xeons w/ 8GB RAM) connected to a *large*, high-performance NAS system over a dedicated (out-of-band) GbE switch (Dell PowerConnect 5324) using a 9K MTU (jumbo packets). We are starting with 2 blades and will add as demands require. While we have a lot of storage, the idea of master/slave Solr Collection Distribution to add more Solr instances clearly means duplicating an immense index. Is it possible to use one instance to update the index on NAS while other instances only read the index and commit to keep their caches warm instead? Should we expect Solr indexing time to slow significantly as we scale up? What kind of query performance could we expect? Is it totally naive even to consider Solr at this kind of scale? Given these parameters is it realistic to think that Solr could handle the task? Any advice/wisdom greatly appreciated, Phil
Re: Storing Related Data - At Different Times
You could have 2 separate indices tied with a common field (a la FK->PK). Then you only need to change the item you are updating. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gavin <[EMAIL PROTECTED]> To: solr-user Sent: Monday, January 21, 2008 12:09:23 AM Subject: Storing Related Data - At Different Times Hi, In the web application we are developing we have two sets of details. The personal details and the resume details. We allow 5 different resumes to be available for each user. But we want the personal details to remain same for each 5 resumes. Personal details are added at registration time. After than for each resume we want link personal details. This is a simple join in the db. But how do we achieve this in Solr. The problem is when personal details are changed we will have to update all 5 resumes. I read the thread "Some sort of join in SOLR?". But not sure this answers my problem. Would very much appreciate some sort of help on this one. Thanks, -- Gavin Selvaratnam, Project Leader hSenid Mobile Solutions Phone: +94-11-2446623/4 Fax: +94-11-2307579 Web: http://www.hSenidMobile.com Make it happen Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. The content and opinions contained in this email are not necessarily those of hSenid Software International. If you have received this email in error please contact the sender.
Re: spellcheckhandler
You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update the index
http://wiki.apache.org/solr/UpdateXmlMessages Is this what you are looking for. Index the document again and it should overwrite the older one with the same id. Gavin-39 wrote: > > Hi, > Can some one point me to a location where it describes how to update an > already indexed document? I was thinking there is and tag > explained somewhere but cant find it. > > Thanks, > -- > Gavin Selvaratnam, > Project Leader > > hSenid Mobile Solutions > Phone: +94-11-2446623/4 > Fax: +94-11-2307579 > > Web: http://www.hSenidMobile.com > > Make it happen > > Disclaimer: This email and any files transmitted with it are confidential > and intended solely for > the use of the individual or entity to which they are addressed. The > content and opinions > contained in this email are not necessarily those of hSenid Software > International. > If you have received this email in error please contact the sender. > > > -- View this message in context: http://www.nabble.com/Update-the-index-tp14991443p14991551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com.
Update the index
Hi, Can some one point me to a location where it describes how to update an already indexed document? I was thinking there is and tag explained somewhere but cant find it. Thanks, -- Gavin Selvaratnam, Project Leader hSenid Mobile Solutions Phone: +94-11-2446623/4 Fax: +94-11-2307579 Web: http://www.hSenidMobile.com Make it happen Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. The content and opinions contained in this email are not necessarily those of hSenid Software International. If you have received this email in error please contact the sender.
Storing Related Data - At Different Times
Hi, In the web application we are developing we have two sets of details. The personal details and the resume details. We allow 5 different resumes to be available for each user. But we want the personal details to remain same for each 5 resumes. Personal details are added at registration time. After than for each resume we want link personal details. This is a simple join in the db. But how do we achieve this in Solr. The problem is when personal details are changed we will have to update all 5 resumes. I read the thread "Some sort of join in SOLR?". But not sure this answers my problem. Would very much appreciate some sort of help on this one. Thanks, -- Gavin Selvaratnam, Project Leader hSenid Mobile Solutions Phone: +94-11-2446623/4 Fax: +94-11-2307579 Web: http://www.hSenidMobile.com Make it happen Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. The content and opinions contained in this email are not necessarily those of hSenid Software International. If you have received this email in error please contact the sender.
Term vector
what are term vectors? How do they help with mlt? -- View this message in context: http://www.nabble.com/Term-vector-tp14990408p14990408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3
http://svn.apache.org/repos/asf/lucene/solr/trunk or http://lucene.apache.org/solr/version_control.html On 1/21/08, anuvenk <[EMAIL PROTECTED]> wrote: > > > Could you please let me know the location from where i can get it. > > climbingrose wrote: > > > > I'm using code pulled directly from Subversion. > > > > On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote: > > > >> > >> Thanks. Would this be the latest code from the trunk that you > mentioned? > >> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip > >> > >> > >> climbingrose wrote: > >> > > >> > I don't think they (Solr developers) have a time frame for 1.3release. > >> > However, I've been using the latest code from the trunk and I can > tell > >> you > >> > it's quite stable. The only problem is the documentation sometimes > >> doesn't > >> > cover lastest changes in the code. You'll probably have to dig into > the > >> > code > >> > itself or post a question here and many people will be happy to help > >> you. > >> > > >> > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote: > >> > > >> >> > >> >> when will this be released? where can i find the list of > >> >> improvements/enhancements in 1.3 if its been documented already? > >> >> -- > >> >> View this message in context: > >> >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > -- > >> > Regards, > >> > > >> > Cuong Hoang > >> > > >> > > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/solr-1.3-tp14989395p14989689.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > Regards, > > > > Cuong Hoang > > > > > > -- > View this message in context: > http://www.nabble.com/solr-1.3-tp14989395p14989802.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: solr 1.3
Could you please let me know the location from where i can get it. climbingrose wrote: > > I'm using code pulled directly from Subversion. > > On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote: > >> >> Thanks. Would this be the latest code from the trunk that you mentioned? >> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip >> >> >> climbingrose wrote: >> > >> > I don't think they (Solr developers) have a time frame for 1.3 release. >> > However, I've been using the latest code from the trunk and I can tell >> you >> > it's quite stable. The only problem is the documentation sometimes >> doesn't >> > cover lastest changes in the code. You'll probably have to dig into the >> > code >> > itself or post a question here and many people will be happy to help >> you. >> > >> > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote: >> > >> >> >> >> when will this be released? where can i find the list of >> >> improvements/enhancements in 1.3 if its been documented already? >> >> -- >> >> View this message in context: >> >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > -- >> > Regards, >> > >> > Cuong Hoang >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/solr-1.3-tp14989395p14989689.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > > Cuong Hoang > > -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3
I'm using code pulled directly from Subversion. On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote: > > Thanks. Would this be the latest code from the trunk that you mentioned? > http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip > > > climbingrose wrote: > > > > I don't think they (Solr developers) have a time frame for 1.3 release. > > However, I've been using the latest code from the trunk and I can tell > you > > it's quite stable. The only problem is the documentation sometimes > doesn't > > cover lastest changes in the code. You'll probably have to dig into the > > code > > itself or post a question here and many people will be happy to help > you. > > > > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote: > > > >> > >> when will this be released? where can i find the list of > >> improvements/enhancements in 1.3 if its been documented already? > >> -- > >> View this message in context: > >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > Regards, > > > > Cuong Hoang > > > > > > -- > View this message in context: > http://www.nabble.com/solr-1.3-tp14989395p14989689.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Cuong Hoang
Re: Getting multiple category results
On Jan 18, 2008 11:54 PM, muddassir hasan <[EMAIL PROTECTED]> wrote: > Thanks Ryan, > > Field collapsing that u have suggested is close to what i want. > > But I am still in need of certain solution to my first problem i.e. sorting > on combined score of date and relevancy of document. See the DisMax handler, esp the bf parameter (boost function) where you can add the score of a function query into the full-text relevancy score. An incomplete answer is here: http://wiki.apache.org/solr/SolrRelevancyFAQ -Yonik
Re: solr 1.3
Thanks. Would this be the latest code from the trunk that you mentioned? http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip climbingrose wrote: > > I don't think they (Solr developers) have a time frame for 1.3 release. > However, I've been using the latest code from the trunk and I can tell you > it's quite stable. The only problem is the documentation sometimes doesn't > cover lastest changes in the code. You'll probably have to dig into the > code > itself or post a question here and many people will be happy to help you. > > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote: > >> >> when will this be released? where can i find the list of >> improvements/enhancements in 1.3 if its been documented already? >> -- >> View this message in context: >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > > Cuong Hoang > > -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3
I don't think they (Solr developers) have a time frame for 1.3 release. However, I've been using the latest code from the trunk and I can tell you it's quite stable. The only problem is the documentation sometimes doesn't cover lastest changes in the code. You'll probably have to dig into the code itself or post a question here and many people will be happy to help you. On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote: > > when will this be released? where can i find the list of > improvements/enhancements in 1.3 if its been documented already? > -- > View this message in context: > http://www.nabble.com/solr-1.3-tp14989395p14989395.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Cuong Hoang
solr 1.3
when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989395.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spell check component
I just asked this too: http://www.mail-archive.com/solr-user@lucene.apache.org/msg08515.html On Jan 19, 2008, at 1:45 PM, anuvenk wrote: Is it possible to add a spell check component so i don't have to issue a separate request to solr to do the spell checking? Sorry if this question is naive..am just learning to use solr. and add it to the search handler like this spellcheck what would the name of the spell check component be? -- View this message in context: http://www.nabble.com/spell-check-component-tp14973651p14973651.html Sent from the Solr - User mailing list archive at Nabble.com.