Re: Solr feasibility with terabyte-scale data

2008-01-20 Thread Otis Gospodnetic
Hi,
Some quick notes, since it's late here.

- You'll need to wait for SOLR-303 - there is no way even a big machine will be 
able to search such a large index in a reasonable amount of time, plus you may 
simply not have enough RAM for such a large index.

- I'd suggest you wait for Solr 1.3 (or some -dev version that uses the 
about-to-be-released Lucene 2.3)...performance reasons.

- As for avoiding index duplication - how about having a SAN with a single copy 
of the index that all searchers (and the master) point to?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Phillip Farber <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, January 18, 2008 5:26:21 PM
Subject: Solr feasibility with terabyte-scale data

Hello everyone,

We are considering Solr 1.2 to index and search a terabyte-scale
 dataset 
of OCR.  Initially our requirements are simple: basic tokenizing, score
 
sorting only, no faceting.   The schema is simple too.  A document 
consists of a numeric id, stored and indexed and a large text field, 
indexed not stored, containing the OCR typically ~1.4Mb.  Some limited 
faceting or additional metadata fields may be added later.

The data in question currently amounts to about 1.1Tb of OCR (about 1M 
docs) which we expect to increase to 10Tb over time.  Pilot tests on
 the 
desktop w/ 2.6 GHz P4 with 2.5 Gb memory, java 1Gb heap on ~180 Mb of 
data via HTTP suggest we can index at a rate sufficient to keep up with
 
the inputs (after getting over the 1.1 Tb hump).  We envision nightly 
commits/optimizes.

We expect to have low QPS (<10) rate and probably will not need 
millisecond query response.

Our environment makes available Apache on blade servers (Dell 1955 dual
dual-core 3.x GHz Xeons w/ 8GB RAM) connected to a *large*,
high-performance NAS system over a dedicated (out-of-band) GbE switch
(Dell PowerConnect 5324) using a 9K MTU (jumbo packets). We are
 starting
with 2 blades and will add as demands require.

While we have a lot of storage, the idea of master/slave Solr
 Collection 
Distribution to add more Solr instances clearly means duplicating an 
immense index.  Is it possible to use one instance to update the index 
on NAS while other instances only read the index and commit to keep 
their caches warm instead?

Should we expect Solr indexing time to slow significantly as we scale 
up?  What kind of query performance could we expect?  Is it totally 
naive even to consider Solr at this kind of scale?

Given these parameters is it realistic to think that Solr could handle 
the task?

Any advice/wisdom greatly appreciated,

Phil






Re: Storing Related Data - At Different Times

2008-01-20 Thread Otis Gospodnetic
You could have 2 separate indices tied with a common field (a la FK->PK).  Then 
you only need to change the item you are updating.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Gavin <[EMAIL PROTECTED]>
To: solr-user 
Sent: Monday, January 21, 2008 12:09:23 AM
Subject: Storing Related Data - At Different Times

Hi,
In the web application we are developing we have two sets of
 details.
The personal details and the resume details. We allow 5 different
resumes to be available for each user. But we want the personal details
to remain same for each 5 resumes. Personal details are added at
registration time. After than for each resume we want link personal
details. This is a simple join in the db. But how do we achieve this in
Solr. The problem is when personal details are changed we will have to
update all 5 resumes. 

I read the thread "Some sort of join in SOLR?". But not sure this
answers my problem. Would very much appreciate some sort of help on
 this
one.

Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are
 confidential and intended solely for 
the use of the individual or entity to which they are addressed. The
 content and opinions 
contained in this email are not necessarily those of hSenid Software
 International. 
If you have received this email in error please contact the sender.






Re: spellcheckhandler

2008-01-20 Thread Otis Gospodnetic
You don't need to wait for 1.3 to be released - you can simply use a recent 
nightly build.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: anuvenk <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, January 21, 2008 12:35:52 AM
Subject: Re: spellcheckhandler


I followed the steps outlined in 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to setting up of the schema with a new field 'spell' and
copying other fields to this 'spell' field at index time.
It works fine with single word queries but doesn't return anything for
multi-word queries. I read previous posts where this has been
 discussed. I
read that some of the active members are in the process of releasing
 patches
that fixes this problem. I'm actually trying to implement this spell
 check
in the production set up. Is it absolutely not possible to get spell
 check
results back for multi-word queries, should i wait for 1.3 release. If
 there
is any other option please educate me. In case a patch was already
 released,
how to add it to the current 1.2 version that i'm using?
-- 
View this message in context:
 http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Update the index

2008-01-20 Thread anuvenk

http://wiki.apache.org/solr/UpdateXmlMessages
Is this what you are looking for. Index the document again and it should
overwrite the older one with the same id.

Gavin-39 wrote:
> 
> Hi,
>   Can some one point me to a location where it describes how to update an
> already indexed document? I was thinking there is and  tag
> explained somewhere but cant find it.
> 
> Thanks,
> -- 
> Gavin Selvaratnam,
> Project Leader
> 
> hSenid Mobile Solutions
> Phone: +94-11-2446623/4 
> Fax: +94-11-2307579 
> 
> Web: http://www.hSenidMobile.com 
>  
> Make it happen
> 
> Disclaimer: This email and any files transmitted with it are confidential
> and intended solely for 
> the use of the individual or entity to which they are addressed. The
> content and opinions 
> contained in this email are not necessarily those of hSenid Software
> International. 
> If you have received this email in error please contact the sender.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Update-the-index-tp14991443p14991551.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheckhandler

2008-01-20 Thread anuvenk

I followed the steps outlined in 
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to setting up of the schema with a new field 'spell' and
copying other fields to this 'spell' field at index time.
It works fine with single word queries but doesn't return anything for
multi-word queries. I read previous posts where this has been discussed. I
read that some of the active members are in the process of releasing patches
that fixes this problem. I'm actually trying to implement this spell check
in the production set up. Is it absolutely not possible to get spell check
results back for multi-word queries, should i wait for 1.3 release. If there
is any other option please educate me. In case a patch was already released,
how to add it to the current 1.2 version that i'm using?
-- 
View this message in context: 
http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html
Sent from the Solr - User mailing list archive at Nabble.com.



Update the index

2008-01-20 Thread Gavin
Hi,
Can some one point me to a location where it describes how to update an
already indexed document? I was thinking there is and  tag
explained somewhere but cant find it.

Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.



Storing Related Data - At Different Times

2008-01-20 Thread Gavin
Hi,
In the web application we are developing we have two sets of details.
The personal details and the resume details. We allow 5 different
resumes to be available for each user. But we want the personal details
to remain same for each 5 resumes. Personal details are added at
registration time. After than for each resume we want link personal
details. This is a simple join in the db. But how do we achieve this in
Solr. The problem is when personal details are changed we will have to
update all 5 resumes. 

I read the thread "Some sort of join in SOLR?". But not sure this
answers my problem. Would very much appreciate some sort of help on this
one.

Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.



Term vector

2008-01-20 Thread anuvenk

what are term vectors? How do they help with mlt?
-- 
View this message in context: 
http://www.nabble.com/Term-vector-tp14990408p14990408.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3

2008-01-20 Thread Edward Zhang
http://svn.apache.org/repos/asf/lucene/solr/trunk
or
http://lucene.apache.org/solr/version_control.html


On 1/21/08, anuvenk <[EMAIL PROTECTED]> wrote:
>
>
> Could you please let me know the location from where i can get it.
>
> climbingrose wrote:
> >
> > I'm using code pulled directly from Subversion.
> >
> > On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> Thanks. Would this be the latest code from the trunk that you
> mentioned?
> >> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip
> >>
> >>
> >> climbingrose wrote:
> >> >
> >> > I don't think they (Solr developers) have a time frame for 1.3release.
> >> > However, I've been using the latest code from the trunk and I can
> tell
> >> you
> >> > it's quite stable. The only problem is the documentation sometimes
> >> doesn't
> >> > cover lastest changes in the code. You'll probably have to dig into
> the
> >> > code
> >> > itself or post a question here and many people will be happy to help
> >> you.
> >> >
> >> > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> >> >
> >> >>
> >> >> when will this be released? where can i find the list of
> >> >> improvements/enhancements in 1.3 if its been documented already?
> >> >> --
> >> >> View this message in context:
> >> >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Cuong Hoang
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/solr-1.3-tp14989395p14989689.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Cuong Hoang
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/solr-1.3-tp14989395p14989802.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: solr 1.3

2008-01-20 Thread anuvenk

Could you please let me know the location from where i can get it.

climbingrose wrote:
> 
> I'm using code pulled directly from Subversion.
> 
> On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> 
>>
>> Thanks. Would this be the latest code from the trunk that you mentioned?
>> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip
>>
>>
>> climbingrose wrote:
>> >
>> > I don't think they (Solr developers) have a time frame for 1.3 release.
>> > However, I've been using the latest code from the trunk and I can tell
>> you
>> > it's quite stable. The only problem is the documentation sometimes
>> doesn't
>> > cover lastest changes in the code. You'll probably have to dig into the
>> > code
>> > itself or post a question here and many people will be happy to help
>> you.
>> >
>> > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote:
>> >
>> >>
>> >> when will this be released? where can i find the list of
>> >> improvements/enhancements in 1.3 if its been documented already?
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Cuong Hoang
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/solr-1.3-tp14989395p14989689.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> 
> Cuong Hoang
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989802.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3

2008-01-20 Thread climbingrose
I'm using code pulled directly from Subversion.

On Jan 21, 2008 12:34 PM, anuvenk <[EMAIL PROTECTED]> wrote:

>
> Thanks. Would this be the latest code from the trunk that you mentioned?
> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip
>
>
> climbingrose wrote:
> >
> > I don't think they (Solr developers) have a time frame for 1.3 release.
> > However, I've been using the latest code from the trunk and I can tell
> you
> > it's quite stable. The only problem is the documentation sometimes
> doesn't
> > cover lastest changes in the code. You'll probably have to dig into the
> > code
> > itself or post a question here and many people will be happy to help
> you.
> >
> > On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> when will this be released? where can i find the list of
> >> improvements/enhancements in 1.3 if its been documented already?
> >> --
> >> View this message in context:
> >> http://www.nabble.com/solr-1.3-tp14989395p14989395.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Cuong Hoang
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/solr-1.3-tp14989395p14989689.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,

Cuong Hoang


Re: Getting multiple category results

2008-01-20 Thread Yonik Seeley
On Jan 18, 2008 11:54 PM, muddassir hasan <[EMAIL PROTECTED]> wrote:
> Thanks Ryan,
>
> Field collapsing that u have suggested is close to what i want.
>
> But I am still in need of certain solution to my first problem i.e. sorting 
> on combined score of date and relevancy of document.

See the DisMax handler, esp the bf parameter (boost function) where
you can add the score of a function query into the full-text relevancy
score.

 An incomplete answer is here:
http://wiki.apache.org/solr/SolrRelevancyFAQ

-Yonik


Re: solr 1.3

2008-01-20 Thread anuvenk

Thanks. Would this be the latest code from the trunk that you mentioned?
http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip


climbingrose wrote:
> 
> I don't think they (Solr developers) have a time frame for 1.3 release.
> However, I've been using the latest code from the trunk and I can tell you
> it's quite stable. The only problem is the documentation sometimes doesn't
> cover lastest changes in the code. You'll probably have to dig into the
> code
> itself or post a question here and many people will be happy to help you.
> 
> On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> 
>>
>> when will this be released? where can i find the list of
>> improvements/enhancements in 1.3 if its been documented already?
>> --
>> View this message in context:
>> http://www.nabble.com/solr-1.3-tp14989395p14989395.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> 
> Cuong Hoang
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989689.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3

2008-01-20 Thread climbingrose
I don't think they (Solr developers) have a time frame for 1.3 release.
However, I've been using the latest code from the trunk and I can tell you
it's quite stable. The only problem is the documentation sometimes doesn't
cover lastest changes in the code. You'll probably have to dig into the code
itself or post a question here and many people will be happy to help you.

On Jan 21, 2008 12:07 PM, anuvenk <[EMAIL PROTECTED]> wrote:

>
> when will this be released? where can i find the list of
> improvements/enhancements in 1.3 if its been documented already?
> --
> View this message in context:
> http://www.nabble.com/solr-1.3-tp14989395p14989395.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,

Cuong Hoang


solr 1.3

2008-01-20 Thread anuvenk

when will this be released? where can i find the list of
improvements/enhancements in 1.3 if its been documented already?
-- 
View this message in context: 
http://www.nabble.com/solr-1.3-tp14989395p14989395.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spell check component

2008-01-20 Thread Grant Ingersoll

I just asked this too: 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg08515.html

On Jan 19, 2008, at 1:45 PM, anuvenk wrote:



Is it possible to add a spell check component so i don't have to  
issue a
separate request to solr to do the spell checking? Sorry if this  
question is

naive..am just learning to use solr.



and add it to the search handler like this


 spellcheck


what would the name of the spell check component be?

--
View this message in context: 
http://www.nabble.com/spell-check-component-tp14973651p14973651.html
Sent from the Solr - User mailing list archive at Nabble.com.