Re: performance of million documents search

2010-04-25 Thread weiqi wang
Hi Erick,

It's very useful.Thank you very much

2010/4/26 Erick Erickson 

> NGrams might help here, search the SOLR list for "NGram"
> and I think you'll find that this subject has been discussed
> several times...
>
> HTH
> Erick
>
> On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang  wrote:
>
> > Hi,
> >
> > I have about 2 million documents in my index.  I want to search them by a
> > string field.  Every document have this field such as 'LB681' .
> >
> > The field is a dynamic Field which type is string.  So, in solr/admin , I
> > do
> > search by using "  PartNo_s:L*  " which means started with L,
> >
> > I can get the result from 2 million documents less than 300 ms.  But,
>  when
> > I use "  PartNo_s:*B68*  " which means included B68  to search,
> >
> > It take more than 2000 ms.  It is too slow for me.
> >
> > Has anyone know that how can I get the result more faster?
> >
> >
> > thank you very much
> >
>


Re: hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread findbestopensource
Hello Dennis

>>If the load goes up, then queries are sent to the cloud at a certain
point.
My advice is to do load balancing between local and cloud.  Your local
system seems to be capable as it is a dedicated host. Another option is to
do indexing in local and sync it with cloud. Cloud will be only used for
search.

Hope it helps.

Regards
Aditya
www,findbestopensource.com


On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon wrote:

> I'm working on an app that could grow much faster and bigger than I could
> scale local resources, at least on certain dates and for other reasons.
>
> So I'd like to run a local machine in a dedicated host or even virtual
> machine at a host.
>
> If the load goes up, then queries are sent to the cloud at a certain point.
>
> Is this practical, anyone have experience in this?
>
> This is obviously a search engine app based on solr/lucene if someone is
> wondering.
>
> Dennis Gearon
>
> Signature Warning
> 
> EARTH has a Right To Life,
>  otherwise we all die.
>
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
>


Re: DIH: inner select fails when outter entity is null/empty

2010-04-25 Thread Otis Gospodnetic
Hi,

Thanks for this tip, Paul.  But what if this is not an error.  Is this what 
transformers should be used for somehow?
 Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Noble Paul നോബിള്‍  नोब्ळ् 
> To: solr-user@lucene.apache.org
> Sent: Sun, April 25, 2010 9:16:22 AM
> Subject: Re: DIH: inner select fails when outter entity is null/empty
> 
> do an onError="skip" on the inner entity

On Fri, Apr 23, 2010 at 3:56 AM, 
> Otis Gospodnetic
<
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com> 
> wrote:
> Hello,
>
> Here is a newbie DataImportHandler 
> question:
>
> Currently, I have entities with entities.  There are 
> some
> situations where a column value from the outer entity is null, and 
> when I try to use it in the inner entity, the null just gets replaced with 
> an
> empty string.  That in turn causes the SQL query in the inner entity 
> to
> fail.
>
> This seems like a common problem, but I 
> couldn't find any solutions or mention in the FAQ ( 
> href="http://wiki.apache.org/solr/DataImportHandlerFaq"; target=_blank 
> >http://wiki.apache.org/solr/DataImportHandlerFaq )
>
> What is 
> the best practice to avoid or convert null values to something safer? 
>  Would
> this be done via a Transformer or is there a better mechanism for 
> this?
>
> I think the problem I'm describing is similar to what was 
> described here:  
> >http://search-lucene.com/m/cjlhtFkG6m
> ... except I don't have the 
> luxury of rewriting the SQL selects.
>
> Thanks,
> 
> Otis
> 
> Sematext :: 
> target=_blank >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene 
> ecosystem search :: 
> >http://search-lucene.com/
>
>



-- 
> 
-
Noble Paul | 
> Systems Architect| AOL | 
> >http://aol.com


Re: hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread Otis Gospodnetic
Hi,

Hm.  Everything is doable, but this sounds a bit undefined and possibly messy.  
If flexibility is of such importance, why have the "local" part at all?  Why 
not have everything in an elastic cloud environment?
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Dennis Gearon 
> To: solr-user@lucene.apache.org
> Sent: Sun, April 25, 2010 10:17:11 PM
> Subject: hybrid approach to using cloud servers for Solr/Lucene
> 
> I'm working on an app that could grow much faster and bigger than I could 
> scale 
> local resources, at least on certain dates and for other reasons.

So I'd 
> like to run a local machine in a dedicated host or even virtual machine at a 
> host.

If the load goes up, then queries are sent to the cloud at a 
> certain point.

Is this practical, anyone have experience in 
> this?

This is obviously a search engine app based on solr/lucene if 
> someone is wondering.

Dennis Gearon

Signature 
> Warning

EARTH has a Right To Life,
  otherwise we 
> all die.

Read 'Hot, Flat, and Crowded'
Laugh at 
> href="http://www.yert.com/film.php"; target=_blank 
> >http://www.yert.com/film.php


RE: Howto build a function query using the 'query' function

2010-04-25 Thread Villemos, Gert
If the 'query' returned a count, yes. But my problem is exactly that as far as 
I can see from the description of the 'query' function, it does NOT return the 
count but the score of the search.
 
So my quetion is;
 
How can I write a 'query' function that returns a count, not a score?
 
Cheers,
Gert.
 
 



From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Sun 4/25/2010 2:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Howto build a function query using the 'query' function



Villemos, Gert wrote:
> I want to build a function expression for a dismax request handler 'bf'
> field, to boost the documents if it is referenced by other documents.
> I.e. the more often a document is referenced, the higher the boost.
>
> 
>
> Something like
>
> 
> linear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference
> d, 1), 0.01, 1)
>
> 
>
> Intended to mean;
>
> if count is 0, then the boost is 0*0.01+1 = 1
>
> if count is 1, then the boost is 1*0.01+1 = 1.01
>
> If count is 100, then the boost is 100*0.01 + 1 = 2
>
> 
>
> However the query function
> (http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able
> to return the score of the query results, not the count of results.
>
>  
Probably I'm missing something, but doesn't just using
linear function meet your needs? i.e.

linear(myQueryReturningACountOfHowOftenThisDocumentIsReferenced, 0.01,1)

Koji

--
http://www.rondhuit.com/en/






Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread Dennis Gearon
I'm working on an app that could grow much faster and bigger than I could scale 
local resources, at least on certain dates and for other reasons.

So I'd like to run a local machine in a dedicated host or even virtual machine 
at a host.

If the load goes up, then queries are sent to the cloud at a certain point.

Is this practical, anyone have experience in this?

This is obviously a search engine app based on solr/lucene if someone is 
wondering.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


Re: performance of million documents search

2010-04-25 Thread Erick Erickson
NGrams might help here, search the SOLR list for "NGram"
and I think you'll find that this subject has been discussed
several times...

HTH
Erick

On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang  wrote:

> Hi,
>
> I have about 2 million documents in my index.  I want to search them by a
> string field.  Every document have this field such as 'LB681' .
>
> The field is a dynamic Field which type is string.  So, in solr/admin , I
> do
> search by using "  PartNo_s:L*  " which means started with L,
>
> I can get the result from 2 million documents less than 300 ms.  But,  when
> I use "  PartNo_s:*B68*  " which means included B68  to search,
>
> It take more than 2000 ms.  It is too slow for me.
>
> Has anyone know that how can I get the result more faster?
>
>
> thank you very much
>


Re: [spAm] Solr does not honor facet.mincount and field.facet.mincount

2010-04-25 Thread Chris Hostetter
: REQUEST: 
: 
http://localhost:8983/solr/select/?q=*%3A*&version=2.2&rows=0&start=0&indent=on&facet=true&facet.field=Instrument&facet.field=Location&facet.mincount=9
: 
: RESPONSE: 
...
:  
...
: 9 

...the REQUST url you listed says facet.mincount, but the response from 
Solr disagrees.  according to it you actaully had a capital "C" in 
facet.minCount ... solr params are case sensitive, so Solr is completley 
ignoring facet.minCount.


As for why you don't get any values for the Instrument facet -- 
understanding that requires you to tell us more about the field/fieldType 
for Instrument.


-Hoss



local vs cloud

2010-04-25 Thread Dennis Gearon
I'm working on an app that could grow much faster and bigger than I could scale 
local resources, at least on certain dates and for other reasons.

So I'd like to run a local machine in a dedicated host or even virtual machine 
at a host.

If the load goes up, then queries are sent to the cloud at a certain point.

Is this practical, anyone have experience in this?

This is obviously a search engine app based on solr/lucene if someone is 
wondering.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


Re: DIH: inner select fails when outter entity is null/empty

2010-04-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
do an onError="skip" on the inner entity

On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic
 wrote:
> Hello,
>
> Here is a newbie DataImportHandler question:
>
> Currently, I have entities with entities.  There are some
> situations where a column value from the outer entity is null, and when I try 
> to use it in the inner entity, the null just gets replaced with an
> empty string.  That in turn causes the SQL query in the inner entity to
> fail.
>
> This seems like a common problem, but I couldn't find any solutions or 
> mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq )
>
> What is the best practice to avoid or convert null values to something safer? 
>  Would
> this be done via a Transformer or is there a better mechanism for this?
>
> I think the problem I'm describing is similar to what was described here:  
> http://search-lucene.com/m/cjlhtFkG6m
> ... except I don't have the luxury of rewriting the SQL selects.
>
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: How to setup search engine for B2B web app

2010-04-25 Thread Shalin Shekhar Mangar
Hi Bill,

On Sun, Apr 25, 2010 at 12:23 PM, Bill Paetzke wrote:

> *Given:*
>
>   - 1 database per client (business customer)
>   - 5000 clients
>   - Clients have between 2 to 2000 users (avg is ~100 users/client)
>   - 100k to 10 million records per database
>   - Users need to search those records often (it's the best way to navigate
>   their data)
>
> *The Question:*
>
> How would you setup Solr (or Lucene) search so that each client can only
> search within its database?
>
> How would you setup the index(es)?
>

I'd look at setting up multiple cores for each client. You may need to setup
slaves as well depending on search traffic.


> Where do you store the index(es)?
>

Setting up 5K cores on one box will not work. So you will need to partition
the clients into multiple boxes each having a subset of cores.


> Would you need to add a filter to all search queries?
>

Nope, but you will need to send the query to the correct host (perhaps a
mapping DB will help)


> If a client cancelled, how would you delete their (part of the) index?
> (this
> may be trivial--not sure yet)
>
>
With different cores for each client, this'd be pretty easy.

-- 
Regards,
Shalin Shekhar Mangar.