Re: ExternalFileField best practices

2010-08-28 Thread Andy
But isn't it the case that bf adds the boost value while {!boost} multiply the 
boost value? In my case I think a multiplication is more appropriate.

So there's no way to use ExternalFileField in {!boost}?

--- On Sat, 8/28/10, Lance Norskog  wrote:

> From: Lance Norskog 
> Subject: Re: ExternalFileField best practices
> To: solr-user@lucene.apache.org
> Date: Saturday, August 28, 2010, 11:55 PM
> You want the boost function bf=
> parameter.
> 
> On Sat, Aug 28, 2010 at 5:32 PM, Andy 
> wrote:
> > Lance,
> >
> > Thanks for the response.
> >
> > Can I use an ExternalFileField as an input to a boost
> query?
> >
> > For example, if I put the field "popularity" in an
> ExternalFileField, can I still use "popularity" in a boosted
> query such as:
> >
> > {!boost b=log(popularity)}foo
> >
> > The doc says ExternalFileField can only be used in
> FunctionQuery. Does that include a boost query like {!boost
> b=log(popularity)}?
> >
> >
> > --- On Sat, 8/28/10, Lance Norskog 
> wrote:
> >
> >> From: Lance Norskog 
> >> Subject: Re: ExternalFileField best practices
> >> To: solr-user@lucene.apache.org
> >> Date: Saturday, August 28, 2010, 5:16 PM
> >> The file is completely reloaded when
> >> you commit or optimize. There is
> >> no incremental update available. And, yes, this
> could be a
> >> scaling
> >> problem.
> >>
> >> How you update it is completely external to Solr.
> >>
> >> On Sat, Aug 28, 2010 at 2:50 AM, Andy 
> >> wrote:
> >> > I'm interested in using ExternalFileField to
> store a
> >> field "popularity" that is being updated
> frequently.
> >> >
> >> > However ExternalFileField seems to be a
> pretty obscure
> >> feature. Have a few questions:
> >> >
> >> > 1) Can anyone share your experience using
> it?
> >> >
> >> > 2) What is the most efficient way to update
> the
> >> external file?
> >> > For example, the file could look like:
> >> >
> >> > 1=12      // the document with uniqueKey 1
> has a
> >> popularity of 12//
> >> > 2=4
> >> > 3=45
> >> > 5=78
> >> >
> >> > Now the popularity of document 1 is updated
> to 13:
> >> >
> >> > - What is the best way to update the file to
> reflect
> >> the change? Isn't this an O(n) operation?
> >> > - How to deal with concurrent updates to the
> file by
> >> multiple threads?
> >> >
> >> > Would this method of using an external file
> scale?
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> goks...@gmail.com
> >>
> >
> >
> >
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 





Re: ExternalFileField best practices

2010-08-28 Thread Lance Norskog
You want the boost function bf= parameter.

On Sat, Aug 28, 2010 at 5:32 PM, Andy  wrote:
> Lance,
>
> Thanks for the response.
>
> Can I use an ExternalFileField as an input to a boost query?
>
> For example, if I put the field "popularity" in an ExternalFileField, can I 
> still use "popularity" in a boosted query such as:
>
> {!boost b=log(popularity)}foo
>
> The doc says ExternalFileField can only be used in FunctionQuery. Does that 
> include a boost query like {!boost b=log(popularity)}?
>
>
> --- On Sat, 8/28/10, Lance Norskog  wrote:
>
>> From: Lance Norskog 
>> Subject: Re: ExternalFileField best practices
>> To: solr-user@lucene.apache.org
>> Date: Saturday, August 28, 2010, 5:16 PM
>> The file is completely reloaded when
>> you commit or optimize. There is
>> no incremental update available. And, yes, this could be a
>> scaling
>> problem.
>>
>> How you update it is completely external to Solr.
>>
>> On Sat, Aug 28, 2010 at 2:50 AM, Andy 
>> wrote:
>> > I'm interested in using ExternalFileField to store a
>> field "popularity" that is being updated frequently.
>> >
>> > However ExternalFileField seems to be a pretty obscure
>> feature. Have a few questions:
>> >
>> > 1) Can anyone share your experience using it?
>> >
>> > 2) What is the most efficient way to update the
>> external file?
>> > For example, the file could look like:
>> >
>> > 1=12      // the document with uniqueKey 1 has a
>> popularity of 12//
>> > 2=4
>> > 3=45
>> > 5=78
>> >
>> > Now the popularity of document 1 is updated to 13:
>> >
>> > - What is the best way to update the file to reflect
>> the change? Isn't this an O(n) operation?
>> > - How to deal with concurrent updates to the file by
>> multiple threads?
>> >
>> > Would this method of using an external file scale?
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Multiple passes with WordDelimiterFilterFactory

2010-08-28 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly 
photos, with some videos and text.  The data is imported from a MySQL 
database and split among six large shards (each nearly 13GB) and a small 
shard with data added in the last week, which usually works out to 
between 300,000 and 500,000 documents.


My goal is to reduce the index size without reducing the functionality. 
 Using copyField would just make it larger.


The biggest issue to solve is making sure that I don't have two terms 
when there's a punctuation character at the beginning or end of a word. 
 For intstance, one chunk of text that I just analyzed ends up with 
terms like the following, which are unneeded duplicates:


championship.
championship
'04
04
wisconsin.
wisconsin

Since I was already toying around, I just tested the whole notion with 
the analysis tool.  I configured two filter steps - the first with just 
generateWordParts and catenateWords enabled, the second with all the 
options including preserveOriginal enabled.  A test analysis of input 
with 59 whitespace separated words showed 93 terms with the single 
filter and 77 with two.  The only drop in term quality that I noticed 
was that possessive words (apostrophe-s) no longer have the original 
preserved.  I haven't yet decided whether that's a problem.


Shawn


On 8/27/2010 11:00 AM, Erick Erickson wrote:

I agree with Marcus, the usefulness of passing through WDF twice
is suspect. You can always do a copyfield to a completely different
field and do whatever you want there, copyfield forks the raw input
to the second field, not the analyzed stream...

What is it you're really trying to accomplish? Your use-case would
help us help you.

About defining things differently in index and analysis. Sure, it can
make sense. But, especially with WDF it's tricky. Spend some
significant time in the admin analysis page looking at the effects
of various configurations before you decide.

Best
Erick




Re: ExternalFileField best practices

2010-08-28 Thread Andy
Lance,

Thanks for the response.

Can I use an ExternalFileField as an input to a boost query?

For example, if I put the field "popularity" in an ExternalFileField, can I 
still use "popularity" in a boosted query such as:

{!boost b=log(popularity)}foo

The doc says ExternalFileField can only be used in FunctionQuery. Does that 
include a boost query like {!boost b=log(popularity)}?


--- On Sat, 8/28/10, Lance Norskog  wrote:

> From: Lance Norskog 
> Subject: Re: ExternalFileField best practices
> To: solr-user@lucene.apache.org
> Date: Saturday, August 28, 2010, 5:16 PM
> The file is completely reloaded when
> you commit or optimize. There is
> no incremental update available. And, yes, this could be a
> scaling
> problem.
> 
> How you update it is completely external to Solr.
> 
> On Sat, Aug 28, 2010 at 2:50 AM, Andy 
> wrote:
> > I'm interested in using ExternalFileField to store a
> field "popularity" that is being updated frequently.
> >
> > However ExternalFileField seems to be a pretty obscure
> feature. Have a few questions:
> >
> > 1) Can anyone share your experience using it?
> >
> > 2) What is the most efficient way to update the
> external file?
> > For example, the file could look like:
> >
> > 1=12      // the document with uniqueKey 1 has a
> popularity of 12//
> > 2=4
> > 3=45
> > 5=78
> >
> > Now the popularity of document 1 is updated to 13:
> >
> > - What is the best way to update the file to reflect
> the change? Isn't this an O(n) operation?
> > - How to deal with concurrent updates to the file by
> multiple threads?
> >
> > Would this method of using an external file scale?
> >
> > Thanks.
> >
> >
> >
> >
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 





Re: ExternalFileField best practices

2010-08-28 Thread Lance Norskog
The file is completely reloaded when you commit or optimize. There is
no incremental update available. And, yes, this could be a scaling
problem.

How you update it is completely external to Solr.

On Sat, Aug 28, 2010 at 2:50 AM, Andy  wrote:
> I'm interested in using ExternalFileField to store a field "popularity" that 
> is being updated frequently.
>
> However ExternalFileField seems to be a pretty obscure feature. Have a few 
> questions:
>
> 1) Can anyone share your experience using it?
>
> 2) What is the most efficient way to update the external file?
> For example, the file could look like:
>
> 1=12      // the document with uniqueKey 1 has a popularity of 12//
> 2=4
> 3=45
> 5=78
>
> Now the popularity of document 1 is updated to 13:
>
> - What is the best way to update the file to reflect the change? Isn't this 
> an O(n) operation?
> - How to deal with concurrent updates to the file by multiple threads?
>
> Would this method of using an external file scale?
>
> Thanks.
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: JVM GC is very frequent.

2010-08-28 Thread Bill Au
Besides frequency, you should also look at duration of GC events.  You may
want to try the concurrent garbage collector if you see many long full gc.

Bill

2010/8/25 Chengyang 

> We have about 500million documents are indexed.The index size is aobut 10G.
> Running on a 32bit box. During the pressure testing, we monitered that the
> JVM GC is very frequent, about 5min once. Is there any tips to turning this?
>


ExternalFileField best practices

2010-08-28 Thread Andy
I'm interested in using ExternalFileField to store a field "popularity" that is 
being updated frequently.

However ExternalFileField seems to be a pretty obscure feature. Have a few 
questions:

1) Can anyone share your experience using it? 

2) What is the most efficient way to update the external file?
For example, the file could look like:

1=12  // the document with uniqueKey 1 has a popularity of 12//
2=4
3=45
5=78

Now the popularity of document 1 is updated to 13:
 
- What is the best way to update the file to reflect the change? Isn't this an 
O(n) operation?
- How to deal with concurrent updates to the file by multiple threads?

Would this method of using an external file scale?

Thanks.


  


Implementing synonym NewBie

2010-08-28 Thread Jonty Rhods
Hi All,

I want to use synonym for my search.
Still I am in learning phase of solr. So please help me to implement synonym
in my search.
according to wiki synonym can be implemented in two ways.
1> at index time
2> at search time]

I have combination 10 of phrase for synonym so which will be better in my
case.
something like : live show in new york=>live show in clifornia=> live show
=> live show in DC => live show in USA
is synonym will effect my original search?

thanks
with regards
Jonty