Re: UTF-8 encoding

2012-03-29 Thread Paul Libbrecht
Henri,

look velocity.properties. I have there:
> input.encoding  = UTF-8

Do you also?

This is the vm files' encodings.
Of course also make sure you edit these files in UTF-8 (using jEdit made it 
trustable to me).

paul


Le 30 mars 2012 à 08:49, henri.gour...@laposte.net a écrit :

> OK, Ill try to provide more details:
> I am using solr-3.5.0
> I am running the example provided in the package.
> Some of the modifications I have done in the various velocity/*.vm files
> have accents!
> It is those accents that show up garbled when I look at the results.
> The .vm files are utf-8 encoded.
> Solr behaves correctly, and treats the utf-8 characters ok.
> My browser is utf-8 ready, and displays correctly results returned by solr
> 
> Cheers,
> Henri
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870081.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Quantiles in SOLR ???

2012-03-29 Thread Kashif Khan
Hi all,

I am doing R&D about SOLR using any quantiles function for a set. I need a
quick-start road-map for modifying  that quantiles function in my SOLR
plugin. I am thinking that it might be using any third party tools or
library for it.

--
Kashif Khan

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Quantiles-in-SOLR-tp3870084p3870084.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UTF-8 encoding

2012-03-29 Thread henri.gour...@laposte.net
OK, Ill try to provide more details:
I am using solr-3.5.0
I am running the example provided in the package.
Some of the modifications I have done in the various velocity/*.vm files
have accents!
It is those accents that show up garbled when I look at the results.
The .vm files are utf-8 encoded.
Solr behaves correctly, and treats the utf-8 characters ok.
My browser is utf-8 ready, and displays correctly results returned by solr

Cheers,
Henri



--
View this message in context: 
http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3870081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-29 Thread William Bell
Why don't yu contribute RA to the source so that it is a
feature/module inside SOLR?

On Thu, Mar 29, 2012 at 8:32 AM, Nagendra Nagarajayya
 wrote:
> It is from build 2012-03-19 from the trunk (part of the email). No fork.
>
>
> Regards,
>
> Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.org
>
> On 3/29/2012 7:20 AM, Bernd Fehling wrote:
>>
>> Nothing against RankingAlgorithm and your work, which sounds great, but
>> I think that YOUR "Solr 4.0" might confuse some Solr users and/or newbees.
>> As far as I know the next official release will be 3.6.
>>
>> So your "Solr 4.0" is a trunk snapshot or what?
>>
>> If so, which revision number?
>>
>> Or have you done a fork and produced a stable Solr 4.0 of your own?
>>
>> Regards
>> Bernd
>>
>>
>> Am 29.03.2012 15:49, schrieb Nagendra Nagarajayya:
>>>
>>> I am very excited to announce the availability of Solr 4.0 with
>>> RankingAlgorithm 1.4.1 (NRT support) (build 2012-03-19). The NRT
>>> implementation
>>> now supports both RankingAlgorithm and Lucene.
>>>
>>> RankingAlgorithm 1.4.1 has improved performance over the earlier release
>>> (1.4) and supports the entire Lucene Query Syntax, ą and/or boolean
>>> queries and is compatible with the new Lucene 4.0 api.
>>>
>>> You can get more information about NRT performance from here:
>>> http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x
>>>
>>> You can download Solr 4.0 with RankingAlgorithm 1.4.1 from here:
>>> http://solr-ra.tgels.org
>>>
>>> Please download and give the new version a try.
>>>
>>> Regards,
>>>
>>> Nagendra Nagarajayya
>>> http://solr-ra.tgels.org
>>> http://rankingalgorithm.tgels.org
>>>
>>>
>>
>>
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Empty facet counts

2012-03-29 Thread William Bell
Can you also include a /select?q=*:*&wt=xml

?

On Thu, Mar 29, 2012 at 11:47 AM, Erick Erickson
 wrote:
> Hmmm, looking at your schema, faceting on a  really doesn't make
> all that much sense, there will always be exactly one of them. At
> least it's highly
> questionable.
>
> But that's not your problem and what's wrong isn't at all obvious. Can you try
> pasting the results of adding &debugQuery=on?
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 11:12 AM, Youri Westerman  
> wrote:
>> The version is 3.5.0.2011.11.22.14.54.38. I did not apply any patches, but
>> then again it is not my server.
>> Do you have a clue on what is going wrong here?
>>
>> Regards,
>>
>> Youri
>>
>>
>> 2012/3/29 Bill Bell 
>>>
>>> Send schema.xml and did you apply any patches? What version of Solr?
>>>
>>> Bill Bell
>>> Sent from mobile
>>>
>>>
>>> On Mar 29, 2012, at 5:26 AM, Youri Westerman  wrote:
>>>
>>> > Hi,
>>> >
>>> > I'm currently learning how to use solr and everything seems pretty
>>> > straight
>>> > forward. For some reason when I use faceted queries it returns only
>>> > empty
>>> > sets in the facet_count section.
>>> >
>>> > The get params I'm using are:
>>> >  ?q=*:*&rows=0&facet=true&facet.field=urn
>>> >
>>> > The result:
>>> >  "facet_counts": {
>>> >
>>> >      "facet_queries": { },
>>> >      "facet_fields": { },
>>> >      "facet_dates": { },
>>> >      "facet_ranges": { }
>>> >
>>> >  }
>>> >
>>> > The urn field is indexed and there are enough entries to be counted.
>>> > When
>>> > adding facet.method=Enum, nothing changes.
>>> > Does anyone know why this is happening? Am I missing something?
>>> >
>>> > Thanks in advance!
>>> >
>>> > Youri
>>
>>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud

2012-03-29 Thread asia
Ok.Then what does exactly zookeeper do in Solrcloud?Why we use?I am geetting
query response from both shards even without using zookeeper.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3869896.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NullPointException when Faceting

2012-03-29 Thread Jamie Johnson
I don't believe this is related to that bug as I don't set the facet
prefix.  I have honestly never seen the issue before either and an
optimize fixed it.  I can try to see if I can duplicate the issue
tomorrow, but since I've done the optimize I haven't seen it again.

On Thu, Mar 29, 2012 at 7:45 PM, Yonik Seeley
 wrote:
> On Thu, Mar 29, 2012 at 6:33 PM, Jamie Johnson  wrote:
>> I recently got this stack trace when trying to execute a facet based
>> query on my index.  The error went away when I did an optimize but I
>> was surprised to see it at all.  Can anyone shed some light on why
>> this may have happened?
>
> I don't see how that could happen (and I've never seen it happen).
>
> I recently fixed one NPE: https://issues.apache.org/jira/browse/SOLR-3150
> Hopefully this isn't another!
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10


Constant Score queries in functions

2012-03-29 Thread Kevin Osborn
I am using the function query as part of an frange.


So, something like this: q=productId:[* TO *] fq={!frange
l=1}ceil(query(!v='documentType:(blah1 blah2 blah3)'))

This is actually quite slow. I suspect that the problem is the query
function is calculating a score for every document in the index. Of course,
I don't actually care about this score. I just want to know if it matches
or not. (I can't actually just make it a separate filter because it is
actually part of a bigger function string.)

Is there anyway to force it to use contstant scoring like a range would? It
is not a range, but that is the type of scoring I want. I suspect that
would make it much faster, but am not sure if that is possible. Thanks.

-- 
KEVIN OSBORN
LEAD SOFTWARE ENGINEER
T 949.399.8714  C 949.310.4677
5 Park Plaza, Suite 600, Irvine, CA 92614


Re: NullPointException when Faceting

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 6:33 PM, Jamie Johnson  wrote:
> I recently got this stack trace when trying to execute a facet based
> query on my index.  The error went away when I did an optimize but I
> was surprised to see it at all.  Can anyone shed some light on why
> this may have happened?

I don't see how that could happen (and I've never seen it happen).

I recently fixed one NPE: https://issues.apache.org/jira/browse/SOLR-3150
Hopefully this isn't another!

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
The documents are removed from the search when the delete is committed.

The space for those documents is reclaimed at the next merge for the segment 
where they were. 

wunder

On Mar 29, 2012, at 4:15 PM, Jamie Johnson wrote:

> Thanks, does it matter that we are also updates to documents at
> various times?  Do the deleted documents get removed when doing a
> merge or does that only get done on an optimize?
> 
> On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood  
> wrote:
>> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make 
>> that much difference and there is almost never a need to do it on a periodic 
>> basis.
>> 
>> The full merge will mean a longer time between the commit and the time that 
>> the data is first searchable. Do the commit, then search.
>> 
>> wunder
>> 
>> On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:
>> 
>>> What is the best way to periodically optimize a Solr index?  I've seen
>>> a few places where this is done from a CRON job, but I wanted to know
>>> if there are any other techniques that are used in practice for doing
>>> this.  My use case is that we generally load a large corpus of data up
>>> front and then information trickle's in after that, but we want this
>>> information to be available for search within a reasonable amount of
>>> time (say 10 minutes).  I believe that the CRON job would probably
>>> suffice but if there are any other thoughts/suggestions I'd be
>>> interested to hear them.
>> 






Re: Optimizing in SolrCloud

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 7:15 PM, Jamie Johnson  wrote:
> Thanks, does it matter that we are also updates to documents at
> various times?  Do the deleted documents get removed when doing a
> merge or does that only get done on an optimize?

Yes, any merge removes documents that have been marked as deleted
(from the segments involved in the merge).

Optimize can still make sense, but more often in scenarios where
documents are updated infrequently.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Optimizing in SolrCloud

2012-03-29 Thread Jamie Johnson
Thanks, does it matter that we are also updates to documents at
various times?  Do the deleted documents get removed when doing a
merge or does that only get done on an optimize?

On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood  wrote:
> Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make 
> that much difference and there is almost never a need to do it on a periodic 
> basis.
>
> The full merge will mean a longer time between the commit and the time that 
> the data is first searchable. Do the commit, then search.
>
> wunder
>
> On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:
>
>> What is the best way to periodically optimize a Solr index?  I've seen
>> a few places where this is done from a CRON job, but I wanted to know
>> if there are any other techniques that are used in practice for doing
>> this.  My use case is that we generally load a large corpus of data up
>> front and then information trickle's in after that, but we want this
>> information to be available for search within a reasonable amount of
>> time (say 10 minutes).  I believe that the CRON job would probably
>> suffice but if there are any other thoughts/suggestions I'd be
>> interested to hear them.
>
>
>
>
>


Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
Don't. "Optimize" is a poorly-chosen name for a full merge. It doesn't make 
that much difference and there is almost never a need to do it on a periodic 
basis.

The full merge will mean a longer time between the commit and the time that the 
data is first searchable. Do the commit, then search.

wunder

On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:

> What is the best way to periodically optimize a Solr index?  I've seen
> a few places where this is done from a CRON job, but I wanted to know
> if there are any other techniques that are used in practice for doing
> this.  My use case is that we generally load a large corpus of data up
> front and then information trickle's in after that, but we want this
> information to be available for search within a reasonable amount of
> time (say 10 minutes).  I believe that the CRON job would probably
> suffice but if there are any other thoughts/suggestions I'd be
> interested to hear them.







Re: bbox query and range queries

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 6:44 PM, Alexandre Rocco  wrote:
> Yonik,
>
> Thanks for the heads-up. That one worked.
>
> Just trying to wrap around how it would work on a real case. To test this
> one I just got the coordinates from Google Maps and searched within the pair
> of coordinates as I got them. Should I always check which is the lower and
> upper to assemble the query?

Yep... range query on LatLonField is currently pretty low level, and
you need to ensure yourself that lat1<=lat2 and lon1<=lon2 in
[lat1,lon1 TO lat2,lon2]

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Yonik,

Thanks for the heads-up. That one worked.

Just trying to wrap around how it would work on a real case. To test this
one I just got the coordinates from Google Maps and searched within the
pair of coordinates as I got them. Should I always check which is the lower
and upper to assemble the query?
I know that this one is off-topic, just curious.

Thanks
Alexandre

On Thu, Mar 29, 2012 at 7:26 PM, Yonik Seeley wrote:

> On Thu, Mar 29, 2012 at 6:20 PM, Alexandre Rocco 
> wrote:
> > http://localhost:8984/solr/select?q=*:*&fq=local:[-23.6677,-46.7315 TO
> > -23.6709,-46.7261]
>
> Range queries always need to be [lower_bound TO upper_bound]
> Try
> http://localhost:8984/solr/select?q=*:*&fq=local:[-23.6709,-46.7315 TO
> -23.6677,-46.7261]
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>


Re: bbox query and range queries

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 6:20 PM, Alexandre Rocco  wrote:
> http://localhost:8984/solr/select?q=*:*&fq=local:[-23.6677,-46.7315 TO
> -23.6709,-46.7261]

Range queries always need to be [lower_bound TO upper_bound]
Try
http://localhost:8984/solr/select?q=*:*&fq=local:[-23.6709,-46.7315 TO
-23.6677,-46.7261]

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Erick,

Just checked on the separate fields and everything looks fine.
One thing that I'm not completely sure is if this query I tried to perform
is correct.

One sample document looks like this:

200
-23.6696784,-46.7290193
-23.6696784
-46.7290193


So, to find for this document I tried to create a virtual rectangle that
would be queried using the range query I described:
http://localhost:8984/solr/select?q=*:*&fq=local:[-23.6677,-46.7315 TO
-23.6709,-46.7261]

You see that in the first coordinate I used a smaller value (got it from
map) that is on the top left corner of the area of the doc. The other
coordinate is on the bottom right corner, and it's bigger than the doc
local field.

When I split the query in 2 parts, the first part
(local_1_coordinate:[-46.7315 TO -46.7261]) returns results but the other
part (local_0_coordinate:[-23.6709 TO -23.6677]) doesn't match any docs.

I am guessing that my query is wrong. The typical use case is to take the
bounds of part of an map, that is represented by these top left and bottom
right coordinates and find the docs inside this area. Does this range query
accomplish this kind of scenario?

Any pointers are appreciated.

Best,
Alexandre

On Thu, Mar 29, 2012 at 3:54 PM, Erick Erickson wrote:

> This all looks fine, so the next question is whether or not your
> documents have the value you think.
>
> +local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
> -46.7274]
> is the actual translated filter.
>
> So I'd check the actual documents in the index to see if you have a single
> document with local_0 and local_1 that fits the above. You should be able
> to
> use the TermsComponent: http://wiki.apache.org/solr/TermsComponent
> to look. Or switch to stored="true" and look at search results for
> documents you think should match, just to see the raw value Who knows?
> It could be something as silly as you have your lat/lon backwards somehow,
> I've
> spent _days_ having problems like that ...
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 2:34 PM, Alexandre Rocco 
> wrote:
> > Erick,
> >
> > My location field is defined like in the example project:
> > 
> >
> > Also, there is the dynamic that stores the splitted coordinates:
> >  > stored="false" multiValued="false"/>
> >
> > The response XML with debugQuery=on is looking like this:
> > 
> > 
> > 0
> > 1
> > 
> > 
> > 
> > *:*
> > *:*
> > MatchAllDocsQuery(*:*)
> > *:*
> > 
> > LuceneQParser
> > 
> > local:[-23.6674,-46.7314 TO -23.6705,-46.7274]
> > 
> > 
> > 
> > +local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314
> TO
> > -46.7274]
> > 
> > 
> > 
> > 1.0
> > 
> > 0.0
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 
> > 1.0
> > 
> > 1.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 0.0
> > 
> > 
> > 
> > 
> > 
> >
> > I tried to get some docs that contains the coordinates and then created a
> > retangle around that doc to see it is returned between these ranges.
> > Don't know if this is the best way to test it, but it's quite easy.
> >
> > Best,
> > Alexandre
> >
> > On Thu, Mar 29, 2012 at 2:57 PM, Erick Erickson  >wrote:
> >
> >> What are your results? Can you show us the field definition for "local"
> >> and the results of adding &debugQuery=on?
> >>
> >> Because this should work as far as I can tell.
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Mar 29, 2012 at 11:04 AM, Alexandre Rocco 
> >> wrote:
> >> > Hello,
> >> >
> >> > I'm trying to perform some queries on a location field on the index.
> >> > The requirement is to search listings inside a pair of coordinates,
> like
> >> a
> >> > bounding box.
> >> >
> >> > Taking a look on the wiki, I noticed that there is the option to use
> the
> >> > bbox query but in does not create a retangular shaped box to find the
> >> docs.
> >> > Also since the LatLon field is searchable by range, it's possible to
> use
> >> a
> >> > range query to find.
> >> >
> >> > I'm trying to search inside a pair of coordinates (the top left corner
> >> and
> >> > bottom right corner) and no result is found.
> >> >
> >> > The query i'm trying is something like:
> >> >
> >>
> http://localhost:8984/solr/select?wt=json&indent=true&fl=local,*&q=*:*&fq=local:[-23.6674,-46.7314TO
> >> > -23.6705,-46.7274]
> >> >
> >> > Is there any other way to find docs inside a rectangular bounding box?
> >> >
> >> > Thanks
> >> > Alexandre
> >>
>


Re: Custom scoring question

2012-03-29 Thread Darren Govoni
Yeah, I guess that would work. I wasn't sure if it would change relative
to other documents. But if it were to be combined with other fields,
that approach may not work because the calculation wouldn't include the
scoring for other parts of the query. So then you have the dynamic score
and what to do with it.

On Thu, 2012-03-29 at 16:29 -0300, Tomás Fernández Löbbe wrote:
> Can't you simply calculate that at index time and assign the result to a
> field, then sort by that field.
> 
> On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni  wrote:
> 
> > I'm going to try index time per-field boosting and do the boost
> > computation at index time and see if that helps.
> >
> > On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > > Hi,
> > >  I have a situation I want to re-score document relevance.
> > >
> > > Let's say I have two fields:
> > >
> > > text: The quick brown fox jumped over the white fence.
> > > terms: fox fence
> > >
> > > Now my queries come in as:
> > >
> > > terms:[* TO *]
> > >
> > > and Solr scores them on that field.
> > >
> > > What I want is to rank them according to the distribution of field
> > > "terms" within field "text". Which is a per document calculation.
> > >
> > > Can this be done with any kind of dismax? I'm not searching for known
> > > terms at query time.
> > >
> > > If not, what is the best way to implement a custom scoring handler to
> > > perform this calculation and re-score/sort the results?
> > >
> > > thanks for any tips!!!
> > >
> >
> >
> >




Re: Localize the largest fields (content) in index

2012-03-29 Thread Erick Erickson
I don't think there's really any reason SolrCloud won't work with
Tomcat, the setup is
probably just tricky. See:
http://lucene.472066.n3.nabble.com/SolrCloud-new-td1528872.html
It's about a year old, but might prove helpful.

Best
Erick

On Thu, Mar 29, 2012 at 3:41 PM, Vadim Kisselmann
 wrote:
> Yes, i think so, too :)
> MLT doesn´t need termVectors really, but it´s faster with them. I
> found out, what
> MLT works better on the title field in my case, instead of big text fields.
>
> Sharding is in planning, but my setup with SolrCloud, ZK and Tomcat
> doesn´t work,
> see here: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3CCA+GXEZE3LCTtgXFzn9uEdRxMymGF=z0ujb9s8b0qkipafn6...@mail.gmail.com%3E
> I split my huge index (150GB-index in this case is my test-index), and
> want use SolrCloud,
> but it´s not runnable with tomcat at this time.
>
> Best regards
> Vadim
>
>
> 2012/3/29 Erick Erickson :
>> Yeah, it's worth a try. The term vectors aren't entirely necessary for
>> highlighting,
>> although they do make things more efficient.
>>
>> As far as MLT, does MLT really need such a big field?
>>
>> But you may be on your way to sharding your index if you remove this info
>> and testing shows problems
>>
>> Best
>> Erick
>>
>> On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann
>>  wrote:
>>> Hi Erick,
>>> thanks:)
>>> The admin UI give me the counts, so i can identify fields with big
>>> bulks of unique terms.
>>> I known this wiki-page, but i read it one more time.
>>> List of my file extensions with size in GB(Index size ~150GB):
>>> tvf 90GB
>>> fdt 30GB
>>> tim 18GB
>>> prx 15GB
>>> frq 12GB
>>> tip 200MB
>>> tvx 150MB
>>>
>>> tvf is my biggest file extension.
>>> Wiki :This file contains, for each field that has a term vector
>>> stored, a list of the terms, their frequencies and, optionally,
>>> position and offest information.
>>>
>>> Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
>>> But i think i should test my performance without termVectors. Good Idea? :)
>>>
>>> What do you think about my file extension sizes?
>>>
>>> Best regards
>>> Vadim
>>>
>>>
>>>
>>>
>>> 2012/3/29 Erick Erickson :
 The admin UI (schema browser) will give you the counts of unique terms
 in your fields, which is where I'd start.

 I suspect you've already seen this page, but if not:
 http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
 the .fdt and .fdx file extensions are where data goes when
 you set 'stored="true" '. These files don't affect search speed,
 they just contain the verbatim copy of the data.

 The relative sizes of the various files above should give
 you a hint as to what's using the most space, but it'll be a bit
 of a hunt for you to pinpoint what's actually up. TermVectors
 and norms are often sources of using up space.

 Best
 Erick

 On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
  wrote:
> Hello folks,
>
> i work with Solr 4.0 r1292064 from trunk.
> My index grows fast, with 10Mio. docs i get an index size of 150GB
> (25% stored, 75% indexed).
> I want to find out, which fields(content) are too large, to consider 
> measures.
>
> How can i localize/discover the largest fields in my index?
> Luke(latest from trunk) doesn't work
> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
> this these, but i get many errors
> and can't build it.
>
> What other options do i have?
>
> Thanks and best regards
> Vadim


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Yonik Seeley
Oops... my previous replies accidentally went off-list.  I'll cut-n-paste below.

OK, so it looks like there is probably no bug here - it's simply that
commits can sometimes take a long time and updates were blocked during
that time (and would have succeeded eventually except the jetty
timeout was not set long enough).

Things are better in trunk (4.0) with soft commits and updates that
can proceed concurrently with commits.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10



On Thu, Mar 29, 2012 at 3:11 PM, Rafal Gwizdala
 wrote:
> You're right, this is not default Jetty from Solr - I configured it from
> scratch and then added Solr.
> Previously I had autocommit enabled and also did commit on every update so
> this might also contribute to the problem. Now I disabled it and made the
> updates less frequent.
> If the autocommit is allowed to happen together with 'manual' commit on
> update then there could be simultaneous commits, which now shouldn't happen
> - there will be at most one update/commit active at a time.
> Request timeout is default for jetty, but don't know what's that value.
>
> Best regards
> RG
>

I wrote:
On Thu, Mar 29, 2012 at 2:25 PM, Rafal Gwizdala
 wrote:
> Yonik, I didn't say there was an update request active at the moment the
> thread dump was made, only that previous update requests failed with a
> timeout. So maybe this is the missing piece.
> I didn't enable nio with Jetty, probably it's there by default.

Not with the jetty that comes with Solr.

bq. If solr hangs next time I'll try to make a thread dump when the
update request is waiting for completion.

Great!  We need to see where it's hanging!
Also, how long did the request take to time out?  Do you have
auto-commit enabled?
In the 3x series, updates will block while commits are in progress, so
timeouts can happen if they are set too short (and it seems like maybe
you aren't using the Jetty from Solr, so the configuration may not be
ideal).


Re: Localize the largest fields (content) in index

2012-03-29 Thread Vadim Kisselmann
Yes, i think so, too :)
MLT doesn´t need termVectors really, but it´s faster with them. I
found out, what
MLT works better on the title field in my case, instead of big text fields.

Sharding is in planning, but my setup with SolrCloud, ZK and Tomcat
doesn´t work,
see here: 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3CCA+GXEZE3LCTtgXFzn9uEdRxMymGF=z0ujb9s8b0qkipafn6...@mail.gmail.com%3E
I split my huge index (150GB-index in this case is my test-index), and
want use SolrCloud,
but it´s not runnable with tomcat at this time.

Best regards
Vadim


2012/3/29 Erick Erickson :
> Yeah, it's worth a try. The term vectors aren't entirely necessary for
> highlighting,
> although they do make things more efficient.
>
> As far as MLT, does MLT really need such a big field?
>
> But you may be on your way to sharding your index if you remove this info
> and testing shows problems
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann
>  wrote:
>> Hi Erick,
>> thanks:)
>> The admin UI give me the counts, so i can identify fields with big
>> bulks of unique terms.
>> I known this wiki-page, but i read it one more time.
>> List of my file extensions with size in GB(Index size ~150GB):
>> tvf 90GB
>> fdt 30GB
>> tim 18GB
>> prx 15GB
>> frq 12GB
>> tip 200MB
>> tvx 150MB
>>
>> tvf is my biggest file extension.
>> Wiki :This file contains, for each field that has a term vector
>> stored, a list of the terms, their frequencies and, optionally,
>> position and offest information.
>>
>> Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
>> But i think i should test my performance without termVectors. Good Idea? :)
>>
>> What do you think about my file extension sizes?
>>
>> Best regards
>> Vadim
>>
>>
>>
>>
>> 2012/3/29 Erick Erickson :
>>> The admin UI (schema browser) will give you the counts of unique terms
>>> in your fields, which is where I'd start.
>>>
>>> I suspect you've already seen this page, but if not:
>>> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
>>> the .fdt and .fdx file extensions are where data goes when
>>> you set 'stored="true" '. These files don't affect search speed,
>>> they just contain the verbatim copy of the data.
>>>
>>> The relative sizes of the various files above should give
>>> you a hint as to what's using the most space, but it'll be a bit
>>> of a hunt for you to pinpoint what's actually up. TermVectors
>>> and norms are often sources of using up space.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
>>>  wrote:
 Hello folks,

 i work with Solr 4.0 r1292064 from trunk.
 My index grows fast, with 10Mio. docs i get an index size of 150GB
 (25% stored, 75% indexed).
 I want to find out, which fields(content) are too large, to consider 
 measures.

 How can i localize/discover the largest fields in my index?
 Luke(latest from trunk) doesn't work
 with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
 this these, but i get many errors
 and can't build it.

 What other options do i have?

 Thanks and best regards
 Vadim


Re: Custom scoring question

2012-03-29 Thread Tomás Fernández Löbbe
Can't you simply calculate that at index time and assign the result to a
field, then sort by that field.

On Thu, Mar 29, 2012 at 12:07 PM, Darren Govoni  wrote:

> I'm going to try index time per-field boosting and do the boost
> computation at index time and see if that helps.
>
> On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> > Hi,
> >  I have a situation I want to re-score document relevance.
> >
> > Let's say I have two fields:
> >
> > text: The quick brown fox jumped over the white fence.
> > terms: fox fence
> >
> > Now my queries come in as:
> >
> > terms:[* TO *]
> >
> > and Solr scores them on that field.
> >
> > What I want is to rank them according to the distribution of field
> > "terms" within field "text". Which is a per document calculation.
> >
> > Can this be done with any kind of dismax? I'm not searching for known
> > terms at query time.
> >
> > If not, what is the best way to implement a custom scoring handler to
> > perform this calculation and re-score/sort the results?
> >
> > thanks for any tips!!!
> >
>
>
>


Post Sorting hook before the doc slicing.

2012-03-29 Thread Stephane Bailliez
I'm currently looking to see what would be a decent way to implement a
scrolling window in the result set when looking for an item.

Basically, I need to find item X in the result set and return say N items
before and N items after.

< - N items -- Item X --- N items >

So I was thinking a post filter could do the work, where I'm basically
looking for the id of the document, select it + the N documents after.
This is easy to do, however in the end it cannot work since this doc
selection need to happen post sorting while the post filtering is before
the sorting.

So I might be wrong, but it looks like the only way would be to create a
custom SolrIndexSearcher which will find the offset and create the related
docslice. That slicing part doesn't seem to be well factored that I can
see, so it seems to imply copy/pasting a significant chunk off the code. Am
I looking at the wrong place ?

-- stephane


Re: bbox query and range queries

2012-03-29 Thread Erick Erickson
This all looks fine, so the next question is whether or not your
documents have the value you think.

+local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
-46.7274]
is the actual translated filter.

So I'd check the actual documents in the index to see if you have a single
document with local_0 and local_1 that fits the above. You should be able to
use the TermsComponent: http://wiki.apache.org/solr/TermsComponent
to look. Or switch to stored="true" and look at search results for
documents you think should match, just to see the raw value Who knows?
It could be something as silly as you have your lat/lon backwards somehow, I've
spent _days_ having problems like that ...

Best
Erick

On Thu, Mar 29, 2012 at 2:34 PM, Alexandre Rocco  wrote:
> Erick,
>
> My location field is defined like in the example project:
> 
>
> Also, there is the dynamic that stores the splitted coordinates:
>  stored="false" multiValued="false"/>
>
> The response XML with debugQuery=on is looking like this:
> 
> 
> 0
> 1
> 
> 
> 
> *:*
> *:*
> MatchAllDocsQuery(*:*)
> *:*
> 
> LuceneQParser
> 
> local:[-23.6674,-46.7314 TO -23.6705,-46.7274]
> 
> 
> 
> +local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
> -46.7274]
> 
> 
> 
> 1.0
> 
> 0.0
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 1.0
> 
> 1.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 
> 
>
> I tried to get some docs that contains the coordinates and then created a
> retangle around that doc to see it is returned between these ranges.
> Don't know if this is the best way to test it, but it's quite easy.
>
> Best,
> Alexandre
>
> On Thu, Mar 29, 2012 at 2:57 PM, Erick Erickson 
> wrote:
>
>> What are your results? Can you show us the field definition for "local"
>> and the results of adding &debugQuery=on?
>>
>> Because this should work as far as I can tell.
>>
>> Best
>> Erick
>>
>> On Thu, Mar 29, 2012 at 11:04 AM, Alexandre Rocco 
>> wrote:
>> > Hello,
>> >
>> > I'm trying to perform some queries on a location field on the index.
>> > The requirement is to search listings inside a pair of coordinates, like
>> a
>> > bounding box.
>> >
>> > Taking a look on the wiki, I noticed that there is the option to use the
>> > bbox query but in does not create a retangular shaped box to find the
>> docs.
>> > Also since the LatLon field is searchable by range, it's possible to use
>> a
>> > range query to find.
>> >
>> > I'm trying to search inside a pair of coordinates (the top left corner
>> and
>> > bottom right corner) and no result is found.
>> >
>> > The query i'm trying is something like:
>> >
>> http://localhost:8984/solr/select?wt=json&indent=true&fl=local,*&q=*:*&fq=local:[-23.6674,-46.7314TO
>> > -23.6705,-46.7274]
>> >
>> > Is there any other way to find docs inside a rectangular bounding box?
>> >
>> > Thanks
>> > Alexandre
>>


Re: query help

2012-03-29 Thread Erick Erickson
Boosting won't help either I don't think. Boosts apply to the
document, and pretty much ignore position information.

Best
Erick

On Thu, Mar 29, 2012 at 2:07 PM, Abhishek tiwari
 wrote:
> can i achieve this with help of  boosting technique ?
>
> On Thu, Mar 29, 2012 at 10:42 PM, Erick Erickson 
> wrote:
>
>> Solr doesn't support sorting on multValued fields so I don't think this
>> is possible OOB.
>>
>> I can't come up with a clever indexing solution that does this either,
>> sorry.
>>
>> Best
>> Erick
>>
>> On Thu, Mar 29, 2012 at 8:27 AM, Abhishek tiwari
>>  wrote:
>> > a) No. i do not want to sort the content within document .
>> > I want to sort the documents .
>> > b) As i have explained i have result set( documents ) and each document
>> > contains a fields "*ad_text*" (with other fields also) which is
>> > multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
>> > are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
>> > doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*
>> >
>> > if i search for B1: result should come in following order:
>> > doc1,doc3,doc2,doc4
>> > (As B1 is first value in maltivalued result for doc1and doc3, and B1 is
>> in
>> > 2nd value in doc2 while  B1 is at 3rd in doc4  )
>> > if i search for B2: result should come in following order: doc4
>> > ,doc1,doc3,doc2
>> >
>> >
>> > I donot know whether it is possible or not ..
>> >
>> > but please suggest how it can be done.
>> >
>> >
>> >
>> > On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson > >wrote:
>> >
>> >> Hmmm, I don't quite get this. Are you saying that you want
>> >> to sort the documents or sort the content within the document?
>> >>
>> >> Sorting documents (i.e the results list) requires a single-valued
>> >> field. So you'd have to, at index time, sort the entries.
>> >>
>> >> Sorting the content within the document is something you'd
>> >> have to do when you index, Solr doesn't rearrange the
>> >> contents of a document.
>> >>
>> >> If all you want to do is display the results within the document
>> >> in order, your app can do that as it builds the display page.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
>> >>  wrote:
>> >> > Hi ,
>> >> > i have multi valued field want to sort the docs order the particular
>> >> > text eq:'B1' is added.
>> >> > how i should query? ad_text is multivalued field.
>> >> >
>> >> > t
>> >> >
>> >> > 
>> >> > 
>> >> > B1
>> >> > B2
>> >> > B3
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > B2
>> >> > B1
>> >> > B3
>> >> > 
>> >> > 
>> >> >
>> >> > 
>> >> > 
>> >> > B1
>> >> > B2
>> >> > B3
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > B3
>> >> > B2
>> >> > B1
>> >> > 
>> >> > 
>> >>
>>


Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Erick,

My location field is defined like in the example project:


Also, there is the dynamic that stores the splitted coordinates:


The response XML with debugQuery=on is looking like this:


0
1



*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

local:[-23.6674,-46.7314 TO -23.6705,-46.7274]



+local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
-46.7274]



1.0

0.0

0.0


0.0


0.0


0.0


0.0


0.0



1.0

1.0


0.0


0.0


0.0


0.0


0.0






I tried to get some docs that contains the coordinates and then created a
retangle around that doc to see it is returned between these ranges.
Don't know if this is the best way to test it, but it's quite easy.

Best,
Alexandre

On Thu, Mar 29, 2012 at 2:57 PM, Erick Erickson wrote:

> What are your results? Can you show us the field definition for "local"
> and the results of adding &debugQuery=on?
>
> Because this should work as far as I can tell.
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 11:04 AM, Alexandre Rocco 
> wrote:
> > Hello,
> >
> > I'm trying to perform some queries on a location field on the index.
> > The requirement is to search listings inside a pair of coordinates, like
> a
> > bounding box.
> >
> > Taking a look on the wiki, I noticed that there is the option to use the
> > bbox query but in does not create a retangular shaped box to find the
> docs.
> > Also since the LatLon field is searchable by range, it's possible to use
> a
> > range query to find.
> >
> > I'm trying to search inside a pair of coordinates (the top left corner
> and
> > bottom right corner) and no result is found.
> >
> > The query i'm trying is something like:
> >
> http://localhost:8984/solr/select?wt=json&indent=true&fl=local,*&q=*:*&fq=local:[-23.6674,-46.7314TO
> > -23.6705,-46.7274]
> >
> > Is there any other way to find docs inside a rectangular bounding box?
> >
> > Thanks
> > Alexandre
>


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Rafal Gwizdala
Yonik, I didn't say there was an update request active at the moment the
thread dump was made, only that previous update requests failed with a
timeout. So maybe this is the missing piece.
I didn't enable nio with Jetty, probably it's there by default. Disabling
it is the next thing to check.
If solr hangs next time I'll try to make a thread dump when the update
request is waiting for completion.

Best regards
RG

On Thu, Mar 29, 2012 at 8:19 PM, Yonik Seeley wrote:

> On Thu, Mar 29, 2012 at 1:50 PM, Rafal Gwizdala
>  wrote:
> > Below i'm pasting the thread dump taken when the update was hung (it's
> also
> > attached to the first message of this topic)
>
> Interesting...
> It looks like there's only one thread in solr code (the one generating
> the thread dump).
>
> The stack trace looks like you switched Jetty to use the NIO connector
> perhaps?
> Could you try with the Jetty shipped with Solr (exactly as configured)?
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 1:50 PM, Rafal Gwizdala
 wrote:
> Below i'm pasting the thread dump taken when the update was hung (it's also
> attached to the first message of this topic)

Interesting...
It looks like there's only one thread in solr code (the one generating
the thread dump).

The stack trace looks like you switched Jetty to use the NIO connector perhaps?
Could you try with the Jetty shipped with Solr (exactly as configured)?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: query help

2012-03-29 Thread Abhishek tiwari
can i achieve this with help of  boosting technique ?

On Thu, Mar 29, 2012 at 10:42 PM, Erick Erickson wrote:

> Solr doesn't support sorting on multValued fields so I don't think this
> is possible OOB.
>
> I can't come up with a clever indexing solution that does this either,
> sorry.
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 8:27 AM, Abhishek tiwari
>  wrote:
> > a) No. i do not want to sort the content within document .
> > I want to sort the documents .
> > b) As i have explained i have result set( documents ) and each document
> > contains a fields "*ad_text*" (with other fields also) which is
> > multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
> > are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
> > doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*
> >
> > if i search for B1: result should come in following order:
> > doc1,doc3,doc2,doc4
> > (As B1 is first value in maltivalued result for doc1and doc3, and B1 is
> in
> > 2nd value in doc2 while  B1 is at 3rd in doc4  )
> > if i search for B2: result should come in following order: doc4
> > ,doc1,doc3,doc2
> >
> >
> > I donot know whether it is possible or not ..
> >
> > but please suggest how it can be done.
> >
> >
> >
> > On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson  >wrote:
> >
> >> Hmmm, I don't quite get this. Are you saying that you want
> >> to sort the documents or sort the content within the document?
> >>
> >> Sorting documents (i.e the results list) requires a single-valued
> >> field. So you'd have to, at index time, sort the entries.
> >>
> >> Sorting the content within the document is something you'd
> >> have to do when you index, Solr doesn't rearrange the
> >> contents of a document.
> >>
> >> If all you want to do is display the results within the document
> >> in order, your app can do that as it builds the display page.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
> >>  wrote:
> >> > Hi ,
> >> > i have multi valued field want to sort the docs order the particular
> >> > text eq:'B1' is added.
> >> > how i should query? ad_text is multivalued field.
> >> >
> >> > t
> >> >
> >> > 
> >> > 
> >> > B1
> >> > B2
> >> > B3
> >> > 
> >> > 
> >> > 
> >> > 
> >> > B2
> >> > B1
> >> > B3
> >> > 
> >> > 
> >> >
> >> > 
> >> > 
> >> > B1
> >> > B2
> >> > B3
> >> > 
> >> > 
> >> > 
> >> > 
> >> > B3
> >> > B2
> >> > B1
> >> > 
> >> > 
> >>
>


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Erick Erickson
More memory is not necessarily better, it can lead to longer, more
intense garbage collections that cause things to stop. You might
also consider lowering your memory allocation, but 2G is really not
all that much so I somewhat doubt it's a problem but thought I'd
mention it.

Best
Erick

On Thu, Mar 29, 2012 at 1:50 PM, Rafal Gwizdala
 wrote:
> Guys, thanks for all the suggestions
> I will be trying them, one at a time. Imho it's too early to give up and
> look for another tool, I'll try to work on configuration and see what
> happens.
> The NRT looks quite promising, there are also tons of config options to
> change.
> As for now, I have made the updates less frequent - about once every 30
> seconds (but now the batches are bigger, about 150-200 documents per
> update). I'll see if this makes SOLR more stable or users more aggressive.
> Unfortunately I have no resources for experimenting so I'll keep making
> small changes to production system and observing the effects.
> Shawn, I have given the JVM about 2 GB of memory but it's only using 300 MB
> so I don't think there's memory shortage now. The whole index is about 2 GB
> in size but I think there aren't enough queries to fill up the cache and
> make SOLR load everything in memory.
>
> Below i'm pasting the thread dump taken when the update was hung (it's also
> attached to the first message of this topic)
>
> Best regards, RG
>
> 
>  example
>  
>  
>    20.5-b03
>    Java HotSpot(TM) 64-Bit Server VM
>  
>  
>    31
>    32
>    8
>  
>
>  
>    
>      39
>      pool-4-thread-1
>      WAITING
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@765bc9b8
> 
>      312,5000ms
>      265,6250ms
>      
>        at sun.misc.Unsafe.park(Native Method)        
>        at java.util.concurrent.locks.LockSupport.park(Unknown
> Source)        
>        at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
> Source)        
>        at java.util.concurrent.DelayQueue.take(Unknown Source)
>  
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
> Source)        
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
> Source)        
>        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown
> Source)        
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)        
>        at java.lang.Thread.run(Unknown Source)        
>      
>    
>    
>      38
>      pool-2-thread-1
>      WAITING
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4188bbd
> 
>      6484,3750ms
>      5546,8750ms
>      
>        at sun.misc.Unsafe.park(Native Method)        
>        at java.util.concurrent.locks.LockSupport.park(Unknown
> Source)        
>        at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
> Source)        
>        at java.util.concurrent.DelayQueue.take(Unknown Source)
>  
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
> Source)        
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
> Source)        
>        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown
> Source)        
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)        
>        at java.lang.Thread.run(Unknown Source)        
>      
>    
>    
>      37
>      DestroyJavaVM
>      RUNNABLE
>      4906,2500ms
>      4484,3750ms
>      
>      
>    
>    
>      36
>      qtp1033068770-36
>      TIMED_WAITING
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@677e2764
> 
>      134968,7500ms
>      114984,3750ms
>      
>        at sun.misc.Unsafe.park(Native Method)        
>        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown
> Source)        
>        at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
> Source)        
>        at
> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320)
>       
>        at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:480)
>       
>        at java.lang.Thread.run(Unknown Source)        
>      
>    
>    
>      35
>      qtp1033068770-35
>      RUNNABLE
>      147390,6250ms
>      126593,7500ms
>      
>        at sun.management.ThreadImpl.getThreadInfo1(Native Method)
>     
>        at sun.management.ThreadImpl.getThreadInfo(Unknown Source)
>     
>        at
> org.apache.jsp.admin.threaddump_jsp._jspService(org.apache.jsp.admin.threaddump_jsp:264)
>       
>        at
> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
>  
>        at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:820)        
>        at
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
>       
>        at
> org.apache.jasper.servle

Re: bbox query and range queries

2012-03-29 Thread Erick Erickson
What are your results? Can you show us the field definition for "local"
and the results of adding &debugQuery=on?

Because this should work as far as I can tell.

Best
Erick

On Thu, Mar 29, 2012 at 11:04 AM, Alexandre Rocco  wrote:
> Hello,
>
> I'm trying to perform some queries on a location field on the index.
> The requirement is to search listings inside a pair of coordinates, like a
> bounding box.
>
> Taking a look on the wiki, I noticed that there is the option to use the
> bbox query but in does not create a retangular shaped box to find the docs.
> Also since the LatLon field is searchable by range, it's possible to use a
> range query to find.
>
> I'm trying to search inside a pair of coordinates (the top left corner and
> bottom right corner) and no result is found.
>
> The query i'm trying is something like:
> http://localhost:8984/solr/select?wt=json&indent=true&fl=local,*&q=*:*&fq=local:[-23.6674,-46.7314TO
> -23.6705,-46.7274]
>
> Is there any other way to find docs inside a rectangular bounding box?
>
> Thanks
> Alexandre


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Rafal Gwizdala
Guys, thanks for all the suggestions
I will be trying them, one at a time. Imho it's too early to give up and
look for another tool, I'll try to work on configuration and see what
happens.
The NRT looks quite promising, there are also tons of config options to
change.
As for now, I have made the updates less frequent - about once every 30
seconds (but now the batches are bigger, about 150-200 documents per
update). I'll see if this makes SOLR more stable or users more aggressive.
Unfortunately I have no resources for experimenting so I'll keep making
small changes to production system and observing the effects.
Shawn, I have given the JVM about 2 GB of memory but it's only using 300 MB
so I don't think there's memory shortage now. The whole index is about 2 GB
in size but I think there aren't enough queries to fill up the cache and
make SOLR load everything in memory.

Below i'm pasting the thread dump taken when the update was hung (it's also
attached to the first message of this topic)

Best regards, RG


  example
  
  
20.5-b03
Java HotSpot(TM) 64-Bit Server VM
  
  
31
32
8
  

  

  39
  pool-4-thread-1
  WAITING

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@765bc9b8

  312,5000ms
  265,6250ms
  
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown
Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source)
at java.util.concurrent.DelayQueue.take(Unknown Source)
 
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
  


  38
  pool-2-thread-1
  WAITING

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4188bbd

  6484,3750ms
  5546,8750ms
  
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(Unknown
Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source)
at java.util.concurrent.DelayQueue.take(Unknown Source)
 
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
  


  37
  DestroyJavaVM
  RUNNABLE
  4906,2500ms
  4484,3750ms
  
  


  36
  qtp1033068770-36
  TIMED_WAITING

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@677e2764

  134968,7500ms
  114984,3750ms
  
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(Unknown
Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
Source)
at
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320)
   
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:480)
   
at java.lang.Thread.run(Unknown Source)
  


  35
  qtp1033068770-35
  RUNNABLE
  147390,6250ms
  126593,7500ms
  
at sun.management.ThreadImpl.getThreadInfo1(Native Method)
 
at sun.management.ThreadImpl.getThreadInfo(Unknown Source)
 
at
org.apache.jsp.admin.threaddump_jsp._jspService(org.apache.jsp.admin.threaddump_jsp:264)
   
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109)
 
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:389)
   
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486)
 
at
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380)
 
at
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:534)
   
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:475)
   
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   
at
org.eclipse.jetty.security.SecurityHandler.handle(Secur

Re: Empty facet counts

2012-03-29 Thread Erick Erickson
Hmmm, looking at your schema, faceting on a  really doesn't make
all that much sense, there will always be exactly one of them. At
least it's highly
questionable.

But that's not your problem and what's wrong isn't at all obvious. Can you try
pasting the results of adding &debugQuery=on?

Best
Erick

On Thu, Mar 29, 2012 at 11:12 AM, Youri Westerman  wrote:
> The version is 3.5.0.2011.11.22.14.54.38. I did not apply any patches, but
> then again it is not my server.
> Do you have a clue on what is going wrong here?
>
> Regards,
>
> Youri
>
>
> 2012/3/29 Bill Bell 
>>
>> Send schema.xml and did you apply any patches? What version of Solr?
>>
>> Bill Bell
>> Sent from mobile
>>
>>
>> On Mar 29, 2012, at 5:26 AM, Youri Westerman  wrote:
>>
>> > Hi,
>> >
>> > I'm currently learning how to use solr and everything seems pretty
>> > straight
>> > forward. For some reason when I use faceted queries it returns only
>> > empty
>> > sets in the facet_count section.
>> >
>> > The get params I'm using are:
>> >  ?q=*:*&rows=0&facet=true&facet.field=urn
>> >
>> > The result:
>> >  "facet_counts": {
>> >
>> >      "facet_queries": { },
>> >      "facet_fields": { },
>> >      "facet_dates": { },
>> >      "facet_ranges": { }
>> >
>> >  }
>> >
>> > The urn field is indexed and there are enough entries to be counted.
>> > When
>> > adding facet.method=Enum, nothing changes.
>> > Does anyone know why this is happening? Am I missing something?
>> >
>> > Thanks in advance!
>> >
>> > Youri
>
>


Re: UTF-8 encoding

2012-03-29 Thread Erick Erickson
I doubt that the pre-installed Jetty server has problems with UTF-8, although
you haven't told us what version of Solr you're running on so it could be really
old.

And you also haven't told us why you think UTF-8 is a problem. How is this
manifesting itself? Failed searches? Failed indexing? ???

Because there's some possibility that, if your problem is with
searching from the browser, that your _browser_ isn't
configured to handle UTF-8 for instance.

Best
Erick

On Thu, Mar 29, 2012 at 12:17 PM, henri.gour...@laposte.net
 wrote:
> Thanks for the tips, but unfortunately, no progress so far.
> Reading through the Web, I guess that jetty has utf-8 problems!
> I guess that I will have to switch from the embedded (and pre installed ->
> easy) jetty server present in Solr in favor of Tomcat (for which I have to
> rediscover the installation issues!).
>
> Cheers,
>
> Henri
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3868198.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: why does building war from source produce a different size file?

2012-03-29 Thread Chris Hostetter

: 6403279 Nov 22 14:54 apache-solr-3.5.0.war
: 
: when i build the war file from source - i get a different sized file:
: 
:  ./dist/apache-solr-3.5-SNAPSHOT.war
: 
: 6404098 Mar 29 11:41 ./dist/apache-solr-3.5-SNAPSHOT.war
: 
: am i building from the wrong source?

I haven't tested this to confirm, but i suspect you are just seeing hte 
effect of the "-SNAPSHOT" extension being left on all the jar names, and 
in the jar version and specversion metdata making each jar slightly 
bigger.  those jars are all then bundled up into that war

this "-SNAPSHOT" extesion happens by default when building -- official 
builds set the "version" properties explicitly.

https://wiki.apache.org/solr/HowToRelease#Release_Guidlines






-Hoss


Re: Localize the largest fields (content) in index

2012-03-29 Thread Erick Erickson
Yeah, it's worth a try. The term vectors aren't entirely necessary for
highlighting,
although they do make things more efficient.

As far as MLT, does MLT really need such a big field?

But you may be on your way to sharding your index if you remove this info
and testing shows problems

Best
Erick

On Thu, Mar 29, 2012 at 9:32 AM, Vadim Kisselmann
 wrote:
> Hi Erick,
> thanks:)
> The admin UI give me the counts, so i can identify fields with big
> bulks of unique terms.
> I known this wiki-page, but i read it one more time.
> List of my file extensions with size in GB(Index size ~150GB):
> tvf 90GB
> fdt 30GB
> tim 18GB
> prx 15GB
> frq 12GB
> tip 200MB
> tvx 150MB
>
> tvf is my biggest file extension.
> Wiki :This file contains, for each field that has a term vector
> stored, a list of the terms, their frequencies and, optionally,
> position and offest information.
>
> Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
> But i think i should test my performance without termVectors. Good Idea? :)
>
> What do you think about my file extension sizes?
>
> Best regards
> Vadim
>
>
>
>
> 2012/3/29 Erick Erickson :
>> The admin UI (schema browser) will give you the counts of unique terms
>> in your fields, which is where I'd start.
>>
>> I suspect you've already seen this page, but if not:
>> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
>> the .fdt and .fdx file extensions are where data goes when
>> you set 'stored="true" '. These files don't affect search speed,
>> they just contain the verbatim copy of the data.
>>
>> The relative sizes of the various files above should give
>> you a hint as to what's using the most space, but it'll be a bit
>> of a hunt for you to pinpoint what's actually up. TermVectors
>> and norms are often sources of using up space.
>>
>> Best
>> Erick
>>
>> On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
>>  wrote:
>>> Hello folks,
>>>
>>> i work with Solr 4.0 r1292064 from trunk.
>>> My index grows fast, with 10Mio. docs i get an index size of 150GB
>>> (25% stored, 75% indexed).
>>> I want to find out, which fields(content) are too large, to consider 
>>> measures.
>>>
>>> How can i localize/discover the largest fields in my index?
>>> Luke(latest from trunk) doesn't work
>>> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
>>> this these, but i get many errors
>>> and can't build it.
>>>
>>> What other options do i have?
>>>
>>> Thanks and best regards
>>> Vadim


Re: query help

2012-03-29 Thread Erick Erickson
Solr doesn't support sorting on multValued fields so I don't think this
is possible OOB.

I can't come up with a clever indexing solution that does this either, sorry.

Best
Erick

On Thu, Mar 29, 2012 at 8:27 AM, Abhishek tiwari
 wrote:
> a) No. i do not want to sort the content within document .
> I want to sort the documents .
> b) As i have explained i have result set( documents ) and each document
> contains a fields "*ad_text*" (with other fields also) which is
> multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
> are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
> doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*
>
> if i search for B1: result should come in following order:
> doc1,doc3,doc2,doc4
> (As B1 is first value in maltivalued result for doc1and doc3, and B1 is in
> 2nd value in doc2 while  B1 is at 3rd in doc4  )
> if i search for B2: result should come in following order: doc4
> ,doc1,doc3,doc2
>
>
> I donot know whether it is possible or not ..
>
> but please suggest how it can be done.
>
>
>
> On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson 
> wrote:
>
>> Hmmm, I don't quite get this. Are you saying that you want
>> to sort the documents or sort the content within the document?
>>
>> Sorting documents (i.e the results list) requires a single-valued
>> field. So you'd have to, at index time, sort the entries.
>>
>> Sorting the content within the document is something you'd
>> have to do when you index, Solr doesn't rearrange the
>> contents of a document.
>>
>> If all you want to do is display the results within the document
>> in order, your app can do that as it builds the display page.
>>
>> Best
>> Erick
>>
>> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
>>  wrote:
>> > Hi ,
>> > i have multi valued field want to sort the docs order the particular
>> > text eq:'B1' is added.
>> > how i should query? ad_text is multivalued field.
>> >
>> > t
>> >
>> > 
>> > 
>> > B1
>> > B2
>> > B3
>> > 
>> > 
>> > 
>> > 
>> > B2
>> > B1
>> > B3
>> > 
>> > 
>> >
>> > 
>> > 
>> > B1
>> > B2
>> > B3
>> > 
>> > 
>> > 
>> > 
>> > B3
>> > B2
>> > B1
>> > 
>> > 
>>


why does building war from source produce a different size file?

2012-03-29 Thread geeky2

hello all,

i have been pulling down the 3.5 solr war file from the mirror site.

the size of this file is:

6403279 Nov 22 14:54 apache-solr-3.5.0.war

when i build the war file from source - i get a different sized file:

 ./dist/apache-solr-3.5-SNAPSHOT.war

6404098 Mar 29 11:41 ./dist/apache-solr-3.5-SNAPSHOT.war

am i building from the wrong source?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-does-building-war-from-source-produce-a-different-size-file-tp3868307p3868307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UTF-8 encoding

2012-03-29 Thread henri.gour...@laposte.net
Thanks for the tips, but unfortunately, no progress so far.
Reading through the Web, I guess that jetty has utf-8 problems!
I guess that I will have to switch from the embedded (and pre installed ->
easy) jetty server present in Solr in favor of Tomcat (for which I have to
rediscover the installation issues!).

Cheers,

Henri

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3868198.html
Sent from the Solr - User mailing list archive at Nabble.com.


pattern error in PatternReplaceCharFilterFactory

2012-03-29 Thread OliverS
Hello

I am trying to filter out characters per unicode block or before
tokenization, so I use "PatternReplaceCharFilterFactory". In the end, I want
to filter out all non-CJK characters, basically latin, greek, arabic and
hebrew scripts.

The problem is, PatternReplaceCharFilterFactory does not fully support the
block or script pattern notation. Example:

This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
These throw an exception, from the log:
***
Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
***

I am running the latest 4.0 nightly (version 4.0.0.2012.03.09.11.46.05)

Can anybody help? Or, might this be a java issue?

Thanks a lot
Oliver

--
View this message in context: 
http://lucene.472066.n3.nabble.com/pattern-error-in-PatternReplaceCharFilterFactory-tp3868174p3868174.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Empty facet counts

2012-03-29 Thread Youri Westerman
The version is 3.5.0.2011.11.22.14.54.38. I did not apply any patches, but
then again it is not my server.
Do you have a clue on what is going wrong here?

Regards,

Youri

2012/3/29 Bill Bell 

> Send schema.xml and did you apply any patches? What version of Solr?
>
> Bill Bell
> Sent from mobile
>
>
> On Mar 29, 2012, at 5:26 AM, Youri Westerman  wrote:
>
> > Hi,
> >
> > I'm currently learning how to use solr and everything seems pretty
> straight
> > forward. For some reason when I use faceted queries it returns only empty
> > sets in the facet_count section.
> >
> > The get params I'm using are:
> >  ?q=*:*&rows=0&facet=true&facet.field=urn
> >
> > The result:
> >  "facet_counts": {
> >
> >  "facet_queries": { },
> >  "facet_fields": { },
> >  "facet_dates": { },
> >  "facet_ranges": { }
> >
> >  }
> >
> > The urn field is indexed and there are enough entries to be counted. When
> > adding facet.method=Enum, nothing changes.
> > Does anyone know why this is happening? Am I missing something?
> >
> > Thanks in advance!
> >
> > Youri
>






































































































































		










urn
titleMain




Re: Custom scoring question

2012-03-29 Thread Darren Govoni
I'm going to try index time per-field boosting and do the boost
computation at index time and see if that helps.

On Thu, 2012-03-29 at 10:08 -0400, Darren Govoni wrote:
> Hi,
>  I have a situation I want to re-score document relevance.
> 
> Let's say I have two fields:
> 
> text: The quick brown fox jumped over the white fence.
> terms: fox fence
> 
> Now my queries come in as:
> 
> terms:[* TO *]
> 
> and Solr scores them on that field. 
> 
> What I want is to rank them according to the distribution of field
> "terms" within field "text". Which is a per document calculation.
> 
> Can this be done with any kind of dismax? I'm not searching for known
> terms at query time.
> 
> If not, what is the best way to implement a custom scoring handler to
> perform this calculation and re-score/sort the results?
> 
> thanks for any tips!!!
> 




bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Hello,

I'm trying to perform some queries on a location field on the index.
The requirement is to search listings inside a pair of coordinates, like a
bounding box.

Taking a look on the wiki, I noticed that there is the option to use the
bbox query but in does not create a retangular shaped box to find the docs.
Also since the LatLon field is searchable by range, it's possible to use a
range query to find.

I'm trying to search inside a pair of coordinates (the top left corner and
bottom right corner) and no result is found.

The query i'm trying is something like:
http://localhost:8984/solr/select?wt=json&indent=true&fl=local,*&q=*:*&fq=local:[-23.6674,-46.7314TO
-23.6705,-46.7274]

Is there any other way to find docs inside a rectangular bounding box?

Thanks
Alexandre


Re: UTF-8 encoding

2012-03-29 Thread Paul Libbrecht
Also, in case you use Apache's mod_proxy, be sure to use the nocanon attribute.
(I don't know of an equivalent for mod_rewrite).

In general, I tend also to advise also to change the default encoding of the 
java running the servlets... but I am sure you've done this.

Tell us your success or lack thereof, I'm interested and I am sure others are.

paul


Le 29 mars 2012 à 16:49, Bob Sandiford a écrit :

> Hi, Henri.
> 
> Make sure that the container in which you are running Solr is also set for 
> UTF-8.
> 
> For example, in Tomcat, in the server.xml file, your Connector definitions 
> should include:
>   URIEncoding="UTF-8"
> 
> 
>> -Original Message-
>> From: henri.gour...@laposte.net [mailto:henri.gour...@laposte.net]
>> Sent: Thursday, March 29, 2012 10:42 AM
>> To: solr-user@lucene.apache.org
>> Subject: UTF-8 encoding
>> 
>> I cant get utf-8 encoding to work!!
>> 
>> I havetext/html;charset=UTF-8
>> 
>> in my request handler, and
>> input.encoding=UTF-8
>> output.encoding=UTF-8
>> in velocity.properties, in various locations (I may have the wrong ones! at
>> least in the folder where the .vm files reside)
>> 
>> What else should I be doing/configuring.



RE: UTF-8 encoding

2012-03-29 Thread Bob Sandiford
Hi, Henri.

Make sure that the container in which you are running Solr is also set for 
UTF-8.

For example, in Tomcat, in the server.xml file, your Connector definitions 
should include:
URIEncoding="UTF-8"

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

Register for the 2012 COSUGI User Group Conference today for early bird pricing!
May 2-5 at Disney's Coronado Springs Resort - Lake Buena Vista, Florida
 
Join the conversation: Like us on Facebook! Follow us on Twitter!

> -Original Message-
> From: henri.gour...@laposte.net [mailto:henri.gour...@laposte.net]
> Sent: Thursday, March 29, 2012 10:42 AM
> To: solr-user@lucene.apache.org
> Subject: UTF-8 encoding
> 
> I cant get utf-8 encoding to work!!
> 
> I havetext/html;charset=UTF-8
> 
> in my request handler, and
> input.encoding=UTF-8
> output.encoding=UTF-8
> in velocity.properties, in various locations (I may have the wrong ones! at
> least in the folder where the .vm files reside)
> 
> What else should I be doing/configuring.
> 
> Thanks
> Henri
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/UTF-8-
> encoding-tp3867885p3867885.html
> Sent from the Solr - User mailing list archive at Nabble.com.




Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread Dennis Schafroth
I was wrong! It does seem to work! 

Thanks a bunch! 

cheers,
:-Dennis

On Mar 29, 2012, at 15:52 , fbrisbart wrote:

> I had the same issue months ago.
> 'newSearcher' fixed the problem for me.
> I also remember that I had to upgrade solr (3.1) because it didn't work
> with release 1.4 
> But, I suppose you already have a solr 3.x or more.
> So I'm afraid I can't help you more :o(
> 
> Franck
> 
> 
> Le jeudi 29 mars 2012 à 15:41 +0200, Dennis Schafroth a écrit :
>> On Mar 29, 2012, at 14:49 , fbrisbart wrote:
>> 
>>> Arf, I didn't see your attached tgz.
>>> 
>>> In your slave solrconfig.xml, only the 'firstSearcher' contains the
>>> query. Add it also in the 'newSearcher', so that the new search
>>> instances will wait also after a new index is replicated.
>> 
>> Did that now, but I believe my case is mostly a first searcher issue. Anyway 
>> it didn't seem to change anything. 
>> 
>>> 
>>> The first request is long because the default faceting method uses the
>>> FieldCache for your facet fields.
>> 
>> Jup, i know. 
>> 
>>> You may also choose to use the facet.method=enum  The performance is
>>> globally worse
>> 
>> You say. This means that every search with facets is now 20 seconds instead 
>> of 2. Then I prefer the field cache with one bad first search. 
>> 
>>> than the 'fc' method, but you will avoid the very slow
>>> first request. Btw, it's far better to use the default 'enum' facet
>>> method.
> I meant "the default 'fc' method" of course :o)
> 
>> 
>> Thanks for the input so far. 
>> 
>>> 
>>> Hope this helps,
>>> Franck
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit :
 If you add your query to the firstSearcher and/or newSearcher event
 listeners in the slave
 'solrconfig.xml' ( 
 http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
  ),
 
 each new search instance will wait before accepting queries.
 
 Example to load the FieldCache for 'your_facet_field' field :
 ...
   
 
   *:*true>>> name="facet.field">your_facet_field
 
   
 ...
 
 
 Franck
 
 Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
> Hi 
>   
> I am running indexing and facetted searching on bibliographic data, which 
> is known not to perform to well due to the high facet count. Actually 
> it's just the firstSearch that is horrible slow, 200+ seconds  . After 
> that, I am getting okay times (1 second) (at least in a few users 
> scenario we have now). 
> 
> The current index is 54 millions record with approx. 10 millions unique 
> authors. The facets (… _exact) is using the string type. 
> 
> I had hoped that a master (indexing) and slave (searching) would have 
> solved the issue, but I am still seeing the issue on the slave, so I 
> guess I must have misunderstood (or perhaps misconfigured) something
> 
> I had thought that the slave would not switch to the new index until the 
> auto warming was completed.  Is such behavior possible? 
> 
> I guess a alternative solution could be to have multiple slaves and 
> taking a slave off-line when doing replication, but if it is possible to 
> do simpler (and using 1/3 less space) that would be great. Then again we 
> might need multiple slaves with more requests.
> 
> Attached is the configuration files.
> 
> Let me know if there is missing information. 
> 
> cheers, 
> :-Dennis Schafroth
> 
 
 
>>> 
>>> 
>>> 
>> 
> 
> 
> 



UTF-8 encoding

2012-03-29 Thread henri.gour...@laposte.net
I cant get utf-8 encoding to work!!

I havetext/html;charset=UTF-8

in my request handler, and 
input.encoding=UTF-8
output.encoding=UTF-8
in velocity.properties, in various locations (I may have the wrong ones! at
least in the folder where the .vm files reside)

What else should I be doing/configuring.

Thanks
Henri

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UTF-8-encoding-tp3867885p3867885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Walter Underwood
If you must have real-time search, you might look at systems that are designed 
to do that. MarkLogic isn't free, but it is fast and real-time. You can use 
their no-charge Express license for development and prototyping: 
http://developer.marklogic.com/express

OK, back to Solr.

wunder
Search Guy, Chegg
former MarkLogic engineer

On Mar 29, 2012, at 1:49 AM, Rafal Gwizdala wrote:

> That's bad news.
> If 5-7 seconds is not safe then what is the safe interval for updates?
> Near real-time is not for me as it works only when querying by document Id
> - this doesn't solve anything in my case. I just want the index to be
> updated in real-time, 30-40 seconds delay is acceptable but not much more
> than that. Is there anything that can be done, or should I start looking
> for some other indexing tool?
> I'm wondering why there's such terrible performance degradation over time -
> SOLR runs fine for first 10-20 hours, updates are extremely fast and then
> they become slower and slower until eventually they stop executing at all.
> Is there any issue with garbage collection or index fragmentation or some
> internal data structures that can't manage their data effectively when
> updates are frequent?
> 
> Best regards
> RG
> 
> 
> Thu, Mar 29, 2012 at 10:24 AM, Lance Norskog  wrote:
> 
>> 5-7 seconds- there's the problem. If you want to have documents
>> visible for search within that time, you want to use the trunk and
>> "near-real-time" search. A hard commit does several hard writes to the
>> disk (with the fsync() system call). It does not run smoothly at that
>> rate. It is no surprise that eventually you hit a thread-locking bug.
>> 
>> 
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet
>> 
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin
>> 
>> On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
>>  wrote:
>>> Lance, I know there are many variables that's why I'm asking where to
>> start
>>> and what to check.
>>> Updates are sent every 5-7 seconds, each update contains between 1 and 50
>>> docs. Commit is done every time (on each update).
>>> Currently queries aren't very frequent - about 1 query every 3-5 seconds,
>>> but the system is going to handle much more (of course if the problem is
>>> fixed).
>>> The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
>>> 300 MB)
>>> 
>>> R
>>> 
>>> On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog 
>> wrote:
>>> 
 How often are updates? And when are commits? How many CPUs? How much
 query load? There are so many variables.
 
 Check the mailing list archives and Solr issues, there might be a
 similar problem already discussed. Also, attachments do not work with
 Apache mailing lists. (Well, ok, they work for direct subscribers, but
 not for indirect subscribers and archive site users.)
 
 --
 Lance Norskog
 goks...@gmail.com
 
>> 
>> 
>> 
>> --
>> Lance Norskog
>> goks...@gmail.com
>> 






Re: [Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-29 Thread Nagendra Nagarajayya

It is from build 2012-03-19 from the trunk (part of the email). No fork.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 3/29/2012 7:20 AM, Bernd Fehling wrote:

Nothing against RankingAlgorithm and your work, which sounds great, but
I think that YOUR "Solr 4.0" might confuse some Solr users and/or 
newbees.

As far as I know the next official release will be 3.6.

So your "Solr 4.0" is a trunk snapshot or what?

If so, which revision number?

Or have you done a fork and produced a stable Solr 4.0 of your own?

Regards
Bernd


Am 29.03.2012 15:49, schrieb Nagendra Nagarajayya:
I am very excited to announce the availability of Solr 4.0 with 
RankingAlgorithm 1.4.1 (NRT support) (build 2012-03-19). The NRT 
implementation

now supports both RankingAlgorithm and Lucene.

RankingAlgorithm 1.4.1 has improved performance over the earlier 
release (1.4) and supports the entire Lucene Query Syntax, ± and/or 
boolean

queries and is compatible with the new Lucene 4.0 api.

You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0 with RankingAlgorithm 1.4.1 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org









RE: dataImportHandler: delta query fetching data, not just ids?

2012-03-29 Thread Dyer, James
You can also use $deleteDocById . If you also use $skipDoc, you can sometimes 
get the deletes on the same entity with a "command=full-import&clean=false" 
delta.  This may or may not be more convienent that what you're doing already.  
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands .

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: janne mattila [mailto:jannepostilis...@gmail.com] 
Sent: Thursday, March 29, 2012 12:45 AM
To: solr-user@lucene.apache.org
Subject: Re: dataImportHandler: delta query fetching data, not just ids?

> I'm not sure why deltas were implemented this way.  Possibly it was designed 
> to behave like some of our object-to-relational libraries?  In any case, 
> there are 2 ways to do deltas and you just have to take your pick based on 
> what will work best for your situation.  I wouldn't consider the 
> "command=full-import&clean=false" method a workaround but just a different 
> way to tackle the same problem.

Yeah, I find the delta-update strategy a little strange as well.

Problem with command=full-import&clean=false is that you can't handle
removes nicely using that. If you use the actual delta-import and
deletedPkQuery for that, you run into problems with last_index_time
and miss either modifications or deletes.

I'm handling that by creating a different entity config for updates
(using command=full-import&clean=false) and deletes (using
command=delta-import) but it ends up being much dirtier than it should
be.


Re: [Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-29 Thread Bernd Fehling

Nothing against RankingAlgorithm and your work, which sounds great, but
I think that YOUR "Solr 4.0" might confuse some Solr users and/or newbees.
As far as I know the next official release will be 3.6.

So your "Solr 4.0" is a trunk snapshot or what?

If so, which revision number?

Or have you done a fork and produced a stable Solr 4.0 of your own?

Regards
Bernd


Am 29.03.2012 15:49, schrieb Nagendra Nagarajayya:

I am very excited to announce the availability of Solr 4.0 with 
RankingAlgorithm 1.4.1 (NRT support) (build 2012-03-19). The NRT implementation
now supports both RankingAlgorithm and Lucene.

RankingAlgorithm 1.4.1 has improved performance over the earlier release (1.4) 
and supports the entire Lucene Query Syntax, ± and/or boolean
queries and is compatible with the new Lucene 4.0 api.

You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0 with RankingAlgorithm 1.4.1 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org




Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 4:24 AM, Lance Norskog  wrote:
> 5-7 seconds- there's the problem. If you want to have documents
> visible for search within that time, you want to use the trunk and
> "near-real-time" search. A hard commit does several hard writes to the
> disk (with the fsync() system call). It does not run smoothly at that
> rate. It is no surprise that eventually you hit a thread-locking bug.

Are you speaking of a JVM bug, or something else?  A Lucene bug?  A Solr bug?

Rafal, do you have a thread dump of when the update hangs (as opposed
to at shutdown?)

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Empty facet counts

2012-03-29 Thread Bill Bell
Send schema.xml and did you apply any patches? What version of Solr?

Bill Bell
Sent from mobile


On Mar 29, 2012, at 5:26 AM, Youri Westerman  wrote:

> Hi,
> 
> I'm currently learning how to use solr and everything seems pretty straight
> forward. For some reason when I use faceted queries it returns only empty
> sets in the facet_count section.
> 
> The get params I'm using are:
>  ?q=*:*&rows=0&facet=true&facet.field=urn
> 
> The result:
>  "facet_counts": {
> 
>  "facet_queries": { },
>  "facet_fields": { },
>  "facet_dates": { },
>  "facet_ranges": { }
> 
>  }
> 
> The urn field is indexed and there are enough entries to be counted. When
> adding facet.method=Enum, nothing changes.
> Does anyone know why this is happening? Am I missing something?
> 
> Thanks in advance!
> 
> Youri


Custom scoring question

2012-03-29 Thread Darren Govoni
Hi,
 I have a situation I want to re-score document relevance.

Let's say I have two fields:

text: The quick brown fox jumped over the white fence.
terms: fox fence

Now my queries come in as:

terms:[* TO *]

and Solr scores them on that field. 

What I want is to rank them according to the distribution of field
"terms" within field "text". Which is a per document calculation.

Can this be done with any kind of dismax? I'm not searching for known
terms at query time.

If not, what is the best way to implement a custom scoring handler to
perform this calculation and re-score/sort the results?

thanks for any tips!!!



Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Nagendra Nagarajayya
Have you tried using Solr 3.5 with RankingAlgorithm 1.4.1 ? Has NRT 
support and is very fast, updates about 5000 documents in about 490 ms 
(while updating 1m docs in batches of 5k).


You can get more info from here:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x


Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

On 3/29/2012 1:49 AM, Rafal Gwizdala wrote:

That's bad news.
If 5-7 seconds is not safe then what is the safe interval for updates?
Near real-time is not for me as it works only when querying by document Id
- this doesn't solve anything in my case. I just want the index to be
updated in real-time, 30-40 seconds delay is acceptable but not much more
than that. Is there anything that can be done, or should I start looking
for some other indexing tool?
I'm wondering why there's such terrible performance degradation over time -
SOLR runs fine for first 10-20 hours, updates are extremely fast and then
they become slower and slower until eventually they stop executing at all.
Is there any issue with garbage collection or index fragmentation or some
internal data structures that can't manage their data effectively when
updates are frequent?

Best regards
RG


  Thu, Mar 29, 2012 at 10:24 AM, Lance Norskog  wrote:


5-7 seconds- there's the problem. If you want to have documents
visible for search within that time, you want to use the trunk and
"near-real-time" search. A hard commit does several hard writes to the
disk (with the fsync() system call). It does not run smoothly at that
rate. It is no surprise that eventually you hit a thread-locking bug.


http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet

http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin

On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
  wrote:

Lance, I know there are many variables that's why I'm asking where to

start

and what to check.
Updates are sent every 5-7 seconds, each update contains between 1 and 50
docs. Commit is done every time (on each update).
Currently queries aren't very frequent - about 1 query every 3-5 seconds,
but the system is going to handle much more (of course if the problem is
fixed).
The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
300 MB)

R

On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog

wrote:

How often are updates? And when are commits? How many CPUs? How much
query load? There are so many variables.

Check the mailing list archives and Solr issues, there might be a
similar problem already discussed. Also, attachments do not work with
Apache mailing lists. (Well, ok, they work for direct subscribers, but
not for indirect subscribers and archive site users.)

--
Lance Norskog
goks...@gmail.com




--
Lance Norskog
goks...@gmail.com





Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread fbrisbart
I had the same issue months ago.
'newSearcher' fixed the problem for me.
I also remember that I had to upgrade solr (3.1) because it didn't work
with release 1.4 
But, I suppose you already have a solr 3.x or more.
So I'm afraid I can't help you more :o(

Franck


Le jeudi 29 mars 2012 à 15:41 +0200, Dennis Schafroth a écrit :
> On Mar 29, 2012, at 14:49 , fbrisbart wrote:
> 
> > Arf, I didn't see your attached tgz.
> > 
> > In your slave solrconfig.xml, only the 'firstSearcher' contains the
> > query. Add it also in the 'newSearcher', so that the new search
> > instances will wait also after a new index is replicated.
> 
> Did that now, but I believe my case is mostly a first searcher issue. Anyway 
> it didn't seem to change anything. 
> 
> > 
> > The first request is long because the default faceting method uses the
> > FieldCache for your facet fields.
> 
> Jup, i know. 
> 
> > You may also choose to use the facet.method=enum  The performance is
> > globally worse
> 
> You say. This means that every search with facets is now 20 seconds instead 
> of 2. Then I prefer the field cache with one bad first search. 
> 
> > than the 'fc' method, but you will avoid the very slow
> > first request. Btw, it's far better to use the default 'enum' facet
> > method.
I meant "the default 'fc' method" of course :o)

> 
> Thanks for the input so far. 
> 
> > 
> > Hope this helps,
> > Franck
> > 
> > 
> > 
> > 
> > 
> > 
> > Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit :
> >> If you add your query to the firstSearcher and/or newSearcher event
> >> listeners in the slave
> >> 'solrconfig.xml' ( 
> >> http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
> >>  ),
> >> 
> >> each new search instance will wait before accepting queries.
> >> 
> >> Example to load the FieldCache for 'your_facet_field' field :
> >> ...
> >>
> >>  
> >>*:*true >> name="facet.field">your_facet_field
> >>  
> >>
> >> ...
> >> 
> >> 
> >> Franck
> >> 
> >> Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
> >>> Hi 
> >>>   
> >>> I am running indexing and facetted searching on bibliographic data, which 
> >>> is known not to perform to well due to the high facet count. Actually 
> >>> it's just the firstSearch that is horrible slow, 200+ seconds  . After 
> >>> that, I am getting okay times (1 second) (at least in a few users 
> >>> scenario we have now). 
> >>> 
> >>> The current index is 54 millions record with approx. 10 millions unique 
> >>> authors. The facets (… _exact) is using the string type. 
> >>> 
> >>> I had hoped that a master (indexing) and slave (searching) would have 
> >>> solved the issue, but I am still seeing the issue on the slave, so I 
> >>> guess I must have misunderstood (or perhaps misconfigured) something
> >>> 
> >>> I had thought that the slave would not switch to the new index until the 
> >>> auto warming was completed.  Is such behavior possible? 
> >>> 
> >>> I guess a alternative solution could be to have multiple slaves and 
> >>> taking a slave off-line when doing replication, but if it is possible to 
> >>> do simpler (and using 1/3 less space) that would be great. Then again we 
> >>> might need multiple slaves with more requests.
> >>> 
> >>> Attached is the configuration files.
> >>> 
> >>> Let me know if there is missing information. 
> >>> 
> >>> cheers, 
> >>> :-Dennis Schafroth
> >>> 
> >> 
> >> 
> > 
> > 
> > 
> 




[Announce] Solr 4.0 with RankingAlgorithm 1.4.1, NRT now supports both RankingAlgorithm and Lucene

2012-03-29 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.0 with 
RankingAlgorithm 1.4.1 (NRT support) (build 2012-03-19). The NRT 
implementation now supports both RankingAlgorithm and Lucene.


RankingAlgorithm 1.4.1 has improved performance over the earlier release 
(1.4) and supports the entire Lucene Query Syntax, ± and/or boolean 
queries and is compatible with the new Lucene 4.0 api.


You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0 with RankingAlgorithm 1.4.1 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org




Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread Dennis Schafroth

On Mar 29, 2012, at 14:49 , fbrisbart wrote:

> Arf, I didn't see your attached tgz.
> 
> In your slave solrconfig.xml, only the 'firstSearcher' contains the
> query. Add it also in the 'newSearcher', so that the new search
> instances will wait also after a new index is replicated.

Did that now, but I believe my case is mostly a first searcher issue. Anyway it 
didn't seem to change anything. 

> 
> The first request is long because the default faceting method uses the
> FieldCache for your facet fields.

Jup, i know. 

> You may also choose to use the facet.method=enum  The performance is
> globally worse

You say. This means that every search with facets is now 20 seconds instead of 
2. Then I prefer the field cache with one bad first search. 

> than the 'fc' method, but you will avoid the very slow
> first request. Btw, it's far better to use the default 'enum' facet
> method.

Thanks for the input so far. 

> 
> Hope this helps,
> Franck
> 
> 
> 
> 
> 
> 
> Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit :
>> If you add your query to the firstSearcher and/or newSearcher event
>> listeners in the slave
>> 'solrconfig.xml' ( 
>> http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
>>  ),
>> 
>> each new search instance will wait before accepting queries.
>> 
>> Example to load the FieldCache for 'your_facet_field' field :
>> ...
>>
>>  
>>*:*true> name="facet.field">your_facet_field
>>  
>>
>> ...
>> 
>> 
>> Franck
>> 
>> Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
>>> Hi 
>>> 
>>> I am running indexing and facetted searching on bibliographic data, which 
>>> is known not to perform to well due to the high facet count. Actually it's 
>>> just the firstSearch that is horrible slow, 200+ seconds  . After that, I 
>>> am getting okay times (1 second) (at least in a few users scenario we have 
>>> now). 
>>> 
>>> The current index is 54 millions record with approx. 10 millions unique 
>>> authors. The facets (… _exact) is using the string type. 
>>> 
>>> I had hoped that a master (indexing) and slave (searching) would have 
>>> solved the issue, but I am still seeing the issue on the slave, so I guess 
>>> I must have misunderstood (or perhaps misconfigured) something
>>> 
>>> I had thought that the slave would not switch to the new index until the 
>>> auto warming was completed.  Is such behavior possible? 
>>> 
>>> I guess a alternative solution could be to have multiple slaves and taking 
>>> a slave off-line when doing replication, but if it is possible to do 
>>> simpler (and using 1/3 less space) that would be great. Then again we might 
>>> need multiple slaves with more requests.
>>> 
>>> Attached is the configuration files.
>>> 
>>> Let me know if there is missing information. 
>>> 
>>> cheers, 
>>> :-Dennis Schafroth
>>> 
>> 
>> 
> 
> 
> 



Re: Localize the largest fields (content) in index

2012-03-29 Thread Vadim Kisselmann
Hi Erick,
thanks:)
The admin UI give me the counts, so i can identify fields with big
bulks of unique terms.
I known this wiki-page, but i read it one more time.
List of my file extensions with size in GB(Index size ~150GB):
tvf 90GB
fdt 30GB
tim 18GB
prx 15GB
frq 12GB
tip 200MB
tvx 150MB

tvf is my biggest file extension.
Wiki :This file contains, for each field that has a term vector
stored, a list of the terms, their frequencies and, optionally,
position and offest information.

Hmm, i use termVectors on my biggest fields because of MLT and Highlighting.
But i think i should test my performance without termVectors. Good Idea? :)

What do you think about my file extension sizes?

Best regards
Vadim




2012/3/29 Erick Erickson :
> The admin UI (schema browser) will give you the counts of unique terms
> in your fields, which is where I'd start.
>
> I suspect you've already seen this page, but if not:
> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
> the .fdt and .fdx file extensions are where data goes when
> you set 'stored="true" '. These files don't affect search speed,
> they just contain the verbatim copy of the data.
>
> The relative sizes of the various files above should give
> you a hint as to what's using the most space, but it'll be a bit
> of a hunt for you to pinpoint what's actually up. TermVectors
> and norms are often sources of using up space.
>
> Best
> Erick
>
> On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
>  wrote:
>> Hello folks,
>>
>> i work with Solr 4.0 r1292064 from trunk.
>> My index grows fast, with 10Mio. docs i get an index size of 150GB
>> (25% stored, 75% indexed).
>> I want to find out, which fields(content) are too large, to consider 
>> measures.
>>
>> How can i localize/discover the largest fields in my index?
>> Luke(latest from trunk) doesn't work
>> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
>> this these, but i get many errors
>> and can't build it.
>>
>> What other options do i have?
>>
>> Thanks and best regards
>> Vadim


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Shawn Heisey

On 3/29/2012 2:49 AM, Rafal Gwizdala wrote:

That's bad news.
If 5-7 seconds is not safe then what is the safe interval for updates?
Near real-time is not for me as it works only when querying by document Id
- this doesn't solve anything in my case. I just want the index to be
updated in real-time, 30-40 seconds delay is acceptable but not much more
than that. Is there anything that can be done, or should I start looking
for some other indexing tool?
I'm wondering why there's such terrible performance degradation over time -
SOLR runs fine for first 10-20 hours, updates are extremely fast and then
they become slower and slower until eventually they stop executing at all.
Is there any issue with garbage collection or index fragmentation or some
internal data structures that can't manage their data effectively when
updates are frequent?


You've gotten some replies from experts already.  I'm nowhere near their 
caliber, but I do have some things to say about my experiences.


When I do a commit, it can take 30 seconds or longer.  The bulk of that 
time is spent warming the caches.  Most of the time it's between 5 and 
15 seconds. I have a program that starts updates at the top of every 
minute, but won't begin checking time again until the previous update is 
done.  I've checked things carefully, and it's warming the filter cache 
that takes so much time.  The crazy thing is that my autoWarmCount for 
filterCache is only 4.  We have some very very nasty filter queries.


Are you kicking off these every 5-7 second updates even if the previous 
update has not finished running?  You might be able to make things 
better by only doing the current update if the previous update has 
finished, which means using the default waitSearcher=true on your 
commits.  You can try other things - reducing the size of Solr's caches 
and reducing the autoWarmCount, possibly to zero.


Garbage collection can definitely be a problem, and that can be 
compounded if the machine does not have enough RAM for the OS to keep a 
large chunk of your index cached, and/or you have not given enough RAM 
to the JVM.  As far as garbage collection, I have had good luck with the 
following options added to the java commandline.  As you can see, I have 
an 8GB heap size, which is quite a bit more than my Solr actually 
needs.  Garbage collection is less of a problem if the JVM has plenty of 
memory to work with - though I understand that if it has too much 
memory, you start having different problems with GC.


-Xms8192M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

The servers are maxed out at 64GB, and each server is handling three 
index cores totaling about 60GB, so I can't quite fit all of my index 
into RAM.  I wish I had 256GB per server - Solr would perform much better.


You say your server has 4GB of memory, and that Solr is only using 
300MB?  I would guess that you need to upgrade to 8GB, 16GB, or more if 
you can.  Then you should give Solr at least 2-3GB of that, leaving the 
rest to cache your index.  With 5 million records, your index is 
probably several gigabytes in size.


Thanks,
Shawn



Re: Solr advanced boosting

2012-03-29 Thread Martin Koch
We're doing something similar: We want to combine search relevancy with a
fitness value computed from several other data sources.

For this, we pre-compute the fitness value for each document and store it a
flat file (lines of the format document_id=fitness_score) that we use an
externalFileField
to
access from Solr.

This file can be updated at regular intervals, e.g. to reflect recent views
or up/downvotes. It is re-read by solr on every commit.

The fitness field can then be included as a boost field in a (e)dismax
query.

/Martin

On Thu, Mar 29, 2012 at 9:56 AM, mads  wrote:

> Hello everyone!
>
> I am new to Solr and I have been doing a bit of reading about boosting
> search results. My search index consists of products with different
> attributes like a title, a description, a brand, a price, a discount
> percent
> and so on. I would like to do a fairly complex boosting, so that for
> example
> a hit on the brand name, a low price, a high discount percent is boosted
> compared to a hit in the title, higher prices etc. Basically I would like
> to
> make a more "intelligent" search with a my self-defined boosting algorithm
> of definition. I hope it makes sense. My question is if more experienced
> Solr people considers this possible, and how I can get started on this
> project? Is it possible to do a kind of a plugin, or?
>
> Regards Mads
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-advanced-boosting-tp3867025p3867025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread fbrisbart
Arf, I didn't see your attached tgz.

In your slave solrconfig.xml, only the 'firstSearcher' contains the
query. Add it also in the 'newSearcher', so that the new search
instances will wait also after a new index is replicated.

The first request is long because the default faceting method uses the
FieldCache for your facet fields.
You may also choose to use the facet.method=enum  The performance is
globally worse than the 'fc' method, but you will avoid the very slow
first request. Btw, it's far better to use the default 'enum' facet
method.

Hope this helps,
Franck






Le jeudi 29 mars 2012 à 13:57 +0200, fbrisbart a écrit :
> If you add your query to the firstSearcher and/or newSearcher event
> listeners in the slave
> 'solrconfig.xml' ( 
> http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
>  ),
> 
> each new search instance will wait before accepting queries.
> 
> Example to load the FieldCache for 'your_facet_field' field :
> ...
> 
>   
> *:*true name="facet.field">your_facet_field
>   
> 
> ...
> 
> 
> Franck
> 
> Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
> > Hi 
> > 
> > I am running indexing and facetted searching on bibliographic data, which 
> > is known not to perform to well due to the high facet count. Actually it's 
> > just the firstSearch that is horrible slow, 200+ seconds  . After that, I 
> > am getting okay times (1 second) (at least in a few users scenario we have 
> > now). 
> > 
> > The current index is 54 millions record with approx. 10 millions unique 
> > authors. The facets (… _exact) is using the string type. 
> >  
> > I had hoped that a master (indexing) and slave (searching) would have 
> > solved the issue, but I am still seeing the issue on the slave, so I guess 
> > I must have misunderstood (or perhaps misconfigured) something
> > 
> > I had thought that the slave would not switch to the new index until the 
> > auto warming was completed.  Is such behavior possible? 
> > 
> > I guess a alternative solution could be to have multiple slaves and taking 
> > a slave off-line when doing replication, but if it is possible to do 
> > simpler (and using 1/3 less space) that would be great. Then again we might 
> > need multiple slaves with more requests.
> > 
> > Attached is the configuration files.
> > 
> > Let me know if there is missing information. 
> > 
> > cheers, 
> > :-Dennis Schafroth
> > 
> 
> 




RE: Build solr with Maven

2012-03-29 Thread Aleksander Akerø
Yes I figured my problem would be something like that. I'll try with the
catalina/tomcat home variables I think.
Thank you.

-Original Message-
From: Ingar Hov [mailto:ingar@gmail.com] 
Sent: 29. mars 2012 14:13
To: solr-user@lucene.apache.org
Subject: Re: Build solr with Maven

I think you need absolute path. But perhaps if $CATALINA_HOME or
$TOMCAT_HOME is set, you can use it with your path.Haven't tried it though..

In any case, you should quite easily be able to verify if relative paths can
be used. All you need to do is to get the work directory for the webapp and
reflect that with your variable. In any case I don't think you can use a
path relative to web.xml. I believe the work directory is somewhere else
than /WEB-INF/ and that is the dir you need to make your variable relative
to..


On Thu, Mar 29, 2012 at 1:28 PM, Aleksander Akerø 
wrote:
> Tried that, but I guess I am doing it wrong somehow with the paths.
>
> The home folder should be WEB-INF/solr inside the tomcat. But how 
> would I set that path correctly? Do I need to use absolute paths?
>
> -Original Message-
> From: Ingar Hov [mailto:ingar@gmail.com]
> Sent: 29. mars 2012 12:57
> To: solr-user@lucene.apache.org
> Subject: Re: Build solr with Maven
>
> I see..
>
> Try to use ... in web.xml.
>
> Regards,
> Ingar
>
> On Thu, Mar 29, 2012 at 8:34 AM, Aleksander Akerø 
> 
> wrote:
>> Well, it's got all to do with how we have decided the rest of our 
>> deployment environment. So the point is basicly that there should be 
>> no configurations to the tomcat because the webapp should know all 
>> it's settings and could basicly be deployed to whatever tomcat 
>> without configuration. Also there should be no configuration done 
>> outside the
> webapp.
>>
>> That is basicly the rule that I have to live by, and I'm very 
>> thankful for your solution here but a tomcat configuration is out of the
question.
>> I was hoping there was some way to set this via mavens pom.xml or 
>> something like that.
>>
>> -Original Message-
>> From: Ingar Hov [mailto:ingar@gmail.com]
>> Sent: 28. mars 2012 18:48
>> To: solr-user@lucene.apache.org
>> Subject: Re: Build solr with Maven
>>
>> Is there any good reason for keeping solr_home within the webapp?
>>
>> It should work, but I would not recommend it. Have you configured 
>> solr_home somewhere?
>> One way in Tomcat is to do something like this:
>>
>> --
>> 
>>  > value="[your_solr_home]" override="true"/> 
>> --
>>
>> in either: $tomcat_home/conf/Catalina/localhost/solr.xml or in 
>> $tomcat_home/conf/server xml.
>>
>> A better solution would probably be a maven project, but with two
modules.
>> This way you could build the modules together or individually. Then 
>> you can make adjustments to the config and reload core's at will, a 
>> feature you would lose with keeping solr_home within the webapp.
>>
>> Regards,
>> Ingar
>>
>>
>> On Wed, Mar 28, 2012 at 1:39 PM, Aleksander Akerø 
>> 
>> wrote:
>>> Hi
>>>
>>>
>>>
>>> My company has just decided to use maven to build new projects, 
>>> which then includes building solr with maven too.
>>>
>>> But then it has been decided that solr_home also should be installed 
>>> within the webapp someplace. But now I have got the problem that 
>>> solr can’t find the config files and so on. I have come across some 
>>> posts here and there which says that solr_home should not be placed 
>>> within the
>> webapp.
>>>
>>> Is that correct? If so, what are the reasons for it, and should it 
>>> not work at all?
>>>
>>>
>>>
>>> Aleksander Akerø
>>>
>>
>



Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread Dennis Schafroth

I do have a firstSearcher, but currently coldSearcher is set to true. But 
doesn't this just mean that that any searches will block while the first 
searcher is running? This is how the comment describes first searcher. It would 
almost give the same effect; that some searches take a long time.   

What I am looking for is after receiving replicated data, do first searcher and 
then switch to new index. 

I will try with coldSearcher false, but I actually think I have already tried 
this. 

cheers, 
:-Dennis

On Mar 29, 2012, at 13:57 , fbrisbart wrote:

> If you add your query to the firstSearcher and/or newSearcher event
> listeners in the slave
> 'solrconfig.xml' ( 
> http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
>  ),
> 
> each new search instance will wait before accepting queries.
> 
> Example to load the FieldCache for 'your_facet_field' field :
> ...
>
>  
>*:*true name="facet.field">your_facet_field
>  
>
> ...
> 
> 
> Franck
> 
> Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
>> Hi 
>>  
>> I am running indexing and facetted searching on bibliographic data, which is 
>> known not to perform to well due to the high facet count. Actually it's just 
>> the firstSearch that is horrible slow, 200+ seconds  . After that, I am 
>> getting okay times (1 second) (at least in a few users scenario we have 
>> now). 
>> 
>> The current index is 54 millions record with approx. 10 millions unique 
>> authors. The facets (… _exact) is using the string type. 
>> 
>> I had hoped that a master (indexing) and slave (searching) would have solved 
>> the issue, but I am still seeing the issue on the slave, so I guess I must 
>> have misunderstood (or perhaps misconfigured) something
>> 
>> I had thought that the slave would not switch to the new index until the 
>> auto warming was completed.  Is such behavior possible? 
>> 
>> I guess a alternative solution could be to have multiple slaves and taking a 
>> slave off-line when doing replication, but if it is possible to do simpler 
>> (and using 1/3 less space) that would be great. Then again we might need 
>> multiple slaves with more requests.
>> 
>> Attached is the configuration files.
>> 
>> Let me know if there is missing information. 
>> 
>> cheers, 
>> :-Dennis Schafroth
>> 
> 
> 
> 



Solr index files - GZIP compression

2012-03-29 Thread mechravi25
Hi, 

I am trying to do the gzip compression in solr server. 

I had refered to the below link to add the same in the web.xml

http://blog.max.berger.name/2010/01/jetty-7-gzip-filter.html

I am using jetty server version 6. When I restart the server after adding
the above changes, I got the following exception

java.lang.ClassNotFoundException: org.mortbay.servlet.GzipFilter

So, I tried placing the jar file (org.mortbay.jetty-4.2.15.jar) in the lib
folder. But, its resulting in the same error.

Am I missing anything out? Can someone please guide me on this?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-index-files-GZIP-compression-tp3867562p3867562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Erick Erickson
Could be garbage collection. Could be larger and larger merges. At some point
your commit will cause all segments to be merged. It's likely that what's
happening is you need to hit the "magic combination" of events, particularly
the problem of too many warming searchers.

So, look at your log files or the admin page and see what your searcher
warmup times are. This provides a lower bound for your commit interval.

I'm guessing you have a single machine that's indexing and searching.
Consider a master/slave setup which will avoid the problem of
indexing and search contention. As you say you're going to handle
many more queries in the future this may be required anyway...

NRT does not just search doc IDs, it's intended for this kind of problem
so I believe that is a possibility. But we're talking trunk here I think.

I _strongly_ encourage you to think about whether such rapid search
availability is really required. Often 3-5 minutes is acceptable if you
ask, which gives you ample time to avoid this problem. That said,
you have a relatively small index here, so you may be able to get
away with, say, 30 second commits.

Best
Erick

On Thu, Mar 29, 2012 at 4:49 AM, Rafal Gwizdala
 wrote:
> That's bad news.
> If 5-7 seconds is not safe then what is the safe interval for updates?
> Near real-time is not for me as it works only when querying by document Id
> - this doesn't solve anything in my case. I just want the index to be
> updated in real-time, 30-40 seconds delay is acceptable but not much more
> than that. Is there anything that can be done, or should I start looking
> for some other indexing tool?
> I'm wondering why there's such terrible performance degradation over time -
> SOLR runs fine for first 10-20 hours, updates are extremely fast and then
> they become slower and slower until eventually they stop executing at all.
> Is there any issue with garbage collection or index fragmentation or some
> internal data structures that can't manage their data effectively when
> updates are frequent?
>
> Best regards
> RG
>
>
>  Thu, Mar 29, 2012 at 10:24 AM, Lance Norskog  wrote:
>
>> 5-7 seconds- there's the problem. If you want to have documents
>> visible for search within that time, you want to use the trunk and
>> "near-real-time" search. A hard commit does several hard writes to the
>> disk (with the fsync() system call). It does not run smoothly at that
>> rate. It is no surprise that eventually you hit a thread-locking bug.
>>
>>
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet
>>
>> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin
>>
>> On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
>>  wrote:
>> > Lance, I know there are many variables that's why I'm asking where to
>> start
>> > and what to check.
>> > Updates are sent every 5-7 seconds, each update contains between 1 and 50
>> > docs. Commit is done every time (on each update).
>> > Currently queries aren't very frequent - about 1 query every 3-5 seconds,
>> > but the system is going to handle much more (of course if the problem is
>> > fixed).
>> > The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
>> > 300 MB)
>> >
>> > R
>> >
>> > On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog 
>> wrote:
>> >
>> >> How often are updates? And when are commits? How many CPUs? How much
>> >> query load? There are so many variables.
>> >>
>> >> Check the mailing list archives and Solr issues, there might be a
>> >> similar problem already discussed. Also, attachments do not work with
>> >> Apache mailing lists. (Well, ok, they work for direct subscribers, but
>> >> not for indirect subscribers and archive site users.)
>> >>
>> >> --
>> >> Lance Norskog
>> >> goks...@gmail.com
>> >>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>


Re: query help

2012-03-29 Thread Abhishek tiwari
a) No. i do not want to sort the content within document .
I want to sort the documents .
b) As i have explained i have result set( documents ) and each document
contains a fields "*ad_text*" (with other fields also) which is
multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*

if i search for B1: result should come in following order:
doc1,doc3,doc2,doc4
(As B1 is first value in maltivalued result for doc1and doc3, and B1 is in
2nd value in doc2 while  B1 is at 3rd in doc4  )
if i search for B2: result should come in following order: doc4
,doc1,doc3,doc2


I donot know whether it is possible or not ..

but please suggest how it can be done.



On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson wrote:

> Hmmm, I don't quite get this. Are you saying that you want
> to sort the documents or sort the content within the document?
>
> Sorting documents (i.e the results list) requires a single-valued
> field. So you'd have to, at index time, sort the entries.
>
> Sorting the content within the document is something you'd
> have to do when you index, Solr doesn't rearrange the
> contents of a document.
>
> If all you want to do is display the results within the document
> in order, your app can do that as it builds the display page.
>
> Best
> Erick
>
> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
>  wrote:
> > Hi ,
> > i have multi valued field want to sort the docs order the particular
> > text eq:'B1' is added.
> > how i should query? ad_text is multivalued field.
> >
> > t
> >
> > 
> > 
> > B1
> > B2
> > B3
> > 
> > 
> > 
> > 
> > B2
> > B1
> > B3
> > 
> > 
> >
> > 
> > 
> > B1
> > B2
> > B3
> > 
> > 
> > 
> > 
> > B3
> > B2
> > B1
> > 
> > 
>


Re: SolrCloud

2012-03-29 Thread Erick Erickson
This is the way SolrCloud works at present. There must be at least
one instance of each shard up in order to get results. I believe there
are plans to return partial results in future, but that's not been
implemented yet.

Best
Erick

On Thu, Mar 29, 2012 at 4:37 AM, asia  wrote:
> Hello,
> I am working on solr.I have set up 2 solr instances on different systems i.e
> i did sharding.I am using tomcat and eclipse environment.When I fire a query
> in solrj for a data from  index,i get response when both system's tomcat is
> working.But when I stop one of the system's server I dont get response from
> any of the system.Is there any solution for this that when any one of the
> system is down I will get response from any one of the server.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3867086.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: disadvantage one field in a catchall field

2012-03-29 Thread Erick Erickson
I guess my question is "why are you using a catchall field at all"? This
is the kind of thing edismax was designed for, so your qf could just
contain all the fields with appropriate boosts, there aren't that many...

But what you're actually doing will probably work. I think if you're not
seeing DESCRIPTION in your explain, then the terms you're searching
on aren't in that field. But I could well be wrong there...

BTW, do you use this param? If not, it indents things much easier.
debug.explain.structured=true

Best
Erick


On Thu, Mar 29, 2012 at 4:05 AM, elisabeth benoit
 wrote:
> Hi all,
>
> I'm using solr 3.4 with a catchall field and an edismaw request handler.
> I'd like to score higher answers matching with words not contained in one
> of the fields copied into my catchall field.
>
> So my catchallfield is called catchall. It contains, let's say, fields
> NAME, CATEGORY, TOWN, WAY and DESCRIPTION.
>
> For one query, I would like to have answers matching NAME, CATEGORY, TOWN
> and WAY scored higher, but I still want to search in DESCRIPTION.
>
> I tried
>
> qf=catchall DESCRIPTION^0.001,
>
> but this doesn't seem to change the scoring. When I set debutQuery=on,
> parsedquery_toString looks like
>
> (text:paus | DESCRIPTION:pause^0.001) (this seems like an OR to me)
>
> but I see no trace of DESCRIPTION in explain
>
> One solution I guess would be to keep DESCRIPTION in a separate filed, and
> do not include it in my catchall field. But I wonder if there is a solution
> with the catchall field???
>
> Thanks for your help,
> Elisabeth


Re: Build solr with Maven

2012-03-29 Thread Ingar Hov
I think you need absolute path. But perhaps if $CATALINA_HOME or
$TOMCAT_HOME is set, you can use it with your path.Haven't tried it
though..

In any case, you should quite easily be able to verify if relative
paths can be used. All you need to do is to get the work directory for
the webapp and reflect that with your variable. In any case I don't
think you can use a path relative to web.xml. I believe the work
directory is somewhere else than /WEB-INF/ and that is the dir you
need to make your variable relative to..


On Thu, Mar 29, 2012 at 1:28 PM, Aleksander Akerø
 wrote:
> Tried that, but I guess I am doing it wrong somehow with the paths.
>
> The home folder should be WEB-INF/solr inside the tomcat. But how would I
> set that path correctly? Do I need to use absolute paths?
>
> -Original Message-
> From: Ingar Hov [mailto:ingar@gmail.com]
> Sent: 29. mars 2012 12:57
> To: solr-user@lucene.apache.org
> Subject: Re: Build solr with Maven
>
> I see..
>
> Try to use ... in web.xml.
>
> Regards,
> Ingar
>
> On Thu, Mar 29, 2012 at 8:34 AM, Aleksander Akerø 
> wrote:
>> Well, it's got all to do with how we have decided the rest of our
>> deployment environment. So the point is basicly that there should be
>> no configurations to the tomcat because the webapp should know all
>> it's settings and could basicly be deployed to whatever tomcat without
>> configuration. Also there should be no configuration done outside the
> webapp.
>>
>> That is basicly the rule that I have to live by, and I'm very thankful
>> for your solution here but a tomcat configuration is out of the question.
>> I was hoping there was some way to set this via mavens pom.xml or
>> something like that.
>>
>> -Original Message-
>> From: Ingar Hov [mailto:ingar@gmail.com]
>> Sent: 28. mars 2012 18:48
>> To: solr-user@lucene.apache.org
>> Subject: Re: Build solr with Maven
>>
>> Is there any good reason for keeping solr_home within the webapp?
>>
>> It should work, but I would not recommend it. Have you configured
>> solr_home somewhere?
>> One way in Tomcat is to do something like this:
>>
>> --
>> 
>>  > value="[your_solr_home]" override="true"/> 
>> --
>>
>> in either: $tomcat_home/conf/Catalina/localhost/solr.xml or in
>> $tomcat_home/conf/server xml.
>>
>> A better solution would probably be a maven project, but with two modules.
>> This way you could build the modules together or individually. Then
>> you can make adjustments to the config and reload core's at will, a
>> feature you would lose with keeping solr_home within the webapp.
>>
>> Regards,
>> Ingar
>>
>>
>> On Wed, Mar 28, 2012 at 1:39 PM, Aleksander Akerø
>> 
>> wrote:
>>> Hi
>>>
>>>
>>>
>>> My company has just decided to use maven to build new projects, which
>>> then includes building solr with maven too.
>>>
>>> But then it has been decided that solr_home also should be installed
>>> within the webapp someplace. But now I have got the problem that solr
>>> can’t find the config files and so on. I have come across some posts
>>> here and there which says that solr_home should not be placed within
>>> the
>> webapp.
>>>
>>> Is that correct? If so, what are the reasons for it, and should it
>>> not work at all?
>>>
>>>
>>>
>>> Aleksander Akerø
>>>
>>
>


Re: Luke using shards

2012-03-29 Thread Dmitry Kan
One option to try here (not verified) is to set up a Solr front that will
point to these two shards. Then try accessing its luke interface via admin
as you did on one of the shards.

But as Erick already pointed out, Luke operates on a lower level than Solr,
so this does not necessarily work.

Dmitry

On Wed, Mar 28, 2012 at 11:02 PM, Dennis Brundage
wrote:

> Is there a way to get Solr/Luke to return the aggregated results across
> shards? I tried setting the shards parameter
> (
> http://localhost:8983/solr/admin/luke?shards=localhost:8983/solr,localhost:7574/solr
> )
> but only got the results for localhost:8983. I am able to search across the
> shards so my url's are correct.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Luke-using-shards-tp3865816p3865816.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan


Re: Why this document does not match?

2012-03-29 Thread Erick Erickson
Alexander:

Your images were stripped by one of our mail servers, so there's not
much we can see ...

But guessing, you aren't searching the fields you think you are:
itemNameSearch:fifa12
becomes
itemNameSearch:fifa defaultSearchField:12

where defaultSearchField is defined in your schema.xml file.
Try itemNameSearch:(fifa 12) or similar.

Using debugQuery=on should show this in the "parsed_query" section if I'm
right.

If that doesn't help, maybe you can post your info again?



Best
Erick

On Wed, Mar 28, 2012 at 5:31 PM, Alexander Ramos Jardim
 wrote:
>
> Hi,
>
> I have an old Solr 1.3 version running on an issue. I have a field configured 
> in such a way that "fifa 12" and "fifa12" should match the same documents, as 
> it can be seen in screenshot bellow.
>
>
>
>
> When I run the query itemNameSearch:fifa12, I get the folowing result:
>
>
>
>
> That seems okay. But I have the following document on the index:
>
>
> As my field is defined, I expected the query to match this document. This is 
> not what is happening. Does anyone have any idea on what is wrong?
>
>
> --
> Alexander Ramos Jardim


Re: Slow first searcher with facet on bibliographic data in Master - Slave

2012-03-29 Thread fbrisbart
If you add your query to the firstSearcher and/or newSearcher event
listeners in the slave
'solrconfig.xml' ( 
http://wiki.apache.org/solr/SolrCaching#newSearcher_and_firstSearcher_Event_Listeners
 ),

each new search instance will wait before accepting queries.

Example to load the FieldCache for 'your_facet_field' field :
...

  
*:*trueyour_facet_field
  

...


Franck

Le jeudi 29 mars 2012 à 13:30 +0200, Dennis Schafroth a écrit :
> Hi 
>   
> I am running indexing and facetted searching on bibliographic data, which is 
> known not to perform to well due to the high facet count. Actually it's just 
> the firstSearch that is horrible slow, 200+ seconds  . After that, I am 
> getting okay times (1 second) (at least in a few users scenario we have now). 
> 
> The current index is 54 millions record with approx. 10 millions unique 
> authors. The facets (… _exact) is using the string type. 
>  
> I had hoped that a master (indexing) and slave (searching) would have solved 
> the issue, but I am still seeing the issue on the slave, so I guess I must 
> have misunderstood (or perhaps misconfigured) something
> 
> I had thought that the slave would not switch to the new index until the auto 
> warming was completed.  Is such behavior possible? 
> 
> I guess a alternative solution could be to have multiple slaves and taking a 
> slave off-line when doing replication, but if it is possible to do simpler 
> (and using 1/3 less space) that would be great. Then again we might need 
> multiple slaves with more requests.
> 
> Attached is the configuration files.
> 
> Let me know if there is missing information. 
> 
> cheers, 
> :-Dennis Schafroth
> 




Re: Luke using shards

2012-03-29 Thread Erick Erickson
I very much doubt that you can persuade Luke to reach across shards. Shards
are really a higher-level notion, the automatic distribution of requests
across shards is really a Solr-level construct (making use of the lower-
level Lucene capabilities, to be sure). With Luke, you point
it at index files on disk, which have no awareness of the existence of
other shards.

At least I think.

Best
Erick

On Wed, Mar 28, 2012 at 4:02 PM, Dennis Brundage
 wrote:
> Is there a way to get Solr/Luke to return the aggregated results across
> shards? I tried setting the shards parameter
> (http://localhost:8983/solr/admin/luke?shards=localhost:8983/solr,localhost:7574/solr)
> but only got the results for localhost:8983. I am able to search across the
> shards so my url's are correct.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Luke-using-shards-tp3865816p3865816.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Localize the largest fields (content) in index

2012-03-29 Thread Erick Erickson
The admin UI (schema browser) will give you the counts of unique terms
in your fields, which is where I'd start.

I suspect you've already seen this page, but if not:
http://lucene.apache.org/java/3_5_0/fileformats.html#file-names
the .fdt and .fdx file extensions are where data goes when
you set 'stored="true" '. These files don't affect search speed,
they just contain the verbatim copy of the data.

The relative sizes of the various files above should give
you a hint as to what's using the most space, but it'll be a bit
of a hunt for you to pinpoint what's actually up. TermVectors
and norms are often sources of using up space.

Best
Erick

On Wed, Mar 28, 2012 at 10:55 AM, Vadim Kisselmann
 wrote:
> Hello folks,
>
> i work with Solr 4.0 r1292064 from trunk.
> My index grows fast, with 10Mio. docs i get an index size of 150GB
> (25% stored, 75% indexed).
> I want to find out, which fields(content) are too large, to consider measures.
>
> How can i localize/discover the largest fields in my index?
> Luke(latest from trunk) doesn't work
> with my Solr version. I build Lucene/Solr .jars and tried to feed Luke
> this these, but i get many errors
> and can't build it.
>
> What other options do i have?
>
> Thanks and best regards
> Vadim


Re: query help

2012-03-29 Thread Erick Erickson
Hmmm, I don't quite get this. Are you saying that you want
to sort the documents or sort the content within the document?

Sorting documents (i.e the results list) requires a single-valued
field. So you'd have to, at index time, sort the entries.

Sorting the content within the document is something you'd
have to do when you index, Solr doesn't rearrange the
contents of a document.

If all you want to do is display the results within the document
in order, your app can do that as it builds the display page.

Best
Erick

On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
 wrote:
> Hi ,
> i have multi valued field want to sort the docs order the particular
> text eq:'B1' is added.
> how i should query? ad_text is multivalued field.
>
> t
>
> 
> 
> B1
> B2
> B3
> 
> 
> 
> 
> B2
> B1
> B3
> 
> 
>
> 
> 
> B1
> B2
> B3
> 
> 
> 
> 
> B3
> B2
> B1
> 
> 


RE: Build solr with Maven

2012-03-29 Thread Aleksander Akerø
Tried that, but I guess I am doing it wrong somehow with the paths.

The home folder should be WEB-INF/solr inside the tomcat. But how would I
set that path correctly? Do I need to use absolute paths?

-Original Message-
From: Ingar Hov [mailto:ingar@gmail.com] 
Sent: 29. mars 2012 12:57
To: solr-user@lucene.apache.org
Subject: Re: Build solr with Maven

I see..

Try to use ... in web.xml.

Regards,
Ingar

On Thu, Mar 29, 2012 at 8:34 AM, Aleksander Akerø 
wrote:
> Well, it's got all to do with how we have decided the rest of our 
> deployment environment. So the point is basicly that there should be 
> no configurations to the tomcat because the webapp should know all 
> it's settings and could basicly be deployed to whatever tomcat without 
> configuration. Also there should be no configuration done outside the
webapp.
>
> That is basicly the rule that I have to live by, and I'm very thankful 
> for your solution here but a tomcat configuration is out of the question.
> I was hoping there was some way to set this via mavens pom.xml or 
> something like that.
>
> -Original Message-
> From: Ingar Hov [mailto:ingar@gmail.com]
> Sent: 28. mars 2012 18:48
> To: solr-user@lucene.apache.org
> Subject: Re: Build solr with Maven
>
> Is there any good reason for keeping solr_home within the webapp?
>
> It should work, but I would not recommend it. Have you configured 
> solr_home somewhere?
> One way in Tomcat is to do something like this:
>
> --
> 
>   value="[your_solr_home]" override="true"/> 
> --
>
> in either: $tomcat_home/conf/Catalina/localhost/solr.xml or in 
> $tomcat_home/conf/server xml.
>
> A better solution would probably be a maven project, but with two modules.
> This way you could build the modules together or individually. Then 
> you can make adjustments to the config and reload core's at will, a 
> feature you would lose with keeping solr_home within the webapp.
>
> Regards,
> Ingar
>
>
> On Wed, Mar 28, 2012 at 1:39 PM, Aleksander Akerø 
> 
> wrote:
>> Hi
>>
>>
>>
>> My company has just decided to use maven to build new projects, which 
>> then includes building solr with maven too.
>>
>> But then it has been decided that solr_home also should be installed 
>> within the webapp someplace. But now I have got the problem that solr 
>> can’t find the config files and so on. I have come across some posts 
>> here and there which says that solr_home should not be placed within 
>> the
> webapp.
>>
>> Is that correct? If so, what are the reasons for it, and should it 
>> not work at all?
>>
>>
>>
>> Aleksander Akerø
>>
>



Empty facet counts

2012-03-29 Thread Youri Westerman
Hi,

I'm currently learning how to use solr and everything seems pretty straight
forward. For some reason when I use faceted queries it returns only empty
sets in the facet_count section.

The get params I'm using are:
  ?q=*:*&rows=0&facet=true&facet.field=urn

The result:
  "facet_counts": {

  "facet_queries": { },
  "facet_fields": { },
  "facet_dates": { },
  "facet_ranges": { }

  }

The urn field is indexed and there are enough entries to be counted. When
adding facet.method=Enum, nothing changes.
Does anyone know why this is happening? Am I missing something?

Thanks in advance!

Youri


Solr advanced boosting

2012-03-29 Thread mads
Hello everyone!

I am new to Solr and I have been doing a bit of reading about boosting
search results. My search index consists of products with different
attributes like a title, a description, a brand, a price, a discount percent
and so on. I would like to do a fairly complex boosting, so that for example
a hit on the brand name, a low price, a high discount percent is boosted
compared to a hit in the title, higher prices etc. Basically I would like to
make a more "intelligent" search with a my self-defined boosting algorithm
of definition. I hope it makes sense. My question is if more experienced
Solr people considers this possible, and how I can get started on this
project? Is it possible to do a kind of a plugin, or?

Regards Mads

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-advanced-boosting-tp3867025p3867025.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fw: confirm subscribe to solr-user@lucene.apache.org

2012-03-29 Thread Rahul Mandaliya



- Forwarded Message -
From: Rahul Mandaliya 
To: "solr-user@lucene.apache.org"  
Sent: Thursday, March 29, 2012 9:38 AM
Subject: Fw: confirm subscribe to solr-user@lucene.apache.org
 



hi,
 i am giving confirmation for subscribtion to solr-user@lucene.apache.org
regards,
Rahul




Re: Build solr with Maven

2012-03-29 Thread Ingar Hov
I see..

Try to use ... in web.xml.

Regards,
Ingar

On Thu, Mar 29, 2012 at 8:34 AM, Aleksander Akerø
 wrote:
> Well, it's got all to do with how we have decided the rest of our deployment
> environment. So the point is basicly that there should be no configurations
> to the tomcat because the webapp should know all it's settings and could
> basicly be deployed to whatever tomcat without configuration. Also there
> should be no configuration done outside the webapp.
>
> That is basicly the rule that I have to live by, and I'm very thankful for
> your solution here but a tomcat configuration is out of the question.
> I was hoping there was some way to set this via mavens pom.xml or something
> like that.
>
> -Original Message-
> From: Ingar Hov [mailto:ingar@gmail.com]
> Sent: 28. mars 2012 18:48
> To: solr-user@lucene.apache.org
> Subject: Re: Build solr with Maven
>
> Is there any good reason for keeping solr_home within the webapp?
>
> It should work, but I would not recommend it. Have you configured solr_home
> somewhere?
> One way in Tomcat is to do something like this:
>
> --
> 
>   value="[your_solr_home]" override="true"/> 
> --
>
> in either: $tomcat_home/conf/Catalina/localhost/solr.xml or in
> $tomcat_home/conf/server xml.
>
> A better solution would probably be a maven project, but with two modules.
> This way you could build the modules together or individually. Then you can
> make adjustments to the config and reload core's at will, a feature you
> would lose with keeping solr_home within the webapp.
>
> Regards,
> Ingar
>
>
> On Wed, Mar 28, 2012 at 1:39 PM, Aleksander Akerø 
> wrote:
>> Hi
>>
>>
>>
>> My company has just decided to use maven to build new projects, which
>> then includes building solr with maven too.
>>
>> But then it has been decided that solr_home also should be installed
>> within the webapp someplace. But now I have got the problem that solr
>> can’t find the config files and so on. I have come across some posts
>> here and there which says that solr_home should not be placed within the
> webapp.
>>
>> Is that correct? If so, what are the reasons for it, and should it not
>> work at all?
>>
>>
>>
>> Aleksander Akerø
>>
>


Re: Responding to Requests with Chunks/Streaming

2012-03-29 Thread Mikhail Khludnev
@All
Why nobody desires such a pretty cool feature?

Nicholas,
I have a tiny progress: I'm able to stream in javabin codec format while
searching, It implies sorting by _docid_

here is the diff
https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95

The current issue is that reading response by SolrJ is done as whole.
Reading by callback is supported by EmbeddedServer only. Anyway it should
not a big deal. ResponseStreamingTest.java somehow works.
I'm stuck on introducing response streaming in distributes search, it's
actually more challenging  - RespStreamDistributedTest fails

Regards

On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball wrote:

>
> Mikhail & Ludovic,
>
> Thanks for both your replies, very helpful indeed!
>
> Ludovic, I was actually looking into just that and did some tests with
> SolrJ, it does work well but needs some changes on the Solr server if we
> want to send out individual documents a various times. This could be done
> with a write() and flush() to the FastOutputStream (daos) in JavBinCodec. I
> therefore think that a combination of this and Mikhail's solution would
> work best!
>
> Mikhail, you mention that your solution doesn't currently work and not
> sure why this is the case, but could it be that you haven't flushed the
> data (os.flush()) you've written in the collect method of DocSetStreamer? I
> think placing the output stream into the SolrQueryRequest is the way to go,
> so that we can access it and write to it how we intend. However, I think
> using the JavaBinCodec would be ideal so that we can work with SolrJ
> directly, and not mess around with the encoding of the docs/data etc...
>
> At the moment the entry point to JavaBinCodec is through the
> BinaryResponseWriter which calls the highest level marshal() method which
> decodes and sends out the entire SolrQueryResponse (line 49 @
> BinaryResponseWriter). What would be ideal is to be able to break up the
> response and call the JavaBinCodec for pieces of it with a flush after each
> call. Did a few tests with a simple Thread.sleep and a flush to see if this
> would actually work and looks like it's working out perfectly. Just trying
> to figure out the best way to actually do it now :) any ideas?
>
> An another note, for a solution to work with the chunked transfer encoding
> (and therefore web browsers), a lot more development is going to be needed.
> Not sure if it's worth trying yet but might look into it later down the
> line.
>
> Nick
>
> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>  wrote:
> > Ludovic,
> >
> > I looked through. First of all, it seems to me you don't amend regular
> > "servlet" solr server, but the only embedded one.
> > Anyway, the difference is that you stream DocList via callback, but it
> > means that you've instantiated it in memory and keep it there until it
> will
> > be completely consumed. Think about a billion numfound. Core idea of my
> > approach is keep almost zero memory for response.
> >
> > Regards
> >
> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros  wrote:
> >
> >> Hi,
> >>
> >> I was looking for something similar.
> >>
> >> I tried this patch :
> >>
> >> https://issues.apache.org/jira/browse/SOLR-2112
> >>
> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
> >>
> >> Is it really different from what you are trying to achieve ?
> >>
> >> Ludovic.
> >>
> >> -
> >> Jouve
> >> France.
> >> --
> >> View this message in context:
> >>
>
> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>



-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru


 


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Rafal Gwizdala
That's bad news.
If 5-7 seconds is not safe then what is the safe interval for updates?
Near real-time is not for me as it works only when querying by document Id
- this doesn't solve anything in my case. I just want the index to be
updated in real-time, 30-40 seconds delay is acceptable but not much more
than that. Is there anything that can be done, or should I start looking
for some other indexing tool?
I'm wondering why there's such terrible performance degradation over time -
SOLR runs fine for first 10-20 hours, updates are extremely fast and then
they become slower and slower until eventually they stop executing at all.
Is there any issue with garbage collection or index fragmentation or some
internal data structures that can't manage their data effectively when
updates are frequent?

Best regards
RG


 Thu, Mar 29, 2012 at 10:24 AM, Lance Norskog  wrote:

> 5-7 seconds- there's the problem. If you want to have documents
> visible for search within that time, you want to use the trunk and
> "near-real-time" search. A hard commit does several hard writes to the
> disk (with the fsync() system call). It does not run smoothly at that
> rate. It is no surprise that eventually you hit a thread-locking bug.
>
>
> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet
>
> http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin
>
> On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
>  wrote:
> > Lance, I know there are many variables that's why I'm asking where to
> start
> > and what to check.
> > Updates are sent every 5-7 seconds, each update contains between 1 and 50
> > docs. Commit is done every time (on each update).
> > Currently queries aren't very frequent - about 1 query every 3-5 seconds,
> > but the system is going to handle much more (of course if the problem is
> > fixed).
> > The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
> > 300 MB)
> >
> > R
> >
> > On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog 
> wrote:
> >
> >> How often are updates? And when are commits? How many CPUs? How much
> >> query load? There are so many variables.
> >>
> >> Check the mailing list archives and Solr issues, there might be a
> >> similar problem already discussed. Also, attachments do not work with
> >> Apache mailing lists. (Well, ok, they work for direct subscribers, but
> >> not for indirect subscribers and archive site users.)
> >>
> >> --
> >> Lance Norskog
> >> goks...@gmail.com
> >>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


SolrCloud

2012-03-29 Thread asia
Hello,
I am working on solr.I have set up 2 solr instances on different systems i.e
i did sharding.I am using tomcat and eclipse environment.When I fire a query
in solrj for a data from  index,i get response when both system's tomcat is
working.But when I stop one of the system's server I dont get response from
any of the system.Is there any solution for this that when any one of the
system is down I will get response from any one of the server.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3867086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR hangs - update timeout - please help

2012-03-29 Thread Lance Norskog
5-7 seconds- there's the problem. If you want to have documents
visible for search within that time, you want to use the trunk and
"near-real-time" search. A hard commit does several hard writes to the
disk (with the fsync() system call). It does not run smoothly at that
rate. It is no surprise that eventually you hit a thread-locking bug.

http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/RealTimeGet
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/CommitWithin

On Wed, Mar 28, 2012 at 11:08 PM, Rafal Gwizdala
 wrote:
> Lance, I know there are many variables that's why I'm asking where to start
> and what to check.
> Updates are sent every 5-7 seconds, each update contains between 1 and 50
> docs. Commit is done every time (on each update).
> Currently queries aren't very frequent - about 1 query every 3-5 seconds,
> but the system is going to handle much more (of course if the problem is
> fixed).
> The system has 2 core CPU (virtualized) and 4 GB memory (SOLR uses about
> 300 MB)
>
> R
>
> On Thu, Mar 29, 2012 at 1:53 AM, Lance Norskog  wrote:
>
>> How often are updates? And when are commits? How many CPUs? How much
>> query load? There are so many variables.
>>
>> Check the mailing list archives and Solr issues, there might be a
>> similar problem already discussed. Also, attachments do not work with
>> Apache mailing lists. (Well, ok, they work for direct subscribers, but
>> not for indirect subscribers and archive site users.)
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>



-- 
Lance Norskog
goks...@gmail.com


Re: disadvantage one field in a catchall field

2012-03-29 Thread Gora Mohanty
On 29 March 2012 13:35, elisabeth benoit  wrote:
> Hi all,
>
> I'm using solr 3.4 with a catchall field and an edismaw request handler.
> I'd like to score higher answers matching with words not contained in one
> of the fields copied into my catchall field.
>
> So my catchallfield is called catchall. It contains, let's say, fields
> NAME, CATEGORY, TOWN, WAY and DESCRIPTION.
>
> For one query, I would like to have answers matching NAME, CATEGORY, TOWN
> and WAY scored higher, but I still want to search in DESCRIPTION.
>
> I tried
>
> qf=catchall DESCRIPTION^0.001,
[...]

As far as I know, this is not possible. Once the fields are copied
to a catch-all field, they are indistinguishable. Your only option is
to have a separate DESCRIPTION field if you want to down-boost
it.

Regards,
Gora


disadvantage one field in a catchall field

2012-03-29 Thread elisabeth benoit
Hi all,

I'm using solr 3.4 with a catchall field and an edismaw request handler.
I'd like to score higher answers matching with words not contained in one
of the fields copied into my catchall field.

So my catchallfield is called catchall. It contains, let's say, fields
NAME, CATEGORY, TOWN, WAY and DESCRIPTION.

For one query, I would like to have answers matching NAME, CATEGORY, TOWN
and WAY scored higher, but I still want to search in DESCRIPTION.

I tried

qf=catchall DESCRIPTION^0.001,

but this doesn't seem to change the scoring. When I set debutQuery=on,
parsedquery_toString looks like

(text:paus | DESCRIPTION:pause^0.001) (this seems like an OR to me)

but I see no trace of DESCRIPTION in explain

One solution I guess would be to keep DESCRIPTION in a separate filed, and
do not include it in my catchall field. But I wonder if there is a solution
with the catchall field???

Thanks for your help,
Elisabeth


Re: Solr Tomcat Install

2012-03-29 Thread Jamel ESSOUSSI
Hi,

You should disable velocity by adding -Dsolr.velocity.enabled=false to
JAVA_OPTS

--Jamel


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Tomcat-Install-tp3865290p3866947.html
Sent from the Solr - User mailing list archive at Nabble.com.