RE: Results driving me nuts!

2011-03-14 Thread cbennett
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Sunday, March 13, 2011 6:25 PM
> To: solr-user@lucene.apache.org; andy.ne...@gmail.com
> Subject: Re: Results driving me nuts!
> 
> 
> --- On Sun, 3/13/11, Andy Newby  wrote:
> 
> > From: Andy Newby 
> > Subject: Results driving me nuts!
> > To: solr-user@lucene.apache.org
> > Date: Sunday, March 13, 2011, 10:38 PM
> > Hi,
> >
> > Ok, I'm really really trying to get my head around this,
> > but I just can't :/
> >
> > Here are 2 example records, both using the query "st
> > patricks" to
> > search on (matches for the keywords are in **stars** like
> > so, to make
> > a point of what SHOULD be matching);
> >
> > keywords: animations mini alphabets **st** **patricks**
> > animated 1
> > clover  animations mini alphabets **st** **patricks**
> > description: animated 1 clover
> >
> > "124966":"
> > 209.23984 = (MATCH) product of:
> >   418.47968 = (MATCH) sum of:
> >     418.47968 = (MATCH) sum of:
> >       212.91336 = (MATCH) weight(keywords:st
> > in 5697), product of:
> >         0.41379675 =
> > queryWeight(keywords:st), product of:
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         514.5361 = (MATCH)
> > fieldWeight(keywords:st in 5697), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:st)=2)
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           48.0 =
> > fieldNorm(field=keywords, doc=5697)
> >       205.56633 = (MATCH)
> > weight(keywords:patricks in 5697), product of:
> >         0.4065946 =
> > queryWeight(keywords:patricks), product of:
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         505.58057 = (MATCH)
> > fieldWeight(keywords:patricks in 5697), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:patricks)=2)
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           48.0 =
> > fieldNorm(field=keywords, doc=5697)
> >   0.5 = coord(1/2)
> >
> > The other one:
> >
> > desc: a black and white mug of beer with a three leaf
> > clover in it
> > keywords: saint **patricks** day green irish
> > beer   spel132_bw clip
> > art holidays **st** **patricks** day
> > handle drink celebrate clip art holidays **st**
> > **patricks** day
> >
> > 5 matches
> >
> > "145351":"
> > 193.61652 = (MATCH) product of:
> >   387.23303 = (MATCH) sum of:
> >     387.23303 = (MATCH) sum of:
> >       177.4278 = (MATCH) weight(keywords:st
> > in 25380), product of:
> >         0.41379675 =
> > queryWeight(keywords:st), product of:
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         428.78006 = (MATCH)
> > fieldWeight(keywords:st in 25380), product of:
> >           1.4142135 =
> > tf(termFreq(keywords:st)=2)
> >           7.5798326 =
> > idf(docFreq=233, maxDocs=168578)
> >           40.0 =
> > fieldNorm(field=keywords, doc=25380)
> >       209.80525 = (MATCH)
> > weight(keywords:patricks in 25380), product of:
> >         0.4065946 =
> > queryWeight(keywords:patricks), product of:
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           0.05459181 = queryNorm
> >         516.006 = (MATCH)
> > fieldWeight(keywords:patricks in 25380), product of:
> >           1.7320508 =
> > tf(termFreq(keywords:patricks)=3)
> >           7.447905 =
> > idf(docFreq=266, maxDocs=168578)
> >           40.0 =
> > fieldNorm(field=keywords, doc=25380)
> >   0.5 = coord(1/2)
> >
> >
> > Now the thing thats getting me, is the record which has 5
> > occurencs of
> > "st patricks" , is so different in terms of the scores it
> > gives!
> >
> > 209.23984
> > 193.61652
> >
> > (these should be the other way around)
> >
> > Can anyone try and explain whats going on with this?
> >
> > BTW, the queries are matched based on a normal "white
> > space" index,
> > nothing special.
> >
> > The actual query being used, is as follows:
> >
> > (keywords:"st" AND keywords:"patricks") OR
> > (description:"st" AND
> > description:"patricks")
> >
> > TIA - I'm hoping someone can save my sanity ;)
> 
> Their fieldNorm values are different. Norm consists of index time boost
> and length normalization.
> 
> http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/S
> imilarity.html#formula_norm
> 
> I can see that the one with 5 matches is longer than the other. Shorter
> documents are favored in solr/lucene with length normalization factor.
> 
> 
> 

Also the term frequency for patricks is different in each document

For 1st doc termFreq(keywords:st)=2 and for 2nd doc
termFreq(keywords:patricks)=3






RE: Filter Query, Filter Cache and Hit Ratio

2011-01-28 Thread cbennett
Ooops,

I meant NOW/DAY 

> -Original Message-
> From: cbenn...@job.com [mailto:cbenn...@job.com]
> Sent: Friday, January 28, 2011 3:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Filter Query, Filter Cache and Hit Ratio
> 
> Hi,
> 
> You've used NOW in the range query which will give a date/time accurate
> to
> the millisecond, try using NOW\DAY
> 
> Colin.
> 
> > -Original Message-
> > From: Renaud Delbru [mailto:renaud.del...@deri.org]
> > Sent: Friday, January 28, 2011 2:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Filter Query, Filter Cache and Hit Ratio
> >
> > Hi,
> >
> > I am looking for some more information on how the filter cache is
> > working, and how the hit are incremented.
> >
> > We are using filter queries for certain predefined value, such as the
> > timestamp:[2011-01-21T00:00:00Z+TO+NOW] (which is the current day).
> > From
> > what I understand from the documentation:
> > "the filter cache stores the results of any filter queries ("fq"
> > parameters) that Solr is explicitly asked to execute. (Each filter is
> > executed and cached separately. When it's time to use them to limit
> the
> > number of results returned by a query, this is done using set
> > intersections.)"
> > So, we were imagining that is two consecutive queries (as the one
> > above)
> > was using the same timestamp filter query, the second query will take
> > advantage of the filter cache, and we would see the number of hits
> > increasing (hit on the cached timestamp filter query) . However, this
> > is
> > not the case, the number of hits on the filter cache does not
> increase
> > and stays very low. Is it normal ?
> >
> > INFO: [] webapp=/siren path=/select
> >
> params={wt=javabin&rows=0&version=2&fl=id,score&start=0&q=*:*&isShard=t
> > rue&fq=timestamp:[2011-01-
> > 21T00:00:00Z+TO+NOW]&fq=domain:my.wordpress.com&fsv=true}
> > hits=0 status=0 QTime=139
> > INFO: [] webapp=/siren path=/select
> >
> params={wt=javabin&rows=0&version=2&fl=id,score&start=0&q=*:*&isShard=t
> > rue&fq=timestamp:[2011-01-
> > 21T00:00:00Z+TO+NOW]&fq=domain:syours.wordpress.com&fsv=true}
> > hits=0 status=0 QTime=138
> >
> > --
> > Renaud Delbru
> 
> 
> 






RE: Filter Query, Filter Cache and Hit Ratio

2011-01-28 Thread cbennett
Hi,

You've used NOW in the range query which will give a date/time accurate to
the millisecond, try using NOW\DAY

Colin.

> -Original Message-
> From: Renaud Delbru [mailto:renaud.del...@deri.org]
> Sent: Friday, January 28, 2011 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: Filter Query, Filter Cache and Hit Ratio
> 
> Hi,
> 
> I am looking for some more information on how the filter cache is
> working, and how the hit are incremented.
> 
> We are using filter queries for certain predefined value, such as the
> timestamp:[2011-01-21T00:00:00Z+TO+NOW] (which is the current day).
> From
> what I understand from the documentation:
> "the filter cache stores the results of any filter queries ("fq"
> parameters) that Solr is explicitly asked to execute. (Each filter is
> executed and cached separately. When it's time to use them to limit the
> number of results returned by a query, this is done using set
> intersections.)"
> So, we were imagining that is two consecutive queries (as the one
> above)
> was using the same timestamp filter query, the second query will take
> advantage of the filter cache, and we would see the number of hits
> increasing (hit on the cached timestamp filter query) . However, this
> is
> not the case, the number of hits on the filter cache does not increase
> and stays very low. Is it normal ?
> 
> INFO: [] webapp=/siren path=/select
> params={wt=javabin&rows=0&version=2&fl=id,score&start=0&q=*:*&isShard=t
> rue&fq=timestamp:[2011-01-
> 21T00:00:00Z+TO+NOW]&fq=domain:my.wordpress.com&fsv=true}
> hits=0 status=0 QTime=139
> INFO: [] webapp=/siren path=/select
> params={wt=javabin&rows=0&version=2&fl=id,score&start=0&q=*:*&isShard=t
> rue&fq=timestamp:[2011-01-
> 21T00:00:00Z+TO+NOW]&fq=domain:syours.wordpress.com&fsv=true}
> hits=0 status=0 QTime=138
> 
> --
> Renaud Delbru






RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread cbennett

> Where do you get your Lucene/Solr downloads from?
> 
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
> 





RE: Query question

2010-11-03 Thread cbennett
Another option is to override the default operator in the query.

{!lucene q.op=OR}city:Chicago^10 +Romantic +View

Colin.

> -Original Message-
> From: Mike Sokolov [mailto:soko...@ifactory.com]
> Sent: Wednesday, November 03, 2010 9:42 AM
> To: solr-user@lucene.apache.org
> Cc: kenf_nc
> Subject: Re: Query question
> 
> Another alternative (prettier to my eye), would be:
> 
> (city:Chicago AND Romantic AND View)^10 OR (Romantic AND View)
> 
> 
> -Mike
> 
> 
> 
> On 11/03/2010 09:28 AM, kenf_nc wrote:
> > Unfortunately the default operator is set to AND and I can't change
> that at
> > this time.
> >
> > If I do  (city:Chicago^10 OR Romantic OR View) it returns way too
> many
> > unwanted results.
> > If I do (city:Chicago^10 OR (Romantic AND View)) it returns less
> unwanted
> > results, but still a lot.
> > iorixxx's solution of (Romantic AND View AND (city:Chicago^10 OR (*:*
> > -city:Chicago))) does seem to work. Chicago results are at the top,
> and the
> > remaining results seem to fit the other search parameters. It's an
> ugly
> > query, but does seem to do the trick for now until I master Dismax.
> >
> > Thanks all!
> >
> >





RE: Tomcat startup script

2010-06-08 Thread cbennett
The following should work on centos/redhat, don't forget to edit the paths,
user, and java options for your environment. You can use chkconfig to add it
to your startup.

Note, this script assumes that the Solr webapp is configured using JNDI in a
tomcat context fragment. If not you will need to add something like
-Dsolr.solr.home=/solr/home to the JAVA_OPTS line.

Colin.

#!/bin/sh
# chkconfig: 345 99 1
# description: Tomcat6 service
# processname: java

. /etc/init.d/functions

my_log_message()
{
ACTION=$1
shift

case "$ACTION" in
success)
echo -n $*
success "$*"
echo
;;
failure)
echo -n $*
failure "$*"
echo
;;
warning)
echo -n $*
warning "$*"
echo
;;
*)
;;
esac
}
log_success_msg()
{
my_log_message success "$*"
}
log_failure_msg()
{
my_log_message failure "$*"
}
log_warning_msg()
{
my_log_message warning "$*"
}

export JAVA_HOME=/usr/java/default
export TOMCAT_USER=solr
export CATALINA_HOME=/opt/solr/production/tomcat6
export CATALINA_PID=$CATALINA_HOME/bin/tomcat6.pid
JAVA_OPTS="-server -Xms6G -Xmx6G -XX:+UseConcMarkSweepGC"
export JAVA_OPTS

[ -d "$CATALINA_HOME" ] || { echo "Tomcat requires $CATALINA_HOME."; exit 1;
}

case $1 in

start|stop|run) 
if su $TOMCAT_USER bash -c "$CATALINA_HOME/bin/catalina.sh $1"; then
log_success_msg "Tomcat $1 successful"
[ $1 == "stop" ] && rm -f $CATALINA_PID
else
log_failure_msg "Error in Tomcat $1: $?"
fi
;;

restart)
$0 start
$0 stop
;;

status)
if [ -f "$CATALINA_PID" ]; then
read kpid < "$CATALINA_PID"
if ps --pid $kpid 2>&1 1>/dev/null; then
echo "$0 is already running at ${kpid}"
else
echo "$CATALINA_PID found, but $kpid is not running"
fi
unset kpid
else
echo "$0 is stopped"
fi
;;

esac   
exit 0


> -Original Message-
> From: Sixten Otto [mailto:six...@sfko.com]
> Sent: Tuesday, June 08, 2010 3:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Tomcat startup script
> 
> On Tue, Jun 8, 2010 at 11:00 AM, K Wong  wrote:
> > Okay. I've been running multicore Solr 1.4 on Tomcat 5.5/OpenJDK 6
> > straight out of the centos repo and I've not had any issues. We're
> not
> > doing anything wild and crazy with it though.
> 
> It's nice to know that the wiki's advice might be out of date. That
> doesn't really help me with my immediate problem (lacking the script
> the wiki is trying to provide), though, unless I want to rip out what
> I've got and start over. :-/
> 
> Sixten





RE: DIH, Full-Import, DB and Performance.

2010-06-01 Thread cbennett
Performance is dependent on your server/data and the batchsize. To reduce
the server load experiment with different batchsize settings. The higher the
batch size the faster the import and the higher your SQL Server load will
be. Try starting with a small batch and then gradually increasing it.

Colin.

> -Original Message-
> From: stockii [mailto:st...@shopgate.com]
> Sent: Tuesday, June 01, 2010 12:31 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH, Full-Import, DB and Performance.
> 
> 
> thx for the reply =)
> 
> 
> i try out selectMethod="cursor"  but the load of the server is going
> bigger
> and bigger during a import =(
> 
> selectMethod="cursor" only solve the problem with the locking ? right ?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p862043.html
> Sent from the Solr - User mailing list archive at Nabble.com.





RE: DIH, Full-Import, DB and Performance.

2010-06-01 Thread cbennett
The settings and defaults will depend on which version of SQL Server you are
using and which version of the JDBC driver.

The default for resonseBuffering was changed to adaptive after version 1.2
so unless you are using 1.2 or earlier you don't need to set it to adaptive.

Also if I remember correctly the batchsize will only take affect if you are
using cursors, the default is for all data to be sent to the client
(selectMethod is direct).

Using the default settings for the MS sqljdbc driver caused locking issues
in our database. As soon as the full import started shared locks would be
set on all rows and wouldn't be removed until all the data had been sent,
which for us would be around 30 minutes. During that time no updates could
get an exclusive lock which of course led to huge problems.

Setting selectMethod="cursor" solved the problem for us although it does
slow down the full import.

Another option that worked for us was to not set the selectMethod and set
readOnly="true", but be sure you understand the implications. This causes
all data to be sent to the client (which is the default), giving maximum
performance, and causes no locks to be set which resolves the other issues.
However, this sets transaction isolation to TRANSACTION_READ_UNCOMMITTED
which will cause the select statement to ignore any locks when getting data
so the consistency of the data cannot be guaranteed, which may or may not be
an issue depending on your particular situation.


Colin.

> -Original Message-
> From: stockii [mailto:st...@shopgate.com]
> Sent: Tuesday, June 01, 2010 7:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH, Full-Import, DB and Performance.
> 
> 
> do you think that the option
> 
> responseBuffer="adaptive"
> 
> should solve my problem ?
> 
> 
> From DIH FAQ ...:
> 
> I'm using DataImportHandler with MS SQL Server database with sqljdbc
> driver.
> DataImportHandler is going out of memory. I tried adjustng the
> batchSize
> values but they don't seem to make any difference. How do I fix this?
> 
> There's a connection property called responseBuffering in the sqljdbc
> driver
> whose default value is "full" which causes the entire result set to be
> fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for
> more
> details. You can set this property to "adaptive" to keep the driver
> from
> getting everything into memory. Connection properties like this can be
> set
> as an attribute (responseBuffering="adaptive") in the dataSource
> configuration OR directly in the jdbc url specified in
> DataImportHandler's
> dataSource configuration.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p861134.html
> Sent from the Solr - User mailing list archive at Nabble.com.





RE: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-04 Thread cbennett
Hi,

This could also be caused by performing an optimize after the commit, or it
could be caused by auto warming the caches, or a combination of both.

If you are using the Data Import Handler the default for a delta import is
commit and optimize, which caused us a similar problem except we were
optimizing a 7 million document, 23Gb index with every delta import which
was taking over 10 minutes. As soon as we added optimize=false to the
command updates took a few seconds. You can always add separate calls to
perform the optimize when it's convenient for you.

To see if the problem is auto warming take a look at the warm up time for
the searcher. If this is the cause you will need to consider lowering the
autowarmCount for your caches. 


Colin.

> -Original Message-
> From: Markus Fischer [mailto:mar...@fischer.name]
> Sent: Tuesday, May 04, 2010 6:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Commit takes 1 to 2 minutes, CPU usage affects other apps
> 
> On 04.05.2010 11:01, Peter Sturge wrote:
> > It might be worth checking the VMWare environment - if you're using
> the
> > VMWare scsi vmdk and it's shared across multiple VMs and there's a
> lot of
> > disk contention (i.e. multiple VMs are all busy reading/writing
> to/from the
> > same disk channel), this can really slow down I/O operations.
> 
> Ok, thanks, I'll try to get the information from my hoster.
> 
> I noticed that the commiting seems to be constant in time: it doesn't
> matter whether I'm updating only one document or 50 (usually it won't
> be
> more). Maybe these numbers are so low anyway to cause any real impact
> ...
> 
> - Markus






RE: Problem with DIH delta-import on JDBC

2010-04-29 Thread cbennett
Hi,

It looks like the deltaImportQuery needs to be changed you are using
dataimporter.delta.id which is not correct, you are selecting objected in
the deltaQuery, so the deltaImportQuery should be using
dataimporter.delta.objectid

So try this:




Colin.

> -Original Message-
> From: safl [mailto:s...@salamin.net]
> Sent: Wednesday, April 28, 2010 3:05 PM
> To: solr-user@lucene.apache.org
> Subject: Problem with DIH delta-import on JDBC
> 
> 
> Hello,
> 
> I'm just new on the list.
> I searched a lot on the list, but I didn't find an answer to my
> question.
> 
> I'm using Solr 1.4 on Windows with an Oracle 10g database.
> I am able to do full-import without any problem, but I'm not able to
> get
> delta-import working.
> 
> I have the following in the data-config.xml:
> 
> ...
>  query="select * from table"
> deltaImportQuery="select * from table where
> objectid='${dataimporter.delta.id}'"
> deltaQuery="select objectid from table where lastupdate >
> '${dataimporter.last_index_time}'">
> 
> ...
> 
> I update some records in the table and the try to run a delta-import.
> I track the SQL queries on DB with P6Spy, and I always see a query like
> 
> select * from table where objectid=''
> 
> Of course, with such an SQL query, nothing is updated in my index.
> 
> It behave the same if I replace ${dataimporter.delta.id} by
> ${dataimporter.delta.objectid}.
> Can someone tell what is wrong with it?
> 
> Thanks a lot,
>  Florian
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-DIH-delta-import-on-
> JDBC-tp763469p763469.html
> Sent from the Solr - User mailing list archive at Nabble.com.





RE: question about schemas

2009-12-02 Thread cbennett
Solr supports multi value fields so you could store one document per customer 
and have multi value fields for the product information.

Colin.

> -Original Message-
> From: solr-user [mailto:solr-u...@hotmail.com]
> Sent: Tuesday, December 01, 2009 6:27 PM
> To: solr-user@lucene.apache.org
> Subject: question about schemas
> 
> 
> I just started using Solr, and I am trying to figure out how to setup
> my
> schema. I know that Solr doesn’t have JOINs, and so I am having some
> difficulty figuring out how would I setup a schema for the following
> fictional situation.  For example, let us say that :
> 
> - I have a 1+ customers, each having some specific info
> (StoreId , Name,
> Phone, Address, City, State, Zip, etc)
> - Each customer has a subset of the 100+ products I am looking to
> track,
> each product having some specific info (ProductId, Name, Width, Height,
> Depth, Weight, Density, etc)
> - I want to be able to search by the product info but have facets
> return the
> number of customers, rather than the number of products, that meet my
> criteria
> - I want to display (and sort) customers based on my product search
> 
> In relational databases, I would simply create two tables (customer and
> product) and JOIN them.  I could then craft a sql query to count the
> number
> of distinct StoreId values in the result (something like facets).
> 
> In Solr, however, there are no joins.  As far as I can tell, my options
> are
> to:
> 
> - create two Solr instances, one with customer info and one with
> product
> info; I would search the product Solr instance and identify the StoreId
> values return, and then use that info to search the customer Solr
> instance
> to get the customer info.  The problem with this is the second query
> could
> have ten thousand ANDs (one for each StoreId returned by the first
> query)
> - create a single Solr instance that contains a denormalized
> version of the
> data where each doc would contain both the customer info and the
> product
> info for a given product.  The problem with this is that my facets
> would
> return the number of products, not the number of customers
> - create a single Solr instance that contains a denormalized
> version of the
> data where each doc contains the customer info and info for ALL
> products
> that the  customer might have (likely done via dynamicfields). The
> problem
> with this is that my schema would be a bit messy and that my queries
> could
> have hundreds of ANDs and Ors (one AND for each product field, and one
> OR
> for each product); for example, q=((Width1:50 AND Density1:7) OR
> (Width2:50
> AND Density2:7) OR …)
> 
> Does anyone have any advice on this?  Are there other schemas that
> might
> work?  Hopefully the example makes sense.
> 
> --
> View this message in context: http://old.nabble.com/question-about-
> schemas-tp26600956p26600956.html
> Sent from the Solr - User mailing list archive at Nabble.com.






RE: Solr and Garbage Collection

2009-09-25 Thread cbennett
I would look at the JVM. Have you tried switching to the concurrent low
pause collector ?

Colin.


-Original Message-
From: Jonathan Ariel [mailto:ionat...@gmail.com] 
Sent: Friday, September 25, 2009 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr and Garbage Collection

You are saying that I should give more memory than 12GB?
When I was with 10GB I had the exceptions that I sent. Switching to 12GB
made them disappear.
So I think I don't have problems with FieldCache any more. What it seems
like a problem is 11% on the application time dedicated to GC. Specially
when those servers are under really heavy load.
I think that's why I sometimes get queries that in one moment are being
executed in a few ms and a moment after 20 seconds!

It seems like I should tune my jvm, don't you think so?

On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi  wrote:

> Give it even more memory.
>
> Lucene FieldCache is used to store non-tokenized single-value non-boolean
> (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance
> for
> sorting query results.
>
> So that if you have 100,000,000 documents with specific heavily
distributed
> field values (cardinality is high! Size is 100bytes!) you need
> 10,000,000,000 bytes for just this instance of FieldCache.
>
> GC does not play any role. FieldCache won't be GC-collected.
>
>
> -Fuad
> http://www.linkedin.com/in/liferay
>
>
>
> > -Original Message-
> > From: Jonathan Ariel [mailto:ionat...@gmail.com]
> > Sent: September-25-09 11:37 AM
> > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> > Subject: Re: Solr and Garbage Collection
> >
> > Right, now I'm giving it 12GB of heap memory.
> > If I give it less (10GB) it throws the following exception:
> >
> > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> > at
> >
>
>
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
> 61
> > )
> > at
> >
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
> > at
> >
>
>
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3
> 52
> > )
> > at
> >
>
>
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2
> 67
> > )
> > at
> >
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185)
> > at
> >
>
>
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2
> 07
> > )
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104)
> > at
> >
>
>
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java
> :7
> > 0)
> > at
> >
>
>
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
> le
> > r.java:169)
> > at
> >
>
>
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
> ja
> > va:131)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
> > at
> >
>
>
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
> 03
> > )
> > at
> >
>
>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
> 23
> > 2)
> > at
> >
>
>
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
> .j
> > ava:1089)
> > at
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> > at
> >
>
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > at
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> > at
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> > at
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> > at
> >
>
>
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
> ec
> > tion.java:211)
> > at
> >
>
>
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11
> 4)
> > at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> > at org.mortbay.jetty.Server.handle(Server.java:285)
> > at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> > at
> >
>
>
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:
> 83
> > 5)
> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
> > at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> > at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> > at
> >
>
>
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22
> 6)
> > at
> >
>
>
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4
> 42
> > )
> >
> > On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
> > wrote:
> >
> > > On Fri, Sep 25, 2009 at 9:30 AM, Jonathan

RE: Solr and Garbage Collection

2009-09-25 Thread cbennett
Hi,

Have you looked at tuning the garbage collection ?

Take a look at the following articles

http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot
-camp-draft/
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

Changing to the concurrent or throughput collector should help with the long
pauses.


Colin.

-Original Message-
From: Jonathan Ariel [mailto:ionat...@gmail.com] 
Sent: Friday, September 25, 2009 11:37 AM
To: solr-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: Solr and Garbage Collection

Right, now I'm giving it 12GB of heap memory.
If I give it less (10GB) it throws the following exception:

Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3
61)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3
52)
at
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2
67)
at
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2
07)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java
:70)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
ler.java:169)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
03)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
ection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11
4)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:
835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22
6)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4
42)

On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
wrote:

> On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel 
> wrote:
> > Hi to all!
> > Lately my solr servers seem to stop responding once in a while. I'm
using
> > solr 1.3.
> > Of course I'm having more traffic on the servers.
> > So I logged the Garbage Collection activity to check if it's because of
> > that. It seems like 11% of the time the application runs, it is stopped
> > because of GC. And some times the GC takes up to 10 seconds!
> > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
> > servers. My index is around 10GB and I'm giving to the instances 10GB of
> > RAM.
>
> Bigger heaps lead to bigger GC pauses in general.
> Do you mean that you are giving the JVM a 10GB heap?  Were you getting
> OOM exceptions with a smaller heap?
>
> -Yonik
> http://www.lucidimagination.com
>