Re: Who is running 1.4 nightly in production?

2009-05-13 Thread Andrew McCombe
We are using a nightly from 13/04.  I've found one issue with the PHP
ResponseWriter but apart from that it has been pretty solid.

I'm using the bundled Jetty server to run it for the moment but hope
to move to Tomcat once released and stable (and I have learned
Tomcat!).

Andrew


2009/5/12 Walter Underwood :
> We're planning our move to 1.4, and want to run one of our production
> servers with the new code. Just to feel better about it, is anyone else
> running 1.4 in production?
>
> I'm building 2009-05-11 right now.
>
> wuner
>
>


RE: Solr Loggin issue

2009-05-13 Thread Sagar Khetkade

In addition to earlier mail I have a particular scenario. For that I have to 
explain my application level logging in detail.
 
I am using solr as embedded server. I am using solr with Solr-560-slf4j patch. 
I need logging information for solr. Right now my application is using log4j 
for logging and the log4j.properties files is in my WEB-INF. It is working 
fine.  But the error, info and severe logs generated by solr are going to the 
stdout.log file  of tomacat as it is using logging.properties file from the 
jre/lib folder and using the ConsolHandler
I tried with n number of combination but cant configure to make the solr log 
releated information to go the application logger or the new logger specified 
in the FileHandler.
I addition whether there is some other issue that I am unable to figure it out.
 
Please help me out of this scenario. I am stuck up with this issue. 
 
~Sagar 
 
> From: sagar.khetk...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: Solr Loggin issue
> Date: Wed, 13 May 2009 09:21:57 +0530
> 
> 
> 
> 
> I have only one log4j.properties file in classpath and even if i configure 
> for the particular package where the solr exception would come then also the 
> same issue. I had removed the logger for my application and using only for 
> solr logging.
> 
> 
> 
> ~Sagar
> 
> 
> 
> 
> 
> > Date: Tue, 12 May 2009 09:59:01 -0700
> > Subject: Re: Solr Loggin issue
> > From: jayallenh...@gmail.com
> > To: solr-user@lucene.apache.org
> > 
> > Usually that means there is another log4j.properties or log4j.xml file in
> > your classpath that is being found before the one you are intending to use.
> > Check your classpath for other versions of these files.
> > 
> > -Jay
> > 
> > 
> > On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade
> > wrote:
> > 
> > >
> > > Hi,
> > > I have solr implemented in multi-core scenario and also implemented
> > > solr-560-slf4j.patch for implementing the logging. But the problem I am
> > > facing is that the logs are going to the stdout.log file not the log file
> > > that I have mentioned in the log4j.properties file. Can anybody give me 
> > > work
> > > round to make logs go into the logger mentioned in log4j.properties file.
> > > Thanks in advance.
> > >
> > > Regards,
> > > Sagar Khetkade
> > > _
> > > Live Search extreme As India feels the heat of poll season, get all the
> > > info you need on the MSN News Aggregator
> > > http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx
> > >
> 
> _
> Live Search extreme As India feels the heat of poll season, get all the info 
> you need on the MSN News Aggregator
> http://news.in.msn.com/National/indiaelections2009/aggregator/default.aspx

_
Planning the weekend ? Here’s what is happening in your town.
http://msn.asklaila.com/events/

Re: Who is running 1.4 nightly in production?

2009-05-13 Thread Markus Jelsma - Buyways B.V.
Thats probably Jira #1063. We have only seen it in the spellcheck
results and only in PHPS and not in PHP ResponseWriter.
https://issues.apache.org/jira/browse/SOLR-1063

-  
Markus Jelsma  Buyways B.V. Tel. 050-3118123
Technisch ArchitectFriesestraatweg 215c Fax. 050-3118124
http://www.buyways.nl  9743 AD GroningenKvK  01074105


On Wed, 2009-05-13 at 09:03 +0100, Andrew McCombe wrote:

> We are using a nightly from 13/04.  I've found one issue with the PHP
> ResponseWriter but apart from that it has been pretty solid.
> 
> I'm using the bundled Jetty server to run it for the moment but hope
> to move to Tomcat once released and stable (and I have learned
> Tomcat!).
> 
> Andrew
> 
> 
> 2009/5/12 Walter Underwood :
> > We're planning our move to 1.4, and want to run one of our production
> > servers with the new code. Just to feel better about it, is anyone else
> > running 1.4 in production?
> >
> > I'm building 2009-05-11 right now.
> >
> > wuner
> >
> >


Re: Newbie question

2009-05-13 Thread Wayne Pope

Hello Shalin,

thaks you for your help. yes it answers my question.

Much appreciated



Shalin Shekhar Mangar wrote:
> 
> On Tue, May 12, 2009 at 9:48 PM, Wayne Pope
> wrote:
> 
>>
>> I have this request:
>>
>>
>> http://localhost:8983/solr/select?start=0&rows=20&qt=dismax&q=copy&hl=true&hl.snippets=4&hl.fragsize=50&facet=true&facet.mincount=1&facet.limit=8&facet.field=type&fq=company-id%3A1&wt=javabin&version=2.2
>>
>> (I've been using this to see it rendered in the browser:
>>
>> http://localhost:8983/solr/select?indent=on&version=2.2&q=copy&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features&hl=true&hl.fragsize=50
>> )
>>
>>
>> that I've been trying out. I get a good responce - however the
>> hl.fragsize
>> is ignored and the hl.fragsize in the solrconfig.xml is ignored. Instead
>> I
>> get back the whole document (10,000 chars!) in the doc txt field. And
>> bizarely the response header is this:
>>
> 
> hl.fragsize is relevant only for the snippets created by the highlighter.
> The returned fields will always have the complete data for a document.
> Does
> that answer your question?
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Newbie-question-tp23505802p23518485.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: master/slave failure scenario

2009-05-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, May 13, 2009 at 12:10 PM, nk 11  wrote:
> Hello
>
> I'm kind of new to Solr and I've read about replication, and the fact that a
> node can act as both master and slave.
> I a replica fails and then comes back on line I suppose that it will resyncs
> with the master.
right
>
> But what happnes if the master fails? A slave that is configured as master
> will kick in? What if that slave is not yes fully sync'ed with the failed
> master and has old data?
if the master fails you can't index the data. but the slaves will
continue serving the requests with the last index. You an bring back
the master up and resume indexing.

>
> What happens when the original master comes back on line? He will remain a
> slave because there is another node with the master role?
>
> Thank you!
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: master/slave failure scenario

2009-05-13 Thread nk 11
Nice.
What if the master fails permanently (like a disk crash...) and the new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 

> On Wed, May 13, 2009 at 12:10 PM, nk 11  wrote:
> > Hello
> >
> > I'm kind of new to Solr and I've read about replication, and the fact
> that a
> > node can act as both master and slave.
> > I a replica fails and then comes back on line I suppose that it will
> resyncs
> > with the master.
> right
> >
> > But what happnes if the master fails? A slave that is configured as
> master
> > will kick in? What if that slave is not yes fully sync'ed with the failed
> > master and has old data?
> if the master fails you can't index the data. but the slaves will
> continue serving the requests with the last index. You an bring back
> the master up and resume indexing.
>
> >
> > What happens when the original master comes back on line? He will remain
> a
> > slave because there is another node with the master role?
> >
> > Thank you!
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig

Terence Gannon schrieb:

Paul -- thanks for the reply, I appreciate it.  That's a very
practical approach, and is worth taking a closer look at.  Actually,
taking your idea one step further, perhaps three fields; 1) ownerUid
(uid of the document's owner) 2) grantedUid (uid of users who have
been granted access), and 3) deniedUid (uid of users specifically
denied access to the document).


Grants might change quite a bit, the owner will likely remain the same.

Wouldn't it be better to include only the owner in the document and
store grants someplace else, like in an RDBMS or - if you don't want
one - a lightweight embedded database like BDB?

That way you could have your application tag an ineluctable filter query
onto each and every user query, which would ensure to include only those
documents in the results the owner of which has granted the user access.

Considering that I'm a Solr/Lucene newbie, this approach might have a
disadvantage that escapes me, which is why other people haven't made
this particular suggestion. If so, I'd be happy to learn why this isn't
preferable.

Michael Ludwig


Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-13 Thread Grant Ingersoll
Hmmm, maybe we need to think about someway to hook this into the build  
process or make it easier to just drop it into the conf or lib dirs.   
I'm no web.xml expert, but I'm sure you're not the first one to want  
to do this kind of thing.


The easiest way _might_ be to patch build.xml to take a property for  
the location of the web.xml, defaulting to the current Solr one.   
Then, people who want to use their own version could just pass in - 
Dweb.xml=.  The downside to this is that it may  
cause problems for us devs when users ask questions about strange  
behavior and it turns out they have mucked up the web.xml


FYI: dist-war is in build.xml, not common-build.xml.

-Grant

On May 12, 2009, at 5:52 AM, Jacob Singh wrote:


Hi folks,

I just wrote a Servlet Filter to handle authentication for our
service.  Here's what I did:

1. Created a dir in contrib
2. Put my project in there, I took the dataimporthandler build.xml as
an example and modified it to suit my needs.  Worked great!
3. ant dist now builds my jar and includes it

I now need to modify web.xml to add my filter-mapping, init params,
etc.  How can I do this cleanly?  Or do I need to manually open up the
archive and edit it and then re-war it?

In common-build I don't see a target for dist-war, so don't see how it
is possible...

Thanks!
Jacob

--

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: camel-casing and dismax troubles

2009-05-13 Thread Yonik Seeley
On Tue, May 12, 2009 at 7:19 PM, Geoffrey Young
 wrote:
> hi all :)
>
> I'm having trouble with camel-cased query strings and the dismax handler.
>
> a user query
>
>  LeAnn Rimes
>
> isn't matching the indexed term
>
>  Leann Rimes

This is the camel-case case that can't currently be handled by a
single WordDelimiterFilter.

If the indexeddoc had LeAnn, then it would be indexed as
"le","ann"/"leann" and hence queries of both forms "le ann" and
"leann" would match.

However since the indexed term is simply "leann", a
WordDelimiterFilter configured to split won't match (a search for
"LeAnn" will be translated into a search for "le" "ann".

One way to work around this now is to do a copyField into another
field that catenates split terms in the query analyzer instead of
generating/splitting, and then search across both fields.

BTW, your parsed query below shows you turned on both catenation and
generation (or perhaps preserveOriginal) for split subwords in your
query analyzer.  Unfortunately this configuration doesn't work due to
the ambiguity of what it means to have multiple terms at the same
position (this is the same problem for multi-word synonyms at query
time).  The query shown below looks for "leann" or "le" followed by
"ann" and hence an indexed term of "leann" won't match.

-Yonik
http://www.lucidimagination.com

> even though both are lower-cased in the end.  furthermore, the
> analysis tool shows a match.
>
> the debug query looks like
>
>  "parsedquery":"+((DisjunctionMaxQuery((search-en:\"(leann le)
> ann\")) DisjunctionMaxQuery((search-en:rimes)))~2) ()",
>
> I have a feeling it's due to how the broken up tokens are added back
> into the token stream with PreserveOriginal, and some strange
> interaction between that order and dismax, but I'm not entirely sure.
>
> configs follow.  thoughts appreciated.
>
> --Geoff
>
>   positionIncrementGap="100">
>    
>      
>      
>                                                            generateWordParts="1"
>                                                      generateNumberParts="1"
>                                                      catenateWords="1"
>                                                      catenateNumbers="1"
>                                                      catenateAll="1"/>
>      
>       synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>       words="stopwords-en.txt"/>
>    
>
>    
>      
>      
>                                                            generateWordParts="1"
>                                                      generateNumberParts="1"
>                                                      catenateWords="0"
>                                                      catenateNumbers="0"
>                                                      catenateAll="0"/>
>      
>       words="stopwords-en.txt"/>
>    
>  
>


Delete documents from index with dataimport

2009-05-13 Thread Andrew McCombe
Hi

Is it possible, through dataimport handler to remove an existing
document from the Solr index?

I import/update from my database where the active field is true.
However, if the client then set's active to false, the document stays
in the Solr index and doesn't get removed.

Regards
Andrew


RE: Selective Searches Based on User Identity

2009-05-13 Thread Terence Gannon
Yes, the ownerUid will likely be assigned once and never changed.  But
you still need it, in order to keep track of who has contributed which
document.

I've been going over some of the simpler query scenarios, and Solr is
capable of handling them without having to resort to an external
RDBMS.  In order to limit documents to those which a given user owns,
or those to which he has been granted access, the syntax fragment
would be something like;

ownerUid:ab2734 or grantedUid:ab2734

where abs2734 is the uid for the user doing the query.  However, I'm
less comfortable with more complex query scenarios, particularly if
the concept of groups is eventually introduced, which is likely in my
scenario.
In the latter case, it may be necessary to use an external RDBMS.
I'll plead ignorance of the 'ineluctable filter query' and will have
to read up on that one.

With respect to updates to rights, they are not likely to be that
frequent, but when they are, they entire document will have to be
reindexed rather than simply updating the grantedUid and/or deniedUid
fields.  I don't believe Solr supports the updating of individual
fields, at least not yet.  This may be another reason to eventually go
to an external RDBMS.

Thanks very much for your help!

Terence

-Original Message-
From: Michael Ludwig
Sent: May 13, 2009 05:27
To: solr-user@lucene.apache.org
Subject: Re: Selective Searches Based on User Identity

Terence Gannon schrieb:
> Paul -- thanks for the reply, I appreciate it.  That's a very
> practical approach, and is worth taking a closer look at.  Actually,
> taking your idea one step further, perhaps three fields; 1) ownerUid
> (uid of the document's owner) 2) grantedUid (uid of users who have
> been granted access), and 3) deniedUid (uid of users specifically
> denied access to the document).

Grants might change quite a bit, the owner will likely remain the same.

Wouldn't it be better to include only the owner in the document and
store grants someplace else, like in an RDBMS or - if you don't want
one - a lightweight embedded database like BDB?

That way you could have your application tag an ineluctable filter query
onto each and every user query, which would ensure to include only those
documents in the results the owner of which has granted the user access.

Considering that I'm a Solr/Lucene newbie, this approach might have a
disadvantage that escapes me, which is why other people haven't made
this particular suggestion. If so, I'd be happy to learn why this isn't
preferable.

Michael Ludwig


Solr vs Sphinx

2009-05-13 Thread wojtekpia

I came across this article praising Sphinx:
http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
specifically mentions Solr as an 'aging' technology, and states that
performance on Sphinx is 2x-4x faster than Solr. Has anyone compared Sphinx
to Solr? Or used Sphinx in the past? I realize that you can't just say one
is faster than the other because it depends so much on configuration,
requirements, # docs, size of each doc, etc. I'm just looking for general
observations. I've found other articles comparing Solr with Sphinx and most
state that performance is similar between the two. 

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23524676.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replication master+slave

2009-05-13 Thread Bryan Talbot
I see that Nobel's final comment in SOLR-1154 is that config files  
need to be able to include snippets from external files.  In my  
limited testing, a simple patch to enable XInclude support seems to  
work.




--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
  if (lis == null) {
lis = loader.openConfig(name);
  }
-  javax.xml.parsers.DocumentBuilder builder =  
DocumentBuilderFactory.newInstance().newDocumentBuilder();

-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if it exists.  If it's not found an exception will be  
thrown.



http://localhost:8983/solr/corename/admin/file/?file=replication.xml 
"

 xmlns:xi="http://www.w3.org/2001/XInclude";>



If the file is optional and no exception should be thrown if the file  
is missing, simply include a fallback action: in this case the  
fallback is empty and does nothing.



http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 
"

 xmlns:xi="http://www.w3.org/2001/XInclude";>




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with Noble.  
You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot  
wrote:


So how are people managing solrconfig.xml files which are largely  
the same

other than differences for replication?

I don't think it's a "good thing" to maintain two copies of the  
same file

and I'd like to avoid that.  Maybe enabling the XInclude feature in
DocumentBuilders would make it possible to modularize configuration  
files to

make this possible?


http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) 




-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:

On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot  

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node can  
be both

the master and a slave:

A node can act as both master and slave. In that case both the  
master and

slave configuration lists need to be present inside the
ReplicationHandler
requestHandler in the solrconfig.xml.

What does this mean?  Does the core then poll itself for updates?




No. This type of configuration is meant for "repeaters". Suppose  
there are
slaves in multiple data-centers (say data center A and B). There  
is always

a
single master (say in A). One of the slaves in B is used as a  
master for

the
other slaves in B. Therefore, this one slave in B is both a master  
as well

as the slave.



I'd like to have a single set of configuration files that are  
shared by

masters and slaves and avoid duplicating configuration details in
multiple
files (one for master and one for slave) to ease management and  
failover.

Is this possible?


You wouldn't want the master to be a slave. So I guess you'd need  
to have

a
separate file. Also, it needs to be a separate file so that the  
slave does

not become a master when the solrconfig.xml is replicated.



When I attempt to setup a multi server master-slave configuration  
and
include both master and slave replication configuration options,  
I into

some
problems.  I'm  running a nightly build from May 7.


Not sure what happened. Is that the url for this solr (meaning  
same solr

url
is master and slave of itself)? If yes, that is not a valid  
configuration.


--
Regards,
Shalin Shekhar Mangar.








Re: Solr vs Sphinx

2009-05-13 Thread Yonik Seeley
It's probably the case that every search engine out there is faster
than Solr at one thing or another, and that Solr is faster or better
at some other things.

I prefer to spend my time improving Solr rather than engage in
benchmarking wars... and Solr 1.4 will have a ton of speed
improvements over Solr 1.3.

-Yonik
http://www.lucidimagination.com


Re: camel-casing and dismax troubles

2009-05-13 Thread Geoffrey Young
On Wed, May 13, 2009 at 6:23 AM, Yonik Seeley
 wrote:
> On Tue, May 12, 2009 at 7:19 PM, Geoffrey Young
>  wrote:
>> hi all :)
>>
>> I'm having trouble with camel-cased query strings and the dismax handler.
>>
>> a user query
>>
>>  LeAnn Rimes
>>
>> isn't matching the indexed term
>>
>>  Leann Rimes
>
> This is the camel-case case that can't currently be handled by a
> single WordDelimiterFilter.
>
> If the indexeddoc had LeAnn, then it would be indexed as
> "le","ann"/"leann" and hence queries of both forms "le ann" and
> "leann" would match.
>
> However since the indexed term is simply "leann", a
> WordDelimiterFilter configured to split won't match (a search for
> "LeAnn" will be translated into a search for "le" "ann".

but the concatparts and/or concatall should handle splicing the tokens
back together, right?

>
> One way to work around this now is to do a copyField into another
> field that catenates split terms in the query analyzer instead of
> generating/splitting, and then search across both fields.

yeah, unforunately, that's not an option for me :)

>
> BTW, your parsed query below shows you turned on both catenation and
> generation (or perhaps preserveOriginal) for split subwords in your
> query analyzer.  Unfortunately this configuration doesn't work due to
> the ambiguity of what it means to have multiple terms at the same
> position (this is the same problem for multi-word synonyms at query
> time).  The query shown below looks for "leann" or "le" followed by
> "ann" and hence an indexed term of "leann" won't match.

ugh.  ok, thanks for letting me know.

I'm not using the same concat parameters on the index as the query
based on the solr wiki docs.  but I've always wondered if that was a
good idea.  I'll see if matching them up helps at all.

thanks.  I'll let you know what I find.

--Geoff


Re: Solr vs Sphinx

2009-05-13 Thread Grant Ingersoll


On May 13, 2009, at 11:55 AM, wojtekpia wrote:



I came across this article praising Sphinx:
http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
specifically mentions Solr as an 'aging' technology,


Solr is the same age as Sphinx (2006), so if Solr is aging, then so is  
Sphinx.  But, hey aren't we all aging?  It sure beats not aging.  ;-)   
That being said, we are always open to suggestions and improvements.   
Lucene has seen a massive speedup on indexing that comes through in  
Solr in the past year (and it was fast before), and Solr 1.4 looks to  
be faster than 1.3 (and it was fast before, too.)  The Solr community  
is clearly interested in moving things forward and staying fresh, as  
is the Lucene community.



and states that
performance on Sphinx is 2x-4x faster than Solr. Has anyone compared  
Sphinx
to Solr? Or used Sphinx in the past? I realize that you can't just  
say one

is faster than the other because it depends so much on configuration,
requirements, # docs, size of each doc, etc. I'm just looking for  
general
observations. I've found other articles comparing Solr with Sphinx  
and most

state that performance is similar between the two.


I can't speak to Sphinx, as I haven't used it.

As for performance tests, those are always apples and oranges.  If one  
camp does them, then the other camp says "You don't know how to use  
our product" and vice versa.  I think that applies here.  So, when you  
see things like "Internal tests show" that is always a red flag in my  
mind.  I've contacted others in the past who have done "comparisons"  
and after one round of emailing it was almost always clear that they  
didn't know what best practices are for any given product and thus  
were doing things sub-optimally.


One thing in the article that is worthwhile to consider is the fact  
that some (most?) people would likely benefit from not removing  
stopwords, as they can enhance phrase based searching and thus improve  
relevance.  Obviously, with Solr, it is easy to keep stopwords by  
simply removing the StopwordFilterFactor from the analysis process and  
then dealing with them appropriately at query time.  However, it is  
likely the case that too many Solr users simply rely on the example  
schema when it comes to setup instead of actively investigating what  
the proper choices are for their situation.


Finally, an old baseball saying comes to mind: "Pitchers only bother  
to throw at .300 hitters".  Solr is a pretty darn full featured search  
platform with a large and active community, a commercial friendly  
license, and it also performs quite well.


-Grant


Re: Solr vs Sphinx

2009-05-13 Thread Todd Benge
Our company has a large search deployment serving > 50 M search hits / per
day.

We've been leveraging Lucene for several years and have recently deployed
Solr for the distributed search feature.  We were hitting scaling limits
with lucene due to our index size.

I did an evaluation of Sphinx and found Solr / Lucene to be more suitable
for our needs and much more flexible.  Performance in the Solr deployment (
especially with 1.4) has been better than expected.

Thanks to all the Solr developers for a great product.

Hopefully we'll have the opportunity to contribute to the project as it
moves forward.

Todd

On Wed, May 13, 2009 at 10:33 AM, Grant Ingersoll wrote:

>
> On May 13, 2009, at 11:55 AM, wojtekpia wrote:
>
>
>> I came across this article praising Sphinx:
>> http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
>> specifically mentions Solr as an 'aging' technology,
>>
>
> Solr is the same age as Sphinx (2006), so if Solr is aging, then so is
> Sphinx.  But, hey aren't we all aging?  It sure beats not aging.  ;-)  That
> being said, we are always open to suggestions and improvements.  Lucene has
> seen a massive speedup on indexing that comes through in Solr in the past
> year (and it was fast before), and Solr 1.4 looks to be faster than 1.3 (and
> it was fast before, too.)  The Solr community is clearly interested in
> moving things forward and staying fresh, as is the Lucene community.
>
>  and states that
>> performance on Sphinx is 2x-4x faster than Solr. Has anyone compared
>> Sphinx
>> to Solr? Or used Sphinx in the past? I realize that you can't just say one
>> is faster than the other because it depends so much on configuration,
>> requirements, # docs, size of each doc, etc. I'm just looking for general
>> observations. I've found other articles comparing Solr with Sphinx and
>> most
>> state that performance is similar between the two.
>>
>
> I can't speak to Sphinx, as I haven't used it.
>
> As for performance tests, those are always apples and oranges.  If one camp
> does them, then the other camp says "You don't know how to use our product"
> and vice versa.  I think that applies here.  So, when you see things like
> "Internal tests show" that is always a red flag in my mind.  I've contacted
> others in the past who have done "comparisons" and after one round of
> emailing it was almost always clear that they didn't know what best
> practices are for any given product and thus were doing things
> sub-optimally.
>
> One thing in the article that is worthwhile to consider is the fact that
> some (most?) people would likely benefit from not removing stopwords, as
> they can enhance phrase based searching and thus improve relevance.
>  Obviously, with Solr, it is easy to keep stopwords by simply removing the
> StopwordFilterFactor from the analysis process and then dealing with them
> appropriately at query time.  However, it is likely the case that too many
> Solr users simply rely on the example schema when it comes to setup instead
> of actively investigating what the proper choices are for their situation.
>
> Finally, an old baseball saying comes to mind: "Pitchers only bother to
> throw at .300 hitters".  Solr is a pretty darn full featured search platform
> with a large and active community, a commercial friendly license, and it
> also performs quite well.
>
> -Grant
>


Re: master/slave failure scenario

2009-05-13 Thread Jay Hill
- Migrate configuration files from old master (or backup) to new master.
- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11  wrote:

> Nice.
> What if the master fails permanently (like a disk crash...) and the new
> master is a clean machine?
> 2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>
> > On Wed, May 13, 2009 at 12:10 PM, nk 11  wrote:
> > > Hello
> > >
> > > I'm kind of new to Solr and I've read about replication, and the fact
> > that a
> > > node can act as both master and slave.
> > > I a replica fails and then comes back on line I suppose that it will
> > resyncs
> > > with the master.
> > right
> > >
> > > But what happnes if the master fails? A slave that is configured as
> > master
> > > will kick in? What if that slave is not yes fully sync'ed with the
> failed
> > > master and has old data?
> > if the master fails you can't index the data. but the slaves will
> > continue serving the requests with the last index. You an bring back
> > the master up and resume indexing.
> >
> > >
> > > What happens when the original master comes back on line? He will
> remain
> > a
> > > slave because there is another node with the master role?
> > >
> > > Thank you!
> > >
> >
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
>


Re: master/slave failure scenario

2009-05-13 Thread Bryan Talbot

Or ...

1. Promote existing slave to new master
2. Add new slave to cluster




-Bryan




On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to new  
master.

- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11  wrote:


Nice.
What if the master fails permanently (like a disk crash...) and the  
new

master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ्  



On Wed, May 13, 2009 at 12:10 PM, nk 11   
wrote:

Hello

I'm kind of new to Solr and I've read about replication, and the  
fact

that a

node can act as both master and slave.
I a replica fails and then comes back on line I suppose that it  
will

resyncs

with the master.

right


But what happnes if the master fails? A slave that is configured as

master

will kick in? What if that slave is not yes fully sync'ed with the

failed

master and has old data?

if the master fails you can't index the data. but the slaves will
continue serving the requests with the last index. You an bring back
the master up and resume indexing.



What happens when the original master comes back on line? He will

remain

a

slave because there is another node with the master role?

Thank you!





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







Re: camel-casing and dismax troubles

2009-05-13 Thread Yonik Seeley
On Wed, May 13, 2009 at 12:29 PM, Geoffrey Young
 wrote:
>> However since the indexed term is simply "leann", a
>> WordDelimiterFilter configured to split won't match (a search for
>> "LeAnn" will be translated into a search for "le" "ann".
>
> but the concatparts and/or concatall should handle splicing the tokens
> back together, right?

Yes, but you can't do both at once on the query side (split and
concat)... you have to pick one or the other (hence the workaround of
using more than one field).

-Yonik
http://www.lucidimagination.com


Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-13 Thread alxsss

 I forget to say that when I do 

curl http://localhost:8983/solr/update -H "Content-Type: text/xml" 
--data-binary ''


0453



and search for added keywords gives 0 results. Does status 0 mean that addition 
was successful?

Thanks.
Alex.


 


 

-Original Message-
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Tue, 12 May 2009 6:48 pm
Subject: Re: how to manually add data to indexes generated by nutch-1.0 using 
solr









send a  request afterwards, or you can add ?commit=true to the /update 
request with the adds.?
?

?  Erik?
?

On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:?
?

>?

> Tried to add a new record using?

>?

>?

>?

> curl http://localhost:8983/solr/update -H "Content-Type: text/xml" --> 
> data-binary '?

> ?

> 20090512170318?

> 86937aaee8e748ac3007ed8b66477624?

> 0.21189615?

> test.com?

> test test?

>  20090513003210909?

>  '?

>?

> I get?

>?

> ?

> ?

> 0 
> name="QTime">71?

> ?

>?

>?

> and added records are not found in the search.?

>?

> Any ideas what went wrong??

>?

>?

> Thanks.?

> Alex.?

>?

>?

>?

>?

> -Original Message-?

> From: alx...@aim.com?

> To: solr-u...@lucene.apache.org?

> Sent: Mon, 11 May 2009 12:14 pm?

> Subject: how to manually add data to indexes generated by nutch-1.0 > using 
> solr?

>?

>?

>?

>?

>?

>?

>?

>?

>?

>?

> Hello,?

>?

> I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I > needed 
> to??

>?

> index a few files also. But I know keywords for those files and their??

> locations. I need to add them manually. I took a look to two > tutorials on 
> the?

> wiki, but did not find any info about this issue.?

> Is there a tutorial on, step by step procedure of adding data to? > nutch 
> index?

> using solr? manually??

>?

> Thanks in advance.?

> Alex.?

>?

>?

>?

>?

>?
?



 



Re: master/slave failure scenario

2009-05-13 Thread nk 11
This is more interesting.Such a procedure would involve taking down and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot wrote:

> Or ...
>
> 1. Promote existing slave to new master
> 2. Add new slave to cluster
>
>
>
>
> -Bryan
>
>
>
>
>
> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>
>  - Migrate configuration files from old master (or backup) to new master.
>> - Replicate from a slave to the new master.
>> - Resume indexing to new master.
>>
>> -Jay
>>
>> On Wed, May 13, 2009 at 4:26 AM, nk 11  wrote:
>>
>>  Nice.
>>> What if the master fails permanently (like a disk crash...) and the new
>>> master is a clean machine?
>>> 2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 
>>>
>>>  On Wed, May 13, 2009 at 12:10 PM, nk 11  wrote:

> Hello
>
> I'm kind of new to Solr and I've read about replication, and the fact
>
 that a

> node can act as both master and slave.
> I a replica fails and then comes back on line I suppose that it will
>
 resyncs

> with the master.
>
 right

>
> But what happnes if the master fails? A slave that is configured as
>
 master

> will kick in? What if that slave is not yes fully sync'ed with the
>
 failed
>>>
 master and has old data?
>
 if the master fails you can't index the data. but the slaves will
 continue serving the requests with the last index. You an bring back
 the master up and resume indexing.


> What happens when the original master comes back on line? He will
>
 remain
>>>
 a

> slave because there is another node with the master role?
>
> Thank you!
>
>


 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>


Re: Commits taking too long

2009-05-13 Thread vivek sar
Hi,

  This problem is still haunting us. I've reduced the merge factor to
50, but as my index get fat (anything over 20G), the commit starts
taking much longer. Some info,

1) Less than 20 G index size, 5000 records commit takes around 15sec
2) Over 20G the commit starts taking 50-70sec for 5K records
3) mergefactor = 50
4) Using multicore - each core is around 70G (currently there are 5
cores maintained by single Solr instance)
5) RAM = 6G
6) OS = OS X 10.5
7) JVM Options:

export JAVA_OPTS="-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,address=3090,suspend=n \
  -server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m \
  -XX:NewRatio=2 -XX:MaxPermSize=512m \
  -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=${AC_ROOT}/data/pmiJavaHeapDump.hprof \
  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360 \
  -Droot.dir=$AC_ROOT"

export CATALINA_OPTS="-server -Xms${MIN_JVM_HEAP}m -Xmx${MAX_JVM_HEAP}m 
\
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=50
-XX:-UseGCOverheadLimit"

I also see following from GC log to coincide with commit slowness,

40387.691: [GC 40387.691: [ParNew (promotion failed):
132131K->149120K(149120K), 186.3768727 secs]40574.068: [CMSbailing out
to foreground collection
40736.670: [CMS-concurrent-mark: 168.574/356.749 secs] [Times:
user=276.41 sys=1192.51, real=356.77 secs]
 (concurrent mode failure): 6116976K->5908559K(6121088K), 174.0819842
secs] 6229178K->5908559K(6270208K), 360.4589949 secs] [Times:
user=267.90 sys=1185.49, real=360.48 secs]
40748.155: [GC [1 CMS-initial-mark: 5908559K(6121088K)]
5910029K(6270208K), 0.0014832 secs] [Times: user=0.00 sys=0.00,
real=0.00 secs]
40748.156: [CMS-concurrent-mark-start]
40748.513: [GC 40748.513: [ParNew: 127872K->21248K(149120K), 0.7482810
secs] 6036431K->6050277K(6270208K), 0.7483775 secs] [Times: user=1.66
sys=0.71, real=0.75 secs]
40749.613: [GC 40749.613: [ParNew: 149120K->149120K(149120K),
0.227 secs]40749.613: [CMS40784.961: [CMS-concurrent-mark:
36.055/36.805 secs] [Times: user=20.74 sys=31.41, real=36.81 secs]
 (concurrent mode failure): 6029029K->4899386K(6121088K), 44.2068275
secs] 6178149K->4899386K(6270208K), 44.2069457 secs] [Times:
user=26.05 sys=30.21, real=44.21 secs]

Few questions,

1) Should I lower the merge factor even more? Low merge factor seems
to cause more frequent commit pauses.
2)  Do I need more RAM to maintain large indexes?
3) Should I not have any core bigger than 20G?
4) Any other configuration (Solr or JVM) that can help with this?
5) Does search has to wait until commit completes? Right now the
search doesn't return while the commit is happening.

We are using Solr 1.4 (nightly build from 3/29/09).

Thanks,
-vivek

On Wed, Apr 15, 2009 at 11:41 AM, Mark Miller  wrote:
> vivek sar wrote:
>>
>> Hi,
>>
>>  I've index where I commit every 50K records (using Solrj). Usually
>> this commit takes 20sec to complete, but every now and then the commit
>> takes way too long - from 10 min to 30 min. I see more delays as the
>> index size continues to grow - once it gets over 5G I start seeing
>> long commit cycles more frequently. See this for ex.,
>>
>> Apr 15, 2009 12:04:13 AM org.apache.solr.update.DirectUpdateHandler2
>> commit
>> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=false)
>> Apr 15, 2009 12:39:58 AM org.apache.solr.core.SolrDeletionPolicy onCommit
>> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>>
>>  commit{dir=/Users/vivek/demo/afterchat/solr/multicore/20090414_1/data/index,segFN=segments_fq,version=1239747075391,generation=566,filenames=[_19m.cfs,
>> _jm.cfs, _1bk.cfs, _193.cfx, _19z.cfs, _1b8.cfs, _1bf.cfs, _10g.cfs, _
>> 2s.cfs, _1bf.cfx, _18x.cfx, _19c.cfx, _193.cfs, _18x.cfs, _1b7.cfs,
>> _1aw.cfs, _1aq.cfs, _1bi.cfx, _1a6.cfs, _19l.cfs, _1ad.cfs, _1a6.cfx,
>> _1as.cfs, _19l.cfx, _1aa.cfs, _1an.cfs, _19d.cfs, _1a3.cfx, _1a3.cfs,
>> _19g.cfs, _b7.cfs, _19
>> e.cfs, _19b.cfs, _1ab.cfs, _1b3.cfx, _19j.cfs, _190.cfs, _uu.cfs,
>> _1b3.cfs, _1ak.cfs, _19p.cfs, _195.cfs, _194.cfs, _19i.cfx, _199.cfs,
>> _19i.cfs, _19o.cfx, _196.cfs, _199.cfx, _196.cfx, _19o.cfs, _190.cfx,
>> _xn.cfs, _1b0.cfx, _1at.
>> cfs, _1av.cfs, _1ao.cfs, _1a9.cfx, _1b0.cfs, _5l.cfs, _1ao.cfx,
>> _1ap.cfs, _1b6.cfx, _19a.cfs, _139.cfs, _1a1.cfs, _s1.cfs, _1b6.cfs,
>> _1a9.cfs, _197.cfs, _1bd.cfs, _19n.cfs, _1au.cfx, _1au.cfs, _1a5.cfs,
>> _1be.cfs, segments_fq, _1b4.cfs, _gt.cfs, _1ag.cfs, _18z.cfs,
>> _162.cfs, _1a4.cfs, _198.cfs, _19x.cfs, _1ah.cfs, _1ai.cfs, _19q.cfs,
>> _1a7.cfs, _1ae.cfs, _19h.cfs, _19x.cfx, _1a2.cfs, _1bj.cfs, _1bb.cfs,
>> _1b1.cfs, _1ai.cfx, _19r.cfs, _18y.cfs, _19u.cfx, _1a8.
>> cfs, _19u.cfs, _1aj.cfs, _19r.cfx, _1ac.cfs, _1az.cfs, _1ac.cfx,
>> _19y.cfs, _1bc.cfx, _19s.cfs, _1ar.cfs, _1al.cfx, _1bg.cfs, _18v.cfs,
>> _1ar.cfx, _1bc.cfs, _1a0.cfx, _1b2.cfs, _1af.cfs, _1bi.cfs, _1af.cfx,
>> _19f.cfs, _1a0.cfs, _1bh.cfs, _19f.cfx, _19c.cfs, _e0.

Re: how to manually add data to indexes generated by nutch-1.0 using solr

2009-05-13 Thread Erik Hatcher
Try a search for *:* and see if you get results for that.  If so, you  
have your documents indexed, but you need to dig into things like  
query parser configuration and analysis to see why things aren't  
matching.  Perhaps you're not querying the field you think you are?


Erik

On May 13, 2009, at 1:15 PM, alx...@aim.com wrote:



I forget to say that when I do

curl http://localhost:8983/solr/update -H "Content-Type: text/xml" -- 
data-binary ''



0name="QTime">453




and search for added keywords gives 0 results. Does status 0 mean  
that addition was successful?


Thanks.
Alex.







-Original Message-
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Sent: Tue, 12 May 2009 6:48 pm
Subject: Re: how to manually add data to indexes generated by  
nutch-1.0 using solr










send a  request afterwards, or you can add ?commit=true to  
the /update request with the adds.?

?

?  Erik?
?

On May 12, 2009, at 8:57 PM, alx...@aim.com wrote:?
?


?



Tried to add a new record using?



?



?



?


curl http://localhost:8983/solr/update -H "Content-Type: text/xml"  
--> data-binary '?



?



20090512170318?



86937aaee8e748ac3007ed8b66477624?



0.21189615?



test.com?



test test?



 20090513003210909?



 '?



?



I get?



?



?



?


0  
name="QTime">71?



?



?



?



and added records are not found in the search.?



?



Any ideas what went wrong??



?



?



Thanks.?



Alex.?



?



?



?



?



-Original Message-?



From: alx...@aim.com?



To: solr-u...@lucene.apache.org?



Sent: Mon, 11 May 2009 12:14 pm?


Subject: how to manually add data to indexes generated by nutch-1.0  
> using solr?



?



?



?



?



?



?



?



?



?



?



Hello,?



?


I had? Nutch -1.0 to crawl fetch and index a lot of files. Then I >  
needed to??



?


index a few files also. But I know keywords for those files and  
their??


locations. I need to add them manually. I took a look to two >  
tutorials on the?



wiki, but did not find any info about this issue.?


Is there a tutorial on, step by step procedure of adding data to? >  
nutch index?



using solr? manually??



?



Thanks in advance.?



Alex.?



?



?



?



?



?

?









Re: Delete documents from index with dataimport

2009-05-13 Thread Fergus McMenemie
>Hi
>
>Is it possible, through dataimport handler to remove an existing
>document from the Solr index?
>
>I import/update from my database where the active field is true.
>However, if the client then set's active to false, the document stays
>in the Solr index and doesn't get removed.
>
>Regards
>Andrew

Yes but only in the latest trunk. If your "active" field is false
do you want to see the document deleted? Do you have another field
which is a unique ID for the document?

Fergus
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig

Hi Terence,

Terence Gannon schrieb:

Yes, the ownerUid will likely be assigned once and never changed.  But
you still need it, in order to keep track of who has contributed which
document.


Yes, of course!


I've been going over some of the simpler query scenarios, and Solr is
capable of handling them without having to resort to an external
RDBMS.


The database is only to store grants - it's not to help with searching.
It would look like this:

  grantee| grant
  ---+--
  fritz  | fred,frank,egon
  frank  | egon,fritz
  egon   | terence,frank
  ...

Each user is granted to access to his own documents and to those he
had received grants for.


In order to limit documents to those which a given user owns,
or those to which he has been granted access, the syntax fragment
would be something like;

ownerUid:ab2734 or grantedUid:ab2734


I think it could be:

  ownerUid:egon OR ownerUid:terence OR ownerUid:frank

No need to embed grants in the document.

Ah, I see my mistake now. You want grants based on the document, not on
the user - I had overlooked that fact. That makes my suggestion invalid.


I'll plead ignorance of the 'ineluctable filter query' and will have
to read up on that one.


I meant a filter query that the application tags onto the query on
behalf of the user and without the user being able to do anything about
it so he cannot circumvent the filter.

Best regards,

Michael Ludwig


Solr memory requirements?

2009-05-13 Thread vivek sar
Hi,

  I'm pretty sure this has been asked before, but I couldn't find a
complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory? Let's say
I've 4 cores with each core 50G in size. When Solr comes up how much
of it would be loaded in memory?

2) How much memory is required during index time? If I'm committing
50K records at a time (1 record = 1KB) using solrj, how much memory do
I need to give to Solr.

3) Is there a minimum memory requirement by Solr to maintain a certain
size index? Is there any benchmark on this?

Here are some of my configuration from solrconfig.xml,

1) 64
2) All the caches (under query tag) are commented out
3) Few others,
  a)  true==>
would this require memory?
  b)  50
  c) 200
  d) 
  e) false
  f)  2

The problem we are having is following,

I've given Solr RAM of 6G. As the total index size (all cores
combined) start growing the Solr memory consumption  goes up. With 800
million documents, I see Solr already taking up all the memory at
startup. After that the commits, searches everything become slow. We
will be having distributed setup with multiple Solr instances (around
8) on four boxes, but our requirement is to have each Solr instance at
least maintain around 1.5 billion documents.

We are trying to see if we can somehow reduce the Solr memory
footprint. If someone can provide a pointer on what parameters affect
memory and what effects it has we can then decide whether we want that
parameter or not. I'm not sure if there is any minimum Solr
requirement for it to be able maintain large indexes. I've used Lucene
before and that didn't require anything by default - it used up memory
only during index and search times - not otherwise.

Any help is very much appreciated.

Thanks,
-vivek


Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct values for the 
field(s) used for sorting.  Similarly for facet fields.  Solr caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
during indexing.  There is no need to commit every 50K docs unless you want to 
trigger snapshot creation.
3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's going 
to fly. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 3:04:46 PM
> Subject: Solr memory requirements?
> 
> Hi,
> 
>   I'm pretty sure this has been asked before, but I couldn't find a
> complete answer in the forum archive. Here are my questions,
> 
> 1) When solr starts up what does it loads up in the memory? Let's say
> I've 4 cores with each core 50G in size. When Solr comes up how much
> of it would be loaded in memory?
> 
> 2) How much memory is required during index time? If I'm committing
> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> I need to give to Solr.
> 
> 3) Is there a minimum memory requirement by Solr to maintain a certain
> size index? Is there any benchmark on this?
> 
> Here are some of my configuration from solrconfig.xml,
> 
> 1) 64
> 2) All the caches (under query tag) are commented out
> 3) Few others,
>   a)  true==>
> would this require memory?
>   b)  50
>   c) 200
>   d) 
>   e) false
>   f)  2
> 
> The problem we are having is following,
> 
> I've given Solr RAM of 6G. As the total index size (all cores
> combined) start growing the Solr memory consumption  goes up. With 800
> million documents, I see Solr already taking up all the memory at
> startup. After that the commits, searches everything become slow. We
> will be having distributed setup with multiple Solr instances (around
> 8) on four boxes, but our requirement is to have each Solr instance at
> least maintain around 1.5 billion documents.
> 
> We are trying to see if we can somehow reduce the Solr memory
> footprint. If someone can provide a pointer on what parameters affect
> memory and what effects it has we can then decide whether we want that
> parameter or not. I'm not sure if there is any minimum Solr
> requirement for it to be able maintain large indexes. I've used Lucene
> before and that didn't require anything by default - it used up memory
> only during index and search times - not otherwise.
> 
> Any help is very much appreciated.
> 
> Thanks,
> -vivek



Re: Replication master+slave

2009-05-13 Thread Otis Gospodnetic

This looks nice and simple.  I don't know enough about this stuff to see any 
issues.  If there are no issues.?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Bryan Talbot 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 11:26:41 AM
> Subject: Re: Replication master+slave
> 
> I see that Nobel's final comment in SOLR-1154 is that config files need to be 
> able to include snippets from external files.  In my limited testing, a 
> simple 
> patch to enable XInclude support seems to work.
> 
> 
> 
> --- src/java/org/apache/solr/core/Config.java   (revision 774137)
> +++ src/java/org/apache/solr/core/Config.java   (working copy)
> @@ -100,8 +100,10 @@
>   if (lis == null) {
> lis = loader.openConfig(name);
>   }
> -  javax.xml.parsers.DocumentBuilder builder = 
> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> -  doc = builder.parse(lis);
> +  javax.xml.parsers.DocumentBuilderFactory dbf = 
> DocumentBuilderFactory.newInstance();
> +  dbf.setNamespaceAware(true);
> +  dbf.setXIncludeAware(true);
> +  doc = dbf.newDocumentBuilder().parse(lis);
> 
> DOMUtil.substituteProperties(doc, loader.getCoreProperties());
> } catch (ParserConfigurationException e)  {
> 
> 
> 
> This allows a clause like this to include the contents of replication.xml if 
> it 
> exists.  If it's not found an exception will be thrown.
> 
> 
> 
> href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml";
>  xmlns:xi="http://www.w3.org/2001/XInclude";>
> 
> 
> 
> If the file is optional and no exception should be thrown if the file is 
> missing, simply include a fallback action: in this case the fallback is empty 
> and does nothing.
> 
> 
> 
> href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml";
>  xmlns:xi="http://www.w3.org/2001/XInclude";>
> 
> 
> 
> 
> -Bryan
> 
> 
> 
> 
> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:
> 
> > I was looking at the same problem, and had a discussion with Noble. You can
> > use a hack to achieve what you want, see
> > 
> > https://issues.apache.org/jira/browse/SOLR-1154
> > 
> > Thanks,
> > 
> > Jianhan
> > 
> > 
> > On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:
> > 
> >> So how are people managing solrconfig.xml files which are largely the same
> >> other than differences for replication?
> >> 
> >> I don't think it's a "good thing" to maintain two copies of the same file
> >> and I'd like to avoid that.  Maybe enabling the XInclude feature in
> >> DocumentBuilders would make it possible to modularize configuration files 
> >> to
> >> make this possible?
> >> 
> >> 
> >> 
> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)
> >> 
> >> 
> >> -Bryan
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:
> >> 
> >> On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot 
>  wrote:
> >>> 
> >>> For replication in 1.4, the wiki at
>  http://wiki.apache.org/solr/SolrReplication says that a node can be both
>  the master and a slave:
>  
>  A node can act as both master and slave. In that case both the master and
>  slave configuration lists need to be present inside the
>  ReplicationHandler
>  requestHandler in the solrconfig.xml.
>  
>  What does this mean?  Does the core then poll itself for updates?
>  
> >>> 
> >>> 
> >>> No. This type of configuration is meant for "repeaters". Suppose there are
> >>> slaves in multiple data-centers (say data center A and B). There is always
> >>> a
> >>> single master (say in A). One of the slaves in B is used as a master for
> >>> the
> >>> other slaves in B. Therefore, this one slave in B is both a master as well
> >>> as the slave.
> >>> 
> >>> 
> >>> 
>  I'd like to have a single set of configuration files that are shared by
>  masters and slaves and avoid duplicating configuration details in
>  multiple
>  files (one for master and one for slave) to ease management and failover.
>  Is this possible?
>  
>  
> >>> You wouldn't want the master to be a slave. So I guess you'd need to have
> >>> a
> >>> separate file. Also, it needs to be a separate file so that the slave does
> >>> not become a master when the solrconfig.xml is replicated.
> >>> 
> >>> 
> >>> 
>  When I attempt to setup a multi server master-slave configuration and
>  include both master and slave replication configuration options, I into
>  some
>  problems.  I'm  running a nightly build from May 7.
>  
>  
> >>> Not sure what happened. Is that the url for this solr (meaning same solr
> >>> url
> >>> is master and slave of itself)? If yes, that is not a valid configuration.
> >>> 
> >>> --
> >>> Regards,
> >>> Shalin Shekhar Mangar.
> >>> 
> >> 
> >> 



Re: Replication master+slave

2009-05-13 Thread Peter Wolanin
Indeed - that looks nice - having some kind of conditional includes
would make many things easier.

-Peter

On Wed, May 13, 2009 at 4:22 PM, Otis Gospodnetic
 wrote:
>
> This looks nice and simple.  I don't know enough about this stuff to see any 
> issues.  If there are no issues.?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Bryan Talbot 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 11:26:41 AM
>> Subject: Re: Replication master+slave
>>
>> I see that Nobel's final comment in SOLR-1154 is that config files need to be
>> able to include snippets from external files.  In my limited testing, a 
>> simple
>> patch to enable XInclude support seems to work.
>>
>>
>>
>> --- src/java/org/apache/solr/core/Config.java   (revision 774137)
>> +++ src/java/org/apache/solr/core/Config.java   (working copy)
>> @@ -100,8 +100,10 @@
>>   if (lis == null) {
>>     lis = loader.openConfig(name);
>>   }
>> -      javax.xml.parsers.DocumentBuilder builder =
>> DocumentBuilderFactory.newInstance().newDocumentBuilder();
>> -      doc = builder.parse(lis);
>> +      javax.xml.parsers.DocumentBuilderFactory dbf =
>> DocumentBuilderFactory.newInstance();
>> +      dbf.setNamespaceAware(true);
>> +      dbf.setXIncludeAware(true);
>> +      doc = dbf.newDocumentBuilder().parse(lis);
>>
>>     DOMUtil.substituteProperties(doc, loader.getCoreProperties());
>> } catch (ParserConfigurationException e)  {
>>
>>
>>
>> This allows a clause like this to include the contents of replication.xml if 
>> it
>> exists.  If it's not found an exception will be thrown.
>>
>>
>>
>> href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml";
>>          xmlns:xi="http://www.w3.org/2001/XInclude";>
>>
>>
>>
>> If the file is optional and no exception should be thrown if the file is
>> missing, simply include a fallback action: in this case the fallback is empty
>> and does nothing.
>>
>>
>>
>> href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml";
>>          xmlns:xi="http://www.w3.org/2001/XInclude";>
>>
>>
>>
>>
>> -Bryan
>>
>>
>>
>>
>> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:
>>
>> > I was looking at the same problem, and had a discussion with Noble. You can
>> > use a hack to achieve what you want, see
>> >
>> > https://issues.apache.org/jira/browse/SOLR-1154
>> >
>> > Thanks,
>> >
>> > Jianhan
>> >
>> >
>> > On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:
>> >
>> >> So how are people managing solrconfig.xml files which are largely the same
>> >> other than differences for replication?
>> >>
>> >> I don't think it's a "good thing" to maintain two copies of the same file
>> >> and I'd like to avoid that.  Maybe enabling the XInclude feature in
>> >> DocumentBuilders would make it possible to modularize configuration files 
>> >> to
>> >> make this possible?
>> >>
>> >>
>> >>
>> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)
>> >>
>> >>
>> >> -Bryan
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:
>> >>
>> >> On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot
>>  wrote:
>> >>>
>> >>> For replication in 1.4, the wiki at
>>  http://wiki.apache.org/solr/SolrReplication says that a node can be both
>>  the master and a slave:
>> 
>>  A node can act as both master and slave. In that case both the master 
>>  and
>>  slave configuration lists need to be present inside the
>>  ReplicationHandler
>>  requestHandler in the solrconfig.xml.
>> 
>>  What does this mean?  Does the core then poll itself for updates?
>> 
>> >>>
>> >>>
>> >>> No. This type of configuration is meant for "repeaters". Suppose there 
>> >>> are
>> >>> slaves in multiple data-centers (say data center A and B). There is 
>> >>> always
>> >>> a
>> >>> single master (say in A). One of the slaves in B is used as a master for
>> >>> the
>> >>> other slaves in B. Therefore, this one slave in B is both a master as 
>> >>> well
>> >>> as the slave.
>> >>>
>> >>>
>> >>>
>>  I'd like to have a single set of configuration files that are shared by
>>  masters and slaves and avoid duplicating configuration details in
>>  multiple
>>  files (one for master and one for slave) to ease management and 
>>  failover.
>>  Is this possible?
>> 
>> 
>> >>> You wouldn't want the master to be a slave. So I guess you'd need to have
>> >>> a
>> >>> separate file. Also, it needs to be a separate file so that the slave 
>> >>> does
>> >>> not become a master when the solrconfig.xml is replicated.
>> >>>
>> >>>
>> >>>
>>  When I attempt to setup a multi server master-slave configuration and
>>  include both master and slave replication configuration options, I into
>>  some
>>  problems.  I'm  running a nightly build from May 7.
>> 
>> 
>> 

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


   
   

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
>
> Hi,
> Some answers:
> 1) .tii files in the Lucene index.  When you sort, all distinct values for 
> the field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to trigger snapshot creation.
> 3) see 1) above
>
> 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> going to fly. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> Subject: Solr memory requirements?
>>
>> Hi,
>>
>>   I'm pretty sure this has been asked before, but I couldn't find a
>> complete answer in the forum archive. Here are my questions,
>>
>> 1) When solr starts up what does it loads up in the memory? Let's say
>> I've 4 cores with each core 50G in size. When Solr comes up how much
>> of it would be loaded in memory?
>>
>> 2) How much memory is required during index time? If I'm committing
>> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> I need to give to Solr.
>>
>> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> size index? Is there any benchmark on this?
>>
>> Here are some of my configuration from solrconfig.xml,
>>
>> 1) 64
>> 2) All the caches (under query tag) are commented out
>> 3) Few others,
>>       a)  true    ==>
>> would this require memory?
>>       b)  50
>>       c) 200
>>       d)
>>       e) false
>>       f)  2
>>
>> The problem we are having is following,
>>
>> I've given Solr RAM of 6G. As the total index size (all cores
>> combined) start growing the Solr memory consumption  goes up. With 800
>> million documents, I see Solr already taking up all the memory at
>> startup. After that the commits, searches everything become slow. We
>> will be having distributed setup with multiple Solr instances (around
>> 8) on four boxes, but our requirement is to have each Solr instance at
>> least maintain around 1.5 billion documents.
>>
>> We are trying to see if we can somehow reduce the Solr memory
>> footprint. If someone can provide a pointer on what parameters affect
>> memory and what effects it has we can then decide whether we want that
>> parameter or not. I'm not sure if there is any minimum Solr
>> requirement for it to be able maintain large indexes. I've used Lucene
>> before and that didn't require anything by default - it used up memory
>> only during index and search times - not otherwise.
>>
>> Any help is very much appreciated.
>>
>> Thanks,
>> -vivek
>
>


Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,

Sorting is triggered by the sort parameter in the URL, not a characteristic of 
a field. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 4:42:16 PM
> Subject: Re: Solr memory requirements?
> 
> Thanks Otis.
> 
> Our use case doesn't require any sorting or faceting. I'm wondering if
> I've configured anything wrong.
> 
> I got total of 25 fields (15 are indexed and stored, other 10 are just
> stored). All my fields are basic data type - which I thought are not
> sorted. My id field is unique key.
> 
> Is there any field here that might be getting sorted?
> 
> 
> required="true" omitNorms="true" compressed="false"/>
> 
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> default="NOW/HOUR"  compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> default="NOW/HOUR" omitNorms="true"/>
> 
> 
>   
>   
> omitNorms="true" multiValued="true"/>
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
> wrote:
> >
> > Hi,
> > Some answers:
> > 1) .tii files in the Lucene index.  When you sort, all distinct values for 
> > the 
> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
> > consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to 
> trigger snapshot creation.
> > 3) see 1) above
> >
> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> > going 
> to fly. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >> Subject: Solr memory requirements?
> >>
> >> Hi,
> >>
> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >> complete answer in the forum archive. Here are my questions,
> >>
> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >> I've 4 cores with each core 50G in size. When Solr comes up how much
> >> of it would be loaded in memory?
> >>
> >> 2) How much memory is required during index time? If I'm committing
> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> >> I need to give to Solr.
> >>
> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
> >> size index? Is there any benchmark on this?
> >>
> >> Here are some of my configuration from solrconfig.xml,
> >>
> >> 1) 64
> >> 2) All the caches (under query tag) are commented out
> >> 3) Few others,
> >>   a)  true==>
> >> would this require memory?
> >>   b)  50
> >>   c) 200
> >>   d)
> >>   e) false
> >>   f)  2
> >>
> >> The problem we are having is following,
> >>
> >> I've given Solr RAM of 6G. As the total index size (all cores
> >> combined) start growing the Solr memory consumption  goes up. With 800
> >> million documents, I see Solr already taking up all the memory at
> >> startup. After that the commits, searches everything become slow. We
> >> will be having distributed setup with multiple Solr instances (around
> >> 8) on four boxes, but our requirement is to have each Solr instance at
> >> least maintain around 1.5 billion documents.
> >>
> >> We are trying to see if we can somehow reduce the Solr memory
> >> footprint. If someone can provide a pointer on what parameters affect
> >> memory and what effects it has we can then decide whether we want that
> >> parameter or not. I'm not sure if there is any minimum Solr
> >> requirement for it to be able maintain large indexes. I've used Lucene
> >> before and that didn't require anything by default - it used up memory
> >> only during index and search times - not otherwise.
> >>
> >> Any help is very much appreciated.
> >>
> >> Thanks,
> >> -vivek
> >
> >



Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/index/_3au.tii

I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Sorting is triggered by the sort parameter in the URL, not a characteristic 
> of a field. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> Subject: Re: Solr memory requirements?
>>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>
>> required="true" omitNorms="true" compressed="false"/>
>>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> default="NOW/HOUR"  compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>
>>
>> omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values for 
>> > the
>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>> > consume
>> during indexing.  There is no need to commit every 50K docs unless you want 
>> to
>> trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
>> > going
>> to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in memory?
>> >>
>> >> 2) How much memory is required during index time? If I'm committing
>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> >> I need to give to Solr.
>> >>
>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> >> size index? Is there any benchmark on this?
>> >>
>> >> Here are some of my configuration from solrconfig.xml,
>> >>
>> >> 1) 64
>> >> 2) All the caches (under query tag) are commented out
>> >> 3) Few others,
>> >>       a)  true    ==>
>> >> would this require memory?
>> >>       b)  50
>> >>       c) 200
>> >>       d)
>> >>       e) false
>> >>       f)  2
>> >>
>> >> The problem we are having is following,
>> >>
>> >> I've given Solr RAM of 6G. As the total index size (all cores
>> >> combined) start growing the Solr memory consumption  goes up. With 800
>> >> million documents, I see Solr already taking up all the memory at
>> >> startup. After that the commits, searches everything become slow. We
>> >> will be having distributed setup with multiple Solr instances (around
>> >> 8) on four boxes, but our requirement is to have each Solr instance at
>> >> least maintain around 1.5 billion documents.
>> >>
>> >> We are trying to see if we can somehow reduce the Solr memory
>> >> footprint. If someone can provide a pointer 

SOLR date boost

2009-05-13 Thread Jack Godwin
With solr 1.3 I'm having a problem boosting new documents to the top.  I
used the recommended BoostFunction  "recip(rord(created_at),1,1000,1000)"
but older documents, sometimes 5 years old, make it to the top 3 documents.
 I've started using "ord(created_at)^0.0005" and get better results, but I
don't think I should be... From what I understand rord is descending order
and ord is ascending order, so why does this work?  Does Solr 1.3 still have
issues with date fields?
Thanks,
Jack


Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll
Have you done any profiling to see where the hotspots are?  I realize  
that may be difficult on an index of that size, but maybe you can  
approximate on a smaller version.  Also, do you have warming queries?


You might also look into setting the termIndexInterval at the Lucene  
level.  This is not currently exposed in Solr (AFAIK), but likely  
could be added fairly easily as part of the index parameters.  http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#setTermIndexInterval(int)


-Grant

On May 13, 2009, at 5:12 PM, vivek sar wrote:


Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/ 
index/_3au.tii


I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:


Hi,

Sorting is triggered by the sort parameter in the URL, not a  
characteristic of a field. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 4:42:16 PM
Subject: Re: Solr memory requirements?

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm  
wondering if

I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are  
just

stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?


required="true" omitNorms="true" compressed="false"/>


compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

default="NOW/HOUR"  compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

compressed="false"/>

compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

default="NOW/HOUR" omitNorms="true"/>




omitNorms="true" multiValued="true"/>

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
wrote:


Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct  
values for the
field(s) used for sorting.  Similarly for facet fields.  Solr  
caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr  
will consume
during indexing.  There is no need to commit every 50K docs unless  
you want to

trigger snapshot creation.

3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt  
that's going

to fly. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 3:04:46 PM
Subject: Solr memory requirements?

Hi,

  I'm pretty sure this has been asked before, but I couldn't  
find a

complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory?  
Let's say
I've 4 cores with each core 50G in size. When Solr comes up how  
much

of it would be loaded in memory?

2) How much memory is required during index time? If I'm  
committing
50K records at a time (1 record = 1KB) using solrj, how much  
memory do

I need to give to Solr.

3) Is there a minimum memory requirement by Solr to maintain a  
certain

size index? Is there any benchmark on this?

Here are some of my configuration from solrconfig.xml,

1) 64
2) All the caches (under query tag) are commented out
3) Few others,
  a)  true==>
would this require memory?
  b)  50
  c) 200
  d)
  e) false
  f)  2

The problem we are having is following,

I've given Solr RAM of 6G. As the total index size (all cores
combined) start growing the Solr memory consumption  goes up.  
With 800

million documents, I see Solr already taking up all the memory at
startup. After that the commits, searches everything become  
slow. We
will be having distributed setup with multiple Solr instances  
(around
8) on four boxes, but our requirement is to have each Solr  
instance at

least maintain around 1.5 billion documents.

We are trying to see if we can somehow

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Just an update on the memory issue - might be useful for others. I
read the following,

 http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)

and looks like the first and new searcher listeners would populate the
FieldCache. Commenting out these two listener entries seems to do the
trick - at least the heap size is not growing as soon as Solr starts
up.

I ran some searches and they all came out fine. Index rate is also
pretty good. Would there be any impact of disabling these listeners?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
> Otis,
>
> In that case, I'm not sure why Solr is taking up so much memory as
> soon as we start it up. I checked for .tii file and there is only one,
>
> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
>
> I have all the cache disabled - so that shouldn't be a problem too. My
> ramBuffer size is only 64MB.
>
> I read note on sorting,
> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> something related to FieldCache. I don't see this as parameter defined
> in either solrconfig.xml or schema.xml. Could this be something that
> can load things in memory at startup? How can we disable it?
>
> I'm trying to find out if there is a way to tell how much memory Solr
> would consume and way to cap it.
>
> Thanks,
> -vivek
>
>
>
>
> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>  wrote:
>>
>> Hi,
>>
>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>> of a field. :)
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: vivek sar 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>>> Subject: Re: Solr memory requirements?
>>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>
>>> required="true" omitNorms="true" compressed="false"/>
>>>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> default="NOW/HOUR"  compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>
>>>
>>> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>> wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
>>> > for the
>>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>>> > consume
>>> during indexing.  There is no need to commit every 50K docs unless you want 
>>> to
>>> trigger snapshot creation.
>>> > 3) see 1) above
>>> >
>>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
>>> > going
>>> to fly. :)
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Message 
>>> >> From: vivek sar
>>> >> To: solr-user@lucene.apache.org
>>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>>> >> Subject: Solr memory requirements?
>>> >>
>>> >> Hi,
>>> >>
>>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>>> >> complete answer in the forum archive. Here are my questions,
>>> >>
>>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>>> >> of it would be loaded in memory?
>>> >>
>>> >> 2) How much memory is required during index time? If I'm committing
>>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>>> >> I need to give to Solr.
>>> >>
>>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>>> >> size index? Is there any benchmark on this?
>>> >>
>>> >> Here are some of my configuration from solrconfig.xml,
>>> >>
>>> >> 1) 64
>>> >> 2) All the caches (under query tag) are commented out
>>> >> 3) Few others,
>>> >>       a)  true    ==>
>>> >> would this require memory?
>>> >>       b)

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and doesn't release?

I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.

Some other query properties under solrconfig,


   1024
   true
   50
   200

   false
   2
 

Currently, I got 800 million documents and have specified 8G heap size.

Any other suggestion on what can I do to control the Solr memory consumption?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
> Just an update on the memory issue - might be useful for others. I
> read the following,
>
>  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>
> and looks like the first and new searcher listeners would populate the
> FieldCache. Commenting out these two listener entries seems to do the
> trick - at least the heap size is not growing as soon as Solr starts
> up.
>
> I ran some searches and they all came out fine. Index rate is also
> pretty good. Would there be any impact of disabling these listeners?
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>>  wrote:
>>>
>>> Hi,
>>>
>>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>>> of a field. :)
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
 From: vivek sar 
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 4:42:16 PM
 Subject: Re: Solr memory requirements?

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?


 required="true" omitNorms="true" compressed="false"/>


 compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 default="NOW/HOUR"  compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 compressed="false"/>

 compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 omitNorms="true" compressed="false"/>

 compressed="false"/>

 compressed="false"/>

 compressed="false"/>

 omitNorms="true" compressed="false"/>

 compressed="false"/>

 default="NOW/HOUR" omitNorms="true"/>




 omitNorms="true" multiValued="true"/>

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
 >
 > Hi,
 > Some answers:
 > 1) .tii files in the Lucene index.  When you sort, all distinct values 
 > for the
 field(s) used for sorting.  Similarly for facet fields.  Solr caches.
 > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
 > consume
 during indexing.  There is no need to commit every 50K docs unless you 
 want to
 trigger snapshot creation.
 > 3) see 1) above
 >
 > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
 > going
 to fly. :)
 >
 > Otis
 > --
 > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 >
 >
 >
 > - Original Message 
 >> From: vivek sar
 >> To: solr-user@lucene.apache.org
 >> Sent: Wednesday, May 13, 2009 3:04:46 PM
 >> Subject: Solr memory requireme

Re: Solr memory requirements?

2009-05-13 Thread Jack Godwin
Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
3 million docs.
Jack

On Wed, May 13, 2009 at 6:53 PM, vivek sar  wrote:

> Disabling first/new searchers did help for the initial load time, but
> after 10-15 min the heap memory start climbing up again and reached
> max within 20 min. Now the GC is coming up all the time, which is
> slowing down the commit and search cycles.
>
> This is still puzzling what does Solr holds in the memory and doesn't
> release?
>
> I haven't been able to profile as the dump is too big. Would setting
> termIndexInterval help - not sure how can that be set using Solr.
>
> Some other query properties under solrconfig,
>
> 
>   1024
>   true
>   50
>   200
>
>   false
>   2
>  
>
> Currently, I got 800 million documents and have specified 8G heap size.
>
> Any other suggestion on what can I do to control the Solr memory
> consumption?
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
> > Just an update on the memory issue - might be useful for others. I
> > read the following,
> >
> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> >
> > and looks like the first and new searcher listeners would populate the
> > FieldCache. Commenting out these two listener entries seems to do the
> > trick - at least the heap size is not growing as soon as Solr starts
> > up.
> >
> > I ran some searches and they all came out fine. Index rate is also
> > pretty good. Would there be any impact of disabling these listeners?
> >
> > Thanks,
> > -vivek
> >
> > On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
> >> Otis,
> >>
> >> In that case, I'm not sure why Solr is taking up so much memory as
> >> soon as we start it up. I checked for .tii file and there is only one,
> >>
> >> -rw-r--r--  1 search  staff  20306 May 11 21:47
> ./20090510_1/data/index/_3au.tii
> >>
> >> I have all the cache disabled - so that shouldn't be a problem too. My
> >> ramBuffer size is only 64MB.
> >>
> >> I read note on sorting,
> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> >> something related to FieldCache. I don't see this as parameter defined
> >> in either solrconfig.xml or schema.xml. Could this be something that
> >> can load things in memory at startup? How can we disable it?
> >>
> >> I'm trying to find out if there is a way to tell how much memory Solr
> >> would consume and way to cap it.
> >>
> >> Thanks,
> >> -vivek
> >>
> >>
> >>
> >>
> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorting is triggered by the sort parameter in the URL, not a
> characteristic of a field. :)
> >>>
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
>  From: vivek sar 
>  To: solr-user@lucene.apache.org
>  Sent: Wednesday, May 13, 2009 4:42:16 PM
>  Subject: Re: Solr memory requirements?
> 
>  Thanks Otis.
> 
>  Our use case doesn't require any sorting or faceting. I'm wondering if
>  I've configured anything wrong.
> 
>  I got total of 25 fields (15 are indexed and stored, other 10 are just
>  stored). All my fields are basic data type - which I thought are not
>  sorted. My id field is unique key.
> 
>  Is there any field here that might be getting sorted?
> 
> 
>  required="true" omitNorms="true" compressed="false"/>
> 
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  default="NOW/HOUR"  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  default="NOW/HOUR" omitNorms="true"/>
> 
> 
> 
> 
>  omitNorms="true" multiValued="true"/>
> 
>  Thanks,
>  -vivek
> 
>  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>  wrote:
>  >
>  > Hi,
>  > Some answers:
>  > 1) .tii files in the Lucene index.  When you sort, all distinct
> values for the
>  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>  > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
> consume
>  during indexing.  There is no need to commit every 50K docs unless you
> want to
>  trigger snapshot creat

acts_as_solr patch support for Solr Cell style requests

2009-05-13 Thread Thanh Doan
Hi Erik et all,

I am following this  tutorial link
http://www.lucidimagination.com/blog/tag/acts_as_solr/

to play with acts_as_solr and see if we can invoke solr cell right
from our Rails app.

following he tutorial i created  classSolrCellRequest  but dont
know where to save the solr_cell_request.rb file to.

Should I save file solr_cell_request.rb to
/path/to/resume/vendor/plugins/acts_as_solr/lib  directory
or
I have to save it to
/path/to/resume/vendor/plugins/acts_as_solr/lib/solr/request directory
where the Solr::Request::Select class locate?

Thanks!

Thanh Doan


Re: Solr memory requirements?

2009-05-13 Thread vivek sar
I think maxBufferedDocs has been deprecated in Solr 1.4 - it's
recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64.
This shouldn't be a problem I think.

There has to be something else that Solr is holding up in memory. Anyone else?

Thanks,
-vivek

On Wed, May 13, 2009 at 4:01 PM, Jack Godwin  wrote:
> Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
> 3 million docs.
> Jack
>
> On Wed, May 13, 2009 at 6:53 PM, vivek sar  wrote:
>
>> Disabling first/new searchers did help for the initial load time, but
>> after 10-15 min the heap memory start climbing up again and reached
>> max within 20 min. Now the GC is coming up all the time, which is
>> slowing down the commit and search cycles.
>>
>> This is still puzzling what does Solr holds in the memory and doesn't
>> release?
>>
>> I haven't been able to profile as the dump is too big. Would setting
>> termIndexInterval help - not sure how can that be set using Solr.
>>
>> Some other query properties under solrconfig,
>>
>> 
>>   1024
>>   true
>>   50
>>   200
>>    
>>   false
>>   2
>>  
>>
>> Currently, I got 800 million documents and have specified 8G heap size.
>>
>> Any other suggestion on what can I do to control the Solr memory
>> consumption?
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
>> > Just an update on the memory issue - might be useful for others. I
>> > read the following,
>> >
>> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>> >
>> > and looks like the first and new searcher listeners would populate the
>> > FieldCache. Commenting out these two listener entries seems to do the
>> > trick - at least the heap size is not growing as soon as Solr starts
>> > up.
>> >
>> > I ran some searches and they all came out fine. Index rate is also
>> > pretty good. Would there be any impact of disabling these listeners?
>> >
>> > Thanks,
>> > -vivek
>> >
>> > On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> >> Otis,
>> >>
>> >> In that case, I'm not sure why Solr is taking up so much memory as
>> >> soon as we start it up. I checked for .tii file and there is only one,
>> >>
>> >> -rw-r--r--  1 search  staff  20306 May 11 21:47
>> ./20090510_1/data/index/_3au.tii
>> >>
>> >> I have all the cache disabled - so that shouldn't be a problem too. My
>> >> ramBuffer size is only 64MB.
>> >>
>> >> I read note on sorting,
>> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> >> something related to FieldCache. I don't see this as parameter defined
>> >> in either solrconfig.xml or schema.xml. Could this be something that
>> >> can load things in memory at startup? How can we disable it?
>> >>
>> >> I'm trying to find out if there is a way to tell how much memory Solr
>> >> would consume and way to cap it.
>> >>
>> >> Thanks,
>> >> -vivek
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> >>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Sorting is triggered by the sort parameter in the URL, not a
>> characteristic of a field. :)
>> >>>
>> >>> Otis
>> >>> --
>> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >>>
>> >>>
>> >>>
>> >>> - Original Message 
>>  From: vivek sar 
>>  To: solr-user@lucene.apache.org
>>  Sent: Wednesday, May 13, 2009 4:42:16 PM
>>  Subject: Re: Solr memory requirements?
>> 
>>  Thanks Otis.
>> 
>>  Our use case doesn't require any sorting or faceting. I'm wondering if
>>  I've configured anything wrong.
>> 
>>  I got total of 25 fields (15 are indexed and stored, other 10 are just
>>  stored). All my fields are basic data type - which I thought are not
>>  sorted. My id field is unique key.
>> 
>>  Is there any field here that might be getting sorted?
>> 
>> 
>>  required="true" omitNorms="true" compressed="false"/>
>> 
>> 
>>  compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  default="NOW/HOUR"  compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  omitNorms="true" compressed="false"/>
>> 
>>  compressed="false"/>
>> 
>>  default="NOW/HOUR" omitNorms="true"/>
>> 
>> 
>> 
>> 
>>  omitNorms="true" multiValued="true"/>
>> 
>>  Thanks,
>>  -vivek
>> 
>> 

Re: Solr memory requirements?

2009-05-13 Thread Erick Erickson
Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them 

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:

> Thanks Otis.
>
> Our use case doesn't require any sorting or faceting. I'm wondering if
> I've configured anything wrong.
>
> I got total of 25 fields (15 are indexed and stored, other 10 are just
> stored). All my fields are basic data type - which I thought are not
> sorted. My id field is unique key.
>
> Is there any field here that might be getting sorted?
>
>   required="true" omitNorms="true" compressed="false"/>
>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>default="NOW/HOUR"  compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>default="NOW/HOUR" omitNorms="true"/>
>
>
>   
>omitNorms="true" multiValued="true"/>
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>  wrote:
> >
> > Hi,
> > Some answers:
> > 1) .tii files in the Lucene index.  When you sort, all distinct values
> for the field(s) used for sorting.  Similarly for facet fields.  Solr
> caches.
> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
> consume during indexing.  There is no need to commit every 50K docs unless
> you want to trigger snapshot creation.
> > 3) see 1) above
> >
> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
> going to fly. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >> Subject: Solr memory requirements?
> >>
> >> Hi,
> >>
> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >> complete answer in the forum archive. Here are my questions,
> >>
> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >> I've 4 cores with each core 50G in size. When Solr comes up how much
> >> of it would be loaded in memory?
> >>
> >> 2) How much memory is required during index time? If I'm committing
> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> >> I need to give to Solr.
> >>
> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
> >> size index? Is there any benchmark on this?
> >>
> >> Here are some of my configuration from solrconfig.xml,
> >>
> >> 1) 64
> >> 2) All the caches (under query tag) are commented out
> >> 3) Few others,
> >>   a)  true==>
> >> would this require memory?
> >>   b)  50
> >>   c) 200
> >>   d)
> >>   e) false
> >>   f)  2
> >>
> >> The problem we are having is following,
> >>
> >> I've given Solr RAM of 6G. As the total index size (all cores
> >> combined) start growing the Solr memory consumption  goes up. With 800
> >> million documents, I see Solr already taking up all the memory at
> >> startup. After that the commits, searches everything become slow. We
> >> will be having distributed setup with multiple Solr instances (around
> >> 8) on four boxes, but our requirement is to have each Solr instance at
> >> least maintain around 1.5 billion documents.
> >>
> >> We are trying to see if we can somehow reduce the Solr memory
> >> footprint. If someone can provide a pointer on what parameters affect
> >> memory and what effects it has we can then decide whether we want that
> >> parameter or not. I'm not sure if there is any minimum Solr
> >> requirement for it to be able maintain large indexes. I've used Lucene
> >> before and that didn't require anything by default - it used up memory
> >> only during index and search times - not otherwise.
> >>
> >> Any help is very much appreciated.
> >>
> >> Thanks,
> >> -vivek
> >
> >
>


Re: acts_as_solr patch support for Solr Cell style requests

2009-05-13 Thread Thanh Doan
I created Ruby class SolrCellRequest and saved it to
/path/to/resume/vendor/plugins/acts_as_solr/lib  directory.

Here is code original from the tutorial.

module ActsAsSolr
  class SolrCellRequest < Solr::Request::Select
def initialize(doc,file_name)
 .
 .
  def handler
  'update/extract'
end
  end

  class SolrCellResponse < Solr::Response::Ruby
  end

end

however when I start using it
$ script/console
Loading development environment (Rails 2.2.2)
>> solr = Solr::Connection.new("http://localhost:8982/solr";)
>> req = SolrCellRequest.new(Solr::Document.new(:id=>1), '/path/to/resume.pdf')

I got this error

> req = SolrCellRequest.new(Solr::Document.new(:id=>1), 
> '/Users/tcdoan/eric.pdf')
LoadError: Expected
/Users/tcdoan/resume/vendor/plugins/acts_as_solr/lib/solr_cell_request.rb
to define SolrCellRequest
from 
/Library/Ruby/Gems/1.8/gems/activesupport-2.3.2/lib/active_support/dependencies.rb:426:in
`load_missing_constant'
from 
/Library/Ruby/Gems/1.8/gems/activesupport-2.3.2/lib/active_support/dependencies.rb:80:in
`const_missing'
from 
/Library/Ruby/Gems/1.8/gems/activesupport-2.3.2/lib/active_support/dependencies.rb:92:in
`const_missing'
from (irb):2

Can you tell what was wrong here. Thanks.

Thanh


On Wed, May 13, 2009 at 6:11 PM, Thanh Doan  wrote:
> Hi Erik et all,
>
> I am following this  tutorial link
> http://www.lucidimagination.com/blog/tag/acts_as_solr/
>
> to play with acts_as_solr and see if we can invoke solr cell right
> from our Rails app.
>
> following he tutorial i created  class    c  but dont
> know where to save the solr_cell_request.rb file to.
>
> Should I save file solr_cell_request.rb to
> /path/to/resume/vendor/plugins/acts_as_solr/lib  directory
> or
> I have to save it to
> /path/to/resume/vendor/plugins/acts_as_solr/lib/solr/request directory
> where the Solr::Request::Select class locate?
>
> Thanks!
>
> Thanh Doan
>



-- 
Regards,
Thanh Doan
713-884-0576
http://datamatter.blogspot.com/


Java Environment Problem on Vista

2009-05-13 Thread John Bennett
I'm having difficulty getting Solr running on Vista. I've got the 1.6 
JDK installed, and I've successfully compiled file and run other Java 
programs.


When I run java -jar start.jar in the Apache Solr example directory, I 
get a large number of INFO messages, including:


INFO: JNDI not configured for solr (NoInitialContextEx)

When I visit localhost:8983/solr/, I get a 404 error message:


   HTTP ERROR: 404

NOT_FOUND

RequestURI=/solr/

/Powered by jetty:// /

I've talked to a couple of engineers who suspect that the problem is 
with my Java environment. My environment is configured as follows:


CLASSPATH=.;C:\Program 
Files\Java\jdk1.6.0_13\lib\ext\QTJava.zip;C:\Users\John\Documents\Java;C:\Program 
Files\Java\jdk1.6.0_13;

JAVA_HOME=C:\Program_Files\Java\jdk1.6.0_13
Path=C:\Program Files\Snap\scripts;C:\Program 
Files\Snap;C:\Python25\Scripts;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;c:\Program 
Files\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\Common 
Files\Roxio Shared\DLLShared\;C:\Program Files\Common Files\Roxio 
Shared\9.0\DLLShared\;C:\Program Files\QuickTime\QTSystem\;C:\Program 
Files\Java\jdk1.6.0_13\bin


Any ideas?

Regards,

John



Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Even a simple command like this will help:

  jmap -histo:live  | head -30

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 6:53:29 PM
> Subject: Re: Solr memory requirements?
> 
> Disabling first/new searchers did help for the initial load time, but
> after 10-15 min the heap memory start climbing up again and reached
> max within 20 min. Now the GC is coming up all the time, which is
> slowing down the commit and search cycles.
> 
> This is still puzzling what does Solr holds in the memory and doesn't release?
> 
> I haven't been able to profile as the dump is too big. Would setting
> termIndexInterval help - not sure how can that be set using Solr.
> 
> Some other query properties under solrconfig,
> 
> 
>   1024
>   true
>   50
>   200
> 
>   false
>   2
> 
> 
> Currently, I got 800 million documents and have specified 8G heap size.
> 
> Any other suggestion on what can I do to control the Solr memory consumption?
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote:
> > Just an update on the memory issue - might be useful for others. I
> > read the following,
> >
> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> >
> > and looks like the first and new searcher listeners would populate the
> > FieldCache. Commenting out these two listener entries seems to do the
> > trick - at least the heap size is not growing as soon as Solr starts
> > up.
> >
> > I ran some searches and they all came out fine. Index rate is also
> > pretty good. Would there be any impact of disabling these listeners?
> >
> > Thanks,
> > -vivek
> >
> > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
> >> Otis,
> >>
> >> In that case, I'm not sure why Solr is taking up so much memory as
> >> soon as we start it up. I checked for .tii file and there is only one,
> >>
> >> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> >>
> >> I have all the cache disabled - so that shouldn't be a problem too. My
> >> ramBuffer size is only 64MB.
> >>
> >> I read note on sorting,
> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> >> something related to FieldCache. I don't see this as parameter defined
> >> in either solrconfig.xml or schema.xml. Could this be something that
> >> can load things in memory at startup? How can we disable it?
> >>
> >> I'm trying to find out if there is a way to tell how much memory Solr
> >> would consume and way to cap it.
> >>
> >> Thanks,
> >> -vivek
> >>
> >>
> >>
> >>
> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorting is triggered by the sort parameter in the URL, not a 
> >>> characteristic 
> of a field. :)
> >>>
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
>  From: vivek sar 
>  To: solr-user@lucene.apache.org
>  Sent: Wednesday, May 13, 2009 4:42:16 PM
>  Subject: Re: Solr memory requirements?
> 
>  Thanks Otis.
> 
>  Our use case doesn't require any sorting or faceting. I'm wondering if
>  I've configured anything wrong.
> 
>  I got total of 25 fields (15 are indexed and stored, other 10 are just
>  stored). All my fields are basic data type - which I thought are not
>  sorted. My id field is unique key.
> 
>  Is there any field here that might be getting sorted?
> 
> 
>  required="true" omitNorms="true" compressed="false"/>
> 
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  default="NOW/HOUR"  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  compressed="false"/>
> 
>  omitNorms="true" compressed="false"/>
> 
>  compressed="false"/>
> 
>  default="NOW/HOUR" omitNorms="true"/>
> 
> 
> 
> 
>  omitNorms="true" multiValued="true"/>
> 
>  Thanks,
>  -vivek
> 
>  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>  wrote:
>  >
>  > Hi,
>  > Some answers:
>  > 1) .tii files in the Lucene index.  When you sort, all distinct values 
> for the
>  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>  > 2) ramBufferSizeMB di

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Yeah, I'm not sure why this would help.  There should be nothing in FieldCaches 
unless you sort or use facets.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 5:53:45 PM
> Subject: Re: Solr memory requirements?
> 
> Just an update on the memory issue - might be useful for others. I
> read the following,
> 
> http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> 
> and looks like the first and new searcher listeners would populate the
> FieldCache. Commenting out these two listener entries seems to do the
> trick - at least the heap size is not growing as soon as Solr starts
> up.
> 
> I ran some searches and they all came out fine. Index rate is also
> pretty good. Would there be any impact of disabling these listeners?
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
> > Otis,
> >
> > In that case, I'm not sure why Solr is taking up so much memory as
> > soon as we start it up. I checked for .tii file and there is only one,
> >
> > -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> >
> > I have all the cache disabled - so that shouldn't be a problem too. My
> > ramBuffer size is only 64MB.
> >
> > I read note on sorting,
> > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> > something related to FieldCache. I don't see this as parameter defined
> > in either solrconfig.xml or schema.xml. Could this be something that
> > can load things in memory at startup? How can we disable it?
> >
> > I'm trying to find out if there is a way to tell how much memory Solr
> > would consume and way to cap it.
> >
> > Thanks,
> > -vivek
> >
> >
> >
> >
> > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> > wrote:
> >>
> >> Hi,
> >>
> >> Sorting is triggered by the sort parameter in the URL, not a 
> >> characteristic 
> of a field. :)
> >>
> >> Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> - Original Message 
> >>> From: vivek sar 
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >>> Subject: Re: Solr memory requirements?
> >>>
> >>> Thanks Otis.
> >>>
> >>> Our use case doesn't require any sorting or faceting. I'm wondering if
> >>> I've configured anything wrong.
> >>>
> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >>> stored). All my fields are basic data type - which I thought are not
> >>> sorted. My id field is unique key.
> >>>
> >>> Is there any field here that might be getting sorted?
> >>>
> >>>
> >>> required="true" omitNorms="true" compressed="false"/>
> >>>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> default="NOW/HOUR"  compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> default="NOW/HOUR" omitNorms="true"/>
> >>>
> >>>
> >>>
> >>>
> >>> omitNorms="true" multiValued="true"/>
> >>>
> >>> Thanks,
> >>> -vivek
> >>>
> >>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
> >>> wrote:
> >>> >
> >>> > Hi,
> >>> > Some answers:
> >>> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
> >>> > for 
> the
> >>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> >>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
> consume
> >>> during indexing.  There is no need to commit every 50K docs unless you 
> >>> want 
> to
> >>> trigger snapshot creation.
> >>> > 3) see 1) above
> >>> >
> >>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt 
> >>> > that's 
> going
> >>> to fly. :)
> >>> >
> >>> > Otis
> >>> > --
> >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>> >
> >>> >
> >>> >
> >>> > - Original Message 
> >>> >> From: vivek sar
> >>> >> To: solr-user@lucene.apache.org
> >>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >>> >> Subject: Solr memory requirements?
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >>> >> complete answer in the forum archive. Here are my questions,
> >>> >>
> >>> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >>> >> I've 4

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

There is constant mixing of indexing concepts and searching concepts in this 
thread.  Are you having problems on the master (indexing) or on the slave 
(searching)?


That .tii is only 20K and you said this is a large index?  That doesn't smell 
right...

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 5:12:00 PM
> Subject: Re: Solr memory requirements?
> 
> Otis,
> 
> In that case, I'm not sure why Solr is taking up so much memory as
> soon as we start it up. I checked for .tii file and there is only one,
> 
> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> 
> I have all the cache disabled - so that shouldn't be a problem too. My
> ramBuffer size is only 64MB.
> 
> I read note on sorting,
> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> something related to FieldCache. I don't see this as parameter defined
> in either solrconfig.xml or schema.xml. Could this be something that
> can load things in memory at startup? How can we disable it?
> 
> I'm trying to find out if there is a way to tell how much memory Solr
> would consume and way to cap it.
> 
> Thanks,
> -vivek
> 
> 
> 
> 
> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> wrote:
> >
> > Hi,
> >
> > Sorting is triggered by the sort parameter in the URL, not a characteristic 
> > of 
> a field. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >> Subject: Re: Solr memory requirements?
> >>
> >> Thanks Otis.
> >>
> >> Our use case doesn't require any sorting or faceting. I'm wondering if
> >> I've configured anything wrong.
> >>
> >> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >> stored). All my fields are basic data type - which I thought are not
> >> sorted. My id field is unique key.
> >>
> >> Is there any field here that might be getting sorted?
> >>
> >>
> >> required="true" omitNorms="true" compressed="false"/>
> >>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> default="NOW/HOUR"  compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> default="NOW/HOUR" omitNorms="true"/>
> >>
> >>
> >>
> >>
> >> omitNorms="true" multiValued="true"/>
> >>
> >> Thanks,
> >> -vivek
> >>
> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
> >> wrote:
> >> >
> >> > Hi,
> >> > Some answers:
> >> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
> >> > for 
> the
> >> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
> consume
> >> during indexing.  There is no need to commit every 50K docs unless you 
> >> want 
> to
> >> trigger snapshot creation.
> >> > 3) see 1) above
> >> >
> >> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> going
> >> to fly. :)
> >> >
> >> > Otis
> >> > --
> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >> >
> >> >
> >> >
> >> > - Original Message 
> >> >> From: vivek sar
> >> >> To: solr-user@lucene.apache.org
> >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >> >> Subject: Solr memory requirements?
> >> >>
> >> >> Hi,
> >> >>
> >> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >> >> complete answer in the forum archive. Here are my questions,
> >> >>
> >> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >> >> I've 4 cores with each core 50G in size. When Solr comes up how much
> >> >> of it would be loaded in memory?
> >> >>
> >> >> 2) How much memory is required during index time? If I'm committing
> >> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> >> >> I need to give to Solr.
> >> >>
> >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
> >> >> size index? Is there any benchmark on this?
> >> >>
> >> >> Here are some of my configuration from solrconfig.xml,
> >> >>
> >> >> 1) 64
> >> >> 2) All the caches (under query tag) are commented out
> >> >> 3) Few others,
> >> >>   a

Re: Replication master+slave

2009-05-13 Thread Otis Gospodnetic

Coincidentally, from 
http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ :

"Hadoop configuration files now support XInclude elements for including 
portions of another configuration file (HADOOP-4944). This mechanism allows you 
to make configuration files more modular and reusable."

So "others are doing it, too".

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Bryan Talbot 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 11:26:41 AM
> Subject: Re: Replication master+slave
> 
> I see that Nobel's final comment in SOLR-1154 is that config files need to be 
> able to include snippets from external files.  In my limited testing, a 
> simple 
> patch to enable XInclude support seems to work.
> 
> 
> 
> --- src/java/org/apache/solr/core/Config.java   (revision 774137)
> +++ src/java/org/apache/solr/core/Config.java   (working copy)
> @@ -100,8 +100,10 @@
>   if (lis == null) {
> lis = loader.openConfig(name);
>   }
> -  javax.xml.parsers.DocumentBuilder builder = 
> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> -  doc = builder.parse(lis);
> +  javax.xml.parsers.DocumentBuilderFactory dbf = 
> DocumentBuilderFactory.newInstance();
> +  dbf.setNamespaceAware(true);
> +  dbf.setXIncludeAware(true);
> +  doc = dbf.newDocumentBuilder().parse(lis);
> 
> DOMUtil.substituteProperties(doc, loader.getCoreProperties());
> } catch (ParserConfigurationException e)  {
> 
> 
> 
> This allows a clause like this to include the contents of replication.xml if 
> it 
> exists.  If it's not found an exception will be thrown.
> 
> 
> 
> href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml";
>  xmlns:xi="http://www.w3.org/2001/XInclude";>
> 
> 
> 
> If the file is optional and no exception should be thrown if the file is 
> missing, simply include a fallback action: in this case the fallback is empty 
> and does nothing.
> 
> 
> 
> href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml";
>  xmlns:xi="http://www.w3.org/2001/XInclude";>
> 
> 
> 
> 
> -Bryan
> 
> 
> 
> 
> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:
> 
> > I was looking at the same problem, and had a discussion with Noble. You can
> > use a hack to achieve what you want, see
> > 
> > https://issues.apache.org/jira/browse/SOLR-1154
> > 
> > Thanks,
> > 
> > Jianhan
> > 
> > 
> > On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:
> > 
> >> So how are people managing solrconfig.xml files which are largely the same
> >> other than differences for replication?
> >> 
> >> I don't think it's a "good thing" to maintain two copies of the same file
> >> and I'd like to avoid that.  Maybe enabling the XInclude feature in
> >> DocumentBuilders would make it possible to modularize configuration files 
> >> to
> >> make this possible?
> >> 
> >> 
> >> 
> http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)
> >> 
> >> 
> >> -Bryan
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:
> >> 
> >> On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot 
>  wrote:
> >>> 
> >>> For replication in 1.4, the wiki at
>  http://wiki.apache.org/solr/SolrReplication says that a node can be both
>  the master and a slave:
>  
>  A node can act as both master and slave. In that case both the master and
>  slave configuration lists need to be present inside the
>  ReplicationHandler
>  requestHandler in the solrconfig.xml.
>  
>  What does this mean?  Does the core then poll itself for updates?
>  
> >>> 
> >>> 
> >>> No. This type of configuration is meant for "repeaters". Suppose there are
> >>> slaves in multiple data-centers (say data center A and B). There is always
> >>> a
> >>> single master (say in A). One of the slaves in B is used as a master for
> >>> the
> >>> other slaves in B. Therefore, this one slave in B is both a master as well
> >>> as the slave.
> >>> 
> >>> 
> >>> 
>  I'd like to have a single set of configuration files that are shared by
>  masters and slaves and avoid duplicating configuration details in
>  multiple
>  files (one for master and one for slave) to ease management and failover.
>  Is this possible?
>  
>  
> >>> You wouldn't want the master to be a slave. So I guess you'd need to have
> >>> a
> >>> separate file. Also, it needs to be a separate file so that the slave does
> >>> not become a master when the solrconfig.xml is replicated.
> >>> 
> >>> 
> >>> 
>  When I attempt to setup a multi server master-slave configuration and
>  include both master and slave replication configuration options, I into
>  some
>  problems.  I'm  running a nightly build from May 7.
>  
>  
> >>> Not sure what happened. Is that the url for this solr (meaning same so

Re: Replication master+slave

2009-05-13 Thread Bryan Talbot
I think the patch I included earlier covers solr core, but it looks  
like at least some other extensions (DIH) create and use their own XML  
parser.  So, if this functionality is to extend to all XML files,  
those will need similar patches.


Here's one for DIH:

--- src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (revision 774137)
+++ src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (working copy)

@@ -148,8 +148,10 @@
   void loadDataConfig(String configFile) {

 try {
-  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
-  .newDocumentBuilder();
+  DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  DocumentBuilder builder = dbf.newDocumentBuilder();
   Document document = builder.parse(new InputSource(new  
StringReader(

   configFile)));



The only down side I can see to this is it doesn't offer very  
expressive conditional inclusion: the file is included if it's present  
otherwise fallback inclusions can be used.  It's also specific to XML  
files and obviously won't work for other types of configuration  
files.  However, it is simple and effective.



-Bryan




On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:



Coincidentally, from http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ 
 :


"Hadoop configuration files now support XInclude elements for  
including portions of another configuration file (HADOOP-4944). This  
mechanism allows you to make configuration files more modular and  
reusable."


So "others are doing it, too".

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 11:26:41 AM
Subject: Re: Replication master+slave

I see that Nobel's final comment in SOLR-1154 is that config files  
need to be
able to include snippets from external files.  In my limited  
testing, a simple

patch to enable XInclude support seems to work.



--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
 if (lis == null) {
   lis = loader.openConfig(name);
 }
-  javax.xml.parsers.DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

   DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if it

exists.  If it's not found an exception will be thrown.



href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml 
"

xmlns:xi="http://www.w3.org/2001/XInclude";>



If the file is optional and no exception should be thrown if the  
file is
missing, simply include a fallback action: in this case the  
fallback is empty

and does nothing.



href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 
"

xmlns:xi="http://www.w3.org/2001/XInclude";>




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with  
Noble. You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:

So how are people managing solrconfig.xml files which are largely  
the same

other than differences for replication?

I don't think it's a "good thing" to maintain two copies of the  
same file

and I'd like to avoid that.  Maybe enabling the XInclude feature in
DocumentBuilders would make it possible to modularize  
configuration files to

make this possible?




http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)



-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:

On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node  
can be both

the master and a slave:

A node can act as both master and slave. In that case both the  
master and

slave configuration lists need to be present inside the
ReplicationHandler
requestHandler in the solrconfig.xml.

What does this mean?  Does the core then poll itself for updates?




No. This type of configuration is meant for "repeaters". Suppose  
there are
slaves in multiple data-centers (say data center A and B). There  
is always

a
single master (say in A). One of the slaves in B is used as a  
master for

the
other slaves in B

Re: Replication master+slave

2009-05-13 Thread Otis Gospodnetic

Bryan, maybe it's time to stick this in JIRA?
http://wiki.apache.org/solr/HowToContribute

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Bryan Talbot 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 10:11:21 PM
> Subject: Re: Replication master+slave
> 
> I think the patch I included earlier covers solr core, but it looks like at 
> least some other extensions (DIH) create and use their own XML parser.  So, 
> if 
> this functionality is to extend to all XML files, those will need similar 
> patches.
> 
> Here's one for DIH:
> 
> --- src/main/java/org/apache/solr/handler/dataimport/DataImporter.java  
> (revision 774137)
> +++ src/main/java/org/apache/solr/handler/dataimport/DataImporter.java  
> (working 
> copy)
> @@ -148,8 +148,10 @@
>void loadDataConfig(String configFile) {
> 
>  try {
> -  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
> -  .newDocumentBuilder();
> +  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> +  dbf.setNamespaceAware(true);
> +  dbf.setXIncludeAware(true);
> +  DocumentBuilder builder = dbf.newDocumentBuilder();
>Document document = builder.parse(new InputSource(new StringReader(
>configFile)));
> 
> 
> 
> The only down side I can see to this is it doesn't offer very expressive 
> conditional inclusion: the file is included if it's present otherwise 
> fallback 
> inclusions can be used.  It's also specific to XML files and obviously won't 
> work for other types of configuration files.  However, it is simple and 
> effective.
> 
> 
> -Bryan
> 
> 
> 
> 
> On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:
> 
> > 
> > Coincidentally, from 
> http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/
>  :
> > 
> > "Hadoop configuration files now support XInclude elements for including 
> portions of another configuration file (HADOOP-4944). This mechanism allows 
> you 
> to make configuration files more modular and reusable."
> > 
> > So "others are doing it, too".
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: Bryan Talbot 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 11:26:41 AM
> >> Subject: Re: Replication master+slave
> >> 
> >> I see that Nobel's final comment in SOLR-1154 is that config files need to 
> >> be
> >> able to include snippets from external files.  In my limited testing, a 
> simple
> >> patch to enable XInclude support seems to work.
> >> 
> >> 
> >> 
> >> --- src/java/org/apache/solr/core/Config.java   (revision 774137)
> >> +++ src/java/org/apache/solr/core/Config.java   (working copy)
> >> @@ -100,8 +100,10 @@
> >>  if (lis == null) {
> >>lis = loader.openConfig(name);
> >>  }
> >> -  javax.xml.parsers.DocumentBuilder builder =
> >> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> >> -  doc = builder.parse(lis);
> >> +  javax.xml.parsers.DocumentBuilderFactory dbf =
> >> DocumentBuilderFactory.newInstance();
> >> +  dbf.setNamespaceAware(true);
> >> +  dbf.setXIncludeAware(true);
> >> +  doc = dbf.newDocumentBuilder().parse(lis);
> >> 
> >>DOMUtil.substituteProperties(doc, loader.getCoreProperties());
> >> } catch (ParserConfigurationException e)  {
> >> 
> >> 
> >> 
> >> This allows a clause like this to include the contents of replication.xml 
> >> if 
> it
> >> exists.  If it's not found an exception will be thrown.
> >> 
> >> 
> >> 
> >> href="http://localhost:8983/solr/corename/admin/file/?file=replication.xml";
> >> xmlns:xi="http://www.w3.org/2001/XInclude";>
> >> 
> >> 
> >> 
> >> If the file is optional and no exception should be thrown if the file is
> >> missing, simply include a fallback action: in this case the fallback is 
> >> empty
> >> and does nothing.
> >> 
> >> 
> >> 
> >> href="http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml";
> >> xmlns:xi="http://www.w3.org/2001/XInclude";>
> >> 
> >> 
> >> 
> >> 
> >> -Bryan
> >> 
> >> 
> >> 
> >> 
> >> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:
> >> 
> >>> I was looking at the same problem, and had a discussion with Noble. You 
> >>> can
> >>> use a hack to achieve what you want, see
> >>> 
> >>> https://issues.apache.org/jira/browse/SOLR-1154
> >>> 
> >>> Thanks,
> >>> 
> >>> Jianhan
> >>> 
> >>> 
> >>> On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:
> >>> 
>  So how are people managing solrconfig.xml files which are largely the 
>  same
>  other than differences for replication?
>  
>  I don't think it's a "good thing" to maintain two copies of the same file
>  and I'd like to avoid that.  Maybe enabling the XInclude feature in
>  DocumentBuilders would make it possible to modularize configuration 
>  files 
> to
>  make this possib

Re: Sorting by 'starts with'

2009-05-13 Thread Otis Gospodnetic

Wojtek,

I believe 
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/spans/SpanFirstQuery.html
 would help, though there is no support for Span queries in Solr.  But there is 
support for custom query parsers, and there is 
http://lucene.apache.org/java/2_4_1/api/contrib-snowball/index.html

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: wojtekpia 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 7, 2009 2:41:29 PM
> Subject: Sorting by 'starts with'
> 
> 
> I have an index of product names. I'd like to sort results so that entries
> starting with the user query come first. 
> E.g. 
> 
> q=kitchen
> 
> Results would sort something like:
> 1. kitchen appliance
> 2. kitchenaid dishwasher
> 3. fridge for kitchen
> 
> It looks like using a query Function Query comes close, but I don't know how
> to write a subquery that only matches if the value starts with the query
> string. 
> 
> Has anyone solved a similar need?
> 
> Thanks,
> 
> Wojtek
> -- 
> View this message in context: 
> http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Creating new QParserPlugin

2009-05-13 Thread Otis Gospodnetic

Andrey,

I urge you to use JIRA for this.  That's exactly what it's for and how it gets 
used.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Andrey Klochkov 
> To: solr-user@lucene.apache.org
> Sent: Thursday, May 7, 2009 5:14:26 AM
> Subject: Re: Creating new QParserPlugin
> 
> Hi!
> 
> I agree that Solr is difficult to extend in many cases. We just patch Solr,
> and I guess many other users patch it too. What I propose is to create some
> Solr-community site (Solr incubator?) to public patches there, and Solr core
> team could then look there and choose patches to apply to the Solr codebase.
> I know that one can use Jira for that, but it's not convinient to use it in
> this way.
> 
> On Thu, May 7, 2009 at 2:41 AM, KaktuChakarabati wrote:
> 
> >
> > Hello everyone,
> > I am trying to write a new QParserPlugin+QParser, one that will work
> > similar
> > to how DisMax does, but will give me more control over the
> > FunctionQuery-related part of the query processing (e.g in regards to a
> > specified bf parameter).
> >
> > In specific, I want to be able to affect the way the queryNorm (and
> > possibly
> > other factors) interact with a
> > pre-computed value I store in a static field (i.e I compute an index-time
> > score for a document that I wish to use in a bf as a ValueSource, without
> > being affected by queryNorm or other such extranous considerations.)
> >
> > While trying this, I notice I run alot into cases where some parts I try to
> > override/inherit from are private to a java package namespace, and this
> > makes the whole thing very cumbersome.
> >
> > Examples for this are the DismaxQParser class which is defined as a local
> > class inside the DisMaxQParserPlugin.java file (i think this is bad
> > practice
> > - otherwise, FunctionQParserPlugin/FunctionQParser do have their own
> > seperate files, so i think this is a good convention to follow generally).
> > Another case is where i try to inherit from FunctionQParser and end up not
> > being able to replicate some of the parse() logic, because it uses the
> > QueryParsing.StrParser class which is a static inner class and so is only
> > accessible from the solr.search namespace.
> >
> > In short, many such cases seem to arise and i think this poses a
> > considerable limitation on
> > the possibilities of extending solr.
> > If this resonates with more people here, I'd take this issue up with
> > solr-dev.
> >
> > Otherwise, if some of you have some notions about going about what i'm
> > trying to do differently,
> > I would be happy to hear.
> >
> > Thanks,
> > -Chak
> > --
> > View this message in context:
> > http://www.nabble.com/Creating-new-QParserPlugin-tp23416974p23416974.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> 
> 
> -- 
> Andrew Klochkov



Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll


On May 13, 2009, at 6:53 PM, vivek sar wrote:


Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and  
doesn't release?


I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.


It would have to be set in the same place that the ramBufferSizeMB  
gets set, in the config, but this would require some coding (albeit  
pretty straightforward) to set it on the IndexWriter.  I don't think  
it would help in profiling.


Do you have warming queries? (Sorry if I missed your answer)

Also, I know you have set the heap to 8 gbs.  Is there a size you can  
get to that it levels out at?  I presume you are getting Out Of  
Memory, right?  Or, are you just concerned about the current mem. size?


Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-13 Thread Jacob Singh
HI Grant,

That's not a bad idea... I could try that.  I was also looking at cactus:
http://jakarta.apache.org/cactus/integration/ant/index.html

It has an ant task to merge XML.  Could this be a contrib-crawl add-on?

Alternately, do you know of any xslt templates built for this?  Could
write one, but that's a fair bit of work to support everything.
Perhaps an xslt task combined with a contrib-crawl would do the trick?

Best,
-J

On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll  wrote:
> Hmmm, maybe we need to think about someway to hook this into the build
> process or make it easier to just drop it into the conf or lib dirs.  I'm no
> web.xml expert, but I'm sure you're not the first one to want to do this
> kind of thing.
>
> The easiest way _might_ be to patch build.xml to take a property for the
> location of the web.xml, defaulting to the current Solr one.  Then, people
> who want to use their own version could just pass in -Dweb.xml= web.xml>.  The downside to this is that it may cause problems for us devs
> when users ask questions about strange behavior and it turns out they have
> mucked up the web.xml
>
> FYI: dist-war is in build.xml, not common-build.xml.
>
> -Grant
>
> On May 12, 2009, at 5:52 AM, Jacob Singh wrote:
>
>> Hi folks,
>>
>> I just wrote a Servlet Filter to handle authentication for our
>> service.  Here's what I did:
>>
>> 1. Created a dir in contrib
>> 2. Put my project in there, I took the dataimporthandler build.xml as
>> an example and modified it to suit my needs.  Worked great!
>> 3. ant dist now builds my jar and includes it
>>
>> I now need to modify web.xml to add my filter-mapping, init params,
>> etc.  How can I do this cleanly?  Or do I need to manually open up the
>> archive and edit it and then re-war it?
>>
>> In common-build I don't see a target for dist-war, so don't see how it
>> is possible...
>>
>> Thanks!
>> Jacob
>>
>> --
>>
>> +1 510 277-0891 (o)
>> +91  33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: jacobsi...@gmail.com
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: master/slave failure scenario

2009-05-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish to
change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11  wrote:
> This is more interesting.Such a procedure would involve taking down and
> reconfiguring the slave?
>
> On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot wrote:
>
>> Or ...
>>
>> 1. Promote existing slave to new master
>> 2. Add new slave to cluster
>>
>>
>>
>>
>> -Bryan
>>
>>
>>
>>
>>
>> On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
>>
>>  - Migrate configuration files from old master (or backup) to new master.
>>> - Replicate from a slave to the new master.
>>> - Resume indexing to new master.
>>>
>>> -Jay
>>>
>>> On Wed, May 13, 2009 at 4:26 AM, nk 11  wrote:
>>>
>>>  Nice.
 What if the master fails permanently (like a disk crash...) and the new
 master is a clean machine?
 2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् 

  On Wed, May 13, 2009 at 12:10 PM, nk 11  wrote:
>
>> Hello
>>
>> I'm kind of new to Solr and I've read about replication, and the fact
>>
> that a
>
>> node can act as both master and slave.
>> I a replica fails and then comes back on line I suppose that it will
>>
> resyncs
>
>> with the master.
>>
> right
>
>>
>> But what happnes if the master fails? A slave that is configured as
>>
> master
>
>> will kick in? What if that slave is not yes fully sync'ed with the
>>
> failed

> master and has old data?
>>
> if the master fails you can't index the data. but the slaves will
> continue serving the requests with the last index. You an bring back
> the master up and resume indexing.
>
>
>> What happens when the original master comes back on line? He will
>>
> remain

> a
>
>> slave because there is another node with the master role?
>>
>> Thank you!
>>
>>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
>

>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Replication master+slave

2009-05-13 Thread Shalin Shekhar Mangar
There's a related issue open.

https://issues.apache.org/jira/browse/SOLR-712

On Thu, May 14, 2009 at 7:50 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Bryan, maybe it's time to stick this in JIRA?
> http://wiki.apache.org/solr/HowToContribute
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Bryan Talbot 
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, May 13, 2009 10:11:21 PM
> > Subject: Re: Replication master+slave
> >
> > I think the patch I included earlier covers solr core, but it looks like
> at
> > least some other extensions (DIH) create and use their own XML parser.
>  So, if
> > this functionality is to extend to all XML files, those will need similar
> > patches.
> >
> > Here's one for DIH:
> >
> > --- src/main/java/org/apache/solr/handler/dataimport/DataImporter.java
> > (revision 774137)
> > +++ src/main/java/org/apache/solr/handler/dataimport/DataImporter.java
>  (working
> > copy)
> > @@ -148,8 +148,10 @@
> >void loadDataConfig(String configFile) {
> >
> >  try {
> > -  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
> > -  .newDocumentBuilder();
> > +  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> > +  dbf.setNamespaceAware(true);
> > +  dbf.setXIncludeAware(true);
> > +  DocumentBuilder builder = dbf.newDocumentBuilder();
> >Document document = builder.parse(new InputSource(new
> StringReader(
> >configFile)));
> >
> >
> >
> > The only down side I can see to this is it doesn't offer very expressive
> > conditional inclusion: the file is included if it's present otherwise
> fallback
> > inclusions can be used.  It's also specific to XML files and obviously
> won't
> > work for other types of configuration files.  However, it is simple and
> > effective.
> >
> >
> > -Bryan
> >
> >
> >
> >
> > On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:
> >
> > >
> > > Coincidentally, from
> >
> http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/:
> > >
> > > "Hadoop configuration files now support XInclude elements for including
> > portions of another configuration file (HADOOP-4944). This mechanism
> allows you
> > to make configuration files more modular and reusable."
> > >
> > > So "others are doing it, too".
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > - Original Message 
> > >> From: Bryan Talbot
> > >> To: solr-user@lucene.apache.org
> > >> Sent: Wednesday, May 13, 2009 11:26:41 AM
> > >> Subject: Re: Replication master+slave
> > >>
> > >> I see that Nobel's final comment in SOLR-1154 is that config files
> need to be
> > >> able to include snippets from external files.  In my limited testing,
> a
> > simple
> > >> patch to enable XInclude support seems to work.
> > >>
> > >>
> > >>
> > >> --- src/java/org/apache/solr/core/Config.java   (revision 774137)
> > >> +++ src/java/org/apache/solr/core/Config.java   (working copy)
> > >> @@ -100,8 +100,10 @@
> > >>  if (lis == null) {
> > >>lis = loader.openConfig(name);
> > >>  }
> > >> -  javax.xml.parsers.DocumentBuilder builder =
> > >> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> > >> -  doc = builder.parse(lis);
> > >> +  javax.xml.parsers.DocumentBuilderFactory dbf =
> > >> DocumentBuilderFactory.newInstance();
> > >> +  dbf.setNamespaceAware(true);
> > >> +  dbf.setXIncludeAware(true);
> > >> +  doc = dbf.newDocumentBuilder().parse(lis);
> > >>
> > >>DOMUtil.substituteProperties(doc, loader.getCoreProperties());
> > >> } catch (ParserConfigurationException e)  {
> > >>
> > >>
> > >>
> > >> This allows a clause like this to include the contents of
> replication.xml if
> > it
> > >> exists.  If it's not found an exception will be thrown.
> > >>
> > >>
> > >>
> > >> href="
> http://localhost:8983/solr/corename/admin/file/?file=replication.xml";
> > >> xmlns:xi="http://www.w3.org/2001/XInclude";>
> > >>
> > >>
> > >>
> > >> If the file is optional and no exception should be thrown if the file
> is
> > >> missing, simply include a fallback action: in this case the fallback
> is empty
> > >> and does nothing.
> > >>
> > >>
> > >>
> > >> href="
> http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml";
> > >> xmlns:xi="http://www.w3.org/2001/XInclude";>
> > >>
> > >>
> > >>
> > >>
> > >> -Bryan
> > >>
> > >>
> > >>
> > >>
> > >> On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:
> > >>
> > >>> I was looking at the same problem, and had a discussion with Noble.
> You can
> > >>> use a hack to achieve what you want, see
> > >>>
> > >>> https://issues.apache.org/jira/browse/SOLR-1154
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jianhan
> > >>>
> > >>>
> > >>> On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:
> > >>>
> >  So how are people managing s

Re: Java Environment Problem on Vista

2009-05-13 Thread Amit Nithian
To me it sounds like it's not finding solr home. I have Windows Vista and
JDK 1.6.0_11 and when I run java -jar start.jar, I too get a ton of the INFO
messages and one of them should read something like:INFO: solr home
defaulted to 'solr/' (could not find system property or JNDI)
May 13, 2009 10:45:16 PM org.apache.solr.servlet.SolrServlet init

I assume that you are running java -jar start.jar in the example/ directory
and that in that example directory there is a folder called solr/? If you
don't specify a JNDI or Java system property specifying solr's home then it
defaults to the /solr (in this case example/solr).

The fact that it's printing out "INFO: JNDI not configured for solr
(NoInitialContextEx)" indicates that the war file is being loaded properly
by JETTY.

Hope that helps some
Amit



On Wed, May 13, 2009 at 5:36 PM, John Bennett  wrote:

> I'm having difficulty getting Solr running on Vista. I've got the 1.6 JDK
> installed, and I've successfully compiled file and run other Java programs.
>
> When I run java -jar start.jar in the Apache Solr example directory, I get
> a large number of INFO messages, including:
>
> INFO: JNDI not configured for solr (NoInitialContextEx)
>
> When I visit localhost:8983/solr/, I get a 404 error message:
>
>
>   HTTP ERROR: 404
>
> NOT_FOUND
>
> RequestURI=/solr/
>
> /Powered by jetty:// /
>
> I've talked to a couple of engineers who suspect that the problem is with
> my Java environment. My environment is configured as follows:
>
> CLASSPATH=.;C:\Program
> Files\Java\jdk1.6.0_13\lib\ext\QTJava.zip;C:\Users\John\Documents\Java;C:\Program
> Files\Java\jdk1.6.0_13;
> JAVA_HOME=C:\Program_Files\Java\jdk1.6.0_13
> Path=C:\Program Files\Snap\scripts;C:\Program
> Files\Snap;C:\Python25\Scripts;%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;c:\Program
> Files\Microsoft SQL Server\90\Tools\binn\;C:\Program Files\Common
> Files\Roxio Shared\DLLShared\;C:\Program Files\Common Files\Roxio
> Shared\9.0\DLLShared\;C:\Program Files\QuickTime\QTSystem\;C:\Program
> Files\Java\jdk1.6.0_13\bin
>
> Any ideas?
>
> Regards,
>
> John
>
>


Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Otis,

 We are not running master-slave configuration. We get very few
searches(admin only) in a day so we didn't see the need of
replication/snapshot. This problem is with one Solr instance managing
4 cores (each core 200 million records). Both indexing and searching
is performed by the same Solr instance.

What are .tii files used for? I see this file under only one core.

Still looking for what gets loaded in heap by Solr (during load time,
indexing, and searching) and stays there. I see most of these are
tenured objects and not getting released by GC - will post profile
records tomorrow.

Thanks,
-vivek





On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic
 wrote:
>
> There is constant mixing of indexing concepts and searching concepts in this 
> thread.  Are you having problems on the master (indexing) or on the slave 
> (searching)?
>
>
> That .tii is only 20K and you said this is a large index?  That doesn't smell 
> right...
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 5:12:00 PM
>> Subject: Re: Solr memory requirements?
>>
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> >
>> > Sorting is triggered by the sort parameter in the URL, not a 
>> > characteristic of
>> a field. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> >> Subject: Re: Solr memory requirements?
>> >>
>> >> Thanks Otis.
>> >>
>> >> Our use case doesn't require any sorting or faceting. I'm wondering if
>> >> I've configured anything wrong.
>> >>
>> >> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> >> stored). All my fields are basic data type - which I thought are not
>> >> sorted. My id field is unique key.
>> >>
>> >> Is there any field here that might be getting sorted?
>> >>
>> >>
>> >> required="true" omitNorms="true" compressed="false"/>
>> >>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> default="NOW/HOUR"  compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> default="NOW/HOUR" omitNorms="true"/>
>> >>
>> >>
>> >>
>> >>
>> >> omitNorms="true" multiValued="true"/>
>> >>
>> >> Thanks,
>> >> -vivek
>> >>
>> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>> >> wrote:
>> >> >
>> >> > Hi,
>> >> > Some answers:
>> >> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
>> >> > for
>> the
>> >> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>> >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>> consume
>> >> during indexing.  There is no need to commit every 50K docs unless you 
>> >> want
>> to
>> >> trigger snapshot creation.
>> >> > 3) see 1) above
>> >> >
>> >> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>> going
>> >> to fly. :)
>> >> >
>> >> > Otis
>> >> > --
>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >> >
>> >> >
>> >> >
>> >> > - Original Message 
>> >> >> From: vivek sar
>> >> >> To: solr-user@lucene.apache.org
>> >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> >> Subject: Solr memory requirements?
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >>   I'm pretty sure this has been