RE: Best practices for installing and maintaining Solr configuration

2011-12-30 Thread Brandon Ramirez
I actually have read that and I have Solr up and running on Tomcat.  I didn't 
realize that it was example/ including Jetty, etc. that was being recommended 
against, but the $SOLR_HOME, which I created by copying example/solr/

Thanks for the tips on upgrading.  I'll keep that in our documentation.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, December 29, 2011 8:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Best practices for installing and maintaining Solr configuration

This should help: http://wiki.apache.org/solr/SolrTomcat

The difference here is that you're not copying the example directory, you're 
copying the example/solr directory. And this is just basically to get the 
configuration files and directory structure right. You're not copying 
executables, jars, wars, or any of that stuff from example. You get the war 
file from the dist directory and that should contain all the executables & etc.


As to your other questions:
1> If at all possible, upping the match version and reindexing
 are good things to do.
2> It's also a good idea to update the config files. Alternatively,
 you can diff the config files between releases to see what the
 changes are and selectively add them to your config file.
 But you should test, test, test before rolling out into prod.

My rule of thumb for upgrading is to just not upgrade minor releases unless 
there are compelling reasons. The CHANGES.txt file will identify major 
additions.

There are good reasons not to get too far behind on major (i.e. 3.x -> 4.x) 
releases, the primary one being that Solr only makes an effort to be 
backwards-compatible through one major release. i.e. 1.4 can be read by 3.x 
(there was no 2.x Solr release). But no attempt will be made to by 4.x code to 
read 1.x indexes.

Hope this helps
Erick

On Wed, Dec 28, 2011 at 8:49 AM, Brandon Ramirez  
wrote:
> Hi List,
> I've seen several Solr developers mention the fact that people often copy 
> example/ to become their solr installation and that that is not recommended.  
> We are rebuilding our search functionality to use Solr and will be deploying 
> it in a few weeks.
>
> I have read the README, several wiki articles, mailing list and browsed the 
> Solr distribution.  The example/ directory seems to be the only configuration 
> I can find.  So, I have to ask: what is the recommended way to install Solr?
>
> What about maintaining it?  For example, Is it wise to up the 
> luceneMatchVersion and re-index with every upgrade?  When new configuration 
> options are added in new versions of Solr, should we worry about updating our 
> configuration to include them?  I realize these may be vague questions and 
> the answers could be case-by-case, but some general or high-level 
> documentation may help.
>
> Thanks!
>
>
> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 Software 
> Engineer II | Element K | www.elementk.com<http://www.elementk.com/>
>




RE: getting solr to expand Acronym

2011-11-11 Thread Brandon Ramirez
Could this be simulated through synonyms?  Could you define "CD" as a synonym 
of "Compact Disc" or vice versa?  I'm not sure if that would work, just 
brainstorming here...


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Tiernan OToole [mailto:lsmart...@gmail.com] 
Sent: Friday, November 11, 2011 5:10 AM
To: solr-user@lucene.apache.org
Subject: getting solr to expand Acronym

Dont know if this is posible, but  i need to ask anyway... Say we have a list 
of Acronyms in a database (CD, DVD, CPU) and also a list of their not so short 
names (Compact Disk, Digital Versitile Disk, Central Processing
Unit) but they are not linked in any particular way (lost of items, some with 
full names, some using anronyms), is it posible for Solr to figure out CD is an 
Acronym of Compact Disk? I know CD could also mean Central Data, or anything 
that beings with C and D, but is there a way to tell solr to look for items 
that not only match CD, but have words next to each other that begin with C and 
D... Another example i can think of is IBM: It could be International Business 
Machines, or Irish Business Machines, or Irish Banking Machines...

So, would that be posible?

--
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie


RE: Can't find resource 'solrconfig.xml'

2011-10-31 Thread Brandon Ramirez
I have found setenv.sh to be very helpful.  It's a hook where you can setup 
environment variables and java options without modifying your catalina.sh 
script.  This makes upgrading a whole lot easier.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: Monday, October 31, 2011 8:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Can't find resource 'solrconfig.xml'

modify catalina.sh(bat)
adding java startup params:
-Dsolr.solr.home=/your/path

On Mon, Oct 31, 2011 at 8:30 PM, 刘浪  wrote:

> Hi,
>  After I start tomcat, I input http://localhost:8080/solr/admin. 
> It can display. But in the tomcat, I find an exception like "Can't 
> find resource 'solrconfig.xml' in classpath or 'solr\.\conf/', 
> cwd=D:\Program Files (x86)\apache-tomcat-6.0.33\bin". It occures 
> before "Server start up in 1682 ms."
>  What should I do?Thank you very much.
>
>  Solr Directory: D:\Program Files (x86)\solr. It contents bin, 
> conf, data, solr.xml, and README.txt.
>  Tomcat Directory: D:\Program Files (x86)\apache-tomcat-6.0.33.
>
> Sincerely,
> Amos
>


RE: Partial updates?

2011-10-28 Thread Brandon Ramirez
I would love to see this too.  Most of our data comes from a relational 
database, but there are some files on the file system related to our products 
that may need to be indexed.  The files have different change control / life 
cycle, so I can't be sure that our application will know when this data  
changes, so a recurring background re-index job would be helpful.  Having to go 
to the database to get 99% of the data (which didn't change anyway) to send 
along with the 1% from the file system is a big limitation.

This also prevents the use of DIH.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: mlevy [mailto:ml...@ushmm.org] 
Sent: Friday, October 28, 2011 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial updates?

An ability to update would be extremely useful for us. Different parts of 
records sometimes come from different databases, and being able to update after 
creation of the Solr index would be extremely useful.

I've made some processes that reads a record and adds a new field to it. The 
most awkward thing is when there's been a CopyField, when the record is read 
and re-saved, the copied field causes CopyField to be invoked again.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-updates-tp502570p3461740.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: how to handle large relational data in Solr

2011-10-20 Thread Brandon Ramirez
I would not recommend removing your relational database altogether.  You should 
treat that as your system of record.  By replacing it, you are forcing Solr to 
store the unmodified value for everything even when not needed.  You also lose 
normalization.   And if you ever need to add some data to your system that 
isn't search-related, you have no choice but to add it to your search index.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Jonathan Carothers [mailto:jonathan.caroth...@amentra.com] 
Sent: Thursday, October 20, 2011 10:12 AM
To: solr-user@lucene.apache.org
Subject: how to handle large relational data in Solr

All,

We are attempting to convert a fairly large relational database into Solr 
index(es).

There are ~100,000 products with ~1,000,000 accessories that can be related to 
any number of the products.  So if I include the search terms and the 
relationships in the same index, we're looking at a pretty huge index.

If we break it out into three indexes, one for the product search, one for the 
accessories search, and one for their relationship, is there a good way to 
merge the results?

Is there a better way to structure the indexes?

We will have a relational database available if it makes sense to do some sort 
of a hybrid approach.

many thanks,
Jonathan



RE: Find Documents with field = maxValue

2011-10-18 Thread Brandon Ramirez
I don't know anything about your environment, so maybe this doesn't make sense, 
but maybe you can check your source system (database or whatnot) to get the 
max_age, then search for the max_age in your Solr index.

It's not as elegant, but may be a lot easier.

To reduce the risk of interacting with potentially stale data, you may want to 
change your = to >= or whatever is appropriate.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Alireza Salimi [mailto:alireza.sal...@gmail.com] 
Sent: Tuesday, October 18, 2011 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Find Documents with field = maxValue

Hi Ahmet,

Thanks for your reply, but I want ALL documents with age = max_age.


On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan  wrote:

>
>
> --- On Tue, 10/18/11, Alireza Salimi  wrote:
>
> > From: Alireza Salimi 
> > Subject: Find Documents with field = maxValue
> > To: solr-user@lucene.apache.org
> > Date: Tuesday, October 18, 2011, 4:10 PM Hi,
> >
> > It might be a naive question.
> > Assume we have a list of Document, each Document contains the 
> > information of a person, there is a numeric field named 'age', how 
> > can we find those Documents whose
> > *age* field
> > is *max(age) *in one query.
>
> May be http://wiki.apache.org/solr/StatsComponent?
>
> Or sort by age?  q=*:*&start=0&rows=1&sort=age desc
>



--
Alireza Salimi
Java EE Developer


Re: Replication with an HA master

2011-10-13 Thread Brandon Ramirez
I have the luxury of JMS in my environment, so that may be a simple way to 
solve this...

Sent from my iPhone

On Oct 13, 2011, at 4:02 PM, "Robert Stewart"  wrote:

> Yes that is a good point.  Thanks.
> 
> I think I will avoid using NAS/SAN and use two masters, one setup as a 
> repeater (slave and master).  In case of very rare master failure, some minor 
> manual intervention will be required to re-configure remaining master or 
> bring other one back up.
> 
> My only concern in that case is losing new documents from solrj client since 
> there is no broker/buffer/queue between solrj client and SOLR master.  It 
> would be nice if there was some open source broker/queue which could sit 
> between solrj and SOLR and queue up messages (publish/subscribe).
> 
> Bob
> 
> On Oct 13, 2011, at 3:56 PM, Jaeger, Jay - DOT wrote:
> 
>> One thing to consider is the case where the JVM is up, but the system is 
>> otherwise unavailable (say, a NIC failure, firewall failure, load balancer 
>> failure) - especially if you use a SAN (whose connection is different from 
>> the normal network).
>> 
>> In such a case the old master might have uncommitted updates.
>> 
>> JRJ
>> 
>> -Original Message-
>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
>> Sent: Tuesday, October 11, 2011 3:17 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Replication with an HA master
>> 
>> Hello,
>> - Original Message -
>> 
>>> From: Robert Stewart 
>>> To: solr-user@lucene.apache.org
>>> Cc: 
>>> Sent: Tuesday, October 11, 2011 3:37 PM
>>> Subject: Re: Replication with an HA master
>>> 
>>> In the case of using a shared (SAN) index between 2 masters, what happens 
>>> if the 
>>> live master fails in such a way that the index remains "locked" (such 
>>> as if some hardware failure and it did not unlock/close index).  Will the 
>>> other 
>>> master be able to open/write to the index as new documents are added?
>> 
>> 
>> You'd use native locks, which should disappear if the JVM dies.  If it does 
>> not, then I'm not 100% sure what happens, but in the worst case there would 
>> be a need for a quick manual (or scripted) intervention.  But your index 
>> would be up to date!
>> 
>>> Also, if that can work ok, would it work if you have a LB (VIP) from both 
>>> indexing and replication sides of the 2 masters, such that some VIP used by 
>>> solrj for indexing new documents via HTTP, and the same VIP used by slave 
>>> searchers for replication?  That sounds like it would work.
>> 
>> 
>> Precisely what you should do.  e.g. "master-vip" is the "hostname" that both 
>> SolrJ would post new docs to and the master "server" slaves would poll for 
>> index changes.
>> 
>> Otis
>> 
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>> 
>> 
>> 
>> 
>>> On Oct 11, 2011, at 3:16 PM, Otis Gospodnetic wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Yes, you've read about NFS, which is why I gave the example of a SAN 
>>> (which can have multiple power supplies, controllers, etc.)
>>>> 
>>>> Yes, should be OK to have multiple Solr instances have the same index 
>>>> open, 
>>> since only one of them will actually be writing to it, thanks to LB.
>>>> 
>>>> Otis
>>>> 
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>> 
>>>> 
>>>>> 
>>>>> From: Brandon Ramirez 
>>>>> To: "solr-user@lucene.apache.org" 
>>> 
>>>>> Sent: Tuesday, October 11, 2011 2:55 PM
>>>>> Subject: RE: Replication with an HA master
>>>>> 
>>>>> Using a shared volume crossed my mind too, but I discarded the idea 
>>> because of literature I have read about Lucene performing poorly against 
>>> remote 
>>> file systems.  But then I suppose a SAN wouldn't be a remote file system in 
>>> the same sense as an NFS-mounted NAS or similar.
>>>>> 
>>>>> Should I be concerned about two solr instances on two machines having 
>>> the same SAN-based index open, as long as only one of them is receiving 
>>> 

RE: Replication with an HA master

2011-10-11 Thread Brandon Ramirez
Using a shared volume crossed my mind too, but I discarded the idea because of 
literature I have read about Lucene performing poorly against remote file 
systems.  But then I suppose a SAN wouldn't be a remote file system in the same 
sense as an NFS-mounted NAS or similar.

Should I be concerned about two solr instances on two machines having the same 
SAN-based index open, as long as only one of them is receiving requests?  I 
would think in theory it would work, but I don't have any production-level 
experience with Solr yet, only textbook knowledge.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, October 11, 2011 2:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication with an HA master

A few alternatives:
* Have the master keep the index on a shared disk (e.g. SAN)
* Use LB to easily switch to between masters, potentially even automatically if 
LB can detect the primary is down

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem 
search :: http://search-lucene.com/


>
>From: Robert Stewart 
>To: solr-user@lucene.apache.org
>Sent: Friday, October 7, 2011 10:22 AM
>Subject: Re: Replication with an HA master
>
>Your idea sounds like the correct path.  Setup 2 masters, one running 
>in "slave" mode which pulls replicas from the live master.  When/if live 
>master goes down, you just reconfigure and restart the backup master to be the 
>live master.  You'd also need to then start data import on the backup master 
>(enable/start cron job?), and redirect slave searchers to pull replicas from 
>the new live master.  All that could be done using scripts or something like 
>puppet possibly.
>
>Another solution maybe is to run 2 "live" masters, which both index the same 
>content from the same data source.  If one goes down, then you just need to 
>redirect slave searchers to the backup master for replication.
>
>I am also starting a similar project which needs some disaster recovery 
>processes in place, so any other info would be useful to me as well.
>
>Bob
>
>On Oct 7, 2011, at 9:53 AM, Brandon Ramirez wrote:
>
>> We are getting ready to start a project using Solr as our backend search 
>> engine and I am trying to devise a deployment architecture that works for 
>> us.  We definitely need a master/slave replication strategy, that's for 
>> sure, but my concern is the master becomes a single point of failure.
>> 
>> Fortunately, real-time search is not a requirement for us.  If search 
>> results are a few minutes out of sync with our database, it's not a big deal.
>> 
>> So what I would like to do is have a set of query servers (slaves) that are 
>> only used for querying, no indexing and have them use Solr's HTTP 
>> replication mechanism on a 2 or 3 minute interval.  To get HA indexing, I'd 
>> like to have 2 masters: a primary and a standby.  All indexing requests go 
>> to the primary unless it's taken out of service.  To keep the standby ready 
>> to takeover if it needs to, it needs to be more up to date than the slaves.  
>> I'd like to have it replicate every 30 seconds or so.
>> 
>> The reason I'm asking about it on this list is that I haven't seen any Solr 
>> documentation or even anything that talks about this.  I can't be the only 
>> one concerned about having a single point of failure, so I'm reaching out to 
>> see what others have done in this case before I go with my own solution.
>> 
>> 
>> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 Software 
>> Engineer II | Element K | www.elementk.com<http://www.elementk.com/>
>> 
>
>
>
>



RE: searching documents partially

2011-10-10 Thread Brandon Ramirez
I may not be understanding the question correctly, but I think the dismax 
parser would solve this since you can specify the fields that you want to 
search against.  So you just need a pre-login field list and a post-login field 
list in your application logic.  Or like pravesh suggested, create multiple 
search handlers in solr and give each a different set of default or mandatory 
field list.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com


-Original Message-
From: pravesh [mailto:suyalprav...@yahoo.com] 
Sent: Monday, October 10, 2011 5:32 AM
To: solr-user@lucene.apache.org
Subject: Re: searching documents partially

Can you clarify following:

1)  Is it that: You want to hide some documents from search when user is not 
logged-in?
 OR
2)  Is it that: You want to hide some fields of some documents from search when 
user is not logged-in?

For Point 2; one solution can be that while indexing the documents, you can 
re-index same field(s) twice. Lets say; /content/ & /content_dup/. (Similar 
treatment can be given to other fields, which requires this restriction)

At the index time you set /content/ for all documents, but set /content_dup/ 
only for documents/fields which you want to appear in search when user is not 
logged-in.

At search time, when user is not logged-in then search on the /content_dup/ 
field, and when user is logged-in then search on the /content/ field.


Another way could be:

Just register your search-handler with another name and change the default 
search fields etc.

I don't know that much this helps :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-documents-partially-tp3408429p3409022.html
Sent from the Solr - User mailing list archive at Nabble.com.



Replication with an HA master

2011-10-07 Thread Brandon Ramirez
We are getting ready to start a project using Solr as our backend search engine 
and I am trying to devise a deployment architecture that works for us.  We 
definitely need a master/slave replication strategy, that's for sure, but my 
concern is the master becomes a single point of failure.

Fortunately, real-time search is not a requirement for us.  If search results 
are a few minutes out of sync with our database, it's not a big deal.

So what I would like to do is have a set of query servers (slaves) that are 
only used for querying, no indexing and have them use Solr's HTTP replication 
mechanism on a 2 or 3 minute interval.  To get HA indexing, I'd like to have 2 
masters: a primary and a standby.  All indexing requests go to the primary 
unless it's taken out of service.  To keep the standby ready to takeover if it 
needs to, it needs to be more up to date than the slaves.  I'd like to have it 
replicate every 30 seconds or so.

The reason I'm asking about it on this list is that I haven't seen any Solr 
documentation or even anything that talks about this.  I can't be the only one 
concerned about having a single point of failure, so I'm reaching out to see 
what others have done in this case before I go with my own solution.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
Software Engineer II | Element K | www.elementk.com<http://www.elementk.com/>