Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
So, if my other filter/selection criteria get some set of the whole index that 
goes say from 50% relevance to 60% relevance, the set still gets ordered by 
relevance and then each item in the returned set is still based on its 
relevance 
relative to the set, right? That would only be a problem if there was some 
minimal relevance desired, right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:00:09 PM
Subject: Re: Using Multiple Cores for Multiple Users

Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for "total frequency for terms only in
these documents". Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon  wrote:
> hm, relevance is before filtering, probably during indexing?
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 7:07:45 PM
> Subject: Re: Using Multiple Cores for Multiple Users
>
> There is a standard problem with this: relevance is determined from
> all of the words in a field of all documents, not just the documents
> that match the query. That is, when user A searches for 'monkeys' and
> one of his feeds has a document with this word, but someone else is a
> zoophile, 'monkeys' will be a common word in the index. This will skew
> the relevance computation for user A.
>
> You could have a separate text field for each user. This might work
> better- but you can't use field norms (they take up space for all
> documents).
>
> Lance
>
> On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
>  wrote:
>> Thanks a lot for all the tips, guys! I think that we may explore both
>> options just to see what happens. I'm sure that scalability will be a huge
>> mess with the core-per-user scenario. I like the idea of creating a user ID
>> field and agree that it's probably the best approach. We'll see...I will be
>> sure to let the list know what I find! Please don't stop posting your
>> comments everyone ;-) My inquiring mind wants to know...
>>
>> Adam
>>
>> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>>
>>> If storing in a single index (possibly sharded if you need it), you can
>>> simply include a solr field that specifies the user ID of the saved thing.
>>> On the client side, in your application, simply ensure that there is an fq
>>> parameter limiting to the current user, if you want to limit to the current
>>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>>> cores', there is no relevancy issue.
>>>
>>> It IS true that when your index gets very large, commits will start taking
>>> longer, which can be a problem. I don't mean commits will take longer just
>>> because there is more stuff to commit -- the larger the index, the longer an
>>> update to a single document will take to commit.
>>>
>>> In general, i suspect that having dozens or hundreds (or thousands!) of
>>> cores is not going to scale well, it is not going to make good use of your
>>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>>
>>> However, you are probably going to run into some issues with the single
>>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>>> an oft-asked question that there doesn't seem to be any "just works and does
>>> everything for you without needing to think about it" solution for in solr.
>>> Judging from past thread. I am not a Solr developer or expert.
>>>
>>> 
>>> From: Markus Jelsma [markus.jel...@openindex.io]
>>> Sent: Tuesday, November 09, 2010 6:57 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Adam Estrada
>>> Subject: Re: Using Multiple Cores for Multiple Users
>>>
>>> Hi,
>>>
>>> > All,
>>> >
>>> > I have a web application that requires the user to register and then
>>> login
>>> > to gain access to the site. Pretty standard stuff...Now I would like to
>>> > know what the best approach would be to implement a "customized" search
>>> > experience for each user. Would this mean creating a separate core per
>>> > user? I think that this is not possible without restarting Solr after
>>> each
>>> > core is added 

Re: scheduling imports and heartbeats

2010-11-09 Thread Ranveer Kumar
You should use cron for that..

On 10 Nov 2010 08:47, "Tri Nguyen"  wrote:

Hi,

Can I configure solr to schedule imports at a specified time (say once a
day,
once an hour, etc)?

Also, does solr have some sort of heartbeat mechanism?

Thanks,

Tri


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for "total frequency for terms only in
these documents". Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon  wrote:
> hm, relevance is before filtering, probably during indexing?
>  Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 7:07:45 PM
> Subject: Re: Using Multiple Cores for Multiple Users
>
> There is a standard problem with this: relevance is determined from
> all of the words in a field of all documents, not just the documents
> that match the query. That is, when user A searches for 'monkeys' and
> one of his feeds has a document with this word, but someone else is a
> zoophile, 'monkeys' will be a common word in the index. This will skew
> the relevance computation for user A.
>
> You could have a separate text field for each user. This might work
> better- but you can't use field norms (they take up space for all
> documents).
>
> Lance
>
> On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
>  wrote:
>> Thanks a lot for all the tips, guys! I think that we may explore both
>> options just to see what happens. I'm sure that scalability will be a huge
>> mess with the core-per-user scenario. I like the idea of creating a user ID
>> field and agree that it's probably the best approach. We'll see...I will be
>> sure to let the list know what I find! Please don't stop posting your
>> comments everyone ;-) My inquiring mind wants to know...
>>
>> Adam
>>
>> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>>
>>> If storing in a single index (possibly sharded if you need it), you can
>>> simply include a solr field that specifies the user ID of the saved thing.
>>> On the client side, in your application, simply ensure that there is an fq
>>> parameter limiting to the current user, if you want to limit to the current
>>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>>> cores', there is no relevancy issue.
>>>
>>> It IS true that when your index gets very large, commits will start taking
>>> longer, which can be a problem. I don't mean commits will take longer just
>>> because there is more stuff to commit -- the larger the index, the longer an
>>> update to a single document will take to commit.
>>>
>>> In general, i suspect that having dozens or hundreds (or thousands!) of
>>> cores is not going to scale well, it is not going to make good use of your
>>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>>
>>> However, you are probably going to run into some issues with the single
>>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>>> an oft-asked question that there doesn't seem to be any "just works and does
>>> everything for you without needing to think about it" solution for in solr.
>>> Judging from past thread. I am not a Solr developer or expert.
>>>
>>> 
>>> From: Markus Jelsma [markus.jel...@openindex.io]
>>> Sent: Tuesday, November 09, 2010 6:57 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Adam Estrada
>>> Subject: Re: Using Multiple Cores for Multiple Users
>>>
>>> Hi,
>>>
>>> > All,
>>> >
>>> > I have a web application that requires the user to register and then
>>> login
>>> > to gain access to the site. Pretty standard stuff...Now I would like to
>>> > know what the best approach would be to implement a "customized" search
>>> > experience for each user. Would this mean creating a separate core per
>>> > user? I think that this is not possible without restarting Solr after
>>> each
>>> > core is added to the multi-core xml file, right?
>>>
>>> No, you can dynamically manage cores and parts of their configuration.
>>> Sometimes you must reindex after a change, the same is true for reloading
>>> cores. Check the wiki on this one [1].
>>>
>>> >
>>> > My use case is this...User A would like to index 5 RSS feeds and User B
>>> > would like to index 5 completely different RSS feeds and he is not
>>> > interested at all in what User A is interested in. This means that they
>>> > would have to be separate index cores, right?
>>>
>>> If you view documents within an rss feed as a separate documents, you can
>>> assign an user ID to those documents, creating a multi user index with rss
>>> documents per user, or group or whatever.
>>>
>>> Having a core per user isn't a good idea if you have many users.  It takes
>>> up
>>> additional memory and disk space, do

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
hm, relevance is before filtering, probably during indexing?
 Dennis Gearon 


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die. 



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 7:07:45 PM
Subject: Re: Using Multiple Cores for Multiple Users

There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 wrote:
> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
>
> Adam
>
> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>
>> If storing in a single index (possibly sharded if you need it), you can
>> simply include a solr field that specifies the user ID of the saved thing.
>> On the client side, in your application, simply ensure that there is an fq
>> parameter limiting to the current user, if you want to limit to the current
>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>> cores', there is no relevancy issue.
>>
>> It IS true that when your index gets very large, commits will start taking
>> longer, which can be a problem. I don't mean commits will take longer just
>> because there is more stuff to commit -- the larger the index, the longer an
>> update to a single document will take to commit.
>>
>> In general, i suspect that having dozens or hundreds (or thousands!) of
>> cores is not going to scale well, it is not going to make good use of your
>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>
>> However, you are probably going to run into some issues with the single
>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>> an oft-asked question that there doesn't seem to be any "just works and does
>> everything for you without needing to think about it" solution for in solr.
>> Judging from past thread. I am not a Solr developer or expert.
>>
>> 
>> From: Markus Jelsma [markus.jel...@openindex.io]
>> Sent: Tuesday, November 09, 2010 6:57 PM
>> To: solr-user@lucene.apache.org
>> Cc: Adam Estrada
>> Subject: Re: Using Multiple Cores for Multiple Users
>>
>> Hi,
>>
>> > All,
>> >
>> > I have a web application that requires the user to register and then
>> login
>> > to gain access to the site. Pretty standard stuff...Now I would like to
>> > know what the best approach would be to implement a "customized" search
>> > experience for each user. Would this mean creating a separate core per
>> > user? I think that this is not possible without restarting Solr after
>> each
>> > core is added to the multi-core xml file, right?
>>
>> No, you can dynamically manage cores and parts of their configuration.
>> Sometimes you must reindex after a change, the same is true for reloading
>> cores. Check the wiki on this one [1].
>>
>> >
>> > My use case is this...User A would like to index 5 RSS feeds and User B
>> > would like to index 5 completely different RSS feeds and he is not
>> > interested at all in what User A is interested in. This means that they
>> > would have to be separate index cores, right?
>>
>> If you view documents within an rss feed as a separate documents, you can
>> assign an user ID to those documents, creating a multi user index with rss
>> documents per user, or group or whatever.
>>
>> Having a core per user isn't a good idea if you have many users.  It takes
>> up
>> additional memory and disk space, doesn't share caches etc.  There is also
>> more maintenance and your need some support scripts to dynamically create
>> new
>> cores - Solr currently doesn't create a new core directory structure.
>>
>> But, reindexing a very large index takes up a lot more time and resources
>> and
>> relevancy might be an issue depending on the rss feeds' contents.
>>
>> >
>> > What is the best approach for this kind of thing?
>>
>> I'd usually store 

scheduling imports and heartbeats

2010-11-09 Thread Tri Nguyen
Hi,
 
Can I configure solr to schedule imports at a specified time (say once a day, 
once an hour, etc)?
 
Also, does solr have some sort of heartbeat mechanism?
 
Thanks,
 
Tri

Re: Highlighter - multiple instances of term being combined

2010-11-09 Thread Lance Norskog
Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
off the main solr admin page. It will show you how text is broken up
for both the indexing and query processes. You might get some insight
about how these words are torn apart and assigned positions. Trying
the different Analyzers and options might get you there.

But to be frank- highlighting is a tough problem and has always had a
lot of edge cases.

On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri  wrote:
> I'm finding that if a keyword appears in a field multiple times very close
> together, it will get highlighted as a phrase even though there are other
> terms between the two instances. So this search:
>
> http://localhost:8983/solr/select/?
>
> hl=true&
> hl.snippets=1&
> q=residue&
> hl.fragsize=0&
> mergeContiguous=false&
> indent=on&
> hl.usePhraseHighlighter=false&
> debugQuery=on&
> hl.fragmenter=gap&
> hl.highlightMultiTerm=false
>
> Highlights as:
> What does "low-residue" mean? Like low-residue diet?
>
> Trying to get it to highlight as:
> What does "low-residue" mean? Like low-residue diet?
> I've tried playing with various combinations of mergeContiguous,
> highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
> output.
>
> For reference, field type uses a StandardTokenizerFactory and
> SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
> SnowballFilterFactory. I've confirmed that the intermediate words don't
> appear in either the synonym or the stop words list. I can post the full
> definition if helpful.
>
> Any pointers as to how to debug this would be greatly appreciated!
> sasank
>



-- 
Lance Norskog
goks...@gmail.com


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 wrote:
> Thanks a lot for all the tips, guys! I think that we may explore both
> options just to see what happens. I'm sure that scalability will be a huge
> mess with the core-per-user scenario. I like the idea of creating a user ID
> field and agree that it's probably the best approach. We'll see...I will be
> sure to let the list know what I find! Please don't stop posting your
> comments everyone ;-) My inquiring mind wants to know...
>
> Adam
>
> On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:
>
>> If storing in a single index (possibly sharded if you need it), you can
>> simply include a solr field that specifies the user ID of the saved thing.
>> On the client side, in your application, simply ensure that there is an fq
>> parameter limiting to the current user, if you want to limit to the current
>> user's stuff.  Relevancy ranking should work just as if you had 'seperate
>> cores', there is no relevancy issue.
>>
>> It IS true that when your index gets very large, commits will start taking
>> longer, which can be a problem. I don't mean commits will take longer just
>> because there is more stuff to commit -- the larger the index, the longer an
>> update to a single document will take to commit.
>>
>> In general, i suspect that having dozens or hundreds (or thousands!) of
>> cores is not going to scale well, it is not going to make good use of your
>> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>>
>> However, you are probably going to run into some issues with the single
>> index approach too. In general, how to deal with "multi-tenancy" in Solr is
>> an oft-asked question that there doesn't seem to be any "just works and does
>> everything for you without needing to think about it" solution for in solr.
>> Judging from past thread. I am not a Solr developer or expert.
>>
>> 
>> From: Markus Jelsma [markus.jel...@openindex.io]
>> Sent: Tuesday, November 09, 2010 6:57 PM
>> To: solr-user@lucene.apache.org
>> Cc: Adam Estrada
>> Subject: Re: Using Multiple Cores for Multiple Users
>>
>> Hi,
>>
>> > All,
>> >
>> > I have a web application that requires the user to register and then
>> login
>> > to gain access to the site. Pretty standard stuff...Now I would like to
>> > know what the best approach would be to implement a "customized" search
>> > experience for each user. Would this mean creating a separate core per
>> > user? I think that this is not possible without restarting Solr after
>> each
>> > core is added to the multi-core xml file, right?
>>
>> No, you can dynamically manage cores and parts of their configuration.
>> Sometimes you must reindex after a change, the same is true for reloading
>> cores. Check the wiki on this one [1].
>>
>> >
>> > My use case is this...User A would like to index 5 RSS feeds and User B
>> > would like to index 5 completely different RSS feeds and he is not
>> > interested at all in what User A is interested in. This means that they
>> > would have to be separate index cores, right?
>>
>> If you view documents within an rss feed as a separate documents, you can
>> assign an user ID to those documents, creating a multi user index with rss
>> documents per user, or group or whatever.
>>
>> Having a core per user isn't a good idea if you have many users.  It takes
>> up
>> additional memory and disk space, doesn't share caches etc.  There is also
>> more maintenance and your need some support scripts to dynamically create
>> new
>> cores - Solr currently doesn't create a new core directory structure.
>>
>> But, reindexing a very large index takes up a lot more time and resources
>> and
>> relevancy might be an issue depending on the rss feeds' contents.
>>
>> >
>> > What is the best approach for this kind of thing?
>>
>> I'd usually store the feeds in a single index and shard if it's too many
>> for a
>> single server with your specifications. Unless the demands are too
>> specific.
>>
>> >
>> > Thanks in advance,
>> > Adam
>>
>> [1]: http://wiki.apache.org/solr/CoreAdmin
>>
>> Cheers
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: dynamically create unique key

2010-11-09 Thread Lance Norskog
Here is an exausting and exhaustive discursion about picking a unique key:

http://wiki.apache.org/solr/UniqueKey




On Tue, Nov 9, 2010 at 4:20 PM, Dennis Gearon  wrote:
> Seems to me, it would be a good idea to put into the Solr Code, a unique ID 
> per
> instance or installation or both, accessible either with JAVA or a query. Kind
> of like all the browsers do for their SSL connections.
>
> Then, it's automatically easy to implement what is described below.
>
> Maybe it should be written to the config file upon first run when it does not
> exist, and then any updates or reinstalls would reuse the same
> installation/instance ID.
>
>
>
> From: Christopher Gross 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 9, 2010 11:37:03 AM
> Subject: Re: dynamically create unique key
>
> Thanks Hoss, I'll look into that!
>
> -- Chris
>
>
> On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter 
> wrote:
>
>>
>> : one large index.  I need to create a unique key for the Solr index that
>> will
>> : be unique per document.  If I have 3 systems, and they all have a
>> document
>> : with id=1, then I need to create a "uniqueId" field in my schema that
>> : contains both the system name and that id, along the lines of: "sysa1",
>> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>>
>> take a look at the SignatureUpdateProcessorFactory...
>>
>> http://wiki.apache.org/solr/Deduplication
>>
>> :   
>> :   
>>         ...
>> : So instead of just appending to the uniqueId field, it tried to do a
>> : multiValued.  Does anyone have an idea on how I can make this work?
>>
>> copyField doesn't "append" it copies Field (value) instances from the
>> "source" field to the "dest" field -- so if you get multiple values for
>> hte dest field.
>>
>>
>> -Hoss
>>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: returning message to sender

2010-11-09 Thread Lance Norskog
David Smiley and Eric Pugh wrote a wonderful book on Solr:

http://www.lucidimagination.com/blog/2010/01/11/book-review-solr-packt-book/

Reading through this book and trying the examples will address all of
your questions.

On Tue, Nov 9, 2010 at 3:23 PM, Erick Erickson  wrote:
> Hmmm, this is a little murky
> I'm inferring that you believe that DIH somehow
> queries the data source at #query# time, and this
> is not true.  DIH is an #index time# concept.
>
> DIH is used to add data to an index. Once that index is
> created, all searches against are unaware that there
> were different data sources.
>
> So, with a single Solr schema, you can use DIH
> on as many different data sources as you want,
> mapping the various bits of information from each
> data source into your Solr schema. Searches go
> against fields defined in the schema, so you're
> automatically searching against all the databases
> (assuming you've mapped your data into your
> schema)
>
> If I've misunderstood, perhaps you can add some
> details?
>
> Best
> Erick
>
> On Tue, Nov 9, 2010 at 1:39 PM, Teki, Prasad <
> prasad_t...@standardandpoors.com> wrote:
>
>> --=_Part_27114_30663314.1289327581322
>> Content-Type: text/plain; charset=us-ascii
>> Content-Transfer-Encoding: 7bit
>>
>>
>> Hi guys,
>> I have been exploring Solr since last few weeks. Our main intension is
>> to
>> expose the data, as WS, across various data sources by linking them
>> using
>> some scenario.
>>
>> I have couple of questions.
>> Is there any good document/URL, which answers...
>>
>> How the indexing happens/built for the queries across different data
>> sources
>> (DIH)?
>>
>> Does the Lucene store the actual data of each individual query or a
>> combination?, where, if yes?
>>
>> Whenever we do a query against built index, when exactly it fires the
>> query
>> to database?
>>
>> How does the index get the updates from the DIH, For example, if my
>> query
>> includes 3 DIH and
>> What is the max number of data sources, I can include to get better
>> performace?
>>
>> How do we measure the scalablity?
>>
>> Can I run these search engines in a grid mode?
>>
>> Thanks.
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>> --=_Part_27114_30663314.1289327581322
>> Content-Type: text/html; charset=us-ascii
>> Content-Transfer-Encoding: 7bit
>>
>>
>> Hi guys,
>> I have been exploring Solr since last few weeks. Our main intension is
>> to expose the data, as WS, across various data sources by linking them
>> using some scenario.
>>
>> I have couple of questions.
>> Is there any good document/URL, which answers...
>>
>> How the indexing happens/built for the queries across different data
>> sources (DIH)?
>>
>> Does the Lucene store the actual data of each individual query or a
>> combination?, where, if yes?
>>
>> Whenever we do a query against built index, when exactly it fires the
>> query to database?
>>
>> How does the index get the updates from the DIH, For example, if my
>> query includes 3 DIH and
>> What is the max number of data sources, I can include to get better
>> performace?
>>
>> How do we measure the scalablity?
>>
>> Can I run these search engines in a grid mode?
>>
>> Thanks.> src='http://n3.nabble.com/images/smiley/anim_confused.gif' />
>> 
>> View this message in context: > href="http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html";
>> >Storage
>> Sent from the > href="http://lucene.472066.n3.nabble.com/Solr-User-f472068.html";>Solr -
>> User mailing list archive at Nabble.com.
>>
>> --=_Part_27114_30663314.1289327581322--
>> Standard & Poor's: Empowering Investors and Markets for 150 Years
>>
>> 
>>
>> The information contained in this message is intended only for the
>> recipient, and may be a confidential attorney-client communication or may
>> otherwise be privileged and confidential and protected from disclosure. If
>> the reader of this message is not the intended recipient, or an employee or
>> agent responsible for delivering this message to the intended recipient,
>> please be aware that any dissemination or copying of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please immediately notify us by replying to the message and deleting it from
>> your computer. The McGraw-Hill Companies, Inc. reserves the right, subject
>> to applicable local law, to monitor and review the content of any electronic
>> message or information sent to or from McGraw-Hill employee e-mail addresses
>> without informing the sender or recipient of the message.
>> 
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: solr init.d script

2010-11-09 Thread Lance Norskog
As many solrs as you want can open an index for read-only queries. If
you have a shared disk with a global file system, this could work very
well.

A note: Solr sessions are stateless. There is no reason to run JBoss
Solr in fail-over mode with session replication.

On Tue, Nov 9, 2010 at 12:25 PM, Nikola Garafolic
 wrote:
> On 11/09/2010 07:00 PM, Israel Ekpo wrote:
>>
>> Yes.
>>
>> I recommend running Solr via a servlet container.
>>
>> It is much easier to manage compared to running it by itself.
>>
>> On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
>> wrote:
>
> But in my case, that would make things more complex as I see it. Two jboss
> servers with solr as servlet container, and then I need the same data dir,
> right? I am now running single solr instance as cluster service, with data
> dir set to shared lun, that can be started on any of two hosts.
>
> Can you explain my benefits with two solr instances via servlet, maybe more
> performance?
>
> Regards,
> Nikola
>
> --
> Nikola Garafolic
> SRCE, Sveucilisni racunski centar
> tel: +385 1 6165 804
> email: nikola.garafo...@srce.hr
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Highlighter - multiple instances of term being combined

2010-11-09 Thread Sasank Mudunuri
I'm finding that if a keyword appears in a field multiple times very close
together, it will get highlighted as a phrase even though there are other
terms between the two instances. So this search:

http://localhost:8983/solr/select/?

hl=true&
hl.snippets=1&
q=residue&
hl.fragsize=0&
mergeContiguous=false&
indent=on&
hl.usePhraseHighlighter=false&
debugQuery=on&
hl.fragmenter=gap&
hl.highlightMultiTerm=false

Highlights as:
What does "low-residue" mean? Like low-residue diet?

Trying to get it to highlight as:
What does "low-residue" mean? Like low-residue diet?
I've tried playing with various combinations of mergeContiguous,
highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
output.

For reference, field type uses a StandardTokenizerFactory and
SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
SnowballFilterFactory. I've confirmed that the intermediate words don't
appear in either the synonym or the stop words list. I can post the full
definition if helpful.

Any pointers as to how to debug this would be greatly appreciated!
sasank


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
Thanks a lot for all the tips, guys! I think that we may explore both
options just to see what happens. I'm sure that scalability will be a huge
mess with the core-per-user scenario. I like the idea of creating a user ID
field and agree that it's probably the best approach. We'll see...I will be
sure to let the list know what I find! Please don't stop posting your
comments everyone ;-) My inquiring mind wants to know...

Adam

On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind  wrote:

> If storing in a single index (possibly sharded if you need it), you can
> simply include a solr field that specifies the user ID of the saved thing.
> On the client side, in your application, simply ensure that there is an fq
> parameter limiting to the current user, if you want to limit to the current
> user's stuff.  Relevancy ranking should work just as if you had 'seperate
> cores', there is no relevancy issue.
>
> It IS true that when your index gets very large, commits will start taking
> longer, which can be a problem. I don't mean commits will take longer just
> because there is more stuff to commit -- the larger the index, the longer an
> update to a single document will take to commit.
>
> In general, i suspect that having dozens or hundreds (or thousands!) of
> cores is not going to scale well, it is not going to make good use of your
> cpu/ram/hd resources.   Not really the intended use case of multiple cores.
>
> However, you are probably going to run into some issues with the single
> index approach too. In general, how to deal with "multi-tenancy" in Solr is
> an oft-asked question that there doesn't seem to be any "just works and does
> everything for you without needing to think about it" solution for in solr.
> Judging from past thread. I am not a Solr developer or expert.
>
> 
> From: Markus Jelsma [markus.jel...@openindex.io]
> Sent: Tuesday, November 09, 2010 6:57 PM
> To: solr-user@lucene.apache.org
> Cc: Adam Estrada
> Subject: Re: Using Multiple Cores for Multiple Users
>
> Hi,
>
> > All,
> >
> > I have a web application that requires the user to register and then
> login
> > to gain access to the site. Pretty standard stuff...Now I would like to
> > know what the best approach would be to implement a "customized" search
> > experience for each user. Would this mean creating a separate core per
> > user? I think that this is not possible without restarting Solr after
> each
> > core is added to the multi-core xml file, right?
>
> No, you can dynamically manage cores and parts of their configuration.
> Sometimes you must reindex after a change, the same is true for reloading
> cores. Check the wiki on this one [1].
>
> >
> > My use case is this...User A would like to index 5 RSS feeds and User B
> > would like to index 5 completely different RSS feeds and he is not
> > interested at all in what User A is interested in. This means that they
> > would have to be separate index cores, right?
>
> If you view documents within an rss feed as a separate documents, you can
> assign an user ID to those documents, creating a multi user index with rss
> documents per user, or group or whatever.
>
> Having a core per user isn't a good idea if you have many users.  It takes
> up
> additional memory and disk space, doesn't share caches etc.  There is also
> more maintenance and your need some support scripts to dynamically create
> new
> cores - Solr currently doesn't create a new core directory structure.
>
> But, reindexing a very large index takes up a lot more time and resources
> and
> relevancy might be an issue depending on the rss feeds' contents.
>
> >
> > What is the best approach for this kind of thing?
>
> I'd usually store the feeds in a single index and shard if it's too many
> for a
> single server with your specifications. Unless the demands are too
> specific.
>
> >
> > Thanks in advance,
> > Adam
>
> [1]: http://wiki.apache.org/solr/CoreAdmin
>
> Cheers
>


RE: Using Multiple Cores for Multiple Users

2010-11-09 Thread Jonathan Rochkind
If storing in a single index (possibly sharded if you need it), you can simply 
include a solr field that specifies the user ID of the saved thing. On the 
client side, in your application, simply ensure that there is an fq parameter 
limiting to the current user, if you want to limit to the current user's stuff. 
 Relevancy ranking should work just as if you had 'seperate cores', there is no 
relevancy issue. 

It IS true that when your index gets very large, commits will start taking 
longer, which can be a problem. I don't mean commits will take longer just 
because there is more stuff to commit -- the larger the index, the longer an 
update to a single document will take to commit. 

In general, i suspect that having dozens or hundreds (or thousands!) of cores 
is not going to scale well, it is not going to make good use of your cpu/ram/hd 
resources.   Not really the intended use case of multiple cores. 

However, you are probably going to run into some issues with the single index 
approach too. In general, how to deal with "multi-tenancy" in Solr is an 
oft-asked question that there doesn't seem to be any "just works and does 
everything for you without needing to think about it" solution for in solr. 
Judging from past thread. I am not a Solr developer or expert. 


From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Tuesday, November 09, 2010 6:57 PM
To: solr-user@lucene.apache.org
Cc: Adam Estrada
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

> All,
>
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration.
Sometimes you must reindex after a change, the same is true for reloading
cores. Check the wiki on this one [1].

>
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can
assign an user ID to those documents, creating a multi user index with rss
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up
additional memory and disk space, doesn't share caches etc.  There is also
more maintenance and your need some support scripts to dynamically create new
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and
relevancy might be an issue depending on the rss feeds' contents.

>
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a
single server with your specifications. Unless the demands are too specific.

>
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: dynamically create unique key

2010-11-09 Thread Dennis Gearon
Seems to me, it would be a good idea to put into the Solr Code, a unique ID per 
instance or installation or both, accessible either with JAVA or a query. Kind 
of like all the browsers do for their SSL connections.

Then, it's automatically easy to implement what is described below.

Maybe it should be written to the config file upon first run when it does not 
exist, and then any updates or reinstalls would reuse the same 
installation/instance ID.



From: Christopher Gross 
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 11:37:03 AM
Subject: Re: dynamically create unique key

Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter wrote:

>
> : one large index.  I need to create a unique key for the Solr index that
> will
> : be unique per document.  If I have 3 systems, and they all have a
> document
> : with id=1, then I need to create a "uniqueId" field in my schema that
> : contains both the system name and that id, along the lines of: "sysa1",
> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> take a look at the SignatureUpdateProcessorFactory...
>
> http://wiki.apache.org/solr/Deduplication
>
> :   
> :   
> ...
> : So instead of just appending to the uniqueId field, it tried to do a
> : multiValued.  Does anyone have an idea on how I can make this work?
>
> copyField doesn't "append" it copies Field (value) instances from the
> "source" field to the "dest" field -- so if you get multiple values for
> hte dest field.
>
>
> -Hoss
>



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
I'm willing to bet a lot that the standard approach is to use a Server Side 
Langauge to customize the queries for the user . . . on the same core/set of 
cores.

The only reasons that my limited experience suggests for a 'core per user' is 
privacy/performance. Unless you have a very small set of users, I would think 
managing cores for LOTS of users to be PIA. Create one (takes time), replicate 
to it (takes MORE time), use it, destroy it after session expires (requires 
garbage collection program running pretty often)(LOTS more time/CPU resource 
taken up.

I am happy to be corrected on any of this.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma 
To: solr-user@lucene.apache.org
Cc: Adam Estrada 
Sent: Tue, November 9, 2010 3:57:34 PM
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

> All,
> 
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

> 
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

> 
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

> 
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Markus Jelsma
Hi,

> All,
> 
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

> 
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

> 
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

> 
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: returning message to sender

2010-11-09 Thread Erick Erickson
Hmmm, this is a little murky
I'm inferring that you believe that DIH somehow
queries the data source at #query# time, and this
is not true.  DIH is an #index time# concept.

DIH is used to add data to an index. Once that index is
created, all searches against are unaware that there
were different data sources.

So, with a single Solr schema, you can use DIH
on as many different data sources as you want,
mapping the various bits of information from each
data source into your Solr schema. Searches go
against fields defined in the schema, so you're
automatically searching against all the databases
(assuming you've mapped your data into your
schema)

If I've misunderstood, perhaps you can add some
details?

Best
Erick

On Tue, Nov 9, 2010 at 1:39 PM, Teki, Prasad <
prasad_t...@standardandpoors.com> wrote:

> --=_Part_27114_30663314.1289327581322
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
>
> Hi guys,
> I have been exploring Solr since last few weeks. Our main intension is
> to
> expose the data, as WS, across various data sources by linking them
> using
> some scenario.
>
> I have couple of questions.
> Is there any good document/URL, which answers...
>
> How the indexing happens/built for the queries across different data
> sources
> (DIH)?
>
> Does the Lucene store the actual data of each individual query or a
> combination?, where, if yes?
>
> Whenever we do a query against built index, when exactly it fires the
> query
> to database?
>
> How does the index get the updates from the DIH, For example, if my
> query
> includes 3 DIH and
> What is the max number of data sources, I can include to get better
> performace?
>
> How do we measure the scalablity?
>
> Can I run these search engines in a grid mode?
>
> Thanks.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
> --=_Part_27114_30663314.1289327581322
> Content-Type: text/html; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
>
> Hi guys,
> I have been exploring Solr since last few weeks. Our main intension is
> to expose the data, as WS, across various data sources by linking them
> using some scenario.
>
> I have couple of questions.
> Is there any good document/URL, which answers...
>
> How the indexing happens/built for the queries across different data
> sources (DIH)?
>
> Does the Lucene store the actual data of each individual query or a
> combination?, where, if yes?
>
> Whenever we do a query against built index, when exactly it fires the
> query to database?
>
> How does the index get the updates from the DIH, For example, if my
> query includes 3 DIH and
> What is the max number of data sources, I can include to get better
> performace?
>
> How do we measure the scalablity?
>
> Can I run these search engines in a grid mode?
>
> Thanks. src='http://n3.nabble.com/images/smiley/anim_confused.gif' />
> 
> View this message in context:  href="http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html";
> >Storage
> Sent from the  href="http://lucene.472066.n3.nabble.com/Solr-User-f472068.html";>Solr -
> User mailing list archive at Nabble.com.
>
> --=_Part_27114_30663314.1289327581322--
> Standard & Poor's: Empowering Investors and Markets for 150 Years
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify us by replying to the message and deleting it from
> your computer. The McGraw-Hill Companies, Inc. reserves the right, subject
> to applicable local law, to monitor and review the content of any electronic
> message or information sent to or from McGraw-Hill employee e-mail addresses
> without informing the sender or recipient of the message.
> 
>


Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
All,

I have a web application that requires the user to register and then login
to gain access to the site. Pretty standard stuff...Now I would like to know
what the best approach would be to implement a "customized" search
experience for each user. Would this mean creating a separate core per user?
I think that this is not possible without restarting Solr after each core is
added to the multi-core xml file, right?

My use case is this...User A would like to index 5 RSS feeds and User B
would like to index 5 completely different RSS feeds and he is not
interested at all in what User A is interested in. This means that they
would have to be separate index cores, right?

What is the best approach for this kind of thing?

Thanks in advance,
Adam


RE: returning message to sender

2010-11-09 Thread Teki, Prasad
--=_Part_27114_30663314.1289327581322
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi guys,
I have been exploring Solr since last few weeks. Our main intension is
to
expose the data, as WS, across various data sources by linking them
using
some scenario.

I have couple of questions.
Is there any good document/URL, which answers...

How the indexing happens/built for the queries across different data
sources
(DIH)?

Does the Lucene store the actual data of each individual query or a
combination?, where, if yes?

Whenever we do a query against built index, when exactly it fires the
query
to database?

How does the index get the updates from the DIH, For example, if my
query
includes 3 DIH and 
What is the max number of data sources, I can include to get better
performace?

How do we measure the scalablity?

Can I run these search engines in a grid mode?

Thanks.
-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html
Sent from the Solr - User mailing list archive at Nabble.com.

--=_Part_27114_30663314.1289327581322
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi guys,
I have been exploring Solr since last few weeks. Our main intension is
to expose the data, as WS, across various data sources by linking them
using some scenario.

I have couple of questions.
Is there any good document/URL, which answers...

How the indexing happens/built for the queries across different data
sources (DIH)?

Does the Lucene store the actual data of each individual query or a
combination?, where, if yes?

Whenever we do a query against built index, when exactly it fires the
query to database?

How does the index get the updates from the DIH, For example, if my
query includes 3 DIH and 
What is the max number of data sources, I can include to get better
performace?

How do we measure the scalablity?

Can I run these search engines in a grid mode?

Thanks.

View this message in context: http://lucene.472066.n3.nabble.com/Storage-tp1871155p1871155.html";
>Storage
Sent from the http://lucene.472066.n3.nabble.com/Solr-User-f472068.html";>Solr -
User mailing list archive at Nabble.com.

--=_Part_27114_30663314.1289327581322-- 
Standard & Poor's: Empowering Investors and Markets for 150 Years
 


The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. The McGraw-Hill 
Companies, Inc. reserves the right, subject to applicable local law, to monitor 
and review the content of any electronic message or information sent to or from 
McGraw-Hill employee e-mail addresses without informing the sender or recipient 
of the message.



Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

On 11/09/2010 07:00 PM, Israel Ekpo wrote:

Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
wrote:


But in my case, that would make things more complex as I see it. Two 
jboss servers with solr as servlet container, and then I need the same 
data dir, right? I am now running single solr instance as cluster 
service, with data dir set to shared lun, that can be started on any of 
two hosts.


Can you explain my benefits with two solr instances via servlet, maybe 
more performance?


Regards,
Nikola

--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr




Re: dynamically create unique key

2010-11-09 Thread Christopher Gross
Thanks Hoss, I'll look into that!

-- Chris


On Tue, Nov 9, 2010 at 1:43 PM, Chris Hostetter wrote:

>
> : one large index.  I need to create a unique key for the Solr index that
> will
> : be unique per document.  If I have 3 systems, and they all have a
> document
> : with id=1, then I need to create a "uniqueId" field in my schema that
> : contains both the system name and that id, along the lines of: "sysa1",
> : "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> take a look at the SignatureUpdateProcessorFactory...
>
> http://wiki.apache.org/solr/Deduplication
>
> :   
> :   
> ...
> : So instead of just appending to the uniqueId field, it tried to do a
> : multiValued.  Does anyone have an idea on how I can make this work?
>
> copyField doesn't "append" it copies Field (value) instances from the
> "source" field to the "dest" field -- so if you get multiple values for
> hte dest field.
>
>
> -Hoss
>


Re: spell check vs terms component

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 1:02 PM, Shalin Shekhar Mangar
 wrote:
> On Tue, Nov 9, 2010 at 8:20 AM, bbarani  wrote:
>
>>
>> Hi,
>>
>> We are trying to implement auto suggest feature in our application.
>>
>> I would like to know the difference between terms vs spell check component.
>>
>> Both the handlers seems to display almost the same output, can anyone let
>> me
>> know the difference and also I would like to know when to go for spell
>> check
>> and when to go for terms component.
>>
>>
> SpellCheckComponent is designed to operate on whole words and not partial
> words so I don't know how well it will work for auto-suggest, if at all.
>
> As far as differences between SpellCheckComponent and Terms Component is
> concerned, TermsComponent is a straight prefix match whereas SCC takes edit
> distance into account. Also, SCC can deal with phrases composed of multiple
> words and also gives back a collated suggestion.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

An alternative to using the SpellCheckComponent and/or the
TermsComponent, would be the (Edge)NGrams filter. Basically, this
filter breaks words down into auto-suggest-friendly tokens (i.e.,
"Hello" => "H", "He", "Hel", "Hell", "Hello") that works great for
auto suggestion querying.

Here is an article from Lucid Imagination on using the ngram filter:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
Here is the SOLR wiki entry for the filter:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

- Ken Stanley


Solr highlighter question

2010-11-09 Thread Moazzam Khan
Hey guys,

I have 3 fields: FirstName, LastName, Biography. They are all string
fields.  In schema, I copy them to the default search field which is
"text". Is there any way to get Solr to highlight all the fields when
someone searches the default search field but when someone searches
for FirstName then only highlight that?

For example: if someone searches: medical +FirstName:dave then medical
should be highlighted in all fields and dave only in FirstName.

Thanks in advance,
Moazzam


Re: dynamically create unique key

2010-11-09 Thread Chris Hostetter

: one large index.  I need to create a unique key for the Solr index that will
: be unique per document.  If I have 3 systems, and they all have a document
: with id=1, then I need to create a "uniqueId" field in my schema that
: contains both the system name and that id, along the lines of: "sysa1",
: "sysb1", and "sysc1".  That way, each document will have a unique id.

take a look at the SignatureUpdateProcessorFactory...

http://wiki.apache.org/solr/Deduplication

:   
:   
...
: So instead of just appending to the uniqueId field, it tried to do a
: multiValued.  Does anyone have an idea on how I can make this work?

copyField doesn't "append" it copies Field (value) instances from the 
"source" field to the "dest" field -- so if you get multiple values for 
hte dest field. 


-Hoss


Re: How to Facet on a price range

2010-11-09 Thread Geert-Jan Brits
@ 
http://www.mysecondhome.co.uk/search.htm
-->
when you drag the sliders , an update of how many results would match is
immediately shown. I really like this. How did you do this? IS this
out-of-the-box available with the suggested Facet_by_range patch?

Thanks,
Geert-Jan

2010/11/9 gwk 

> Hi,
>
> Instead of all the facet queries, you can also make use of range facets (
> http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range), which
> is in trunk afaik, it should also be patchable into older versions of Solr,
> although that should not be necessary.
>
> We make use of it (http://www.mysecondhome.co.uk/search.html) to create
> the nice sliders Geert-Jan describes. We've also used it to add the
> sparklines above the sliders which give a nice indication of how the current
> selection is spread out.
>
> Regards,
>
> gwk
>
>
> On 11/9/2010 3:33 PM, Geert-Jan Brits wrote:
>
>> Just to add to this, if you want to allow the user more choice in his
>> option
>> to select ranges, perhaps by using a 2-sided javasacript slider for the
>> pricerange (ala kayak.com) it may be very worthwhile to discretize the
>> allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
>> implementations allow for this easily.
>>
>> This has the advantages of:
>> - having far fewer possible facetqueries and thus a far greater chance of
>> these facetqueries hitting the cache.
>> - a better user-experience, although that's debatable.
>>
>> just to be clear: for this the Solr-side would still use:
>> &facet=on&facet.query=price:[50
>> TO *]&facet.query=price:[* TO 100] and not the optimized pre-computed
>> variant suggested above.
>>
>> Geert-Jan
>>
>> 2010/11/9 jayant
>>
>>  That was very well thought of and a clever solution. Thanks.
>>> --
>>> View this message in context:
>>>
>>> http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>


Re: spell check vs terms component

2010-11-09 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 8:20 AM, bbarani  wrote:

>
> Hi,
>
> We are trying to implement auto suggest feature in our application.
>
> I would like to know the difference between terms vs spell check component.
>
> Both the handlers seems to display almost the same output, can anyone let
> me
> know the difference and also I would like to know when to go for spell
> check
> and when to go for terms component.
>
>
SpellCheckComponent is designed to operate on whole words and not partial
words so I don't know how well it will work for auto-suggest, if at all.

As far as differences between SpellCheckComponent and Terms Component is
concerned, TermsComponent is a straight prefix match whereas SCC takes edit
distance into account. Also, SCC can deal with phrases composed of multiple
words and also gives back a collated suggestion.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
wrote:

> I  have two nodes running one jboss server each and using one (single) solr
> instance, thats how I run it for now.
>
> Do you recommend running jboss with solr via servlet? Two jboss run in
> load-balancing for high availability purpose.
>
> For now it seems to be ok.
>
>
> On 11/09/2010 03:17 PM, Israel Ekpo wrote:
>
>> I think it would be a better idea to load solr via a servlet container
>> like
>> Tomcat and then create the init.d script for tomcat instead.
>>
>> http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6
>>
>>
> --
> Nikola Garafolic
> SRCE, Sveucilisni racunski centar
> tel: +385 1 6165 804
> email: nikola.garafo...@srce.hr
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Is there a way to embed terms handler in search handler?

2010-11-09 Thread bbarani

Hi,

I am trying to figure out if there is a way to embed terms handler as part
of default search handler and 
access using URL something lilke below

http://localhost:8990/solr/db/select?q=*:*&terms.prefix=a&terms.fl=name

Couple of other questions,

I would like to know if its possible to mention * in fl.name to search on
all fields or we should specify the field names only?

Will the autosuggest suggest the whole phrase or just the word it matches?

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-embed-terms-handler-in-search-handler-tp1870505p1870505.html
Sent from the Solr - User mailing list archive at Nabble.com.


spell check vs terms component

2010-11-09 Thread bbarani

Hi,

We are trying to implement auto suggest feature in our application.

I would like to know the difference between terms vs spell check component.

Both the handlers seems to display almost the same output, can anyone let me
know the difference and also I would like to know when to go for spell check
and when to go for terms component.

Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spell-check-vs-terms-component-tp1870214p1870214.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Facet on a price range

2010-11-09 Thread gwk

Hi,

Instead of all the facet queries, you can also make use of range facets 
(http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range), 
which is in trunk afaik, it should also be patchable into older versions 
of Solr, although that should not be necessary.


We make use of it (http://www.mysecondhome.co.uk/search.html) to create 
the nice sliders Geert-Jan describes. We've also used it to add the 
sparklines above the sliders which give a nice indication of how the 
current selection is spread out.


Regards,

gwk

On 11/9/2010 3:33 PM, Geert-Jan Brits wrote:

Just to add to this, if you want to allow the user more choice in his option
to select ranges, perhaps by using a 2-sided javasacript slider for the
pricerange (ala kayak.com) it may be very worthwhile to discretize the
allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
implementations allow for this easily.

This has the advantages of:
- having far fewer possible facetqueries and thus a far greater chance of
these facetqueries hitting the cache.
- a better user-experience, although that's debatable.

just to be clear: for this the Solr-side would still use:
&facet=on&facet.query=price:[50
TO *]&facet.query=price:[* TO 100] and not the optimized pre-computed
variant suggested above.

Geert-Jan

2010/11/9 jayant


That was very well thought of and a clever solution. Thanks.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: dynamically create unique key

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 10:53 AM, Christopher Gross  wrote:
> Thanks Ken.
>
> I'm using a script with Java/SolrJ to copy documents from their original
> locations into the Solr Index.
>
> I wasn't sure if the copyField would help me, but from your answers it seems
> that I'll have to handle it on my own.  That's fine -- it is definitely not
> hard to pass a new field myself.  I was just thinking that there should be
> an "easy" way to have Solr build the unique field, since it was getting
> everything anyway.
>
> I was just confused as to why I was getting a multiValued error, since I was
> just trying to append to a field.  I wasn't sure if I was missing something.
>
> Thanks again!
>
> -- Chris
>

Chris,

I definitely understand your sentiment. The thing to keep in mind with
SOLR is that it really has limited logic mechanisms; in fact, unless
you're willing to use the DataImportHandler (dih) and the
ScriptTransformer, you really have no logic.

The copyField directive in schema.xml is mainly used to help you
easily copy the contents of one field into another so that it may be
indexed in multiple ways; for example, you can index a string so that
it is stored literally (i.e., "Hello World"), parsed using a
whitespace tokenizer (i.e., "Hello", "World"), parsed for an nGram
tokenizer (i.e., "H", "He", "Hel"... ). This is beneficial to you
because you wouldn't have to explicitly define each possible instance
in your data stream. You just define the field once, and SOLR is smart
enough to copy it where it needs to go.

Glad to have helped. :)

- Ken


Re: dynamically create unique key

2010-11-09 Thread Christopher Gross
Thanks Ken.

I'm using a script with Java/SolrJ to copy documents from their original
locations into the Solr Index.

I wasn't sure if the copyField would help me, but from your answers it seems
that I'll have to handle it on my own.  That's fine -- it is definitely not
hard to pass a new field myself.  I was just thinking that there should be
an "easy" way to have Solr build the unique field, since it was getting
everything anyway.

I was just confused as to why I was getting a multiValued error, since I was
just trying to append to a field.  I wasn't sure if I was missing something.

Thanks again!

-- Chris


On Tue, Nov 9, 2010 at 10:47 AM, Ken Stanley  wrote:

> On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross 
> wrote:
> > I'm trying to use Solr to store information from a few different sources
> in
> > one large index.  I need to create a unique key for the Solr index that
> will
> > be unique per document.  If I have 3 systems, and they all have a
> document
> > with id=1, then I need to create a "uniqueId" field in my schema that
> > contains both the system name and that id, along the lines of: "sysa1",
> > "sysb1", and "sysc1".  That way, each document will have a unique id.
> >
> > I added this to my schema.xml:
> >
> >  
> >  
> >
> >
> > However, after trying to insert, I got this:
> > java.lang.Exception: ERROR: multiple values encountered for non
> multiValued
> > copy field uniqueId: sysa
> >
> > So instead of just appending to the uniqueId field, it tried to do a
> > multiValued.  Does anyone have an idea on how I can make this work?
> >
> > Thanks!
> >
> > -- Chris
> >
>
> Chris,
>
> Depending on how you insert your documents into SOLR will determine
> how to create your unique field. If you are POST'ing the data via
> HTTP, then you would be responsible for building your unique id (i.e.,
> your program/language would use string concatenation to add the unique
> id to the output before it gets to the update handler in SOLR). If
> you're using the DataImportHandler, then you can use the
> TemplateTransformer
> (http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
> dynamically build your unique id at document insertion time.
>
> For example, we here at bizjournals use SOLR and the DataImportHandler
> to index our documents. Like you, we run the risk of two or more ids
> clashing, and thus overwriting a different type of document. As such,
> we take two or three different fields and combine them together using
> the TemplateTransformer to generate a more unique id for each document
> we index.
>
> With respect to the multiValued option, that is used more for an
> array-like structure within a field. For example, if you have a blog
> entry with multiple tag keywords, you would probably want a field in
> SOLR that can contain the various tag keywords for each blog entry;
> this is where multiValued comes in handy.
>
> I hope that this helps to clarify things for you.
>
> - Ken Stanley
>


Re: dynamically create unique key

2010-11-09 Thread Ken Stanley
On Tue, Nov 9, 2010 at 10:39 AM, Christopher Gross  wrote:
> I'm trying to use Solr to store information from a few different sources in
> one large index.  I need to create a unique key for the Solr index that will
> be unique per document.  If I have 3 systems, and they all have a document
> with id=1, then I need to create a "uniqueId" field in my schema that
> contains both the system name and that id, along the lines of: "sysa1",
> "sysb1", and "sysc1".  That way, each document will have a unique id.
>
> I added this to my schema.xml:
>
>  
>  
>
>
> However, after trying to insert, I got this:
> java.lang.Exception: ERROR: multiple values encountered for non multiValued
> copy field uniqueId: sysa
>
> So instead of just appending to the uniqueId field, it tried to do a
> multiValued.  Does anyone have an idea on how I can make this work?
>
> Thanks!
>
> -- Chris
>

Chris,

Depending on how you insert your documents into SOLR will determine
how to create your unique field. If you are POST'ing the data via
HTTP, then you would be responsible for building your unique id (i.e.,
your program/language would use string concatenation to add the unique
id to the output before it gets to the update handler in SOLR). If
you're using the DataImportHandler, then you can use the
TemplateTransformer
(http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer) to
dynamically build your unique id at document insertion time.

For example, we here at bizjournals use SOLR and the DataImportHandler
to index our documents. Like you, we run the risk of two or more ids
clashing, and thus overwriting a different type of document. As such,
we take two or three different fields and combine them together using
the TemplateTransformer to generate a more unique id for each document
we index.

With respect to the multiValued option, that is used more for an
array-like structure within a field. For example, if you have a blog
entry with multiple tag keywords, you would probably want a field in
SOLR that can contain the various tag keywords for each blog entry;
this is where multiValued comes in handy.

I hope that this helps to clarify things for you.

- Ken Stanley


dynamically create unique key

2010-11-09 Thread Christopher Gross
I'm trying to use Solr to store information from a few different sources in
one large index.  I need to create a unique key for the Solr index that will
be unique per document.  If I have 3 systems, and they all have a document
with id=1, then I need to create a "uniqueId" field in my schema that
contains both the system name and that id, along the lines of: "sysa1",
"sysb1", and "sysc1".  That way, each document will have a unique id.

I added this to my schema.xml:

  
  


However, after trying to insert, I got this:
java.lang.Exception: ERROR: multiple values encountered for non multiValued
copy field uniqueId: sysa

So instead of just appending to the uniqueId field, it tried to do a
multiValued.  Does anyone have an idea on how I can make this work?

Thanks!

-- Chris


Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
Cool, thanks for the clarification, Shalin.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 9. nov. 2010, at 15.12, Shalin Shekhar Mangar wrote:

> On Tue, Nov 9, 2010 at 12:33 AM, Jan Høydahl / Cominvent
>  wrote:
>> Not sure about that. I have read that the replication handler actually 
>> issues a commit() on itself once the index is downloaded.
> 
> That was true with the old replication scripts. The Java based
> replication just re-opens the IndexReader after all the files are
> downloaded so the index version on the slave remains in sync with the
> one on the master.
> 
>> 
>> But probably a better way for Markus' case is to hook the prune job on the 
>> master, writing to another core (myIndexPruned). Then you replicate from 
>> that core instead, and you also get the benefit of transferring a smaller 
>> index across the network.
> 
> I agree, that is a good idea.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: How to Facet on a price range

2010-11-09 Thread Geert-Jan Brits
Just to add to this, if you want to allow the user more choice in his option
to select ranges, perhaps by using a 2-sided javasacript slider for the
pricerange (ala kayak.com) it may be very worthwhile to discretize the
allowed values for the slider (e.g: steps of 5 dolllar) Most js-slider
implementations allow for this easily.

This has the advantages of:
- having far fewer possible facetqueries and thus a far greater chance of
these facetqueries hitting the cache.
- a better user-experience, although that's debatable.

just to be clear: for this the Solr-side would still use:
&facet=on&facet.query=price:[50
TO *]&facet.query=price:[* TO 100] and not the optimized pre-computed
variant suggested above.

Geert-Jan

2010/11/9 jayant 

>
> That was very well thought of and a clever solution. Thanks.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replication and ignored fields

2010-11-09 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 12:33 AM, Jan Høydahl / Cominvent
 wrote:
> Not sure about that. I have read that the replication handler actually issues 
> a commit() on itself once the index is downloaded.

That was true with the old replication scripts. The Java based
replication just re-opens the IndexReader after all the files are
downloaded so the index version on the slave remains in sync with the
one on the master.

>
> But probably a better way for Markus' case is to hook the prune job on the 
> master, writing to another core (myIndexPruned). Then you replicate from that 
> core instead, and you also get the benefit of transferring a smaller index 
> across the network.

I agree, that is a good idea.

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to Facet on a price range

2010-11-09 Thread jayant

That was very well thought of and a clever solution. Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1869201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr init.d script

2010-11-09 Thread Israel Ekpo
I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6

On Tue, Nov 9, 2010 at 2:47 AM, Eric Martin  wrote:

> Er, what flavor?
>
> RHEL / CentOS
>
> #!/bin/sh
>
> # Starts, stops, and restarts Apache Solr.
> #
> # chkconfig: 35 92 08
> # description: Starts and stops Apache Solr
>
> SOLR_DIR="/var/solr"
> JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar"
> LOG_FILE="/var/log/solr.log"
> JAVA="/usr/bin/java"
>
> case $1 in
>start)
>echo "Starting Solr"
>cd $SOLR_DIR
>$JAVA $JAVA_OPTIONS 2> $LOG_FILE &
>;;
>stop)
>echo "Stopping Solr"
>cd $SOLR_DIR
>$JAVA $JAVA_OPTIONS --stop
>;;
>restart)
>$0 stop
>sleep 1
>$0 start
>;;
>*)
>echo "Usage: $0 {start|stop|restart}" >&2
>exit 1
>;;
> esac
>
> 
>
>
> Debian
>
> http://xdeb.org/node/1213
>
> __
>
> Ubuntu
>
> STEPS
> Type in the following command in TERMINAL to install nano text editor.
> sudo apt-get install nano
> Type in the following command in TERMINAL to add a new script.
> sudo nano /etc/init.d/solr
> TERMINAL will display a new page title "GNU nano 2.0.x".
> Paste the below script in this TERMINAL window.
> #!/bin/sh -e
>
> # Starts, stops, and restarts solr
>
> SOLR_DIR="/apache-solr-1.4.0/example"
> JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar"
> LOG_FILE="/var/log/solr.log"
> JAVA="/usr/bin/java"
>
> case $1 in
>start)
>echo "Starting Solr"
>cd $SOLR_DIR
>$JAVA $JAVA_OPTIONS 2> $LOG_FILE &
>;;
>stop)
>echo "Stopping Solr"
>cd $SOLR_DIR
>$JAVA $JAVA_OPTIONS --stop
>;;
>restart)
>$0 stop
>sleep 1
>$0 start
>;;
>*)
>echo "Usage: $0 {start|stop|restart}" >&2
>exit 1
>;;
> esac
> Note: In above script you might have to replace /apache-solr-1.4.0/example
> with appropriate directory name.
> Press CTRL-X keys.
> Type in Y
> When ask File Name to Write press ENTER key.
> You're now back to TERMINAL command line.
>
> Type in the following command in TERMINAL to create all the links to the
> script.
> sudo update-rc.d solr defaults
> Type in the following command in TERMINAL to make the script executable.
> sudo chmod a+rx /etc/init.d/solr
> To test. Reboot your Ubuntu Server.
> Wait until Ubuntu Server reboot is completed.
> Wait 2 minutes for Apache Solr to startup.
> Using your internet browser go to your website and try a Solr search.
>
>
>
> -Original Message-
> From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
> Sent: Monday, November 08, 2010 11:42 PM
> To: solr-user@lucene.apache.org
> Subject: solr init.d script
>
> Hi,
>
> Does anyone have some kind of init.d script for solr, that can start,
> stop and check solr status?
>
> --
> Nikola Garafolic
> SRCE, Sveucilisni racunski centar
> tel: +385 1 6165 804
> email: nikola.garafo...@srce.hr
>
>


-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: solr init.d script

2010-11-09 Thread Nikola Garafolic
I  have two nodes running one jboss server each and using one (single) 
solr instance, thats how I run it for now.


Do you recommend running jboss with solr via servlet? Two jboss run in 
load-balancing for high availability purpose.


For now it seems to be ok.

On 11/09/2010 03:17 PM, Israel Ekpo wrote:

I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6



--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


RE: Tomcat special character problem

2010-11-09 Thread Em

The problem was firstly the wrong URIEncoding of tomcat itself.
The second problem came from the application's side: The params were wrongly
encoded, so it was not possible to show the desired results.

If you need to convert from different encodings to utf8, I can give you the
following piece of pseudocode:

string = urlencode(encodeForUtf8(myString));

And if you need to decode for several reasons, keep in mind that you must
change the order of decodings:

value = decodeFromUtf8(urldecode(string));

Hope that helps.

Thank you!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-special-character-problem-tp1857648p1868024.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr dynamic core creation

2010-11-09 Thread nizan

Hi,

I’m not sure this is the right place, hopefully you can help. Anyway, I also
sent mail to solr-user@lucene.apache.org

I’m using solr – one master with 17 slaves in the server and using solrj as
the java client

Currently there’s only one core in all of them (master and slaves) – only
the cpaCore.

I thought about using multi-cores solr, but I have some problems with that.

I don’t know in advance which cores I’d need – 

When my java program runs, I call for documents to be index to a certain
url, which contains the core name, and I might create a url based on core
that is not yet created. For example:

(at the begining, the only core is cpaCore)

Calling to index – http://localhost:8080/cpaCore  - existing core,
everything as usual
Calling to index -  http://localhost:8080/newCore - Currently throws
excecption. what I'd like to happen is - server realizes there’s no core
“newCore”, creates it and indexes to it. After that – also creates the new
core in the slaves
Calling to index – http://localhost:8080/newCore  - existing core,
everything as usual

What I’d like to have on the server side to do is realize by itself if the
cores exists or not, and if not  - create it

One other restriction – I can’t change anything in the client side – calling
to the server can only make the calls it’s doing now – for index and search,
and cannot make calls for cores creation via the CoreAdminHandler. All I can
do is something in the server itself

What can I do to get it done? Write some RequestHandler? REquestProcessor?
Any other option?

Thanks, nizan

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-dynamic-core-creation-tp1867705p1867705.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication and ignored fields

2010-11-09 Thread Jan Høydahl / Cominvent
Not sure about that. I have read that the replication handler actually issues a 
commit() on itself once the index is downloaded.

But probably a better way for Markus' case is to hook the prune job on the 
master, writing to another core (myIndexPruned). Then you replicate from that 
core instead, and you also get the benefit of transferring a smaller index 
across the network.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 8. nov. 2010, at 23.50, Shalin Shekhar Mangar wrote:

> On Fri, Nov 5, 2010 at 2:30 PM, Jan Høydahl / Cominvent
>  wrote:
>> 
>> How about hooking in  Andrzej's pruning tool at the postCommit event, 
>> literally removing unused fields. I believe a "commit" is fired on the slave 
>> by itself after every successful replication, to put
>> the index live. You could execute a script which prunes away the dead meat 
>> and then call a new commit?
> 
> Well, I don't think it will work because a new commit will cause the
> index version on the slave to be ahead of the master which will cause
> Solr replication to download a full index from the master and it'd go
> in an infinite loop.
> 
> --
> Regards,
> Shalin Shekhar Mangar.



Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

Sorry, forgot to mention, Centos.
Thanks.

I have very similar script to this Centos one and I am missing status 
portion of the script.


On 11/09/2010 08:47 AM, Eric Martin wrote:

Er, what flavor?

RHEL / CentOS

#!/bin/sh

# Starts, stops, and restarts Apache Solr.
#
# chkconfig: 35 92 08
# description: Starts and stops Apache Solr

SOLR_DIR="/var/solr"
JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar"
LOG_FILE="/var/log/solr.log"
JAVA="/usr/bin/java"

case $1 in
 start)
 echo "Starting Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2>  $LOG_FILE&
 ;;
 stop)
 echo "Stopping Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo "Usage: $0 {start|stop|restart}">&2
 exit 1
 ;;
esac




Debian

http://xdeb.org/node/1213

__

Ubuntu

STEPS
Type in the following command in TERMINAL to install nano text editor.
sudo apt-get install nano
Type in the following command in TERMINAL to add a new script.
sudo nano /etc/init.d/solr
TERMINAL will display a new page title "GNU nano 2.0.x".
Paste the below script in this TERMINAL window.
#!/bin/sh -e

# Starts, stops, and restarts solr

SOLR_DIR="/apache-solr-1.4.0/example"
JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar"
LOG_FILE="/var/log/solr.log"
JAVA="/usr/bin/java"

case $1 in
 start)
 echo "Starting Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2>  $LOG_FILE&
 ;;
 stop)
 echo "Stopping Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo "Usage: $0 {start|stop|restart}">&2
 exit 1
 ;;
esac
Note: In above script you might have to replace /apache-solr-1.4.0/example
with appropriate directory name.
Press CTRL-X keys.
Type in Y
When ask File Name to Write press ENTER key.
You're now back to TERMINAL command line.

Type in the following command in TERMINAL to create all the links to the
script.
sudo update-rc.d solr defaults
Type in the following command in TERMINAL to make the script executable.
sudo chmod a+rx /etc/init.d/solr
To test. Reboot your Ubuntu Server.
Wait until Ubuntu Server reboot is completed.
Wait 2 minutes for Apache Solr to startup.
Using your internet browser go to your website and try a Solr search.



-Original Message-
From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
Sent: Monday, November 08, 2010 11:42 PM
To: solr-user@lucene.apache.org
Subject: solr init.d script

Hi,

Does anyone have some kind of init.d script for solr, that can start,
stop and check solr status?




--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr