date:20130130

Thanks Felipe..
Can you point me an example please?

Also forgive me but if a document has matches in more searchable fields
then should it not rank higher?

Thanks,
Sandeep
On 30 Jan 2013 19:30, "Felipe Lahti"  wrote:

> If you compare the first and last document scores you will see that the
> last one matches more fields than first one. So, you maybe thinking why?
> The first doc only matches "contributions" field and the last matches a
> bunch of fields so if you want to  have behave more like ( name="qf">series_title^500 title^100 description^15 contribution) you
> have to override the method of DefaultSimilarity.
>
>
> On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
> wrote:
>
> > I have pasted it below and it is slightly variant from the dismax
> > configuration I have mentioned above as I was playing with all sorts of
> > boost values, however it looks more lie below:
> >
> > 
> > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
> others
> > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
> freq
> > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 40960.0 = fieldNorm(doc=63298)
> > 
> > 
> > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
> > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
> > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
> > with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> > 
> > 
> > 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
> others
> > of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> > [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 32768.0 = fieldNorm(doc=9882325)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=220007)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=241151)
> > 
> > 
> > id:c208c2b4-1b3e-27b8-e040-a8c00409063a
> > 
> >  
> > 6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times
> others
> > of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
> > [DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
> > termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
> > boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
> > queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 =
> tf(freq=1.0),
> > with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
> > maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
> > weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result
> of:
> > 5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
> > 0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
> > idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
> > fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0
> =
> > termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
> > fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news
> in
> > 967895) [DefaultSimilarity], result of: 0.18680073 =
> > score(doc=967895,

Re: Issue with spellcheck and autosuggest

2013-01-30 Thread Artyom

you can of course check suggestions, but then you should remove
  wordbreak 
from your handler, because its purpose is to find cases, when user types
spaces wrongly (e.g., solrrocks, sol rrocks, so lr)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-spellcheck-and-autosuggest-tp4036208p4037631.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: admin security vs. replication

2013-01-30 Thread AlexeyK

As long as Core Admin is accessible via HTTP and allows to manipulate Solr
cores, it should be secured, regardless of configured path. The difference
between securing Admin vs. securing other handlers is that other handlers
are accessed by a specific application server(s), and therefore may be
easily firewalled etc.
Admin interface can (in theory) be accessed from machine other than
application server, but I cannot really apply security constraints to it as
long as Core Admin is used both internally(replication) and externally
(admin web interface JS).
Therefore, it's necessary to provide reverse proxy with access control
management for secure external access to admin AND internal access.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-admin-security-vs-replication-tp4037337p4037628.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: field space consumption - stored vs not stored


On 1/30/2013 6:24 PM, Shawn Heisey wrote:

If I had to guess about the extra space required for storing an int
field, I would say it's in the neighborhood of 20 bytes per document,
perhaps less.  I am also interested in a definitive answer.


The answer is very likely less than 20 bytes per doc.  I was assuming a 
larger size for VInt than it is likely to use.  See the answer for this 
question:


http://stackoverflow.com/questions/2752612/what-is-the-vint-in-lucene

Thanks,
Shawn

Re: field space consumption - stored vs not stored


On 1/30/2013 6:04 PM, Petersen, Robert wrote:

Hi

Just a quick question:  for a single valued int field in solr 3.6.1 how much 
more space is used if the field is stored vs indexed and not stored?


Here is the index file format reference for the two files that make up 
stored fields in the 3.6 index format:


http://lucene.apache.org/core/3_6_2/fileformats.html#field_index

If I read that right, the fdx file has a fixed size of 8 bytes times the 
number of documents in the segment.  The size should only depend on the 
number of documents, not the number of stored fields or their contents.


The fdt file contains the actual stored data and will vary according to 
the actual stored data.  Smaller fields take up less space than large 
fields.  If a stored field is missing from a document, it probably 
doesn't take up any space.  There is some overhead - exactly how much 
overhead is hard for me to calculate, especially since I don't know how 
much space a VInt takes up, which may in fact be variable.


If I had to guess about the extra space required for storing an int 
field, I would say it's in the neighborhood of 20 bytes per document, 
perhaps less.  I am also interested in a definitive answer.


Thanks,
Shawn

Re: queryResultWindowSize

2013-01-30 Thread Erick Erickson

Pretty much. The queryResultCache is pretty inexpensive. But be a bit
careful, it's tempting to increase it greatly, but that only buys you
performance if you see your users actually ask for subsequent pages
reasonably often

Best
Erick

On Tue, Jan 29, 2013 at 1:38 PM, Isaac Hebsh  wrote:

> Hi everyone.
>
> queryResultWindowSize parameter sets the number of documents be to
> retrieved and stored in the queryResultCache. That is, a further request
> with the same query, but the next page, might return very fast.
> The items that stored in queryResultCache are IDs only (not stored fields
> or preview/highlighting).
>
> That means that increasing queryResultWindowSize, will not significantly
> hurt performance of the first query?
>

Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-30 Thread Erick Erickson

Please feel free to just edit the Wiki yourself, all you have to do is
create a login




On Wed, Jan 23, 2013 at 9:04 AM, eShard  wrote:

> Thanks,
> That worked.
> So the documentation needs to be fixed in a few places (the solr wiki and
> the default solrconfig.xml in Solr 4.0 final; I didn't check any other
> versions)
> I'll either open a new ticket in JIRA to request a fix or reopen the old
> one...
>
> Furthermore,
> I tried using the ElevatedMarkerFactory and it didn't behave the way I
> thought it would.
>
> this
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&wt=xml&defType=dismax
> got me all the doc info but no elevated marker
>
> I ran this
>
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=[elevated]&wt=xml&defType=dismax
> and all I got was response = 1 and elevated = true
>
> I had to run this to get all of the above info:
>
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=*,[elevated]&wt=xml&defType=dismax
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Manually assigning shard leader and replicas during initial setup on EC2

2013-01-30 Thread Erick Erickson

But there's still the latency issue. Draw a diagram of all the
communications that have to go on to do an update and it's a _lot_ of
arrows going across DCs.

My suspicion is that it'll be much easier to just treat the separate DCs as
separate clusters that don't know about each other. that is, your indexing
process just has to send the update request to each cluster. Synchronizing
them afterwards is a problem assuredly.

TNSTAAFL


On Wed, Jan 23, 2013 at 4:46 AM, Upayavira  wrote:

> The way Zookeeper is set up, requiring 'quorum' is aimed at avoiding
> 'split brain' where two halves of your cluster start to operate
> independently. This means that you *have* to favour one half of your
> cluster over the other, in the case that they cannot communicate with
> each other.
>
> For example. if you have three zookeepers, you'll put two in one DC and
> one in the other. The DC with two Zookeepers will stay active should the
> link between them go down.
>
> I'm not entirely sure what happens to the network with one zookeeper,
> I'd like to think it can still serve queries, it *could* work out which
> nodes are accessible to it - but it will certainly not be doing updates
> (they should be buffered until the other DC returns).
>
> If you want true geographical redundancy, I think Markus' suggestion is
> a sensible one.
>
> Upayavira
>
> On Tue, Jan 22, 2013, at 10:11 PM, Markus Jelsma wrote:
> > Hi,
> >
> > Regarding availability; since SolrCloud is not DC-aware at this moment we
> > 'solve' the problem by simply operating multiple identical clusters in
> > different DCs and send updates to them all. This works quite well but it
> > requires some manual intervention if a DC is down due to a prolonged DOS
> > attack or netwerk of power failure.
> >
> > I don't think it's a very good idea to change clusterstate.json because
> > Solr will modify it when for example a node goes down. Your preconfigured
> > state doesn't exist anymore. It's also a bad idea because distributed
> > queries are going to be sent to remote locations, adding a lot of
> > latency. Again, because it's not DC aware.
> >
> > Any good solution to this problem should be in Solr itself.
> >
> > Cheers,
> >
> >
> > -Original message-
> > > From:Timothy Potter 
> > > Sent: Tue 22-Jan-2013 22:46
> > > To: solr-user@lucene.apache.org
> > > Subject: Manually assigning shard leader and replicas during initial
> setup on EC2
> > >
> > > Hi,
> > >
> > > I'm wanting to split my existing Solr 4 cluster into 2 different
> > > availability zones in EC2, as in have my initial leaders in one zone
> and
> > > their replicas in another AZ. My thinking here is if one zone goes
> down, my
> > > cluster stays online. This is the recommendation of Amazon EC2 docs.
> > >
> > > My thinking here is to just cook up a clusterstate.json file to
> manually
> > > set my desired shard / replica assignments to specific nodes. After
> which I
> > > can update the clusterstate.json file in Zk and then bring the nodes
> > > online.
> > >
> > > The other thing to mention is that I have existing indexes that need
> to be
> > > preserved as I don't want to re-index. For this I'm planning to just
> move
> > > data directories where they need to be based on my changes to
> > > clusterstate.json
> > >
> > > Does this sound reasonable? Any pitfalls I should look out for?
> > >
> > > Thanks.
> > > Tim
> > >
>

Re: Return Meta data when using Suggester

2013-01-30 Thread Erik Holstad

Thanks Arcadius!
Looks very promising and will have a look at it.

On Wed, Jan 30, 2013 at 3:00 PM, Arcadius Ahouansou wrote:

> Hi Erik.
>
> You way want to have a look at:
>
> http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
>
> Arcadius.
>
>
>
> On 30 January 2013 17:09, Erik Holstad  wrote:
>
> > Hey!
> >
> > Been playing around with Suggester and things are working just fine,
> > but I have a use case where it would be really helpful to return some
> meta
> > data
> > for each suggestion, for example the "id", or some other input that I can
> > specify.
> >
> > Is this possible, or would it totally mess up the underlying data
> structure
> > to do
> > something like this.
> >
> > I guess you could add the meta data to the tail of your string and then
> > parse it out at
> > retrieval time, but it feels kinda hacky.
> >
> >
> > --
> > Regards Erik
> >
>



-- 
Regards Erik

Re: Return Meta data when using Suggester

2013-01-30 Thread Arcadius Ahouansou

Hi Erik.

You way want to have a look at:

http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Arcadius.



On 30 January 2013 17:09, Erik Holstad  wrote:

> Hey!
>
> Been playing around with Suggester and things are working just fine,
> but I have a use case where it would be really helpful to return some meta
> data
> for each suggestion, for example the "id", or some other input that I can
> specify.
>
> Is this possible, or would it totally mess up the underlying data structure
> to do
> something like this.
>
> I guess you could add the meta data to the tail of your string and then
> parse it out at
> retrieval time, but it feels kinda hacky.
>
>
> --
> Regards Erik
>

Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti

If you compare the first and last document scores you will see that the
last one matches more fields than first one. So, you maybe thinking why?
The first doc only matches "contributions" field and the last matches a
bunch of fields so if you want to  have behave more like (series_title^500 title^100 description^15 contribution) you
have to override the method of DefaultSimilarity.


On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry  wrote:

> I have pasted it below and it is slightly variant from the dismax
> configuration I have mentioned above as I was playing with all sorts of
> boost values, however it looks more lie below:
>
> 
> 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others
> of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 40960.0 = fieldNorm(doc=63298)
> 
> 
> 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
> of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
> termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
> with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> 
> 
> 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others
> of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 32768.0 = fieldNorm(doc=9882325)
> 
> 
> 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
> of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 24576.0 = fieldNorm(doc=220007)
> 
> 
> 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
> of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
> [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 24576.0 = fieldNorm(doc=241151)
> 
> 
> id:c208c2b4-1b3e-27b8-e040-a8c00409063a
> 
>  
> 6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times others
> of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
> [DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
> termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
> boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
> queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 = tf(freq=1.0),
> with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
> maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
> weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result of:
> 5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
> 0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
> idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
> fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
> termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
> fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news in
> 967895) [DefaultSimilarity], result of: 0.18680073 =
> score(doc=967895,freq=1.0 = termFreq=1.0 ), product of: 0.002031815 =
> queryWeight, product of: 6.5669904 = idf(docFreq=43120, maxDocs=11282414)
> 3.093982E-4 = queryNorm 91.93787 = fieldWeight in 967895, product of: 1.0 =
> tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.5669904 =
> idf(docFreq=43120, maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 6.464123
> = (MATCH) weight(pg_series_title_ci:news^500.0 in 967895)
> [DefaultSim

Re: overlap function query

2013-01-30 Thread Chris Hostetter


: I think coord works at the document level, I was thinking of having
: something that worked at a field level, against a 'principle/primary'
: field.

I'm not sure what you mean by "works at hte document level" ... coord is 
used y the BooleanQuery scoring mechanism to define how scores should be 
affected when a document doens't match all terms in the query.

Mikhail's suggestion was that with an appropriately definied coord method 
in a custom similarity, you could probably get close to what you are 
asking about using the "query()" function on a BooleanQuery containing hte 
terms you are interested in.

It may be easier then that though ... you might want to take a look at the 
termfreq() and norm() functions .. combined with map() (to ensure you get 
a "1" for docs that match a term, no matter what the termfreq() is) you 
could probably get proportionate values to what you are looking for -- but 
the denominators won't be the exact number of terms in the field unless 
you customize the norm function in your similarity.

in general though this smells like an XY problem, because the kind of 
boosting you seem to be trying to achieve sounds like exactly what the 
normal TF/IDF scoring algorithm will give you.  so perhaps you should tell 
us more about some real world specifics of hte types of data/query you are 
using, what types of results you are seeing, and the types of results you 
want...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss

Re: Server stops at Opening Searcher in 4.1

thanks Shawn, 
I will try both the approach. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Server stops at Opening Searcher in 4.1


On 1/30/2013 11:48 AM, adityab wrote:

thanks Shawn,
We use Master-Slave architecture in Prod and planning to continue even with
4.1.
Our indexing usually happens on Master. and we about 10K docs every 2hrs and
then perform commit.
Our full re-index is only when we have schema change. So we dont use auto
commit.

Is there a way to turn off the "tlog", or when/what action recycles this
file?


The autoCommit settings that I have described will recycle the 
transaction logs without changing ANYTHING about how you index.  The 
resulting tlogs will be small, and there will not be very many of them. 
 That would be the first thing to try, and it should solve things for 
you completely.  You do need to have a _version_ field defined in your 
schema, but you probably already know that.


  

Removing the "updateLog" config from your updateHandler definition will 
turn off transaction logs entirely.  Because you have already switched 
to MMapDirectoryFactory, this should be safe - but I would recommend the 
autoCommit changes instead.


Thanks,
Shawn

Re: Server stops at Opening Searcher in 4.1

thanks Shawn, 
We use Master-Slave architecture in Prod and planning to continue even with
4.1. 
Our indexing usually happens on Master. and we about 10K docs every 2hrs and
then perform commit. 
Our full re-index is only when we have schema change. So we dont use auto
commit. 

Is there a way to turn off the "tlog", or when/what action recycles this
file?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037493.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Server stops at Opening Searcher in 4.1


On 1/30/2013 11:21 AM, adityab wrote:

Shawn,
i believe your point is valid ... if you see below my tlog.* file size is
huge. Bu shouldn't that be cleared if i am not using soft commit and do an
explicit hard commit?

After deleting this i was able to get my server up. Thanks for the
information/help. Also pleae let me know how to avoid such situation. For
now i can work with manually deleting the log but cant do it all the time.

[root@awbmmv030 tlog]# pwd
/storage/solrdata/tlog
[root@awbmmv030 tlog]# ls -l
total 27624908
-rw-r--r-- 1 root root 28260272185 Jan 29 15:44 tlog.001
[root@awbmmv030 tlog]#


I imagine that the reason that you do a large amount of indexing without 
any commits is that you don't want the clients to see any change until 
the entire index run is complete.


If you use autoCommit but make openSearcher false, then a hard commit 
will be automatically performed every time you reach maxDocs or maxTime 
milliseconds.  What the openSearcher value of false does is make it so 
the same IndexSearcher will continue to be used -- no change will be 
visible to clients making queries, even if one of the things you do is 
delete all documents before adding new documents.


I'm currently using a maxDocs of 25000 and a maxTime of 30 (five 
minutes).  I don't think it ever takes five minutes to index 25000 
documents.


Thanks,
Shawn

Re: Server stops at Opening Searcher in 4.1

Shawn, 
i believe your point is valid ... if you see below my tlog.* file size is
huge. Bu shouldn't that be cleared if i am not using soft commit and do an
explicit hard commit?

After deleting this i was able to get my server up. Thanks for the
information/help. Also pleae let me know how to avoid such situation. For
now i can work with manually deleting the log but cant do it all the time. 

[root@awbmmv030 tlog]# pwd
/storage/solrdata/tlog
[root@awbmmv030 tlog]# ls -l
total 27624908
-rw-r--r-- 1 root root 28260272185 Jan 29 15:44 tlog.001
[root@awbmmv030 tlog]#




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037483.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Server stops at Opening Searcher in 4.1

Thanks for additional information Shawn,

I am testing 4.1 on single machine sing core. so no cloud. I did change
NRTCachingDirectoryFactory to MMapDirectoryFactory and after indexing all
the document we do a hard commit explicitly from our publisher client. 
I was able to run queries to verify my document. The problem encountered
after i restarted the server for load test. Just to make sure all caches are
clean. 
This is where i see the server never starts and is stuck at the point
mentioned in my logs. 

If i delete the index manually and then start the server every thing looks
good. 
Also not able to set log level to debug from Admin UI. (it doesn't show up
may be that's another bug)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037481.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Possible issue in edismax?

I have pasted it below and it is slightly variant from the dismax
configuration I have mentioned above as I was playing with all sorts of
boost values, however it looks more lie below:


2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others
of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
[DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
40960.0 = fieldNorm(doc=63298)


2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
[DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)


2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others
of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
[DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
32768.0 = fieldNorm(doc=9882325)


1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
[DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
24576.0 = fieldNorm(doc=220007)


1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
[DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
24576.0 = fieldNorm(doc=241151)


id:c208c2b4-1b3e-27b8-e040-a8c00409063a

 
6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times others
of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
[DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 = tf(freq=1.0),
with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result of:
5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news in
967895) [DefaultSimilarity], result of: 0.18680073 =
score(doc=967895,freq=1.0 = termFreq=1.0 ), product of: 0.002031815 =
queryWeight, product of: 6.5669904 = idf(docFreq=43120, maxDocs=11282414)
3.093982E-4 = queryNorm 91.93787 = fieldWeight in 967895, product of: 1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.5669904 =
idf(docFreq=43120, maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 6.464123
= (MATCH) weight(pg_series_title_ci:news^500.0 in 967895)
[DefaultSimilarity], result of: 6.464123 = score(doc=967895,freq=1.0 =
termFreq=1.0 ), product of: 0.9696 = queryWeight, product of: 500.0 =
boost 6.4641423 = idf(docFreq=47791, maxDocs=11282414) 3.093982E-4 =
queryNorm 6.4641423 = fieldWeight in 967895, product of: 1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.4641423 =
idf(docFreq=47791, maxDocs=11282414) 1.0 = fieldNorm(doc=967895) 1.6107484
= (MATCH) weight(title_ci:news^100.0 in 967895) [DefaultSimilarity], result
of: 1.6107484 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
0.22324038 = queryWeight, product of: 100.0 = boost 7.215

Re: Server stops at Opening Searcher in 4.1

thanks Jack, 

I did take the latest solrconfig.xml file. 
The only change i made to the file is for using MMapDirectory




Apart from that i increased the cache size for query/document/filter all
warm-up set to 0 (for test)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037478.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Server stops at Opening Searcher in 4.1


On 1/30/2013 10:31 AM, Jack Krupansky wrote:

Do you have any customized settings for , warming
queries, or customized settings for the old/deprecated 
or ? Try using the settings from the latest solrconfig.xml
and then customize from there. Or at least see how they are different.


Jack,

Is there any possibility that their failure could be caused by a very 
very large updateLog?


If not, the rest of this probably doesn't apply.  If so:

Aditya, a large updateLog is generated when a large amount of indexing 
is done without any hard commits.  One way to make things better without 
changing the way you index is to put nonzero values for maxDocs and/or 
maxTime into autoCommit, with openSearcher=false.


http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section

Another option is to turn off the updateLog entirely.  That would not be 
recommended if you are using SolrCloud.  If there's no cloud, you'd 
probably be OK, but you'd also want to change from 
NRTCachingDirectoryFactory to MMapDirectoryFactory to avoid data loss.


Thanks,
Shawn

Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti

Let me see if I understood your problem:

By your first e-mail I think you are worried about the returned order of
documents from Solr. Is that correct? If yes, as I said before it's not
only the boosting that influence the order of returned documents. There's
term frequency, IDF(inverse document frequency)... If I understood
correctly by your first e-mail, you are interested in get rid of IDF. So
for that, you can create a NoIDFSimilarity class to override the default
similarity.

Can you paste here the score calculation for one document?


On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry  wrote:

> (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
> an email in Gmail.. ;-))
>
> Thanks Felipe, yes I have seen that and my requirement falls for
>
> How can I make exact-case matches score higher
>
> Example: a query of "Penguin" should score documents containing "Penguin"
> higher than docs containing "penguin".
>
> The general strategy is to index the content twice, using different fields
> with different fieldTypes (and different analyzers associated with those
> fieldTypes). One analyzer will contain a lowercase filter for
> case-insensitive matches, and one will preserve case for exact-case
> matches.
>
> Use copyField  commands
> in
> the schema to index a single input field multiple times.
>
> Once the content is indexed into multiple fields that are analyzed
> differently, query across both
> fields
> .
>
> I have added a case insensitive field too to match the exact matches
> higher, however the result is not even considering the matches in field -
> forget the exact matching part.
>
> And I have tried the debugQuery option as mentioned in my previous mail,
> and I have also posted the parsed queries. From the debug query, I see that
> field boosted with lesser factor (contribution) is still resulting higher
> than the one with higher boost factor (series_title).
>
>
> Thanks,
>
> Sandeep
>
>
>
>
> On 30 January 2013 16:02, Sandeep Mestry  wrote:
>
> > Thanks Felipe, yes I have seen that and my requirement somewhere falls
> for
> >
> >
> > On 30 January 2013 15:53, Felipe Lahti  wrote:
> >
> >> Hi Sandeep,
> >>
> >> Quick answer is that not only the boost that you define in your
> >> requestHandler is taken to calculate the score of each document. There
> are
> >> others factors that contribute to score calculation. You can take a look
> >> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
> >> see
> >> using debugQuery=true the score calculation for each document returned.
> >>
> >> Let me know you need something else.
> >>
> >>
> >>
> >> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I'm facing an issue in relevancy calculation by dismax query parser.
> >> > The boost factor applied does not work as expected in certain cases
> when
> >> > the keyword is generic and by generic I mean, if the keyword is
> >> appearing
> >> > many times in the document as well as in the index.
> >> >
> >> > I have parser configuration as below:
> >> >
> >> > 
> >> > 
> >> > edismax
> >> > explicit
> >> > 0.01
> >> > series_title^500 title^100 description^15
> >> > contribution
> >> > series_title^200
> >> > 0
> >> > *:*
> >> > 
> >> > 
> >> >
> >> > As you can see above, I'd expect the documents containing the matches
> >> for
> >> > series title should rank higher than the ones in contribution.
> >> >
> >> > This works well, if I type in a query like 'wonderworld' which is a
> less
> >> > occurring term and the series titles rank higher. But, if I type in a
> >> > keyword like 'news' which is the most common term in the index, I get
> >> hits
> >> > in contributions even though I have lots of documents having word news
> >> in
> >> > series title.
> >> >
> >> > The field definition is as below:
> >> >
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="true" />
> >> >
> >> >  positionIncrementGap="100"
> >> > compressThreshold="10">
> >> > 
> >> > 
> >> >  >> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > 
> >> > 
> >> > 
> >> > 
> >> >  >> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >> > 
> >> > 
> >> > 
> >> >
> >> >  >> positionIncrementGap="100"
> >> > >
> >> > 
> >> > 
> >> >  >> > stemEnglishPossessive="0" generateWordParts="1"
> generateNumberParts="1"
> >> >

Re: SolrCloud 4.1 - change config set for a collection?

2013-01-30 Thread Mark Miller

On Jan 30, 2013, at 12:14 PM, Shawn Heisey  wrote:

> Is it possible to issue a command through the collections API that will 
> assign a new config set (already stored in zookeeper) to an existing 
> collection?

No, not currently. We are talking about such things here: SOLR-4193.

Right now you either have to use one of the auto rules (eg collections with no 
link that see a conf set that shares their name should auto link) or use the 
ZkCli tool or do something manual.

> 
> Related - because such changes would require a reload, is there a RELOAD 
> action on the collection API that finds all the cores for that collection and 
> reloads them?

See the reload command: 
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

> 
> The SolrCloud wiki page doesn't have a reference for the action parameter and 
> the parameter list is almost guaranteed to be incomplete.

I don't think it is? What is missing?

- Mark

Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread Jack Krupansky

Do you have any customized settings for , warming queries, 
or customized settings for the old/deprecated  or 
? Try using the settings from the latest solrconfig.xml and then 
customize from there. Or at least see how they are different.


-- Jack Krupansky

-Original Message- 
From: adityab

Sent: Wednesday, January 30, 2013 12:21 PM
To: solr-user@lucene.apache.org
Subject: Server stops at Opening Searcher in 4.1

Hi,
We currently have Solr3.5 and working well. With the features and fixes
available in 4.1 we decided to upgrade.
We started some test with Solr4.1 on Jbos7.1. Everything looks good at first
run and indexing and execute some queries. We restart servers before
performing Load test and encountered this problem several times. Where the
search is opening (stuck there) and then jboss throws an error as seen
below. Our index size is 8.9GB (25M docs).
Have observed this usually during restart with index. To get the server
started we need to delete the index and then it goes through. Any idea what
could be the issue?

09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) adding lazy requestHandler: solr.SearchHandler
09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) created /terms: solr.SearchHandler
09:12:08,020 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,022 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,023 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,026 INFO  [org.apache.solr.core.CachingDirectoryFactory]
(coreLoadExecutor-3-thread-1) Releasing directory:/storage/solrdata
09:12:08,260 INFO  [org.apache.solr.search.SolrIndexSearcher]
(coreLoadExecutor-3-thread-1) Opening Searcher@7703d93e main
09:13:05,576 INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2)
JBAS015870: Deploy of deployment "solr.war" was rolled back with failure
message Operation cancelled
09:13:05,578 ERROR [org.jboss.as.server.deployment.scanner]
(DeploymentScanner-threads - 1) JBAS015052: Did not receive a response to
the deployment operation within the allowed timeout period [60 seconds].
Check the server configuration file and the server logs to find more about
the status of the deployment.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458.html
Sent from the Solr - User mailing list archive at Nabble.com.

Server stops at Opening Searcher in 4.1

Hi,
We currently have Solr3.5 and working well. With the features and fixes
available in 4.1 we decided to upgrade. 
We started some test with Solr4.1 on Jbos7.1. Everything looks good at first
run and indexing and execute some queries. We restart servers before
performing Load test and encountered this problem several times. Where the
search is opening (stuck there) and then jboss throws an error as seen
below. Our index size is 8.9GB (25M docs). 
Have observed this usually during restart with index. To get the server
started we need to delete the index and then it goes through. Any idea what
could be the issue? 

09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) adding lazy requestHandler: solr.SearchHandler
09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) created /terms: solr.SearchHandler
09:12:08,020 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,022 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,023 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,026 INFO  [org.apache.solr.core.CachingDirectoryFactory]
(coreLoadExecutor-3-thread-1) Releasing directory:/storage/solrdata
09:12:08,260 INFO  [org.apache.solr.search.SolrIndexSearcher]
(coreLoadExecutor-3-thread-1) Opening Searcher@7703d93e main
09:13:05,576 INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2)
JBAS015870: Deploy of deployment "solr.war" was rolled back with failure
message Operation cancelled
09:13:05,578 ERROR [org.jboss.as.server.deployment.scanner]
(DeploymentScanner-threads - 1) JBAS015052: Did not receive a response to
the deployment operation within the allowed timeout period [60 seconds].
Check the server configuration file and the server logs to find more about
the status of the deployment.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud 4.1 - change config set for a collection?

Is it possible to issue a command through the collections API that will 
assign a new config set (already stored in zookeeper) to an existing 
collection?


Related - because such changes would require a reload, is there a RELOAD 
action on the collection API that finds all the cores for that 
collection and reloads them?


The SolrCloud wiki page doesn't have a reference for the action 
parameter and the parameter list is almost guaranteed to be incomplete.


I thought I remembered seeing a Jira issue that would automatically 
reload collections when making changes to the associated config set.


Thanks,
Shawn

Re: SolrCloud: admin security vs. replication

2013-01-30 Thread Mark Miller

The admin user interface and admin/cores are two very different things - they 
just happen to share admin in the url.

It doesn't make any sense to secure admin/cores unless you are also going to 
secure all the other Solr API's.

- Mark

On Jan 30, 2013, at 5:55 AM, AlexeyK  wrote:

> Hi,
> There are a lot of posts which talk about hardening the /admin handler with
> user credentials etc.
> From the other hand, replication handler wouldn't work if /admin/cores is
> also hardened.
> Considering this fact, how could I allow secure external access to the admin
> interface AND allow proper cluster work?
> Not setting any security on admin/cores is not an option.
> 
> Thanks,
> Alexey 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-admin-security-vs-replication-tp4037337.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I start solr with replication activated but disabled between master and slave

2013-01-30 Thread Arcadius Ahouansou

As stated by Robi, you can through the admin UI:

-disable replication on the master through the admin or

-disable polling on the slave through the admin UI. Disabling polling on
the slaves is very handy if you doing stuff on the master that require
master restart as a restart.

Thanks.

Arcadius.





On 30 January 2013 16:35, Petersen, Robert  wrote:

> Hi Jamel,
>
> You can start solr slaves with them pointed at a master and then turn off
> replication in the admin replication page.
>
> Hope that helps,
> -Robi
>
> Robert (Robi) Petersen
> Senior Software Engineer
> Search Department
>
>
>
>
> -Original Message-
> From: Jamel ESSOUSSI [mailto:jamel.essou...@gmail.com]
> Sent: Wednesday, January 30, 2013 2:45 AM
> To: solr-user@lucene.apache.org
> Subject: Can I start solr with replication activated but disabled between
> master and slave
>
> Hello,
>
> I would like to start solr with the following configuration;
>
> Replication between master and slave activated but not enabled.
>
> Regards
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-I-start-solr-with-replication-activated-but-disabled-between-master-and-slave-tp4037333.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>

RE: Can I start solr with replication activated but disabled between master and slave

2013-01-30 Thread Petersen, Robert

Hi Jamel,

You can start solr slaves with them pointed at a master and then turn off 
replication in the admin replication page.

Hope that helps,
-Robi

Robert (Robi) Petersen
Senior Software Engineer
Search Department

 


-Original Message-
From: Jamel ESSOUSSI [mailto:jamel.essou...@gmail.com] 
Sent: Wednesday, January 30, 2013 2:45 AM
To: solr-user@lucene.apache.org
Subject: Can I start solr with replication activated but disabled between 
master and slave

Hello,

I would like to start solr with the following configuration;

Replication between master and slave activated but not enabled.

Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-start-solr-with-replication-activated-but-disabled-between-master-and-slave-tp4037333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A question about attaching shards to load balancers

2013-01-30 Thread Peter Keegan

Aren't you concerned about having a single point of failure with this setup?

On Wed, Jan 30, 2013 at 10:38 AM, Michael Ryan  wrote:

> From a performance point of view, I can't imagine it mattering. In our
> setup, we have a dedicated Solr server that is not a shard that takes
> incoming requests (we call it the "coordinator"). This server is very
> lightweight and practically has no load at all.
>
> My gut feeling is that having a separate dedicated server might be a
> slightly better approach, as it will have totally different performance
> characteristics than the shards, and so you can tune it for this.
>
> -Michael
>

Re: configuring datasource for dynamic password and user

2013-01-30 Thread Walter Underwood

This was discussed last week, with two different solutions:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/browser

In general, you can set a Java property, like "-Ddbpass=fred", then use it in 
the config files as "${dbpass}".

wunder

On Jan 30, 2013, at 3:37 AM, Lapera-Valenzuela, Elizabeth [Primerica] wrote:

> Hi, we will be using solr on development, test and prod platforms.  Is
> there a way to dynamically create the datasource so that the url,
> password and user id is passed in or can I point it to a properties file
> that has this info?  Thanks 
>

Re: Possible issue in edismax?

(Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
an email in Gmail.. ;-))

Thanks Felipe, yes I have seen that and my requirement falls for

How can I make exact-case matches score higher

Example: a query of "Penguin" should score documents containing "Penguin"
higher than docs containing "penguin".

The general strategy is to index the content twice, using different fields
with different fieldTypes (and different analyzers associated with those
fieldTypes). One analyzer will contain a lowercase filter for
case-insensitive matches, and one will preserve case for exact-case matches.

Use copyField  commands in
the schema to index a single input field multiple times.

Once the content is indexed into multiple fields that are analyzed
differently, query across both
fields
.

I have added a case insensitive field too to match the exact matches
higher, however the result is not even considering the matches in field -
forget the exact matching part.

And I have tried the debugQuery option as mentioned in my previous mail,
and I have also posted the parsed queries. From the debug query, I see that
field boosted with lesser factor (contribution) is still resulting higher
than the one with higher boost factor (series_title).


Thanks,

Sandeep




On 30 January 2013 16:02, Sandeep Mestry  wrote:

> Thanks Felipe, yes I have seen that and my requirement somewhere falls for
>
>
> On 30 January 2013 15:53, Felipe Lahti  wrote:
>
>> Hi Sandeep,
>>
>> Quick answer is that not only the boost that you define in your
>> requestHandler is taken to calculate the score of each document. There are
>> others factors that contribute to score calculation. You can take a look
>> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
>> see
>> using debugQuery=true the score calculation for each document returned.
>>
>> Let me know you need something else.
>>
>>
>>
>> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
>> wrote:
>>
>> > Hi All,
>> >
>> > I'm facing an issue in relevancy calculation by dismax query parser.
>> > The boost factor applied does not work as expected in certain cases when
>> > the keyword is generic and by generic I mean, if the keyword is
>> appearing
>> > many times in the document as well as in the index.
>> >
>> > I have parser configuration as below:
>> >
>> > 
>> > 
>> > edismax
>> > explicit
>> > 0.01
>> > series_title^500 title^100 description^15
>> > contribution
>> > series_title^200
>> > 0
>> > *:*
>> > 
>> > 
>> >
>> > As you can see above, I'd expect the documents containing the matches
>> for
>> > series title should rank higher than the ones in contribution.
>> >
>> > This works well, if I type in a query like 'wonderworld' which is a less
>> > occurring term and the series titles rank higher. But, if I type in a
>> > keyword like 'news' which is the most common term in the index, I get
>> hits
>> > in contributions even though I have lots of documents having word news
>> in
>> > series title.
>> >
>> > The field definition is as below:
>> >
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="true" />
>> >
>> > > > compressThreshold="10">
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> >
>> > > positionIncrementGap="100"
>> > >
>> > 
>> > 
>> > > > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> > catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> > splitOnNumerics="0" preserveOriginal="1" />
>> > 
>> > 
>> > 
>> > 
>> > > > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> > catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> > splitOnNumerics="0" preserveOriginal="1" />
>> > 
>> > 
>> >  
>> >
>> > I have tried debugging and when I use query term news, I see that
>> matches
>> > for contributions are ranked higher than series title. The parsed
>> queries
>> > look like below:
>> > (Note that I have edited the query as in reality I have lot of fields
>> that
>> > are searchable and I have only mentioned the fields containing text
>> data -
>> > rest all contain uuids)
>> >
>> > 
>> > (+DisjunctionM

Re: Possible issue in edismax?

Thanks Felipe, yes I have seen that and my requirement somewhere falls for


On 30 January 2013 15:53, Felipe Lahti  wrote:

> Hi Sandeep,
>
> Quick answer is that not only the boost that you define in your
> requestHandler is taken to calculate the score of each document. There are
> others factors that contribute to score calculation. You can take a look
> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
> using debugQuery=true the score calculation for each document returned.
>
> Let me know you need something else.
>
>
>
> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
> wrote:
>
> > Hi All,
> >
> > I'm facing an issue in relevancy calculation by dismax query parser.
> > The boost factor applied does not work as expected in certain cases when
> > the keyword is generic and by generic I mean, if the keyword is appearing
> > many times in the document as well as in the index.
> >
> > I have parser configuration as below:
> >
> > 
> > 
> > edismax
> > explicit
> > 0.01
> > series_title^500 title^100 description^15
> > contribution
> > series_title^200
> > 0
> > *:*
> > 
> > 
> >
> > As you can see above, I'd expect the documents containing the matches for
> > series title should rank higher than the ones in contribution.
> >
> > This works well, if I type in a query like 'wonderworld' which is a less
> > occurring term and the series titles rank higher. But, if I type in a
> > keyword like 'news' which is the most common term in the index, I get
> hits
> > in contributions even though I have lots of documents having word news in
> > series title.
> >
> > The field definition is as below:
> >
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="true" />
> >
> >  > compressThreshold="10">
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> >
> >  positionIncrementGap="100"
> > >
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> >  
> >
> > I have tried debugging and when I use query term news, I see that matches
> > for contributions are ranked higher than series title. The parsed queries
> > look like below:
> > (Note that I have edited the query as in reality I have lot of fields
> that
> > are searchable and I have only mentioned the fields containing text data
> -
> > rest all contain uuids)
> >
> > 
> > (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> > contributions:news | series_title:news^500.0)~0.01) () () () () () () ()
> ()
> > () () () () () () () () () () () () () () () () () () () ())/no_coord
> > 
> > 
> > +(description:news^15 | title:news^100.0 | contributions:news |
> > series_title:news^500.0)~0.01 () () () () () () () () () () () () () ()
> ()
> > () () () () () () () () () () () () ()
> >
> >
> > Could you guide me in right direction please?
> >
> > Many Thanks,
> > Sandeep
> >
>
>
>
> --
> Felipe Lahti
> Consultant Developer - ThoughtWorks Porto Alegre
>

Re: Variable expansion in DIH SimplePropertiesWriter's filename?

2013-01-30 Thread Jonas Birgander

Thank you for the reply, issue created at 
.


Regards,
Jonas Birgander


On 2013-01-30 16:26, Dyer, James wrote:

This is a bug.  Can you paste what you've said here into a new JIRA issue?

https://issues.apache.org/jira/browse/SOLR

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jonas Birgander [mailto:jonas.birgan...@prisjakt.nu]
Sent: Wednesday, January 30, 2013 4:54 AM
To: solr-user@lucene.apache.org
Subject: Variable expansion in DIH SimplePropertiesWriter's filename?

Hello,

I'm testing Solr 4.1, but I've run into some problems with
DataImportHandler's new propertyWriter tag.
I'm trying to use variable expansion in the `filename` field when using
SimplePropertiesWriter.

Here are the relevant parts of my configuration:

conf/solrconfig.xml
-


  db-data-config.xml



  
  ${country_code}
  




conf/db-data-config.xml
-





  





  






If country_code is set to "gb", I want the last_index_time to be read
and written in the file conf/gb.dataimport.properties, instead of the
default conf/dataimport.properties

The variable expansion works perfectly in the SQL and setup of the data
source, but not in the property writer's filename field.

When initiating an import, the log file shows:

Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
maybeReloadConfiguration
INFO: Loading DIH Configuration: db-data-config.xml
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$skipDoc present in DataConfig does not have a
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$deleteDocById present in DataConfig does not have a
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
WARNING: Unable to read:
${dataimporter.request.country_code}.dataimport.properties


Is it supposed to work?
Anyone else having problems with this?


Any help appreciated!

Regards,




--
Jonas Birgander 
Systemutvecklare Prisjakt & Minhembio

Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti

Hi Sandeep,

Quick answer is that not only the boost that you define in your
requestHandler is taken to calculate the score of each document. There are
others factors that contribute to score calculation. You can take a look
here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
using debugQuery=true the score calculation for each document returned.

Let me know you need something else.



On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry  wrote:

> Hi All,
>
> I'm facing an issue in relevancy calculation by dismax query parser.
> The boost factor applied does not work as expected in certain cases when
> the keyword is generic and by generic I mean, if the keyword is appearing
> many times in the document as well as in the index.
>
> I have parser configuration as below:
>
> 
> 
> edismax
> explicit
> 0.01
> series_title^500 title^100 description^15
> contribution
> series_title^200
> 0
> *:*
> 
> 
>
> As you can see above, I'd expect the documents containing the matches for
> series title should rank higher than the ones in contribution.
>
> This works well, if I type in a query like 'wonderworld' which is a less
> occurring term and the series titles rank higher. But, if I type in a
> keyword like 'news' which is the most common term in the index, I get hits
> in contributions even though I have lots of documents having word news in
> series title.
>
> The field definition is as below:
>
>  multiValued="false" />
>  multiValued="false" />
>  multiValued="false" />
>  multiValued="true" />
>
>  compressThreshold="10">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
>
>  >
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
>  
>
> I have tried debugging and when I use query term news, I see that matches
> for contributions are ranked higher than series title. The parsed queries
> look like below:
> (Note that I have edited the query as in reality I have lot of fields that
> are searchable and I have only mentioned the fields containing text data -
> rest all contain uuids)
>
> 
> (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
> () () () () () () () () () () () () () () () () () () () ())/no_coord
> 
> 
> +(description:news^15 | title:news^100.0 | contributions:news |
> series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
> () () () () () () () () () () () () ()
>
>
> Could you guide me in right direction please?
>
> Many Thanks,
> Sandeep
>



-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre

Term Frequencies for Query Result

2013-01-30 Thread Kai Gülzau

Hi,

I am looking for a way to get the top terms for a query result.

Faceting does not work since counts are measured as documents containing a term 
and not as the overall count of a term in all found documents:

http://localhost:8983/solr/master/select?q=type%3A7&rows=1&wt=json&indent=true&facet=true&facet.query=type%3A7&facet.field=albody&facet.method=fc

  "facet_counts":{
"facet_queries":{
  "type:7":156},
"facet_fields":{
  "albody":[
"der",73,
"in",68,
"betreff",63,
...


Using http://wiki.apache.org/solr/TermVectorComponent an counting all 
frequencies manually seems to be the only solution by now:

http://localhost:8983/solr/tvrh/?q=type:7&tv.fl=albody&f.albody.tv.tf=true&wt=json&indent=true


"termVectors":[

"uniqueKeyFieldName","ukey",

"798_7_0",[

  "uniqueKey","798_7_0",

  "albody",[

"der",[

  "tf",5],

"die",[

  "tf",7],

...



Does anyone know a better and more efficient solution?


Regards,

Kai Gülzau

Re: configuring datasource for dynamic password and user

2013-01-30 Thread Michael Della Bitta

Sorry, email sent too quickly. Here's the second url:

http://wiki.apache.org/solr/SolrConfigXml?highlight=%28solrconfig%5C.xml%29#System_property_substitution

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Jan 30, 2013 at 10:42 AM, Michael Della Bitta
 wrote:
> Hi Elizabeth,
>
> I haven't tried this, but given this entry:
>
> http://wiki.apache.org/solr/DataImportHandler#Adding_datasource_in_solrconfig.xml
>
> You should be able to parameterize the arguments in solrconfig.xml
> with environment variables and then set them in solr.xml or at runtime
> using command line arguments like this:
>
>
> Michael
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, Jan 30, 2013 at 6:37 AM, Lapera-Valenzuela, Elizabeth
> [Primerica]  wrote:
>> Hi, we will be using solr on development, test and prod platforms.  Is
>> there a way to dynamically create the datasource so that the url,
>> password and user id is passed in or can I point it to a properties file
>> that has this info?  Thanks
>>

Re: configuring datasource for dynamic password and user

2013-01-30 Thread Michael Della Bitta

Hi Elizabeth,

I haven't tried this, but given this entry:

http://wiki.apache.org/solr/DataImportHandler#Adding_datasource_in_solrconfig.xml

You should be able to parameterize the arguments in solrconfig.xml
with environment variables and then set them in solr.xml or at runtime
using command line arguments like this:


Michael

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Jan 30, 2013 at 6:37 AM, Lapera-Valenzuela, Elizabeth
[Primerica]  wrote:
> Hi, we will be using solr on development, test and prod platforms.  Is
> there a way to dynamically create the datasource so that the url,
> password and user id is passed in or can I point it to a properties file
> that has this info?  Thanks
>

RE: A question about attaching shards to load balancers

2013-01-30 Thread Michael Ryan

>From a performance point of view, I can't imagine it mattering. In our setup, 
>we have a dedicated Solr server that is not a shard that takes incoming 
>requests (we call it the "coordinator"). This server is very lightweight and 
>practically has no load at all.

My gut feeling is that having a separate dedicated server might be a slightly 
better approach, as it will have totally different performance characteristics 
than the shards, and so you can tune it for this.

-Michael

Re: help to build query

2013-01-30 Thread Jack Krupansky

Start by expressing the specific semantics of those queries in strict 
boolean form. I mean, what exactly do you mean by "in", and "location1, 
location 2", and "location1, loc2 and loc3? Is the latter an AND or an OR?


Or at least fully express those two queries, unambiguously in plain English. 
There is too much ambiguity present to give you any solid direction.


-- Jack Krupansky

-Original Message- 
From: Abhishek tiwari

Sent: Wednesday, January 30, 2013 12:55 AM
To: solr-user@lucene.apache.org
Subject: help to build query

want to execute queries like :
a)  cat in location1 , location2
b)  cat 1 and cat2 in location1 ,loc2 and  loc3

in our search .

our challenges :

1)  picking right keywords(category and locality) from query entered.
2)  its mapping to relevant entity

How should i proceed for it .

we have localities and categories data indexed .

thanks in advance.

~abhishek

RE: Variable expansion in DIH SimplePropertiesWriter's filename?

2013-01-30 Thread Dyer, James

This is a bug.  Can you paste what you've said here into a new JIRA issue?  

https://issues.apache.org/jira/browse/SOLR

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jonas Birgander [mailto:jonas.birgan...@prisjakt.nu] 
Sent: Wednesday, January 30, 2013 4:54 AM
To: solr-user@lucene.apache.org
Subject: Variable expansion in DIH SimplePropertiesWriter's filename?

Hello,

I'm testing Solr 4.1, but I've run into some problems with 
DataImportHandler's new propertyWriter tag.
I'm trying to use variable expansion in the `filename` field when using 
SimplePropertiesWriter.

Here are the relevant parts of my configuration:

conf/solrconfig.xml
-

   
 db-data-config.xml
   

   
 
 ${country_code}
 
   



conf/db-data-config.xml
-

   

   
   
 
   



   
 
   





If country_code is set to "gb", I want the last_index_time to be read 
and written in the file conf/gb.dataimport.properties, instead of the 
default conf/dataimport.properties

The variable expansion works perfectly in the SQL and setup of the data 
source, but not in the property writer's filename field.

When initiating an import, the log file shows:

Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
maybeReloadConfiguration
INFO: Loading DIH Configuration: db-data-config.xml
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$skipDoc present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$deleteDocById present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig
INFO: Data Configuration loaded successfully
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.SimplePropertiesWriter 
readIndexerProperties
WARNING: Unable to read: 
${dataimporter.request.country_code}.dataimport.properties


Is it supposed to work?
Anyone else having problems with this?


Any help appreciated!

Regards,
-- 
Jonas Birgander 
Systemutvecklare Prisjakt & Minhembio

Re: A question about attaching shards to load balancers

On 1/30/2013 6:45 AM, Lee, Peter wrote:

Upayavira,

Thank you for your response. I'm sorry my post is perhaps not clear...I am
relatively new to solr and I'm not sure I'm using the correct nomenclature.

We did encounter the issue of one shard in the stripe going down and all other
shards continue to receive requests...and return errors because of the missing
shard. We did in fact correct this problem by making our healthcheck smart
enough to test all of the other servers in the stripe. That works very well and
was not hard at all to implement.

My intended question was one entirely about performance. Perhaps if I am more
specific it will help.

We have 6 servers per "stripe" (which means, a search request going to any one
of these servers also generates traffic on the other 5 servers in the stripe to fulfill
the request) and multiple stripes (for load and for redundancy). For this discussion
though, let's assume we have only ONE stripe.

We currently have a load balancer that points to all 6 of the servers in our stripe. That
is, requests from "outside" can be directed to any server in the stripe.

The question is: Has anyone performed empirical testing to see if perhaps
having 2 or 3 servers (instead of all 6) on the load balancer improves
performance?

In this configuration, sure, not all servers can field requests from the "outside."
However, the total amount of "conversation" going on between the different servers will
also be lower, as distributed searches can now only originate from 2 or 3 servers in the stripe
(however many we attached to the load balancer).

We can perform this testing, but it will take time, so I thought I'd ask if anyone has
done this already. I was hoping to find a mention of a "best practice"
somewhere regarding this type of question, but I have not found one yet.

I have a multi-server distributed Solr 3.5 installation behind a load
balancer (haproxy). The application and the load balancer are
completely unaware of the shards parameter- that's handled in Solr.
Here's how I've made that work:

The core with the shards parameter (we refer to it as a broker core)
exists on all servers. There are two servers for chain A and two
servers for chain B. Three of the seven shards live on idxa1/idxb1 and
four of the shards live on idxa2/idxb2. The "shards" parameter on both
chain A servers point only to chain A shards. The same goes for chain B.

The ping handler's health check query contains shards and shards.qt
parameters, so the health check will fail if any of the shards for that
chain are down.

The load balancer has idxa1 and idxb1 as primary equal cost entries. It
has idxa2 and idxb2 as backup entries, with idxa2 having the higher
weight. In normal operation, queries only go to idxa1 and idxb1.

If any shard failure happens on either chain A server, both the idxa1
and idxa2 entries will be marked down by the health check and queries
will only go to chain B.

I can also disable these servers from the load balancer's perspective
using the admin UI. If idxb1 is disabled, all queries will go to idxa1
(which utilizes both idxa1 and idxa2). In that situation, if any chain
A failure were to happen but the chain B shards were all still fine,
idxb2 would still be marked up and the load balancer would send the
queries there.

The two index chains are independently updated - no replication. This
allows me to disable either idxa1 or idxb1 and completely rebuild (or
upgrade) the disabled chain while the other chain remains online. I can
then switch and do the same thing to the other chain, and the
application using Solr has no idea anything has happened.

Thanks,
Shawn

Possible issue in edismax?

Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.

I have parser configuration as below:



edismax
explicit
0.01
series_title^500 title^100 description^15
contribution
series_title^200
0
*:*



As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:






























 

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)


(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord


+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep

Re: CopyField issue on Solr4.1