Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry
Thanks Felipe..
Can you point me an example please?

Also forgive me but if a document has matches in more searchable fields
then should it not rank higher?

Thanks,
Sandeep
On 30 Jan 2013 19:30, "Felipe Lahti"  wrote:

> If you compare the first and last document scores you will see that the
> last one matches more fields than first one. So, you maybe thinking why?
> The first doc only matches "contributions" field and the last matches a
> bunch of fields so if you want to  have behave more like ( name="qf">series_title^500 title^100 description^15 contribution) you
> have to override the method of DefaultSimilarity.
>
>
> On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry 
> wrote:
>
> > I have pasted it below and it is slightly variant from the dismax
> > configuration I have mentioned above as I was playing with all sorts of
> > boost values, however it looks more lie below:
> >
> > 
> > 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times
> others
> > of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> > [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with
> freq
> > of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 40960.0 = fieldNorm(doc=63298)
> > 
> > 
> > 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
> > of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> > [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
> > termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
> > with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> > maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> > 
> > 
> > 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times
> others
> > of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> > [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 32768.0 = fieldNorm(doc=9882325)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=220007)
> > 
> > 
> > 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times
> others
> > of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
> > [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
> > termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> > 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> > 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
> > freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> > 24576.0 = fieldNorm(doc=241151)
> > 
> > 
> > id:c208c2b4-1b3e-27b8-e040-a8c00409063a
> > 
> >  
> > 6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times
> others
> > of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
> > [DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
> > termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
> > boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
> > queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 =
> tf(freq=1.0),
> > with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
> > maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
> > weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result
> of:
> > 5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
> > 0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
> > idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
> > fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0
> =
> > termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
> > fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news
> in
> > 967895) [DefaultSimilarity], result of: 0.18680073 =
> > score(doc=967895,

Re: Issue with spellcheck and autosuggest

2013-01-30 Thread Artyom
you can of course check suggestions, but then you should remove
  wordbreak 
from your handler, because its purpose is to find cases, when user types
spaces wrongly (e.g., solrrocks, sol rrocks, so lr)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-with-spellcheck-and-autosuggest-tp4036208p4037631.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: admin security vs. replication

2013-01-30 Thread AlexeyK
As long as Core Admin is accessible via HTTP and allows to manipulate Solr
cores, it should be secured, regardless of configured path. The difference
between securing Admin vs. securing other handlers is that other handlers
are accessed by a specific application server(s), and therefore may be
easily firewalled etc.
Admin interface can (in theory) be accessed from machine other than
application server, but I cannot really apply security constraints to it as
long as Core Admin is used both internally(replication) and externally
(admin web interface JS).
Therefore, it's necessary to provide reverse proxy with access control
management for secure external access to admin AND internal access.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-admin-security-vs-replication-tp4037337p4037628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: field space consumption - stored vs not stored

2013-01-30 Thread Shawn Heisey

On 1/30/2013 6:24 PM, Shawn Heisey wrote:

If I had to guess about the extra space required for storing an int
field, I would say it's in the neighborhood of 20 bytes per document,
perhaps less.  I am also interested in a definitive answer.


The answer is very likely less than 20 bytes per doc.  I was assuming a 
larger size for VInt than it is likely to use.  See the answer for this 
question:


http://stackoverflow.com/questions/2752612/what-is-the-vint-in-lucene

Thanks,
Shawn



Re: field space consumption - stored vs not stored

2013-01-30 Thread Shawn Heisey

On 1/30/2013 6:04 PM, Petersen, Robert wrote:

Hi

Just a quick question:  for a single valued int field in solr 3.6.1 how much 
more space is used if the field is stored vs indexed and not stored?


Here is the index file format reference for the two files that make up 
stored fields in the 3.6 index format:


http://lucene.apache.org/core/3_6_2/fileformats.html#field_index

If I read that right, the fdx file has a fixed size of 8 bytes times the 
number of documents in the segment.  The size should only depend on the 
number of documents, not the number of stored fields or their contents.


The fdt file contains the actual stored data and will vary according to 
the actual stored data.  Smaller fields take up less space than large 
fields.  If a stored field is missing from a document, it probably 
doesn't take up any space.  There is some overhead - exactly how much 
overhead is hard for me to calculate, especially since I don't know how 
much space a VInt takes up, which may in fact be variable.


If I had to guess about the extra space required for storing an int 
field, I would say it's in the neighborhood of 20 bytes per document, 
perhaps less.  I am also interested in a definitive answer.


Thanks,
Shawn



Re: queryResultWindowSize

2013-01-30 Thread Erick Erickson
Pretty much. The queryResultCache is pretty inexpensive. But be a bit
careful, it's tempting to increase it greatly, but that only buys you
performance if you see your users actually ask for subsequent pages
reasonably often

Best
Erick


On Tue, Jan 29, 2013 at 1:38 PM, Isaac Hebsh  wrote:

> Hi everyone.
>
> queryResultWindowSize parameter sets the number of documents be to
> retrieved and stored in the queryResultCache. That is, a further request
> with the same query, but the next page, might return very fast.
> The items that stored in queryResultCache are IDs only (not stored fields
> or preview/highlighting).
>
> That means that increasing queryResultWindowSize, will not significantly
> hurt performance of the first query?
>


Re: SolrException: Error loading class 'org.apache.solr.response.transform.EditorialMarkerFactory'

2013-01-30 Thread Erick Erickson
Please feel free to just edit the Wiki yourself, all you have to do is
create a login




On Wed, Jan 23, 2013 at 9:04 AM, eShard  wrote:

> Thanks,
> That worked.
> So the documentation needs to be fixed in a few places (the solr wiki and
> the default solrconfig.xml in Solr 4.0 final; I didn't check any other
> versions)
> I'll either open a new ticket in JIRA to request a fix or reopen the old
> one...
>
> Furthermore,
> I tried using the ElevatedMarkerFactory and it didn't behave the way I
> thought it would.
>
> this
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&wt=xml&defType=dismax
> got me all the doc info but no elevated marker
>
> I ran this
>
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=[elevated]&wt=xml&defType=dismax
> and all I got was response = 1 and elevated = true
>
> I had to run this to get all of the above info:
>
> http://localhost:8080/solr/Lisa/elevate?q=foo+bar&fl=*,[elevated]&wt=xml&defType=dismax
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrException-Error-loading-class-org-apache-solr-response-transform-EditorialMarkerFactory-tp4035203p4035621.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Manually assigning shard leader and replicas during initial setup on EC2

2013-01-30 Thread Erick Erickson
But there's still the latency issue. Draw a diagram of all the
communications that have to go on to do an update and it's a _lot_ of
arrows going across DCs.

My suspicion is that it'll be much easier to just treat the separate DCs as
separate clusters that don't know about each other. that is, your indexing
process just has to send the update request to each cluster. Synchronizing
them afterwards is a problem assuredly.

TNSTAAFL


On Wed, Jan 23, 2013 at 4:46 AM, Upayavira  wrote:

> The way Zookeeper is set up, requiring 'quorum' is aimed at avoiding
> 'split brain' where two halves of your cluster start to operate
> independently. This means that you *have* to favour one half of your
> cluster over the other, in the case that they cannot communicate with
> each other.
>
> For example. if you have three zookeepers, you'll put two in one DC and
> one in the other. The DC with two Zookeepers will stay active should the
> link between them go down.
>
> I'm not entirely sure what happens to the network with one zookeeper,
> I'd like to think it can still serve queries, it *could* work out which
> nodes are accessible to it - but it will certainly not be doing updates
> (they should be buffered until the other DC returns).
>
> If you want true geographical redundancy, I think Markus' suggestion is
> a sensible one.
>
> Upayavira
>
> On Tue, Jan 22, 2013, at 10:11 PM, Markus Jelsma wrote:
> > Hi,
> >
> > Regarding availability; since SolrCloud is not DC-aware at this moment we
> > 'solve' the problem by simply operating multiple identical clusters in
> > different DCs and send updates to them all. This works quite well but it
> > requires some manual intervention if a DC is down due to a prolonged DOS
> > attack or netwerk of power failure.
> >
> > I don't think it's a very good idea to change clusterstate.json because
> > Solr will modify it when for example a node goes down. Your preconfigured
> > state doesn't exist anymore. It's also a bad idea because distributed
> > queries are going to be sent to remote locations, adding a lot of
> > latency. Again, because it's not DC aware.
> >
> > Any good solution to this problem should be in Solr itself.
> >
> > Cheers,
> >
> >
> > -Original message-
> > > From:Timothy Potter 
> > > Sent: Tue 22-Jan-2013 22:46
> > > To: solr-user@lucene.apache.org
> > > Subject: Manually assigning shard leader and replicas during initial
> setup on EC2
> > >
> > > Hi,
> > >
> > > I'm wanting to split my existing Solr 4 cluster into 2 different
> > > availability zones in EC2, as in have my initial leaders in one zone
> and
> > > their replicas in another AZ. My thinking here is if one zone goes
> down, my
> > > cluster stays online. This is the recommendation of Amazon EC2 docs.
> > >
> > > My thinking here is to just cook up a clusterstate.json file to
> manually
> > > set my desired shard / replica assignments to specific nodes. After
> which I
> > > can update the clusterstate.json file in Zk and then bring the nodes
> > > online.
> > >
> > > The other thing to mention is that I have existing indexes that need
> to be
> > > preserved as I don't want to re-index. For this I'm planning to just
> move
> > > data directories where they need to be based on my changes to
> > > clusterstate.json
> > >
> > > Does this sound reasonable? Any pitfalls I should look out for?
> > >
> > > Thanks.
> > > Tim
> > >
>


Re: Return Meta data when using Suggester

2013-01-30 Thread Erik Holstad
Thanks Arcadius!
Looks very promising and will have a look at it.

On Wed, Jan 30, 2013 at 3:00 PM, Arcadius Ahouansou wrote:

> Hi Erik.
>
> You way want to have a look at:
>
> http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
>
> Arcadius.
>
>
>
> On 30 January 2013 17:09, Erik Holstad  wrote:
>
> > Hey!
> >
> > Been playing around with Suggester and things are working just fine,
> > but I have a use case where it would be really helpful to return some
> meta
> > data
> > for each suggestion, for example the "id", or some other input that I can
> > specify.
> >
> > Is this possible, or would it totally mess up the underlying data
> structure
> > to do
> > something like this.
> >
> > I guess you could add the meta data to the tail of your string and then
> > parse it out at
> > retrieval time, but it feels kinda hacky.
> >
> >
> > --
> > Regards Erik
> >
>



-- 
Regards Erik


Re: Return Meta data when using Suggester

2013-01-30 Thread Arcadius Ahouansou
Hi Erik.

You way want to have a look at:

http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Arcadius.



On 30 January 2013 17:09, Erik Holstad  wrote:

> Hey!
>
> Been playing around with Suggester and things are working just fine,
> but I have a use case where it would be really helpful to return some meta
> data
> for each suggestion, for example the "id", or some other input that I can
> specify.
>
> Is this possible, or would it totally mess up the underlying data structure
> to do
> something like this.
>
> I guess you could add the meta data to the tail of your string and then
> parse it out at
> retrieval time, but it feels kinda hacky.
>
>
> --
> Regards Erik
>


Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti
If you compare the first and last document scores you will see that the
last one matches more fields than first one. So, you maybe thinking why?
The first doc only matches "contributions" field and the last matches a
bunch of fields so if you want to  have behave more like (series_title^500 title^100 description^15 contribution) you
have to override the method of DefaultSimilarity.


On Wed, Jan 30, 2013 at 4:12 PM, Sandeep Mestry  wrote:

> I have pasted it below and it is slightly variant from the dismax
> configuration I have mentioned above as I was playing with all sorts of
> boost values, however it looks more lie below:
>
> 
> 2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others
> of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
> [DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 40960.0 = fieldNorm(doc=63298)
> 
> 
> 2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
> of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
> [DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
> termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
> with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
> maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)
> 
> 
> 2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others
> of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
> [DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 32768.0 = fieldNorm(doc=9882325)
> 
> 
> 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
> of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
> [DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 24576.0 = fieldNorm(doc=220007)
> 
> 
> 1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
> of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
> [DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
> termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
> 14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
> 357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
> freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
> 24576.0 = fieldNorm(doc=241151)
> 
> 
> id:c208c2b4-1b3e-27b8-e040-a8c00409063a
> 
>  
> 6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times others
> of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
> [DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
> termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
> boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
> queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 = tf(freq=1.0),
> with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
> maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
> weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result of:
> 5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
> 0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
> idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
> fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
> termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
> fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news in
> 967895) [DefaultSimilarity], result of: 0.18680073 =
> score(doc=967895,freq=1.0 = termFreq=1.0 ), product of: 0.002031815 =
> queryWeight, product of: 6.5669904 = idf(docFreq=43120, maxDocs=11282414)
> 3.093982E-4 = queryNorm 91.93787 = fieldWeight in 967895, product of: 1.0 =
> tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.5669904 =
> idf(docFreq=43120, maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 6.464123
> = (MATCH) weight(pg_series_title_ci:news^500.0 in 967895)
> [DefaultSim

Re: overlap function query

2013-01-30 Thread Chris Hostetter

: I think coord works at the document level, I was thinking of having
: something that worked at a field level, against a 'principle/primary'
: field.

I'm not sure what you mean by "works at hte document level" ... coord is 
used y the BooleanQuery scoring mechanism to define how scores should be 
affected when a document doens't match all terms in the query.

Mikhail's suggestion was that with an appropriately definied coord method 
in a custom similarity, you could probably get close to what you are 
asking about using the "query()" function on a BooleanQuery containing hte 
terms you are interested in.

It may be easier then that though ... you might want to take a look at the 
termfreq() and norm() functions .. combined with map() (to ensure you get 
a "1" for docs that match a term, no matter what the termfreq() is) you 
could probably get proportionate values to what you are looking for -- but 
the denominators won't be the exact number of terms in the field unless 
you customize the norm function in your similarity.

in general though this smells like an XY problem, because the kind of 
boosting you seem to be trying to achieve sounds like exactly what the 
normal TF/IDF scoring algorithm will give you.  so perhaps you should tell 
us more about some real world specifics of hte types of data/query you are 
using, what types of results you are seeing, and the types of results you 
want...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss


Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
thanks Shawn, 
I will try both the approach. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread Shawn Heisey

On 1/30/2013 11:48 AM, adityab wrote:

thanks Shawn,
We use Master-Slave architecture in Prod and planning to continue even with
4.1.
Our indexing usually happens on Master. and we about 10K docs every 2hrs and
then perform commit.
Our full re-index is only when we have schema change. So we dont use auto
commit.

Is there a way to turn off the "tlog", or when/what action recycles this
file?


The autoCommit settings that I have described will recycle the 
transaction logs without changing ANYTHING about how you index.  The 
resulting tlogs will be small, and there will not be very many of them. 
 That would be the first thing to try, and it should solve things for 
you completely.  You do need to have a _version_ field defined in your 
schema, but you probably already know that.


  

Removing the "updateLog" config from your updateHandler definition will 
turn off transaction logs entirely.  Because you have already switched 
to MMapDirectoryFactory, this should be safe - but I would recommend the 
autoCommit changes instead.


Thanks,
Shawn



Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
thanks Shawn, 
We use Master-Slave architecture in Prod and planning to continue even with
4.1. 
Our indexing usually happens on Master. and we about 10K docs every 2hrs and
then perform commit. 
Our full re-index is only when we have schema change. So we dont use auto
commit. 

Is there a way to turn off the "tlog", or when/what action recycles this
file?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037493.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread Shawn Heisey

On 1/30/2013 11:21 AM, adityab wrote:

Shawn,
i believe your point is valid ... if you see below my tlog.* file size is
huge. Bu shouldn't that be cleared if i am not using soft commit and do an
explicit hard commit?

After deleting this i was able to get my server up. Thanks for the
information/help. Also pleae let me know how to avoid such situation. For
now i can work with manually deleting the log but cant do it all the time.

[root@awbmmv030 tlog]# pwd
/storage/solrdata/tlog
[root@awbmmv030 tlog]# ls -l
total 27624908
-rw-r--r-- 1 root root 28260272185 Jan 29 15:44 tlog.001
[root@awbmmv030 tlog]#


I imagine that the reason that you do a large amount of indexing without 
any commits is that you don't want the clients to see any change until 
the entire index run is complete.


If you use autoCommit but make openSearcher false, then a hard commit 
will be automatically performed every time you reach maxDocs or maxTime 
milliseconds.  What the openSearcher value of false does is make it so 
the same IndexSearcher will continue to be used -- no change will be 
visible to clients making queries, even if one of the things you do is 
delete all documents before adding new documents.


I'm currently using a maxDocs of 25000 and a maxTime of 30 (five 
minutes).  I don't think it ever takes five minutes to index 25000 
documents.


Thanks,
Shawn



Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
Shawn, 
i believe your point is valid ... if you see below my tlog.* file size is
huge. Bu shouldn't that be cleared if i am not using soft commit and do an
explicit hard commit?

After deleting this i was able to get my server up. Thanks for the
information/help. Also pleae let me know how to avoid such situation. For
now i can work with manually deleting the log but cant do it all the time. 

[root@awbmmv030 tlog]# pwd
/storage/solrdata/tlog
[root@awbmmv030 tlog]# ls -l
total 27624908
-rw-r--r-- 1 root root 28260272185 Jan 29 15:44 tlog.001
[root@awbmmv030 tlog]#




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
Thanks for additional information Shawn,

I am testing 4.1 on single machine sing core. so no cloud. I did change
NRTCachingDirectoryFactory to MMapDirectoryFactory and after indexing all
the document we do a hard commit explicitly from our publisher client. 
I was able to run queries to verify my document. The problem encountered
after i restarted the server for load test. Just to make sure all caches are
clean. 
This is where i see the server never starts and is stuck at the point
mentioned in my logs. 

If i delete the index manually and then start the server every thing looks
good. 
Also not able to set log level to debug from Admin UI. (it doesn't show up
may be that's another bug)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry
I have pasted it below and it is slightly variant from the dismax
configuration I have mentioned above as I was playing with all sorts of
boost values, however it looks more lie below:


2675.7844 = (MATCH) sum of: 2675.7844 = (MATCH) max plus 0.01 times others
of: 2675.7844 = (MATCH) weight(contributions:news in 63298)
[DefaultSimilarity], result of: 2675.7844 = score(doc=63298,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
595177.7 = fieldWeight in 63298, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
40960.0 = fieldNorm(doc=63298)


2317.297 = (MATCH) sum of: 2317.297 = (MATCH) max plus 0.01 times others
of: 2317.297 = (MATCH) weight(contributions:news in 9826415)
[DefaultSimilarity], result of: 2317.297 = score(doc=9826415,freq=3.0 =
termFreq=3.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
515439.0 = fieldWeight in 9826415, product of: 1.7320508 = tf(freq=3.0),
with freq of: 3.0 = termFreq=3.0 14.530705 = idf(docFreq=14,
maxDocs=11282414) 20480.0 = fieldNorm(doc=9826415)


2140.6274 = (MATCH) sum of: 2140.6274 = (MATCH) max plus 0.01 times others
of: 2140.6274 = (MATCH) weight(contributions:news in 9882325)
[DefaultSimilarity], result of: 2140.6274 = score(doc=9882325,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
476142.16 = fieldWeight in 9882325, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
32768.0 = fieldNorm(doc=9882325)


1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
of: 1605.4707 = (MATCH) weight(contributions:news in 220007)
[DefaultSimilarity], result of: 1605.4707 = score(doc=220007,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 220007, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
24576.0 = fieldNorm(doc=220007)


1605.4707 = (MATCH) sum of: 1605.4707 = (MATCH) max plus 0.01 times others
of: 1605.4707 = (MATCH) weight(contributions:news in 241151)
[DefaultSimilarity], result of: 1605.4707 = score(doc=241151,freq=1.0 =
termFreq=1.0 ), product of: 0.004495774 = queryWeight, product of:
14.530705 = idf(docFreq=14, maxDocs=11282414) 3.093982E-4 = queryNorm
357106.62 = fieldWeight in 241151, product of: 1.0 = tf(freq=1.0), with
freq of: 1.0 = termFreq=1.0 14.530705 = idf(docFreq=14, maxDocs=11282414)
24576.0 = fieldNorm(doc=241151)


id:c208c2b4-1b3e-27b8-e040-a8c00409063a

 
6.5742764 = (MATCH) sum of: 6.5742764 = (MATCH) max plus 0.01 times others
of: 3.304414 = (MATCH) weight(description:news^25.0 in 967895)
[DefaultSimilarity], result of: 3.304414 = score(doc=967895,freq=1.0 =
termFreq=1.0 ), product of: 0.042727955 = queryWeight, product of: 25.0 =
boost 5.5240083 = idf(docFreq=122362, maxDocs=11282414) 3.093982E-4 =
queryNorm 77.33611 = fieldWeight in 967895, product of: 1.0 = tf(freq=1.0),
with freq of: 1.0 = termFreq=1.0 5.5240083 = idf(docFreq=122362,
maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 5.913381 = (MATCH)
weight(pg_series_title:news^50.0 in 967895) [DefaultSimilarity], result of:
5.913381 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
0.080834694 = queryWeight, product of: 50.0 = boost 5.2252855 =
idf(docFreq=164961, maxDocs=11282414) 3.093982E-4 = queryNorm 73.154 =
fieldWeight in 967895, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 =
termFreq=1.0 5.2252855 = idf(docFreq=164961, maxDocs=11282414) 14.0 =
fieldNorm(doc=967895) 0.18680073 = (MATCH) weight(p_programme_title:news in
967895) [DefaultSimilarity], result of: 0.18680073 =
score(doc=967895,freq=1.0 = termFreq=1.0 ), product of: 0.002031815 =
queryWeight, product of: 6.5669904 = idf(docFreq=43120, maxDocs=11282414)
3.093982E-4 = queryNorm 91.93787 = fieldWeight in 967895, product of: 1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.5669904 =
idf(docFreq=43120, maxDocs=11282414) 14.0 = fieldNorm(doc=967895) 6.464123
= (MATCH) weight(pg_series_title_ci:news^500.0 in 967895)
[DefaultSimilarity], result of: 6.464123 = score(doc=967895,freq=1.0 =
termFreq=1.0 ), product of: 0.9696 = queryWeight, product of: 500.0 =
boost 6.4641423 = idf(docFreq=47791, maxDocs=11282414) 3.093982E-4 =
queryNorm 6.4641423 = fieldWeight in 967895, product of: 1.0 =
tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 6.4641423 =
idf(docFreq=47791, maxDocs=11282414) 1.0 = fieldNorm(doc=967895) 1.6107484
= (MATCH) weight(title_ci:news^100.0 in 967895) [DefaultSimilarity], result
of: 1.6107484 = score(doc=967895,freq=1.0 = termFreq=1.0 ), product of:
0.22324038 = queryWeight, product of: 100.0 = boost 7.215

Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
thanks Jack, 

I did take the latest solrconfig.xml file. 
The only change i made to the file is for using MMapDirectory




Apart from that i increased the cache size for query/document/filter all
warm-up set to 0 (for test)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458p4037478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread Shawn Heisey

On 1/30/2013 10:31 AM, Jack Krupansky wrote:

Do you have any customized settings for , warming
queries, or customized settings for the old/deprecated 
or ? Try using the settings from the latest solrconfig.xml
and then customize from there. Or at least see how they are different.


Jack,

Is there any possibility that their failure could be caused by a very 
very large updateLog?


If not, the rest of this probably doesn't apply.  If so:

Aditya, a large updateLog is generated when a large amount of indexing 
is done without any hard commits.  One way to make things better without 
changing the way you index is to put nonzero values for maxDocs and/or 
maxTime into autoCommit, with openSearcher=false.


http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section

Another option is to turn off the updateLog entirely.  That would not be 
recommended if you are using SolrCloud.  If there's no cloud, you'd 
probably be OK, but you'd also want to change from 
NRTCachingDirectoryFactory to MMapDirectoryFactory to avoid data loss.


Thanks,
Shawn



Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti
Let me see if I understood your problem:

By your first e-mail I think you are worried about the returned order of
documents from Solr. Is that correct? If yes, as I said before it's not
only the boosting that influence the order of returned documents. There's
term frequency, IDF(inverse document frequency)... If I understood
correctly by your first e-mail, you are interested in get rid of IDF. So
for that, you can create a NoIDFSimilarity class to override the default
similarity.

Can you paste here the score calculation for one document?


On Wed, Jan 30, 2013 at 2:06 PM, Sandeep Mestry  wrote:

> (Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
> an email in Gmail.. ;-))
>
> Thanks Felipe, yes I have seen that and my requirement falls for
>
> How can I make exact-case matches score higher
>
> Example: a query of "Penguin" should score documents containing "Penguin"
> higher than docs containing "penguin".
>
> The general strategy is to index the content twice, using different fields
> with different fieldTypes (and different analyzers associated with those
> fieldTypes). One analyzer will contain a lowercase filter for
> case-insensitive matches, and one will preserve case for exact-case
> matches.
>
> Use copyField  commands
> in
> the schema to index a single input field multiple times.
>
> Once the content is indexed into multiple fields that are analyzed
> differently, query across both
> fields
> .
>
> I have added a case insensitive field too to match the exact matches
> higher, however the result is not even considering the matches in field -
> forget the exact matching part.
>
> And I have tried the debugQuery option as mentioned in my previous mail,
> and I have also posted the parsed queries. From the debug query, I see that
> field boosted with lesser factor (contribution) is still resulting higher
> than the one with higher boost factor (series_title).
>
>
> Thanks,
>
> Sandeep
>
>
>
>
> On 30 January 2013 16:02, Sandeep Mestry  wrote:
>
> > Thanks Felipe, yes I have seen that and my requirement somewhere falls
> for
> >
> >
> > On 30 January 2013 15:53, Felipe Lahti  wrote:
> >
> >> Hi Sandeep,
> >>
> >> Quick answer is that not only the boost that you define in your
> >> requestHandler is taken to calculate the score of each document. There
> are
> >> others factors that contribute to score calculation. You can take a look
> >> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
> >> see
> >> using debugQuery=true the score calculation for each document returned.
> >>
> >> Let me know you need something else.
> >>
> >>
> >>
> >> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I'm facing an issue in relevancy calculation by dismax query parser.
> >> > The boost factor applied does not work as expected in certain cases
> when
> >> > the keyword is generic and by generic I mean, if the keyword is
> >> appearing
> >> > many times in the document as well as in the index.
> >> >
> >> > I have parser configuration as below:
> >> >
> >> > 
> >> > 
> >> > edismax
> >> > explicit
> >> > 0.01
> >> > series_title^500 title^100 description^15
> >> > contribution
> >> > series_title^200
> >> > 0
> >> > *:*
> >> > 
> >> > 
> >> >
> >> > As you can see above, I'd expect the documents containing the matches
> >> for
> >> > series title should rank higher than the ones in contribution.
> >> >
> >> > This works well, if I type in a query like 'wonderworld' which is a
> less
> >> > occurring term and the series titles rank higher. But, if I type in a
> >> > keyword like 'news' which is the most common term in the index, I get
> >> hits
> >> > in contributions even though I have lots of documents having word news
> >> in
> >> > series title.
> >> >
> >> > The field definition is as below:
> >> >
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="false" />
> >> >  >> > multiValued="true" />
> >> >
> >> >  positionIncrementGap="100"
> >> > compressThreshold="10">
> >> > 
> >> > 
> >> >  >> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >> > 
> >> > 
> >> > 
> >> > 
> >> >  >> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >> > 
> >> > 
> >> > 
> >> >
> >> >  >> positionIncrementGap="100"
> >> > >
> >> > 
> >> > 
> >> >  >> > stemEnglishPossessive="0" generateWordParts="1"
> generateNumberParts="1"
> >> >

Re: SolrCloud 4.1 - change config set for a collection?

2013-01-30 Thread Mark Miller

On Jan 30, 2013, at 12:14 PM, Shawn Heisey  wrote:

> Is it possible to issue a command through the collections API that will 
> assign a new config set (already stored in zookeeper) to an existing 
> collection?

No, not currently. We are talking about such things here: SOLR-4193.

Right now you either have to use one of the auto rules (eg collections with no 
link that see a conf set that shares their name should auto link) or use the 
ZkCli tool or do something manual.

> 
> Related - because such changes would require a reload, is there a RELOAD 
> action on the collection API that finds all the cores for that collection and 
> reloads them?

See the reload command: 
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

> 
> The SolrCloud wiki page doesn't have a reference for the action parameter and 
> the parameter list is almost guaranteed to be incomplete.

I don't think it is? What is missing?

- Mark



Re: Server stops at Opening Searcher in 4.1

2013-01-30 Thread Jack Krupansky
Do you have any customized settings for , warming queries, 
or customized settings for the old/deprecated  or 
? Try using the settings from the latest solrconfig.xml and then 
customize from there. Or at least see how they are different.


-- Jack Krupansky

-Original Message- 
From: adityab

Sent: Wednesday, January 30, 2013 12:21 PM
To: solr-user@lucene.apache.org
Subject: Server stops at Opening Searcher in 4.1

Hi,
We currently have Solr3.5 and working well. With the features and fixes
available in 4.1 we decided to upgrade.
We started some test with Solr4.1 on Jbos7.1. Everything looks good at first
run and indexing and execute some queries. We restart servers before
performing Load test and encountered this problem several times. Where the
search is opening (stuck there) and then jboss throws an error as seen
below. Our index size is 8.9GB (25M docs).
Have observed this usually during restart with index. To get the server
started we need to delete the index and then it goes through. Any idea what
could be the issue?

09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) adding lazy requestHandler: solr.SearchHandler
09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) created /terms: solr.SearchHandler
09:12:08,020 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,022 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,023 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,026 INFO  [org.apache.solr.core.CachingDirectoryFactory]
(coreLoadExecutor-3-thread-1) Releasing directory:/storage/solrdata
09:12:08,260 INFO  [org.apache.solr.search.SolrIndexSearcher]
(coreLoadExecutor-3-thread-1) Opening Searcher@7703d93e main
09:13:05,576 INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2)
JBAS015870: Deploy of deployment "solr.war" was rolled back with failure
message Operation cancelled
09:13:05,578 ERROR [org.jboss.as.server.deployment.scanner]
(DeploymentScanner-threads - 1) JBAS015052: Did not receive a response to
the deployment operation within the allowed timeout period [60 seconds].
Check the server configuration file and the server logs to find more about
the status of the deployment.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Server stops at Opening Searcher in 4.1

2013-01-30 Thread adityab
Hi,
We currently have Solr3.5 and working well. With the features and fixes
available in 4.1 we decided to upgrade. 
We started some test with Solr4.1 on Jbos7.1. Everything looks good at first
run and indexing and execute some queries. We restart servers before
performing Load test and encountered this problem several times. Where the
search is opening (stuck there) and then jboss throws an error as seen
below. Our index size is 8.9GB (25M docs). 
Have observed this usually during restart with index. To get the server
started we need to delete the index and then it goes through. Any idea what
could be the issue? 

09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) adding lazy requestHandler: solr.SearchHandler
09:12:08,007 INFO  [org.apache.solr.core.RequestHandlers]
(coreLoadExecutor-3-thread-1) created /terms: solr.SearchHandler
09:12:08,020 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,022 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,023 INFO  [org.apache.solr.handler.loader.XMLLoader]
(coreLoadExecutor-3-thread-1) xsltCacheLifetimeSeconds=60
09:12:08,026 INFO  [org.apache.solr.core.CachingDirectoryFactory]
(coreLoadExecutor-3-thread-1) Releasing directory:/storage/solrdata
09:12:08,260 INFO  [org.apache.solr.search.SolrIndexSearcher]
(coreLoadExecutor-3-thread-1) Opening Searcher@7703d93e main
09:13:05,576 INFO  [org.jboss.as.server] (DeploymentScanner-threads - 2)
JBAS015870: Deploy of deployment "solr.war" was rolled back with failure
message Operation cancelled
09:13:05,578 ERROR [org.jboss.as.server.deployment.scanner]
(DeploymentScanner-threads - 1) JBAS015052: Did not receive a response to
the deployment operation within the allowed timeout period [60 seconds].
Check the server configuration file and the server logs to find more about
the status of the deployment.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Server-stops-at-Opening-Searcher-in-4-1-tp4037458.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud 4.1 - change config set for a collection?

2013-01-30 Thread Shawn Heisey
Is it possible to issue a command through the collections API that will 
assign a new config set (already stored in zookeeper) to an existing 
collection?


Related - because such changes would require a reload, is there a RELOAD 
action on the collection API that finds all the cores for that 
collection and reloads them?


The SolrCloud wiki page doesn't have a reference for the action 
parameter and the parameter list is almost guaranteed to be incomplete.


I thought I remembered seeing a Jira issue that would automatically 
reload collections when making changes to the associated config set.


Thanks,
Shawn


Re: SolrCloud: admin security vs. replication

2013-01-30 Thread Mark Miller
The admin user interface and admin/cores are two very different things - they 
just happen to share admin in the url.

It doesn't make any sense to secure admin/cores unless you are also going to 
secure all the other Solr API's.

- Mark

On Jan 30, 2013, at 5:55 AM, AlexeyK  wrote:

> Hi,
> There are a lot of posts which talk about hardening the /admin handler with
> user credentials etc.
> From the other hand, replication handler wouldn't work if /admin/cores is
> also hardened.
> Considering this fact, how could I allow secure external access to the admin
> interface AND allow proper cluster work?
> Not setting any security on admin/cores is not an option.
> 
> Thanks,
> Alexey 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-admin-security-vs-replication-tp4037337.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Can I start solr with replication activated but disabled between master and slave

2013-01-30 Thread Arcadius Ahouansou
As stated by Robi, you can through the admin UI:

-disable replication on the master through the admin or

-disable polling on the slave through the admin UI. Disabling polling on
the slaves is very handy if you doing stuff on the master that require
master restart as a restart.

Thanks.

Arcadius.





On 30 January 2013 16:35, Petersen, Robert  wrote:

> Hi Jamel,
>
> You can start solr slaves with them pointed at a master and then turn off
> replication in the admin replication page.
>
> Hope that helps,
> -Robi
>
> Robert (Robi) Petersen
> Senior Software Engineer
> Search Department
>
>
>
>
> -Original Message-
> From: Jamel ESSOUSSI [mailto:jamel.essou...@gmail.com]
> Sent: Wednesday, January 30, 2013 2:45 AM
> To: solr-user@lucene.apache.org
> Subject: Can I start solr with replication activated but disabled between
> master and slave
>
> Hello,
>
> I would like to start solr with the following configuration;
>
> Replication between master and slave activated but not enabled.
>
> Regards
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-I-start-solr-with-replication-activated-but-disabled-between-master-and-slave-tp4037333.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


RE: Can I start solr with replication activated but disabled between master and slave

2013-01-30 Thread Petersen, Robert
Hi Jamel,

You can start solr slaves with them pointed at a master and then turn off 
replication in the admin replication page.

Hope that helps,
-Robi

Robert (Robi) Petersen
Senior Software Engineer
Search Department

 


-Original Message-
From: Jamel ESSOUSSI [mailto:jamel.essou...@gmail.com] 
Sent: Wednesday, January 30, 2013 2:45 AM
To: solr-user@lucene.apache.org
Subject: Can I start solr with replication activated but disabled between 
master and slave

Hello,

I would like to start solr with the following configuration;

Replication between master and slave activated but not enabled.

Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-start-solr-with-replication-activated-but-disabled-between-master-and-slave-tp4037333.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: A question about attaching shards to load balancers

2013-01-30 Thread Peter Keegan
Aren't you concerned about having a single point of failure with this setup?

On Wed, Jan 30, 2013 at 10:38 AM, Michael Ryan  wrote:

> From a performance point of view, I can't imagine it mattering. In our
> setup, we have a dedicated Solr server that is not a shard that takes
> incoming requests (we call it the "coordinator"). This server is very
> lightweight and practically has no load at all.
>
> My gut feeling is that having a separate dedicated server might be a
> slightly better approach, as it will have totally different performance
> characteristics than the shards, and so you can tune it for this.
>
> -Michael
>


Re: configuring datasource for dynamic password and user

2013-01-30 Thread Walter Underwood
This was discussed last week, with two different solutions:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/browser

In general, you can set a Java property, like "-Ddbpass=fred", then use it in 
the config files as "${dbpass}".

wunder

On Jan 30, 2013, at 3:37 AM, Lapera-Valenzuela, Elizabeth [Primerica] wrote:

> Hi, we will be using solr on development, test and prod platforms.  Is
> there a way to dynamically create the datasource so that the url,
> password and user id is passed in or can I point it to a properties file
> that has this info?  Thanks 
> 







Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry
(Sorry for in complete reply in my previous mail, didn't know Ctrl F sends
an email in Gmail.. ;-))

Thanks Felipe, yes I have seen that and my requirement falls for

How can I make exact-case matches score higher

Example: a query of "Penguin" should score documents containing "Penguin"
higher than docs containing "penguin".

The general strategy is to index the content twice, using different fields
with different fieldTypes (and different analyzers associated with those
fieldTypes). One analyzer will contain a lowercase filter for
case-insensitive matches, and one will preserve case for exact-case matches.

Use copyField  commands in
the schema to index a single input field multiple times.

Once the content is indexed into multiple fields that are analyzed
differently, query across both
fields
.

I have added a case insensitive field too to match the exact matches
higher, however the result is not even considering the matches in field -
forget the exact matching part.

And I have tried the debugQuery option as mentioned in my previous mail,
and I have also posted the parsed queries. From the debug query, I see that
field boosted with lesser factor (contribution) is still resulting higher
than the one with higher boost factor (series_title).


Thanks,

Sandeep




On 30 January 2013 16:02, Sandeep Mestry  wrote:

> Thanks Felipe, yes I have seen that and my requirement somewhere falls for
>
>
> On 30 January 2013 15:53, Felipe Lahti  wrote:
>
>> Hi Sandeep,
>>
>> Quick answer is that not only the boost that you define in your
>> requestHandler is taken to calculate the score of each document. There are
>> others factors that contribute to score calculation. You can take a look
>> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can
>> see
>> using debugQuery=true the score calculation for each document returned.
>>
>> Let me know you need something else.
>>
>>
>>
>> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
>> wrote:
>>
>> > Hi All,
>> >
>> > I'm facing an issue in relevancy calculation by dismax query parser.
>> > The boost factor applied does not work as expected in certain cases when
>> > the keyword is generic and by generic I mean, if the keyword is
>> appearing
>> > many times in the document as well as in the index.
>> >
>> > I have parser configuration as below:
>> >
>> > 
>> > 
>> > edismax
>> > explicit
>> > 0.01
>> > series_title^500 title^100 description^15
>> > contribution
>> > series_title^200
>> > 0
>> > *:*
>> > 
>> > 
>> >
>> > As you can see above, I'd expect the documents containing the matches
>> for
>> > series title should rank higher than the ones in contribution.
>> >
>> > This works well, if I type in a query like 'wonderworld' which is a less
>> > occurring term and the series titles rank higher. But, if I type in a
>> > keyword like 'news' which is the most common term in the index, I get
>> hits
>> > in contributions even though I have lots of documents having word news
>> in
>> > series title.
>> >
>> > The field definition is as below:
>> >
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="false" />
>> > > > multiValued="true" />
>> >
>> > > > compressThreshold="10">
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> > 
>> > > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > 
>> > 
>> > 
>> >
>> > > positionIncrementGap="100"
>> > >
>> > 
>> > 
>> > > > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> > catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> > splitOnNumerics="0" preserveOriginal="1" />
>> > 
>> > 
>> > 
>> > 
>> > > > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
>> > catenateWords="1" catenateNumbers="1" catenateAll="1"
>> splitOnCaseChange="1"
>> > splitOnNumerics="0" preserveOriginal="1" />
>> > 
>> > 
>> >  
>> >
>> > I have tried debugging and when I use query term news, I see that
>> matches
>> > for contributions are ranked higher than series title. The parsed
>> queries
>> > look like below:
>> > (Note that I have edited the query as in reality I have lot of fields
>> that
>> > are searchable and I have only mentioned the fields containing text
>> data -
>> > rest all contain uuids)
>> >
>> > 
>> > (+DisjunctionM

Re: Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry
Thanks Felipe, yes I have seen that and my requirement somewhere falls for


On 30 January 2013 15:53, Felipe Lahti  wrote:

> Hi Sandeep,
>
> Quick answer is that not only the boost that you define in your
> requestHandler is taken to calculate the score of each document. There are
> others factors that contribute to score calculation. You can take a look
> here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
> using debugQuery=true the score calculation for each document returned.
>
> Let me know you need something else.
>
>
>
> On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry 
> wrote:
>
> > Hi All,
> >
> > I'm facing an issue in relevancy calculation by dismax query parser.
> > The boost factor applied does not work as expected in certain cases when
> > the keyword is generic and by generic I mean, if the keyword is appearing
> > many times in the document as well as in the index.
> >
> > I have parser configuration as below:
> >
> > 
> > 
> > edismax
> > explicit
> > 0.01
> > series_title^500 title^100 description^15
> > contribution
> > series_title^200
> > 0
> > *:*
> > 
> > 
> >
> > As you can see above, I'd expect the documents containing the matches for
> > series title should rank higher than the ones in contribution.
> >
> > This works well, if I type in a query like 'wonderworld' which is a less
> > occurring term and the series titles rank higher. But, if I type in a
> > keyword like 'news' which is the most common term in the index, I get
> hits
> > in contributions even though I have lots of documents having word news in
> > series title.
> >
> > The field definition is as below:
> >
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="false" />
> >  > multiValued="true" />
> >
> >  > compressThreshold="10">
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> >
> >  positionIncrementGap="100"
> > >
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> > 
> > 
> >  > stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> > catenateWords="1" catenateNumbers="1" catenateAll="1"
> splitOnCaseChange="1"
> > splitOnNumerics="0" preserveOriginal="1" />
> > 
> > 
> >  
> >
> > I have tried debugging and when I use query term news, I see that matches
> > for contributions are ranked higher than series title. The parsed queries
> > look like below:
> > (Note that I have edited the query as in reality I have lot of fields
> that
> > are searchable and I have only mentioned the fields containing text data
> -
> > rest all contain uuids)
> >
> > 
> > (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> > contributions:news | series_title:news^500.0)~0.01) () () () () () () ()
> ()
> > () () () () () () () () () () () () () () () () () () () ())/no_coord
> > 
> > 
> > +(description:news^15 | title:news^100.0 | contributions:news |
> > series_title:news^500.0)~0.01 () () () () () () () () () () () () () ()
> ()
> > () () () () () () () () () () () () ()
> >
> >
> > Could you guide me in right direction please?
> >
> > Many Thanks,
> > Sandeep
> >
>
>
>
> --
> Felipe Lahti
> Consultant Developer - ThoughtWorks Porto Alegre
>


Re: Variable expansion in DIH SimplePropertiesWriter's filename?

2013-01-30 Thread Jonas Birgander
Thank you for the reply, issue created at 
.


Regards,
Jonas Birgander


On 2013-01-30 16:26, Dyer, James wrote:

This is a bug.  Can you paste what you've said here into a new JIRA issue?

https://issues.apache.org/jira/browse/SOLR

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jonas Birgander [mailto:jonas.birgan...@prisjakt.nu]
Sent: Wednesday, January 30, 2013 4:54 AM
To: solr-user@lucene.apache.org
Subject: Variable expansion in DIH SimplePropertiesWriter's filename?

Hello,

I'm testing Solr 4.1, but I've run into some problems with
DataImportHandler's new propertyWriter tag.
I'm trying to use variable expansion in the `filename` field when using
SimplePropertiesWriter.

Here are the relevant parts of my configuration:

conf/solrconfig.xml
-


  db-data-config.xml



  
  ${country_code}
  




conf/db-data-config.xml
-





  





  






If country_code is set to "gb", I want the last_index_time to be read
and written in the file conf/gb.dataimport.properties, instead of the
default conf/dataimport.properties

The variable expansion works perfectly in the SQL and setup of the data
source, but not in the property writer's filename field.

When initiating an import, the log file shows:

Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
maybeReloadConfiguration
INFO: Loading DIH Configuration: db-data-config.xml
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$skipDoc present in DataConfig does not have a
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$deleteDocById present in DataConfig does not have a
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
loadDataConfig
INFO: Data Configuration loaded successfully
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Jan 30, 2013 11:25:42 AM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
WARNING: Unable to read:
${dataimporter.request.country_code}.dataimport.properties


Is it supposed to work?
Anyone else having problems with this?


Any help appreciated!

Regards,




--
Jonas Birgander 
Systemutvecklare Prisjakt & Minhembio


Re: Possible issue in edismax?

2013-01-30 Thread Felipe Lahti
Hi Sandeep,

Quick answer is that not only the boost that you define in your
requestHandler is taken to calculate the score of each document. There are
others factors that contribute to score calculation. You can take a look
here about http://wiki.apache.org/solr/SolrRelevancyFAQ. Also, you can see
using debugQuery=true the score calculation for each document returned.

Let me know you need something else.



On Wed, Jan 30, 2013 at 1:13 PM, Sandeep Mestry  wrote:

> Hi All,
>
> I'm facing an issue in relevancy calculation by dismax query parser.
> The boost factor applied does not work as expected in certain cases when
> the keyword is generic and by generic I mean, if the keyword is appearing
> many times in the document as well as in the index.
>
> I have parser configuration as below:
>
> 
> 
> edismax
> explicit
> 0.01
> series_title^500 title^100 description^15
> contribution
> series_title^200
> 0
> *:*
> 
> 
>
> As you can see above, I'd expect the documents containing the matches for
> series title should rank higher than the ones in contribution.
>
> This works well, if I type in a query like 'wonderworld' which is a less
> occurring term and the series titles rank higher. But, if I type in a
> keyword like 'news' which is the most common term in the index, I get hits
> in contributions even though I have lots of documents having word news in
> series title.
>
> The field definition is as below:
>
>  multiValued="false" />
>  multiValued="false" />
>  multiValued="false" />
>  multiValued="true" />
>
>  compressThreshold="10">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> 
>
>  >
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
> 
> 
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
> 
> 
>  
>
> I have tried debugging and when I use query term news, I see that matches
> for contributions are ranked higher than series title. The parsed queries
> look like below:
> (Note that I have edited the query as in reality I have lot of fields that
> are searchable and I have only mentioned the fields containing text data -
> rest all contain uuids)
>
> 
> (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
> contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
> () () () () () () () () () () () () () () () () () () () ())/no_coord
> 
> 
> +(description:news^15 | title:news^100.0 | contributions:news |
> series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
> () () () () () () () () () () () () ()
>
>
> Could you guide me in right direction please?
>
> Many Thanks,
> Sandeep
>



-- 
Felipe Lahti
Consultant Developer - ThoughtWorks Porto Alegre


Term Frequencies for Query Result

2013-01-30 Thread Kai Gülzau
Hi,

I am looking for a way to get the top terms for a query result.

Faceting does not work since counts are measured as documents containing a term 
and not as the overall count of a term in all found documents:

http://localhost:8983/solr/master/select?q=type%3A7&rows=1&wt=json&indent=true&facet=true&facet.query=type%3A7&facet.field=albody&facet.method=fc

  "facet_counts":{
"facet_queries":{
  "type:7":156},
"facet_fields":{
  "albody":[
"der",73,
"in",68,
"betreff",63,
...


Using http://wiki.apache.org/solr/TermVectorComponent an counting all 
frequencies manually seems to be the only solution by now:

http://localhost:8983/solr/tvrh/?q=type:7&tv.fl=albody&f.albody.tv.tf=true&wt=json&indent=true


"termVectors":[

"uniqueKeyFieldName","ukey",

"798_7_0",[

  "uniqueKey","798_7_0",

  "albody",[

"der",[

  "tf",5],

"die",[

  "tf",7],

...



Does anyone know a better and more efficient solution?


Regards,

Kai Gülzau



Re: configuring datasource for dynamic password and user

2013-01-30 Thread Michael Della Bitta
Sorry, email sent too quickly. Here's the second url:

http://wiki.apache.org/solr/SolrConfigXml?highlight=%28solrconfig%5C.xml%29#System_property_substitution

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Jan 30, 2013 at 10:42 AM, Michael Della Bitta
 wrote:
> Hi Elizabeth,
>
> I haven't tried this, but given this entry:
>
> http://wiki.apache.org/solr/DataImportHandler#Adding_datasource_in_solrconfig.xml
>
> You should be able to parameterize the arguments in solrconfig.xml
> with environment variables and then set them in solr.xml or at runtime
> using command line arguments like this:
>
>
> Michael
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, Jan 30, 2013 at 6:37 AM, Lapera-Valenzuela, Elizabeth
> [Primerica]  wrote:
>> Hi, we will be using solr on development, test and prod platforms.  Is
>> there a way to dynamically create the datasource so that the url,
>> password and user id is passed in or can I point it to a properties file
>> that has this info?  Thanks
>>


Re: configuring datasource for dynamic password and user

2013-01-30 Thread Michael Della Bitta
Hi Elizabeth,

I haven't tried this, but given this entry:

http://wiki.apache.org/solr/DataImportHandler#Adding_datasource_in_solrconfig.xml

You should be able to parameterize the arguments in solrconfig.xml
with environment variables and then set them in solr.xml or at runtime
using command line arguments like this:


Michael

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Jan 30, 2013 at 6:37 AM, Lapera-Valenzuela, Elizabeth
[Primerica]  wrote:
> Hi, we will be using solr on development, test and prod platforms.  Is
> there a way to dynamically create the datasource so that the url,
> password and user id is passed in or can I point it to a properties file
> that has this info?  Thanks
>


RE: A question about attaching shards to load balancers

2013-01-30 Thread Michael Ryan
>From a performance point of view, I can't imagine it mattering. In our setup, 
>we have a dedicated Solr server that is not a shard that takes incoming 
>requests (we call it the "coordinator"). This server is very lightweight and 
>practically has no load at all.

My gut feeling is that having a separate dedicated server might be a slightly 
better approach, as it will have totally different performance characteristics 
than the shards, and so you can tune it for this.

-Michael


Re: help to build query

2013-01-30 Thread Jack Krupansky
Start by expressing the specific semantics of those queries in strict 
boolean form. I mean, what exactly do you mean by "in", and "location1, 
location 2", and "location1, loc2 and loc3? Is the latter an AND or an OR?


Or at least fully express those two queries, unambiguously in plain English. 
There is too much ambiguity present to give you any solid direction.


-- Jack Krupansky

-Original Message- 
From: Abhishek tiwari

Sent: Wednesday, January 30, 2013 12:55 AM
To: solr-user@lucene.apache.org
Subject: help to build query

want to execute queries like :
a)  cat in location1 , location2
b)  cat 1 and cat2 in location1 ,loc2 and  loc3

in our search .

our challenges :

1)  picking right keywords(category and locality) from query entered.
2)  its mapping to relevant entity

How should i proceed for it .

we have localities and categories data indexed .

thanks in advance.

~abhishek 



RE: Variable expansion in DIH SimplePropertiesWriter's filename?

2013-01-30 Thread Dyer, James
This is a bug.  Can you paste what you've said here into a new JIRA issue?  

https://issues.apache.org/jira/browse/SOLR

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jonas Birgander [mailto:jonas.birgan...@prisjakt.nu] 
Sent: Wednesday, January 30, 2013 4:54 AM
To: solr-user@lucene.apache.org
Subject: Variable expansion in DIH SimplePropertiesWriter's filename?

Hello,

I'm testing Solr 4.1, but I've run into some problems with 
DataImportHandler's new propertyWriter tag.
I'm trying to use variable expansion in the `filename` field when using 
SimplePropertiesWriter.

Here are the relevant parts of my configuration:

conf/solrconfig.xml
-

   
 db-data-config.xml
   

   
 
 ${country_code}
 
   



conf/db-data-config.xml
-

   

   
   
 
   



   
 
   





If country_code is set to "gb", I want the last_index_time to be read 
and written in the file conf/gb.dataimport.properties, instead of the 
default conf/dataimport.properties

The variable expansion works perfectly in the SQL and setup of the data 
source, but not in the property writer's filename field.

When initiating an import, the log file shows:

Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
maybeReloadConfiguration
INFO: Loading DIH Configuration: db-data-config.xml
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$skipDoc present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$deleteDocById present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig
INFO: Data Configuration loaded successfully
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.SimplePropertiesWriter 
readIndexerProperties
WARNING: Unable to read: 
${dataimporter.request.country_code}.dataimport.properties


Is it supposed to work?
Anyone else having problems with this?


Any help appreciated!

Regards,
-- 
Jonas Birgander 
Systemutvecklare Prisjakt & Minhembio




Re: A question about attaching shards to load balancers

2013-01-30 Thread Shawn Heisey

On 1/30/2013 6:45 AM, Lee, Peter wrote:

Upayavira,

Thank you for your response. I'm sorry my post is perhaps not clear...I am 
relatively new to solr and I'm not sure I'm using the correct nomenclature.

We did encounter the issue of one shard in the stripe going down and all other 
shards continue to receive requests...and return errors because of the missing 
shard. We did in fact correct this problem by making our healthcheck smart 
enough to test all of the other servers in the stripe. That works very well and 
was not hard at all to implement.

My intended question was one entirely about performance.  Perhaps if I am more 
specific it will help.

We have 6 servers per "stripe" (which means, a search request going to any one 
of these servers also generates traffic on the other 5 servers in the stripe to fulfill 
the request) and multiple stripes (for load and for redundancy). For this discussion 
though, let's assume we have only ONE stripe.

We currently have a load balancer that points to all 6 of the servers in our stripe. That 
is, requests from "outside" can be directed to any server in the stripe.

The question is: Has anyone performed empirical testing to see if perhaps 
having 2 or 3 servers (instead of all 6) on the load balancer improves 
performance?

In this configuration, sure, not all servers can field requests from the "outside." 
However, the total amount of "conversation" going on between the different servers will 
also be lower, as distributed searches can now only originate from 2 or 3 servers in the stripe 
(however many we attached to the load balancer).

We can perform this testing, but it will take time, so I thought I'd ask if anyone has 
done this already. I was hoping to find a mention of a "best practice" 
somewhere regarding this type of question, but I have not found one yet.


I have a multi-server distributed Solr 3.5 installation behind a load 
balancer (haproxy).  The application and the load balancer are 
completely unaware of the shards parameter- that's handled in Solr. 
Here's how I've made that work:


The core with the shards parameter (we refer to it as a broker core) 
exists on all servers.  There are two servers for chain A and two 
servers for chain B.  Three of the seven shards live on idxa1/idxb1 and 
four of the shards live on idxa2/idxb2.  The "shards" parameter on both 
chain A servers point only to chain A shards.  The same goes for chain B.


The ping handler's health check query contains shards and shards.qt 
parameters, so the health check will fail if any of the shards for that 
chain are down.


The load balancer has idxa1 and idxb1 as primary equal cost entries.  It 
has idxa2 and idxb2 as backup entries, with idxa2 having the higher 
weight.  In normal operation, queries only go to idxa1 and idxb1.


If any shard failure happens on either chain A server, both the idxa1 
and idxa2 entries will be marked down by the health check and queries 
will only go to chain B.


I can also disable these servers from the load balancer's perspective 
using the admin UI.  If idxb1 is disabled, all queries will go to idxa1 
(which utilizes both idxa1 and idxa2).  In that situation, if any chain 
A failure were to happen but the chain B shards were all still fine, 
idxb2 would still be marked up and the load balancer would send the 
queries there.


The two index chains are independently updated - no replication.  This 
allows me to disable either idxa1 or idxb1 and completely rebuild (or 
upgrade) the disabled chain while the other chain remains online.  I can 
then switch and do the same thing to the other chain, and the 
application using Solr has no idea anything has happened.


Thanks,
Shawn



Possible issue in edismax?

2013-01-30 Thread Sandeep Mestry
Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.

I have parser configuration as below:



edismax
explicit
0.01
series_title^500 title^100 description^15
contribution
series_title^200
0
*:*



As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:






























 

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)


(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord


+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep


Re: CopyField issue on Solr4.1

2013-01-30 Thread Shawn Heisey

On 1/30/2013 7:28 AM, Upayavira wrote:

Stored fields are now compressed in 4.1. There's other efficiencies too
in 4.0 that will also result in smaller indexes, but the compressed
stored fields is the most significant.


The compressed stored fields explains your smaller index.  As to why you 
get different results, are you doing the default relevancy ranking, or 
are you sorting by one of your fields?  If you are doing the default 
relevancy ranking, you may be getting different results because of 
scoring bugs that have been fixed since 3.6.1.  Try sorting your results 
by a field - add "&sort=fieldname asc" or "&sort=fieldname desc" to the 
URL and see if the results are what you expect.


If you are already sorting, that's another situation.  If you search for 
all documents on both indexes, is the numFound value about where it 
should be?


Thanks,
Shawn



Re: CopyField issue on Solr4.1

2013-01-30 Thread Jack Krupansky
There are probably any number of changes between 3.x and 4.x to account for 
query differences. This includes bug fixes and in some cases new bugs, in 
areas such as the query parsers and various filters. The first step is to 
isolate a couple of examples of both false positive queries and false 
negative queries. Then look at the field types involved. Then use the Solr 
Admin Analysis UI to see how an index or query term analyzes differently. 
Post the details here and we can figure out what change is causing your 
query discrepencies.


-- Jack Krupansky

-Original Message- 
From: anarchos78

Sent: Wednesday, January 30, 2013 8:59 AM
To: solr-user@lucene.apache.org
Subject: CopyField issue on Solr4.1

Hello,

I am using Solr 3.6.1 and I am very satisfied. Now I want to move on
solr4.1. So I took “schema.xml” and “solrconfig.xml” (with minor changes)
and place them under my new solr4.1 configuration. The indexing was
successful (DIH). But, I have noticed an issue. In “schema.xml” I have
“copyField” directives in order to index same fields using different
“types”. When I try to index using the same configuration on solr4.1, the
index size is the half of the index size on solr3.6.1 (and when I query I
get different results). Has anything changed on Solr4.1? I need little help
on this.

*The schema.xml:*





${solr.abortOnConfigurationError:true}


 LUCENE_41

 ${solr.data.dir:}

 

 

 

 

 
 


 

   2048


   

   



   

   true

   150

   200

   
 
   
 χρησικτησια νομη
 apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
   
 νομη
 apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
   
 χρησικτησια νομη
 apofasi_taxonomy:ΠΟΙΝΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
 
   


 
   
 χρησικτησια νομη
 apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
   
 νομη
 apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
   
 χρησικτησια νομη
 apofasi_taxonomy:ΠΟΙΝΙΚΕΣ
 apofasi_date asc,ida desc,apofasi_tmima
desc
 0
 150
   
 
  

  false

  2

 

 
   
   
 

 

data-config.xml

 

 

  edismax
  content contentS^10
  content^10 contentS^100
  100
  explicit
  150
  score desc
  edismax
  content contentS^10
  content^10 contentS^100
  100
  json
  true
  solr_id,ida,type,model,keywordlist,title,apofasi_taxonomy,apofasi_tmima,apofasi_date,grid_title
  content,title
  content
  800
  800

 

 
 

 

 

 

 
   
 text
 true
 ignored_
 last_modified
 true
 links
 ignored_
   
 

 

 

 

 



 
   
 solrpingquery
   
   
 all
   
 


 
   
explicit
true
   
 

 
   textSpell
   
 default
 name
 spellchecker
   
 


 
   
 text
 false
 false
 1
   
   
 spellcheck
   
 

 


 

 true
   
   
 terms
   
 

 

   string
   elevate.xml
 


 
   
 explicit
 text
   
   
 elevator
   
 

 
   
 
   
   
 
 
   

 70

 0.5

 [-\w ,/\n\"']{20,200}
   
 

 
   
 
 
   
 


 


 


 


 
 

 
   
 
 
   
 

 
   
 10
 .,!? 	

   
 

 
   

 WORD

 en
 US
   
 
   
 

 

   text/plain; charset=UTF-8
 


   


 
   5
 
 
   *:*


 









*The solrconfig.xml*




 

   

   

   

   
   
   
   
   
   
   
   

   
   

   
   
   
   
   

   
   
   
   

   


   
 
   

   
   
   
 
   



   
   
   
   
   
   

   
   
   
   
   
   
   

   
   


















   
 
   


   

 
 
   


   
 
   

   

   

   

   

   





 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


solr_id
content


  
  
  
  
  
  
  
  





Regards,
Tom




--
View this message in context: 
http://lucene.472066.n3.nabble.com/CopyField-issue-on-Solr4-1-tp4037373.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: A question about attaching shards to load balancers

2013-01-30 Thread Upayavira
I haven't got anything to back this up, but I'd say there's no issue
pointing your load balancer to all your nodes. When you do a distributed
query, the work required of the distributed part is relatively small -
it pushes the request to all the shard nodes, then does the job of
merging the results. This does not require large caches or any such, so
I do not see that you're going to have resource advantages to limiting
them to specific nodes.

Upayavira

On Wed, Jan 30, 2013, at 01:45 PM, Lee, Peter wrote:
> Upayavira,
> 
> Thank you for your response. I'm sorry my post is perhaps not clear...I
> am relatively new to solr and I'm not sure I'm using the correct
> nomenclature.
> 
> We did encounter the issue of one shard in the stripe going down and all
> other shards continue to receive requests...and return errors because of
> the missing shard. We did in fact correct this problem by making our
> healthcheck smart enough to test all of the other servers in the stripe.
> That works very well and was not hard at all to implement.
> 
> My intended question was one entirely about performance.  Perhaps if I am
> more specific it will help.
> 
> We have 6 servers per "stripe" (which means, a search request going to
> any one of these servers also generates traffic on the other 5 servers in
> the stripe to fulfill the request) and multiple stripes (for load and for
> redundancy). For this discussion though, let's assume we have only ONE
> stripe.
> 
> We currently have a load balancer that points to all 6 of the servers in
> our stripe. That is, requests from "outside" can be directed to any
> server in the stripe.
> 
> The question is: Has anyone performed empirical testing to see if perhaps
> having 2 or 3 servers (instead of all 6) on the load balancer improves
> performance? 
> 
> In this configuration, sure, not all servers can field requests from the
> "outside." However, the total amount of "conversation" going on between
> the different servers will also be lower, as distributed searches can now
> only originate from 2 or 3 servers in the stripe (however many we
> attached to the load balancer).
> 
> We can perform this testing, but it will take time, so I thought I'd ask
> if anyone has done this already. I was hoping to find a mention of a
> "best practice" somewhere regarding this type of question, but I have not
> found one yet.
> 
> Thanks.
> 
> Peter S. Lee
> 
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk] 
> Sent: Wednesday, January 30, 2013 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: A question about attaching shards to load balancers
> 
> I'm afraid I'm note completely clear about your scenario. Let me say how
> I understand what you're saying, and what I've done in the past.
> 
> Firstly, I take it you are using Solr 3.x (from your reference to a
> 'shards' parameter.
> 
> Secondly, you refer to a 'stripe' as one set of nodes, one for each
> shard, that are enough to allow querying your whole collection.
> 
> Having created the concept of a 'slice', you then hardwire the 'shards'
> parameter in solrconfig.xml in each machine in that slice to point to all
> the other nodes in that same slice.
> 
> Then you point your load balancer at some boxes, which will do
> distributed queries. Now, by the sounds of it, every box on your setup
> could do this, they all have a shards parameter set up. Minimally, you'll
> want at least one box from each slice, otherwise you'll have slices that
> aren't receiving queries. But could you include all of your boxes, and
> have all of them handling the query distribution work? I guess you could,
> but I'd suggest another architecture.
> 
> In the setup you describe, if you loose one node, you loose an entire
> slice. However, if a distributed query comes into another node in the
> slice, the load balancer may well not notice (unless you make the
> healthcheck itself do a distributed search) and things could get messy.
> 
> What I've done is set up a VIP in my load balancer for each and every
> node that can service a shard. Repeat that for each shard that I have.
> Let's say I have four shards, I'll end up with four VIPs. I then put
> those four VIPs into my shards parameter in solrconfig.xml on all of my
> hosts, regardless of what shard/slice.
> 
> Then, I create another VIP that includes all of my nodes in it. That is
> the one that I hand to my application. 
> 
> This way, you can loose any node in any shard and the thing should keep
> on going. 
> 
> Obviously I'm talking about slaves here. There will be a master for each
> shard which each of these nodes pull their indexes from.
> 
> Hope this is helpful.
> 
> Upayavira
> 
> On Tue, Jan 29, 2013, at 09:35 PM, Lee, Peter wrote:
> > I would appreciate people's experience on the following load balancing 
> > question...
> > 
> > We currently have solr configured in shards across multiple machines 
> > to handle our load. That is, a request being sent to any one of these 
> >

Re: CopyField issue on Solr4.1

2013-01-30 Thread Upayavira
Stored fields are now compressed in 4.1. There's other efficiencies too
in 4.0 that will also result in smaller indexes, but the compressed
stored fields is the most significant.

Upayavira

On Wed, Jan 30, 2013, at 01:59 PM, anarchos78 wrote:
> Hello,
> 
> I am using Solr 3.6.1 and I am very satisfied. Now I want to move on
> solr4.1. So I took “schema.xml” and “solrconfig.xml” (with minor changes)
> and place them under my new solr4.1 configuration. The indexing was
> successful (DIH). But, I have noticed an issue. In “schema.xml” I have
> “copyField” directives in order to index same fields using different
> “types”. When I try to index using the same configuration on solr4.1, the
> index size is the half of the index size on solr3.6.1 (and when I query I
> get different results). Has anything changed on Solr4.1? I need little
> help
> on this.
> 
> *The schema.xml:*
> 
> 
> 
>   
>  
> ${solr.abortOnConfigurationError:true}
>   
>  
>   LUCENE_41   
>  
>   ${solr.data.dir:}
>   
>   
> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>  
>   
> 
>   
>   
>   
>   
>  
>   
>   
>   
>   
>
> 2048
> 
> 
>size="2048"
>   initialSize="1024"
>   autowarmCount="512"
>   cleanupThread="true" />
> 
>size="2048"
>   initialSize="1024"
>   autowarmCount="512"
>   cleanupThread="true" />
>   
>  size="2048"
>   initialSize="2048"
> autowarmCount="512" />
> 
>size="2048"
>   initialSize="512"
>   autowarmCount="512"
>   cleanupThread="true" /> 
> 
> 
> true
> 
> 150
> 
> 200   
>
> 
>   
> 
>   χρησικτησια νομη
>   apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
> 
>   νομη
>   apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
> 
>   χρησικτησια νομη
>   apofasi_taxonomy:ΠΟΙΝΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
>   
> 
>   
>   
>   
> 
>   χρησικτησια νομη
>   apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
> 
>   νομη
>   apofasi_taxonomy:ΠΟΛΙΤΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
> 
>   χρησικτησια νομη
>   apofasi_taxonomy:ΠΟΙΝΙΚΕΣ
>   apofasi_date asc,ida desc,apofasi_tmima
> desc
>   0
>   150
> 
>   
>
>
>false
>
>2
> 
>   
>   
>
>  multipartUploadLimitInKB="2048000" /> 
> 
>   
>   
>class="org.apache.solr.handler.dataimport.DataImportHandler">
>   
>   data-config.xml
>   
>   
>   
>   
>  
>  edismax
>  content contentS^10
>content^10 contentS^100
>  100
>explicit
>150
>  score desc
>  edismax
>  content contentS^10
>content^10 contentS^100
>  100
>json
>true  
> name="fl">solr_id,ida,type,model,keywordlist,title,apofasi_taxonomy,apofasi_tmima,apofasi_date,grid_title
>content,title
>content
>800
>800  
> 
>   
>   
>  class="solr.XmlUpdateRequestHandler">
>   
>   
>  class="solr.BinaryUpdateRequestHandler" />
> 
>  class="solr.CSVRequestHandler" 
>   startup="lazy" />
> 
>  class="solr.JsonUpdateRequestHandler" 
>   startup="lazy" />
> 
>  startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   text
>   true
>   ignored_
>   last_modified
>   true
>   links
>   ignored_
> 
>   
>   
>   startup="lazy"
>class="solr.XsltUpdateRequestHandler"/>
>   

Re: Solr 4.1 UI fail to display result

2013-01-30 Thread Stefan Matheis
On Wednesday, January 30, 2013 at 2:13 PM, J Mohamed Zahoor wrote:
> I am using Safari 6.0.2 and i see a "SyntaxError: JSON Parse error: 
> Unrecognized token '<'".


I'm not sure why .. but this sounds like the JSON Parser was called with an 
HTML- or XML-String? After you hit the "Execute" Button on the Website, on the 
top of the right content-area, there is a link - which is what the UI will 
request .. if you open that in another browser-tab or with curl/wget .. what is 
the response you get? Is that really JSON? Or perhaps some kind of Error 
Message?


Re: Solr 4.1 UI fail to display result

2013-01-30 Thread J Mohamed Zahoor

Hi Alex,

Cleared Cache -  Problem persists.
Disabled Cache - problem Persists.

This was in Safari though.

./zahoor


On 30-Jan-2013, at 6:55 PM, Alexandre Rafalovitch  wrote:

> Before worrying about anything else, try doing a full cache clean. My
> (Chrome) browser was caching Solr 4.0 resources for unreasonably long
> period of time until I completely disable its cache (in dev tools) and
> tried the full reload.
> 
> Or try a browser you did not use before.
> 
> Regards,
>   Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Wed, Jan 30, 2013 at 8:17 AM, J Mohamed Zahoor  wrote:
> 
>> The stack is
>> 
>> format_json  -- app.js  (465)
>> json -- query.js (59)
>> complete - query.js (77)
>> fire -- require.js (3099)
>> fireWith -- require.js (3217)
>> done -- require.js (9469)
>> callback -- require.js (10235)
>> 
>> ./zahoor
>> 
>> 
>> On 30-Jan-2013, at 6:43 PM, J Mohamed Zahoor  wrote:
>> 
>>> Hi
>>> 
>>> Iam using 4.1 release and i see a problem when i set the response type
>> as JSON in the UI.
>>> 
>>> I am using Safari 6.0.2 and i see a "SyntaxError: JSON Parse error:
>> Unrecognized token '<'".
>>> 
>>> app.js line 465. When i debug more.. i see the response is still coming
>> in XML format.
>>> 
>>> 
>>> Is anyone else facing this problem?
>>> 
>>> ./Zahoor
>> 
>> 



RE: A question about attaching shards to load balancers

2013-01-30 Thread Lee, Peter
Upayavira,

Thank you for your response. I'm sorry my post is perhaps not clear...I am 
relatively new to solr and I'm not sure I'm using the correct nomenclature.

We did encounter the issue of one shard in the stripe going down and all other 
shards continue to receive requests...and return errors because of the missing 
shard. We did in fact correct this problem by making our healthcheck smart 
enough to test all of the other servers in the stripe. That works very well and 
was not hard at all to implement.

My intended question was one entirely about performance.  Perhaps if I am more 
specific it will help.

We have 6 servers per "stripe" (which means, a search request going to any one 
of these servers also generates traffic on the other 5 servers in the stripe to 
fulfill the request) and multiple stripes (for load and for redundancy). For 
this discussion though, let's assume we have only ONE stripe.

We currently have a load balancer that points to all 6 of the servers in our 
stripe. That is, requests from "outside" can be directed to any server in the 
stripe.

The question is: Has anyone performed empirical testing to see if perhaps 
having 2 or 3 servers (instead of all 6) on the load balancer improves 
performance? 

In this configuration, sure, not all servers can field requests from the 
"outside." However, the total amount of "conversation" going on between the 
different servers will also be lower, as distributed searches can now only 
originate from 2 or 3 servers in the stripe (however many we attached to the 
load balancer).

We can perform this testing, but it will take time, so I thought I'd ask if 
anyone has done this already. I was hoping to find a mention of a "best 
practice" somewhere regarding this type of question, but I have not found one 
yet.

Thanks.

Peter S. Lee

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Wednesday, January 30, 2013 5:24 AM
To: solr-user@lucene.apache.org
Subject: Re: A question about attaching shards to load balancers

I'm afraid I'm note completely clear about your scenario. Let me say how I 
understand what you're saying, and what I've done in the past.

Firstly, I take it you are using Solr 3.x (from your reference to a 'shards' 
parameter.

Secondly, you refer to a 'stripe' as one set of nodes, one for each shard, that 
are enough to allow querying your whole collection.

Having created the concept of a 'slice', you then hardwire the 'shards'
parameter in solrconfig.xml in each machine in that slice to point to all the 
other nodes in that same slice.

Then you point your load balancer at some boxes, which will do distributed 
queries. Now, by the sounds of it, every box on your setup could do this, they 
all have a shards parameter set up. Minimally, you'll want at least one box 
from each slice, otherwise you'll have slices that aren't receiving queries. 
But could you include all of your boxes, and have all of them handling the 
query distribution work? I guess you could, but I'd suggest another 
architecture.

In the setup you describe, if you loose one node, you loose an entire slice. 
However, if a distributed query comes into another node in the slice, the load 
balancer may well not notice (unless you make the healthcheck itself do a 
distributed search) and things could get messy.

What I've done is set up a VIP in my load balancer for each and every node that 
can service a shard. Repeat that for each shard that I have.
Let's say I have four shards, I'll end up with four VIPs. I then put those four 
VIPs into my shards parameter in solrconfig.xml on all of my hosts, regardless 
of what shard/slice.

Then, I create another VIP that includes all of my nodes in it. That is the one 
that I hand to my application. 

This way, you can loose any node in any shard and the thing should keep on 
going. 

Obviously I'm talking about slaves here. There will be a master for each shard 
which each of these nodes pull their indexes from.

Hope this is helpful.

Upayavira

On Tue, Jan 29, 2013, at 09:35 PM, Lee, Peter wrote:
> I would appreciate people's experience on the following load balancing 
> question...
> 
> We currently have solr configured in shards across multiple machines 
> to handle our load. That is, a request being sent to any one of these 
> servers will cause that server to query the rest of the servers in 
> that "stripe" (we use the term "stripe" to refer to a set of servers 
> that point to each other with the shard parameter).
> 
> We currently have all servers in a stripe registered with our load 
> balancer. Thus, requests are being spread out across all servers in 
> the stripe...but of course requests to any shard generates additional 
> traffic on all shards in that stripe.
> 
> My question (finally) is this: Has anyone determined if it is better 
> to place only a few (that is, not all) of the shards in a stripe on 
> the load balancer as versus ALL of the shards in a stripe on the load 
> bala

Re: Solr 4.1 UI fail to display result

2013-01-30 Thread Alexandre Rafalovitch
Before worrying about anything else, try doing a full cache clean. My
(Chrome) browser was caching Solr 4.0 resources for unreasonably long
period of time until I completely disable its cache (in dev tools) and
tried the full reload.

Or try a browser you did not use before.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jan 30, 2013 at 8:17 AM, J Mohamed Zahoor  wrote:

> The stack is
>
> format_json  -- app.js  (465)
> json -- query.js (59)
> complete - query.js (77)
> fire -- require.js (3099)
> fireWith -- require.js (3217)
> done -- require.js (9469)
> callback -- require.js (10235)
>
> ./zahoor
>
>
> On 30-Jan-2013, at 6:43 PM, J Mohamed Zahoor  wrote:
>
> > Hi
> >
> > Iam using 4.1 release and i see a problem when i set the response type
> as JSON in the UI.
> >
> > I am using Safari 6.0.2 and i see a "SyntaxError: JSON Parse error:
> Unrecognized token '<'".
> >
> > app.js line 465. When i debug more.. i see the response is still coming
> in XML format.
> >
> >
> > Is anyone else facing this problem?
> >
> > ./Zahoor
>
>


Re: Solr 4.1 UI fail to display result

2013-01-30 Thread J Mohamed Zahoor
The stack is

format_json  -- app.js  (465)
json -- query.js (59)
complete - query.js (77)
fire -- require.js (3099)
fireWith -- require.js (3217)
done -- require.js (9469)
callback -- require.js (10235)

./zahoor


On 30-Jan-2013, at 6:43 PM, J Mohamed Zahoor  wrote:

> Hi 
> 
> Iam using 4.1 release and i see a problem when i set the response type as 
> JSON in the UI.
> 
> I am using Safari 6.0.2 and i see a "SyntaxError: JSON Parse error: 
> Unrecognized token '<'".
> 
> app.js line 465. When i debug more.. i see the response is still coming in 
> XML format.
> 
> 
> Is anyone else facing this problem?
> 
> ./Zahoor



Solr 4.1 UI fail to display result

2013-01-30 Thread J Mohamed Zahoor
Hi 

Iam using 4.1 release and i see a problem when i set the response type as JSON 
in the UI.

I am using Safari 6.0.2 and i see a "SyntaxError: JSON Parse error: 
Unrecognized token '<'".

app.js line 465. When i debug more.. i see the response is still coming in XML 
format.


Is anyone else facing this problem?

./Zahoor

Fwd: advice about develop AbstractSolrEventListener.

2013-01-30 Thread Miguel


Hi

I have to developed a function that must comunicate with webservice and
this function must execute after each time commits.
My doubt;
it's possible get that records had been updated on solr index?
My function must send information about add, updated and delete records
from solr index to external webservice, and this information must be
send after commit event.

I have read wiki apache solr and it seems the best way is create
listener with event=postCommit, but I have seen example
"solr.RunExecutableListener" and I don't see how to know records
associated to commit event.

Example Solrconfig.xml:


 


Thanks.





configuring datasource for dynamic password and user

2013-01-30 Thread Lapera-Valenzuela, Elizabeth [Primerica]
Hi, we will be using solr on development, test and prod platforms.  Is
there a way to dynamically create the datasource so that the url,
password and user id is passed in or can I point it to a properties file
that has this info?  Thanks 



SolrCloud: admin security vs. replication

2013-01-30 Thread AlexeyK
Hi,
There are a lot of posts which talk about hardening the /admin handler with
user credentials etc.
>From the other hand, replication handler wouldn't work if /admin/cores is
also hardened.
Considering this fact, how could I allow secure external access to the admin
interface AND allow proper cluster work?
Not setting any security on admin/cores is not an option.

Thanks,
Alexey 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-admin-security-vs-replication-tp4037337.html
Sent from the Solr - User mailing list archive at Nabble.com.


Variable expansion in DIH SimplePropertiesWriter's filename?

2013-01-30 Thread Jonas Birgander

Hello,

I'm testing Solr 4.1, but I've run into some problems with 
DataImportHandler's new propertyWriter tag.
I'm trying to use variable expansion in the `filename` field when using 
SimplePropertiesWriter.


Here are the relevant parts of my configuration:

conf/solrconfig.xml
-
class="org.apache.solr.handler.dataimport.DataImportHandler">

  
db-data-config.xml
  

  

${country_code}

  



conf/db-data-config.xml
-

  

  
  

  



  

  





If country_code is set to "gb", I want the last_index_time to be read 
and written in the file conf/gb.dataimport.properties, instead of the 
default conf/dataimport.properties


The variable expansion works perfectly in the SQL and setup of the data 
source, but not in the property writer's filename field.


When initiating an import, the log file shows:

Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
maybeReloadConfiguration

INFO: Loading DIH Configuration: db-data-config.xml
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$skipDoc present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.config.ConfigParseUtil verifyWithSchema
INFO: The field :$deleteDocById present in DataConfig does not have a 
counterpart in Solr Schema
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
loadDataConfig

INFO: Data Configuration loaded successfully
Jan 30, 2013 11:25:42 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport

INFO: Starting Full Import
Jan 30, 2013 11:25:42 AM 
org.apache.solr.handler.dataimport.SimplePropertiesWriter 
readIndexerProperties
WARNING: Unable to read: 
${dataimporter.request.country_code}.dataimport.properties



Is it supposed to work?
Anyone else having problems with this?


Any help appreciated!

Regards,
--
Jonas Birgander 
Systemutvecklare Prisjakt & Minhembio


Can I start solr with replication activated but disabled between master and slave

2013-01-30 Thread Jamel ESSOUSSI
Hello,

I would like to start solr with the following configuration;

Replication between master and slave activated but not enabled.

Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-start-solr-with-replication-activated-but-disabled-between-master-and-slave-tp4037333.html
Sent from the Solr - User mailing list archive at Nabble.com.


[OT] San Fran. Lucene/Solr Hack Night

2013-01-30 Thread Grant Ingersoll
If you are in the San. Fran. area next Wednesday, Feb. 06, LucidWorks and I 
will be hosting a Lucene/Solr hack night.  To reserve a spot or learn more, see 
http://www.meetup.com/SFBay-Lucene-Solr-Meetup/

Bring you laptop, your code, etc. and we'll hack on Lucene/Solr for a few 
hours.  

Cheers,
Grant


Grant Ingersoll
http://www.lucidworks.com






Re: Solrcloud 4.1 Cluster state NullPointerException error.

2013-01-30 Thread Luis Cappa Banda
I´ve noticed checking Cloud admin UI that sometimes one of the nodes
appears with no Cloud information (even reloading cluster state .json).
However, a while later the whole Cloud status information appears again. It
looks like it disconnects and re-connects itself.

Quite strage, guys...

2013/1/30 Luis Cappa Banda 

> Hello, guys.
>
> After upgrading from Solr 4.1 to Solr 4.1 the following error has
> frecuently appeared in my logs.
>
> *INFO: A cluster state change: WatchedEvent state:SyncConnected
> type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
> (live nodes size: 8)*
> *2013-01-30 10:14:21,631 2432605 [localhost-startStop-1-EventThread]
> ERROR org.apache.zookeeper.ClientCnxn  - Error while calling watcher *
> *java.lang.NullPointerException*
> *at
> org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201)
> *
> *at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
> *
> *at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)*
>
>
> I´ve using SolrJ CloudSolrServer with a 2 shards configuration using only
> one Zookeeper instance. I´ll move to a Zookeeper ensemble (3 Zookeepers)
> soon.
>
> Any idea?
>
> Thanks,
>
> - Luis Cappa.
>
>


Re: overlap function query

2013-01-30 Thread Mikhail Khludnev
I'm not really getting your point. If you say that tf,idf get over on
coord, I think it's possible to eliminate them by custom similarity.


On Wed, Jan 30, 2013 at 2:06 PM, Daniel Rosher  wrote:

> Hi Mikhail,
>
> Thanks for the reply.
>
> I think coord works at the document level, I was thinking of having
> something that worked at a field level, against a 'principle/primary'
> field.
>
> I'm using edismax with tie=1 (a.k.a. Disjunction Sum) and several fields,
> but docs with greater query overlap on the primary field should score
> higher if you see what I mean.
>
> Cheers,
> Dan
>
> On Tue, Jan 29, 2013 at 7:14 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Daniel,
> >
> > You can start from here
> >
> >
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/Similarity.html#coord%28int,%20int%29but
> > it requires deep understanding of Lucene internals
> >
> >
> >
> > On Tue, Jan 29, 2013 at 2:12 PM, Daniel Rosher 
> wrote:
> >
> > > Hi,
> > >
> > > I'm wondering if there exists or if someone has implemented something
> > like
> > > the following as a function query:
> > >
> > > overlap(query,field) = number of matching terms in field/number of
> terms
> > in
> > > field
> > >
> > > e.g. with three docs having these tokens(e.g.A B C) in a field
> > > D
> > > 1:A B B
> > > 2:A B
> > > 3:A
> > >
> > > The overlap would be for these queries (-- highlights possibly highest
> > > scoring doc):
> > >
> > > Q:A
> > > 1:1/3
> > > 2:1/2
> > > 3:1/1 --
> > >
> > > Q:A B
> > > 1:2/3
> > > 2:2/2 --
> > > 3:1/1
> > >
> > > Q:A B C
> > > 1:2/3
> > > 2:2/2 --
> > > 3:1/1
> > >
> > > The objective to to pick the most likely doc using the overlap to boost
> > the
> > > score.
> > >
> > > Cheers,
> > > Dan
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> >  
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: A question about attaching shards to load balancers

2013-01-30 Thread Upayavira
I'm afraid I'm note completely clear about your scenario. Let me say how
I understand what you're saying, and what I've done in the past.

Firstly, I take it you are using Solr 3.x (from your reference to a
'shards' parameter.

Secondly, you refer to a 'stripe' as one set of nodes, one for each
shard, that are enough to allow querying your whole collection.

Having created the concept of a 'slice', you then hardwire the 'shards'
parameter in solrconfig.xml in each machine in that slice to point to
all the other nodes in that same slice.

Then you point your load balancer at some boxes, which will do
distributed queries. Now, by the sounds of it, every box on your setup
could do this, they all have a shards parameter set up. Minimally,
you'll want at least one box from each slice, otherwise you'll have
slices that aren't receiving queries. But could you include all of your
boxes, and have all of them handling the query distribution work? I
guess you could, but I'd suggest another architecture.

In the setup you describe, if you loose one node, you loose an entire
slice. However, if a distributed query comes into another node in the
slice, the load balancer may well not notice (unless you make the
healthcheck itself do a distributed search) and things could get messy.

What I've done is set up a VIP in my load balancer for each and every
node that can service a shard. Repeat that for each shard that I have.
Let's say I have four shards, I'll end up with four VIPs. I then put
those four VIPs into my shards parameter in solrconfig.xml on all of my
hosts, regardless of what shard/slice.

Then, I create another VIP that includes all of my nodes in it. That is
the one that I hand to my application. 

This way, you can loose any node in any shard and the thing should keep
on going. 

Obviously I'm talking about slaves here. There will be a master for each
shard which each of these nodes pull their indexes from.

Hope this is helpful.

Upayavira

On Tue, Jan 29, 2013, at 09:35 PM, Lee, Peter wrote:
> I would appreciate people's experience on the following load balancing
> question...
> 
> We currently have solr configured in shards across multiple machines to
> handle our load. That is, a request being sent to any one of these
> servers will cause that server to query the rest of the servers in that
> "stripe" (we use the term "stripe" to refer to a set of servers that
> point to each other with the shard parameter).
> 
> We currently have all servers in a stripe registered with our load
> balancer. Thus, requests are being spread out across all servers in the
> stripe...but of course requests to any shard generates additional traffic
> on all shards in that stripe.
> 
> My question (finally) is this: Has anyone determined if it is better to
> place only a few (that is, not all) of the shards in a stripe on the load
> balancer as versus ALL of the shards in a stripe on the load balancer? It
> seemed to me at first that it would not make much of a difference, but
> then I realized that this would really depend on the relative costs of a
> few different steps (one step would be the cost of collecting all of the
> responses from the other servers in the shard to formulate the final
> answer. Another step would be the cost of generating more traffic between
> the shards, etc.).
> 
> So what I am trying to ask is this: If we had 6  servers in a "stripe" (6
> servers set up as shards to support a single query), would there be any
> advantage with respect to handling load to only place ONE or TWO of the
> shards on the load balancer as versus putting ALL shards on the load
> balancer?
> 
> We can test this empirically but if the community already has gotten a
> feel for the best practice in this situation I would be happy to learn
> from your experience. I could not find anything online that spoke to this
> particular situation.
> 
> Thanks.
> 
> Peter S. Lee
> Senior Software Engineer
> ProQuest
> 789 E. Eisenhower Parkway
> Ann Arbor, MI, 48106-1346
> USA
> 734-761-4700 x72025
> peter@proquest.com
> www.proquest.com
> 
> ProQuest...Start here
> InformationWeek 500 Top
> Innovator
> 


Re: overlap function query

2013-01-30 Thread Daniel Rosher
Hi Mikhail,

Thanks for the reply.

I think coord works at the document level, I was thinking of having
something that worked at a field level, against a 'principle/primary'
field.

I'm using edismax with tie=1 (a.k.a. Disjunction Sum) and several fields,
but docs with greater query overlap on the primary field should score
higher if you see what I mean.

Cheers,
Dan

On Tue, Jan 29, 2013 at 7:14 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Daniel,
>
> You can start from here
>
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/Similarity.html#coord%28int,%20int%29but
> it requires deep understanding of Lucene internals
>
>
>
> On Tue, Jan 29, 2013 at 2:12 PM, Daniel Rosher  wrote:
>
> > Hi,
> >
> > I'm wondering if there exists or if someone has implemented something
> like
> > the following as a function query:
> >
> > overlap(query,field) = number of matching terms in field/number of terms
> in
> > field
> >
> > e.g. with three docs having these tokens(e.g.A B C) in a field
> > D
> > 1:A B B
> > 2:A B
> > 3:A
> >
> > The overlap would be for these queries (-- highlights possibly highest
> > scoring doc):
> >
> > Q:A
> > 1:1/3
> > 2:1/2
> > 3:1/1 --
> >
> > Q:A B
> > 1:2/3
> > 2:2/2 --
> > 3:1/1
> >
> > Q:A B C
> > 1:2/3
> > 2:2/2 --
> > 3:1/1
> >
> > The objective to to pick the most likely doc using the overlap to boost
> the
> > score.
> >
> > Cheers,
> > Dan
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


How to make Solr multiple dataimporthandler scheduleer in php

2013-01-30 Thread ashimbose
Hi All,

How to make Solr multiple dataimporthandler scheduleer in php? 

I have multiple dataimport in solrconfig.xml

I want to make a php script which will make index one by one by running url
like below...
http://localhost:8080/solr/core_sql/dataimport1?command=full-import
http://localhost:8080/solr/core_sql/dataimport2?command=full-import&clean=false
http://localhost:8080/solr/core_sql/dataimport3?command=full-import&clean=false
second url will wait until the first url index not being finish successfully
or will through some exception if happens during process. And will not rich
to  Maximum execution time say 230.

Please anybody can help me to write the script? or any idea?

Regards,
Ashim Bose





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-make-Solr-multiple-dataimporthandler-scheduleer-in-php-tp4037302.html
Sent from the Solr - User mailing list archive at Nabble.com.