Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem

2014-07-21 Thread Harald Kirsch

Hi IJ,

yes indeed, there are multiple node. But I have a 50 seconds delay, not 
5 seconds.


Anyway I will keep this in mind and will experiment with the hosts file 
if it starts to get annoying again.


Cheers,
Harald.

On 16.07.2014 19:44, IJ wrote:

I know u mentioned you have a single machine at play - but do you have
multiple nodes on the machine that talk to one another ??

Does your problem recur when the load on the system is low ?

Also faced a similar problem wherein the 5 second delay (described in
detail on my other post) kept happening after a 1.5 minute inactivity
interval. This was explained off as Solr keeping alive the http connection
for inter-node communication for around 1.5 minutes before disconnecting -
and if a new request happens post 1.5 minutes then, a new connection is
created - which probably suffers a latency due to a DNS Name Lookup delay.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4147512.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49 211 53883-216
Fax +49-211-550266-19
http://www.raytion.com


Re: Plugin init failure for custom analysis filter

2014-07-21 Thread ssivakumaran
Hi,
I am not able to find anything in the log or rather not that specific. This
error is being thrown when I add a string argument to my filter in schema.
If I remove the same, I donot get any error. I tried changing the datatype
but still same error.
A little more detail regarding the filter arguments:

fieldType name=textNumeric class=solr.CustomTextField
positionIncrementGap=100 
analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory/
filter class=solr.CustomFilterFactory
pattern=(.*)/([0-9]+)/([0-9]+)/([0-9]+)/(.*)? /
/analyzer
analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType


fieldType name=textCustom class=solr.CustomTextField
positionIncrementGap=100 
analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory/
filter class=solr.CustomFilterFactory
pattern=(.*)/([0-9]+)/([0-9]+)/([0-9]+)/(.*)? ValueToStore=N /
filter class=solr.LengthFilterFactory 
min=1 max=100
enablePositionIncrements=true /
/analyzer
analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType

I get error here only for textCustom and not textNumeric while initializing
the core.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Plugin-init-failure-for-custom-analysis-filter-tp4147851p4148259.html
Sent from the Solr - User mailing list archive at Nabble.com.


stats.facet with multi-valued field in Solr 4.9

2014-07-21 Thread Nico Kaiser
Hi!

I am storing aggregated article click statistics for a website in a Lucene 
database. Website articles (i.e., pages in this case) can have multiple 
associated financial instruments, which – for statistics reasons – I also copy 
to Lucene. So basically this data is stored (and regularly updated) by 
articleId and date as

{
articleId: 1234,
date: 2014-07-21,
clicks: 5,
instrumentIds: [ 1, 2, 3, 4 ]
}

Now I need to generate statistics, like aggregated article click count by 
instrumentId:

/solr/article_stats/select
?q=*:*
stats=true
stats.field=clicks
stats.facet=instrumentIds
rows=0

This way Solr returned a (large) list of instrumentIds in 
stats.stats_fields.clicks.facets.instrumentIds with the clicks per instrument, 
which was exactly what I want.


After the upgrade to Solr 4.9 (from 3.6) this seems not to be possible anymore:

Stats can only facet on single-valued fields, not: instrumentIds


Is there a way to replicate the old behaviour?

Thanks,
Nico



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: stats.facet with multi-valued field in Solr 4.9

2014-07-21 Thread Yonik Seeley
On Mon, Jul 21, 2014 at 7:09 AM, Nico Kaiser n...@kaiser.me wrote:
 After the upgrade to Solr 4.9 (from 3.6) this seems not to be possible 
 anymore:

 Stats can only facet on single-valued fields, not: instrumentIds

https://issues.apache.org/jira/browse/SOLR-3642

It looks like perhaps it never did work correctly.

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


Re: stats.facet with multi-valued field in Solr 4.9

2014-07-21 Thread Nico Kaiser
Yonik, thanks for your reply! I also found 
https://issues.apache.org/jira/browse/SOLR-1782 which also sees to deal with 
this, but I did not find out wether there is a workaround.

For our use case the previous behaviour was ok and seemed (!) to be consistent.
However I understand that this feature had to be disabled if it was broken.

Do you have an idea how to achieve the behaviour I mentioned before?

Nico


On 21 Jul 2014, at 13:26, Yonik Seeley yo...@heliosearch.com wrote:

 On Mon, Jul 21, 2014 at 7:09 AM, Nico Kaiser n...@kaiser.me wrote:
 After the upgrade to Solr 4.9 (from 3.6) this seems not to be possible 
 anymore:
 
 Stats can only facet on single-valued fields, not: instrumentIds
 
 https://issues.apache.org/jira/browse/SOLR-3642
 
 It looks like perhaps it never did work correctly.
 
 -Yonik
 http://heliosearch.org - native code faceting, facet functions,
 sub-facets, off-heap data



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: stats.facet with multi-valued field in Solr 4.9

2014-07-21 Thread Yonik Seeley
On Mon, Jul 21, 2014 at 7:32 AM, Nico Kaiser n...@kaiser.me wrote:
 Yonik, thanks for your reply! I also found 
 https://issues.apache.org/jira/browse/SOLR-1782 which also sees to deal with 
 this, but I did not find out wether there is a workaround.

 For our use case the previous behaviour was ok and seemed (!) to be 
 consistent.
 However I understand that this feature had to be disabled if it was broken.

 Do you have an idea how to achieve the behaviour I mentioned before?

I don't think there's anything currently committed/released.

There has been work on an Analytics component that could do it.  This
hasn't been committed to Solr yet, but has been committed in
Heliosearch.  Also, Heliosearch has facet functions:
http://heliosearch.org/solr-facet-functions/

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


AUTO: Nicholas M. Wertzberger is out of the office (returning 07/23/2014)

2014-07-21 Thread Nicholas M. Wertzberger


I am out of the office until 07/23/2014.

I'm out of town for the next few days. I am reachable by Blackberry, if
needed. Please contact Jason Brown for anything JAS Team related.


Note: This is an automated response to your message  Re: questions on Solr
WordBreakSolrSpellChecker and WordDelimiterFilterFactory sent on 7/17/2014
7:42:42 AM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


faceting within facets

2014-07-21 Thread David Flower
Hi

Is it possible to create a facet within another facet in a single query, 
currently I'm having to filter the query with facet.query=type:foo and running 
the query multiple times to return the number and type of object created on a 
given date.

Is it even possible to return this in a single query?

Cheers,
David


Re: faceting within facets

2014-07-21 Thread Yonik Seeley
On Mon, Jul 21, 2014 at 8:08 AM, David Flower dflo...@amplience.com wrote:
 Is it possible to create a facet within another facet in a single query

For simple field facets, there's pivot faceting.
For more complex nested facets, there are sub-facets in heliosearch (a
solr fork):
http://heliosearch.org/solr-subfacets/

-Yonik


Solr Cassandra MySQL Best Practice Indexing

2014-07-21 Thread Yavar Husain
So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.
My thoughts (maybe weird):

1. Read the data from Cassandra, for each ID read, read the corresponding
row from MySQL for that ID, form an XML on the fly (for each ID) and send
it to Solr for Indexing without storing anything.
2. I do not have much idea on Solandra. However even if I use it I will
have to go to MySQL for fetching the structured data.
3. Duplicate the data and either get all of Cassandra to MySQL or vice
versa but then data duplication would happen.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.


Re: Solr Cassandra MySQL Best Practice Indexing

2014-07-21 Thread Jack Krupansky
Solandra is not a supported product. DataStax Enterprise (DSE) supersedes 
it. With DSE, just load your data into a Solr-enabled Cassandra data center 
and it will be indexed automatically in the embedded Solr within DSE, as per 
a Solr schema that you provide. Then use any of the nodes in that 
Solr-enabled Cassandra data center just the same as with normal Solr.


-- Jack Krupansky

-Original Message- 
From: Yavar Husain

Sent: Monday, July 21, 2014 8:37 AM
To: solr-user@lucene.apache.org
Subject: Solr Cassandra MySQL Best Practice Indexing

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.
My thoughts (maybe weird):

1. Read the data from Cassandra, for each ID read, read the corresponding
row from MySQL for that ID, form an XML on the fly (for each ID) and send
it to Solr for Indexing without storing anything.
2. I do not have much idea on Solandra. However even if I use it I will
have to go to MySQL for fetching the structured data.
3. Duplicate the data and either get all of Cassandra to MySQL or vice
versa but then data duplication would happen.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated. 



RE: SolrCloud performance issues regarding hardware configuration

2014-07-21 Thread Toke Eskildsen
search engn dev [sachinyadav0...@gmail.com] wrote:
 Yes, You are right my facet queries are for text analytic purpose.

Does this mean that facet calls are rare (at most one at a time)?

 Users will send boolean and spatial queries. current performance for spatial
 queries is 100qps with 150 concurrent users and avg response time is 500ms.

What is the limiting factor here? CPU or I/O? If it is the latter, then adding 
more memory to the existing setup seems like the cheapest and easiest choice.

- Toke Eskildsen


Query about Solr

2014-07-21 Thread Ameya Aware
Hi,

How can i stop content of file from being getting indexed??

Will removing content field from schema.xml do that job?


Thanks,
Ameya


Edit Example Post.jar to read ALL file types

2014-07-21 Thread jrusnak
I am working with Solr 4.8.1 to set up an enterprise search system.

The file system I am working with has numerous files with unique extension
types (ex .20039 .20040 .20041 etc.)

I am using the post.jar file included in the binary download (src: 
SimplePostTool.java
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SimplePostTool.java
 
)to post these files to the solr server and would like to edit this jar file
to recognize /any/ file extension it comes across.

Is there a way to do this with the SimplePostTool.java source? I am right
now working to better understand the Filetype and DEFAULT_FILE_TYPE
variables as well as the mimeMap. It is these that currently allow me to
manually add file extensions.

I would however, like the tool to be able to read in files no matter what
they extension was and default their mime type to text/plain.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query about Solr

2014-07-21 Thread Alexandre Rafalovitch
Nothing gets indexed automatically. So you must be doing something (e.g.
Nutch). Tell us what that something is first so we know your baseline setup.

Regards,
 Alex
On 21/07/2014 9:43 pm, Ameya Aware ameya.aw...@gmail.com wrote:

 Hi,

 How can i stop content of file from being getting indexed??

 Will removing content field from schema.xml do that job?


 Thanks,
 Ameya



Re: Query about Solr

2014-07-21 Thread Ameya Aware
Hi,

The data coming into Solr is different metadata such as author, created
time, last modified time etc along with content of the file.

So indexing content is giving me different errors, so i just simply want to
skip indexing content part.


Thanks,
Ameya


On Mon, Jul 21, 2014 at 11:07 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Nothing gets indexed automatically. So you must be doing something (e.g.
 Nutch). Tell us what that something is first so we know your baseline
 setup.

 Regards,
  Alex
 On 21/07/2014 9:43 pm, Ameya Aware ameya.aw...@gmail.com wrote:

  Hi,
 
  How can i stop content of file from being getting indexed??
 
  Will removing content field from schema.xml do that job?
 
 
  Thanks,
  Ameya
 



Re: Query about Solr

2014-07-21 Thread Jack Krupansky

Set the field type for such a field to ignored.

Or set it to string and then you can still examine or query the data even 
if it is not properly formatted.


-- Jack Krupansky

-Original Message- 
From: Ameya Aware

Sent: Monday, July 21, 2014 11:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Query about Solr

Hi,

The data coming into Solr is different metadata such as author, created
time, last modified time etc along with content of the file.

So indexing content is giving me different errors, so i just simply want to
skip indexing content part.


Thanks,
Ameya


On Mon, Jul 21, 2014 at 11:07 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:


Nothing gets indexed automatically. So you must be doing something (e.g.
Nutch). Tell us what that something is first so we know your baseline
setup.

Regards,
 Alex
On 21/07/2014 9:43 pm, Ameya Aware ameya.aw...@gmail.com wrote:

 Hi,

 How can i stop content of file from being getting indexed??

 Will removing content field from schema.xml do that job?


 Thanks,
 Ameya






Solr schema.xml query analyser

2014-07-21 Thread prashantc88
 0 down vote favorite


I am a complete beginner to Solr and need some help.

My task is to provide a match when the search term contains the indexed
field.

For example:

If query= foo bar and textExactMatch= foo, I should not get a MATCH
If query= foo bar and textExactMatch= foo bar, I should get a MATCH
If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

fieldType name=textExactMatch class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
/analyzer

So I'm indexing the text for the field as it is without breaking it further
down. Could someone help me out with how should I tokenize and filter the
field during query time.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr schema.xml query analyser

2014-07-21 Thread Jack Krupansky
If you don't specify a query analyzer, Solr will use the index analyzer 
at query time.


But... at query time there is something called a query parser which 
typically breaks the query into separate terms, delimited by white space, 
and then calls the analyzer for each term, separately.


You can put the entire query in quotes or escape the space with a backslash.

Of, just use the edismax query parser with the pf or pf2 parameters and 
then Solr will boost exact phrase matches even if not quoted or escaped.


-- Jack Krupansky

-Original Message- 
From: prashantc88

Sent: Monday, July 21, 2014 11:29 AM
To: solr-user@lucene.apache.org
Subject: Solr schema.xml query analyser

0 down vote favorite


I am a complete beginner to Solr and need some help.

My task is to provide a match when the search term contains the indexed
field.

For example:

   If query= foo bar and textExactMatch= foo, I should not get a MATCH
   If query= foo bar and textExactMatch= foo bar, I should get a MATCH
   If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

fieldType name=textExactMatch class=solr.TextField
positionIncrementGap=100
   analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.LowerCaseFilterFactory/
   /analyzer

So I'm indexing the text for the field as it is without breaking it further
down. Could someone help me out with how should I tokenize and filter the
field during query time.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr schema.xml query analyser

2014-07-21 Thread prashantc88
Thanks Jack for the reply.

I did not mention the query time analyzer in my post because I wasn't sure
what should be put there.

With regards to your reply, If I put the query term in quotes, would I get a
match for the following:

Indexed field value: foo bar
Query term: foo bar xyz/xyz foo bar

I believe it should not as it will be looking for the exact term present in
both the places.

However I want it to behave in the following way:

If query= foo bar and textExactMatch= foo, I SHOULD NOT get a MATCH
If query= foo bar and textExactMatch= foo bar, I SHOULD get a MATCH
If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I SHOULD 
get a MATCH 

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr schema.xml query analyser

2014-07-21 Thread Jack Krupansky
Based on your stated requirements, there is no obvious need to use the 
keyword tokenizer. So fix that and then quoted phrases or escaped spaces 
should work.


-- Jack Krupansky

-Original Message- 
From: prashantc88

Sent: Monday, July 21, 2014 11:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr schema.xml query analyser

Thanks Jack for the reply.

I did not mention the query time analyzer in my post because I wasn't sure
what should be put there.

With regards to your reply, If I put the query term in quotes, would I get a
match for the following:

Indexed field value: foo bar
Query term: foo bar xyz/xyz foo bar

I believe it should not as it will be looking for the exact term present in
both the places.

However I want it to behave in the following way:

   If query= foo bar and textExactMatch= foo, I SHOULD NOT get a MATCH
   If query= foo bar and textExactMatch= foo bar, I SHOULD get a MATCH
   If query= foo bar and textExactMatch= xyz foo bar/foo bar xyz, I SHOULD
get a MATCH

Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148327.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr schema.xml query analyser

2014-07-21 Thread newBie88
My apologies Jack. But there was a mistake in my question.

I actually switched query and textExactMatch in my question.

I would be really helpful if you could have a look at the scenario once
again:

My task is to provide a match when the search term contains the indexed
field. 

For example: 

If textExactMatch= foo bar and query= foo, I should not get a MATCH 
If textExactMatch= foo bar and query= foo bar, I should get a MATCH 
If textExactMatch= foo bar and query= xyz foo bar/foo bar xyz, I should
get a MATCH 

I am indexing my field as follows: 

fieldType name=textExactMatch class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

So I'm indexing the text for the field as it is without breaking it further
down. How should I tokenize and filter the field during query time? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148352.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: text search problem

2014-07-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Thanks for the reply Erick, I will try as you suggested. There I have  another 
question related to this lines.

When I have - in my description , name then the search results are different. 
For e.g.

ABC-123 , it look sofr ABC or 123, I want to treat this search as exact 
match, i.e if my document has ABC-123 then I should get the results. 

When I check with hl-on, it has emABCem and get the results. How can I 
avoid this situation.

Thanks

Ravi


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, July 19, 2014 4:40 PM
To: solr-user@lucene.apache.org
Subject: Re: text search problem

Try adding debug=all to the query and see what the parsed form of the query 
is, likely you're
1 using phrase queries, so broadway hotel requires both words in the 
1 text
or
2 if you're not using phrases, you're searching for the AND of the two
terms.

But debug=all will show you.

Plus, take a look at the admin/analysis page, your tokenization may not be what 
you expect.

Best,
Erick


On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi,  Below is the text_general field type when I search Text:Boradway  
 it is not returning all the records, it returning only few records. 
 But when I search for Text:*Broadway*, it is getting more records. 
 When I get into multiple words ln search like Broadway Hotel, it may 
 not get Broadway , HotelBroadway Hotel. DO you have any 
 thought how to handle these type of keyword search.

 Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car 
 Wash Water Recovery

 My Field type look like this.

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.KStemFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

   !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --

   /analyzer
   analyzer type=query
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.KStemFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

  /analyzer
 /fieldType



 Do you have any thought the behavior or how to get this?

 Thanks

 Ravi



Re: Solr schema.xml query analyser

2014-07-21 Thread Jack Krupansky
That sounds more like a reverse query - trying to match documents against 
the query rather than matching the query against the documents. Solr doesn't 
have that feature currently.


Although I'm not absolutely sure what your textExactMatch is. I'm guessing 
that it is a document field in your index.


-- Jack Krupansky

-Original Message- 
From: newBie88

Sent: Monday, July 21, 2014 1:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr schema.xml query analyser

My apologies Jack. But there was a mistake in my question.

I actually switched query and textExactMatch in my question.

I would be really helpful if you could have a look at the scenario once
again:

My task is to provide a match when the search term contains the indexed
field.

For example:

   If textExactMatch= foo bar and query= foo, I should not get a MATCH
   If textExactMatch= foo bar and query= foo bar, I should get a MATCH
   If textExactMatch= foo bar and query= xyz foo bar/foo bar xyz, I should
get a MATCH

I am indexing my field as follows:

fieldType name=textExactMatch class=solr.TextField
positionIncrementGap=100
 analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

So I'm indexing the text for the field as it is without breaking it further
down. How should I tokenize and filter the field during query time?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-schema-xml-query-analyser-tp4148317p4148352.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: Multiterm analysis in complexphrase query

2014-07-21 Thread Gopal Agarwal
That would be really useful.

Can you upload the jar and its requirements?

It also makes it pluggable with diff versions of solr.
 On Jul 1, 2014 9:01 PM, Allison, Timothy B. talli...@mitre.org wrote:

 If there's enough interest, I might get back into the code and throw a
 standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto
 github.  That would make it more widely available until there's a chance to
 integrate it into Lucene/Solr.  If you'd be interested in this, let me know
 (and/or vote on the issue pages on Jira).

 Best,

Tim

 -Original Message-
 From: Michael Ryan [mailto:mr...@moreover.com]
 Sent: Tuesday, July 01, 2014 9:24 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Multiterm analysis in complexphrase query

 Thanks. This looks interesting...

 -Michael

 -Original Message-
 From: Allison, Timothy B. [mailto:talli...@mitre.org]
 Sent: Monday, June 30, 2014 8:15 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Multiterm analysis in complexphrase query

 Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser
 does not perform analysis (as you, Michael, point out).  The
 SpanQueryParser in LUCENE-5205 does perform analysis and might meet your
 needs.  Work on it has gone on pause, though, so you'll have to build from
 the patch or the LUCENE-5205 branch.  Let me know if you have any questions.

 LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and
 make it available to all parsers that use QueryParserBase, including the
 ComplexPhraseQueryParser.

 Best,

 Tim

 -Original Message-
 From: Michael Ryan [mailto:mr...@moreover.com]
 Sent: Sunday, June 29, 2014 11:09 AM
 To: solr-user@lucene.apache.org
 Subject: Multiterm analysis in complexphrase query

 I've been using a modified version of the complex phrase query parser
 patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6,
 and I'm currently upgrading to 4.9, which has this built-in.

 I'm having trouble with using accents in wildcard queries, support for
 which was added in https://issues.apache.org/jira/browse/SOLR-2438. In
 3.6, I was using a modified version of SolrQueryParser, which simply used
 ComplexPhraseQueryParser in place of QueryParser. In the version of
 ComplexPhraseQParserPlugin in 4.9, it just directly uses
 ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all.
 SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm
 analysis magic happens.

 So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser
 doesn't use SolrQueryParserBase, which breaks doing fun things like this:
 {!complexPhrase}barac* óba*a
 And expecting it to match Barack Obama.

 Anyone run into this before, or have a way to get this working?

 -Michael



How do I disable distributed search feature when I have only one shard

2014-07-21 Thread pramodEbay
Hi there,

We have a solr cloud set up with only one shard. There is one leader and 15
followers. So the data is replicated on 15 nodes. When we run a solr query,
only one node should handle the request and we do not need any distributed
search feature as all the nodes are exact copies of each other.

Under certain load scenarios, we are seeing SOLRJ api is adding
isShard=truedistrib=falseshard.url=A,B,C etc. to all the queries.  Is the
solr query waiting for responses from A, B and C before returning back to
the client. If that is true, it is unnecessary and causing problems for us
under heavy load.

The thing is, somehow, these parameters are automagically added during query
time. How do we disable this. The solrj query that we build programatically
does not add these three parameters. Is there some configuration we can turn
on, to tell solrj not to add these parameters to the solr request.

Thanks,
Pramod



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-disable-distributed-search-feature-when-I-have-only-one-shard-tp4148449.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud replica dies under high throughput

2014-07-21 Thread Darren Lee
Hi,

I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out 
exactly how much throughput my cluster can handle.

Consistently in my test I see a replica go into recovering state forever caused 
by what looks like a timeout during replication. I can understand the timeout 
and failure (I am hitting it fairly hard) but what seems odd to me is that when 
I stop the heavy load it still does not recover the next time it tries, it 
seems broken forever until I manually go in, clear the index and let it do a 
full resync.

Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 
shards, 2 replicas) (AWS m3.2xlarge). I am indexing with ~800 concurrent 
connections and a 10 sec soft commit. I consistently get this problem with a 
throughput of around 1.5 million documents per hour.

Thanks all,
Darren


Stack Traces  Messages:

[qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter  â 
null:org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for 
connection from pool
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

Error while trying to recover. 
core=assets_shard2_replica1:java.util.concurrent.ExecutionException: 
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: http://xxx.xxx.15.171:8080/solr
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: http://xxx.xxx.15.171:8080/solr
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at 

Re: SolrCloud replica dies under high throughput

2014-07-21 Thread Mark Miller
Looks like you probably have to raise the http client connection pool limits to 
handle that kind of load currently.

They are specified as top level config in solr.xml:

maxUpdateConnections
maxUpdateConnectionsPerHost

--  
Mark Miller
about.me/markrmiller

On July 21, 2014 at 7:14:59 PM, Darren Lee (d...@amplience.com) wrote:
 Hi,
  
 I'm doing some benchmarking with Solr Cloud 4.9.0. I am trying to work out 
 exactly how  
 much throughput my cluster can handle.
  
 Consistently in my test I see a replica go into recovering state forever 
 caused by what  
 looks like a timeout during replication. I can understand the timeout and 
 failure (I  
 am hitting it fairly hard) but what seems odd to me is that when I stop the 
 heavy load it still  
 does not recover the next time it tries, it seems broken forever until I 
 manually go in,  
 clear the index and let it do a full resync.
  
 Is this normal? Am I misunderstanding something? My cluster has 4 nodes (2 
 shards, 2 replicas)  
 (AWS m3.2xlarge). I am indexing with ~800 concurrent connections and a 10 sec 
 soft commit.  
 I consistently get this problem with a throughput of around 1.5 million 
 documents per  
 hour.
  
 Thanks all,
 Darren
  
  
 Stack Traces  Messages:
  
 [qtp779330563-627] ERROR org.apache.solr.servlet.SolrDispatchFilter â 
 null:org.apache.http.conn.ConnectionPoolTimeoutException:  
 Timeout waiting for connection from pool
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
   
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
   
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:422)
   
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
   
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
   
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
   
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
   
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
   
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   
 at java.lang.Thread.run(Thread.java:724)
  
 Error while trying to recover. 
 core=assets_shard2_replica1:java.util.concurrent.ExecutionException:  
 org.apache.solr.client.solrj.SolrServerException: IOException occured when  
 talking to server at: http://xxx.xxx.15.171:8080/solr
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:615)
   
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:371)  
 at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)  
 Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
 occured  
 when talking to server at: http://xxx.xxx.15.171:8080/solr
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
   
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:245)
   
 at 
 org.apache.solr.client.solrj.impl.HttpSolrServer$1.call(HttpSolrServer.java:241)
   
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.net.SocketException: Socket closed
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:152)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
   
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
   
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
   
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
   
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
   
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
   
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
   
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
   
 at 
 

SolrCloud extended warmup support

2014-07-21 Thread Jeff Wartes

I’d like to ensure an extended warmup is done on each SolrCloud node prior to 
that node serving traffic.
I can do certain things prior to starting Solr, such as pump the index dir 
through /dev/null to pre-warm the filesystem cache, and post-start I can use 
the ping handler with a health check file to prevent the node from entering the 
clients load balancer until I’m ready.
What I seem to be missing is control over when a node starts participating in 
queries sent to the other nodes.

I can, of course, add solrconfig.xml firstSearcher queries, which I assume (and 
fervently hope!) happens before a node registers itself in ZK clusterstate.json 
as ready for work, but that doesn’t scale so well if I want that initial warmup 
to run thousands of queries, or run them with some paralleism. I’m storing 
solrconfig.xml in ZK, so I’m sensitive to the size.

Any ideas, or corrections to my assumptions?

Thanks.


Re: SolrCloud extended warmup support

2014-07-21 Thread Shawn Heisey
On 7/21/2014 5:37 PM, Jeff Wartes wrote:
 I’d like to ensure an extended warmup is done on each SolrCloud node prior to 
 that node serving traffic.
 I can do certain things prior to starting Solr, such as pump the index dir 
 through /dev/null to pre-warm the filesystem cache, and post-start I can use 
 the ping handler with a health check file to prevent the node from entering 
 the clients load balancer until I’m ready.
 What I seem to be missing is control over when a node starts participating in 
 queries sent to the other nodes.
 
 I can, of course, add solrconfig.xml firstSearcher queries, which I assume 
 (and fervently hope!) happens before a node registers itself in ZK 
 clusterstate.json as ready for work, but that doesn’t scale so well if I want 
 that initial warmup to run thousands of queries, or run them with some 
 paralleism. I’m storing solrconfig.xml in ZK, so I’m sensitive to the size.
 
 Any ideas, or corrections to my assumptions?

I think that firstSearcher/newSearcher (and making sure useColdSearcher
is set to false) is going to be the only way you can do this in a way
that's compatible with SolrCloud.  If you were doing manual distributed
search without SolrCloud, you'd have more options available.

If useColdSearcher is set to false, that should keep *everything* from
using the searcher until the warmup has finished.  I cannot be certain
that this is the case, but I have some reasonable confidence that this
is how it works.  If you find that it doesn't behave this way, I'd call
it a bug.

Thanks,
Shawn



Re: SolrCloud extended warmup support

2014-07-21 Thread Jeff Wartes

On 7/21/14, 4:50 PM, Shawn Heisey s...@elyograg.org wrote:

On 7/21/2014 5:37 PM, Jeff Wartes wrote:
 I¹d like to ensure an extended warmup is done on each SolrCloud node
prior to that node serving traffic.
 I can do certain things prior to starting Solr, such as pump the index
dir through /dev/null to pre-warm the filesystem cache, and post-start I
can use the ping handler with a health check file to prevent the node
from entering the clients load balancer until I¹m ready.
 What I seem to be missing is control over when a node starts
participating in queries sent to the other nodes.
 
 I can, of course, add solrconfig.xml firstSearcher queries, which I
assume (and fervently hope!) happens before a node registers itself in
ZK clusterstate.json as ready for work, but that doesn¹t scale so well
if I want that initial warmup to run thousands of queries, or run them
with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
to the size.
 
 Any ideas, or corrections to my assumptions?

I think that firstSearcher/newSearcher (and making sure useColdSearcher
is set to false) is going to be the only way you can do this in a way
that's compatible with SolrCloud.  If you were doing manual distributed
search without SolrCloud, you'd have more options available.

If useColdSearcher is set to false, that should keep *everything* from
using the searcher until the warmup has finished.  I cannot be certain
that this is the case, but I have some reasonable confidence that this
is how it works.  If you find that it doesn't behave this way, I'd call
it a bug.

Thanks,
Shawn


Thanks for the quick reply. Since distributed search latency is the max of
the shard sub-requests, I¹m trying my best to minimize any spikes in
cluster latency due to node restarts.
I double-checked useColdSearcher was false, but the doc says this means
requests ³block until the first searcher is done warming², which
translates pretty clearly to ³latency spike². The more I think about it,
the more worried I am that a node might indeed register itself in
live_nodes and get distributed requests before it¹s got a searcher to work
with. *Especially* if I have lots of serial firstSearcher queries.

I¹ll look through the code myself tomorrow, but if anyone can help
confirm/deny the order of operations here, I¹d appreciate it.



Re: Edit Example Post.jar to read ALL file types

2014-07-21 Thread Erick Erickson
So how do you expect these to be indexed? I mean what happens
if you run across a Word document? How about an mp3? Just
blasting all files up seems chancy. And doesn't just
'java -jar post.jar * ' do what you ask?

This seems like an XY problem, _why_ do you want
to do this? Because unless the files being sent to Solr are
properly formatted, they won't be ingested. There's some special
logic that handles XML file and expects the very precise Solr
format Solr would have no idea what to do with the
extensions in your example.

Perhaps a better approach would be to control the indexing
from a SolrJ client. Here's a blog if you want to follow
that approach.

Best,
Erick


On Mon, Jul 21, 2014 at 7:51 AM, jrusnak jrus...@live.unc.edu wrote:

 I am working with Solr 4.8.1 to set up an enterprise search system.

 The file system I am working with has numerous files with unique extension
 types (ex .20039 .20040 .20041 etc.)

 I am using the post.jar file included in the binary download (src:
 SimplePostTool.java
 
 http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SimplePostTool.java
 
 )to post these files to the solr server and would like to edit this jar
 file
 to recognize /any/ file extension it comes across.

 Is there a way to do this with the SimplePostTool.java source? I am right
 now working to better understand the Filetype and DEFAULT_FILE_TYPE
 variables as well as the mimeMap. It is these that currently allow me to
 manually add file extensions.

 I would however, like the tool to be able to read in files no matter what
 they extension was and default their mime type to text/plain.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: text search problem

2014-07-21 Thread Erick Erickson
Try escaping the hyphen as \-. Or enclosing it all
in quotes.

But you _really_ have to spend some time with the debug option
an admin/analysis page or you will find endless surprises.

Best,
Erick


On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:


 Thanks for the reply Erick, I will try as you suggested. There I have
  another question related to this lines.

 When I have - in my description , name then the search results are
 different. For e.g.

 ABC-123 , it look sofr ABC or 123, I want to treat this search as exact
 match, i.e if my document has ABC-123 then I should get the results.

 When I check with hl-on, it has emABCem and get the results. How can
 I avoid this situation.

 Thanks

 Ravi


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, July 19, 2014 4:40 PM
 To: solr-user@lucene.apache.org
 Subject: Re: text search problem

 Try adding debug=all to the query and see what the parsed form of the
 query is, likely you're
 1 using phrase queries, so broadway hotel requires both words in the
 1 text
 or
 2 if you're not using phrases, you're searching for the AND of the two
 terms.

 But debug=all will show you.

 Plus, take a look at the admin/analysis page, your tokenization may not be
 what you expect.

 Best,
 Erick


 On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
 Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

  Hi,  Below is the text_general field type when I search Text:Boradway
  it is not returning all the records, it returning only few records.
  But when I search for Text:*Broadway*, it is getting more records.
  When I get into multiple words ln search like Broadway Hotel, it may
  not get Broadway , HotelBroadway Hotel. DO you have any
  thought how to handle these type of keyword search.
 
  Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
  Wash Water Recovery
 
  My Field type look like this.
 
  fieldType name=text_general class=solr.TextField
  positionIncrementGap=100
analyzer type=index
   charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
filter class=solr.KStemFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
  splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
  catenateNumbers=1 catenateAll=1 preserveOriginal=0/
 
!-- in this example, we will only use synonyms at query
 time
  filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
 
/analyzer
analyzer type=query
   charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.KStemFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
  splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
  catenateNumbers=1 catenateAll=1 preserveOriginal=0/
 
   /analyzer
  /fieldType
 
 
 
  Do you have any thought the behavior or how to get this?
 
  Thanks
 
  Ravi
 



Re: How do I disable distributed search feature when I have only one shard

2014-07-21 Thread Erick Erickson
Are you using CloudSolrServer in your SolrJ program?

No matter what, the distrib=false should be keeping the
query from going to more than one shard

So I'd check the logs and see if the suspect query appears in
more than one node.

FWIW,
Erick


On Mon, Jul 21, 2014 at 4:13 PM, pramodEbay prmaha...@ebay.com wrote:

 Hi there,

 We have a solr cloud set up with only one shard. There is one leader and 15
 followers. So the data is replicated on 15 nodes. When we run a solr query,
 only one node should handle the request and we do not need any distributed
 search feature as all the nodes are exact copies of each other.

 Under certain load scenarios, we are seeing SOLRJ api is adding
 isShard=truedistrib=falseshard.url=A,B,C etc. to all the queries.  Is the
 solr query waiting for responses from A, B and C before returning back to
 the client. If that is true, it is unnecessary and causing problems for us
 under heavy load.

 The thing is, somehow, these parameters are automagically added during
 query
 time. How do we disable this. The solrj query that we build programatically
 does not add these three parameters. Is there some configuration we can
 turn
 on, to tell solrj not to add these parameters to the solr request.

 Thanks,
 Pramod



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-do-I-disable-distributed-search-feature-when-I-have-only-one-shard-tp4148449.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud extended warmup support

2014-07-21 Thread Erick Erickson
I've never seen it necessary to run thousands of queries
to warm Solr. Usually less than a dozen will work fine. My
challenge would be for you to measure performance differences
on queries after running, say, 12 well-chosen queries as
opposed to hundreds/thousands. I bet that if
1 you search across all the relevant fields, you'll fill up the
 low-level caches for those fields.
2 you facet on all the fields you intend to facet on.
3 you sort on all the fields you intend to sort on.
4 you specify some filter queries. This is fuzzy since
 really depends on you being able to predict what
 those will be for firstSearcher. Things like in the
 last day/week/month can be pre-configured, but
 others you won't get. BTW, here's a blog about
 why in the last day fq clauses can be tricky.
   http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/

that you'll pretty much nail warmup and be fine. Note that
you can do all the faceting on a single query. Specifying
the primary, secondary  etc. sorts will fill those caches.

Best,
Erick


On Mon, Jul 21, 2014 at 5:07 PM, Jeff Wartes jwar...@whitepages.com wrote:


 On 7/21/14, 4:50 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/21/2014 5:37 PM, Jeff Wartes wrote:
  I¹d like to ensure an extended warmup is done on each SolrCloud node
 prior to that node serving traffic.
  I can do certain things prior to starting Solr, such as pump the index
 dir through /dev/null to pre-warm the filesystem cache, and post-start I
 can use the ping handler with a health check file to prevent the node
 from entering the clients load balancer until I¹m ready.
  What I seem to be missing is control over when a node starts
 participating in queries sent to the other nodes.
 
  I can, of course, add solrconfig.xml firstSearcher queries, which I
 assume (and fervently hope!) happens before a node registers itself in
 ZK clusterstate.json as ready for work, but that doesn¹t scale so well
 if I want that initial warmup to run thousands of queries, or run them
 with some paralleism. I¹m storing solrconfig.xml in ZK, so I¹m sensitive
 to the size.
 
  Any ideas, or corrections to my assumptions?
 
 I think that firstSearcher/newSearcher (and making sure useColdSearcher
 is set to false) is going to be the only way you can do this in a way
 that's compatible with SolrCloud.  If you were doing manual distributed
 search without SolrCloud, you'd have more options available.
 
 If useColdSearcher is set to false, that should keep *everything* from
 using the searcher until the warmup has finished.  I cannot be certain
 that this is the case, but I have some reasonable confidence that this
 is how it works.  If you find that it doesn't behave this way, I'd call
 it a bug.
 
 Thanks,
 Shawn


 Thanks for the quick reply. Since distributed search latency is the max of
 the shard sub-requests, I¹m trying my best to minimize any spikes in
 cluster latency due to node restarts.
 I double-checked useColdSearcher was false, but the doc says this means
 requests ³block until the first searcher is done warming², which
 translates pretty clearly to ³latency spike². The more I think about it,
 the more worried I am that a node might indeed register itself in
 live_nodes and get distributed requests before it¹s got a searcher to work
 with. *Especially* if I have lots of serial firstSearcher queries.

 I¹ll look through the code myself tomorrow, but if anyone can help
 confirm/deny the order of operations here, I¹d appreciate it.




DocValues without re-index?

2014-07-21 Thread Michael Ryan
Is it possible to use DocValues on an existing index without first re-indexing?

-Michael