Re: Every minute update on solrclound

2014-05-03 Thread Shawn Heisey
On 5/3/2014 8:01 PM, eakarsu wrote:
> I am using solr 4.3.1 attached solrconfig.xml file.
> There is no softCommit enabled in config file but master node  is receiving
> continuously an update document evry minute. I could not figure it out where
> this update is coming.
> 
> Solr cloud master:
> [04/May/2014:01:50:54 +] "GET
> /solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
> 160

Your servlet container should have a request log, or the ability to have
a request log.  The jetty that's included in the solr example has a
commented section in etc/jetty.xml that will enable a request log to
names like logs/request.2014_05_04.log ... you just have to uncomment it.

If you're using something like Tomcat instead of the example, you'll
need to figure out where it logs requests, or how to enable its request log.

Once you enable and/or find your servlet container's request log, this
should show you the IP address or hostname where the request originates.

Thanks,
Shawn



Every minute update on solrclound

2014-05-03 Thread eakarsu
I am using solr 4.3.1 attached solrconfig.xml file.
There is no softCommit enabled in config file but master node  is receiving
continuously an update document evry minute. I could not figure it out where
this update is coming.

Solr cloud master:
[04/May/2014:01:50:54 +] "GET
/solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
160
 [04/May/2014:01:51:53 +] "POST
/solr/trcollection2/update/json?commit=true HTTP/1.1" 200 55
[04/May/2014:01:51:54 +] "GET
/solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
160
 [04/May/2014:01:52:53 +] "POST
/solr/trcollection2/update/json?commit=true HTTP/1.1" 200 55
 [04/May/2014:01:52:54 +] "GET
/solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
160
 [04/May/2014:01:53:53 +] "POST
/solr/trcollection2/update/json?commit=true HTTP/1.1" 200 55
 [04/May/2014:01:53:54 +] "GET
/solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
160
 [04/May/2014:01:54:53 +] "POST
/solr/trcollection2/update/json?commit=true HTTP/1.1" 200 55
 - [04/May/2014:01:54:54 +] "GET
/solr/trcollection2/update/json?commit=true?softCommit=true HTTP/1.1" 200
160



Solr clound slave solrconfig.xml
  
- [04/May/2014:01:54:39 +] "POST
/solr/trcollection2_shard9_replica1/update HTTP/1.1" 200 41
- - [04/May/2014:01:54:40 +] "POST
/solr/trcollection2_shard9_replica1/update HTTP/1.1" 200 41
- - [04/May/2014:01:55:39 +] "POST
/solr/trcollection2_shard9_replica1/update HTTP/1.1" 200 41
 - - [04/May/2014:01:55:40 +] "POST
/solr/trcollection2_shard9_replica1/update HTTP/1.1" 200 41




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Every-minute-update-on-solrclound-tp4134489.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellchecking - looking for general advice

2014-05-03 Thread Susheel Kumar
Got it.  Are you also considering Stemming & Phonetic here.  For e.g. phonetic 
may catch some of the restaurant variations and recruiter & recruited may 
convert to base words and at last spell check would have catch all situation.

-Original Message-
From: Maciej Dziardziel [mailto:fied...@gmail.com]
Sent: Saturday, May 03, 2014 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking - looking for general advice

Hi

I've set it to 2, but python implementation of Levenshtein says its 3 for 
restraunt -> restaurant.

On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar 
 wrote:
> How much is the maxEdits you have set. It should catch restaurant example 
> with edit distance set to 2.
>
> Thanks,
> Susheel
>
> -Original Message-
> From: Maciej Dziardziel [mailto:fied...@gmail.com]
> Sent: Friday, May 02, 2014 7:05 PM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking - looking for general advice
>
> Hi
>
> I was looking at spellcheck (Direct and FileBased) and testing that they can 
> do.
> Direct works fine most of the time, but I'd like to find solution for few 
> corner cases:
>
> 1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
> latter.
> Obviously the distance to the former is smaller, so it may be completely 
> arbitrary,
> and perhaps must be handled on application side rather then solr.
> 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to 
> big for that.
>
> Those are few examples of queries that spellcheck gets (according to my 
> requirements) wrong.
> For now I am just looking at possible solutions and I'd need to come up with 
> initial concept to have something to show to users and get more feedback, 
> likely with more cases to correct.
>
> I'd like to know if there are some tweaks to spellcheck component I could 
> make (or perhaps other ways of doing this with solr), or am I forced to 
> hardcode list of all such corrections that go beyond what spellcheck can do?
>
> One solution I am considering is to put list of those special cases
> into FileSpellChecker (it seems to be more relaxed, and handles
> restraunt case well) and fall back to Direct if this yields no
> results... though I am not sure yet how well that would work in
> practice if the list of misspelled words would grow beyond few I have
> now. It would most likely woldn't scale
>
> Another possibility would be to analyze list of queries our users use that 
> yield little results and check if there is spellchecked version that improves 
> that... but that seems to require human to review corrections.
>
> Yet another thing I was thinking about would be to pull terms into separate 
> spellchecker (like aspell) and see if they do better job or are more 
> tweakable.
>
> That's a bit open ended problem, so any advice welcome.
>
> --
> Maciej Dziardziel
> fied...@gmail.com
> This e-mail message may contain confidential or legally privileged 
> information and is intended only for the use of the intended recipient(s). 
> Any unauthorized disclosure, dissemination, distribution, copying or the 
> taking of any action in reliance on the information herein is prohibited. 
> E-mails are not secure and cannot be guaranteed to be error free as they can 
> be intercepted, amended, or contain viruses. Anyone who communicates with us 
> by e-mail is deemed to have accepted these risks. The Digital Group is not 
> responsible for errors or omissions in this message and denies any 
> responsibility for any damage arising from the use of e-mail. Any opinion 
> defamatory or deemed to be defamatory or  any material which could be 
> reasonably branded to be a species of plagiarism and other statements 
> contained in this message and any attachment are solely those of the author 
> and do not necessarily represent those of the company.



--
Maciej Dziardziel
fied...@gmail.com
This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.


Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
No not yet; but that could be one more reason to upgrade.  The 
performance boost from PH is quite nice. In my test, it's about 7x 
faster than the default highlighter, almost 2x faster than "fast" vector 
highlighter, and only about a 50% penalty compared to no highlighting at 
all, so this could be a huge win for us.  I haven't looked at the actual 
highlighting yet.  From what I understand the main sacrifice would be 
phrase-sensitive highlighting, but this could be a good tradeoff.


-Mike

On 5/3/2014 2:39 PM, Markus Jelsma wrote:

Hello michael, you are not on lucene 4.8?
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111


Michael Sokolov  schreef:For posterity, in case 
anybody follows this thread, I tracked the
problem down to WordDelimiterFilter; apparently it creates an offset of
-1 in some case, which PostingsHighlighter rejects.

-Mike


On 5/2/2014 10:20 AM, Michael Sokolov wrote:

I checked using the analysis admin page, and I believe there are
offsets being generated (I assume start/end=offsets).  So IDK I am
going to try reindexing again.  Maybe I neglected to reload the config
before I indexed last time.

-Mike

On 05/02/2014 09:34 AM, Michael Sokolov wrote:

I've been wanting to try out the PostingsHighlighter, so I added
storeOffsetsWithPositions to my field definition, enabled the
highlighter in solrconfig.xml,  reindexed and tried it out. When I
issue a query I'm getting this error:

|field 'text' was indexed without offsets, cannot highlight


java.lang.IllegalArgumentException: field 'text' was indexed without offsets, 
cannot highlight
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
I've been trying to figure out why the field wouldn't have offsets
indexed, but I just can't see it.  Is there something in the analysis
chain that could stripping out offsets?


This is the field definition:

  

(Yes I know PH doesn't require term vectors; I'm keeping them around
for now while I experiment)

  

  
  
  
  
  
  
  
  
  
  
  


  
  
  
  
  
  
  
  

  




Re: Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Ahmet Arslan
Hi,

so this is all about posIncAttribute? 

I had opened https://issues.apache.org/jira/browse/SOLR-3193, about 
ReversedWildcardFilterFactory is causing highlighter exceptions. I wonder  
ReversedWildcardFilter has similar bug.

Ahmet


On Saturday, May 3, 2014 9:39 PM, Markus Jelsma  
wrote:
Hello michael, you are not on lucene 4.8?
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111


Michael Sokolov  schreef:For posterity, in case 
anybody follows this thread, I tracked the 
problem down to WordDelimiterFilter; apparently it creates an offset of 
-1 in some case, which PostingsHighlighter rejects.

-Mike


On 5/2/2014 10:20 AM, Michael Sokolov wrote:
> I checked using the analysis admin page, and I believe there are 
> offsets being generated (I assume start/end=offsets).  So IDK I am 
> going to try reindexing again.  Maybe I neglected to reload the config 
> before I indexed last time.
>
> -Mike
>
> On 05/02/2014 09:34 AM, Michael Sokolov wrote:
>> I've been wanting to try out the PostingsHighlighter, so I added 
>> storeOffsetsWithPositions to my field definition, enabled the 
>> highlighter in solrconfig.xml,  reindexed and tried it out. When I 
>> issue a query I'm getting this error:
>>
>> |field 'text' was indexed without offsets, cannot highlight
>>
>>
>> java.lang.IllegalArgumentException: field 'text' was indexed without 
>> offsets, cannot highlight
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
>> I've been trying to figure out why the field wouldn't have offsets 
>> indexed, but I just can't see it.  Is there something in the analysis 
>> chain that could stripping out offsets?
>>
>>
>> This is the field definition:
>>
>> > multiValued="false" termVectors="true" termPositions="true" 
>> termOffsets="true" storeOffsetsWithPositions="true" />
>>
>> (Yes I know PH doesn't require term vectors; I'm keeping them around 
>> for now while I experiment)
>>
>> > positionIncrementGap="100">
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>> > stemEnglishPossessive="1" protected="protwords.txt"/>
>> 
>> > synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
>> > dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
>> 
>>   
>>   
>> 
>> 
>> 
>> 
>> > protected="protwords.txt"/>
>> 
>> > dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
>> 
>>   
>> 
>


Re: Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Markus Jelsma
Hello michael, you are not on lucene 4.8?
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111


Michael Sokolov  schreef:For posterity, in case 
anybody follows this thread, I tracked the 
problem down to WordDelimiterFilter; apparently it creates an offset of 
-1 in some case, which PostingsHighlighter rejects.

-Mike


On 5/2/2014 10:20 AM, Michael Sokolov wrote:
> I checked using the analysis admin page, and I believe there are 
> offsets being generated (I assume start/end=offsets).  So IDK I am 
> going to try reindexing again.  Maybe I neglected to reload the config 
> before I indexed last time.
>
> -Mike
>
> On 05/02/2014 09:34 AM, Michael Sokolov wrote:
>> I've been wanting to try out the PostingsHighlighter, so I added 
>> storeOffsetsWithPositions to my field definition, enabled the 
>> highlighter in solrconfig.xml,  reindexed and tried it out. When I 
>> issue a query I'm getting this error:
>>
>> |field 'text' was indexed without offsets, cannot highlight
>>
>>
>> java.lang.IllegalArgumentException: field 'text' was indexed without 
>> offsets, cannot highlight
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
>> at 
>> org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
>> I've been trying to figure out why the field wouldn't have offsets 
>> indexed, but I just can't see it.  Is there something in the analysis 
>> chain that could stripping out offsets?
>>
>>
>> This is the field definition:
>>
>> > multiValued="false" termVectors="true" termPositions="true" 
>> termOffsets="true" storeOffsetsWithPositions="true" />
>>
>> (Yes I know PH doesn't require term vectors; I'm keeping them around 
>> for now while I experiment)
>>
>> > positionIncrementGap="100">
>>   
>> 
>> 
>> 
>> 
>> 
>> 
>> > stemEnglishPossessive="1" protected="protwords.txt"/>
>> 
>> > synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
>> > dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
>> 
>>   
>>   
>> 
>> 
>> 
>> 
>> > protected="protwords.txt"/>
>> 
>> > dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
>> 
>>   
>> 
>



Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Michael Sokolov
For posterity, in case anybody follows this thread, I tracked the 
problem down to WordDelimiterFilter; apparently it creates an offset of 
-1 in some case, which PostingsHighlighter rejects.


-Mike


On 5/2/2014 10:20 AM, Michael Sokolov wrote:
I checked using the analysis admin page, and I believe there are 
offsets being generated (I assume start/end=offsets).  So IDK I am 
going to try reindexing again.  Maybe I neglected to reload the config 
before I indexed last time.


-Mike

On 05/02/2014 09:34 AM, Michael Sokolov wrote:
I've been wanting to try out the PostingsHighlighter, so I added 
storeOffsetsWithPositions to my field definition, enabled the 
highlighter in solrconfig.xml,  reindexed and tried it out. When I 
issue a query I'm getting this error:


|field 'text' was indexed without offsets, cannot highlight


java.lang.IllegalArgumentException: field 'text' was indexed without offsets, 
cannot highlight
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
I've been trying to figure out why the field wouldn't have offsets 
indexed, but I just can't see it.  Is there something in the analysis 
chain that could stripping out offsets?



This is the field definition:

multiValued="false" termVectors="true" termPositions="true" 
termOffsets="true" storeOffsetsWithPositions="true" />


(Yes I know PH doesn't require term vectors; I'm keeping them around 
for now while I experiment)


positionIncrementGap="100">

  








stemEnglishPossessive="1" protected="protwords.txt"/>


synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>


  
  





protected="protwords.txt"/>

dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>


  







Re: Spellchecking - looking for general advice

2014-05-03 Thread Maciej Dziardziel
Hi

I've set it to 2, but python implementation of Levenshtein says its 3
for restraunt -> restaurant.

On Sat, May 3, 2014 at 2:44 PM, Susheel Kumar
 wrote:
> How much is the maxEdits you have set. It should catch restaurant example 
> with edit distance set to 2.
>
> Thanks,
> Susheel
>
> -Original Message-
> From: Maciej Dziardziel [mailto:fied...@gmail.com]
> Sent: Friday, May 02, 2014 7:05 PM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking - looking for general advice
>
> Hi
>
> I was looking at spellcheck (Direct and FileBased) and testing that they can 
> do.
> Direct works fine most of the time, but I'd like to find solution for few 
> corner cases:
>
> 1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
> latter.
> Obviously the distance to the former is smaller, so it may be completely 
> arbitrary,
> and perhaps must be handled on application side rather then solr.
> 2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to 
> big for that.
>
> Those are few examples of queries that spellcheck gets (according to my 
> requirements) wrong.
> For now I am just looking at possible solutions and I'd need to come up with 
> initial concept to have something to show to users and get more feedback, 
> likely with more cases to correct.
>
> I'd like to know if there are some tweaks to spellcheck component I could 
> make (or perhaps other ways of doing this with solr), or am I forced to 
> hardcode list of all such corrections that go beyond what spellcheck can do?
>
> One solution I am considering is to put list of those special cases into 
> FileSpellChecker (it seems to be more relaxed, and handles restraunt case 
> well) and fall back to Direct if this yields no results... though I am not 
> sure yet how well that would work in practice if the list of misspelled words 
> would grow beyond few I have now. It would most likely woldn't scale
>
> Another possibility would be to analyze list of queries our users use that 
> yield little results and check if there is spellchecked version that improves 
> that... but that seems to require human to review corrections.
>
> Yet another thing I was thinking about would be to pull terms into separate 
> spellchecker (like aspell) and see if they do better job or are more 
> tweakable.
>
> That's a bit open ended problem, so any advice welcome.
>
> --
> Maciej Dziardziel
> fied...@gmail.com
> This e-mail message may contain confidential or legally privileged 
> information and is intended only for the use of the intended recipient(s). 
> Any unauthorized disclosure, dissemination, distribution, copying or the 
> taking of any action in reliance on the information herein is prohibited. 
> E-mails are not secure and cannot be guaranteed to be error free as they can 
> be intercepted, amended, or contain viruses. Anyone who communicates with us 
> by e-mail is deemed to have accepted these risks. The Digital Group is not 
> responsible for errors or omissions in this message and denies any 
> responsibility for any damage arising from the use of e-mail. Any opinion 
> defamatory or deemed to be defamatory or  any material which could be 
> reasonably branded to be a species of plagiarism and other statements 
> contained in this message and any attachment are solely those of the author 
> and do not necessarily represent those of the company.



-- 
Maciej Dziardziel
fied...@gmail.com


RE: Spellchecking - looking for general advice

2014-05-03 Thread Susheel Kumar
How much is the maxEdits you have set. It should catch restaurant example with 
edit distance set to 2.

Thanks,
Susheel

-Original Message-
From: Maciej Dziardziel [mailto:fied...@gmail.com]
Sent: Friday, May 02, 2014 7:05 PM
To: solr-user@lucene.apache.org
Subject: Spellchecking - looking for general advice

Hi

I was looking at spellcheck (Direct and FileBased) and testing that they can do.
Direct works fine most of the time, but I'd like to find solution for few 
corner cases:

1) having "recruted" and "recruiter" in index, "recruter" should suggest the 
latter.
Obviously the distance to the former is smaller, so it may be completely 
arbitrary,
and perhaps must be handled on application side rather then solr.
2) "restraunt" doesn't suggest "restaurant" - I assume that distance is to big 
for that.

Those are few examples of queries that spellcheck gets (according to my 
requirements) wrong.
For now I am just looking at possible solutions and I'd need to come up with 
initial concept to have something to show to users and get more feedback, 
likely with more cases to correct.

I'd like to know if there are some tweaks to spellcheck component I could make 
(or perhaps other ways of doing this with solr), or am I forced to hardcode 
list of all such corrections that go beyond what spellcheck can do?

One solution I am considering is to put list of those special cases into 
FileSpellChecker (it seems to be more relaxed, and handles restraunt case well) 
and fall back to Direct if this yields no results... though I am not sure yet 
how well that would work in practice if the list of misspelled words would grow 
beyond few I have now. It would most likely woldn't scale

Another possibility would be to analyze list of queries our users use that 
yield little results and check if there is spellchecked version that improves 
that... but that seems to require human to review corrections.

Yet another thing I was thinking about would be to pull terms into separate 
spellchecker (like aspell) and see if they do better job or are more tweakable.

That's a bit open ended problem, so any advice welcome.

--
Maciej Dziardziel
fied...@gmail.com
This e-mail message may contain confidential or legally privileged information 
and is intended only for the use of the intended recipient(s). Any unauthorized 
disclosure, dissemination, distribution, copying or the taking of any action in 
reliance on the information herein is prohibited. E-mails are not secure and 
cannot be guaranteed to be error free as they can be intercepted, amended, or 
contain viruses. Anyone who communicates with us by e-mail is deemed to have 
accepted these risks. The Digital Group is not responsible for errors or 
omissions in this message and denies any responsibility for any damage arising 
from the use of e-mail. Any opinion defamatory or deemed to be defamatory or  
any material which could be reasonably branded to be a species of plagiarism 
and other statements contained in this message and any attachment are solely 
those of the author and do not necessarily represent those of the company.