subject:"Different Results.."

Re: Why do I get different results for the same query with two Solr versions?

2021-01-04 Thread nettadalet

Tulsi wrote
> Can you post the managed schema and solrconfig content here ?

Schema for the 4.6 index (I omitted all non-relevant data):






























Schema for the 7.5 index (I omitted all non-relevant data):






























About the solrconfig.xml file - I don't think I can share it because it may
contain sensitive information. Is there something specific from this file
that may be relevant for our discussion?


Tulsi wrote
> Do try the solr admin analysis screen
> once as well to see the behaviour for this field.
> https://lucene.apache.org/solr/guide/7_6/index.html

I looked at the analysis screen, but it wasn't helpful. That's why I started
using the "debug=query" parameter and the content of parsedquery.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread Tulsi Das

Can you post the managed schema and solrconfig content here ?

Do try the solr admin analysis screen
once as well to see the behaviour for this field.

https://lucene.apache.org/solr/guide/7_6/index.html

On Sun, 27 Dec, 2020, 6:54 pm nettadalet,  wrote:

> Thank you, that was helpful!
>
> For Solr 4.6 I get
> "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
> For Solr 7.5 I get
> "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
> +TITLE_ItemCode_t:7)))"
>
> So this is the cause of the difference in the search result, but I still
> don't know why the parsedquery is different between the two versions.
> Any idea/guess?
> Is it some internal implementation that changed sometime between 4.6 and
> 7.5?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-29 Thread nettadalet

Hi,
thank for the comment, but I tried to use both "sow=false" and "saw=true"
and I still get the same result. For query (TITLE_ItemCode_t:KI_7) I still
see:
Solr 4.6: "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
Solr 7.5: "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"



Tulsi wrote
> Hi ,
> Yes this look like related to sow (split on whitespace) param default
> behaviour change in solr 7.
> 
> The sow parameter (short for "Split on Whitespace") now defaults to
> false, which allows support for multi-word synonyms out of the box.
> This parameter is used with the eDismax and standard/"lucene" query
> parsers. If this parameter is not explicitly specified as true, query
> text will not be split on whitespace before analysis.
> 
> https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
> 
> 
> On Sun, 27 Dec, 2020, 8:25 pm nettadalet, 

> nsteinberg@

>  wrote:
> 
>> I added "defType=lucene" to both searches to make sure I use the same
>> query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re:Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-28 Thread xiefengchang




SOW default to false?
but this seems to be true right??
For Solr 7.5 I get
"parsedquery":"+(+(text1:ki7 (+text1:ki
+text1:7)))"














At 2020-12-28 01:13:29, "Tulsi Das"  wrote:
>Hi ,
>Yes this look like related to sow (split on whitespace) param default
>behaviour change in solr 7.
>
>The sow parameter (short for "Split on Whitespace") now defaults to
>false, which allows support for multi-word synonyms out of the box.
>This parameter is used with the eDismax and standard/"lucene" query
>parsers. If this parameter is not explicitly specified as true, query
>text will not be split on whitespace before analysis.
>
>https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html
>
>
>On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:
>
>> I added "defType=lucene" to both searches to make sure I use the same query
>> parser, but it didn't change the results.
>>
>>
>>
>> --
>> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread Tulsi Das

Hi ,
Yes this look like related to sow (split on whitespace) param default
behaviour change in solr 7.

The sow parameter (short for "Split on Whitespace") now defaults to
false, which allows support for multi-word synonyms out of the box.
This parameter is used with the eDismax and standard/"lucene" query
parsers. If this parameter is not explicitly specified as true, query
text will not be split on whitespace before analysis.

https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html

On Sun, 27 Dec, 2020, 8:25 pm nettadalet,  wrote:

> I added "defType=lucene" to both searches to make sure I use the same query
> parser, but it didn't change the results.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet

I added "defType=lucene" to both searches to make sure I use the same query
parser, but it didn't change the results.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet

I'm not sure how to check the implementation of the query parser, or how to
change the query parser that I use. I think I'm using the standard query
parser.

I use Solr Admin to run the queries. If I look at the URL, I see
Solr 4.6:
select?q=TITLE_ItemCode_t:KI_7=TITLE_ItemCode_t
Solr 7.5:
select?q=TITLE_ItemCode_t:KI_7=TITLE_ItemCode_t

Should I change something?
Where should I look?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re:Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread xiefengchang

which query parser are you using? I think to answer your question, you need to 
check the implementation of the query parser

















At 2020-12-27 21:23:59, "nettadalet"  wrote:
>Thank you, that was helpful!
>
>For Solr 4.6 I get 
>"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"
>
>For Solr 7.5 I get
>"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
>+TITLE_ItemCode_t:7)))"
>
>So this is the cause of the difference in the search result, but I still
>don't know why the parsedquery is different between the two versions.
>Any idea/guess?
>Is it some internal implementation that changed sometime between 4.6 and
>7.5?
>
>
>
>--
>Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

2020-12-27 Thread nettadalet

Thank you, that was helpful!

For Solr 4.6 I get 
"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")"

For Solr 7.5 I get
"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki
+TITLE_ItemCode_t:7)))"

So this is the cause of the difference in the search result, but I still
don't know why the parsedquery is different between the two versions.
Any idea/guess?
Is it some internal implementation that changed sometime between 4.6 and
7.5?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread Tulsi Das

Hi,
Try adding debug=true or debug=query in the url and see the formed query at
the end .
You will get to know why the results are different.


On Thu, 24 Dec, 2020, 8:05 pm nettadalet,  wrote:

> Hello,
>
> I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
> search with both versions, I get different results, and I don't know why
>
> I have the following *field type definition in Solr 4.6*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
>
>
> I have the following *field type definition in Solr 7.5*:
>  positionIncrementGap="1000">
> 
> 
> 
>  words="stopwords.txt" />
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> 
> 
> 
> 
> 
> 
>  synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords.txt"
>/>
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>     
>     
> 
>
> * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
> solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
> but the result was the same.
>
> I have the following *6 values set for field text1 of type text_type1 for 6
> different documents* (the type(s) from above):
> KI_d5e7b43a
> KI_b7c490bd
> KI_7df2f026
> KI_fa7d129d
> KI_5867aec7
> KI_7c3c0b93
>
>
> My query is *text1=KI_7*.
> Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
> Using Solr 7.5, I get all 6 results.
>
> Questions:
> 1. How come I get different results with the same data, when my fields
> definitions are the same (as far as I can tell)?
>
> 2. What are the expected results?
> I think that the results Solr 7.5 returns are the correct ones, since at
> the
> end of the of the analysis I get *KA* as a term and *7* as a term, both
> during the indexing analysis and the query analysis, so, to my
> understanding, all 6 results should be found.
> Is this correct? if not, what am I missing? what don't I understand
> correctly?
>
> I would very much appreciate a full/partial answer, but even a link that
> could explain at least the expected results part would be great.
>
> Thanks in advance, I know this might be a tough one to answer [Hope not
> :)]
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Why do I get different results for the same query with two Solr versions?

2020-12-24 Thread nettadalet

Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:



















I have the following *field type definition in Solr 7.5*:



















* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J

> : I am going to adjust my schema, re-index, and try again. See if that
> : doesn't fix this problem. I didn't know that having the uniqueKey be a
> : textField was a bad idea.
>
>
> https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey
>
> "The fieldType of uniqueKey must not be analyzed"
>
> (hence my comment baout "possible, but hard to get right ... you can use
> something like the KeywordTokenizer, but at that point you might as well
> use StrField except in some really esoteric special situations)
>
>
Good news. I added a field called ID, and made it string. Then I deleted
documents, re-indexed my data, and tried the search again.

Now solrResults size and numFound size are exactly the same.

Thanks for your help.

Rhys

Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter



: > whoa... that's not normal .. what *exactly* does the fieldType declaration
: > (with all analyzers) look like, and what does the  declaration
: > look like?
: >
: >
: 
: 
: 

NOTE: "text_general" != "text_gen_sort"

Assuming your "text_general" declaration looks like it does in the 
_default config set, then using that for uniqueKey or sorting is definitly 
not a good idea.

If you were *actually* using SortableTextField for your uniqueKeyField ... 
well, that should be ok to *sort* on, but i still wouldn't suggest using 
it as a uniqueKey field ... honestly not sure what behavior that might 
have with things like deleteById, etc...


: I am going to adjust my schema, re-index, and try again. See if that
: doesn't fix this problem. I didn't know that having the uniqueKey be a
: textField was a bad idea.

https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey

"The fieldType of uniqueKey must not be analyzed"

(hence my comment baout "possible, but hard to get right ... you can use 
something like the KeywordTokenizer, but at that point you might as well 
use StrField except in some really esoteric special situations)



-Hoss
http://www.lucidworks.com/

Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J

On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter 
wrote:

>
> : > a) What is the fieldType of the uniqueKey field in use?
> : >
> :
> : It is a textField
>
> whoa... that's not normal .. what *exactly* does the fieldType declaration
> (with all analyzers) look like, and what does the  declaration
> look like?
>
>




  
  
  


  
  
  
  

  



> you should really never use TextField for a uniqueKey ... it's possible,
> but incredibly tricky to get "right".
>
>
I am going to adjust my schema, re-index, and try again. See if that
doesn't fix this problem. I didn't know that having the uniqueKey be a
textField was a bad idea.


> Independent from that, "sorting" on a TextField doesn't always do what you
> might think (again: depending on the analysis in use)
>
> With a cursorMark you have other factors to consider: i bet what's
> happening is that the post-analysis terms for your docs result it
> duplicate values, so the cursorMark is skipping all docs that have hte
> same (post analysis) sort value ... this could also manifest itself in
> other weird ways, like trying to deleteById.
>
> Step #1: switch to using a simple StrField for your uniqueKey field and
> see if htat solves all your problems.
>
>
Thanks, doing this now.

Rhys

Re: different results in numFound vs using the cursor

2019-11-12 Thread Chris Hostetter



: > a) What is the fieldType of the uniqueKey field in use?
: >
: 
: It is a textField

whoa... that's not normal .. what *exactly* does the fieldType declaration 
(with all analyzers) look like, and what does the  declaration 
look like?

you should really never use TextField for a uniqueKey ... it's possible, 
but incredibly tricky to get "right".

Independent from that, "sorting" on a TextField doesn't always do what you 
might think (again: depending on the analysis in use)

With a cursorMark you have other factors to consider: i bet what's 
happening is that the post-analysis terms for your docs result it 
duplicate values, so the cursorMark is skipping all docs that have hte 
same (post analysis) sort value ... this could also manifest itself in 
other weird ways, like trying to deleteById.

Step #1: switch to using a simple StrField for your uniqueKey field and 
see if htat solves all your problems.


-Hoss
http://www.lucidworks.com/

Re: different results in numFound vs using the cursor

2019-11-12 Thread rhys J

On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter 
wrote:

>
> Based on the info provided, it's hard to be certain, but reading between
> the lines here are hte assumptions i'm making...
>
> 1) your core name is "dbtr"
> 2) the uniqueId field for the "dbtr" core is "debtor_id"
>
> ..are those assumptions correct?
>

Yes they are. Sorry I didn't provide that from the beginning.


> Two key pieces of information that doesn't seem to be assumable from the
> imfo you've provided:
>
> a) What is the fieldType of the uniqueKey field in use?
>

It is a textField


> b) how are you determining that "The numFound: 35008"
>
>
I do a preliminary query to the solr core and print out the numFound from
this:

 my $solrResponse = $ua->post( $solrURI );

 my $decoded = decode_json( $solrResponse->{_content} );
 my $numFound = $decoded->{response}{numFound};


> ...
>
> You show the code that prints out "size of solrResults: 22006" but nothing
> in your code ever prints $numFound.  there is a snippet of code at the top
>

I am printing numFound every time it loops. This should remain constant,
because it is the total of all documents found. It's not really necessary
that I am printing it.

The number of docs is the size that I also print, and that is 1000 every
time, until the last little bit, and then it is 6 docs found.


> of your perl logic that seems disconnected from the rest of the code which
> makes me think that before you do anything with a cursor you are already
> parsing some *other* query response to get $numFound that way...
>
>
I am running this query first, to get the cursor set:

"http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id
asc=debt_id: 608384 OR debt_id: 393291=*"

This sets the cursor, and then returns a cursorMark that I start using in
order to grab 1000 documents at a time.



> ...what exactly does all the code *before* this look like? what is the
> request that you are using to get that initial '$solrResponse' that you
> are parsing to extract '$numFound'  are you sure it's exactly the same as
> the query whose cursor you are iterating over?
>
>
query from before the loop:

"http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id
asc=debt_id: 608384 OR debt_id: 393291=*"

query in the loop:

http://10.40.10.14:8983/solr/debt/select?indent=on=1000=id+asc=debt_id:
608384 OR debt_id: 393291=AoElMTg1MzE=

I do have some logic to make sure i grab the first 1000 from the first
query, but other than that, it's a simple loop.


> It looks like you are (also) extracting 'my $numFound =
> $decoded->{response}{numFound};' on every (cusor) request ... what do you
> get if add this to your cursor loop...
>
>print STDERR "numFound = $numFound at '$cursor'";
>
> numFound is always 35008 because that is how many total documents are
found. The number of docs in the response is the number that I care about,
because that shows me how many came back for this slice.


> ...because unless documents are being added/deleted as you iterate over
> hte cursor, the numFound value should be consistent on each request.
>
>
numFound is consistently 35008.

Thanks

Rhys

Re: different results in numFound vs using the cursor

2019-11-11 Thread Chris Hostetter



Based on the info provided, it's hard to be certain, but reading between 
the lines here are hte assumptions i'm making...

1) your core name is "dbtr"
2) the uniqueId field for the "dbtr" core is "debtor_id"

..are those assumptions correct?

Two key pieces of information that doesn't seem to be assumable from the 
imfo you've provided:

a) What is the fieldType of the uniqueKey field in use?
b) how are you determining that "The numFound: 35008"

...

You show the code that prints out "size of solrResults: 22006" but nothing 
in your code ever prints $numFound.  there is a snippet of code at the top 
of your perl logic that seems disconnected from the rest of the code which 
makes me think that before you do anything with a cursor you are already 
parsing some *other* query response to get $numFound that way...

: i am using this logic in perl:
: 
: my $decoded = decode_json( $solrResponse->{_content} );
: my $numFound = $decoded->{response}{numFound};
: 
: $cursor = "*";
: $prevCursor = '';
: 
: while ( $prevCursor ne $cursor )
: {
:   my $solrURI = "\"http://[SOLR URL]:8983/solr/";
:   $solrURI .= $fdat{core};
...

...what exactly does all the code *before* this look like? what is the 
request that you are using to get that initial '$solrResponse' that you 
are parsing to extract '$numFound'  are you sure it's exactly the same as 
the query whose cursor you are iterating over?

It looks like you are (also) extracting 'my $numFound = 
$decoded->{response}{numFound};' on every (cusor) request ... what do you 
get if add this to your cursor loop...

   print STDERR "numFound = $numFound at '$cursor'";


...because unless documents are being added/deleted as you iterate over 
hte cursor, the numFound value should be consistent on each request.


-Hoss
http://www.lucidworks.com/

different results in numFound vs using the cursor

2019-11-11 Thread rhys J

i am using this logic in perl:

my $decoded = decode_json( $solrResponse->{_content} );
my $numFound = $decoded->{response}{numFound};

$cursor = "*";
$prevCursor = '';

while ( $prevCursor ne $cursor )
{
  my $solrURI = "\"http://[SOLR URL]:8983/solr/";
  $solrURI .= $fdat{core};

  $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc";
  $solrOptions = "/select?indent=on=$getrows=$solrSort=";
  $solrURI .= $solrOptions;
  $solrURI .= $query;

 $solrURI .= ( $prevCursor eq '' ) ? "=*\"":
 "=$cursor\"";

 print STDERR "solrURI '$solrURI'\n";
 my $solrResponse = $ua->post( $solrURI );
   my $decoded = decode_json( $solrResponse->{_content} );
  my $numFound = $decoded->{response}{numFound};

 foreach my $d ( $decoded->{response}{docs} )
  {
  my @docs = @$d;
  print STDERR "size of docs '" . scalar( @docs ) . "'\n";
   foreach my $r ( @docs )
   {
   if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' )
   {
   push ( @solrResults, $r->{debtor_id} );
   }
   elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' )
   {
   push ( @solrResults, $r->{debt_id} );
   }
   }

}
   $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor;
 $cursor = $decoded->{nextCursorMark};
  print STDERR "cursor '$cursor'\n";
  print STDERR "prevCursor '$prevCursor'\n";
  print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n";
}

print out:

http://[SOLR
URL]:8983/solr/debt/select?indent=on=1000=id+asc=debt_id:
608384 OR debt_id: 393291=AoEmMzkzMjkx

The numFound: 35008
final size of solrResults: 22006

Am I missing something I should be using with cursorMark? Or is this
expected?

I've checked my logic, and I'm using the cursors the way this page is using
them in examples:

https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html

Thanks

Rhys

Re: Different results due to sharding and problems with interesting terms in MLT

2019-09-27 Thread Lucky Sharma

Hi Salman,

1. For 1st one:
 One suggestion could be, don't create  [@, ., -, _, +, #, *] as
individual tokens. I guess you need to update your tokenizer in that case.

2. For the second issue, is the score of both the results same? If the
score is same and the queries are same then the reason would be  Lucene doc
ID. I have also observed the same thing in Solr 7.6.0, and my reason for
that was, docID for the same doc could be different in both the nodes. so
for making the same record order what you can do is, add  "id desc" as very
last stage of sorting

Regards,
Lucky Sharma

On Sat, 28 Sep, 2019, 8:22 am Salmaan Rashid Syed, <
salmaan.ras...@mroads.com> wrote:

> Hi Solr Users,
>
> I have two questions,
>
> 1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I
> need to allow users to search on emails and skills, so I have allowed few
> of the special characters such as [@, ., -, _, +, #, *]. I am not using
> stemmer as it is removing letter "s" from many of the useful words like
> "AngularJS" to "AngularJ".
>
> Now when I enter a processed text as query into the search bar, I get "."
> as the "*most interesting term*" boosted by the highest order usually. I
> can't figure out how to remove this from interesting terms without removing
> it from the field I am searching in.
>
> 2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud
> mode. I am getting different results for same query.
>
> I have come to know through reading forums and documentation that this is
> happening due to sharding and due to calculation of stats on individual
> sharding rather than on entire collection. So I implemented one of the
> solutions mentioned in forum/documentations in solrconfig.xml as follows,
>
> 
>
> It still doesn't works and gives different results for same query. Please
> let me know what can be done to avoid these issues.
>
> Regards,
> Salmaan
>

Different results due to sharding and problems with interesting terms in MLT

2019-09-27 Thread Salmaan Rashid Syed

Hi Solr Users,

I have two questions,

1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I
need to allow users to search on emails and skills, so I have allowed few
of the special characters such as [@, ., -, _, +, #, *]. I am not using
stemmer as it is removing letter "s" from many of the useful words like
"AngularJS" to "AngularJ".

Now when I enter a processed text as query into the search bar, I get "."
as the "*most interesting term*" boosted by the highest order usually. I
can't figure out how to remove this from interesting terms without removing
it from the field I am searching in.

2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud
mode. I am getting different results for same query.

I have come to know through reading forums and documentation that this is
happening due to sharding and due to calculation of stats on individual
sharding rather than on entire collection. So I implemented one of the
solutions mentioned in forum/documentations in solrconfig.xml as follows,



It still doesn't works and gives different results for same query. Please
let me know what can be done to avoid these issues.

Regards,
Salmaan

Re:Solr query fetching different results

2019-09-19 Thread Ramsey Haddad (BLOOMBERG/ LONDON)

Your query seems simple enough that this may not be your issue, but just 
mentioning it:

Your collection has 1 shard. Depending on how the query is sent, queries to 1 
shard collections can sometimes get interpreted as a "distributed query" and 
sometimes as a "non-distributed query". These have different code paths that 
should *in theory* give identical results. When we made some code extensions to 
Solr in our private plugins, we decided not to support both code paths and so 
instead we use shortCircuit=false (we sent this in the config of our 
) to force use of the distributed query code path. (We want our 
change to work for both our 60 shard collection and our 1 shard collection.) 
This gives us more consistent results from different ways of invoking the 
search.

But, again, your query seems too simple for this to be the cause -- why would 
the distributed vs non-distributed return different results for this??

From: solr-user@lucene.apache.org At: 09/19/19 06:20:30To:  
solr-user@lucene.apache.org
Subject: Solr query fetching different results

Hi all,

There is something "strange' happening in our Solr cluster. If I execute a
query from the server, via solarium client, I get one result. If I execute
the same or similar query from admin Panel, I get another result. If I go
to Admin Panel  - Collections - Select Collection and click "Reload", and
then repeat the query, the result I get is consistent with  the one I get
from the server via solarium client. So I picked the query that is getting
executed, from Solr logs. Evidently, the query was going to different nodes.

Query that went from Admin Panel, went to node 4 and fetched 0 documents
2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
[c:paymetryproducts s:shard1 r:*core_node4*
x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
status=0 QTime=0


Query that went from solarium client running on a server, went to node 3
and fetched 4 documents

2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
[c:paymetryproducts s:shard1 r:*core_node3*
x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39=flat=true=I
D=0=90=json}
*hits=4* status=0 QTime=104

What could be causing this strange behaviour? How can I fix this?
SOlr Version - 7.3
Shard count: 1
replicationFactor: 2
maxShardsPerNode: 1

Regards,
Jayadevan

Re: Solr query fetching different results

2019-09-19 Thread Erick Erickson

Multiple replicas of the same shard will execute their autocommits at
different wall clock times.
Thus there may be a _temporary_ time when newly-indexed document is
found by a query that
happens to get served by replica1 but not by replica2. If you have a
timestamp in the doc, and
a soft commit interval of, say, 1 minute, you can test whether this is
the case by adding
=timestamp:[* TO NOW-2MINUTE]. In that case you should see identical returns.

Best,
Erick

On Thu, Sep 19, 2019 at 1:20 AM Jayadevan Maymala
 wrote:
>
> Hi all,
>
> There is something "strange' happening in our Solr cluster. If I execute a
> query from the server, via solarium client, I get one result. If I execute
> the same or similar query from admin Panel, I get another result. If I go
> to Admin Panel  - Collections - Select Collection and click "Reload", and
> then repeat the query, the result I get is consistent with  the one I get
> from the server via solarium client. So I picked the query that is getting
> executed, from Solr logs. Evidently, the query was going to different nodes.
>
> Query that went from Admin Panel, went to node 4 and fetched 0 documents
> 2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
> [c:paymetryproducts s:shard1 r:*core_node4*
> x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
> status=0 QTime=0
>
>
> Query that went from solarium client running on a server, went to node 3
> and fetched 4 documents
>
> 2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
> [c:paymetryproducts s:shard1 r:*core_node3*
> x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
> [paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
> params={q=category_id:5a0aeaeea6bc7239cc21ee39=flat=true=ID=0=90=json}
> *hits=4* status=0 QTime=104
>
> What could be causing this strange behaviour? How can I fix this?
> SOlr Version - 7.3
> Shard count: 1
> replicationFactor: 2
> maxShardsPerNode: 1
>
> Regards,
> Jayadevan

Solr query fetching different results

2019-09-18 Thread Jayadevan Maymala

Hi all,

There is something "strange' happening in our Solr cluster. If I execute a
query from the server, via solarium client, I get one result. If I execute
the same or similar query from admin Panel, I get another result. If I go
to Admin Panel  - Collections - Select Collection and click "Reload", and
then repeat the query, the result I get is consistent with  the one I get
from the server via solarium client. So I picked the query that is getting
executed, from Solr logs. Evidently, the query was going to different nodes.

Query that went from Admin Panel, went to node 4 and fetched 0 documents
2019-09-19 05:02:04.549 INFO  (qtp434091818-205178)
[c:paymetryproducts s:shard1 r:*core_node4*
x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n2]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0*
status=0 QTime=0


Query that went from solarium client running on a server, went to node 3
and fetched 4 documents

2019-09-19 05:06:41.511 INFO  (qtp434091818-17)
[c:paymetryproducts s:shard1 r:*core_node3*
x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request
[paymetryproducts_shard1_replica_n1]  webapp=/solr path=/select
params={q=category_id:5a0aeaeea6bc7239cc21ee39=flat=true=ID=0=90=json}
*hits=4* status=0 QTime=104

What could be causing this strange behaviour? How can I fix this?
SOlr Version - 7.3
Shard count: 1
replicationFactor: 2
maxShardsPerNode: 1

Regards,
Jayadevan

Re: Consecutive calls to a query give different results

2017-09-08 Thread Erick Erickson

Here's Mike McCandless' blog on the topic:

https://www.elastic.co/blog/lucenes-handling-of-deleted-documents

The same options he mentions are available in Solr as both use Lucene
under the covers.

The long and short of it is that you can have a significant amount of
deleted documents in your index, depending on the update pattern.

One thing Mike doesn't mention is at the root of why I'm so negative
about optimize (and forceMerge is just an optimize that only mashes
segments together if they have > X% deleted docs). Let's say your max
segment size is 5G. And you optimize an index down to a single 100G
segment. That segment will _not_ be merged until it has < 2.5G live
docs. That's not a typo. 97.5% deleted docs..

You could ameliorate this somewhat by specifying the number of
segments after optimizing (default is 1). Say you determine that you
have 100G of live data, specify 20 segments for optimize. This would
be better I'd guess, but haven't tested personally.

Best,
Erick

On Fri, Sep 8, 2017 at 10:36 AM, Webster Homer <webster.ho...@sial.com> wrote:
> Thank you, Erick Erickson and Shawn Heisey for your excellent answers.
> For some of our collections, it would seem that an occasional optimize
> would be a good thing. However we have some collections that are updated
> constantly
>
> Would using the commit expungeDeletes help mitigate the issue?
>
> I also came across a discussion of Lucene merge policies. and the
> TieredMergePolicy.
> Is there documentation about this? I notice that a couple of our replicas
> in some of our collections have ~30% deleted documents which I would think
> would contribute to the problem.
> I have at least 3 collections that are updated constantly, and would not
> lend themselves to being optimized what is the best approach for these?
>
> Thanks
>
> On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 9/7/2017 8:54 AM, Webster Homer wrote:
>> > I am not concerned about deleted documents. I am concerned that the same
>> > search gives different results after each search. The top document seems
>> to
>> > cycle between 3 different documents
>> >
>> > I have an enhanced collections info api call that calls the core admin
>> api
>> > to get the index information for the replica.
>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>> > deleted documents are not the same for the replicas, but the number of
>> > numdocs is.
>> >
>> > Or are you saying that the search is looking at deleted documents
>> wouldn't
>> > that be a very significant bug?
>>
>> Lucene score calculations take a lot of information in the index into
>> account when calculating the score.  That includes deleted documents,
>> because they are part of the index.  When you delete a document, Lucene
>> just makes a note saying "internal document ID number  is deleted."
>> The actual information for that document is not removed from the index,
>> because doing so could take a very long time.
>>
>> When you make queries against a replicated SolrCloud, the queries are
>> load balanced across the entire cloud, so different queries will hit
>> different replicas.  With different numbers of deleted documents in
>> different replicas (which is not unusual), the scores are going to come
>> out a little bit different on each query.  If you're sorting by score
>> (which is the default sort), that *can* affect the order.  Your replicas
>> have a fairly high percentage of deleted documents, so there is a lot of
>> extra information affecting the scores.  The relative difference in the
>> deleted document count between the replicas is high as well, so multiple
>> queries could be substantially different.
>>
>> It is not a bug that Lucene and Solr look at deleted documents.
>> Removing deleted document information from things like the score
>> calculation would be VERY computationally intense, bordering on the
>> impossible.  To assure good performance, Lucene doesn't even try.
>> Because the way Lucene tracks deleted documents is with a list of
>> internal Lucene document IDs, those documents are easily removed from
>> *results*, but their contents are an integral part of the index and that
>> information can only be truly removed by completely rewriting (merging)
>> the index segments.
>>
>> You can get rid of all deleted documents with an optimize operation,
>> which is a forced merge of the entire index down to one segment -- but
>> just like it sounds, that is a complete rewrite of the index.  It
>> involves a huge amount of CPU resources and disk

Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer

Thank you, Erick Erickson and Shawn Heisey for your excellent answers.
For some of our collections, it would seem that an occasional optimize
would be a good thing. However we have some collections that are updated
constantly

Would using the commit expungeDeletes help mitigate the issue?

I also came across a discussion of Lucene merge policies. and the
TieredMergePolicy.
Is there documentation about this? I notice that a couple of our replicas
in some of our collections have ~30% deleted documents which I would think
would contribute to the problem.
I have at least 3 collections that are updated constantly, and would not
lend themselves to being optimized what is the best approach for these?

Thanks

On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 9/7/2017 8:54 AM, Webster Homer wrote:
> > I am not concerned about deleted documents. I am concerned that the same
> > search gives different results after each search. The top document seems
> to
> > cycle between 3 different documents
> >
> > I have an enhanced collections info api call that calls the core admin
> api
> > to get the index information for the replica.
> > When I said the numdocs were the same I meant exactly that. maxdocs and
> > deleted documents are not the same for the replicas, but the number of
> > numdocs is.
> >
> > Or are you saying that the search is looking at deleted documents
> wouldn't
> > that be a very significant bug?
>
> Lucene score calculations take a lot of information in the index into
> account when calculating the score.  That includes deleted documents,
> because they are part of the index.  When you delete a document, Lucene
> just makes a note saying "internal document ID number  is deleted."
> The actual information for that document is not removed from the index,
> because doing so could take a very long time.
>
> When you make queries against a replicated SolrCloud, the queries are
> load balanced across the entire cloud, so different queries will hit
> different replicas.  With different numbers of deleted documents in
> different replicas (which is not unusual), the scores are going to come
> out a little bit different on each query.  If you're sorting by score
> (which is the default sort), that *can* affect the order.  Your replicas
> have a fairly high percentage of deleted documents, so there is a lot of
> extra information affecting the scores.  The relative difference in the
> deleted document count between the replicas is high as well, so multiple
> queries could be substantially different.
>
> It is not a bug that Lucene and Solr look at deleted documents.
> Removing deleted document information from things like the score
> calculation would be VERY computationally intense, bordering on the
> impossible.  To assure good performance, Lucene doesn't even try.
> Because the way Lucene tracks deleted documents is with a list of
> internal Lucene document IDs, those documents are easily removed from
> *results*, but their contents are an integral part of the index and that
> information can only be truly removed by completely rewriting (merging)
> the index segments.
>
> You can get rid of all deleted documents with an optimize operation,
> which is a forced merge of the entire index down to one segment -- but
> just like it sounds, that is a complete rewrite of the index.  It
> involves a huge amount of CPU resources and disk I/O, and can severely
> impact normal indexing and query operations while it's happening.  If
> the collection is extremely large, an optimize could take hours.  For
> indexes that change rapidly, optimize is strongly discouraged, except as
> an occasional "clean things up" operation, run during non-peak times.
>
> Thanks,
> Shawn
>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: Consecutive calls to a query give different results

2017-09-08 Thread Shawn Heisey

On 9/7/2017 8:54 AM, Webster Homer wrote:
> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the core admin api
> to get the index information for the replica.
> When I said the numdocs were the same I meant exactly that. maxdocs and
> deleted documents are not the same for the replicas, but the number of
> numdocs is.
>
> Or are you saying that the search is looking at deleted documents wouldn't
> that be a very significant bug?

Lucene score calculations take a lot of information in the index into
account when calculating the score.  That includes deleted documents,
because they are part of the index.  When you delete a document, Lucene
just makes a note saying "internal document ID number  is deleted." 
The actual information for that document is not removed from the index,
because doing so could take a very long time.

When you make queries against a replicated SolrCloud, the queries are
load balanced across the entire cloud, so different queries will hit
different replicas.  With different numbers of deleted documents in
different replicas (which is not unusual), the scores are going to come
out a little bit different on each query.  If you're sorting by score
(which is the default sort), that *can* affect the order.  Your replicas
have a fairly high percentage of deleted documents, so there is a lot of
extra information affecting the scores.  The relative difference in the
deleted document count between the replicas is high as well, so multiple
queries could be substantially different.

It is not a bug that Lucene and Solr look at deleted documents. 
Removing deleted document information from things like the score
calculation would be VERY computationally intense, bordering on the
impossible.  To assure good performance, Lucene doesn't even try. 
Because the way Lucene tracks deleted documents is with a list of
internal Lucene document IDs, those documents are easily removed from
*results*, but their contents are an integral part of the index and that
information can only be truly removed by completely rewriting (merging)
the index segments.

You can get rid of all deleted documents with an optimize operation,
which is a forced merge of the entire index down to one segment -- but
just like it sounds, that is a complete rewrite of the index.  It
involves a huge amount of CPU resources and disk I/O, and can severely
impact normal indexing and query operations while it's happening.  If
the collection is extremely large, an optimize could take hours.  For
indexes that change rapidly, optimize is strongly discouraged, except as
an occasional "clean things up" operation, run during non-peak times.

Thanks,
Shawn

Re: Consecutive calls to a query give different results

2017-09-08 Thread Webster Homer

We have several cloud collections, but this one is updated once a day with
a partial load, and once a week with a full load, followed by a delete
which is based upon an index_date field (timestamp of the solr record).

For this and related collections optimizing once per day is probably
acceptable.

We do have other collections that are updated every 15 minutes, I don't
think those would be able to be optimized from what you write.



On Thu, Sep 7, 2017 at 5:10 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: So apparently it IS essential to run optimize after a data load
>
> Don't do this if you can avoid it, you run the risk of excessive
> amounts of your index consisting of deleted documents unless you are
> following a process whereby you periodically (and I'm talking at least
> hours, if not once per day) index data then don't change the index for
> a bunch more hours.
>
> You're missing the point when it comes to deleted docs. Different
> replicas of the _same_ shard commit at different wall clock times due
> to network delays. Therefore, which segments are merged will not be
> identical between replicas when a commit happens, since commits are
> local.
>
> So replica1 may merge segments 1, 3, 6 in to segment 7
> replica2 may merge segments 1, 2, 4 into segment 7
>
> Here's the key: Now replica1 may have 100 deleted documents (ones
> marked as deleted but still in segments 2, 4 and 5
>  replica2 may have 90 deleted
> documents (the ones still in segments 3, 5 and 6)
>
> The statistics in the term frequency and document frequency for some
> terms are _not_ the same. Therefore the scoring will be slightly
> different. Therefore, depending on which replica serves the query, the
> order of docs may be somewhat different if the scores are close.
>
> optimizing squeezes all the deleted documents out of all the replicas
> so the scores become identical.
>
> This doesn't happen, of course, if you have only one replica.
>
> Best,
> Erick
>
> On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer <webster.ho...@sial.com>
> wrote:
> > We have several solr clouds, a couple of them have only 1 replica per
> > shard. We have never observed the problem when we have a single replica
> > only when there are multiple replicas per shard.
> >
> > On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer <webster.ho...@sial.com>
> > wrote:
> >
> >> the scores are not the same
> >> Doc
> >> 305340 432.44238
> >> C2646 428.24185
> >> 12837 430.61722
> >>
> >> One other thing. I just ran optimize and now document 305340 is
> >> consistently the top score.
> >> So apparently it IS essential to run optimize after a data load
> >>
> >> Note we see this behavior fairly commonly on our solr cloud instances.
> >> This was not the first time. This particular situation was on a
> development
> >> system
> >>
> >> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer <webster.ho...@sial.com>
> >> wrote:
> >>
> >>> the scores are not the same
> >>> Doc
> >>> 305340 432.44238
> >>>
> >>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
> >>> hastings.recurs...@gmail.com> wrote:
> >>>
> >>>> "I am concerned that the same
> >>>> search gives different results after each search. The top document
> seems
> >>>> to
> >>>> cycle between 3 different documents"
> >>>>
> >>>>
> >>>> if you do debug query on the search, are the scores for the top 3
> >>>> documents
> >>>> the same or not?  you can easily have three documents with the same
> >>>> score,
> >>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
> >>>> expect
> >>>> 1-1-1 to rotate based on whatever.  use a second element like id to
> your
> >>>> ranking perhaps.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <
> webster.ho...@sial.com>
> >>>> wrote:
> >>>>
> >>>> > I am not concerned about deleted documents. I am concerned that the
> >>>> same
> >>>> > search gives different results after each search. The top document
> >>>> seems to
> >>>> > cycle between 3 different documents
> >>>> >
> >>>> > I have an enhanced collections info a

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson

bq: So apparently it IS essential to run optimize after a data load

Don't do this if you can avoid it, you run the risk of excessive
amounts of your index consisting of deleted documents unless you are
following a process whereby you periodically (and I'm talking at least
hours, if not once per day) index data then don't change the index for
a bunch more hours.

You're missing the point when it comes to deleted docs. Different
replicas of the _same_ shard commit at different wall clock times due
to network delays. Therefore, which segments are merged will not be
identical between replicas when a commit happens, since commits are
local.

So replica1 may merge segments 1, 3, 6 in to segment 7
replica2 may merge segments 1, 2, 4 into segment 7

Here's the key: Now replica1 may have 100 deleted documents (ones
marked as deleted but still in segments 2, 4 and 5
 replica2 may have 90 deleted
documents (the ones still in segments 3, 5 and 6)

The statistics in the term frequency and document frequency for some
terms are _not_ the same. Therefore the scoring will be slightly
different. Therefore, depending on which replica serves the query, the
order of docs may be somewhat different if the scores are close.

optimizing squeezes all the deleted documents out of all the replicas
so the scores become identical.

This doesn't happen, of course, if you have only one replica.

Best,
Erick

On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer <webster.ho...@sial.com> wrote:
> We have several solr clouds, a couple of them have only 1 replica per
> shard. We have never observed the problem when we have a single replica
> only when there are multiple replicas per shard.
>
> On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer <webster.ho...@sial.com>
> wrote:
>
>> the scores are not the same
>> Doc
>> 305340 432.44238
>> C2646 428.24185
>> 12837 430.61722
>>
>> One other thing. I just ran optimize and now document 305340 is
>> consistently the top score.
>> So apparently it IS essential to run optimize after a data load
>>
>> Note we see this behavior fairly commonly on our solr cloud instances.
>> This was not the first time. This particular situation was on a development
>> system
>>
>> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer <webster.ho...@sial.com>
>> wrote:
>>
>>> the scores are not the same
>>> Doc
>>> 305340 432.44238
>>>
>>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
>>> hastings.recurs...@gmail.com> wrote:
>>>
>>>> "I am concerned that the same
>>>> search gives different results after each search. The top document seems
>>>> to
>>>> cycle between 3 different documents"
>>>>
>>>>
>>>> if you do debug query on the search, are the scores for the top 3
>>>> documents
>>>> the same or not?  you can easily have three documents with the same
>>>> score,
>>>> so when you have a result set that is ranked 1-1-1-2-3-4.... you can
>>>> expect
>>>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>>>> ranking perhaps.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com>
>>>> wrote:
>>>>
>>>> > I am not concerned about deleted documents. I am concerned that the
>>>> same
>>>> > search gives different results after each search. The top document
>>>> seems to
>>>> > cycle between 3 different documents
>>>> >
>>>> > I have an enhanced collections info api call that calls the core admin
>>>> api
>>>> > to get the index information for the replica.
>>>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>>>> > deleted documents are not the same for the replicas, but the number of
>>>> > numdocs is.
>>>> >
>>>> > Or are you saying that the search is looking at deleted documents
>>>> wouldn't
>>>> > that be a very significant bug?
>>>> >
>>>> > The four replicas:
>>>> > shard1
>>>> > core_node1
>>>> > "numDocs": 383817,
>>>> > "maxDocs": 611592,
>>>> > "deletedDocs": 227775,
>>>> > "size": "2.49 GB",
>>>> > "lastModified": "2017-09-07T08:18:03.639Z",
>>>> > "current&qu

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer

We have several solr clouds, a couple of them have only 1 replica per
shard. We have never observed the problem when we have a single replica
only when there are multiple replicas per shard.

On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer <webster.ho...@sial.com>
wrote:

> the scores are not the same
> Doc
> 305340 432.44238
> C2646 428.24185
> 12837 430.61722
>
> One other thing. I just ran optimize and now document 305340 is
> consistently the top score.
> So apparently it IS essential to run optimize after a data load
>
> Note we see this behavior fairly commonly on our solr cloud instances.
> This was not the first time. This particular situation was on a development
> system
>
> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer <webster.ho...@sial.com>
> wrote:
>
>> the scores are not the same
>> Doc
>> 305340 432.44238
>>
>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
>> hastings.recurs...@gmail.com> wrote:
>>
>>> "I am concerned that the same
>>> search gives different results after each search. The top document seems
>>> to
>>> cycle between 3 different documents"
>>>
>>>
>>> if you do debug query on the search, are the scores for the top 3
>>> documents
>>> the same or not?  you can easily have three documents with the same
>>> score,
>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
>>> expect
>>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>>> ranking perhaps.
>>>
>>>
>>>
>>>
>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com>
>>> wrote:
>>>
>>> > I am not concerned about deleted documents. I am concerned that the
>>> same
>>> > search gives different results after each search. The top document
>>> seems to
>>> > cycle between 3 different documents
>>> >
>>> > I have an enhanced collections info api call that calls the core admin
>>> api
>>> > to get the index information for the replica.
>>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>>> > deleted documents are not the same for the replicas, but the number of
>>> > numdocs is.
>>> >
>>> > Or are you saying that the search is looking at deleted documents
>>> wouldn't
>>> > that be a very significant bug?
>>> >
>>> > The four replicas:
>>> > shard1
>>> > core_node1
>>> > "numDocs": 383817,
>>> > "maxDocs": 611592,
>>> > "deletedDocs": 227775,
>>> > "size": "2.49 GB",
>>> > "lastModified": "2017-09-07T08:18:03.639Z",
>>> > "current": true,
>>> > "version": 35644,
>>> > "segmentCount": 28
>>> >
>>> > core_node3
>>> > "numDocs": 383817,
>>> > "maxDocs": 571737,
>>> > "deletedDocs": 187920,
>>> > "size": "2.85 GB",
>>> > "lastModified": "2017-09-07T08:18:03.634Z",
>>> > "current": false,
>>> > "version": 35562,
>>> > "segmentCount": 36
>>> > shard2
>>> > core_node2
>>> > "numDocs": 385326,
>>> > "maxDocs": 529214,
>>> > "deletedDocs": 143888,
>>> > "size": "2.13 GB",
>>> > "lastModified": "2017-09-07T08:18:03.632Z",
>>> > "current": true,
>>> > "version": 34783,
>>> > "segmentCount": 24
>>> > core_node4
>>> > "numDocs": 385326,
>>> > "maxDocs": 488201,
>>> > "deletedDocs": 102875,
>>> > "size": "1.96 GB",
>>> > "lastModified": "2017-09-07T08:18:03.633Z",
>>> > "current": true,
>>> > "version": 34932,
>>> > "segmentCount": 21
>>> >
>>> >
>>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com>
>>> wrote:
>>> >
>>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
>>> erickerick...@gmail.com
>>> > >
>>> > > wrote:
>>> >

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer

the scores are not the same
Doc
305340 432.44238
C2646 428.24185
12837 430.61722

One other thing. I just ran optimize and now document 305340 is
consistently the top score.
So apparently it IS essential to run optimize after a data load

Note we see this behavior fairly commonly on our solr cloud instances. This
was not the first time. This particular situation was on a development
system

On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer <webster.ho...@sial.com>
wrote:

> the scores are not the same
> Doc
> 305340 432.44238
>
> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
>
>> "I am concerned that the same
>> search gives different results after each search. The top document seems
>> to
>> cycle between 3 different documents"
>>
>>
>> if you do debug query on the search, are the scores for the top 3
>> documents
>> the same or not?  you can easily have three documents with the same score,
>> so when you have a result set that is ranked 1-1-1-2-3-4 you can
>> expect
>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>> ranking perhaps.
>>
>>
>>
>>
>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com>
>> wrote:
>>
>> > I am not concerned about deleted documents. I am concerned that the same
>> > search gives different results after each search. The top document
>> seems to
>> > cycle between 3 different documents
>> >
>> > I have an enhanced collections info api call that calls the core admin
>> api
>> > to get the index information for the replica.
>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>> > deleted documents are not the same for the replicas, but the number of
>> > numdocs is.
>> >
>> > Or are you saying that the search is looking at deleted documents
>> wouldn't
>> > that be a very significant bug?
>> >
>> > The four replicas:
>> > shard1
>> > core_node1
>> > "numDocs": 383817,
>> > "maxDocs": 611592,
>> > "deletedDocs": 227775,
>> > "size": "2.49 GB",
>> > "lastModified": "2017-09-07T08:18:03.639Z",
>> > "current": true,
>> > "version": 35644,
>> > "segmentCount": 28
>> >
>> > core_node3
>> > "numDocs": 383817,
>> > "maxDocs": 571737,
>> > "deletedDocs": 187920,
>> > "size": "2.85 GB",
>> > "lastModified": "2017-09-07T08:18:03.634Z",
>> > "current": false,
>> > "version": 35562,
>> > "segmentCount": 36
>> > shard2
>> > core_node2
>> > "numDocs": 385326,
>> > "maxDocs": 529214,
>> > "deletedDocs": 143888,
>> > "size": "2.13 GB",
>> > "lastModified": "2017-09-07T08:18:03.632Z",
>> > "current": true,
>> > "version": 34783,
>> > "segmentCount": 24
>> > core_node4
>> > "numDocs": 385326,
>> > "maxDocs": 488201,
>> > "deletedDocs": 102875,
>> > "size": "1.96 GB",
>> > "lastModified": "2017-09-07T08:18:03.633Z",
>> > "current": true,
>> > "version": 34932,
>> > "segmentCount": 21
>> >
>> >
>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com> wrote:
>> >
>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
>> erickerick...@gmail.com
>> > >
>> > > wrote:
>> > > > bq: and deleted documents are irrelevant to term statistics...
>> > > >
>> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>> > >
>> > > One can make it work either way ;-)
>> > > Whether a document is marked as deleted or not has no effect on term
>> > > statistics (i.e. irrelevant)
>> > > OR documents marked for deletion still count in term statistics (i.e.
>> > > relevant)
>> > >
>> > > I guess I used the former because we don't go out of our way to still
>> > > include deleted documents... it's just a side effect of the index
>> > > structure t

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer

the scores are not the same
Doc
305340 432.44238

On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> "I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents"
>
>
> if you do debug query on the search, are the scores for the top 3 documents
> the same or not?  you can easily have three documents with the same score,
> so when you have a result set that is ranked 1-1-1-2-3-4 you can expect
> 1-1-1 to rotate based on whatever.  use a second element like id to your
> ranking perhaps.
>
>
>
>
> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com>
> wrote:
>
> > I am not concerned about deleted documents. I am concerned that the same
> > search gives different results after each search. The top document seems
> to
> > cycle between 3 different documents
> >
> > I have an enhanced collections info api call that calls the core admin
> api
> > to get the index information for the replica.
> > When I said the numdocs were the same I meant exactly that. maxdocs and
> > deleted documents are not the same for the replicas, but the number of
> > numdocs is.
> >
> > Or are you saying that the search is looking at deleted documents
> wouldn't
> > that be a very significant bug?
> >
> > The four replicas:
> > shard1
> > core_node1
> > "numDocs": 383817,
> > "maxDocs": 611592,
> > "deletedDocs": 227775,
> > "size": "2.49 GB",
> > "lastModified": "2017-09-07T08:18:03.639Z",
> > "current": true,
> > "version": 35644,
> > "segmentCount": 28
> >
> > core_node3
> > "numDocs": 383817,
> > "maxDocs": 571737,
> > "deletedDocs": 187920,
> > "size": "2.85 GB",
> > "lastModified": "2017-09-07T08:18:03.634Z",
> > "current": false,
> > "version": 35562,
> > "segmentCount": 36
> > shard2
> > core_node2
> > "numDocs": 385326,
> > "maxDocs": 529214,
> > "deletedDocs": 143888,
> > "size": "2.13 GB",
> > "lastModified": "2017-09-07T08:18:03.632Z",
> > "current": true,
> > "version": 34783,
> > "segmentCount": 24
> > core_node4
> > "numDocs": 385326,
> > "maxDocs": 488201,
> > "deletedDocs": 102875,
> > "size": "1.96 GB",
> > "lastModified": "2017-09-07T08:18:03.633Z",
> > "current": true,
> > "version": 34932,
> > "segmentCount": 21
> >
> >
> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com> wrote:
> >
> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > > > bq: and deleted documents are irrelevant to term statistics...
> > > >
> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
> > >
> > > One can make it work either way ;-)
> > > Whether a document is marked as deleted or not has no effect on term
> > > statistics (i.e. irrelevant)
> > > OR documents marked for deletion still count in term statistics (i.e.
> > > relevant)
> > >
> > > I guess I used the former because we don't go out of our way to still
> > > include deleted documents... it's just a side effect of the index
> > > structure that we don't (and can't easily) update statistics when a
> > > document is marked as deleted.
> > >
> > > -Yonik
> > >
> > >
> > > > Erick
> > > >
> > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <ysee...@gmail.com>
> > wrote:
> > > >> Different replicas of the same shard can have different numbers of
> > > >> deleted documents (really just marked as deleted), and deleted
> > > >> documents are irrelevant to term statistics (like the number of
> > > >> documents a term appears in).  Documents marked for deletion stop
> > > >> contributing to corpus statistics when they are actually removed
> (via
> > > >> expunge deletes, merges, optimizes).
> > > >> -Yonik
> > > >>
> > > >>
> > > >> On

Re: Consecutive calls to a query give different results

2017-09-07 Thread David Hastings

"I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents"


if you do debug query on the search, are the scores for the top 3 documents
the same or not?  you can easily have three documents with the same score,
so when you have a result set that is ranked 1-1-1-2-3-4 you can expect
1-1-1 to rotate based on whatever.  use a second element like id to your
ranking perhaps.




On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.ho...@sial.com>
wrote:

> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the core admin api
> to get the index information for the replica.
> When I said the numdocs were the same I meant exactly that. maxdocs and
> deleted documents are not the same for the replicas, but the number of
> numdocs is.
>
> Or are you saying that the search is looking at deleted documents wouldn't
> that be a very significant bug?
>
> The four replicas:
> shard1
> core_node1
> "numDocs": 383817,
> "maxDocs": 611592,
> "deletedDocs": 227775,
> "size": "2.49 GB",
> "lastModified": "2017-09-07T08:18:03.639Z",
> "current": true,
> "version": 35644,
> "segmentCount": 28
>
> core_node3
> "numDocs": 383817,
> "maxDocs": 571737,
> "deletedDocs": 187920,
> "size": "2.85 GB",
> "lastModified": "2017-09-07T08:18:03.634Z",
> "current": false,
> "version": 35562,
> "segmentCount": 36
> shard2
> core_node2
> "numDocs": 385326,
> "maxDocs": 529214,
> "deletedDocs": 143888,
> "size": "2.13 GB",
> "lastModified": "2017-09-07T08:18:03.632Z",
> "current": true,
> "version": 34783,
> "segmentCount": 24
> core_node4
> "numDocs": 385326,
> "maxDocs": 488201,
> "deletedDocs": 102875,
> "size": "1.96 GB",
> "lastModified": "2017-09-07T08:18:03.633Z",
> "current": true,
> "version": 34932,
> "segmentCount": 21
>
>
> On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com> wrote:
>
> > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> > > bq: and deleted documents are irrelevant to term statistics...
> > >
> > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
> >
> > One can make it work either way ;-)
> > Whether a document is marked as deleted or not has no effect on term
> > statistics (i.e. irrelevant)
> > OR documents marked for deletion still count in term statistics (i.e.
> > relevant)
> >
> > I guess I used the former because we don't go out of our way to still
> > include deleted documents... it's just a side effect of the index
> > structure that we don't (and can't easily) update statistics when a
> > document is marked as deleted.
> >
> > -Yonik
> >
> >
> > > Erick
> > >
> > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <ysee...@gmail.com>
> wrote:
> > >> Different replicas of the same shard can have different numbers of
> > >> deleted documents (really just marked as deleted), and deleted
> > >> documents are irrelevant to term statistics (like the number of
> > >> documents a term appears in).  Documents marked for deletion stop
> > >> contributing to corpus statistics when they are actually removed (via
> > >> expunge deletes, merges, optimizes).
> > >> -Yonik
> > >>
> > >>
> > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <webster.ho...@sial.com
> >
> > wrote:
> > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> > >>> replicas (total of 4 nodes).
> > >>>
> > >>> If I run the query multiple times I see the three different top
> scoring
> > >>> results.
> > >>> No data load is running, all data has been commited
> > >>>
> > >>> I get these three different hits with their scores:
> > >>> copperiinitratehemipentahydrate2325919004194430.61722
> > >>> copperiinitrateonc

Re: Consecutive calls to a query give different results

2017-09-07 Thread Webster Homer

I am not concerned about deleted documents. I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents

I have an enhanced collections info api call that calls the core admin api
to get the index information for the replica.
When I said the numdocs were the same I meant exactly that. maxdocs and
deleted documents are not the same for the replicas, but the number of
numdocs is.

Or are you saying that the search is looking at deleted documents wouldn't
that be a very significant bug?

The four replicas:
shard1
core_node1
"numDocs": 383817,
"maxDocs": 611592,
"deletedDocs": 227775,
"size": "2.49 GB",
"lastModified": "2017-09-07T08:18:03.639Z",
"current": true,
"version": 35644,
"segmentCount": 28

core_node3
"numDocs": 383817,
"maxDocs": 571737,
"deletedDocs": 187920,
"size": "2.85 GB",
"lastModified": "2017-09-07T08:18:03.634Z",
"current": false,
"version": 35562,
"segmentCount": 36
shard2
core_node2
"numDocs": 385326,
"maxDocs": 529214,
"deletedDocs": 143888,
"size": "2.13 GB",
"lastModified": "2017-09-07T08:18:03.632Z",
"current": true,
"version": 34783,
"segmentCount": 24
core_node4
"numDocs": 385326,
"maxDocs": 488201,
"deletedDocs": 102875,
"size": "1.96 GB",
"lastModified": "2017-09-07T08:18:03.633Z",
"current": true,
"version": 34932,
"segmentCount": 21


On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> > bq: and deleted documents are irrelevant to term statistics...
> >
> > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>
> One can make it work either way ;-)
> Whether a document is marked as deleted or not has no effect on term
> statistics (i.e. irrelevant)
> OR documents marked for deletion still count in term statistics (i.e.
> relevant)
>
> I guess I used the former because we don't go out of our way to still
> include deleted documents... it's just a side effect of the index
> structure that we don't (and can't easily) update statistics when a
> document is marked as deleted.
>
> -Yonik
>
>
> > Erick
> >
> > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> >> Different replicas of the same shard can have different numbers of
> >> deleted documents (really just marked as deleted), and deleted
> >> documents are irrelevant to term statistics (like the number of
> >> documents a term appears in).  Documents marked for deletion stop
> >> contributing to corpus statistics when they are actually removed (via
> >> expunge deletes, merges, optimizes).
> >> -Yonik
> >>
> >>
> >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
> >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> >>> replicas (total of 4 nodes).
> >>>
> >>> If I run the query multiple times I see the three different top scoring
> >>> results.
> >>> No data load is running, all data has been commited
> >>>
> >>> I get these three different hits with their scores:
> >>> copperiinitratehemipentahydrate2325919004194430.61722
> >>> copperiinitrateoncelite1234598765
>  432.44238
> >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
> >>>
> >>> How is it that the same search against the same data can give different
> >>> responses?
> >>> I looked at the specific cores they look OK the numdocs for the
> replicas in
> >>> a shard match
> >>>
> >>> This is the query:
> >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-
> catalog-product/select?defType=edismax=searchmv_
> en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%
> 20search_en_p_pri_name,%20search_pno%20[explain%
> 20style=nl]=id_s=30=true&
> group.sort=sort_ds%20asc=on=2%3C-25%25=
> OR=copper%20nitrate=search_pid
> >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%
> 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%
> 20searchmv_p_skus_genr%20searchmv_user_term^200%
> 20search_lform^190%20searchmv_en_acronym^180%20search_en_
> root_name^170%20sear

Re: Consecutive calls to a query give different results

2017-09-07 Thread Erick Erickson

Whew! I haven't been lying to people for _years_..

On Thu, Sep 7, 2017 at 5:58 AM, Yonik Seeley  wrote:
> On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson  
> wrote:
>> bq: and deleted documents are irrelevant to term statistics...
>>
>> Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>
> One can make it work either way ;-)
> Whether a document is marked as deleted or not has no effect on term
> statistics (i.e. irrelevant)
> OR documents marked for deletion still count in term statistics (i.e. 
> relevant)
>
> I guess I used the former because we don't go out of our way to still
> include deleted documents... it's just a side effect of the index
> structure that we don't (and can't easily) update statistics when a
> document is marked as deleted.
>
> -Yonik
>
>
>> Erick
>>
>> On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
>>> Different replicas of the same shard can have different numbers of
>>> deleted documents (really just marked as deleted), and deleted
>>> documents are irrelevant to term statistics (like the number of
>>> documents a term appears in).  Documents marked for deletion stop
>>> contributing to corpus statistics when they are actually removed (via
>>> expunge deletes, merges, optimizes).
>>> -Yonik
>>>
>>>
>>> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  
>>> wrote:
 I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
 replicas (total of 4 nodes).

 If I run the query multiple times I see the three different top scoring
 results.
 No data load is running, all data has been commited

 I get these three different hits with their scores:
 copperiinitratehemipentahydrate2325919004194430.61722
 copperiinitrateoncelite1234598765   432.44238
 copperiinitratehydrate18756anhydrousbasis13778319 428.24185

 How is it that the same search against the same data can give different
 responses?
 I looked at the specific cores they look OK the numdocs for the replicas in
 a shard match

 This is the query:
 http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]=id_s=30=true=sort_ds%20asc=on=2%3C-25%25=OR=copper%20nitrate=search_pid
 ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform=30=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc=json

 --

 This message and any attachment are confidential and may be privileged or
 otherwise protected from disclosure. If you are not the intended recipient,
 you must not copy this message or attachment or disclose the contents to
 any other person. If you have received this transmission in error, please
 notify the sender immediately and delete the message and any attachment
 from your system. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not accept liability for any omissions or errors in this
 message which may arise as a result of E-Mail-transmission or for damages
 resulting from any unauthorized changes of the content of this message and
 any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
 subsidiaries do not guarantee that this message is free of viruses and does
 not accept liability for any damages caused by any virus transmitted
 therewith.

 Click http://www.emdgroup.com/disclaimer to access the German, French,
 Spanish and Portuguese versions of this disclaimer.

Re: Consecutive calls to a query give different results

2017-09-07 Thread Yonik Seeley

On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson  wrote:
> bq: and deleted documents are irrelevant to term statistics...
>
> Did you mean "relevant"? Or do I have to adjust my thinking _again_?

One can make it work either way ;-)
Whether a document is marked as deleted or not has no effect on term
statistics (i.e. irrelevant)
OR documents marked for deletion still count in term statistics (i.e. relevant)

I guess I used the former because we don't go out of our way to still
include deleted documents... it's just a side effect of the index
structure that we don't (and can't easily) update statistics when a
document is marked as deleted.

-Yonik


> Erick
>
> On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
>> Different replicas of the same shard can have different numbers of
>> deleted documents (really just marked as deleted), and deleted
>> documents are irrelevant to term statistics (like the number of
>> documents a term appears in).  Documents marked for deletion stop
>> contributing to corpus statistics when they are actually removed (via
>> expunge deletes, merges, optimizes).
>> -Yonik
>>
>>
>> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
>>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
>>> replicas (total of 4 nodes).
>>>
>>> If I run the query multiple times I see the three different top scoring
>>> results.
>>> No data load is running, all data has been commited
>>>
>>> I get these three different hits with their scores:
>>> copperiinitratehemipentahydrate2325919004194430.61722
>>> copperiinitrateoncelite1234598765   432.44238
>>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>>>
>>> How is it that the same search against the same data can give different
>>> responses?
>>> I looked at the specific cores they look OK the numdocs for the replicas in
>>> a shard match
>>>
>>> This is the query:
>>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]=id_s=30=true=sort_ds%20asc=on=2%3C-25%25=OR=copper%20nitrate=search_pid
>>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform=30=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc=json
>>>
>>> --
>>>
>>>
>>> This message and any attachment are confidential and may be privileged or
>>> otherwise protected from disclosure. If you are not the intended recipient,
>>> you must not copy this message or attachment or disclose the contents to
>>> any other person. If you have received this transmission in error, please
>>> notify the sender immediately and delete the message and any attachment
>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not accept liability for any omissions or errors in this
>>> message which may arise as a result of E-Mail-transmission or for damages
>>> resulting from any unauthorized changes of the content of this message and
>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>> subsidiaries do not guarantee that this message is free of viruses and does
>>> not accept liability for any damages caused by any virus transmitted
>>> therewith.
>>>
>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>>> Spanish and Portuguese versions of this disclaimer.

Re: Consecutive calls to a query give different results

2017-09-06 Thread Erick Erickson

bq: and deleted documents are irrelevant to term statistics...

Did you mean "relevant"? Or do I have to adjust my thinking _again_?

Erick

On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley  wrote:
> Different replicas of the same shard can have different numbers of
> deleted documents (really just marked as deleted), and deleted
> documents are irrelevant to term statistics (like the number of
> documents a term appears in).  Documents marked for deletion stop
> contributing to corpus statistics when they are actually removed (via
> expunge deletes, merges, optimizes).
> -Yonik
>
>
> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
>> replicas (total of 4 nodes).
>>
>> If I run the query multiple times I see the three different top scoring
>> results.
>> No data load is running, all data has been commited
>>
>> I get these three different hits with their scores:
>> copperiinitratehemipentahydrate2325919004194430.61722
>> copperiinitrateoncelite1234598765   432.44238
>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>>
>> How is it that the same search against the same data can give different
>> responses?
>> I looked at the specific cores they look OK the numdocs for the replicas in
>> a shard match
>>
>> This is the query:
>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]=id_s=30=true=sort_ds%20asc=on=2%3C-25%25=OR=copper%20nitrate=search_pid
>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform=30=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc=json
>>
>> --
>>
>>
>> This message and any attachment are confidential and may be privileged or
>> otherwise protected from disclosure. If you are not the intended recipient,
>> you must not copy this message or attachment or disclose the contents to
>> any other person. If you have received this transmission in error, please
>> notify the sender immediately and delete the message and any attachment
>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not accept liability for any omissions or errors in this
>> message which may arise as a result of E-Mail-transmission or for damages
>> resulting from any unauthorized changes of the content of this message and
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not guarantee that this message is free of viruses and does
>> not accept liability for any damages caused by any virus transmitted
>> therewith.
>>
>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>> Spanish and Portuguese versions of this disclaimer.

Re: Consecutive calls to a query give different results

2017-09-06 Thread Yonik Seeley

Different replicas of the same shard can have different numbers of
deleted documents (really just marked as deleted), and deleted
documents are irrelevant to term statistics (like the number of
documents a term appears in).  Documents marked for deletion stop
contributing to corpus statistics when they are actually removed (via
expunge deletes, merges, optimizes).
-Yonik


On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer  wrote:
> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
> replicas (total of 4 nodes).
>
> If I run the query multiple times I see the three different top scoring
> results.
> No data load is running, all data has been commited
>
> I get these three different hits with their scores:
> copperiinitratehemipentahydrate2325919004194430.61722
> copperiinitrateoncelite1234598765   432.44238
> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>
> How is it that the same search against the same data can give different
> responses?
> I looked at the specific cores they look OK the numdocs for the replicas in
> a shard match
>
> This is the query:
> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]=id_s=30=true=sort_ds%20asc=on=2%3C-25%25=OR=copper%20nitrate=search_pid
> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform=30=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc=json
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Consecutive calls to a query give different results

2017-09-06 Thread Webster Homer

I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
replicas (total of 4 nodes).

If I run the query multiple times I see the three different top scoring
results.
No data load is running, all data has been commited

I get these three different hits with their scores:
copperiinitratehemipentahydrate2325919004194430.61722
copperiinitrateoncelite1234598765   432.44238
copperiinitratehydrate18756anhydrousbasis13778319 428.24185

How is it that the same search against the same data can give different
responses?
I looked at the specific cores they look OK the numdocs for the replicas in
a shard match

This is the query:
http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]=id_s=30=true=sort_ds%20asc=on=2%3C-25%25=OR=copper%20nitrate=search_pid
^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform=30=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc=json

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Solr suggester query with quotes produces different results

2017-07-01 Thread Angel Todorov

Hi guys,

I have the Suggester configured using the FreeTextFactory. Noticed that if
I dont use quotation marks, I only get single term results. If i use
quotation marks around my query, then I only get results that are comprised
of multiple terms. There is no configuration that would return both types
of results with a single query.

Thanks
Angel

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Erick Erickson

 message. I only
>>>> realized we may be using a different definition of "different hits"
>>>> part way through writing this reply.
>>>>
>>>> 
>>>>
>>>> Having the timestamp as a string isn't a problem, you can do something
>>>> very similar with wildcards and the like if it's a string that sorts
>>>> the same way the timestamp would. And it's best if it's created
>>>> upstream anyway that way it's guaranteed to be the same for the doc on
>>>> all replicas.
>>>>
>>>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
>>>> copyfield to a date field would do the trick.
>>>>
>>>> But there's no real reason to do any of that. Given that you see this
>>>> when there's no indexing going on then there's no point to those
>>>> tests, those were just for a way to examine your nodes while there was
>>>> active indexing.
>>>>
>>>> How do you fix this problem when you see it? If it goes away by itself
>>>> that would gives at least a start on where to look. If you have to
>>>> manually intervene it would be good to know what you do.
>>>>
>>>> The CDCR pattern is docs to from the leader on the source cluster to
>>>> the leader on the target cluster. Once the target leader gets the
>>>> docs, it's supposed to send the doc to all the replicas.
>>>>
>>>> To try to narrow down the issue, next time it occurs can you look at
>>>> _both_ the source and target clusters and see if they _both_ show the
>>>> same discrepancy? What I'm looking for is whether both are
>>>> self-consistent. That is, all the replicas for shardN on the source
>>>> cluster show the same documents (M). All the replicas for shardN on
>>>> the target cluster show the same number of docs (N). I'm not as
>>>> concerned if M != N at this point. Note I'm looking at the number of
>>>> hits here, not say the document ordering.
>>>>
>>>> To do this you'll have to do the trick I mentioned where you query
>>>> each replica separately.
>>>>
>>>> And are you absolutely sure that your different results are coming
>>>> from the _same_ cluster? If you're comparing a query from the source
>>>> cluster with a query from the target cluster, that's different than if
>>>> the queries come from the same cluster.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer <webster.ho...@sial.com>
>>>> wrote:
>>>> > Thanks for the quick feedback.
>>>> >
>>>> > We are not doing continuous indexing, we do a complete load once a
>>>> week and
>>>> > then have a daily partial load for any documents that have changed
>>>> since
>>>> > the load. These partial loads take only a few minutes every morning.
>>>> >
>>>> > The problem is we see this discrepancy long after the data load
>>>> completes.
>>>> >
>>>> > We have a source collection that uses cdcr to replicate to the target.
>>>> I
>>>> > see the current=false setting in both the source and target
>>>> collections.
>>>> > Only the target collection is being heavily searched so that is where
>>>> my
>>>> > concern is. So what could cause this kind of issue?
>>>> > Do we have a configuration problem?
>>>> >
>>>> > It doesn't happen all the time, so I don't currently have a
>>>> reproducible
>>>> > test case, yet.
>>>> >
>>>> > I will see about adding the timestamp, we have one, but it was created
>>>> as a
>>>> > string, and was generated by our ETL job
>>>> >
>>>> > On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson <
>>>> erickerick...@gmail.com>
>>>> > wrote:
>>>> >
>>>> >> The commit points on different replicas will trip at different wall
>>>> >> clock times so the leader and replica may return slightly different
>>>> >> results depending on whether doc X was included in the commit on one
>>>> >> replica but not on the second. After the _next_ commit interval (2
>>>> >> seconds in your case), doc X will be committed on the second replica:

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer

nt.java:758)\n\tat
>> org.apache.solr.handler.component.QueryComponent.handleRespo
>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon
>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore.
>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo
>> lrCall.execute(HttpSolrCall.java:652)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv
>> letHandler.doScope(ServletHandler.java:511)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl
>> er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> ndle(ContextHandlerCollection.java:213)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl
>> er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io
>> .FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io
>> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
>> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> java.lang.Thread.run(Thread.java:745)\n", "code":500}}
>>
>> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> Let's back up a bit. You say "This seems to cause two replicas to
>>> return different hits depending upon which one is queried."
>>>
>>> OK, _how_ are they different? I've been assuming different numbers of
>>> hits. If you're getting the same number of hits but different document
>>> ordering, that's a completely different issue and may be easily
>>> explainable. If this is true, skip the rest of this message. I only
>>> realized we may be using a different definition of "different hits"
>>> part way through writing this reply.
>>>
>>> 
>>>
>>> Having the timestamp as a string isn't a problem, you can do something
>>> very similar with wildcards and the like if it's a string that sorts
>>> the same way the timestamp would. And it's best if it's created
>>> upstream anyway that way it's guaranteed to be the same for the doc on
>>> all replicas.
>>>
>>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
>>> copyfield to a date field would do the trick.
>>>
>>> But there's no real reason to do any of that. Given that you see this
>>> when there's no indexing going on then there's no point to those
>>> tests, those were just for a way to examine your nodes while there was
>>> active indexing.
>>>
>>> How do you fix this problem when you see it? If it goes away by itself
>>> that would gives at least a start on where to look. If you have to
>>> manually intervene i

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer

a:208)\n\tat org.eclipse.jetty.servlet.
> ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.
> session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.
> ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.
> handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.
> handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.
> FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.
> SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.
> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n",
> "code":500}}
>
> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Let's back up a bit. You say "This seems to cause two replicas to
>> return different hits depending upon which one is queried."
>>
>> OK, _how_ are they different? I've been assuming different numbers of
>> hits. If you're getting the same number of hits but different document
>> ordering, that's a completely different issue and may be easily
>> explainable. If this is true, skip the rest of this message. I only
>> realized we may be using a different definition of "different hits"
>> part way through writing this reply.
>>
>> 
>>
>> Having the timestamp as a string isn't a problem, you can do something
>> very similar with wildcards and the like if it's a string that sorts
>> the same way the timestamp would. And it's best if it's created
>> upstream anyway that way it's guaranteed to be the same for the doc on
>> all replicas.
>>
>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
>> copyfield to a date field would do the trick.
>>
>> But there's no real reason to do any of that. Given that you see this
>> when there's no indexing going on then there's no point to those
>> tests, those were just for a way to examine your nodes while there was
>> active indexing.
>>
>> How do you fix this problem when you see it? If it goes away by itself
>> that would gives at least a start on where to look. If you have to
>> manually intervene it would be good to know what you do.
>>
>> The CDCR pattern is docs to from the leader on the source cluster to
>> the leader on the target cluster. Once the target leader gets the
>> docs, it's supposed to send the doc to all the replicas.
>>
>> To try to narrow down the issue, next time it occurs can you look at
>> _both_ the source and target clusters and see if they _both_ show the
>> same discrepancy? What I'm looking for is whether both are
>> self-consistent. That is, all the replicas for shardN on the source
>> cluster show the same documents (M). All the replicas for shardN on
>> the target cluster show the same number of docs (N). I'm not as
>> concerned if M != N at this point. Note I'm looking at the number of
>> hits here, not say the document ordering.
>>
>> To do this you'll have to do the trick I mentioned where you query
>> each replica separately.
>>
>> And are you absolutely sure that your different results are coming
>> from the _same_ cl

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-15 Thread Webster Homer

adPool$3.run(QueuedThreadPool.java:572)\n\tat
java.lang.Thread.run(Thread.java:745)\n", "code":500}}

On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Let's back up a bit. You say "This seems to cause two replicas to
> return different hits depending upon which one is queried."
>
> OK, _how_ are they different? I've been assuming different numbers of
> hits. If you're getting the same number of hits but different document
> ordering, that's a completely different issue and may be easily
> explainable. If this is true, skip the rest of this message. I only
> realized we may be using a different definition of "different hits"
> part way through writing this reply.
>
> 
>
> Having the timestamp as a string isn't a problem, you can do something
> very similar with wildcards and the like if it's a string that sorts
> the same way the timestamp would. And it's best if it's created
> upstream anyway that way it's guaranteed to be the same for the doc on
> all replicas.
>
> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
> copyfield to a date field would do the trick.
>
> But there's no real reason to do any of that. Given that you see this
> when there's no indexing going on then there's no point to those
> tests, those were just for a way to examine your nodes while there was
> active indexing.
>
> How do you fix this problem when you see it? If it goes away by itself
> that would gives at least a start on where to look. If you have to
> manually intervene it would be good to know what you do.
>
> The CDCR pattern is docs to from the leader on the source cluster to
> the leader on the target cluster. Once the target leader gets the
> docs, it's supposed to send the doc to all the replicas.
>
> To try to narrow down the issue, next time it occurs can you look at
> _both_ the source and target clusters and see if they _both_ show the
> same discrepancy? What I'm looking for is whether both are
> self-consistent. That is, all the replicas for shardN on the source
> cluster show the same documents (M). All the replicas for shardN on
> the target cluster show the same number of docs (N). I'm not as
> concerned if M != N at this point. Note I'm looking at the number of
> hits here, not say the document ordering.
>
> To do this you'll have to do the trick I mentioned where you query
> each replica separately.
>
> And are you absolutely sure that your different results are coming
> from the _same_ cluster? If you're comparing a query from the source
> cluster with a query from the target cluster, that's different than if
> the queries come from the same cluster.
>
> Best,
> Erick
>
> On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
> > Thanks for the quick feedback.
> >
> > We are not doing continuous indexing, we do a complete load once a week
> and
> > then have a daily partial load for any documents that have changed since
> > the load. These partial loads take only a few minutes every morning.
> >
> > The problem is we see this discrepancy long after the data load
> completes.
> >
> > We have a source collection that uses cdcr to replicate to the target. I
> > see the current=false setting in both the source and target collections.
> > Only the target collection is being heavily searched so that is where my
> > concern is. So what could cause this kind of issue?
> > Do we have a configuration problem?
> >
> > It doesn't happen all the time, so I don't currently have a reproducible
> > test case, yet.
> >
> > I will see about adding the timestamp, we have one, but it was created
> as a
> > string, and was generated by our ETL job
> >
> > On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> The commit points on different replicas will trip at different wall
> >> clock times so the leader and replica may return slightly different
> >> results depending on whether doc X was included in the commit on one
> >> replica but not on the second. After the _next_ commit interval (2
> >> seconds in your case), doc X will be committed on the second replica:
> >> that is it's not lost.
> >>
> >> Here's a couple of ways to verify:
> >>
> >> 1> turn off indexing and wait a few seconds. The replicas should have
> >> the exact same documents. "A few seconds" is your autocommit (soft in
> >> your case) interval + autowarm time. This last is unknown, but you can
> >> check your admin

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson

Let's back up a bit. You say "This seems to cause two replicas to
return different hits depending upon which one is queried."

OK, _how_ are they different? I've been assuming different numbers of
hits. If you're getting the same number of hits but different document
ordering, that's a completely different issue and may be easily
explainable. If this is true, skip the rest of this message. I only
realized we may be using a different definition of "different hits"
part way through writing this reply.

Having the timestamp as a string isn't a problem, you can do something
very similar with wildcards and the like if it's a string that sorts
the same way the timestamp would. And it's best if it's created
upstream anyway that way it's guaranteed to be the same for the doc on
all replicas.

If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple
copyfield to a date field would do the trick.

But there's no real reason to do any of that. Given that you see this
when there's no indexing going on then there's no point to those
tests, those were just for a way to examine your nodes while there was
active indexing.

How do you fix this problem when you see it? If it goes away by itself
that would gives at least a start on where to look. If you have to
manually intervene it would be good to know what you do.

The CDCR pattern is docs to from the leader on the source cluster to
the leader on the target cluster. Once the target leader gets the
docs, it's supposed to send the doc to all the replicas.

To try to narrow down the issue, next time it occurs can you look at
_both_ the source and target clusters and see if they _both_ show the
same discrepancy? What I'm looking for is whether both are
self-consistent. That is, all the replicas for shardN on the source
cluster show the same documents (M). All the replicas for shardN on
the target cluster show the same number of docs (N). I'm not as
concerned if M != N at this point. Note I'm looking at the number of
hits here, not say the document ordering.

To do this you'll have to do the trick I mentioned where you query
each replica separately.

And are you absolutely sure that your different results are coming
from the _same_ cluster? If you're comparing a query from the source
cluster with a query from the target cluster, that's different than if
the queries come from the same cluster.

Best,
Erick

On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer <webster.ho...@sial.com> wrote:
> Thanks for the quick feedback.
>
> We are not doing continuous indexing, we do a complete load once a week and
> then have a daily partial load for any documents that have changed since
> the load. These partial loads take only a few minutes every morning.
>
> The problem is we see this discrepancy long after the data load completes.
>
> We have a source collection that uses cdcr to replicate to the target. I
> see the current=false setting in both the source and target collections.
> Only the target collection is being heavily searched so that is where my
> concern is. So what could cause this kind of issue?
> Do we have a configuration problem?
>
> It doesn't happen all the time, so I don't currently have a reproducible
> test case, yet.
>
> I will see about adding the timestamp, we have one, but it was created as a
> string, and was generated by our ETL job
>
> On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> The commit points on different replicas will trip at different wall
>> clock times so the leader and replica may return slightly different
>> results depending on whether doc X was included in the commit on one
>> replica but not on the second. After the _next_ commit interval (2
>> seconds in your case), doc X will be committed on the second replica:
>> that is it's not lost.
>>
>> Here's a couple of ways to verify:
>>
>> 1> turn off indexing and wait a few seconds. The replicas should have
>> the exact same documents. "A few seconds" is your autocommit (soft in
>> your case) interval + autowarm time. This last is unknown, but you can
>> check your admin/plugins-stats search handler times, it's reported
>> there. Now issue your queries. If the replicas don't report the same
>> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
>> commit interval, which is really aggressive, you _better not_ have
>> very large autowarm intervals!
>>
>> 2> Include a timestamp in your docs when they are indexed. There's an
>> automatic way to do that BTW now do your queries and append an FQ
>> clause like =timestamp:[* TO some_point_in_the_past]. The replicas
>> should have the same counts unless you are deleting documents. I
>> mention deletes on the off c

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer

Thanks for the quick feedback.

We are not doing continuous indexing, we do a complete load once a week and
then have a daily partial load for any documents that have changed since
the load. These partial loads take only a few minutes every morning.

The problem is we see this discrepancy long after the data load completes.

We have a source collection that uses cdcr to replicate to the target. I
see the current=false setting in both the source and target collections.
Only the target collection is being heavily searched so that is where my
concern is. So what could cause this kind of issue?
Do we have a configuration problem?

It doesn't happen all the time, so I don't currently have a reproducible
test case, yet.

I will see about adding the timestamp, we have one, but it was created as a
string, and was generated by our ETL job

On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> The commit points on different replicas will trip at different wall
> clock times so the leader and replica may return slightly different
> results depending on whether doc X was included in the commit on one
> replica but not on the second. After the _next_ commit interval (2
> seconds in your case), doc X will be committed on the second replica:
> that is it's not lost.
>
> Here's a couple of ways to verify:
>
> 1> turn off indexing and wait a few seconds. The replicas should have
> the exact same documents. "A few seconds" is your autocommit (soft in
> your case) interval + autowarm time. This last is unknown, but you can
> check your admin/plugins-stats search handler times, it's reported
> there. Now issue your queries. If the replicas don't report the same
> docs A Bad Thing that should be worrying. BTW, with a 2 second soft
> commit interval, which is really aggressive, you _better not_ have
> very large autowarm intervals!
>
> 2> Include a timestamp in your docs when they are indexed. There's an
> automatic way to do that BTW now do your queries and append an FQ
> clause like =timestamp:[* TO some_point_in_the_past]. The replicas
> should have the same counts unless you are deleting documents. I
> mention deletes on the off chance that you're deleting documents that
> fall in the interval and then the same as above could theoretically
> occur. Updates should be fine.
>
> BTW, I've seen continuous monitoring of this done by automated
> scripts. The key is to get the shard URL and ping that with
> =false. It'll look something like
> http://host:port/solr/collection_shard1_replica1 People usually
> just use *:* and compare numFound.
>
> Best,
> Erick
>
>
>
> On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer <webster.ho...@sial.com>
> wrote:
> > We are using Solr Cloud 6.2
> >
> > We have been noticing an issue where the index in a core shows as
> current =
> > false
> >
> > We have autocommit set for 15 seconds, and soft commit at 2 seconds
> >
> > This seems to cause two replicas to return different hits depending upon
> > which one is queried.
> >
> > What would lead to the indexes not being "current"? The documentation on
> > the meaning of current is vague.
> >
> > The collections in our cloud have two shards each with two replicas. I
> see
> > this with several of the collections.
> >
> > We don't know how they get like this but it's troubling
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.merckgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must no

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson

The commit points on different replicas will trip at different wall
clock times so the leader and replica may return slightly different
results depending on whether doc X was included in the commit on one
replica but not on the second. After the _next_ commit interval (2
seconds in your case), doc X will be committed on the second replica:
that is it's not lost.

Here's a couple of ways to verify:

1> turn off indexing and wait a few seconds. The replicas should have
the exact same documents. "A few seconds" is your autocommit (soft in
your case) interval + autowarm time. This last is unknown, but you can
check your admin/plugins-stats search handler times, it's reported
there. Now issue your queries. If the replicas don't report the same
docs A Bad Thing that should be worrying. BTW, with a 2 second soft
commit interval, which is really aggressive, you _better not_ have
very large autowarm intervals!

2> Include a timestamp in your docs when they are indexed. There's an
automatic way to do that BTW now do your queries and append an FQ
clause like =timestamp:[* TO some_point_in_the_past]. The replicas
should have the same counts unless you are deleting documents. I
mention deletes on the off chance that you're deleting documents that
fall in the interval and then the same as above could theoretically
occur. Updates should be fine.

BTW, I've seen continuous monitoring of this done by automated
scripts. The key is to get the shard URL and ping that with
=false. It'll look something like
http://host:port/solr/collection_shard1_replica1 People usually
just use *:* and compare numFound.

Best,
Erick

On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer <webster.ho...@sial.com> wrote:
> We are using Solr Cloud 6.2
>
> We have been noticing an issue where the index in a core shows as current =
> false
>
> We have autocommit set for 15 seconds, and soft commit at 2 seconds
>
> This seems to cause two replicas to return different hits depending upon
> which one is queried.
>
> What would lead to the indexes not being "current"? The documentation on
> the meaning of current is vague.
>
> The collections in our cloud have two shards each with two replicas. I see
> this with several of the collections.
>
> We don't know how they get like this but it's troubling
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer

We are using Solr Cloud 6.2

We have been noticing an issue where the index in a core shows as current =
false

We have autocommit set for 15 seconds, and soft commit at 2 seconds

This seems to cause two replicas to return different hits depending upon
which one is queried.

What would lead to the indexes not being "current"? The documentation on
the meaning of current is vague.

The collections in our cloud have two shards each with two replicas. I see
this with several of the collections.

We don't know how they get like this but it's troubling

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Different results for comma and whitespace separated query string using eDisMax Query Parser

2016-10-31 Thread Frank.Zirkelbach

Hi,

different results are obtained for a query separated by comma and one separated 
by whitespace,

   "q":"foo,bar",
   "q":"foo bar",

although solr.StandardTokenizerFactory is utilized. The eDisMax Query Parser is 
used.
Fields of interest are determined by the 'qf' parameter.
   
   "defType":"edismax",
   "qf":"STREET_NAME COMMPART_NAME",

The different results are also reflected within the parsedquery debug output:

Whitespace:
"rawquerystring":"foo bar",
"querystring":"foo bar",
"parsedquery":"(+(DisjunctionMaxQuery((STREET_NAME:foo | 
COMMPART_NAME:foo)) DisjunctionMaxQuery((STREET_NAME:bar | 
COMMPART_NAME:bar/no_coord",
"parsedquery_toString":"+((STREET_NAME:foo | COMMPART_NAME:foo) 
(STREET_NAME:bar | COMMPART_NAME:bar))",
"explain":{},
"QParser":"ExtendedDismaxQParser",

Comma:
"rawquerystring":"foo,bar",
"querystring":"foo,bar",
"parsedquery":"(+DisjunctionMaxQuery(((STREET_NAME:foo STREET_NAME:bar) | 
(COMMPART_NAME:foo COMMPART_NAME:bar/no_coord",
"parsedquery_toString":"+((STREET_NAME:foo STREET_NAME:bar) | 
(COMMPART_NAME:foo COMMPART_NAME:bar))",
"explain":{},
"QParser":"ExtendedDismaxQParser",

The way I understand the standard tokenizer, both query strings should be split 
in the same way,
treating whitespace and punctuation as delimiters.

However, obviously, different separators result in different evaluations.
In the first case, the score values of both DisjunctionMaxQuery evaluations are 
added together.
In the second case, only one (the maximum) of these score values is returned.

Any ideas what I am missing here?

I am using Solr 6.2.0.
Configuration details:
   
   
and
   
 
   
   
   
   
   
 
   


Thanks and all the best,

Frank

-- 

Frank Zirkelbach
LEW Verteilnetz GmbH (LVN), GIS/NIS
Schaezlerstraße 3, 86150 Augsburg

Tel. intern: 71-1379
Tel. extern: +49-821-328-1379
Fax extern: +49-821-328-1360
mailto:frank.zirkelb...@lew-verteilnetz.de
www.lew-verteilnetz.de

Vorsitzender des Aufsichtsrats: Dr. Markus Litpher;
Geschäftsführer: Manfred Lux, Theo Schmidtner, Eugen Wiedemann
Sitz der Gesellschaft: Augsburg; USt-IdNr. DE240432124
Handelsregister HRB 20929, Registergericht: Amtsgericht Augsburg

Re: Solr MLT with stream.body returns different results on each shard

2015-08-11 Thread Chris Hostetter


: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
: indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
: MorelikeThisHandler with content stream I'm getting different results per
: shard.

I haven't looked at the code recently but i'm 99% certain that the MLT 
handler in general doesn't work with distributed (ie: sharded) queries.  
(unlike the MLT component and the recently added MLT qparser)

I suspect that in the specific case of stream.body, what you are seeing is 
that the interesting terms are being computed relative the local tf/idf 
stats for that shard, and then only local results from that shard are 
being returned.

: I also looked at using a standard MLT query, but I need to be able to
: stream in a fairly large block of text for comparison that is not in the
: index (different type of document). A standard MLT  query

Until/unless the MLT parser supports arbitrary text (there's some mention 
of this in SOLR-7639 but i'm not sure what the status of that is) you 
might find that just POSTing all of your text as a regular query (q) using 
dismax or edismax is suitable for your needs -- that's essentially the 
equivilent of what MLTHandler does with a stream.body, except it tries to 
only focus on interesting terms based on tf/idf, but if your fields 
are all configured with stopword files anyway, then the results and 
performance may be similar.


-Hoss
http://www.lucidworks.com/

Solr MLT with stream.body returns different results on each shard

2015-08-11 Thread Aaron Gibbons

I have a fresh install of Solr 5.2.1 with about 3 million docs freshly
indexed (I can also reproduce this issue on 4.10.0). When I use the Solr
MorelikeThisHandler with content stream I'm getting different results per
shard.

I also looked at using a standard MLT query, but I need to be able to
stream in a fairly large block of text for comparison that is not in the
index (different type of document). A standard MLT  query
http://testsolr2:8983/solr/mega/select?q=electronicsmlt.flt=textmlt.mintf=0fl=id,score
appears to return consistent results between shards.

Any reason why the content stream query would be different between shards?
Thank you for your help!
Aaron


*Content Stream Example:*
http://testsolr1:8983/solr/mega/mlt?stream.body=electronicsmlt.flt=textmlt.mintf=0fl=id,score
*Returns: *
response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
result name=response numFound=1590 start=0

http://testsolr2:8983/solr/mega/mlt?stream.body=electronicsmlt.flt=textmlt.mintf=0fl=id,score

*Returns: *
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
result name=response numFound=1619 start=0

Solr Clustering component different results than Carrot workbench

2014-08-18 Thread Yavar Husain

Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing
list however just wanted to post my problem to a wider audience.

I am using Solr 4.7 (on both windows and linux) and saved my
lingo-attributes.xml file from the workbench which I am using in Solr. Note
that for testing I am just having one solr Index and all the queries are
getting fired on that.

Now the clusters that I am getting are good in the workbench (carrot) but
pathetic in Solr. In the logs (jetty) I can see:

Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that
indicates that my attribute file is being loaded.

I am really confused what is accounting for the difference in the two
outputs (workbench vs Solr). Again to reiterate the data sources are same
(just one solr index and same queries with 100 results). This is happening
on both Linux and Windows.

Given below is my search component and request handler configuration:

searchComponent name=clustering
   enable=${solr.clustering.enabled:true}
   class=solr.clustering.ClusteringComponent 
lst name=engine
  str name=namelingo/str

  !-- Class name of a clustering algorithm compatible with the Carrot2
framework.

   Currently available open source algorithms are:
   * org.carrot2.clustering.lingo.LingoClusteringAlgorithm
   * org.carrot2.clustering.stc.STCClusteringAlgorithm
   *
org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

   See http://project.carrot2.org/algorithms.html for more
information.

   A commercial algorithm Lingo3G (needs to be installed
separately) is defined as:
   * com.carrotsearch.lingo3g.Lingo3GClusteringAlgorithm
--
  str
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str
  str name=LingoClusteringAlgorithm.desiredClusterCountBase30/str


  !-- Override location of the clustering algorithm's resources
   (attribute definitions and lexical resources).

   A directory from which to load algorithm-specific stop words,
   stop labels and attribute definition XMLs.

   For an overview of Carrot2 lexical resources, see:

http://download.carrot2.org/head/manual/#chapter.lexical-resources

   For an overview of Lingo3G lexical resources, see:

http://download.carrotsearch.com/lingo3g/manual/#chapter.lexical-resources
   --
  str name=carrot.resourcesDirclustering/carrot2/str
/lst


  /searchComponent

  !-- A request handler for demonstrating the clustering component

   This is purely as an example.

   In reality you will likely want to add the component to your
   already specified request handlers.
--
  requestHandler name=/clustering
  enable=${solr.clustering.enabled:true}
  class=solr.SearchHandler
lst name=defaults
  bool name=clusteringtrue/bool
  bool name=clustering.resultstrue/bool
  !-- Field name with the logical title of a each document
(optional) --
  str
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str
  str name=carrot.resourcesDirclustering/carrot2/str
  str name=carrot.titlefilm_id/str
  !-- Field name with the logical content of a each document
(optional) --
  str name=carrot.snippetdescription/str
  !-- Apply highlighter to the title/ content and use this for
clustering. --
  bool name=carrot.produceSummarytrue/bool
  !-- the maximum number of labels per cluster --
  !--int name=carrot.numDescriptions5/int--
  !-- produce sub clusters --
  bool name=carrot.outputSubClustersfalse/bool
  str name=rows100/str
/lst
arr name=last-components
  strclustering/str
/arr
  /requestHandler

Re: Join and non-Join query give different results

2014-07-19 Thread atawfik

I have figured it out. 

The reason is simply the type of join in Solr. It is an outer join. Since
both filter queries are executed separately, a house that has available
documents with discount  1 or (sd_year:2014 AND sd_month:11) will be
returned even though my intention was applying bother conditions at the same
time. 

However, in the second case, both conditions are applied at same time to
find available documents, then houses based on the matching available
documents are returned. Since there is no any available document that
satisfies both conditions, then there is no any matching house which gives
zero results.

It really took sometime to figure this out, I hope this will help someone
else.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922p4148131.html
Sent from the Solr - User mailing list archive at Nabble.com.

Join and non-Join query give different results

2014-07-13 Thread atawfik

Hi everyone,

I am trying to link two types of documents in my Solr index. The parent is
named house and the child is named available. So, I want to return a
list of houses that have available documents with some filtering. However,
the following query gives me around 18 documents, which is wrong. It should
return 0 documents.

q=*:*
fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO
*] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS]
fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014
AND sd_month:11

To debug it, I tried first to check whether there is any available documents
with the given filter queries. So, I tried the following query:
q=*:*
fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO
NOW/DAY%2B21DAYS]
fq=doctype:available AND sd_year:2014 AND sd_month:11

The query gives 0 results, which is correct. So as you can see both queries
are the same, the different is using the join query parser. I am a bit
confused, why the first query gives results. My understanding is that this
should not happen because the second query shows that there is no any
available documents that satisfy the given filter queries.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Debug different Results from different Request Handlers

2014-06-18 Thread O. Olson

Thank you Erik (and to steffkes who helped me on the IRC #Solr Chat). Sorry
for the delay in responding, but I got this to work. 

Your suggestion about adding debug=true to the query helped me. Since I 
was
adding this to the Velocity request handler, I could not see the debug
results, but when I added wt=xml i.e. /products?q=hp|lync
debug=truewt=xml, I could see the Parsed Query as well as the Parser used
for each handler. 

Thanks also to steffkes who answered my question in the original post 
(on
IRC) i.e. both of my handlers go through
org.apache.solr.servlet.SolrDispatchFilter, particularly it’s the doFilter()
method that I was looking for.

Also as steffkes pointed out, (from my original post), the /products
request handler uses the ExtendedDismaxQParser whereas the second /search or
/select request handler uses the LuceneQParser. It seems that these two
parsers handle the | sign very differently.  For my limited private
installation, I decided to get to the base class of ExtendedDismaxQParser 
LuceneQParser i.e. QParser. There in the constructor, I strip out the | sign
from the qstr parameter. This is probably the dirtiest way to get this to
work, but it works for now. 

Thanks again to you all.
O. O. 

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4142716.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Debug different Results from different Request Handlers

2014-06-16 Thread Erik Hatcher

If you want the two request handlers to have the same behavior, but just the 
velocity stuff be different, than remove everything except echoParams, wt, 
v.template, v.base_dir, v.layout, (and title if your templates are using it, 
the default does).

You can see which query parser is being used by adding debug=true to the 
request (or debugQuery=true, legacy param).

Erik

On Jun 14, 2014, at 1:47 PM, O. Olson olson_...@yahoo.it wrote:

 Thank you Erik. I tried /products?q=hp|lyncwt=xml and I show no results i.e.
 numFound=0, so I think there is something wrong. You are correct, that the
 VRW is not the problem but the Query Parser. Could you please let me know
 how to determine the query parser?
 
 For most part I have not changed these request handlers from the Solr
 examples. The Request Handler that uses Apache Velocity looks like: 
 
 requestHandler name=/products class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=wtvelocity/str
   str name=v.templatebrowse/str
  str name=debugQuerytrue/str
  str name=v.base_dirVMTemplates/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str
   str name=defTypeedismax/str
   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0
 resourcename^1.0
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   /str 
   str
 name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename/str
   int name=mlt.count3/int
   str name=faceton/str
  str name=facet.fieldCategoryID/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultsfalse/str   
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str   
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str  
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str  
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
  /requestHandler
 
 And the regular XML handler looks like: 
 
 requestHandler name=/search
 class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
/lst
  /requestHandler
 
 Does this show which is the Query Parser? I can post more of my
 solrconfig.xml if necessary. 
 
 I am curious where the Query Parser hands over the parameters to the Solr
 engine that would be common irrespective of Request Handler i.e. I am trying
 to put debugging statements into the common code so that these can dump out
 intermediate results to the log. 
 
 Thanks again Erik.
 O. O.
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Debug different Results from different Request Handlers

2014-06-14 Thread Erik Hatcher

Try /products?wt=xml and compare.  VRW is just a writer; it doesn't affect the 
results in any way.   Let's see the rest of those handler definitions - 
different query parser is my hunch. Or maybe your velocity template is not 
showing the actual results?

  Erik

 On Jun 13, 2014, at 22:44, O. Olson olson_...@yahoo.it wrote:
 
 Hi,
 
 In my solrcofig.xml I have one Request Handler displaying the results using 
 Apache Velocity: 
 
   requestHandler name=/products class=solr.SearchHandler
 
 And another with regular XML: 
 requestHandler name=/search 
 class=org.apache.solr.handler.component.SearchHandler
 
 I am seeing different results when I use these two handlers. 
 
 Search Query: hp|lync  (Or on the URL  q=hp%7Elync)
 
 I see 0 results when I use the first handler (Velocity), but I see many 
 results (10’s) with the second handler. I am trying to debug why this problem 
 occurs.  I am certain the problem is with the first handler, and I would be 
 grateful if anyone can help me debug this. I do not know Solr well enough, so 
 a few pointers could help. 
 
 1. First, I would like to know if class=solr.SearchHandler and 
 class=org.apache.solr.handler.component.SearchHandler are the same? If no, 
 what does solr.SearchHandler refer to?
 
 2. Second, I am working with the source of Solr 4.7 (yes, it is  a bit old, 
 but I don’t think it is fundamentally changed). I have put log.debug() 
 statements in the org.apache.solr.response.VelocityResponseWriter.write() 
 method to verify that my query is not getting mangled with the URL encoding, 
 and it is not. So, since I am getting different results for the same queries, 
 I am curious to see what the core Solr engine is receiving when I run the 
 same query from different handlers. Could someone tell me the class which has 
 the core Solr engine that is used irrespective of which Request Handler makes 
 the request? I am trying to put debug statements into this class to log the 
 value of the query parameter that it receives. The results are different, so 
 I think one or more parameters are different.
 
 Thank you in advance,
 O. O.

Re: Debug different Results from different Request Handlers

2014-06-14 Thread O. Olson

Thank you Erik. I tried /products?q=hp|lyncwt=xml and I show no results i.e.
numFound=0, so I think there is something wrong. You are correct, that the
VRW is not the problem but the Query Parser. Could you please let me know
how to determine the query parser?

For most part I have not changed these request handlers from the Solr
examples. The Request Handler that uses Apache Velocity looks like: 

requestHandler name=/products class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=wtvelocity/str
   str name=v.templatebrowse/str
   str name=debugQuerytrue/str
   str name=v.base_dirVMTemplates/str
   str name=v.layoutlayout/str
   str name=titleSolritas/str
   str name=defTypeedismax/str
   str name=qf
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  title^10.0 description^5.0 keywords^5.0 author^2.0
resourcename^1.0
   /str
   str name=dftext/str
   str name=mm100%/str
   str name=q.alt*:*/str
   str name=rows10/str
   str name=fl*,score/str
   str name=mlt.qf
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
   /str 
   str
name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename/str
   int name=mlt.count3/int
   str name=faceton/str
   str name=facet.fieldCategoryID/str
   str name=spellcheckon/str
   str name=spellcheck.extendedResultsfalse/str   
   str name=spellcheck.count5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.maxResultsForSuggest5/str   
   str name=spellcheck.collatetrue/str
   str name=spellcheck.collateExtendedResultstrue/str  
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.maxCollations3/str  
 /lst
 arr name=last-components
   strspellcheck/str
 /arr
  /requestHandler

And the regular XML handler looks like: 

requestHandler name=/search
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
/lst
  /requestHandler

Does this show which is the Query Parser? I can post more of my
solrconfig.xml if necessary. 

I am curious where the Query Parser hands over the parameters to the Solr
engine that would be common irrespective of Request Handler i.e. I am trying
to put debugging statements into the common code so that these can dump out
intermediate results to the log. 

Thanks again Erik.
O. O.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Debug different Results from different Request Handlers

2014-06-13 Thread O. Olson

Hi,

In my solrcofig.xml I have one Request Handler displaying the results using 
Apache Velocity: 

  requestHandler name=/products class=solr.SearchHandler

And another with regular XML: 
requestHandler name=/search 
class=org.apache.solr.handler.component.SearchHandler

I am seeing different results when I use these two handlers. 

Search Query: hp|lync  (Or on the URL  q=hp%7Elync)

I see 0 results when I use the first handler (Velocity), but I see many results 
(10’s) with the second handler. I am trying to debug why this problem occurs.  
I am certain the problem is with the first handler, and I would be grateful if 
anyone can help me debug this. I do not know Solr well enough, so a few 
pointers could help. 

1. First, I would like to know if class=solr.SearchHandler and 
class=org.apache.solr.handler.component.SearchHandler are the same? If no, 
what does solr.SearchHandler refer to?

2. Second, I am working with the source of Solr 4.7 (yes, it is  a bit old, but 
I don’t think it is fundamentally changed). I have put log.debug() statements 
in the org.apache.solr.response.VelocityResponseWriter.write() method to verify 
that my query is not getting mangled with the URL encoding, and it is not. So, 
since I am getting different results for the same queries, I am curious to see 
what the core Solr engine is receiving when I run the same query from different 
handlers. Could someone tell me the class which has the core Solr engine that 
is used irrespective of which Request Handler makes the request? I am trying to 
put debug statements into this class to log the value of the query parameter 
that it receives. The results are different, so I think one or more parameters 
are different.

Thank you in advance,
O. O.

Re: Luke and SOLR search giving different results

2012-12-04 Thread Erol Akarsu

Thanks Shawn and Jack,

I changed solrconfig to set defaul query field (qf) to field content. It
works fine now.

Erol Akarsu

On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/3/2012 1:44 PM, Erol Akarsu wrote:

 I tried  as search query  not baş but features:baş in field q in
 SOLR
 GUI. And, I got result!

 In the one document, I had some fields type of text_eng, text_general and
 one field features type of text_tr. If I don't specify field name, SOLR
 use
 EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
 in search query string.


 Your config is set up to search against a field named text by default -
 either by a setting in schema.xml or a df parameter in your search
 handler definition in solrconfig.xml.  If you are using (e)dismax, it might
 be qf/pf parameters instead of df.

 The field named text is not properly set up for this search.  Your
 attachment at the beginning of this thread indicates that either you do not
 have a text field for this document at all, or that field is not stored.
  If the text field is a copyField as Jack has mentioned, note that it
 doesn't matter what analysis you are doing on features -- the copy is done
 before analysis, so it is completely separate.

 Thanks,
 Shawn

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words bul, baş ,gör and umut in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.comwrote:

 Have you tried using the Solr Admin Analysis page, using the word and a
 few words of context for index analysis and the word alone for query
 analysis?

 And be sure to fully reindex if you change ANYTHING in the schema fields
 or field types.

 -- Jack Krupansky

 From: Erol Akarsu
 Sent: Sunday, December 02, 2012 10:38 PM
 To: solr-user@lucene.apache.org
 Subject: Luke and SOLR search giving different results

 Hi,

 I am trying to apply SOLR for Turkish Language for my research.

 Instead of using language identification, I manually assigned Turkish
 language for a sample test document. I have configured SOLR schema.xml,
 activated the part below. I have added the attached document
 testTurkishDoc.xml that is inserted to SOLR database.

 But searching for raw Lucene index through Luke and SOLR 4.0 search though
 GUI is giving different results. In picture Selection_006.png, the word
 baş is listed as top term. I search the word baş in Luke and I got the
 result result that is only document, shown in Selection_004.png.

 But in SOLR GUI, I am getting empty result for word baş in picture
 Selection_002.png.

 In the text we have  features field, that has word baştan that is being
 derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing
 search different than Luke. I could not figure it out why I could not find
 it while getting in Luke. The same thing happens for words umut, bul
 and gör.

 I will appreciate if you can help me to get same results from SOLR UI.


 field name=features
Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!
 diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
 ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
 firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
 reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği,
 sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
 Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı,
 Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek!
 dedirterek.
   /field



 Added to schema.xml for SOLR:

 field name=features type=text_tr indexed=true stored=true
 multiValued=true/
 fieldType name=text_tr class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.TurkishLowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
 filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.TurkishLowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
 filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
   /analyzer
 /fieldType

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

So, does that highlight the problem for you or not? Is the term analyzed as you 
expected?

-- Jack Krupansky

From: Erol Akarsu 
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org 
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch, there 
was no document in SOLR but only one. 

When I analysed , I can see stemming is correct and I can see these for words 
bul, baş ,gör and umut in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results

  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish 
language for a sample test document. I have configured SOLR schema.xml, 
activated the part below. I have added the attached document testTurkishDoc.xml 
that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search though 
GUI is giving different results. In picture Selection_006.png, the word baş 
is listed as top term. I search the word baş in Luke and I got the result 
result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word baş in picture 
Selection_002.png.

  In the text we have  features field, that has word baştan that is being 
derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words umut, bul and gör.

  I will appreciate if you can help me to get same results from SOLR UI.

  field name=features
 Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
/field

  Added to schema.xml for SOLR:

  field name=features type=text_tr indexed=true stored=true 
multiValued=true/
  fieldType name=text_tr class=solr.TextField positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.TurkishLowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory language=Turkish/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.TurkishLowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory language=Turkish/
/analyzer
  /fieldType

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky j...@basetechnology.comwrote:

 So, does that highlight the problem for you or not? Is the term analyzed
 as you expected?

 -- Jack Krupansky

 From: Erol Akarsu
 Sent: Monday, December 03, 2012 8:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Luke and SOLR search giving different results

 Jack,

 Thanks for help.

 I removed data folder  of SOLR and indexed this sample doc from scratch,
 there was no document in SOLR but only one.

 When I analysed , I can see stemming is correct and I can see these for
 words bul, baş ,gör and umut in SF row
 I attached analyse screens

 Erol Akarsu


 On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   Have you tried using the Solr Admin Analysis page, using the word and a
 few words of context for index analysis and the word alone for query
 analysis?

   And be sure to fully reindex if you change ANYTHING in the schema fields
 or field types.

   -- Jack Krupansky

   From: Erol Akarsu
   Sent: Sunday, December 02, 2012 10:38 PM
   To: solr-user@lucene.apache.org
   Subject: Luke and SOLR search giving different results


   Hi,

   I am trying to apply SOLR for Turkish Language for my research.

   Instead of using language identification, I manually assigned Turkish
 language for a sample test document. I have configured SOLR schema.xml,
 activated the part below. I have added the attached document
 testTurkishDoc.xml that is inserted to SOLR database.

   But searching for raw Lucene index through Luke and SOLR 4.0 search
 though GUI is giving different results. In picture Selection_006.png, the
 word baş is listed as top term. I search the word baş in Luke and I got
 the result result that is only document, shown in Selection_004.png.

   But in SOLR GUI, I am getting empty result for word baş in picture
 Selection_002.png.

   In the text we have  features field, that has word baştan that is
 being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is
 doing search different than Luke. I could not figure it out why I could not
 find it while getting in Luke. The same thing happens for words umut,
 bul and gör.

   I will appreciate if you can help me to get same results from SOLR UI.


   field name=features
  Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!
 diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
 ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
 firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
 reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği,
 sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
 Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı,
 Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek!
 dedirterek.
 /field



   Added to schema.xml for SOLR:

   field name=features type=text_tr indexed=true stored=true
 multiValued=true/
   fieldType name=text_tr class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.TurkishLowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
   filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.TurkishLowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
   filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
 /analyzer
   /fieldType

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky


Two points:

1. Possibly an encoding problem with your container? Is UTF-8 encoding 
enabled?
2. Add debugQuery=true to your query (from the browser) and see if the 
parser_query has the expected term that matches what Luke reports for the 
index and what Solr Admin Analysis also reports for index analysis.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky 
j...@basetechnology.comwrote:



So, does that highlight the problem for you or not? Is the term analyzed
as you expected?

-- Jack Krupansky

From: Erol Akarsu
Sent: Monday, December 03, 2012 8:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words bul, baş ,gör and umut in SF row
I attached analyse screens

Erol Akarsu


On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com
wrote:

  Have you tried using the Solr Admin Analysis page, using the word and a
few words of context for index analysis and the word alone for query
analysis?

  And be sure to fully reindex if you change ANYTHING in the schema fields
or field types.

  -- Jack Krupansky

  From: Erol Akarsu
  Sent: Sunday, December 02, 2012 10:38 PM
  To: solr-user@lucene.apache.org
  Subject: Luke and SOLR search giving different results


  Hi,

  I am trying to apply SOLR for Turkish Language for my research.

  Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

  But searching for raw Lucene index through Luke and SOLR 4.0 search
though GUI is giving different results. In picture Selection_006.png, the
word baş is listed as top term. I search the word baş in Luke and I 
got

the result result that is only document, shown in Selection_004.png.

  But in SOLR GUI, I am getting empty result for word baş in picture
Selection_002.png.

  In the text we have  features field, that has word baştan that is
being derived from root word baş in Turkish Grammar. Somehow, SOLR GUI 
is
doing search different than Luke. I could not figure it out why I could 
not

find it while getting in Luke. The same thing happens for words umut,
bul and gör.

  I will appreciate if you can help me to get same results from SOLR UI.


  field name=features
 Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda 
Turan

ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir 
de

Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı,
Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek!
dedirterek.
/field



  Added to schema.xml for SOLR:

  field name=features type=text_tr indexed=true stored=true
multiValued=true/
  fieldType name=text_tr class=solr.TextField
positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.TurkishLowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory
language=Turkish/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.TurkishLowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory
language=Turkish/
/analyzer
  /fieldType

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

.
/str
/arr
float name=price350.0/float
str name=price_c350,USD/str
int name=popularity6/int
bool name=inStocktrue/bool
date name=manufacturedate_dt2006-02-13T15:26:37Z/date
long name=_version_1420300467908378624/long
/doc
/result
lst name=debug
str name=rawquerystringbaştan/str
str name=querystringbaştan/str
str name=parsedquerytext:baştan/str
str name=parsedquery_toStringtext:baştan/str
lst name=explain
str name=6H500F0
0.028767452 = (MATCH) weight(text:baştan in 0)
[DefaultSimilarity], result of:
0.028767452 = fieldWeight in 0, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
0.30685282 = idf(docFreq=1, maxDocs=1)
0.09375 = fieldNorm(doc=0)
/str
/lst
str name=QParserLuceneQParser/str
lst name=timing
double name=time2.0/double
lst name=prepare
double name=time1.0/double
lst
name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
lst
name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time1.0/double
lst
name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst
name=org.apache.solr.handler.component.DebugComponent
double name=time1.0/double
/lst
/lst
/lst
/lst
/response

On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky j...@basetechnology.comwrote:

 Two points:

 1. Possibly an encoding problem with your container? Is UTF-8 encoding
 enabled?
 2. Add debugQuery=true to your query (from the browser) and see if the
 parser_query has the expected term that matches what Luke reports for the
 index and what Solr Admin Analysis also reports for index analysis.

 -- Jack Krupansky

 -Original Message- From: Erol Akarsu
 Sent: Monday, December 03, 2012 11:35 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Luke and SOLR search giving different results

 Jack,

 Yes.

 I expect SOLR should give same search results as Luked does.

 Term analyzer gives correct answer in SOLR as expected. But SOLR does not
 return correct search results.

 I don't know why.

 Erol Akarsu

 On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  So, does that highlight the problem for you or not? Is the term analyzed
 as you expected?

 -- Jack Krupansky

 From: Erol Akarsu
 Sent: Monday, December 03, 2012 8:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Luke and SOLR search giving different results

 Jack,

 Thanks for help.

 I removed data folder  of SOLR and indexed this sample doc from scratch,
 there was no document in SOLR but only one.

 When I analysed , I can see stemming is correct and I can see these for
 words bul, baş ,gör and umut in SF row
 I attached analyse screens

 Erol Akarsu


 On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   Have you tried using the Solr Admin Analysis page, using the word and a
 few words of context for index analysis and the word alone for query
 analysis?

   And be sure to fully reindex if you change ANYTHING in the schema fields
 or field types.

   -- Jack Krupansky

   From: Erol Akarsu
   Sent: Sunday, December 02, 2012 10:38 PM
   To: solr-user@lucene.apache.org
   Subject: Luke and SOLR search giving different results


   Hi,

   I am

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

Ah! See where it says str name=parsedquery_toStringtext:baş/str? 
Your query is against the text field, which probably doesn't have the 
Turkish analysis.


There is probably a copyField from features to text. You use the 
text_tr field type for features, but probably not for the text field.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat
7.

As you see below, when I search  word baş  with debug mode I can see
empty response. But  when I search word baştan, I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search baştan but not the root word
baş. Probably, English Analyzer is being used and could not find the root
word. For example, in Luke, if I change Analyser to use for query parsing
to EnglishAnalyser, then it can not find word baş but it can with
TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason


?xml version=1.0 encoding=UTF-8?
response
   lst name=responseHeader
   int name=status0/int
   int name=QTime58/int
   lst name=params
   str name=debugQuerytrue/str
   str name=qbaş/str
   str name=wtxml/str
   /lst
   /lst
   result name=response numFound=0 start=0 /
   lst name=debug
   str name=rawquerystringbaş/str
   str name=querystringbaş/str
   str name=parsedquerytext:baş/str
   str name=parsedquery_toStringtext:baş/str
   lst name=explain /
   str name=QParserLuceneQParser/str
   lst name=timing
   double name=time38.0/double
   lst name=prepare
   double name=time16.0/double
   lst
name=org.apache.solr.handler.component.QueryComponent
   double name=time3.0/double
   /lst
   lst
name=org.apache.solr.handler.component.FacetComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.MoreLikeThisComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.HighlightComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.StatsComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.DebugComponent
   double name=time0.0/double
   /lst
   /lst
   lst name=process
   double name=time10.0/double
   lst
name=org.apache.solr.handler.component.QueryComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.FacetComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.MoreLikeThisComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.HighlightComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.StatsComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.component.DebugComponent
   double name=time10.0/double
   /lst
   /lst
   /lst
   /lst
/response

response
   lst name=responseHeader
   int name=status0/int
   int name=QTime2/int
   lst name=params
   str name=debugQuerytrue/str
   str name=qbaştan/str
   str name=wtxml/str
   /lst
   /lst
   result name=response numFound=1 start=0
   doc
   str name=urlhtt://111.a.b1/str
   str name=id6H500F0/str
   str name=langtr/str
   str name=nameMaxtor DiamondMax 11 - hard drive - 500 GB -
SATA-300
   /str
   str name=manuMaxtor Corp./str
   str name=manu_id_smaxtor/str
   arr name=cat
   strelectronics/str
   strhard drive/str
   /arr
   arr name=features
   strSATA 3.0Gb/s, NCQ/str
   str8.5ms seek/str
   str16MB cache/str
   str
   Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim
senaryoyu! diyerek
   baştan savma reklamlarla kotarmaya bakıyor işi.
Futbolcu Arda Turan
   ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
oynatıldığı
   giyim firması reklamı da tam bir fiyasko. Birbirinden
ünlü bu iki

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

I have these in schema.xml that defines features as type of text_tr

But unfortunately, this fails.

 field name=features type=text_tr indexed=true stored=true
multiValued=true/
copyField source=features dest=text/

fieldType name=text_tr class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory
language=Turkish/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory
language=Turkish/
  /analyzer
/fieldType



On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky j...@basetechnology.comwrote:

 Ah! See where it says str name=parsedquery_toString**text:baş/str?
 Your query is against the text field, which probably doesn't have the
 Turkish analysis.

 There is probably a copyField from features to text. You use the
 text_tr field type for features, but probably not for the text field.


 -- Jack Krupansky

 -Original Message- From: Erol Akarsu
 Sent: Monday, December 03, 2012 1:06 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Luke and SOLR search giving different results

 Jack,

 I have already set tomcat server fro UTF-Encoding before. I have added
 URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat
 7.

 As you see below, when I search  word baş  with debug mode I can see
 empty response. But  when I search word baştan, I can get correct
 response.

 It seems to me that TurkishAnalyser is not being used in SOLR search
 because we can make only full word search baştan but not the root word
 baş. Probably, English Analyzer is being used and could not find the root
 word. For example, in Luke, if I change Analyser to use for query parsing
 to EnglishAnalyser, then it can not find word baş but it can with
 TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer.

 Is this assumption true? I could not find any other reason


 ?xml version=1.0 encoding=UTF-8?
 response
lst name=responseHeader
int name=status0/int
int name=QTime58/int
lst name=params
str name=debugQuerytrue/str
str name=qbaş/str
str name=wtxml/str
/lst
/lst
result name=response numFound=0 start=0 /
lst name=debug
str name=rawquerystringbaş/**str
str name=querystringbaş/str
str name=parsedquerytext:baş/**str
str name=parsedquery_toString**text:baş/str
lst name=explain /
str name=QParserLuceneQParser/**str
lst name=timing
double name=time38.0/double
lst name=prepare
double name=time16.0/double
lst
 name=org.apache.solr.handler.**component.QueryComponent
double name=time3.0/double
/lst
lst
 name=org.apache.solr.handler.**component.FacetComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.**MoreLikeThisComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.HighlightComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.StatsComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time10.0/double
lst
 name=org.apache.solr.handler.**component.QueryComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.FacetComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.**MoreLikeThisComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.HighlightComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.StatsComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.DebugComponent
double name=time10.0/double
/lst
/lst
/lst
/lst

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu

Jack,

I see interesting stuff here now.

I tried  as search query  not baş but features:baş in field q in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu eaka...@gmail.com wrote:

 Jack,

 I have these in schema.xml that defines features as type of text_tr

 But unfortunately, this fails.


  field name=features type=text_tr indexed=true stored=true
 multiValued=true/
 copyField source=features dest=text/


 fieldType name=text_tr class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.TurkishLowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
   /analyzer
   analyzer type=query

 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.TurkishLowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_tr.txt enablePositionIncrements=true/
  filter class=solr.SnowballPorterFilterFactory
 language=Turkish/
   /analyzer
 /fieldType




 On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky j...@basetechnology.comwrote:

 Ah! See where it says str name=parsedquery_toString**text:baş/str?
 Your query is against the text field, which probably doesn't have the
 Turkish analysis.

 There is probably a copyField from features to text. You use the
 text_tr field type for features, but probably not for the text field.


 -- Jack Krupansky

 -Original Message- From: Erol Akarsu
 Sent: Monday, December 03, 2012 1:06 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Luke and SOLR search giving different results

 Jack,

 I have already set tomcat server fro UTF-Encoding before. I have added
 URIEncoding=UTF-8 to all Connector .. elements in server.xml in Tomcat
 7.

 As you see below, when I search  word baş  with debug mode I can see
 empty response. But  when I search word baştan, I can get correct
 response.

 It seems to me that TurkishAnalyser is not being used in SOLR search
 because we can make only full word search baştan but not the root word
 baş. Probably, English Analyzer is being used and could not find the
 root
 word. For example, in Luke, if I change Analyser to use for query
 parsing
 to EnglishAnalyser, then it can not find word baş but it can with
 TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer.

 Is this assumption true? I could not find any other reason


 ?xml version=1.0 encoding=UTF-8?
 response
lst name=responseHeader
int name=status0/int
int name=QTime58/int
lst name=params
str name=debugQuerytrue/str
str name=qbaş/str
str name=wtxml/str
/lst
/lst
result name=response numFound=0 start=0 /
lst name=debug
str name=rawquerystringbaş/**str
str name=querystringbaş/str
str name=parsedquerytext:baş/**str
str name=parsedquery_toString**text:baş/str
lst name=explain /
str name=QParserLuceneQParser/**str
lst name=timing
double name=time38.0/double
lst name=prepare
double name=time16.0/double
lst
 name=org.apache.solr.handler.**component.QueryComponent
double name=time3.0/double
/lst
lst
 name=org.apache.solr.handler.**component.FacetComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.**MoreLikeThisComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.HighlightComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.StatsComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time10.0/double
lst
 name=org.apache.solr.handler.**component.QueryComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.FacetComponent
double name=time0.0/double
/lst
lst
 name=org.apache.solr.handler.**component.**MoreLikeThisComponent
double name

Re: Luke and SOLR search giving different results

2012-12-03 Thread Jack Krupansky

As I pointed out in my message, your query is indicating that text is your 
default search field. So, either choose a different default search field, or 
assure that the text field has the desired field type.


If you want to change the default search field, eEither use a df request 
parameter or change the df default value for the request handler in the 
solrconfig.xml.


-- Jack Krupansky

-Original Message- 
From: Erol Akarsu

Sent: Monday, December 03, 2012 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I see interesting stuff here now.

I tried  as search query  not baş but features:baş in field q in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu eaka...@gmail.com wrote:


Jack,

I have these in schema.xml that defines features as type of text_tr

But unfortunately, this fails.


 field name=features type=text_tr indexed=true stored=true
multiValued=true/
copyField source=features dest=text/


fieldType name=text_tr class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
 filter class=solr.SnowballPorterFilterFactory
language=Turkish/
  /analyzer
  analyzer type=query

tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
 filter class=solr.SnowballPorterFilterFactory
language=Turkish/
  /analyzer
/fieldType




On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky 
j...@basetechnology.comwrote:


Ah! See where it says str 
name=parsedquery_toString**text:baş/str?

Your query is against the text field, which probably doesn't have the
Turkish analysis.

There is probably a copyField from features to text. You use the
text_tr field type for features, but probably not for the text 
field.



-- Jack Krupansky

-Original Message- From: Erol Akarsu
Sent: Monday, December 03, 2012 1:06 PM

To: solr-user@lucene.apache.org
Subject: Re: Luke and SOLR search giving different results

Jack,

I have already set tomcat server fro UTF-Encoding before. I have added
URIEncoding=UTF-8 to all Connector .. elements in server.xml in 
Tomcat

7.

As you see below, when I search  word baş  with debug mode I can see
empty response. But  when I search word baştan, I can get correct
response.

It seems to me that TurkishAnalyser is not being used in SOLR search
because we can make only full word search baştan but not the root word
baş. Probably, English Analyzer is being used and could not find the
root
word. For example, in Luke, if I change Analyser to use for query
parsing
to EnglishAnalyser, then it can not find word baş but it can with
TurkishAnalyser only. I guess SOLR is not using TurkishAnalyzer.

Is this assumption true? I could not find any other reason


?xml version=1.0 encoding=UTF-8?
response
   lst name=responseHeader
   int name=status0/int
   int name=QTime58/int
   lst name=params
   str name=debugQuerytrue/str
   str name=qbaş/str
   str name=wtxml/str
   /lst
   /lst
   result name=response numFound=0 start=0 /
   lst name=debug
   str name=rawquerystringbaş/**str
   str name=querystringbaş/str
   str name=parsedquerytext:baş/**str
   str name=parsedquery_toString**text:baş/str
   lst name=explain /
   str name=QParserLuceneQParser/**str
   lst name=timing
   double name=time38.0/double
   lst name=prepare
   double name=time16.0/double
   lst
name=org.apache.solr.handler.**component.QueryComponent
   double name=time3.0/double
   /lst
   lst
name=org.apache.solr.handler.**component.FacetComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.**component.**MoreLikeThisComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.**component.HighlightComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.**component.StatsComponent
   double name=time0.0/double
   /lst
   lst
name=org.apache.solr.handler.**component.DebugComponent
   double name=time0.0/double
   /lst
   /lst

Re: Luke and SOLR search giving different results

2012-12-03 Thread Shawn Heisey


On 12/3/2012 1:44 PM, Erol Akarsu wrote:

I tried  as search query  not baş but features:baş in field q in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.


Your config is set up to search against a field named text by default 
- either by a setting in schema.xml or a df parameter in your search 
handler definition in solrconfig.xml.  If you are using (e)dismax, it 
might be qf/pf parameters instead of df.


The field named text is not properly set up for this search.  Your 
attachment at the beginning of this thread indicates that either you do 
not have a text field for this document at all, or that field is not 
stored.  If the text field is a copyField as Jack has mentioned, note 
that it doesn't matter what analysis you are doing on features -- the 
copy is done before analysis, so it is completely separate.


Thanks,
Shawn

Luke and SOLR search giving different results

2012-12-02 Thread Erol Akarsu

Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though
GUI is giving different results. In picture Selection_006.png, the word baş
is listed as top term. I search the word baş in Luke and I got the result
result that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word baş in picture
Selection_002.png.

In the text we have features field, that has word baştan that is being
derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing
search different than Luke. I could not figure it out why I could not find
it while getting in Luke. The same thing happens for words umut, bul
and gör.

I will appreciate if you can help me to get same results from SOLR UI.

field name=features
Firmalarsa Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı My darling! repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
Paris'in ancak 5 kez izledikten sonra anlaşılan Paris seçti, firma yaptı,
Arda bayıldı. sözleriyle kazındı hafızalara, Keşke unutabilsek!
dedirterek.
/field

Added to schema.xml for SOLR:

field name=features type=text_tr indexed=true stored=true
multiValued=true/
fieldType name=text_tr class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory
language=Turkish/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory
language=Turkish/
/analyzer
/fieldType
add
doc
field name=urlhtt://111.a.b1/field
field name=id6H500F0/field
field name=langtr/field
field name=nameMaxtor DiamondMax 11 - hard drive - 500 GB - SATA-300/field
field name=manuMaxtor Corp./field
!-- Join --
field name=manu_id_smaxtor/field
field name=catelectronics/field
field name=cathard drive/field
field name=featuresSATA 3.0Gb/s, NCQ/field
field name=features8.5ms seek/field
field name=features16MB cache/field
field name=price350/field
field name=popularity6/field
field name=inStocktrue/field
field name=features
Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
/field
!-- Buffalo store --
field name=manufacturedate_dt2006-02-13T15:26:37Z/field
/doc
/add

Re: Luke and SOLR search giving different results

2012-12-02 Thread Jack Krupansky

Have you tried using the Solr Admin Analysis page, using the word and a few 
words of context for index analysis and the word alone for query analysis?

And be sure to fully reindex if you change ANYTHING in the schema fields or 
field types.

-- Jack Krupansky

From: Erol Akarsu 
Sent: Sunday, December 02, 2012 10:38 PM
To: solr-user@lucene.apache.org 
Subject: Luke and SOLR search giving different results

Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish language 
for a sample test document. I have configured SOLR schema.xml, activated the 
part below. I have added the attached document testTurkishDoc.xml that is 
inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI 
is giving different results. In picture Selection_006.png, the word baş is 
listed as top term. I search the word baş in Luke and I got the result result 
that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word baş in picture 
Selection_002.png.

In the text we have  features field, that has word baştan that is being 
derived from root word baş in Turkish Grammar. Somehow, SOLR GUI is doing 
search different than Luke. I could not figure it out why I could not find it 
while getting in Luke. The same thing happens for words umut, bul and 
gör. 

I will appreciate if you can help me to get same results from SOLR UI.


field name=features
   Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” 
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve 
büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması 
reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam 
Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda 
Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in 
ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda 
bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
  /field



Added to schema.xml for SOLR:

field name=features type=text_tr indexed=true stored=true 
multiValued=true/
fieldType name=text_tr class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory language=Turkish/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=lang/stopwords_tr.txt enablePositionIncrements=true/
filter class=solr.SnowballPorterFilterFactory language=Turkish/
  /analyzer
/fieldType

Re: synonyms.txt: different results on admin and on site..

2011-09-09 Thread deniz

you are right about wildcards and analysis stuff... 

so any way of putting wildcards in for analysis? 

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3322026.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms.txt: different results on admin and on site..

2011-09-08 Thread François Schiettecatte

Wildcard terms are not analyzed, so your synonyms.txt may come into play here,
have you check the analysis for deniz* ?

François

On Sep 7, 2011, at 10:08 PM, deniz wrote:

well yea you are right... i realised that lack of detail issue here... so
here it comes...

This is from my schema.xml and basically i have a synonyms.txt file which
contains

deniz,denis,denise

After posting here, I have checked some stuff that I have faced before,
while trying to add accented letters to the system... so it seems like same
or similar stuff... so...

As i want to support partial matches, the search string is modified on php
side. if user enters deniz, it is sent to solr as deniz*

when i check on solr admin, i was able to make searches with
deniz,denise,denis and they all return correct results, but when i put the
wildcard, i get nothing...

so with the above settings;

deniz
denise
denis
works smoothly

deniz*
denise*
denis*
returns nothing...

should i implement some kinda analyzer or tokenizer or any kinda component
to overtime this thing?

Rob Casson wrote:

you should probably post your schema.xml and some parts of your
synonyms.txt. it could be differences between your index and query
analysis chains, synonym expansion errors, etc, but folks will likely
need more details to help you out.

cheers,
rob

On Wed, Sep 7, 2011 at 9:46 PM, deniz lt;denizdurmu...@gmail.comgt;
wrote:
could it be related with analysis issue about synonyms once again?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
Sent from the Solr - User mailing list archive at Nabble.com.

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html
Sent from the Solr - User mailing list archive at Nabble.com.

synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz

hi all...

i have checked the list about the issue in the title, but couldnt find any
related info... so my problem is:

i change sysnonyms.txt and then reload the core without restarting the
server. new synonyms works smoothly if i use admin interface of solr,
however when i use the site which is written in php, i got nothing when i
use one of the synonyms that i have added.

any ideas why this is happening?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz

could it be related with analysis issue about synonyms once again? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread Rob Casson

you should probably post your schema.xml and some parts of your
synonyms.txt.  it could be differences between your index and query
analysis chains, synonym expansion errors, etc, but folks will likely
need more details to help you out.

cheers,
rob

On Wed, Sep 7, 2011 at 9:46 PM, deniz denizdurmu...@gmail.com wrote:
 could it be related with analysis issue about synonyms once again?



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: synonyms.txt: different results on admin and on site..

2011-09-07 Thread deniz

well yea you are right... i realised that lack of detail issue here... so
here it comes...

This is from my schema.xml and basically i have a synonyms.txt file which
contains

deniz,denis,denise

After posting here, I have checked some stuff that I have faced before,
while trying to add accented letters to the system... so it seems like same
or similar stuff... so...

As i want to support partial matches, the search string is modified on php
side. if user enters deniz, it is sent to solr as deniz*

when i check on solr admin, i was able to make searches with
deniz,denise,denis and they all return correct results, but when i put the
wildcard, i get nothing...

so with the above settings;

deniz
denise
denis
works smoothly

deniz*
denise*
denis*
returns nothing...

should i implement some kinda analyzer or tokenizer or any kinda component
to overtime this thing?

Rob Casson wrote:

cheers,
rob

On Wed, Sep 7, 2011 at 9:46 PM, deniz lt;denizdurmu...@gmail.comgt;
wrote:
could it be related with analysis issue about synonyms once again?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

2011-01-06 Thread PeterKerk


@iorixxx:
I ran: http://localhost:8983/solr/db/update/?optimize=true
This is the response:
response
lst name=responseHeader
int name=status0/int
int name=QTime58/int
/lst
/response

Then I ran:
http://localhost:8983/solr/db/select/?indent=onfacet=onq=*:*facet.field=themes_raw

This is response:
lst name=facet_fields
lst name=themes_raw
int name=Hotel en Restaurant366/int
int name=Kasteel en Landgoed153/int
int name=Strand en Zee16/int
/lst
/lst

So, it seems that nothing has changed there, and it looks like also before
the optimize operation the results were shown correct?

when you say http caching, you mean the caching by the browser? Or does Solr
have some caching by default? If the latter, how can I clear that cache?


@Erick: I added debugquery

For Strand en Zee I see this:
arr name=parsed_filter_queries
strPhraseQuery(themes:strand en zee)/str
/arr

Looks correct.


For Kasteel en Landgoed I see this:
arr name=parsed_filter_queries
strPhraseQuery(themes:kasteel en landgo)/str
/arr

Which isnt correct! So it seems herein lies the problem.

Now Im wondering why the value is cut off...this is my schema.xml:

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

field name=themes type=text indexed=true stored=true
multiValued=true  /
field name=themes_raw type=string indexed=true stored=true
multiValued=true/


I checked analysis.jsp:
filled in Field: themes
and Field value: Kasteel en Landgoed

and schema.jsp, but I didnt see any weird results

Now, Im wondering what else it could be..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

2011-01-06 Thread Juan Grande

You have a problem with the analysis chain. When you do a query, the
EnglishPorterFilter is cutting off the last part of your word, but you're
not doing the same when indexing. I think that removing that filter from the
chain will solve your problem.

Remember that there are two different analysis chains, one for indexing time
and one for querying time. I think that you didn't see the shortened word in
analysis.jsp because you entered the text in the Field Value (Index) text
box, so it was using the indexing time analysis chain. If you want to see
the results of applying the querying time analysis chain, you should enter
the text in the Field Value (Query) text box.

Good luck,

Juan Grande

On Thu, Jan 6, 2011 at 10:58 AM, PeterKerk vettepa...@hotmail.com wrote:


 @iorixxx:
 I ran: http://localhost:8983/solr/db/update/?optimize=true
 This is the response:
 response
lst name=responseHeader
int name=status0/int
int name=QTime58/int
/lst
 /response

 Then I ran:

 http://localhost:8983/solr/db/select/?indent=onfacet=onq=*:*facet.field=themes_raw

 This is response:
 lst name=facet_fields
lst name=themes_raw
int name=Hotel en Restaurant366/int
int name=Kasteel en Landgoed153/int
 int name=Strand en Zee16/int
 /lst
 /lst

 So, it seems that nothing has changed there, and it looks like also before
 the optimize operation the results were shown correct?

 when you say http caching, you mean the caching by the browser? Or does
 Solr
 have some caching by default? If the latter, how can I clear that cache?


 @Erick: I added debugquery

 For Strand en Zee I see this:
 arr name=parsed_filter_queries
 strPhraseQuery(themes:strand en zee)/str
 /arr

 Looks correct.


 For Kasteel en Landgoed I see this:
 arr name=parsed_filter_queries
 strPhraseQuery(themes:kasteel en landgo)/str
 /arr

 Which isnt correct! So it seems herein lies the problem.

 Now Im wondering why the value is cut off...this is my schema.xml:

 fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
 /fieldType

 field name=themes type=text indexed=true stored=true
 multiValued=true  /
 field name=themes_raw type=string indexed=true stored=true
 multiValued=true/


 I checked analysis.jsp:
 filled in Field: themes
 and Field value: Kasteel en Landgoed

 and schema.jsp, but I didnt see any weird results

 Now, Im wondering what else it could be..
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

2011-01-06 Thread PeterKerk


That was it! thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2206087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searching similar values for same field results in different results

2011-01-05 Thread PeterKerk


Something weird is happening.

I have locations that can have 1 or more themes.
A theme can be: Kasteel en Landgoed, or a theme can be Strand en Zee

I checked in the database, there are many locations that have 1 or more of
these themes assigned to it.

Also in the response xml when I do a general search I get:
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=themes_raw
int name=Hotel en Restaurant366/int
int name=Kasteel en Landgoed153/int- 153 found
int name=Strand en Zee16/int  - 16 found
/lst


When I request this:
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Strand%20en%20Zee%22q=*:*fl=id,title
I get 16 results. Which is expected.

When I request this:
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Kasteel%20en%20Landgoed%22q=*:*fl=id,title
I get 0 results!!!

why?!?


definition in schema.xml:


field name=themes type=text indexed=true stored=true
multiValued=true  /
field name=themes_raw type=string indexed=true stored=true
multiValued=true/

copyField source=themes dest=themes_raw/

Why are these results differing?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

2011-01-05 Thread Ahmet Arslan

 Something weird is happening.
 
 I have locations that can have 1 or more themes.
 A theme can be: Kasteel en Landgoed, or a theme can be
 Strand en Zee
 
 I checked in the database, there are many locations that
 have 1 or more of
 these themes assigned to it.
 
 Also in the response xml when I do a general search I get:
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=themes_raw
     int name=Hotel en
 Restaurant366/int
     int name=Kasteel en
 Landgoed153/int    - 153
 found
     int name=Strand en
 Zee16/int    - 16 found
 /lst
 
 
 When I request this:
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Strand%20en%20Zee%22q=*:*fl=id,title
 I get 16 results. Which is expected.
 
 When I request this:
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Kasteel%20en%20Landgoed%22q=*:*fl=id,title
 I get 0 results!!!
 
 why?!?

May be you deleted those documents? Deleted terms can appear in facet section 
until you optimize. Can you run these queries after an optimize operation?
What is the output of this after an optimize :
facet=onq=*:*facet.field=themes_raw

Also using browser to query/test solr sometimes gives old results due to http 
caching.

Re: Searching similar values for same field results in different results

2011-01-05 Thread PeterKerk


uhm...how do I perform an optimize operation? :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199795.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching similar values for same field results in different results

2011-01-05 Thread Ahmet Arslan


 
 uhm...how do I perform an optimize operation? :)


http://localhost:8983/solr/db/update/?optimize=true

Re: Searching similar values for same field results in different results

2011-01-05 Thread Erick Erickson

Often adding debugQuery=on to the URL can show you very useful information
that helps pinpoint the problem. I confess I don't see anything amiss in
what
you've shown though.

Also, look at the schema browser page off the admin page, and look
at your themes field to see what is actually in your index, it may
surprise you..

Finally, the admin/analysis page (turn debug on) may also help you to see
exactly what tokenization is happening when indexing and querying. I'd guess
that the behavior isn't exactly what you expect.

Best
Erick


On Wed, Jan 5, 2011 at 10:47 AM, PeterKerk vettepa...@hotmail.com wrote:


 Something weird is happening.

 I have locations that can have 1 or more themes.
 A theme can be: Kasteel en Landgoed, or a theme can be Strand en Zee

 I checked in the database, there are many locations that have 1 or more of
 these themes assigned to it.

 Also in the response xml when I do a general search I get:
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=themes_raw
int name=Hotel en Restaurant366/int
int name=Kasteel en Landgoed153/int- 153 found
int name=Strand en Zee16/int  - 16 found
 /lst


 When I request this:

 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Strand%20en%20Zee%22q=*:*fl=id,title
 I get 16 results. Which is expected.

 When I request this:

 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:%22Kasteel%20en%20Landgoed%22q=*:*fl=id,title
 I get 0 results!!!

 why?!?


 definition in schema.xml:


 field name=themes type=text indexed=true stored=true
 multiValued=true  /
 field name=themes_raw type=string indexed=true stored=true
 multiValued=true/

 copyField source=themes dest=themes_raw/

 Why are these results differing?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Different Results..

2010-12-22 Thread satya swaroop

Hi All,
 i am getting different results when i used with some escape keys..
for example:::
1) when i use this request
http://localhost:8080/solr/select?q=erlang!ericson
   the result obtained is
   result name=response numFound=1934 start=0

2) when the request is
 http://localhost:8080/solr/select?q=erlang/ericson
the result is
  result name=response numFound=1 start=0


My query here is, do solr consider both the queries differently and what do
it consider for !,/ and all other escape characters.


Regards,
satya

Re: Different Results..

2010-12-22 Thread Marco Martinez

We need more information about the the analyzers and tokenizers of the
default field of your search

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/12/22 satya swaroop satya.yada...@gmail.com

 Hi All,
 i am getting different results when i used with some escape keys..
 for example:::
 1) when i use this request
http://localhost:8080/solr/select?q=erlang!ericson
   the result obtained is
   result name=response numFound=1934 start=0

 2) when the request is
 http://localhost:8080/solr/select?q=erlang/ericson
the result is
  result name=response numFound=1 start=0


 My query here is, do solr consider both the queries differently and what do
 it consider for !,/ and all other escape characters.


 Regards,
 satya

Re: Different Results..

2010-12-22 Thread Ahmet Arslan

--- On Wed, 12/22/10, satya swaroop satya.yada...@gmail.com wrote:

 From: satya swaroop satya.yada...@gmail.com
 Subject: Different Results..
 To: solr-user@lucene.apache.org
 Date: Wednesday, December 22, 2010, 10:44 AM
 Hi All,
          i am getting
 different results when i used with some escape keys..
 for example:::
 1) when i use this request
             http://localhost:8080/solr/select?q=erlang!ericson

    the result obtained is

    result name=response numFound=1934
 start=0

 2) when the request is
              http://localhost:8080/solr/select?q=erlang/ericson

     the result is

           result
 name=response numFound=1 start=0

 My query here is, do solr consider both the queries
 differently and what do
 it consider for !,/ and all other escape characters.

First of all ! has a special meaning. it means NOT. It is part of the query 
syntax. It is equivalent to minus - operator. 

q=erlang!ericson is parsed into : 
defaultSearchField:erlang -defaultSearchField:ericson

You can see this by appending debugQuery=on to your search URL.

So you need to escape ! in your case. 
q=erlang\!ericson will return same result set as q=erlang/ericson

You can see the complete list of special charter list.
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping Special 
Characters

Re: different results depending on result format

2010-10-22 Thread Savvas-Andreas Moysidis

strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolov soko...@ifactory.com wrote:

 quick follow-up: I also notice that the query from solrj gets version=1,
 whereas the admin webapp puts version=2.2 on the query string, although this
 param doesn't seem to change the xml results at all.  Does this indicate an
 older version of solrj perhaps?

 -Mike


 On 10/21/2010 04:47 PM, Mike Sokolov wrote:

 I'm experiencing something really weird: I get different results depending
 on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
 spent quite a while staring at query params to make sure everything else is
 the same, and they do seem to be.  At first I thought the problem related to
 the javabin format change that has been talked about recently, but I am
 using solr 1.4.0 and solrj 1.4.0.

 Notice in the two entries that the wt param is different and the hits
 result count is different.

 Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
 INFO: [bopp.ba] webapp=/solr path=/select/
 params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
 hits=261 status=0 QTime=1
 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
 INFO: [bopp.ba] webapp=/solr path=/select
 params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
 hits=57 status=0 QTime=0


 The xml format results seem to be the correct ones. So one thought I had
 is that I could somehow fall back to using xml format in solrj, but I tried
 SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still
 javabin).


 Am I crazy? Is this a known issue?

 Thanks for any suggestions

Re: different results depending on result format

2010-10-22 Thread Mike Sokolov

Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's really 
a mystery.  Unfortunately I have to catch up on other stuff I have been 
neglecting, but I'll follow up when I'm able to get a solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:

strange..are you absolutely sure the two queries are directed to the same
Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com  wrote:

   

quick follow-up: I also notice that the query from solrj gets version=1,
whereas the admin webapp puts version=2.2 on the query string, although this
param doesn't seem to change the xml results at all.  Does this indicate an
older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

 

I'm experiencing something really weird: I get different results depending
on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml.  I
spent quite a while staring at query params to make sure everything else is
the same, and they do seem to be.  At first I thought the problem related to
the javabin format change that has been talked about recently, but I am
using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1}
hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought I had
is that I could somehow fall back to using xml format in solrj, but I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect (I get
'wt=javabinwt=javabin' in the log - ie the param is repeated, but still
javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions

Re: different results depending on result format

2010-10-22 Thread Mike Sokolov

OK I solved the problem.  It turns out that I was connecting to the 
server using its FQDN (rosen.ifactory.com).  When, instead, I connect to 
it using the name rosen (which maps to the same IP using the default 
domain name configured in my resolver, ifactory.com), I get results back.


I am looking into the virtual hosts config in tomcat; it seems as if 
there must indeed be another solr instance running; in fact I'm now 
concerned there might be two solr instances running against the same 
data folder. yargh.


-Mike


On 10/22/2010 09:05 AM, Mike Sokolov wrote:
Yes - I really only have the one solr instance.  And I have plenty of 
other cases where I am getting good results back via solrj.  It's 
really a mystery.  Unfortunately I have to catch up on other stuff I 
have been neglecting, but I'll follow up when I'm able to get a 
solution...


-Mike


On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote:
strange..are you absolutely sure the two queries are directed to the 
same

Solr instance? I'm running the same query from the admin page (which
specifies the xml format) and I get the exact same results as solrj.

On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com  wrote:

quick follow-up: I also notice that the query from solrj gets 
version=1,
whereas the admin webapp puts version=2.2 on the query string, 
although this
param doesn't seem to change the xml results at all.  Does this 
indicate an

older version of solrj perhaps?

-Mike


On 10/21/2010 04:47 PM, Mike Sokolov wrote:

I'm experiencing something really weird: I get different results 
depending
on whether I specify wt=javabin, and retrieve using SolrJ, or 
wt=xml.  I
spent quite a while staring at query params to make sure everything 
else is
the same, and they do seem to be.  At first I thought the problem 
related to
the javabin format change that has been talked about recently, but 
I am

using solr 1.4.0 and solrj 1.4.0.

Notice in the two entries that the wt param is different and the hits
result count is different.

Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/
params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 


hits=261 status=0 QTime=1
Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select
params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 


hits=57 status=0 QTime=0


The xml format results seem to be the correct ones. So one thought 
I had
is that I could somehow fall back to using xml format in solrj, but 
I tried
SolrQuery.set('wt','xml') and that didn't have the desired effect 
(I get
'wt=javabinwt=javabin' in the log - ie the param is repeated, but 
still

javabin).


Am I crazy? Is this a known issue?

Thanks for any suggestions

Re: different results depending on result format

2010-10-21 Thread Mike Sokolov

quick follow-up: I also notice that the query from solrj gets version=1, 
whereas the admin webapp puts version=2.2 on the query string, although 
this param doesn't seem to change the xml results at all.  Does this 
indicate an older version of solrj perhaps?


-Mike

On 10/21/2010 04:47 PM, Mike Sokolov wrote:
I'm experiencing something really weird: I get different results 
depending on whether I specify wt=javabin, and retrieve using SolrJ, 
or wt=xml.  I spent quite a while staring at query params to make sure 
everything else is the same, and they do seem to be.  At first I 
thought the problem related to the javabin format change that has been 
talked about recently, but I am using solr 1.4.0 and solrj 1.4.0.


Notice in the two entries that the wt param is different and the hits 
result count is different.


Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select/ 
params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 
hits=261 status=0 QTime=1

Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute
INFO: [bopp.ba] webapp=/solr path=/select 
params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} 
hits=57 status=0 QTime=0



The xml format results seem to be the correct ones. So one thought I 
had is that I could somehow fall back to using xml format in solrj, 
but I tried SolrQuery.set('wt','xml') and that didn't have the desired 
effect (I get 'wt=javabinwt=javabin' in the log - ie the param is 
repeated, but still javabin).



Am I crazy? Is this a known issue?

Thanks for any suggestions

SolrJ - how separte different results from the same facet query?

2010-03-15 Thread Saïd Radhouani

I'm faceting with a two different query ranges while using addFacetQuery. I
wonder wether it's possible using SolrJ to extract the result of each query
range separately. Here's is an example:

addFacetQuery(price:[* TO 150]); addFacetQuery(price:[151 TO 300]); etc.
addFacetQuery(length:[* TO 5]);addFacetQuery(length:[5 TO 10]); etc.

When I use getFacetQuery, SolrJ gives me the responses of both query ranges
(prices and lengths) mixed in the same list. I wonder wether it's possible
to tell SolrJ to extract the response of a specific query range, i.e., tell
it to extract the price-based response in a list and the length-based
response in another list. It would be helpful to have something like
getFacetQuery(field=price), getFacetQuery(field=length), etc.

Any ideas?

Thanks.

Re: SolrJ - how separte different results from the same facet query?

2010-03-15 Thread Jon Baer

I am interested in this as well ... Im also having the issue of understanding 
if a result has been elevated by the QueryElevation component.  It should like 
SolrJ would need to know about some type of metadata contained within the docs 
but I haven't seen SolrJ dealing w/ payloads specifically yet.  

I also can't tell if these would require some feature request on those 
components or if it's something that is too custom that it would require 
writing new components.  

It sounds like retrieving a document should answer questions like ...

did this document come from a facet query?
was this document elevated?

Etc.  Maybe something the Debug component can handle if it can write payloads 
back to the results, etc.

- Jon

On Mar 15, 2010, at 7:56 AM, Saïd Radhouani wrote:

 I'm faceting with a two different query ranges while using addFacetQuery. I
 wonder wether it's possible using SolrJ to extract the result of each query
 range separately. Here's is an example:
 
 addFacetQuery(price:[* TO 150]); addFacetQuery(price:[151 TO 300]); etc.
 addFacetQuery(length:[* TO 5]);addFacetQuery(length:[5 TO 10]); etc.
 
 When I use getFacetQuery, SolrJ gives me the responses of both query ranges
 (prices and lengths) mixed in the same list. I wonder wether it's possible
 to tell SolrJ to extract the response of a specific query range, i.e., tell
 it to extract the price-based response in a list and the length-based
 response in another list. It would be helpful to have something like
 getFacetQuery(field=price), getFacetQuery(field=length), etc.
 
 Any ideas?
 
 Thanks.

Re: Different results return for capital and small letters.

2009-01-02 Thread Otis Gospodnetic

Tushar,

Could you ask on solr-user in the future, please?
Your last sentence got cut off.  Do you have LowerCaseFilter in both the index 
and query-time analyzer sections?  Perhaps you should just paste that section 
of the config.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Tushar_Gandhi tushar_gan...@neovasolutions.com
 To: solr-...@lucene.apache.org
 Sent: Wednesday, December 31, 2008 3:26:32 AM
 Subject: Different results return for capital and small letters.
 
 
 Hi,
I am using solr 1.3.
 I am facing a problem with the ordering of the results returned by the
 solr.
 Whenever I search for cats, it is giving me the result. Nextly whenever I
 am searching CATS, I am getting same result but ordering is different. Is
 this the behavior of the Solr ? Is there is any priority for searching
 depending on the cases?
 I want same result for both. What should I do if this is default behavior of
 solr?
 Is there is any problem with my indexing?
 Also, I already have LowerCaseFilter configuration for the
 Thanks,
 Tushar
 -- 
 View this message in context: 
 http://www.nabble.com/Different-results-return-for-capital-and-small-letters.-tp21228594p21228594.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.

95 matches

Mail list logo