Re: strange behavior of solr query parser

2020-03-02 Thread Hongtai Xue
Hi Phil.Staley

Thanks for your reply.
but I'm afraid that's a different problem.

Our problem can be confirmed since at least SOLR 7.3.0. (the oldest version we 
have)
And we guess it might already exists since SOLR-9786.
https://github.com/apache/lucene-solr/commit/bf9db95f218f49bac8e7971eb953a9fd9d13a2f0#diff-269ae02e56283ced3ce781cce21b3147R563

sincerely 
hongtai

送信元: "Staley, Phil R - DCF" 
Reply-To: "d...@lucene.apache.org" 
日付: 2020年3月2日 月曜日 22:38
宛先: solr_user lucene_apache , 
"d...@lucene.apache.org" 
件名: Re: strange behavior of solr query parser

I believe we are experiencing the same thing.

We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
“The website encountered an unexpected error. Please try again later.”
 
Below is a list of example terms that always result in this error and a similar 
list that works fine.  The problem pattern seems to be a search term that 
contains 2 or 3 characters followed by a space, followed by additional text.
 
To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.
 
Thoughts?
 
Search terms that result in error
• w-2 agency directory
• agency w-2 directory
• w-2 agency
• w-2 directory
• w2 agency directory
• w2 agency
• w2 directory
 
Search terms that do not result in error
• w-22 agency directory
• agency directory w-2
• agency w-2directory
• agencyw-2 directory
• w-2
• w2
• agency directory
• agency
• directory
• -2 agency directory
• 2 agency directory
• w-2agency directory
• w2agency directory
 



From: Hongtai Xue 
Sent: Monday, March 2, 2020 3:45 AM
To: solr_user lucene_apache 
Cc: d...@lucene.apache.org 
Subject: strange behavior of solr query parser 
 
Hi,
 
Our team found a strange behavior of solr query parser.
In some specific cases, some conditional clauses on unindexed field will be 
ignored.
 
for query like, q=A:1 OR B:1 OR A:2 OR B:2
if field B is not indexed(but docValues="true"), "B:1" will be lost.
 
but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2, 
it will work perfect.
 
the only difference of two queries is that they are wrote in different orders.
one is ABAB, another is AABB,
 
■reproduce steps and example explanation
you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.
 
1. create a _default collection
bin/solr create -c books -s 2 -rf 2
 
2. post books.csv.
bin/post -c books example/exampledocs/books.csv
 
3. run following query.
http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29=query
 
 
I printed query parsing debug information. 
you can tell "name_str:Foundation" is lost.
 
query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")

  "debug":{
    "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
    "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
    "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
    "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
    "QParser":"LuceneQParser"}}

 
but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
everything is OK. "name_str:Foundation" is not lost.

  "debug":{
    "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
    "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
    "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
    "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
    "QParser":"LuceneQParser"}}

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29=query
 
we did a little bit research, and we wander if it is a bug of SolrQueryParser.
more specifically, we think if statement here might be wrong.
https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
 
Could you please tell us if it is a bug, or it's just a wrong query statement.
 
Thanks,
Hongtai Xue



Re: strange behavior of solr query parser

2020-03-02 Thread Staley, Phil R - DCF
I believe we are experiencing the same thing.


We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting 
reports of certain patterns of search terms resulting in an error that reads, 
“The website encountered an unexpected error. Please try again later.”



Below is a list of example terms that always result in this error and a similar 
list that works fine.  The problem pattern seems to be a search term that 
contains 2 or 3 characters followed by a space, followed by additional text.



To confirm that the problem is version 8 of SOLR, I have updated our local and 
UAT sites with the latest Drupal updates that did include an update to the 
Search API Solr module and tested the terms below under SOLR 7.7.2, 8.3.1, and 
8.4.1.  Under version 7.7.2  everything works fine. Under either of the version 
8, the problem returns.



Thoughts?



Search terms that result in error

  *   w-2 agency directory
  *   agency w-2 directory
  *   w-2 agency
  *   w-2 directory
  *   w2 agency directory
  *   w2 agency
  *   w2 directory



Search terms that do not result in error

  *   w-22 agency directory
  *   agency directory w-2
  *   agency w-2directory
  *   agencyw-2 directory
  *   w-2
  *   w2
  *   agency directory
  *   agency
  *   directory
  *   -2 agency directory
  *   2 agency directory
  *   w-2agency directory
  *   w2agency directory





From: Hongtai Xue 
Sent: Monday, March 2, 2020 3:45 AM
To: solr_user lucene_apache 
Cc: d...@lucene.apache.org 
Subject: strange behavior of solr query parser


Hi,



Our team found a strange behavior of solr query parser.

In some specific cases, some conditional clauses on unindexed field will be 
ignored.



for query like, q=A:1 OR B:1 OR A:2 OR B:2

if field B is not indexed(but docValues="true"), "B:1" will be lost.



but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,

it will work perfect.



the only difference of two queries is that they are wrote in different orders.

one is ABAB, another is AABB,



■reproduce steps and example explanation

you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.



1. create a _default collection

bin/solr create -c books -s 2 -rf 2



2. post books.csv.

bin/post -c books example/exampledocs/books.csv



3. run following query.

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29=query





I printed query parsing debug information.

you can tell "name_str:Foundation" is lost.



query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"

(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")



  "debug":{

"rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",

"querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",

"parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",

"parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",

"QParser":"LuceneQParser"}}





but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",

everything is OK. "name_str:Foundation" is not lost.



  "debug":{

"rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",

"querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",

"parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",

"parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",

"QParser":"LuceneQParser"}}



http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29=query



we did a little bit research, and we wander if it is a bug of SolrQueryParser.

more specifically, we think if statement here might be wrong.

https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711



Could you please tell us if it is a bug, or it's just a wrong query statement.



Thanks,

Hongtai Xue


strange behavior of solr query parser

2020-03-02 Thread Hongtai Xue
Hi,

Our team found a strange behavior of solr query parser.
In some specific cases, some conditional clauses on unindexed field will be 
ignored.

for query like, q=A:1 OR B:1 OR A:2 OR B:2
if field B is not indexed(but docValues="true"), "B:1" will be lost.

but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
it will work perfect.

the only difference of two queries is that they are wrote in different orders.
one is ABAB, another is AABB,

■reproduce steps and example explanation
you can easily reproduce this problem on a solr collection with _default 
configset and exampledocs/books.csv data.

1. create a _default collection
bin/solr create -c books -s 2 -rf 2

2. post books.csv.
bin/post -c books example/exampledocs/books.csv

3. run following query.
http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29=query


I printed query parsing debug information.
you can tell "name_str:Foundation" is lost.

query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
64 61 74 69 6f 6e")

  "debug":{
"rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
cat:cd)",
"parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
65 72 65 67]]))",
"parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]])",
"QParser":"LuceneQParser"}}


but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
everything is OK. "name_str:Foundation" is not lost.

  "debug":{
"rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
cat:cd)",
"parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 
68 65 72 65 67]])))",
"parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 
69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO 
[4a 68 65 72 65 67]]))",
"QParser":"LuceneQParser"}}

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29=query

we did a little bit research, and we wander if it is a bug of SolrQueryParser.
more specifically, we think if statement here might be wrong.
https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711

Could you please tell us if it is a bug, or it's just a wrong query statement.

Thanks,
Hongtai Xue


Re: strange behavior

2019-06-06 Thread Wendy2
Hi David,

I see. It fixed now by adding the ().  Thank you so much!
q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: strange behavior

2019-06-06 Thread Wendy2
Hi Shawn,

I see. 

I added () and it works now. Thank you very much for your help!

q=audit_author.name:(Burley,%20S.K.)%20AND%20entity.type:polymer=1





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: strange behavior

2019-06-06 Thread Shawn Heisey

On 6/6/2019 12:46 PM, Wendy2 wrote:

Why "AND" didn't work anymore?

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?
What could be the cause? Thanks!

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents.
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.=entity.type:polymer=1


That should work no problem with edismax.  It would not however work 
properly with dismax, and it would be easy to mix up the two query parsers.


The way you have written your query is somewhat ambiguous, because of 
the space after the comma.  That ambiguity exists in both of the queries 
mentioned, even the one with the fq.


Thanks,
Shawn


Re: strange behavior

2019-06-06 Thread David Hastings
audit_author.name:Burley,%20S.K.

translates to
audit_author.name:Burley, DEFAULT_OPERATOR DEFAULT_FIELD:S.K.




On Thu, Jun 6, 2019 at 2:46 PM Wendy2  wrote:

>
> Hi,
>
> Why "AND" didn't work anymore?
>
> I use Solr 7.3.1 and edismax parser.
> Could someone explain to me why the following query doesn't work any
> more?
> What could be the cause? Thanks!
>
> q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer
>
> It worked previously but now returned very lower number of documents.
> I had to use "fq" to make it work correctly:
>
> q=audit_author.name:Burley,%20S.K.=entity.type:polymer=1
>
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


strange behavior

2019-06-06 Thread Wendy2


Hi,

Why "AND" didn't work anymore?  

I use Solr 7.3.1 and edismax parser.
Could someone explain to me why the following query doesn't work any more?  
What could be the cause? Thanks! 

q=audit_author.name:Burley,%20S.K.%20AND%20entity.type:polymer

It worked previously but now returned very lower number of documents. 
I had to use "fq" to make it work correctly:

q=audit_author.name:Burley,%20S.K.=entity.type:polymer=1







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Strange Behavior When Extracting Features

2017-10-16 Thread Michael Alcorn
If anyone else is following this thread, I replied on the Jira.

On Mon, Oct 16, 2017 at 4:07 AM, alessandro.benedetti 
wrote:

> This is interesting, the EFI parameter resolution should work using the
> quotes independently of the query parser.
> At that point, the query parsers (both) receive a multi term text.
> Both of them should work the same.
> At the time I saw the mail I tried to reproduce it through the LTR module
> tests and I didn't succeed .
> It would be quite useful if you can contribute a test that is failing with
> the field query parser.
> Have you tried just with the same query, but in a request handler ?
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Strange Behavior When Extracting Features

2017-10-16 Thread alessandro.benedetti
This is interesting, the EFI parameter resolution should work using the
quotes independently of the query parser.
At that point, the query parsers (both) receive a multi term text.
Both of them should work the same.
At the time I saw the mail I tried to reproduce it through the LTR module
tests and I didn't succeed .
It would be quite useful if you can contribute a test that is failing with
the field query parser.
Have you tried just with the same query, but in a request handler ?



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Strange Behavior When Extracting Features

2017-10-13 Thread Michael Alcorn
I believe I've discovered a workaround. If you use:

{
"store": "redhat_efi_feature_store",
"name": "case_description_issue_tfidf",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q":"{!dismax qf=text_tfidf}${text}"
}
}

instead of:

{
"store": "redhat_efi_feature_store",
"name": "case_description_issue_tfidf",
"class": "org.apache.solr.ltr.feature.SolrFeature",
"params": {
"q": "{!field f=issue_tfidf}${case_description}"
}
}

you can then use single quotes to incorporate multi-term arguments as
Alessandro suggested. I've added this information to the Jira.

On Fri, Sep 22, 2017 at 8:30 AM, alessandro.benedetti 
wrote:

> I think this has nothing to do with the LTR plugin.
> The problem here should be just the way you use the local params,
> to properly pass multi term local params in Solr you need to use *'* :
>
> efi.case_description='added couple of fiber channel'
>
> This should work.
> If not only the first term will be passed as a local param and then passed
> in the efi map to LTR.
>
> I will update the Jira issue as well.
>
> Cheers
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Strange Behavior When Extracting Features

2017-09-22 Thread alessandro.benedetti
I think this has nothing to do with the LTR plugin.
The problem here should be just the way you use the local params,
to properly pass multi term local params in Solr you need to use *'* :

efi.case_description='added couple of fiber channel'

This should work.
If not only the first term will be passed as a local param and then passed
in the efi map to LTR.

I will update the Jira issue as well.

Cheers





-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Strange Behavior When Extracting Features

2017-09-20 Thread Michael Alcorn
Hi all,

I'm getting some extremely strange behavior when trying to extract features
for a learning to rank model. The following query incorrectly says all
features have zero values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=added
couple of fiber channel={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=added couple of fiber channel
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

But this query, which simply moves the word "added" from the front of the
provided text to the back, properly fills in the feature values:

http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple
of fiber channel added={!ltr model=redhat_efi_model reRankDocs=1
efi.case_summary=the efi.case_description=couple of fiber channel added
efi.case_issue=the efi.case_environment=the}=id,score,[features]=10

The explain output for the failing query can be found here:

https://gist.github.com/manisnesan/18a8f1804f29b1b62ebfae1211f38cc4

and the explain output for the properly functioning query can be found here:

https://gist.github.com/manisnesan/47685a561605e2229434b38aed11cc65

Have any of you run into this issue? Seems like it could be a bug.

Thanks,
Michael A. Alcorn


Re: Strange behavior of solr

2015-09-02 Thread Zheng Lin Edwin Yeo
Is there any error message in the log when Solr stops indexing the file at
line 2046?

Regards,
Edwin

On 2 September 2015 at 17:17, Long Yan  wrote:

> Hey,
> I have created a core with
> bin\solr create -c mycore
>
> I want to index the csv sample files from solr-5.2.1
>
> If I index film.csv under solr-5.2.1\example\films\, solr can only index
> this file until the line
> "2046,Wong Kar-wai,Romance Film|Fantasy|Science
> Fiction|Drama,,/en/2046_2004,2004-05-20"
>
> But if I at first index books.csv under solr-5.2.1\example\exampledocs and
> then index film.csv, solr can index all lines in film.csv
>
> Why?
>
> Regards
> Long Yan
>
>
>


Re: Strange behavior of solr

2015-09-02 Thread Erik Hatcher
See example/films/README.txt

The “name” field is guessed incorrectly (because the first film has name=“.45”, 
so indexing errors once it hits a name value that is no longer numeric.  The 
README provides a command to define the name field *before* indexing.  If 
you’ve indexed and had the name field guessed incorrectly and created, you’ll 
need to delete and recreate the collection, then define the name field, then 
reindex.

We used to have a fake film at the top to allow field guessing to “work”, but I 
felt that was too fake and that the example should be true to what happens with 
real world data and the pitfalls of allowing field type guessing to guess 
incorrectly.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com




> On Sep 2, 2015, at 5:17 AM, Long Yan  wrote:
> 
> Hey,
> I have created a core with
> bin\solr create -c mycore
> 
> I want to index the csv sample files from solr-5.2.1
> 
> If I index film.csv under solr-5.2.1\example\films\, solr can only index this 
> file until the line
> "2046,Wong Kar-wai,Romance Film|Fantasy|Science 
> Fiction|Drama,,/en/2046_2004,2004-05-20"
> 
> But if I at first index books.csv under solr-5.2.1\example\exampledocs and 
> then index film.csv, solr can index all lines in film.csv
> 
> Why?
> 
> Regards
> Long Yan
> 
> 



Strange behavior of solr

2015-09-02 Thread Long Yan
Hey,
I have created a core with
bin\solr create -c mycore

I want to index the csv sample files from solr-5.2.1

If I index film.csv under solr-5.2.1\example\films\, solr can only index this 
file until the line
"2046,Wong Kar-wai,Romance Film|Fantasy|Science 
Fiction|Drama,,/en/2046_2004,2004-05-20"

But if I at first index books.csv under solr-5.2.1\example\exampledocs and then 
index film.csv, solr can index all lines in film.csv

Why?

Regards
Long Yan




Re: Strange Behavior

2014-08-23 Thread Jack Krupansky
It sounds as if you are trying to treat hyphen as a digit so that negative 
numbers are discrete terms. But... that conflicts with the use of hyphen as 
a word separator. Sorry, but WDF does not support both. Pick one or the 
other, you can't have both.


But first, please explain your intended use case clearly - there may be some 
better way to try to achieve it.


Use the analysis page of the Solr Admin UI to see the detailed query and 
index analysis of your terms. You'll be surprised.


-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Sent: Thursday, August 21, 2014 2:31 PM
To: solr-user@lucene.apache.org
Subject: Strange Behavior

Hi , I have a field type text_general where query type for worddelimiter I 
am using the below type: where wddftype.txt contains - DIGIT



When I do a query I am not getting the right results. E.g. Name:Wi-Fi 
Gets results but Name:Wi-Fi Devices Make not getting any results

but if I change it to Name:Wi-Fi Devices Make~3 it works.

If someone can explain what is happening with the current situation..? FYI I 
have the types=wdfftypes.txt in Query Analyzer.



My Fieldtype

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100

 analyzer type=index

   charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /


   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KStemFilterFactory/

   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 splitOnCaseChange=0
splitOnNumerics=0 stemEnglishPossessive=0 
catenateWords=1 catenateNumbers=1

catenateAll=1 preserveOriginal=1 /

   filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/


/analyzer
 analyzer type=query

   charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /


   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.KStemFilterFactory/

filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 splitOnCaseChange=0
splitOnNumerics=0 stemEnglishPossessive=0 
catenateWords=1 catenateNumbers=1
catenateAll=1 preserveOriginal=1 
types=wdfftypes.txt /
   filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/


/analyzer
   /fieldType





Re: Strange Behavior

2014-08-23 Thread Shawn Heisey
On 8/23/2014 9:01 AM, Jack Krupansky wrote:
 It sounds as if you are trying to treat hyphen as a digit so that
 negative numbers are discrete terms. But... that conflicts with the use
 of hyphen as a word separator. Sorry, but WDF does not support both.
 Pick one or the other, you can't have both.
 
 But first, please explain your intended use case clearly - there may be
 some better way to try to achieve it.
 
 Use the analysis page of the Solr Admin UI to see the detailed query and
 index analysis of your terms. You'll be surprised.

You can force WDF to treat hyphen as a digit if you want to, but you are
right that you cannot have both.  To change WDF, create a text file, put
the following in it, and reference it with the types parameter on
WordDelimiterFilterFactory:

- = DIGIT

I use this functionality to build a special analysis chain for
mimetypes.  FOR that fieldType, I treat hyphen and underscore as ALPHANUM.

Search for wdfftypes on this page for more info:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Naturally you have to reindex after making this change.  For anyone who
doesn't know what that entails:

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Strange Behavior

2014-08-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi , I have a field type text_general where query type for worddelimiter I am 
using the below type: where wddftype.txt contains - DIGIT


When I do a query I am not getting the right results. E.g. Name:Wi-Fi  Gets 
results but Name:Wi-Fi Devices Make not getting any results
but if I change it to Name:Wi-Fi Devices Make~3 it works.

If someone can explain what is happening with the current situation..? FYI I 
have the types=wdfftypes.txt in Query Analyzer.


My Fieldtype

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index

charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory/

filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 
catenateWords=1 catenateNumbers=1
 catenateAll=1 preserveOriginal=1 /

filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/

 /analyzer
  analyzer type=query

charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory/
filter class=solr.KStemFilterFactory/

 filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 
catenateWords=1 catenateNumbers=1
 catenateAll=1 preserveOriginal=1 
types=wdfftypes.txt /
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/

 /analyzer
/fieldType





Re: Strange Behavior with Solr in Tomcat.

2014-06-07 Thread S.L
Thanks, Meraj, that was exactly the issue , setting
useColdSearchertrue/useColdSearcher worked like a charm and the server
starts up as usual.

Thanks again!


On Fri, Jun 6, 2014 at 2:42 PM, Meraj A. Khan mera...@gmail.com wrote:

 This looks distinctly related to
 https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher = true
 as being suggested in JIRA and let us know .


 On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon 
 jean-sebastien.vac...@wantedanalytics.com wrote:

  I would try a thread dump and check the output to see what`s going on.
  You could also strace the process if you`re running on Unix or changed
 the
  log level in Solr to get more information logged
 
   -Original Message-
   From: S.L [mailto:simpleliving...@gmail.com]
   Sent: June-06-14 2:33 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Strange Behavior with Solr in Tomcat.
  
   Anyone folks?
  
  
   On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com
 wrote:
  
 Hi Folks,
   
I recently started using the spellchecker in my solrconfig.xml. I am
able to build up an index in Solr.
   
But,if I ever shutdown tomcat I am not able to restart it.The server
never spits out the server startup time in seconds in the logs,nor
does it print any error messages in the catalina.out file.
   
The only way for me to get around this is by delete the data
 directory
of the index and then start the server,obviously this makes me loose
 my
   index.
   
Just wondering if anyone faced a similar issue and if they were able
to solve this.
   
Thanks.
   
   
  
   -
   Aucun virus trouvé dans ce message.
   Analyse effectuée par AVG - www.avg.fr
   Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
   27/05/2014 La Base de données des virus a expiré.
 



Re: Strange Behavior with Solr in Tomcat.

2014-06-07 Thread Shalin Shekhar Mangar
Interesting, thanks for reporting back. I've re-opened SOLR-4408.


On Sat, Jun 7, 2014 at 10:50 PM, S.L simpleliving...@gmail.com wrote:

 Thanks, Meraj, that was exactly the issue , setting
 useColdSearchertrue/useColdSearcher worked like a charm and the server
 starts up as usual.

 Thanks again!


 On Fri, Jun 6, 2014 at 2:42 PM, Meraj A. Khan mera...@gmail.com wrote:

  This looks distinctly related to
  https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher =
 true
  as being suggested in JIRA and let us know .
 
 
  On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon 
  jean-sebastien.vac...@wantedanalytics.com wrote:
 
   I would try a thread dump and check the output to see what`s going on.
   You could also strace the process if you`re running on Unix or changed
  the
   log level in Solr to get more information logged
  
-Original Message-
From: S.L [mailto:simpleliving...@gmail.com]
Sent: June-06-14 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Strange Behavior with Solr in Tomcat.
   
Anyone folks?
   
   
On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com
  wrote:
   
  Hi Folks,

 I recently started using the spellchecker in my solrconfig.xml. I
 am
 able to build up an index in Solr.

 But,if I ever shutdown tomcat I am not able to restart it.The
 server
 never spits out the server startup time in seconds in the logs,nor
 does it print any error messages in the catalina.out file.

 The only way for me to get around this is by delete the data
  directory
 of the index and then start the server,obviously this makes me
 loose
  my
index.

 Just wondering if anyone faced a similar issue and if they were
 able
 to solve this.

 Thanks.


   
-
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr
Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
27/05/2014 La Base de données des virus a expiré.
  
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread S.L
Anyone folks?


On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com wrote:

  Hi Folks,

 I recently started using the spellchecker in my solrconfig.xml. I am able
 to build up an index in Solr.

 But,if I ever shutdown tomcat I am not able to restart it.The server never
 spits out the server startup time in seconds in the logs,nor does it print
 any error messages in the catalina.out file.

 The only way for me to get around this is by delete the data directory of
 the index and then start the server,obviously this makes me loose my index.

 Just wondering if anyone faced a similar issue and if they were able to
 solve this.

 Thanks.




RE: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread Jean-Sebastien Vachon
I would try a thread dump and check the output to see what`s going on. 
You could also strace the process if you`re running on Unix or changed the log 
level in Solr to get more information logged

 -Original Message-
 From: S.L [mailto:simpleliving...@gmail.com]
 Sent: June-06-14 2:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Strange Behavior with Solr in Tomcat.
 
 Anyone folks?
 
 
 On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com wrote:
 
   Hi Folks,
 
  I recently started using the spellchecker in my solrconfig.xml. I am
  able to build up an index in Solr.
 
  But,if I ever shutdown tomcat I am not able to restart it.The server
  never spits out the server startup time in seconds in the logs,nor
  does it print any error messages in the catalina.out file.
 
  The only way for me to get around this is by delete the data directory
  of the index and then start the server,obviously this makes me loose my
 index.
 
  Just wondering if anyone faced a similar issue and if they were able
  to solve this.
 
  Thanks.
 
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
 27/05/2014 La Base de données des virus a expiré.


Re: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread Meraj A. Khan
This looks distinctly related to
https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher = true
as being suggested in JIRA and let us know .


On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 I would try a thread dump and check the output to see what`s going on.
 You could also strace the process if you`re running on Unix or changed the
 log level in Solr to get more information logged

  -Original Message-
  From: S.L [mailto:simpleliving...@gmail.com]
  Sent: June-06-14 2:33 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Strange Behavior with Solr in Tomcat.
 
  Anyone folks?
 
 
  On Wed, Jun 4, 2014 at 10:25 AM, S.L simpleliving...@gmail.com wrote:
 
Hi Folks,
  
   I recently started using the spellchecker in my solrconfig.xml. I am
   able to build up an index in Solr.
  
   But,if I ever shutdown tomcat I am not able to restart it.The server
   never spits out the server startup time in seconds in the logs,nor
   does it print any error messages in the catalina.out file.
  
   The only way for me to get around this is by delete the data directory
   of the index and then start the server,obviously this makes me loose my
  index.
  
   Just wondering if anyone faced a similar issue and if they were able
   to solve this.
  
   Thanks.
  
  
 
  -
  Aucun virus trouvé dans ce message.
  Analyse effectuée par AVG - www.avg.fr
  Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
  27/05/2014 La Base de données des virus a expiré.



Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi Folks,

I recently started using the spellchecker in my solrconfig.xml. I am able to 
build up an index in Solr.

But,if I ever shutdown tomcat I am not able to restart it.The server never 
spits out the server startup time in seconds in the logs,nor does it print any 
error messages in the catalina.out file.

The only way for me to get around this is by delete the data directory of the 
index and then start the server,obviously this makes me loose my index.

Just wondering if anyone faced a similar issue and if they were able to solve 
this.

Thanks.



Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread Aman Tandon
I guess if you try to copy the index and then kill the process of tomcat
then it might help. If still the index need to be delete you would have the
back up. Next time always make back up.
On Jun 4, 2014 7:55 PM, S.L simpleliving...@gmail.com wrote:

 Hi Folks,

 I recently started using the spellchecker in my solrconfig.xml. I am able
 to build up an index in Solr.

 But,if I ever shutdown tomcat I am not able to restart it.The server never
 spits out the server startup time in seconds in the logs,nor does it print
 any error messages in the catalina.out file.

 The only way for me to get around this is by delete the data directory of
 the index and then start the server,obviously this makes me loose my index.

 Just wondering if anyone faced a similar issue and if they were able to
 solve this.

 Thanks.




Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi,

This is not a case of accidental deletion , the only way I can restart the
tomcat is by deleting the data directory for the index that was created
earlier, this started happening after I started using spellcheckers in my
solrconfig.xml. As long as the Tomcat is running its fine.

Any help from anyone who faced a similar issues would be appreciated.

Thanks.



On Wed, Jun 4, 2014 at 11:08 AM, Aman Tandon antn.s...@gmail.com wrote:

 I guess if you try to copy the index and then kill the process of tomcat
 then it might help. If still the index need to be delete you would have the
 back up. Next time always make back up.
 On Jun 4, 2014 7:55 PM, S.L simpleliving...@gmail.com wrote:

  Hi Folks,
 
  I recently started using the spellchecker in my solrconfig.xml. I am able
  to build up an index in Solr.
 
  But,if I ever shutdown tomcat I am not able to restart it.The server
 never
  spits out the server startup time in seconds in the logs,nor does it
 print
  any error messages in the catalina.out file.
 
  The only way for me to get around this is by delete the data directory of
  the index and then start the server,obviously this makes me loose my
 index.
 
  Just wondering if anyone faced a similar issue and if they were able to
  solve this.
 
  Thanks.
 
 



Re: Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-06 Thread Nils Kaiser
Actually I found why... I had and as lowercase word in my queries at the
checkbox does not seem to work in the admin UI.
adding lowercaseOperators=false made the queries work.


2014-04-04 18:10 GMT+02:00 Nils Kaiser m...@nils-kaiser.de:

 Hey,

 I am currently using solr to recognize songs and people from a list of
 user comments. My index stores the titles of the songs. At the moment my
 application builds word ngrams and fires a search with that query, which
 works well but is quite inefficient.

 So my thought was to simply use the collated comments as query. So it is a
 case where the query is much longer. I need to use mm=0 or mm=1.

 My plan was to use edismax as the pf2 and pf3 parameters should work well
 for my usecase.

 However when using longer queries, I get a strange behavior which can be
 seen in debugQuery.

 Here is an example:

 Collated Comments (used as query)

 I love Henry so much. It is hard to tear your eyes away from Maria, but
 watch just his feet. You'll be amazed.
 sometimes pure skill can will a comp, sometimes pure joy can win... put
 them both together and there is no competition
 This video clip makes me smile.
 Pure joy!
 so good!
 Who's the person that gave this a thumbs down?!? This is one of the best
 routines I've ever seen. Period. And it's a competitionl! How is that
 possible? They're so good it boggles my mind.
 It's gorgeous. Flawless victory.
 Great number! Does anybody know the name of the piece?
 I believe it's called Sunny side of the street
 Maria is like, the best 'follow' I've ever seen. She's so amazing.
 Thanks so much Johnathan!

 Song name in Index
 Louis Armstrong - Sunny Side of The Street

 parsedquery_toString:
 +(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
 (text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
 (text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
 (text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
 (text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
 (text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
 (text:win...) (text:put) (text:them) (text:both) +(text:together)
 +(text:there) (text:is) (text:no) (text:competition) (text:This)
 (text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
 (text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
 (text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
 (text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
 (text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
 +(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
 (text:possible?) (text:They're) (text:so) (text:good) (text:it)
 (text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
 (text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
 (text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
 (text:piece?) (text:I) (text:believe) (text:it's) (text:called)
 (text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
 (text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
 (text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
 (text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)/str

 This query generates 0 results. The reason is it expects terms together,
 there, Period., it's to be part of the document (see parsedquery above, all
 other terms are optional, those terms are must).

 Is there any reason for this behavior? If I use shorter queries it works
 flawlessly and returns the document.

 I've appended the whole query.

 Best,

 Nils



Re: Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-05 Thread Jack Krupansky
Set the q.op parameter to OR and set mm=10% or something like that. The idea is 
to not excessively restrict the documents that will match, but weight the 
matched results based on how many word pairs and triples do match.

In addition, use the pf parameter to provide extra weight when the full query 
term phrase matches exactly.

-- Jack Krupansky

From: Nils Kaiser 
Sent: Friday, April 4, 2014 10:10 AM
To: solr-user@lucene.apache.org 
Subject: Strange behavior of edismax and mm=0 with long queries (bug?)

Hey, 

I am currently using solr to recognize songs and people from a list of user 
comments. My index stores the titles of the songs. At the moment my application 
builds word ngrams and fires a search with that query, which works well but is 
quite inefficient.

So my thought was to simply use the collated comments as query. So it is a case 
where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well for 
my usecase.

However when using longer queries, I get a strange behavior which can be seen 
in debugQuery.

Here is an example:

Collated Comments (used as query)

I love Henry so much. It is hard to tear your eyes away from Maria, but watch 
just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them 
both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best 
routines I've ever seen. Period. And it's a competitionl! How is that possible? 
They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It) (text:is) 
(text:hard) (text:to) (text:tear) (text:your) (text:eyes) (text:away) 
(text:from) (text:Maria,) (text:but) (text:watch) (text:just) (text:his) 
(text:feet.) (text:You'll) (text:be) (text:amazed.) (text:sometimes) 
(text:pure) (text:skill) (text:can) (text:will) (text:a) (text:comp,) 
(text:sometimes) (text:pure) (text:joy) (text:can) (text:win...) (text:put) 
(text:them) (text:both) +(text:together) +(text:there) (text:is) (text:no) 
(text:competition) (text:This) (text:video) (text:clip) (text:makes) (text:me) 
(text:smile.) (text:Pure) (text:joy!) (text:so) (text:good!) (text:Who's) 
(text:the) (text:person) (text:that) (text:gave) (text:this) (text:a) 
(text:thumbs) (text:down?!?) (text:This) (text:is) (text:one) (text:of) 
(text:the) (text:best) (text:routines) (text:I've) (text:ever) (text:seen.) 
+(text:Period.) +(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) 
(text:that) (text:possible?) (text:They're) (text:so) (text:good) (text:it) 
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.) 
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does) 
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the) 
(text:piece?) (text:I) (text:believe) (text:it's) (text:called) (text:Sunny) 
(text:side) (text:of) (text:the) (text:street) (text:Maria) (text:is) 
(text:like,) (text:the) (text:best) (text:'follow') (text:I've) (text:ever) 
(text:seen.) (text:She's) (text:so) (text:amazing.) (text:Thanks) (text:so) 
(text:much) (text:Johnathan!))~1)/str
 
This query generates 0 results. The reason is it expects terms together, there, 
Period., it's to be part of the document (see parsedquery above, all other 
terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works 
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils

Strange behavior of edismax and mm=0 with long queries (bug?)

2014-04-04 Thread Nils Kaiser
Hey,

I am currently using solr to recognize songs and people from a list of user
comments. My index stores the titles of the songs. At the moment my
application builds word ngrams and fires a search with that query, which
works well but is quite inefficient.

So my thought was to simply use the collated comments as query. So it is a
case where the query is much longer. I need to use mm=0 or mm=1.

My plan was to use edismax as the pf2 and pf3 parameters should work well
for my usecase.

However when using longer queries, I get a strange behavior which can be
seen in debugQuery.

Here is an example:

Collated Comments (used as query)

I love Henry so much. It is hard to tear your eyes away from Maria, but
watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put
them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best
routines I've ever seen. Period. And it's a competitionl! How is that
possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!

Song name in Index
Louis Armstrong - Sunny Side of The Street

parsedquery_toString:
+(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It)
(text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes)
(text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just)
(text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.)
(text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a)
(text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can)
(text:win...) (text:put) (text:them) (text:both) +(text:together)
+(text:there) (text:is) (text:no) (text:competition) (text:This)
(text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure)
(text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person)
(text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?)
(text:This) (text:is) (text:one) (text:of) (text:the) (text:best)
(text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.)
+(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that)
(text:possible?) (text:They're) (text:so) (text:good) (text:it)
(text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.)
(text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does)
(text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the)
(text:piece?) (text:I) (text:believe) (text:it's) (text:called)
(text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria)
(text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've)
(text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.)
(text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)/str

This query generates 0 results. The reason is it expects terms together,
there, Period., it's to be part of the document (see parsedquery above, all
other terms are optional, those terms are must).

Is there any reason for this behavior? If I use shorter queries it works
flawlessly and returns the document.

I've appended the whole query.

Best,

Nils
?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime11/int
/lst
result name=response numFound=0 start=0
/result
lst name=debug
  str name=rawquerystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan!
/str
  str name=querystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed.
sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition
This video clip makes me smile.
Pure joy!
so good!
Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind.
It's gorgeous. Flawless victory.
Great number! Does anybody know the name of the piece?
I believe it's called Sunny side of the street
Maria is like, the best 'follow' I've ever seen. She's so amazing.
Thanks so much Johnathan

Strange behavior while deleting

2014-03-31 Thread abhishek jain
hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek


Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky
Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?


Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?


How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain
Hi,
These settings are commented in schema. These are two different solr severs and 
almost identical schema ‎with the exception of one stemmed field.

Same solr versions are running.
Please help.

Thanks 
Abhishek

  Original Message  
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain
Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek 



Re: Strange behavior while deleting

2014-03-31 Thread Jack Krupansky

So, how big is the discrepancy?

If you do a *:* query for rows=100, is the 100th result the same for both?

Do a bunch of random queries and see if you can find a document key that is 
missing from one core, but present in the other, and check if it should have 
been deleted.


Are you deleting by id or by query?

Do you do an explicit commit on your update request? If not, it could just 
take a few minutes before the commit actually occurs.


Are the two Solr servers on the same machine or different machines? If the 
latter, is one of the machines significantly faster than the other.


-- Jack Krupansky

-Original Message- 
From: abhishek.netj...@gmail.com

Sent: Monday, March 31, 2014 5:48 AM
To: solr-user@lucene.apache.org ; solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Hi,
These settings are commented in schema. These are two different solr severs 
and almost identical schema ‎with the exception of one stemmed field.


Same solr versions are running.
Please help.

Thanks
Abhishek

 Original Message
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain

Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

--
Thanks and kind Regards,
Abhishek 



Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  fragmenter name=gap
  default=true
  class=solr.highlight.GapFragmenter
lst name=defaults
  int name=hl.fragsize150/int
/lst
  /fragmenter

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
languange=Spanish/
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 types=characters.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Strange behavior on text field with number-text content

2013-05-29 Thread Erick Erickson
Hmmm, there are two things you _must_ get familiar with when diagnosing
these G..

1 admin/analysis. That'll show you exactly what the analysis chain does,
and it's
 not always obvious.
2 add debug=query to your input and look at the parsed query results. For
instance,
 this name:4nSolution Inc. parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except..

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой alexey...@gmail.com wrote:

 solr-user-unsubscribe solr-user-unsubscr...@lucene.apache.org


 2013/5/28 Michał Matulka michal.matu...@gowork.pl

  Thanks for your responses, I must admit that after hours of trying I
 made some mistakes.
 So the most problematic phrase will now be:
 4nSolution Inc. which cannot be found using query:

 name:4nSolution

 or even

 name:4nSolution Inc.

 but can be using following queries:

 name:nSolution
 name:4
 name:inc

 Sorry for the mess, it turned out I didn't reindex fields after modyfying
 schema so I thought that the problem also applies to 300letters .

 The cause of all of this is the WordDelimiter filter defined as following:

 fieldType name=text class=solr.TextField
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0
 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
 /fieldType

 and I still don't know why it behaves like that - after all there is
 preserveOriginal attribute set to 1...

 W dniu 28.05.2013 14:21, Erick Erickson pisze:

 Hmmm, with 4.x I get much different behavior than you're
 describing, what version of Solr are you using?

 Besides Alex's comments, try adding debug=query to the url and see what 
 comes
 out from the query parser.

 A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
 do
 any analysis, here's the javadoc...
  /**
* Default analyzer for types that only produces 1 verbatim token...
* A maximum size of chars to be read must be specified
*/

 so it's much like the string type. Which means I'm totally perplexed by 
 your
 statement that 300 and letters return a hit. Have you perhaps changed the
 field definition and not re-indexed?

 The behavior you're seeing really looks like somehow 
 WordDelimiterFilterFactory
 is getting into your analysis chain with settings that don't mash the parts 
 back
 together, i.e. you can set up WDDF to split on letter/number transitions, 
 index
 each and NOT index the original, but I have no explanation for how that
 could happen with the field definition you indicated

 FWIW,
 Erick

 On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitcharafa...@gmail.com 
 arafa...@gmail.com wrote:

   What does analyzer screen say in the Web AdminUI when you try to do that?
 Also, what are the tokens stored in the field (also in Web AdminUI).

 I think it is very strange to have TextField without a tokenizer chain.
 Maybe you get a standard one assigned by default, but I don't know what the
 standard chain would be.

 Regards,

   Alex.
 On 28 May 2013 04:44, Michał Matulka michal.matu...@gowork.pl 
 michal.matu...@gowork.pl wrote:


  Hello,

 I've got following problem. I have a text type in my schema and a field
 name of that type.
 That field contains a data, there is, for example, record 

Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka

Hello,

I've got following problem. I have a text type in my schema and a field 
name of that type.
That field contains a data, there is, for example, record that has 
300letters as name.


Now field type definition:
fieldType name=text class=solr.TextField/fieldType

And, of course, field definition:
fieldname=nametype=textindexed=truestored=true/

yes, that's all - there are no tokenizers.

And now time for my question:

Why following queries:

name:300

and

name:letters

are returning that result, but:

name:300letters

is not (0 results)?

Best regards,
Michał Matulka


Re: Strange behavior on text field with number-text content

2013-05-28 Thread Alexandre Rafalovitch
 What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).

I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standard chain would be.

Regards,

  Alex.
On 28 May 2013 04:44, Michał Matulka michal.matu...@gowork.pl wrote:

 Hello,

 I've got following problem. I have a text type in my schema and a field
 name of that type.
 That field contains a data, there is, for example, record that has
 300letters as name.

 Now field type definition:
 fieldType name=text class=solr.TextField/**fieldType

 And, of course, field definition:
 fieldname=nametype=text**indexed=truestored=true/

 yes, that's all - there are no tokenizers.

 And now time for my question:

 Why following queries:

 name:300

 and

 name:letters

 are returning that result, but:

 name:300letters

 is not (0 results)?

 Best regards,
 Michał Matulka



Re: Strange behavior on text field with number-text content

2013-05-28 Thread Erick Erickson
Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?

Besides Alex's comments, try adding debug=query to the url and see what comes
out from the query parser.

A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any analysis, here's the javadoc...
 /**
   * Default analyzer for types that only produces 1 verbatim token...
   * A maximum size of chars to be read must be specified
   */

so it's much like the string type. Which means I'm totally perplexed by your
statement that 300 and letters return a hit. Have you perhaps changed the
field definition and not re-indexed?

The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
is getting into your analysis chain with settings that don't mash the parts back
together, i.e. you can set up WDDF to split on letter/number transitions, index
each and NOT index the original, but I have no explanation for how that
could happen with the field definition you indicated

FWIW,
Erick

On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
  What does analyzer screen say in the Web AdminUI when you try to do that?
 Also, what are the tokens stored in the field (also in Web AdminUI).

 I think it is very strange to have TextField without a tokenizer chain.
 Maybe you get a standard one assigned by default, but I don't know what the
 standard chain would be.

 Regards,

   Alex.
 On 28 May 2013 04:44, Michał Matulka michal.matu...@gowork.pl wrote:

 Hello,

 I've got following problem. I have a text type in my schema and a field
 name of that type.
 That field contains a data, there is, for example, record that has
 300letters as name.

 Now field type definition:
 fieldType name=text class=solr.TextField/**fieldType

 And, of course, field definition:
 fieldname=nametype=text**indexed=truestored=true/

 yes, that's all - there are no tokenizers.

 And now time for my question:

 Why following queries:

 name:300

 and

 name:letters

 are returning that result, but:

 name:300letters

 is not (0 results)?

 Best regards,
 Michał Matulka



Re: Strange behavior on text field with number-text content

2013-05-28 Thread Michał Matulka

  
  
Thanks for your responses, I must admit
  that after hours of trying I made some mistakes.
  So the most problematic phrase will now be:
  "4nSolution Inc." which cannot be found using query:
  
  name:4nSolution
  
  or even
  
  name:4nSolution Inc.
  
  but can be using following queries:
  
  name:nSolution
  name:4
  name:inc
  
  Sorry for the mess, it turned out I didn't reindex fields after
  modyfying schema so I thought that the problem also applies to
  300letters .
  
  The cause of all of this is the WordDelimiter filter defined as
  following:
  
  fieldType name="text" class="solr.TextField"
    analyzer type="index"
      tokenizer class="solr.WhitespaceTokenizerFactory"/
      !-- in this example, we will only use synonyms at
  query time
      filter class="solr.SynonymFilterFactory"
  synonyms="index_synonyms.txt" ignoreCase="true"
  expand="false"/
      --
      !-- Case insensitive stop word removal.
    add enablePositionIncrements=true in both the index and
  query
    analyzers to leave a 'gap' for more accurate phrase
  queries.
      --
      filter class="solr.StopFilterFactory"
      ignoreCase="true"
      words="stopwords.txt"
      enablePositionIncrements="true"
      /
      filter class="solr.WordDelimiterFilterFactory"
  generateWordParts="1" generateNumberParts="1" catenateWords="1"
  catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
  preserveOriginal="1"/
      filter class="solr.LowerCaseFilterFactory"/
      filter class="solr.SnowballPorterFilterFactory"
  language="English" protected="protwords.txt"/
    /analyzer
    analyzer type="query"
      tokenizer class="solr.WhitespaceTokenizerFactory"/
      filter class="solr.SynonymFilterFactory"
  synonyms="synonyms.txt" ignoreCase="true" expand="true"/
      filter class="solr.StopFilterFactory"
      ignoreCase="true"
      words="stopwords.txt"
      enablePositionIncrements="true"
      /
      filter class="solr.WordDelimiterFilterFactory"
  generateWordParts="1" generateNumberParts="1" catenateWords="0"
  catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
  preserveOriginal="1" /
      filter class="solr.LowerCaseFilterFactory"/
      filter class="solr.SnowballPorterFilterFactory"
  language="English" protected="protwords.txt"/
    /analyzer
      /fieldType
  
  and I still don't know why it behaves like that - after all there
  is "preserveOriginal" attribute set to 1...
  
  W dniu 28.05.2013 14:21, Erick Erickson pisze:


  Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?

Besides Alex's comments, try adding debug=query to the url and see what comes
out from the query parser.

A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any analysis, here's the javadoc...
 /**
   * Default analyzer for types that only produces 1 verbatim token...
   * A maximum size of chars to be read must be specified
   */

so it's much like the "string" type. Which means I'm totally perplexed by your
statement that 300 and letters return a hit. Have you perhaps changed the
field definition and not re-indexed?

The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
is getting into your analysis chain with settings that don't mash the parts back
together, i.e. you can set up WDDF to split on letter/number transitions, index
each and NOT index the original, but I have no explanation for how that
could happen with the field definition you indicated

FWIW,
Erick

On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
arafa...@gmail.com wrote:

  
 What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).

I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standard chain would be.

Regards,

  Alex.
On 28 May 2013 04:44, "Michał Matulka" michal.matu...@gowork.pl wrote:



  Hello,

I've got following problem. I have a text type in my schema and a field
"name" of that type.
That field contains a data, there is, for example, record that has
"300letters" as name.

Now field type definition:
fieldType name="text" class="solr.TextField"/**fieldType

And, of course, field definition:
fieldname="name"type="text"**indexed="true"stored="true"/

yes, that's all - there are no tokenizers.

And now time for my 

Re: Strange behavior on text field with number-text content

2013-05-28 Thread Алексей Цой
solr-user-unsubscribe solr-user-unsubscr...@lucene.apache.org


2013/5/28 Michał Matulka michal.matu...@gowork.pl

  Thanks for your responses, I must admit that after hours of trying I
 made some mistakes.
 So the most problematic phrase will now be:
 4nSolution Inc. which cannot be found using query:

 name:4nSolution

 or even

 name:4nSolution Inc.

 but can be using following queries:

 name:nSolution
 name:4
 name:inc

 Sorry for the mess, it turned out I didn't reindex fields after modyfying
 schema so I thought that the problem also applies to 300letters .

 The cause of all of this is the WordDelimiter filter defined as following:

 fieldType name=text class=solr.TextField
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 !-- Case insensitive stop word removal.
   add enablePositionIncrements=true in both the index and query
   analyzers to leave a 'gap' for more accurate phrase queries.
 --
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0
 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer
 /fieldType

 and I still don't know why it behaves like that - after all there is
 preserveOriginal attribute set to 1...

 W dniu 28.05.2013 14:21, Erick Erickson pisze:

 Hmmm, with 4.x I get much different behavior than you're
 describing, what version of Solr are you using?

 Besides Alex's comments, try adding debug=query to the url and see what comes
 out from the query parser.

 A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
 do
 any analysis, here's the javadoc...
  /**
* Default analyzer for types that only produces 1 verbatim token...
* A maximum size of chars to be read must be specified
*/

 so it's much like the string type. Which means I'm totally perplexed by your
 statement that 300 and letters return a hit. Have you perhaps changed the
 field definition and not re-indexed?

 The behavior you're seeing really looks like somehow 
 WordDelimiterFilterFactory
 is getting into your analysis chain with settings that don't mash the parts 
 back
 together, i.e. you can set up WDDF to split on letter/number transitions, 
 index
 each and NOT index the original, but I have no explanation for how that
 could happen with the field definition you indicated

 FWIW,
 Erick

 On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitcharafa...@gmail.com 
 arafa...@gmail.com wrote:

   What does analyzer screen say in the Web AdminUI when you try to do that?
 Also, what are the tokens stored in the field (also in Web AdminUI).

 I think it is very strange to have TextField without a tokenizer chain.
 Maybe you get a standard one assigned by default, but I don't know what the
 standard chain would be.

 Regards,

   Alex.
 On 28 May 2013 04:44, Michał Matulka michal.matu...@gowork.pl 
 michal.matu...@gowork.pl wrote:


  Hello,

 I've got following problem. I have a text type in my schema and a field
 name of that type.
 That field contains a data, there is, for example, record that has
 300letters as name.

 Now field type definition:
 fieldType name=text class=solr.TextField/**fieldType

 And, of course, field definition:
 fieldname=nametype=text**indexed=truestored=true/

 yes, that's all - there are no tokenizers.

 And now time for my question:

 Why following queries:

 name:300

 and

 name:letters

 are returning that result, but:

 name:300letters

 is not (0 results)?

 Best regards,
 Michał Matulka




 --
  Pozdrawiam,
 Michał Matulka
  Programista
  michal.matu...@gowork.pl


  *[image: GoWork.pl]*
  ul. Zielna 39
  00-108 Warszawa
  www.GoWork.pl



Re: Distributed query: strange behavior.

2013-05-28 Thread Valery Giner

Eric,

Thank you for the explanation.

My problem was that allowing the docs with the same unique ids  to be 
present in the multiple shards in a normal situation,
makes it impossible to estimate the number of shards needed for an index 
with a really large number of docs.


Thanks,
Val

On 05/26/2013 11:16 AM, Erick Erickson wrote:

Valery:

I share your puzzlement. _If_ you are letting Solr do the document
routing, and not doing any of the custom routing, then the same unique
key should be going to the same shard and replacing the previous doc
with that key.

But, if you're using custom routing, if you've been experimenting with
different configurations and didn't start over, in general if you're
configuration is in an interesting state this could happen.

So in the normal case if you have a document with the same key indexed
in multiple shards, that would indicate a bug. But there are many
ways, especially when experimenting, that you could have this happen
which are _not_ a bug. I'm guessing that Luis may be trying the custom
routing option maybe?

Best
Erick

On Fri, May 24, 2013 at 9:09 AM, Valery Giner valgi...@research.att.com wrote:

Shawn,

How is it possible for more than one document with the same unique key to
appear in the index, even in different shards?
Isn't it a bug by definition?
What am I missing here?

Thanks,
Val


On 05/23/2013 09:55 AM, Shawn Heisey wrote:

On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to
100
the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn





Re: Distributed query: strange behavior.

2013-05-27 Thread Luis Cappa Banda
Hi, Erick!

That's it! I'm using a custom implementation of a SolrServer with
distributed behavior that routes queries and updates using an in-house
Round Robin method. But the thing is that I'm doing this myself because
I've noticed that duplicated documents appears using LBHttpSolrServer
implementation. Last week I modified my implementation to avoid that with
this changes:


   - I have normalized the key field to all documents. Now every document
   indexed must include *_id_* field that stores the selected key value.
   The value is setted with a *copyField*.
   - When I index a new document a *HttpSolrServer* from the shard list is
   selected using a Round Robin strategy. Then, a field called *_shard_* is
   setted to *SolrInputDocument*. That field value includes a relationship
   with the main shard selected.
   - If a document wants to be indexed/updated and it includes
*_shard_*field to update it automatically the belonged shard (
   *HttpSolrServer*) is selected.
   - If a document wants to be indexed/updated and *_shard_* field is not
   included then the key value from *_id_* is getted from *SolrInputDocument
   *. With that key a distributed search query is executed by it's key to
   retrieve *_shard_* field. With *_shard_* field we can now choose the
   correct shard (*HttpSolrServer*). It's not a good practice and
   performance isn't the best, but it's secure.

Best Regards,

- Luis Cappa


2013/5/26 Erick Erickson erickerick...@gmail.com

 Valery:

 I share your puzzlement. _If_ you are letting Solr do the document
 routing, and not doing any of the custom routing, then the same unique
 key should be going to the same shard and replacing the previous doc
 with that key.

 But, if you're using custom routing, if you've been experimenting with
 different configurations and didn't start over, in general if you're
 configuration is in an interesting state this could happen.

 So in the normal case if you have a document with the same key indexed
 in multiple shards, that would indicate a bug. But there are many
 ways, especially when experimenting, that you could have this happen
 which are _not_ a bug. I'm guessing that Luis may be trying the custom
 routing option maybe?

 Best
 Erick

 On Fri, May 24, 2013 at 9:09 AM, Valery Giner valgi...@research.att.com
 wrote:
  Shawn,
 
  How is it possible for more than one document with the same unique key to
  appear in the index, even in different shards?
  Isn't it a bug by definition?
  What am I missing here?
 
  Thanks,
  Val
 
 
  On 05/23/2013 09:55 AM, Shawn Heisey wrote:
 
  On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
 
  I've query each Solr shard server one by one and the total number of
  documents is correct. However, when I change rows parameter from 10 to
  100
  the total numFound of documents change:
 
  I've seen this problem on the list before and the cause has been
  determined each time to be caused by documents with the same uniqueKey
  value appearing in more than one shard.
 
  What I think happens here:
 
  With rows=10, you get the top ten docs from each of the three shards,
  and each shard sends its numFound for that query to the core that's
  coordinating the search.  The coordinator adds up numFound, looks
  through those thirty docs, and arranges them according to the requested
  sort order, returning only the top 10.  In this case, there happen to be
  no duplicates.
 
  With rows=100, you get a total of 300 docs.  This time, duplicates are
  found and removed by the coordinator.  I think that the coordinator
  adjusts the total numFound by the number of duplicate documents it
  removed, in an attempt to be more accurate.
 
  I don't know if adjusting numFound when duplicates are found in a
  sharded query is the right thing to do, I'll leave that for smarter
  people.  Perhaps Solr should return a message with the results saying
  that duplicates were found, and if a config option is not enabled, the
  server should throw an exception and return a 4xx HTTP error code.  One
  idea for a config parameter name would be allowShardDuplicates, but
  something better can probably be found.
 
  Thanks,
  Shawn
 
 




-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-27 Thread Luis Cappa Banda
Hello, guys!

Well, I've done some tests and I think that there exists some kind of bug
related with distributed search. Currently I'm setting a key field that
it's impossible to be duplicated, and I have experienced the same wrong
behavior with numFound field while changing rows parameter. Has anyone
experienced the same?

Best regards,

- Luis Cappa


2013/5/27 Luis Cappa Banda luisca...@gmail.com

 Hi, Erick!

 That's it! I'm using a custom implementation of a SolrServer with
 distributed behavior that routes queries and updates using an in-house
 Round Robin method. But the thing is that I'm doing this myself because
 I've noticed that duplicated documents appears using LBHttpSolrServer
 implementation. Last week I modified my implementation to avoid that with
 this changes:


- I have normalized the key field to all documents. Now every document
indexed must include *_id_* field that stores the selected key value.
The value is setted with a *copyField*.
- When I index a new document a *HttpSolrServer* from the shard list
is selected using a Round Robin strategy. Then, a field called *_shard_
* is setted to *SolrInputDocument*. That field value includes a
relationship with the main shard selected.
- If a document wants to be indexed/updated and it includes *_shard_*field 
 to update it automatically the belonged shard (
*HttpSolrServer*) is selected.
- If a document wants to be indexed/updated and *_shard_* field is not
included then the key value from *_id_* is getted from *
SolrInputDocument*. With that key a distributed search query is
executed by it's key to retrieve *_shard_* field. With *_shard_* field
we can now choose the correct shard (*HttpSolrServer*). It's not a
good practice and performance isn't the best, but it's secure.

 Best Regards,

 - Luis Cappa


 2013/5/26 Erick Erickson erickerick...@gmail.com

 Valery:

 I share your puzzlement. _If_ you are letting Solr do the document
 routing, and not doing any of the custom routing, then the same unique
 key should be going to the same shard and replacing the previous doc
 with that key.

 But, if you're using custom routing, if you've been experimenting with
 different configurations and didn't start over, in general if you're
 configuration is in an interesting state this could happen.

 So in the normal case if you have a document with the same key indexed
 in multiple shards, that would indicate a bug. But there are many
 ways, especially when experimenting, that you could have this happen
 which are _not_ a bug. I'm guessing that Luis may be trying the custom
 routing option maybe?

 Best
 Erick

 On Fri, May 24, 2013 at 9:09 AM, Valery Giner valgi...@research.att.com
 wrote:
  Shawn,
 
  How is it possible for more than one document with the same unique key
 to
  appear in the index, even in different shards?
  Isn't it a bug by definition?
  What am I missing here?
 
  Thanks,
  Val
 
 
  On 05/23/2013 09:55 AM, Shawn Heisey wrote:
 
  On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
 
  I've query each Solr shard server one by one and the total number of
  documents is correct. However, when I change rows parameter from 10 to
  100
  the total numFound of documents change:
 
  I've seen this problem on the list before and the cause has been
  determined each time to be caused by documents with the same uniqueKey
  value appearing in more than one shard.
 
  What I think happens here:
 
  With rows=10, you get the top ten docs from each of the three shards,
  and each shard sends its numFound for that query to the core that's
  coordinating the search.  The coordinator adds up numFound, looks
  through those thirty docs, and arranges them according to the requested
  sort order, returning only the top 10.  In this case, there happen to
 be
  no duplicates.
 
  With rows=100, you get a total of 300 docs.  This time, duplicates are
  found and removed by the coordinator.  I think that the coordinator
  adjusts the total numFound by the number of duplicate documents it
  removed, in an attempt to be more accurate.
 
  I don't know if adjusting numFound when duplicates are found in a
  sharded query is the right thing to do, I'll leave that for smarter
  people.  Perhaps Solr should return a message with the results saying
  that duplicates were found, and if a config option is not enabled, the
  server should throw an exception and return a 4xx HTTP error code.  One
  idea for a config parameter name would be allowShardDuplicates, but
  something better can probably be found.
 
  Thanks,
  Shawn
 
 




 --
 - Luis Cappa




-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-26 Thread Erick Erickson
Valery:

I share your puzzlement. _If_ you are letting Solr do the document
routing, and not doing any of the custom routing, then the same unique
key should be going to the same shard and replacing the previous doc
with that key.

But, if you're using custom routing, if you've been experimenting with
different configurations and didn't start over, in general if you're
configuration is in an interesting state this could happen.

So in the normal case if you have a document with the same key indexed
in multiple shards, that would indicate a bug. But there are many
ways, especially when experimenting, that you could have this happen
which are _not_ a bug. I'm guessing that Luis may be trying the custom
routing option maybe?

Best
Erick

On Fri, May 24, 2013 at 9:09 AM, Valery Giner valgi...@research.att.com wrote:
 Shawn,

 How is it possible for more than one document with the same unique key to
 appear in the index, even in different shards?
 Isn't it a bug by definition?
 What am I missing here?

 Thanks,
 Val


 On 05/23/2013 09:55 AM, Shawn Heisey wrote:

 On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

 I've query each Solr shard server one by one and the total number of
 documents is correct. However, when I change rows parameter from 10 to
 100
 the total numFound of documents change:

 I've seen this problem on the list before and the cause has been
 determined each time to be caused by documents with the same uniqueKey
 value appearing in more than one shard.

 What I think happens here:

 With rows=10, you get the top ten docs from each of the three shards,
 and each shard sends its numFound for that query to the core that's
 coordinating the search.  The coordinator adds up numFound, looks
 through those thirty docs, and arranges them according to the requested
 sort order, returning only the top 10.  In this case, there happen to be
 no duplicates.

 With rows=100, you get a total of 300 docs.  This time, duplicates are
 found and removed by the coordinator.  I think that the coordinator
 adjusts the total numFound by the number of duplicate documents it
 removed, in an attempt to be more accurate.

 I don't know if adjusting numFound when duplicates are found in a
 sharded query is the right thing to do, I'll leave that for smarter
 people.  Perhaps Solr should return a message with the results saying
 that duplicates were found, and if a config option is not enabled, the
 server should throw an exception and return a 4xx HTTP error code.  One
 idea for a config parameter name would be allowShardDuplicates, but
 something better can probably be found.

 Thanks,
 Shawn




Re: Distributed query: strange behavior.

2013-05-24 Thread Luis Cappa Banda
Uhm... that sounds reasonable. My data model may allow duplicate keys, but
it's quite difficult. My key is a hash formed by an URL during a crawling
process, and it's posible to re-crawl an existing URL. I think that I need
to find a new way to compose an unique key to avoid this kind of bad
behavior. However, that would be very useful if can Solr alert about
duplicate keys or something. Maybe an extra parameter included as a field
in the response plus numFound, docs, facets, etc. would be nice. Thank you
very much!

Best regards,

- Luis Cappa


2013/5/23 Shawn Heisey s...@elyograg.org

 On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
  I've query each Solr shard server one by one and the total number of
  documents is correct. However, when I change rows parameter from 10 to
 100
  the total numFound of documents change:

 I've seen this problem on the list before and the cause has been
 determined each time to be caused by documents with the same uniqueKey
 value appearing in more than one shard.

 What I think happens here:

 With rows=10, you get the top ten docs from each of the three shards,
 and each shard sends its numFound for that query to the core that's
 coordinating the search.  The coordinator adds up numFound, looks
 through those thirty docs, and arranges them according to the requested
 sort order, returning only the top 10.  In this case, there happen to be
 no duplicates.

 With rows=100, you get a total of 300 docs.  This time, duplicates are
 found and removed by the coordinator.  I think that the coordinator
 adjusts the total numFound by the number of duplicate documents it
 removed, in an attempt to be more accurate.

 I don't know if adjusting numFound when duplicates are found in a
 sharded query is the right thing to do, I'll leave that for smarter
 people.  Perhaps Solr should return a message with the results saying
 that duplicates were found, and if a config option is not enabled, the
 server should throw an exception and return a 4xx HTTP error code.  One
 idea for a config parameter name would be allowShardDuplicates, but
 something better can probably be found.

 Thanks,
 Shawn




-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-24 Thread Valery Giner

Shawn,

How is it possible for more than one document with the same unique key 
to appear in the index, even in different shards?

Isn't it a bug by definition?
What am I missing here?

Thanks,
Val

On 05/23/2013 09:55 AM, Shawn Heisey wrote:

On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to 100
the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn





Re: Distributed query: strange behavior.

2013-05-24 Thread Shalin Shekhar Mangar
The uniqueKey is enforced within the same shard/index only.


On Fri, May 24, 2013 at 6:39 PM, Valery Giner valgi...@research.att.comwrote:

 Shawn,

 How is it possible for more than one document with the same unique key to
 appear in the index, even in different shards?
 Isn't it a bug by definition?
 What am I missing here?

 Thanks,
 Val


 On 05/23/2013 09:55 AM, Shawn Heisey wrote:

 On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:

 I've query each Solr shard server one by one and the total number of
 documents is correct. However, when I change rows parameter from 10 to
 100
 the total numFound of documents change:

 I've seen this problem on the list before and the cause has been
 determined each time to be caused by documents with the same uniqueKey
 value appearing in more than one shard.

 What I think happens here:

 With rows=10, you get the top ten docs from each of the three shards,
 and each shard sends its numFound for that query to the core that's
 coordinating the search.  The coordinator adds up numFound, looks
 through those thirty docs, and arranges them according to the requested
 sort order, returning only the top 10.  In this case, there happen to be
 no duplicates.

 With rows=100, you get a total of 300 docs.  This time, duplicates are
 found and removed by the coordinator.  I think that the coordinator
 adjusts the total numFound by the number of duplicate documents it
 removed, in an attempt to be more accurate.

 I don't know if adjusting numFound when duplicates are found in a
 sharded query is the right thing to do, I'll leave that for smarter
 people.  Perhaps Solr should return a message with the results saying
 that duplicates were found, and if a config option is not enabled, the
 server should throw an exception and return a 4xx HTTP error code.  One
 idea for a config parameter name would be allowShardDuplicates, but
 something better can probably be found.

 Thanks,
 Shawn





-- 
Regards,
Shalin Shekhar Mangar.


Distributed query: strange behavior.

2013-05-23 Thread Luis Cappa Banda
Hello, guys!

I'm running Solr 4.3.0 and I've notice an strange behavior during
distributed queries execution. Currently I have three Solr servers as
shards and I when I do the following query...


http://localhost:11080/twitter/data/select?q=*:**rows=10*
shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=jsonhttp://localhost:11080/twitter/data/select?q=*:*rows=10sort=docIndexDate%20descshards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json

*Numfound* = 47131


I've query each Solr shard server one by one and the total number of
documents is correct. However, when I change rows parameter from 10 to 100
the total numFound of documents change:

http://localhost:11080/twitter/data/select?q=*:**rows=100*
shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=jsonhttp://localhost:11080/twitter/data/select?q=*:*rows=10sort=docIndexDate%20descshards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json

*Numfound* = 47124

And if i set rows=50 again the numFound count changes:

http://localhost:11080/twitter/data/select?q=*:*rows=50shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json

*Numfound* = 47129


What's happening here? Anybody knows? It's a distributed search bug or
something?

Thank you very much in advance!


Best regards,

-- 
- Luis Cappa


Re: Distributed query: strange behavior.

2013-05-23 Thread Shawn Heisey
On 5/23/2013 1:51 AM, Luis Cappa Banda wrote:
 I've query each Solr shard server one by one and the total number of
 documents is correct. However, when I change rows parameter from 10 to 100
 the total numFound of documents change:

I've seen this problem on the list before and the cause has been
determined each time to be caused by documents with the same uniqueKey
value appearing in more than one shard.

What I think happens here:

With rows=10, you get the top ten docs from each of the three shards,
and each shard sends its numFound for that query to the core that's
coordinating the search.  The coordinator adds up numFound, looks
through those thirty docs, and arranges them according to the requested
sort order, returning only the top 10.  In this case, there happen to be
no duplicates.

With rows=100, you get a total of 300 docs.  This time, duplicates are
found and removed by the coordinator.  I think that the coordinator
adjusts the total numFound by the number of duplicate documents it
removed, in an attempt to be more accurate.

I don't know if adjusting numFound when duplicates are found in a
sharded query is the right thing to do, I'll leave that for smarter
people.  Perhaps Solr should return a message with the results saying
that duplicates were found, and if a config option is not enabled, the
server should throw an exception and return a 4xx HTTP error code.  One
idea for a config parameter name would be allowShardDuplicates, but
something better can probably be found.

Thanks,
Shawn



Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-25 Thread joe.cohe...@gmail.com

I'm having a smiliar problem.

Did you by any chance try the suggestion here:
https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

?



Rakudten wrote
 More info:
 
 -  I´m trying to update the document re-indexing the whole document again.
 I first retrieve the document querying by it´s id, then delete it by it´s
 id, and re-index including the new changes.
 - At the same time there are other index writing operations.
 
 *RESULT*: in most cases the document wasn´t updated. Bad news... it smells
 like a critical bug.
 
 Regards,
 
 
 - Luis Cappa.
 
 2012/11/22 Luis Cappa Banda lt;

 luiscappa@

 gt;
 
 For more details, my indexation App is:

 1. Multithreaded.
 2. NRT indexation.
 3. It´s a Web App with a REST API. It receives asynchronous requests that
 produces those atomic updates / document reindexations I told before.

 I´m pretty sure that the wrong behavior is related with CloudSolrServer
 and with the fact that maybe you are trying to modify the index while an
 index update is in course.

 Regards,


 - Luis Cappa.


 2012/11/22 Luis Cappa Banda lt;

 luiscappa@

 gt;

 Hello!

 I´m using a simple test configuration with nShards=1 without any
 replica.
 SolrCloudServer is suposed to forward properly those index/update
 operations, isn´t it? I test with a complete document reindexation, not
 atomic updates, using the official LBHttpSolrServer, not my custom
 BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
 related with atomic updates via CloudSolrServer but a general bug when
 an
 index changes with reindexations/updates frequently.

 Regards,

 - Luis Cappa.


 2012/11/22 Sami Siren lt;

 ssiren@

 gt;

 It might even depend on the cluster layout! Let's say you have 2 shards
 (no
 replicas) if the doc belongs to the node you send it to so that it does
 not
 get forwarded to another node then the update should work and in case
 where
 the doc gets forwarded to another node the problem occurs. With
 replicas
 it
 could appear even more strange: the leader might have the doc right and
 the
 replica not.

 I only briefly looked at the bits that deal with this so perhaps
 there's
 something more involved.


 On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda lt;

 luiscappa@

 gt; wrote:

  Hi, Sami!
 
  But isn´t strange that some documents were updated (atomic updates)
  correctly and other ones not? Can´t it be a more serious problem like
 some
  kind of index writer lock, or whatever?
 
  Regards,
 
  - Luis Cappa.
 
  2012/11/22 Sami Siren lt;

 ssiren@

 gt;
 
   I think the problem is that even though you were able to work
 around
 the
   bug in the client solr still uses the xml format internally so the
 atomic
   update (with multivalued field) fails later down the stack. The bug
 you
   filed needs to be fixed to get the problem solved.
  
  
   On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda 
 

 luiscappa@

   wrote:
  
Hello everyone.
   
I´ve starting to seriously worry about with SolrCloud due an
 strange
behavior that I have detected. The situation is this the
 following:
   
*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
  correctly
atomic updates. Check
JIRA-4080
   
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

*3.* An asynchronous proccess updates partially some document
 fields.
   After
that operation I automatically execute a commit, so the index
 must
 be
reloaded.
   
What I have checked is that both using atomic updates or complete
   document
reindexations* aleatory documents are not updated* *even if I saw
   debugging
how the add() and commit() operations were executed correctly*
 *and
   without
errors*. Has anyone experienced a similar behavior? Is it posible
 that
  if
an index update operation didn´t finish and CloudSolrServer
 receives a
   new
one this second update operation doesn´t complete?
   
Thank you in advance.
   
Regards,
   
--
   
- Luis Cappa
   
  
 
 
 
  --
 
  - Luis Cappa
 




 --

 - Luis Cappa




 --

 - Luis Cappa


 
 
 -- 
 
 - Luis Cappa





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Very-strange-behavior-when-doing-atomic-updates-or-documents-reindexation-tp4021899p4022250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-25 Thread Luis Cappa Banda
Yes! I opened that issue, :-P Next week I'll test with the latest trunk
artifacts and check if the problem still happens.

Regards,

- Luis Cappa.
El 25/11/2012 13:35, joe.cohe...@gmail.com joe.cohe...@gmail.com
escribió:


 I'm having a smiliar problem.

 Did you by any chance try the suggestion here:

 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

 ?



 Rakudten wrote
  More info:
 
  -  I´m trying to update the document re-indexing the whole document
 again.
  I first retrieve the document querying by it´s id, then delete it by it´s
  id, and re-index including the new changes.
  - At the same time there are other index writing operations.
 
  *RESULT*: in most cases the document wasn´t updated. Bad news... it
 smells
  like a critical bug.
 
  Regards,
 
 
  - Luis Cappa.
 
  2012/11/22 Luis Cappa Banda lt;

  luiscappa@

  gt;
 
  For more details, my indexation App is:
 
  1. Multithreaded.
  2. NRT indexation.
  3. It´s a Web App with a REST API. It receives asynchronous requests
 that
  produces those atomic updates / document reindexations I told before.
 
  I´m pretty sure that the wrong behavior is related with CloudSolrServer
  and with the fact that maybe you are trying to modify the index while an
  index update is in course.
 
  Regards,
 
 
  - Luis Cappa.
 
 
  2012/11/22 Luis Cappa Banda lt;

  luiscappa@

  gt;
 
  Hello!
 
  I´m using a simple test configuration with nShards=1 without any
  replica.
  SolrCloudServer is suposed to forward properly those index/update
  operations, isn´t it? I test with a complete document reindexation, not
  atomic updates, using the official LBHttpSolrServer, not my custom
  BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
  related with atomic updates via CloudSolrServer but a general bug when
  an
  index changes with reindexations/updates frequently.
 
  Regards,
 
  - Luis Cappa.
 
 
  2012/11/22 Sami Siren lt;

  ssiren@

  gt;
 
  It might even depend on the cluster layout! Let's say you have 2
 shards
  (no
  replicas) if the doc belongs to the node you send it to so that it
 does
  not
  get forwarded to another node then the update should work and in case
  where
  the doc gets forwarded to another node the problem occurs. With
  replicas
  it
  could appear even more strange: the leader might have the doc right
 and
  the
  replica not.
 
  I only briefly looked at the bits that deal with this so perhaps
  there's
  something more involved.
 
 
  On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda lt;

  luiscappa@

  gt; wrote:
 
   Hi, Sami!
  
   But isn´t strange that some documents were updated (atomic updates)
   correctly and other ones not? Can´t it be a more serious problem
 like
  some
   kind of index writer lock, or whatever?
  
   Regards,
  
   - Luis Cappa.
  
   2012/11/22 Sami Siren lt;

  ssiren@

  gt;
  
I think the problem is that even though you were able to work
  around
  the
bug in the client solr still uses the xml format internally so the
  atomic
update (with multivalued field) fails later down the stack. The
 bug
  you
filed needs to be fixed to get the problem solved.
   
   
On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda 
 

  luiscappa@

wrote:
   
 Hello everyone.

 I´ve starting to seriously worry about with SolrCloud due an
  strange
 behavior that I have detected. The situation is this the
  following:

 *1.* SolrCloud with one shard and two Solr instances.
 *2.* Indexation via SolrJ with CloudServer and a custom
 BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
   correctly
 atomic updates. Check
 JIRA-4080

   
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
 
 *3.* An asynchronous proccess updates partially some document
  fields.
After
 that operation I automatically execute a commit, so the index
  must
  be
 reloaded.

 What I have checked is that both using atomic updates or
 complete
document
 reindexations* aleatory documents are not updated* *even if I
 saw
debugging
 how the add() and commit() operations were executed correctly*
  *and
without
 errors*. Has anyone experienced a similar behavior? Is it
 posible
  that
   if
 an index update operation didn´t finish and CloudSolrServer
  receives a
new
 one this second update operation doesn´t complete?

 Thank you in advance.

 Regards,

 --

 - Luis Cappa

   
  
  
  
   --
  
   - Luis Cappa
  
 
 
 
 
  --
 
  - Luis Cappa
 
 
 
 
  --
 
  - Luis Cappa
 
 
 
 
  --
 
  - Luis Cappa





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-Very-strange-behavior-when-doing-atomic-updates

SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello everyone.

I´ve starting to seriously worry about with SolrCloud due an strange
behavior that I have detected. The situation is this the following:

*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
atomic updates. Check
JIRA-4080https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
*3.* An asynchronous proccess updates partially some document fields. After
that operation I automatically execute a commit, so the index must be
reloaded.

What I have checked is that both using atomic updates or complete document
reindexations* aleatory documents are not updated* *even if I saw debugging
how the add() and commit() operations were executed correctly* *and without
errors*. Has anyone experienced a similar behavior? Is it posible that if
an index update operation didn´t finish and CloudSolrServer receives a new
one this second update operation doesn´t complete?

Thank you in advance.

Regards,

-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
I think the problem is that even though you were able to work around the
bug in the client solr still uses the xml format internally so the atomic
update (with multivalued field) fails later down the stack. The bug you
filed needs to be fixed to get the problem solved.


On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda luisca...@gmail.comwrote:

 Hello everyone.

 I´ve starting to seriously worry about with SolrCloud due an strange
 behavior that I have detected. The situation is this the following:

 *1.* SolrCloud with one shard and two Solr instances.
 *2.* Indexation via SolrJ with CloudServer and a custom
 BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
 atomic updates. Check
 JIRA-4080
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
 
 *3.* An asynchronous proccess updates partially some document fields. After
 that operation I automatically execute a commit, so the index must be
 reloaded.

 What I have checked is that both using atomic updates or complete document
 reindexations* aleatory documents are not updated* *even if I saw debugging
 how the add() and commit() operations were executed correctly* *and without
 errors*. Has anyone experienced a similar behavior? Is it posible that if
 an index update operation didn´t finish and CloudSolrServer receives a new
 one this second update operation doesn´t complete?

 Thank you in advance.

 Regards,

 --

 - Luis Cappa



Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hi, Sami!

But isn´t strange that some documents were updated (atomic updates)
correctly and other ones not? Can´t it be a more serious problem like some
kind of index writer lock, or whatever?

Regards,

- Luis Cappa.

2012/11/22 Sami Siren ssi...@gmail.com

 I think the problem is that even though you were able to work around the
 bug in the client solr still uses the xml format internally so the atomic
 update (with multivalued field) fails later down the stack. The bug you
 filed needs to be fixed to get the problem solved.


 On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda luisca...@gmail.com
 wrote:

  Hello everyone.
 
  I´ve starting to seriously worry about with SolrCloud due an strange
  behavior that I have detected. The situation is this the following:
 
  *1.* SolrCloud with one shard and two Solr instances.
  *2.* Indexation via SolrJ with CloudServer and a custom
  BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute correctly
  atomic updates. Check
  JIRA-4080
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
  
  *3.* An asynchronous proccess updates partially some document fields.
 After
  that operation I automatically execute a commit, so the index must be
  reloaded.
 
  What I have checked is that both using atomic updates or complete
 document
  reindexations* aleatory documents are not updated* *even if I saw
 debugging
  how the add() and commit() operations were executed correctly* *and
 without
  errors*. Has anyone experienced a similar behavior? Is it posible that if
  an index update operation didn´t finish and CloudSolrServer receives a
 new
  one this second update operation doesn´t complete?
 
  Thank you in advance.
 
  Regards,
 
  --
 
  - Luis Cappa
 




-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
It might even depend on the cluster layout! Let's say you have 2 shards (no
replicas) if the doc belongs to the node you send it to so that it does not
get forwarded to another node then the update should work and in case where
the doc gets forwarded to another node the problem occurs. With replicas it
could appear even more strange: the leader might have the doc right and the
replica not.

I only briefly looked at the bits that deal with this so perhaps there's
something more involved.


On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda luisca...@gmail.comwrote:

 Hi, Sami!

 But isn´t strange that some documents were updated (atomic updates)
 correctly and other ones not? Can´t it be a more serious problem like some
 kind of index writer lock, or whatever?

 Regards,

 - Luis Cappa.

 2012/11/22 Sami Siren ssi...@gmail.com

  I think the problem is that even though you were able to work around the
  bug in the client solr still uses the xml format internally so the atomic
  update (with multivalued field) fails later down the stack. The bug you
  filed needs to be fixed to get the problem solved.
 
 
  On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda luisca...@gmail.com
  wrote:
 
   Hello everyone.
  
   I´ve starting to seriously worry about with SolrCloud due an strange
   behavior that I have detected. The situation is this the following:
  
   *1.* SolrCloud with one shard and two Solr instances.
   *2.* Indexation via SolrJ with CloudServer and a custom
   BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
 correctly
   atomic updates. Check
   JIRA-4080
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055
   
   *3.* An asynchronous proccess updates partially some document fields.
  After
   that operation I automatically execute a commit, so the index must be
   reloaded.
  
   What I have checked is that both using atomic updates or complete
  document
   reindexations* aleatory documents are not updated* *even if I saw
  debugging
   how the add() and commit() operations were executed correctly* *and
  without
   errors*. Has anyone experienced a similar behavior? Is it posible that
 if
   an index update operation didn´t finish and CloudSolrServer receives a
  new
   one this second update operation doesn´t complete?
  
   Thank you in advance.
  
   Regards,
  
   --
  
   - Luis Cappa
  
 



 --

 - Luis Cappa



Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello!

I´m using a simple test configuration with nShards=1 without any replica.
SolrCloudServer is suposed to forward properly those index/update
operations, isn´t it? I test with a complete document reindexation, not
atomic updates, using the official LBHttpSolrServer, not my custom
BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
related with atomic updates via CloudSolrServer but a general bug when an
index changes with reindexations/updates frequently.

Regards,

- Luis Cappa.


2012/11/22 Sami Siren ssi...@gmail.com

 It might even depend on the cluster layout! Let's say you have 2 shards (no
 replicas) if the doc belongs to the node you send it to so that it does not
 get forwarded to another node then the update should work and in case where
 the doc gets forwarded to another node the problem occurs. With replicas it
 could appear even more strange: the leader might have the doc right and the
 replica not.

 I only briefly looked at the bits that deal with this so perhaps there's
 something more involved.


 On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda luisca...@gmail.com
 wrote:

  Hi, Sami!
 
  But isn´t strange that some documents were updated (atomic updates)
  correctly and other ones not? Can´t it be a more serious problem like
 some
  kind of index writer lock, or whatever?
 
  Regards,
 
  - Luis Cappa.
 
  2012/11/22 Sami Siren ssi...@gmail.com
 
   I think the problem is that even though you were able to work around
 the
   bug in the client solr still uses the xml format internally so the
 atomic
   update (with multivalued field) fails later down the stack. The bug you
   filed needs to be fixed to get the problem solved.
  
  
   On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda luisca...@gmail.com
   wrote:
  
Hello everyone.
   
I´ve starting to seriously worry about with SolrCloud due an strange
behavior that I have detected. The situation is this the following:
   
*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
  correctly
atomic updates. Check
JIRA-4080
   
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

*3.* An asynchronous proccess updates partially some document fields.
   After
that operation I automatically execute a commit, so the index must be
reloaded.
   
What I have checked is that both using atomic updates or complete
   document
reindexations* aleatory documents are not updated* *even if I saw
   debugging
how the add() and commit() operations were executed correctly* *and
   without
errors*. Has anyone experienced a similar behavior? Is it posible
 that
  if
an index update operation didn´t finish and CloudSolrServer receives
 a
   new
one this second update operation doesn´t complete?
   
Thank you in advance.
   
Regards,
   
--
   
- Luis Cappa
   
  
 
 
 
  --
 
  - Luis Cappa
 




-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
For more details, my indexation App is:

1. Multithreaded.
2. NRT indexation.
3. It´s a Web App with a REST API. It receives asynchronous requests that
produces those atomic updates / document reindexations I told before.

I´m pretty sure that the wrong behavior is related with CloudSolrServer and
with the fact that maybe you are trying to modify the index while an index
update is in course.

Regards,


- Luis Cappa.


2012/11/22 Luis Cappa Banda luisca...@gmail.com

 Hello!

 I´m using a simple test configuration with nShards=1 without any replica.
 SolrCloudServer is suposed to forward properly those index/update
 operations, isn´t it? I test with a complete document reindexation, not
 atomic updates, using the official LBHttpSolrServer, not my custom
 BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
 related with atomic updates via CloudSolrServer but a general bug when an
 index changes with reindexations/updates frequently.

 Regards,

 - Luis Cappa.


 2012/11/22 Sami Siren ssi...@gmail.com

 It might even depend on the cluster layout! Let's say you have 2 shards
 (no
 replicas) if the doc belongs to the node you send it to so that it does
 not
 get forwarded to another node then the update should work and in case
 where
 the doc gets forwarded to another node the problem occurs. With replicas
 it
 could appear even more strange: the leader might have the doc right and
 the
 replica not.

 I only briefly looked at the bits that deal with this so perhaps there's
 something more involved.


 On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda luisca...@gmail.com
 wrote:

  Hi, Sami!
 
  But isn´t strange that some documents were updated (atomic updates)
  correctly and other ones not? Can´t it be a more serious problem like
 some
  kind of index writer lock, or whatever?
 
  Regards,
 
  - Luis Cappa.
 
  2012/11/22 Sami Siren ssi...@gmail.com
 
   I think the problem is that even though you were able to work around
 the
   bug in the client solr still uses the xml format internally so the
 atomic
   update (with multivalued field) fails later down the stack. The bug
 you
   filed needs to be fixed to get the problem solved.
  
  
   On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda 
 luisca...@gmail.com
   wrote:
  
Hello everyone.
   
I´ve starting to seriously worry about with SolrCloud due an strange
behavior that I have detected. The situation is this the following:
   
*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
  correctly
atomic updates. Check
JIRA-4080
   
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

*3.* An asynchronous proccess updates partially some document
 fields.
   After
that operation I automatically execute a commit, so the index must
 be
reloaded.
   
What I have checked is that both using atomic updates or complete
   document
reindexations* aleatory documents are not updated* *even if I saw
   debugging
how the add() and commit() operations were executed correctly* *and
   without
errors*. Has anyone experienced a similar behavior? Is it posible
 that
  if
an index update operation didn´t finish and CloudSolrServer
 receives a
   new
one this second update operation doesn´t complete?
   
Thank you in advance.
   
Regards,
   
--
   
- Luis Cappa
   
  
 
 
 
  --
 
  - Luis Cappa
 




 --

 - Luis Cappa




-- 

- Luis Cappa


Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
More info:

-  I´m trying to update the document re-indexing the whole document again.
I first retrieve the document querying by it´s id, then delete it by it´s
id, and re-index including the new changes.
- At the same time there are other index writing operations.

*RESULT*: in most cases the document wasn´t updated. Bad news... it smells
like a critical bug.

Regards,


- Luis Cappa.

2012/11/22 Luis Cappa Banda luisca...@gmail.com

 For more details, my indexation App is:

 1. Multithreaded.
 2. NRT indexation.
 3. It´s a Web App with a REST API. It receives asynchronous requests that
 produces those atomic updates / document reindexations I told before.

 I´m pretty sure that the wrong behavior is related with CloudSolrServer
 and with the fact that maybe you are trying to modify the index while an
 index update is in course.

 Regards,


 - Luis Cappa.


 2012/11/22 Luis Cappa Banda luisca...@gmail.com

 Hello!

 I´m using a simple test configuration with nShards=1 without any replica.
 SolrCloudServer is suposed to forward properly those index/update
 operations, isn´t it? I test with a complete document reindexation, not
 atomic updates, using the official LBHttpSolrServer, not my custom
 BinaryLBHttpSolrServer, and it dosn´t work. I think is not just a bug
 related with atomic updates via CloudSolrServer but a general bug when an
 index changes with reindexations/updates frequently.

 Regards,

 - Luis Cappa.


 2012/11/22 Sami Siren ssi...@gmail.com

 It might even depend on the cluster layout! Let's say you have 2 shards
 (no
 replicas) if the doc belongs to the node you send it to so that it does
 not
 get forwarded to another node then the update should work and in case
 where
 the doc gets forwarded to another node the problem occurs. With replicas
 it
 could appear even more strange: the leader might have the doc right and
 the
 replica not.

 I only briefly looked at the bits that deal with this so perhaps there's
 something more involved.


 On Thu, Nov 22, 2012 at 8:29 PM, Luis Cappa Banda luisca...@gmail.com
 wrote:

  Hi, Sami!
 
  But isn´t strange that some documents were updated (atomic updates)
  correctly and other ones not? Can´t it be a more serious problem like
 some
  kind of index writer lock, or whatever?
 
  Regards,
 
  - Luis Cappa.
 
  2012/11/22 Sami Siren ssi...@gmail.com
 
   I think the problem is that even though you were able to work around
 the
   bug in the client solr still uses the xml format internally so the
 atomic
   update (with multivalued field) fails later down the stack. The bug
 you
   filed needs to be fixed to get the problem solved.
  
  
   On Thu, Nov 22, 2012 at 8:19 PM, Luis Cappa Banda 
 luisca...@gmail.com
   wrote:
  
Hello everyone.
   
I´ve starting to seriously worry about with SolrCloud due an
 strange
behavior that I have detected. The situation is this the following:
   
*1.* SolrCloud with one shard and two Solr instances.
*2.* Indexation via SolrJ with CloudServer and a custom
BinaryLBHttpSolrServer that uses BinaryRequestWriter to execute
  correctly
atomic updates. Check
JIRA-4080
   
  
 
 https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13498055

*3.* An asynchronous proccess updates partially some document
 fields.
   After
that operation I automatically execute a commit, so the index must
 be
reloaded.
   
What I have checked is that both using atomic updates or complete
   document
reindexations* aleatory documents are not updated* *even if I saw
   debugging
how the add() and commit() operations were executed correctly* *and
   without
errors*. Has anyone experienced a similar behavior? Is it posible
 that
  if
an index update operation didn´t finish and CloudSolrServer
 receives a
   new
one this second update operation doesn´t complete?
   
Thank you in advance.
   
Regards,
   
--
   
- Luis Cappa
   
  
 
 
 
  --
 
  - Luis Cappa
 




 --

 - Luis Cappa




 --

 - Luis Cappa




-- 

- Luis Cappa


Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
When specifying a field name that starts with a digit (or digits) in the fl
parameter solr returns both the field name and field value as the those
digits. For example, using nightly build
apache-solr-4.0-2012-04-24_08-27-47 I run:

java -jar start.jar
and
java -jar post.jar solr.xml monitor.xml

If I then add a field to the field list that starts with a digit (
localhost:8983/solr/select?q=*:*fl=24 ) the results look like:
...
doc
long name=2424/long
/doc
...

if I try fl=24_7 it looks like everything after the underscore is truncated
...
doc
long name=2424/long
/doc
...

and if I try fl=3test it looks like everything after the last digit is
truncated
...
doc
long name=33/long
/doc
...

If I have an actual value for that field (say I've indexed 24_7 to be true
) I get back that value as well as the behavior above.
...
doc
bool name=24_7true/bool
long name=2424/long
/doc
...

Is it ok the have fields that start with digits? If so, is there a different
way to specify them using the fl parameter? Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread Erick Erickson
Hmmm, this does NOT happen on 3.6, and it DOES happen on
trunk. Sure sounds like a JIRA to me, would you mind raising one?

I can't imagine this is desired behavior, it's just weird.

Thanks for pointing this out!
Erick

On Tue, Apr 24, 2012 at 3:38 PM, bleakley bleak...@factual.com wrote:
 When specifying a field name that starts with a digit (or digits) in the fl
 parameter solr returns both the field name and field value as the those
 digits. For example, using nightly build
 apache-solr-4.0-2012-04-24_08-27-47 I run:

 java -jar start.jar
 and
 java -jar post.jar solr.xml monitor.xml

 If I then add a field to the field list that starts with a digit (
 localhost:8983/solr/select?q=*:*fl=24 ) the results look like:
 ...
 doc
 long name=2424/long
 /doc
 ...

 if I try fl=24_7 it looks like everything after the underscore is truncated
 ...
 doc
 long name=2424/long
 /doc
 ...

 and if I try fl=3test it looks like everything after the last digit is
 truncated
 ...
 doc
 long name=33/long
 /doc
 ...

 If I have an actual value for that field (say I've indexed 24_7 to be true
 ) I get back that value as well as the behavior above.
 ...
 doc
 bool name=24_7true/bool
 long name=2424/long
 /doc
 ...

 Is it ok the have fields that start with digits? If so, is there a different
 way to specify them using the fl parameter? Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field names w/ leading digits cause strange behavior

2012-04-24 Thread bleakley
Thank you for verifying the issue. I've created a ticket at
https://issues.apache.org/jira/browse/SOLR-3407

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936599.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange behavior with search on empty string and NOT

2012-04-09 Thread Chris Hostetter

: Would it be a good idea to have Solr throw syntax error if an empty string
: query occurs? 

erick's explanation wasn't very precise ... 

solr doesn't have any special handling of empty strings, but what you 
are searching for *might* be a totally valid query based on how the field 
type is configured (ie: strfield, or keywordtokenizer, etc...

in your case, you seem to be seraching for  in a field for the 
analyzer produces no tokens for , so it falls out of the query.


-Hoss


Re: Strange behavior with search on empty string and NOT

2012-03-13 Thread Lan
Would it be a good idea to have Solr throw syntax error if an empty string
query occurs? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3823572.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange behavior with search on empty string and NOT

2012-03-12 Thread Erick Erickson
Because Lucene query syntax is not a strict Boolean logic system.
There's a good explanation here:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

Adding debugQuery=on to your search is your friend G.. You'll see
that your return (at least on 3.5 with going at /solr/select) returns
this as the parsed query:

str name=parsedquery-name:foobar/str

Solr really doesn't have the semantics for empty strings (or NULL for
that matter) so it just gets dropped out.

Best
Erick

On Sun, Mar 11, 2012 at 11:36 PM, Lan dung@gmail.com wrote:
 I am curious why solr results are inconsistent for the query below for an
 empty string search on a TextField.

 q=name: returns 0 results
 q=name: AND NOT name:FOOBAR return all results in the solr index. Should
 it should not return 0 results too?

 Here is the debugQuery.

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=qname: AND NOT name:BLAH232282/str
 str name=rows0/str
 str name=version2.2/str
 /lst
 /lst
 result name=response numFound=3790790 start=0/
 lst name=debug
 str name=rawquerystringname: AND NOT name:BLAH232282/str
 str name=querystringname: AND NOT name:BLAH232282/str
 str name=parsedquery-PhraseQuery(name:blah 232282)/str
 str name=parsedquery_toString-name:blah 232282/str
 lst name=explain/
 str name=QParserLuceneQParser/str
 lst name=timing
 double name=time1.0/double
 lst name=prepare
 double name=time1.0/double
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time1.0/double
 /lst
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 lst name=process
 double name=time0.0/double
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3818023.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Strange behavior with search on empty string and NOT

2012-03-11 Thread Lan
I am curious why solr results are inconsistent for the query below for an
empty string search on a TextField. 

q=name: returns 0 results
q=name: AND NOT name:FOOBAR return all results in the solr index. Should
it should not return 0 results too?

Here is the debugQuery.

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
str name=debugQueryon/str
str name=indenton/str
str name=start0/str
str name=qname: AND NOT name:BLAH232282/str
str name=rows0/str
str name=version2.2/str
/lst
/lst
result name=response numFound=3790790 start=0/
lst name=debug
str name=rawquerystringname: AND NOT name:BLAH232282/str
str name=querystringname: AND NOT name:BLAH232282/str
str name=parsedquery-PhraseQuery(name:blah 232282)/str
str name=parsedquery_toString-name:blah 232282/str
lst name=explain/
str name=QParserLuceneQParser/str
lst name=timing
double name=time1.0/double
lst name=prepare
double name=time1.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-behavior-with-search-on-empty-string-and-NOT-tp3818023p3818023.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: strange behavior of scores and term proximity use

2011-11-25 Thread Erick Erickson
You  might try with a less fraught search phrase,
to be or not to be is a classic query that may be all
stop words.

Otherwise, I'm clueless.

On Wed, Nov 23, 2011 at 3:15 PM, Ariel Zerbib ariel.zer...@gmail.com wrote:
 I tested with the version 4.0-2011-11-04_09-29-42.

 Ariel


 2011/11/17 Erick Erickson erickerick...@gmail.com

 Hmmm, I'm not seeing similar behavior on a trunk from today, when did
 you get your copy?

 Erick

 On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib ariel.zer...@gmail.com
 wrote:
  Hi,
 
  For this term proximity query: ab_main_title_l0:to be or not to be~1000
 
 
 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=truehttp://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22%7E1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true
 
  The third first results are the following one:
 
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeader
   int name=status0/int
   int name=QTime5/int
  /lst
  result name=response numFound=318 start=0 maxScore=3.0814114
   doc
     long name=id2315190010001021/long
     arr name=ab_main_title_l0
       strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
     /arr
     float name=score3.0814114/float/doc
   doc
     long name=id2313006480001021/long
     arr name=ab_main_title_l0
       strog54ct8n To be or not to be 5w8ojsx2/str
     /arr
     float name=score3.0814114/float/doc
   doc
     long name=id2356410250001021/long
     arr name=ab_main_title_l0
       strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
     /arr
     float name=score3.0814114/float/doc
  /result
  lst name=debug
   str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=querystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
  not to be 5w8ojsx2~1000)/str
   str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
  to be 5w8ojsx2~1000/str
   lst name=explain
     str name=2315190010001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 378403, product of:
     0.57735026 = tf(freq=0.3334), with freq of:
       0.3334 = phraseFreq=0.3334
     29.581549 = idf(), sum of:
       1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       4.3826413 = idf(docFreq=112108, maxDocs=3301436)
       6.3982043 = idf(docFreq=14937, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
     0.3125 = fieldNorm(doc=378403)
  /str
     str name=2313006480001021
  9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
   9.244234 = fieldWeight in 482807, product of:
     1.0 = tf(freq=1.0), with freq of:
       1.0 = phraseFreq=1.0
     29.581549 = idf(), sum of:
       1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       4.3826413 = idf(docFreq=112108, maxDocs=3301436)
       6.3982043 = idf(docFreq=14937, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
     0.3125 = fieldNorm(doc=482807)
  /str
     str name=2356410250001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 1317563, product of:
     0.57735026 = tf(freq=0.3334), with freq of:
       0.3334 = phraseFreq=0.3334
     29.581549 = idf(), sum of:
       1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       4.3826413 = idf(docFreq=112108, maxDocs=3301436)
       6.3982043 = idf(docFreq=14937, maxDocs=3301436)
       3.0405464 = idf(docFreq=429046, maxDocs=3301436)
       5.3583193 = idf(docFreq=42257, maxDocs=3301436)
       1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
     0.3125 = fieldNorm(doc=1317563)
  /str
  /response
 
  The used version is a 4.0 October snapshot.
 
  I have 2 questions about the result:
  - Why debug print and scores in result are different?
  - What is the expected behavior of this kind of term proximity query?
           - The debug scores seem to be well ordered but the result scores
  seem to be wrong.
 
 
  Thanks,
  Ariel
 



Re: strange behavior of scores and term proximity use

2011-11-23 Thread Ariel Zerbib
I tested with the version 4.0-2011-11-04_09-29-42.

Ariel


2011/11/17 Erick Erickson erickerick...@gmail.com

 Hmmm, I'm not seeing similar behavior on a trunk from today, when did
 you get your copy?

 Erick

 On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib ariel.zer...@gmail.com
 wrote:
  Hi,
 
  For this term proximity query: ab_main_title_l0:to be or not to be~1000
 
 
 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=truehttp://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22%7E1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true
 
  The third first results are the following one:
 
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeader
   int name=status0/int
   int name=QTime5/int
  /lst
  result name=response numFound=318 start=0 maxScore=3.0814114
   doc
 long name=id2315190010001021/long
 arr name=ab_main_title_l0
   strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
   doc
 long name=id2313006480001021/long
 arr name=ab_main_title_l0
   strog54ct8n To be or not to be 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
   doc
 long name=id2356410250001021/long
 arr name=ab_main_title_l0
   strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
  /result
  lst name=debug
   str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=querystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
  not to be 5w8ojsx2~1000)/str
   str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
  to be 5w8ojsx2~1000/str
   lst name=explain
 str name=2315190010001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 378403, product of:
 0.57735026 = tf(freq=0.3334), with freq of:
   0.3334 = phraseFreq=0.3334
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=378403)
  /str
 str name=2313006480001021
  9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
   9.244234 = fieldWeight in 482807, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=482807)
  /str
 str name=2356410250001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 1317563, product of:
 0.57735026 = tf(freq=0.3334), with freq of:
   0.3334 = phraseFreq=0.3334
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=1317563)
  /str
  /response
 
  The used version is a 4.0 October snapshot.
 
  I have 2 questions about the result:
  - Why debug print and scores in result are different?
  - What is the expected behavior of this kind of term proximity query?
   - The debug scores seem to be well ordered but the result scores
  seem to be wrong.
 
 
  Thanks,
  Ariel
 



Re: strange behavior of scores and term proximity use

2011-11-17 Thread Erick Erickson
Hmmm, I'm not seeing similar behavior on a trunk from today, when did
you get your copy?

Erick

On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib ariel.zer...@gmail.com wrote:
 Hi,

 For this term proximity query: ab_main_title_l0:to be or not to be~1000

 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true

 The third first results are the following one:

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
  int name=status0/int
  int name=QTime5/int
 /lst
 result name=response numFound=318 start=0 maxScore=3.0814114
  doc
    long name=id2315190010001021/long
    arr name=ab_main_title_l0
      strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
    /arr
    float name=score3.0814114/float/doc
  doc
    long name=id2313006480001021/long
    arr name=ab_main_title_l0
      strog54ct8n To be or not to be 5w8ojsx2/str
    /arr
    float name=score3.0814114/float/doc
  doc
    long name=id2356410250001021/long
    arr name=ab_main_title_l0
      strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
    /arr
    float name=score3.0814114/float/doc
 /result
 lst name=debug
  str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
 5w8ojsx2~1000/str
  str name=querystringab_main_title_l0:og54ct8n to be or not to be
 5w8ojsx2~1000/str
  str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
 not to be 5w8ojsx2~1000)/str
  str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
 to be 5w8ojsx2~1000/str
  lst name=explain
    str name=2315190010001021
 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
 5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
    0.57735026 = tf(freq=0.3334), with freq of:
      0.3334 = phraseFreq=0.3334
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=378403)
 /str
    str name=2313006480001021
 9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
 5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = phraseFreq=1.0
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=482807)
 /str
    str name=2356410250001021
 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
 5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
    0.57735026 = tf(freq=0.3334), with freq of:
      0.3334 = phraseFreq=0.3334
    29.581549 = idf(), sum of:
      1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      4.3826413 = idf(docFreq=112108, maxDocs=3301436)
      6.3982043 = idf(docFreq=14937, maxDocs=3301436)
      3.0405464 = idf(docFreq=429046, maxDocs=3301436)
      5.3583193 = idf(docFreq=42257, maxDocs=3301436)
      1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
    0.3125 = fieldNorm(doc=1317563)
 /str
 /response

 The used version is a 4.0 October snapshot.

 I have 2 questions about the result:
 - Why debug print and scores in result are different?
 - What is the expected behavior of this kind of term proximity query?
          - The debug scores seem to be well ordered but the result scores
 seem to be wrong.


 Thanks,
 Ariel



strange behavior of scores and term proximity use

2011-11-16 Thread Ariel Zerbib
Hi,

For this term proximity query: ab_main_title_l0:to be or not to be~1000

http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true

The third first results are the following one:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
  int name=status0/int
  int name=QTime5/int
/lst
result name=response numFound=318 start=0 maxScore=3.0814114
  doc
long name=id2315190010001021/long
arr name=ab_main_title_l0
  strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
  doc
long name=id2313006480001021/long
arr name=ab_main_title_l0
  strog54ct8n To be or not to be 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
  doc
long name=id2356410250001021/long
arr name=ab_main_title_l0
  strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
/arr
float name=score3.0814114/float/doc
/result
lst name=debug
  str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000/str
  str name=querystringab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000/str
  str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
not to be 5w8ojsx2~1000)/str
  str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
to be 5w8ojsx2~1000/str
  lst name=explain
str name=2315190010001021
5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 378403, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=378403)
/str
str name=2313006480001021
9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
  9.244234 = fieldWeight in 482807, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=482807)
/str
str name=2356410250001021
5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
  5.337161 = fieldWeight in 1317563, product of:
0.57735026 = tf(freq=0.3334), with freq of:
  0.3334 = phraseFreq=0.3334
29.581549 = idf(), sum of:
  1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  4.3826413 = idf(docFreq=112108, maxDocs=3301436)
  6.3982043 = idf(docFreq=14937, maxDocs=3301436)
  3.0405464 = idf(docFreq=429046, maxDocs=3301436)
  5.3583193 = idf(docFreq=42257, maxDocs=3301436)
  1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
0.3125 = fieldNorm(doc=1317563)
/str
/response

The used version is a 4.0 October snapshot.

I have 2 questions about the result:
- Why debug print and scores in result are different?
- What is the expected behavior of this kind of term proximity query?
  - The debug scores seem to be well ordered but the result scores
seem to be wrong.


Thanks,
Ariel


Re: Strange behavior

2011-06-16 Thread Alexey Serba
Have you stopped Solr before manually copying the data? This way you
can be sure that index is the same and you didn't have any new docs on
the fly.

2011/6/14 Denis Kuzmenok forward...@ukr.net:
 What  should  i provide, OS is the same, environment is the same, solr
 is  completely  copied,  searches  work,  except that one, and that is
 strange..

 I think you will need to provide more information than this, no-one on this 
 list is omniscient AFAIK.

 François

 On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.

 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?








Strange behavior

2011-06-14 Thread Denis Kuzmenok
Hi.

I've  debugged search on test machine, after copying to production server
the  entire  directory  (entire solr directory), i've noticed that one
query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
production.
How can that be?



Re: Strange behavior

2011-06-14 Thread François Schiettecatte
I think you will need to provide more information than this, no-one on this 
list is omniscient AFAIK.

François

On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.
 
 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?
 



Re: Strange behavior

2011-06-14 Thread Denis Kuzmenok
What  should  i provide, OS is the same, environment is the same, solr
is  completely  copied,  searches  work,  except that one, and that is
strange.. 

 I think you will need to provide more information than this, no-one on this 
 list is omniscient AFAIK.

 François

 On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.
 
 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?
 






Re: Strange behavior

2011-06-14 Thread Erick Erickson
Well, you could provide the results with debugQuery=on. You could
provide the schema.xml and solrconfig.xml files for both. You
could provide a listing of your index files. You could provide some
evidence that you've tried chasing down your problem using tools
like Luke or the Solr admin interface. Something please...

You might also review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2011/6/14 Denis Kuzmenok forward...@ukr.net:
 What  should  i provide, OS is the same, environment is the same, solr
 is  completely  copied,  searches  work,  except that one, and that is
 strange..

 I think you will need to provide more information than this, no-one on this 
 list is omniscient AFAIK.

 François

 On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote:

 Hi.

 I've  debugged search on test machine, after copying to production server
 the  entire  directory  (entire solr directory), i've noticed that one
 query  (SDR  S70EE  K)  does  match  on  test  server, and does not on
 production.
 How can that be?








strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Dear list,

after setting echoParams to none wildcard search isn't working.
Only if I set echoParams to explicit then wildcard is possible.

http://wiki.apache.org/solr/CoreQueryParameters
states that echoParams is for debugging purposes.

We use Solr 3.1.0.

Snippet from solrconfig.xml:
requestHandler name=standard class=solr.SearchHandler default=true
 lst name=defaults
   str name=echoParamsnone/str
!--   str name=echoParamsexplicit/str --
   str name=wtxml/str
   int name=rows10/int
 /lst
/requestHandler

Any explanation about this behavior?

Regards,
Bernd


Re: strange behavior of echoParams

2011-04-13 Thread Erik Hatcher
What does the parsed query look like with debugQuery=true for both scenarios?  
Any difference?  Doesn't make any sense that echoParams would have an effect, 
unless somehow your search client is relying on parameters returned to do 
something with them.?!

Erik

On Apr 13, 2011, at 09:57 , Bernd Fehling wrote:

 Dear list,
 
 after setting echoParams to none wildcard search isn't working.
 Only if I set echoParams to explicit then wildcard is possible.
 
 http://wiki.apache.org/solr/CoreQueryParameters
 states that echoParams is for debugging purposes.
 
 We use Solr 3.1.0.
 
 Snippet from solrconfig.xml:
 requestHandler name=standard class=solr.SearchHandler default=true
 lst name=defaults
   str name=echoParamsnone/str
 !--   str name=echoParamsexplicit/str --
   str name=wtxml/str
   int name=rows10/int
 /lst
 /requestHandler
 
 Any explanation about this behavior?
 
 Regards,
 Bernd



Re: strange behavior of echoParams

2011-04-13 Thread Bernd Fehling

Hi Erik,

never mind.
Can't reproduce this strange behavior.
Obviously stopping and starting of solr solved this.

Thanks,
Bernd


Am 13.04.2011 16:00, schrieb Erik Hatcher:

What does the parsed query look like with debugQuery=true for both scenarios?
Any difference?
Doesn't make any sense that echoParams would have an effect, unless somehow 
your search client is relying on parameters returned to do something with 
them.?!

Erik

On Apr 13, 2011, at 09:57 , Bernd Fehling wrote:


Dear list,

after setting echoParams to none wildcard search isn't working.
Only if I set echoParams to explicit then wildcard is possible.

http://wiki.apache.org/solr/CoreQueryParameters
states that echoParams is for debugging purposes.

We use Solr 3.1.0.

Snippet from solrconfig.xml:
requestHandler name=standard class=solr.SearchHandler default=true
 lst name=defaults
   str name=echoParamsnone/str
!--str name=echoParamsexplicit/str  --
   str name=wtxml/str
   int name=rows10/int
 /lst
/requestHandler

Any explanation about this behavior?

Regards,
Bernd






RE: Strange behavior for certain words

2010-05-13 Thread Ahmet Arslan
Hi,
       Thanks for your response. Attached are the Schema.xml and sample docs 
that were indexed. The query and response are as below. The attachment 
Prodsku4270257.xml has a field paymenttype whose value is 'prepaid'.

query:
q=prepaidstart=0rows=10fl=*%2Cscoreqt=standardwt=jsondebugQuery=onexplainOther=hl=on

But you are populating your text field from deviceType, features, description 
and color. paymentType is not copied into text. So this behavior is normal.
Either add this copy field declaration 
  copyField source=paymentType dest=text / 
Or query directly this field: q=paymentType:prepaid



  

Strange behavior for certain words

2010-05-12 Thread RamaKrishna Atmakur

Hi,
   We are trying to use SOLR for searching our catalog online and during QA 
came across a interesting case where SOLR is not returning results that it 
should.

Specificially, we have indexed things like Title and Description, of the 
words in the Title happens to be Prepaid' and Postpaid. However when we 
search on those words, SOLR does not return any results.
But if we search on some other words in the same title in which the word 
Prepaid occurs then the correct results are returned. In fact SOLR even 
returns the result count for the Prepaid and Postpaid facets.

We know that there are no synonyms associated with both those words and these 
words are also not in any other list such as stopwords.txt etc.

Any idea as to why this should be happening ?

Thanks in advance,
Rama
  

Re: Strange behavior for certain words

2010-05-12 Thread Erick Erickson
Hmmm, there's not much information to go on here.
You might review this page:
http://wiki.apache.org/solr/UsingMailingLists
and post with more information. At minimum,
the field definitions, the query output (include
debugQuery=on), perhaps what comes out
of the analysis admin page for both indexing
and querying the problem text, and whatever
else you can think of that would help analyze the
problem.

Best
Erick

On Wed, May 12, 2010 at 8:26 PM, RamaKrishna Atmakur 
ramkrishn...@hotmail.com wrote:


 Hi,
   We are trying to use SOLR for searching our catalog online and during QA
 came across a interesting case where SOLR is not returning results that it
 should.

 Specificially, we have indexed things like Title and Description, of
 the words in the Title happens to be Prepaid' and Postpaid. However when
 we search on those words, SOLR does not return any results.
 But if we search on some other words in the same title in which the word
 Prepaid occurs then the correct results are returned. In fact SOLR even
 returns the result count for the Prepaid and Postpaid facets.

 We know that there are no synonyms associated with both those words and
 these words are also not in any other list such as stopwords.txt etc.

 Any idea as to why this should be happening ?

 Thanks in advance,
 Rama



RE: Strange behavior for certain words

2010-05-12 Thread RamaKrishna Atmakur

Hi,
   Thanks for your response. Attached are the Schema.xml and sample docs 
that were indexed. The query and response are as below. The attachment 
Prodsku4270257.xml has a field paymenttype whose value is 'prepaid'.

query:
q=prepaidstart=0rows=10fl=*%2Cscoreqt=standardwt=jsondebugQuery=onexplainOther=hl=on
Result:
{
 responseHeader:{
  status:0,
  QTime:0,
  params:{
wt:json,
debugQuery:on,
start:0,
rows:10,
explainOther:,
indent:on,
fl:*,score,
hl:on,
qt:standard,
version:2.2,
q:prepaid,
hl.fl:}},
 response:{numFound:0,start:0,maxScore:0.0,docs:[]
 },
 highlighting:{},
 debug:{
  rawquerystring:prepaid,
  querystring:prepaid,
  parsedquery:text:prepaid,
  parsedquery_toString:text:prepaid,
  explain:{},
  QParser:OldLuceneQParser,
  timing:{
time:0.0,
prepare:{
 time:0.0,
 org.apache.solr.handler.component.QueryComponent:{
  time:0.0},
 org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
 org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
 org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
 org.apache.solr.handler.component.DebugComponent:{
  time:0.0}},
process:{
 time:0.0,
 org.apache.solr.handler.component.QueryComponent:{
  time:0.0},
 org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
 org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
 org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
 org.apache.solr.handler.component.DebugComponent:{
  time:0.0}
Thanks and Regards
Rama K Atmakur.

 Date: Wed, 12 May 2010 20:46:11 -0400
 Subject: Re: Strange behavior for certain words
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 
 Hmmm, there's not much information to go on here.
 You might review this page:
 http://wiki.apache.org/solr/UsingMailingLists
 and post with more information. At minimum,
 the field definitions, the query output (include
 debugQuery=on), perhaps what comes out
 of the analysis admin page for both indexing
 and querying the problem text, and whatever
 else you can think of that would help analyze the
 problem.
 
 Best
 Erick
 
 On Wed, May 12, 2010 at 8:26 PM, RamaKrishna Atmakur 
 ramkrishn...@hotmail.com wrote:
 
 
  Hi,
We are trying to use SOLR for searching our catalog online and during QA
  came across a interesting case where SOLR is not returning results that it
  should.
 
  Specificially, we have indexed things like Title and Description, of
  the words in the Title happens to be Prepaid' and Postpaid. However when
  we search on those words, SOLR does not return any results.
  But if we search on some other words in the same title in which the word
  Prepaid occurs then the correct results are returned. In fact SOLR even
  returns the result count for the Prepaid and Postpaid facets.
 
  We know that there are no synonyms associated with both those words and
  these words are also not in any other list such as stopwords.txt etc.
 
  Any idea as to why this should be happening ?
 
  Thanks in advance,
  Rama
 
  ?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml
--

schema name=attcatalog version=1.1
  !-- attribute name is the name of this schema and is only used for display purposes.
   Applications should change this to reflect the nature of the search collection.
   version=1.1 is Solr's version number for the schema syntax and semantics.  It should
   not normally be changed by applications

RE: Strange behavior for certain words

2010-05-12 Thread Naga Darbha
Hi Rama,

What field types are these Title and Description?

You may go to SOLR admin console and try Analysis, and select the field type 
that you have used for Title and Description and provide those words Prepaid 
and Postpaid in the indexing analyzer and see how is it storing the information.

regards,
Naga Ranjan

-Original Message-
From: RamaKrishna Atmakur [mailto:ramkrishn...@hotmail.com] 
Sent: Thursday, May 13, 2010 5:57 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior for certain words


Hi,
   We are trying to use SOLR for searching our catalog online and during QA 
came across a interesting case where SOLR is not returning results that it 
should.

Specificially, we have indexed things like Title and Description, of the 
words in the Title happens to be Prepaid' and Postpaid. However when we 
search on those words, SOLR does not return any results.
But if we search on some other words in the same title in which the word 
Prepaid occurs then the correct results are returned. In fact SOLR even 
returns the result count for the Prepaid and Postpaid facets.

We know that there are no synonyms associated with both those words and these 
words are also not in any other list such as stopwords.txt etc.

Any idea as to why this should be happening ?

Thanks in advance,
Rama
  


Re: Strange Behavior When Using CSVRequestHandler

2010-01-07 Thread danben

Erick - thanks very much, all of this makes sense.  But the one thing I still
find puzzling is the fact that re-adding the file a second, third, fourth
etc time causes numDocs to increase, and ALWAYS by the same amount
(141,645).  Any ideas as to what could cause that?

Dan


Erick Erickson wrote:
 
 I think the root of your problem is that unique fields should NOT
 be multivalued. See
 http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
 
 http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)In
 this case, since you're tokenizing, your query field is
 implicitly multi-valued, I don't know what the behavior will be.
 
 But there's another problem:
 All the filters in your analyzer definition will mess up the
 correspondence between the Unix uniq and numDocs even
 if you got by the above. I.e
 
 StopFilter would make the lines a problem and the problem identical.
 WordDelimiter would do all kinds of interesting things
 LowerCaseFilter would make Myproblem and myproblem identical.
 RemoveDuplicatesFilter would make interesting interesting and
 interesting identical
 
 You could define a second field, make *that* one unique and NOT analyzer
 it in any way...
 
 You could hash your sentences and define the hash as your unique key.
 
 You could
 
 HTH
 Erick
 
 On Wed, Jan 6, 2010 at 1:06 PM, danben dan...@gmail.com wrote:
 

 The problem:

 Not all of the documents that I expect to be indexed are showing up in
 the
 index.

 The background:

 I start off with an empty index based on a schema with a single field
 named
 'query', marked as unique and using the following analyzer:

 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer

 My input is a utf-8 encoded file with one sentence per line.  Its total
 size
 is about 60MB.  I would like each line of the file to correspond to a
 single
 document in the solr index.  If I print the number of unique lines in the
 file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
 the total number of lines in the file gives me around 2.7M.

 I use the following to start indexing:

 curl
 '
 http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=
 \'

 When this command completes, I see numDocs is approximately 470k (which
 is
 what I find strange) and maxDocs is approximately 890k (which is fine
 since
 I know I have around 700k duplicates).  Even more confusing is that if I
 run
 this exact command a second time without performing any other operations,
 numDocs goes up to around 610k, and a third time brings it up to about
 750k.

 Can anyone tell me what might cause Solr not to index everything in my
 input
 file the first time, and why it would be able to index new documents the
 second and third times?

 I also have this line in solrconfig.xml, if it matters:

 requestParsers enableRemoteStreaming=true
 multipartUploadLimitInKB=2048 /

 Thanks,
 Dan

 --
 View this message in context:
 http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange Behavior When Using CSVRequestHandler

2010-01-07 Thread Erick Erickson
It puzzles me too. I don't know the internals of that code
well enough to speculate, but once you're into undefined
behavior, I have great faith in *many* inexplicable things
happening.

Erick

On Thu, Jan 7, 2010 at 9:45 AM, danben dan...@gmail.com wrote:


 Erick - thanks very much, all of this makes sense.  But the one thing I
 still
 find puzzling is the fact that re-adding the file a second, third, fourth
 etc time causes numDocs to increase, and ALWAYS by the same amount
 (141,645).  Any ideas as to what could cause that?

 Dan


 Erick Erickson wrote:
 
  I think the root of your problem is that unique fields should NOT
  be multivalued. See
 
 http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
 
  
 http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)
 In
  this case, since you're tokenizing, your query field is
  implicitly multi-valued, I don't know what the behavior will be.
 
  But there's another problem:
  All the filters in your analyzer definition will mess up the
  correspondence between the Unix uniq and numDocs even
  if you got by the above. I.e
 
  StopFilter would make the lines a problem and the problem identical.
  WordDelimiter would do all kinds of interesting things
  LowerCaseFilter would make Myproblem and myproblem identical.
  RemoveDuplicatesFilter would make interesting interesting and
  interesting identical
 
  You could define a second field, make *that* one unique and NOT analyzer
  it in any way...
 
  You could hash your sentences and define the hash as your unique key.
 
  You could
 
  HTH
  Erick
 
  On Wed, Jan 6, 2010 at 1:06 PM, danben dan...@gmail.com wrote:
 
 
  The problem:
 
  Not all of the documents that I expect to be indexed are showing up in
  the
  index.
 
  The background:
 
  I start off with an empty index based on a schema with a single field
  named
  'query', marked as unique and using the following analyzer:
 
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
 
  My input is a utf-8 encoded file with one sentence per line.  Its total
  size
  is about 60MB.  I would like each line of the file to correspond to a
  single
  document in the solr index.  If I print the number of unique lines in
 the
  file (using cat | sort | uniq | wc -l), I get a little over 2M.
  Printing
  the total number of lines in the file gives me around 2.7M.
 
  I use the following to start indexing:
 
  curl
  '
 
 http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=
  \'
 
  When this command completes, I see numDocs is approximately 470k (which
  is
  what I find strange) and maxDocs is approximately 890k (which is fine
  since
  I know I have around 700k duplicates).  Even more confusing is that if I
  run
  this exact command a second time without performing any other
 operations,
  numDocs goes up to around 610k, and a third time brings it up to about
  750k.
 
  Can anyone tell me what might cause Solr not to index everything in my
  input
  file the first time, and why it would be able to index new documents the
  second and third times?
 
  I also have this line in solrconfig.xml, if it matters:
 
  requestParsers enableRemoteStreaming=true
  multipartUploadLimitInKB=2048 /
 
  Thanks,
  Dan
 
  --
  View this message in context:
 
 http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-%28Solr-1.4%29-tp27026926p27061086.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread danben

The problem:

Not all of the documents that I expect to be indexed are showing up in the
index.

The background:

I start off with an empty index based on a schema with a single field named
'query', marked as unique and using the following analyzer:

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer

My input is a utf-8 encoded file with one sentence per line.  Its total size
is about 60MB.  I would like each line of the file to correspond to a single
document in the solr index.  If I print the number of unique lines in the
file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
the total number of lines in the file gives me around 2.7M.

I use the following to start indexing:

curl
'http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=\'

When this command completes, I see numDocs is approximately 470k (which is
what I find strange) and maxDocs is approximately 890k (which is fine since
I know I have around 700k duplicates).  Even more confusing is that if I run
this exact command a second time without performing any other operations,
numDocs goes up to around 610k, and a third time brings it up to about 750k.

Can anyone tell me what might cause Solr not to index everything in my input
file the first time, and why it would be able to index new documents the
second and third times?

I also have this line in solrconfig.xml, if it matters:

requestParsers enableRemoteStreaming=true
multipartUploadLimitInKB=2048 /

Thanks,
Dan

-- 
View this message in context: 
http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange Behavior When Using CSVRequestHandler

2010-01-06 Thread Erick Erickson
I think the root of your problem is that unique fields should NOT
be multivalued. See
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)

http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=(unique)|(key)In
this case, since you're tokenizing, your query field is
implicitly multi-valued, I don't know what the behavior will be.

But there's another problem:
All the filters in your analyzer definition will mess up the
correspondence between the Unix uniq and numDocs even
if you got by the above. I.e

StopFilter would make the lines a problem and the problem identical.
WordDelimiter would do all kinds of interesting things
LowerCaseFilter would make Myproblem and myproblem identical.
RemoveDuplicatesFilter would make interesting interesting and
interesting identical

You could define a second field, make *that* one unique and NOT analyzer
it in any way...

You could hash your sentences and define the hash as your unique key.

You could

HTH
Erick

On Wed, Jan 6, 2010 at 1:06 PM, danben dan...@gmail.com wrote:


 The problem:

 Not all of the documents that I expect to be indexed are showing up in the
 index.

 The background:

 I start off with an empty index based on a schema with a single field named
 'query', marked as unique and using the following analyzer:

 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer

 My input is a utf-8 encoded file with one sentence per line.  Its total
 size
 is about 60MB.  I would like each line of the file to correspond to a
 single
 document in the solr index.  If I print the number of unique lines in the
 file (using cat | sort | uniq | wc -l), I get a little over 2M.  Printing
 the total number of lines in the file gives me around 2.7M.

 I use the following to start indexing:

 curl
 '
 http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/home/gkropitz/querystage2map/file1stream.contentType=text/plain;charset=utf-8fieldnames=queryescape=
 \'

 When this command completes, I see numDocs is approximately 470k (which is
 what I find strange) and maxDocs is approximately 890k (which is fine since
 I know I have around 700k duplicates).  Even more confusing is that if I
 run
 this exact command a second time without performing any other operations,
 numDocs goes up to around 610k, and a third time brings it up to about
 750k.

 Can anyone tell me what might cause Solr not to index everything in my
 input
 file the first time, and why it would be able to index new documents the
 second and third times?

 I also have this line in solrconfig.xml, if it matters:

 requestParsers enableRemoteStreaming=true
 multipartUploadLimitInKB=2048 /

 Thanks,
 Dan

 --
 View this message in context:
 http://old.nabble.com/Strange-Behavior-When-Using-CSVRequestHandler-tp27026926p27026926.html
 Sent from the Solr - User mailing list archive at Nabble.com.




RE: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-18 Thread Fuad Efendi
UPDATE:

Crazy staff with SLES10 SP2 default installation/partitioning, LVM (Logical
Volume Manager) shows 400Gb available, but... I lost 90% of index without
even noticing that!

Aug 16, 2009 8:04:32 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java)

- then somehow no any exceptions without few hours, no any corrupted index
after several commits, then again not enough space, etc.; finally
corrupted index (still, SATA)


Thanks


-Original Message-
From: Funtick [mailto:f...@efendi.ca] 
Sent: August-18-09 12:25 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR uniqueKey - extremely strange behavior! Documents
disappeared...


sorry for typo in prev msg,

Increase = 2,297,231 - 1,786,552  = 500,000 (average)

RATE (non-unique-id:unique-id) = 7,000,000 : 500,000 = 14:1

but 125:1 (initial 30 hours) was very strange...



Funtick wrote:
 
 UPDATE:
 
 After few more minutes (after previous commit):
 docsPending: about 7,000,000
 
 After commit:
 numDocs: 2,297,231
 
 Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)
 
 So that I have 7 docs with same ID in average.
 
 Having 100,000,000 and then dropping below 1,000,000 is strange; it is a
 bug somewhere... need to investigate ramBufferSize and MergePolicy,
 including SOLR uniqueId implementation...
 
 
 
 Funtick wrote:
 
 After running an application which heavily uses MD5 HEX-representation as
 uniqueKey for SOLR v.1.4-dev-trunk:
 
 1. After 30 hours: 
 101,000,000 documents added
 
 2. Commit: 
 numDocs = 783,714 
 maxDoc = 3,975,393
 
 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851
 
 It looks _extremely_ strange that within an hour I have such a huge
 increase with same 'average' document set...
 
 I am suspecting something goes wrong with Lucene buffer flush / index
 merge OR SOLR - Unique ID handling...
 
 According to my own estimates, I should have about 10,000,000 new
 documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
 day; same 'random' documents.
 
 This morning index size was about 4Gb, then suddenly dropped below 0.5
 Gb. Why? I haven't issued any commit...
 
 I am using ramBufferMB=8192
 
 
 
 
 
 
 
 
 

-- 
View this message in context:
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-D
ocuments-disappeared...-tp25017728p25018263.html
Sent from the Solr - User mailing list archive at Nabble.com.





SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

After running an application which heavily uses MD5 HEX-representation as
uniqueKey for SOLR v.1.4-dev-trunk:

1. After 30 hours: 
101,000,000 documents added

2. Commit: 
numDocs = 783,714 
maxDoc = 3,975,393

3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
optimize:
numDocs=1,281,851
maxDocs=1,281,851

It looks _extremely_ strange that within an hour I have such a huge increase
with same 'average' document set...

I am suspecting something goes wrong with Lucene buffer flush / index merge
OR SOLR - Unique ID handling...

According to my own estimates, I should have about 10,000,000 new documents
now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
'random' documents.

This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
Why? I haven't issued any commit...

I am using ramBufferMB=8192






-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Mark Miller
I'd say you have a lot of documents that have the same id.
When you add a doc with the same id, first the old one is deleted, then the
new one is added (atomically though).

The deleted docs are not removed from the index immediately though - the doc
id is just marked as deleted.

Over time though, as segments are merged due to hitting triggers while
adding new documents, deletes are removed (which deletes depends on which
segments have been merged).

So if you add a tone of documents over time, many with the same ids, you
would likely see this type of maxDoc, numDoc churn. maxDoc will include
deleted docs while numDoc will not.


-- 
- Mark

http://www.lucidimagination.com

On Mon, Aug 17, 2009 at 11:09 PM, Funtick f...@efendi.ca wrote:


 After running an application which heavily uses MD5 HEX-representation as
 uniqueKey for SOLR v.1.4-dev-trunk:

 1. After 30 hours:
 101,000,000 documents added

 2. Commit:
 numDocs = 783,714
 maxDoc = 3,975,393

 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851

 It looks _extremely_ strange that within an hour I have such a huge
 increase
 with same 'average' document set...

 I am suspecting something goes wrong with Lucene buffer flush / index merge
 OR SOLR - Unique ID handling...

 According to my own estimates, I should have about 10,000,000 new documents
 now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
 'random' documents.

 This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
 Why? I haven't issued any commit...

 I am using ramBufferMB=8192






 --
 View this message in context:
 http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick


But how to explain that within an hour (after commit) I have had about
500,000 new documents, and within 30 hours (after commit) only 1,300,000?

Same _random_enough_ documents... 

BTW, SOLR Console was showing only few hundreds deletesById although I
don't use any deleteById explicitly; only update with allowOverwrite and
uniqueId.




markrmiller wrote:
 
 I'd say you have a lot of documents that have the same id.
 When you add a doc with the same id, first the old one is deleted, then
 the
 new one is added (atomically though).
 
 The deleted docs are not removed from the index immediately though - the
 doc
 id is just marked as deleted.
 
 Over time though, as segments are merged due to hitting triggers while
 adding new documents, deletes are removed (which deletes depends on which
 segments have been merged).
 
 So if you add a tone of documents over time, many with the same ids, you
 would likely see this type of maxDoc, numDoc churn. maxDoc will include
 deleted docs while numDoc will not.
 
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 On Mon, Aug 17, 2009 at 11:09 PM, Funtick f...@efendi.ca wrote:
 

 After running an application which heavily uses MD5 HEX-representation as
 uniqueKey for SOLR v.1.4-dev-trunk:

 1. After 30 hours:
 101,000,000 documents added

 2. Commit:
 numDocs = 783,714
 maxDoc = 3,975,393

 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851

 It looks _extremely_ strange that within an hour I have such a huge
 increase
 with same 'average' document set...

 I am suspecting something goes wrong with Lucene buffer flush / index
 merge
 OR SOLR - Unique ID handling...

 According to my own estimates, I should have about 10,000,000 new
 documents
 now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
 'random' documents.

 This morning index size was about 4Gb, then suddenly dropped below 0.5
 Gb.
 Why? I haven't issued any commit...

 I am using ramBufferMB=8192






 --
 View this message in context:
 http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017826.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

One more hour, and I have +0.5 mlns more (after commit/optimize)

Something strange happening with SOLR buffer flush (if we have single
segment???)... explicit commit prevents it...

30 hours, with index flush, commit: 783,714
+ 1 hour, commit, optimize: 1,281,851
+ 1 hour, commit, optimize: 1,786,552

Same random docs retrieved from web...



Funtick wrote:
 
 
 But how to explain that within an hour (after commit) I have had about
 500,000 new documents, and within 30 hours (after commit) only 783,714?
 
 Same _random_enough_ documents... 
 
 BTW, SOLR Console was showing only few hundreds deletesById although I
 don't use any deleteById explicitly; only update with allowOverwrite
 and uniqueId.
 
 
 
 
 markrmiller wrote:
 
 I'd say you have a lot of documents that have the same id.
 When you add a doc with the same id, first the old one is deleted, then
 the
 new one is added (atomically though).
 
 The deleted docs are not removed from the index immediately though - the
 doc
 id is just marked as deleted.
 
 Over time though, as segments are merged due to hitting triggers while
 adding new documents, deletes are removed (which deletes depends on which
 segments have been merged).
 
 So if you add a tone of documents over time, many with the same ids, you
 would likely see this type of maxDoc, numDoc churn. maxDoc will include
 deleted docs while numDoc will not.
 
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 On Mon, Aug 17, 2009 at 11:09 PM, Funtick f...@efendi.ca wrote:
 

 After running an application which heavily uses MD5 HEX-representation
 as
 uniqueKey for SOLR v.1.4-dev-trunk:

 1. After 30 hours:
 101,000,000 documents added

 2. Commit:
 numDocs = 783,714
 maxDoc = 3,975,393

 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851

 It looks _extremely_ strange that within an hour I have such a huge
 increase
 with same 'average' document set...

 I am suspecting something goes wrong with Lucene buffer flush / index
 merge
 OR SOLR - Unique ID handling...

 According to my own estimates, I should have about 10,000,000 new
 documents
 now... I had 0.5 millions within an hour, and 0.8 mlns within a day;
 same
 'random' documents.

 This morning index size was about 4Gb, then suddenly dropped below 0.5
 Gb.
 Why? I haven't issued any commit...

 I am using ramBufferMB=8192






 --
 View this message in context:
 http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017967.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

UPDATE:

After few more minutes (after previous commit):
docsPending: about 7,000,000

After commit:
numDocs: 2,297,231

Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)

So that I have 7 docs with same ID in average.

Having 100,000,000 and then dropping below 1,000,000 is strange; it is a bug
somewhere... need to investigate ramBufferSize and MergePolicy, including
SOLR uniqueId implementation...



Funtick wrote:
 
 After running an application which heavily uses MD5 HEX-representation as
 uniqueKey for SOLR v.1.4-dev-trunk:
 
 1. After 30 hours: 
 101,000,000 documents added
 
 2. Commit: 
 numDocs = 783,714 
 maxDoc = 3,975,393
 
 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851
 
 It looks _extremely_ strange that within an hour I have such a huge
 increase with same 'average' document set...
 
 I am suspecting something goes wrong with Lucene buffer flush / index
 merge OR SOLR - Unique ID handling...
 
 According to my own estimates, I should have about 10,000,000 new
 documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
 day; same 'random' documents.
 
 This morning index size was about 4Gb, then suddenly dropped below 0.5 Gb.
 Why? I haven't issued any commit...
 
 I am using ramBufferMB=8192
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25018221.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick

sorry for typo in prev msg,

Increase = 2,297,231 - 1,786,552  = 500,000 (average)

RATE (non-unique-id:unique-id) = 7,000,000 : 500,000 = 14:1

but 125:1 (initial 30 hours) was very strange...



Funtick wrote:
 
 UPDATE:
 
 After few more minutes (after previous commit):
 docsPending: about 7,000,000
 
 After commit:
 numDocs: 2,297,231
 
 Increase = 2,297,231 - 1,281,851 = 1,000,000 (average)
 
 So that I have 7 docs with same ID in average.
 
 Having 100,000,000 and then dropping below 1,000,000 is strange; it is a
 bug somewhere... need to investigate ramBufferSize and MergePolicy,
 including SOLR uniqueId implementation...
 
 
 
 Funtick wrote:
 
 After running an application which heavily uses MD5 HEX-representation as
 uniqueKey for SOLR v.1.4-dev-trunk:
 
 1. After 30 hours: 
 101,000,000 documents added
 
 2. Commit: 
 numDocs = 783,714 
 maxDoc = 3,975,393
 
 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then
 optimize:
 numDocs=1,281,851
 maxDocs=1,281,851
 
 It looks _extremely_ strange that within an hour I have such a huge
 increase with same 'average' document set...
 
 I am suspecting something goes wrong with Lucene buffer flush / index
 merge OR SOLR - Unique ID handling...
 
 According to my own estimates, I should have about 10,000,000 new
 documents now... I had 0.5 millions within an hour, and 0.8 mlns within a
 day; same 'random' documents.
 
 This morning index size was about 4Gb, then suddenly dropped below 0.5
 Gb. Why? I haven't issued any commit...
 
 I am using ramBufferMB=8192
 
 
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25018263.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Strange behavior

2008-02-12 Thread Yonik Seeley
On Feb 12, 2008 9:50 AM, Traut [EMAIL PROTECTED] wrote:
 Thank you, it works. Stemming filter works only with lowercased words?

I've never tried it in the order you have it.
You could try the analysis admin page and report back what happens...

-Yonik


 On Feb 12, 2008 4:29 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

  Try putting the stemmer after the lowercase filter.
  -Yonik
 
  On Feb 12, 2008 9:15 AM, Traut [EMAIL PROTECTED] wrote:
   Hi all
  
   Please take a look at this strange behavior (connected with stemming I
   suppose):
  
  
   type:
  
   fieldtype name=customTextField class=solr.TextField indexed=true
   stored=false
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true words=
   stopwords.txt/
   filter class=solr.EnglishPorterFilterFactory protected=
   protwords.txt/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true words=
   stopwords.txt/
   filter class=solr.EnglishPorterFilterFactory protected=
   protwords.txt/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldtype
  
   field:
  
   field name=name  type=customTextField indexed=true
   stored=false/
  
  
  
   I'm adding a document:
  
   adddocfield name=id99/fieldfield
   name=nameApple/field/doc/add
  
   commit/
  
  
   Queriyng name:apple - 0 results. Searching name:Apple - 1 result.
  But
   name:appl* - 1 result
  
  
   Adding next document:
  
   adddocfield name=id8/fieldfield
   name=nameSomenamele/field/doc/add
  
   commit/
  
  
   Searching for name:somenamele - 1 result, for name:Somenamele - 1
  result
  
  
   What is the problem with Apple ? Maybe StandardTokenizer understands
  it as
   trademark :) ?
  
  
   Thank you in advence
  
  
   --
   Best regards,
   Traut
  
 



 --
 Best regards,
 Traut



Strange behavior

2008-02-12 Thread Traut
Hi all

Please take a look at this strange behavior (connected with stemming I
suppose):


type:

fieldtype name=customTextField class=solr.TextField indexed=true
stored=false
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.EnglishPorterFilterFactory protected=
protwords.txt/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

field:

field name=name  type=customTextField indexed=true  stored=false/



I'm adding a document:

adddocfield name=id99/fieldfield
name=nameApple/field/doc/add

commit/


Queriyng name:apple - 0 results. Searching name:Apple - 1 result. But
name:appl* - 1 result


Adding next document:

adddocfield name=id8/fieldfield
name=nameSomenamele/field/doc/add

commit/


Searching for name:somenamele - 1 result, for name:Somenamele - 1 result


What is the problem with Apple ? Maybe StandardTokenizer understands it as
trademark :) ?


Thank you in advence


-- 
Best regards,
Traut


Re: Strange behavior

2008-02-12 Thread Traut
Thank you, it works. Stemming filter works only with lowercased words?

On Feb 12, 2008 4:29 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 Try putting the stemmer after the lowercase filter.
 -Yonik

 On Feb 12, 2008 9:15 AM, Traut [EMAIL PROTECTED] wrote:
  Hi all
 
  Please take a look at this strange behavior (connected with stemming I
  suppose):
 
 
  type:
 
  fieldtype name=customTextField class=solr.TextField indexed=true
  stored=false
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true words=
  stopwords.txt/
  filter class=solr.EnglishPorterFilterFactory protected=
  protwords.txt/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true words=
  stopwords.txt/
  filter class=solr.EnglishPorterFilterFactory protected=
  protwords.txt/
  filter class=solr.LowerCaseFilterFactory/
/analyzer
  /fieldtype
 
  field:
 
  field name=name  type=customTextField indexed=true
  stored=false/
 
 
 
  I'm adding a document:
 
  adddocfield name=id99/fieldfield
  name=nameApple/field/doc/add
 
  commit/
 
 
  Queriyng name:apple - 0 results. Searching name:Apple - 1 result.
 But
  name:appl* - 1 result
 
 
  Adding next document:
 
  adddocfield name=id8/fieldfield
  name=nameSomenamele/field/doc/add
 
  commit/
 
 
  Searching for name:somenamele - 1 result, for name:Somenamele - 1
 result
 
 
  What is the problem with Apple ? Maybe StandardTokenizer understands
 it as
  trademark :) ?
 
 
  Thank you in advence
 
 
  --
  Best regards,
  Traut
 




-- 
Best regards,
Traut


Re: Strange behavior

2008-02-12 Thread Yonik Seeley
Try putting the stemmer after the lowercase filter.
-Yonik

On Feb 12, 2008 9:15 AM, Traut [EMAIL PROTECTED] wrote:
 Hi all

 Please take a look at this strange behavior (connected with stemming I
 suppose):


 type:

 fieldtype name=customTextField class=solr.TextField indexed=true
 stored=false
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true words=
 stopwords.txt/
 filter class=solr.EnglishPorterFilterFactory protected=
 protwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true words=
 stopwords.txt/
 filter class=solr.EnglishPorterFilterFactory protected=
 protwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype

 field:

 field name=name  type=customTextField indexed=true  stored=false/



 I'm adding a document:

 adddocfield name=id99/fieldfield
 name=nameApple/field/doc/add

 commit/


 Queriyng name:apple - 0 results. Searching name:Apple - 1 result. But
 name:appl* - 1 result


 Adding next document:

 adddocfield name=id8/fieldfield
 name=nameSomenamele/field/doc/add

 commit/


 Searching for name:somenamele - 1 result, for name:Somenamele - 1 result


 What is the problem with Apple ? Maybe StandardTokenizer understands it as
 trademark :) ?


 Thank you in advence


 --
 Best regards,
 Traut



Re: Strange behavior MoreLikeThis Feature

2007-11-22 Thread Ryan McKinley


Now when I run the following query:
http://localhost:8080/solr/mlt?q=id:neardup06mlt.fl=featuresmlt.mindf=1mlt.mintf=1mlt.displayTerms=detailswt=jsonindent=on



try adding:
 debugQuery=on

to your query string and you can see why each document matches...

My guess is that features uses a text field with stemming and a 
stemmed word matches


ryan


Re: Strange behavior MoreLikeThis Feature

2007-11-22 Thread Rishabh Joshi
Thanks Ryan. I now know the reason why.
Before I explain the reason, let me correct the mistake I made in my earlier
mail. I was not using the first document mentioned in the xml . Instead it
was this one:
doc
  field name=idIW-02/field
  field name=nameiPod amp; iPod Mini USB 2.0 Cable/field
  field name=manuBelkin/field
  field name=catelectronics/field
  field name=catconnector/field
  field name=featurescar power adapter for iPod, white/field
  field name=weight2/field
  field name=price11.50/field
  field name=popularity1/field
  field name=inStockfalse/field
/doc

The reason I was getting strange result was because of the character i.
Here is what I learnt from debug info:

debug:{
  rawquerystring:id:neardup06,
  querystring:id:neardup06,
  parsedquery:features:og features:en features:til features:er
features:af features:der features:ts features:se features:i features:p
features:pet features:brag features:efter features:zombier features:k
features:tilbag features:ala features:sviner features:folk
features:klassisk features:resid features:horder features:lidt
features:man features:denn,
  parsedquery_toString:features:og features:en features:til
features:er features:af features:der features:ts features:se
features:i features:p features:pet features:brag features:efter
features:zombier features:k features:tilbag features:ala
features:sviner features:folk features:klassisk features:resid
features:horder features:lidt features:man features:denn,
  explain:{
id=IW-02,internal_docid=8:\n0.0050230525 = (MATCH) product of:\n
0.12557632 = (MATCH) sum of:\n0.12557632 = (MATCH)
weight(features:i in 8), product of:\n  0.17474915 =
queryWeight(features:i), product of:\n1.9162908 =
idf(docFreq=3)\n0.09119135 = queryNorm\n  0.71860904 =
(MATCH) fieldWeight(features:i in 8), product of:\n1.0 =
tf(termFreq(features:i)=1)\n1.9162908 = idf(docFreq=3)\n
 0.375 = fieldNorm(field=features, doc=8)\n  0.04 = coord(1/25)\n}}}

The field features uses the default fieldtype - text in the schema.xml.
The problem was solved by adding the character i to the
stopwords.txtfile. the is in document 2 were matched with the i in
iPod of document
1.

I still have to figure out why a single character - i - matched the i in
a word - iPod.

Regards,
Rishabh

On 22/11/2007, Ryan McKinley [EMAIL PROTECTED] wrote:

 
  Now when I run the following query:
 
 http://localhost:8080/solr/mlt?q=id:neardup06mlt.fl=featuresmlt.mindf=1mlt.mintf=1mlt.displayTerms=detailswt=jsonindent=on
 

 try adding:
   debugQuery=on

 to your query string and you can see why each document matches...

 My guess is that features uses a text field with stemming and a
 stemmed word matches

 ryan



  1   2   >