Solr 4.10 SSL with Sharded Collection

2017-10-03 Thread Tarala, Magesh
We have a 3 node solr cloud installation running on version 4.10. There is one 
collection that's sharded. After enabling SSL, we are unable to query the 
sharded collection. Getting this error:


"no servers hosting shard:"


I've googled and seen reports of this issue, but have not seen a resolution.

Thanks in advance for your help!


RE: Scramble data

2015-10-08 Thread Tarala, Magesh
I already have the data ingested and it takes several days to do that. I was 
trying to avoid re-ingesting the data. 

Thanks,
Magesh

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, October 07, 2015 9:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Scramble data

Probably sanitize the data on the front end? Something simple like put 
"REDACTED" for all of the customer-sensitive fields.

You might also write a DocTransformer plugin, all you have to do is implement 
subclass DocTransformer and override one very simple "transform" method,

Best,
Erick

On Wed, Oct 7, 2015 at 5:09 PM, Tarala, Magesh  wrote:
> Folks,
> I have a strange question. We have a Solr implementation that we would like 
> to demo to external customers. But we don't want to display the real data, 
> which contains our customer information and so is sensitive data. What's the 
> best way to scramble the data of the Solr Query results? By best I mean the 
> simplest way with least amount of work. BTW, we have a .NET front end 
> application.
>
> Thanks,
> Magesh
>
>
>


Scramble data

2015-10-07 Thread Tarala, Magesh
Folks,
I have a strange question. We have a Solr implementation that we would like to 
demo to external customers. But we don't want to display the real data, which 
contains our customer information and so is sensitive data. What's the best way 
to scramble the data of the Solr Query results? By best I mean the simplest way 
with least amount of work. BTW, we have a .NET front end application.

Thanks,
Magesh





Solr Log Analysis

2015-09-23 Thread Tarala, Magesh
I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3 replicas 
for the collection.

I want to analyze the logs to extract the queries and query times. Is there a 
tool or script someone has created already for this?

Thanks,
Magesh


RE: Solr Cloud Security Question

2015-08-16 Thread Tarala, Magesh
Thanks Shawn!

We are on 4.10.4. Will consider 5.x upgrade shortly. 


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Sunday, August 16, 2015 9:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud Security Question

On 8/16/2015 12:09 PM, Tarala, Magesh wrote:
> I have a solr cloud with 3 nodes. I've added password protection 
> following the steps here: 
> http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-adm
> in-password
> 
> Now only one node is able to load the collections. The others are getting 401 
> Unauthorized error when loading the collections.
> 
> Could anybody provide the instructions to configure security for solr cloud?

Authentication and SolrCloud do not work well together, unless it's 
client-certificate-based authentication (SSL).  This is because there is 
currently no way to tell Solr what user/pass to use when making requests to 
another node.

This was one of the early issues trying to solve the problem with user/pass 
authentication and inter-node requests:

https://issues.apache.org/jira/browse/SOLR-4470

That issue is now closed as a duplicate, because Solr 5.3 will have an 
authentication/authorization framework, and basic authentication is one of the 
first things that has been implemented using that framework:

https://issues.apache.org/jira/browse/SOLR-7692

The release process for 5.3 is underway now.  If all goes well, the release 
will happen well before the end of August.

Thanks,
Shawn



Solr Cloud Security Question

2015-08-16 Thread Tarala, Magesh
I have a solr cloud with 3 nodes. I've added password protection following the 
steps here: 
http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password

Now only one node is able to load the collections. The others are getting 401 
Unauthorized error when loading the collections.

Could anybody provide the instructions to configure security for solr cloud?

Thanks,
Magesh




RE: Duplicate Documents

2015-08-05 Thread Tarala, Magesh
I deleted the index and re-indexed. Duplicates went away. Have not identified 
root cause, but looks like updating documents is causing it sporadically. Going 
to try deleting the document and then update. 


-Original Message-
From: Tarala, Magesh 
Sent: Monday, August 03, 2015 8:27 AM
To: solr-user@lucene.apache.org
Subject: Duplicate Documents

I'm using solr 4.10.2. I'm using "id" field as the unique key - it is passed in 
with the document when ingesting the documents into solr. When querying I get 
duplicate documents with different "_version_". Out off approx. 25K unique 
documents ingested into solr, I see approx. 300 duplicates.

It is a 3 node solr cloud with one shard and 2 replicas.
I'm also using nested documents.

Thanks in advance for any insights.

--Magesh



Duplicate Documents

2015-08-03 Thread Tarala, Magesh
I'm using solr 4.10.2. I'm using "id" field as the unique key - it is passed in 
with the document when ingesting the documents into solr. When querying I get 
duplicate documents with different "_version_". Out off approx. 25K unique 
documents ingested into solr, I see approx. 300 duplicates.

It is a 3 node solr cloud with one shard and 2 replicas.
I'm also using nested documents.

Thanks in advance for any insights.

--Magesh



RE: StandardTokenizerFactory and WhitespaceTokenizerFactory

2015-07-30 Thread Tarala, Magesh
I'm adding PatternReplaceCharFilterFactory to exclude characters. Looks like 
this works. 

-Original Message-
From: Tarala, Magesh 
Sent: Thursday, July 30, 2015 10:37 AM
To: solr-user@lucene.apache.org
Subject: RE: StandardTokenizerFactory and WhitespaceTokenizerFactory

Using PatternReplaceCharFilterFactory to replace comma, period, etc with space 
or empty char will work?

-Original Message-
From: Tarala, Magesh 
Sent: Thursday, July 30, 2015 10:08 AM
To: solr-user@lucene.apache.org
Subject: StandardTokenizerFactory and WhitespaceTokenizerFactory

I am indexing text that contains part numbers in various formats that contain 
hypens/dashes, and a few other special characters.

Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are 
stripped and so I cannot search by the part number 222-333-. I can only 
search for 222 or 333 or 444.
If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but 
I'm not able to search words if they have punctuations like comma or period 
after the word. Example: wheel,

Should I use copy fields and use different tokenizers and then during the 
search based on the search string? Any other options?





RE: StandardTokenizerFactory and WhitespaceTokenizerFactory

2015-07-30 Thread Tarala, Magesh
Using PatternReplaceCharFilterFactory to replace comma, period, etc with space 
or empty char will work?

-Original Message-
From: Tarala, Magesh 
Sent: Thursday, July 30, 2015 10:08 AM
To: solr-user@lucene.apache.org
Subject: StandardTokenizerFactory and WhitespaceTokenizerFactory

I am indexing text that contains part numbers in various formats that contain 
hypens/dashes, and a few other special characters.

Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are 
stripped and so I cannot search by the part number 222-333-. I can only 
search for 222 or 333 or 444.
If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but 
I'm not able to search words if they have punctuations like comma or period 
after the word. Example: wheel,

Should I use copy fields and use different tokenizers and then during the 
search based on the search string? Any other options?





StandardTokenizerFactory and WhitespaceTokenizerFactory

2015-07-30 Thread Tarala, Magesh
I am indexing text that contains part numbers in various formats that contain 
hypens/dashes, and a few other special characters.

Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are 
stripped and so I cannot search by the part number 222-333-. I can only 
search for 222 or 333 or 444.
If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but 
I'm not able to search words if they have punctuations like comma or period 
after the word. Example: wheel,

Should I use copy fields and use different tokenizers and then during the 
search based on the search string? Any other options?





RE: Different scores for the same search

2015-07-23 Thread Tarala, Magesh
 = fieldNorm(doc=10803)\n",
  "874085_HDR":"\n4.4176526 = (MATCH) weight(description:jackshaft in 
16291) [DefaultSimilarity], result of:\n  4.4176526 = fieldWeight in 16291, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.5 = fieldNorm(doc=16291)\n",
  "877118_HDR":"\n4.4176526 = (MATCH) weight(description:jackshaft in 
22106) [DefaultSimilarity], result of:\n  4.4176526 = fieldWeight in 22106, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.5 = fieldNorm(doc=22106)\n",
  "883634_HDR":"\n4.393703 = (MATCH) weight(description:jackshaft in 
24564) [DefaultSimilarity], result of:\n  4.393703 = fieldWeight in 24564, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.042749 = idf(docFreq=3, maxDocs=33828)\n0.4375 = 
fieldNorm(doc=24564)\n",
  "880146_HDR":"\n4.32111 = (MATCH) weight(description:jackshaft in 
7010) [DefaultSimilarity], result of:\n  4.32111 = score(doc=7010,freq=1.0 = 
termFreq=1.0\n), product of:\n0.9994 = queryWeight, product of:\n  
9.876823 = idf(docFreq=4, maxDocs=35820)\n  0.101247124 = queryNorm\n
4.3211102 = fieldWeight in 7010, product of:\n  1.0 = tf(freq=1.0), with 
freq of:\n1.0 = termFreq=1.0\n  9.876823 = idf(docFreq=4, 
maxDocs=35820)\n  0.4375 = fieldNorm(doc=7010)\n",
  "877317_HDR":"\n3.865446 = (MATCH) weight(description:jackshaft in 
21143) [DefaultSimilarity], result of:\n  3.865446 = fieldWeight in 21143, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.4375 = 
fieldNorm(doc=21143)\n",
  "886520_HDR":"\n3.865446 = (MATCH) weight(description:jackshaft in 
2674) [DefaultSimilarity], result of:\n  3.865446 = fieldWeight in 2674, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.4375 = 
fieldNorm(doc=2674)\n",
  "879606_HDR":"\n3.766031 = (MATCH) weight(description:jackshaft in 
979) [DefaultSimilarity], result of:\n  3.766031 = fieldWeight in 979, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
10.042749 = idf(docFreq=3, maxDocs=33828)\n0.375 = fieldNorm(doc=979)\n",
  "802721_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
23662) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 23662, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = 
fieldNorm(doc=23662)\n",
  "875853_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
11654) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 11654, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = 
fieldNorm(doc=11654)\n",
  "878628_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
1141) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 1141, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=1141)\n",
  "880537_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
7821) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 7821, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=7821)\n",
  "884278_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
21496) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 21496, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = 
fieldNorm(doc=21496)\n",
  "885893_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 
26120) [DefaultSimilarity], result of:\n  3.3132396 = fieldWeight in 26120, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = 
fieldNorm(doc=26120)\n"}}}


-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Thursday, July 23, 2015 2:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Different scores for the same search

What it looks like is kinda as Erick suggested - the scores are the same for 
some docs, so it probably depends upon which order they come back from the 
shards as to which wi

Different scores for the same search

2015-07-23 Thread Tarala, Magesh
I'm executing a very simple search in a 3 node cluster - 3 shards with 1 
replica each. Solr version 4.10.2:
http://server1.domain.com:8983/solr/serviceorder_shard1_replica2/select?q=description%3Ajackshaft&fl=service_order&wt=json&indent=true&debugQuery=true

I'm getting different scores when I run them several times. This brings the 
results in different order. How can I ensure the same search returns the 
results with consistent rank?


First time I run the query:

"884420_HDR":"\n6.357613 = (MATCH) weight(description:jackshaft in 3339) 
[DefaultSimilarity], result of:\n  6.357613 = fieldWeight in 3339, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
10.172181 = idf(docFreq=4, maxDocs=48128)\n0.625 = fieldNorm(doc=3339)\n",
  "882606_HDR":"\n5.53756 = (MATCH) weight(description:jackshaft in 
5663) [DefaultSimilarity], result of:\n  5.53756 = fieldWeight in 5663, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
8.860096 = idf(docFreq=12, maxDocs=33693)\n0.625 = fieldNorm(doc=5663)\n",
  "881351_HDR":"\n5.53756 = (MATCH) weight(description:jackshaft in 
10434) [DefaultSimilarity], result of:\n  5.53756 = fieldWeight in 10434, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.625 = 
fieldNorm(doc=10434)\n",
  "880845_HDR":"\n5.0860906 = (MATCH) weight(description:jackshaft in 
19728) [DefaultSimilarity], result of:\n  5.0860906 = fieldWeight in 19728, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.172181 = idf(docFreq=4, maxDocs=48128)\n0.5 = fieldNorm(doc=19728)\n",
  "877923_HDR":"\n5.0860906 = (MATCH) weight(description:jackshaft in 
1569) [DefaultSimilarity], result of:\n  5.0860906 = fieldWeight in 1569, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.172181 = idf(docFreq=4, maxDocs=48128)\n0.5 = fieldNorm(doc=1569)\n",
  "880918_HDR":"\n5.0213747 = (MATCH) weight(description:jackshaft in 
13013) [DefaultSimilarity], result of:\n  5.0213747 = fieldWeight in 13013, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.042749 = idf(docFreq=3, maxDocs=33828)\n0.5 = fieldNorm(doc=13013)\n",
  "880146_HDR":"\n4.4503293 = (MATCH) weight(description:jackshaft in 
24626) [DefaultSimilarity], result of:\n  4.4503293 = fieldWeight in 24626, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.172181 = idf(docFreq=4, maxDocs=48128)\n0.4375 = 
fieldNorm(doc=24626)\n",
  "874085_HDR":"\n4.430048 = (MATCH) weight(description:jackshaft in 
2470) [DefaultSimilarity], result of:\n  4.430048 = fieldWeight in 2470, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.5 = fieldNorm(doc=2470)\n",
  "877118_HDR":"\n4.430048 = (MATCH) weight(description:jackshaft in 
12530) [DefaultSimilarity], result of:\n  4.430048 = fieldWeight in 12530, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.5 = fieldNorm(doc=12530)\n",
  "883634_HDR":"\n4.393703 = (MATCH) weight(description:jackshaft in 
24564) [DefaultSimilarity], result of:\n  4.393703 = fieldWeight in 24564, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  10.042749 = idf(docFreq=3, maxDocs=33828)\n0.4375 = 
fieldNorm(doc=24564)\n",
  "877317_HDR":"\n3.876292 = (MATCH) weight(description:jackshaft in 
12902) [DefaultSimilarity], result of:\n  3.876292 = fieldWeight in 12902, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.4375 = 
fieldNorm(doc=12902)\n",
  "886520_HDR":"\n3.876292 = (MATCH) weight(description:jackshaft in 
2053) [DefaultSimilarity], result of:\n  3.876292 = fieldWeight in 2053, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.4375 = 
fieldNorm(doc=2053)\n",
  "879606_HDR":"\n3.766031 = (MATCH) weight(description:jackshaft in 
979) [DefaultSimilarity], result of:\n  3.766031 = fieldWeight in 979, product 
of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n
10.042749 = idf(docFreq=3, maxDocs=33828)\n0.375 = fieldNorm(doc=979)\n",
  "875853_HDR":"\n3.322536 = (MATCH) weight(description:jackshaft in 
6890) [DefaultSimilarity], result of:\n  3.322536 = fieldWeight in 6890, 
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 = termFreq=1.0\n  
  8.860096 = idf(docFreq=12, maxDocs=33693)\n0.375 = fieldNorm(doc=6890)\n",
  "880537_HDR":"\n3.322536 = (MATCH) weight(description:jackshaft in 
10814) [DefaultSimilarity], result of

RE: Inconsistent Solr Search Results

2015-07-23 Thread Tarala, Magesh
Erick,
The 3 node cluster is setup to use 3 shards each with 1 replica. So, the index 
is split on 3 servers. 

Another piece of info - I think the issue happens only when I use pagination. 
Verifying if that's the case..


Here's a query from the solr log on the server I'm pointing the query to:

INFO  - 2015-07-23 16:56:09.683; org.apache.solr.core.SolrCore; 
[serviceorder_shard1_replica2] webapp=/solr path=/select 
params={f.contact_facet.facet.sort=false&f.environment_facet.facet.mincount=1&facet=true&f.item_category1_hdr_facet.facet.sort=false&f.item_category3_hdr_facet.facet.sort=false&f.order_reason_facet.facet.sort=false&f.ac_model_facet.facet.mincount=1&f.employee_responsible_facet.facet.sort=false&f.header_category2_facet.facet.mincount=1&f.item_category3_hdr_facet.facet.mincount=1&f.ac_model_facet.facet.sort=false&f.environment_facet.facet.sort=false&f.header_status_facet.facet.mincount=1&f.howmal_facet.facet.mincount=1&fl=ac_model,contact,description,header_category2,header_status,id,notes_question_subject,notes_problem_description,notes_quick_response,order_reason,requested_start,serial_number,service_order,sold_to_party&f.operational_effect_facet.facet.sort=false&f.item_category2_hdr_facet.facet.sort=false&f.header_category2_facet.facet.sort=false&f.sold_to_party.facet.sort=false&facet.field=sold_to_party&facet.field=item_category3_hdr_facet&facet.field=item_category2_hdr_facet&facet.field=item_category1_hdr_facet&facet.field=operational_effect_facet&facet.field=when_discovered_facet&facet.field=environment_facet&facet.field=header_category2_facet&facet.field=header_status_facet&facet.field=howmal_facet&facet.field=order_reason_facet&facet.field=employee_responsible_facet&facet.field=contact_facet&facet.field=priority&facet.field=ac_model_facet&f.order_reason_facet.facet.mincount=1&f.howmal_facet.facet.sort=false&fq=requested_start:[+1991-01-01T00:00:00Z+TO+2015-07-23T23:59:59Z+]&fq=document_type:header&f.priority.facet.sort=false&f.header_status_facet.facet.sort=false&f.item_category1_hdr_facet.facet.mincount=1&f.sold_to_party.facet.mincount=1&f.operational_effect_facet.facet.mincount=1&f.priority.facet.mincount=1&rows=100&f.item_category2_hdr_facet.facet.mincount=1&f.employee_responsible_facet.facet.mincount=1&start=0&q={!type%3Dedismax+qf%3D'service_order^9+serial_number^9+material_hdr^9+description^8+notes_problem_description^7+notes_quick_response^6+notes_question_subject^5+notes_internal_note^4+notes_request_comments^3+notes_easa_aircarrier_notes^3+notes_apparent_cause^3+doc_content_hdr^2'+pf%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'+pf2%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'+pf3%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'}driveshaft+corrosion&f.when_discovered_facet.facet.sort=fa





-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, July 23, 2015 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent Solr Search Results

The query you're running would help. But here's a guess:
You say you have a "3 node Solr cluster". By that I'm guessing you mean a 
single shard with 1 leader and 2 replicas.

when the primary sort criteria (score by default) is tied between two 
documents, the internal Lucene doc ID is used as a tiebreaker. If you're doing 
something like a
*:* query (asterisk:asterisk in case bolding happens) then the score for all 
docs is the same. Here's the kicker:
the internal Lucene doc id will be different in each replica.
So my guess is you're getting results from different replicas where the 
internal doc id is between docs has different relations.

So I claim you don't get different results every time, you get
1 of three result orderings at a guess.

If you have a decent size corpus and are searching by more interesting criteria 
this should be much less of a problem, but still theoretically can happen. To 
nail down the ordering completely, specify a secondary sort, as &sort=score,id

Best,
Erick

On Thu, Jul 23, 2015 at 8:46 AM, Tarala, Magesh  wrote:
> I have about 15K documents in a 3 node solr cluster. When I execute a simple 
> search, I g

Inconsistent Solr Search Results

2015-07-23 Thread Tarala, Magesh
I have about 15K documents in a 3 node solr cluster. When I execute a simple 
search, I get the results in different order every time I search. But the 
number of records is the same. Here's the definition for the field.

Any ideas, suggestions would be greatly appreciated.




  







   
  

  







  




Thanks,
Magesh



RE: Solr cloud error during document ingestion

2015-07-13 Thread Tarala, Magesh
Shawn,
Here are my responses:

>> Is that the entire error, or is there additional error information?  Do
>> you have any way to know exactly what is in that request that's throwing
>> the error?
That's the entire error stack. Don’t see anything else in solr log. Probably 
need to turn on additional logging? 
I've identified the text in the email (.msg) that's causing it. This is it: 
(daños)
The tilde in the n is the culprit. If I remove this and run the load, it works 
fine. 

>> You said 4.10.2 ... is this the Solr or SolrJ version?  Are both of them
>> the same version?  Are you running Solr in the included jetty, or have
>> you installed it into another servlet container?  What Java vendor and
>> version are you running, and is it 64-bit?
Solr version is 4.10.2
Solrj version is 4.10.3
Using built in Jetty. 
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)

>> Can you share your SolrJ code, solrconfig, schema, and any other
>> information you can think of that might be relevant?
Yes, absolutely. Where would you like to see it posted?

>> Because the error is from a request, I doubt that autoCommit has
>> anything to do with the problem, but I could be wrong about that.
Yes, agree. This is not related to autoCommit.


-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Sunday, July 12, 2015 6:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud error during document ingestion

On 7/11/2015 9:33 PM, Tarala, Magesh wrote:
> I'm using 4.10.2 in a 3 node solr cloud setup
> I have a collection with 3 shards and 2 replicas each.
> I'm ingesting solr documents via solrj.
> 
> While ingesting the documents, I get the following error:
> 
> 264147944 [updateExecutor-1-thread-268] ERROR 
> org.apache.solr.update.StreamingSolrServers  ? error 
> org.apache.solr.common.SolrException: Bad Request
> 
> request: 
> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:745)
> 
> I commit after every 100 documents in solrj.
> And I also have the following solrconfig.xml setting:
>  
>${solr.autoCommit.maxTime:15000}
>false
>  

Is that the entire error, or is there additional error information?  Do
you have any way to know exactly what is in that request that's throwing
the error?

You said 4.10.2 ... is this the Solr or SolrJ version?  Are both of them
the same version?  Are you running Solr in the included jetty, or have
you installed it into another servlet container?  What Java vendor and
version are you running, and is it 64-bit?

Can you share your SolrJ code, solrconfig, schema, and any other
information you can think of that might be relevant?

Because the error is from a request, I doubt that autoCommit has
anything to do with the problem, but I could be wrong about that.

Thanks,
Shawn



RE: Solr cloud error during document ingestion

2015-07-12 Thread Tarala, Magesh
I'm using solrj to ingest the documents. But I'm using only one client now. 

Yes Erick, it is weird. You are right, it is already in UTF-8 and so even if I 
convert it explicitly it is the same string and so, same issue... I'm still 
stumped:(

The code is straight forward. I am using Tika and solrj. 
Creating Tika AutoDetectParser(), getting the BodyContentHandler, creating a 
SafeContentHandler, and getting the content as string. 
Then passing the string to SolrInputDocument
The error is caused when the SolrInputDocument is sent to solr. 

One other piece of info.. We are creating nested documents.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, July 12, 2015 1:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud error during document ingestion

How are you ingesting documents? ExtractingRequestHandler? That loads all the 
work to the Solr node(s) you might want to consider using SolrJ as that gives 
you much more control as well as the ability to farm out the work to N clients.

Another blog:
https://lucidworks.com/blog/indexing-with-solrj/

 Best,
Erick

P.S. Glad you found the problem, but it's a little weird. Solr already talks 
UTF-8 so this should "just work", but then I'm not familiar with all the 
details of your setup.



On Sun, Jul 12, 2015 at 10:11 AM, Tarala, Magesh  wrote:
> I narrowed down the cause. And it is a character issue!
>
> The .msg file content I'm extracting using Tika parser has this text 
> (daños) If I remove the character n with the tilde, it works.
>
> Explicitly convert to UTF-8 before sending it to solr?
>
> Erick - I'm in the QA phase. I'll be ingesting around 800K documents total 
> (word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates 
> when we first go to prod end of month. i.e., capture all the new and modified 
> documents on a daily basis and update solr. Once we get a grasp of things, we 
> want to go near real time. Thanks for the link to your post. It is very 
> helpful.
>
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, July 12, 2015 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr cloud error during document ingestion
>
> Probably not related to your problem, but if you're sending lots of docs at 
> Solr, committing every 100 is very aggressive.
> I'm assuming you're committing from the client, which, while OK 
> doesn't scale very well if you ever decide to have more than
> 1 client sending docs.
>
> I'd recommend setting your hard commit to a minute or so and just leaving it 
> at that if possible, with soft committing to make the docs visible.
>
> Here's more than you ever wanted to know about soft commits, hard commits and 
> such:
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-
> and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev 
>  wrote:
>> I suggest to check
>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2
>> <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?u
>> p
>> date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%
>> 2 Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2>
>> logs to find root cause.
>>
>> On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh  wrote:
>>
>>> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection 
>>> with 3 shards and 2 replicas each.
>>> I'm ingesting solr documents via solrj.
>>>
>>> While ingesting the documents, I get the following error:
>>>
>>> 264147944 [updateExecutor-1-thread-268] ERROR 
>>> org.apache.solr.update.StreamingSolrServers  ? error
>>> org.apache.solr.common.SolrException: Bad Request
>>>
>>> request:
>>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2
>>> at
>>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>at java.lang.Thread.run(Thread.java:745)
>>>
>>> I commit after every 100 documents in solrj.
>>> And I also have the following solrconfig.xml setting:
>>>  
>>> 

RE: Solr cloud error during document ingestion

2015-07-12 Thread Tarala, Magesh
I narrowed down the cause. And it is a character issue! 

The .msg file content I'm extracting using Tika parser has this text (daños)
If I remove the character n with the tilde, it works. 

Explicitly convert to UTF-8 before sending it to solr?

Erick - I'm in the QA phase. I'll be ingesting around 800K documents total 
(word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates when 
we first go to prod end of month. i.e., capture all the new and modified 
documents on a daily basis and update solr. Once we get a grasp of things, we 
want to go near real time. Thanks for the link to your post. It is very 
helpful. 



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, July 12, 2015 11:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud error during document ingestion

Probably not related to your problem, but if you're sending lots of docs at 
Solr, committing every 100 is very aggressive.
I'm assuming you're committing from the client, which, while OK doesn't scale 
very well if you ever decide to have more than
1 client sending docs.

I'd recommend setting your hard commit to a minute or so and just leaving it at 
that if possible, with soft committing to make the docs visible.

Here's more than you ever wanted to know about soft commits, hard commits and 
such:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev  
wrote:
> I suggest to check
> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2
> <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?up
> date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2
> Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2>
> logs to find root cause.
>
> On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh  wrote:
>
>> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection 
>> with 3 shards and 2 replicas each.
>> I'm ingesting solr documents via solrj.
>>
>> While ingesting the documents, I get the following error:
>>
>> 264147944 [updateExecutor-1-thread-268] ERROR 
>> org.apache.solr.update.StreamingSolrServers  ? error
>> org.apache.solr.common.SolrException: Bad Request
>>
>> request:
>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2
>> at
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>at java.lang.Thread.run(Thread.java:745)
>>
>> I commit after every 100 documents in solrj.
>> And I also have the following solrconfig.xml setting:
>>  
>>${solr.autoCommit.maxTime:15000}
>>false
>>  
>>
>>
>> IMO, tlogs (for serviceorder_shard1_replica2) are not too big
>> -rw-r--r-- 1 solr users  8338 Jul 11 21:40 tlog.364
>> -rw-r--r-- 1 solr users  6385 Jul 11 21:40 tlog.365
>> -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.366
>> -rw-r--r-- 1 solr users  5981 Jul 11 21:41 tlog.367
>> -rw-r--r-- 1 solr users  2682 Jul 11 21:41 tlog.368
>> -rw-r--r-- 1 solr users  8515 Jul 11 21:42 tlog.369
>> -rw-r--r-- 1 solr users  7373 Jul 11 21:42 tlog.370
>> -rw-r--r-- 1 solr users  6907 Jul 11 21:42 tlog.371
>> -rw-r--r-- 1 solr users  5524 Jul 11 21:42 tlog.372
>> -rw-r--r-- 1 solr users  5600 Jul 11 21:43 tlog.373
>>
>>
>> So far I've not been able to resolve this issue. Any ideas / pointers 
>> would be greatly appreciated!
>>
>> Thanks,
>> Magesh
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 


Solr cloud error during document ingestion

2015-07-11 Thread Tarala, Magesh
I'm using 4.10.2 in a 3 node solr cloud setup
I have a collection with 3 shards and 2 replicas each.
I'm ingesting solr documents via solrj.

While ingesting the documents, I get the following error:

264147944 [updateExecutor-1-thread-268] ERROR 
org.apache.solr.update.StreamingSolrServers  ? error 
org.apache.solr.common.SolrException: Bad Request

request: 
http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)

I commit after every 100 documents in solrj.
And I also have the following solrconfig.xml setting:
 
   ${solr.autoCommit.maxTime:15000}
   false
 


IMO, tlogs (for serviceorder_shard1_replica2) are not too big
-rw-r--r-- 1 solr users  8338 Jul 11 21:40 tlog.364
-rw-r--r-- 1 solr users  6385 Jul 11 21:40 tlog.365
-rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.366
-rw-r--r-- 1 solr users  5981 Jul 11 21:41 tlog.367
-rw-r--r-- 1 solr users  2682 Jul 11 21:41 tlog.368
-rw-r--r-- 1 solr users  8515 Jul 11 21:42 tlog.369
-rw-r--r-- 1 solr users  7373 Jul 11 21:42 tlog.370
-rw-r--r-- 1 solr users  6907 Jul 11 21:42 tlog.371
-rw-r--r-- 1 solr users  5524 Jul 11 21:42 tlog.372
-rw-r--r-- 1 solr users  5600 Jul 11 21:43 tlog.373


So far I've not been able to resolve this issue. Any ideas / pointers would be 
greatly appreciated!

Thanks,
Magesh



RE: Solr Encoding Issue?

2015-07-08 Thread Tarala, Magesh
Shawn - Stupid coding error in my java code. Used default charset. Changed to 
UTF-8 and problem fixed. 

Thanks again!

-Original Message-
From: Tarala, Magesh 
Sent: Wednesday, July 08, 2015 8:11 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Encoding Issue?

Wow, that makes total sense. Thanks Shawn!! 

I'll go down this path. 

Thanks,
Magesh

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, July 08, 2015 7:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Encoding Issue?

On 7/8/2015 6:09 PM, Tarala, Magesh wrote:
> I believe the issue is in solr. The character “à” is getting stored in solr 
> as “Ã ”. Notice the space after Ã.
>
> I'm using solrj to ingest the documents into solr. So, one of those could be 
> the culprit?

Solr accepts and outputs text in UTF-8.  The UTF-8 hex encoding for the à 
character is C3A0.

In the latin1 character set, hex C3 is the à character.  Similarly, in latin1, 
hex A0 is a non-breaking space.

So it sounds like your input is encoded as UTF-8, therefore that character in 
your input source is hex c3a0, but something in your indexing process is 
incorrectly interpreting the UTF-8 representation as latin1, so it sees it as 
"Ã ".

SolrJ is faithfully converting that input to UTF-8 and sending it to Solr.

Thanks,
Shawn



RE: Solr Encoding Issue?

2015-07-08 Thread Tarala, Magesh
Wow, that makes total sense. Thanks Shawn!! 

I'll go down this path. 

Thanks,
Magesh

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, July 08, 2015 7:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Encoding Issue?

On 7/8/2015 6:09 PM, Tarala, Magesh wrote:
> I believe the issue is in solr. The character “à” is getting stored in solr 
> as “Ã ”. Notice the space after Ã.
>
> I'm using solrj to ingest the documents into solr. So, one of those could be 
> the culprit?

Solr accepts and outputs text in UTF-8.  The UTF-8 hex encoding for the à 
character is C3A0.

In the latin1 character set, hex C3 is the à character.  Similarly, in latin1, 
hex A0 is a non-breaking space.

So it sounds like your input is encoded as UTF-8, therefore that character in 
your input source is hex c3a0, but something in your indexing process is 
incorrectly interpreting the UTF-8 representation as latin1, so it sees it as 
"Ã ".

SolrJ is faithfully converting that input to UTF-8 and sending it to Solr.

Thanks,
Shawn



RE: Solr Encoding Issue?

2015-07-08 Thread Tarala, Magesh
Thanks Erick.

I believe the issue is in solr. The character “à” is getting stored in solr as 
“Ã ”. Notice the space after Ã.

I'm using solrj to ingest the documents into solr. So, one of those could be 
the culprit?


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, July 08, 2015 1:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Encoding Issue?

Attachments are pretty aggressively stripped by the e-mail server, so there's 
nothing to see, you'll have to paste it somewhere else and provide a link.

Usually, though, this is a character set issue with the browser using a 
different charset than Solr, it's really the same character, just displayed 
differently.

Shot in the dark though.

Erick

On Wed, Jul 8, 2015 at 10:49 AM, Tarala, Magesh  wrote:

>  I’m ingesting a .TXT file with HTML content into Solr. The content 
> has the following character highlighted below:
>
> The file we get from CRM (also attached):
>
> [image: cid:image001.png@01D0B972.75BE23F0]
>
>
>
>
>
> After ingesting into solr, I see a different character. This is query 
> response from solr management console.
>
>
>
> [image: cid:image003.png@01D0B972.D1AED290]
>
>
>
>
>
> Anybody know how I can prevent this from happening?
>
>
>
> Thanks!
>


RE: Solr Encoding Issue?

2015-07-08 Thread Tarala, Magesh
Looks like images did not come through. Here's the text...


I'm ingesting a .TXT file with HTML content into Solr. The content has the 
following character highlighted below:
The file we get from CRM (also attached):
Enter Data in TK Onlyà



After ingesting into solr, I see a different character. This is query response 
from solr management console.
Enter Data in TK Onlyà 



I'm expecting to see à
But I'm seeing à 

Anybody know how I can prevent this from happening?

Thanks!


Solr Encoding Issue?

2015-07-08 Thread Tarala, Magesh
I'm ingesting a .TXT file with HTML content into Solr. The content has the 
following character highlighted below:
The file we get from CRM (also attached):
[cid:image001.png@01D0B972.75BE23F0]


After ingesting into solr, I see a different character. This is query response 
from solr management console.

[cid:image003.png@01D0B972.D1AED290]


Anybody know how I can prevent this from happening?

Thanks!


RE: Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
Hi Shawn - Thank you for the quick and detailed response!! 

Good to hear that Jetty 8 installation with solr for typical uses does not need 
to be modified. 

I believe what we have is a "typical" use case. We will be installing solr on 3 
nodes in our Hadoop cluster. Will use Hadoop's zookeeper. One collection with 3 
shards and 2 replicas each. Have not benchmarked performance. So, may need more 
shards, nodes,... Data volume and user volumes are not very high. But we are 
using nested document structure. We are concerned that it may introduce 
performance issues. Will check it out. 

Regarding you recommendation to upgrade to Solr 5.2.1, we have Hortonworks HDP 
2.2 in place and they support 4.10. Will revisit the decision. 

Thanks,
Magesh

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Monday, June 29, 2015 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Jetty Plus for Solr 4.10.4

On 6/29/2015 8:44 AM, Tarala, Magesh wrote:
> We are planning to go to production with Solr 4.10.4. Documentation 
> recommends to use full Jetty package that includes JettyPlus. I'm not able to 
> find the instructions to do this. Can someone point me in the right direction?

I found the official page that talks about JettyPlus.

https://wiki.apache.org/solr/SolrJetty

Note at the top of the page where it says that info is outdated for Jetty 8.  
Solr has been using Jetty 8 since version 4.0-ALPHA -- for nearly three years 
now.  Typical use cases for Solr do *not* require a full Jetty install.  Even 
most non-typical use cases do not require it.

Solr 4.10 includes the bin/solr script for startup, which runs the Jetty that's 
included in the Solr download.  Solr 5.x makes those scripts even better.

If you haven't made it to production yet, you should probably consider 
upgrading to Solr 5.2.1.

If you are not going to use the Jetty included with Solr, then you're pretty 
much on your own. You can take the war file from the dist directory, the 
logging jars from the example/lib/ext directory, and the logging config from 
example/resources, and install it in most of the available servlet containers.

Starting with 5.0, the included Jetty is the only officially supported way to 
start Solr, and the war is no longer included in the dist directory in the 
download.

https://wiki.apache.org/solr/WhyNoWar

Thanks,
Shawn



Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
We are planning to go to production with Solr 4.10.4. Documentation recommends 
to use full Jetty package that includes JettyPlus. I'm not able to find the 
instructions to do this. Can someone point me in the right direction?

Thanks,
Magesh