Solr 4.10 SSL with Sharded Collection
We have a 3 node solr cloud installation running on version 4.10. There is one collection that's sharded. After enabling SSL, we are unable to query the sharded collection. Getting this error: "no servers hosting shard:" I've googled and seen reports of this issue, but have not seen a resolution. Thanks in advance for your help!
RE: Scramble data
I already have the data ingested and it takes several days to do that. I was trying to avoid re-ingesting the data. Thanks, Magesh -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, October 07, 2015 9:26 PM To: solr-user@lucene.apache.org Subject: Re: Scramble data Probably sanitize the data on the front end? Something simple like put "REDACTED" for all of the customer-sensitive fields. You might also write a DocTransformer plugin, all you have to do is implement subclass DocTransformer and override one very simple "transform" method, Best, Erick On Wed, Oct 7, 2015 at 5:09 PM, Tarala, Magesh wrote: > Folks, > I have a strange question. We have a Solr implementation that we would like > to demo to external customers. But we don't want to display the real data, > which contains our customer information and so is sensitive data. What's the > best way to scramble the data of the Solr Query results? By best I mean the > simplest way with least amount of work. BTW, we have a .NET front end > application. > > Thanks, > Magesh > > >
Scramble data
Folks, I have a strange question. We have a Solr implementation that we would like to demo to external customers. But we don't want to display the real data, which contains our customer information and so is sensitive data. What's the best way to scramble the data of the Solr Query results? By best I mean the simplest way with least amount of work. BTW, we have a .NET front end application. Thanks, Magesh
Solr Log Analysis
I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3 replicas for the collection. I want to analyze the logs to extract the queries and query times. Is there a tool or script someone has created already for this? Thanks, Magesh
RE: Solr Cloud Security Question
Thanks Shawn! We are on 4.10.4. Will consider 5.x upgrade shortly. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Sunday, August 16, 2015 9:05 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud Security Question On 8/16/2015 12:09 PM, Tarala, Magesh wrote: > I have a solr cloud with 3 nodes. I've added password protection > following the steps here: > http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-adm > in-password > > Now only one node is able to load the collections. The others are getting 401 > Unauthorized error when loading the collections. > > Could anybody provide the instructions to configure security for solr cloud? Authentication and SolrCloud do not work well together, unless it's client-certificate-based authentication (SSL). This is because there is currently no way to tell Solr what user/pass to use when making requests to another node. This was one of the early issues trying to solve the problem with user/pass authentication and inter-node requests: https://issues.apache.org/jira/browse/SOLR-4470 That issue is now closed as a duplicate, because Solr 5.3 will have an authentication/authorization framework, and basic authentication is one of the first things that has been implemented using that framework: https://issues.apache.org/jira/browse/SOLR-7692 The release process for 5.3 is underway now. If all goes well, the release will happen well before the end of August. Thanks, Shawn
Solr Cloud Security Question
I have a solr cloud with 3 nodes. I've added password protection following the steps here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password Now only one node is able to load the collections. The others are getting 401 Unauthorized error when loading the collections. Could anybody provide the instructions to configure security for solr cloud? Thanks, Magesh
RE: Duplicate Documents
I deleted the index and re-indexed. Duplicates went away. Have not identified root cause, but looks like updating documents is causing it sporadically. Going to try deleting the document and then update. -Original Message- From: Tarala, Magesh Sent: Monday, August 03, 2015 8:27 AM To: solr-user@lucene.apache.org Subject: Duplicate Documents I'm using solr 4.10.2. I'm using "id" field as the unique key - it is passed in with the document when ingesting the documents into solr. When querying I get duplicate documents with different "_version_". Out off approx. 25K unique documents ingested into solr, I see approx. 300 duplicates. It is a 3 node solr cloud with one shard and 2 replicas. I'm also using nested documents. Thanks in advance for any insights. --Magesh
Duplicate Documents
I'm using solr 4.10.2. I'm using "id" field as the unique key - it is passed in with the document when ingesting the documents into solr. When querying I get duplicate documents with different "_version_". Out off approx. 25K unique documents ingested into solr, I see approx. 300 duplicates. It is a 3 node solr cloud with one shard and 2 replicas. I'm also using nested documents. Thanks in advance for any insights. --Magesh
RE: StandardTokenizerFactory and WhitespaceTokenizerFactory
I'm adding PatternReplaceCharFilterFactory to exclude characters. Looks like this works. -Original Message- From: Tarala, Magesh Sent: Thursday, July 30, 2015 10:37 AM To: solr-user@lucene.apache.org Subject: RE: StandardTokenizerFactory and WhitespaceTokenizerFactory Using PatternReplaceCharFilterFactory to replace comma, period, etc with space or empty char will work? -Original Message- From: Tarala, Magesh Sent: Thursday, July 30, 2015 10:08 AM To: solr-user@lucene.apache.org Subject: StandardTokenizerFactory and WhitespaceTokenizerFactory I am indexing text that contains part numbers in various formats that contain hypens/dashes, and a few other special characters. Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are stripped and so I cannot search by the part number 222-333-. I can only search for 222 or 333 or 444. If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but I'm not able to search words if they have punctuations like comma or period after the word. Example: wheel, Should I use copy fields and use different tokenizers and then during the search based on the search string? Any other options?
RE: StandardTokenizerFactory and WhitespaceTokenizerFactory
Using PatternReplaceCharFilterFactory to replace comma, period, etc with space or empty char will work? -Original Message- From: Tarala, Magesh Sent: Thursday, July 30, 2015 10:08 AM To: solr-user@lucene.apache.org Subject: StandardTokenizerFactory and WhitespaceTokenizerFactory I am indexing text that contains part numbers in various formats that contain hypens/dashes, and a few other special characters. Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are stripped and so I cannot search by the part number 222-333-. I can only search for 222 or 333 or 444. If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but I'm not able to search words if they have punctuations like comma or period after the word. Example: wheel, Should I use copy fields and use different tokenizers and then during the search based on the search string? Any other options?
StandardTokenizerFactory and WhitespaceTokenizerFactory
I am indexing text that contains part numbers in various formats that contain hypens/dashes, and a few other special characters. Here's the problem: If I use StandardTokenizerFactory, the hypens, etc are stripped and so I cannot search by the part number 222-333-. I can only search for 222 or 333 or 444. If I use the WhitespaceTokenizerFactory instead, I can search part numbers, but I'm not able to search words if they have punctuations like comma or period after the word. Example: wheel, Should I use copy fields and use different tokenizers and then during the search based on the search string? Any other options?
RE: Different scores for the same search
= fieldNorm(doc=10803)\n", "874085_HDR":"\n4.4176526 = (MATCH) weight(description:jackshaft in 16291) [DefaultSimilarity], result of:\n 4.4176526 = fieldWeight in 16291, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.5 = fieldNorm(doc=16291)\n", "877118_HDR":"\n4.4176526 = (MATCH) weight(description:jackshaft in 22106) [DefaultSimilarity], result of:\n 4.4176526 = fieldWeight in 22106, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.5 = fieldNorm(doc=22106)\n", "883634_HDR":"\n4.393703 = (MATCH) weight(description:jackshaft in 24564) [DefaultSimilarity], result of:\n 4.393703 = fieldWeight in 24564, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.042749 = idf(docFreq=3, maxDocs=33828)\n0.4375 = fieldNorm(doc=24564)\n", "880146_HDR":"\n4.32111 = (MATCH) weight(description:jackshaft in 7010) [DefaultSimilarity], result of:\n 4.32111 = score(doc=7010,freq=1.0 = termFreq=1.0\n), product of:\n0.9994 = queryWeight, product of:\n 9.876823 = idf(docFreq=4, maxDocs=35820)\n 0.101247124 = queryNorm\n 4.3211102 = fieldWeight in 7010, product of:\n 1.0 = tf(freq=1.0), with freq of:\n1.0 = termFreq=1.0\n 9.876823 = idf(docFreq=4, maxDocs=35820)\n 0.4375 = fieldNorm(doc=7010)\n", "877317_HDR":"\n3.865446 = (MATCH) weight(description:jackshaft in 21143) [DefaultSimilarity], result of:\n 3.865446 = fieldWeight in 21143, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.4375 = fieldNorm(doc=21143)\n", "886520_HDR":"\n3.865446 = (MATCH) weight(description:jackshaft in 2674) [DefaultSimilarity], result of:\n 3.865446 = fieldWeight in 2674, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.4375 = fieldNorm(doc=2674)\n", "879606_HDR":"\n3.766031 = (MATCH) weight(description:jackshaft in 979) [DefaultSimilarity], result of:\n 3.766031 = fieldWeight in 979, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.042749 = idf(docFreq=3, maxDocs=33828)\n0.375 = fieldNorm(doc=979)\n", "802721_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 23662) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 23662, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=23662)\n", "875853_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 11654) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 11654, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=11654)\n", "878628_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 1141) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 1141, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=1141)\n", "880537_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 7821) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 7821, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=7821)\n", "884278_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 21496) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 21496, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=21496)\n", "885893_HDR":"\n3.3132396 = (MATCH) weight(description:jackshaft in 26120) [DefaultSimilarity], result of:\n 3.3132396 = fieldWeight in 26120, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.835305 = idf(docFreq=12, maxDocs=32868)\n0.375 = fieldNorm(doc=26120)\n"}}} -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, July 23, 2015 2:36 PM To: solr-user@lucene.apache.org Subject: Re: Different scores for the same search What it looks like is kinda as Erick suggested - the scores are the same for some docs, so it probably depends upon which order they come back from the shards as to which wi
Different scores for the same search
I'm executing a very simple search in a 3 node cluster - 3 shards with 1 replica each. Solr version 4.10.2: http://server1.domain.com:8983/solr/serviceorder_shard1_replica2/select?q=description%3Ajackshaft&fl=service_order&wt=json&indent=true&debugQuery=true I'm getting different scores when I run them several times. This brings the results in different order. How can I ensure the same search returns the results with consistent rank? First time I run the query: "884420_HDR":"\n6.357613 = (MATCH) weight(description:jackshaft in 3339) [DefaultSimilarity], result of:\n 6.357613 = fieldWeight in 3339, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.172181 = idf(docFreq=4, maxDocs=48128)\n0.625 = fieldNorm(doc=3339)\n", "882606_HDR":"\n5.53756 = (MATCH) weight(description:jackshaft in 5663) [DefaultSimilarity], result of:\n 5.53756 = fieldWeight in 5663, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.625 = fieldNorm(doc=5663)\n", "881351_HDR":"\n5.53756 = (MATCH) weight(description:jackshaft in 10434) [DefaultSimilarity], result of:\n 5.53756 = fieldWeight in 10434, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.625 = fieldNorm(doc=10434)\n", "880845_HDR":"\n5.0860906 = (MATCH) weight(description:jackshaft in 19728) [DefaultSimilarity], result of:\n 5.0860906 = fieldWeight in 19728, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.172181 = idf(docFreq=4, maxDocs=48128)\n0.5 = fieldNorm(doc=19728)\n", "877923_HDR":"\n5.0860906 = (MATCH) weight(description:jackshaft in 1569) [DefaultSimilarity], result of:\n 5.0860906 = fieldWeight in 1569, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.172181 = idf(docFreq=4, maxDocs=48128)\n0.5 = fieldNorm(doc=1569)\n", "880918_HDR":"\n5.0213747 = (MATCH) weight(description:jackshaft in 13013) [DefaultSimilarity], result of:\n 5.0213747 = fieldWeight in 13013, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.042749 = idf(docFreq=3, maxDocs=33828)\n0.5 = fieldNorm(doc=13013)\n", "880146_HDR":"\n4.4503293 = (MATCH) weight(description:jackshaft in 24626) [DefaultSimilarity], result of:\n 4.4503293 = fieldWeight in 24626, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.172181 = idf(docFreq=4, maxDocs=48128)\n0.4375 = fieldNorm(doc=24626)\n", "874085_HDR":"\n4.430048 = (MATCH) weight(description:jackshaft in 2470) [DefaultSimilarity], result of:\n 4.430048 = fieldWeight in 2470, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.5 = fieldNorm(doc=2470)\n", "877118_HDR":"\n4.430048 = (MATCH) weight(description:jackshaft in 12530) [DefaultSimilarity], result of:\n 4.430048 = fieldWeight in 12530, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.5 = fieldNorm(doc=12530)\n", "883634_HDR":"\n4.393703 = (MATCH) weight(description:jackshaft in 24564) [DefaultSimilarity], result of:\n 4.393703 = fieldWeight in 24564, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.042749 = idf(docFreq=3, maxDocs=33828)\n0.4375 = fieldNorm(doc=24564)\n", "877317_HDR":"\n3.876292 = (MATCH) weight(description:jackshaft in 12902) [DefaultSimilarity], result of:\n 3.876292 = fieldWeight in 12902, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.4375 = fieldNorm(doc=12902)\n", "886520_HDR":"\n3.876292 = (MATCH) weight(description:jackshaft in 2053) [DefaultSimilarity], result of:\n 3.876292 = fieldWeight in 2053, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.4375 = fieldNorm(doc=2053)\n", "879606_HDR":"\n3.766031 = (MATCH) weight(description:jackshaft in 979) [DefaultSimilarity], result of:\n 3.766031 = fieldWeight in 979, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 10.042749 = idf(docFreq=3, maxDocs=33828)\n0.375 = fieldNorm(doc=979)\n", "875853_HDR":"\n3.322536 = (MATCH) weight(description:jackshaft in 6890) [DefaultSimilarity], result of:\n 3.322536 = fieldWeight in 6890, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 8.860096 = idf(docFreq=12, maxDocs=33693)\n0.375 = fieldNorm(doc=6890)\n", "880537_HDR":"\n3.322536 = (MATCH) weight(description:jackshaft in 10814) [DefaultSimilarity], result of
RE: Inconsistent Solr Search Results
Erick, The 3 node cluster is setup to use 3 shards each with 1 replica. So, the index is split on 3 servers. Another piece of info - I think the issue happens only when I use pagination. Verifying if that's the case.. Here's a query from the solr log on the server I'm pointing the query to: INFO - 2015-07-23 16:56:09.683; org.apache.solr.core.SolrCore; [serviceorder_shard1_replica2] webapp=/solr path=/select params={f.contact_facet.facet.sort=false&f.environment_facet.facet.mincount=1&facet=true&f.item_category1_hdr_facet.facet.sort=false&f.item_category3_hdr_facet.facet.sort=false&f.order_reason_facet.facet.sort=false&f.ac_model_facet.facet.mincount=1&f.employee_responsible_facet.facet.sort=false&f.header_category2_facet.facet.mincount=1&f.item_category3_hdr_facet.facet.mincount=1&f.ac_model_facet.facet.sort=false&f.environment_facet.facet.sort=false&f.header_status_facet.facet.mincount=1&f.howmal_facet.facet.mincount=1&fl=ac_model,contact,description,header_category2,header_status,id,notes_question_subject,notes_problem_description,notes_quick_response,order_reason,requested_start,serial_number,service_order,sold_to_party&f.operational_effect_facet.facet.sort=false&f.item_category2_hdr_facet.facet.sort=false&f.header_category2_facet.facet.sort=false&f.sold_to_party.facet.sort=false&facet.field=sold_to_party&facet.field=item_category3_hdr_facet&facet.field=item_category2_hdr_facet&facet.field=item_category1_hdr_facet&facet.field=operational_effect_facet&facet.field=when_discovered_facet&facet.field=environment_facet&facet.field=header_category2_facet&facet.field=header_status_facet&facet.field=howmal_facet&facet.field=order_reason_facet&facet.field=employee_responsible_facet&facet.field=contact_facet&facet.field=priority&facet.field=ac_model_facet&f.order_reason_facet.facet.mincount=1&f.howmal_facet.facet.sort=false&fq=requested_start:[+1991-01-01T00:00:00Z+TO+2015-07-23T23:59:59Z+]&fq=document_type:header&f.priority.facet.sort=false&f.header_status_facet.facet.sort=false&f.item_category1_hdr_facet.facet.mincount=1&f.sold_to_party.facet.mincount=1&f.operational_effect_facet.facet.mincount=1&f.priority.facet.mincount=1&rows=100&f.item_category2_hdr_facet.facet.mincount=1&f.employee_responsible_facet.facet.mincount=1&start=0&q={!type%3Dedismax+qf%3D'service_order^9+serial_number^9+material_hdr^9+description^8+notes_problem_description^7+notes_quick_response^6+notes_question_subject^5+notes_internal_note^4+notes_request_comments^3+notes_easa_aircarrier_notes^3+notes_apparent_cause^3+doc_content_hdr^2'+pf%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'+pf2%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'+pf3%3D'description~4^8+notes_problem_description~4^7+notes_quick_response~4^6+notes_question_subject~4^5+notes_internal_note~4^4+notes_request_comments~4^3+notes_easa_aircarrier_notes~4^3+notes_apparent_cause~4^3+doc_content_hdr~10^2'}driveshaft+corrosion&f.when_discovered_facet.facet.sort=fa -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, July 23, 2015 11:18 AM To: solr-user@lucene.apache.org Subject: Re: Inconsistent Solr Search Results The query you're running would help. But here's a guess: You say you have a "3 node Solr cluster". By that I'm guessing you mean a single shard with 1 leader and 2 replicas. when the primary sort criteria (score by default) is tied between two documents, the internal Lucene doc ID is used as a tiebreaker. If you're doing something like a *:* query (asterisk:asterisk in case bolding happens) then the score for all docs is the same. Here's the kicker: the internal Lucene doc id will be different in each replica. So my guess is you're getting results from different replicas where the internal doc id is between docs has different relations. So I claim you don't get different results every time, you get 1 of three result orderings at a guess. If you have a decent size corpus and are searching by more interesting criteria this should be much less of a problem, but still theoretically can happen. To nail down the ordering completely, specify a secondary sort, as &sort=score,id Best, Erick On Thu, Jul 23, 2015 at 8:46 AM, Tarala, Magesh wrote: > I have about 15K documents in a 3 node solr cluster. When I execute a simple > search, I g
Inconsistent Solr Search Results
I have about 15K documents in a 3 node solr cluster. When I execute a simple search, I get the results in different order every time I search. But the number of records is the same. Here's the definition for the field. Any ideas, suggestions would be greatly appreciated. Thanks, Magesh
RE: Solr cloud error during document ingestion
Shawn, Here are my responses: >> Is that the entire error, or is there additional error information? Do >> you have any way to know exactly what is in that request that's throwing >> the error? That's the entire error stack. Don’t see anything else in solr log. Probably need to turn on additional logging? I've identified the text in the email (.msg) that's causing it. This is it: (daños) The tilde in the n is the culprit. If I remove this and run the load, it works fine. >> You said 4.10.2 ... is this the Solr or SolrJ version? Are both of them >> the same version? Are you running Solr in the included jetty, or have >> you installed it into another servlet container? What Java vendor and >> version are you running, and is it 64-bit? Solr version is 4.10.2 Solrj version is 4.10.3 Using built in Jetty. Java(TM) SE Runtime Environment (build 1.7.0_67-b01) >> Can you share your SolrJ code, solrconfig, schema, and any other >> information you can think of that might be relevant? Yes, absolutely. Where would you like to see it posted? >> Because the error is from a request, I doubt that autoCommit has >> anything to do with the problem, but I could be wrong about that. Yes, agree. This is not related to autoCommit. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Sunday, July 12, 2015 6:25 PM To: solr-user@lucene.apache.org Subject: Re: Solr cloud error during document ingestion On 7/11/2015 9:33 PM, Tarala, Magesh wrote: > I'm using 4.10.2 in a 3 node solr cloud setup > I have a collection with 3 shards and 2 replicas each. > I'm ingesting solr documents via solrj. > > While ingesting the documents, I get the following error: > > 264147944 [updateExecutor-1-thread-268] ERROR > org.apache.solr.update.StreamingSolrServers ? error > org.apache.solr.common.SolrException: Bad Request > > request: > http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >at java.lang.Thread.run(Thread.java:745) > > I commit after every 100 documents in solrj. > And I also have the following solrconfig.xml setting: > >${solr.autoCommit.maxTime:15000} >false > Is that the entire error, or is there additional error information? Do you have any way to know exactly what is in that request that's throwing the error? You said 4.10.2 ... is this the Solr or SolrJ version? Are both of them the same version? Are you running Solr in the included jetty, or have you installed it into another servlet container? What Java vendor and version are you running, and is it 64-bit? Can you share your SolrJ code, solrconfig, schema, and any other information you can think of that might be relevant? Because the error is from a request, I doubt that autoCommit has anything to do with the problem, but I could be wrong about that. Thanks, Shawn
RE: Solr cloud error during document ingestion
I'm using solrj to ingest the documents. But I'm using only one client now. Yes Erick, it is weird. You are right, it is already in UTF-8 and so even if I convert it explicitly it is the same string and so, same issue... I'm still stumped:( The code is straight forward. I am using Tika and solrj. Creating Tika AutoDetectParser(), getting the BodyContentHandler, creating a SafeContentHandler, and getting the content as string. Then passing the string to SolrInputDocument The error is caused when the SolrInputDocument is sent to solr. One other piece of info.. We are creating nested documents. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, July 12, 2015 1:41 PM To: solr-user@lucene.apache.org Subject: Re: Solr cloud error during document ingestion How are you ingesting documents? ExtractingRequestHandler? That loads all the work to the Solr node(s) you might want to consider using SolrJ as that gives you much more control as well as the ability to farm out the work to N clients. Another blog: https://lucidworks.com/blog/indexing-with-solrj/ Best, Erick P.S. Glad you found the problem, but it's a little weird. Solr already talks UTF-8 so this should "just work", but then I'm not familiar with all the details of your setup. On Sun, Jul 12, 2015 at 10:11 AM, Tarala, Magesh wrote: > I narrowed down the cause. And it is a character issue! > > The .msg file content I'm extracting using Tika parser has this text > (daños) If I remove the character n with the tilde, it works. > > Explicitly convert to UTF-8 before sending it to solr? > > Erick - I'm in the QA phase. I'll be ingesting around 800K documents total > (word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates > when we first go to prod end of month. i.e., capture all the new and modified > documents on a daily basis and update solr. Once we get a grasp of things, we > want to go near real time. Thanks for the link to your post. It is very > helpful. > > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Sunday, July 12, 2015 11:24 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr cloud error during document ingestion > > Probably not related to your problem, but if you're sending lots of docs at > Solr, committing every 100 is very aggressive. > I'm assuming you're committing from the client, which, while OK > doesn't scale very well if you ever decide to have more than > 1 client sending docs. > > I'd recommend setting your hard commit to a minute or so and just leaving it > at that if possible, with soft committing to make the docs visible. > > Here's more than you ever wanted to know about soft commits, hard commits and > such: > https://lucidworks.com/blog/understanding-transaction-logs-softcommit- > and-commit-in-sorlcloud/ > > Best, > Erick > > On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev > wrote: >> I suggest to check >> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2 >> <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?u >> p >> date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983% >> 2 Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2> >> logs to find root cause. >> >> On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh wrote: >> >>> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection >>> with 3 shards and 2 replicas each. >>> I'm ingesting solr documents via solrj. >>> >>> While ingesting the documents, I get the following error: >>> >>> 264147944 [updateExecutor-1-thread-268] ERROR >>> org.apache.solr.update.StreamingSolrServers ? error >>> org.apache.solr.common.SolrException: Bad Request >>> >>> request: >>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 >>> at >>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>at java.lang.Thread.run(Thread.java:745) >>> >>> I commit after every 100 documents in solrj. >>> And I also have the following solrconfig.xml setting: >>> >>>
RE: Solr cloud error during document ingestion
I narrowed down the cause. And it is a character issue! The .msg file content I'm extracting using Tika parser has this text (daños) If I remove the character n with the tilde, it works. Explicitly convert to UTF-8 before sending it to solr? Erick - I'm in the QA phase. I'll be ingesting around 800K documents total (word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates when we first go to prod end of month. i.e., capture all the new and modified documents on a daily basis and update solr. Once we get a grasp of things, we want to go near real time. Thanks for the link to your post. It is very helpful. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, July 12, 2015 11:24 AM To: solr-user@lucene.apache.org Subject: Re: Solr cloud error during document ingestion Probably not related to your problem, but if you're sending lots of docs at Solr, committing every 100 is very aggressive. I'm assuming you're committing from the client, which, while OK doesn't scale very well if you ever decide to have more than 1 client sending docs. I'd recommend setting your hard commit to a minute or so and just leaving it at that if possible, with soft committing to make the docs visible. Here's more than you ever wanted to know about soft commits, hard commits and such: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev wrote: > I suggest to check > http://10.222.238.35:8983/solr/serviceorder_shard1_replica2 > <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?up > date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2 > Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2> > logs to find root cause. > > On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh wrote: > >> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection >> with 3 shards and 2 replicas each. >> I'm ingesting solr documents via solrj. >> >> While ingesting the documents, I get the following error: >> >> 264147944 [updateExecutor-1-thread-268] ERROR >> org.apache.solr.update.StreamingSolrServers ? error >> org.apache.solr.common.SolrException: Bad Request >> >> request: >> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 >> at >> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>at java.lang.Thread.run(Thread.java:745) >> >> I commit after every 100 documents in solrj. >> And I also have the following solrconfig.xml setting: >> >>${solr.autoCommit.maxTime:15000} >>false >> >> >> >> IMO, tlogs (for serviceorder_shard1_replica2) are not too big >> -rw-r--r-- 1 solr users 8338 Jul 11 21:40 tlog.364 >> -rw-r--r-- 1 solr users 6385 Jul 11 21:40 tlog.365 >> -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.366 >> -rw-r--r-- 1 solr users 5981 Jul 11 21:41 tlog.367 >> -rw-r--r-- 1 solr users 2682 Jul 11 21:41 tlog.368 >> -rw-r--r-- 1 solr users 8515 Jul 11 21:42 tlog.369 >> -rw-r--r-- 1 solr users 7373 Jul 11 21:42 tlog.370 >> -rw-r--r-- 1 solr users 6907 Jul 11 21:42 tlog.371 >> -rw-r--r-- 1 solr users 5524 Jul 11 21:42 tlog.372 >> -rw-r--r-- 1 solr users 5600 Jul 11 21:43 tlog.373 >> >> >> So far I've not been able to resolve this issue. Any ideas / pointers >> would be greatly appreciated! >> >> Thanks, >> Magesh >> >> > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> >
Solr cloud error during document ingestion
I'm using 4.10.2 in a 3 node solr cloud setup I have a collection with 3 shards and 2 replicas each. I'm ingesting solr documents via solrj. While ingesting the documents, I get the following error: 264147944 [updateExecutor-1-thread-268] ERROR org.apache.solr.update.StreamingSolrServers ? error org.apache.solr.common.SolrException: Bad Request request: http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I commit after every 100 documents in solrj. And I also have the following solrconfig.xml setting: ${solr.autoCommit.maxTime:15000} false IMO, tlogs (for serviceorder_shard1_replica2) are not too big -rw-r--r-- 1 solr users 8338 Jul 11 21:40 tlog.364 -rw-r--r-- 1 solr users 6385 Jul 11 21:40 tlog.365 -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.366 -rw-r--r-- 1 solr users 5981 Jul 11 21:41 tlog.367 -rw-r--r-- 1 solr users 2682 Jul 11 21:41 tlog.368 -rw-r--r-- 1 solr users 8515 Jul 11 21:42 tlog.369 -rw-r--r-- 1 solr users 7373 Jul 11 21:42 tlog.370 -rw-r--r-- 1 solr users 6907 Jul 11 21:42 tlog.371 -rw-r--r-- 1 solr users 5524 Jul 11 21:42 tlog.372 -rw-r--r-- 1 solr users 5600 Jul 11 21:43 tlog.373 So far I've not been able to resolve this issue. Any ideas / pointers would be greatly appreciated! Thanks, Magesh
RE: Solr Encoding Issue?
Shawn - Stupid coding error in my java code. Used default charset. Changed to UTF-8 and problem fixed. Thanks again! -Original Message- From: Tarala, Magesh Sent: Wednesday, July 08, 2015 8:11 PM To: solr-user@lucene.apache.org Subject: RE: Solr Encoding Issue? Wow, that makes total sense. Thanks Shawn!! I'll go down this path. Thanks, Magesh -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, July 08, 2015 7:24 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? On 7/8/2015 6:09 PM, Tarala, Magesh wrote: > I believe the issue is in solr. The character “à” is getting stored in solr > as “à ”. Notice the space after Ã. > > I'm using solrj to ingest the documents into solr. So, one of those could be > the culprit? Solr accepts and outputs text in UTF-8. The UTF-8 hex encoding for the à character is C3A0. In the latin1 character set, hex C3 is the à character. Similarly, in latin1, hex A0 is a non-breaking space. So it sounds like your input is encoded as UTF-8, therefore that character in your input source is hex c3a0, but something in your indexing process is incorrectly interpreting the UTF-8 representation as latin1, so it sees it as "à ". SolrJ is faithfully converting that input to UTF-8 and sending it to Solr. Thanks, Shawn
RE: Solr Encoding Issue?
Wow, that makes total sense. Thanks Shawn!! I'll go down this path. Thanks, Magesh -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Wednesday, July 08, 2015 7:24 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? On 7/8/2015 6:09 PM, Tarala, Magesh wrote: > I believe the issue is in solr. The character “à” is getting stored in solr > as “à ”. Notice the space after Ã. > > I'm using solrj to ingest the documents into solr. So, one of those could be > the culprit? Solr accepts and outputs text in UTF-8. The UTF-8 hex encoding for the à character is C3A0. In the latin1 character set, hex C3 is the à character. Similarly, in latin1, hex A0 is a non-breaking space. So it sounds like your input is encoded as UTF-8, therefore that character in your input source is hex c3a0, but something in your indexing process is incorrectly interpreting the UTF-8 representation as latin1, so it sees it as "à ". SolrJ is faithfully converting that input to UTF-8 and sending it to Solr. Thanks, Shawn
RE: Solr Encoding Issue?
Thanks Erick. I believe the issue is in solr. The character “à” is getting stored in solr as “Ã ”. Notice the space after Ã. I'm using solrj to ingest the documents into solr. So, one of those could be the culprit? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, July 08, 2015 1:36 PM To: solr-user@lucene.apache.org Subject: Re: Solr Encoding Issue? Attachments are pretty aggressively stripped by the e-mail server, so there's nothing to see, you'll have to paste it somewhere else and provide a link. Usually, though, this is a character set issue with the browser using a different charset than Solr, it's really the same character, just displayed differently. Shot in the dark though. Erick On Wed, Jul 8, 2015 at 10:49 AM, Tarala, Magesh wrote: > I’m ingesting a .TXT file with HTML content into Solr. The content > has the following character highlighted below: > > The file we get from CRM (also attached): > > [image: cid:image001.png@01D0B972.75BE23F0] > > > > > > After ingesting into solr, I see a different character. This is query > response from solr management console. > > > > [image: cid:image003.png@01D0B972.D1AED290] > > > > > > Anybody know how I can prevent this from happening? > > > > Thanks! >
RE: Solr Encoding Issue?
Looks like images did not come through. Here's the text... I'm ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): Enter Data in TK Onlyà After ingesting into solr, I see a different character. This is query response from solr management console. Enter Data in TK Onlyà I'm expecting to see à But I'm seeing à Anybody know how I can prevent this from happening? Thanks!
Solr Encoding Issue?
I'm ingesting a .TXT file with HTML content into Solr. The content has the following character highlighted below: The file we get from CRM (also attached): [cid:image001.png@01D0B972.75BE23F0] After ingesting into solr, I see a different character. This is query response from solr management console. [cid:image003.png@01D0B972.D1AED290] Anybody know how I can prevent this from happening? Thanks!
RE: Jetty Plus for Solr 4.10.4
Hi Shawn - Thank you for the quick and detailed response!! Good to hear that Jetty 8 installation with solr for typical uses does not need to be modified. I believe what we have is a "typical" use case. We will be installing solr on 3 nodes in our Hadoop cluster. Will use Hadoop's zookeeper. One collection with 3 shards and 2 replicas each. Have not benchmarked performance. So, may need more shards, nodes,... Data volume and user volumes are not very high. But we are using nested document structure. We are concerned that it may introduce performance issues. Will check it out. Regarding you recommendation to upgrade to Solr 5.2.1, we have Hortonworks HDP 2.2 in place and they support 4.10. Will revisit the decision. Thanks, Magesh -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Monday, June 29, 2015 11:50 AM To: solr-user@lucene.apache.org Subject: Re: Jetty Plus for Solr 4.10.4 On 6/29/2015 8:44 AM, Tarala, Magesh wrote: > We are planning to go to production with Solr 4.10.4. Documentation > recommends to use full Jetty package that includes JettyPlus. I'm not able to > find the instructions to do this. Can someone point me in the right direction? I found the official page that talks about JettyPlus. https://wiki.apache.org/solr/SolrJetty Note at the top of the page where it says that info is outdated for Jetty 8. Solr has been using Jetty 8 since version 4.0-ALPHA -- for nearly three years now. Typical use cases for Solr do *not* require a full Jetty install. Even most non-typical use cases do not require it. Solr 4.10 includes the bin/solr script for startup, which runs the Jetty that's included in the Solr download. Solr 5.x makes those scripts even better. If you haven't made it to production yet, you should probably consider upgrading to Solr 5.2.1. If you are not going to use the Jetty included with Solr, then you're pretty much on your own. You can take the war file from the dist directory, the logging jars from the example/lib/ext directory, and the logging config from example/resources, and install it in most of the available servlet containers. Starting with 5.0, the included Jetty is the only officially supported way to start Solr, and the war is no longer included in the dist directory in the download. https://wiki.apache.org/solr/WhyNoWar Thanks, Shawn
Jetty Plus for Solr 4.10.4
We are planning to go to production with Solr 4.10.4. Documentation recommends to use full Jetty package that includes JettyPlus. I'm not able to find the instructions to do this. Can someone point me in the right direction? Thanks, Magesh