RE: date range tree
I solved it by adding a loop for years and one for quartals in which i count the month-facets -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Montag, 11. November 2013 17:52 To: solr-user@lucene.apache.org Subject: RE: date range tree Has someone at least got a idee how i could do a year/month-date-tree? In Solr-Wiki it is mentioned that facet.date.gap=+1DAY,+2DAY,+3DAY,+10DAY should create 4 buckets but it doesn't work -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 7. November 2013 18:23 To: solr-user@lucene.apache.org Subject: date range tree I would like to make a facet on a date field with the following tree: 2013 4.Quartal December November Oktober 3.Quartal September August Juli 2.Quartal June Mai April 1. Quartal March February January 2012 . Same as above So far I have this in solrconfig.xml: str name=facet.date{!ex=last_modified,thema,inhaltstyp,doctype}last_modified /str str name=facet.date.gap+1MONTH/str str name=facet.date.endNOW/MONTH/str str name=facet.date.startNOW/MONTH-36MONTHS/str str name=facet.date.otherafter/str Can I do this in one query or do I need multiple queries? If yes how would I do the second and keep all the facet queries in the count?
Re: serialization error - BinaryResponseWriter
Mhhh, I run a dih full reload every night, and the source field is a sqlserver smallint column... By the way I'll try cleaning the data dir of the index and reindexing Il 12/11/13 17:13, Shawn Heisey ha scritto: On 11/12/2013 2:37 AM, giovanni.bricc...@banzai.it wrote: I'm getting some errors reading boolean filelds, can you give me any suggestions? in this example I only have four false fields: leasing=false, FiltroNovita=false, FiltroFreeShipping=false, Outlet=false. this is the stack trace (solr 4.2.1) java.lang.NumberFormatException: For input string: false at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Integer.valueOf(Integer.java:582) at org.apache.solr.schema.IntField.toObject(IntField.java:89) at org.apache.solr.schema.IntField.toObject(IntField.java:43) at org.apache.solr.response.BinaryResponseWriter$Resolver.getValue(BinaryResponseWriter.java:223) Solr stores boolean values internally as a number - 0 or 1. That gets changed to true/false when displaying search results. It sounds like what you have here is quite possibly an index which originally had text fields with the literal string true or false, and you've changed your schema so these fields are now boolean. When you change your schema, you have to reindex. http://wiki.apache.org/solr/HowToReindex Thanks, Shawn
Re: Multi-Tenant Setup in Single Core
On 11/12/13 5:20 PM, Shawn Heisey wrote: Ensure that all handler names start with a slash character, so they are things like /query, /select, and so on. Make sure that handleSelect is set to false on your requestDispatcher config. This is how Solr 4.x examples are set up already. With that config, the qt parameter will not function and will be ignored -- you must use the request handler path as part of the URL -- /solr/corename/handler. Great thanks, I already had it this way but I wasn't aware of these fine details, very helpful. Christian
Re: Modify the querySearch to q=*:*
Hi: First of all I have to say that I had never heard about *\* as the query to get all the documents in a index but *:* (maybe I'm wrong) . Re-reading Apache Solr 4 cookbook, Solr 1.4 Enterprise Search Server and Apache Solr 3 Enterprise Search Server there is no trace for the query *\* as the universal query to get every doc. If you enable debugQueryhttp://wiki.apache.org/solr/CommonQueryParameters#debugQuery you can see that *:* is transformed into MatchAllDocsQuery(*:*) (Solr1.4 and Solr4.4) wich means give me all the documents, but the query *\* is transformed into other thing (In my case having a default field called description defined in the schema) I get in Solr1.4 description:*\\* wich means give all the documents that have the char \ in the field description and in SOLR1.4 I get description:** which also gets all the documents in the index. It would be helpful to see how is interpreted *\* in your system (solr3.5 and solr4). I think, the best way to solve your problem Is to modify the system which launches the request to SOLR and modify *\* by *:* (if it is possible). I dont know if SOLR can make that kind of translation, I mean change *\* by *:*. One possible workaround with collateral damages is the inclusion of a PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you use to search in order to delete every \ character included in the input or even include the expression to transform *\* into *:* . But including that element in your schema means that it will always be used during your search (thus if your users type a\b they will search ab). If you want to explore that path I recommend you to use the analysis toolhttps://cwiki.apache.org/confluence/display/solr/Analysis+Screenincluded in solr. Regards. On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey s...@elyograg.org wrote: On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco) wrote: I am trying to set the query to q=*:* permanently. I tried to set q=*:* in SolrConfig.xml file as follows. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsnone/str str name=q*:*/str /lst /requestHandler But this didn’t help. Please advise how to change query to q=*:* in Solr 4.4. This configuration sets the default for the q parameter to *:*, but if the actual query that is sent to Solr has a q parameter, it will override that default. In the very unlikely situation that you don't want to ever do any query besides *:*, you can put that setting into the invariants section instead of the defaults section - but be aware that if you do that, you will never be able to send any other query.Normally your application decides what the query string should be, not Solr. I concur with Jack's recommendation that you migrate to the 4.x way of naming handlers. You would need to set handleSelect to false and change all your search handlers so their name starts with a slash. The one that is currently named standard would instead be named /select and you would need to remove the default=true setting. Thanks, Shawn
Re: solrcloud - forward update to a shard failed
Do you do your commit from the two indexing clients or have the autocommit set to maxDocs = 1000 ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html Sent from the Solr - User mailing list archive at Nabble.com.
Updating Document Score With Payload of Multivalued Field?
Here is my case; I have a field at my schema named *elmo_field*. I want that *elmo_field* should have multiple values and multiple payloads. i.e. dorothy|0.46 sesame|0.37 big bird|0.19 bird|0.22 When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to score. If user searches for *big bird *0.19 and if user searches for *bird * 0.22 I mean I will make a search on my index at my other fields of solr schema. And I will make another search (this one is an exact match search) at *elmo_field* at same time and if matches something I will increase score with payloads. How can I do that: adding something to score at multivalued payload (with a nested query or not) and do you have any other ideas to achieve that?
Re: Updating Document Score With Payload of Multivalued Field?
PS: I use Solr 4.5.1 2013/11/13 Furkan KAMACI furkankam...@gmail.com Here is my case; I have a field at my schema named *elmo_field*. I want that *elmo_field* should have multiple values and multiple payloads. i.e. dorothy|0.46 sesame|0.37 big bird|0.19 bird|0.22 When a user searches for a keyword i.e. *dorothy* I want to add 0.46 to score. If user searches for *big bird *0.19 and if user searches for *bird *0.22 I mean I will make a search on my index at my other fields of solr schema. And I will make another search (this one is an exact match search) at *elmo_field* at same time and if matches something I will increase score with payloads. How can I do that: adding something to score at multivalued payload (with a nested query or not) and do you have any other ideas to achieve that?
Re: Why do people want to deploy to Tomcat?
So, it sounds like that either Solr is treated as a webapp, in which case it is installed with most of the webapps under Tomcat (legacy/operational reason). So, Solr docs just needs to explain how to deploy under Tomcat and the rest of document/tooling comes from Tomcat community. Or, if Solr is treated not as a webapp but as a black box, it needs to support and explain all the operational requirements (deployment, extension, monitoring) that are currently waved away as a 'container issue'. Regards, Alex. P.s. I also agree that example directory layout is become very confusing and may need to be re-thought. Probably a discussion for a different thread, if somebody has a thought out suggestion. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa gopalpa...@gmail.com wrote: My case is also similar to Sujit Pal but we have jboss6. On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal sujit@comcast.net wrote: In our case, it is because all our other applications are deployed on Tomcat and ops is familiar with the deployment process. We also had customizations that needed to go in, so we inserted our custom JAR into the solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr was (almost, except for schema.xml or solrconfig.xml changes) identical to any of the other apps. But I think if Solr becomes a server with clearly defined extension points (such as dropping your custom JARs into lib/ and custom configuration in conf/solrconfig.xml or similar like it already is) then it will be treated as something other than a webapp and the expectation that it runs on Tomcat will not apply. Just my $0.02... Sujit On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl sgoes...@gmx.at wrote: Hi ALex, in my case * ignorance that Tomcat is not fully supported * Tomcat configuration and operations know-how inhouse * could migrate to Jetty but need approved change request to do so Cheers, Siegfried Goeschl On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious. What is the actual business case. Is that because Tomcat is well known? Is it because other apps are running under Tomcat and it is ops' requirement? Is it because Tomcat gives something - to Solr - that Jetty does not? It might be useful to know. Especially, since Solr team is considering making the server part into a black box component. What use cases will that break? So, if somebody runs Solr under Tomcat (or needed to and gave up), let's use this thread to collect this knowledge. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
(info) lucene first search performance
Dear lucene In order to test the solr search performance ,I closed all the cache solr [cid:image001.png@01CEE0AA.39ECDE90] insert into the 10 million data,and find the first search very slowly(700ms),and the secondary search very quick(20ms),I am sure no solr cache。 This problem bothering me for a month, Tracing the source code found [说明: 说明: cid:image001.png@01CED80C.EF49C740] Fisrt invoke readVIntBlock method always very slowly ,and secondary invoke readVIntBlock method is very quick, I don't know what reason is this Eagerly awaiting your reply, thanks very much!!!
Re: Why do people want to deploy to Tomcat?
Hi, Reading that people have considered deploying example folder is slightly strange to me. No wonder they are confused and confuse their ops. We just took vanilla jetty (jetty9) and installed solr.war on it, configured it, no example folders at all. Since then it works nicely. The main reason for us to get away from tomcat, that we have used originally, was that it felt too heavy for running a Solr webapp, which isn't using anything Tomcat-specific. In older versions (tomcat6) it would leak memory and threads. We knew, that jetty is mature enough and is lighter and used at large companies, like Google. This was convincing enough to try. We are still using Tomcat for other webapps, specifically for clustering and load balancing between webapp instances, but that is not needed for our Solr installation at this point. Regards, Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan On Wed, Nov 13, 2013 at 1:42 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: So, it sounds like that either Solr is treated as a webapp, in which case it is installed with most of the webapps under Tomcat (legacy/operational reason). So, Solr docs just needs to explain how to deploy under Tomcat and the rest of document/tooling comes from Tomcat community. Or, if Solr is treated not as a webapp but as a black box, it needs to support and explain all the operational requirements (deployment, extension, monitoring) that are currently waved away as a 'container issue'. Regards, Alex. P.s. I also agree that example directory layout is become very confusing and may need to be re-thought. Probably a discussion for a different thread, if somebody has a thought out suggestion. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Nov 12, 2013 at 8:32 PM, Gopal Patwa gopalpa...@gmail.com wrote: My case is also similar to Sujit Pal but we have jboss6. On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal sujit@comcast.net wrote: In our case, it is because all our other applications are deployed on Tomcat and ops is familiar with the deployment process. We also had customizations that needed to go in, so we inserted our custom JAR into the solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr was (almost, except for schema.xml or solrconfig.xml changes) identical to any of the other apps. But I think if Solr becomes a server with clearly defined extension points (such as dropping your custom JARs into lib/ and custom configuration in conf/solrconfig.xml or similar like it already is) then it will be treated as something other than a webapp and the expectation that it runs on Tomcat will not apply. Just my $0.02... Sujit On Tue, Nov 12, 2013 at 9:13 AM, Siegfried Goeschl sgoes...@gmx.at wrote: Hi ALex, in my case * ignorance that Tomcat is not fully supported * Tomcat configuration and operations know-how inhouse * could migrate to Jetty but need approved change request to do so Cheers, Siegfried Goeschl On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious. What is the actual business case. Is that because Tomcat is well known? Is it because other apps are running under Tomcat and it is ops' requirement? Is it because Tomcat gives something - to Solr - that Jetty does not? It might be useful to know. Especially, since Solr team is considering making the server part into a black box component. What use cases will that break? So, if somebody runs Solr under Tomcat (or needed to and gave up), let's use this thread to collect this knowledge. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: distributed search is significantly slower than direct search
One thing you can try, and this is more diagnostic than a cure, is return just the id field (and insure that lazy field loading is true). That'll tell you whether the issue is actually fetching the document off disk and decompressing, although frankly that's unlikely since you can get your 5,000 rows from a single machine quickly. The code you found where Solr is spending its time, is that on the routing core or on the shards? I actually have a hard time understanding how that code could take a long time, doesn't seem right. You are transferring 5,000 docs across the network, so it's possible that your network is just slow, that's certainly a difference between the local and remote case, but that's a stab in the dark. Not much help I know, Erick On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir elr...@checkpoint.com wrote: Erick, Thanks for your response. We are upgrading our system using Solr. We need to preserve old functionality. Our client displays 5K document and groups them. Is there a way to refactor code in order to improve distributed documents fetching? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, October 30, 2013 3:17 AM To: solr-user@lucene.apache.org Subject: Re: distributed search is significantly slower than direct search You can't. There will inevitably be some overhead in the distributed case. That said, 7 seconds is quite long. 5,000 rows is excessive, and probably where your issue is. You're having to go out and fetch the docs across the wire. Perhaps there is some batching that could be done there, I don't know whether this is one document per request or not. Why 5K docs? Best, Erick On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, I am using Solr 4.4 with multi cores. One core (called template) is my routing core. When I run http://127.0.0.1:8983/solr/template/select?rows=5000q=*:*shards=127. 0.0.1:8983/solr/core1, it consistently takes about 7s. When I run http://127.0.0.1:8983/solr/core1/select?rows=5000q=*:*, it consistently takes about 40ms. I profiled the distributed query. This is the distributed query process (I hope the terms are accurate): When solr identifies a distributed query, it sends the query to the shard and get matched shard docs. Then it sends another query to the shard to get the Solr documents. Most time is spent in the last stage in the function process of QueryComponent in: for (int i=0; iidArr.size(); i++) { int id = req.getSearcher().getFirstMatch( new Term(idField.getName(), idField.getType().toInternal(idArr.get(i; How can I make my distributed query as fast as the direct one? Thanks. Email secured by Check Point
Re: (info) lucene first search performance
Solr uses the MMap Directory by default. What you see is surely a filesystem cache. Once a file is accessed, it's memory mapped. Restarting solr won't reset it. On unix, you may reset this cache with echo 3 /proc/sys/vm/drop_caches Franck Brisbart Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang (mis.cnsh04.Newegg) 41361 a écrit : Dear lucene In order to test the solr search performance ,I closed all the cache solr insert into the 10 million data,and find the first search very slowly(700ms),and the secondary search very quick(20ms),I am sure no solr cache。 This problem bothering me for a month, Tracing the source code found 说明: 说明: cid:image001.png@01CED80C.EF49C740 Fisrt invoke readVIntBlock method always very slowly ,and secondary invoke readVIntBlock method is very quick, I don't know what reason is this Eagerly awaiting your reply, thanks very much!!!
Re: solrcloud - forward update to a shard failed
Explicit commits after writing 1000 docs in a batch from both indexing clients. No auto commit. Thanks. -Original Message Do you do your commit from the two indexing clients or have the autocommit set to maxDocs = 1000 ? - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: (info) lucene first search performance
I have to ask a different question: Why would you disable the caches? You're trying to test worst-case times perhaps? Because the caches are an integral part of Solr performance. Disabling them artificially reduces your performance numbers. So disabling them is useful for answering the question how bad can it get, but it's also skewing your results FWIW, Erick On Wed, Nov 13, 2013 at 7:42 AM, fbrisbart fbrisb...@bestofmedia.comwrote: Solr uses the MMap Directory by default. What you see is surely a filesystem cache. Once a file is accessed, it's memory mapped. Restarting solr won't reset it. On unix, you may reset this cache with echo 3 /proc/sys/vm/drop_caches Franck Brisbart Le mercredi 13 novembre 2013 à 11:58 +, Jacky.J.Wang (mis.cnsh04.Newegg) 41361 a écrit : Dear lucene In order to test the solr search performance ,I closed all the cache solr insert into the 10 million data,and find the first search very slowly(700ms),and the secondary search very quick(20ms),I am sure no solr cache。 This problem bothering me for a month, Tracing the source code found 说明: 说明: cid:image001.png@01CED80C.EF49C740 Fisrt invoke readVIntBlock method always very slowly ,and secondary invoke readVIntBlock method is very quick, I don't know what reason is this Eagerly awaiting your reply, thanks very much!!!
Re: Modify the querySearch to q=*:*
Just in case anybody is curious what *\* would really mean, the backslash means to escape the following character, which in this case means don't treat the second asterisk as a wildcard, but since the initial asterisk was not escaped (the full rule is that if there is any unescaped wildcard in a term then all of the escaped wildcards are treated as unescaped since Lucene has no support for escaping in WildcardQuery), any escaping of wildcards in the term is ignored, so *\* is treated as **, and ** is redundant and matches the same as *, so a *\* query would simply match all documents that have a value in the default search field. In many cases this would give identical results to a *:* query, but in some apps it might not. Still it would be nice to know who originated this suggestion to use *\* instead of *:* - or even simply *. -- Jack Krupansky -Original Message- From: Alvaro Cabrerizo Sent: Wednesday, November 13, 2013 4:16 AM To: solr-user@lucene.apache.org Subject: Re: Modify the querySearch to q=*:* Hi: First of all I have to say that I had never heard about *\* as the query to get all the documents in a index but *:* (maybe I'm wrong) . Re-reading Apache Solr 4 cookbook, Solr 1.4 Enterprise Search Server and Apache Solr 3 Enterprise Search Server there is no trace for the query *\* as the universal query to get every doc. If you enable debugQueryhttp://wiki.apache.org/solr/CommonQueryParameters#debugQuery you can see that *:* is transformed into MatchAllDocsQuery(*:*) (Solr1.4 and Solr4.4) wich means give me all the documents, but the query *\* is transformed into other thing (In my case having a default field called description defined in the schema) I get in Solr1.4 description:*\\* wich means give all the documents that have the char \ in the field description and in SOLR1.4 I get description:** which also gets all the documents in the index. It would be helpful to see how is interpreted *\* in your system (solr3.5 and solr4). I think, the best way to solve your problem Is to modify the system which launches the request to SOLR and modify *\* by *:* (if it is possible). I dont know if SOLR can make that kind of translation, I mean change *\* by *:*. One possible workaround with collateral damages is the inclusion of a PatternReplaceCharFilterFactory (in schema.xml) within the fieldtypes you use to search in order to delete every \ character included in the input or even include the expression to transform *\* into *:* . But including that element in your schema means that it will always be used during your search (thus if your users type a\b they will search ab). If you want to explore that path I recommend you to use the analysis toolhttps://cwiki.apache.org/confluence/display/solr/Analysis+Screenincluded in solr. Regards. On Wed, Nov 13, 2013 at 2:34 AM, Shawn Heisey s...@elyograg.org wrote: On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco) wrote: I am trying to set the query to q=*:* permanently. I tried to set q=*:* in SolrConfig.xml file as follows. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsnone/str str name=q*:*/str /lst /requestHandler But this didn’t help. Please advise how to change query to q=*:* in Solr 4.4. This configuration sets the default for the q parameter to *:*, but if the actual query that is sent to Solr has a q parameter, it will override that default. In the very unlikely situation that you don't want to ever do any query besides *:*, you can put that setting into the invariants section instead of the defaults section - but be aware that if you do that, you will never be able to send any other query.Normally your application decides what the query string should be, not Solr. I concur with Jack's recommendation that you migrate to the 4.x way of naming handlers. You would need to set handleSelect to false and change all your search handlers so their name starts with a slash. The one that is currently named standard would instead be named /select and you would need to remove the default=true setting. Thanks, Shawn
Re: solrcloud - forward update to a shard failed
I did something like that also, and i was getting some nasty problems when one of my clients would try to commit before a commit issued by another one hadn't yet finish. Might be the same problem for you too. Try not doing explicit commits fomr the indexing client and instead set the autocommit to 1000 docs or whichever value fits you best. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100670.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLRJ API to do similar CURL command execution
I am able to perform the xml atomic update properly using curl commands. However the moment I try to achieve the same using the solrj APIs I am facing problems. What should be the equivalent SOLRJ api code to perform similar action using the below CURL command ? curl http://search1.es.dupont.com:8080/solr/core1/update; -H Content-Type: text/xml --data-binary adddocfield name=\id\uniqueid/fieldfield name=\tags\ update=\add\updatefieldvalue/field/doc/add I have attempted below code but it fails to add the field in proper manner as it get set as {add=[updatefieldvalue]}. QueryResponse qs2 = solr.query(params2); MapString, ListString operation = new HashMapString, ListString(); ListString vals = new ArrayListString(); vals.add(tag); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, (String)qs2.getResults().get(j).get(id)); operation.put(add,vals); doc.addField(tags, operation); Thanks in advance for any inputs. Regards Anupam
Re: SOLRJ API to do similar CURL command execution
How can I post the whole XML string to SOLR using its SOLRJ API ? On Wed, Nov 13, 2013 at 6:50 PM, Anupam Bhattacharya anupam...@gmail.comwrote: I am able to perform the xml atomic update properly using curl commands. However the moment I try to achieve the same using the solrj APIs I am facing problems. What should be the equivalent SOLRJ api code to perform similar action using the below CURL command ? curl http://search1.es.dupont.com:8080/solr/core1/update; -H Content-Type: text/xml --data-binary adddocfield name=\id\uniqueid/fieldfield name=\tags\ update=\add\updatefieldvalue/field/doc/add I have attempted below code but it fails to add the field in proper manner as it get set as {add=[updatefieldvalue]}. QueryResponse qs2 = solr.query(params2); MapString, ListString operation = new HashMapString, ListString(); ListString vals = new ArrayListString(); vals.add(tag); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, (String)qs2.getResults().get(j).get(id)); operation.put(add,vals); doc.addField(tags, operation); Thanks in advance for any inputs. Regards Anupam -- Thanks Regards Anupam Bhattacharya
Updating an entry in Solr
Hi, I've been researching how to update a specific field of an entry in Solr, and it seems like the only way to do this is a delete then an add. Is there a better way to do this? If I want to change one field, do I have to store the whole entry locally, delete it from the solr index, and then add it with the new field? That seems like a big missing feature if so! Thanks Zach -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updating an entry in Solr
Okay, so I've found in the solr tutorial that if you do a POST command and post a new entry with the same uniquekey (in my case, id_) as an entry already in the index, solr will automatically replace it for you. That seems to be what I need, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLRJ API to do similar CURL command execution
(13/11/13 22:25), Anupam Bhattacharya wrote: How can I post the whole XML string to SOLR using its SOLRJ API ? The source code of SimplePostTool would be of some help: http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/SimplePostTool.html koji -- http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html
Re: Updating an entry in Solr
Yes, that's correct. You can also update document per field but all fields need to be stored=true, because Solr (version = 4.0) first gets your document from the index, creates new document with modified field, and adds it again to the index... Primoz From: gohome190 gohome...@gmail.com To: solr-user@lucene.apache.org Date: 13.11.2013 14:39 Subject:Re: Updating an entry in Solr Okay, so I've found in the solr tutorial that if you do a POST command and post a new entry with the same uniquekey (in my case, id_) as an entry already in the index, solr will automatically replace it for you. That seems to be what I need, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Data Import Handler
James can elaborate how to process driver=${dataimporter.request.driver} url =${dataimporter.request.url} and all where to mention these my purpose is to config my DB Details(url,uname,password) in properties file -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Wednesday, November 06, 2013 7:42 PM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler If you prepend the variable name with dataimporter.request, you can include variables like these as request parameters: dataSource name=ds driver=${dataimporter.request.driver} url=${dataimporter.request.url} / /dih?driver=some.driver.classurl=jdbc:url:something If you want to include these in solrcore.properties, you can additionally add each property to solrconfig.xml like this: requestHandler name=/dih class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=driver${dih.driver}/str str name=url${dih.url}/str /lst /requestHandler Then in solrcore.properties: dih.driver=some.driver.class dih.url=jdbc:url:something See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ramesh [mailto:ramesh.po...@vensaiinc.com] Sent: Wednesday, November 06, 2013 7:25 AM To: solr-user@lucene.apache.org Subject: Data Import Handler Hi Folks, Can anyone suggest me how can customize dataconfig.xml file I want to provide database details like( db_url,uname,password ) from my own properties file instead of dataconfig.xaml file
Re: Updating an entry in Solr
You should read here: http://wiki.apache.org/solr/Atomic_Updates 2013/11/13 primoz.sk...@policija.si Yes, that's correct. You can also update document per field but all fields need to be stored=true, because Solr (version = 4.0) first gets your document from the index, creates new document with modified field, and adds it again to the index... Primoz From: gohome190 gohome...@gmail.com To: solr-user@lucene.apache.org Date: 13.11.2013 14:39 Subject:Re: Updating an entry in Solr Okay, so I've found in the solr tutorial that if you do a POST command and post a new entry with the same uniquekey (in my case, id_) as an entry already in the index, solr will automatically replace it for you. That seems to be what I need, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Data Import Handler
In solrcore.properties, put: datasource.url=jdbc:xxx:yyy datasource.driver=com.some.driver In solrconfig.xml, put: requestHandler name=/dih class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults ... str name=dsDriver${datasource.driver}/str str name=dsUrl${datasource.url}/str ... /lst /requestHandler In data-config.xml, put: dataSource name=ds driver=${dataimporter.request.dsDriver} url=${dataimporter.request.dsUrl} / Hope this works for you. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ramesh [mailto:ramesh.po...@vensaiinc.com] Sent: Wednesday, November 13, 2013 9:00 AM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler James can elaborate how to process driver=${dataimporter.request.driver} url =${dataimporter.request.url} and all where to mention these my purpose is to config my DB Details(url,uname,password) in properties file -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Wednesday, November 06, 2013 7:42 PM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler If you prepend the variable name with dataimporter.request, you can include variables like these as request parameters: dataSource name=ds driver=${dataimporter.request.driver} url=${dataimporter.request.url} / /dih?driver=some.driver.classurl=jdbc:url:something If you want to include these in solrcore.properties, you can additionally add each property to solrconfig.xml like this: requestHandler name=/dih class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=driver${dih.driver}/str str name=url${dih.url}/str /lst /requestHandler Then in solrcore.properties: dih.driver=some.driver.class dih.url=jdbc:url:something See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ramesh [mailto:ramesh.po...@vensaiinc.com] Sent: Wednesday, November 06, 2013 7:25 AM To: solr-user@lucene.apache.org Subject: Data Import Handler Hi Folks, Can anyone suggest me how can customize dataconfig.xml file I want to provide database details like( db_url,uname,password ) from my own properties file instead of dataconfig.xaml file
RE: Data Import Handler
Need to be put out of solr like customized Mysolr_core.properties how to access it -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Wednesday, November 13, 2013 8:50 PM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler In solrcore.properties, put: datasource.url=jdbc:xxx:yyy datasource.driver=com.some.driver In solrconfig.xml, put: requestHandler name=/dih class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults ... str name=dsDriver${datasource.driver}/str str name=dsUrl${datasource.url}/str ... /lst /requestHandler In data-config.xml, put: dataSource name=ds driver=${dataimporter.request.dsDriver} url=${dataimporter.request.dsUrl} / Hope this works for you. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ramesh [mailto:ramesh.po...@vensaiinc.com] Sent: Wednesday, November 13, 2013 9:00 AM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler James can elaborate how to process driver=${dataimporter.request.driver} url =${dataimporter.request.url} and all where to mention these my purpose is to config my DB Details(url,uname,password) in properties file -Original Message- From: Dyer, James [mailto:james.d...@ingramcontent.com] Sent: Wednesday, November 06, 2013 7:42 PM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler If you prepend the variable name with dataimporter.request, you can include variables like these as request parameters: dataSource name=ds driver=${dataimporter.request.driver} url=${dataimporter.request.url} / /dih?driver=some.driver.classurl=jdbc:url:something If you want to include these in solrcore.properties, you can additionally add each property to solrconfig.xml like this: requestHandler name=/dih class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=driver${dih.driver}/str str name=url${dih.url}/str /lst /requestHandler Then in solrcore.properties: dih.driver=some.driver.class dih.url=jdbc:url:something See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Ramesh [mailto:ramesh.po...@vensaiinc.com] Sent: Wednesday, November 06, 2013 7:25 AM To: solr-user@lucene.apache.org Subject: Data Import Handler Hi Folks, Can anyone suggest me how can customize dataconfig.xml file I want to provide database details like( db_url,uname,password ) from my own properties file instead of dataconfig.xaml file
Strange behavior of gap fragmenter on highlighting
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is my configuration for the gap fragmenter: fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize150/int /lst /fragmenter This is the basic configuration, just tweaked the fragsize parameter to get shorter fragments. The thing is that for 1 particular PDF document in my results I get a really long snippet, way over 150 characters. This get a little more odd, if I change the 150 value for 100 the snippet for the same document it's normal ~ 100 characters. The type of the field being highlighted is this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SnowballPorterFilterFactory languange=Spanish/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=characters.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any ideas about what's happening?? Or how could I debug what is really going on?? Greetings! III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: High disk IO during UpdateCSV
Bumping this one again, any suggestions? On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine which is doing the processing then the request times out. But in general, requests to all the machines become slow. I have attached a screenshot of the diskI/O and CPU usage. Is there a fix in solr which can possibly throttle the load or maybe its due to MergePolicy? How can I debug solr to get the exact cause? -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: High disk IO during UpdateCSV
Utkarsh, Your screenshot didn't come through. I don't think this list allows attachments. Maybe put it up on imgur or something? I'm a little unclear on whether you're using Solr in Cloud mode, or with a single master. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Bumping this one again, any suggestions? On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine which is doing the processing then the request times out. But in general, requests to all the machines become slow. I have attached a screenshot of the diskI/O and CPU usage. Is there a fix in solr which can possibly throttle the load or maybe its due to MergePolicy? How can I debug solr to get the exact cause? -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: High disk IO during UpdateCSV
Hi Michael, I am using solr cloud 4.5. And update csv loads data to one of these nodes. Attachment: http://i.imgur.com/1xmoNtt.png Thanks, -Utkarsh On Wed, Nov 13, 2013 at 8:33 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Utkarsh, Your screenshot didn't come through. I don't think this list allows attachments. Maybe put it up on imgur or something? I'm a little unclear on whether you're using Solr in Cloud mode, or with a single master. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Nov 13, 2013 at 11:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Bumping this one again, any suggestions? On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine which is doing the processing then the request times out. But in general, requests to all the machines become slow. I have attached a screenshot of the diskI/O and CPU usage. Is there a fix in solr which can possibly throttle the load or maybe its due to MergePolicy? How can I debug solr to get the exact cause? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: High disk IO during UpdateCSV
Don't load 50M documents in one shot. Break it up into reasonable chunks (100K?) with commits at each point. You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be disk. If you get faster disks, it might become the CPU. wunder On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Bumping this one again, any suggestions? On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine which is doing the processing then the request times out. But in general, requests to all the machines become slow. I have attached a screenshot of the diskI/O and CPU usage. Is there a fix in solr which can possibly throttle the load or maybe its due to MergePolicy? How can I debug solr to get the exact cause? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- Walter Underwood wun...@wunderwood.org
Re: Why do people want to deploy to Tomcat?
On 11/13/2013 5:29 AM, Dmitry Kan wrote: Reading that people have considered deploying example folder is slightly strange to me. No wonder they are confused and confuse their ops. I do use the stripped jetty included in the example, but my setup is not a straight copy of the example directory. I removed a lot of it and changed how jars get loaded. I built my own init script from scratch, tailored for my setup. I'll start a new thread with my init script and some info about how I installed Solr. Thanks, Shawn
Re: Why do people want to deploy to Tomcat?
RE: the example folder It’s something I’ve been pushing towards moving away from for a long time - see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 'server' and pull examples into an 'examples’ directory Part of a push I’ve been on to own the Container level (people are now on board with that for 5.0), add start scripts, and other niceties that we should have but don’t yet. Even our config files should move away from being an “example” and end up more like a default starting template. Like a database, it should be simple to create a collection without needing to deal with config - you want to deal with the config when you need to, not face it all up front every time it is time to create a new collection. IMO, the name example is historical - most people already use it this way, the name just confuses matters. - Mark On Nov 13, 2013, at 12:30 PM, Shawn Heisey s...@elyograg.org wrote: On 11/13/2013 5:29 AM, Dmitry Kan wrote: Reading that people have considered deploying example folder is slightly strange to me. No wonder they are confused and confuse their ops. I do use the stripped jetty included in the example, but my setup is not a straight copy of the example directory. I removed a lot of it and changed how jars get loaded. I built my own init script from scratch, tailored for my setup. I'll start a new thread with my init script and some info about how I installed Solr. Thanks, Shawn
Re: My setup - init script and other info
Thank you. This will help me a lot. Sent from my iPhone On Nov 13, 2013, at 10:08 AM, Shawn Heisey s...@elyograg.org wrote: In the hopes that it will help someone get Solr running in a very clean way, here's an informational email. For my Solr install on CentOS 6, I use /opt/solr4 as my installation path, and /index/solr4 as my solr home. The /index directory is a dedicated filesystem, /opt is part of the root filesystem. From the example directory, I copied cloud-scripts, contexts, etc, lib, webapps, and start.jar over to /opt/solr4. My stuff was created before 4.3.0, so the resources directory didn't exist. I was already using log4j with a custom Solr build, and I put my log4j.properties file in etc instead. I created a logs directory and a run directory in /opt/solr4. My data structure in /index/solr4 is complex. All a new user really needs to know is that solr.xml goes here and dictates the rest of the structure. There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib - so that jars placed in ${solr.solr.home}/lib are actually located in the program directory, not the data directory. That makes for a much cleaner version control scenario - both directories are git repositories cloned from our internal git server. Unlike the example configs, my solrconfig.xml files do not have lib directives for loading jars. That gets automatically handled by the jars living in that symlinked lib directory. See SOLR-4852 for caveats regarding central lib directories. https://issues.apache.org/jira/browse/SOLR-4852 If you want to run SolrCloud, you would need to install zookeeper separately and put your zkHost parameter in solr.xml. Due to a bug, putting zkHost in solr.xml doesn't work properly until 4.4.0. Here's the current state of my init script. It's redhat-specific. I used /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that there are bash-isms in it, and bash is always available on the systems that I use: http://apaste.info/9fVA Notable features: * Runs Solr as an unprivileged user. * Has three methods for stopping Solr, tries graceful methods first. 1) The jetty STOPPORT/STOPKEY mechanism. 2) PID saved by the 'start' action. 3) Any program using the Solr listening port. * Before killing by PID, tries to make sure that the process actually is Solr. * Sets up remote JMX, by default without authentication or SSL. * Highly tuned CMS garbage collection. * Sets up GC logging. * Virtually everything is overridable via /etc/sysconfig/solr4. * Points at an overridable log4j config file, by default in /opt/solr4/etc. * Removes the existing PID file if the server is just booting up -- which it knows by noting that server uptime is less than three minutes. It shouldn't be too hard to convert this so it works on debian-derived systems. That would involve rewriting portions that use redhat init routines, and probably start-stop-daemon. What I'd really like is one script that will work on any system, but that will require a fair amount of work. It's a work in progress. It should load log4j.properties from resources instead of etc. I'd like to include it in the Solr download, but without a fair amount of documentation and possibly an installation script, which still must be written, that won't be possible. Feel free to ask questions about anything that doesn't seem clear. I welcome ideas for improvement on both my own setup and the solr example. Thanks, Shawn
Atomic Update at Solrj For a Newly Added Schema Field
I use Solr 4.5.1 I have indexed some documents and decided to add a new field to my schema after a time later. I want to use Atomic Updates for that newly added field. I use Solrj for indexing. However due to there is no field named as I've newly added Solr does not make an atomic update for existing documents. I do not want to reindex my whole data. Any ideas for it?
My setup - init script and other info
In the hopes that it will help someone get Solr running in a very clean way, here's an informational email. For my Solr install on CentOS 6, I use /opt/solr4 as my installation path, and /index/solr4 as my solr home. The /index directory is a dedicated filesystem, /opt is part of the root filesystem. From the example directory, I copied cloud-scripts, contexts, etc, lib, webapps, and start.jar over to /opt/solr4. My stuff was created before 4.3.0, so the resources directory didn't exist. I was already using log4j with a custom Solr build, and I put my log4j.properties file in etc instead. I created a logs directory and a run directory in /opt/solr4. My data structure in /index/solr4 is complex. All a new user really needs to know is that solr.xml goes here and dictates the rest of the structure. There is a symlink at /index/solr4/lib, pointing to /opt/solr4/solrlib - so that jars placed in ${solr.solr.home}/lib are actually located in the program directory, not the data directory. That makes for a much cleaner version control scenario - both directories are git repositories cloned from our internal git server. Unlike the example configs, my solrconfig.xml files do not have lib directives for loading jars. That gets automatically handled by the jars living in that symlinked lib directory. See SOLR-4852 for caveats regarding central lib directories. https://issues.apache.org/jira/browse/SOLR-4852 If you want to run SolrCloud, you would need to install zookeeper separately and put your zkHost parameter in solr.xml. Due to a bug, putting zkHost in solr.xml doesn't work properly until 4.4.0. Here's the current state of my init script. It's redhat-specific. I used /bin/bash (instead of /bin/sh) in the shebang because I am pretty sure that there are bash-isms in it, and bash is always available on the systems that I use: http://apaste.info/9fVA Notable features: * Runs Solr as an unprivileged user. * Has three methods for stopping Solr, tries graceful methods first. 1) The jetty STOPPORT/STOPKEY mechanism. 2) PID saved by the 'start' action. 3) Any program using the Solr listening port. * Before killing by PID, tries to make sure that the process actually is Solr. * Sets up remote JMX, by default without authentication or SSL. * Highly tuned CMS garbage collection. * Sets up GC logging. * Virtually everything is overridable via /etc/sysconfig/solr4. * Points at an overridable log4j config file, by default in /opt/solr4/etc. * Removes the existing PID file if the server is just booting up -- which it knows by noting that server uptime is less than three minutes. It shouldn't be too hard to convert this so it works on debian-derived systems. That would involve rewriting portions that use redhat init routines, and probably start-stop-daemon. What I'd really like is one script that will work on any system, but that will require a fair amount of work. It's a work in progress. It should load log4j.properties from resources instead of etc. I'd like to include it in the Solr download, but without a fair amount of documentation and possibly an installation script, which still must be written, that won't be possible. Feel free to ask questions about anything that doesn't seem clear. I welcome ideas for improvement on both my own setup and the solr example. Thanks, Shawn
Using data-config.xml from DIH in SolrJ
Hi All, I'm building a utility (Java jar) to create SolrInputDocuments and send them to a HttpSolrServer using the SolrJ API. The intention is to find an efficient way to create documents from a large directory of files (where multiple files make one Solr document) and be sent to a remote Solr instance for update and commit. I've already solved the problem using the DataImportHandler (DIH) so I have a data-config.xml that describes the templated fields and cross-walking of the source(s) to the schema. The original data won't always be able to be co-located with the Solr server which is why I'm looking for another option. I've also already solved the problem using ant and xslt to create a temporary (and unfortunately a potentially large) document which the UpdateHandler will accept. I couldn't think of a solution that took advantage of the XSLT support in the UpdateHandler because each document is created from multiple files. Our current dated Java based solution significantly outperforms this solution in terms of disk and time. I've rejected it based on that and gone back to the drawing board. Does anyone have any suggestions on how I might be able to reuse my DIH configuration in the SolrJ context without re-inventing the wheel (or DIH in this case)? If I'm doing something ridiculous I hope you'll point that out too. Thanks, Tricia
Re: collections API error
Try Solr 4.5.1. https://issues.apache.org/jira/browse/SOLR-5306 Extra collection creation parameters like collection.configName are not being respected. - Mark On Nov 13, 2013, at 2:24 PM, Christopher Gross cogr...@gmail.com wrote: Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30. 3 SolrCloud nodes running. 5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2 other servers. I want to create a collection on all 3 nodes. I only need 1 shard. The config is in Zookeeper (another collection is using it) http://solrserver:8080/solr/admin/collections?action=CREATEname=newtestnumShards=1replicationFactor=3collection.configName=test I get this error (3 times, though for a different replica #) org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core: newtest_shard1_replica2 The SolrCloud Admin logs give this as the root error: Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:newtest You can see from my call that I don't want it to be called test (already have one) but I want to make a new instance of the test collection. This seems pretty straightforward -- what am I missing? Did the parameters change and the wiki not get updated? [ http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API ] Thanks. -- Chris
collections API error
Running Apache Solr 4.5 on Tomcat 7.0.29, Java 1.6_30. 3 SolrCloud nodes running. 5 ZK nodes (v 3.4.5), one on each SolrCloud server, and on 2 other servers. I want to create a collection on all 3 nodes. I only need 1 shard. The config is in Zookeeper (another collection is using it) http://solrserver:8080/solr/admin/collections?action=CREATEname=newtestnumShards=1replicationFactor=3collection.configName=test I get this error (3 times, though for a different replica #) org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'newtest_shard1_replica2': Unable to create core: newtest_shard1_replica2 The SolrCloud Admin logs give this as the root error: Caused by: org.apache.solr.common.cloud.ZooKeeperException: Specified config does not exist in ZooKeeper:newtest You can see from my call that I don't want it to be called test (already have one) but I want to make a new instance of the test collection. This seems pretty straightforward -- what am I missing? Did the parameters change and the wiki not get updated? [ http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API ] Thanks. -- Chris
field collapsing performance in sharded environment
Hello, I'm hitting a performance issue when using field collapsing in a distributed Solr setup and I'm wondering if others have seen it and if anyone has an idea to work around. it. I'm using field collapsing to deduplicate documents that have the same near duplicate hash value, and deduplicating at query time (as opposed to filtering at index time) is a requirement. I have a sharded setup with 10 cores (not SolrCloud), each having ~1000 documents each. Of the 10k docs, most have a unique near duplicate hash value, so there are about 10k unique values for the field that I'm grouping on. The grouping parameters that I'm using are: group=true group.field=near dupe hash field group.main=true I'm attempting distributed queries (shards=s1,s2,...,s10) where the only difference is the absence or presence of these three grouping parameters and I'm consistently seeing a marked difference in performance (as a representative data point, 200ms latency without grouping and 1600ms with grouping). Interestingly, if I put all 10k docs on the same core and query that core independently with and without grouping, I don't see much of a latency difference, so the performance degradation seems to exist only in the sharded setup. Is there a known performance issue when field collapsing in a sharded setup (perhaps only manifests when the grouping field has many unique values), or have other people observed this? Any ideas for a workaround? Note that docs in my sharded setup can only have the same signature if they're in the same shard, so perhaps that can be used to boost perf, though I don't see an exposed way to do so. A follow-on question is whether we're likely to see the same issue if / when we move to SolrCloud. Thanks, Dave
How to escape special characters from SOLR response header
I am trying to escape special characters from SOLR response header (to prevent cross site scripting). I couldn't find any method in SolrQueryResponse to get just the SOLR response header. Can someone let me know if there is a way to modify the SOLR response header? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to escape special characters from SOLR response header
I'm not quite sure what you're trying to do here, can you please elaborate with an example? But, you can get the response header from a SolrQueryResponse using the getResponseHeader() method. Erik On Nov 13, 2013, at 3:21 PM, Developer bbar...@gmail.com wrote: I am trying to escape special characters from SOLR response header (to prevent cross site scripting). I couldn't find any method in SolrQueryResponse to get just the SOLR response header. Can someone let me know if there is a way to modify the SOLR response header? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-escape-special-characters-from-SOLR-response-header-tp4100772.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distributed search is significantly slower than direct search
It's surprising such a query takes a long time, I would assume that after trying consistently q=*:* you should be getting cache hits and times should be faster. Try see in the adminUI how do your query/doc cache perform. Moreover, the query in itself is just asking the first 5000 docs that were indexed (returing the first [docid]), so seems all this time is wasted on transfer. Out of these 7 secs how much is spent on the above method? What do you return by default? How big is every doc you display in your results? Might be the matter that both collections work on the same ressources. Try elaborating your use-case. Anyway, it seems like you just made a test to see what will be the performance hit in a distributed environment so I'll try to explain some things we encountered in our benchmarks, with a case that has at least the similarity of the num of docs fetched. We reclaim 2000 docs every query, running over 40 shards. This means every shard is actually transfering to our frontend 2000 docs every document-match request (the first you were referring to). Even if lazily loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a tough job. Waiting for the slowest shard to respond, then sorting the docs and reloading (lazy or not) the top 2000 docs might take a long time. Our times are 4-8 secs, but do it's not possible comparing cases. We've done few steps that improved it along the way, steps that led to others. These were our starters: 1. Profile these queries from different servers and solr instances, try putting your finger what collection is working hard and why. Check if you're stuck on components that don't have an added value for you but are used by default. 2. Consider eliminating the doc cache. It loads lots of (partly) lazy documents that their probability of secondary usage is low. There's no such thing popular docs when requesting so many docs. You may be using your memory in a better way. 3. Bottleneck check - inner server metrics as cpu user / iowait, packets transferred over the network, page faults etc. are excellent in order to understand if the disk/network/cpu is slowing you down. Then upgrade hardware in one of the shards to check if it helps by looking at the upgraded shard qTime compared to other. 4. Warm up the index after commiting - try to benchmark how do queries performs before and after some warm-up, let's say some few hundreds of queries (from your previous system) in order to warm up the os cache (assuming your using NRTDirectoryFactory) Good luck, Manu On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson erickerick...@gmail.comwrote: One thing you can try, and this is more diagnostic than a cure, is return just the id field (and insure that lazy field loading is true). That'll tell you whether the issue is actually fetching the document off disk and decompressing, although frankly that's unlikely since you can get your 5,000 rows from a single machine quickly. The code you found where Solr is spending its time, is that on the routing core or on the shards? I actually have a hard time understanding how that code could take a long time, doesn't seem right. You are transferring 5,000 docs across the network, so it's possible that your network is just slow, that's certainly a difference between the local and remote case, but that's a stab in the dark. Not much help I know, Erick On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir elr...@checkpoint.com wrote: Erick, Thanks for your response. We are upgrading our system using Solr. We need to preserve old functionality. Our client displays 5K document and groups them. Is there a way to refactor code in order to improve distributed documents fetching? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, October 30, 2013 3:17 AM To: solr-user@lucene.apache.org Subject: Re: distributed search is significantly slower than direct search You can't. There will inevitably be some overhead in the distributed case. That said, 7 seconds is quite long. 5,000 rows is excessive, and probably where your issue is. You're having to go out and fetch the docs across the wire. Perhaps there is some batching that could be done there, I don't know whether this is one document per request or not. Why 5K docs? Best, Erick On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, I am using Solr 4.4 with multi cores. One core (called template) is my routing core. When I run http://127.0.0.1:8983/solr/template/select?rows=5000q=*:*shards=127. 0.0.1:8983/solr/core1, it consistently takes about 7s. When I run http://127.0.0.1:8983/solr/core1/select?rows=5000q=*:*, it consistently takes about 40ms. I profiled the distributed query. This is the distributed query process (I hope the terms
Re: High disk IO during UpdateCSV
Thanks guys! I will start splitting the file in chunks of 5M (10 chunks) to start with reduce the size if needed. Thanks, -Utkarsh On Wed, Nov 13, 2013 at 9:08 AM, Walter Underwood wun...@wunderwood.orgwrote: Don't load 50M documents in one shot. Break it up into reasonable chunks (100K?) with commits at each point. You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be disk. If you get faster disks, it might become the CPU. wunder On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Bumping this one again, any suggestions? On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine which is doing the processing then the request times out. But in general, requests to all the machines become slow. I have attached a screenshot of the diskI/O and CPU usage. Is there a fix in solr which can possibly throttle the load or maybe its due to MergePolicy? How can I debug solr to get the exact cause? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- Walter Underwood wun...@wunderwood.org -- Thanks, -Utkarsh
Re: Why do people want to deploy to Tomcat?
which example? there are so many. On Wed, Nov 13, 2013 at 1:00 PM, Mark Miller markrmil...@gmail.com wrote: RE: the example folder It’s something I’ve been pushing towards moving away from for a long time - see https://issues.apache.org/jira/browse/SOLR-3619 Rename 'example' dir to 'server' and pull examples into an 'examples’ directory Part of a push I’ve been on to own the Container level (people are now on board with that for 5.0), add start scripts, and other niceties that we should have but don’t yet. Even our config files should move away from being an “example” and end up more like a default starting template. Like a database, it should be simple to create a collection without needing to deal with config - you want to deal with the config when you need to, not face it all up front every time it is time to create a new collection. IMO, the name example is historical - most people already use it this way, the name just confuses matters. - Mark On Nov 13, 2013, at 12:30 PM, Shawn Heisey s...@elyograg.org wrote: On 11/13/2013 5:29 AM, Dmitry Kan wrote: Reading that people have considered deploying example folder is slightly strange to me. No wonder they are confused and confuse their ops. I do use the stripped jetty included in the example, but my setup is not a straight copy of the example directory. I removed a lot of it and changed how jars get loaded. I built my own init script from scratch, tailored for my setup. I'll start a new thread with my init script and some info about how I installed Solr. Thanks, Shawn
queries including time zone
Can anybody provide any insight about using the tz param? The behavior of this isn't affecting date math and /day rounding. What format does the tz variables need to be in? Not finding any documentation on this. Sample query we're using: path=/select params={tz=America/Chicagosort=id+descstart=0q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]wt=jsonrows=25} Thanks! Eric
Re: queries including time zone
I believe it is the TZ column from this table: http://en.wikipedia.org/wiki/List_of_tz_database_time_zones Yeah, it's on my TODO list for my book. I suspect that tz will not affect NOW, which is probably UTC. I suspect that tz only affects literal dates in date math. -- Jack Krupansky -Original Message- From: Eric Katherman Sent: Wednesday, November 13, 2013 11:38 PM To: solr-user@lucene.apache.org Subject: queries including time zone Can anybody provide any insight about using the tz param? The behavior of this isn't affecting date math and /day rounding. What format does the tz variables need to be in? Not finding any documentation on this. Sample query we're using: path=/select params={tz=America/Chicagosort=id+descstart=0q=application_id:51b30ed9bc571bd96773f09c+AND+object_key:object_26+AND+values_field_215_date:[*+TO+NOW/DAY%2B1DAY]wt=jsonrows=25} Thanks! Eric=
(info)about lucene search performents
Dear lucene I find a question that lucene search performent,first search is very slowly and secondary search is very quick I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr cache for testing lucene search peforments ) Call mmap () is the kernel just logical addresses to physical address mapping table is established, and without any data to the memory mapping Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise method I find a jrra(LUCENE-3178) , I don't know if I can solve this problem
(info) about lucene search performents
Dear lucene I find a question that lucene search performent,first search is very slowly and secondary search is very quick I use MMapDirectoryFactory in solrconfig.xml (I have already banned all solr cache for testing lucene search peforments ) Call mmap () is the kernel just logical addresses to physical address mapping table is established, and without any data to the memory mapping Should be madvise () and mmap () match ,but MMapDirectoryFactory no madvise method I find a jrra(LUCENE-3178) , I don't know if I can solve this problem