Re: Concurrent Indexing and Searching in Solr.
Thanks Erick for your suggestion. I will remove commit = true and use solr 5.2 and then get back to you again. For further help. Thanks. On Sat, Aug 8, 2015 at 4:07 AM Erick Erickson erickerick...@gmail.com wrote: bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a whole lot of work going on here.. Best, Erick On Fri, Aug 7, 2015 at 11:23 AM, Upayavira u...@odoko.co.uk wrote: How many CPUs do you have? 100 concurrent
Re: Concurrent Indexing and Searching in Solr.
If you are using Python, then you can use urllib2, or requests which is reportedly better, or better still something like pysolr, which makes life simpler. Here's a Pull Request that makes pysolr Zookeeper aware, which'll help if you are using SolrCloud. I hope one day they will merge it: https://github.com/toastdriven/pysolr/pull/138 Upayavira On Fri, Aug 7, 2015, at 11:37 PM, Erick Erickson wrote: bq: So, How much minimum concurrent threads should I run? I really can't answer that in the abstract, you'll simply have to test. I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that moving from Python to post.jar isn't all that useful. But before you do anything, see what really happens when you remove th commit=true. That's likely way more important than the rest. Best, Erick On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, posting files to Solr via curl = Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please. indexing with 100 threads is going to eat up a lot of CPU cycles = So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson erickerick...@gmail.com wrote: bq: How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? None a-priori. It all depends on the hardware you're throwing at it. Obviously indexing with 100 threads is going to eat up a lot of CPU cycles that can't then be devoted to satisfying queries. You need to strike a balance. Do seriously consider using some other method than posting files to Solr via curl or the like, that's rarely a robust solution for production. As for adding the commit=true, this shouldn't be affecting the index size, I suspect you were mislead by something else happening. Really, remove it or you'll beat up your system hugely. As for the soft commit interval, that's totally irrelevant when you're committing every document. But do lengthen it as much as you can. Most of the time when people say real time, it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check what the _real_ requirement is, it's often not what's stated. bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. Did you read the link I provided? With replicas, 5.2 will index almost twice as fast. That means (roughly) half the work on the followers is being done, freeing up cycles for performing queries. Best, Erick On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Erick, You said that soft commit should be more than 3000 ms. Actually, I need Real time searching and that's why I need soft commit fast. commit=true = I made commit=true because , It reduces by indexed data size from 1.5GB to 500MB on* each shard*. When I did commit=false then, my indexed data size was 1.5GB. After changing it to commit=true, then size reduced to 500MB only. I am not getting how is it? I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding indexing and searching data. How much limitations does Solr has related to indexing and searching simultaneously? It means that how many simultaneously calls, I made for searching and indexing once? On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson erickerick...@gmail.com wrote: Your soft commit time of 3 seconds is quite aggressive, I'd lengthen it to as long as possible. Ugh, looked at your query more closely. Adding commit=true to every update request is horrible performance wise. Let your autocommit process handle the commits is the first thing I'd do. Second, I'd try going to SolrJ and batching up documents (I usually start with 1,000) or using the post.jar tool rather than sending them via a raw URL. I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what version of Solr? There was a 2x speedup in Solr 5.2, see: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ One symptom was that the followers were doing way more work than the leader (BTW, using master/slave when talking SolrCloud is a bit confusing...) which will affect query response rates. Basically, if query response is paramount, you really need to throttle your indexing, there's just a
Re: SolrJ update
Hi Henrique, I don't believe there's an easy way to do that. As you noticed, the SolrInputDocument is not an I/O param, that is, it is not sent back once data has been indexed and this is good, because here you're sending just one document, but imagine what could happen if you do a bulk loading...the response would be very very huge! Although I could imagine some workaround (with a custom UpdateRequestProcessor and a custom ResponseWriter), the point is that (see above) I believe it would end in a bad design: - if you send one document at time this is *often* considered a bad practice; - if you send a lot of data the corresponding response would be huge, it will contains a lot of new created identifiersand BTW how do you match them with your input documents? Sequentially? in this way you won't be able to use any *asynch* client Personally, if that is ok for your context, I'd completely avoid the problem moving the logic on the client side. I mean, create a UUID field on Solrj and add that ID to the outcoming document. Best, Andrea 2015-08-06 21:39 GMT+02:00 Henrique O. Santos hensan...@gmail.com: Hello all, I am using SolrJ to do a index update on one of my collections. This collection has a uniqueKey id field: fields field name=id type=string indexed=true stored=true/ field name=_version_ type=long indexed=true stored=true/ field name=name type=string indexed=true stored=true/ /fields uniqueKeyid/uniqueKey This field is configured to be auto generated on solrconfig.xml like this: updateRequestProcessorChain processor class=solr.UUIDUpdateProcessorFactory str name=fieldNameid/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain On my Java code, I just add the name field to my document and then proceed with the add: doc.addField(name, this.name); solrClient.add(doc); solrClient.commit(); Everything works, the document gets indexed. What I really need is to know right away in the code the id that was generated for that single document. I have tried looking into the UpdateReponse but no luck. Is there any easy way to do that? Thank you in advance, Henrique.
Re: how to extend JavaBinCodec and make it available in solrj api
Shalin, Thanks, can I also introduce custom entity tags like in my example with the highlighter output? Dmitry On Fri, Aug 7, 2015 at 5:10 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The thing is that you are trying to introduce custom xml tags which require changing the response writers. Instead, if you just used nested maps/lists or SimpleOrderedMap/NamedList then every response writer should be able to just directly write the output. Nesting is not a problem. On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan solrexp...@gmail.com wrote: Shawn: thanks, we found an intermediate solution by serializing our data structure using string representation, perhaps less optimal than using binary format directly. In the original router with JavaBinCodec we found, that BinaryResponseWriter should also be extended. But the following method is static and does allow extending: public static NamedListObject getParsedResponse(SolrQueryRequest req, SolrQueryResponse rsp) { try { Resolver resolver = new Resolver(req, rsp.getReturnFields()); ByteArrayOutputStream out = new ByteArrayOutputStream(); new JavaBinCodec(resolver).marshal(rsp.getValues(), out); InputStream in = new ByteArrayInputStream(out.toByteArray()); return (NamedListObject) new JavaBinCodec(resolver).unmarshal(in); } catch (Exception ex) { throw new RuntimeException(ex); } } Shalin: We needed new data structure in highlighter with more nested levels, than just one. Something like this (in xml representation): lst name=highlighting lst name=doc1 arr name=snippets snippet idid1/id contentsSnippet text goes here/contents other params/ /snippet /arr /lst/lst Can this be modelled with existing types? On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: What do you mean by a custom format? As long as your custom component is writing primitives or NamedList/SimpleOrderedMap or collections such as List/Map, any response writer should be able to handle them. On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello, Solr: 5.2.1 class: org.apache.solr.common.util.JavaBinCodec I'm working on a custom data structure for the highlighter. The data structure is ready in JSON and XML formats. I need also JavaBin format. The data structure is already made serializable by extending the WritableValue class (methods write and resolve). To receive the custom format on the client via solrj api, the data structure needs to be parseable by JavaBinCodec. Is this correct assumption? Can we introduce the custom data structure consumer on the solrj api without complete overhaul of the api? Is there plugin framework such that JavaBinCodec is extended and used for the new data structure? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: Streaming API running a simple query
Hi, Thanks, good to know, in fact my requirement needs to merge multiple expressions, while current streaming expressions supports only two expression. Do you think we can expect that in future versions? On 07-Aug-2015 6:46 pm, Joel Bernstein joels...@gmail.com wrote: Hi, There is a new error handling framework in trunk (SOLR-7441) for the Streaming API, Streaming Expressions. So if you're purely in testing mode, it will be much easier to work in trunk then Solr 5.2. If you run into errors in trunk that are still confusing please continue to report them so we can get all the error messages covered. Thanks, Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 7, 2015 at 6:19 AM, Selvam s.selvams...@gmail.com wrote: Hi, Sorry, it is working now. curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id asc)' http://localhost:8983/solr/gettingstarted/stream I missed *'asc'* in sort :) Thanks for the help Shawn Heisey. On Fri, Aug 7, 2015 at 3:46 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks for your update, yes, I was missing the cloud mode, I am new to the world of Solr cloud. Now I have enabled a single node (with two shards replicas) that runs on 8983 port along with zookeeper running on 9983 port. When I run, curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Again, I get Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream . . Caused by: java.lang.reflect.InvocationTargetException . . Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 I tried different port, 9983 as well, which returns Empty reply from server. I think I miss some obvious configuration. On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:37 AM, Selvam wrote: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions I tried this from my linux terminal, 1) curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Threw zkHost error. Then tried with, 2) curl --data-urlencode 'stream=search(gettingstarted,zkHost=localhost:8983,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream It throws me java.lang.ArrayIndexOutOfBoundsException: 1\n\tat org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260) The documentation page you linked seems to indicate that this is a feature that only works in SolrCloud. Your inclusion of localhost:8983 as the zkHost suggests that either you are NOT running in cloud mode, or that you do not understand what zkHost means. Zookeeper runs on a different port than Solr. 8983 is Solr's port. If you are running a 5.x cloud with the embedded zookeeper, it is most likely running on port 9983. If you are running in cloud mode with a properly configured external zookeeper, then your zkHost parameter will probably have three hosts in it with port 2181. Thanks, Shawn -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
Re: Streaming API running a simple query
Can you describe your use case? Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 7:36 AM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks, good to know, in fact my requirement needs to merge multiple expressions, while current streaming expressions supports only two expression. Do you think we can expect that in future versions? On 07-Aug-2015 6:46 pm, Joel Bernstein joels...@gmail.com wrote: Hi, There is a new error handling framework in trunk (SOLR-7441) for the Streaming API, Streaming Expressions. So if you're purely in testing mode, it will be much easier to work in trunk then Solr 5.2. If you run into errors in trunk that are still confusing please continue to report them so we can get all the error messages covered. Thanks, Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 7, 2015 at 6:19 AM, Selvam s.selvams...@gmail.com wrote: Hi, Sorry, it is working now. curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id asc)' http://localhost:8983/solr/gettingstarted/stream I missed *'asc'* in sort :) Thanks for the help Shawn Heisey. On Fri, Aug 7, 2015 at 3:46 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks for your update, yes, I was missing the cloud mode, I am new to the world of Solr cloud. Now I have enabled a single node (with two shards replicas) that runs on 8983 port along with zookeeper running on 9983 port. When I run, curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Again, I get Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream . . Caused by: java.lang.reflect.InvocationTargetException . . Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 I tried different port, 9983 as well, which returns Empty reply from server. I think I miss some obvious configuration. On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:37 AM, Selvam wrote: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions I tried this from my linux terminal, 1) curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Threw zkHost error. Then tried with, 2) curl --data-urlencode 'stream=search(gettingstarted,zkHost=localhost:8983,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream It throws me java.lang.ArrayIndexOutOfBoundsException: 1\n\tat org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260) The documentation page you linked seems to indicate that this is a feature that only works in SolrCloud. Your inclusion of localhost:8983 as the zkHost suggests that either you are NOT running in cloud mode, or that you do not understand what zkHost means. Zookeeper runs on a different port than Solr. 8983 is Solr's port. If you are running a 5.x cloud with the embedded zookeeper, it is most likely running on port 9983. If you are running in cloud mode with a properly configured external zookeeper, then your zkHost parameter will probably have three hosts in it with port 2181. Thanks, Shawn -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
Re: Streaming API running a simple query
Hi, I needed to run a multiple subqueries each with its own limit of rows. For eg: to get 30 users from country India with age greater than 30 and 50 users from England who are all male. Thanks again. On 08-Aug-2015 5:30 pm, Joel Bernstein joels...@gmail.com wrote: Can you describe your use case? Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 7:36 AM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks, good to know, in fact my requirement needs to merge multiple expressions, while current streaming expressions supports only two expression. Do you think we can expect that in future versions? On 07-Aug-2015 6:46 pm, Joel Bernstein joels...@gmail.com wrote: Hi, There is a new error handling framework in trunk (SOLR-7441) for the Streaming API, Streaming Expressions. So if you're purely in testing mode, it will be much easier to work in trunk then Solr 5.2. If you run into errors in trunk that are still confusing please continue to report them so we can get all the error messages covered. Thanks, Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 7, 2015 at 6:19 AM, Selvam s.selvams...@gmail.com wrote: Hi, Sorry, it is working now. curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id asc)' http://localhost:8983/solr/gettingstarted/stream I missed *'asc'* in sort :) Thanks for the help Shawn Heisey. On Fri, Aug 7, 2015 at 3:46 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks for your update, yes, I was missing the cloud mode, I am new to the world of Solr cloud. Now I have enabled a single node (with two shards replicas) that runs on 8983 port along with zookeeper running on 9983 port. When I run, curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Again, I get Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream . . Caused by: java.lang.reflect.InvocationTargetException . . Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 I tried different port, 9983 as well, which returns Empty reply from server. I think I miss some obvious configuration. On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:37 AM, Selvam wrote: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions I tried this from my linux terminal, 1) curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Threw zkHost error. Then tried with, 2) curl --data-urlencode 'stream=search(gettingstarted,zkHost=localhost:8983,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream It throws me java.lang.ArrayIndexOutOfBoundsException: 1\n\tat org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260) The documentation page you linked seems to indicate that this is a feature that only works in SolrCloud. Your inclusion of localhost:8983 as the zkHost suggests that either you are NOT running in cloud mode, or that you do not understand what zkHost means. Zookeeper runs on a different port than Solr. 8983 is Solr's port. If you are running a 5.x cloud with the embedded zookeeper, it is most likely running on port 9983. If you are running in cloud mode with a properly configured external zookeeper, then your zkHost parameter will probably have three hosts in it with port 2181. Thanks, Shawn -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
Re: how to extend JavaBinCodec and make it available in solrj api
No, I'm afraid you will have to extend the XmlResponseWriter in that case. On Sat, Aug 8, 2015 at 2:02 PM, Dmitry Kan solrexp...@gmail.com wrote: Shalin, Thanks, can I also introduce custom entity tags like in my example with the highlighter output? Dmitry On Fri, Aug 7, 2015 at 5:10 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The thing is that you are trying to introduce custom xml tags which require changing the response writers. Instead, if you just used nested maps/lists or SimpleOrderedMap/NamedList then every response writer should be able to just directly write the output. Nesting is not a problem. On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan solrexp...@gmail.com wrote: Shawn: thanks, we found an intermediate solution by serializing our data structure using string representation, perhaps less optimal than using binary format directly. In the original router with JavaBinCodec we found, that BinaryResponseWriter should also be extended. But the following method is static and does allow extending: public static NamedListObject getParsedResponse(SolrQueryRequest req, SolrQueryResponse rsp) { try { Resolver resolver = new Resolver(req, rsp.getReturnFields()); ByteArrayOutputStream out = new ByteArrayOutputStream(); new JavaBinCodec(resolver).marshal(rsp.getValues(), out); InputStream in = new ByteArrayInputStream(out.toByteArray()); return (NamedListObject) new JavaBinCodec(resolver).unmarshal(in); } catch (Exception ex) { throw new RuntimeException(ex); } } Shalin: We needed new data structure in highlighter with more nested levels, than just one. Something like this (in xml representation): lst name=highlighting lst name=doc1 arr name=snippets snippet idid1/id contentsSnippet text goes here/contents other params/ /snippet /arr /lst/lst Can this be modelled with existing types? On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: What do you mean by a custom format? As long as your custom component is writing primitives or NamedList/SimpleOrderedMap or collections such as List/Map, any response writer should be able to handle them. On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello, Solr: 5.2.1 class: org.apache.solr.common.util.JavaBinCodec I'm working on a custom data structure for the highlighter. The data structure is ready in JSON and XML formats. I need also JavaBin format. The data structure is already made serializable by extending the WritableValue class (methods write and resolve). To receive the custom format on the client via solrj api, the data structure needs to be parseable by JavaBinCodec. Is this correct assumption? Can we introduce the custom data structure consumer on the solrj api without complete overhaul of the api? Is there plugin framework such that JavaBinCodec is extended and used for the new data structure? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar.
Re: how to extend JavaBinCodec and make it available in solrj api
Or use the XsltResponseWriter :) On Sat, Aug 8, 2015 at 7:51 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No, I'm afraid you will have to extend the XmlResponseWriter in that case. On Sat, Aug 8, 2015 at 2:02 PM, Dmitry Kan solrexp...@gmail.com wrote: Shalin, Thanks, can I also introduce custom entity tags like in my example with the highlighter output? Dmitry On Fri, Aug 7, 2015 at 5:10 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The thing is that you are trying to introduce custom xml tags which require changing the response writers. Instead, if you just used nested maps/lists or SimpleOrderedMap/NamedList then every response writer should be able to just directly write the output. Nesting is not a problem. On Fri, Aug 7, 2015 at 6:09 PM, Dmitry Kan solrexp...@gmail.com wrote: Shawn: thanks, we found an intermediate solution by serializing our data structure using string representation, perhaps less optimal than using binary format directly. In the original router with JavaBinCodec we found, that BinaryResponseWriter should also be extended. But the following method is static and does allow extending: public static NamedListObject getParsedResponse(SolrQueryRequest req, SolrQueryResponse rsp) { try { Resolver resolver = new Resolver(req, rsp.getReturnFields()); ByteArrayOutputStream out = new ByteArrayOutputStream(); new JavaBinCodec(resolver).marshal(rsp.getValues(), out); InputStream in = new ByteArrayInputStream(out.toByteArray()); return (NamedListObject) new JavaBinCodec(resolver).unmarshal(in); } catch (Exception ex) { throw new RuntimeException(ex); } } Shalin: We needed new data structure in highlighter with more nested levels, than just one. Something like this (in xml representation): lst name=highlighting lst name=doc1 arr name=snippets snippet idid1/id contentsSnippet text goes here/contents other params/ /snippet /arr /lst/lst Can this be modelled with existing types? On Thu, Aug 6, 2015 at 9:47 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: What do you mean by a custom format? As long as your custom component is writing primitives or NamedList/SimpleOrderedMap or collections such as List/Map, any response writer should be able to handle them. On Wed, Aug 5, 2015 at 5:08 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello, Solr: 5.2.1 class: org.apache.solr.common.util.JavaBinCodec I'm working on a custom data structure for the highlighter. The data structure is ready in JSON and XML formats. I need also JavaBin format. The data structure is already made serializable by extending the WritableValue class (methods write and resolve). To receive the custom format on the client via solrj api, the data structure needs to be parseable by JavaBinCodec. Is this correct assumption? Can we introduce the custom data structure consumer on the solrj api without complete overhaul of the api? Is there plugin framework such that JavaBinCodec is extended and used for the new data structure? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
SolrCloud - Error getting leader from zk
Hello there, Im getting these errors after an election: ERROR - 2015-08-08 13:51:05.035; org.apache.solr.cloud.ZkController; Error getting leader from zk org.apache.solr.common.SolrException: There is conflicting information about the leader of shard: shard1 our state says:http://HOST/solr/COLLECTION/ but zookeeper says:http://ANOTHER_HOST/solr/COLLECTION at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:889) ... INFO - 2015-08-08 13:51:05.036; org.apache.solr.cloud.ZkController; publishing core=COLLECTION state=down collection=COLLECTION Then the host gets registered as down. Ive tried cleaning the data path and restarting the node, but didint work. Yesterday I had an issue where I needed to update the collection leader on the clusterstate.json editing the file in exhibitor. If i access the console through other nodes the leader showing is the right one. But it seems to me that the updated clusterstate.json was not sent properly to this specific node. Any suggestions on how to fix it? Att., Francisco Andrade
Re: SolrCloud - Error getting leader from zk
On 8/8/2015 8:38 AM, Francisco Andrade wrote: Yesterday I had an issue where I needed to update the collection leader on the clusterstate.json editing the file in exhibitor. If i access the console through other nodes the leader showing is the right one. But it seems to me that the updated clusterstate.json was not sent properly to this specific node. Any suggestions on how to fix it? First, what version of Solr are you running? Do you have the same version on all nodes? I am not familiar enough with the code to try and debug what the log messages mean, I will just speak in general terms. Unless it's the only way to fix a situation that has arisen because of extraordinary circumstances, it's a bad idea to manually edit what's in the zookeeper database, and even then, it's a good idea to restart Solr to be sure it notices the change. On the topic of your specific edit, if you want to be able to control which node is the leader, upgrade to the latest Solr version and use the new preferred leader capability. I believe it was added in 5.0. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Something I would try at this point is shutting down and restarting each Solr instance in your cluster, to be absolutely sure their internal state agrees with zookeeper. Restarting Solr instances will shuffle your leaders around, and you may not end up with the leader assignments you want. The preferred leader feature is a much better way to handle this. Thanks, Shawn
Re: Streaming API running a simple query
This sounds doable using nested merge functions like this: merge(search(...), merge(search(...), search(),...), ...) Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 8:08 AM, Selvam s.selvams...@gmail.com wrote: Hi, I needed to run a multiple subqueries each with its own limit of rows. For eg: to get 30 users from country India with age greater than 30 and 50 users from England who are all male. Thanks again. On 08-Aug-2015 5:30 pm, Joel Bernstein joels...@gmail.com wrote: Can you describe your use case? Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 7:36 AM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks, good to know, in fact my requirement needs to merge multiple expressions, while current streaming expressions supports only two expression. Do you think we can expect that in future versions? On 07-Aug-2015 6:46 pm, Joel Bernstein joels...@gmail.com wrote: Hi, There is a new error handling framework in trunk (SOLR-7441) for the Streaming API, Streaming Expressions. So if you're purely in testing mode, it will be much easier to work in trunk then Solr 5.2. If you run into errors in trunk that are still confusing please continue to report them so we can get all the error messages covered. Thanks, Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 7, 2015 at 6:19 AM, Selvam s.selvams...@gmail.com wrote: Hi, Sorry, it is working now. curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id asc)' http://localhost:8983/solr/gettingstarted/stream I missed *'asc'* in sort :) Thanks for the help Shawn Heisey. On Fri, Aug 7, 2015 at 3:46 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks for your update, yes, I was missing the cloud mode, I am new to the world of Solr cloud. Now I have enabled a single node (with two shards replicas) that runs on 8983 port along with zookeeper running on 9983 port. When I run, curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Again, I get Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream . . Caused by: java.lang.reflect.InvocationTargetException . . Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 I tried different port, 9983 as well, which returns Empty reply from server. I think I miss some obvious configuration. On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:37 AM, Selvam wrote: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions I tried this from my linux terminal, 1) curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Threw zkHost error. Then tried with, 2) curl --data-urlencode 'stream=search(gettingstarted,zkHost=localhost:8983,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream It throws me java.lang.ArrayIndexOutOfBoundsException: 1\n\tat org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260) The documentation page you linked seems to indicate that this is a feature that only works in SolrCloud. Your inclusion of localhost:8983 as the zkHost suggests that either you are NOT running in cloud mode, or that you do not understand what zkHost means. Zookeeper runs on a different port than Solr. 8983 is Solr's port. If you are running a 5.x cloud with the embedded zookeeper, it is most likely running on port 9983. If you are running in cloud mode with a properly configured external zookeeper, then your zkHost parameter will probably have three hosts in it with port 2181. Thanks, Shawn -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
Re: SolrCloud - Error getting leader from zk
Hi Shawn, thanks for replying. My solr version is 4.9.0 at all nodes. I just figured out what was the problem. When I edited the clusterstate.json on exhibitor, I forgot to also edit the file located at: /collections collection_name leaders shard There is also a leader config on that file. Once I updated the leader node on the file to match the one chosen at clusterstate.json and restarted the solr nodes that were facing the problem everything worked fine. Att., Francisco Andrade On Sat, Aug 8, 2015 at 12:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/8/2015 8:38 AM, Francisco Andrade wrote: Yesterday I had an issue where I needed to update the collection leader on the clusterstate.json editing the file in exhibitor. If i access the console through other nodes the leader showing is the right one. But it seems to me that the updated clusterstate.json was not sent properly to this specific node. Any suggestions on how to fix it? First, what version of Solr are you running? Do you have the same version on all nodes? I am not familiar enough with the code to try and debug what the log messages mean, I will just speak in general terms. Unless it's the only way to fix a situation that has arisen because of extraordinary circumstances, it's a bad idea to manually edit what's in the zookeeper database, and even then, it's a good idea to restart Solr to be sure it notices the change. On the topic of your specific edit, if you want to be able to control which node is the leader, upgrade to the latest Solr version and use the new preferred leader capability. I believe it was added in 5.0. https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Something I would try at this point is shutting down and restarting each Solr instance in your cluster, to be absolutely sure their internal state agrees with zookeeper. Restarting Solr instances will shuffle your leaders around, and you may not end up with the leader assignments you want. The preferred leader feature is a much better way to handle this. Thanks, Shawn
Re: docValues
I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% In my experience, a healthy Java application (after the heap size has stabilized) will have a heap utilization graph where the low points are between 50 and 75 percent. If the low points in heap utilization are consistently below 25 percent, you would be better off reducing the heap size and allowing the OS to use that memory instead. If you want to track heap utilization, JVM-Memory in the Solr dashboard is a very poor tool. Use tools like visualvm or jconsole. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap I need to add what I said about very low heap utilization to that wiki page. Thanks, Shawn
Re: docValues
Hi, I am seeing a significant difference in the query time after using docValue what kind of difference, is it good or bad? With Regards Aman Tandon On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com wrote: I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% In my experience, a healthy Java application (after the heap size has stabilized) will have a heap utilization graph where the low points are between 50 and 75 percent. If the low points in heap utilization are consistently below 25 percent, you would be better off reducing the heap size and allowing the OS to use that memory instead. If you want to track heap utilization, JVM-Memory in the Solr dashboard is a very poor tool. Use tools like visualvm or jconsole. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap I need to add what I said about very low heap utilization to that wiki page. Thanks, Shawn
Re: docValues
Good Sent from my iPhone On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I am seeing a significant difference in the query time after using docValue what kind of difference, is it good or bad? With Regards Aman Tandon On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com wrote: I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% In my experience, a healthy Java application (after the heap size has stabilized) will have a heap utilization graph where the low points are between 50 and 75 percent. If the low points in heap utilization are consistently below 25 percent, you would be better off reducing the heap size and allowing the OS to use that memory instead. If you want to track heap utilization, JVM-Memory in the Solr dashboard is a very poor tool. Use tools like visualvm or jconsole. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap I need to add what I said about very low heap utilization to that wiki page. Thanks, Shawn
Re: docValues
Have you seen: https://cwiki.apache.org/confluence/display/solr/DocValues? What kind of speedup? How often are you committing? Is there a speed difference after a while or on the first few queries? Details matter a lot for questions like this. Best, Erick On Sat, Aug 8, 2015 at 6:22 PM, Nagasharath sharathrayap...@gmail.com wrote: Good Sent from my iPhone On 08-Aug-2015, at 8:12 pm, Aman Tandon amantandon...@gmail.com wrote: Hi, I am seeing a significant difference in the query time after using docValue what kind of difference, is it good or bad? With Regards Aman Tandon On Sat, Aug 8, 2015 at 11:38 PM, Nagasharath sharathrayap...@gmail.com wrote: I am seeing a significant difference in the query time after using docValue. I am curious to know what's happening with 'docValue' included in the schema On 07-Aug-2015, at 4:31 pm, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 11:47 AM, naga sharathrayapati wrote: JVM-Memory has gone up from 3% to 17.1% In my experience, a healthy Java application (after the heap size has stabilized) will have a heap utilization graph where the low points are between 50 and 75 percent. If the low points in heap utilization are consistently below 25 percent, you would be better off reducing the heap size and allowing the OS to use that memory instead. If you want to track heap utilization, JVM-Memory in the Solr dashboard is a very poor tool. Use tools like visualvm or jconsole. https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap I need to add what I said about very low heap utilization to that wiki page. Thanks, Shawn