Re: Solr Atomic Updates
Basically, I think about using SolrCloud whenever you have to split your corpus into more than one core (shard in SolrCloud terms). Or when you require fault tolerance in terms of machines going up and down. Despite the name, it does _not_ require AWS or similar, and you can run SolrCloud on a single machine, that is host multiple shards on a single physical machine to take advantage of the many CPU cores often available on modern hardware. Or you can host your SolrCloud in your own data center. Or, really, anywhere that you have one or more machines available that can talk to each other. I _really_ recommend you look at this option before pursuing your original question, it's vastly easier to let SolrCloud handle your routing, queries etc. than re-invent all that yourself. Best, Erick On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова batalova...@gmail.com wrote: Upayavira, I'm using stand-alone Solr instances. I've not learnt SolrCloud yet. Please, give me some advice when SolrCloud is better then stand-alone Solr instances. Or when it is worth to choose SolrCloud. _ _ _ Batalova Kseniya If you are using stand-alone Solr instances, then it is your responsibility to decide which node a document resides in, and thus to which core you will send your update request. If, however, you used SolrCloud, it would handle that for you - deciding which node should contain a document, and directing the update their all behind the scenes for you. Upayavira On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: retrieving large number of docs
Hi Erick they are on the same JVM. I had already tried the core join strategy but that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and core1, and I run this query on core0 /select?q=QUERYfq={!join from=id1 to=id2 fromIndex=core1}facet=truefacet.field=tag has 2 problems 1) i need to specify the docIDs with the fq (so back to the same fq={!terms} problem), and 2) faceting doesnt work Flattening the data is not possible due to security reasons. Am I using join correctly? thank you Erick Peyman On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com wrote: Are these indexes on different machines? Because if they're in the same JVM, you might be able to use cross-core joins. Be aware, though, that joining on high-cardinality fields (which, by definition, docID probably is) is where pseudo joins perform worst. Have you considered flattening the data and including whatever information you have in your from index in your main index? Because 100ms response is probably not going to be tough if you have to have two indexes/cores. Best, Erick On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html ). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: retrieving large number of docs
Specify the join query parser for the main query. See: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser -- Jack Krupansky On Wed, Jun 3, 2015 at 3:32 PM, Robust Links pey...@robustlinks.com wrote: Hi Erick they are on the same JVM. I had already tried the core join strategy but that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and core1, and I run this query on core0 /select?q=QUERYfq={!join from=id1 to=id2 fromIndex=core1}facet=truefacet.field=tag has 2 problems 1) i need to specify the docIDs with the fq (so back to the same fq={!terms} problem), and 2) faceting doesnt work Flattening the data is not possible due to security reasons. Am I using join correctly? thank you Erick Peyman On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com wrote: Are these indexes on different machines? Because if they're in the same JVM, you might be able to use cross-core joins. Be aware, though, that joining on high-cardinality fields (which, by definition, docID probably is) is where pseudo joins perform worst. Have you considered flattening the data and including whatever information you have in your from index in your main index? Because 100ms response is probably not going to be tough if you have to have two indexes/cores. Best, Erick On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html ). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: Solr Atomic Updates
BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster would make a lot more sense since a cloud is typically a very large collection of machines and more of a place than a specific configuration, while a Solr deployment is more typically a more modest number of machines, a cluster. It just seems totally out of sync with the current popular conception of a cloud, and it helps confuse people as to when and where people can use it. I think it must have occurred after the end of my tenure at Lucid (October 2011), because my recollection is that it was then just known as distributed. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:26 PM, Erick Erickson erickerick...@gmail.com wrote: Basically, I think about using SolrCloud whenever you have to split your corpus into more than one core (shard in SolrCloud terms). Or when you require fault tolerance in terms of machines going up and down. Despite the name, it does _not_ require AWS or similar, and you can run SolrCloud on a single machine, that is host multiple shards on a single physical machine to take advantage of the many CPU cores often available on modern hardware. Or you can host your SolrCloud in your own data center. Or, really, anywhere that you have one or more machines available that can talk to each other. I _really_ recommend you look at this option before pursuing your original question, it's vastly easier to let SolrCloud handle your routing, queries etc. than re-invent all that yourself. Best, Erick On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова batalova...@gmail.com wrote: Upayavira, I'm using stand-alone Solr instances. I've not learnt SolrCloud yet. Please, give me some advice when SolrCloud is better then stand-alone Solr instances. Or when it is worth to choose SolrCloud. _ _ _ Batalova Kseniya If you are using stand-alone Solr instances, then it is your responsibility to decide which node a document resides in, and thus to which core you will send your update request. If, however, you used SolrCloud, it would handle that for you - deciding which node should contain a document, and directing the update their all behind the scenes for you. Upayavira On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: SolrCloud 5.1 startup looking for standalone config
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port) http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr The shard directories do now appear on s1,s2,s3 but the order is different every time I DELETE the collection and rerun the CREATE, right now it is s1: mycollection_shard2_replica1 s2: mycollection_shard3_replica1 s3: mycollection_shard1_replica1 I'll look further at your article but any advice appreciated on controlling what hosts the shards land on. Also are these considered leaders? If so I don't understand the replica1 suffix. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Atomic Updates
On 6/3/2015 2:19 PM, Jack Krupansky wrote: BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster would make a lot more sense since a cloud is typically a very large collection of machines and more of a place than a specific configuration, while a Solr deployment is more typically a more modest number of machines, a cluster. It just seems totally out of sync with the current popular conception of a cloud, and it helps confuse people as to when and where people can use it. I think it must have occurred after the end of my tenure at Lucid (October 2011), because my recollection is that it was then just known as distributed. This all happened before I was paying attention to any development stuff on Solr. The earliest mention I have found so far is this: https://issues.apache.org/jira/browse/SOLR-1873 Here's the first revision of the SolrCloud wiki page that I can access: http://wiki.apache.org/solr/SolrCloud?action=recallrev=1 I can't find anything about the origins. I'd like to search the dev list for history, but I can't find anyplace where this list is searchable for the correct (2009-2010) timeframe. Possible origins that I have thought of: 1) *Very* large clusters were envisioned. There are real SolrCloud installs consisting of hundreds of machines and billions of documents. That certainly qualifies for the cloud moniker. 2) Somebody was interested in leveraging a hot buzzword, to help generate excitement and support for a new feature. Thanks, Shawn
Re: retrieving large number of docs
that doesnt work either, and even if it did, joining is not going to be a solution since i cant query 1 core and facet on the result of the other. To sum up, my problem is core0 field:id field: text core1 field:id field tag I want to 1) query text field of core0, 2) use the {id} of matches (which can be 10K) to retrieve the docs in core 1 with same id and 3) facet on tags in core1 Is this possible without denormalizing (which is not an option)? thank you On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Specify the join query parser for the main query. See: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser -- Jack Krupansky On Wed, Jun 3, 2015 at 3:32 PM, Robust Links pey...@robustlinks.com wrote: Hi Erick they are on the same JVM. I had already tried the core join strategy but that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and core1, and I run this query on core0 /select?q=QUERYfq={!join from=id1 to=id2 fromIndex=core1}facet=truefacet.field=tag has 2 problems 1) i need to specify the docIDs with the fq (so back to the same fq={!terms} problem), and 2) faceting doesnt work Flattening the data is not possible due to security reasons. Am I using join correctly? thank you Erick Peyman On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com wrote: Are these indexes on different machines? Because if they're in the same JVM, you might be able to use cross-core joins. Be aware, though, that joining on high-cardinality fields (which, by definition, docID probably is) is where pseudo joins perform worst. Have you considered flattening the data and including whatever information you have in your from index in your main index? Because 100ms response is probably not going to be tough if you have to have two indexes/cores. Best, Erick On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html ). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: BoolField fieldType
I took a quick look at the code and it _looks_ like any string starting with t, T or 1 is evaluated as true and everything else as false. sortMissingLast determines sort order if you're sorting on this field and the document doesn't have a value. Should the be sorted after or before docs that have a value for the field? Hmm, could use some better docs Erick On Wed, Jun 3, 2015 at 2:38 PM, Steven White swhite4...@gmail.com wrote: Hi everyone, This is a two part question: 1) I see the following: fieldType name=boolean class=solr.BoolField sortMissingLast=true/ a) what does sortMissingLast do? b) what kind of data is considered Boolean? TRUE, True, true, 1, yes,, Yes, FALSE, etc. 2) When searching, what do I search on: q=MyBoolField:what That is what should what be? Thanks Steve
BoolField fieldType
Hi everyone, This is a two part question: 1) I see the following: fieldType name=boolean class=solr.BoolField sortMissingLast=true/ a) what does sortMissingLast do? b) what kind of data is considered Boolean? TRUE, True, true, 1, yes,, Yes, FALSE, etc. 2) When searching, what do I search on: q=MyBoolField:what That is what should what be? Thanks Steve
Re: SolrCloud 5.1 startup looking for standalone config
On 6/3/2015 2:48 PM, tuxedomoon wrote: Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port) http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr The shard directories do now appear on s1,s2,s3 but the order is different every time I DELETE the collection and rerun the CREATE, right now it is s1: mycollection_shard2_replica1 s2: mycollection_shard3_replica1 s3: mycollection_shard1_replica1 I'll look further at your article but any advice appreciated on controlling what hosts the shards land on. Also are these considered leaders? If so I don't understand the replica1 suffix. A leader is merely a replica that has won an election and has a temporary title. It's still a replica, even if it's the ONLY replica. I would need to look at the code to figure out how it works, but I would imagine that the shards are shuffled randomly among the hosts so that multiple collections will be evenly distributed across the cluster. It would take me quite a while to familiarize myself with the code before I could figure out where to look. If you want to have absolute control over shard and replica placement, then you will probably need to follow steps similar to these: * Create a collection with replicationFactor=1. * Create foo_shardN_replica2 cores with CoreAdmin or ADDREPLICA where you want them. * Let the replication fully catch up. * Use DELETEREPLICA on all the foo_shardN_replica1 cores. * Manually create the foo_shardN_replica1 cores where you want them. * Manually create any additional replicas that you desire. Thanks, Shawn
Lost connection to Zookeeper
Hi All - I've run into a problem where every-once in a while one or more of the shards (27 shard cluster) will loose connection to zookeeper and report updates are disabled. In additional to the CLUSTERSTATUS timeout errors, which don't seem to cause any issue, this one certainly does as that shard no longer takes any (you guessed it!) updates! We are using Zookeeper with 7 nodes (7 servers in our quorum). There stack trace is: - 282833508 [qtp1221263105-801058] INFO org.apache.solr.update.processor.LogUpdateProcessor [UNCLASS shard17 core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update params={wt=javabinversion=2} {add=[COLLECT20001208773720 (1502857505963769856)]} 0 3 282837711 [qtp1221263105-802489] INFO org.apache.solr.update.processor.LogUpdateProcessor [UNCLASS shard17 core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update params={wt=javabinversion=2} {add=[COLLECT20001208773796 (1502857510369886208)]} 0 3 282839485 [qtp1221263105-800319] INFO org.apache.solr.update.processor.LogUpdateProcessor [UNCLASS shard17 core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update params={wt=javabinversion=2} {add=[COLLECT20001208773821 (1502857512230060032)]} 0 4 282841460 [qtp1221263105-801228] INFO org.apache.solr.update.processor.LogUpdateProcessor [UNCLASS shard17 core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update params={wt=javabinversion=2} {} 0 1 282841461 [qtp1221263105-801228] ERROR org.apache.solr.core.SolrCore [UNCLASS shard17 core_node17 UNCLASS] â org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1474) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:661) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:103) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at
How http connections are handled in Solr?
Hi, I wanted to know in detail on how it is http connections are handled in Solr. 1. From my code, I am using CloudSolrServer of solrj client library to get the connection. From one of my previous discussion in this forum, I understood that Solr uses Apache's HttpClient for connections and the default maxConnections per host is 32 and default max connections is 128. *CloudSolrServer cloudSolrServer = new CloudSolrServer(zookeeper_quorum);* *cloudSolrServer.connect();* My first question here is what does this maxConnectionsperHost and maxConnections imply? Are these the connections from solrj client to the Zookeeper quorum OR from solrj client to the solr nodes? 2. CloudSolrServer uses LBHttpSolrServer which does send requests in round robin fashion, i.e., first request to node1, 2nd request to node2 etc. If the answer to the above question is from solrj client to the solr nodes, then does the http connection pool to the solr nodes from solrj client will be created for the first request to a particular solr node during round robin? 3. Consider in my solr cloud I have one collection with 8 shards spread on 4 solr nodes. My understanding is that solrj client will send a query to one the solr core ( eg:solr core1) residing in one of the solr node (eg: node1). The solr core1 is responsible for sending queries to all the 8 Solr cores of that collection. Once it gets the response from all the solr cores, it merges the data and returns to the client. In this process, how the http connections between one solr node and rest of solr nodes are handled. Does Solr maintains a connection pool here between Solr nodes? If so, when the connection pool is created between the Solr nodes? Thanks, Manohar
Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
On 6/3/2015 12:20 AM, Clemens Wyss DEV wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space snip Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does not seem to happen. What am I missing? Is it o to pull a heapdump from Solr before killing/rebooting in oom_solr.sh? Also I would like to know what query parameters were sent to /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ... The oom script just kills Solr with the KILL signal (-9) and logs the kill. That's it. It does not attempt to make a heap dump. If you *want* to dump the heap on OOM, you can, with some additional options: http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss/20496376#20496376 I don't know if a heap dump on OOM is compatible with the OOM script. If Java chooses to run the OOM script before the heap dump is done, the process will be killed before the heap finishes dumping. FYI, the stacktrace on the OOM error, especially in a multi-threaded app like Solr, will frequently be completely useless in tracking down the problem. The thread that makes the triggering memory allocation may be completely unrelated. This error happened on a suggest handler ... but the large memory allocations may be happening in a completely different part of the code. We have not had any recent indications of a memory leak in Solr. Memory leaks in Solr *do* happen, but they are usually caught by the tests. which run in a minimal memory space. The project has continuous integration servers set up that run all the tests many times per day. If you are running out of heap with 16GB allocated, then either your Solr installation is enormous or you've got a configuration that's not tuned properly. With a very large Solr installation, you may need to simply allocate more memory to the heap ... which may mean that you'll need to install more memory in the server. The alternative would be figuring out where you can change your configuration to reduce memory requirements. Here's some incomplete info on settings and situations that can require a very large heap: https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap To provide much help, we'll need lots of details about your system ... number of documents in all cores, total index size on disk, your config, possibly your schema, and maybe a few other things I haven't thought of yet. Thanks, Shawn
Re: Derive suggestions across multiple fields
Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldContent, Summary/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent I'm using solr 5.1. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does not seem to happen. What am I missing? Is it o to pull a heapdump from Solr before killing/rebooting in oom_solr.sh? Also I would like to know what query parameters were sent to /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ...
Re: Number of clustering labels to show
Thank you so much for your explanation. On 2 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com wrote: The scope in there is to try to make clustering lighter and more related to the query. The summary produced is a fragment that is surrounding the query terms in the document content. Actually this is arguably a way to improve the quality of clusters, but for sure it makes the clustering operation lighter, as the content used to produce the clusters is much smaller than the full content. We can discuss of course if the window of text surrounding queries match is really helpful to cluster the documents in a more precise way. That is not an easy research topic, and for sure it depends strictly on the use cases. For this reason a user should decide if going with the summary ( lighter) approach or the more comprehensive , full content approach. Cheers 2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you so much Alessandro. But i do not find any difference with the quality of the clustering results when I change the hl.fragszie to a even though I've set my carrot.produceSummary to true. Regards, Edwin On 1 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Only to clarify the initial mail, The carrot.fragSize has nothing to do with the number of clusters produced. When you select to work with field summary ( you will work only on snippets from the original content, snippets produced by the highlight of the query in the content), the fragSize will specify the size of these fragments. From Carrot documentation : carrot.produceSummary When true, the carrot.snippet https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field (if no snippet field, then the carrot.title https://wiki.apache.org/solr/ClusteringComponent#carrot.title field) will be highlighted and the highlighted text will be used for clustering. Highlighting is recommended when the snippet field contains a lot of content. Highlighting can also increase the quality of clustering because the clustered content will get an additional query-specific context. carrot.fragSize The frag size to use for highlighting. Meaningful only when carrot.produceSummary https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary is true. If not specified, the default highlighting fragsize (hl.fragsize) will be used. If that isn't specified, then 100. Cheers 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you Stanislaw for the links. Will read them up to better understand how the algorithm works. Regards, Edwin On 29 May 2015 at 17:22, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: Hi, The number of clusters primarily depends on the parameters of the specific clustering algorithm. If you're using the default Lingo algorithm, the number of clusters is governed by the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look at the documentation ( https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings ) for some more details (the Tweaking at Query-Time section shows how to pass the specific parameters at request time). A complete overview of the Lingo clustering algorithm parameters is here: http://doc.carrot2.org/#section.component.lingo. Stanislaw -- Stanislaw Osinski, stanislaw.osin...@carrotsearch.com http://carrotsearch.com On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm trying to increase the number of cluster result to be shown during the search. I tried to set carrot.fragSize=20 but only 15 cluster labels is shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels shown. Is this the correct way to do this? I understand that setting it to 20 might not necessary mean 20 lables will be shown, as the setting is for maximum number. But when I set this to 5, it should reduce the number of labels to 5? I'm using Solr 5.1. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night,
AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
Ciao Shawn, thanks for your reply. The oom script just kills Solr with the KILL signal (-9) and logs the kill. I know. But my feeling is, that not even this happens, i.e. the script is not being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log file ... Btw: Who re-starts solr after it's been killed? FYI, the stacktrace on the OOM error, especially in a multi-threaded app like Solr, will frequently be completely useless in tracking down the problem. I agree I don't know if a heap dump on OOM is compatible with the OOM script. If Java chooses to run the OOM script before the heap dump is done, the process will be killed before the heap finishes dumping. What if I did a jmap-call in the oom-script before killing the process? -Clemens -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Mittwoch, 3. Juni 2015 09:16 An: solr-user@lucene.apache.org Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered On 6/3/2015 12:20 AM, Clemens Wyss DEV wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space snip Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does not seem to happen. What am I missing? Is it o to pull a heapdump from Solr before killing/rebooting in oom_solr.sh? Also I would like to know what query parameters were sent to /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ... The oom script just kills Solr with the KILL signal (-9) and logs the kill. That's it. It does not attempt to make a heap dump. If you *want* to dump the heap on OOM, you can, with some additional options: http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss/20496376#20496376 I don't know if a heap dump on OOM is compatible with the OOM script. If Java chooses to run the OOM script before the heap dump is done, the process will be killed before the heap finishes dumping. FYI, the stacktrace on the OOM error, especially in a multi-threaded app like Solr, will frequently be completely useless in tracking down the problem. The thread that makes the triggering memory allocation may be completely unrelated. This error happened on a suggest handler ... but the large memory allocations may be happening in a completely different part of the code. We have not had any recent indications of a memory leak in Solr. Memory leaks in Solr *do* happen, but they are usually caught by the tests. which run in a minimal memory space. The project has continuous integration servers set up that run all the tests many times per day. If you are running out of heap with 16GB allocated, then either your Solr installation is enormous or you've got a configuration that's not tuned properly. With a very large Solr installation, you may need to simply allocate more memory to the heap ... which may mean that you'll need to install more memory in the server. The alternative would be figuring out where you can change your configuration to reduce memory requirements. Here's some incomplete info on settings and situations that can require a very large heap: https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap To provide much help, we'll need lots of details about your system ... number of documents in all cores, total index size on disk, your config, possibly your schema, and maybe a few other things I
Re: Derive suggestions across multiple fields
This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldtext/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=rows10/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Yes, I've read the guide. I've found out that there is a need to do re-indexing if I'm creating a new copyField. It works when I used the copyField that's created before the indexing is done. As I'm using the spellcheck dictionary as my suggester, so does that mean I just need to build the spellcheck dictionary? Regards, Edwin On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldContent, Summary/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent I'm using solr 5.1. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr Atomic Updates
Explain a little about why you have separate cores, and how you decide which core a new document should reside in. Your scenario still seems a bit odd, so help us understand. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: Solr Atomic Updates
If you are using stand-alone Solr instances, then it is your responsibility to decide which node a document resides in, and thus to which core you will send your update request. If, however, you used SolrCloud, it would handle that for you - deciding which node should contain a document, and directing the update their all behind the scenes for you. Upayavira On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: How to tell when Collector finishes collect loop?
I think there are easier ways to do what you are trying to do. Take a look at the Function query parser. It will allow you to control the score for each document from within a function query. The basic use case is this: q={!func}myFunc()fq=my+query In this scenario the func qparser plugin controls the score and the fq provides the query. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 9:50 AM, adfel70 adfe...@gmail.com wrote: Hi guys, need your help (again): I have a search handler which need to override solr's scoring. I chose to implement it with RankQuery API, so when getTopDocsCollector() gets called it instantiates my TopDocsCollector instance, and every dicId gets its own score: public class MyScorerrankQuet extends RankQuery { ... @Override public TopDocsCollector getTopDocsCollector(int i, SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { ... return new MyCollector(...) } } public class MyCollector extends TopDocsCollector{ //Initialized in constrctor MyScorer scorer; public MyCollector(){ scorer = new MyScorer(); scorer.start(); //the scorer's API needs to call start() before every query and close() at the end of the query } @Override public void collect(int id){ //1. get specific field from the doc using DocValues and calculate score using my scorer //2. add docId and score (ScoreDoc object) into PriorityQueue. } } My problem is that I cant find a place to call scorer.close(), which need to be executed when the query ends (after we calculated score for each docID). I saw the DeligatingCollector has finish() method which is called after collector is done, but I cannot extend both TopDocsCollector and DeligatingCollector... -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to tell when Collector finishes collect loop?
The finish method would still be a problem using the func qparser. Out of curiosity, why do you need to call close on the scorer? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 10:53 AM, Joel Bernstein joels...@gmail.com wrote: I think there are easier ways to do what you are trying to do. Take a look at the Function query parser. It will allow you to control the score for each document from within a function query. The basic use case is this: q={!func}myFunc()fq=my+query In this scenario the func qparser plugin controls the score and the fq provides the query. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 9:50 AM, adfel70 adfe...@gmail.com wrote: Hi guys, need your help (again): I have a search handler which need to override solr's scoring. I chose to implement it with RankQuery API, so when getTopDocsCollector() gets called it instantiates my TopDocsCollector instance, and every dicId gets its own score: public class MyScorerrankQuet extends RankQuery { ... @Override public TopDocsCollector getTopDocsCollector(int i, SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { ... return new MyCollector(...) } } public class MyCollector extends TopDocsCollector{ //Initialized in constrctor MyScorer scorer; public MyCollector(){ scorer = new MyScorer(); scorer.start(); //the scorer's API needs to call start() before every query and close() at the end of the query } @Override public void collect(int id){ //1. get specific field from the doc using DocValues and calculate score using my scorer //2. add docId and score (ScoreDoc object) into PriorityQueue. } } My problem is that I cant find a place to call scorer.close(), which need to be executed when the query ends (after we calculated score for each docID). I saw the DeligatingCollector has finish() method which is called after collector is done, but I cannot extend both TopDocsCollector and DeligatingCollector... -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How http connections are handled in Solr?
On 6/3/2015 4:12 AM, Manohar Sripada wrote: 1. From my code, I am using CloudSolrServer of solrj client library to get the connection. From one of my previous discussion in this forum, I understood that Solr uses Apache's HttpClient for connections and the default maxConnections per host is 32 and default max connections is 128. *CloudSolrServer cloudSolrServer = new CloudSolrServer(zookeeper_quorum);* *cloudSolrServer.connect();* My first question here is what does this maxConnectionsperHost and maxConnections imply? Are these the connections from solrj client to the Zookeeper quorum OR from solrj client to the solr nodes? By default, CloudSolrServer sets up an HttpClient object that is given to the LBHttpSolrServer instance inside it. The LBHttpSolrServer object shares that HttpClient between all of the HttpSolrServer objects that it maintains. You can configure your own HttpClient in your code and then use that to create CloudSolrServer: http://lucene.apache.org/solr/5_1_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient%28java.util.Collection,%20java.lang.String,%20org.apache.http.client.HttpClient%29 Zookeeper is a separate jar entirely, and handles its own network connectivity. That connectivity is NOT http. 3. Consider in my solr cloud I have one collection with 8 shards spread on 4 solr nodes. My understanding is that solrj client will send a query to one the solr core ( eg:solr core1) residing in one of the solr node (eg: node1). The solr core1 is responsible for sending queries to all the 8 Solr cores of that collection. Once it gets the response from all the solr cores, it merges the data and returns to the client. In this process, how the http connections between one solr node and rest of solr nodes are handled. For distributed searching, Solr uses the SolrJ client internally to collect responses from the shards. The HttpClient for THAT communication is configured with the shardHandler in solrconfig.xml. https://wiki.apache.org/solr/SolrConfigXml?highlight=%28shardhandler%29#Configuration_of_Shard_Handlers_for_Distributed_searches Thanks, Shawn
AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
Hi Mark, what exactly should I file? What needs to be added/appended to the issue? Regards Clemens -Ursprüngliche Nachricht- Von: Mark Miller [mailto:markrmil...@gmail.com] Gesendet: Mittwoch, 3. Juni 2015 14:23 An: solr-user@lucene.apache.org Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered We will have to a find a way to deal with this long term. Browsing the code I can see a variety of places where problem exception handling has been introduced since this all was fixed. - Mark On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote: File a JIRA issue please. That OOM Exception is getting wrapped in a RuntimeException it looks. Bug. - Mark On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd
Re: Derive suggestions across multiple fields
I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to apply the suggesting guidance - currently there is no need to use spellcheck for suggestions, now they are separated things - i see text used to derive suggestions, I would prefer there to see the copy field specifically used to contain the interesting fields - Yes you need to build the suggester the first time to see suggestions - Yes , if you add a copy field yo need to re-index to see it filled ! Cheers 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldtext/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=rows10/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Yes, I've read the guide. I've found out that there is a need to do re-indexing if I'm creating a new copyField. It works when I used the copyField that's created before the indexing is done. As I'm using the spellcheck dictionary as my suggester, so does that mean I just need to build the spellcheck dictionary? Regards, Edwin On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldContent, Summary/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent I'm using solr 5.1. Regards, Edwin -- -- Benedetti Alessandro
Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
File a JIRA issue please. That OOM Exception is getting wrapped in a RuntimeException it looks. Bug. - Mark On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does not seem to happen. What am I missing? Is it o to pull a heapdump from Solr before killing/rebooting in oom_solr.sh? Also I would like to know what query parameters were sent to /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ... -- - Mark about.me/markrmiller
How to tell when Collector finishes collect loop?
Hi guys, need your help (again): I have a search handler which need to override solr's scoring. I chose to implement it with RankQuery API, so when getTopDocsCollector() gets called it instantiates my TopDocsCollector instance, and every dicId gets its own score: public class MyScorerrankQuet extends RankQuery { ... @Override public TopDocsCollector getTopDocsCollector(int i, SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) { ... return new MyCollector(...) } } public class MyCollector extends TopDocsCollector{ //Initialized in constrctor MyScorer scorer; public MyCollector(){ scorer = new MyScorer(); scorer.start(); //the scorer's API needs to call start() before every query and close() at the end of the query } @Override public void collect(int id){ //1. get specific field from the doc using DocValues and calculate score using my scorer //2. add docId and score (ScoreDoc object) into PriorityQueue. } } My problem is that I cant find a place to call scorer.close(), which need to be executed when the query ends (after we calculated score for each docID). I saw the DeligatingCollector has finish() method which is called after collector is done, but I cannot extend both TopDocsCollector and DeligatingCollector... -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
We will have to a find a way to deal with this long term. Browsing the code I can see a variety of places where problem exception handling has been introduced since this all was fixed. - Mark On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote: File a JIRA issue please. That OOM Exception is getting wrapped in a RuntimeException it looks. Bug. - Mark On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr -Dlog4j.configuration=file:/opt/solr/log4j.properties -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /opt/solr/logs OPTIONS=default,rewrite So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does not seem to happen. What am I missing? Is it o to pull a heapdump from Solr before killing/rebooting in oom_solr.sh? Also I would like to know what query parameters were sent to /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ... -- - Mark
Re: Solr Atomic Updates
Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Could not find configName for collection client_active found:nul
I’m helping someone with this but my zookeeper experience is limited (as in none). They have purportedly followed the instruction from the wiki. https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader updateClusterState INFO: Updating cloud state from ZooKeeper... Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Check for collection zkNode:client_active Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater updateState INFO: Update state numShards=null message={ operation:state, state:down, base_url:http://10.10.1.178:8983/solr;, core:client_active, roles:null, node_name:10.10.1.178:8983_solr, shard:null, collection:client_active, numShards:null, core_node_name:10.10.1.178:8983_solr_client_active} Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Creating collection in ZooKeeper:client_active Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater updateState INFO: shard=shard1 is already registered Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName INFO: Looking for collection configName Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 1 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) Jun 02, 2015 2:40:40 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 2 Jun 02, 2015 2:40:43 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 3 Jun 02, 2015 2:40:46 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 4 Jun 02, 2015 2:40:49 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 5 Jun 02, 2015 2:40:52 PM org.apache.solr.cloud.ZkController getConfName SEVERE: Could not find configName for collection client_active Jun 02, 2015 2:40:52 PM org.apache.solr.core.CoreContainer recordAndThrow SEVERE: Unable to create core: client_active org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection client_active found:null -- -- *Mi aerodeslizador está lleno de anguilas.*
Re: AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
On 6/3/2015 1:41 AM, Clemens Wyss DEV wrote: The oom script just kills Solr with the KILL signal (-9) and logs the kill. I know. But my feeling is, that not even this happens, i.e. the script is not being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log file ... Btw: Who re-starts solr after it's been killed? I'm not sure what to think here. I wonder if the Java you are using is broken? Restarting would most likely need to be handled by you. You could put Solr under the management of something that keeps it running, like heartbeat, pacemaker, or the supervisor program that comes with qmail, whose name I can never remember. You could add something to send you an email every time the OOM script is run, so you know it has been killed. If you have sized the memory appropriately, the OOM killer will never run. I don't know if a heap dump on OOM is compatible with the OOM script. If Java chooses to run the OOM script before the heap dump is done, the process will be killed before the heap finishes dumping. What if I did a jmap-call in the oom-script before killing the process? You very likely could add that, but be aware of how much time anything you add will take. If your cloud is sufficiently redundant, then it probably won't matter if one node is down for several minutes. Thanks, Shawn
Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered
bq: what exactly should I file? What needs to be added/appended to the issue? Just what Mark said, title it something like OOM exception wrapped in runtime exception Include your original post and that you were asked to open the JIRA after discussion on the user's list. Don't worry too much, the title etc. can be changed after as things become clearer. Best, Erick On Wed, Jun 3, 2015 at 5:58 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Hi Mark, what exactly should I file? What needs to be added/appended to the issue? Regards Clemens -Ursprüngliche Nachricht- Von: Mark Miller [mailto:markrmil...@gmail.com] Gesendet: Mittwoch, 3. Juni 2015 14:23 An: solr-user@lucene.apache.org Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered We will have to a find a way to deal with this long term. Browsing the code I can see a variety of places where problem exception handling has been introduced since this all was fixed. - Mark On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote: File a JIRA issue please. That OOM Exception is getting wrapped in a RuntimeException it looks. Bug. - Mark On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch wrote: Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for Solr. I am seeing the following OOMs: ERROR - 2015-06-03 05:17:13.317; [ customer-1-de_CH_1] org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space WARN - 2015-06-03 05:17:13.319; [ customer-1-de_CH_1] org.eclipse.jetty.servlet.ServletHandler; Error for /solr/customer-1-de_CH_1/suggest_phrase java.lang.OutOfMemoryError: Java heap space The full commandline is /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
Re: Could not find configName for collection client_active found:nul
It's not entirely clear what you're trying to do when this is pushed out, but I'm guessing it's create a collection. If that's so, then this is your problem: Could not find configName for collection client_active You've set up Zookeeper correctly. But _before_ you create a collection, you have to upload a configset to Zookeeper. This is actually just a Solr conf directory, where thngs like schema.xml, solrconfig.xml and all that live. If you use the startup scripts with '-c -z zkaddress -e cloud' options, you'll be guided through this process. Otherwise, you'll need to push a configuration up to Zookeeper with the command line options, see: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities Note that when creating a collection, if you _don't_ specify collection.configName when you create a collection, Solr will assume that there is a configset with the same name as your collection. But just to check that your ZK is set up, take a look at the Solr admin UIcloud. If you see things like livenodes that shows Solr (expand the triangle), then Zookeeper is running just fine and Solr can talk to it. Best, Erick On Wed, Jun 3, 2015 at 5:36 AM, David McReynolds david.mcreyno...@gmail.com wrote: I’m helping someone with this but my zookeeper experience is limited (as in none). They have purportedly followed the instruction from the wiki. https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader updateClusterState INFO: Updating cloud state from ZooKeeper... Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Check for collection zkNode:client_active Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater updateState INFO: Update state numShards=null message={ operation:state, state:down, base_url:http://10.10.1.178:8983/solr;, core:client_active, roles:null, node_name:10.10.1.178:8983_solr, shard:null, collection:client_active, numShards:null, core_node_name:10.10.1.178:8983_solr_client_active} Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController createCollectionZkNode INFO: Creating collection in ZooKeeper:client_active Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater updateState INFO: shard=shard1 is already registered Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName INFO: Looking for collection configName Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 1 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) Jun 02, 2015 2:40:40 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 2 Jun 02, 2015 2:40:43 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 3 Jun 02, 2015 2:40:46 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 4 Jun 02, 2015 2:40:49 PM org.apache.solr.cloud.ZkController getConfName INFO: Could not find collection configName - pausing for 3 seconds and trying again - try: 5 Jun 02, 2015 2:40:52 PM org.apache.solr.cloud.ZkController getConfName SEVERE: Could not find configName for collection client_active Jun 02, 2015 2:40:52 PM org.apache.solr.core.CoreContainer recordAndThrow SEVERE: Unable to create core: client_active org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection client_active found:null -- -- *Mi aerodeslizador está lleno de anguilas.*
Re: Derive suggestions across multiple fields
Thank you for your suggestions. Will try that out and update on the results again. Regards, Edwin On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to apply the suggesting guidance - currently there is no need to use spellcheck for suggestions, now they are separated things - i see text used to derive suggestions, I would prefer there to see the copy field specifically used to contain the interesting fields - Yes you need to build the suggester the first time to see suggestions - Yes , if you add a copy field yo need to re-index to see it filled ! Cheers 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldtext/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=rows10/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Yes, I've read the guide. I've found out that there is a need to do re-indexing if I'm creating a new copyField. It works when I used the copyField that's created before the indexing is done. As I'm using the spellcheck dictionary as my suggester, so does that mean I just need to build the spellcheck dictionary? Regards, Edwin On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str
Re: Sorting in Solr
: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter : : I think we may have an omission from the docs -- docValues can also be : used for sorting, and may also offer a performance advantage. I added a note about that. -Hoss http://www.lucidworks.com/
retrieving large number of docs
Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: How to identify field names from the suggested values in multiple fields
Configure two suggesters, one based on each field. Use both of them and you’ll get separate suggestions from each. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote: Hi Anyone help me to build a suggester auto complete based on multiple fields? There are two fields in my schema. Category and Subcategory and I'm trying to build suggester based on these 2 fields. When the suggestions result, how can I distinguish from which filed it come from? I used a copyfields to combine multiple fields into single field and use that field in suggester But this will return the combined result of category and subcategory. I can't differentiate the results that fetch from which field These are the copyfields for autocomplete copyField source=category dest=businessAutoComplete/ copyField source=subcategory dest=businessAutoComplete/ Suggestions should know from which field its from. For Eg my suggester returns 5 results for the keyword schools. In that result 2 from the category field and 3 from the subcategory field. Schools (Category) Primary Schools (Subcategory) Driving Schools (Subcategory) Day care and play school (Subcategory) Day Care/Play School (Category) Is there any way to build like this ?? -- [image: hifx_logo] http://hifx.in/ *dhanesh s.R * Team Lead t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4 703 e: dhan...@hifx.in | w: www.hifx.in https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets https://www.linkedin.com/company/2889649 https://plus.google.com/104259935226993895226/about -- -- IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system. Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any). Before you print this email or attachments, please consider the negative environmental impacts associated with printing.
Re: Solr Atomic Updates
Upayavira, I'm using stand-alone Solr instances. I've not learnt SolrCloud yet. Please, give me some advice when SolrCloud is better then stand-alone Solr instances. Or when it is worth to choose SolrCloud. _ _ _ Batalova Kseniya If you are using stand-alone Solr instances, then it is your responsibility to decide which node a document resides in, and thus to which core you will send your update request. If, however, you used SolrCloud, it would handle that for you - deciding which node should contain a document, and directing the update their all behind the scenes for you. Upayavira On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: retrieving large number of docs
what would be a custom solution? On Wed, Jun 3, 2015 at 1:58 PM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html ). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: retrieving large number of docs
Erick makes a great point, if they are in the same VM try the cross-core join first. It might be fast enough for you. A custom solution would be to build a custom query or post filter that works with your specific scenario. For example if the docID's are integers you could build a fast PostFilter using data structures best suited for integer filters. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 2:23 PM, Robust Links pey...@robustlinks.com wrote: what would be a custom solution? On Wed, Jun 3, 2015 at 1:58 PM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html ). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: Derive suggestions across multiple fields
My previous suggester configuration is derived from this page: https://wiki.apache.org/solr/Suggester Does it mean that what is written there is outdated? Regards, Edwin On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for your suggestions. Will try that out and update on the results again. Regards, Edwin On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to apply the suggesting guidance - currently there is no need to use spellcheck for suggestions, now they are separated things - i see text used to derive suggestions, I would prefer there to see the copy field specifically used to contain the interesting fields - Yes you need to build the suggester the first time to see suggestions - Yes , if you add a copy field yo need to re-index to see it filled ! Cheers 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldtext/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=rows10/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Yes, I've read the guide. I've found out that there is a need to do re-indexing if I'm creating a new copyField. It works when I used the copyField that's created before the indexing is done. As I'm using the spellcheck dictionary as my suggester, so does that mean I just need to build the spellcheck dictionary? Regards, Edwin On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker
How to identify field names from the suggested values in multiple fields
Hi Anyone help me to build a suggester auto complete based on multiple fields? There are two fields in my schema. Category and Subcategory and I'm trying to build suggester based on these 2 fields. When the suggestions result, how can I distinguish from which filed it come from? I used a copyfields to combine multiple fields into single field and use that field in suggester But this will return the combined result of category and subcategory. I can't differentiate the results that fetch from which field These are the copyfields for autocomplete copyField source=category dest=businessAutoComplete/ copyField source=subcategory dest=businessAutoComplete/ Suggestions should know from which field its from. For Eg my suggester returns 5 results for the keyword schools. In that result 2 from the category field and 3 from the subcategory field. Schools (Category) Primary Schools (Subcategory) Driving Schools (Subcategory) Day care and play school (Subcategory) Day Care/Play School (Category) Is there any way to build like this ?? -- [image: hifx_logo] http://hifx.in/ *dhanesh s.R * Team Lead t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4 703 e: dhan...@hifx.in | w: www.hifx.in https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets https://www.linkedin.com/company/2889649 https://plus.google.com/104259935226993895226/about -- -- IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system. Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any). Before you print this email or attachments, please consider the negative environmental impacts associated with printing.
Re: Derive suggestions across multiple fields
This may be helpful: http://lucidworks.com/blog/solr-suggester/ Note that there are a series of fixes in various versions of Solr, particularly buildOnStartup=false and working on multivalued fields. Best, Erick On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: My previous suggester configuration is derived from this page: https://wiki.apache.org/solr/Suggester Does it mean that what is written there is outdated? Regards, Edwin On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for your suggestions. Will try that out and update on the results again. Regards, Edwin On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I can see a lot of confusion in the configuration! Few suggestions : - read carefully the document and try to apply the suggesting guidance - currently there is no need to use spellcheck for suggestions, now they are separated things - i see text used to derive suggestions, I would prefer there to see the copy field specifically used to contain the interesting fields - Yes you need to build the suggester the first time to see suggestions - Yes , if you add a copy field yo need to re-index to see it filled ! Cheers 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: This is my suggester configuration: searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldtext/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent requestHandler class=org.apache.solr.handler.component.SearchHandler name=/suggest lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=rows10/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str /lst arr name=components strsuggest/str /arr /requestHandler Yes, I've read the guide. I've found out that there is a need to do re-indexing if I'm creating a new copyField. It works when I used the copyField that's created before the indexing is done. As I'm using the spellcheck dictionary as my suggester, so does that mean I just need to build the spellcheck dictionary? Regards, Edwin On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Can you share you suggester configurations ? Have you read the guide I linked ? Has the suggestion index/fst has been built ? ( you need to build the suggester) Cheers 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you for your explanation. I'll not need to care where the suggestions are coming from. All the suggestions from different fields can be consolidate and display together. I've tried to put those field into a new Suggestion copy field, but no suggestion is shown when I set: str name=fieldSuggestion/str !-- the indexed field to derive suggestions from -- Is there a need to re-index the documents in order for this to work? Regards, Edwin On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Hi Edwin, I have worked extensively recently in Suggester and the blog I feel to suggest is Erick's one. It's really detailed and good for a beginner and expert as well. [1] Apart that let's see you particular use case : 1) Do you want to be able to get also where the suggestions are coming from ? e.g. suggestion1 from field1 suggestion2 from field2 ? In this case I would try with multiple dictionaries but I am not sure Solr allows you to use them concurrently. But can be a really nice extension to develop. 2) If you don't care where the suggestions are coming from, just use a copy field, where you copy the content of the interesting fields. The suggestions will come from the fields you have copied in the copy field, without distinction. Hope this helps you Cheers [1] http://lucidworks.com/blog/solr-suggester/ 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com : Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set
Re: How to identify field names from the suggested values in multiple fields
Thank you for the quick response. If I use 2 suggesters, can I get the result in a single request? http://192.17.80.99:8983/solr/core1/suggest?suggest=truesuggest.dictionary=mySuggesterwt=xmlsuggest.q=school Is there any helping document to build multiple suggesters?? On Thu, Jun 4, 2015 at 10:40 AM, Walter Underwood wun...@wunderwood.org wrote: Configure two suggesters, one based on each field. Use both of them and you’ll get separate suggestions from each. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote: Hi Anyone help me to build a suggester auto complete based on multiple fields? There are two fields in my schema. Category and Subcategory and I'm trying to build suggester based on these 2 fields. When the suggestions result, how can I distinguish from which filed it come from? I used a copyfields to combine multiple fields into single field and use that field in suggester But this will return the combined result of category and subcategory. I can't differentiate the results that fetch from which field These are the copyfields for autocomplete copyField source=category dest=businessAutoComplete/ copyField source=subcategory dest=businessAutoComplete/ Suggestions should know from which field its from. For Eg my suggester returns 5 results for the keyword schools. In that result 2 from the category field and 3 from the subcategory field. Schools (Category) Primary Schools (Subcategory) Driving Schools (Subcategory) Day care and play school (Subcategory) Day Care/Play School (Category) Is there any way to build like this ?? -- [image: hifx_logo] http://hifx.in/ *dhanesh s.R * Team Lead t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4 703 e: dhan...@hifx.in | w: www.hifx.in https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets https://www.linkedin.com/company/2889649 https://plus.google.com/104259935226993895226/about -- -- IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system. Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any). Before you print this email or attachments, please consider the negative environmental impacts associated with printing. -- [image: hifx_logo] http://hifx.in/ *dhanesh s.R * Team Lead t: (+91) 484 4011750 (ext. 712) | m: (+91) 99 4 703 e: dhan...@hifx.in | w: www.hifx.in https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets https://www.linkedin.com/company/2889649 https://plus.google.com/104259935226993895226/about -- -- IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its content are confidential to the intended recipient. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. It may not be disclosed to or used by anyone other than its intended recipient, nor may it be copied in any way. If received in error, please email a reply to the sender, then delete it from your system. Although this e-mail has been scanned for viruses, HiFX cannot ultimately accept any responsibility for viruses and it is your responsibility to scan attachments (if any). Before you print this email or attachments, please consider the negative environmental impacts associated with printing.
Re: retrieving large number of docs
Are these indexes on different machines? Because if they're in the same JVM, you might be able to use cross-core joins. Be aware, though, that joining on high-cardinality fields (which, by definition, docID probably is) is where pseudo joins perform worst. Have you considered flattening the data and including whatever information you have in your from index in your main index? Because 100ms response is probably not going to be tough if you have to have two indexes/cores. Best, Erick On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com wrote: You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: Solr Atomic Updates
I have to ask then why you're not using SolrCloud with multiple shards? It seems to me that that gives you the indexing throughput you need (be sure to use CloudSolrServer from your client). At 300M complex documents, you pretty much certainly will need to shard anyway so in some sense you're re-inventing the wheel here. You can host multiple shards on the same machine, and these _are_ separate Solr cores under the covers so you problem with atomic updates disappears. Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being voted on even now and should be out in a week or so barring problems). Best, Erick On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова batalova...@gmail.com wrote: Jack, Decision of using several cores was made to increase indexing and searching performance (experimentally). In my project index is about 300-500 millions documents (each document has rather difficult structure) and it may be larger. So, while indexing the documents are being added in different cores by some amount of threads. In other words, each thread collect nessesary information for list of documents and generate create-documents query to specific core. At this moment it doesn't matter (and it can't be found out) which document in which core will be. And now there is necessary to update (atomic update) this index. Something like this.. _ _ Batalova Kseniya Explain a little about why you have separate cores, and how you decide which core a new document should reside in. Your scenario still seems a bit odd, so help us understand. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya
Re: retrieving large number of docs
A few questions for you: How large can the list of filtering ID's be? What's your expectation on latency? What version of Solr are you using? SolrCloud or not? Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: retrieving large number of docs
Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: retrieving large number of docs
You may have to do something custom to meet your needs. 10,000 DocID's is not huge but you're latency requirement are pretty low. Are your DocID's by any chance integers? This can make custom PostFilters run much faster. You should also be aware of the Streaming API in Solr 5.1 which will give you fast Map/Reduce approaches ( http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html). Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote: Hey Joel see below On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote: A few questions for you: How large can the list of filtering ID's be? 10k What's your expectation on latency? 10 latency 100 What version of Solr are you using? 5.0.0 SolrCloud or not? not Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote: Hi I have a set of document IDs from one core and i want to query another core using the ids retrieved from the first core...the constraint is that the size of doc ID set can be very large. I want to: 1) retrieve these docs from the 2nd index 2) facet on the results I can think of 3 solutions: 1) boolean query 2) terms fq 3) use a DB rather than Solr I am trying to keep latencies down so prefer to not use (3). The problem with (1) is maxBooleanclauses is hardwired and I am not sure when I will hit the exception. Option (2) seems to also hit limits.. so if I do select?fl=*q=*:*facet=truefacet.field=titlefq={!terms f=id}LONG_LIST_OF_IDS solr just goes blank. I have tried adding cost=200 to try to run the query first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could be a solution but the problem then is that the faceting results correspond to the paged IDs and not the global set. My filter cache spec is as follows filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=10/ What would be the best way for me to solve this problem? thank you
Re: Solr Atomic Updates
Jack, Decision of using several cores was made to increase indexing and searching performance (experimentally). In my project index is about 300-500 millions documents (each document has rather difficult structure) and it may be larger. So, while indexing the documents are being added in different cores by some amount of threads. In other words, each thread collect nessesary information for list of documents and generate create-documents query to specific core. At this moment it doesn't matter (and it can't be found out) which document in which core will be. And now there is necessary to update (atomic update) this index. Something like this.. _ _ Batalova Kseniya Explain a little about why you have separate cores, and how you decide which core a new document should reside in. Your scenario still seems a bit odd, so help us understand. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! Thanks for your quick reply. The problem that all my index is consists of several parts (several cores) and while updating I don't know in advance in which part updated id is lying (in which core the document with specified id is lying). For example, I have two cores (*Core1 *and *Core2*) and I want to update the document with id *Id1 *and I don't know where this document is lying. So, I have to do two select-queries to my cores to know where it is. And then generate update-query to necessary core. What am I doing wrong? I remind that I'm using SOLR 4.4.0. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya What exactly is the problem? And why do you care about cores, per se - other than to send the update to the core/collection you are trying to update? You should specify the core/collection name in the URL. You should also be using the Solr reference guide rather than the (old) wiki: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com wrote: Hi! I'm using *SOLR 4.4.0* for searching in my project. Now I am facing a problem of atomic updates in multiple cores. From wiki: curl *http://localhost:8983/solr/update http://localhost:8983/solr/update *-H 'Content-type:application/json' -d ' [ { *id*: *TestDoc1*, title : {set:test1}, revision : {inc:3}, publisher : {add:TestPublisher} }, { id: TestDoc2, publisher : {add:TestPublisher} } ]' As well as I understand, this means that the document, for example, with id *TestDoc1*, will be searched for updating *only in one core*. And if there is no any document with id *TestDoc1*, the document will be created. Can I somehow to specify the* list of cores* for searching and then updating necessary document with specific id? It's something like *shards *parameter in *select* query. From wiki: #now do a distributed search across both servers with your browser or curl curl ' http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr ' Or is it planned in the future? Thanks in advance. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Best regards, Batalova Kseniya