Re: Merge Two Fields in SOLR
Ravi, what about using field aliasing at search time? Would that do the trick for your use case? http://localhost:8983/solr/mycollection/select?defType=edismaxq=name:john doef.name.qf=firstname surname For more details: https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser Damien On 04/07/2015 10:21 AM, Erick Erickson wrote: I don't understand why copyField doesn't work. Admittedly the firstName and SurName would be separate tokens, but isn't that what you want? The fact that it's multiValued isn't really a problem, multiValued fields are really functionally identical to single valued fields if you set positionIncrementGap to... hmmm.. 1 or 0 I'm not quite sure which. Of course if your'e sorting by the field, that's a different story. Here's a discussion with several options, but I really wonder what your specific objection to copyField is, it's the simplest and on the surface it seems like it would work. http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-td4086786.html Best, Erick On Tue, Apr 7, 2015 at 10:08 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: Hi Group, I am not sure if we have any easy way to merge two fields data in One Field, the Copy field doesn’t works as it stores as Multivalued. Can someone suggest any workaround to achieve this Use Case? FirstName:ABC SurName:XYZ I need an Another Field with Name:ABCXYZ where I have to do at SOLR END.. as the Source Data is read only and no control to comibine. Thanks Ravi
Retrieving list of words for highlighting
In Solr 5 (or 4), is there an easy way to retrieve the list of words to highlight? Use case: allow an external application to highlight the matching words of a matching document, rather than using the highlighted snippets returned by Solr. Thanks, Damien
Re: Solr 5.0.0 - Multiple instances sharing Solr server *read-only* dir
Thanks Timothy for the pointer to the Jira ticket. That's exactly it :-) Erick, the main reason why I would run multiple instances on the same machine is to simulate a multi node environment. But beyond that, I like the idea of being able to clearly separate the server dir and the data dirs. That way the server dir could be deployed by root. Yet Solr instances could run in userland. Damien On 03/10/2015 09:31 AM, Timothy Potter wrote: I think the next step here is to ship Solr with the war already extracted so that Jetty doesn't need to extract it on first startup - https://issues.apache.org/jira/browse/SOLR-7227 On Tue, Mar 10, 2015 at 10:15 AM, Erick Erickson erickerick...@gmail.com wrote: If I'm understanding your problem correctly, I think you want the -d option, then all the -s guys would be under that. Just to check, though, why are you running multiple Solrs? There are sometimes very good reasons, just checking that you're not making things more difficult than necessary Best, Erick On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman damien.dyk...@gmail.com wrote: Hi all, Quoted from https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference When running multiple instances of Solr on the same host, it is more common to use the same server directory for each instance and use a unique Solr home directory using the -s option. Is there a way to achieve this without making *any* changes to the extracted content of solr-5.0.0.tgz and only use runtime parameters? I other words, make the extracted folder solr-5.0.0 strictly read-only? By default, the Solr web app is deployed under server/solr-webapp, as per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I cannot make folder sorl-5.0.0 read-only to my Solr instances. I've figured out how to make the log files and pid file to be located under the Solr data dir by doing: export SOLR_PID_DIR=mySolrDataDir/logs; \ export SOLR_LOGS_DIR=mySolrDataDir/logs; \ bin/solr start -c -z localhost:32101/solr \ -s mySolrDataDir \ -a -Dsolr.log=mySolrDataDir/logs \ -p 31100 -h localhost But if there was a way to not have to change solr-jetty-context.xml that would be awesome! Thoughts? Thanks, Damien
Solr 5.0.0 - Multiple instances sharing Solr server *read-only* dir
Hi all, Quoted from https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference When running multiple instances of Solr on the same host, it is more common to use the same server directory for each instance and use a unique Solr home directory using the -s option. Is there a way to achieve this without making *any* changes to the extracted content of solr-5.0.0.tgz and only use runtime parameters? I other words, make the extracted folder solr-5.0.0 strictly read-only? By default, the Solr web app is deployed under server/solr-webapp, as per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I cannot make folder sorl-5.0.0 read-only to my Solr instances. I've figured out how to make the log files and pid file to be located under the Solr data dir by doing: export SOLR_PID_DIR=mySolrDataDir/logs; \ export SOLR_LOGS_DIR=mySolrDataDir/logs; \ bin/solr start -c -z localhost:32101/solr \ -s mySolrDataDir \ -a -Dsolr.log=mySolrDataDir/logs \ -p 31100 -h localhost But if there was a way to not have to change solr-jetty-context.xml that would be awesome! Thoughts? Thanks, Damien
/export - Why need sort criteria (4.10.2)?
The /export request handler mandates a sort order. Is there a particular reason? It'd be nice to have the option to tell Solr: just export in the order you want, to limit any kind of overhead added by sorting. Or am I missing something? If exports were distributed, I can see the need for some kind of sort order, but they are not. BTW, kudos for adding this feature, it rocks and seems to scale really well :-) Though, I did see some weird behaviors (NullPointerException @ SortingResponseWriter.java:784) in some cases. I'll further investigate and if I manage to make that issue a little more deterministic and reproducible, I'll share my findings. Thanks, Damien
Duplicate unique ID in implicit collection - Illegal?
Hi all, With an implicit collection, is it legal to index the same document (same unique ID) in 2 different shards? I know, it kind of defeats the purpose of having a unique ID... The reason I'm doing this, is because I want to move a single document from 1 shard to an other. During the transition period, I'd use a search criteria to specify which shard I want to target to find that document. At search, I do notice some weird behaviors. The facets do take into account the duplicate nature but the number of results varies, for instance depending on parameter row=xx. But that doesn't surprise me too much given the non-uniqueness-of-the-unique-ID. So my actual question is the following: if my search query guaranties there will be no duplicate matches, is my search result going to be consistent? That's assuming it's legal to have duplicates across shards from an indexation point of view. Thanks, Damien
Re: Transparently rebalancing a Solr cluster without splitting or moving shards
Thanks for your suggestions and recommendations. If I understand correctly, the MIGRATE command does shard splitting (around the range of the split.key) and merging behind the scene. Though, it's a bit difficult to properly monitor the actual migration, set the proper timeouts, know when to direct indexing and search traffic to the destination collection, etc. Note sure how to MIGRATE an entire collection. By providing the full list of split.keys? I'd be surprised if that was doable, but I guess it will skip the splitting part, which makes it easier ;-) Or much tougher by splitting around all the ranges. More seriously, doing a MERGEINDEX at the core level might not be a bad alternative, providing the hash ranges are compatible. Damien On 07/07/2014 05:14 PM, Shawn Heisey wrote: I don't think you'd want to disable mmap. It could be done, by choosing another DirectoryFactory object. Adding memory is likely to be the only sane way forward. Another possibility would be to bump up the maxShardsPerNode value and build the new collection (with the proper number of shards) only on the new machines... Then when they are built, move them to their proper homes and manually adjust the cluster state in zookeeper. This will still generate a lot of I/O, but hopefully it will last for less time on the wall clock, and it will be something you can do when load is low. After that done and you've switched to it, you can add replicas with either the addreplica collections api or with the core admin api. You should be on the newest Solr version... Lots of bugs have been found and fixed. One thing I wonder is whether the MIGRATE api can be used on an entire collection. It says it works by shard key, but I suspect that most users will not be using that functionality. Thanks, Shawn
Transparently rebalancing a Solr cluster without splitting or moving shards
I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and rebalance data accordingly. Lets add the following constraints: - 1. boxes have different characteristics (RAM, CPU, disks) - 2. different number of shards per box/node (lets pretend we have found the sweet spot for each box) - 3. once rebalancing is over, the layout of the cluster should be the same as if it had been bootstrapped from N+M boxes Because of the above constraints, shard splitting or moving shards around is not an option. And too keep the discussion simple, lets ignore shard replicas. So far, the best scenario I could think of is the following: - a. 1 collection on the N nodes using implicit routing - b. add shards on the M new nodes as part of that collection - c. reindex a portion of the data on the shards of the M new nodes, while restricting them from search - d. in 1 transaction, delete the old data and immediately issue a soft commit and remove search restrictions Any better idea? I could also use 1 collection per box and have Solr do the routing within each collection. I would still have to handle the routing across collections but collection aliases would come in handy. But overall, it would be similar to the above scenario. Actually in my case, it wouldn't work as well because I also use some kind of flag document on the M new nodes which I need to update atomically with the delete of the old stuff. And, if I'm not mistaken, I'd loose atomicity with the multi-collection scenario. Thank you for your feedback, Damien
Re: Transparently rebalancing a Solr cluster without splitting or moving shards
Thanks Shawn, clean way to do it, indeed. And going your route, one could even copy the existing shards into the new collection and then delete the data which is getting reindexed on the new nodes. That would spare reindexing everything. But in my case, I add boxes after a noticeable performance degradation due to data volume increase. So the old boxes cannot afford reindexing data (or deleting if using the propose variation) in the new collection while serving searches with the old collection. Unless there is a way to bound aggressively the RAM consumption of new collection (disabling MMAP?), given that it's not being used for search during the transition? That said, even if that was possible, both collections would compete for disk IOs. Thanks, Damien On 07/07/2014 12:26 PM, Shawn Heisey wrote: On 7/7/2014 12:41 PM, Damien Dykman wrote: I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and rebalance data accordingly. Lets add the following constraints: - 1. boxes have different characteristics (RAM, CPU, disks) - 2. different number of shards per box/node (lets pretend we have found the sweet spot for each box) - 3. once rebalancing is over, the layout of the cluster should be the same as if it had been bootstrapped from N+M boxes Because of the above constraints, shard splitting or moving shards around is not an option. And too keep the discussion simple, lets ignore shard replicas. So far, the best scenario I could think of is the following: - a. 1 collection on the N nodes using implicit routing - b. add shards on the M new nodes as part of that collection - c. reindex a portion of the data on the shards of the M new nodes, while restricting them from search - d. in 1 transaction, delete the old data and immediately issue a soft commit and remove search restrictions You may not like this answer, but here's a fairly clean way to do this, assuming you have enough disk space on the existing machines: 1. Add the new boxes to the cluster. 2. Create a new collection across all the boxes. 2a. If your current collection is named test then name the new one test0 or something else that's related, but different. 3. Index all data into the new collection. 4. As quickly as possible, do the following actions: 4a. Stop indexing. 4b. Do a synchronization pass on the new collection so it's current. 4c. Delete the original collection. 4d. Create a collection alias so that you can access the new collection with the original collection name. 4e. Restart indexing. Thanks, Shawn
Re: Adding router.field property to an existing collection.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Modassar, I ran into the same issue (Solr 4.8.1) with an existing collection set to implicit routing but with no router.field defined. I managed to set the router.field by modifying /clusterstate.json and pushing it back to Zookeeper. For instance, I use field shard_name for routing. Now, in my /clusterstate.json, I have: router:{ name:implicit, field:shard_name } Warning: you'll probably need to reload your collection (see Collection API) for the change to be taken into account. Or a more brutal way, restart your Solr nodes. Then you should see the update in http://localhost:8983/solr/admin/collections?action=clusterstatus. I'd be curious to know if there's a cleaner method though, rather than modifying /clusterstate.json. Otherwise, if you want to create a collection from scratch with implict routing and a router.field (see Collection API), use: http://localhost:8983/solr/admin/collections?action=CREATEname=my_collectionrouter.name=implicitrouter.field=shard_name Good luck, Damien On 05/06/2014 05:59 AM, Modassar Ather wrote: Hi, I have a setup of two shard with embedded zookeeper and one collection on two tomcat instances. I cannot use uniqueKey i.e the compositeId routing for document routing as per my understanding it will change the uniqueKey. There is another way mentioned on Solr wiki is by using router.field. I could not find a way of setting it in solr.xml/other configuration file to get it added. Kindly share your suggestion on: How I can use router.field in an existing collection? Create a collection with router.field and implicit routing enabled? Thanks, Modassar -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTqyixAAoJENfoFMxpEaCCPGgH/iAyTPeWbEtdgWdLN46kP3RT vnSzf2qFEE4bXgdyVVuuZ/dagEPYUDxn9EhSwOrzuZmJcBNpgaTP8lZtejRo6LCO jYItfO14uq/wEczelyvb3iEAqFYdCG1hQxpmabEi1uuLvLCgwLgbgsvZ8AR7l3ci IGdQvMnD004VRXIAqErpv8E24ChH+qD+gC7ed4FiAhKfb6fBvNmsoIqmPSRcmeZX zXjSZJ3K/c3P+pddKaEGr6BFccb/zIK/yJ/q/ihZIr1kyBnjEBfhhlBhgSvVXBEu l97gvyz84WO5++TGFNbNIAj9quTu6+23Rn2ohjcMpz9TA9RtVbNImoZ5wQ0qjYY= =F0U4 -END PGP SIGNATURE-
Atomic commit across shards?
Is a commit (hard or soft) atomic across shards? In other words, can I guaranty that any given search on a multi-shard collection will hit the same index generation of each shard? Thanks, Damien