[ 
https://issues.apache.org/jira/browse/SOLR-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510698#comment-13510698
 ] 

Hoss Man commented on SOLR-4136:
--------------------------------


Been poking around the SolrCloud/zk code ... fun times.

>From what i can tell, we don't record anywhere in zookeeper the mapping of 
>"nodeName" -> "baseURL" for the various solr nodes in a solr cloud cluster.  
>We _do_ seem evidently record the baseUrl associated with a nodeName in the 
>info about each _replica_ -- but that information is per collection & shard, 
>so as is it doesn't really help in the general case of the bad code in 
>OverseerCollectionProcessor.

Three options occur to me...

1) We could consider adding these mappins to ZK as 1st order info. possibly by 
adding some data to the ephemeral "liveNodes" path for each node, so code like 
OverseerCollectionProcessor could just ask for the data of each liveNode to 
know it's baseUrl ... but i'm not sure how far down that rabithole we want to 
go (i don't really know the performance characteristics of ZK enough to know if 
it's a good idea to have code doing lots of those kinds of lookups ad-hoc)

2) we could cheat: we could add something like this to ClusterState...
{code}
private final Map<String,String> baseUrls;
public String getBaseUrl(final String nodeName);
{code}
...and populate the baseUrls Map in the constructor based on the properties 
found when looping over every collections->slice->replica.  The only question 
is what to do if/when two diff collections/slice/replica in the clusterstate 
disagree about the baseUrl?  (assertion failed?)

3) We could improve the kludge to be a bit less kludgy: 
OverseerCollectionProcessor (and possibly other places) currently assume that a 
baseUrl can be computed from a nodeName by replacing all "\_" with "/" -- if we 
change that substitution to only apply to the _first_ "\_" in the nodeName, and 
combine it with some URL decoding on the "hostContext" portion of the nodeName 
(to match my suggested improvement in theprevious patch) i think we would have 
a fairly safe way of bi-directinally converting nodeName<->URL regardless of 
what's in the hostContext - because hostnames and ports can't ever have "\_" in 
them.  (this wouldn't address the "http://"; kludge, but that assumption seems 
to be more pervasive - we can fight that battle another day)

--

Option #3 seems the invasive for now, so unless mark/yonik/sami/somebody chimes 
in with more encouragment to go down one of the other routes, i'll take a stab 
at #3 and see what other problems i encounter.


                
> SolrCloud bugs when servlet context contains "/" or "_"
> -------------------------------------------------------
>
>                 Key: SOLR-4136
>                 URL: https://issues.apache.org/jira/browse/SOLR-4136
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-4136.patch
>
>
> SolrCloud does not work properly with non-trivial values for "hostContext" 
> (ie: the servlet context path).  In particular...
> * Using a hostContext containing a  "/" (ie: a servlet context with a subdir 
> path, semi-common among people who organize webapps hierarchically for lod 
> blanacer rules) is explicitly forbidden in ZkController because of how the 
> hostContext is used to build a ZK nodeName
> * Using a hostContext containing a "\_" causes problems in 
> OverseerCollectionProcessor where it assumes all "\_" characters should be 
> converted to "/" to reconstitute a URL from nodeName (NOTE: this code 
> specifically has a TODO to fix this, and then has a subsequent TODO about 
> assuming "http://"; labeled "this sucks")

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to