[ 
https://issues.apache.org/jira/browse/STORM-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320588#comment-14320588
 ] 

ASF GitHub Bot commented on STORM-411:
--------------------------------------

Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/354#issuecomment-74308624
  
    @revans2 I have added the nimbus discovery using thrift API to address your 
concerns around clients connecting directly to zookeeper. 
    
    For fencing around zk, non leader nimbuses still need to write some 
information (e.g. they write /storm/code-distributor/ what topologies they have 
locally replicated,and now they also write their nimbus summary under 
/storm/nimbuses) which means I can not put a single fencing around all zk 
write. One way or another I will have to differentiate between leader 
operations vs non leader operation. I prefer to put the fencing upfront so any 
in memory states are also not altered.
    
    The coupling between storage layer and leader election is a side effect of 
having to support a protocol like bittorrent in absence of any highly available 
storage from which other hosts could download .torrent files. I can make the 
assumption that ICodeDistributor implementation are always backed by highly 
available storage systems and that will allow for removal of coupling with 
replication count but then unless STORM-411 is done we will have to force users 
to use HDFS to achieve nimbus HA.


> Extend file uploads to support more distributed cache like semantics
> --------------------------------------------------------------------
>
>                 Key: STORM-411
>                 URL: https://issues.apache.org/jira/browse/STORM-411
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> One of the big features that we are asked about for a hosted storm instance 
> is how to distribute and update large shared data sets with topologies.  
> These could be things like ip to geolocation tables, machine learned models 
> or just about anything else.
> Currently with storm you either have to package it as part of your topology 
> jar, install in on the machine already, or access an external service to pull 
> the data down.  Packaging it in the jar does not allow users to update the 
> dataset without restarting their topologies, installing it on the machine 
> will not work for a hosted storm solution, and pulling it form an external 
> service without the supervisors being aware of it would mean it would be 
> downloaded multiple times, and may not be cleaned up properly afterwards.
> I propose that instead we setup something similar to the distributed cache on 
> Hadoop, but with a pluggable backend.  The APIs would be for a simple 
> blobstore so they could be backed by local disk on nimbus, HDFS, swift, or 
> even bittorrent.
> Adding new "files" to the blob store or downloading them would by default go 
> through nimbus, but if an external store is properly configured direct access 
> into the store could be used.
> The worker process would access the files through symlinks in the current 
> working directory of the worker.  For posix systems when a new version of the 
> file is made available the symlink would atomically be replaced by a new one 
> pointing to the new version.  Windows does not support atomic replacement of 
> a symlink so we should provide a simple library that will return resolved 
> paths to be used, and can detect when the links have changed, but have some 
> retry logic built in, if the symlink disappears in the middle.
> We are in the early stages of implementing this functionality and would like 
> some feedback on the concepts before getting too far along. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to