FYI: This proposal is filed to STORM-2016
<https://issues.apache.org/jira/browse/STORM-2016> and I've been working on
this.

I'd like to explain the details on topology submitter as I wasn't clear on
that.

I've been experimenting several ways of topology submission, but they're
all having pros and cons.

1. Introduce Submitter class which resolves dependencies and upload them to
blobstore, and load topology code and dependencies to custom mutable
classloader and finally run child class' main method by reflection. This is
what SparkSubmit is doing though that is more complicated due to support
various options.

pros.
- No need to handle communication between processes. That class bootstraps
and handle all of things.
cons.
- We should pass custom classloader to all usages of Class.forName in order
to prevent any CNFs.
- Spark uses checkstyle to check usage of Class.forName, but we don't apply
that so we could miss it.

2. Introduce Helper class which resolves transitive dependencies (with
fetching) and upload them to blobstore, and return pair of (blob key, file)
map. storm.py reads the response of Helper class and add them to classpath
and run child class' main.

pros.
- We don't need to use Classloader hack (?).
- If we make Helper class to separate module, we can even place that module
to outside of lib and avoid adding aether libraries to lib directory.
cons.
- It's annoying and error prone to get and parse Helper's output from
stdout.
- Also storm.py needs to run two classes but it's not a big deal since we
already do that. (confvalue, and ClientJarTransformerRunner)
- It's not easy to remove dependencies from blobstore if topology
submission from child class is failed.

3 Let Helper class just resolves transitive dependencies and return file
list. storm.py reads the response of Helper class and add them to classpath
and run child class' main. StormSubmitter will upload them to blobstore.

pros.
- Same as 2.
- Easy to remove dependencies from blobstore if submission is failed.
- Helper class is no longer depending on storm-core. Easier to place the
module to outside of lib.
cons.
- StormSubmitter should handle dependencies when submitting topology.

I've succeed with 2, and will try 3 to see it helps.

Any other suggestions or opinions for existing options are much appreciated!

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 8월 3일 (수) 오전 8:01, Jungtaek Lim <[email protected]>님이 작성:

> Hi Priyank,
>
> first of all, this feature is similar (close) to what Spark provides.
>
> https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management
>
> if you have additional jars which are not packed to uber topology jar, you
> can use --jars option to include them without repackaging topology jar.
>
> And I think I was not clear on submitter. I'm still trying to design that
> point in detail since resolving dependencies need eclipse aether libraries
> so thinking about avoiding to add dependency to storm-core. But it seems
> not that easy and clear. I'll update once I'm clear on this.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 8월 3일 (수) 오전 7:43, Priyank Shah <[email protected]>님이 작성:
>
>> Hi Jungtaek,
>>
>> For adding jars and maven at submission, you have used the word
>> Submitter. Is Submitter the person running storm jar command or is
>> Submitter the java code that actually submits it to Nimbus?
>> Also, I did not quite understand the --jars option. If you could please
>> elaborate a little on that, that will be great
>>
>> Thanks
>> Priyank
>>
>>
>>
>>
>>
>>
>> On 8/2/16, 7:05 AM, "Jungtaek Lim" <[email protected]> wrote:
>>
>> >Ah, Satish you got the point. I meant copied version of files in
>> >supervisor, but itself can be isolated.
>> >I didn't think about removing blobs, and it seems not easy to do.
>> >
>> >Jungtaek Lim (HeartSaVIoR)
>> >
>> >
>> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <[email protected]>님이 작성:
>> >
>> >> Hi Jungtaek,
>> >> With the current proposal, are we removing blob store files referred
>> by a
>> >> topology when it is killed?
>> >>
>> >> Thanks,
>> >> Satish.
>> >>
>> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <[email protected]>
>> wrote:
>> >>
>> >> > Hi Satish,
>> >> >
>> >> > Thanks for reviewing and share your idea.
>> >> >
>> >> > Yes this is shared dependencies vs isolated dependencies.
>> >> > If we name file of dependency to contain group name, artifact name,
>> and
>> >> > version, that can be shared.
>> >> > One downside of this approach is storage space since we don't know
>> when
>> >> > it's safe to delete without additional care, but I'm curious that
>> disk
>> >> > fills up due to dependency blob jar files in normal situation.
>> >> > So I think we're OK to do this but I would like to see others
>> opinions.
>> >> >
>> >> > Btw, I'm designing details based on proposal. Will update to this
>> thread
>> >> if
>> >> > there're not covered things with initial design.
>> >> >
>> >> > Thanks,
>> >> > Jungtaek Lim (HeartSaVioR)
>> >> >
>> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <[email protected]>님이
>> 작성:
>> >> >
>> >> > > Hi Jungtaek,
>> >> > > Proposal looks good to me. Good that we are not going with other
>> >> > > alternative using mutable classloader etc.
>> >> > >
>> >> > > Good to have the mentioned config in proposal to add those jars
>> before
>> >> or
>> >> > > after storm core/libs. There is a property Config.
>> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as initial
>> >> > > classpath and that should continue to be working as expected even
>> with
>> >> > the
>> >> > > new configuration.
>> >> > >
>> >> > > One enhancement which we may want to add to the existing proposal.
>> >> > > When --packages are used, storm submitter can upload those
>> dependencies
>> >> > in
>> >> > > blob store with a defined naming convention so that same set of
>> >> packages
>> >> > > are not uploaded again and they can be used again for other
>> topologies
>> >> if
>> >> > > they use same package.
>> >> > >
>> >> > > Thanks,
>> >> > > Satish.
>> >> > >
>> >> > >
>> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <[email protected]>
>> >> wrote:
>> >> > >
>> >> > > > Hi dev,
>> >> > > >
>> >> > > > This is proposal review thread for submitting topology with
>> adding
>> >> jars
>> >> > > and
>> >> > > > maven artifacts. This is also following up discussion thread for
>> >> > > > [DISCUSSION]
>> >> > > > Policy of resolving dependencies for non storm-core modules.[1]
>> >> > > >
>> >> > > > I've written design doc which also describes motivation on this.
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission
>> >> > > >
>> >> > > > Please review this and comment to "this thread" instead of wiki
>> page
>> >> so
>> >> > > > that all devs can be notified for the update.
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Jungtaek Lim (HeartSaVioR)
>> >> > > >
>> >> > > > [1]
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzqbviscggsc1c...@mail.gmail.com%3E
>> >> > > >
>> >> > >
>> >> >
>> >>
>>
>

Reply via email to