One thing I've found while working is that we may want to add package with excludes.
Launching child class with --packages to kafka_2.10 just fails since it has conflicted libraries as transitive dependencies. Not sure how to represent that, but technically Aether seems to support this. SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError. SLF4J: See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details. Two jars are transitive dependencies of org.apache.kafka:kafka_2.10:0.9.0 (also 0.8.2.1). So if we would like to add kafka lib from submission step, exclusion should be supported. 2016년 8월 4일 (목) 오전 12:03, Jungtaek Lim <[email protected]>님이 작성: > FYI: This proposal is filed to STORM-2016 > <https://issues.apache.org/jira/browse/STORM-2016> and I've been working > on this. > > I'd like to explain the details on topology submitter as I wasn't clear on > that. > > I've been experimenting several ways of topology submission, but they're > all having pros and cons. > > 1. Introduce Submitter class which resolves dependencies and upload them > to blobstore, and load topology code and dependencies to custom mutable > classloader and finally run child class' main method by reflection. This is > what SparkSubmit is doing though that is more complicated due to support > various options. > > pros. > - No need to handle communication between processes. That class bootstraps > and handle all of things. > cons. > - We should pass custom classloader to all usages of Class.forName in > order to prevent any CNFs. > - Spark uses checkstyle to check usage of Class.forName, but we don't > apply that so we could miss it. > > 2. Introduce Helper class which resolves transitive dependencies (with > fetching) and upload them to blobstore, and return pair of (blob key, file) > map. storm.py reads the response of Helper class and add them to classpath > and run child class' main. > > pros. > - We don't need to use Classloader hack (?). > - If we make Helper class to separate module, we can even place that > module to outside of lib and avoid adding aether libraries to lib directory. > cons. > - It's annoying and error prone to get and parse Helper's output from > stdout. > - Also storm.py needs to run two classes but it's not a big deal since we > already do that. (confvalue, and ClientJarTransformerRunner) > - It's not easy to remove dependencies from blobstore if topology > submission from child class is failed. > > 3 Let Helper class just resolves transitive dependencies and return file > list. storm.py reads the response of Helper class and add them to > classpath and run child class' main. StormSubmitter will upload them to > blobstore. > > pros. > - Same as 2. > - Easy to remove dependencies from blobstore if submission is failed. > - Helper class is no longer depending on storm-core. Easier to place the > module to outside of lib. > cons. > - StormSubmitter should handle dependencies when submitting topology. > > I've succeed with 2, and will try 3 to see it helps. > > Any other suggestions or opinions for existing options are much > appreciated! > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 2016년 8월 3일 (수) 오전 8:01, Jungtaek Lim <[email protected]>님이 작성: > >> Hi Priyank, >> >> first of all, this feature is similar (close) to what Spark provides. >> >> https://spark.apache.org/docs/2.0.0/submitting-applications.html#advanced-dependency-management >> >> if you have additional jars which are not packed to uber topology jar, >> you can use --jars option to include them without repackaging topology jar. >> >> And I think I was not clear on submitter. I'm still trying to design that >> point in detail since resolving dependencies need eclipse aether libraries >> so thinking about avoiding to add dependency to storm-core. But it seems >> not that easy and clear. I'll update once I'm clear on this. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> 2016년 8월 3일 (수) 오전 7:43, Priyank Shah <[email protected]>님이 작성: >> >>> Hi Jungtaek, >>> >>> For adding jars and maven at submission, you have used the word >>> Submitter. Is Submitter the person running storm jar command or is >>> Submitter the java code that actually submits it to Nimbus? >>> Also, I did not quite understand the --jars option. If you could please >>> elaborate a little on that, that will be great >>> >>> Thanks >>> Priyank >>> >>> >>> >>> >>> >>> >>> On 8/2/16, 7:05 AM, "Jungtaek Lim" <[email protected]> wrote: >>> >>> >Ah, Satish you got the point. I meant copied version of files in >>> >supervisor, but itself can be isolated. >>> >I didn't think about removing blobs, and it seems not easy to do. >>> > >>> >Jungtaek Lim (HeartSaVIoR) >>> > >>> > >>> >2016년 8월 2일 (화) 오후 7:35, Satish Duggana <[email protected]>님이 >>> 작성: >>> > >>> >> Hi Jungtaek, >>> >> With the current proposal, are we removing blob store files referred >>> by a >>> >> topology when it is killed? >>> >> >>> >> Thanks, >>> >> Satish. >>> >> >>> >> On Tue, Aug 2, 2016 at 3:50 PM, Jungtaek Lim <[email protected]> >>> wrote: >>> >> >>> >> > Hi Satish, >>> >> > >>> >> > Thanks for reviewing and share your idea. >>> >> > >>> >> > Yes this is shared dependencies vs isolated dependencies. >>> >> > If we name file of dependency to contain group name, artifact name, >>> and >>> >> > version, that can be shared. >>> >> > One downside of this approach is storage space since we don't know >>> when >>> >> > it's safe to delete without additional care, but I'm curious that >>> disk >>> >> > fills up due to dependency blob jar files in normal situation. >>> >> > So I think we're OK to do this but I would like to see others >>> opinions. >>> >> > >>> >> > Btw, I'm designing details based on proposal. Will update to this >>> thread >>> >> if >>> >> > there're not covered things with initial design. >>> >> > >>> >> > Thanks, >>> >> > Jungtaek Lim (HeartSaVioR) >>> >> > >>> >> > 2016년 8월 2일 (화) 오후 6:58, Satish Duggana <[email protected]>님이 >>> 작성: >>> >> > >>> >> > > Hi Jungtaek, >>> >> > > Proposal looks good to me. Good that we are not going with other >>> >> > > alternative using mutable classloader etc. >>> >> > > >>> >> > > Good to have the mentioned config in proposal to add those jars >>> before >>> >> or >>> >> > > after storm core/libs. There is a property Config. >>> >> > > TOPOLOGY_CLASSPATH_BEGINNING which is to have that value as >>> initial >>> >> > > classpath and that should continue to be working as expected even >>> with >>> >> > the >>> >> > > new configuration. >>> >> > > >>> >> > > One enhancement which we may want to add to the existing proposal. >>> >> > > When --packages are used, storm submitter can upload those >>> dependencies >>> >> > in >>> >> > > blob store with a defined naming convention so that same set of >>> >> packages >>> >> > > are not uploaded again and they can be used again for other >>> topologies >>> >> if >>> >> > > they use same package. >>> >> > > >>> >> > > Thanks, >>> >> > > Satish. >>> >> > > >>> >> > > >>> >> > > On Tue, Aug 2, 2016 at 7:25 AM, Jungtaek Lim <[email protected]> >>> >> wrote: >>> >> > > >>> >> > > > Hi dev, >>> >> > > > >>> >> > > > This is proposal review thread for submitting topology with >>> adding >>> >> jars >>> >> > > and >>> >> > > > maven artifacts. This is also following up discussion thread for >>> >> > > > [DISCUSSION] >>> >> > > > Policy of resolving dependencies for non storm-core modules.[1] >>> >> > > > >>> >> > > > I've written design doc which also describes motivation on this. >>> >> > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> https://cwiki.apache.org/confluence/display/STORM/A.+Design+doc%3A+adding+jars+and+maven+artifacts+at+submission >>> >> > > > >>> >> > > > Please review this and comment to "this thread" instead of wiki >>> page >>> >> so >>> >> > > > that all devs can be notified for the update. >>> >> > > > >>> >> > > > Thanks, >>> >> > > > Jungtaek Lim (HeartSaVioR) >>> >> > > > >>> >> > > > [1] >>> >> > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> http://mail-archives.apache.org/mod_mbox/storm-dev/201607.mbox/%3CCAF5108jByyJLTKrV_P4fS=dj8rsr_o5oubzqbviscggsc1c...@mail.gmail.com%3E >>> >> > > > >>> >> > > >>> >> > >>> >> >>> >>
