I know we want to keep breaking changes to a minimum but I'm hoping that with Spark 2.0 we can also look at better classpath isolation with user programs. I propose we build on spark.{driver|executor}.userClassPathFirst, setting it true by default, and not allow any spark transitive dependencies to leak into user code. For backwards compatibility we can have a whitelist if we want but I'd be good if we start requiring user apps to explicitly pull in all their dependencies. From what I can tell, Hadoop 3 is also moving in this direction.
Kostas On Thu, Nov 12, 2015 at 9:56 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > With regards to Machine learning, it would be great to move useful > features from MLlib to ML and deprecate the former. Current structure of > two separate machine learning packages seems to be somewhat confusing. > > With regards to GraphX, it would be great to deprecate the use of RDD in > GraphX and switch to Dataframe. This will allow GraphX evolve with Tungsten. > > On that note of deprecating stuff, it might be good to deprecate some > things in 2.0 without removing or replacing them immediately. That way 2.0 > doesn’t have to wait for everything that we want to deprecate to be > replaced all at once. > > Nick > > > On Thu, Nov 12, 2015 at 12:45 PM Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > >> Parameter Server is a new feature and thus does not match the goal of 2.0 >> is “to fix things that are broken in the current API and remove certain >> deprecated APIs”. At the same time I would be happy to have that feature. >> >> >> >> With regards to Machine learning, it would be great to move useful >> features from MLlib to ML and deprecate the former. Current structure of >> two separate machine learning packages seems to be somewhat confusing. >> >> With regards to GraphX, it would be great to deprecate the use of RDD in >> GraphX and switch to Dataframe. This will allow GraphX evolve with Tungsten. >> >> >> >> Best regards, Alexander >> >> >> >> *From:* Nan Zhu [mailto:zhunanmcg...@gmail.com] >> *Sent:* Thursday, November 12, 2015 7:28 AM >> *To:* wi...@qq.com >> *Cc:* dev@spark.apache.org >> *Subject:* Re: A proposal for Spark 2.0 >> >> >> >> Being specific to Parameter Server, I think the current agreement is that >> PS shall exist as a third-party library instead of a component of the core >> code base, isn’t? >> >> >> >> Best, >> >> >> >> -- >> >> Nan Zhu >> >> http://codingcat.me >> >> >> >> On Thursday, November 12, 2015 at 9:49 AM, wi...@qq.com wrote: >> >> Who has the idea of machine learning? Spark missing some features for >> machine learning, For example, the parameter server. >> >> >> >> >> >> 在 2015年11月12日,05:32,Matei Zaharia <matei.zaha...@gmail.com> 写道: >> >> >> >> I like the idea of popping out Tachyon to an optional component too to >> reduce the number of dependencies. In the future, it might even be useful >> to do this for Hadoop, but it requires too many API changes to be worth >> doing now. >> >> >> >> Regarding Scala 2.12, we should definitely support it eventually, but I >> don't think we need to block 2.0 on that because it can be added later too. >> Has anyone investigated what it would take to run on there? I imagine we >> don't need many code changes, just maybe some REPL stuff. >> >> >> >> Needless to say, but I'm all for the idea of making "major" releases as >> undisruptive as possible in the model Reynold proposed. Keeping everyone >> working with the same set of releases is super important. >> >> >> >> Matei >> >> >> >> On Nov 11, 2015, at 4:58 AM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> On Wed, Nov 11, 2015 at 12:10 AM, Reynold Xin <r...@databricks.com> >> wrote: >> >> to the Spark community. A major release should not be very different from >> a >> >> minor release and should not be gated based on new features. The main >> >> purpose of a major release is an opportunity to fix things that are broken >> >> in the current API and remove certain deprecated APIs (examples follow). >> >> >> >> Agree with this stance. Generally, a major release might also be a >> >> time to replace some big old API or implementation with a new one, but >> >> I don't see obvious candidates. >> >> >> >> I wouldn't mind turning attention to 2.x sooner than later, unless >> >> there's a fairly good reason to continue adding features in 1.x to a >> >> 1.7 release. The scope as of 1.6 is already pretty darned big. >> >> >> >> >> >> 1. Scala 2.11 as the default build. We should still support Scala 2.10, >> but >> >> it has been end-of-life. >> >> >> >> By the time 2.x rolls around, 2.12 will be the main version, 2.11 will >> >> be quite stable, and 2.10 will have been EOL for a while. I'd propose >> >> dropping 2.10. Otherwise it's supported for 2 more years. >> >> >> >> >> >> 2. Remove Hadoop 1 support. >> >> >> >> I'd go further to drop support for <2.2 for sure (2.0 and 2.1 were >> >> sort of 'alpha' and 'beta' releases) and even <2.6. >> >> >> >> I'm sure we'll think of a number of other small things -- shading a >> >> bunch of stuff? reviewing and updating dependencies in light of >> >> simpler, more recent dependencies to support from Hadoop etc? >> >> >> >> Farming out Tachyon to a module? (I felt like someone proposed this?) >> >> Pop out any Docker stuff to another repo? >> >> Continue that same effort for EC2? >> >> Farming out some of the "external" integrations to another repo (? >> >> controversial) >> >> >> >> See also anything marked version "2+" in JIRA. >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> >