Maybe let’s ask the folks from Lightbend who helped with the previous scala upgrade for their thoughts?
On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <gatorsm...@gmail.com> wrote: > 1. On the technical side, my main concern is the runtime dependency on >> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We >> came out with the solution to shade a few Scala libraries to avoid >> pollution. However, I'm not super confident that the approach is >> sustainable for two reasons: a) there exists no proper shading libraries >> for Scala, 2) We will have to wait for upgrades from those Scala libraries >> before we can upgrade Spark to use a newer Scala version. So it would be >> great if some Scala experts can help review the current implementation and >> help assess the risk. > > > This concern is valid. I think we should start the vote to ensure the > whole community is aware of the risk and take the responsibility to > maintain this in the long term. > > Cheers, > > Xiao > > > Xiangrui Meng <men...@gmail.com> 于2019年10月4日周五 下午12:27写道: > >> Hi all, >> >> I want to clarify my role first to avoid misunderstanding. I'm an >> individual contributor here. My work on the graph SPIP as well as other >> Spark features I contributed to are not associated with my employer. It >> became quite challenging for me to keep track of the graph SPIP work due to >> less available time at home. >> >> On retrospective, we should have involved more Spark devs and committers >> early on so there is no single point of failure, i.e., me. Hopefully it is >> not too late to fix. I summarize my thoughts here to help onboard other >> reviewers: >> >> 1. On the technical side, my main concern is the runtime dependency on >> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We >> came out with the solution to shade a few Scala libraries to avoid >> pollution. However, I'm not super confident that the approach is >> sustainable for two reasons: a) there exists no proper shading libraries >> for Scala, 2) We will have to wait for upgrades from those Scala libraries >> before we can upgrade Spark to use a newer Scala version. So it would be >> great if some Scala experts can help review the current implementation and >> help assess the risk. >> >> 2. Overloading helper methods. MLlib used to have several overloaded >> helper methods for each algorithm, which later became a major maintenance >> burden. Builders and setters/getters are more maintainable. I will comment >> again on the PR. >> >> 3. The proposed API partitions graph into sub-graphs, as described in the >> property graph model. It is unclear to me how it would affect query >> performance because it requires SQL optimizer to correctly recognize data >> from the same source and make execution efficient. >> >> 4. The feature, although originally targeted for Spark 3.0, should not be >> a Spark 3.0 release blocker because it doesn't require breaking changes. If >> we miss the code freeze deadline, we can introduce a build flag to exclude >> the module from the official release/distribution, and then make it default >> once the module is ready. >> >> 5. If unfortunately we still don't see sufficient committer reviews, I >> think the best option would be submitting the work to Apache Incubator >> instead to unblock the work. But maybe it is too earlier to discuss this >> option. >> >> It would be great if other committers can offer help on the review! >> Really appreciated! >> >> Best, >> Xiangrui >> >> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <m...@neo4j.org.invalid> >> wrote: >> >>> Hello dear Spark community >>> >>> We are the developers behind the SparkGraph SPIP, which is a project >>> created out of our work on openCypher Morpheus ( >>> https://github.com/opencypher/morpheus). During this year we have >>> collaborated with mainly Xiangrui Meng of Databricks to define and develop >>> a new SparkGraph module based on our experience from working on Morpheus. >>> Morpheus - formerly known as "Cypher for Apache Spark" - has been in >>> development for over 3 years and matured in its API and implementation. >>> >>> The SPIP work has been on hold for a period of time now, as priorities >>> at Databricks have changed which has occupied Xiangrui's time (as well as >>> other happenings). As you may know, the latest API PR ( >>> https://github.com/apache/spark/pull/24851) is blocking us from moving >>> forward with the implementation. >>> >>> In an attempt to not lose track of this project we now reach out to you >>> to ask whether there are any Spark committers in the community who would be >>> prepared to commit to helping us review and merge our code contributions to >>> Apache Spark? We are not asking for lots of direct development support, as >>> we believe we have the implementation more or less completed already since >>> early this year. There is a proof-of-concept PR ( >>> https://github.com/apache/spark/pull/24297) which contains the >>> functionality. >>> >>> If you could offer such aid it would be greatly appreciated. None of us >>> are Spark committers, which is hindering our ability to deliver this >>> project in time for Spark 3.0. >>> >>> Sincerely >>> the Neo4j Graph Analytics team >>> Mats, Martin, Max, Sören, Jonatan >>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau