Re: SparkGraph review process

kant kodali Thu, 13 Feb 2020 23:13:28 -0800

any update on this? Is spark graph going to make it into Spark or no?

On Mon, Oct 14, 2019 at 12:26 PM Holden Karau <hol...@pigscanfly.ca> wrote:


> Maybe let’s ask the folks from Lightbend who helped with the previous
> scala upgrade for their thoughts?
>
> On Mon, Oct 14, 2019 at 8:24 PM Xiao Li <gatorsm...@gmail.com> wrote:
>
>> 1. On the technical side, my main concern is the runtime dependency on
>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
>>> came out with the solution to shade a few Scala libraries to avoid
>>> pollution. However, I'm not super confident that the approach is
>>> sustainable for two reasons: a) there exists no proper shading libraries
>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries
>>> before we can upgrade Spark to use a newer Scala version. So it would be
>>> great if some Scala experts can help review the current implementation and
>>> help assess the risk.
>>
>>
>> This concern is valid. I think we should start the vote to ensure the
>> whole community is aware of the risk and take the responsibility to
>> maintain this in the long term.
>>
>> Cheers,
>>
>> Xiao
>>
>>
>> Xiangrui Meng <men...@gmail.com> 于2019年10月4日周五 下午12:27写道：
>>
>>> Hi all,
>>>
>>> I want to clarify my role first to avoid misunderstanding. I'm an
>>> individual contributor here. My work on the graph SPIP as well as other
>>> Spark features I contributed to are not associated with my employer. It
>>> became quite challenging for me to keep track of the graph SPIP work due to
>>> less available time at home.
>>>
>>> On retrospective, we should have involved more Spark devs and committers
>>> early on so there is no single point of failure, i.e., me. Hopefully it is
>>> not too late to fix. I summarize my thoughts here to help onboard other
>>> reviewers:
>>>
>>> 1. On the technical side, my main concern is the runtime dependency on
>>> org.opencypher:okapi-shade. okapi depends on several Scala libraries. We
>>> came out with the solution to shade a few Scala libraries to avoid
>>> pollution. However, I'm not super confident that the approach is
>>> sustainable for two reasons: a) there exists no proper shading libraries
>>> for Scala, 2) We will have to wait for upgrades from those Scala libraries
>>> before we can upgrade Spark to use a newer Scala version. So it would be
>>> great if some Scala experts can help review the current implementation and
>>> help assess the risk.
>>>
>>> 2. Overloading helper methods. MLlib used to have several overloaded
>>> helper methods for each algorithm, which later became a major maintenance
>>> burden. Builders and setters/getters are more maintainable. I will comment
>>> again on the PR.
>>>
>>> 3. The proposed API partitions graph into sub-graphs, as described in
>>> the property graph model. It is unclear to me how it would affect query
>>> performance because it requires SQL optimizer to correctly recognize data
>>> from the same source and make execution efficient.
>>>
>>> 4. The feature, although originally targeted for Spark 3.0, should not
>>> be a Spark 3.0 release blocker because it doesn't require breaking changes.
>>> If we miss the code freeze deadline, we can introduce a build flag to
>>> exclude the module from the official release/distribution, and then make it
>>> default once the module is ready.
>>>
>>> 5. If unfortunately we still don't see sufficient committer reviews, I
>>> think the best option would be submitting the work to Apache Incubator
>>> instead to unblock the work. But maybe it is too earlier to discuss this
>>> option.
>>>
>>> It would be great if other committers can offer help on the review!
>>> Really appreciated!
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Fri, Oct 4, 2019 at 1:32 AM Mats Rydberg <m...@neo4j.org.invalid>
>>> wrote:
>>>
>>>> Hello dear Spark community
>>>>
>>>> We are the developers behind the SparkGraph SPIP, which is a project
>>>> created out of our work on openCypher Morpheus (
>>>> https://github.com/opencypher/morpheus). During this year we have
>>>> collaborated with mainly Xiangrui Meng of Databricks to define and develop
>>>> a new SparkGraph module based on our experience from working on Morpheus.
>>>> Morpheus - formerly known as "Cypher for Apache Spark" - has been in
>>>> development for over 3 years and matured in its API and implementation.
>>>>
>>>> The SPIP work has been on hold for a period of time now, as priorities
>>>> at Databricks have changed which has occupied Xiangrui's time (as well as
>>>> other happenings). As you may know, the latest API PR (
>>>> https://github.com/apache/spark/pull/24851) is blocking us from moving
>>>> forward with the implementation.
>>>>
>>>> In an attempt to not lose track of this project we now reach out to you
>>>> to ask whether there are any Spark committers in the community who would be
>>>> prepared to commit to helping us review and merge our code contributions to
>>>> Apache Spark? We are not asking for lots of direct development support, as
>>>> we believe we have the implementation more or less completed already since
>>>> early this year. There is a proof-of-concept PR (
>>>> https://github.com/apache/spark/pull/24297) which contains the
>>>> functionality.
>>>>
>>>> If you could offer such aid it would be greatly appreciated. None of us
>>>> are Spark committers, which is hindering our ability to deliver this
>>>> project in time for Spark 3.0.
>>>>
>>>> Sincerely
>>>> the Neo4j Graph Analytics team
>>>> Mats, Martin, Max, Sören, Jonatan
>>>>
>>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: SparkGraph review process

Reply via email to