I would personally love to see us provide a gentle migration path to Spark 3 especially if much of the work is already going to happen anyways.
Maybe giving it a different name (eg something like Spark-2-to-3-transitional) would make it more clear about its intended purpose and encourage folks to move to 3 when they can? On Mon, Sep 23, 2019 at 9:17 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > My understanding is that 3.0-preview is not going to be a production-ready > release. For those of us that have been using backports of DSv2 in > production, that doesn't help. > > It also doesn't help as a stepping stone because users would need to > handle all of the incompatible changes in 3.0. Using 3.0-preview would be > an unstable release with breaking changes instead of a stable release > without the breaking changes. > > I'm offering to help build a stable release without breaking changes. But > if there is no community interest in it, I'm happy to drop this. > > On Sun, Sep 22, 2019 at 6:39 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> +1 for Matei's as well. >> >> On Sun, 22 Sep 2019, 14:59 Marco Gaido, <marcogaid...@gmail.com> wrote: >> >>> I agree with Matei too. >>> >>> Thanks, >>> Marco >>> >>> Il giorno dom 22 set 2019 alle ore 03:44 Dongjoon Hyun < >>> dongjoon.h...@gmail.com> ha scritto: >>> >>>> +1 for Matei's suggestion! >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> On Sat, Sep 21, 2019 at 5:44 PM Matei Zaharia <matei.zaha...@gmail.com> >>>> wrote: >>>> >>>>> If the goal is to get people to try the DSv2 API and build DSv2 data >>>>> sources, can we recommend the 3.0-preview release for this? That would get >>>>> people shifting to 3.0 faster, which is probably better overall compared >>>>> to >>>>> maintaining two major versions. There’s not that much else changing in 3.0 >>>>> if you already want to update your Java version. >>>>> >>>>> On Sep 21, 2019, at 2:45 PM, Ryan Blue <rb...@netflix.com.INVALID> >>>>> wrote: >>>>> >>>>> > If you insist we shouldn't change the unstable temporary API in 3.x >>>>> . . . >>>>> >>>>> Not what I'm saying at all. I said we should carefully >>>>> consider whether a breaking change is the right decision in the 3.x line. >>>>> >>>>> All I'm suggesting is that we can make a 2.5 release with the feature >>>>> and an API that is the same as the one in 3.0. >>>>> >>>>> > I also don't get this backporting a giant feature to 2.x line >>>>> >>>>> I am planning to do this so we can use DSv2 before 3.0 is released. >>>>> Then we can have a source implementation that works in both 2.x and 3.0 to >>>>> make the transition easier. Since I'm already doing the work, I'm offering >>>>> to share it with the community. >>>>> >>>>> >>>>> On Sat, Sep 21, 2019 at 2:36 PM Reynold Xin <r...@databricks.com> >>>>> wrote: >>>>> >>>>>> Because for example we'd need to move the location of InternalRow, >>>>>> breaking the package name. If you insist we shouldn't change the unstable >>>>>> temporary API in 3.x to maintain compatibility with 3.0, which is totally >>>>>> different from my understanding of the situation when you exposed it, >>>>>> then >>>>>> I'd say we should gate 3.0 on having a stable row interface. >>>>>> >>>>>> I also don't get this backporting a giant feature to 2.x line ... as >>>>>> suggested by others in the thread, DSv2 would be one of the main reasons >>>>>> people upgrade to 3.0. What's so special about DSv2 that we are doing >>>>>> this? >>>>>> Why not abandoning 3.0 entirely and backport all the features to 2.x? >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 21, 2019 at 2:31 PM, Ryan Blue <rb...@netflix.com> wrote: >>>>>> >>>>>>> Why would that require an incompatible change? >>>>>>> >>>>>>> We *could* make an incompatible change and remove support for >>>>>>> InternalRow, but I think we would want to carefully consider whether >>>>>>> that >>>>>>> is the right decision. And in any case, we would be able to keep 2.5 and >>>>>>> 3.0 compatible, which is the main goal. >>>>>>> >>>>>>> On Sat, Sep 21, 2019 at 2:28 PM Reynold Xin <r...@databricks.com> >>>>>>> wrote: >>>>>>> >>>>>>>> How would you not make incompatible changes in 3.x? As discussed >>>>>>>> the InternalRow API is not stable and needs to change. >>>>>>>> >>>>>>>> On Sat, Sep 21, 2019 at 2:27 PM Ryan Blue <rb...@netflix.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> > Making downstream to diverge their implementation heavily >>>>>>>>> between minor versions (say, 2.4 vs 2.5) wouldn't be a good experience >>>>>>>>> >>>>>>>>> You're right that the API has been evolving in the 2.x line. But, >>>>>>>>> it is now reasonably stable with respect to the current feature set >>>>>>>>> and we >>>>>>>>> should not need to break compatibility in the 3.x line. Because we >>>>>>>>> have >>>>>>>>> reached our goals for the 3.0 release, we can backport at least those >>>>>>>>> features to 2.x and confidently have an API that works in both a 2.x >>>>>>>>> release and is compatible with 3.0, if not 3.1 and later releases as >>>>>>>>> well. >>>>>>>>> >>>>>>>>> > I'd rather say preparation of Spark 2.5 should be started after >>>>>>>>> Spark 3.0 is officially released >>>>>>>>> >>>>>>>>> The reason I'm suggesting this is that I'm already going to do the >>>>>>>>> work to backport the 3.0 release features to 2.4. I've been asked by >>>>>>>>> several people when DSv2 will be released, so I know there is a lot of >>>>>>>>> interest in making this available sooner than 3.0. If I'm already >>>>>>>>> doing the >>>>>>>>> work, then I'd be happy to share that with the community. >>>>>>>>> >>>>>>>>> I don't see why 2.5 and 3.0 are mutually exclusive. We can work on >>>>>>>>> 2.5 while preparing the 3.0 preview and fixing bugs. For DSv2, the >>>>>>>>> work is >>>>>>>>> about complete so we can easily release the same set of features and >>>>>>>>> API in >>>>>>>>> 2.5 and 3.0. >>>>>>>>> >>>>>>>>> If we decide for some reason to wait until after 3.0 is released, >>>>>>>>> I don't know that there is much value in a 2.5. The purpose is to be >>>>>>>>> a step >>>>>>>>> toward 3.0, and releasing that step after 3.0 doesn't seem helpful to >>>>>>>>> me. >>>>>>>>> It also wouldn't get these features out any sooner than 3.0, as a 2.5 >>>>>>>>> release probably would, given the work needed to validate the >>>>>>>>> incompatible >>>>>>>>> changes in 3.0. >>>>>>>>> >>>>>>>>> > DSv2 change would be the major backward incompatibility which >>>>>>>>> Spark 2.x users may hesitate to upgrade >>>>>>>>> >>>>>>>>> As I pointed out, DSv2 has been changing in the 2.x line, so this >>>>>>>>> is expected. I don't think it will need incompatible changes in the >>>>>>>>> 3.x >>>>>>>>> line. >>>>>>>>> >>>>>>>>> On Fri, Sep 20, 2019 at 9:25 PM Jungtaek Lim <kabh...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Just 2 cents, I haven't tracked the change of DSv2 (though I >>>>>>>>>> needed to deal with this as the change made confusion on my PRs...), >>>>>>>>>> but my >>>>>>>>>> bet is that DSv2 would be already changed in incompatible way, at >>>>>>>>>> least who >>>>>>>>>> works for custom DataSource. Making downstream to diverge their >>>>>>>>>> implementation heavily between minor versions (say, 2.4 vs 2.5) >>>>>>>>>> wouldn't be >>>>>>>>>> a good experience - especially we are not completely closed the >>>>>>>>>> chance >>>>>>>>>> to further modify DSv2, and the change could be backward >>>>>>>>>> incompatible. >>>>>>>>>> >>>>>>>>>> If we really want to bring the DSv2 change to 2.x version line to >>>>>>>>>> let end users avoid forcing to upgrade Spark 3.x to enjoy new DSv2, >>>>>>>>>> I'd >>>>>>>>>> rather say preparation of Spark 2.5 should be started after Spark >>>>>>>>>> 3.0 is >>>>>>>>>> officially released, honestly even later than that, say, getting some >>>>>>>>>> reports from Spark 3.0 about DSv2 so that we feel DSv2 is OK. I hope >>>>>>>>>> we >>>>>>>>>> don't make Spark 2.5 be a kind of "tech-preview" which Spark 2.4 >>>>>>>>>> users may >>>>>>>>>> be frustrated to upgrade to next minor version. >>>>>>>>>> >>>>>>>>>> Btw, do we have any specific target users for this? Personally >>>>>>>>>> DSv2 change would be the major backward incompatibility which Spark >>>>>>>>>> 2.x >>>>>>>>>> users may hesitate to upgrade, so they might be already prepared to >>>>>>>>>> migrate >>>>>>>>>> to Spark 3.0 if they are prepared to migrate to new DSv2. >>>>>>>>>> >>>>>>>>>> On Sat, Sep 21, 2019 at 12:46 PM Dongjoon Hyun < >>>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Do you mean you want to have a breaking API change between 3.0 >>>>>>>>>>> and 3.1? >>>>>>>>>>> I believe we follow Semantic Versioning ( >>>>>>>>>>> https://spark.apache.org/versioning-policy.html ). >>>>>>>>>>> >>>>>>>>>>> > We just won’t add any breaking changes before 3.1. >>>>>>>>>>> >>>>>>>>>>> Bests, >>>>>>>>>>> Dongjoon. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 20, 2019 at 11:48 AM Ryan Blue < >>>>>>>>>>> rb...@netflix.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> I don’t think we need to gate a 3.0 release on making a more >>>>>>>>>>>> stable version of InternalRow >>>>>>>>>>>> >>>>>>>>>>>> Sounds like we agree, then. We will use it for 3.0, but there >>>>>>>>>>>> are known problems with it. >>>>>>>>>>>> >>>>>>>>>>>> Thinking we’d have dsv2 working in both 3.x (which will change >>>>>>>>>>>> and progress towards more stable, but will have to break certain >>>>>>>>>>>> APIs) and >>>>>>>>>>>> 2.x seems like a false premise. >>>>>>>>>>>> >>>>>>>>>>>> Why do you think we will need to break certain APIs before 3.0? >>>>>>>>>>>> >>>>>>>>>>>> I’m only suggesting that we release the same support in a 2.5 >>>>>>>>>>>> release that we do in 3.0. Since we are nearly finished with the >>>>>>>>>>>> 3.0 goals, >>>>>>>>>>>> it seems like we can certainly do that. We just won’t add any >>>>>>>>>>>> breaking >>>>>>>>>>>> changes before 3.1. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 20, 2019 at 11:39 AM Reynold Xin < >>>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I don't think we need to gate a 3.0 release on making a more >>>>>>>>>>>>> stable version of InternalRow, but thinking we'd have dsv2 >>>>>>>>>>>>> working in both >>>>>>>>>>>>> 3.x (which will change and progress towards more stable, but will >>>>>>>>>>>>> have to >>>>>>>>>>>>> break certain APIs) and 2.x seems like a false premise. >>>>>>>>>>>>> >>>>>>>>>>>>> To point out some problems with InternalRow that you think are >>>>>>>>>>>>> already pragmatic and stable: >>>>>>>>>>>>> >>>>>>>>>>>>> The class is in catalyst, which states: >>>>>>>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala >>>>>>>>>>>>> >>>>>>>>>>>>> /** >>>>>>>>>>>>> * Catalyst is a library for manipulating relational query >>>>>>>>>>>>> plans. All classes in catalyst are >>>>>>>>>>>>> * considered an internal API to Spark SQL and are subject to >>>>>>>>>>>>> change between minor releases. >>>>>>>>>>>>> */ >>>>>>>>>>>>> >>>>>>>>>>>>> There is no even any annotation on the interface. >>>>>>>>>>>>> >>>>>>>>>>>>> The entire dependency chain were created to be private, and >>>>>>>>>>>>> tightly coupled with internal implementations. For example, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java >>>>>>>>>>>>> >>>>>>>>>>>>> /** >>>>>>>>>>>>> * A UTF-8 String for internal Spark use. >>>>>>>>>>>>> * <p> >>>>>>>>>>>>> * A String encoded in UTF-8 as an Array[Byte], which can be >>>>>>>>>>>>> used for comparison, >>>>>>>>>>>>> * search, see http://en.wikipedia.org/wiki/UTF-8 for details. >>>>>>>>>>>>> * <p> >>>>>>>>>>>>> * Note: This is not designed for general use cases, should not >>>>>>>>>>>>> be used outside SQL. >>>>>>>>>>>>> */ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala >>>>>>>>>>>>> >>>>>>>>>>>>> (which again is in catalyst package) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to argue this way, you might as well argue we >>>>>>>>>>>>> should make the entire catalyst package public to be pragmatic >>>>>>>>>>>>> and not >>>>>>>>>>>>> allow any changes. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Sep 20, 2019 at 11:32 AM, Ryan Blue <rb...@netflix.com >>>>>>>>>>>>> > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> When you created the PR to make InternalRow public >>>>>>>>>>>>>> >>>>>>>>>>>>>> This isn’t quite accurate. The change I made was to use >>>>>>>>>>>>>> InternalRow instead of UnsafeRow, which is a specific >>>>>>>>>>>>>> implementation of InternalRow. Exposing this API has always >>>>>>>>>>>>>> been a part of DSv2 and while both you and I did some work to >>>>>>>>>>>>>> avoid this, >>>>>>>>>>>>>> we are still in the phase of starting with that API. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note that any change to InternalRow would be very costly to >>>>>>>>>>>>>> implement because this interface is widely used. That is why I >>>>>>>>>>>>>> think we can >>>>>>>>>>>>>> certainly consider it stable enough to use here, and that’s >>>>>>>>>>>>>> probably why >>>>>>>>>>>>>> UnsafeRow was part of the original proposal. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In any case, the goal for 3.0 was not to replace the use of >>>>>>>>>>>>>> InternalRow, it was to get the majority of SQL working on >>>>>>>>>>>>>> top of the interface added after 2.4. That’s done and stable, so >>>>>>>>>>>>>> I think a >>>>>>>>>>>>>> 2.5 release with it is also reasonable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 20, 2019 at 11:23 AM Reynold Xin < >>>>>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> To push back, while I agree we should not drastically change >>>>>>>>>>>>>>> "InternalRow", there are a lot of changes that need to happen >>>>>>>>>>>>>>> to make it >>>>>>>>>>>>>>> stable. For example, none of the publicly exposed interfaces >>>>>>>>>>>>>>> should be in >>>>>>>>>>>>>>> the Catalyst package or the unsafe package. External >>>>>>>>>>>>>>> implementations should >>>>>>>>>>>>>>> be decoupled from the internal implementations, with cheap ways >>>>>>>>>>>>>>> to convert >>>>>>>>>>>>>>> back and forth. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When you created the PR to make InternalRow public, the >>>>>>>>>>>>>>> understanding was to work towards making it stable in the >>>>>>>>>>>>>>> future, assuming >>>>>>>>>>>>>>> we will start with an unstable API temporarily. You can't just >>>>>>>>>>>>>>> make a bunch >>>>>>>>>>>>>>> internal APIs tightly coupled with other internal pieces public >>>>>>>>>>>>>>> and stable >>>>>>>>>>>>>>> and call it a day, just because it happen to satisfy some use >>>>>>>>>>>>>>> cases >>>>>>>>>>>>>>> temporarily assuming the rest of Spark doesn't change. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 20, 2019 at 11:19 AM, Ryan Blue < >>>>>>>>>>>>>>> rb...@netflix.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > DSv2 is far from stable right? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No, I think it is reasonably stable and very close to being >>>>>>>>>>>>>>>> ready for a release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > All the actual data types are unstable and you guys have >>>>>>>>>>>>>>>> completely ignored that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think what you're referring to is the use of >>>>>>>>>>>>>>>> `InternalRow`. That's a stable API and there has been no work >>>>>>>>>>>>>>>> to avoid >>>>>>>>>>>>>>>> using it. In any case, I don't think that anyone is suggesting >>>>>>>>>>>>>>>> that we >>>>>>>>>>>>>>>> delay 3.0 until a replacement for `InternalRow` is added, >>>>>>>>>>>>>>>> right? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> While I understand the motivation for a better solution >>>>>>>>>>>>>>>> here, I think the pragmatic solution is to continue using >>>>>>>>>>>>>>>> `InternalRow`. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > If the goal is to make DSv2 work across 3.x and 2.x, that >>>>>>>>>>>>>>>> seems too invasive of a change to backport once you consider >>>>>>>>>>>>>>>> the parts >>>>>>>>>>>>>>>> needed to make dsv2 stable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I believe that those of us working on DSv2 are confident >>>>>>>>>>>>>>>> about the current stability. We set goals for what to get into >>>>>>>>>>>>>>>> the 3.0 >>>>>>>>>>>>>>>> release months ago and have very nearly reached the point >>>>>>>>>>>>>>>> where we are >>>>>>>>>>>>>>>> ready for that release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't think instability would be a problem in maintaining >>>>>>>>>>>>>>>> compatibility between the 2.5 version and the 3.0 version. If >>>>>>>>>>>>>>>> we find that >>>>>>>>>>>>>>>> we need to make API changes (other than additions) then we can >>>>>>>>>>>>>>>> make those >>>>>>>>>>>>>>>> in the 3.1 release. Because the goals we set for the 3.0 >>>>>>>>>>>>>>>> release have been >>>>>>>>>>>>>>>> reached with the current API and if we are ready to release >>>>>>>>>>>>>>>> 3.0, we can >>>>>>>>>>>>>>>> release a 2.5 with the same API. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Sep 20, 2019 at 11:05 AM Reynold Xin < >>>>>>>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> DSv2 is far from stable right? All the actual data types >>>>>>>>>>>>>>>>> are unstable and you guys have completely ignored that. We'd >>>>>>>>>>>>>>>>> need to work >>>>>>>>>>>>>>>>> on that and that will be a breaking change. If the goal is to >>>>>>>>>>>>>>>>> make DSv2 >>>>>>>>>>>>>>>>> work across 3.x and 2.x, that seems too invasive of a change >>>>>>>>>>>>>>>>> to backport >>>>>>>>>>>>>>>>> once you consider the parts needed to make dsv2 stable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue < >>>>>>>>>>>>>>>>> rb...@netflix.com.invalid> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In the DSv2 sync this week, we talked about a possible >>>>>>>>>>>>>>>>>> Spark 2.5 release based on the latest Spark 2.4, but with >>>>>>>>>>>>>>>>>> DSv2 and Java 11 >>>>>>>>>>>>>>>>>> support added. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> A Spark 2.5 release with these two additions will help >>>>>>>>>>>>>>>>>> people migrate to Spark 3.0 when it is released because they >>>>>>>>>>>>>>>>>> will be able >>>>>>>>>>>>>>>>>> to use a single implementation for DSv2 sources that works >>>>>>>>>>>>>>>>>> in both 2.5 and >>>>>>>>>>>>>>>>>> 3.0. Similarly, upgrading to 3.0 won't also require also >>>>>>>>>>>>>>>>>> updating to Java >>>>>>>>>>>>>>>>>> 11 because users could update to Java 11 with the 2.5 >>>>>>>>>>>>>>>>>> release and have >>>>>>>>>>>>>>>>>> fewer major changes. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Another reason to consider a 2.5 release is that many >>>>>>>>>>>>>>>>>> people are interested in a release with the latest DSv2 API >>>>>>>>>>>>>>>>>> and support for >>>>>>>>>>>>>>>>>> DSv2 SQL. I'm already going to be backporting DSv2 support >>>>>>>>>>>>>>>>>> to the Spark 2.4 >>>>>>>>>>>>>>>>>> line, so it makes sense to share this work with the >>>>>>>>>>>>>>>>>> community. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This release line would just consist of backports like >>>>>>>>>>>>>>>>>> DSv2 and Java 11 that assist compatibility, to keep the >>>>>>>>>>>>>>>>>> scope of the >>>>>>>>>>>>>>>>>> release small. The purpose is to assist people moving to 3.0 >>>>>>>>>>>>>>>>>> and not >>>>>>>>>>>>>>>>>> distract from the 3.0 release. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Would a Spark 2.5 release help anyone else? Are there any >>>>>>>>>>>>>>>>>> concerns about this plan? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> rb >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>>>>>> Netflix >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>>>> Netflix >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>> Netflix >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> Software Engineer >>>>>>>>>>>> Netflix >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Name : Jungtaek Lim >>>>>>>>>> Blog : http://medium.com/@heartsavior >>>>>>>>>> Twitter : http://twitter.com/heartsavior >>>>>>>>>> LinkedIn : http://www.linkedin.com/in/heartsavior >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Software Engineer >>>>>>>>> Netflix >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> >>>>>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau