Re: [DISCUSS] Spark 3.0 and DataSourceV2

Matei Zaharia Tue, 26 Feb 2019 18:31:23 -0800

To add to this, we can add a stable interface anytime if the original one was 
marked as unstable; we wouldn’t have to wait until 4.0. We had a lot of APIs 
that were experimental in 2.0 and then got stabilized in later 2.x releases for 
example.


Matei

> On Feb 26, 2019, at 5:12 PM, Reynold Xin <r...@databricks.com> wrote:
> 
> We will have to fix that before we declare dev2 is stable, because 
> InternalRow is not a stable API. We don’t necessarily need to do it in 3.0. 
> 
> On Tue, Feb 26, 2019 at 5:10 PM Matt Cheah <mch...@palantir.com> wrote:
> Will that then require an API break down the line? Do we save that for Spark 
> 4?
> 
> 
>  
> 
> -Matt Cheah?
> 
>  
> 
> From: Ryan Blue <rb...@netflix.com>
> Reply-To: "rb...@netflix.com" <rb...@netflix.com>
> Date: Tuesday, February 26, 2019 at 4:53 PM
> To: Matt Cheah <mch...@palantir.com>
> Cc: Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, Xiao Li 
> <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, Spark Dev 
> List <dev@spark.apache.org>
> Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
> 
>  
> 
> That's a good question.
> 
>  
> 
> While I'd love to have a solution for that, I don't think it is a good idea 
> to delay DSv2 until we have one. That is going to require a lot of internal 
> changes and I don't see how we could make the release date if we are 
> including an InternalRow replacement.
> 
>  
> 
> On Tue, Feb 26, 2019 at 4:41 PM Matt Cheah <mch...@palantir.com> wrote:
> 
> Reynold made a note earlier about a proper Row API that isn’t InternalRow – 
> is that still on the table?
> 
>  
> 
> -Matt Cheah
> 
>  
> 
> From: Ryan Blue <rb...@netflix.com>
> Reply-To: "rb...@netflix.com" <rb...@netflix.com>
> Date: Tuesday, February 26, 2019 at 4:40 PM
> To: Matt Cheah <mch...@palantir.com>
> Cc: Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, Xiao Li 
> <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, Spark Dev 
> List <dev@spark.apache.org>
> Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2
> 
>  
> 
> Thanks for bumping this, Matt. I think we can have the discussion here to 
> clarify exactly what we’re committing to and then have a vote thread once 
> we’re agreed.
> Getting back to the DSv2 discussion, I think we have a good handle on what 
> would be added:
> ·         Plugin system for catalogs
> 
> ·         TableCatalog interface (I’ll start a vote thread for this SPIP 
> shortly)
> 
> ·         TableCatalog implementation backed by SessionCatalog that can load 
> v2 tables
> 
> ·         Resolution rule to load v2 tables using the new catalog
> 
> ·         CTAS logical and physical plan nodes
> 
> ·         Conversions from SQL parsed logical plans to v2 logical plans
> 
> Initially, this will always use the v2 catalog backed by SessionCatalog to 
> avoid dependence on the multi-catalog work. All of those are already 
> implemented and working, so I think it is reasonable that we can get them in.
> Then we can consider a few stretch goals:
> ·         Get in as much DDL as we can. I think create and drop table should 
> be easy.
> 
> ·         Multi-catalog identifier parsing and multi-catalog support
> 
> If we get those last two in, it would be great. We can make the call closer 
> to release time. Does anyone want to change this set of work?
>  
> 
> On Tue, Feb 26, 2019 at 4:23 PM Matt Cheah <mch...@palantir.com> wrote:
> 
> What would then be the next steps we'd take to collectively decide on plans 
> and timelines moving forward? Might I suggest scheduling a conference call 
> with appropriate PMCs to put our ideas together? Maybe such a discussion can 
> take place at next week's meeting? Or do we need to have a separate 
> formalized voting thread which is guided by a PMC?
> 
> My suggestion is to try to make concrete steps forward and to avoid letting 
> this slip through the cracks.
> 
> I also think there would be merits to having a project plan and estimates 
> around how long each of the features we want to complete is going to take to 
> implement and review.
> 
> -Matt Cheah
> 
> On 2/24/19, 3:05 PM, "Sean Owen" <sro...@apache.org> wrote:
> 
>     Sure, I don't read anyone making these statements though? Let's assume
>     good intent, that "foo should happen" as "my opinion as a member of
>     the community, which is not solely up to me, is that foo should
>     happen". I understand it's possible for a person to make their opinion
>     over-weighted; this whole style of decision making assumes good actors
>     and doesn't optimize against bad ones. Not that it can't happen, just
>     not seeing it here.
> 
>     I have never seen any vote on a feature list, by a PMC or otherwise.
>     We can do that if really needed I guess. But that also isn't the
>     authoritative process in play here, in contrast.
> 
>     If there's not a more specific subtext or issue here, which is fine to
>     say (on private@ if it's sensitive or something), yes, let's move on
>     in good faith.
> 
>     On Sun, Feb 24, 2019 at 3:45 PM Mark Hamstra <m...@clearstorydata.com> 
> wrote:
>     > There is nothing wrong with individuals advocating for what they think 
> should or should not be in Spark 3.0, nor should anyone shy away from 
> explaining why they think delaying the release for some reason is or isn't a 
> good idea. What is a problem, or is at least something that I have a problem 
> with, are declarative, pseudo-authoritative statements that 3.0 (or some 
> other release) will or won't contain some feature, API, etc. or that some 
> issue is or is not blocker or worth delaying for. When the PMC has not voted 
> on such issues, I'm often left thinking, "Wait... what? Who decided that, or 
> where did that decision come from?"
> 
> 
> 
>  
> 
> --
> 
> Ryan Blue
> 
> Software Engineer
> 
> Netflix
> 
> 
> 
>  
> 
> --
> 
> Ryan Blue
> 
> Software Engineer
> 
> Netflix
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [DISCUSS] Spark 3.0 and DataSourceV2

Reply via email to