Will that then require an API break down the line? Do we save that for Spark 4?
-Matt Cheah? From: Ryan Blue <rb...@netflix.com> Reply-To: "rb...@netflix.com" <rb...@netflix.com> Date: Tuesday, February 26, 2019 at 4:53 PM To: Matt Cheah <mch...@palantir.com> Cc: Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, Xiao Li <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, Spark Dev List <dev@spark.apache.org> Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2 That's a good question. While I'd love to have a solution for that, I don't think it is a good idea to delay DSv2 until we have one. That is going to require a lot of internal changes and I don't see how we could make the release date if we are including an InternalRow replacement. On Tue, Feb 26, 2019 at 4:41 PM Matt Cheah <mch...@palantir.com> wrote: Reynold made a note earlier about a proper Row API that isn’t InternalRow – is that still on the table? -Matt Cheah From: Ryan Blue <rb...@netflix.com> Reply-To: "rb...@netflix.com" <rb...@netflix.com> Date: Tuesday, February 26, 2019 at 4:40 PM To: Matt Cheah <mch...@palantir.com> Cc: Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, Xiao Li <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, Spark Dev List <dev@spark.apache.org> Subject: Re: [DISCUSS] Spark 3.0 and DataSourceV2 Thanks for bumping this, Matt. I think we can have the discussion here to clarify exactly what we’re committing to and then have a vote thread once we’re agreed. Getting back to the DSv2 discussion, I think we have a good handle on what would be added: · Plugin system for catalogs · TableCatalog interface (I’ll start a vote thread for this SPIP shortly) · TableCatalog implementation backed by SessionCatalog that can load v2 tables · Resolution rule to load v2 tables using the new catalog · CTAS logical and physical plan nodes · Conversions from SQL parsed logical plans to v2 logical plans Initially, this will always use the v2 catalog backed by SessionCatalog to avoid dependence on the multi-catalog work. All of those are already implemented and working, so I think it is reasonable that we can get them in. Then we can consider a few stretch goals: · Get in as much DDL as we can. I think create and drop table should be easy. · Multi-catalog identifier parsing and multi-catalog support If we get those last two in, it would be great. We can make the call closer to release time. Does anyone want to change this set of work? On Tue, Feb 26, 2019 at 4:23 PM Matt Cheah <mch...@palantir.com> wrote: What would then be the next steps we'd take to collectively decide on plans and timelines moving forward? Might I suggest scheduling a conference call with appropriate PMCs to put our ideas together? Maybe such a discussion can take place at next week's meeting? Or do we need to have a separate formalized voting thread which is guided by a PMC? My suggestion is to try to make concrete steps forward and to avoid letting this slip through the cracks. I also think there would be merits to having a project plan and estimates around how long each of the features we want to complete is going to take to implement and review. -Matt Cheah On 2/24/19, 3:05 PM, "Sean Owen" <sro...@apache.org> wrote: Sure, I don't read anyone making these statements though? Let's assume good intent, that "foo should happen" as "my opinion as a member of the community, which is not solely up to me, is that foo should happen". I understand it's possible for a person to make their opinion over-weighted; this whole style of decision making assumes good actors and doesn't optimize against bad ones. Not that it can't happen, just not seeing it here. I have never seen any vote on a feature list, by a PMC or otherwise. We can do that if really needed I guess. But that also isn't the authoritative process in play here, in contrast. If there's not a more specific subtext or issue here, which is fine to say (on private@ if it's sensitive or something), yes, let's move on in good faith. On Sun, Feb 24, 2019 at 3:45 PM Mark Hamstra <m...@clearstorydata.com> wrote: > There is nothing wrong with individuals advocating for what they think should or should not be in Spark 3.0, nor should anyone shy away from explaining why they think delaying the release for some reason is or isn't a good idea. What is a problem, or is at least something that I have a problem with, are declarative, pseudo-authoritative statements that 3.0 (or some other release) will or won't contain some feature, API, etc. or that some issue is or is not blocker or worth delaying for. When the PMC has not voted on such issues, I'm often left thinking, "Wait... what? Who decided that, or where did that decision come from?" -- Ryan Blue Software Engineer Netflix -- Ryan Blue Software Engineer Netflix
smime.p7s
Description: S/MIME cryptographic signature