Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-05 Thread Saisai Shao
Hi DB,

I saw that we already have 6 RCs, but the vote I can search by now was RC2,
were they all canceled?

Thanks
Saisai

DB Tsai  于2019年2月22日周五 上午4:51写道:

> I am cutting a new rc4 with fix from Felix. Thanks.
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0359BC9965359766
>
> On Thu, Feb 21, 2019 at 8:57 AM Felix Cheung 
> wrote:
> >
> > I merged the fix to 2.4.
> >
> >
> > 
> > From: Felix Cheung 
> > Sent: Wednesday, February 20, 2019 9:34 PM
> > To: DB Tsai; Spark dev list
> > Cc: Cesar Delgado
> > Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)
> >
> > Could you hold for a bit - I have one more fix to get in
> >
> >
> > 
> > From: d_t...@apple.com on behalf of DB Tsai 
> > Sent: Wednesday, February 20, 2019 12:25 PM
> > To: Spark dev list
> > Cc: Cesar Delgado
> > Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)
> >
> > Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859.
> >
> > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple,
> Inc
> >
> > > On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin
>  wrote:
> > >
> > > Just wanted to point out that
> > > https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC,
> > > and is marked as a correctness bug. (The fix is in the 2.4 branch,
> > > just not in rc2.)
> > >
> > > On Wed, Feb 20, 2019 at 12:07 PM DB Tsai 
> wrote:
> > >>
> > >> Please vote on releasing the following candidate as Apache Spark
> version 2.4.1.
> > >>
> > >> The vote is open until Feb 24 PST and passes if a majority +1 PMC
> votes are cast, with
> > >> a minimum of 3 +1 votes.
> > >>
> > >> [ ] +1 Release this package as Apache Spark 2.4.1
> > >> [ ] -1 Do not release this package because ...
> > >>
> > >> To learn more about Apache Spark, please see http://spark.apache.org/
> > >>
> > >> The tag to be voted on is v2.4.1-rc2 (commit
> 229ad524cfd3f74dd7aa5fc9ba841ae223caa960):
> > >> https://github.com/apache/spark/tree/v2.4.1-rc2
> > >>
> > >> The release files, including signatures, digests, etc. can be found
> at:
> > >> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-bin/
> > >>
> > >> Signatures used for Spark RCs can be found in this file:
> > >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >>
> > >> The staging repository for this release can be found at:
> > >>
> https://repository.apache.org/content/repositories/orgapachespark-1299/
> > >>
> > >> The documentation corresponding to this release can be found at:
> > >> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-docs/
> > >>
> > >> The list of bug fixes going into 2.4.1 can be found at the following
> URL:
> > >> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> > >>
> > >> FAQ
> > >>
> > >> =
> > >> How can I help test this release?
> > >> =
> > >>
> > >> If you are a Spark user, you can help us test this release by taking
> > >> an existing Spark workload and running on this release candidate, then
> > >> reporting any regressions.
> > >>
> > >> If you're working in PySpark you can set up a virtual env and install
> > >> the current RC and see if anything important breaks, in the Java/Scala
> > >> you can add the staging repository to your projects resolvers and test
> > >> with the RC (make sure to clean up the artifact cache before/after so
> > >> you don't end up building with a out of date RC going forward).
> > >>
> > >> ===
> > >> What should happen to JIRA tickets still targeting 2.4.1?
> > >> ===
> > >>
> > >> The current list of open tickets targeted at 2.4.1 can be found at:
> > >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.1
> > >>
> > >> Committers should look at those and triage. Extremely important bug
> > >> fixes, documentation, and API tweaks that impact compatibility should
> > >> be worked on immediately. Everything else please retarget to an
> > >> appropriate release.
> > >>
> > >> ==
> > >> But my bug isn't fixed?
> > >> ==
> > >>
> > >> In order to make timely releases, we will typically not hold the
> > >> release unless the bug in question is a regression from the previous
> > >> release. That being said, if there is something which is a regression
> > >> that has not been correctly targeted please ping me or a committer to
> > >> help target the issue.
> > >>
> > >>
> > >> DB Tsai | Siri Open Source Technologies [not a contribution] | 
> Apple, Inc
> > >>
> > >>
> > >> -
> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>
> > >
> > >
> > > --
> > > Marcelo
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@sp

Re: DataSourceV2 sync notes - 20 Feb 2019

2019-03-05 Thread Stavros Kontopoulos
Thanks for the update, is this meeting open for other people to join?

Stavros

On Thu, Feb 21, 2019 at 10:56 PM Ryan Blue 
wrote:

> Here are my notes from the DSv2 sync last night. As always, if you have
> corrections, please reply with them. And if you’d like to be included on
> the invite to participate in the next sync (6 March), send me an email.
>
> Here’s a quick summary of the topics where we had consensus last night:
>
>- The behavior of v1 sources needs to be documented to come up with a
>migration plan
>- Spark 3.0 should include DSv2, even if it would delay the release
>(pending community discussion and vote)
>- Design for the v2 Catalog plugin system
>- V2 catalog approach of separate TableCatalog, FunctionCatalog, and
>ViewCatalog interfaces
>- Common v2 Table metadata should be schema, partitioning, and
>string-map of properties; leaving out sorting for now. (Ready to vote on
>metadata SPIP.)
>
> *Topics*:
>
>- Issues raised by ORC v2 commit
>- Migration to v2 sources
>- Roadmap and current blockers
>- Catalog plugin system
>- Catalog API separate interfaces approach
>- Catalog API metadata (schema, partitioning, and properties)
>- Public catalog API proposal
>
> *Notes*:
>
>- Issues raised by ORC v2 commit
>   - Ryan: Disabled change to use v2 by default in PR for overwrite
>   plans: tests rely on CTAS, which is not implemented in v2.
>   - Wenchen: suggested using a StagedTable to work around not having
>   a CTAS finished. TableProvider could create a staged table.
>   - Ryan: Using StagedTable doesn’t make sense to me. It was intended
>   to solve a different problem (atomicity). Adding an interface to create 
> a
>   staged table either requires the same metadata as CTAS or requires a 
> blank
>   staged table, which isn’t the same concept: these staged tables would
>   behave entirely differently than the ones for atomic operations. Better 
> to
>   spend time getting CTAS done and work through the long-term plan than to
>   hack around it.
>   - Second issue raised by the ORC work: how to support tables that
>   use different validations.
>   - Ryan: What Gengliang’s PRs are missing is a clear definition of
>   what tables require different validation and what that validation should
>   be. In some cases, CTAS is validated against existing data [Ed: this is
>   PreprocessTableCreation] and in some cases, Append has no validation
>   because the table doesn’t exist. What isn’t clear is when these 
> validations
>   are applied.
>   - Ryan: Without knowing exactly how v1 works, we can’t mirror that
>   behavior in v2. Building a way to turn off validation is going to be
>   needed, but is insufficient without knowing when to apply it.
>   - Ryan: We also don’t know if it will make sense to maintain all of
>   these rules to mimic v1 behavior. In v1, CTAS and Append can both write 
> to
>   existing tables, but use different rules to validate. What are the
>   differences between them? It is unlikely that Spark will support both as
>   options, if that is even possible. [Ed: see later discussion on 
> migration
>   that continues this.]
>   - Gengliang: Using SaveMode is an option.
>   - Ryan: Using SaveMode only appears to fix this, but doesn’t
>   actually test v2. Using SaveMode appears to work because it disables all
>   validation and uses code from v1 that will “create” tables by writing. 
> But
>   this isn’t helpful for the v2 goal of having defined and reliable 
> behavior.
>   - Gengliang: SaveMode is not correctly translated. Append could
>   mean AppendData or CTAS.
>   - Ryan: This is why we need to focus on finishing the v2 plans: so
>   we can correctly translate the SaveMode into the right plan. That 
> depends
>   on having a catalog for CTAS and to check the existence of a table.
>   - Wenchen: Catalog doesn’t support path tables, so how does this
>   help?
>   - Ryan: The multi-catalog identifiers proposal includes a way to
>   pass paths as CatalogIdentifiers. [Ed: see PathIdentifier]. This allows 
> a
>   catalog implementation to handle path-based tables. The identifier will
>   also have a method to test whether the identifier is a path identifier 
> and
>   catalogs are not required to support path identifiers.
>- Migration to v2 sources
>   - Hyukjin: Once the ORC upgrade is done how will we move from v1 to
>   v2?
>   - Ryan: We will need to develop v1 and v2 in parallel. There are
>   many code paths in v1 and we don’t know exactly what they do. We first 
> need
>   to know what they do and make a migration plan after that.
>   - Hyukjin: What if there are many behavior differences? Will this
>   require an API to opt in for each one?
>   - Ryan: Without knowing how v1 behaves

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Tom Graves
 So to me most of the questions here are implementation/design questions, I've 
had this issue in the past with SPIP's where I expected to have more high level 
design details but was basically told that belongs in the design jira follow 
on. This makes me think we need to revisit what a SPIP really need to contain, 
which should be done in a separate thread.  Note personally I would be for 
having more high level details in it.But the way I read our documentation on a 
SPIP right now that detail is all optional, now maybe we could argue its based 
on what reviewers request, but really perhaps we should make the wording of 
that more required.  thoughts?  We should probably separate that discussion if 
people want to talk about that.
For this SPIP in particular the reason I +1 it is because it came down to 2 
questions:
1) do I think spark should support this -> my answer is yes, I think this would 
improve spark, users have been requesting both better GPUs support and support 
for controlling container requests at a finer granularity for a while.  If 
spark doesn't support this then users may go to something else, so I think it 
we should support it
2) do I think its possible to design and implement it without causing large 
instabilities?   My opinion here again is yes. I agree with Imran and others 
that the scheduler piece needs to be looked at very closely as we have had a 
lot of issues there and that is why I was asking for more details in the design 
jira:  https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe its 
possible to do.
If others have reservations on similar questions then I think we should resolve 
here or take the discussion of what a SPIP is to a different thread and then 
come back to this, thoughts?    
Note there is a high level design for at least the core piece, which is what 
people seem concerned with, already so including it in the SPIP should be 
straight forward.
Tom
On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid 
 wrote:  
 
 On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng  wrote:

On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung  wrote:
IMO upfront allocation is less useful. Specifically too expensive for large 
jobs.

This is also an API/design discussion.

I agree with Felix -- this is more than just an API question.  It has a huge 
impact on the complexity of what you're proposing.  You might be proposing big 
changes to a core and brittle part of spark, which is already short of experts.
I don't see any value in having a vote on "does feature X sound cool?"  We have 
to evaluate the potential benefit against the risks the feature brings and the 
continued maintenance cost.  We don't need super low-level details, but we have 
to a sketch of the design to be able to make that tradeoff.  

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Imran Rashid
OK, I suppose then we are getting bogged down into what a vote on an SPIP
means then anyway, which I guess we can set aside for now.  With the level
of detail in this proposal, I feel like there is a reasonable chance I'd
still -1 the design or implementation.

And the other thing you're implicitly asking the community for is to
prioritize this feature for continued review and maintenance.  There is
already work to be done in things like making barrier mode support dynamic
allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
very concerned about getting spread too thin.

But if this is really just a vote on (1) is better gpu support important
for spark, in some form, in some release? and (2) is it *possible* to do
this in a safe way?  then I will vote +0.

On Tue, Mar 5, 2019 at 8:25 AM Tom Graves  wrote:

> So to me most of the questions here are implementation/design questions,
> I've had this issue in the past with SPIP's where I expected to have more
> high level design details but was basically told that belongs in the design
> jira follow on. This makes me think we need to revisit what a SPIP really
> need to contain, which should be done in a separate thread.  Note
> personally I would be for having more high level details in it.
> But the way I read our documentation on a SPIP right now that detail is
> all optional, now maybe we could argue its based on what reviewers request,
> but really perhaps we should make the wording of that more required.
>  thoughts?  We should probably separate that discussion if people want to
> talk about that.
>
> For this SPIP in particular the reason I +1 it is because it came down to
> 2 questions:
>
> 1) do I think spark should support this -> my answer is yes, I think this
> would improve spark, users have been requesting both better GPUs support
> and support for controlling container requests at a finer granularity for a
> while.  If spark doesn't support this then users may go to something else,
> so I think it we should support it
>
> 2) do I think its possible to design and implement it without causing
> large instabilities?   My opinion here again is yes. I agree with Imran and
> others that the scheduler piece needs to be looked at very closely as we
> have had a lot of issues there and that is why I was asking for more
> details in the design jira:
> https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe its
> possible to do.
>
> If others have reservations on similar questions then I think we should
> resolve here or take the discussion of what a SPIP is to a different thread
> and then come back to this, thoughts?
>
> Note there is a high level design for at least the core piece, which is
> what people seem concerned with, already so including it in the SPIP should
> be straight forward.
>
> Tom
>
> On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid <
> im...@therashids.com> wrote:
>
>
> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng  wrote:
>
> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung 
> wrote:
>
> IMO upfront allocation is less useful. Specifically too expensive for
> large jobs.
>
>
> This is also an API/design discussion.
>
>
> I agree with Felix -- this is more than just an API question.  It has a
> huge impact on the complexity of what you're proposing.  You might be
> proposing big changes to a core and brittle part of spark, which is already
> short of experts.
>
> I don't see any value in having a vote on "does feature X sound cool?"  We
> have to evaluate the potential benefit against the risks the feature brings
> and the continued maintenance cost.  We don't need super low-level details,
> but we have to a sketch of the design to be able to make that tradeoff.
>


Re: DataSourceV2 sync notes - 20 Feb 2019

2019-03-05 Thread Ryan Blue
Everyone is welcome to join this discussion. Just send me an e-mail to get
added to the invite.

Stavros, I'll add you.

rb

On Tue, Mar 5, 2019 at 5:43 AM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> Thanks for the update, is this meeting open for other people to join?
>
> Stavros
>
> On Thu, Feb 21, 2019 at 10:56 PM Ryan Blue 
> wrote:
>
>> Here are my notes from the DSv2 sync last night. As always, if you have
>> corrections, please reply with them. And if you’d like to be included on
>> the invite to participate in the next sync (6 March), send me an email.
>>
>> Here’s a quick summary of the topics where we had consensus last night:
>>
>>- The behavior of v1 sources needs to be documented to come up with a
>>migration plan
>>- Spark 3.0 should include DSv2, even if it would delay the release
>>(pending community discussion and vote)
>>- Design for the v2 Catalog plugin system
>>- V2 catalog approach of separate TableCatalog, FunctionCatalog, and
>>ViewCatalog interfaces
>>- Common v2 Table metadata should be schema, partitioning, and
>>string-map of properties; leaving out sorting for now. (Ready to vote on
>>metadata SPIP.)
>>
>> *Topics*:
>>
>>- Issues raised by ORC v2 commit
>>- Migration to v2 sources
>>- Roadmap and current blockers
>>- Catalog plugin system
>>- Catalog API separate interfaces approach
>>- Catalog API metadata (schema, partitioning, and properties)
>>- Public catalog API proposal
>>
>> *Notes*:
>>
>>- Issues raised by ORC v2 commit
>>   - Ryan: Disabled change to use v2 by default in PR for overwrite
>>   plans: tests rely on CTAS, which is not implemented in v2.
>>   - Wenchen: suggested using a StagedTable to work around not having
>>   a CTAS finished. TableProvider could create a staged table.
>>   - Ryan: Using StagedTable doesn’t make sense to me. It was
>>   intended to solve a different problem (atomicity). Adding an interface 
>> to
>>   create a staged table either requires the same metadata as CTAS or 
>> requires
>>   a blank staged table, which isn’t the same concept: these staged tables
>>   would behave entirely differently than the ones for atomic operations.
>>   Better to spend time getting CTAS done and work through the long-term 
>> plan
>>   than to hack around it.
>>   - Second issue raised by the ORC work: how to support tables that
>>   use different validations.
>>   - Ryan: What Gengliang’s PRs are missing is a clear definition of
>>   what tables require different validation and what that validation 
>> should
>>   be. In some cases, CTAS is validated against existing data [Ed: this is
>>   PreprocessTableCreation] and in some cases, Append has no validation
>>   because the table doesn’t exist. What isn’t clear is when these 
>> validations
>>   are applied.
>>   - Ryan: Without knowing exactly how v1 works, we can’t mirror that
>>   behavior in v2. Building a way to turn off validation is going to be
>>   needed, but is insufficient without knowing when to apply it.
>>   - Ryan: We also don’t know if it will make sense to maintain all
>>   of these rules to mimic v1 behavior. In v1, CTAS and Append can both 
>> write
>>   to existing tables, but use different rules to validate. What are the
>>   differences between them? It is unlikely that Spark will support both 
>> as
>>   options, if that is even possible. [Ed: see later discussion on 
>> migration
>>   that continues this.]
>>   - Gengliang: Using SaveMode is an option.
>>   - Ryan: Using SaveMode only appears to fix this, but doesn’t
>>   actually test v2. Using SaveMode appears to work because it disables 
>> all
>>   validation and uses code from v1 that will “create” tables by writing. 
>> But
>>   this isn’t helpful for the v2 goal of having defined and reliable 
>> behavior.
>>   - Gengliang: SaveMode is not correctly translated. Append could
>>   mean AppendData or CTAS.
>>   - Ryan: This is why we need to focus on finishing the v2 plans: so
>>   we can correctly translate the SaveMode into the right plan. That 
>> depends
>>   on having a catalog for CTAS and to check the existence of a table.
>>   - Wenchen: Catalog doesn’t support path tables, so how does this
>>   help?
>>   - Ryan: The multi-catalog identifiers proposal includes a way to
>>   pass paths as CatalogIdentifiers. [Ed: see PathIdentifier]. This 
>> allows a
>>   catalog implementation to handle path-based tables. The identifier will
>>   also have a method to test whether the identifier is a path identifier 
>> and
>>   catalogs are not required to support path identifiers.
>>- Migration to v2 sources
>>   - Hyukjin: Once the ORC upgrade is done how will we move from v1
>>   to v2?
>>   - Ryan: We will need to develop v1 and v2 in paralle

Re: DataSourceV2 sync notes - 20 Feb 2019

2019-03-05 Thread Stavros Kontopoulos
Thanks Ryan!

On Tue, Mar 5, 2019 at 7:19 PM Ryan Blue  wrote:

> Everyone is welcome to join this discussion. Just send me an e-mail to get
> added to the invite.
>
> Stavros, I'll add you.
>
> rb
>
> On Tue, Mar 5, 2019 at 5:43 AM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> Thanks for the update, is this meeting open for other people to join?
>>
>> Stavros
>>
>> On Thu, Feb 21, 2019 at 10:56 PM Ryan Blue 
>> wrote:
>>
>>> Here are my notes from the DSv2 sync last night. As always, if you have
>>> corrections, please reply with them. And if you’d like to be included on
>>> the invite to participate in the next sync (6 March), send me an email.
>>>
>>> Here’s a quick summary of the topics where we had consensus last night:
>>>
>>>- The behavior of v1 sources needs to be documented to come up with
>>>a migration plan
>>>- Spark 3.0 should include DSv2, even if it would delay the release
>>>(pending community discussion and vote)
>>>- Design for the v2 Catalog plugin system
>>>- V2 catalog approach of separate TableCatalog, FunctionCatalog, and
>>>ViewCatalog interfaces
>>>- Common v2 Table metadata should be schema, partitioning, and
>>>string-map of properties; leaving out sorting for now. (Ready to vote on
>>>metadata SPIP.)
>>>
>>> *Topics*:
>>>
>>>- Issues raised by ORC v2 commit
>>>- Migration to v2 sources
>>>- Roadmap and current blockers
>>>- Catalog plugin system
>>>- Catalog API separate interfaces approach
>>>- Catalog API metadata (schema, partitioning, and properties)
>>>- Public catalog API proposal
>>>
>>> *Notes*:
>>>
>>>- Issues raised by ORC v2 commit
>>>   - Ryan: Disabled change to use v2 by default in PR for overwrite
>>>   plans: tests rely on CTAS, which is not implemented in v2.
>>>   - Wenchen: suggested using a StagedTable to work around not
>>>   having a CTAS finished. TableProvider could create a staged table.
>>>   - Ryan: Using StagedTable doesn’t make sense to me. It was
>>>   intended to solve a different problem (atomicity). Adding an 
>>> interface to
>>>   create a staged table either requires the same metadata as CTAS or 
>>> requires
>>>   a blank staged table, which isn’t the same concept: these staged 
>>> tables
>>>   would behave entirely differently than the ones for atomic operations.
>>>   Better to spend time getting CTAS done and work through the long-term 
>>> plan
>>>   than to hack around it.
>>>   - Second issue raised by the ORC work: how to support tables that
>>>   use different validations.
>>>   - Ryan: What Gengliang’s PRs are missing is a clear definition of
>>>   what tables require different validation and what that validation 
>>> should
>>>   be. In some cases, CTAS is validated against existing data [Ed: this 
>>> is
>>>   PreprocessTableCreation] and in some cases, Append has no validation
>>>   because the table doesn’t exist. What isn’t clear is when these 
>>> validations
>>>   are applied.
>>>   - Ryan: Without knowing exactly how v1 works, we can’t mirror
>>>   that behavior in v2. Building a way to turn off validation is going 
>>> to be
>>>   needed, but is insufficient without knowing when to apply it.
>>>   - Ryan: We also don’t know if it will make sense to maintain all
>>>   of these rules to mimic v1 behavior. In v1, CTAS and Append can both 
>>> write
>>>   to existing tables, but use different rules to validate. What are the
>>>   differences between them? It is unlikely that Spark will support both 
>>> as
>>>   options, if that is even possible. [Ed: see later discussion on 
>>> migration
>>>   that continues this.]
>>>   - Gengliang: Using SaveMode is an option.
>>>   - Ryan: Using SaveMode only appears to fix this, but doesn’t
>>>   actually test v2. Using SaveMode appears to work because it disables 
>>> all
>>>   validation and uses code from v1 that will “create” tables by 
>>> writing. But
>>>   this isn’t helpful for the v2 goal of having defined and reliable 
>>> behavior.
>>>   - Gengliang: SaveMode is not correctly translated. Append could
>>>   mean AppendData or CTAS.
>>>   - Ryan: This is why we need to focus on finishing the v2 plans:
>>>   so we can correctly translate the SaveMode into the right plan. That
>>>   depends on having a catalog for CTAS and to check the existence of a 
>>> table.
>>>   - Wenchen: Catalog doesn’t support path tables, so how does this
>>>   help?
>>>   - Ryan: The multi-catalog identifiers proposal includes a way to
>>>   pass paths as CatalogIdentifiers. [Ed: see PathIdentifier]. This 
>>> allows a
>>>   catalog implementation to handle path-based tables. The identifier 
>>> will
>>>   also have a method to test whether the identifier is a path 
>>> identifier and
>>>   catalogs are not required to support path 

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Xiangrui Meng
How about letting Xingbo make a major revision to the SPIP doc to make it
clear what proposed are? I like Felix's suggestion to switch to the new
Heilmeier template, which helps clarify what are proposed and what are not.
Then let's review the new SPIP and resume the vote.

On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid  wrote:

> OK, I suppose then we are getting bogged down into what a vote on an SPIP
> means then anyway, which I guess we can set aside for now.  With the level
> of detail in this proposal, I feel like there is a reasonable chance I'd
> still -1 the design or implementation.
>
> And the other thing you're implicitly asking the community for is to
> prioritize this feature for continued review and maintenance.  There is
> already work to be done in things like making barrier mode support dynamic
> allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and
> general efficiency of failure handling (eg. SPARK-25341, SPARK-20178).  I'm
> very concerned about getting spread too thin.
>

> But if this is really just a vote on (1) is better gpu support important
> for spark, in some form, in some release? and (2) is it *possible* to do
> this in a safe way?  then I will vote +0.
>
> On Tue, Mar 5, 2019 at 8:25 AM Tom Graves  wrote:
>
>> So to me most of the questions here are implementation/design questions,
>> I've had this issue in the past with SPIP's where I expected to have more
>> high level design details but was basically told that belongs in the design
>> jira follow on. This makes me think we need to revisit what a SPIP really
>> need to contain, which should be done in a separate thread.  Note
>> personally I would be for having more high level details in it.
>> But the way I read our documentation on a SPIP right now that detail is
>> all optional, now maybe we could argue its based on what reviewers request,
>> but really perhaps we should make the wording of that more required.
>>  thoughts?  We should probably separate that discussion if people want to
>> talk about that.
>>
>> For this SPIP in particular the reason I +1 it is because it came down to
>> 2 questions:
>>
>> 1) do I think spark should support this -> my answer is yes, I think this
>> would improve spark, users have been requesting both better GPUs support
>> and support for controlling container requests at a finer granularity for a
>> while.  If spark doesn't support this then users may go to something else,
>> so I think it we should support it
>>
>> 2) do I think its possible to design and implement it without causing
>> large instabilities?   My opinion here again is yes. I agree with Imran and
>> others that the scheduler piece needs to be looked at very closely as we
>> have had a lot of issues there and that is why I was asking for more
>> details in the design jira:
>> https://issues.apache.org/jira/browse/SPARK-27005.  But I do believe its
>> possible to do.
>>
>> If others have reservations on similar questions then I think we should
>> resolve here or take the discussion of what a SPIP is to a different thread
>> and then come back to this, thoughts?
>>
>> Note there is a high level design for at least the core piece, which is
>> what people seem concerned with, already so including it in the SPIP should
>> be straight forward.
>>
>> Tom
>>
>> On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid <
>> im...@therashids.com> wrote:
>>
>>
>> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng  wrote:
>>
>> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung 
>> wrote:
>>
>> IMO upfront allocation is less useful. Specifically too expensive for
>> large jobs.
>>
>>
>> This is also an API/design discussion.
>>
>>
>> I agree with Felix -- this is more than just an API question.  It has a
>> huge impact on the complexity of what you're proposing.  You might be
>> proposing big changes to a core and brittle part of spark, which is already
>> short of experts.
>>
>> I don't see any value in having a vote on "does feature X sound cool?"
>> We have to evaluate the potential benefit against the risks the feature
>> brings and the continued maintenance cost.  We don't need super low-level
>> details, but we have to a sketch of the design to be able to make that
>> tradeoff.
>>
>