date:20200108

unsubscribe

2020-01-08 Thread William R

unsubscribe

Reviewers for Stage level Scheduling prs

2020-01-08 Thread Tom Graves

Hey everyone,
I'm trying to get reviewers for the Stage Level Scheduling pull requests.  I 
was hoping to get this into Spark 3.0.The code is mostly complete - its just 
missing the webUI and final doc changes.
If anyone has time, reviews from committers would be appreciated.
You can find information about the overall feature and design 
here:[SPARK-27495] SPIP: Support Stage level resource configuration and 
scheduling - ASF JIRA it has Stage Level Scheduling SPIP Appendices 
API/Designattached with a high level overview.

I have a reference pr with most of the code implemented: 
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference 
by tgravescs · Pull Request #27053 · apache/spark

and I've been trying to break that into smaller pieces for easier review - this 
is the current one: [SPARK-29306][CORE] Stage Level Sched: Executors need to 
track what ResourceProfile they are created with by tgravescs · Pull Request 
#26682 · apache/spark

Regards,Tom Graves

unsubscribe

2020-01-08 Thread MEETHU MATHEW




Thanks & Regards, Meethu M

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [SPARK-30319][SQL] Add a stricter version of as[T]

2020-01-08 Thread Enrico Minack

Yes, as[T] is lazy as any transformation is, but in terms of data 
processing not schema. You seem to imply the as[T] is lazy in terms of 
the schema, where I do no know of any other transformation that behaves 
like this.


Your proposed solution works, because the map transformation returns the 
right schema, though it is also a lazy transformation. The as[T] should 
behave like this too.


The map transformation is a quick fix in terms of code length, but it 
materializes the data as instances of T, which introduces a prohibitive 
deserialization / serialization round trip for no good reason:


I think returning the right schema does not need to touch any data and 
should be as lightweight as a projection.


Enrico


Am 07.01.20 um 10:13 schrieb Wenchen Fan:
I think it's simply because as[T] is lazy. You will see the right 
schema if you do `df.as [T].map(identity)`.




On Tue, Jan 7, 2020 at 4:42 PM Enrico Minack > wrote:


Hi Devs,

I'd like to propose a stricter version of as[T]. Given the
interface def as[T](): Dataset[T], it is counter-intuitive that
the schema of the returned Dataset[T] is not agnostic to the
schema of the originating Dataset. The schema should always be
derived only from T.

I am proposing a stricter version so that user code does not need
to pair an .as[T] with a select(schemaOfT.fields.map(col(_.name)):
_*) whenever your code expects Dataset[T] to really contain only
columns of T.

https://github.com/apache/spark/pull/26969

Regards,
Enrico

unsubscribe

Reviewers for Stage level Scheduling prs

unsubscribe

Re: [SPARK-30319][SQL] Add a stricter version of as[T]

4 matches

Site Navigation

Mail list logo

Footer information