gt; think. I'm still not convinced there is a burning need to use Java 11
> but stay on 2.4, after 3.0 is out, and at least the wheels are in
> motion there. Java 8 is still free and being updated.
>
> On Fri, Sep 20, 2019 at 12:48 PM Ryan Blue
> wrote:
> >
> > H
gt; On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue
> wrote:
>
>> Hi everyone,
>>
>> In the DSv2 sync this week, we talked about a possible Spark 2.5 release
>> based on the latest Spark 2.4, but with DSv2 and Java 11 support added.
>>
>> A Spark 2.5
;
>> > Here, I am proposing to cut the branch on October 15th. If the features
>> are targeting to 3.0 preview release, please prioritize the work and finish
>> it before the date. Note, Oct. 15th is not the code freeze of Spark 3.0.
>> That means, the community will still work on the features for the upcoming
>> Spark 3.0 release, even if they are not included in the preview release.
>> The goal of preview release is to collect more feedback from the community
>> regarding the new 3.0 features/behavior changes.
>> >
>> > Thanks!
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
--
Ryan Blue
Software Engineer
Netflix
, to keep the scope of the release small. The
purpose is to assist people moving to 3.0 and not distract from the 3.0
release.
Would a Spark 2.5 release help anyone else? Are there any concerns about
this plan?
rb
--
Ryan Blue
Software Engineer
Netflix
Here are my notes from this week’s DSv2 sync.
*Attendees*:
Ryan Blue
Holden Karau
Russell Spitzer
Terry Kim
Wenchen Fan
Shiv Prashant Sood
Joseph Torres
Gengliang Wang
Matt Cheah
Burak Yavuz
*Topics*:
- Driver-side Hadoop conf
- SHOW DATABASES/NAMESPACES behavior
- Review outstanding
gt;>> >>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>> >>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>> >>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>> >>>>>>> SPARK-25383 Image data source supports sample pushdown
>>> >>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch
>>> failures by default
>>> >>>>>>> SPARK-27296 User Defined Aggregating Functions (UDAFs) have a
>>> major
>>> >>>>>>> efficiency problem
>>> >>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>> backend
>>> >>>>>>> cause driver pods to hang
>>> >>>>>>> SPARK-26731 remove EOLed spark jobs from jenkins
>>> >>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale
>>> configurable
>>> >>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>> >>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>> containing
>>> >>>>>>> barrier stage
>>> >>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>> logical Aggregate
>>> >>>>>>> SPARK-26022 PySpark Comparison with Pandas
>>> >>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>> standard
>>> >>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>> >>>>>>> SPARK-26425 Add more constraint checks in file streaming source
>>> to
>>> >>>>>>> avoid checkpoint corruption
>>> >>>>>>> SPARK-25843 Redesign rangeBetween API
>>> >>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>> >>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>> >>>>>>> produce named output from CleanupAliases
>>> >>>>>>> SPARK-23210 Introduce the concept of default value to schema
>>> >>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>> window aggregate
>>> >>>>>>> SPARK-25531 new write APIs for data source v2
>>> >>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>> >>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>> >>>>>>> SPARK-24417 Build and Run Spark on JDK11
>>> >>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>> Kubernetes
>>> >>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>> Mesos
>>> >>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>> >>>>>>> MesosFineGrainedSchedulerBackend
>>> >>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>> >>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>> >>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>> barrier
>>> >>>>>>> execution mode
>>> >>>>>>> SPARK-25390 data source V2 API refactoring
>>> >>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>> >>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>> Partition Spec
>>> >>>>>>> SPARK-15691 Refactor and improve Hive support
>>> >>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>> >>>>>>> SPARK-16217 Support SELECT INTO statement
>>> >>>>>>> SPARK-16452 basic INFORMATION_SCHEMA support
>>> >>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>> >>>>>>> SPARK-18245 Improving support for bucketed table
>>> >>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>> Support in Spark
>>> >>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>> nested
>>> >>>>>>> list of structures
>>> >>>>>>> SPARK-22632 Fix the behavior of timestamp values for R's
>>> DataFrame to
>>> >>>>>>> respect session timezone
>>> >>>>>>> SPARK-22386 Data Source V2 improvements
>>> >>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>> YARN
>>> >>>>>>>
>>> >>>>>>>
>>> -
>>> >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Name : Jungtaek Lim
>>> >>>> Blog : http://medium.com/@heartsavior
>>> >>>> Twitter : http://twitter.com/heartsavior
>>> >>>> LinkedIn : http://www.linkedin.com/in/heartsavior
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> John Zhuge
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Twitter: https://twitter.com/holdenkarau
>>> >> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
--
Ryan Blue
Software Engineer
Netflix
+0
> [ ] -1: I don't think this is a good idea because ...
>
> I'll start with my +1
>
> Thanks,
> Tom
>
> -----
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Ryan Blue
Software Engineer
Netflix
am DBMS is using this policy by
>> default.
>>
>> Currently, the V1 data source uses "Legacy" policy by default, while V2 uses
>> "Strict". This proposal is to use "ANSI" policy by default for both V1 and
>> V2 in Spark 3.0.
>>
>> There was also a DISCUSS thread "Follow ANSI SQL on table insertion" in the
>> dev mailing list.
>>
>> This vote is open until next Thurs (Sept. 12nd).
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don't think this is a good idea because ...
>>
>> Thank you!
>>
>> Gengliang
>>
>>
>
--
Ryan Blue
Software Engineer
Netflix
Here are my notes from the latest sync. Feel free to reply with
clarifications if I’ve missed anything.
*Attendees*:
Ryan Blue
John Zhuge
Russell Spitzer
Matt Cheah
Gengliang Wang
Priyanka Gomatam
Holden Karau
*Topics*:
- DataFrameWriterV2 insert vs append (recap)
- ANSI and strict modes
Sorry these notes were delayed. Here’s what we talked about in the last
DSv2 sync.
*Attendees*:
Ryan Blue
John Zhuge
Burak Yavuz
Gengliang Wang
Terry Kim
Wenchen Fan
Xin Ren
Srabasti Banerjee
Priyanka Gomatam
*Topics*:
- Follow up on renaming append to insert in v2 API
- Changes
ime
>- softwareVersion
>- options (map)
>
> ViewColumn interface:
>
>- name
>- type
>
>
> Thanks,
> John Zhuge
>
--
Ryan Blue
Software Engineer
Netflix
Here are my notes from the last DSv2 sync. Sorry it's a bit late!
*Attendees*:
Ryan Blue
John Zhuge
Raynmond McCollum
Terry Kim
Gengliang Wang
Jose Torres
Wenchen Fan
Priyanka Gomatam
Matt Cheah
Russel Spitzer
Burak Yavuz
*Topics*:
- Check in on blockers
- Remove SaveMode
. My intuition is yes, because
>> different users have different levels of tolerance for different kinds of
>> errors. I’d expect these sorts of configurations to be set up at an
>> infrastructure level, e.g. to maintain consistent standards throughout a
>> whole organiz
in advance for your help.
>>
>> Regards,
>> Shiv
>>
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>
--
Ryan Blue
Software Engineer
Netflix
;
>>
>> -Matt Cheah
>>
>>
>>
>> *From: *Reynold Xin
>> *Date: *Wednesday, July 31, 2019 at 9:58 AM
>> *To: *Matt Cheah
>> *Cc: *Russell Spitzer , Takeshi Yamamuro <
>> linguin@gmail.com>, Gengliang Wang ,
>> Ryan Blue ,
SQL is a better idea.
>> For more information, please read the Discuss: Follow ANSI SQL on table
>> insertion
>> <https://docs.google.com/document/d/1b9nnWWbKVDRp7lpzhQS1buv1_lDzWIZY2ApFs5rBcGI/edit?usp=sharing>
>> Please let me know if you have any thoughts on this.
>>
>> Regards,
>> Gengliang
>>
>
--
Ryan Blue
Software Engineer
Netflix
rt data source v2 performance a lot and we'd better
> fix it sooner rather than later.
>
>
> On Sat, Jul 20, 2019 at 8:20 AM Ryan Blue
> wrote:
>
>> Here are my notes from the last sync. If you’d like to be added to the
>> invite or have topics, please let me know.
>&g
Here are my notes from the last sync. If you’d like to be added to the
invite or have topics, please let me know.
*Attendees*:
Ryan Blue
Matt Cheah
Yifei Huang
Jose Torres
Burak Yavuz
Gengliang Wang
Michael Artz
Russel Spitzer
*Topics*:
- Existing PRs
- V2 session catalog: https
Sounds great! Ping me on the review, I think this will be really valuable.
On Fri, Jul 12, 2019 at 6:51 PM Xianyin Xin
wrote:
> If there’s nobody working on that, I’d like to contribute.
>
>
>
> Loop in @Gengliang Wang.
>
>
>
> Xianyin
>
>
>
> *F
Master,
> but can't find a JDBC implementation or related JIRA.
>
> DatasourceV2 APIs to me look in good shape to attempt a JDBC connector for
> READ/WRITE path.
>
> Thanks & Regards,
> Shiv
>
--
Ryan Blue
Software Engineer
Netflix
Here are my notes from this week’s sync.
*Attendees*:
Ryan Blue
John Zhuge
Dale Richardson
Gabor Somogyi
Matt Cheah
Yifei Huang
Xin Ren
Jose Torres
Gengliang Wang
Kevin Yu
*Topics*:
- Metadata columns or function push-down for Kafka v2 source
- Open PRs
- REPLACE TABLE
in terms of the first RC and final
> release?
> >
> >
> >
> > Cheers Andrew
>
> -----
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Ryan Blue
Software Engineer
Netflix
(in case of DSv2 replacement is implemented). After some
> digging I've found DSv1 sources which are already removed but in some cases
> v1 and v2 still exists in parallel.
>
> Can somebody please tell me what's the overall plan in this area?
>
> BR,
> G
>
>
--
Ryan Blue
Software Engineer
Netflix
>>>>
>>>>>>
>>>>>>
>>>>>> I would like to call a vote for the SPIP for SPARK-25299
>>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes
>>>>>> to introduce a pluggable storage API for temporary shuffle data.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You may find the SPIP document here
>>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>> The discussion thread for the SPIP was conducted here
>>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please vote on whether or not this proposal is agreeable to you.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matt Cheah
>>>>>>
>>>>>
--
Ryan Blue
Software Engineer
Netflix
Here are the latest DSv2 sync notes. Please reply with updates or
corrections.
*Attendees*:
Ryan Blue
Michael Armbrust
Gengliang Wang
Matt Cheah
John Zhuge
*Topics*:
Wenchen’s reorganization proposal
Problems with TableProvider - property map isn’t sufficient
New PRs:
- ReplaceTable
Here are my notes from last night’s sync. I had to leave early, so there
may be more discussion. Others can fill in the details for those topics.
*Attendees*:
John Zhuge
Ryan Blue
Yifei Huang
Matt Cheah
Yuanjian Li
Russell Spitzer
Kevin Yu
*Topics*:
- Atomic extensions for the TableCatalog
Sorry these notes are so late, I didn’t get to the write up until now. As
usual, if anyone has corrections or comments, please reply.
*Attendees*:
John Zhuge
Ryan Blue
Andrew Long
Wenchen Fan
Gengliang Wang
Russell Spitzer
Yuanjian Li
Yifei Huang
Matt Cheah
Amardeep Singh Dhilon
Zhilmil Dhion
.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:89)
> at
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:41)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:541)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:763)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:463)
> at
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:209)]
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Ryan Blue
Software Engineer
Netflix
Here are my notes for the latest DSv2 community sync. As usual, if you have
comments or corrections, please reply. If you’d like to be invited to the
next sync, email me directly. Everyone is welcome to attend.
*Attendees*:
Ryan Blue
John Zhuge
Andrew Long
Bruce Robbins
Dilip Biswal
Gengliang
Catalyst? I’ve been trying to piece together how
> Catalyst knows that it can remove a sort and shuffle given that both tables
> are bucketed and sorted the same way. Is there any classes in particular I
> should look at?
>
>
>
> Cheers Andrew
>
--
Ryan Blue
Software Engineer
Netflix
orges Perrin
> j...@jgp.net
>
>
>
> On Apr 19, 2019, at 10:10, Ryan Blue wrote:
>
> Here are my notes from the last DSv2 sync. As always:
>
>- If you’d like to attend the sync, send me an email and I’ll add you
>to the invite. Everyone is welcome.
>
*:
- TableCatalog PR #24246: https://github.com/apache/spark/pull/24246
- Remove SaveMode PR #24233: https://github.com/apache/spark/pull/24233
- Streaming capabilities PR #24129:
https://github.com/apache/spark/pull/24129
*Attendees*:
Ryan Blue
John Zhuge
Matt Cheah
Yifei Huang
Bruce Robbins
Jamison
r. Do you have a different proposal about how this should
> be handled?
>
> On Tue, Apr 16, 2019 at 4:23 PM Ryan Blue wrote:
>
>> Is this a bug fix? It looks like a new feature to me.
>>
>> On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust
>> wrote:
>>
&
'd like to start the process shortly.
>
> Michael
>
--
Ryan Blue
Software Engineer
Netflix
t;>>
>>>>>>>>> Hello,
>>>>>>>>> I fail to see how an equi-join on the key columns is different
>>>>>>>>> than the cogroup you propose.
>>>>>>>>>
>>>>>>>>> I
gt;> That is, when reading column partitioned Parquet files the explicitly
>> specified schema is not adhered to, instead the partitioning columns are
>> appended the end of the column list. This is a quite severe issue as some
>> operations, such as union, fails if columns are in
Ryan Blue
John Zhuge
Russel Spitzer
Gengliang Wang
Yuanjian Li
Matt Cheah
Yifei Huang
Felix Cheung
Dilip Biswal
Wenchen Fan
--
Ryan Blue
Software Engineer
Netflix
he SparkContext but is state only needed by one
>
> SparkSession and that there isn't any way to clean up now, that's a
>
> compelling reason to change the API. Is that the situation? The only
>
> downside is making the user separately stop the SparkContext then.
>
>
trying to understand why this is the intended behavior – anyone
> have any knowledge of why this is the case?
> >
> >
> >
> > Thanks,
> >
> > Vinoo
>
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Ryan Blue
Software Engineer
Netflix
doopConfWithOptions(relation.options))
> )
>
> *import *scala.collection.JavaConverters._
>
> *val *rows = readFile(pFile).flatMap(_ *match *{
> *case *r: InternalRow => *Seq*(r)
>
> // This doesn't work. vector mode is doing something screwy
> *case *b: ColumnarBatch => b.rowIterator().asScala
> }).toList
>
> *println*(rows)
> //List([0,1,5b,24,66647361])
> //??this is wrong I think
>
>
>
> Has anyone attempted something similar?
>
>
>
> Cheers Andrew
>
>
>
--
Ryan Blue
Software Engineer
Netflix
artitioned using Hive Hash? By
>> understanding, I mean that I’m able to avoid a full shuffle join on Table A
>> (partitioned by Hive Hash) when joining with a Table B that I can shuffle
>> via Hive Hash to Table A.
>>
>>
>>
>> Thank you,
>>
>> Tyson
>>
>
>
--
Ryan Blue
Software Engineer
Netflix
e to join?
>
> Stavros
>
> On Thu, Feb 21, 2019 at 10:56 PM Ryan Blue
> wrote:
>
>> Here are my notes from the DSv2 sync last night. As always, if you have
>> corrections, please reply with them. And if you’d like to be included on
>> the invite to participate
This vote fails with the following counts:
3 +1 votes:
- Matt Cheah
- Ryan Blue
- Sean Owen (binding)
1 -0 vote:
- Jose Torres
2 -1 votes:
- Mark Hamstra (binding)
- Midrul Muralidharan (binding)
Thanks for the discussion, everyone, It sounds to me that the main
objection
Actually, I went ahead and removed the confusing section. There is no
public API in the doc now, so that it is clear that it isn't a relevant
part of this vote.
On Fri, Mar 1, 2019 at 4:58 PM Ryan Blue wrote:
> I moved the public API to the "Implementation Sketch" sect
t; anthony.young-gar...@cloudera.com.invalid> wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Feb 28, 2019 at 5:54 PM John Zhuge wrote:
>>
>> +1 (non-binding)
>>
>> On Thu, Feb 28, 2019 at 9:11 AM Matt Cheah wrote:
>>
>> +1 (non-binding)
>>
&
Sv2 at the end to meet a deadline? all the better to,
> if anything, agree it's important now. It's also an agreement to delay
> the release for it, not rush it. I don't see that later is a better
> time to make the decision, if rush is a worry?
>
> Given my definition, and understanding of t
roject management, IMO.
>
> On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote:
>
>> Mark, if this goal is adopted, "we" is the Apache Spark community.
>>
>> On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra
>> wrote:
>>
>>> Who is "we&qu
tt Cheah wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Are identifiers and namespaces going to be rolled under one of those six
>> points?
>>
>>
>>
>> *From: *Ryan Blue
>> *Reply-To: *"rb...@netflix.com"
>> *Date: *Thursd
u, Feb 28, 2019 at 1:00 AM Ryan Blue wrote:
>
>> I think that's a good plan. Let's get the functionality done, but mark it
>> experimental pending a new row API.
>>
>> So is there agreement on this set of work, then?
>>
>> On Tue, Feb 26, 2019 at 6:30 PM Matei
(e.g.,
INSERT INTO support)
Please vote in the next 3 days on whether you agree with committing to this
goal.
[ ] +1: Agree that we should consider a functional DSv2 implementation a
blocker for Spark 3.0
[ ] +0: . . .
[ ] -1: I disagree with this goal because . . .
Thank you!
--
Ryan Blue
+1 (non-binding)
On Wed, Feb 27, 2019 at 8:34 PM Russell Spitzer
wrote:
> +1 (non-binding)
>
> On Wed, Feb 27, 2019, 6:28 PM Ryan Blue wrote:
>
>> Hi everyone,
>>
>> In the last DSv2 sync, the consensus was that the table metadata SPIP was
>> ready to bri
doc
<https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d>
.
Please vote in the next 3 days.
[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...
Thanks!
--
Ryan Blue
Software En
x that before we declare dev2 is stable, because
> InternalRow is not a stable API. We don’t necessarily need to do it in 3.0.
> >
> > On Tue, Feb 26, 2019 at 5:10 PM Matt Cheah wrote:
> > Will that then require an API break down the line? Do we save that for
> Spark 4?
&
26, 2019 at 4:41 PM Matt Cheah wrote:
> Reynold made a note earlier about a proper Row API that isn’t InternalRow
> – is that still on the table?
>
>
>
> -Matt Cheah
>
>
>
> *From: *Ryan Blue
> *Reply-To: *"rb...@netflix.com"
> *Date: *Tuesday, F
t I have a
> problem with, are declarative, pseudo-authoritative statements that 3.0 (or
> some other release) will or won't contain some feature, API, etc. or that
> some issue is or is not blocker or worth delaying for. When the PMC has not
> voted on such issues, I'm often left thinking, "Wait... what? Who decided
> that, or where did that decision come from?"
>
>
--
Ryan Blue
Software Engineer
Netflix
wrote:
>>
>>> +1
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
--
Ryan Blue
Software Engineer
Netflix
id not target new features in the Spark 2.0.0 release.
>>> The fact that some entity other than the PMC thinks that Spark 3.0 should
>>> contain certain new features or that it will be costly to them if 3.0 does
>>> not contain those features is not dispositive. If there are public API
>>> changes that should occur in a timely fashion and there is also a list of
>>> new features that some users or contributors want to see in 3.0 but that
>>> look likely to not be ready in a timely fashion, then the PMC should fully
>>> consider releasing 3.0 without all those new features. There is no reason
>>> that they can't come in with 3.1.0.
>>>
>>
--
Ryan Blue
Software Engineer
Netflix
also the features that have remained open for the longest time
> and we really need to move forward on these. Putting a target release for
> 3.0 will help in that regard.
>
>
>
> -Matt Cheah
>
>
>
> *From: *Ryan Blue
> *Reply-To: *"rb...@netflix.com"
> *D
an around when the release branch will likely be cut.
>
> Matei
>
> > On Feb 21, 2019, at 1:03 PM, Ryan Blue
> wrote:
> >
> > Hi everyone,
> >
> > In the DSv2 sync last night, we had a discussion about roadmap and what
> the goal should be for getting th
years to get
the work done.
Are there any objections to targeting 3.0 for this?
In addition, much of the planning for multi-catalog support has been done
to make v2 possible. Do we also want to include multi-catalog support?
rb
--
Ryan Blue
Software Engineer
Netflix
contains sort information, but it isn’t used
because it applies only to single files.
- *Consensus formed not including sorts in v2 table metadata.*
*Attendees*:
Ryan Blue
John Zhuge
Donjoon Hyun
Felix Cheung
Gengliang Wang
Hyukji Kwon
Jacky Lee
Jamison Bennett
Matt Cheah
Yifei Huang
Russel
:33 AM Maryann Xue
> wrote:
>
>> +1
>>
>> On Mon, Feb 18, 2019 at 10:46 PM John Zhuge wrote:
>>
>>> +1
>>>
>>> On Mon, Feb 18, 2019 at 8:43 PM Dongjoon Hyun
>>> wrote:
>>>
>>>> +1
>>>>
in the next 3 days.
[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don't think this is a good idea because ...
Thanks!
rb
--
Ryan Blue
Software Engineer
Netflix
Sure. I'll start a thread.
On Mon, Feb 18, 2019 at 6:27 PM Wenchen Fan wrote:
> I think this is the right direction to go. Shall we move forward with a
> vote and detailed designs?
>
> On Mon, Feb 4, 2019 at 9:57 AM Ryan Blue wrote:
>
>> Hi everyone,
>
I can understand retaining
> old behavior under a flag where the behavior change could be
> problematic for some users or facilitate migration, but this is just a
> change to some UI links no? the underlying links don't change.
> On Fri, Feb 8, 2019 at 5:41 PM Ryan Blue wrote:
>
f flag option to just get one url or
> default two stdout/stderr urls.
> 3. We could let users enumerate file names they want to link, and create
> log links for each file.
>
> Which one do you suggest?
>
> 2019년 2월 9일 (토) 오전 8:24, Ryan Blue 님이 작성:
>
>> Jungtaek,
>>
>
og overview
> > page would make much more sense for us. We work it around with a custom
> > submit process that logs all important URLs on the submit side log.
> >
> >
> >
> > 2019년 2월 9일 (토) 오전 5:42, Ryan Blue 님이 작성:
> >>
> >> Here's what I see from a run
o remove file part manually from URL to
> access list page. Instead of this we may be able to change default URL to
> show all of local logs and let users choose which file to read. (though it
> would be two-clicks to access to actual file)
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2
would put more files then only
> stdout and stderr (like gc logs).
>
> SPARK-23155 provides the way to modify log URL but it's only applied to
> SHS, and in Spark UI in running apps it still only shows "stdout" and
> "stderr". SPARK-26792 is for applying this to Spark UI as well, but I've
> got suggestion to just change the default log URL.
>
> Thanks again,
> Jungtaek Lim (HeartSaVioR)
>
--
Ryan Blue
Software Engineer
Netflix
get(0,
> DataTypes.DateType));
>
> }
>
> It prints an integer as output:
>
> MyDataWriter.write: 17039
>
>
> Is this a bug? or I am doing something wrong?
>
> Thanks,
> Shubham
>
--
Ryan Blue
Software Engineer
Netflix
, and partitioning is already
supported. The idea to use conditions to create separate data frames would
actually make that harder because you'd need to create and name tables for
each one.
On Mon, Feb 4, 2019 at 9:16 AM Andrew Melo wrote:
> Hello Ryan,
>
> On Mon, Feb 4, 2019 at 10:52 AM
t; > --
> >
> > Moein Hosseini
> > Data Engineer
> > mobile: +98 912 468 1859
> > site: www.moein.xyz
> > email: moein...@gmail.com
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
--
Ryan Blue
Software Engineer
Netflix
discussion. From the feedback in the
DSv2 sync and on the previous thread, I think it should go quickly.
Thanks for taking a look at the proposal,
rb
--
Ryan Blue
e broadcast timeout really meant to be a timeout on
> sparkContext.broadcast, instead of the child.executeCollectIterator()? In
> that case, would it make sense to move the timeout to wrap only
> sparkContext.broadcast?
>
> Best,
>
> Justin
>
--
Ryan Blue
Software Engineer
Netflix
- Ryan: next time, we should talk about the set of metadata proposed
for TableCatalog, but we’re out of time.
*Attendees*:
Ryan Blue
John Zhuge
Reynold Xin
Xiao Li
Dongjoon Hyun
Eric Wohlstadter
Hyukjin Kwon
Jacky Lee
Jamison Bennett
Kevin Yu
Yuanjian Li
Maryann Xue
Matt Cheah
Dale Richards
scheme will need to play nice with column identifier as
> well.
>
>
>
>
> --
>
> *From:* Ryan Blue
> *Sent:* Thursday, January 17, 2019 11:38 AM
> *To:* Spark Dev List
> *Subject:* Re: [DISCUSS] Identifiers with multi-catalog support
>
&
Any discussion on how Spark should manage identifiers when multiple
catalogs are supported?
I know this is an area where a lot of people are interested in making
progress, and it is a blocker for both multi-catalog support and CTAS in
DSv2.
On Sun, Jan 13, 2019 at 2:22 PM Ryan Blue wrote:
>
a long time (say .. until
> Spark 4.0.0?).
> >
> > I know somehow it happened to be sensitive but to be just literally
> honest to myself, I think we should make a try.
> >
>
>
> --
> Marcelo
>
--
Ryan Blue
Software Engineer
Netflix
we are super 100% dependent on Hive...
>>
>>
>> --
>> *From:* Ryan Blue
>> *Sent:* Tuesday, January 15, 2019 9:53 AM
>> *To:* Xiao Li
>> *Cc:* Yuming Wang; dev
>> *Subject:* Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>
st PR <https://github.com/apache/spark/pull/23552> does not
>> contain the changes of hive-thriftserver. Please ignore the failed test in
>> hive-thriftserver.
>>
>> The second PR <https://github.com/apache/spark/pull/23553> is complete
>> changes.
>>
>>
>>
>> I have created a Spark distribution for Apache Hadoop 2.7, you might
>> download it via Google Drive
>> <https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> or Baidu
>> Pan <https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>.
>>
>> Please help review and test. Thanks.
>>
>
--
Ryan Blue
Software Engineer
Netflix
to support
path-based tables by adding a path to CatalogIdentifier, either as a
namespace or as a separate optional string. Then, the identifier passed to
a catalog would work for either a path-based table or a catalog table,
without needing a path-based catalog API.
Thoughts?
On Sun, Jan 13, 2019 at 1:
are tables and not nested namespaces. How would Spark handle
arbitrary nesting that differs across catalogs?
Hopefully, I’ve captured the design question well enough for a productive
discussion. Thanks!
rb
--
Ryan Blue
Software Engineer
Netflix
Here are my notes from the DSv2 sync last night.
*As usual, I didn’t take great notes because I was participating in the
discussion. Feel free to send corrections or clarification.*
*Attendees*:
Ryan Blue
John Zhuge
Xiao Li
Reynold Xin
Felix Cheung
Anton Okolnychyi
Bruce Robbins
Dale Richardson
an also talk about the user-facing API
<https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.cgnrs9vys06x>
proposed in the SPIP.
Thanks,
rb
--
Ryan Blue
Software Engineer
Netflix
the tune of 2-6%. Has anyone
>> considered this before?
>>
>> Sean
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
--
Ryan Blue
Software Engineer
Netflix
;
>>> > Following this direction, it makes more sense to delegate everything
>>> to data sources.
>>> >
>>> > As the first step, maybe we should not add DDL commands to change
>>> schema of data source, but just use the capability API to let data s
writing.
> Users can use native client of data source to change schema.
> >
> > On Fri, Dec 21, 2018 at 8:03 AM Ryan Blue wrote:
> >>
> >> I think it is good to know that not all sources support default values.
> That makes me think that we should delegat
l that we should follow RDBMS/SQL standard
>> regarding the behavior?
>>
>> > pass the default through to the underlying data source
>>
>> This is one way to implement the behavior.
>>
>> On Thu, Dec 20, 2018 at 11:12 AM Ryan Blue wrote:
>>
>>>
sers.
>
> On Thu, Dec 20, 2018 at 12:43 AM Ryan Blue wrote:
>
>> Wenchen, can you give more detail about the different ADD COLUMN syntax?
>> That sounds confusing to end users to me.
>>
>> On Wed, Dec 19, 2018 at 7:15 AM Wenchen Fan wrote:
>>
>>> N
ng data, fill the missing column with the initial default
>>> value
>>> 4. when writing data, fill the missing column with the latest default
>>> value
>>> 5. when altering a column to change its default value, only update the
>>> latest defa
that this complexity probably isn’t worth consistency in default
values across sources, if that is even achievable.
In the sync we thought it was a good idea to send this out to the larger
group to discuss. Please reply with comments!
rb
--
Ryan Blue
Software Engineer
Netflix
to the invite
list, just let me know. Everyone is welcome.
rb
*Attendees*:
Ryan Blue
Xiao Li
Bruce Robbins
John Zhuge
Anton Okolnychyi
Jackey Lee
Jamison Bennett
Srabasti Banerjee
Thomas D’Silva
Wenchen Fan
Matt Cheah
Maryann Xue
(possibly others that entered after the start)
*Agenda
; *Cc: *Spark Dev List
>> *Subject: *Re: [DISCUSS] Function plugins
>>
>>
>>
>> [image: Image removed by sender.]
>>
>> Having a way to register UDFs that are not using Hive APIs would be great!
>>
>>
>>
>>
>>
>>
>>
>&
ave to
solve challenges with function naming (whether there is a db component).
Right now I’d like to think through the overall idea and not get too
focused on those details.
Thanks,
rb
--
Ryan Blue
Software Engineer
Netflix
d to address this).
> I'd consider this as a brainstorming email thread. Once we have a good
> proposal, then we can go ahead with a SPIP.
>
> Thanks,
> Marco
>
> Il giorno mer 12 dic 2018 alle ore 19:13 Ryan Blue ha
> scritto:
>
>> Marco,
>>
>>
esign for each data
> type.
>
> The above are the big one I can think of. I probably missed some, but a
> lot of other smaller things can be improved on later.
>
>
>
>
>
>
>
--
Ryan Blue
Software Engineer
Netflix
people that are
looking at it now are the ones already familiar with the problem.
rb
On Wed, Dec 12, 2018 at 2:05 AM Marco Gaido wrote:
> Thank you all for your answers.
>
> @Ryan Blue sure, let me state the problem more
> clearly: imagine you have 2 dataframes with a co
nges in the design, we can do that.
>
> Thoughts on this?
>
> Thanks,
> Marco
>
--
Ryan Blue
Software Engineer
Netflix
f = spark.read.json("s3://sample_bucket/people.json")
>>>>> > df.printSchema()
>>>>> > df.filter($"age" > 20).explain()
>>>>> >
>>>>> > root
>>>>> > |-- age: long (nullable = true)
>>>>> > |-- name: string (nullable = true)
>>>>> >
>>>>> > == Physical Plan ==
>>>>> > *Project [age#47L, name#48]
>>>>> > +- *Filter (isnotnull(age#47L) && (age#47L > 20))
>>>>> >+- *FileScan json [age#47L,name#48] Batched: false, Format: JSON,
>>>>> Location: InMemoryFileIndex[s3://sample_bucket/people.json],
>>>>> PartitionFilters: [], PushedFilters: [IsNotNull(age),
>>>>> GreaterThan(age,20)],
>>>>> ReadSchema: struct
>>>>> >
>>>>> > # Comments
>>>>> > As you can see, PushedFilter is shown even if input data is JSON.
>>>>> > Actually this pushdown is not used.
>>>>> >
>>>>> > I'm wondering if it has been already discussed or not.
>>>>> > If not, this is a chance to have such feature in DataSourceV2
>>>>> because it would require some API level changes.
>>>>> >
>>>>> >
>>>>> > Warm regards,
>>>>> >
>>>>> > Noritaka Sekiyama
>>>>> >
>>>>>
>>>>> -
>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>
>>>>>
--
Ryan Blue
Software Engineer
Netflix
eply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
--
Ryan Blue
Software Engineer
Netflix
101 - 200 of 399 matches
Mail list logo