Behavior of SaveMode.Append when table is not present

2018-11-08 Thread Shubham Chaurasia
Hi, For SaveMode.Append, the doc https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes says *When saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data* However it does not

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Xiao Li
Try to clear your browsing data or use a different web browser. Enjoy it, Xiao On Thu, Nov 8, 2018 at 4:15 PM Reynold Xin wrote: > Do you have a cached copy? I see it here > > http://spark.apache.org/downloads.html > > > > On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > >> this is wonderful !

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Reynold Xin
Do you have a cached copy? I see it here http://spark.apache.org/downloads.html On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > this is wonderful ! > I noticed the official spark download site does not have 2.4 download > links yet. > > On Thu, Nov 8, 2018, 4:11 PM Swapnil Shinde wrote: > >>

Re: DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Yes, we currently use traits that have methods. Something like “supports reading missing columns” doesn’t need to deliver methods. The other example is where we don’t have an object to test for a trait ( scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done. That could be

Re: DataSourceV2 capability API

2018-11-08 Thread Reynold Xin
This is currently accomplished by having traits that data sources can extend, as well as runtime exceptions right? It's hard to argue one way vs another without knowing how things will evolve (e.g. how many different capabilities there will be). On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue wrote:

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Stavros Kontopoulos
Awesome! On Thu, Nov 8, 2018 at 9:36 PM, Jules Damji wrote: > Indeed! > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Nov 8, 2018, at 11:31 AM, Dongjoon Hyun > wrote: > > Finally, thank you all. Especially, thanks to the release manager, Wenchen! > > Bests, > Dongjoon. > > > On

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Felix Cheung
They were discussed on dev@ in Mar 2018, for example. Several attempts were made in 2.3.0, 2.3.1, 2.3.2, 2.4.0. It’s not just tests, the last one is with vignettes. The current doc about RStudio actually assumes you have the full Spark distribution (ie from the download page and Apache Mirror)

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Wenchen Fan
Do we need to create a JIRA ticket for it and list it as a known issue in 2.4.0 release notes? On Wed, Nov 7, 2018 at 11:26 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Agree with the points Felix made. > > One thing is that it looks like the only problem is vignettes and the

[ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Wenchen Fan
Hi all, Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds Barrier Execution Mode for better integration with deep learning frameworks, introduces 30+ built-in and higher-order functions to deal with complex data type easier, improves the K8s integration, along with

DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Hi everyone, I’d like to propose an addition to DataSourceV2 tables, a capability API. This API would allow Spark to query a table to determine whether it supports a capability or not: val table = catalog.load(identifier) val supportsContinuous = table.isSupported("continuous-streaming") There

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Jules Damji
Indeed! Sent from my iPhone Pardon the dumb thumb typos :) > On Nov 8, 2018, at 11:31 AM, Dongjoon Hyun wrote: > > Finally, thank you all. Especially, thanks to the release manager, Wenchen! > > Bests, > Dongjoon. > > >> On Thu, Nov 8, 2018 at 11:24 AM Wenchen Fan wrote: >> + user list >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Dongjoon Hyun
Finally, thank you all. Especially, thanks to the release manager, Wenchen! Bests, Dongjoon. On Thu, Nov 8, 2018 at 11:24 AM Wenchen Fan wrote: > + user list > > On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > >> resend >> >> On Thu, Nov 8, 2018 at 11:02 PM Wenchen Fan wrote: >> >>> >>>

Re: Did the 2.4 release email go out?

2018-11-08 Thread Santiago Saavedra
The announcement has now been delivered El jue., 8 nov. 2018 a las 20:09, Wenchen Fan () escribió: > ping > > On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > >> Actually I did it 3 hours ago, however the mail server seems to have some >> problems and my email was lost. Let me resend it. >>

Re: Did the 2.4 release email go out?

2018-11-08 Thread Xiao Li
me too. Reynold Xin 于2018年11月8日周四 上午9:56写道: > The website is already up but I didn’t see any email announcement yet. >

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Wenchen Fan
+ user list On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > resend > > On Thu, Nov 8, 2018 at 11:02 PM Wenchen Fan wrote: > >> >> >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >>

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Marcelo Vanzin
+user@ >> -- Forwarded message - >> From: Wenchen Fan >> Date: Thu, Nov 8, 2018 at 10:55 PM >> Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 >> To: Spark dev list >> >> >> Hi all, >> >> Apache Spark 2.4.0 is the fifth release in the 2.x line. This release adds >> Barrier

Re: Did the 2.4 release email go out?

2018-11-08 Thread Wenchen Fan
Actually I did it 3 hours ago, however the mail server seems to have some problems and my email was lost. Let me resend it. On Fri, Nov 9, 2018 at 1:56 AM Reynold Xin wrote: > The website is already up but I didn’t see any email announcement yet. >

Re: Did the 2.4 release email go out?

2018-11-08 Thread Wenchen Fan
ping On Fri, Nov 9, 2018 at 2:20 AM Wenchen Fan wrote: > Actually I did it 3 hours ago, however the mail server seems to have some > problems and my email was lost. Let me resend it. > > On Fri, Nov 9, 2018 at 1:56 AM Reynold Xin wrote: > >> The website is already up but I didn’t see any email

On Java 9+ support, Cleaners, modules and the death of reflection

2018-11-08 Thread Sean Owen
I think this is a key thread, perhaps one of the only big problems, for Java 9+ support: https://issues.apache.org/jira/browse/SPARK-24421?focusedCommentId=16680169=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16680169 We basically can't access a certain method

Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Wenchen Fan
resend On Thu, Nov 8, 2018 at 11:02 PM Wenchen Fan wrote: > > > -- Forwarded message - > From: Wenchen Fan > Date: Thu, Nov 8, 2018 at 10:55 PM > Subject: [ANNOUNCE] Announcing Apache Spark 2.4.0 > To: Spark dev list > > > Hi all, > > Apache Spark 2.4.0 is the fifth release in

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread Sean Owen
This seems fine to me. At least we should be primarily testing against 2.12 now. Shane will need to alter the current 2.12 master build to actually test 2.11, but should be a trivial change. On Thu, Nov 8, 2018 at 12:11 AM DB Tsai wrote: > > Based on the discussions, I created a PR that makes

Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-08 Thread Hyukjin Kwon
Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster. Looks working fine so far; however, I would appreciate if you guys have some time to take a look

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Matei Zaharia
I didn’t realize the same thing was broken in 2.3.0, but we should probably have made this a blocker for future releases, if it’s just a matter of removing things from the test script. We should also make the docs at https://spark.apache.org/docs/latest/sparkr.html clear about how we want

Re: Happy Diwali everyone!!!

2018-11-08 Thread Srabasti Banerjee
Thoughtful of you to remember Xiao :-) Wish everyone a Happy & Prosperous Diwali ! ThanksSrabasti Banerjee On Wednesday, 7 November, 2018, 3:12:01 PM GMT-8, Dilip Biswal wrote: Thank you Sean. Happy Diwali !! -- Dilip - Original message - From: Xiao Li To:

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread DB Tsai
Based on the discussions, I created a PR that makes Spark's default Scala version as 2.12, and then Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds. https://github.com/apache/spark/pull/22967 We can decide later

Did the 2.4 release email go out?

2018-11-08 Thread Reynold Xin
The website is already up but I didn’t see any email announcement yet.