Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
I don't think so. Spark brings the Scala dependency, and I don't think the installed scalac matters in this respect. Darcy there was an open question about whether this enables you to back out the workaround you created for 2.12.6. I tried removing it and it failed again, so left it in as still

Re: On Scala 2.12.7

2018-10-01 Thread Wenchen Fan
SGTM then. Is there anything we need to do to pick up the 2.12.7 upgrade? like updating Jenkins config? On Tue, Oct 2, 2018 at 10:53 AM Sean Owen wrote: > I tested both ways, and it actually works fine. It calls into question > whether there's really a fix we need with 2.12.7, but, I hear two >

Re: On Scala 2.12.7

2018-10-01 Thread Sean Owen
I tested both ways, and it actually works fine. It calls into question whether there's really a fix we need with 2.12.7, but, I hear two informed opinions (Darcy and the scala release notes) that it was relevant. As we have no prior 2.12 support, I guess my feeling was indeed to get this update in

Re: On Scala 2.12.7

2018-10-01 Thread Felix Cheung
Although like you said, spark support for scala 2.12 is beta anyway then shouldn’t we get it to a working state by basing on 2.12.7? There shouldn’t be a stability issue since it is not officially “supported” From: Wenchen Fan Sent: Monday, October 1, 2018

Re: On Scala 2.12.7

2018-10-01 Thread Wenchen Fan
My major concern is how it will affect end-users if Spark 2.4 is built with Scala versions prior to 2.12.7. Generally I'm hesitating to upgrade Scala version when we are very close to a release, and Scala 2.12 build of Spark 2.4 is beta anyway. On Sat, Sep 29, 2018 at 6:46 AM Sean Owen wrote: >

Re: BroadcastJoin failed on partitioned parquet table

2018-10-01 Thread Wenchen Fan
I'm not sure if Spark 1.6 is still maintained, can you try a 2.x spark version and see if the problem still exists? On Sun, Sep 30, 2018 at 4:14 PM 白也诗无敌 <445484...@qq.com> wrote: > Besides I have tried ANALYZE statement. It has no use cause I need the > single partition but get the total table

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Wenchen Fan
Ryan thanks for putting up a list! Generally there are a few tunning to the data source v2 API in 2.4, and it shouldn't be too hard if you already have a data source v2 implementation and you want to upgrade to Spark 2.4. However, we do want to do some big API changes for data source v2 in the

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Michael Heuer
FYI I’ve open two new issues against 2.4.0 rc2 https://issues.apache.org/jira/browse/SPARK-25587 https://issues.apache.org/jira/browse/SPARK-25588 that are regressions against 2.3.1, and may

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Wenchen Fan
This RC fails because of the correctness bug: SPARK-25538 I'll start a new RC once the fix(https://github.com/apache/spark/pull/22602) is merged. Thanks, Wenchen On Tue, Oct 2, 2018 at 1:21 AM Sean Owen wrote: > Given that this release is probably still 2 weeks from landing, I don't > think

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
Hi, OK. Sorry for the noise. I don't know why it started working, but I cannot reproduce it anymore. Sorry for a false alarm (but I could promise it didn't work and I changed nothing). Back to work... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL

Re: [DISCUSS] Syntax for table DDL

2018-10-01 Thread Ryan Blue
What do you mean by consistent with the syntax in SqlBase.g4? These aren’t currently defined, so we need to decide what syntax to support. There are more details below, but the syntax I’m proposing is more standard across databases than Hive, which uses confusing and non-standard syntax. I doubt

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Xiangrui Meng
On Mon, Oct 1, 2018 at 9:52 AM Holden Karau wrote: > Oh that does look like an important correctness issue. > -1 > > On Mon, Oct 1, 2018, 9:57 AM Marco Gaido wrote: > >> -1, I was able to reproduce SPARK-25538 with the provided data. >> >> Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha >>

Re: Data source V2 in spark 2.4.0

2018-10-01 Thread Ryan Blue
Hi Assaf, The major changes to the V2 API that you linked to aren’t going into 2.4. Those will be in the next release because they weren’t finished in time for 2.4. Here are the major updates that will be in 2.4: - SPARK-23323 : The output

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Holden Karau
Oh that does look like an important correctness issue. -1 On Mon, Oct 1, 2018, 9:57 AM Marco Gaido wrote: > -1, I was able to reproduce SPARK-25538 with the provided data. > > Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha > scritto: > >> +1 >> >> Original message >> From:

Data source V2 in spark 2.4.0

2018-10-01 Thread assaf.mendelson
Hi all, I understood from previous threads that the Data source V2 API will see some changes in spark 2.4.0, however, I can't seem to find what these changes are. Is there some documentation which summarizes the changes? The only mention I seem to find is this pull request:

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Steve Loughran
On 30 Sep 2018, at 19:37, Jacek Laskowski mailto:ja...@japila.pl>> wrote: scala> spark.range(1).write.saveAsTable("demo") 2018-09-30 17:44:27 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException 2018-09-30 17:44:28 ERROR FileOutputCommitter:314 - Mkdirs

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-01 Thread Steve Loughran
On 11 Aug 2018, at 17:33, chandan prakash mailto:chandanbaran...@gmail.com>> wrote: Hi All, I was going through this pull request about new CheckpointFileManager abstraction in structured streaming coming in 2.4 : https://issues.apache.org/jira/browse/SPARK-23966

Re: Some PRs not automatically linked to JIRAs

2018-10-01 Thread Hyukjin Kwon
Seems fixed but looks it starts to leave duplicated PR links for some recent JIRAs. Not a big deal but are they being ran in multiple places maybe? For instance, https://issues.apache.org/jira/browse/SPARK-25579 https://issues.apache.org/jira/browse/SPARK-25574

why y.size is 65536 but y size in new dataset is 1000

2018-10-01 Thread hagersaleh
please help me, code write in spark by python error is Caused by: java.lang.IllegalArgumentException: requirement failed: BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching sizes: x.size = 1000, y.size = 65536 why y.size is 65536 but y size in new dataset is 1000 1-I train model

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-10-01 Thread Gabor Somogyi
Hi Saisai, The reasons why I've originally set the goal only structured streaming is the following: * Haven't seen big interest in the DStream area for new features * Separate the concerns even if there is a need All in all happy to port the feature to DStream if you think it worth and you can

Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-10-01 Thread Gabor Somogyi
Hi Jungtaek, Thanks for your comments, just reacted on them. BR, G On Sat, Sep 29, 2018 at 2:50 PM Jungtaek Lim wrote: > Hi Gabor, > > Thanks for proposing the feature. I'm definitely interested to see this > feature, but honestly I'm not familiar with how Spark deals with delegation > token

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Marco Gaido
-1, I was able to reproduce SPARK-25538 with the provided data. Il giorno lun 1 ott 2018 alle ore 09:11 Ted Yu ha scritto: > +1 > > Original message > From: Denny Lee > Date: 9/30/18 10:30 PM (GMT-08:00) > To: Stavros Kontopoulos > Cc: Sean Owen , Wenchen Fan , dev < >

Re: [VOTE] SPARK 2.4.0 (RC2)

2018-10-01 Thread Ted Yu
+1 Original message From: Denny Lee Date: 9/30/18 10:30 PM (GMT-08:00) To: Stavros Kontopoulos Cc: Sean Owen , Wenchen Fan , dev Subject: Re: [VOTE] SPARK 2.4.0 (RC2) +1 (non-binding) On Sat, Sep 29, 2018 at 10:24 AM Stavros Kontopoulos wrote: +1 Stavros On Sat, Sep

RE: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jyoti Ranjan Mahapatra
Hi Jacek, The issue might not be very widespread. I couldn’t reproduce it. Can you see if I am doing anything incorrect in the below queries? scala> spark.range(10).write.saveAsTable("t1") scala> spark.sql("describe formatted t1").show(100, false)

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
Hi Sean, Thanks again for helping me to remain sane and that the issue is not imaginary :) I'd expect to be spark-warehouse in the directory where spark-shell is executed (which is what has always been used for the metastore). I'm reviewing all the changes between 2.3.1..2.3.2 to find anything