Re: Distinct on Map data type -- SPARK-19893

2018-01-12 Thread HariKrishnan CK
Hi Wan, could you please be more specific on the scenarios where it will give wrong results. I checked distinct and intersect operators in many use cases i have and could not figure out a failure scenario giving wrong results. Thanks On Jan 12, 2018 7:36 PM, "Wenchen Fan"

Re: Distinct on Map data type -- SPARK-19893

2018-01-12 Thread Wenchen Fan
Actually Spark 2.1.0 doesn't work for your case, it may give you wrong result... We are still working on adding this feature, but before that, we should fail earlier instead of returning wrong result. On Sat, Jan 13, 2018 at 11:02 AM, ckhari4u wrote: > I see SPARK-19893 is

Distinct on Map data type -- SPARK-19893

2018-01-12 Thread ckhari4u
I see SPARK-19893 is backported to Spark 2.1 and 2.0.1 as well. I do not see a clear justification for why SPARK 19893 is important and needed. I have a sample table which works fine with an earlier build of Spark 2.1.0. Now that the latest build is having the backport of SPARK-19893, its failing

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-12 Thread Anirudh Ramanathan
Felix just pointed out to me that staging is missing the spark-kubernetes package. I think we missed updating release-build.sh , which is why staging

Re: Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-12 Thread Shixiong(Ryan) Zhu
FYI, we reverted a commit in https://github.com/apache/spark/commit/55dbfbca37ce4c05f83180777ba3d4fe2d96a02e to fix the issue. On Fri, Jan 12, 2018 at 11:45 AM, Xin Lu wrote: > seems like someone should investigate what caused the build time to go up > an hour and if it's

[VOTE] Spark 2.3.0 (RC1)

2018-01-12 Thread Sameer Agarwal
Please vote on releasing the following candidate as Apache Spark version 2.3.0. The vote is open until Thursday January 18, 2018 at 8:00:00 am UTC and passes if a majority of at least 3 PMC +1 votes are cast. [ ] +1 Release this package as Apache Spark 2.3.0 [ ] -1 Do not release this package

Re: Kubernetes: why use init containers?

2018-01-12 Thread Anirudh Ramanathan
That's fair - I guess it would be a stretch to assume users wouldn't put custom logic in their init containers if that hook is provided to them. :) Experimental sounds like a good idea for 2.3. Gives us enough wriggle room for the next one, and hopefully user feedback in the meantime. Thanks,

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 1:53 PM, Anirudh Ramanathan wrote: > As I understand, the bigger change discussed here are like the init > containers, which will be more on the implementation side than a user facing > change/behavioral change - which is why it seemed okay to

Re: Kubernetes: why use init containers?

2018-01-12 Thread Anirudh Ramanathan
I'd like to discuss the criteria here for graduating from experimental status. (as a fork, we were mentioned in the documentation as experimental). As I understand, the bigger change discussed here are like the init containers, which will be more on the implementation side than a user facing

Re: Kubernetes: why use init containers?

2018-01-12 Thread Andrew Ash
+1 on the first release being marked experimental. Many major features coming into Spark in the past have gone through a stabilization process On Fri, Jan 12, 2018 at 1:18 PM, Marcelo Vanzin wrote: > BTW I most probably will not have time to get back to this at any time >

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
BTW I most probably will not have time to get back to this at any time soon, so if anyone is interested in doing some clean up, I'll leave my branch up. I'm seriously thinking about proposing that we document the k8s backend as experimental in 2.3; it seems there still a lot to be cleaned up in

Re: Build timed out for `branch-2.3 (hadoop-2.7)`

2018-01-12 Thread Xin Lu
seems like someone should investigate what caused the build time to go up an hour and if it's expected or not. On Thu, Jan 11, 2018 at 7:37 PM, Dongjoon Hyun wrote: > Hi, All and Shane. > > Can we increase the build time for `branch-2.3` during 2.3 RC period? > > There

Re: Kubernetes: why use init containers?

2018-01-12 Thread Marcelo Vanzin
On Fri, Jan 12, 2018 at 4:13 AM, Eric Charles wrote: >> Again, I don't see what is all this hoopla about fine grained control >> of dependency downloads. Spark solved this years ago for Spark >> applications. Don't reinvent the wheel. > > Init-containers are used today to

Re: Schema Evolution in Apache Spark

2018-01-12 Thread Dongjoon Hyun
This is about Spark-layer test cases on **read-only** CSV, JSON, Parquet, ORC files. You can find more details and comparisons in terms of Spatk support coverage. Bests, Dongjoon. On Thu, Jan 11, 2018 at 22:19 Georg Heiler wrote: > Isn't this related to the data

Re: Compiling Spark UDF at runtime

2018-01-12 Thread Georg Heiler
You could store the jar in hdfs. Then even in yarn cluster mode your give workaround should work. Michael Shtelma schrieb am Fr. 12. Jan. 2018 um 12:58: > Hi all, > > I would like to be able to compile Spark UDF at runtime. Right now I > am using Janino for that. > My problem

Compiling Spark UDF at runtime

2018-01-12 Thread Michael Shtelma
Hi all, I would like to be able to compile Spark UDF at runtime. Right now I am using Janino for that. My problem is, that in order to make my compiled functions visible to spark, I have to set janino classloader (janino gives me classloader with compiled UDF classes) as context class loader

Re: Accessing the SQL parser

2018-01-12 Thread Michael Shtelma
Hi AbdealiJK, In order to get AST you can parse your query with Spark Parser : LogicalPlan logicalPlan = sparkSession.sessionState().sqlParser().parsePlan("select * from myTable"); Afterwards you can implement your custom logic and execute it in this way: Dataset ds =

Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-12 Thread yash datta
Thanks for the prompt reply!. Opened a ticket here: https://issues.apache.org/jira/browse/SPARK-23056 BR Yash On Fri, Jan 12, 2018 at 3:41 PM, StanZhai wrote: > This problem was introduced by > which is designed to >