Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-13 Thread Nicholas Chammas
Yeah, I think approximate percentile is good enough most of the time. I don't have a specific need for a precise median. I was interested in implementing it more as a Catalyst learning exercise, but it turns out I picked a bad learning exercise to solve. :) On Mon, Dec 13, 2021 at 9:46 PM

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-13 Thread Reynold Xin
tl;dr: there's no easy way to implement aggregate expressions that'd require multiple pass over data. It is simply not something that's supported and doing so would be very high cost. Would you be OK using approximate percentile? That's relatively cheap. On Mon, Dec 13, 2021 at 6:43 PM,

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-13 Thread Nicholas Chammas
No takers here? :) I can see now why a median function is not available in most data processing systems. It's pretty annoying to implement! On Thu, Dec 9, 2021 at 9:25 PM Nicholas Chammas wrote: > I'm trying to create a new aggregate function. It's my first time working > with Catalyst, so

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Qian Sun
My understanding is that we don’t need to do anything. Log4j2-core not used in spark. > 2021年12月13日 下午12:45,Pralabh Kumar 写道: > > Hi developers, users > > Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on > recent CVE detected ? > > > Regards > Pralabh kumar

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
You would want to shade this dependency in your app, in which case you would be using log4j 2. If you don't shade and just include it, you will also be using log4j 2 as some of the API classes are different. If they overlap with log4j 1, you will probably hit errors anyway. On Mon, Dec 13, 2021

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
This has come up several times over years - search JIRA. The very short summary is: Spark does not use log4j 1.x, but its dependencies do, and that's the issue. Anyone that can successfully complete the surgery at this point is welcome to, but I failed ~2 years ago. On Mon, Dec 13, 2021 at 10:02

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Jörn Franke
Is it in any case appropriate to use log4j 1.x which is not maintained anymore and has other security vulnerabilities which won’t be fixed anymore ? > Am 13.12.2021 um 06:06 schrieb Sean Owen : > >  > Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. > There was

Re: Hi Team, I put a UDF-Utils jar on Google Cloud Storage, but I can't run it

2021-12-13 Thread Mich Talebzadeh
probably it is because you are using an older version of spark This works for version 3.1.1 gsutil ls gs://etcbucket/ojdbc8.jar gs://etcbucket/ojdbc8.jar *spark-sql add jar gs://etcbucket/ojdbc8.jar* 21/12/13 08:56:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your