Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Qian Sun
My understanding is that we don’t need to do anything. Log4j2-core not used in spark. > 2021年12月13日 下午12:45,Pralabh Kumar 写道: > > Hi developers, users > > Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on > recent CVE detected ? > > > Regards > Pralabh kumar

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
You would want to shade this dependency in your app, in which case you would be using log4j 2. If you don't shade and just include it, you will also be using log4j 2 as some of the API classes are different. If they overlap with log4j 1, you will probably hit errors anyway. On Mon, Dec 13, 2021

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread James Yu
Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x and gets submitted to the Spark cluster. Which version of log4j gets actually used during the Spark session? From: Sean Owen Sent: Monday, December 13, 2021 8:25 AM To: Jörn Franke Cc:

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
You are correct, I understand. My only concern is the back compatibility problem, which worked for the previous version of Apache Spark. It's painful when an OOTB feature breaks without documentation or a workaround like "spark.sql.legacy.keepSqlRecursive" true/false. It's not about "my code", it

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
I think we're around in circles - you should not do this. You essentially have "__TABLE__ = SELECT * FROM __TABLE__" and I hope it's clear why that can't work in general. At first execution, sure, maybe "old" __TABLE__ refers to "SELECT 1", but what about the second time? if you stick to that

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
I've reduced the code to reproduce the issue, val df = spark.sql("SELECT 1") df.createOrReplaceTempView("__TABLE__") spark.sql("SELECT * FROM __TABLE__").show val df2 = spark.sql("SELECT *,2 FROM __TABLE__") df2.createOrReplaceTempView("__TABLE__") // Exception in Spark 3.2 but works for Spark

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
_shrug_ I think this is a bug fix, unless I am missing something here. You shouldn't just use __TABLE__ for everything, and I'm not seeing a good reason to do that other than it's what you do now. I'm not clear if it's coming across that this _can't_ work in the general case. On Mon, Dec 13, 2021

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
In this context, I don't want to worry about the name of the temporary table. That's why it is "__TABLE__". The point is that this behavior for Spark 3.2.x it's breaking back compatibility for all previous versions of Apache Spark. In my opinion we should at least have some flag like

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
You can replace temp views. Again: what you can't do here is define a temp view in terms of itself. If you are reusing the same name over and over, it's probably easy to do that, so you don't want to do that. You want different names for different temp views, or else ensure you aren't doing the

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
I didn't post the SO issue, I've just found the same exception I'm facing for Spark 3.2. Almaren Framework has a concept of create temporary views with the name "__TABLE__". Example, if you want to use SQL dialect to a DataFrame to join a table/aggregation/apply a function whatever. Instead of

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
If the issue is what you posted in SO, I think the stack trace explains it already. You want to avoid this recursive definition, which in general can't work. I think it's simply explicitly disallowed in all cases now, but, you should not be depending on this anyway - why can't this just be

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
This has come up several times over years - search JIRA. The very short summary is: Spark does not use log4j 1.x, but its dependencies do, and that's the issue. Anyone that can successfully complete the surgery at this point is welcome to, but I failed ~2 years ago. On Mon, Dec 13, 2021 at 10:02

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Martin Wunderlich
There is a discussion on Github on this topic and the recommendation is to upgrade from 1.x to 2.15.0, due to the vulnerability of 1.x: https://github.com/apache/logging-log4j2/pull/608 This discussion is also referenced by the German Federal Office for Information Security:

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Sean, https://github.com/music-of-the-ainur/almaren-framework/tree/spark-3.2 Just executing "sbt test" will reproduce the error, the same code works for spark 2.3.x, 2.4.x and 3.1.x why doesn't it work for spark 3.2 ? Thank you so much On Mon, Dec 13, 2021 at 12:59 PM Sean Owen wrote: >

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Jörn Franke
Is it in any case appropriate to use log4j 1.x which is not maintained anymore and has other security vulnerabilities which won’t be fixed anymore ? > Am 13.12.2021 um 06:06 schrieb Sean Owen : > >  > Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. > There was

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
... but the error is not "because that already exists". See your stack trace. It's because the definition is recursive. You define temp view test1, create a second DF from it, and then redefine test1 as that result. test1 depends on test1. On Mon, Dec 13, 2021 at 9:58 AM Daniel de Oliveira

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Sean, The method name is very clear "createOrReplaceTempView" doesn't make any sense to throw an exception because this view already exists. Spark 3.2.x is breaking back compatibility with no reason or sense. On Mon, Dec 13, 2021 at 12:53 PM Sean Owen wrote: > The error looks 'valid' - you

Re: spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Sean Owen
The error looks 'valid' - you define a temp view in terms of its own previous version, which doesn't quite make sense - somewhere the new definition depends on the old definition. I think it just correctly surfaces as an error now,. On Mon, Dec 13, 2021 at 9:41 AM Daniel de Oliveira Mantovani <

spark 3.2.0 the different dataframe createOrReplaceTempView the same name TempView

2021-12-13 Thread Daniel de Oliveira Mantovani
Hello team, I've found this issue while I was porting my project from Apache Spark 3.1.x to 3.2.x. https://stackoverflow.com/questions/69937415/spark-3-2-0-the-different-dataframe-createorreplacetempview-the-same-name-tempvi Do we have a bug for that in apache-spark or I need to create one ?

Re: About some Spark technical assistance

2021-12-13 Thread sam smith
you were added to the repo to contribute, thanks. I included the java class and the paper i am replicating Le lun. 13 déc. 2021 à 04:27, a écrit : > github url please. > > On 2021-12-13 01:06, sam smith wrote: > > Hello guys, > > > > I am replicating a paper's algorithm (graph coloring