date:20220131

Re: Code fails when AQE enabled in Spark 3.1

2022-01-31 Thread Gaspar Muñoz

it looks that this commit ( https://github.com/apache/spark/commit/a85490659f45410be3588c669248dc4f534d2a71) do the trick. [image: image.png] Don't you think, this bug is enough important to incluide in 3.1 branch? Regards El jue, 20 ene 2022 a las 8:55, Gaspar Muñoz () escribió: > Hi guys, >

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Martin Grigorov

Hi, On Mon, Jan 31, 2022 at 7:57 PM KS, Rajabhupati wrote: > Thanks a lot Sean. One final question before I close the conversion how do > we know what are the features that will be added as part of spark 3.3 > version? > There will be release notes for 3.3 at linked at

bucketBy in pyspark not retaining partition information

2022-01-31 Thread Nitin Siwach

I am reading two datasets that I saved to the disk with ```bucketBy``` option on the same key with the same number of partitions. When I read them back and join them, they should not result in a shuffle. But, that isn't the case I am seeing. *The following code demonstrates the alleged

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread KS, Rajabhupati

Thanks a lot Sean. One final question before I close the conversion how do we know what are the features that will be added as part of spark 3.3 version? Regards Rajabhupati From: Sean Owen Sent: Monday, January 31, 2022 10:50:16 PM To: KS, Rajabhupati Cc:

RE: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread KS, Rajabhupati

Thanks Sean , When is spark 3.3.0 is expected to release? Regards Raja From: Sean Owen mailto:sro...@gmail.com>> Sent: Monday, January 31, 2022 10:28 PM To: KS, Rajabhupati mailto:rajabhupati...@comcast.com>> Subject: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1 Further,

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Sean Owen

https://spark.apache.org/versioning-policy.html On Mon, Jan 31, 2022 at 11:15 AM KS, Rajabhupati wrote: > Thanks Sean , When is spark 3.3.0 is expected to release? > > > > Regards > > Raja > > *From:* Sean Owen > *Sent:* Monday, January 31, 2022 10:28 PM > *To:* KS, Rajabhupati > *Subject:*

Re: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Sean Owen

(BTW you are sending to the Spark incubator list, and Spark has not been in incubation for about 7 years. Use user@spark.apache.org) What update are you looking for? this has been discussed extensively on the Spark mailing list. Spark is not evidently vulnerable to this. 3.3.0 will include log4j

RE: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread KS, Rajabhupati

Hi Team , Is there any update on this request ? We did see Jira https://issues.apache.org/jira/browse/SPARK-37630 for this request but we see it closed . Regards Raja From: KS, Rajabhupati Sent: Sunday, January 30, 2022 9:03 AM To: u...@spark.incubator.apache.org Subject: Log4j upgrade in

Re:

2022-01-31 Thread Bitfox

Please send an e-mail: user-unsubscr...@spark.apache.org to unsubscribe yourself from the mailing list. On Mon, Jan 31, 2022 at 10:11 PM wrote: > unsubscribe > > >

Re:

2022-01-31 Thread Bitfox

Please send an e-mail: user-unsubscr...@spark.apache.org to unsubscribe yourself from the mailing list. On Mon, Jan 31, 2022 at 10:23 PM Gaetano Fabiano wrote: > Unsubscribe > > Inviato da iPhone > > - > To unsubscribe e-mail:

[no subject]

2022-01-31 Thread Gaetano Fabiano

Unsubscribe Inviato da iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[no subject]

2022-01-31 Thread pduflot

unsubscribe

Re: A Persisted Spark DataFrame is computed twice

2022-01-31 Thread Sean Owen

One guess - you are doing two things here, count() and write(). There is a persist(), but it's async. It won't necessarily wait for the persist to finish before proceeding and may have to recompute at least some partitions for the second op. You could debug further by looking at the stages and

Regarding Spark Cassandra Metrics

2022-01-31 Thread Yogesh Kumar Garg

Hi all, I am developing a spark application where I am loading the data into Cassandra and I am using the Spark Cassandra connector for the same. I have created a FAT jar with all the dependencies and submitted that using spark-submit. I am able to load the data successfully to cassandra, but I

Re: unsubscribe

2022-01-31 Thread Bitfox

The signature in your messages has showed how to unsubscribe. To unsubscribe e-mail: user-unsubscr...@spark.apache.org On Mon, Jan 31, 2022 at 7:53 PM Lucas Schroeder Rossi wrote: > unsubscribe > > - > To unsubscribe e-mail:

unsubscribe

2022-01-31 Thread Lucas Schroeder Rossi

unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2022-01-31 Thread Lucas Schroeder Rossi

unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Migration to Spark 3.2

2022-01-31 Thread Aurélien Mazoyer

Hi Stephen, I managed to solve my issue, I had a conflicting version of jackson databind that came from parent pom. Thank you, Aurelien Le dim. 30 janv. 2022 à 23:28, Aurélien Mazoyer a écrit : > Hi Stephen, > > Thank you for your answer. Yes, I changed the scope to "provided" but got > the

Re: why the pyspark RDD API is so slow?

2022-01-31 Thread Sebastian Piu

When you operate on a dataframe from the python side you are just invoking methods in the JVM via a proxy (py4j) so it is almost as coding in java itself. This is as long as you don't define any udf's or any other code that needs to invoke python for processing Check the High Performance Spark

Re: why the pyspark RDD API is so slow?

2022-01-31 Thread Bitfox

Hi In PySpark, RDD need serialised/deserialised, but dataframe doesn’t? Why? Thanks On Mon, Jan 31, 2022 at 4:46 PM Khalid Mammadov wrote: > Your scala program does not use any Spark API hence faster that others. If > you write the same code in pure Python I think it will be even faster than

Re: why the pyspark RDD API is so slow?

2022-01-31 Thread Khalid Mammadov

Your scala program does not use any Spark API hence faster that others. If you write the same code in pure Python I think it will be even faster than Scala program, especially taking into account these 2 programs runs on a single VM. Regarding Dataframe and RDD I would suggest to use Dataframes

Re:[ANNOUNCE] Apache Spark 3.2.1 released

2022-01-31 Thread beliefer

Thank you huaxin gao! Glad to see the release. At 2022-01-29 09:07:13, "huaxin gao" wrote: We are happy to announce the availability of Spark 3.2.1! Spark 3.2.1 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We

Re: A Persisted Spark DataFrame is computed twice

2022-01-31 Thread Sebastian Piu

Can you share the stages as seen in the spark ui for the count and coalesce jobs My suggestion of moving things around was just for troubleshooting rather than a solution of that wasn't clear before On Mon, 31 Jan 2022, 08:07 Benjamin Du, wrote: > Remvoing coalesce didn't help either. > > > >

Re: A Persisted Spark DataFrame is computed twice

2022-01-31 Thread Benjamin Du

I did check the execution plan, there were 2 stages and both stages show that the pandas UDF (which takes almost all the computation time of the DataFrame) is executed. It didn't seem to be an issue of repartition/coalesce as the DataFrame was still computed twice after removing coalesce.

unsubscribe

2022-01-31 Thread Rajeev

Re: A Persisted Spark DataFrame is computed twice

2022-01-31 Thread Benjamin Du

Remvoing coalesce didn't help either. Best, Ben Du Personal Blog | GitHub | Bitbucket | Docker Hub From: Deepak Sharma Sent:

Re: Code fails when AQE enabled in Spark 3.1

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

bucketBy in pyspark not retaining partition information

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

RE: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

Re: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

RE: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

Re:

Re:

[no subject]

[no subject]

Re: A Persisted Spark DataFrame is computed twice

Regarding Spark Cassandra Metrics

Re: unsubscribe

unsubscribe

unsubscribe

Re: Migration to Spark 3.2

Re: why the pyspark RDD API is so slow?

Re: why the pyspark RDD API is so slow?

Re: why the pyspark RDD API is so slow?

Re:[ANNOUNCE] Apache Spark 3.2.1 released

Re: A Persisted Spark DataFrame is computed twice

Re: A Persisted Spark DataFrame is computed twice

unsubscribe

Re: A Persisted Spark DataFrame is computed twice

26 matches

Site Navigation

Mail list logo

Footer information