Re: unsubscribe

2022-01-21 Thread capitnfrakass
On 22/01/2022 11:07, Renan F. Souza wrote: unsubscribe You could be able to unsubscribe yourself from the list by sending an email to: user-unsubscr...@spark.apache.org thanks. - To unsubscribe e-mail:

unsubscribe

2022-01-21 Thread Renan F. Souza
unsubscribe

Migration to Spark 3.2

2022-01-21 Thread Aurélien Mazoyer
Hello, I migrated my code to Spark 3.2 and I am facing some issues. When I run my unit tests via Maven, I get this error: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ which is not super nice. However, when I run my test via Intellij, I get

RE: Is user@spark indexed by google?

2022-01-21 Thread Theodore J Griesenbrock
Try searching here:   https://lists.apache.org/list.html?user@spark.apache.org   -T.J.     T.J. Griesenbrock Technical Release Manager Watson Health He/Him/His   +1 (602) 377-7673 (Text only)t...@ibm.com  IBM     - Original message -From: "Mich Talebzadeh" To:Cc: "user @spark" Subject:

Re: Is user@spark indexed by google?

2022-01-21 Thread Mich Talebzadeh
Well agreed that this user@spark is a great place to search for answers and no I don't think this email list is indexed by Google. For this reason I use gmail and all my user@/dev@ memberships are added to my gmail account. For example, I can search starting from 2016 onwards the gmail mailing

Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
Probably, because Spark prefers locality, but not necessarily. On Fri, Jan 21, 2022 at 2:10 PM Siddhesh Kalgaonkar < kalgaonkarsiddh...@gmail.com> wrote: > Thank you so much for this information, Sean. One more question, that when > it wants to re-run the failed partition, where does it run? On

Re: How to configure log4j in pyspark to get log level, file name, and line number

2022-01-21 Thread Andrew Davidson
Interesting. I noticed that my drive log messages with time stamp, function name but no line number. However log message in other python files only contain the messages. All of my python code is a single zip file. The zip file is job submit argument 2022-01-21 19:45:02 WARN __main__:? -

Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Thank you so much for this information, Sean. One more question, that when it wants to re-run the failed partition, where does it run? On the same node or some other node? On Fri, 21 Jan 2022, 23:41 Sean Owen, wrote: > The Spark program already knows the partitions of the data and where they >

Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
The Spark program already knows the partitions of the data and where they exist; that's just defined by the data layout. It doesn't care what data is inside. It knows partition 1 needs to be processed and if the task processing it fails, needs to be run again. I'm not sure where you're seeing data

Unsubscribe

2022-01-21 Thread Aniket Khandelwal
unsubscribe Thanks

Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Okay, so suppose I have 10 records distributed across 5 nodes and the partition of the first node holding 2 records failed. I understand that it will re-process this partition but how will it come to know that XYZ partition was holding XYZ data so that it will pick again only those records and

Is user@spark indexed by google?

2022-01-21 Thread Andrew Davidson
There is a ton of great info in this archive. I noticed when I do a google search it does not seem to find results from this source Kind regards Andy

Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
In that case, the file exists in parts across machines. No, tasks won't re-read the whole file; no task does or can do that. Failed partitions are reprocessed, but as in the first pass, the same partition is processed. On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <

What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Hello team, I am aware that in case of memory issues when a task fails, it will try to restart 4 times since it is a default number and if it still fails then it will cause the entire job to fail. But suppose if I am reading a file that is distributed across nodes in partitions. So, what will

Re: Spark 3.2.0 upgrade

2022-01-21 Thread Amit Sharma
Hello, I tried using a cassandra unshaded connector or normal connector both are giving the same error at runtime while connecting to cassandra. "com.datastax.spark" %% "spark-cassandra-connector-unshaded" % "2.4.2" Or "com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0" Russ

Re: questions on these functions

2022-01-21 Thread Sean Owen
Eh, are you talking about foldLeft and foldRight in Scala? those are not from Hadoop or Spark. They are common functions in functional languages. They 'fold' a value into a new value by apply a function to the starting value and every element of a collection. Because the op may be non-commutative,

Re: questions on these functions

2022-01-21 Thread Sherd Fox
sorry I am programming with scala who has these functions. regards. On Fri, 21 Jan 2022 at 20:59, Theodore J Griesenbrock wrote: > I discovered several instances of discussion on leftFold and rightFold in > a variety of forums, but I can not find anything related to RDD in the > official

Re: questions on these functions

2022-01-21 Thread Theodore J Griesenbrock
I discovered several instances of discussion on leftFold and rightFold in a variety of forums, but I can not find anything related to RDD in the official documentation:   https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html   It appears to be non-related to Spark, and

RE: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3?

2022-01-21 Thread Bode, Meikel, NM-X-DS
Hello Juan Liu, The release process is well documented (see last step on announcement): https://spark.apache.org/release-process.html To (un)subcribe to the mailing lists see: https://spark.apache.org/community.html Best, Meikel Meikel Bode, MSc Senior Manager | Head of SAP Data Platforms &

questions on these functions

2022-01-21 Thread Sherd Fox
Hello sparkers, What were the differences for leftFold, rightFold and the fold in RDD functions? I am not very clear about the usage of them. Thanks.