Re: unsubscribe

2022-01-21 Thread capitnfrakass

On 22/01/2022 11:07, Renan F. Souza wrote:

unsubscribe


You could be able to unsubscribe yourself from the list by sending an 
email to:

user-unsubscr...@spark.apache.org

thanks.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



unsubscribe

2022-01-21 Thread Renan F. Souza
unsubscribe


Migration to Spark 3.2

2022-01-21 Thread Aurélien Mazoyer
Hello,

I migrated my code to Spark 3.2 and I am facing some issues. When I run my
unit tests via Maven, I get this error:
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
which is not super nice.

However, when I run my test via Intellij, I get the following one:
java.lang.ExceptionInInitializerError
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.map(RDD.scala:421)
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala
module 2.12.3 requires Jackson Databind version >= 2.12.0 and < 2.13.0
which is far better imo since it gives me some clue on what is missing in
my pom.xml file to make it work. After putting a few more dependencies, my
tests are again passing in intellij, but I am stuck on the same error
when I am running maven command :-/.
It seems that jdk and maven versions are the same and both are using the
same .m2 directory.
Any clue on what can be going wrong?

Thank you,

Aurelien


RE: Is user@spark indexed by google?

2022-01-21 Thread Theodore J Griesenbrock
Try searching here:
 
https://lists.apache.org/list.html?user@spark.apache.org
 
-T.J.
 
 
T.J. Griesenbrock
Technical Release Manager
Watson Health
He/Him/His
 
+1 (602) 377-7673 (Text only)t...@ibm.com 
IBM
 
 
- Original message -From: "Mich Talebzadeh" To:Cc: "user @spark" Subject: [EXTERNAL] Re: Is user@spark indexed by google?Date: Fri, Jan 21, 2022 16:08  Well agreed that this user@spark is a great place to search for answers and no I don't think this email list is indexed by Google. For this reason I use gmail and all my user@/dev@ memberships are added to my gmail account. For example,ZjQcmQRYFpfptBannerStart  

This Message Is From an External Sender
This message came from outside your organization. ZjQcmQRYFpfptBannerEnd 
 
Well agreed that this user@spark is a great place to search for answers and no I don't think this email list is indexed by Google.
 
For this reason I use gmail and all my user@/dev@ memberships are added to my gmail account. For example, I can search starting from 2016 onwards the gmail mailing list. I suggest you explore that option if it helps.
 
Mich
 
   view my Linkedin profile
 
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. 
  

On Fri, 21 Jan 2022 at 18:03, Andrew Davidson  wrote:
There is a ton of great info in this archive. I noticed when I do a google search it does not seem to find results from this source
 
Kind regards
 
Andy
 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Is user@spark indexed by google?

2022-01-21 Thread Mich Talebzadeh
Well agreed that this user@spark is a great place to search for answers and
no I don't think this email list is indexed by Google.


For this reason I use gmail and all my user@/dev@ memberships are added to
my gmail account. For example, I can search starting from 2016 onwards the
gmail mailing list. I suggest you explore that option if it helps.


Mich


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 21 Jan 2022 at 18:03, Andrew Davidson 
wrote:

> There is a ton of great info in this archive. I noticed when I do a google
> search it does not seem to find results from this source
>
>
>
> Kind regards
>
>
>
> Andy
>


Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
Probably, because Spark prefers locality, but not necessarily.

On Fri, Jan 21, 2022 at 2:10 PM Siddhesh Kalgaonkar <
kalgaonkarsiddh...@gmail.com> wrote:

> Thank you so much for this information, Sean. One more question, that when
> it wants to re-run the failed partition, where does it run? On the same
> node or some other node?
>
>
> On Fri, 21 Jan 2022, 23:41 Sean Owen,  wrote:
>
>> The Spark program already knows the partitions of the data and where they
>> exist; that's just defined by the data layout. It doesn't care what data is
>> inside. It knows partition 1 needs to be processed and if the task
>> processing it fails, needs to be run again. I'm not sure where you're
>> seeing data loss here? the data is already stored to begin with, not
>> somehow consumed and deleted.
>>
>> On Fri, Jan 21, 2022 at 12:07 PM Siddhesh Kalgaonkar <
>> kalgaonkarsiddh...@gmail.com> wrote:
>>
>>> Okay, so suppose I have 10 records distributed across 5 nodes and the
>>> partition of the first node holding 2 records failed. I understand that it
>>> will re-process this partition but how will it come to know that XYZ
>>> partition was holding XYZ data so that it will pick again only those
>>> records and reprocess it? In case of failure of a partition, is there a
>>> data loss? or is it stored somewhere?
>>>
>>> Maybe my question is very naive but I am trying to understand it in a
>>> better way.
>>>
>>> On Fri, Jan 21, 2022 at 11:32 PM Sean Owen  wrote:
>>>
 In that case, the file exists in parts across machines. No, tasks won't
 re-read the whole file; no task does or can do that. Failed partitions are
 reprocessed, but as in the first pass, the same partition is processed.

 On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
 kalgaonkarsiddh...@gmail.com> wrote:

> Hello team,
>
> I am aware that in case of memory issues when a task fails, it will
> try to restart 4 times since it is a default number and if it still fails
> then it will cause the entire job to fail.
>
> But suppose if I am reading a file that is distributed across nodes in
> partitions. So, what will happen if a partition fails that holds some 
> data?
> Will it re-read the entire file and get that specific subset of data since
> the driver has the complete information? or will it copy the data to the
> other working nodes or tasks and try to run it?
>



Re: How to configure log4j in pyspark to get log level, file name, and line number

2022-01-21 Thread Andrew Davidson
Interesting. I noticed that my drive log messages with time stamp, function 
name but no line number. However log message in other python files only contain 
the messages. All of my python code is a single zip file. The zip file is job 
submit argument

2022-01-21 19:45:02 WARN  __main__:? - sparkConfig: ('spark.sql.cbo.enabled', 
'true')
2022-01-21 19:48:34 WARN  __main__:? - readsSparkDF.rdd.getNumPartitions():1698
__init__ BEGIN
__init__ END
run BEGIN
run rawCountsSparkDF numRows:5387495 numCols:10409

My guess is somehow I need to change the way log4j is configure on the workers?

Kind regards

Andy

From: Andrew Davidson 
Date: Thursday, January 20, 2022 at 2:32 PM
To: "user @spark" 
Subject: How to configure log4j in pyspark to get log level, file name, and 
line number

Hi

When I use python logging for my unit test. I am able to control the output 
format. I get the log level, the file and line number, then the msg

[INFO testEstimatedScalingFactors.py:166 - test_B_convertCountsToInts()] BEGIN

In my spark driver I am able to get the log4j logger

spark = SparkSession\
.builder\
.appName("estimatedScalingFactors")\
.getOrCreate()

#
# 
https://medium.com/@lubna_22592/building-production-pyspark-jobs-5480d03fd71e
# initialize  logger for yarn cluster logs
#
log4jLogger = spark.sparkContext._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)

However it only outputs the message. As a hack I have been adding the function 
names to the msg.



I wonder if this is because of the way I make my python code available. When I 
submit my job using



‘$ gcloud dataproc jobs submit pyspark’



I pass my python file in a zip file
 --py-files ${extraPkg}

I use level warn because the driver info logs are very verbose


###

def rowSums( self, countsSparkDF, columnNames ):

self.logger.warn( "rowSums BEGIN" )



# https://stackoverflow.com/a/54283997/4586180

retDF = countsSparkDF.na.fill( 0 ).withColumn( "rowSum" , reduce( add, 
[col( x ) for x in columnNames] ) )



self.logger.warn( "rowSums retDF numRows:{} numCols:{}"\

 .format( retDF.count(), len( retDF.columns ) ) )



self.logger.warn( "rowSums END\n" )

return retDF

kind regards

Andy


Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Thank you so much for this information, Sean. One more question, that when
it wants to re-run the failed partition, where does it run? On the same
node or some other node?


On Fri, 21 Jan 2022, 23:41 Sean Owen,  wrote:

> The Spark program already knows the partitions of the data and where they
> exist; that's just defined by the data layout. It doesn't care what data is
> inside. It knows partition 1 needs to be processed and if the task
> processing it fails, needs to be run again. I'm not sure where you're
> seeing data loss here? the data is already stored to begin with, not
> somehow consumed and deleted.
>
> On Fri, Jan 21, 2022 at 12:07 PM Siddhesh Kalgaonkar <
> kalgaonkarsiddh...@gmail.com> wrote:
>
>> Okay, so suppose I have 10 records distributed across 5 nodes and the
>> partition of the first node holding 2 records failed. I understand that it
>> will re-process this partition but how will it come to know that XYZ
>> partition was holding XYZ data so that it will pick again only those
>> records and reprocess it? In case of failure of a partition, is there a
>> data loss? or is it stored somewhere?
>>
>> Maybe my question is very naive but I am trying to understand it in a
>> better way.
>>
>> On Fri, Jan 21, 2022 at 11:32 PM Sean Owen  wrote:
>>
>>> In that case, the file exists in parts across machines. No, tasks won't
>>> re-read the whole file; no task does or can do that. Failed partitions are
>>> reprocessed, but as in the first pass, the same partition is processed.
>>>
>>> On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
>>> kalgaonkarsiddh...@gmail.com> wrote:
>>>
 Hello team,

 I am aware that in case of memory issues when a task fails, it will try
 to restart 4 times since it is a default number and if it still fails then
 it will cause the entire job to fail.

 But suppose if I am reading a file that is distributed across nodes in
 partitions. So, what will happen if a partition fails that holds some data?
 Will it re-read the entire file and get that specific subset of data since
 the driver has the complete information? or will it copy the data to the
 other working nodes or tasks and try to run it?

>>>


Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
The Spark program already knows the partitions of the data and where they
exist; that's just defined by the data layout. It doesn't care what data is
inside. It knows partition 1 needs to be processed and if the task
processing it fails, needs to be run again. I'm not sure where you're
seeing data loss here? the data is already stored to begin with, not
somehow consumed and deleted.

On Fri, Jan 21, 2022 at 12:07 PM Siddhesh Kalgaonkar <
kalgaonkarsiddh...@gmail.com> wrote:

> Okay, so suppose I have 10 records distributed across 5 nodes and the
> partition of the first node holding 2 records failed. I understand that it
> will re-process this partition but how will it come to know that XYZ
> partition was holding XYZ data so that it will pick again only those
> records and reprocess it? In case of failure of a partition, is there a
> data loss? or is it stored somewhere?
>
> Maybe my question is very naive but I am trying to understand it in a
> better way.
>
> On Fri, Jan 21, 2022 at 11:32 PM Sean Owen  wrote:
>
>> In that case, the file exists in parts across machines. No, tasks won't
>> re-read the whole file; no task does or can do that. Failed partitions are
>> reprocessed, but as in the first pass, the same partition is processed.
>>
>> On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
>> kalgaonkarsiddh...@gmail.com> wrote:
>>
>>> Hello team,
>>>
>>> I am aware that in case of memory issues when a task fails, it will try
>>> to restart 4 times since it is a default number and if it still fails then
>>> it will cause the entire job to fail.
>>>
>>> But suppose if I am reading a file that is distributed across nodes in
>>> partitions. So, what will happen if a partition fails that holds some data?
>>> Will it re-read the entire file and get that specific subset of data since
>>> the driver has the complete information? or will it copy the data to the
>>> other working nodes or tasks and try to run it?
>>>
>>


Unsubscribe

2022-01-21 Thread Aniket Khandelwal
unsubscribe

Thanks


Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Okay, so suppose I have 10 records distributed across 5 nodes and the
partition of the first node holding 2 records failed. I understand that it
will re-process this partition but how will it come to know that XYZ
partition was holding XYZ data so that it will pick again only those
records and reprocess it? In case of failure of a partition, is there a
data loss? or is it stored somewhere?

Maybe my question is very naive but I am trying to understand it in a
better way.

On Fri, Jan 21, 2022 at 11:32 PM Sean Owen  wrote:

> In that case, the file exists in parts across machines. No, tasks won't
> re-read the whole file; no task does or can do that. Failed partitions are
> reprocessed, but as in the first pass, the same partition is processed.
>
> On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
> kalgaonkarsiddh...@gmail.com> wrote:
>
>> Hello team,
>>
>> I am aware that in case of memory issues when a task fails, it will try
>> to restart 4 times since it is a default number and if it still fails then
>> it will cause the entire job to fail.
>>
>> But suppose if I am reading a file that is distributed across nodes in
>> partitions. So, what will happen if a partition fails that holds some data?
>> Will it re-read the entire file and get that specific subset of data since
>> the driver has the complete information? or will it copy the data to the
>> other working nodes or tasks and try to run it?
>>
>


Is user@spark indexed by google?

2022-01-21 Thread Andrew Davidson
There is a ton of great info in this archive. I noticed when I do a google 
search it does not seem to find results from this source

Kind regards

Andy


Re: What happens when a partition that holds data under a task fails

2022-01-21 Thread Sean Owen
In that case, the file exists in parts across machines. No, tasks won't
re-read the whole file; no task does or can do that. Failed partitions are
reprocessed, but as in the first pass, the same partition is processed.

On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
kalgaonkarsiddh...@gmail.com> wrote:

> Hello team,
>
> I am aware that in case of memory issues when a task fails, it will try to
> restart 4 times since it is a default number and if it still fails then it
> will cause the entire job to fail.
>
> But suppose if I am reading a file that is distributed across nodes in
> partitions. So, what will happen if a partition fails that holds some data?
> Will it re-read the entire file and get that specific subset of data since
> the driver has the complete information? or will it copy the data to the
> other working nodes or tasks and try to run it?
>


What happens when a partition that holds data under a task fails

2022-01-21 Thread Siddhesh Kalgaonkar
Hello team,

I am aware that in case of memory issues when a task fails, it will try to
restart 4 times since it is a default number and if it still fails then it
will cause the entire job to fail.

But suppose if I am reading a file that is distributed across nodes in
partitions. So, what will happen if a partition fails that holds some data?
Will it re-read the entire file and get that specific subset of data since
the driver has the complete information? or will it copy the data to the
other working nodes or tasks and try to run it?


Re: Spark 3.2.0 upgrade

2022-01-21 Thread Amit Sharma
Hello, I tried using a cassandra unshaded  connector or normal connector
both are giving the same error at runtime while connecting to cassandra.

"com.datastax.spark" %% "spark-cassandra-connector-unshaded" % "2.4.2"

Or

"com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0"


Russ similar issue is reported here also but no solution

https://community.datastax.com/questions/3519/issue-with-spring-boot-starter-data-cassandra-and.html

Caused by: java.lang.ClassNotFoundException: com.codahale.metrics.JmxReporter
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)




On Thu, Jan 20, 2022 at 5:17 PM Amit Sharma  wrote:

> Hello, I am trying to upgrade my project from spark 2.3.3 to spark 3.2.0.
> While running the application locally I am getting below error.
>
> Could you please let me know which version of the cassandra connector I
> should use. I am using below shaded connector  but i think that causing the
> issue
>
> "com.datastax.spark" %% "spark-cassandra-connector-unshaded" % "2.4.2"
>
>
> Caused by: java.lang.ClassNotFoundException: com.codahale.metrics.JmxReporter
>   at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
>   at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
>   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
>
>
> Thanks
>
> Amit
>
>


Re: questions on these functions

2022-01-21 Thread Sean Owen
Eh, are you talking about foldLeft and foldRight in Scala? those are not
from Hadoop or Spark.
They are common functions in functional languages. They 'fold' a value into
a new value by apply a function to the starting value and every element of
a collection.
Because the op may be non-commutative, doing this left to right or right to
left is different, hence two functions.

On Fri, Jan 21, 2022 at 6:59 AM Theodore J Griesenbrock 
wrote:

> I discovered several instances of discussion on leftFold and rightFold in
> a variety of forums, but I can not find anything related to RDD in the
> official documentation:
>
>
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
>
> It appears to be non-related to Spark, and probably something
> Hadoop-related.  Can you please be more specific on how leftFold and
> rightFold is imported and the language you are using to implement Spark?
>
> Thanks!
>
> -T.J.
>
>
> *T.J. Griesenbrock*
> Technical Release Manager
> Watson Health
> He/Him/His
>
> +1 (602) 377-7673 (Text only)
> t...@ibm.com
>
> IBM
>
>
>
> - Original message -
> From: "Sherd Fox" 
> To: user@spark.apache.org
> Cc:
> Subject: [EXTERNAL] questions on these functions
> Date: Fri, Jan 21, 2022 04:26
>
> Hello sparkers,
>
> What were the differences for leftFold, rightFold and the fold in RDD
> functions?
>
> I am not very clear about the usage of them.
>
> Thanks.
>
>
>
>
> - To
> unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: questions on these functions

2022-01-21 Thread Sherd Fox
sorry I am programming with scala who has these functions.

regards.

On Fri, 21 Jan 2022 at 20:59, Theodore J Griesenbrock  wrote:

> I discovered several instances of discussion on leftFold and rightFold in
> a variety of forums, but I can not find anything related to RDD in the
> official documentation:
>
>
> https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
>
> It appears to be non-related to Spark, and probably something
> Hadoop-related.  Can you please be more specific on how leftFold and
> rightFold is imported and the language you are using to implement Spark?
>
> Thanks!
>
> -T.J.
>
>
> *T.J. Griesenbrock*
> Technical Release Manager
> Watson Health
> He/Him/His
>
> +1 (602) 377-7673 (Text only)
> t...@ibm.com
>
> IBM
>
>
>
> - Original message -
> From: "Sherd Fox" 
> To: user@spark.apache.org
> Cc:
> Subject: [EXTERNAL] questions on these functions
> Date: Fri, Jan 21, 2022 04:26
>
> Hello sparkers,
>
> What were the differences for leftFold, rightFold and the fold in RDD
> functions?
>
> I am not very clear about the usage of them.
>
> Thanks.
>
>
>
>
>


Re: questions on these functions

2022-01-21 Thread Theodore J Griesenbrock
I discovered several instances of discussion on leftFold and rightFold in a variety of forums, but I can not find anything related to RDD in the official documentation:
 
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
 
It appears to be non-related to Spark, and probably something Hadoop-related.  Can you please be more specific on how leftFold and rightFold is imported and the language you are using to implement Spark?
 
Thanks!
 
-T.J.
 
 
T.J. Griesenbrock
Technical Release Manager
Watson Health
He/Him/His
 
+1 (602) 377-7673 (Text only)t...@ibm.com 
IBM
 
 
- Original message -From: "Sherd Fox" To: user@spark.apache.orgCc:Subject: [EXTERNAL] questions on these functionsDate: Fri, Jan 21, 2022 04:26  
Hello sparkers,
 
What were the differences for leftFold, rightFold and the fold in RDD functions?
 
I am not very clear about the usage of them.
 
Thanks.
 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3?

2022-01-21 Thread Bode, Meikel, NM-X-DS
Hello Juan Liu,

The release process is well documented (see last step on announcement):
https://spark.apache.org/release-process.html

To (un)subcribe to the mailing lists see:
https://spark.apache.org/community.html

Best,
Meikel

Meikel Bode, MSc
Senior Manager | Head of SAP Data Platforms & Analytics
-
Postal address:
Arvato Systems GmbH
Reinhard-Mohn-Straße 200
3 Gütersloh
Germany

Visitor address:
Arvato Systems GmbH
Fuggerstraße 11
33689 Bielefeld
Germany

Phone: +49(5241)80-89734
Mobile: +49(151)14774185
E-Mail: meikel.b...@bertelsmann.de
arvato-systems.de



From: Juan Liu 
Sent: Donnerstag, 20. Januar 2022 09:44
To: Bode, Meikel, NM-X-DS 
Cc: sro...@gmail.com; Theodore J Griesenbrock ; 
user@spark.apache.org
Subject: RE: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target 
release day for Spark3.3?

Sie erhalten nicht oft E-Mail von 
"liuj...@cn.ibm.com". Weitere Informationen, warum 
dies wichtig ist
hi, Meikel, would you pls help to add both of us 
(t...@ibm.com, 
liuj...@cn.ibm.com) to mailing lists: 
user@spark.apache.org ? thanks!
Juan Liu (刘娟) PMP®
Release Manager, Watson Health, China Development Lab
Email: liuj...@cn.ibm.com
Mobile: 86-13521258532





From:"Bode, Meikel, NM-X-DS" 
mailto:meikel.b...@bertelsmann.de>>
To:"Theodore J Griesenbrock" mailto:t...@ibm.com>>, 
"sro...@gmail.com" 
mailto:sro...@gmail.com>>
Cc:"Juan Liu" mailto:liuj...@cn.ibm.com>>, 
"user@spark.apache.org" 
mailto:user@spark.apache.org>>
Date:2022/01/20 03:05 PM
Subject:[EXTERNAL] RE: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and 
how? your target release day for Spark3.3?




Hi, New releases are announced via mailing lists 
user@spark.apache.org & 
d...@spark.apache.org. Best, Meikel From: 
Theodore J Griesenbrock mailto:t...@ibm.com>> Sent: Mittwoch, 19. 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi,



New releases are announced via mailing lists 
user@spark.apache.org& 
d...@spark.apache.org.



Best,

Meikel



From:Theodore J Griesenbrock mailto:t...@ibm.com>>
Sent: Mittwoch, 19. Januar 2022 18:50
To: sro...@gmail.com
Cc: Juan Liu mailto:liuj...@cn.ibm.com>>; 
user@spark.apache.org
Subject: RE: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target 
release day for Spark3.3?



Sie erhalten nicht oft E-Mail von "t...@ibm.com". Weitere 
Informationen, warum dies wichtig 
ist

Again, sorry to bother you.



What is the best option available to ensure we get notified when a new version 
is released for Apache Spark?  I do not see any RSS feeds, nor do I see any 
e-mail subscription option for this page:  
https://spark.apache.org/news/index.html



Please let me know what we can do to ensure we stay up to date with the news.



Thanks!



-T.J.





T.J. Griesenbrock

Technical Release Manager

Watson Health

He/Him/His



+1 (602) 377-7673 (Text only)
t...@ibm.com

IBM





- Original message -
From: "Sean Owen" mailto:sro...@gmail.com>>
To: "Juan Liu" mailto:liuj...@cn.ibm.com>>
Cc: "Theodore J Griesenbrock" mailto:t...@ibm.com>>, "User" 
mailto:user@spark.apache.org>>
Subject: [EXTERNAL] Re: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? 
your target release day for Spark3.3?
Date: Thu, Jan 13, 2022 08:05

Yes, Spark does not use the SocketServer mentioned in CVE-2019-17571, however, 
so is not affected.

3.3.0 would probably be out in a couple months.



On Thu, Jan 13, 2022 at 3:14 AM Juan Liu 
mailto:liuj...@cn.ibm.com>> wrote:

We are informed that CVE-2021-4104 is not only problem with Log4J 1.x. There is 
one more CVE-2019-17571, and as Apache announced EOL in 2015, so Spark 3.3.0 
will be very expected. Do you think middle 2022 is a reasonable time for Spark 
3.3.0 release?


Juan Liu (刘娟) PMP®




Release Management, Watson Health, China Development 

questions on these functions

2022-01-21 Thread Sherd Fox
Hello sparkers,

What were the differences for leftFold, rightFold and the fold in RDD
functions?

I am not very clear about the usage of them.

Thanks.