On 22/01/2022 11:07, Renan F. Souza wrote:
unsubscribe
You could be able to unsubscribe yourself from the list by sending an
email to:
user-unsubscr...@spark.apache.org
thanks.
-
To unsubscribe e-mail:
unsubscribe
Hello,
I migrated my code to Spark 3.2 and I am facing some issues. When I run my
unit tests via Maven, I get this error:
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.rdd.RDDOperationScope$
which is not super nice.
However, when I run my test via Intellij, I get
Try searching here:
https://lists.apache.org/list.html?user@spark.apache.org
-T.J.
T.J. Griesenbrock
Technical Release Manager
Watson Health
He/Him/His
+1 (602) 377-7673 (Text only)t...@ibm.com
IBM
- Original message -From: "Mich Talebzadeh" To:Cc: "user @spark" Subject:
Well agreed that this user@spark is a great place to search for answers and
no I don't think this email list is indexed by Google.
For this reason I use gmail and all my user@/dev@ memberships are added to
my gmail account. For example, I can search starting from 2016 onwards the
gmail mailing
Probably, because Spark prefers locality, but not necessarily.
On Fri, Jan 21, 2022 at 2:10 PM Siddhesh Kalgaonkar <
kalgaonkarsiddh...@gmail.com> wrote:
> Thank you so much for this information, Sean. One more question, that when
> it wants to re-run the failed partition, where does it run? On
Interesting. I noticed that my drive log messages with time stamp, function
name but no line number. However log message in other python files only contain
the messages. All of my python code is a single zip file. The zip file is job
submit argument
2022-01-21 19:45:02 WARN __main__:? -
Thank you so much for this information, Sean. One more question, that when
it wants to re-run the failed partition, where does it run? On the same
node or some other node?
On Fri, 21 Jan 2022, 23:41 Sean Owen, wrote:
> The Spark program already knows the partitions of the data and where they
>
The Spark program already knows the partitions of the data and where they
exist; that's just defined by the data layout. It doesn't care what data is
inside. It knows partition 1 needs to be processed and if the task
processing it fails, needs to be run again. I'm not sure where you're
seeing data
unsubscribe
Thanks
Okay, so suppose I have 10 records distributed across 5 nodes and the
partition of the first node holding 2 records failed. I understand that it
will re-process this partition but how will it come to know that XYZ
partition was holding XYZ data so that it will pick again only those
records and
There is a ton of great info in this archive. I noticed when I do a google
search it does not seem to find results from this source
Kind regards
Andy
In that case, the file exists in parts across machines. No, tasks won't
re-read the whole file; no task does or can do that. Failed partitions are
reprocessed, but as in the first pass, the same partition is processed.
On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
Hello team,
I am aware that in case of memory issues when a task fails, it will try to
restart 4 times since it is a default number and if it still fails then it
will cause the entire job to fail.
But suppose if I am reading a file that is distributed across nodes in
partitions. So, what will
Hello, I tried using a cassandra unshaded connector or normal connector
both are giving the same error at runtime while connecting to cassandra.
"com.datastax.spark" %% "spark-cassandra-connector-unshaded" % "2.4.2"
Or
"com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0"
Russ
Eh, are you talking about foldLeft and foldRight in Scala? those are not
from Hadoop or Spark.
They are common functions in functional languages. They 'fold' a value into
a new value by apply a function to the starting value and every element of
a collection.
Because the op may be non-commutative,
sorry I am programming with scala who has these functions.
regards.
On Fri, 21 Jan 2022 at 20:59, Theodore J Griesenbrock wrote:
> I discovered several instances of discussion on leftFold and rightFold in
> a variety of forums, but I can not find anything related to RDD in the
> official
I discovered several instances of discussion on leftFold and rightFold in a variety of forums, but I can not find anything related to RDD in the official documentation:
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
It appears to be non-related to Spark, and
Hello Juan Liu,
The release process is well documented (see last step on announcement):
https://spark.apache.org/release-process.html
To (un)subcribe to the mailing lists see:
https://spark.apache.org/community.html
Best,
Meikel
Meikel Bode, MSc
Senior Manager | Head of SAP Data Platforms &
Hello sparkers,
What were the differences for leftFold, rightFold and the fold in RDD
functions?
I am not very clear about the usage of them.
Thanks.
20 matches
Mail list logo