df.count()
but this seems clunky (and expensive) for something that should be easy to
keep track of. I then thought accumulators might be the solution, but it
seems that I would have to do a second pass through the data at least to
"addInPlace" to the lines total. I might as well do that
;> > On Fri, 29 May 2020 11:16:12 -0700
>>>>>> > Something Something wrote:
>>>>>> >
>>>>>> > > Did you try this on the Cluster? Note: This works just fine under
>>>>>> 'Local'
>>>>>> > >
his works just fine under
>>>>> 'Local'
>>>>> > > mode.
>>>>> > >
>>>>> > > On Thu, May 28, 2020 at 9:12 PM ZHANG Wei
>>>>> wrote:
>>>>> > >
>>>>> > > > I c
ener {
>>>> > > > override def onQueryProgress(event:
>>>> > > > StreamingQueryListener.QueryProgressEvent): Unit = {
>>>> > > > println(event.progress.id + " is on progress")
>>>> > > > println(s&qu
s")
>>> > > > println(s"My accu is ${myAcc.value} on query progress")
>>> > > > }
>>> > > > ...
>>> > > > })
>>> > > >
>>> > > >
; state: ${state}")
>> > > > ...
>> > > > }
>> > > >
>> > > > val wordCounts = words
>> > > > .groupByKey(v => ...)
>> > > > .m
.mapGroupsWithState(timeoutConf =
> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc)
> > > >
> > > > val query = wordCounts.writeStream
> > > > .outputMode(OutputMode.Update)
> > > > ...
> >
meTimeout)(func = mappingFunc)
> > >
> > > val query = wordCounts.writeStream
> > > .outputMode(OutputMode.Update)
> > > ...
> > > ```
> > >
> > > I'm wondering if there were any errors can be found from driver logs? The
> >
; micro-batch
> > exceptions won't terminate the streaming job running.
> >
> > For the following code, we have to make sure that `StateUpdateTask` is
> > started:
> > > .mapGroupsWithState(
> > >
, Iterator
> eventsIterator, GroupState state) {
> }
> }
>
> On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote:
>
>>
>> Yes, accumulators are updated in the call method of StateUpdateTask. Like
>> when state times out or when the data is pushed to next Kafka topic
*--> I was expecting to
see 'accumulator' here in the definition.*
@Override
public ModelUpdate call(String productId, Iterator
eventsIterator, GroupState state) {
}
}
On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote:
>
> Yes, accumulators are updated in the call method of StateU
Yes, accumulators are updated in the call method of StateUpdateTask. Like
when state times out or when the data is pushed to next Kafka topic etc.
On Fri, May 29, 2020 at 11:55 PM Something Something <
mailinglist...@gmail.com> wrote:
> Thanks! I will take a look at the link. Just one
Thanks! I will take a look at the link. Just one question, you seem to be
passing 'accumulators' in the constructor but where do you use it in the
StateUpdateTask class? I am still missing that connection. Sorry, if my
question is dumb. I must be missing something. Thanks for your help so far
ro-batch
> exceptions won't terminate the streaming job running.
>
> For the following code, we have to make sure that `StateUpdateTask` is
> started:
> > .mapGroupsWithState(
> > new
> StateUpdateTask(Long.pa
event.getId(), Encoders.STRING())
>> .mapGroupsWithState(
>> new
>> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
>> appConfig, accumulators),
>>
tructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
> Encoders.bean(ModelUpdate.class),
> GroupStateTimeout.ProcessingTimeTimeout());
--
Cheers,
-z
On Thu, 28 M
getId(), Encoders.STRING())
> .mapGroupsWithState(
> new
> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
>
Giving the code below:
//accumulators is a class level variable in driver.
sparkSession.streams().addListener(new StreamingQueryListener() {
@Override
public void onQueryStarted(QueryStartedEvent queryStarted) {
logger.info("Query st
new
> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
> Encoders.bean(ModelUpdate.class),
>
ate(
new
StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
appConfig, accumulators),
Encoders.bean(ModelStateInfo.class),
Encoders.bean(ModelUpdate.cl
'? We're
experiencing this only under 'Stateful Structured Streaming'. In other
streaming applications it works as expected.
On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote:
> Yes, I am talking about Application specific Accumulators. Actually I am
> getting the values printed in my driv
Yes, I am talking about Application specific Accumulators. Actually I am
getting the values printed in my driver log as well as sent to Grafana. Not
sure where and when I saw 0 before. My deploy mode is “client” on a yarn
cluster(not local Mac) where I submit from master node. It should work
Hmm... how would they go to Graphana if they are not getting computed in
your code? I am talking about the Application Specific Accumulators. The
other standard counters such as 'event.progress.inputRowsPerSecond' are
getting populated correctly!
On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote
[3]
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator
>>
>> ____
>> From: Something Something
>> Sent: Saturday, May 16, 2020 0:38
>> To: spark-user
>> Subject: Re: Using Spark Accumulators with Structured Streaming
&g
_
> From: Something Something
> Sent: Saturday, May 16, 2020 0:38
> To: spark-user
> Subject: Re: Using Spark Accumulators with Structured Streaming
>
> Can someone from Spark Development team tell me if this functionality is
> supported and tested? I've spent a lot of time
://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator
From: Something Something
Sent: Saturday, May 16, 2020 0:38
To: spark-user
Subject: Re: Using Spark Accumulators with Structured Streaming
Can someone from Spark Development
accumulators. Here's the definition:
class CollectionLongAccumulator[T]
extends AccumulatorV2[T, java.util.Map[T, Long]]
When the job begins we register an instance of this class:
spark.sparkContext.register(myAccumulator, "MyAccumulator")
Is this working under Structured Streaming?
I
In my structured streaming job I am updating Spark Accumulators in the
updateAcrossEvents method but they are always 0 when I try to print them in
my StreamingListener. Here's the code:
.mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())(
updateAcrossEvents
Yes!
Thanks!
~Kedar Dixit
Bigdata Analytics at Persistent Systems Ltd.
From: Holden Karau <hol...@pigscanfly.ca>
Sent: 14 November 2017 20:04:50
To: Kedarnath Dixit
Cc: user@spark.apache.org
Subject: Re: Use of Accumulators
And where do you want t
[via Apache Spark User List] [
> mailto:ml+s1001560n29995...@n3.nabble.com
> <ml+s1001560n29995...@n3.nabble.com>]
> *Sent:* Tuesday, November 14, 2017 1:16 PM
> *To:* Kedarnath Dixit <kedarnath_di...@persistent.com>
>
>
> *Subject:* Re: Use of Accumulators
>
>
&g
: Holden Karau [via Apache Spark User List]
[mailto:ml+s1001560n29995...@n3.nabble.com]
Sent: Tuesday, November 14, 2017 1:16 PM
To: Kedarnath Dixit
<kedarnath_di...@persistent.com<mailto:kedarnath_di...@persistent.com>>
Subject: Re: Use of Accumulators
So you want to set an accumulato
So you want to set an accumulator to 1 after a transformation has fully
completed? Or what exactly do you want to do?
On Mon, Nov 13, 2017 at 9:47 PM vaquar khan <vaquar.k...@gmail.com> wrote:
> Confirmed ,you can use Accumulators :)
>
> Regards,
> Vaquar khan
>
> On Mo
Confirmed ,you can use Accumulators :)
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit <
kedarnath_di...@persistent.com> wrote:
> Hi,
>
>
> We need some way to toggle the flag of a variable in transformation.
>
>
> We are thinking to make
Hi,
We need some way to toggle the flag of a variable in transformation.
We are thinking to make use of spark Accumulators for this purpose.
Can we use these as below:
Variables -> Initial Value
Variable1 -> 0
Variable2 -> 0
In one of the transformations if we nee
path single-threaded and pass around the result when the task
competes; which sounds like AccumulatorV2.
I started rewriting the instrumented logic to be based off accumulators,
but having a hard time getting these to show up in the UI/API (using this
to see if I am linking things properly).
So my
> seem to find the accumulator:
>
> First way:
>
> AccumulatorContext.lookForAccumulatorByName("accumulator-name").
>
> map(accum => {
> accum.asInstanceOf[MyCustomAccumulator].add(*k, v*))
> })
>
>
> Second way:
>
> taskContext.taskMetrics().accumula
v*))
})
Second way:
taskContext.taskMetrics().accumulators().
filter(_.name == Some("accumulator-name")).
map(accum => {
accum.asInstanceOf[MyCustomAccumulator].add(*k, v*))
})
Thanks
Tarun
As you noted, Accumulators do not guarantee accurate results except in
specific situations. I recommend never using them.
This article goes into some detail on the problems with accumulators:
http://imranrashid.com/posts/Spark-Accumulators/
On Wed, Mar 1, 2017 at 7:26 AM, Charles O. Bajomo
Hello everyone,
I wanted to know if there is any benefit to using an acculumator over just
executing a count() on the whole RDD. There seems to be a lot of issues with
accumulator during a stage failure and also seems to be an issue rebuilding
them if the application restarts from a
Accumulators aren't related directly to RDDs or Datasets. They're a
separate construct. You can imagine updating accumulators in any
distributed operation that you see documented for RDDs or Datasets.
On Wed, Jan 18, 2017 at 2:16 PM Hanna Mäki <hanna.m...@comptel.com> wrote:
Hi,
The documentation
(http://spark.apache.org/docs/latest/programming-guide.html#accumulators)
describes how to use accumulators with RDDs, but I'm wondering if and how I
can use accumulators with the Dataset API.
BR,
Hanna
--
View this message in context:
http://apache-spark-user-list
On 12 Dec 2016, at 19:57, Daniel Siegmann
<dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:
Accumulators are generally unreliable and should not be used. The answer to (2)
and (4) is yes. The answer to (3) is both.
Here's a more in-depth explan
Thank you for the clarification.
On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann <
dsiegm...@securityscorecard.io> wrote:
> Accumulators are generally unreliable and should not be used. The answer
> to (2) and (4) is yes. The answer to (3) is both.
>
> Here's a more in-depth e
Accumulators are generally unreliable and should not be used. The answer to
(2) and (4) is yes. The answer to (3) is both.
Here's a more in-depth explanation:
http://imranrashid.com/posts/Spark-Accumulators/
On Sun, Dec 11, 2016 at 11:27 AM, Sudev A C <sudev...@goibibo.com> wrote:
> Pl
Please help.
Anyone, any thoughts on the previous mail ?
Thanks
Sudev
On Fri, Dec 9, 2016 at 2:28 PM Sudev A C <sudev...@goibibo.com> wrote:
> Hi,
>
> Can anyone please help clarity on how accumulators can be used reliably to
> measure error/success/analytical metrics ?
>
Hi,
Can anyone please help clarity on how accumulators can be used reliably to
measure error/success/analytical metrics ?
Given below is use case / code snippet that I have.
val amtZero = sc.accumulator(0)
> val amtLarge = sc.accumulator(0)
> val amtNormal = sc.accumulator(0)
> val
Hi all,
In order to recover Accumulators (functionally) from a Driver failure, it
is recommended to use it within a foreachRDD/transform and use the RDD
context with a Singleton wrapping the Accumulator as shown in the examples
<https://github.com/apache/spark/blob/branch-1.6/examples/src/m
Hi all,
To recover (functionally) Accumulators from Driver failure in a streaming
application, we wrap them in a "getOrCreate" Singleton as shown here
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCou
Hi,
I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator.
My accumulator is simply counting rows: acc += 1.
My test processes 4 files each with 4 rows however the value of the
accumulator in the end is not 16 and even worse is inconsistent between
runs.
Are accumulators
On 25 May 2016 6:00 p.m., "Daniel Barclay" <danielbarclay@gmail.com>
wrote:
>
> Was the feature of displaying accumulators in the Spark UI implemented in
Spark 1.4.1, or was that added later?
Dunno, but only *named* *accumulators* are displayed in Spark’s webUI
(under
Was the feature of displaying accumulators in the Spark UI implemented in Spark
1.4.1, or was that added later?
Thanks,
Daniel
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
Hi,
My question is about having a lot of counters in spark to keep track of bad/null
values in my rdd its descried in detail in below stackoverflow link
http://stackoverflow.com/questions/35953400/too-many-accumulators-in-spark-job
Posting to the user group to get more traction. Appreciate your
ssage without saveAs
INFO : org.apache.spark.executor.Executor - Finished task 0.0 in stage 1.0 (TID
1). 987 bytes result sent to driver
INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (foreachRDD at
KafkaURLStreaming.java:90) finished in 0.103 s
INFO : org.apache.spark.sche
Hi Rachana,
don't you have two messages on the kafka broker ?
Regards
JB
On 01/05/2016 05:14 PM, Rachana Srivastava wrote:
I have a very simple two lines program. I am getting input from Kafka
and save the input in a file and counting the input received. My code
looks like this, when I run
I have a very simple two lines program. I am getting input from Kafka and save
the input in a file and counting the input received. My code looks like this,
when I run this code I am getting two accumulator count for each input.
HashMap kafkaParams = new HashMap
; tasks from ResultStage 1 (MapPartitionsRDD[3] at map at
> KafkaURLStreaming.java:83)
>
> INFO : org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 1.0
> with 1 tasks
>
> INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in
> stage 1.0 (TID 1, localhost, AN
batch and a streaming driver using same functions (Scala). I use
> accumulators (passed to functions constructors) to count stuff.
>
> In the batch driver, doing so in the right point of the pipeline, I'm able
> to retrieve the accumulator value and print it as log4j log.
>
> In the st
Hello,
I have a batch and a streaming driver using same functions (Scala). I use
accumulators (passed to functions constructors) to count stuff.
In the batch driver, doing so in the right point of the pipeline, I'm able
to retrieve the accumulator value and print it as log4j log
to creating many discrete accumulators
* The merge operation is add the values on key conflict
* I’m adding K->Vs to this accumulator in a variety of places (maps,
flatmaps, transforms and updateStateBy key)
* In a foreachRdd at the end of the transformations I’m reading the
accumula
It seems like there is not much literature about Spark's Accumulators so I
thought I'd ask here:
Do Accumulators reside in a Task ? Are they being serialized with the task ?
Sent back on task completion as part of the ResultTask ?
Are they reliable ? If so, when ? Can I relay on accumulators
Can you show related code in DriverAccumulator.java ?
Which Spark release do you use ?
Cheers
On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal anubha...@gmail.com wrote:
Hi,
I am trying to modify my code to use HDFS and multiple nodes. The code
works fine when I run it locally in a single
Hi,
I am trying to modify my code to use HDFS and multiple nodes. The code
works fine when I run it locally in a single machine with a single worker.
I have been trying to modify it and I get the following error. Any hint
would be helpful.
java.lang.NullPointerException
at
The code was written in 1.4 but I am compiling it and running it with 1.3.
import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap;
import org.apache.spark.AccumulableParam;
import scala.Tuple4;
import thomsonreuters.trailblazer.operation.DriverCalc;
import
Putting your code in a file I find the following on line 17:
stepAcc = new StepAccumulator();
However I don't think that was where the NPE was thrown.
Another thing I don't understand was that there were two addAccumulator()
calls at the top of stack trace while in your code I
to update accumulators for
ResultTask(1, 16)
java.util.NoSuchElementException: key not found: 2
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2.For accumulator variable it says :
15/07/29 19:23:12 ERROR DAGScheduler: Failed to update accumulators for
ResultTask(1, 16)
java.util.NoSuchElementException: key not found: 2
(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2.For accumulator variable it says :
15/07/29 19:23:12 ERROR DAGScheduler: Failed to update accumulators for
ResultTask(1, 16)
java.util.NoSuchElementException: key not found: 2
at scala.collection.MapLike$class.default
ERROR DAGScheduler: Failed to update accumulators for
ResultTask(1, 16)
java.util.NoSuchElementException: key not found: 2
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58
.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators
are per partition (so per task as its the same) and are sent
back to the driver with the task result and merged. When a
task needs to be run n times (multiple rdds depend
June 2015 at 21:32, Will Briggs wrbri...@gmail.com wrote:
It sounds like accumulators are not necessary in Spark Streaming - see
this post (
http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
for more details.
On June 21, 2015, at 7:31 PM
If I am not mistaken, one way to see the accumulators is that they are just
write-only for the workers and their value can be read by the driver.
Therefore they cannot be used for ID generation as you wish.
On 22 June 2015 at 04:30, anshu shukla anshushuk...@gmail.com wrote:
But i just want
the accumulators is that they are
just write-only for the workers and their value can be read by the driver.
Therefore they cannot be used for ID generation as you wish.
On 22 June 2015 at 04:30, anshu shukla anshushuk...@gmail.com wrote:
But i just want to update rdd , by appending unique
StreamingContext.sparkContext()
On 21 June 2015 at 21:32, Will Briggs wrbri...@gmail.com wrote:
It sounds like accumulators are not necessary in Spark Streaming - see
this post (
http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
for more
It sounds like accumulators are not necessary in Spark Streaming - see this
post (
http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
for more details.
On June 21, 2015, at 7:31 PM, anshu shukla anshushuk...@gmail.com wrote:
In spark Streaming
In spark Streaming ,Since we are already having Streaming context , which
does not allows us to have accumulators .We have to get sparkContext for
initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help !!
--
Thanks Regards,
Anshu Shukla
Hi,
Thank you for this confirmation.
Coalescing is what we do now. It creates, however, very big partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators are per partition
(so per task as its the same) and are sent back to the driver with the task
result
is what we do now. It creates, however, very big
partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators are per
partition (so per task as its the same) and are sent back to the
driver with the task result and merged. When a task needs
Hey,
I am not 100% sure but from my understanding accumulators are per partition
(so per task as its the same) and are sent back to the driver with the task
result and merged. When a task needs to be run n times (multiple rdds
depend on this one, some partition loss later in the chain etc
for this confirmation.
Coalescing is what we do now. It creates, however, very big partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators are per
partition (so per task as its the same) and are sent back to the driver
with the task result and merged. When a task
but from my understanding accumulators are per
partition (so per task as its the same) and are sent back to the driver
with the task result and merged. When a task needs to be run n times
(multiple rdds depend on this one, some partition loss later in the chain
etc) then the accumulator will count
Hi,
I'm trying to figure out the smartest way to implement a global
count-min-sketch on accumulators. For now, we are doing that with RDDs.
It works well, but with one sketch per partition, merging takes too long.
As you probably know, a count-min sketch is a big mutable array of array
, but it
would become over complicated.
2015-06-18 14:27 GMT+02:00 Guillaume Pitel guillaume.pi...@exensa.com:
Hi,
Thank you for this confirmation.
Coalescing is what we do now. It creates, however, very big partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators
Hello all,
I have accumulator in spark streaming application which counts number of
events received from Kafka.
From the documentation , It seems Spark UI has support to display it .
But I am unable to see it on UI. I am using spark 1.3.1
Do I need to call any method (print) or am I missing
You need to make sure to name the accumulator.
On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote nagmote.sne...@gmail.com
wrote:
Hello all,
I have accumulator in spark streaming application which counts number of
events received from Kafka.
From the documentation , It seems Spark UI has
? And then the final result
will be many times of the real result?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
), then the final result
of acculumator will be 2, twice as the correct result?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
), this operation will be
execuated more than once if the task restarte? And then the final result
will be many times of the real result?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html
Sent from the Apache Spark User List
Given the lazy nature of an RDD if you use an accumulator inside a map()
and then you call count and saveAsTextfile over that accumulator will be
called twice. IMHO, accumulators are a bit nondeterministic you need to be
sure when to read them to avoid unexpected re-executions
El 3/5/2015 2:09 p
://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
Hello,
I am seeing negative values for accumulators. Here's my implementation in a
standalone app in Spark 1.1.1rc:
implicit object BigIntAccumulatorParam extends AccumulatorParam[BigInt] {
def addInPlace(t1: Int, t2: BigInt) = BigInt(t1) + t2
def addInPlace(t1: BigInt, t2: BigInt
Sanity-check: would it be possible that `threshold_var` be negative ?
—
FG
On Fri, Jan 30, 2015 at 5:06 PM, Peter Thai thai.pe...@gmail.com wrote:
Hello,
I am seeing negative values for accumulators. Here's my implementation in a
standalone app in Spark 1.1.1rc:
implicit object
in my wording.
Should be I'm assuming it's not immediately aggregating on the driver
each time I call the += on the Accumulator.
On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet cjno...@gmail.com wrote:
What are the limitations of using Accumulators to get a union of a bunch
of small sets
Just noticed an error in my wording.
Should be I'm assuming it's not immediately aggregating on the driver
each time I call the += on the Accumulator.
On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet cjno...@gmail.com wrote:
What are the limitations of using Accumulators to get a union of a bunch
What are the limitations of using Accumulators to get a union of a bunch of
small sets?
Let's say I have an RDD[Map{String,Any} and i want to do:
rdd.map(accumulator += Set(_.get(entityType).get))
What implication does this have on performance? I'm assuming it's not
immediately aggregating
Hi,
The Spark documentation states that If accumulators are created with a
name, they will be displayed in Spark’s UI
http://spark.apache.org/docs/latest/programming-guide.html#accumulators
Where exactly are they shown? I may be dense, but I can't find them on the
UI from http://localhost:4040
Hi
Does spark have built in possiblity of exposing current value of
Accumulator [1] using Monitoring and Instrumentation [2].
Unfortunately I couldn't find anything in Sources which could be used.
Does it mean only way to expose current accumulator value is to implement
new Source which would
Hi all – I’m running a long running batch-processing job with Spark through
Yarn. I am doing the following
Batch Process
val resultsArr =
sc.accumulableCollection(mutable.ArrayBuffer[ListenableFuture[Result]]())
InMemoryArray.forEach{
1) Using a thread pool, generate callable jobs that
...@capitalone.com
Date: Tuesday, December 16, 2014 at 10:23 AM
To: 'user@spark.apache.orgmailto:'user@spark.apache.org'
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Understanding disk usage with Accumulators
Hi all – I’m running a long running batch-processing job with Spark through
Yarn. I am
={
myAccumulator+=math.min(x._1, 100)
})
//works
someRDD.foreach(x={
myAccumulator+=x._1+100
})
Any ideas?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20183.html
Sent from the Apache Spark User List mailing list archive
: org.apache.spark.Accumulator[scala.math.BigInt] = 0
scala accu += 100
scala accu.value
res1: scala.math.BigInt = 100
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20199.html
Sent from the Apache Spark User List mailing list archive
1 - 100 of 123 matches
Mail list logo