Accumulators and other important metrics for your job

2021-05-27 Thread Hamish Whittal
df.count() but this seems clunky (and expensive) for something that should be easy to keep track of. I then thought accumulators might be the solution, but it seems that I would have to do a second pass through the data at least to "addInPlace" to the lines total. I might as well do that

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
;> > On Fri, 29 May 2020 11:16:12 -0700 >>>>>> > Something Something wrote: >>>>>> > >>>>>> > > Did you try this on the Cluster? Note: This works just fine under >>>>>> 'Local' >>>>>> > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
his works just fine under >>>>> 'Local' >>>>> > > mode. >>>>> > > >>>>> > > On Thu, May 28, 2020 at 9:12 PM ZHANG Wei >>>>> wrote: >>>>> > > >>>>> > > > I c

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
ener { >>>> > > > override def onQueryProgress(event: >>>> > > > StreamingQueryListener.QueryProgressEvent): Unit = { >>>> > > > println(event.progress.id + " is on progress") >>>> > > > println(s&qu

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
s") >>> > > > println(s"My accu is ${myAcc.value} on query progress") >>> > > > } >>> > > > ... >>> > > > }) >>> > > > >>> > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
; state: ${state}") >> > > > ... >> > > > } >> > > > >> > > > val wordCounts = words >> > > > .groupByKey(v => ...) >> > > > .m

Re: Using Spark Accumulators with Structured Streaming

2020-06-07 Thread Something Something
.mapGroupsWithState(timeoutConf = > > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc) > > > > > > > > val query = wordCounts.writeStream > > > > .outputMode(OutputMode.Update) > > > > ... > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-04 Thread ZHANG Wei
meTimeout)(func = mappingFunc) > > > > > > val query = wordCounts.writeStream > > > .outputMode(OutputMode.Update) > > > ... > > > ``` > > > > > > I'm wondering if there were any errors can be found from driver logs? The > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-01 Thread ZHANG Wei
; micro-batch > > exceptions won't terminate the streaming job running. > > > > For the following code, we have to make sure that `StateUpdateTask` is > > started: > > > .mapGroupsWithState( > > >

Re: Using Spark Accumulators with Structured Streaming

2020-05-30 Thread Srinivas V
, Iterator > eventsIterator, GroupState state) { > } > } > > On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > >> >> Yes, accumulators are updated in the call method of StateUpdateTask. Like >> when state times out or when the data is pushed to next Kafka topic

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
*--> I was expecting to see 'accumulator' here in the definition.* @Override public ModelUpdate call(String productId, Iterator eventsIterator, GroupState state) { } } On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > > Yes, accumulators are updated in the call method of StateU

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
Yes, accumulators are updated in the call method of StateUpdateTask. Like when state times out or when the data is pushed to next Kafka topic etc. On Fri, May 29, 2020 at 11:55 PM Something Something < mailinglist...@gmail.com> wrote: > Thanks! I will take a look at the link. Just one

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
Thanks! I will take a look at the link. Just one question, you seem to be passing 'accumulators' in the constructor but where do you use it in the StateUpdateTask class? I am still missing that connection. Sorry, if my question is dumb. I must be missing something. Thanks for your help so far

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
ro-batch > exceptions won't terminate the streaming job running. > > For the following code, we have to make sure that `StateUpdateTask` is > started: > > .mapGroupsWithState( > > new > StateUpdateTask(Long.pa

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
event.getId(), Encoders.STRING()) >> .mapGroupsWithState( >> new >> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), >> appConfig, accumulators), >>

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
tructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), > Encoders.bean(ModelUpdate.class), > GroupStateTimeout.ProcessingTimeTimeout()); -- Cheers, -z On Thu, 28 M

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Something Something
getId(), Encoders.STRING()) > .mapGroupsWithState( > new > StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), >

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
Giving the code below: //accumulators is a class level variable in driver. sparkSession.streams().addListener(new StreamingQueryListener() { @Override public void onQueryStarted(QueryStartedEvent queryStarted) { logger.info("Query st

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
new > StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), > Encoders.bean(ModelUpdate.class), >

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
ate( new StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), appConfig, accumulators), Encoders.bean(ModelStateInfo.class), Encoders.bean(ModelUpdate.cl

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Something Something
'? We're experiencing this only under 'Stateful Structured Streaming'. In other streaming applications it works as expected. On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote: > Yes, I am talking about Application specific Accumulators. Actually I am > getting the values printed in my driv

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
Yes, I am talking about Application specific Accumulators. Actually I am getting the values printed in my driver log as well as sent to Grafana. Not sure where and when I saw 0 before. My deploy mode is “client” on a yarn cluster(not local Mac) where I submit from master node. It should work

Re: Using Spark Accumulators with Structured Streaming

2020-05-26 Thread Something Something
Hmm... how would they go to Graphana if they are not getting computed in your code? I am talking about the Application Specific Accumulators. The other standard counters such as 'event.progress.inputRowsPerSecond' are getting populated correctly! On Mon, May 25, 2020 at 8:39 PM Srinivas V wrote

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
[3] >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator >> >> ____ >> From: Something Something >> Sent: Saturday, May 16, 2020 0:38 >> To: spark-user >> Subject: Re: Using Spark Accumulators with Structured Streaming &g

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Something Something
_ > From: Something Something > Sent: Saturday, May 16, 2020 0:38 > To: spark-user > Subject: Re: Using Spark Accumulators with Structured Streaming > > Can someone from Spark Development team tell me if this functionality is > supported and tested? I've spent a lot of time

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread ZHANG Wei
://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator From: Something Something Sent: Saturday, May 16, 2020 0:38 To: spark-user Subject: Re: Using Spark Accumulators with Structured Streaming Can someone from Spark Development

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread Something Something
accumulators. Here's the definition: class CollectionLongAccumulator[T] extends AccumulatorV2[T, java.util.Map[T, Long]] When the job begins we register an instance of this class: spark.sparkContext.register(myAccumulator, "MyAccumulator") Is this working under Structured Streaming? I

Using Spark Accumulators with Structured Streaming

2020-05-14 Thread Something Something
In my structured streaming job I am updating Spark Accumulators in the updateAcrossEvents method but they are always 0 when I try to print them in my StreamingListener. Here's the code: .mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())( updateAcrossEvents

Re: Use of Accumulators

2017-11-14 Thread Kedarnath Dixit
Yes! Thanks! ~Kedar Dixit Bigdata Analytics at Persistent Systems Ltd. From: Holden Karau <hol...@pigscanfly.ca> Sent: 14 November 2017 20:04:50 To: Kedarnath Dixit Cc: user@spark.apache.org Subject: Re: Use of Accumulators And where do you want t

Re: Use of Accumulators

2017-11-14 Thread Holden Karau
[via Apache Spark User List] [ > mailto:ml+s1001560n29995...@n3.nabble.com > <ml+s1001560n29995...@n3.nabble.com>] > *Sent:* Tuesday, November 14, 2017 1:16 PM > *To:* Kedarnath Dixit <kedarnath_di...@persistent.com> > > > *Subject:* Re: Use of Accumulators > > &g

RE: Use of Accumulators

2017-11-14 Thread Kedarnath Dixit
: Holden Karau [via Apache Spark User List] [mailto:ml+s1001560n29995...@n3.nabble.com] Sent: Tuesday, November 14, 2017 1:16 PM To: Kedarnath Dixit <kedarnath_di...@persistent.com<mailto:kedarnath_di...@persistent.com>> Subject: Re: Use of Accumulators So you want to set an accumulato

Re: Use of Accumulators

2017-11-13 Thread Holden Karau
So you want to set an accumulator to 1 after a transformation has fully completed? Or what exactly do you want to do? On Mon, Nov 13, 2017 at 9:47 PM vaquar khan <vaquar.k...@gmail.com> wrote: > Confirmed ,you can use Accumulators :) > > Regards, > Vaquar khan > > On Mo

Re: Use of Accumulators

2017-11-13 Thread vaquar khan
Confirmed ,you can use Accumulators :) Regards, Vaquar khan On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit < kedarnath_di...@persistent.com> wrote: > Hi, > > > We need some way to toggle the flag of a variable in transformation. > > > We are thinking to make

Use of Accumulators

2017-11-13 Thread Kedarnath Dixit
Hi, We need some way to toggle the flag of a variable in transformation. We are thinking to make use of spark Accumulators for this purpose. Can we use these as below: Variables -> Initial Value Variable1 -> 0 Variable2 -> 0 In one of the transformations if we nee

Dynamic Accumulators in 2.x?

2017-10-11 Thread David Capwell
path single-threaded and pass around the result when the task competes; which sounds like AccumulatorV2. I started rewriting the instrumented logic to be based off accumulators, but having a hard time getting these to show up in the UI/API (using this to see if I am linking things properly). So my

Re: Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
> seem to find the accumulator: > > First way: > > AccumulatorContext.lookForAccumulatorByName("accumulator-name"). > > map(accum => { > accum.asInstanceOf[MyCustomAccumulator].add(*k, v*)) > }) > > > Second way: > > taskContext.taskMetrics().accumula

Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
v*)) }) Second way: taskContext.taskMetrics().accumulators(). filter(_.name == Some("accumulator-name")). map(accum => { accum.asInstanceOf[MyCustomAccumulator].add(*k, v*)) }) Thanks Tarun

Re: [Spark] Accumulators or count()

2017-03-01 Thread Daniel Siegmann
As you noted, Accumulators do not guarantee accurate results except in specific situations. I recommend never using them. This article goes into some detail on the problems with accumulators: http://imranrashid.com/posts/Spark-Accumulators/ On Wed, Mar 1, 2017 at 7:26 AM, Charles O. Bajomo

[Spark] Accumulators or count()

2017-03-01 Thread Charles O. Bajomo
Hello everyone, I wanted to know if there is any benefit to using an acculumator over just executing a count() on the whole RDD. There seems to be a lot of issues with accumulator during a stage failure and also seems to be an issue rebuilding them if the application restarts from a

Re: Accumulators and Datasets

2017-01-18 Thread Sean Owen
Accumulators aren't related directly to RDDs or Datasets. They're a separate construct. You can imagine updating accumulators in any distributed operation that you see documented for RDDs or Datasets. On Wed, Jan 18, 2017 at 2:16 PM Hanna Mäki <hanna.m...@comptel.com> wrote:

Accumulators and Datasets

2017-01-18 Thread Hanna Mäki
Hi, The documentation (http://spark.apache.org/docs/latest/programming-guide.html#accumulators) describes how to use accumulators with RDDs, but I'm wondering if and how I can use accumulators with the Dataset API. BR, Hanna -- View this message in context: http://apache-spark-user-list

Re: Few questions on reliability of accumulators value.

2016-12-15 Thread Steve Loughran
On 12 Dec 2016, at 19:57, Daniel Siegmann <dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote: Accumulators are generally unreliable and should not be used. The answer to (2) and (4) is yes. The answer to (3) is both. Here's a more in-depth explan

Re: Few questions on reliability of accumulators value.

2016-12-13 Thread Sudev A C
Thank you for the clarification. On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann < dsiegm...@securityscorecard.io> wrote: > Accumulators are generally unreliable and should not be used. The answer > to (2) and (4) is yes. The answer to (3) is both. > > Here's a more in-depth e

Re: Few questions on reliability of accumulators value.

2016-12-12 Thread Daniel Siegmann
Accumulators are generally unreliable and should not be used. The answer to (2) and (4) is yes. The answer to (3) is both. Here's a more in-depth explanation: http://imranrashid.com/posts/Spark-Accumulators/ On Sun, Dec 11, 2016 at 11:27 AM, Sudev A C <sudev...@goibibo.com> wrote: > Pl

Re: Few questions on reliability of accumulators value.

2016-12-11 Thread Sudev A C
Please help. Anyone, any thoughts on the previous mail ? Thanks Sudev On Fri, Dec 9, 2016 at 2:28 PM Sudev A C <sudev...@goibibo.com> wrote: > Hi, > > Can anyone please help clarity on how accumulators can be used reliably to > measure error/success/analytical metrics ? >

Few questions on reliability of accumulators value.

2016-12-09 Thread Sudev A C
Hi, Can anyone please help clarity on how accumulators can be used reliably to measure error/success/analytical metrics ? Given below is use case / code snippet that I have. val amtZero = sc.accumulator(0) > val amtLarge = sc.accumulator(0) > val amtNormal = sc.accumulator(0) > val

Fault-tolerant Accumulators in a DStream-only transformations.

2016-11-29 Thread Amit Sela
Hi all, In order to recover Accumulators (functionally) from a Driver failure, it is recommended to use it within a foreachRDD/transform and use the RDD context with a Singleton wrapping the Accumulator as shown in the examples <https://github.com/apache/spark/blob/branch-1.6/examples/src/m

Fault-tolerant Accumulators in stateful operators.

2016-11-22 Thread Amit Sela
Hi all, To recover (functionally) Accumulators from Driver failure in a streaming application, we wrap them in a "getOrCreate" Singleton as shown here <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCou

Using accumulators in Local mode for testing

2016-07-11 Thread harelglik
Hi, I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator. My accumulator is simply counting rows: acc += 1. My test processes 4 files each with 4 rows however the value of the accumulator in the end is not 16 and even worse is inconsistent between runs. Are accumulators

Re: Accumulators displayed in SparkUI in 1.4.1?

2016-05-25 Thread Jacek Laskowski
On 25 May 2016 6:00 p.m., "Daniel Barclay" <danielbarclay@gmail.com> wrote: > > Was the feature of displaying accumulators in the Spark UI implemented in Spark 1.4.1, or was that added later? Dunno, but only *named* *accumulators* are displayed in Spark’s webUI (under

Accumulators displayed in SparkUI in 1.4.1?

2016-05-25 Thread Daniel Barclay
Was the feature of displaying accumulators in the Spark UI implemented in Spark 1.4.1, or was that added later? Thanks, Daniel - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Too Many Accumulators in my Spark Job

2016-03-12 Thread Harshvardhan Chauhan
Hi, My question is about having a lot of counters in spark to keep track of bad/null values in my rdd its descried in detail in below stackoverflow link http://stackoverflow.com/questions/35953400/too-many-accumulators-in-spark-job Posting to the user group to get more traction. Appreciate your

RE: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
ssage without saveAs INFO : org.apache.spark.executor.Executor - Finished task 0.0 in stage 1.0 (TID 1). 987 bytes result sent to driver INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (foreachRDD at KafkaURLStreaming.java:90) finished in 0.103 s INFO : org.apache.spark.sche

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Jean-Baptiste Onofré
Hi Rachana, don't you have two messages on the kafka broker ? Regards JB On 01/05/2016 05:14 PM, Rachana Srivastava wrote: I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run

Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run this code I am getting two accumulator count for each input. HashMap kafkaParams = new HashMap

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Shixiong(Ryan) Zhu
; tasks from ResultStage 1 (MapPartitionsRDD[3] at map at > KafkaURLStreaming.java:83) > > INFO : org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 1.0 > with 1 tasks > > INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in > stage 1.0 (TID 1, localhost, AN

Re: Spark Streaming - print accumulators value every period as logs

2015-12-25 Thread Ali Gouta
batch and a streaming driver using same functions (Scala). I use > accumulators (passed to functions constructors) to count stuff. > > In the batch driver, doing so in the right point of the pipeline, I'm able > to retrieve the accumulator value and print it as log4j log. > > In the st

Spark Streaming - print accumulators value every period as logs

2015-12-24 Thread Roberto Coluccio
Hello, I have a batch and a streaming driver using same functions (Scala). I use accumulators (passed to functions constructors) to count stuff. In the batch driver, doing so in the right point of the pipeline, I'm able to retrieve the accumulator value and print it as log4j log

Re: Accumulators internals and reliability

2015-10-26 Thread Adrian Tanase
to creating many discrete accumulators * The merge operation is add the values on key conflict * I’m adding K->Vs to this accumulator in a variety of places (maps, flatmaps, transforms and updateStateBy key) * In a foreachRdd at the end of the transformations I’m reading the accumula

Accumulators internals and reliability

2015-10-26 Thread Sela, Amit
It seems like there is not much literature about Spark's Accumulators so I thought I'd ask here: Do Accumulators reside in a Task ? Are they being serialized with the task ? Sent back on task completion as part of the ResultTask ? Are they reliable ? If so, when ? Can I relay on accumulators

Re: NullPointException Help while using accumulators

2015-08-03 Thread Ted Yu
Can you show related code in DriverAccumulator.java ? Which Spark release do you use ? Cheers On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal anubha...@gmail.com wrote: Hi, I am trying to modify my code to use HDFS and multiple nodes. The code works fine when I run it locally in a single

NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
Hi, I am trying to modify my code to use HDFS and multiple nodes. The code works fine when I run it locally in a single machine with a single worker. I have been trying to modify it and I get the following error. Any hint would be helpful. java.lang.NullPointerException at

Re: NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
The code was written in 1.4 but I am compiling it and running it with 1.3. import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap; import org.apache.spark.AccumulableParam; import scala.Tuple4; import thomsonreuters.trailblazer.operation.DriverCalc; import

Re: NullPointException Help while using accumulators

2015-08-03 Thread Ted Yu
Putting your code in a file I find the following on line 17: stepAcc = new StepAccumulator(); However I don't think that was where the NPE was thrown. Another thing I don't understand was that there were two addAccumulator() calls at the top of stack trace while in your code I

broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
to update accumulators for ResultTask(1, 16) java.util.NoSuchElementException: key not found: 2 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2.For accumulator variable it says : 15/07/29 19:23:12 ERROR DAGScheduler: Failed to update accumulators for ResultTask(1, 16) java.util.NoSuchElementException: key not found: 2

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2.For accumulator variable it says : 15/07/29 19:23:12 ERROR DAGScheduler: Failed to update accumulators for ResultTask(1, 16) java.util.NoSuchElementException: key not found: 2 at scala.collection.MapLike$class.default

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
ERROR DAGScheduler: Failed to update accumulators for ResultTask(1, 16) java.util.NoSuchElementException: key not found: 2 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-23 Thread Guillaume Pitel
. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run n times (multiple rdds depend

Re: Using Accumulators in Streaming

2015-06-22 Thread anshu shukla
June 2015 at 21:32, Will Briggs wrbri...@gmail.com wrote: It sounds like accumulators are not necessary in Spark Streaming - see this post ( http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) for more details. On June 21, 2015, at 7:31 PM

Re: Using Accumulators in Streaming

2015-06-22 Thread Michal Čizmazia
If I am not mistaken, one way to see the accumulators is that they are just write-only for the workers and their value can be read by the driver. Therefore they cannot be used for ID generation as you wish. On 22 June 2015 at 04:30, anshu shukla anshushuk...@gmail.com wrote: But i just want

Re: Using Accumulators in Streaming

2015-06-22 Thread Michal Čizmazia
the accumulators is that they are just write-only for the workers and their value can be read by the driver. Therefore they cannot be used for ID generation as you wish. On 22 June 2015 at 04:30, anshu shukla anshushuk...@gmail.com wrote: But i just want to update rdd , by appending unique

Re: Using Accumulators in Streaming

2015-06-21 Thread Michal Čizmazia
StreamingContext.sparkContext() On 21 June 2015 at 21:32, Will Briggs wrbri...@gmail.com wrote: It sounds like accumulators are not necessary in Spark Streaming - see this post ( http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) for more

Re: Using Accumulators in Streaming

2015-06-21 Thread Will Briggs
It sounds like accumulators are not necessary in Spark Streaming - see this post ( http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) for more details. On June 21, 2015, at 7:31 PM, anshu shukla anshushuk...@gmail.com wrote: In spark Streaming

Using Accumulators in Streaming

2015-06-21 Thread anshu shukla
In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help !! -- Thanks Regards, Anshu Shukla

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Hi, Thank you for this confirmation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run n times (multiple rdds depend on this one, some partition loss later in the chain etc

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
for this confirmation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run n times (multiple rdds depend on this one, some partition loss later in the chain etc) then the accumulator will count

Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Hi, I'm trying to figure out the smartest way to implement a global count-min-sketch on accumulators. For now, we are doing that with RDDs. It works well, but with one sketch per partition, merging takes too long. As you probably know, a count-min sketch is a big mutable array of array

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
, but it would become over complicated. 2015-06-18 14:27 GMT+02:00 Guillaume Pitel guillaume.pi...@exensa.com: Hi, Thank you for this confirmation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators

Accumulators in Spark Streaming on UI

2015-05-26 Thread Snehal Nagmote
Hello all, I have accumulator in spark streaming application which counts number of events received from Kafka. From the documentation , It seems Spark UI has support to display it . But I am unable to see it on UI. I am using spark 1.3.1 Do I need to call any method (print) or am I missing

Re: Accumulators in Spark Streaming on UI

2015-05-26 Thread Justin Pihony
You need to make sure to name the accumulator. On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote nagmote.sne...@gmail.com wrote: Hello all, I have accumulator in spark streaming application which counts number of events received from Kafka. From the documentation , It seems Spark UI has

Re: Questions about Accumulators

2015-05-03 Thread Eugen Cepoi
? And then the final result will be many times of the real result? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Questions about Accumulators

2015-05-03 Thread Dean Wampler
), then the final result of acculumator will be 2, twice as the correct result? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Questions about Accumulators

2015-05-03 Thread xiazhuchang
), this operation will be execuated more than once if the task restarte? And then the final result will be many times of the real result? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html Sent from the Apache Spark User List

Re: Questions about Accumulators

2015-05-03 Thread Ignacio Blasco
Given the lazy nature of an RDD if you use an accumulator inside a map() and then you call count and saveAsTextfile over that accumulator will be called twice. IMHO, accumulators are a bit nondeterministic you need to be sure when to read them to avoid unexpected re-executions El 3/5/2015 2:09 p

Re: Questions about Accumulators

2015-05-03 Thread xiazhuchang
://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Negative Accumulators

2015-01-30 Thread Peter Thai
Hello, I am seeing negative values for accumulators. Here's my implementation in a standalone app in Spark 1.1.1rc: implicit object BigIntAccumulatorParam extends AccumulatorParam[BigInt] { def addInPlace(t1: Int, t2: BigInt) = BigInt(t1) + t2 def addInPlace(t1: BigInt, t2: BigInt

Re: Negative Accumulators

2015-01-30 Thread francois . garillot
Sanity-check: would it be possible that `threshold_var` be negative ? — FG On Fri, Jan 30, 2015 at 5:06 PM, Peter Thai thai.pe...@gmail.com wrote: Hello, I am seeing negative values for accumulators. Here's my implementation in a standalone app in Spark 1.1.1rc: implicit object

Re: Accumulators

2015-01-15 Thread Imran Rashid
in my wording. Should be I'm assuming it's not immediately aggregating on the driver each time I call the += on the Accumulator. On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet cjno...@gmail.com wrote: What are the limitations of using Accumulators to get a union of a bunch of small sets

Re: Accumulators

2015-01-14 Thread Corey Nolet
Just noticed an error in my wording. Should be I'm assuming it's not immediately aggregating on the driver each time I call the += on the Accumulator. On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet cjno...@gmail.com wrote: What are the limitations of using Accumulators to get a union of a bunch

Accumulators

2015-01-14 Thread Corey Nolet
What are the limitations of using Accumulators to get a union of a bunch of small sets? Let's say I have an RDD[Map{String,Any} and i want to do: rdd.map(accumulator += Set(_.get(entityType).get)) What implication does this have on performance? I'm assuming it's not immediately aggregating

Question on Spark UI/accumulators

2015-01-05 Thread Virgil Palanciuc
Hi, The Spark documentation states that If accumulators are created with a name, they will be displayed in Spark’s UI http://spark.apache.org/docs/latest/programming-guide.html#accumulators Where exactly are they shown? I may be dense, but I can't find them on the UI from http://localhost:4040

Spark Accumulators exposed as Metrics to Graphite

2014-12-30 Thread Łukasz Stefaniak
Hi Does spark have built in possiblity of exposing current value of Accumulator [1] using Monitoring and Instrumentation [2]. Unfortunately I couldn't find anything in Sources which could be used. Does it mean only way to expose current accumulator value is to implement new Source which would

Understanding disk usage with Accumulators

2014-12-16 Thread Ganelin, Ilya
Hi all – I’m running a long running batch-processing job with Spark through Yarn. I am doing the following Batch Process val resultsArr = sc.accumulableCollection(mutable.ArrayBuffer[ListenableFuture[Result]]()) InMemoryArray.forEach{ 1) Using a thread pool, generate callable jobs that

Re: Understanding disk usage with Accumulators

2014-12-16 Thread Ganelin, Ilya
...@capitalone.com Date: Tuesday, December 16, 2014 at 10:23 AM To: 'user@spark.apache.orgmailto:'user@spark.apache.org' user@spark.apache.orgmailto:user@spark.apache.org Subject: Understanding disk usage with Accumulators Hi all – I’m running a long running batch-processing job with Spark through Yarn. I am

Re: Negative Accumulators

2014-12-02 Thread Peter Thai
={ myAccumulator+=math.min(x._1, 100) }) //works someRDD.foreach(x={ myAccumulator+=x._1+100 }) Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20183.html Sent from the Apache Spark User List mailing list archive

Re: Negative Accumulators

2014-12-02 Thread Peter Thai
: org.apache.spark.Accumulator[scala.math.BigInt] = 0 scala accu += 100 scala accu.value res1: scala.math.BigInt = 100 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20199.html Sent from the Apache Spark User List mailing list archive

  1   2   >