Re: [GraphX]: Prevent recomputation of DAG

2024-03-18 Thread Mich Talebzadeh
Hi, I must admit I don't know much about this Fruchterman-Reingold (call it FR) visualization using GraphX and Kubernetes..But you are suggesting this slowdown issue starts after the second iteration, and caching/persisting the graph after each iteration does not help. FR involves many

[GraphX]: Prevent recomputation of DAG

2024-03-17 Thread Marek Berith
Dear community, for my diploma thesis, we are implementing a distributed version of Fruchterman-Reingold visualization algorithm, using GraphX and Kubernetes. Our solution is a backend that continously computes new positions of vertices in a graph and sends them via RabbitMQ to a consumer.

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh
for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >>> tr.phan.tru...@gmail.com> wrote: >>> >>>> Hi, >>>&g

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Khalid Mammadov
learning about Apache Spark and want to know the meaning of each >>> Task created on the Jobs recorded on Spark history. >>> >>> For example, the application I write creates 17 jobs, in which job 0 >>> runs for 10 minutes, there are 2384 small tasks and I want to le

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
gt;> >> >> >> >> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >> tr.phan.tru...@gmail.com> wrote: >> >>> Hi, >>> >>> I am learning about Apache Spark and want to know the meaning of each >>> Task created on the Jobs

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread AN-TRUONG Tran Phan
ite creates 17 jobs, in which job 0 runs >> for 10 minutes, there are 2384 small tasks and I want to learn about the >> meaning of these 2384, is it possible? >> >> I found a picture of DAG in the Jobs and want to know the relationship >> between DAG and Task,

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh
reates 17 jobs, in which job 0 runs > for 10 minutes, there are 2384 small tasks and I want to learn about the > meaning of these 2384, is it possible? > > I found a picture of DAG in the Jobs and want to know the relationship > between DAG and Task, is it possible (Specifically from

Mapping stages in DAG to line of code in pyspark

2021-04-18 Thread Dhruv Kumar
Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code. Can

Re: Where is the DAG stored before catalyst gets it?

2018-10-06 Thread Jacek Laskowski
Hi Jean Georges, > I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Sorry to be that direct, but the sentence does not make much sense to me. Again, very sorry for saying it in the very first sentence. Since I know Jean Georges I allowed

Where is the DAG stored before catalyst gets it?

2018-10-04 Thread Jean Georges Perrin
Hi, I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Correct? tia jg - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark ML DAG Pipelines

2017-09-07 Thread Srikanth Sampath
Hi Spark Experts, Can someone point me to some examples for non-linear (DAG) ML pipelines. That would be of great help. Thanks much in advance -Srikanth

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-29 Thread Nipun Arora
Sending out the message again.. Hopefully someone cal clarify :) I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora
Apogies - Resending as the previous mail went with some unnecessary copy paste. I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all

[Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora
up vote 0 down vote favorite I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have

[Spark Streaming] DAG Execution Model Clarification

2017-05-26 Thread Nipun Arora
Hi, I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have a dstream -A , I do map

How to generate stage for this RDD DAG please?

2017-05-23 Thread ??????????
Hi all, I read some paper about the stage, l know the narrow dependency and shuffle dependency. About the belowing RDD DAG, how deos spark generate the stage DAG please? And is this RDD DAG legal please

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim
a...@insight-centre.org> wrote: > >> Hi Jacek, >> >> I tried accessing Spark web UI on both Firefox and Google Chrome browsers >> with ad blocker enabled. I do see other options like* User, Total >> Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Mark Hamstra
blocker enabled. I do see other options like* User, Total Uptime, > Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. > However, I don't see an option for DAG visualization. > > Please note that I am experiencing the same issue with Spark 2.x (i.e. > 2.0.0, 2.0.1, 2.0

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi Jacek, I tried accessing Spark web UI on both Firefox and Google Chrome browsers with ad blocker enabled. I do see other options like* User, Total Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. However, I don't see an option for DAG visualization. Please note

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Jacek Laskowski
job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Ana

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim
Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Mich Talebzadeh
right let us simplify this. can you run the whole thing *once* only and send dag execution output from UI? you can use snipping tool to take the image. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Rabin Banerjee
(look at UI page, 4040 by default) , ? *I checked Spark UI DAG , so many file reads , Why ?* 6. What Spark mode is being used (Local, Standalone, Yarn) ? *Yarn* 7. OOM could be anything depending on how much you are allocating to your driver memory in spark-submit ? *Driver

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Mich Talebzadeh
Executor memory and Driver memory is set as 4gb which is too high as > data size is in MB. > > Questions :: > > 1. Will Spark optimize multiple SQL queries into one single plysical plan ? > 2. In DAG I can see a lot of file read and lot of stages , Why ? I only > called act

SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-09 Thread Rabin Banerjee
will optimize multiple SQL into one physical execution plan . 2. Executor memory and Driver memory is set as 4gb which is too high as data size is in MB. Questions :: 1. Will Spark optimize multiple SQL queries into one single plysical plan ? 2. In DAG I can see a lot of file read and lot

DAG of Spark Sort application spanning two jobs

2016-05-30 Thread alvarobrandon
com/file/n27047/cbKDZ.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spa

DAG Pipelines?

2016-05-04 Thread Cesar Flores
I read on the ml-guide page ( http://spark.apache.org/docs/latest/ml-guide.html#details). It mention that it is possible to construct DAG Pipelines. Unfortunately there is no example to explain under which use case this may be useful. *Can someone give me an example or use case where

the "DAG Visualiztion" in 1.6 not works fine here

2016-03-15 Thread charles li
sometimes it just shows several *black dots*, and sometimes it can not show the entire graph. did anyone meet this before and how did you fix it? ​ ​ -- *--* a spark lover, a quant, a developer and a good man. http://github.com/litaotao

streaming application redundant dag stage execution/performance/caching

2016-02-16 Thread krishna ramachandran
We have a streaming application containing approximately 12 stages every batch, running in streaming mode (4 sec batches). Each stage persists output to cassandra the pipeline stages stage 1 ---> receive Stream A --> map --> filter -> (union with another stream B) --> map --> groupbykey -->

RE: Question on Spark architecture and DAG

2016-02-12 Thread Mich Talebzadeh
ndy Davidson [mailto:a...@santacruzintegration.com] Sent: 12 February 2016 21:17 To: Mich Talebzadeh <m...@peridale.co.uk>; user@spark.apache.org Subject: Re: Question on Spark architecture and DAG From: Mich Talebzadeh <m...@peridale.co.uk <mailto:m...@peridale.co.uk> > Dat

Re: Question on Spark architecture and DAG

2016-02-12 Thread Andy Davidson
From: Mich Talebzadeh <m...@peridale.co.uk> Date: Thursday, February 11, 2016 at 2:30 PM To: "user @spark" <user@spark.apache.org> Subject: Question on Spark architecture and DAG > Hi, > > I have used Hive on Spark engine and of course Hive tables and its pre

Question on Spark architecture and DAG

2016-02-11 Thread Mich Talebzadeh
Hi, I have used Hive on Spark engine and of course Hive tables and its pretty impressive comparing Hive using MR engine. Let us assume that I use spark shell. Spark shell is a client that connects to spark master running on a host and port like below spark-shell --master

Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ?

2016-02-01 Thread zml张明磊
Hello , I am trying to find some tools but useless. So, as title described, Is there some open source tools which implements draggable widget and make the app running in a form of DAG like workflow ? Thanks, Minglei.

DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava
spark.ui.retainedJobs=1 --conf spark.ui.retainedStages=1 In the Spark Web UI (http://localhost:18080/), the DAG visualization of only the most recent job is available. For rest of the jobs, I get the following message No visualization information available for this job! If this is an old job, its

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Tathagata Das
> Hi Tathagata/Cody, > > I am facing a challenge in Production with DAG behaviour during > checkpointing in spark streaming - > > Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ > 100 GB of data > > Step 2 : Repartition KafkaStreamRdd from 5 t

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Cody Koeninger
ing a challenge in Production with DAG behaviour during > checkpointing in spark streaming - > > Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ > 100 GB of data > > Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to > parallelise processing - c

Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-23 Thread gaurav sharma
Hi Tathagata/Cody, I am facing a challenge in Production with DAG behaviour during checkpointing in spark streaming - Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ 100 GB of data Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise processing

Re: Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Tathagata Das
A very basic support that is there in DStream is DStream.transform() which take arbitrary RDD => RDD function. This function can actually choose to do different computation with time. That may be of help to you. On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakur wrote: >

DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread petranidis
is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then the DAG scheduler seems to dead

Re: DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread Petros Nyfantis
e if the key of each mapped element keyA is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then th

How to create combine DAG visualization?

2015-09-10 Thread b.bhavesh
Hi, How can I create combine DAG visualization of pyspark code instead of separate DAGs of jobs and stages? Thanks b.bhavesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-combine-DAG-visualization-tp24653.html Sent from the Apache Spark

Re: DAG related query

2015-08-20 Thread Andrew Or
driver program. The first RDD still exists. On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain bahub...@gmail.com wrote: Hi, How would the DAG look like for the below code JavaRDDString rdd1 = context.textFile(SOMEPATH); JavaRDDString rdd2 = rdd1.map(DO something); rdd1 = rdd2.map(Do

Re: DAG related query

2015-08-20 Thread Sean Owen
No. The third line creates a third RDD whose reference simply replaces the reference to the first RDD in your local driver program. The first RDD still exists. On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain bahub...@gmail.com wrote: Hi, How would the DAG look like for the below code

DAG related query

2015-08-20 Thread Bahubali Jain
Hi, How would the DAG look like for the below code JavaRDDString rdd1 = context.textFile(SOMEPATH); JavaRDDString rdd2 = rdd1.map(DO something); rdd1 = rdd2.map(Do SOMETHING); Does this lead to any kind of cycle? Thanks, Baahu

DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Kristina Rogale Plazonic
Hi, I'm puzzling over the following problem: when I cache a small sample of a big dataframe, the small dataframe is recomputed when selecting a column (but not if show() or count() is invoked). Why is that so and how can I avoid recomputation of the small sample dataframe? More details: - I

Re: DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Michael Armbrust
We will try to address this before Spark 1.5 is released: https://issues.apache.org/jira/browse/SPARK-9141 On Tue, Jul 28, 2015 at 11:50 AM, Kristina Rogale Plazonic kpl...@gmail.com wrote: Hi, I'm puzzling over the following problem: when I cache a small sample of a big dataframe, the

Building DAG from log

2015-05-04 Thread Giovanni Paolo Gibilisco
Hi, I'm trying to build the DAG of an application from the logs. I've had a look at SparkReplayDebugger but it doesn't operato offline on logs. I looked also at the one in this pull: https://github.com/apache/spark/pull/2077 that seems to operate only on logs but it doesn't clealry show

Re: DAG

2015-04-25 Thread Akhil Das
May be this will give you a good start https://github.com/apache/spark/pull/2077 Thanks Best Regards On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco gibb...@gmail.com wrote: Hi, I would like to know if it is possible to build the DAG before actually executing the application. My

Re: DAG

2015-04-25 Thread Corey Nolet
Giovanni, The DAG can be walked by calling the dependencies() function on any RDD. It returns a Seq containing the parent RDDs. If you start at the leaves and walk through the parents until dependencies() returns an empty Seq, you ultimately have your DAG. On Sat, Apr 25, 2015 at 1:28 PM, Akhil

DAG

2015-04-24 Thread Giovanni Paolo Gibilisco
Hi, I would like to know if it is possible to build the DAG before actually executing the application. My guess is that in the scheduler the DAG is built dynamically at runtime since it might depend on the data, but I was wondering if there is a way (and maybe a tool already) to analyze the code

Re: Spark Application Stages and DAG

2015-04-07 Thread Vijay Innamuri
+ 1) / 3125] One of the stages is taking long time for execution. How to find the transformations/ actions associated with a particular stage? Is there anyway to find the execution DAG of a Spark Application? Regards Vijay

Re: Spark Application Stages and DAG

2015-04-03 Thread Tathagata Das
. How to find the transformations/ actions associated with a particular stage? Is there anyway to find the execution DAG of a Spark Application? Regards Vijay

Support for Data flow graphs and not DAG only

2015-04-02 Thread anshu shukla
Hey , I didn't find any documentation regarding support for cycles in spark topology , although storm supports this using manual configuration in acker function logic (setting it to a particular count) .By cycles i doesn't mean infinite loops . -- Thanks Regards, Anshu Shukla

question regarding the dependency DAG in Spark

2015-03-16 Thread Grandl Robert
Hi guys, I am trying to get a better understanding of the DAG generation for a job in Spark. Ideally, what I want is to run some SQL query and extract the generated DAG by Spark. By DAG I mean the stages and dependencies among stages, and the number of tasks in every stage. Could you guys

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist
There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wbr...@gmail.com wrote: Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent

Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the Spark Job, its stages and the tasks inside the stages, with the dependencies between them (either narrow or shuffle dependencies). The Spark

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny
For anybody who's interested in this, here's a link to a PR that addresses this feature : https://github.com/apache/spark/pull/2077 (thanks to Todd Nist for sending it to me) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-the-DAG-of-a-Spark

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Tobias Pfeiffer
Hi, On Sat, Jan 17, 2015 at 3:37 AM, Peng Cheng pc...@uow.edu.au wrote: I'm talking about RDD1 (not persisted or checkpointed) in this situation: ...(somewhere) - RDD1 - RDD2 || V V

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu
(one for schema inferring, another for data read). It almost guarantees that the source jsonRDD is calculated twice. Has this problem be addressed so far? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng
: http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html Sent from the Apache Spark User List mailing list archive at Nabble.com

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng
I'm talking about RDD1 (not persisted or checkpointed) in this situation: ...(somewhere) - RDD1 - RDD2 || V V RDD3 - RDD4 - Action! To my experience the change RDD1 get

Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework that we've been developing that connects various different RDDs together based on some predefined business cases. After updating to 1.2.0, some of the concurrency expectations about how the stages within jobs are executed

Re: Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet
I asked this question too soon. I am caching off a bunch of RDDs in a TrieMap so that our framework can wire them together and the locking was not completely correct- therefore it was creating multiple new RDDs at times instead of using cached versions- which were creating completely separate

Re: DAG info

2015-01-03 Thread madhu phatak
Hi, You can turn off these messages using log4j.properties. On Fri, Jan 2, 2015 at 1:51 PM, Robineast robin.e...@xense.co.uk wrote: Do you have some example code of what you are trying to do? Robin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG

Re: DAG info

2015-01-02 Thread Robineast
Do you have some example code of what you are trying to do? Robin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940p20941.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: DAG info

2015-01-01 Thread Josh Rosen
), which has no missing parents Also my program is taking lot of time to execute. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html Sent from the Apache Spark User List mailing list archive at Nabble.com

DAG info

2015-01-01 Thread shahid
:43), which has no missing parents Also my program is taking lot of time to execute. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html Sent from the Apache Spark User List mailing list archive at Nabble.com