subject:"DAG"

Re: [GraphX]: Prevent recomputation of DAG

2024-03-18 Thread Mich Talebzadeh

Hi, I must admit I don't know much about this Fruchterman-Reingold (call it FR) visualization using GraphX and Kubernetes..But you are suggesting this slowdown issue starts after the second iteration, and caching/persisting the graph after each iteration does not help. FR involves many

[GraphX]: Prevent recomputation of DAG

2024-03-17 Thread Marek Berith

Dear community, for my diploma thesis, we are implementing a distributed version of Fruchterman-Reingold visualization algorithm, using GraphX and Kubernetes. Our solution is a backend that continously computes new positions of vertices in a graph and sends them via RabbitMQ to a consumer.

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Mich Talebzadeh

for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >>> tr.phan.tru...@gmail.com> wrote: >>> >>>> Hi, >>>&g

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-04-01 Thread Khalid Mammadov

learning about Apache Spark and want to know the meaning of each >>> Task created on the Jobs recorded on Spark history. >>> >>> For example, the application I write creates 17 jobs, in which job 0 >>> runs for 10 minutes, there are 2384 small tasks and I want to le

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh

gt;> >> >> >> >> On Fri, 31 Mar 2023 at 15:15, AN-TRUONG Tran Phan < >> tr.phan.tru...@gmail.com> wrote: >> >>> Hi, >>> >>> I am learning about Apache Spark and want to know the meaning of each >>> Task created on the Jobs

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread AN-TRUONG Tran Phan

ite creates 17 jobs, in which job 0 runs >> for 10 minutes, there are 2384 small tasks and I want to learn about the >> meaning of these 2384, is it possible? >> >> I found a picture of DAG in the Jobs and want to know the relationship >> between DAG and Task,

Re: Help me learn about JOB TASK and DAG in Apache Spark

2023-03-31 Thread Mich Talebzadeh

reates 17 jobs, in which job 0 runs > for 10 minutes, there are 2384 small tasks and I want to learn about the > meaning of these 2384, is it possible? > > I found a picture of DAG in the Jobs and want to know the relationship > between DAG and Task, is it possible (Specifically from

Mapping stages in DAG to line of code in pyspark

2021-04-18 Thread Dhruv Kumar

Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code. Can

Re: Where is the DAG stored before catalyst gets it?

2018-10-06 Thread Jacek Laskowski

Hi Jean Georges, > I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Sorry to be that direct, but the sentence does not make much sense to me. Again, very sorry for saying it in the very first sentence. Since I know Jean Georges I allowed

Where is the DAG stored before catalyst gets it?

2018-10-04 Thread Jean Georges Perrin

Hi, I am assuming it is still in the master and when catalyst is finished it sends the tasks to the workers. Correct? tia jg - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark ML DAG Pipelines

2017-09-07 Thread Srikanth Sampath

Hi Spark Experts, Can someone point me to some examples for non-linear (DAG) ML pipelines. That would be of great help. Thanks much in advance -Srikanth

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-29 Thread Nipun Arora

Sending out the message again.. Hopefully someone cal clarify :) I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts

Re: [Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora

Apogies - Resending as the previous mail went with some unnecessary copy paste. I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all

[Spark Streaming] DAG Output Processing mechanism

2017-05-28 Thread Nipun Arora

up vote 0 down vote favorite I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have

[Spark Streaming] DAG Execution Model Clarification

2017-05-26 Thread Nipun Arora

Hi, I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have a dstream -A , I do map

How to generate stage for this RDD DAG please?

2017-05-23 Thread ??????????

Hi all, I read some paper about the stage, l know the narrow dependency and shuffle dependency. About the belowing RDD DAG, how deos spark generate the stage DAG please? And is this RDD DAG legal please

Re: DAG Visualization option is missing on Spark Web UI

2017-01-30 Thread Md. Rezaul Karim

a...@insight-centre.org> wrote: > >> Hi Jacek, >> >> I tried accessing Spark web UI on both Firefox and Google Chrome browsers >> with ad blocker enabled. I do see other options like* User, Total >> Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Mark Hamstra

blocker enabled. I do see other options like* User, Total Uptime, > Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. > However, I don't see an option for DAG visualization. > > Please note that I am experiencing the same issue with Spark 2.x (i.e. > 2.0.0, 2.0.1, 2.0

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim

Hi Jacek, I tried accessing Spark web UI on both Firefox and Google Chrome browsers with ad blocker enabled. I do see other options like* User, Total Uptime, Scheduling Mode, **Active Jobs, Completed Jobs and* Event Timeline. However, I don't see an option for DAG visualization. Please note

Re: DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Jacek Laskowski

job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Ana

DAG Visualization option is missing on Spark Web UI

2017-01-28 Thread Md. Rezaul Karim

Hi All, I am running a Spark job on my local machine written in Scala with Spark 2.1.0. However, I am not seeing any option of "*DAG Visualization*" at http://localhost:4040/jobs/ Suggestion, please. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Mich Talebzadeh

right let us simplify this. can you run the whole thing *once* only and send dag execution output from UI? you can use snipping tool to take the image. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Rabin Banerjee

(look at UI page, 4040 by default) , ? *I checked Spark UI DAG , so many file reads , Why ?* 6. What Spark mode is being used (Local, Standalone, Yarn) ? *Yarn* 7. OOM could be anything depending on how much you are allocating to your driver memory in spark-submit ? *Driver

Re: SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-10 Thread Mich Talebzadeh

Executor memory and Driver memory is set as 4gb which is too high as > data size is in MB. > > Questions :: > > 1. Will Spark optimize multiple SQL queries into one single plysical plan ? > 2. In DAG I can see a lot of file read and lot of stages , Why ? I only > called act

SparkSQL DAG generation , DAG optimization , DAG execution

2016-09-09 Thread Rabin Banerjee

will optimize multiple SQL into one physical execution plan . 2. Executor memory and Driver memory is set as 4gb which is too high as data size is in MB. Questions :: 1. Will Spark optimize multiple SQL queries into one single plysical plan ? 2. In DAG I can see a lot of file read and lot

DAG of Spark Sort application spanning two jobs

2016-05-30 Thread alvarobrandon

com/file/n27047/cbKDZ.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/GXIkS.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27047/H9LXF.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spa

DAG Pipelines?

2016-05-04 Thread Cesar Flores

I read on the ml-guide page ( http://spark.apache.org/docs/latest/ml-guide.html#details). It mention that it is possible to construct DAG Pipelines. Unfortunately there is no example to explain under which use case this may be useful. *Can someone give me an example or use case where

the "DAG Visualiztion" in 1.6 not works fine here

2016-03-15 Thread charles li

sometimes it just shows several *black dots*, and sometimes it can not show the entire graph. did anyone meet this before and how did you fix it? -- *--* a spark lover, a quant, a developer and a good man. http://github.com/litaotao

streaming application redundant dag stage execution/performance/caching

2016-02-16 Thread krishna ramachandran

We have a streaming application containing approximately 12 stages every batch, running in streaming mode (4 sec batches). Each stage persists output to cassandra the pipeline stages stage 1 ---> receive Stream A --> map --> filter -> (union with another stream B) --> map --> groupbykey -->

RE: Question on Spark architecture and DAG

2016-02-12 Thread Mich Talebzadeh

ndy Davidson [mailto:a...@santacruzintegration.com] Sent: 12 February 2016 21:17 To: Mich Talebzadeh <m...@peridale.co.uk>; user@spark.apache.org Subject: Re: Question on Spark architecture and DAG From: Mich Talebzadeh <m...@peridale.co.uk <mailto:m...@peridale.co.uk> > Dat

Re: Question on Spark architecture and DAG

2016-02-12 Thread Andy Davidson

From: Mich Talebzadeh <m...@peridale.co.uk> Date: Thursday, February 11, 2016 at 2:30 PM To: "user @spark" <user@spark.apache.org> Subject: Question on Spark architecture and DAG > Hi, > > I have used Hive on Spark engine and of course Hive tables and its pre

Question on Spark architecture and DAG

2016-02-11 Thread Mich Talebzadeh

Hi, I have used Hive on Spark engine and of course Hive tables and its pretty impressive comparing Hive using MR engine. Let us assume that I use spark shell. Spark shell is a client that connects to spark master running on a host and port like below spark-shell --master

Is there some open source tools which implements draggable widget and make the app runing in a form of DAG ?

2016-02-01 Thread zml张明磊

Hello , I am trying to find some tools but useless. So, as title described, Is there some open source tools which implements draggable widget and make the app running in a form of DAG like workflow ? Thanks, Minglei.

DAG visualization: no visualization information available with history server

2016-01-31 Thread Raghava

spark.ui.retainedJobs=1 --conf spark.ui.retainedStages=1 In the Spark Web UI (http://localhost:18080/), the DAG visualization of only the most recent job is available. For rest of the jobs, I get the following message No visualization information available for this job! If this is an old job, its

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Tathagata Das

> Hi Tathagata/Cody, > > I am facing a challenge in Production with DAG behaviour during > checkpointing in spark streaming - > > Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ > 100 GB of data > > Step 2 : Repartition KafkaStreamRdd from 5 t

Re: Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-25 Thread Cody Koeninger

ing a challenge in Production with DAG behaviour during > checkpointing in spark streaming - > > Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ > 100 GB of data > > Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to > parallelise processing - c

Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-23 Thread gaurav sharma

Hi Tathagata/Cody, I am facing a challenge in Production with DAG behaviour during checkpointing in spark streaming - Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ 100 GB of data Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise processing

Re: Dynamic DAG use-case for spark streaming.

2015-09-29 Thread Tathagata Das

A very basic support that is there in DStream is DStream.transform() which take arbitrary RDD => RDD function. This function can actually choose to do different computation with time. That may be of help to you. On Tue, Sep 29, 2015 at 12:06 PM, Archit Thakur wrote: >

DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread petranidis

is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then the DAG scheduler seems to dead

Re: DAG Scheduler deadlock when two RDDs reference each other, force Stages manually?

2015-09-14 Thread Petros Nyfantis

e if the key of each mapped element keyA is passed in the aggregate function as a initialization parameter and then for each B element key keyB, if M(keyA, keyB) ==1 then the B element is being taken into account in the summation. The calculation of A is done successfully and correctly, but then th

How to create combine DAG visualization?

2015-09-10 Thread b.bhavesh

Hi, How can I create combine DAG visualization of pyspark code instead of separate DAGs of jobs and stages? Thanks b.bhavesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-combine-DAG-visualization-tp24653.html Sent from the Apache Spark

Re: DAG related query

2015-08-20 Thread Andrew Or

driver program. The first RDD still exists. On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain bahub...@gmail.com wrote: Hi, How would the DAG look like for the below code JavaRDDString rdd1 = context.textFile(SOMEPATH); JavaRDDString rdd2 = rdd1.map(DO something); rdd1 = rdd2.map(Do

Re: DAG related query

2015-08-20 Thread Sean Owen

No. The third line creates a third RDD whose reference simply replaces the reference to the first RDD in your local driver program. The first RDD still exists. On Thu, Aug 20, 2015 at 2:15 PM, Bahubali Jain bahub...@gmail.com wrote: Hi, How would the DAG look like for the below code

DAG related query

2015-08-20 Thread Bahubali Jain

Hi, How would the DAG look like for the below code JavaRDDString rdd1 = context.textFile(SOMEPATH); JavaRDDString rdd2 = rdd1.map(DO something); rdd1 = rdd2.map(Do SOMETHING); Does this lead to any kind of cycle? Thanks, Baahu

DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Kristina Rogale Plazonic

Hi, I'm puzzling over the following problem: when I cache a small sample of a big dataframe, the small dataframe is recomputed when selecting a column (but not if show() or count() is invoked). Why is that so and how can I avoid recomputation of the small sample dataframe? More details: - I

Re: DataFrame DAG recomputed even though DataFrame is cached?

2015-07-28 Thread Michael Armbrust

We will try to address this before Spark 1.5 is released: https://issues.apache.org/jira/browse/SPARK-9141 On Tue, Jul 28, 2015 at 11:50 AM, Kristina Rogale Plazonic kpl...@gmail.com wrote: Hi, I'm puzzling over the following problem: when I cache a small sample of a big dataframe, the

Building DAG from log

2015-05-04 Thread Giovanni Paolo Gibilisco

Hi, I'm trying to build the DAG of an application from the logs. I've had a look at SparkReplayDebugger but it doesn't operato offline on logs. I looked also at the one in this pull: https://github.com/apache/spark/pull/2077 that seems to operate only on logs but it doesn't clealry show

Re: DAG

2015-04-25 Thread Akhil Das

May be this will give you a good start https://github.com/apache/spark/pull/2077 Thanks Best Regards On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco gibb...@gmail.com wrote: Hi, I would like to know if it is possible to build the DAG before actually executing the application. My

Re: DAG

2015-04-25 Thread Corey Nolet

Giovanni, The DAG can be walked by calling the dependencies() function on any RDD. It returns a Seq containing the parent RDDs. If you start at the leaves and walk through the parents until dependencies() returns an empty Seq, you ultimately have your DAG. On Sat, Apr 25, 2015 at 1:28 PM, Akhil

DAG

2015-04-24 Thread Giovanni Paolo Gibilisco

Hi, I would like to know if it is possible to build the DAG before actually executing the application. My guess is that in the scheduler the DAG is built dynamically at runtime since it might depend on the data, but I was wondering if there is a way (and maybe a tool already) to analyze the code

Re: Spark Application Stages and DAG

2015-04-07 Thread Vijay Innamuri

+ 1) / 3125] One of the stages is taking long time for execution. How to find the transformations/ actions associated with a particular stage? Is there anyway to find the execution DAG of a Spark Application? Regards Vijay

Re: Spark Application Stages and DAG

2015-04-03 Thread Tathagata Das

. How to find the transformations/ actions associated with a particular stage? Is there anyway to find the execution DAG of a Spark Application? Regards Vijay

Support for Data flow graphs and not DAG only

2015-04-02 Thread anshu shukla

Hey , I didn't find any documentation regarding support for cycles in spark topology , although storm supports this using manual configuration in acker function logic (setting it to a particular count) .By cycles i doesn't mean infinite loops . -- Thanks Regards, Anshu Shukla

question regarding the dependency DAG in Spark

2015-03-16 Thread Grandl Robert

Hi guys, I am trying to get a better understanding of the DAG generation for a job in Spark. Ideally, what I want is to run some SQL query and extract the generated DAG by Spark. By DAG I mean the stages and dependencies among stages, and the number of tasks in every stage. Could you guys

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist

There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wbr...@gmail.com wrote: Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent

Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny

Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the Spark Job, its stages and the tasks inside the stages, with the dependencies between them (either narrow or shuffle dependencies). The Spark

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread t1ny

For anybody who's interested in this, here's a link to a PR that addresses this feature : https://github.com/apache/spark/pull/2077 (thanks to Todd Nist for sending it to me) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-the-DAG-of-a-Spark

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Tobias Pfeiffer

Hi, On Sat, Jan 17, 2015 at 3:37 AM, Peng Cheng pc...@uow.edu.au wrote: I'm talking about RDD1 (not persisted or checkpointed) in this situation: ...(somewhere) - RDD1 - RDD2 || V V

Re: If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-19 Thread Xuefeng Wu

(one for schema inferring, another for data read). It almost guarantees that the source jsonRDD is calculated twice. Has this problem be addressed so far? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng

: http://apache-spark-user-list.1001560.n3.nabble.com/If-an-RDD-appeared-twice-in-a-DAG-of-which-calculation-is-triggered-by-a-single-action-will-this-RDD-tp21192.html Sent from the Apache Spark User List mailing list archive at Nabble.com

If an RDD appeared twice in a DAG, of which calculation is triggered by a single action, will this RDD be calculated twice?

2015-01-16 Thread Peng Cheng

I'm talking about RDD1 (not persisted or checkpointed) in this situation: ...(somewhere) - RDD1 - RDD2 || V V RDD3 - RDD4 - Action! To my experience the change RDD1 get

Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet

We just updated to Spark 1.2.0 from Spark 1.1.0. We have a small framework that we've been developing that connects various different RDDs together based on some predefined business cases. After updating to 1.2.0, some of the concurrency expectations about how the stages within jobs are executed

Re: Strange DAG scheduling behavior on currently dependent RDDs

2015-01-07 Thread Corey Nolet

I asked this question too soon. I am caching off a bunch of RDDs in a TrieMap so that our framework can wire them together and the locking was not completely correct- therefore it was creating multiple new RDDs at times instead of using cached versions- which were creating completely separate

Re: DAG info

2015-01-03 Thread madhu phatak

Hi, You can turn off these messages using log4j.properties. On Fri, Jan 2, 2015 at 1:51 PM, Robineast robin.e...@xense.co.uk wrote: Do you have some example code of what you are trying to do? Robin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG

Re: DAG info

2015-01-02 Thread Robineast

Do you have some example code of what you are trying to do? Robin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940p20941.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: DAG info

2015-01-01 Thread Josh Rosen

), which has no missing parents Also my program is taking lot of time to execute. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html Sent from the Apache Spark User List mailing list archive at Nabble.com

DAG info

2015-01-01 Thread shahid

:43), which has no missing parents Also my program is taking lot of time to execute. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DAG-info-tp20940.html Sent from the Apache Spark User List mailing list archive at Nabble.com

67 matches

Mail list logo