Re: Spark DAG scheduler

2020-04-16 Thread Reynold Xin
If you are talking about a tree, then the RDDs are nodes, and the dependencies are the edges. If you are talking about a DAG, then the partitions in the RDDs are the nodes, and the dependencies between the partitions are the edges. On Thu, Apr 16, 2020 at 4:02 PM, Mania Abdi <

Re: Spark DAG scheduler

2020-04-16 Thread Mania Abdi
Is it correct to say, the nodes in the DAG are RDDs and the edges are computations? On Thu, Apr 16, 2020 at 6:21 PM Reynold Xin wrote: > The RDD is the DAG. > > > On Thu, Apr 16, 2020 at 3:16 PM, Mania Abdi wrote: > >> Hello everyone, >> >> I am implementing a caching mechanism for analytic

Re: Spark DAG scheduler

2020-04-16 Thread Reynold Xin
The RDD is the DAG. On Thu, Apr 16, 2020 at 3:16 PM, Mania Abdi < abdi...@husky.neu.edu > wrote: > > Hello everyone, > > I am implementing a caching mechanism for analytic workloads running on > top of Spark and I need to retrieve the Spark DAG right after it is > generated and the DAG

Spark DAG scheduler

2020-04-16 Thread Mania Abdi
Hello everyone, I am implementing a caching mechanism for analytic workloads running on top of Spark and I need to retrieve the Spark DAG right after it is generated and the DAG scheduler. I would appreciate it if you could give me some hints or reference me to some documents about where the DAG

Re: DSv2 & DataSourceRegister

2020-04-16 Thread Andrew Melo
Hi again, Does anyone have thoughts on either the idea or the implementation? Thanks, Andrew On Thu, Apr 9, 2020 at 11:32 PM Andrew Melo wrote: > > Hi all, > > I've opened a WIP PR here https://github.com/apache/spark/pull/28159 > I'm a novice at Scala, so I'm sure the code isn't idiomatic,

BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-16 Thread Jacek Laskowski
Hi, While trying to understand the relationship of BlockManager and ShuffleManager I found that ShuffleManager is used for shuffle block data [1] (and that makes sense). What I found quite surprising is that BlockManager can call getLocalBytes for non-shuffle blocks that in turn does...fetching

Re: InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-16 Thread Jacek Laskowski
Hi Jungtaek, Thanks a lot for your answer. What you're saying reflects my understanding perfectly. There's a small change, but makes understanding where rules are used much simpler (= less confusing). I'll propose a PR and see where it goes from there. Thanks! Pozdrawiam, Jacek Laskowski