RE: Should I avoid "state" in an Spark application?

2016-06-12 Thread Haopu Wang
Can someone look at my questions? Thanks again! From: Haopu Wang Sent: 2016年6月12日 16:40 To: u...@spark.apache.org Subject: Should I avoid "state" in an Spark application? I have a Spark application whose structure is below: var ts: Long = 0L

Re: DAG in Pipeline

2016-06-12 Thread Joseph Bradley
One more note: When you specify the stages in the Pipeline, they need to be in topological order according to the DAG. On Sun, Jun 12, 2016 at 10:47 AM, Joseph Bradley wrote: > Hi Pranay, > > Yes, you can do this. The DAG structure should be specified via the > various

Re: DAG in Pipeline

2016-06-12 Thread Joseph Bradley
Hi Pranay, Yes, you can do this. The DAG structure should be specified via the various Transformers' input and output columns, where a Transformer can have multiple input and/or output columns. Most of the classification and regression Models are good examples of Transformers with multiple

Re: Welcoming Yanbo Liang as a committer

2016-06-12 Thread Joseph Bradley
Congrats & welcome! On Tue, Jun 7, 2016 at 7:15 AM, Xiangrui Meng wrote: > Congrats!! > > On Mon, Jun 6, 2016, 8:12 AM Gayathri Murali > wrote: > >> Congratulations Yanbo Liang! Well deserved. >> >> >> On Sun, Jun 5, 2016 at 7:10 PM,

Re: Shrinking the DataFrame lineage

2016-06-12 Thread Joseph Bradley
Sorry for the slow response. I agree with Hamel on #1. GraphFrames are mostly wrappers for GraphX algorithms. There are a few which are not: * BFS: This is an iterative DataFrame alg. Though it has unit tests, I have not pushed it in scaling to see how far it can go. * Belief Propagation