Fwd: spark graphx storage RDD memory leak

2016-04-11 Thread zhang juntao
yes I use version 1.6 , and thanks Ted > Begin forwarded message: > > From: Robin East > Subject: Re: spark graphx storage RDD memory leak > Date: April 12, 2016 at 2:13:10 AM GMT+8 > To: zhang juntao > Cc: Ted Yu ,

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-11 Thread Mark Hamstra
Yes, some organization do lag behind the current release by sometimes a significant amount. That is a bug, not a feature -- and one that increases pressure toward fragmentation of the Spark community. To date, that hasn't been a significant problem, and I think that is mainly because the factors

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-11 Thread Daniel Siegmann
On Wed, Apr 6, 2016 at 2:57 PM, Mark Hamstra wrote: ... My concern is that either of those options will take more resources > than some Spark users will have available in the ~3 months remaining before > Spark 2.0.0, which will cause fragmentation into Spark 1.x and

Re: spark graphx storage RDD memory leak

2016-04-11 Thread Robin East
this looks like https://issues.apache.org/jira/browse/SPARK-12655 fixed in 2.0 --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications

Re: [Streaming] textFileStream has no events shown in web UI

2016-04-11 Thread Yogesh Mahajan
Yes, this has observed in my case also. The Input Rate is 0 even in case of rawSocketStream. Is there a way we can enable the Input Rate for these types of streams ? Thanks, http://www.snappydata.io/blog On Wed, Mar 16, 2016 at 4:21 PM, Hao Ren wrote: >

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-11 Thread Ted Yu
Gentle ping: spark-1.6.1-bin-hadoop2.4.tgz from S3 is still corrupt. On Wed, Apr 6, 2016 at 12:55 PM, Josh Rosen wrote: > Sure, I'll take a look. Planning to do full verification in a bit. > > On Wed, Apr 6, 2016 at 12:54 PM Ted Yu wrote: > >>

Different maxBins value for categorical and continuous features in RandomForest implementation.

2016-04-11 Thread Rahul Tanwani
Hi, Currently the RandomForest algo takes a single maxBins value to decide the number of splits to take. This sometimes causes training time to go very high when there is a single categorical column having sufficiently large number of unique values. This single column impacts all the numeric

Fwd: spark graphx storage RDD memory leak

2016-04-11 Thread zhang juntao
thanks ted for replying , these three lines can’t release param graph cache, it only release g ( graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache() ) ConnectedComponents.scala param graph will cache in ccGraph and won’t be release in Pregel def run[VD: ClassTag, ED: