It would be nice if the RDD cache() method incorporate a depth information.
That is,
void test()
{
JavaRDD. rdd = .;
rdd.cache(); // to depth 1. actual caching happens.
rdd.cache(); // to depth 2. Nop as long as the storage level is the same.
Else, exception.
.
rdd.uncache(); // to
Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I
missing something fundamental?
val nodes = sc.parallelize(Array((1L, N1), (2L, N2), (3L, N3), (4L,
N4), (5L, N5)))
val edges = sc.parallelize(Array(Edge(1L, 2L, E1), Edge(1L, 3L, E2),
Edge(2L, 4L, E3), Edge(3L,
This is a pretty cool idea — instead of cache depth I’d call it something like
reference counting. Would you mind opening a JIRA issue about it?
The issue of really composing together libraries that use RDDs nicely isn’t
fully explored, but this is certainly one thing that would help with it.
Take a look at this one: https://issues.apache.org/jira/browse/SPARK-1188
It was an optimization that added user inconvenience. We got rid of that
now in Spark 1.0.
On Wed, May 28, 2014 at 11:48 PM, Michael Malak michaelma...@yahoo.comwrote:
Shouldn't I be seeing N2 and N4 in the output
Opened a JIRA issue. (https://issues.apache.org/jira/browse/SPARK-1962)
Thanks.
-Original Message-
From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Thursday, May 29, 2014 3:54 PM
To: dev@spark.apache.org
Subject: Re: Suggestion: RDD cache depth
This is a pretty cool idea -
Xiangrui, Christopher,
Thanks for responding. I'll go through the code in detail to evaluate if
the loss function used is suitable to our dataset. I'll also go through the
referred paper since I was unaware of the underlying theory. Thanks again.
-Bharath
On Thu, May 29, 2014 at 8:16 AM,
The instruction address is in
http://spark.apache.org/docs/0.9.0/spark-standalone.html#launching-applications-inside-the-cluster
or
http://spark.apache.org/docs/0.9.1/spark-standalone.html#launching-applications-inside-the-cluster
Origin instruction is:
./bin/spark-class
I do see the issue for centering sparse data. Actually, the centering is less
important than the scaling by the standard deviation. Not having unit
variance causes the convergence issues and long runtimes.
RowMatrix will compute variance of a column?
--
View this message in context:
Can anyone verify which rc [SPARK-1360] Add Timestamp Support for SQL #275
https://github.com/apache/spark/pull/275 is included in? I am running
rc3, but receiving errors with TIMESTAMP as a datatype in my Hive tables
when trying to use them in pyspark.
*The error I get:
*
14/05/29 15:44:47
I can confirm that the commit is included in the 1.0.0 release candidates
(it was committed before branch-1.0 split off from master), but I can't
confirm that it works in PySpark. Generally the Python and Java interfaces
lag a little behind the Scala interface to Spark, but we're working to keep
+1
I spun up a few EC2 clusters and ran my normal audit checks. Tests
passing, sigs, CHANGES and NOTICE look good
Thanks TD for helping cut this RC!
On Wed, May 28, 2014 at 9:38 PM, Kevin Markey kevin.mar...@oracle.com wrote:
+1
Built -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0
Ran current
Thanks for reporting this!
https://issues.apache.org/jira/browse/SPARK-1964
https://github.com/apache/spark/pull/913
If you could test out that PR and see if it fixes your problems I'd really
appreciate it!
Michael
On Thu, May 29, 2014 at 9:09 AM, Andrew Ash and...@andrewash.com wrote:
I
Yes, I get the same error:
scala val hc = new org.apache.spark.sql.hive.HiveContext(sc)
14/05/29 16:53:40 INFO deprecation: mapred.input.dir.recursive is
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/05/29 16:53:40 INFO deprecation: mapred.max.split.size is
Michael,
Will I have to rebuild after adding the change? Thanks
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Timestamp-support-in-v1-0-tp6850p6855.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Darn, I was hoping just to sneak it in that file. I am not the only person
working on the cluster; if I rebuild it that means I have to redeploy
everything to all the nodes as well. So I cannot do that ... today. If
someone else doesn't beat me to it. I can rebuild at another time.
-
Yes, you'll need to download the code from that PR and reassemble Spark
(sbt/sbt assembly).
On Thu, May 29, 2014 at 10:02 AM, dataginjaninja
rickett.stepha...@gmail.com wrote:
Michael,
Will I have to rebuild after adding the change? Thanks
--
View this message in context:
You should be able to get away with only doing it locally. This bug is
happening during analysis which only occurs on the driver.
On Thu, May 29, 2014 at 10:17 AM, dataginjaninja
rickett.stepha...@gmail.com wrote:
Darn, I was hoping just to sneak it in that file. I am not the only person
[tl;dr stable API's are important - sorry, this is slightly meandering]
Hey - just wanted to chime in on this as I was travelling. Sean, you
bring up great points here about the velocity and stability of Spark.
Many projects have fairly customized semantics around what versions
actually mean
Hello everyone,
The vote on Spark 1.0.0 RC11 passes with13 +1 votes, one 0 vote and no
-1 vote.
Thanks to everyone who tested the RC and voted. Here are the totals:
+1: (13 votes)
Matei Zaharia*
Mark Hamstra*
Holden Karau
Nick Pentreath*
Will Benton
Henry Saputra
Sean McNamara*
Xiangrui Meng*
Let me put in my +1 as well!
This voting is now closed, and it successfully passes with 13 +1
votes and one 0 vote.
Thanks to everyone who tested the RC and voted. Here are the totals:
+1: (13 votes)
Matei Zaharia*
Mark Hamstra*
Holden Karau
Nick Pentreath*
Will Benton
Henry Saputra
Sean
Yup, congrats all. The most impressive thing is the number of contributors to
this release — with over 100 contributors, it’s becoming hard to even write the
credits. Look forward to the Apache press release tomorrow.
Matei
On May 29, 2014, at 1:33 PM, Patrick Wendell pwend...@gmail.com wrote:
Yes great work all. Special thanks to Patrick (and TD) for excellent
leadership!
On May 29, 2014 5:39 PM, Usman Ghani us...@platfora.com wrote:
Congrats everyone. Really pumped about this.
On Thu, May 29, 2014 at 2:57 PM, Henry Saputra henry.sapu...@gmail.com
wrote:
Congrats guys! Another
hi, spark-developers, i am using shark/spark, and i am puzzled by such
question, and can not find any info from the web, so i ask you.
1. how spark partition data in memory when creating table when using
create table a tblproperties(shark.cache=memory) as select * from
table b , in another
23 matches
Mail list logo