In the shell you could do:
val ssc = StreamingContext(*sc*, Seconds(1))
as *sc* is the SparkContext, which is already instantiated.
Thanks
Best Regards
On Sun, Dec 28, 2014 at 6:55 AM, Thomas Frisk tfris...@gmail.com wrote:
Yes you are right - thanks for that :)
On 27 December 2014 at
Something like?
val a = myRDD.mapPartitions(p = {
//Do the init
//Perform some operations
//Shut it down?
})
Thanks
Best Regards
On Sun, Dec 28, 2014 at 1:53 AM, Kevin Burton bur...@spinn3r.com wrote:
I have a job where I want to map over
Hi Zigen,
Looks like they missed it.
Thanks
Best Regards
On Sat, Dec 27, 2014 at 12:43 PM, Zigen Zigen dbviewer.zi...@gmail.com
wrote:
Hello , I am zigen.
I am using the Spark SQL 1.1.0.
I want to use the Spark SQL 1.2.0.
but my Spark application is a compile error.
Spark 1.1.0 had a
Hi Nicholas,
The RDD contains only one Iterable[Int].
Pankaj,
I used *collect* and I am getting as *items: Array[Iterable[Int]].*
Then I did like :
*val check = items.take(1).contains(item)*
I am getting *check: Boolean = false, *but the item is present.
Thanks
Amit
On Sun, Dec 28, 2014
Hi Sean,
I have a RDD like
*theItems: org.apache.spark.rdd.RDD[Iterable[Int]]*
I did like
*val items = theItems.collect *//to get it as an array
items: Array[Iterable[Int]]
*val check = items.contains(item)*
Thanks
Amit
On Sun, Dec 28, 2014 at 1:58 PM, Amit Behera amit.bd...@gmail.com wrote:
Hi Nicholas,
I am getting
error: value contains is not a member of Iterable[Int]
On Sun, Dec 28, 2014 at 2:06 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
take(1) will just give you a single item from the RDD. RDDs are not ideal
for point lookups like you are doing, but you can
You can't quite do cleanup in mapPartitions in that way. Here is a bit more
explanation (farther down):
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
On Dec 28, 2014 8:18 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Something like?
val a =
Try instead i.exists(_ == target)
On Dec 28, 2014 8:46 AM, Amit Behera amit.bd...@gmail.com wrote:
Hi Nicholas,
I am getting
error: value contains is not a member of Iterable[Int]
On Sun, Dec 28, 2014 at 2:06 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
take(1) will just give
Hi Sean and Nicholas
Thank you very much, *exists* method works here :)
On Sun, Dec 28, 2014 at 2:27 PM, Sean Owen so...@cloudera.com wrote:
Try instead i.exists(_ == target)
On Dec 28, 2014 8:46 AM, Amit Behera amit.bd...@gmail.com wrote:
Hi Nicholas,
I am getting
error: value contains
Hi Experts,
I am confusing on the input parameters of GenSort.scala and encountered strange
issues.
It requires 3 parameters: [num-parts] [records-per-part] [output-path].
Like Hadoop, I think the sizing of any one row(or record) of the sorting file
equals to 100 bytes. So if I want to
A follow-up to the blog cited below was hinted at, per But Wait,
There's More ... To keep this post brief, the remainder will be left to
a follow-up post.
Is this follow-up pending? Is it sort of pending? Did the follow-up
happen, but I just couldn't find it on the web?
Regards, Ray.
On Sun,
Hi Please find the complete error
-
|Downloading:
org/apache/spark/spark-core_2.10/1.2.0/spark-core_2.10-1.2.0.pom
Error |
Resolve error
Thanks Cody.
I reported the core dump as an issue on the Spark JIRA and a developer
diagnosed it as an openJDK issue.
So I switched over to Oracle Java 8 and... no more core dumps on the
examples. I reported the openJDK issue at the icedtea Bugzilla.
Looks like I'm off and running with Spark on
What do you mean when you say the overhead of spark shuffles start to
accumulate? Could you elaborate more?
In newer versions of Spark shuffle data is cleaned up automatically
when an RDD goes out of scope. It is safe to remove shuffle data at
this point because the RDD can no longer be
Hi there,
I have a cluster with CDH5.1 running on top of Redhat6.5, where the default
Python version is 2.6. I am trying to set up a proper iPython notebook
environment to develop spark application using pyspark.
Here
(Still pending, but believe it's in progress and being written by a
colleague here.)
On Sun, Dec 28, 2014 at 2:41 PM, Ray Melton rtmel...@gmail.com wrote:
A follow-up to the blog cited below was hinted at, per But Wait,
There's More ... To keep this post brief, the remainder will be left to
a
http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.2.0%7Cjar
That checksum is correct for this file, and is what Maven Central
reports. I wonder if your repo is corrupted. Try deleting everything
under ~/.m2/repository that's related to Spark and let it download
Hi Patrick - is that cleanup present in 1.1?
The overhead I am talking about is with regards to what I believe is
shuffle related metadata. If I watch the execution log I see small
broadcast variables created for every stage of execution, a few KB at a
time, and a certain number of MB remaining
Greetings!
Thanks for the comment.
I have tried several variants of this, as indicated.
The code works on small sets, but fails on larger sets.However, I don't get
memory errors.I see java.nio.channels.CancelledKeyException and things about
lost taskand then things like Resubmitting state 1, and
One value is at least 12 + 4 + 4 + 12 + 4 = 36 bytes if you factor in
object overhead, if my math is right. 60M of them is about 2.1GB for a
single key. I could imagine that blowing up an executor that's trying
to have one in memory and deserialize another. You won't want to use
groupByKey if the
Along with Xiangrui’s suggestion, we will soon be adding an implantation of
Streaming Logistic Regression, which will be similar to the current version of
Streaming Linear Regression, and continually update the model as new data
arrive (JIRA). Hopefully this will be in v1.3.
— Jeremy
Hi Fernando,
There’s currently no streaming ALS in Spark. I’m exploring a streaming singular
value decomposition (JIRA) based on this paper
(http://www.stat.osu.edu/~dmsl/thinSVDtracking.pdf), which might be one way to
think about it.
There has also been some cool recent work explicitly on
Yes, I had try that too. I took the pre-built spark 1.1 release. If you there
are changes in up coming changes for GraphX library, just let me know or in
spark 1.2 I can do try on that.
--Harihar
-
--Harihar
--
View this message in context:
Hey Eric,
I'm just curious - which specific features in 1.2 do you find most
help with usability? This is a theme we're focusing on for 1.3 as
well, so it's helpful to hear what makes a difference.
- Patrick
On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
eric.d.fried...@gmail.com wrote:
Hi
Hi,
I'm trying to using sampling with Spark Streaming. I imported the following
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
I then call sample
val streamtoread = KafkaUtils.createStream(ssc, zkQuorum, group,
The method you're referring to is a method of RDD, not DStream. If you
want to do something with a sample of each RDD in the DStream, then
call
streamtoread.foreachRDD { rdd =
val sampled = rdd.sample(...)
...
}
On Sun, Dec 28, 2014 at 10:44 PM, Josh J joshjd...@gmail.com wrote:
Hi,
I'm
I found the TF-IDF feature extraction and all the MLlib code that work with
pure Vector RDD very difficult to work with due to the lack of ability to
associate vector back to the original data. Why can't Spark MLlib support
LabeledPoint?
--
View this message in context:
Can you show how to do IDF transform on tfWithId? Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/TF-IDF-in-Spark-1-1-0-tp16389p20877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hey,
I saw this commit go by, and find it fairly fascinating:
https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305
For two reasons: 1) we have a report that is bogging down exactly in
a .join with lots of elements, so, glad to see the fix, but, more
interesting I
Hi Patrick,
I don't mean to be glib, but the fact that it works at all on my cluster (600
nodes) and data is a novel experience. This is the first release that I haven't
had to struggle with and then give up entirely. I can , for example, finally
use HiveContext from PySpark on CDH, at least
Hello Everyone,
Thank you for the time and the help :).
My goal here is to get this program working:
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
The only lines I do not have from the example are lines
Make sure you verify the following:
- Scala version : I think the correct version would be 2.10.x
- SparkMasterURL: Be sure that you copied the one displayed on the webui's
top left corner (running on port 8080)
Thanks
Best Regards
On Mon, Dec 29, 2014 at 12:26 PM, suhshekar52
Hi all
I just realized that spark-yarn artifact hasn't been published for 1.2.0
release. Any particular reason for that? I was using it in my yet another
spark-job-server project to submit jobs to a YARN cluster through
convenient REST APIs (with some success). The job server was creating
1) Could you please clarify on what you mean by checking the Scala version
is correct? In my pom.xml file it is 2.10.4 (which is the same as when I
start spark-shell).
2) The spark master URL is definitely correct as I have run other apps with
the same script that use Spark (like a word count
Just looked at the pom file that you are using, why are you having
different versions in it?
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-streaming-kafka_2.10/artifactId
version*1.1.1*/version
/dependency
dependency
groupIdorg.apache.spark/groupId
See this thread:
http://search-hadoop.com/m/JW1q5vd61V1/Spark-yarn+1.2.0subj=Re+spark+yarn_2+10+1+2+0+artifacts
Cheers
On Dec 28, 2014, at 11:13 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Hi all
I just realized that spark-yarn artifact hasn't been published for 1.2.0
I made both versions 1.1.1 and I got the same error. I then tried making
both 1.1.0 as that is the version of my Spark Core, but I got the same
error.
I noticed my Kafka dependency is for scala 2.9.2, while my spark streaming
kafka dependency is 2.10.x...I will try changing that next, but don't
37 matches
Mail list logo