How RDD lineage works

2015-07-30 Thread bit1...@163.com
Hi, I don't get a good understanding how RDD lineage works, so I would ask whether spark provides a unit test in the code base to illustrate how RDD lineage works. If there is, What's the class name is it? Thanks! bit1...@163.com

Re: Re: How RDD lineage works

2015-07-30 Thread bit1...@163.com
Thanks TD and Zhihong for the guide. I will check it bit1...@163.com From: Tathagata Das Date: 2015-07-31 12:27 To: Ted Yu CC: bit1...@163.com; user Subject: Re: How RDD lineage works You have to read the original Spark paper to understand how RDD lineage works. https://www.cs.berkeley.edu

Re: Re: How RDD lineage works

2015-07-30 Thread bit1...@163.com
that partition. Thus, lost data can be recovered, often quite quickly, without requiring costly replication. bit1...@163.com From: bit1...@163.com Date: 2015-07-31 13:11 To: Tathagata Das; yuzhihong CC: user Subject: Re: Re: How RDD lineage works Thanks TD and Zhihong for the guide. I

A question about spark checkpoint

2015-07-28 Thread bit1...@163.com
heavyOpRDD = rdd.map(squareWithHeavyOp) heavyOpRDD.checkpoint() heavyOpRDD.foreach(println) println(Job 0 has been finished, press ENTER to do job 1) readLine() heavyOpRDD.foreach(println) } } bit1...@163.com

What if request cores are not satisfied

2015-07-22 Thread bit1...@163.com
with fewer cores, but I didn't get a chance to try/test it. Thanks. bit1...@163.com

Re: Re: Application jar file not found exception when submitting application

2015-07-06 Thread bit1...@163.com
Thanks Shixiong for the reply. Yes, I confirm that the file exists there ,simply checks with ls -l /data/software/spark-1.3.1-bin-2.4.0/applications/pss.am.core-1.0-SNAPSHOT-shaded.jar bit1...@163.com From: Shixiong Zhu Date: 2015-07-06 18:41 To: bit1...@163.com CC: user Subject: Re

Application jar file not found exception when submitting application

2015-07-06 Thread bit1...@163.com
(DriverRunner.scala:72) bit1...@163.com

Explanation of the numbers on Spark Streaming UI

2015-06-30 Thread bit1...@163.com
and the received records are many more than processed records, I can't understand why the total delay or scheduling day is not obvious(5 secs) here. Can someone help explain what clues from this UI? Thanks. bit1...@163.com

when cached RDD will unpersist its data

2015-06-23 Thread bit1...@163.com
I am kind of consused about when cached RDD will unpersist its data. I know we can explicitly unpersist it with RDD.unpersist ,but can it be unpersist automatically by the spark framework? Thanks. bit1...@163.com

How to figure out how many records received by individual receiver

2015-06-23 Thread bit1...@163.com
Hi, I am using spark1.3.1, and have 2 receivers, On the web UI, I can only see the total records received by all these 2 receivers, but I can't figure out the records received by individual receiver? Not sure whether the information is shown on the UI in spark1.4. bit1...@163.com

What does [Stage 0: (0 + 2) / 2] mean on the console

2015-06-23 Thread bit1...@163.com
it bit1...@163.com

Re: Re: What does [Stage 0: (0 + 2) / 2] mean on the console

2015-06-23 Thread bit1...@163.com
Hi, Akhil, Thank you for the explanation! bit1...@163.com From: Akhil Das Date: 2015-06-23 16:29 To: bit1...@163.com CC: user Subject: Re: What does [Stage 0: (0 + 2) / 2] mean on the console Well, you could that (Stage information) is an ASCII representation of the WebUI (running on port

Re: RE: Build spark application into uber jar

2015-06-19 Thread bit1...@163.com
spark.scopecompile/spark.scope /properties /profile profile idClusterRun/id properties spark.scopeprovided/spark.scope /properties /profile bit1...@163.com From: prajod.vettiyat...@wipro.com Date: 2015-06-19 15:22 To: bit1...@163.com; ak...@sigmoidanalytics.com CC: user@spark.apache.org Subject

Re: RE: Build spark application into uber jar

2015-06-19 Thread bit1...@163.com
Sure, Thanks Projod for the detailed steps! bit1...@163.com From: prajod.vettiyat...@wipro.com Date: 2015-06-19 16:56 To: bit1...@163.com; ak...@sigmoidanalytics.com CC: user@spark.apache.org Subject: RE: RE: Build spark application into uber jar Multiple maven profiles may be the ideal way

Re: RE: Spark or Storm

2015-06-19 Thread bit1...@163.com
, then it will be at most once semantics? bit1...@163.com From: Haopu Wang Date: 2015-06-19 18:47 To: Enno Shioji; Tathagata Das CC: prajod.vettiyat...@wipro.com; Cody Koeninger; bit1...@163.com; Jordan Pilat; Will Briggs; Ashish Soni; ayan guha; user@spark.apache.org; Sateesh Kavuri; Spark Enthusiast

Re: RE: Build spark application into uber jar

2015-06-19 Thread bit1...@163.com
Thank you for the reply. Run the application locally means that I run the application in my IDE with master as local[*]. When spark stuff is marked as provided, then I can't run it because the spark stuff is missing. So, how do you work around this? Thanks! bit1...@163.com From

Re: RE: Spark or Storm

2015-06-18 Thread bit1...@163.com
. From the user end, since tasks may process already processed data, user end should detect that some data has already been processed,eg, use some unique ID. Not sure if I have understood correctly. bit1...@163.com From: prajod.vettiyat...@wipro.com Date: 2015-06-18 16:56 To: jrpi

Build spark application into uber jar

2015-06-18 Thread bit1...@163.com
! bit1...@163.com

Re: Wired Problem: Task not serializable[Spark Streaming]

2015-06-08 Thread bit1...@163.com
Could someone help explain what happens that leads to the Task not serializable issue? Thanks. bit1...@163.com From: bit1...@163.com Date: 2015-06-08 19:08 To: user Subject: Wired Problem: Task not serializable[Spark Streaming] Hi, With the following simple code, I got an exception

Which class takes place of BlockManagerWorker in Spark 1.3.1

2015-06-06 Thread bit1...@163.com
. BTW, BlockManagerMaster is there, it makes no sense that BlockManagerWorker is gone. bit1...@163.com

Don't understand the numbers on the Storage UI(/storage/rdd/?id=4)

2015-06-06 Thread bit1...@163.com
, in my opinion it should be about 600M * 2, Looks some compression happens under the scene or something else? Thanks! bit1...@163.com

Articles related with how spark handles spark components(Driver,Worker,Executor, Task) failure

2015-06-05 Thread bit1...@163.com
Hi, I am looking for some articles/blogs on the topic about how spark handles the various failures,such as Driver,Worker,Executor, Task..etc Are there some articles/blogs on this topic? Detailes into source code would be the best. Thanks very much! bit1...@163.com

Don't understand schedule jobs within an Application

2015-06-01 Thread bit1...@163.com
good response times, without waiting for the long job to finish. This mode is best for multi-user settings bit1...@163.com

Re: How Broadcast variable works

2015-05-30 Thread bit1...@163.com
Can someone help take a look at my questions? Thanks. bit1...@163.com From: bit1...@163.com Date: 2015-05-29 18:57 To: user Subject: How Broadcast variable works Hi, I have a spark streaming application. SparkContext uses broadcast vriables to broadcast Configuration information that each

回复: How to use zookeeper in Spark Streaming

2015-05-24 Thread bit1...@163.com
Can someone please help me on this? bit1...@163.com 发件人: bit1...@163.com 发送时间: 2015-05-24 13:53 收件人: user 主题: How to use zookeeper in Spark Streaming Hi, In my spark streaming application, when the application starts and get running, the Tasks running on the Worker nodes need

Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-29 Thread bit1...@163.com
Correct myself: For the SparkContext#wholeTextFile, the RDD's elements are kv pairs, the key is the file path, and the value is the file content So,for the SparkContext#wholeTextFile, the RDD has already carried the file information. bit1...@163.com From: Saisai Shao Date: 2015-04-29 15:50

Re: Re: Question about Memory Used and VCores Used

2015-04-29 Thread bit1...@163.com
Thanks Sandy, it is very useful! bit1...@163.com From: Sandy Ryza Date: 2015-04-29 15:24 To: bit1...@163.com CC: user Subject: Re: Question about Memory Used and VCores Used Hi, Good question. The extra memory comes from spark.yarn.executor.memoryOverhead, the space used

Question about Memory Used and VCores Used

2015-04-28 Thread bit1...@163.com
think the memory used should be executor-memory*numOfWorkers=3G*3=9G, and the Vcores used shoulde be executor-cores*numOfWorkers=6 Can you please explain the result?Thanks. bit1...@163.com

Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread bit1...@163.com
Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the data in RDD is from bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:10 To: lokeshkumar CC: spark users Subject

Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread bit1...@163.com
For the SparkContext#textFile, if a directory is given as the path parameter ,then it will pick up the files in the directory, so the same thing will occur. bit1...@163.com From: Saisai Shao Date: 2015-04-29 10:54 To: Vadim Bichutskiy CC: bit1...@163.com; lokeshkumar; user Subject: Re: Re

Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread bit1...@163.com
Hi, I am frequently asked why spark is also much faster than Hadoop MapReduce on disk (without the use of memory cache). I have no convencing answer for this question, could you guys elaborate on this? Thanks!

Re: Re: Why Spark is much faster than Hadoop MapReduce even on disk

2015-04-27 Thread bit1...@163.com
Is it? I learned somewhere else that spark's speed is 5~10 times faster than Hadoop MapReduce. bit1...@163.com From: Ilya Ganelin Date: 2015-04-28 10:55 To: bit1...@163.com; user Subject: Re: Why Spark is much faster than Hadoop MapReduce even on disk I believe the typical answer

Re: Re: spark streaming printing no output

2015-04-15 Thread bit1...@163.com
Looks the message is consumed by the another console?( can see messages typed on this port from another console.) bit1...@163.com From: Shushant Arora Date: 2015-04-15 17:11 To: Akhil Das CC: user@spark.apache.org Subject: Re: spark streaming printing no output When I launched spark-shell

Re: Re: About Waiting batches on the spark streaming UI

2015-04-05 Thread bit1...@163.com
Thanks Tathagata for the explanation! bit1...@163.com From: Tathagata Das Date: 2015-04-04 01:28 To: Ted Yu CC: bit1129; user Subject: Re: About Waiting batches on the spark streaming UI Maybe that should be marked as waiting as well. Will keep that in mind. We plan to update the ui soon, so

About Waiting batches on the spark streaming UI

2015-04-03 Thread bit1...@163.com
: 23 Waiting batches: 1 Received records: 0 Processed records: 0 bit1...@163.com

Re: Spark + Kafka

2015-04-01 Thread bit1...@163.com
Please make sure that you have given more cores than Receiver numbers. From: James King Date: 2015-04-01 15:21 To: user Subject: Spark + Kafka I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying arrivals to Kafka topic. So i know messages are being

Re: Re: Explanation on the Hive in the Spark assembly

2015-03-15 Thread bit1...@163.com
Thanks Cheng for the great explanation! bit1...@163.com From: Cheng Lian Date: 2015-03-16 00:53 To: bit1...@163.com; Wang, Daoyuan; user Subject: Re: Explanation on the Hive in the Spark assembly Spark SQL supports most commonly used features of HiveQL. However, different HiveQL statements

Re: RE: Explanation on the Hive in the Spark assembly

2015-03-13 Thread bit1...@163.com
Thanks Daoyuan. What do you mean by running some native command, I never thought that hive will run without an computing engine like Hadoop MR or spark. Thanks. bit1...@163.com From: Wang, Daoyuan Date: 2015-03-13 16:39 To: bit1...@163.com; user Subject: RE: Explanation on the Hive

How does Spark honor data locality when allocating computing resources for an application

2015-03-13 Thread bit1...@163.com
for the application. My question is: Assume that the data the application will process is spread on all the worker nodes, then the data locality is lost if using the above policy? Not sure whether I have unstandood correctly or I have missed something. bit1...@163.com

Explanation on the Hive in the Spark assembly

2015-03-13 Thread bit1...@163.com
and Hive on Hadoop? 2. Does Hive in the spark assembly use Spark execution engine or Hadoop MR engine? Thanks. bit1...@163.com

Re: Explanation on the Hive in the Spark assembly

2015-03-13 Thread bit1...@163.com
Can anyone have a look on this question? Thanks. bit1...@163.com From: bit1...@163.com Date: 2015-03-13 16:24 To: user Subject: Explanation on the Hive in the Spark assembly Hi, sparkers, I am kind of confused about hive in the spark assembly. I think hive in the spark assembly

Number of cores per executor on Spark Standalone

2015-02-27 Thread bit1...@163.com
Hi , I know that spark on yarn has a configuration parameter(executor-cores NUM) to specify the number of cores per executor. How about spark standalone? I can specify the total cores, but how could I know how many cores each executor will take(presume one node one executor)? bit1...@163

Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-26 Thread bit1...@163.com
Sure, Thanks Tathagata! bit1...@163.com From: Tathagata Das Date: 2015-02-26 14:47 To: bit1...@163.com CC: Akhil Das; user Subject: Re: Re: Many Receiver vs. Many threads per Receiver Spark Streaming has a new Kafka direct stream, to be release as experimental feature with 1.3. That uses

Re: Re: Many Receiver vs. Many threads per Receiver

2015-02-24 Thread bit1...@163.com
Thanks Akhil. Not sure whether thelowlevel consumer.will be officially supported by Spark Streaming. So far, I don't see it mentioned/documented in the spark streaming programming guide. bit1...@163.com From: Akhil Das Date: 2015-02-24 16:21 To: bit1...@163.com CC: user Subject: Re: Many

Re_ Re_ Does Spark Streaming depend on Hadoop_(4)

2015-02-23 Thread bit1...@163.com
( _ = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2) ) //repartition to 18, 3 times of the receiver val partitions = ssc.union(streams).repartition(18).map(DataReceived: + _) partitions.print() ssc.start() ssc.awaitTermination() } } bit1...@163.com

Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
Thanks both of you guys on this! bit1...@163.com From: Akhil Das Date: 2015-02-24 12:58 To: Tathagata Das CC: user; bit1129 Subject: Re: About FlumeUtils.createStream I see, thanks for the clarification TD. On 24 Feb 2015 09:56, Tathagata Das t...@databricks.com wrote: Akhil

Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
The behvior is exactly what I expected. Thanks Akhil and Tathagata! bit1...@163.com From: Akhil Das Date: 2015-02-24 13:32 To: bit1129 CC: Tathagata Das; user Subject: Re: Re: About FlumeUtils.createStream That depends on how many machines you have in your cluster. Say you have 6 workers

Re: Re: About FlumeUtils.createStream

2015-02-23 Thread bit1...@163.com
will stay on one cluster node, or will they distributed among the cluster nodes? bit1...@163.com From: Akhil Das Date: 2015-02-24 12:58 To: Tathagata Das CC: user; bit1129 Subject: Re: About FlumeUtils.createStream I see, thanks for the clarification TD. On 24 Feb 2015 09:56, Tathagata Das t

Re: Re: Does Spark Streaming depend on Hadoop?

2015-02-23 Thread bit1...@163.com
) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 32 more bit1...@163.com From: Ted Yu Date: 2015-02-24 10:24 To: bit1...@163.com CC: user Subject: Re: Does Spark Streaming depend on Hadoop? Can you pastebin the whole stack trace ? Thanks On Feb 23, 2015, at 6:14 PM, bit1...@163.com

Does Spark Streaming depend on Hadoop?

2015-02-23 Thread bit1...@163.com
main java.net.ConnectException: Call From hadoop.master/192.168.26.137 to hadoop.master:9000 failed on connection exception. From the exception, it tries to connect to 9000 which is for Hadoop/HDFS. and I don't use Hadoop at all in my code(such as save to HDFS). bit1...@163.com

Re: Re: Re_ Re_ Does Spark Streaming depend on Hadoop_(4)

2015-02-23 Thread bit1...@163.com
Thanks Tathagata! You are right, I have packaged the contents of the spark shipped example jar into my jarwhich contains serveral HDFS configuration files like hdfs-default.xml etc. Thanks! bit1...@163.com From: Tathagata Das Date: 2015-02-24 12:04 To: bit1...@163.com CC: yuzhihong

Many Receiver vs. Many threads per Receiver

2015-02-23 Thread bit1...@163.com
on this. Thank bit1...@163.com

Re: Re: Spark streaming doesn't print output when working with standalone master

2015-02-20 Thread bit1...@163.com
Thanks Akhil. From: Akhil Das Date: 2015-02-20 16:29 To: bit1...@163.com CC: user Subject: Re: Re: Spark streaming doesn't print output when working with standalone master local[3] spawns 3 threads on 1 core :) Thanks Best Regards On Fri, Feb 20, 2015 at 12:50 PM, bit1...@163.com bit1

About FlumeUtils.createStream

2015-02-20 Thread bit1...@163.com
Hi, In the spark streaming application, I write the code, FlumeUtils.createStream(ssc,localhost,),which means spark will listen on the port, and wait for Flume Sink to write to it. My question is: when I submit the application to the Spark Standalone cluster, will be opened only

Re: Re: Spark streaming doesn't print output when working with standalone master

2015-02-19 Thread bit1...@163.com
only be allocated one processor. This leads to me another question: Although I have only one core, If I have specified the master and executor as --master local[3] --executor-memory 512M --total-executor-cores 3. Since I have only one core, why does this work? bit1...@163.com From: Akhil

Spark streaming doesn't print output when working with standalone master

2015-02-19 Thread bit1...@163.com
Hi, I am trying the spark streaming log analysis reference application provided by Databricks at https://github.com/databricks/reference-apps/tree/master/logs_analyzer When I deploy the code to the standalone cluster, there is no output at will with the following shell script.Which means, the

java.lang.StackOverflowError when doing spark sql

2015-02-19 Thread bit1...@163.com
I am using spark 1.2.0(prebuild with hadoop 2.4) on windows7 I found a same bug here https://issues.apache.org/jira/browse/SPARK-4208,but it is still open, is there a workaround for this? Thanks! The stack trace: StackOverflow Exception occurs Exception in thread main

Re: Problem with 1 master + 2 slaves cluster

2015-02-18 Thread bit1...@163.com
But I am able to run the SparkPi example: ./run-example SparkPi 1000 --master spark://192.168.26.131:7077 Result:Pi is roughly 3.14173708 bit1...@163.com From: bit1...@163.com Date: 2015-02-18 16:29 To: user Subject: Problem with 1 master + 2 slaves cluster Hi sparkers, I setup a spark(1.2.1

Re: Re: Problem with 1 master + 2 slaves cluster

2015-02-18 Thread bit1...@163.com
Sure, thanks Akhil. A further question : Is local file system(file:///) not supported in standalone cluster? bit1...@163.com From: Akhil Das Date: 2015-02-18 17:35 To: bit1...@163.com CC: user Subject: Re: Problem with 1 master + 2 slaves cluster Since the cluster is standalone, you

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Hi Arush, With your code, I still didn't see the output Received X flumes events.. bit1...@163.com From: bit1...@163.com Date: 2015-02-17 14:08 To: Arush Kharbanda CC: user Subject: Re: Re: Question about spark streaming+Flume Ok, you are missing a letter in foreachRDD.. let me proceed

Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Hi, I am trying Spark Streaming + Flume example: 1. Code object SparkFlumeNGExample { def main(args : Array[String]) { val conf = new SparkConf().setAppName(SparkFlumeNGExample) val ssc = new StreamingContext(conf, Seconds(10)) val lines =

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Ok, you are missing a letter in foreachRDD.. let me proceed.. bit1...@163.com From: Arush Kharbanda Date: 2015-02-17 14:31 To: bit1...@163.com CC: user Subject: Re: Question about spark streaming+Flume Hi Can you try this val lines = FlumeUtils.createStream(ssc,localhost,) // Print

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
14:31 To: bit1...@163.com CC: user Subject: Re: Question about spark streaming+Flume Hi Can you try this val lines = FlumeUtils.createStream(ssc,localhost,) // Print out the count of events received from this server in each batch lines.count().map(cnt = Received + cnt + flume events

Re: Hi: hadoop 2.5 for spark

2015-01-30 Thread bit1...@163.com
You can use prebuilt version that is built upon hadoop2.4. From: Siddharth Ubale Date: 2015-01-30 15:50 To: user@spark.apache.org Subject: Hi: hadoop 2.5 for spark Hi , I am beginner with Apache spark. Can anyone let me know if it is mandatory to build spark with the Hadoop version I am

Re: RE: Shuffle to HDFS

2015-01-26 Thread bit1...@163.com
I have also thought that Hadoop mapper output result is saved on HDFS, at least if the job only has Mapper but doesn't have Reducer. If there is reducer, then the map output will be saved on local disk? From: Shao, Saisai Date: 2015-01-26 15:23 To: Larry Liu CC:

Error occurs when running Spark SQL example

2015-01-17 Thread bit1...@163.com
When I run the following spark sql example within Idea, I got the StackOverflowError, lookes like the scala.util.parsing.combinator.Parsers are calling recursively and infinitely. Anyone encounters this? package spark.examples import org.apache.spark.{SparkContext, SparkConf} import

EventBatch and SparkFlumeProtocol not found in spark codebase?

2015-01-09 Thread bit1...@163.com
Hi, When I fetch the Spark code base and import into Intellj Idea as SBT project, then I build it with SBT, but there is compiling errors in the examples module,complaining that the EventBatch and SparkFlumeProtocol,looks they should be in org.apache.spark.streaming.flume.sink package. Not

Re: Re: I think I am almost lost in the internals of Spark

2015-01-06 Thread bit1...@163.com
Thanks Eric. Yes..I am Chinese, :-). I will read through the articles, thank you! bit1...@163.com From: eric wong Date: 2015-01-07 10:46 To: bit1...@163.com CC: user Subject: Re: Re: I think I am almost lost in the internals of Spark A good beginning if you are chinese. https://github.com

Re: Unable to build spark from source

2015-01-03 Thread bit1...@163.com
The error hints that the maven module scala-compiler can't be fetched from repo1.maven.org. Should some repositoy urls be added to the Maven's settings file? bit1...@163.com From: Manoj Kumar Date: 2015-01-03 18:46 To: user Subject: Unable to build spark from source Hello, I tried

Re: sqlContext is undefined in the Spark Shell

2015-01-03 Thread bit1...@163.com
This is a noise,please ignore I figured out what happens... bit1...@163.com From: bit1...@163.com Date: 2015-01-03 19:03 To: user Subject: sqlContext is undefined in the Spark Shell Hi, In the spark shell, I do the following two things: 1. scala val cxt = new

sqlContext is undefined in the Spark Shell

2015-01-03 Thread bit1...@163.com
._ Is there something missing? I am using Spark 1.2.0. Thanks. bit1...@163.com