Cannot connect to Python process in Spark Streaming

2017-08-01 Thread canan chen
I run pyspark streaming example queue_streaming.py. But run into the following error, does anyone know what might be wrong ? Thanks ERROR [2017-08-02 08:29:20,023] ({Stop-StreamingContext} Logging.scala[logError]:91) - Cannot connect to Python process. It's probably dead. Stopping

Is there any api for categorical column statistic ?

2016-11-23 Thread canan chen
DataSet.describe only calculate the statistics for numerical data, but not for categorical column. R's summary method can also calculate statistical for numerical data which is very useful for exploratory data analysis. Just wondering is there any api for categorical column statistics as well or

How to use custom class in DataSet

2016-08-29 Thread canan chen
e.g. I have a custom class A (not case class), and I'd like to use it as DataSet[A]. I guess I need to implement Encoder for this, but didn't find any example for that, is there any document for that ? Thanks

Why the shuffle write is not the exactly same as shuffle read of next stage

2016-03-10 Thread canan chen
Here's my screenshot, the stage 19 and 20 is one-to-one relationship. They're the only child/parent. From my understanding, the shuffle write of stage 19 should be the same as shuffle read of stage 20, but here they are a little difference. Is there any reason for it ? Thanks. [image: Inline

Re: When does python program started in pyspark

2015-10-13 Thread canan chen
com> wrote: > See PythonRunner @ > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala > > On Tue, Oct 13, 2015 at 7:50 PM, canan chen <ccn...@gmail.com> wrote: > >> I look at the source code of spark, but didn't find where

When does python program started in pyspark

2015-10-13 Thread canan chen
I look at the source code of spark, but didn't find where python program is started in python. It seems spark-submit will call PythonGatewayServer, but where is python program started ? Thanks

Re: Can not allocate executor when running spark on mesos

2015-09-09 Thread canan chen
, 2015 at 10:39 PM, canan chen <ccn...@gmail.com> wrote: > Yes, I follow the guide in this doc, and run it as mesos client mode > > On Tue, Sep 8, 2015 at 6:31 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> In which mode are you submitting your application? (

Can not allocate executor when running spark on mesos

2015-09-08 Thread canan chen
Hi all, I try to run spark on mesos, but it looks like I can not allocate resources from mesos. I am not expert of mesos, but from the mesos log, it seems spark always decline the offer from mesos. Not sure what's wrong, maybe need some configuration change. Here's the mesos master log I0908

Re: Can not allocate executor when running spark on mesos

2015-09-08 Thread canan chen
tion already? > http://spark.apache.org/docs/latest/running-on-mesos.html#using-a-mesos-master-url > > Thanks > Best Regards > > On Tue, Sep 8, 2015 at 12:54 PM, canan chen <ccn...@gmail.com> wrote: > >> Hi all, >> >> I try to run spark on mesos, but it lo

Re: Where is the doc about the spark rest api ?

2015-08-31 Thread canan chen
b.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/deploy/rest > ), current I don't think there's a document address this part, also this > rest api is only used for SparkSubmit currently, not public API as I know. > > Thanks > Jerry > > > On Mon, Aug 31, 201

Re: Where is the doc about the spark rest api ?

2015-08-31 Thread canan chen
I mean the spark builtin rest api On Mon, Aug 31, 2015 at 3:09 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Check Spark Jobserver > <https://github.com/spark-jobserver/spark-jobserver> > > Thanks > Best Regards > > On Mon, Aug 31, 2015 at 8:54 AM, ca

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
: http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+ On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote: I want to work on one jira, but it is not easy to do unit test, because it involves different components especially UI

Re: Why use spark.history.fs.logDirectory instead of spark.eventLog.dir

2015-08-19 Thread canan chen
Anyone know about this ? Or do I miss something here ? On Fri, Aug 7, 2015 at 4:20 PM, canan chen ccn...@gmail.com wrote: Is there any reason that historyserver use another property for the event log dir ? Thanks

Re: Why use spark.history.fs.logDirectory instead of spark.eventLog.dir

2015-08-19 Thread canan chen
then the `spark.history.fs.logDirectory` will happen to point to `spark.eventLog.dir`, but the use case it provides is broader than that. -Andrew 2015-08-19 5:13 GMT-07:00 canan chen ccn...@gmail.com: Anyone know about this ? Or do I miss something here ? On Fri, Aug 7, 2015 at 4:20 PM

What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
I want to work on one jira, but it is not easy to do unit test, because it involves different components especially UI. spark building is pretty slow, I don't want to build it each time to test my code change. I am wondering how other people do ? Is there any experience can share ? Thanks

Why standalone mode don't allow to set num-executor ?

2015-08-18 Thread canan chen
num-executor only works for yarn mode. In standalone mode, I have to set the --total-executor-cores and --executor-cores. Isn't this way so intuitive ? Any reason for that ?

Re: TestSQLContext compilation error when run SparkPi in Intellij ?

2015-08-16 Thread canan chen
TestSQLContext or just create a new SQLContext from a SparkContext. -Andrew 2015-08-15 20:33 GMT-07:00 canan chen ccn...@gmail.com: I am not sure other people's spark debugging environment ( I mean for the master branch) , Anyone can share his experience ? On Sun, Aug 16, 2015 at 10:40 AM, canan

TestSQLContext compilation error when run SparkPi in Intellij ?

2015-08-15 Thread canan chen
I import the spark source code to intellij, and want to run SparkPi in intellij, but meet the folliwing weird compilation error? I googled it and sbt clean doesn't work for me. I am not sure whether anyone else has meet this issue also, any help is appreciated Error:scalac: while compiling:

Re: TestSQLContext compilation error when run SparkPi in Intellij ?

2015-08-15 Thread canan chen
I am not sure other people's spark debugging environment ( I mean for the master branch) , Anyone can share his experience ? On Sun, Aug 16, 2015 at 10:40 AM, canan chen ccn...@gmail.com wrote: I import the spark source code to intellij, and want to run SparkPi in intellij, but meet

Error when running SparkPi in Intellij

2015-08-11 Thread canan chen
I import the spark project into intellij, and try to run SparkPi in intellij, but failed with compilation error: Error:scalac: while compiling: /Users/werere/github/spark/sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala during phase: jvm library version:

Re: Why use spark.history.fs.logDirectory instead of spark.eventLog.dir

2015-08-10 Thread canan chen
Anyone know this ? Thanks On Fri, Aug 7, 2015 at 4:20 PM, canan chen ccn...@gmail.com wrote: Is there any reason that historyserver use another property for the event log dir ? Thanks

Why use spark.history.fs.logDirectory instead of spark.eventLog.dir

2015-08-07 Thread canan chen
Is there any reason that historyserver use another property for the event log dir ? Thanks

How to set log level in spark-submit ?

2015-07-29 Thread canan chen
Anyone know how to set log level in spark-submit ? Thanks

Re: How to set log level in spark-submit ?

2015-07-29 Thread canan chen
miércoles, 29 de julio de 2015, canan chen ccn...@gmail.com escribió: Anyone know how to set log level in spark-submit ? Thanks

Re: RDD saveAsTextFile() to local disk

2015-07-08 Thread canan chen
It works for me by using the following code. Could you share your code ? *val data =sc.parallelize(List(1,2,3))* *data.saveAsTextFile(file:Users/chen/Temp/c)* On Thu, Jul 9, 2015 at 4:05 AM, spok20nn vijaypawnar...@gmail.com wrote: Getting exception when wrting RDD to local disk using

What does RDD lineage refer to ?

2015-07-08 Thread canan chen
Lots of places refer RDD lineage, I'd like to know what it refer to exactly. My understanding is that it means the RDD dependencies and the intermediate MapOutput info in MapOutputTracker. Correct me if I am wrong. Thanks

Re: Spark standalone cluster - resource management

2015-06-23 Thread canan chen
Check the available resources you have (cpu cores memory ) on master web ui. The log you see means the job can't get any resources. On Wed, Jun 24, 2015 at 5:03 AM, Nizan Grauer ni...@windward.eu wrote: I'm having 30G per machine This is the first (and only) job I'm trying to submit. So

Re: map V mapPartitions

2015-06-23 Thread canan chen
One example is that you'd like to set up jdbc connection for each partition and share this connection across the records. mapPartitions is much more like the paradigm of mapper in mapreduce. In the mapper of mapreduce, you have setup method to do any initialization stuff before processing the

Re: When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread canan chen
I don't think this is the correct question. Spark can be deployed on different cluster manager frameworks like standard alone, yarn mesos. Spark can't run without these cluster manager framework, that means spark depend on cluster manager framework. And the data management layer is the upstream

Re: Spark launching without all of the requested YARN resources

2015-06-23 Thread canan chen
Why do you want it start until all the resources are ready ? Make it start as early as possible should make it complete earlier and increase the utilization of resources On Tue, Jun 23, 2015 at 10:34 PM, Arun Luthra arun.lut...@gmail.com wrote: Sometimes if my Hortonworks yarn-enabled cluster

Re: Yarn application ID for Spark job on Yarn

2015-06-23 Thread canan chen
I don't think there is yarn related stuff to access in spark. Spark don't depend on yarn. BTW, why do you want the yarn application id ? On Mon, Jun 22, 2015 at 11:45 PM, roy rp...@njit.edu wrote: Hi, Is there a way to get Yarn application ID inside spark application, when running spark

Intermedate stage will be cached automatically ?

2015-06-17 Thread canan chen
Here's one simple spark example that I call RDD#count 2 times. The first time it would invoke 2 stages, but the second one only need 1 stage. Seems the first stage is cached. Is that true ? Any flag can I control whether the cache the intermediate stage val data = sc.parallelize(1 to 10,

Re: Intermedate stage will be cached automatically ?

2015-06-17 Thread canan chen
. Best Ayan On Wed, Jun 17, 2015 at 10:21 PM, Mark Tse mark@d2l.com wrote: I think https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence might shed some light on the behaviour you’re seeing. Mark *From:* canan chen [mailto:ccn...@gmail.com] *Sent:* June-17-15

Spark compilation issue on intellij

2015-06-08 Thread canan chen
Maybe someone has asked this question before. I have this compilation issue when compiling spark sql. And I found couple of posts on stackoverflow, but did'nt work for me. Does anyone has experience on this ? thanks http://stackoverflow.com/questions/26788367/quasiquotes-in-intellij-14

Re: How many executors can I acquire in standalone mode ?

2015-05-27 Thread canan chen
executors I want in the code ? On Tue, May 26, 2015 at 5:57 PM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: I believe you would be restricted by the number of cores you have in your cluster. Having a worker running without a core is useless. On Tue, May 26, 2015 at 3:04 PM, canan chen ccn

Re: Is the executor number fixed during the lifetime of one app ?

2015-05-27 Thread canan chen
dynamic allocation, the number of executor is not fixed, will change dynamically according to the load. Thanks Jerry 2015-05-27 14:44 GMT+08:00 canan chen ccn...@gmail.com: It seems the executor number is fixed for the standalone mode, not sure other modes.

Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
that by cooperating with the master and the driver There is a one to one maping between Executor and JVM Sent from Samsung Mobile Original message From: Arush Kharbanda Date:2015/05/26 10:55 (GMT+00:00) To: canan chen Cc: Evo Eftimov ,user@spark.apache.org Subject: Re: How does

Re: How does spark manage the memory of executor with multiple tasks

2015-05-27 Thread canan chen
From: Arush Kharbanda Date:2015/05/26 10:55 (GMT+00:00) To: canan chen Cc: Evo Eftimov ,user@spark.apache.org Subject: Re: How does spark manage the memory of executor with multiple tasks Hi Evo, Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you would

How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
Since spark can run multiple tasks in one executor, so I am curious to know how does spark manage memory across these tasks. Say if one executor takes 1GB memory, then if this executor can run 10 tasks simultaneously, then each task can consume 100MB on average. Do I understand it correctly ? It

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
etc developer *From:* canan chen [mailto:ccn...@gmail.com] *Sent:* Tuesday, May 26, 2015 9:02 AM *To:* user@spark.apache.org *Subject:* How does spark manage the memory of executor with multiple tasks Since spark can run multiple tasks in one executor, so I am curious to know how does

Re: How does spark manage the memory of executor with multiple tasks

2015-05-26 Thread canan chen
as there is available in the Executor aka JVM Heap *From:* canan chen [mailto:ccn...@gmail.com] *Sent:* Tuesday, May 26, 2015 9:30 AM *To:* Evo Eftimov *Cc:* user@spark.apache.org *Subject:* Re: How does spark manage the memory of executor with multiple tasks Yes, I know that one task represent

How many executors can I acquire in standalone mode ?

2015-05-26 Thread canan chen
In spark standalone mode, there will be one executor per worker. I am wondering how many executor can I acquire when I submit app ? Is it greedy mode (as many as I can acquire )?