I run pyspark streaming example queue_streaming.py. But run into the
following error, does anyone know what might be wrong ? Thanks
ERROR [2017-08-02 08:29:20,023] ({Stop-StreamingContext}
Logging.scala[logError]:91) - Cannot connect to Python process. It's
probably dead. Stopping
DataSet.describe only calculate the statistics for numerical data, but not
for categorical column. R's summary method can also calculate statistical
for numerical data which is very useful for exploratory data analysis. Just
wondering is there any api for categorical column statistics as well or
e.g. I have a custom class A (not case class), and I'd like to use it as
DataSet[A]. I guess I need to implement Encoder for this, but didn't find
any example for that, is there any document for that ? Thanks
Here's my screenshot, the stage 19 and 20 is one-to-one relationship.
They're the only child/parent. From my understanding, the shuffle write of
stage 19 should be the same as shuffle read of stage 20, but here they are
a little difference. Is there any reason for it ? Thanks.
[image: Inline
com>
wrote:
> See PythonRunner @
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
>
> On Tue, Oct 13, 2015 at 7:50 PM, canan chen <ccn...@gmail.com> wrote:
>
>> I look at the source code of spark, but didn't find where
I look at the source code of spark, but didn't find where python program is
started in python.
It seems spark-submit will call PythonGatewayServer, but where is python
program started ?
Thanks
, 2015 at 10:39 PM, canan chen <ccn...@gmail.com> wrote:
> Yes, I follow the guide in this doc, and run it as mesos client mode
>
> On Tue, Sep 8, 2015 at 6:31 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> In which mode are you submitting your application? (
Hi all,
I try to run spark on mesos, but it looks like I can not allocate resources
from mesos. I am not expert of mesos, but from the mesos log, it seems
spark always decline the offer from mesos. Not sure what's wrong, maybe
need some configuration change. Here's the mesos master log
I0908
tion already?
> http://spark.apache.org/docs/latest/running-on-mesos.html#using-a-mesos-master-url
>
> Thanks
> Best Regards
>
> On Tue, Sep 8, 2015 at 12:54 PM, canan chen <ccn...@gmail.com> wrote:
>
>> Hi all,
>>
>> I try to run spark on mesos, but it lo
b.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/deploy/rest
> ), current I don't think there's a document address this part, also this
> rest api is only used for SparkSubmit currently, not public API as I know.
>
> Thanks
> Jerry
>
>
> On Mon, Aug 31, 201
I mean the spark builtin rest api
On Mon, Aug 31, 2015 at 3:09 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:
> Check Spark Jobserver
> <https://github.com/spark-jobserver/spark-jobserver>
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 8:54 AM, ca
:
http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+
On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote:
I want to work on one jira, but it is not easy to do unit test, because
it involves different components especially UI
Anyone know about this ? Or do I miss something here ?
On Fri, Aug 7, 2015 at 4:20 PM, canan chen ccn...@gmail.com wrote:
Is there any reason that historyserver use another property for the event
log dir ? Thanks
then the `spark.history.fs.logDirectory`
will happen to point to `spark.eventLog.dir`, but the use case it provides
is broader than that.
-Andrew
2015-08-19 5:13 GMT-07:00 canan chen ccn...@gmail.com:
Anyone know about this ? Or do I miss something here ?
On Fri, Aug 7, 2015 at 4:20 PM
I want to work on one jira, but it is not easy to do unit test, because it
involves different components especially UI. spark building is pretty slow,
I don't want to build it each time to test my code change. I am wondering
how other people do ? Is there any experience can share ? Thanks
num-executor only works for yarn mode. In standalone mode, I have to set
the --total-executor-cores and --executor-cores. Isn't this way so
intuitive ? Any reason for that ?
TestSQLContext
or just create a new SQLContext from a SparkContext.
-Andrew
2015-08-15 20:33 GMT-07:00 canan chen ccn...@gmail.com:
I am not sure other people's spark debugging environment ( I mean for the
master branch) , Anyone can share his experience ?
On Sun, Aug 16, 2015 at 10:40 AM, canan
I import the spark source code to intellij, and want to run SparkPi in
intellij, but meet the folliwing weird compilation error? I googled it and
sbt clean doesn't work for me. I am not sure whether anyone else has meet
this issue also, any help is appreciated
Error:scalac:
while compiling:
I am not sure other people's spark debugging environment ( I mean for the
master branch) , Anyone can share his experience ?
On Sun, Aug 16, 2015 at 10:40 AM, canan chen ccn...@gmail.com wrote:
I import the spark source code to intellij, and want to run SparkPi in
intellij, but meet
I import the spark project into intellij, and try to run SparkPi in
intellij, but failed with compilation error:
Error:scalac:
while compiling:
/Users/werere/github/spark/sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala
during phase: jvm
library version:
Anyone know this ? Thanks
On Fri, Aug 7, 2015 at 4:20 PM, canan chen ccn...@gmail.com wrote:
Is there any reason that historyserver use another property for the event
log dir ? Thanks
Is there any reason that historyserver use another property for the event
log dir ? Thanks
Anyone know how to set log level in spark-submit ? Thanks
miércoles, 29 de julio de 2015, canan chen ccn...@gmail.com escribió:
Anyone know how to set log level in spark-submit ? Thanks
It works for me by using the following code. Could you share your code ?
*val data =sc.parallelize(List(1,2,3))*
*data.saveAsTextFile(file:Users/chen/Temp/c)*
On Thu, Jul 9, 2015 at 4:05 AM, spok20nn vijaypawnar...@gmail.com wrote:
Getting exception when wrting RDD to local disk using
Lots of places refer RDD lineage, I'd like to know what it refer to
exactly. My understanding is that it means the RDD dependencies and the
intermediate MapOutput info in MapOutputTracker. Correct me if I am wrong.
Thanks
Check the available resources you have (cpu cores memory ) on master web
ui.
The log you see means the job can't get any resources.
On Wed, Jun 24, 2015 at 5:03 AM, Nizan Grauer ni...@windward.eu wrote:
I'm having 30G per machine
This is the first (and only) job I'm trying to submit. So
One example is that you'd like to set up jdbc connection for each partition
and share this connection across the records.
mapPartitions is much more like the paradigm of mapper in mapreduce. In the
mapper of mapreduce, you have setup method to do any initialization stuff
before processing the
I don't think this is the correct question. Spark can be deployed on
different cluster manager frameworks like standard alone, yarn mesos.
Spark can't run without these cluster manager framework, that means spark
depend on cluster manager framework.
And the data management layer is the upstream
Why do you want it start until all the resources are ready ? Make it start
as early as possible should make it complete earlier and increase the
utilization of resources
On Tue, Jun 23, 2015 at 10:34 PM, Arun Luthra arun.lut...@gmail.com wrote:
Sometimes if my Hortonworks yarn-enabled cluster
I don't think there is yarn related stuff to access in spark. Spark don't
depend on yarn.
BTW, why do you want the yarn application id ?
On Mon, Jun 22, 2015 at 11:45 PM, roy rp...@njit.edu wrote:
Hi,
Is there a way to get Yarn application ID inside spark application, when
running spark
Here's one simple spark example that I call RDD#count 2 times. The first
time it would invoke 2 stages, but the second one only need 1 stage. Seems
the first stage is cached. Is that true ? Any flag can I control whether
the cache the intermediate stage
val data = sc.parallelize(1 to 10,
.
Best
Ayan
On Wed, Jun 17, 2015 at 10:21 PM, Mark Tse mark@d2l.com wrote:
I think
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
might shed some light on the behaviour you’re seeing.
Mark
*From:* canan chen [mailto:ccn...@gmail.com]
*Sent:* June-17-15
Maybe someone has asked this question before. I have this compilation issue
when compiling spark sql. And I found couple of posts on stackoverflow, but
did'nt work for me. Does anyone has experience on this ? thanks
http://stackoverflow.com/questions/26788367/quasiquotes-in-intellij-14
executors I want in the code ?
On Tue, May 26, 2015 at 5:57 PM, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:
I believe you would be restricted by the number of cores you have in your
cluster. Having a worker running without a core is useless.
On Tue, May 26, 2015 at 3:04 PM, canan chen ccn
dynamic allocation, the number of executor is not fixed, will change
dynamically according to the load.
Thanks
Jerry
2015-05-27 14:44 GMT+08:00 canan chen ccn...@gmail.com:
It seems the executor number is fixed for the standalone mode, not sure
other modes.
that by cooperating with the master and the driver
There is a one to one maping between Executor and JVM
Sent from Samsung Mobile
Original message
From: Arush Kharbanda
Date:2015/05/26 10:55 (GMT+00:00)
To: canan chen
Cc: Evo Eftimov ,user@spark.apache.org
Subject: Re: How does
From: Arush Kharbanda
Date:2015/05/26 10:55 (GMT+00:00)
To: canan chen
Cc: Evo Eftimov ,user@spark.apache.org
Subject: Re: How does spark manage the memory of executor with multiple
tasks
Hi Evo,
Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
would
Since spark can run multiple tasks in one executor, so I am curious to know
how does spark manage memory across these tasks. Say if one executor takes
1GB memory, then if this executor can run 10 tasks simultaneously, then
each task can consume 100MB on average. Do I understand it correctly ? It
etc developer
*From:* canan chen [mailto:ccn...@gmail.com]
*Sent:* Tuesday, May 26, 2015 9:02 AM
*To:* user@spark.apache.org
*Subject:* How does spark manage the memory of executor with multiple
tasks
Since spark can run multiple tasks in one executor, so I am curious to
know how does
as there is available in the Executor aka JVM Heap
*From:* canan chen [mailto:ccn...@gmail.com]
*Sent:* Tuesday, May 26, 2015 9:30 AM
*To:* Evo Eftimov
*Cc:* user@spark.apache.org
*Subject:* Re: How does spark manage the memory of executor with multiple
tasks
Yes, I know that one task represent
In spark standalone mode, there will be one executor per worker. I am
wondering how many executor can I acquire when I submit app ? Is it greedy
mode (as many as I can acquire )?
42 matches
Mail list logo