Spark on YARN can use History Server by setting the configuration
spark.yarn.historyServer.address. But, I can't find similar config for
Mesos. Is History Server supported by Spark on Mesos? Thanks.
Kelvin
Hi, I used union() before and yes it may be slow sometimes. I _guess_ your
variable 'data' is a Scala collection and compute() returns an RDD. Right?
If yes, I tried the approach below to operate on one RDD only during the
whole computation (Yes, I also saw that too many RDD hurt performance).
Hi Joe, you might increase spark.yarn.executor.memoryOverhead to see if it
fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996
Hope this helps.
On Tue, Feb 24, 2015 at 2:05 PM, Yiannis Gkoufas johngou...@gmail.com
wrote:
No problem, Joe. There
Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if
it fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996
On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda
ar...@sigmoidanalytics.com wrote:
Can you share what error you
Spark bookkeeping and anything the user does inside UDFs.
-Sandy
On Fri, Feb 20, 2015 at 11:44 AM, Kelvin Chu 2dot7kel...@gmail.com
wrote:
Hi Sandy,
I am also doing memory tuning on YARN. Just want to confirm, is it
correct to say:
spark.executor.memory
Hi,
Currently, there is only one executor per worker. There is jira ticket to
relax this:
https://issues.apache.org/jira/browse/SPARK-1706
But, if you want to use more cores, maybe, you can try increasing
SPARK_WORKER_INSTANCES. It increases the number of workers per machine.
Take a look here:
Hi Sandy,
I am also doing memory tuning on YARN. Just want to confirm, is it correct
to say:
spark.executor.memory - spark.yarn.executor.memoryOverhead = the memory I
can actually use in my jvm application
If it is not, what is the correct relationship? Any other variables or
config parameters
Hi Mohammed,
Did you use --jars to specify your jdbc driver when you submitted your job?
Take a look of this link:
http://spark.apache.org/docs/1.2.0/submitting-applications.html
Hope this help!
Kelvin
On Thu, Feb 19, 2015 at 7:24 PM, Mohammed Guller moham...@glassbeam.com
wrote:
Hi –
I
Since the stacktrace shows kryo is being used, maybe, you could also try
increasing spark.kryoserializer.buffer.max.mb. Hope this help.
Kelvin
On Tue, Feb 10, 2015 at 1:26 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You could try increasing the driver memory. Also, can you be more specific
I had a similar use case before. I found:
1. textFile() produced one partition per file. It can result in many
partitions. I found that calling coalecse() without shuffle helped.
2. If you used persist(), count() will do I/O and put the result into
cache. Transformation later did computation out
Hi Su,
Out of the box, no. But, I know people integrate it with Spark Streaming to
do real-time visualization. It will take some work though.
Kelvin
On Mon, Feb 9, 2015 at 5:04 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I was reading this blog post:
Maybe, try with local: under the heading of Advanced Dependency
Management here:
https://spark.apache.org/docs/1.1.0/submitting-applications.html
It seems this is what you want. Hope this help.
Kelvin
On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow eyc...@hotmail.com wrote:
Is there any way we
Joe, I also use S3 and gzip. So far the I/O is not a problem. In my case,
the operation is SQLContext.JsonFile() and I can see from Ganglia that the
whole cluster is CPU bound (99% saturated). I have 160 cores and I can see
the network can sustain about 150MBit/s.
Kelvin
On Wed, Feb 4, 2015 at
Hi Andy,
It sounds great! Quick questions: I have been using IPython + PySpark. I
crunch the data by PySpark and then visualize the data by Python libraries
like matplotlib and basemap. Could I still use these Python libraries in
the Scala Notebook? If not, what is suggested approaches for
14 matches
Mail list logo