Hi,
How do I start Spark Thrift Server with cloudera CDH 5.3?
Thanks.
Can someone look at my questions? Thanks again!
From: Haopu Wang
Sent: 2016年6月12日 16:40
To: user@spark.apache.org
Subject: Should I avoid "state" in an Spark application?
I have a Spark application whose structure is below:
var ts: Long = 0L
Spark is a software product. In software a "core" is something that a
process can run on. So it's a "virtual core". (Do not call these "threads".
A "thread" is not something a process can run on.)
local[*] uses java.lang.Runtime.availableProcessors()
Hi,
I was writing some docs on Spark P and came across this.
It is about the terminology or interpretation of that in Spark doc.
This is my understanding of cores and threads.
Cores are physical cores. Threads are virtual cores. Cores with 2 threads
is called hyper threading technology so 2
two of my faves:
https://www.amazon.com/Advanced-Analytics-Spark-Patterns-Learning/dp/1491912766/
(Cloudera authors)
https://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515/
(IBM author)
(most) authors are Spark Committers.
while not totally up to date w/ ML pipelines
What's the value of spark.version ?
Do you know which version of Spark mongodb connector 0.10.3 was built
against ?
You can use the following command to find out:
mvn dependency:tree
Maybe the Spark version you use is different from what mongodb connector
was built against.
On Fri, Jun 10,
Hi Asfanyar,
*NoSuchMethodError *in Java means you compiled against one version of code
, and executed against a different version.
Please make sure your java version and adding dependency version is working
on same java version.
regards,
vaquar khan
On Fri, Jun 10, 2016 at 4:50 AM, Asfandyar
Agreed with Mich
The spark driver is the program that declares the transformations and
actions on RDDs of data and submits such requests to the master.
*spark.driver.host :* Hostname or IP address for the driver to listen on.
This is used for communicating with the executors and the standalone
Thanks Mich. Great explanation
On Saturday, 11 June 2016, 22:35, Mich Talebzadeh
wrote:
Hi Gavin,
I believe in standalone mode a simple cluster manager is included with Spark
that makes it easyto set up a cluster.It does not rely on YARN or Mesos.
In summary
Hi Sharad.
The array size you (or the serializer) tries to allocate is just too big
for the JVM.
You can also split your input further by increasing parallelism.
Following is good explanintion
https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit
regards,
Vaquar khan
On
Hi,
You basically want to use wired/Ethernet connections as opposed to wireless?
in Your Spark Web UI under environment table what do you get for "
spark.driver.host".
Also can you cat /etc/hosts and send the output please and the output
from ifconfig -a
HTH
Dr Mich Talebzadeh
LinkedIn *
Hi, guys
My question is about Spark Worker IP address.
I have four nodes, four nodes have Wireless module and Ethernet module, so all
nodes have two IP addresses.
When I vist the webUI, information is always displayed in the Wireless IP
address but my Spark computing cluster based on Ethernet.
Thank You...Please see inline..
On Sun, Jun 12, 2016 at 3:39 PM, wrote:
> Machine learning - I would suggest that you pick up a fine book that
> explains machine learning. That's the way I went about - pick up each type
> of machine learning concept - say Linear
Looks like a bug in the code generating the SQL query…why would it be specific
to SAS, I can’t guess. Did you try the same with another database? As a
workaround you can write the select statement yourself instead of just
providing the table name.
> On Jun 11, 2016, at 6:27 PM, Ajay Chander
Machine learning - I would suggest that you pick up a fine book that explains
machine learning. That's the way I went about - pick up each type of machine
learning concept - say Linear regression then understand the why/when/how etc
and infer results etc.
Then apply the learning to a small
When trying to save the word2vec model trained over 10G of data leads to
below OOM error.
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Spark Version: 1.6
spark.dynamicAllocation.enable false
spark.executor.memory 75g
spark.driver.memory 150g
spark.driver.cores 10
Hey,
I have some additional Spark ML algorithms implemented in scala that I would
like to make available in pyspark. For a reference I am looking at the
available logistic regression implementation here:
https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html
I
Hi,
I have a pipeline for classification. However before classification I want
to use a model generated early in a stage. How can I get this model
reference to use as an input to another stage. Where the model references
are hold generated in pipeline. How can I get the model by uid or etc.
I have a Spark application whose structure is below:
var ts: Long = 0L
dstream1.foreachRDD{
(x, time) => {
ts = time
x.do_something()...
}
}
..
process_data(dstream2, ts, ..)
I assume foreachRDD function call can
19 matches
Mail list logo