Broadcast variables: when should I use them?

2015-01-26 Thread frodo777
Hello.

I have a number of static Arrays and Maps in my Spark Streaming driver
program.
They are simple collections, initialized with integer values and strings
directly in the code. There is no RDD/DStream involvement here.
I do not expect them to contain more than 100 entries, each.
They are used in several subsequent parallel operations.

The question is:
Should I convert them into broadcast variables?

Thanks and regards.
-Bob



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Broadcast-variables-when-should-I-use-them-tp21366.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Standalone Cluster not correctly configured

2015-01-08 Thread frodo777
Hello everyone.

With respect to the configuration problem that I explained before 

Do you have any idea what is wrong there?

The problem in a nutshell:
- When more than one master is started in the cluster, all of them are
scheduling independently, thinking they are all leaders.
- zookeeper configuration seems to be correct, only one leader is reported.
The remaining master nodes are followers.
- Default /spark directory is used for zookeeper.

Thanks a lot.
-Bob



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standalone-Cluster-not-correctly-configured-tp20909p21029.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Standalone Cluster not correctly configured

2014-12-30 Thread frodo777
Hi.

I'm trying to configure a spark standalone cluster, with three master nodes
(bigdata1, bigdata2 and bigdata3) managed by Zookeeper.

It seems there's a configuration problem, since everyone is saying it is the
cluster leader:

 .
 14/12/30 13:54:59 INFO Master: I have been elected leader! New state:
ALIVE

The message above is dumped by every master I start.


Zookeeper is configured identically in all of them, as follows:

dataDir=/spark


The only difference is the myid file in the /spark directory, of course.


The masters are started using the following configuration:
.
export SPARK_DAEMON_JAVA_OPTS= \
-Dspark.deploy.recoverymode=ZOOKEEPER \
-Dspark.deploy.zookeeper.url=bigdata1:2181,bigdata2:2181,bigdata3:2181

I'm not setting the spark.deploy.zookeeper.dir variable, since I'm using the
default value, /spark, configured in zookeeper, as I mentioned before.

I would like to know if there is any other thing I have to configure, in
order to make the masters to behave correctly (only one master node active
at a time).

With the current situation, I can connect workers and applications to the
whole cluster, for instance, I can connect a worker to the cluster using:

spark-class org.apache.spark.deploy.worker.Worker
spark://bigdata1:2181,bigdata2:2181,bigdata3:2181


But the worker gets registered to each of the masters independently.
If I stop one of the masters, it tries to re-register to it.
The notion of active-master is completely lost.

Do you have any idea?

Thanks a lot.
-Bob



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standalone-Cluster-not-correctly-configured-tp20909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org