Re: Missing Spark URL after staring the master

2014-03-03 Thread Mayur Rustagi
I think you have been through enough :).
Basically you have to download spark-ec2 scripts & run them. It'll just
need your amazon secret key & access key, start your cluster, install
everything, create security groups & give you the url, just login & go
ahead...

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Mon, Mar 3, 2014 at 11:00 AM, Bin Wang  wrote:

> Hi there,
>
> I have a CDH cluster set up, and I tried using the Spark parcel come with
> Cloudera Manager, but it turned out they even don't have the run-example
> shell command in the bin folder. Then I removed it from the cluster and
> cloned the incubator-spark into the name node of my cluster, and built from
> source there successfully with everything as default.
>
> I ran a few examples and everything seems work fine in the local mode.
> Then I am thinking about scale it to my cluster, which is what the
> "DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want to add all
> the datanodes to the slaves and think I should run Spark in the standalone
> mode.
>
> Say I am trying to set up Spark in the standalone mode following this
> instruction:
> https://spark.incubator.apache.org/docs/latest/spark-standalone.html
> However, it says "Once started, the master will print out a
> spark://HOST:PORT URL for itself, which you can use to connect workers to
> it, or pass as the “master” argument to SparkContext. You can also find
> this URL on the master’s web UI, which is http://localhost:8080 by
> default."
>
> After I started the master, there is no URL printed on the screen and
> neither the web UI is running.
> Here is the output:
> [root@box incubator-spark]# ./sbin/start-master.sh
> starting org.apache.spark.deploy.master.Master, logging to
> /root/bwang_spark_new/incubator-spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-box.out
>
> First Question: am I even in the ballpark to run Spark in standalone mode
> if I try to fully utilize my cluster? I saw there are four ways to launch
> Spark on a cluster, AWS-EC2, Spark in standalone, Apache Meso, Hadoop
> Yarn... which I guess standalone mode is the way to go?
>
> Second Question: how to get the Spark URL of the cluster, why the output
> is not like what the instruction says?
>
> Best regards,
>
> Bin
>


Re: Missing Spark URL after staring the master

2014-03-03 Thread Ognen Duzlevski
I have a Standalone spark cluster running in an Amazon VPC that I set up 
by hand. All I did was provision the machines from a common AMI image 
(my underlying distribution is Ubuntu), I created a "sparkuser" on each 
machine and I have a /home/sparkuser/spark folder where I downladed 
spark. I did this on the master only, I did sbt/sbt assemble and I set 
up the conf/spark-env.sh to point to the master which is an IP address 
(in my case 10.10.0.200, the port is the default 7077). I also set up 
the slaves file in the same subdirectory to have all 16 ip addresses of 
the worker nodes (in my case 10.10.0.201-216). After sbt/sbt assembly 
was done on master, I then did cd ~/; tar -czf spark.tgz spark/ and I 
copied the resulting tgz file to each worker using the same "sparkuser" 
account and unpacked the .tgz on each slave (this will effectively 
replicate everything from master to all slaves - you can script this so 
you don't do it by hand).


Your AMI should have the distribution's version of Java and git 
installed by the way.


All you have to do then is sparkuser@spark-master> 
spark/sbin/start-all.sh (for 0.9, in 0.8.1 it is spark/bin/start-all.sh) 
and it will all automagically start :)


All my Amazon nodes come with 4x400 Gb of ephemeral space which I have 
set up into a 1.6TB RAID0 array on each node and I am pooling this into 
an HDFS filesystem which is operated by a namenode outside the spark 
cluster while all the datanodes are the same nodes as the spark workers. 
This enables replication and extremely fast access since ephemeral is 
much faster than EBS or anything else on Amazon (you can do even better 
with SSD drives on this setup but it will cost ya).


If anyone is interested I can document our pipeline set up - I came up 
with it myself and do not have a clue as to what the industry standards 
are since I could not find any written instructions anywhere online 
about how to set up a whole data analytics pipeline from the point of 
ingestion to the point of analytics (people don't want to share their 
secrets? or am I just in the dark and incapable of using Google 
properly?). My requirement was that I wanted this to run within a VPC 
for added security and simplicity, the Amazon security groups get really 
old quickly. Added bonus is that you can use a VPN as an entry into the 
whole system and your cluster instantly becomes "local" to you in terms 
of IPs etc. I use OpenVPN since I don't like Cisco nor Juniper (the only 
two options Amazon provides for their VPN gateways).


Ognen


On 3/3/14, 1:00 PM, Bin Wang wrote:

Hi there,

I have a CDH cluster set up, and I tried using the Spark parcel come 
with Cloudera Manager, but it turned out they even don't have the 
run-example shell command in the bin folder. Then I removed it from 
the cluster and cloned the incubator-spark into the name node of my 
cluster, and built from source there successfully with everything as 
default.


I ran a few examples and everything seems work fine in the local mode. 
Then I am thinking about scale it to my cluster, which is what the 
"DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want to 
add all the datanodes to the slaves and think I should run Spark in 
the standalone mode.


Say I am trying to set up Spark in the standalone mode following this 
instruction:

https://spark.incubator.apache.org/docs/latest/spark-standalone.html
However, it says "Once started, the master will print out a 
|spark://HOST:PORT| URL for itself, which you can use to connect 
workers to it, or pass as the "master" argument to |SparkContext|. You 
can also find this URL on the master's web UI, which is 
http://localhost:8080  by default."


After I started the master, there is no URL printed on the screen and 
neither the web UI is running.

Here is the output:
[root@box incubator-spark]# ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to 
/root/bwang_spark_new/incubator-spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-box.out


First Question: am I even in the ballpark to run Spark in standalone 
mode if I try to fully utilize my cluster? I saw there are four ways 
to launch Spark on a cluster, AWS-EC2, Spark in standalone, Apache 
Meso, Hadoop Yarn... which I guess standalone mode is the way to go?


Second Question: how to get the Spark URL of the cluster, why the 
output is not like what the instruction says?


Best regards,

Bin


--
Some people, when confronted with a problem, think "I know, I'll use regular 
expressions." Now they have two problems.
-- Jamie Zawinski



Re: Missing Spark URL after staring the master

2014-03-03 Thread Ognen Duzlevski
I should add that in this setup you really do not need to look for the 
printout of the master node's IP - you set it yourself a priori. If 
anyone is interested, let me know, I can write it all up so that people 
can follow some set of instructions. Who knows, maybe I can come up with 
a set of scripts to automate it all...


Ognen


On 3/3/14, 3:02 PM, Ognen Duzlevski wrote:
I have a Standalone spark cluster running in an Amazon VPC that I set 
up by hand. All I did was provision the machines from a common AMI 
image (my underlying distribution is Ubuntu), I created a "sparkuser" 
on each machine and I have a /home/sparkuser/spark folder where I 
downladed spark. I did this on the master only, I did sbt/sbt assemble 
and I set up the conf/spark-env.sh to point to the master which is an 
IP address (in my case 10.10.0.200, the port is the default 7077). I 
also set up the slaves file in the same subdirectory to have all 16 ip 
addresses of the worker nodes (in my case 10.10.0.201-216). After 
sbt/sbt assembly was done on master, I then did cd ~/; tar -czf 
spark.tgz spark/ and I copied the resulting tgz file to each worker 
using the same "sparkuser" account and unpacked the .tgz on each slave 
(this will effectively replicate everything from master to all slaves 
- you can script this so you don't do it by hand).


Your AMI should have the distribution's version of Java and git 
installed by the way.


All you have to do then is sparkuser@spark-master> 
spark/sbin/start-all.sh (for 0.9, in 0.8.1 it is 
spark/bin/start-all.sh) and it will all automagically start :)


All my Amazon nodes come with 4x400 Gb of ephemeral space which I have 
set up into a 1.6TB RAID0 array on each node and I am pooling this 
into an HDFS filesystem which is operated by a namenode outside the 
spark cluster while all the datanodes are the same nodes as the spark 
workers. This enables replication and extremely fast access since 
ephemeral is much faster than EBS or anything else on Amazon (you can 
do even better with SSD drives on this setup but it will cost ya).


If anyone is interested I can document our pipeline set up - I came up 
with it myself and do not have a clue as to what the industry 
standards are since I could not find any written instructions anywhere 
online about how to set up a whole data analytics pipeline from the 
point of ingestion to the point of analytics (people don't want to 
share their secrets? or am I just in the dark and incapable of using 
Google properly?). My requirement was that I wanted this to run within 
a VPC for added security and simplicity, the Amazon security groups 
get really old quickly. Added bonus is that you can use a VPN as an 
entry into the whole system and your cluster instantly becomes "local" 
to you in terms of IPs etc. I use OpenVPN since I don't like Cisco nor 
Juniper (the only two options Amazon provides for their VPN gateways).


Ognen


On 3/3/14, 1:00 PM, Bin Wang wrote:

Hi there,

I have a CDH cluster set up, and I tried using the Spark parcel come 
with Cloudera Manager, but it turned out they even don't have the 
run-example shell command in the bin folder. Then I removed it from 
the cluster and cloned the incubator-spark into the name node of my 
cluster, and built from source there successfully with everything as 
default.


I ran a few examples and everything seems work fine in the local 
mode. Then I am thinking about scale it to my cluster, which is what 
the "DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want 
to add all the datanodes to the slaves and think I should run Spark 
in the standalone mode.


Say I am trying to set up Spark in the standalone mode following this 
instruction:

https://spark.incubator.apache.org/docs/latest/spark-standalone.html
However, it says "Once started, the master will print out a 
|spark://HOST:PORT| URL for itself, which you can use to connect 
workers to it, or pass as the "master" argument to |SparkContext|. 
You can also find this URL on the master's web UI, which is 
http://localhost:8080  by default."


After I started the master, there is no URL printed on the screen and 
neither the web UI is running.

Here is the output:
[root@box incubator-spark]# ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to 
/root/bwang_spark_new/incubator-spark/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-box.out


First Question: am I even in the ballpark to run Spark in standalone 
mode if I try to fully utilize my cluster? I saw there are four ways 
to launch Spark on a cluster, AWS-EC2, Spark in standalone, Apache 
Meso, Hadoop Yarn... which I guess standalone mode is the way to go?


Second Question: how to get the Spark URL of the cluster, why the 
output is not like what the instruction says?


Best regards,

Bin


--
Some people, when confronted with a problem, think "I know, I'll use regular 
expressions." Now they have two problems.
-- Jami

Re: Missing Spark URL after staring the master

2014-03-03 Thread Bin Wang
Hi Ognen/Mayur,

Thanks for the reply and it is good to know how easy it is to setup Spark
on AWS cluster.

My situation is a bit different from yours, our company already have a
cluster and it really doesn't make that much sense not to use them. That is
why I have been "going through" this. I really wish there are some
tutorials teaching you how to set up Spark Cluster on baremetal CDH cluster
or .. some way to tweak the CDH Spark distribution, so it is up to date.

Ognen, of course it will be very helpful if you can 'history | grep
spark... ' and document the work that you have done since you've already
made it!

Bin



On Mon, Mar 3, 2014 at 2:06 PM, Ognen Duzlevski  wrote:

>  I should add that in this setup you really do not need to look for the
> printout of the master node's IP - you set it yourself a priori. If anyone
> is interested, let me know, I can write it all up so that people can follow
> some set of instructions. Who knows, maybe I can come up with a set of
> scripts to automate it all...
>
> Ognen
>
>
>
> On 3/3/14, 3:02 PM, Ognen Duzlevski wrote:
>
> I have a Standalone spark cluster running in an Amazon VPC that I set up
> by hand. All I did was provision the machines from a common AMI image (my
> underlying distribution is Ubuntu), I created a "sparkuser" on each machine
> and I have a /home/sparkuser/spark folder where I downladed spark. I did
> this on the master only, I did sbt/sbt assemble and I set up the
> conf/spark-env.sh to point to the master which is an IP address (in my case
> 10.10.0.200, the port is the default 7077). I also set up the slaves file
> in the same subdirectory to have all 16 ip addresses of the worker nodes
> (in my case 10.10.0.201-216). After sbt/sbt assembly was done on master, I
> then did cd ~/; tar -czf spark.tgz spark/ and I copied the resulting tgz
> file to each worker using the same "sparkuser" account and unpacked the
> .tgz on each slave (this will effectively replicate everything from master
> to all slaves - you can script this so you don't do it by hand).
>
> Your AMI should have the distribution's version of Java and git installed
> by the way.
>
> All you have to do then is sparkuser@spark-master>
> spark/sbin/start-all.sh (for 0.9, in 0.8.1 it is spark/bin/start-all.sh)
> and it will all automagically start :)
>
> All my Amazon nodes come with 4x400 Gb of ephemeral space which I have set
> up into a 1.6TB RAID0 array on each node and I am pooling this into an HDFS
> filesystem which is operated by a namenode outside the spark cluster while
> all the datanodes are the same nodes as the spark workers. This enables
> replication and extremely fast access since ephemeral is much faster than
> EBS or anything else on Amazon (you can do even better with SSD drives on
> this setup but it will cost ya).
>
> If anyone is interested I can document our pipeline set up - I came up
> with it myself and do not have a clue as to what the industry standards are
> since I could not find any written instructions anywhere online about how
> to set up a whole data analytics pipeline from the point of ingestion to
> the point of analytics (people don't want to share their secrets? or am I
> just in the dark and incapable of using Google properly?). My requirement
> was that I wanted this to run within a VPC for added security and
> simplicity, the Amazon security groups get really old quickly. Added bonus
> is that you can use a VPN as an entry into the whole system and your
> cluster instantly becomes "local" to you in terms of IPs etc. I use OpenVPN
> since I don't like Cisco nor Juniper (the only two options Amazon provides
> for their VPN gateways).
>
> Ognen
>
>
> On 3/3/14, 1:00 PM, Bin Wang wrote:
>
> Hi there,
>
>  I have a CDH cluster set up, and I tried using the Spark parcel come
> with Cloudera Manager, but it turned out they even don't have the
> run-example shell command in the bin folder. Then I removed it from the
> cluster and cloned the incubator-spark into the name node of my cluster,
> and built from source there successfully with everything as default.
>
>  I ran a few examples and everything seems work fine in the local mode.
> Then I am thinking about scale it to my cluster, which is what the
> "DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want to add all
> the datanodes to the slaves and think I should run Spark in the standalone
> mode.
>
>  Say I am trying to set up Spark in the standalone mode following this
> instruction:
> https://spark.incubator.apache.org/docs/latest/spark-standalone.html
> However, it says "Once started, the master will print out a
> spark://HOST:PORT URL for itself, which you can use to connect workers to
> it, or pass as the "master" argument to SparkContext. You can also find
> this URL on the master's web UI, which is http://localhost:8080 by
> default."
>
>  After I started the master, there is no URL printed on the screen and
> neither the web UI is running.
> Here is the output:
>  [root@box 

Re: Missing Spark URL after staring the master

2014-03-04 Thread Mayur Rustagi
I have on cloudera vm
http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Cloudera_VM
which version are you trying to setup on cloudera.. also which cloudera
version are you using...


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Mon, Mar 3, 2014 at 4:29 PM, Bin Wang  wrote:

> Hi Ognen/Mayur,
>
> Thanks for the reply and it is good to know how easy it is to setup Spark
> on AWS cluster.
>
> My situation is a bit different from yours, our company already have a
> cluster and it really doesn't make that much sense not to use them. That is
> why I have been "going through" this. I really wish there are some
> tutorials teaching you how to set up Spark Cluster on baremetal CDH cluster
> or .. some way to tweak the CDH Spark distribution, so it is up to date.
>
> Ognen, of course it will be very helpful if you can 'history | grep
> spark... ' and document the work that you have done since you've already
> made it!
>
> Bin
>
>
>
> On Mon, Mar 3, 2014 at 2:06 PM, Ognen Duzlevski <
> og...@plainvanillagames.com> wrote:
>
>>  I should add that in this setup you really do not need to look for the
>> printout of the master node's IP - you set it yourself a priori. If anyone
>> is interested, let me know, I can write it all up so that people can follow
>> some set of instructions. Who knows, maybe I can come up with a set of
>> scripts to automate it all...
>>
>> Ognen
>>
>>
>>
>> On 3/3/14, 3:02 PM, Ognen Duzlevski wrote:
>>
>> I have a Standalone spark cluster running in an Amazon VPC that I set up
>> by hand. All I did was provision the machines from a common AMI image (my
>> underlying distribution is Ubuntu), I created a "sparkuser" on each machine
>> and I have a /home/sparkuser/spark folder where I downladed spark. I did
>> this on the master only, I did sbt/sbt assemble and I set up the
>> conf/spark-env.sh to point to the master which is an IP address (in my case
>> 10.10.0.200, the port is the default 7077). I also set up the slaves file
>> in the same subdirectory to have all 16 ip addresses of the worker nodes
>> (in my case 10.10.0.201-216). After sbt/sbt assembly was done on master, I
>> then did cd ~/; tar -czf spark.tgz spark/ and I copied the resulting tgz
>> file to each worker using the same "sparkuser" account and unpacked the
>> .tgz on each slave (this will effectively replicate everything from master
>> to all slaves - you can script this so you don't do it by hand).
>>
>> Your AMI should have the distribution's version of Java and git installed
>> by the way.
>>
>> All you have to do then is sparkuser@spark-master>
>> spark/sbin/start-all.sh (for 0.9, in 0.8.1 it is spark/bin/start-all.sh)
>> and it will all automagically start :)
>>
>> All my Amazon nodes come with 4x400 Gb of ephemeral space which I have
>> set up into a 1.6TB RAID0 array on each node and I am pooling this into an
>> HDFS filesystem which is operated by a namenode outside the spark cluster
>> while all the datanodes are the same nodes as the spark workers. This
>> enables replication and extremely fast access since ephemeral is much
>> faster than EBS or anything else on Amazon (you can do even better with SSD
>> drives on this setup but it will cost ya).
>>
>> If anyone is interested I can document our pipeline set up - I came up
>> with it myself and do not have a clue as to what the industry standards are
>> since I could not find any written instructions anywhere online about how
>> to set up a whole data analytics pipeline from the point of ingestion to
>> the point of analytics (people don't want to share their secrets? or am I
>> just in the dark and incapable of using Google properly?). My requirement
>> was that I wanted this to run within a VPC for added security and
>> simplicity, the Amazon security groups get really old quickly. Added bonus
>> is that you can use a VPN as an entry into the whole system and your
>> cluster instantly becomes "local" to you in terms of IPs etc. I use OpenVPN
>> since I don't like Cisco nor Juniper (the only two options Amazon provides
>> for their VPN gateways).
>>
>> Ognen
>>
>>
>> On 3/3/14, 1:00 PM, Bin Wang wrote:
>>
>> Hi there,
>>
>>  I have a CDH cluster set up, and I tried using the Spark parcel come
>> with Cloudera Manager, but it turned out they even don't have the
>> run-example shell command in the bin folder. Then I removed it from the
>> cluster and cloned the incubator-spark into the name node of my cluster,
>> and built from source there successfully with everything as default.
>>
>>  I ran a few examples and everything seems work fine in the local mode.
>> Then I am thinking about scale it to my cluster, which is what the
>> "DISTRIBUTE + ACTIVATE" command does in Cloudera Manager. I want to add all
>> the datanodes to the slaves and think I should run Spark in the standalone
>> mode.
>>
>>  Say I am trying to set up Spark in the standalone mode following this
>> instruc

Re: Missing Spark URL after staring the master

2014-03-04 Thread Bin Wang
Hi Mayur,

I am using CDH4.6.0p0.26.  And the latest Cloudera Spark parcel is Spark
0.9.0 CDH4.6.0p0.50.
As I mentioned, somehow, the Cloudera Spark version doesn't contain the
run-example shell scripts.. However, it is automatically configured and it
is pretty easy to set up across the cluster...

Thanks,
Bin


On Tue, Mar 4, 2014 at 10:59 AM, Mayur Rustagi wrote:

> I have on cloudera vm
> http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Cloudera_VM
> which version are you trying to setup on cloudera.. also which cloudera
> version are you using...
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Mon, Mar 3, 2014 at 4:29 PM, Bin Wang  wrote:
>
>> Hi Ognen/Mayur,
>>
>> Thanks for the reply and it is good to know how easy it is to setup Spark
>> on AWS cluster.
>>
>> My situation is a bit different from yours, our company already have a
>> cluster and it really doesn't make that much sense not to use them. That is
>> why I have been "going through" this. I really wish there are some
>> tutorials teaching you how to set up Spark Cluster on baremetal CDH cluster
>> or .. some way to tweak the CDH Spark distribution, so it is up to date.
>>
>> Ognen, of course it will be very helpful if you can 'history | grep
>> spark... ' and document the work that you have done since you've already
>> made it!
>>
>> Bin
>>
>>
>>
>> On Mon, Mar 3, 2014 at 2:06 PM, Ognen Duzlevski <
>> og...@plainvanillagames.com> wrote:
>>
>>>  I should add that in this setup you really do not need to look for the
>>> printout of the master node's IP - you set it yourself a priori. If anyone
>>> is interested, let me know, I can write it all up so that people can follow
>>> some set of instructions. Who knows, maybe I can come up with a set of
>>> scripts to automate it all...
>>>
>>> Ognen
>>>
>>>
>>>
>>> On 3/3/14, 3:02 PM, Ognen Duzlevski wrote:
>>>
>>> I have a Standalone spark cluster running in an Amazon VPC that I set up
>>> by hand. All I did was provision the machines from a common AMI image (my
>>> underlying distribution is Ubuntu), I created a "sparkuser" on each machine
>>> and I have a /home/sparkuser/spark folder where I downladed spark. I did
>>> this on the master only, I did sbt/sbt assemble and I set up the
>>> conf/spark-env.sh to point to the master which is an IP address (in my case
>>> 10.10.0.200, the port is the default 7077). I also set up the slaves file
>>> in the same subdirectory to have all 16 ip addresses of the worker nodes
>>> (in my case 10.10.0.201-216). After sbt/sbt assembly was done on master, I
>>> then did cd ~/; tar -czf spark.tgz spark/ and I copied the resulting tgz
>>> file to each worker using the same "sparkuser" account and unpacked the
>>> .tgz on each slave (this will effectively replicate everything from master
>>> to all slaves - you can script this so you don't do it by hand).
>>>
>>> Your AMI should have the distribution's version of Java and git
>>> installed by the way.
>>>
>>> All you have to do then is sparkuser@spark-master>
>>> spark/sbin/start-all.sh (for 0.9, in 0.8.1 it is spark/bin/start-all.sh)
>>> and it will all automagically start :)
>>>
>>> All my Amazon nodes come with 4x400 Gb of ephemeral space which I have
>>> set up into a 1.6TB RAID0 array on each node and I am pooling this into an
>>> HDFS filesystem which is operated by a namenode outside the spark cluster
>>> while all the datanodes are the same nodes as the spark workers. This
>>> enables replication and extremely fast access since ephemeral is much
>>> faster than EBS or anything else on Amazon (you can do even better with SSD
>>> drives on this setup but it will cost ya).
>>>
>>> If anyone is interested I can document our pipeline set up - I came up
>>> with it myself and do not have a clue as to what the industry standards are
>>> since I could not find any written instructions anywhere online about how
>>> to set up a whole data analytics pipeline from the point of ingestion to
>>> the point of analytics (people don't want to share their secrets? or am I
>>> just in the dark and incapable of using Google properly?). My requirement
>>> was that I wanted this to run within a VPC for added security and
>>> simplicity, the Amazon security groups get really old quickly. Added bonus
>>> is that you can use a VPN as an entry into the whole system and your
>>> cluster instantly becomes "local" to you in terms of IPs etc. I use OpenVPN
>>> since I don't like Cisco nor Juniper (the only two options Amazon provides
>>> for their VPN gateways).
>>>
>>> Ognen
>>>
>>>
>>> On 3/3/14, 1:00 PM, Bin Wang wrote:
>>>
>>> Hi there,
>>>
>>>  I have a CDH cluster set up, and I tried using the Spark parcel come
>>> with Cloudera Manager, but it turned out they even don't have the
>>> run-example shell command in the bin folder. Then I removed it from the
>>> cluster and cloned the incubator-spark into the name node of my cl