Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Matei Zaharia
Maybe your application is overriding the master variable when it creates its 
SparkContext. I see you are still passing “yarn-client” as an argument later to 
it in your command.

> On Jun 17, 2018, at 11:53 AM, Raymond Xie  wrote:
> 
> Thank you Subhash.
> 
> Here is the new command:
> spark-submit --master local[*] --class retail_db.GetRevenuePerOrder --conf 
> spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client 
> /public/retail_db/order_items /home/rxie/output/revenueperorder
> 
> Still seeing the same issue here.
> 2018-06-17 11:51:25 INFO  RMProxy:98 - Connecting to ResourceManager at 
> /0.0.0.0:8032
> 2018-06-17 11:51:27 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:28 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:29 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:30 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:31 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:32 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:33 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:34 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:35 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 2018-06-17 11:51:36 INFO  Client:871 - Retrying connect to server: 
> 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is
>   
>RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
> sleepTime=1000 MILLISECONDS)
> 
> 
> 
> 
> Sincerely yours,
> 
> 
> Raymond
> 
> On Sun, Jun 17, 2018 at 2:36 PM, Subhash Sriram  
> wrote:
> Hi Raymond,
> 
> If you set your master to local[*] instead of yarn-client, it should run on 
> your local machine.
> 
> Thanks,
> Subhash 
> 
> Sent from my iPhone
> 
> On Jun 17, 2018, at 2:32 PM, Raymond Xie  wrote:
> 
>> Hello,
>> 
>> I am wondering how can I run spark job in my environment which is a single 
>> Ubuntu host with no hadoop installed? if I run my job like below, I will end 
>> up with infinite loop at the end. Thank you very much.
>> 
>> rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder --conf 
>> spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client 
>> /public/retail_db/order_items /home/rxie/output/revenueperorder
>> 2018-06-17 11:19:36 WARN  Utils:66 - Your hostname, ubuntu resolves to a 
>> loopback address: 127.0.1.1; using 192.168.112.141 instead (on interface 
>> ens33)
>> 2018-06-17 11:19:36 WARN  Utils:66 - Set SPARK_LOCAL_IP 

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie
Thank you Subhash.

Here is the new command:
spark-submit --master local[*] --class retail_db.GetRevenuePerOrder --conf
spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client
/public/retail_db/order_items /home/rxie/output/revenueperorder

Still seeing the same issue here.
2018-06-17 11:51:25 INFO  RMProxy:98 - Connecting to ResourceManager at /
0.0.0.0:8032
2018-06-17 11:51:27 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:28 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:29 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:30 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:31 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:32 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:33 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:34 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:35 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2018-06-17 11:51:36 INFO  Client:871 - Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is

   RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)



**
*Sincerely yours,*


*Raymond*

On Sun, Jun 17, 2018 at 2:36 PM, Subhash Sriram 
wrote:

> Hi Raymond,
>
> If you set your master to local[*] instead of yarn-client, it should run
> on your local machine.
>
> Thanks,
> Subhash
>
> Sent from my iPhone
>
> On Jun 17, 2018, at 2:32 PM, Raymond Xie  wrote:
>
> Hello,
>
> I am wondering how can I run spark job in my environment which is a single
> Ubuntu host with no hadoop installed? if I run my job like below, I will
> end up with infinite loop at the end. Thank you very much.
>
> rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder
> --conf spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client
> /public/retail_db/order_items /home/rxie/output/revenueperorder
> 2018-06-17 11:19:36 WARN  Utils:66 - Your hostname, ubuntu resolves to a
> loopback address: 127.0.1.1; using 192.168.112.141 instead (on interface
> ens33)
> 2018-06-17 11:19:36 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to
> bind to another address
> 2018-06-17 11:19:37 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2018-06-17 11:19:38 INFO  SparkContext:54 - Running Spark version 2.3.1
> 2018-06-17 11:19:38 WARN  SparkConf:66 - spark.master yarn-client is
> deprecated in Spark 2.0+, please instead use "yarn" with specified deploy
> mode.
> 2018-06-17 11:19:38 INFO  SparkContext:54 - Submitted application: Get
> Revenue Per Order
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls to: rxie
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls to:
> rxie
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls groups
> to:
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls groups
> to:
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - SecurityManager:
> authentication disabled; ui acls disabled; users  with view permissions:
> Set(rxie); groups with view permissions: Set(); users  with modify
> permissions: Set(rxie); groups with modify permissions: Set()
> 2018-06-17 11:19:39 INFO  Utils:54 - Successfully started service
> 'sparkDriver' on port 44709.
> 2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering MapOutputTracker
> 2018-06-17 11:19:39 INFO  

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Subhash Sriram
Hi Raymond,

If you set your master to local[*] instead of yarn-client, it should run on 
your local machine.

Thanks,
Subhash 

Sent from my iPhone

> On Jun 17, 2018, at 2:32 PM, Raymond Xie  wrote:
> 
> Hello,
> 
> I am wondering how can I run spark job in my environment which is a single 
> Ubuntu host with no hadoop installed? if I run my job like below, I will end 
> up with infinite loop at the end. Thank you very much.
> 
> rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder --conf 
> spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client 
> /public/retail_db/order_items /home/rxie/output/revenueperorder
> 2018-06-17 11:19:36 WARN  Utils:66 - Your hostname, ubuntu resolves to a 
> loopback address: 127.0.1.1; using 192.168.112.141 instead (on interface 
> ens33)
> 2018-06-17 11:19:36 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind 
> to another address
> 2018-06-17 11:19:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2018-06-17 11:19:38 INFO  SparkContext:54 - Running Spark version 2.3.1
> 2018-06-17 11:19:38 WARN  SparkConf:66 - spark.master yarn-client is 
> deprecated in Spark 2.0+, please instead use "yarn" with specified deploy 
> mode.
> 2018-06-17 11:19:38 INFO  SparkContext:54 - Submitted application: Get 
> Revenue Per Order
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls to: rxie
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls to: rxie
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls groups to:
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls groups to:
> 2018-06-17 11:19:38 INFO  SecurityManager:54 - SecurityManager: 
> authentication disabled; ui acls disabled; users  with view permissions: 
> Set(rxie); groups with view permissions: Set(); users  with modify 
> permissions: Set(rxie); groups with modify permissions: Set()
> 2018-06-17 11:19:39 INFO  Utils:54 - Successfully started service 
> 'sparkDriver' on port 44709.
> 2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering MapOutputTracker
> 2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering BlockManagerMaster
> 2018-06-17 11:19:39 INFO  BlockManagerMasterEndpoint:54 - Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 2018-06-17 11:19:39 INFO  BlockManagerMasterEndpoint:54 - 
> BlockManagerMasterEndpoint up
> 2018-06-17 11:19:39 INFO  DiskBlockManager:54 - Created local directory at 
> /tmp/blockmgr-69a8a12d-0881-4454-96ab-6a45d5c58bfe
> 2018-06-17 11:19:39 INFO  MemoryStore:54 - MemoryStore started with capacity 
> 413.9 MB
> 2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
> 2018-06-17 11:19:40 INFO  log:192 - Logging initialized @7035ms
> 2018-06-17 11:19:40 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
> 2018-06-17 11:19:40 INFO  Server:414 - Started @7383ms
> 2018-06-17 11:19:40 INFO  AbstractConnector:278 - Started 
> ServerConnector@51ad75c2{HTTP/1.1,[http/1.1]}{0.0.0.0:12678}
> 2018-06-17 11:19:40 INFO  Utils:54 - Successfully started service 'SparkUI' 
> on port 12678.
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@50b8ae8d{/jobs,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@60afd40d{/jobs/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@28a2a3e7{/jobs/job,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@10b3df93{/jobs/job/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@ea27e34{/stages,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@33a2499c{/stages/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@e72dba7{/stages/stage,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@3c321bdb{/stages/stage/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@24855019{/stages/pool,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@3abd581e{/stages/pool/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@4d4d8fcf{/storage,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@610db97e{/storage/json,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> o.s.j.s.ServletContextHandler@6f0628de{/storage/rdd,null,AVAILABLE,@Spark}
> 2018-06-17 11:19:40 INFO  ContextHandler:781 - Started 
> 

how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie
Hello,

I am wondering how can I run spark job in my environment which is a single
Ubuntu host with no hadoop installed? if I run my job like below, I will
end up with infinite loop at the end. Thank you very much.

rxie@ubuntu:~/data$ spark-submit --class retail_db.GetRevenuePerOrder
--conf spark.ui.port=12678 spark2practice_2.11-0.1.jar yarn-client
/public/retail_db/order_items /home/rxie/output/revenueperorder
2018-06-17 11:19:36 WARN  Utils:66 - Your hostname, ubuntu resolves to a
loopback address: 127.0.1.1; using 192.168.112.141 instead (on interface
ens33)
2018-06-17 11:19:36 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind
to another address
2018-06-17 11:19:37 WARN  NativeCodeLoader:62 - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2018-06-17 11:19:38 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-06-17 11:19:38 WARN  SparkConf:66 - spark.master yarn-client is
deprecated in Spark 2.0+, please instead use "yarn" with specified deploy
mode.
2018-06-17 11:19:38 INFO  SparkContext:54 - Submitted application: Get
Revenue Per Order
2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls to: rxie
2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls to: rxie
2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing view acls groups to:
2018-06-17 11:19:38 INFO  SecurityManager:54 - Changing modify acls groups
to:
2018-06-17 11:19:38 INFO  SecurityManager:54 - SecurityManager:
authentication disabled; ui acls disabled; users  with view permissions:
Set(rxie); groups with view permissions: Set(); users  with modify
permissions: Set(rxie); groups with modify permissions: Set()
2018-06-17 11:19:39 INFO  Utils:54 - Successfully started service
'sparkDriver' on port 44709.
2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-06-17 11:19:39 INFO  BlockManagerMasterEndpoint:54 - Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
2018-06-17 11:19:39 INFO  BlockManagerMasterEndpoint:54 -
BlockManagerMasterEndpoint up
2018-06-17 11:19:39 INFO  DiskBlockManager:54 - Created local directory at
/tmp/blockmgr-69a8a12d-0881-4454-96ab-6a45d5c58bfe
2018-06-17 11:19:39 INFO  MemoryStore:54 - MemoryStore started with
capacity 413.9 MB
2018-06-17 11:19:39 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-06-17 11:19:40 INFO  log:192 - Logging initialized @7035ms
2018-06-17 11:19:40 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-06-17 11:19:40 INFO  Server:414 - Started @7383ms
2018-06-17 11:19:40 INFO  AbstractConnector:278 - Started
ServerConnector@51ad75c2{HTTP/1.1,[http/1.1]}{0.0.0.0:12678}
2018-06-17 11:19:40 INFO  Utils:54 - Successfully started service 'SparkUI'
on port 12678.
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@50b8ae8d{/jobs,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@60afd40d{/jobs/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@28a2a3e7{/jobs/job,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@10b3df93{/jobs/job/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@ea27e34{/stages,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@33a2499c{/stages/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@e72dba7{/stages/stage,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@3c321bdb
{/stages/stage/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@24855019{/stages/pool,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@3abd581e
{/stages/pool/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@4d4d8fcf{/storage,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@610db97e{/storage/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@6f0628de{/storage/rdd,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@3fabf088
{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@1e392345{/environment,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started
o.s.j.s.ServletContextHandler@12f3afb5
{/environment/json,null,AVAILABLE,@Spark}
2018-06-17 11:19:40 INFO  ContextHandler:781 - Started