Connection to Presto via Spark

2021-01-13 Thread Vineet Mishra
Hi,

I am trying to connect to Presto via Spark shell using the following
connection string, however ending up with exception

*-bash-4.2$ spark-shell  --driver-class-path
com.facebook.presto.jdbc.PrestoDriver  --jars presto-jdbc-0.221.jar*

*scala> val presto_df = sqlContext.read.format("jdbc").option("url",
"jdbc:presto://presto-prd.url.com:8443/hive/xyz
").option("dbtable","testTable").option("driver","com.facebook.presto.jdbc.PrestoDriver").load()*
java.sql.SQLException: Unrecognized connection property 'url'
at
com.facebook.presto.jdbc.PrestoDriverUri.validateConnectionProperties(PrestoDriverUri.java:316)
at com.facebook.presto.jdbc.PrestoDriverUri.(PrestoDriverUri.java:95)
at com.facebook.presto.jdbc.PrestoDriverUri.(PrestoDriverUri.java:85)
at com.facebook.presto.jdbc.PrestoDriver.connect(PrestoDriver.java:87)
at
org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:61)
at
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:120)
at
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91)
at
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)

Upon removing the url option from the above string I am getting the
following exception,

*scala> val presto_df = sqlContext.read.format("jdbc").option("uri",
"jdbc:presto://presto-prd.url.com:8443/hive/xyz
").option("dbtable","testTable").option("driver","com.facebook.presto.jdbc.PrestoDriver").load()*
 java.lang.RuntimeException: Option 'url' not specified
at scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource$$anonfun$1.apply(DefaultSource.scala:33)
at
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource$$anonfun$1.apply(DefaultSource.scala:33)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.getOrElse(ddl.scala:150)
at
org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:33)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:25)

Would be helpful if someone can help here!

Thanks!
VM


Re: Running Spark on Yarn

2016-03-30 Thread Vineet Mishra
RM NM logs traced below,

RM -->

2016-03-30 14:59:15,498 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Setting up container Container: [ContainerId:
container_1459326455972_0004_01_01, NodeId: myhost:60653,
NodeHttpAddress: myhost:8042, Resource: , Priority:
0, Token: Token { kind: ContainerToken, service: 10.20.53.123:60653 }, ]
for AM appattempt_1459326455972_0004_01
2016-03-30 14:59:15,498 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
Command to launch container container_1459326455972_0004_01_01 :
{{JAVA_HOME}}/bin/java,-server,-Xmx512m,-Djava.io.tmpdir={{PWD}}/tmp,-Dspark.yarn.app.container.log.dir=,-XX:MaxPermSize=256m,org.apache.spark.deploy.yarn.ExecutorLauncher,--arg,'
10.20.53.123:45379
',--executor-memory,1024m,--executor-cores,1,--properties-file,{{PWD}}/__spark_conf__/__spark_conf__.properties,1>,/stdout,2>,/stderr
2016-03-30 14:59:15,498 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Create AMRMToken for ApplicationAttempt:
appattempt_1459326455972_0004_01
2016-03-30 14:59:15,498 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
Creating password for appattempt_1459326455972_0004_01
2016-03-30 14:59:15,533 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
launching container Container: [ContainerId:
container_1459326455972_0004_01_01, NodeId: myhost:60653,
NodeHttpAddress: myhost:8042, Resource: , Priority:
0, Token: Token { kind: ContainerToken, service: 10.20.53.123:60653 }, ]
for AM appattempt_1459326455972_0004_01
2016-03-30 14:59:15,533 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1459326455972_0004_01 State change from ALLOCATED to LAUNCHED
2016-03-30 14:59:16,437 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1459326455972_0004_01_01 Container Transitioned from ACQUIRED
to RUNNING
2016-03-30 14:59:28,514 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
Auth successful for appattempt_1459326455972_0004_01 (auth:SIMPLE)
2016-03-30 14:59:28,527 INFO
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
registration appattempt_1459326455972_0004_01
2016-03-30 14:59:28,527 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=myhost
IP=10.20.53.123 OPERATION=Register App Master
TARGET=ApplicationMasterService RESULT=SUCCESS
APPID=application_1459326455972_0004
APPATTEMPTID=appattempt_1459326455972_0004_01
2016-03-30 14:59:28,527 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1459326455972_0004_01 State change from LAUNCHED to RUNNING
2016-03-30 14:59:28,528 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1459326455972_0004 State change from ACCEPTED to RUNNING
2016-03-30 14:59:29,456 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1459326455972_0004_01_02 Container Transitioned from NEW to
ALLOCATED
2016-03-30 14:59:29,457 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=myhost OPERATION=AM
Allocated Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1459326455972_0004
CONTAINERID=container_1459326455972_0004_01_02
2016-03-30 14:59:29,457 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_1459326455972_0004_01_02 of capacity
 on host myhost:60653, which has 2 containers,
 used and  available after
allocation
2016-03-30 14:59:30,121 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
Sending NMToken for nodeId : myhost:60653 for container :
container_1459326455972_0004_01_02
2016-03-30 14:59:30,122 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1459326455972_0004_01_02 Container Transitioned from
ALLOCATED to ACQUIRED
2016-03-30 14:59:30,458 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
Making reservation: node=myhost app_id=application_1459326455972_0004
2016-03-30 14:59:30,458 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1459326455972_0004_01_03 Container Transitioned from NEW to
RESERVED
2016-03-30 14:59:30,458 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode:
Reserved container container_1459326455972_0004_01_03 on node host:
myhost:60653 #containers=2 available=1468 used=2560 for application
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@57cf4903
2016-03-30 14:59:31,460 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1459326455972_0004_01_02 Container Transitioned from ACQUIRED
to RUNNING

NM -->

2016-03-30 15:02:38,537 INFO
org.apache.hadoop.yarn.server.nodemana

Re: Running Spark on Yarn

2016-03-29 Thread Vineet Mishra
:~/Downloads/package/spark-1.6.1-bin-hadoop2.6$ bin/spark-shell --master
yarn-client

16/03/30 03:24:43 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost sending #138
16/03/30 03:24:43 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost got value #138
16/03/30 03:24:43 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 3ms
16/03/30 03:24:44 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost sending #139
16/03/30 03:24:44 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost got value #139
16/03/30 03:24:44 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 4ms


scala>

scala>

scala>

scala>

scala> 16/03/30 03:24:45 DEBUG ipc.Client: IPC Client (111576772)
connection to myhost/192.168.1.108:8032 from myhost sending #140
16/03/30 03:24:45 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost got value #140
16/03/30 03:24:45 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 4ms
16/03/30 03:24:46 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost sending #141
16/03/30 03:24:46 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost got value #141
16/03/30 03:24:46 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 2ms
16/03/30 03:24:47 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost sending #142
16/03/30 03:24:47 DEBUG ipc.Client: IPC Client (111576772) connection to
myhost/192.168.1.108:8032 from myhost got value #142
16/03/30 03:24:47 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 4ms


On Wed, Mar 30, 2016 at 3:26 AM, Vineet Mishra 
wrote:

> Looks like still the same while the other MR application is working fine,
>
>
>
> On Wed, Mar 30, 2016 at 3:15 AM, Alexander Pivovarov  > wrote:
>
>> for small cluster set the following settings
>>
>> yarn-site.xml
>>
>> 
>>   yarn.scheduler.minimum-allocation-mb
>>   32
>> 
>>
>>
>> capacity-scheduler.xml
>>
>>   
>> yarn.scheduler.capacity.maximum-am-resource-percent
>> 0.5
>> 
>>   Maximum percent of resources in the cluster which can be used to run
>>   application masters i.e. controls number of concurrent running
>>   applications.
>> 
>>   
>>
>>
>> Probably yarn can not allocate mem for AM container. dafault value is 0.1
>> and Spark AM need 896 MB   (and 0.1 gives just 393 MB which is not enough)
>>
>> On Tue, Mar 29, 2016 at 2:35 PM, Vineet Mishra 
>> wrote:
>>
>>> Yarn seems to be running fine, I have successful MR jobs completed on
>>> the same,
>>>
>>> *Cluster Metrics*
>>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>>> Running Memory Used Memory Total Memory Reserved VCores Used VCores Total
>>> VCores Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy
>>> Nodes Rebooted Nodes*
>>> *1 0 0 1 0 0 B 8 GB 0 B 0 8 0 1 0 0 0 0*
>>> *User Metrics for dr.who*
>>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>>> Running Containers Pending Containers Reserved Memory Used Memory Pending
>>> Memory Reserved VCores Used VCores Pending VCores Reserved*
>>> *0 0 0 1 0 0 0 0 B 0 B 0 B 0 0 0*
>>> *Show  entriesSearch: *
>>> *ID*
>>> *User*
>>> *Name*
>>> *Application Type*
>>> *Queue*
>>> *StartTime*
>>> *FinishTime*
>>> *State*
>>> *FinalStatus*
>>> *Progress*
>>> *Tracking UI*
>>> *application_1459287061048_0001 myhost word count MAPREDUCE root.myhost
>>> Tue, 29 Mar 2016 21:31:39 GMT Tue, 29 Mar 2016 21:31:59 GMT FINISHED
>>> SUCCEEDED *
>>> *History*
>>>
>>> On Wed, Mar 30, 2016 at 2:52 AM, Alexander Pivovarov <
>>> apivova...@gmail.com> wrote:
>>>
>>>> check resource manager and node manager logs.
>>>> Maybe you find smth explaining why 1 app is pending
>>>>
>>>> do you have any app run successfully? *Apps Completed is 0 on the UI*
>>>>
>>>>
>>>> On Tue, Mar 29, 2016 at 2:13 PM, Vineet Mishra 
>>>> wrote:
>>>>
>>>>> Hi Alex/Surendra,
>>>>>
>>>>> Hadoop is up and running fine and I am able to run example on the same.
>>>>>
>>>>> *Cluster Metrics*
>>>>> *Apps Submitted Apps Pending App

Re: Running Spark on Yarn

2016-03-29 Thread Vineet Mishra
Looks like still the same while the other MR application is working fine,



On Wed, Mar 30, 2016 at 3:15 AM, Alexander Pivovarov 
wrote:

> for small cluster set the following settings
>
> yarn-site.xml
>
> 
>   yarn.scheduler.minimum-allocation-mb
>   32
> 
>
>
> capacity-scheduler.xml
>
>   
> yarn.scheduler.capacity.maximum-am-resource-percent
> 0.5
> 
>   Maximum percent of resources in the cluster which can be used to run
>   application masters i.e. controls number of concurrent running
>   applications.
> 
>   
>
>
> Probably yarn can not allocate mem for AM container. dafault value is 0.1
> and Spark AM need 896 MB   (and 0.1 gives just 393 MB which is not enough)
>
> On Tue, Mar 29, 2016 at 2:35 PM, Vineet Mishra 
> wrote:
>
>> Yarn seems to be running fine, I have successful MR jobs completed on the
>> same,
>>
>> *Cluster Metrics*
>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>> Running Memory Used Memory Total Memory Reserved VCores Used VCores Total
>> VCores Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy
>> Nodes Rebooted Nodes*
>> *1 0 0 1 0 0 B 8 GB 0 B 0 8 0 1 0 0 0 0*
>> *User Metrics for dr.who*
>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>> Running Containers Pending Containers Reserved Memory Used Memory Pending
>> Memory Reserved VCores Used VCores Pending VCores Reserved*
>> *0 0 0 1 0 0 0 0 B 0 B 0 B 0 0 0*
>> *Show  entriesSearch: *
>> *ID*
>> *User*
>> *Name*
>> *Application Type*
>> *Queue*
>> *StartTime*
>> *FinishTime*
>> *State*
>> *FinalStatus*
>> *Progress*
>> *Tracking UI*
>> *application_1459287061048_0001 myhost word count MAPREDUCE root.myhost
>> Tue, 29 Mar 2016 21:31:39 GMT Tue, 29 Mar 2016 21:31:59 GMT FINISHED
>> SUCCEEDED *
>> *History*
>>
>> On Wed, Mar 30, 2016 at 2:52 AM, Alexander Pivovarov <
>> apivova...@gmail.com> wrote:
>>
>>> check resource manager and node manager logs.
>>> Maybe you find smth explaining why 1 app is pending
>>>
>>> do you have any app run successfully? *Apps Completed is 0 on the UI*
>>>
>>>
>>> On Tue, Mar 29, 2016 at 2:13 PM, Vineet Mishra 
>>> wrote:
>>>
>>>> Hi Alex/Surendra,
>>>>
>>>> Hadoop is up and running fine and I am able to run example on the same.
>>>>
>>>> *Cluster Metrics*
>>>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>>>> Running Memory Used Memory Total Memory Reserved VCores Used VCores Total
>>>> VCores Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy
>>>> Nodes Rebooted Nodes*
>>>> *1 1 0 0 0 0 B 3.93 GB 0 B 0 4 0 1 0 0 0 0*
>>>> *User Metrics for dr.who*
>>>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>>>> Running Containers Pending Containers Reserved Memory Used Memory Pending
>>>> Memory Reserved VCores Used VCores Pending VCores Reserved*
>>>> *0 1 0 0 0 0 0 0 B 0 B 0 B 0 0 0*
>>>>
>>>> Any Other trace?
>>>>
>>>> On Wed, Mar 30, 2016 at 2:31 AM, Alexander Pivovarov <
>>>> apivova...@gmail.com> wrote:
>>>>
>>>>> check 8088 ui
>>>>> - how many cores and memory available
>>>>> - how many slaves are active
>>>>>
>>>>> run teragen or pi from hadoop examples to make sure that yarn works
>>>>>
>>>>> On Tue, Mar 29, 2016 at 1:25 PM, Surendra , Manchikanti <
>>>>> surendra.manchika...@gmail.com> wrote:
>>>>>
>>>>>> Hi Vineeth,
>>>>>>
>>>>>> Can you please check resource(RAM,Cores) availability in your local
>>>>>> cluster, And change accordingly.
>>>>>>
>>>>>> Regards,
>>>>>> Surendra M
>>>>>>
>>>>>> -- Surendra Manchikanti
>>>>>>
>>>>>> On Tue, Mar 29, 2016 at 1:15 PM, Vineet Mishra <
>>>>>> clearmido...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> While starting Spark on Yarn on local cluster(Single Node Hadoop 2.6
>>>>>>> yarn) I am facing some issues.
>>>>>>>
>>>>>>> As I try to start the Spark Shell it keeps on iterating i

Re: Running Spark on Yarn

2016-03-29 Thread Vineet Mishra
Yarn seems to be running fine, I have successful MR jobs completed on the
same,

*Cluster Metrics*
*Apps Submitted Apps Pending Apps Running Apps Completed Containers Running
Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores
Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes
Rebooted Nodes*
*1 0 0 1 0 0 B 8 GB 0 B 0 8 0 1 0 0 0 0*
*User Metrics for dr.who*
*Apps Submitted Apps Pending Apps Running Apps Completed Containers Running
Containers Pending Containers Reserved Memory Used Memory Pending Memory
Reserved VCores Used VCores Pending VCores Reserved*
*0 0 0 1 0 0 0 0 B 0 B 0 B 0 0 0*
*Show  entriesSearch: *
*ID*
*User*
*Name*
*Application Type*
*Queue*
*StartTime*
*FinishTime*
*State*
*FinalStatus*
*Progress*
*Tracking UI*
*application_1459287061048_0001 myhost word count MAPREDUCE root.myhost
Tue, 29 Mar 2016 21:31:39 GMT Tue, 29 Mar 2016 21:31:59 GMT FINISHED
SUCCEEDED *
*History*

On Wed, Mar 30, 2016 at 2:52 AM, Alexander Pivovarov 
wrote:

> check resource manager and node manager logs.
> Maybe you find smth explaining why 1 app is pending
>
> do you have any app run successfully? *Apps Completed is 0 on the UI*
>
>
> On Tue, Mar 29, 2016 at 2:13 PM, Vineet Mishra 
> wrote:
>
>> Hi Alex/Surendra,
>>
>> Hadoop is up and running fine and I am able to run example on the same.
>>
>> *Cluster Metrics*
>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>> Running Memory Used Memory Total Memory Reserved VCores Used VCores Total
>> VCores Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy
>> Nodes Rebooted Nodes*
>> *1 1 0 0 0 0 B 3.93 GB 0 B 0 4 0 1 0 0 0 0*
>> *User Metrics for dr.who*
>> *Apps Submitted Apps Pending Apps Running Apps Completed Containers
>> Running Containers Pending Containers Reserved Memory Used Memory Pending
>> Memory Reserved VCores Used VCores Pending VCores Reserved*
>> *0 1 0 0 0 0 0 0 B 0 B 0 B 0 0 0*
>>
>> Any Other trace?
>>
>> On Wed, Mar 30, 2016 at 2:31 AM, Alexander Pivovarov <
>> apivova...@gmail.com> wrote:
>>
>>> check 8088 ui
>>> - how many cores and memory available
>>> - how many slaves are active
>>>
>>> run teragen or pi from hadoop examples to make sure that yarn works
>>>
>>> On Tue, Mar 29, 2016 at 1:25 PM, Surendra , Manchikanti <
>>> surendra.manchika...@gmail.com> wrote:
>>>
>>>> Hi Vineeth,
>>>>
>>>> Can you please check resource(RAM,Cores) availability in your local
>>>> cluster, And change accordingly.
>>>>
>>>> Regards,
>>>> Surendra M
>>>>
>>>> -- Surendra Manchikanti
>>>>
>>>> On Tue, Mar 29, 2016 at 1:15 PM, Vineet Mishra 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> While starting Spark on Yarn on local cluster(Single Node Hadoop 2.6
>>>>> yarn) I am facing some issues.
>>>>>
>>>>> As I try to start the Spark Shell it keeps on iterating in a endless
>>>>> loop while initiating,
>>>>>
>>>>> *6/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120) connection
>>>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>>>> sending #11971*
>>>>> *16/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120)
>>>>> connection to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from
>>>>> myhost got value #11971*
>>>>> *16/03/30 01:32:38 DEBUG ipc.ProtobufRpcEngine: Call:
>>>>> getApplicationReport took 1ms*
>>>>> *16/03/30 01:32:38 INFO yarn.Client: Application report for
>>>>> application_1459260674306_0003 (state: ACCEPTED)*
>>>>> *16/03/30 01:32:38 DEBUG yarn.Client: *
>>>>> * client token: N/A*
>>>>> * diagnostics: N/A*
>>>>> * ApplicationMaster host: N/A*
>>>>> * ApplicationMaster RPC port: -1*
>>>>> * queue: root.thequeue*
>>>>> * start time: 1459269797431*
>>>>> * final status: UNDEFINED*
>>>>> * tracking URL:
>>>>> http://myhost:8088/proxy/application_1459260674306_0003/
>>>>> <http://myhost:8088/proxy/application_1459260674306_0003/>*
>>>>> * user: myhost*
>>>>>
>>>>> *16/03/30 01:45:07 DEBUG ipc.Client: IPC Client (101088744) connection
>>>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>>>> sending #3

Re: Running Spark on Yarn

2016-03-29 Thread Vineet Mishra
Hi Alex/Surendra,

Hadoop is up and running fine and I am able to run example on the same.

*Cluster Metrics*
*Apps Submitted Apps Pending Apps Running Apps Completed Containers Running
Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores
Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes
Rebooted Nodes*
*1 1 0 0 0 0 B 3.93 GB 0 B 0 4 0 1 0 0 0 0*
*User Metrics for dr.who*
*Apps Submitted Apps Pending Apps Running Apps Completed Containers Running
Containers Pending Containers Reserved Memory Used Memory Pending Memory
Reserved VCores Used VCores Pending VCores Reserved*
*0 1 0 0 0 0 0 0 B 0 B 0 B 0 0 0*

Any Other trace?

On Wed, Mar 30, 2016 at 2:31 AM, Alexander Pivovarov 
wrote:

> check 8088 ui
> - how many cores and memory available
> - how many slaves are active
>
> run teragen or pi from hadoop examples to make sure that yarn works
>
> On Tue, Mar 29, 2016 at 1:25 PM, Surendra , Manchikanti <
> surendra.manchika...@gmail.com> wrote:
>
>> Hi Vineeth,
>>
>> Can you please check resource(RAM,Cores) availability in your local
>> cluster, And change accordingly.
>>
>> Regards,
>> Surendra M
>>
>> -- Surendra Manchikanti
>>
>> On Tue, Mar 29, 2016 at 1:15 PM, Vineet Mishra 
>> wrote:
>>
>>> Hi All,
>>>
>>> While starting Spark on Yarn on local cluster(Single Node Hadoop 2.6
>>> yarn) I am facing some issues.
>>>
>>> As I try to start the Spark Shell it keeps on iterating in a endless
>>> loop while initiating,
>>>
>>> *6/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>> sending #11971*
>>> *16/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost got
>>> value #11971*
>>> *16/03/30 01:32:38 DEBUG ipc.ProtobufRpcEngine: Call:
>>> getApplicationReport took 1ms*
>>> *16/03/30 01:32:38 INFO yarn.Client: Application report for
>>> application_1459260674306_0003 (state: ACCEPTED)*
>>> *16/03/30 01:32:38 DEBUG yarn.Client: *
>>> * client token: N/A*
>>> * diagnostics: N/A*
>>> * ApplicationMaster host: N/A*
>>> * ApplicationMaster RPC port: -1*
>>> * queue: root.thequeue*
>>> * start time: 1459269797431*
>>> * final status: UNDEFINED*
>>> * tracking URL: http://myhost:8088/proxy/application_1459260674306_0003/
>>> <http://myhost:8088/proxy/application_1459260674306_0003/>*
>>> * user: myhost*
>>>
>>> *16/03/30 01:45:07 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>> sending #338*
>>> *16/03/30 01:45:07 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost got
>>> value #338*
>>> *16/03/30 01:45:07 DEBUG ipc.ProtobufRpcEngine: Call:
>>> getApplicationReport took 2ms*
>>> *16/03/30 01:45:08 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>> sending #339*
>>> *16/03/30 01:45:08 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost got
>>> value #339*
>>> *16/03/30 01:45:08 DEBUG ipc.ProtobufRpcEngine: Call:
>>> getApplicationReport took 2ms*
>>> *16/03/30 01:45:09 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>> sending #340*
>>> *16/03/30 01:45:09 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost got
>>> value #340*
>>> *16/03/30 01:45:09 DEBUG ipc.ProtobufRpcEngine: Call:
>>> getApplicationReport took 2ms*
>>> *16/03/30 01:45:10 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost
>>> sending #341*
>>> *16/03/30 01:45:10 DEBUG ipc.Client: IPC Client (101088744) connection
>>> to myhost/192.168.1.108:8032 <http://192.168.1.108:8032> from myhost got
>>> value #341*
>>> *16/03/30 01:45:10 DEBUG ipc.ProtobufRpcEngine: Call:
>>> getApplicationReport took 1ms*
>>>
>>> Any leads would be appreciated.
>>>
>>> Thanks!
>>>
>>
>>
>


Running Spark on Yarn

2016-03-29 Thread Vineet Mishra
Hi All,

While starting Spark on Yarn on local cluster(Single Node Hadoop 2.6 yarn)
I am facing some issues.

As I try to start the Spark Shell it keeps on iterating in a endless loop
while initiating,

*6/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120) connection to
myhost/192.168.1.108:8032  from myhost sending
#11971*
*16/03/30 01:32:38 DEBUG ipc.Client: IPC Client (1782965120) connection to
myhost/192.168.1.108:8032  from myhost got value
#11971*
*16/03/30 01:32:38 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 1ms*
*16/03/30 01:32:38 INFO yarn.Client: Application report for
application_1459260674306_0003 (state: ACCEPTED)*
*16/03/30 01:32:38 DEBUG yarn.Client: *
* client token: N/A*
* diagnostics: N/A*
* ApplicationMaster host: N/A*
* ApplicationMaster RPC port: -1*
* queue: root.thequeue*
* start time: 1459269797431*
* final status: UNDEFINED*
* tracking URL: http://myhost:8088/proxy/application_1459260674306_0003/
*
* user: myhost*

*16/03/30 01:45:07 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost sending
#338*
*16/03/30 01:45:07 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost got value
#338*
*16/03/30 01:45:07 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 2ms*
*16/03/30 01:45:08 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost sending
#339*
*16/03/30 01:45:08 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost got value
#339*
*16/03/30 01:45:08 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 2ms*
*16/03/30 01:45:09 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost sending
#340*
*16/03/30 01:45:09 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost got value
#340*
*16/03/30 01:45:09 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 2ms*
*16/03/30 01:45:10 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost sending
#341*
*16/03/30 01:45:10 DEBUG ipc.Client: IPC Client (101088744) connection to
myhost/192.168.1.108:8032  from myhost got value
#341*
*16/03/30 01:45:10 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 1ms*

Any leads would be appreciated.

Thanks!


Re: Running Spark on Gateway - Connecting to Resource Manager Retries

2015-04-15 Thread Vineet Mishra
Hi Akhil,

Its running fine when running through Namenode(RM) but fails while running
through Gateway, if I add hadoop-core jars to the hadoop
directory(/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/) it
works fine.

Its really strange that I am running the job through Spark-Submit and
running via NameNode works fine and it fails when running through gateway
even when both are having same classpath.

Anyone tries running Spark from Gateway?

Looking for the quick revert!

Thanks,


On Wed, Apr 15, 2015 at 12:07 PM, Akhil Das 
wrote:

> Make sure your yarn service is running on 8032.
>
> Thanks
> Best Regards
>
> On Tue, Apr 14, 2015 at 12:35 PM, Vineet Mishra 
> wrote:
>
>> Hi Team,
>>
>> I am running Spark Word Count example(
>> https://github.com/sryza/simplesparkapp), if I go with master as local
>> it works fine.
>>
>> But when I change the master to yarn its end with retries connecting to
>> resource manager(stack trace mentioned below),
>>
>> 15/04/14 11:31:57 INFO RMProxy: Connecting to ResourceManager at /
>> 0.0.0.0:8032
>> 15/04/14 11:31:58 INFO Client: Retrying connect to server:
>> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>> MILLISECONDS)
>> 15/04/14 11:31:59 INFO Client: Retrying connect to server:
>> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>> MILLISECONDS)
>>
>> If I run the same command from Namenode instance it ends with
>> ArrayOutofBoundException(Stack trace mentioned below),
>>
>> 15/04/14 11:38:44 INFO YarnClientSchedulerBackend: SchedulerBackend is
>> ready for scheduling beginning after reached minRegisteredResourcesRatio:
>> 0.8
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
>> at
>> com.cloudera.sparkwordcount.SparkWordCount$.main(SparkWordCount.scala:28)
>> at com.cloudera.sparkwordcount.SparkWordCount.main(SparkWordCount.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> Looking forward to get it resolve to work on respective nodes.
>>
>> Thanks,
>>
>
>


Running Spark on Gateway - Connecting to Resource Manager Retries

2015-04-14 Thread Vineet Mishra
Hi Team,

I am running Spark Word Count example(
https://github.com/sryza/simplesparkapp), if I go with master as local it
works fine.

But when I change the master to yarn its end with retries connecting to
resource manager(stack trace mentioned below),

15/04/14 11:31:57 INFO RMProxy: Connecting to ResourceManager at /
0.0.0.0:8032
15/04/14 11:31:58 INFO Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)
15/04/14 11:31:59 INFO Client: Retrying connect to server:
0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
MILLISECONDS)

If I run the same command from Namenode instance it ends with
ArrayOutofBoundException(Stack trace mentioned below),

15/04/14 11:38:44 INFO YarnClientSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.8
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at com.cloudera.sparkwordcount.SparkWordCount$.main(SparkWordCount.scala:28)
at com.cloudera.sparkwordcount.SparkWordCount.main(SparkWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Looking forward to get it resolve to work on respective nodes.

Thanks,