Re: I cannot use spark 2.3.0 and kafka 0.9?

2018-05-08 Thread Shixiong(Ryan) Zhu
"note that the 0.8 integration is compatible with later 0.9 and 0.10
brokers, but the 0.10 integration is not compatible with earlier brokers."

This is pretty clear. You can use 0.8 integration to talk to 0.9 broker.

Best Regards,
Shixiong Zhu
Databricks Inc.
shixi...@databricks.com

databricks.com

[image: http://databricks.com] 




On Fri, May 4, 2018 at 2:02 AM, kant kodali  wrote:

> Hi All,
>
> This link seems to suggest I cant use Spark 2.3.0 and Kafka 0.9 broker. is
> that correct?
>
> https://spark.apache.org/docs/latest/streaming-kafka-integration.html
>
> Thanks!
>


Re: Error submitting Spark Job in yarn-cluster mode on EMR

2018-05-08 Thread Marco Mistroni
Did you by any chances left a   sparkSession.setMaster("local") lurking in
your code?

Last time i checked, to run on yarn you have to package a 'fat jar'. could
you make sure the spark depedencies in your jar matches the version you are
running on Yarn?

alternatively please share code including how you submit  your application
to spark
FYI this is the command i am using to submit  a program to spark

spark-submit --master yarn --deploy-mode cluster --class 
 

hth

On Tue, May 8, 2018 at 10:14 AM, SparkUser6 
wrote:

> I have a simple program that works fine in the local mode.  But I am having
> issues when I try to run the program in yarn-cluster mode.  I know usually
> no such method happens when compile and run version mismatch but I made
> sure
> I took the same version.
>
> 205  [main] INFO  org.spark_project.jetty.server.ServerConnector  -
> Started
> Spark@29539e36{HTTP/1.1}{0.0.0.0:4040}
> 205  [main] INFO  org.spark_project.jetty.server.Server  - Started @3265ms
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.spark.internal.config.package$.APP_CALLER_
> CONTEXT()Lorg/apache/spark/internal/config/OptionalConfigEntry;
> at org.apache.spark.deploy.yarn.Client.submitApplication(
> Client.scala:163)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(
> YarnClientSchedulerBackend.scala:56)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(
> TaskSchedulerImpl.scala:156)
> at org.apache.spark.SparkContext.(SparkContext.scala:509)
> at
> org.apache.spark.api.java.JavaSparkContext.(
> JavaSparkContext.scala:58)
> at
> com.voicebase.etl.PhoenixToElasticSearch.main(PhoenixToElasticSearch.java:
> 54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:187)
> at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:212)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Guava dependency issue

2018-05-08 Thread Koert Kuipers
we shade guava in our fat jar/assembly jar/application jar

On Tue, May 8, 2018 at 12:31 PM, Marcelo Vanzin  wrote:

> Using a custom Guava version with Spark is not that simple. Spark
> shades Guava, but a lot of libraries Spark uses do not - the main one
> being all of the Hadoop ones, and they need a quite old Guava.
>
> So you have two options: shade/relocate Guava in your application, or
> use spark.{driver|executor}.userClassPath first.
>
> There really isn't anything easier until we get shaded Hadoop client
> libraries...
>
> On Tue, May 8, 2018 at 8:44 AM, Stephen Boesch  wrote:
> >
> > I downgraded to spark 2.0.1 and it fixed that particular runtime
> exception:
> > but then a similar one appears when saving to parquet:
> >
> > An  SOF question on this was created a month ago and today further
> details
> > plus an open bounty were added to it:
> >
> > https://stackoverflow.com/questions/49713485/spark-
> error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c
> >
> > The new but similar exception is shown below:
> >
> > The hack to downgrade to 2.0.1 does help - i.e. execution proceeds
> further :
> > but then when writing out to parquet the above error does happen.
> >
> > 8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0
> (TID
> > 2618)
> > java.lang.NoSuchMethodError:
> > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/
> CacheLoader;)Lcom/google/common/cache/LoadingCache;
> > at
> > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
> > at org.apache.hadoop.io.compress.CodecPool.(CodecPool.
> java:74)
> > at
> > org.apache.parquet.hadoop.CodecFactory$BytesCompressor.<
> init>(CodecFactory.java:92)
> > at
> > org.apache.parquet.hadoop.CodecFactory.getCompressor(
> CodecFactory.java:169)
> > at
> > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(
> ParquetOutputFormat.java:303)
> > at
> > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(
> ParquetOutputFormat.java:262)
> > at
> > org.apache.spark.sql.execution.datasources.parquet.
> ParquetOutputWriter.(ParquetFileFormat.scala:562)
> > at
> > org.apache.spark.sql.execution.datasources.parquet.
> ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
> > at
> > org.apache.spark.sql.execution.datasources.BaseWriterContainer.
> newOutputWriter(WriterContainer.scala:131)
> > at
> > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.
> writeRows(WriterContainer.scala:247)
> > at
> > org.apache.spark.sql.execution.datasources.
> InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$
> apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> > at
> > org.apache.spark.sql.execution.datasources.
> InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$
> apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:70)
> > at org.apache.spark.scheduler.Task.run(Task.scala:86)
> > at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:6
> >
> >
> >
> > 2018-05-07 10:30 GMT-07:00 Stephen Boesch :
> >>
> >> I am intermittently running into guava dependency issues across mutiple
> >> spark projects.  I have tried maven shade / relocate but it does not
> resolve
> >> the issues.
> >>
> >> The current project is extremely simple: *no* additional dependencies
> >> beyond scala, spark, and scalatest - yet the issues remain (and yes mvn
> >> clean was re-applied).
> >>
> >> Is there a reliable approach to handling the versioning for guava within
> >> spark dependency projects?
> >>
> >>
> >> [INFO]
> >> 
> 
> >> [INFO] Building ccapps_final 1.0-SNAPSHOT
> >> [INFO]
> >> 
> 
> >> [INFO]
> >> [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final ---
> >> 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop
> >> library for your platform... using builtin-java classes where applicable
> >> [WARNING]
> >> java.lang.NoSuchMethodError:
> >> com.google.common.cache.CacheBuilder.refreshAfterWrite(JLjava/util/
> concurrent/TimeUnit;)Lcom/google/common/cache/CacheBuilder;
> >> at org.apache.hadoop.security.Groups.(Groups.java:96)
> >> at org.apache.hadoop.security.Groups.(Groups.java:73)
> >> at
> >> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(
> Groups.java:293)
> >> at
> >> org.apache.hadoop.security.UserGroupInformation.initialize(
> UserGroupInformation.java:283)
> >> at
> >> 

Re: Guava dependency issue

2018-05-08 Thread Marcelo Vanzin
Using a custom Guava version with Spark is not that simple. Spark
shades Guava, but a lot of libraries Spark uses do not - the main one
being all of the Hadoop ones, and they need a quite old Guava.

So you have two options: shade/relocate Guava in your application, or
use spark.{driver|executor}.userClassPath first.

There really isn't anything easier until we get shaded Hadoop client
libraries...

On Tue, May 8, 2018 at 8:44 AM, Stephen Boesch  wrote:
>
> I downgraded to spark 2.0.1 and it fixed that particular runtime exception:
> but then a similar one appears when saving to parquet:
>
> An  SOF question on this was created a month ago and today further details
> plus an open bounty were added to it:
>
> https://stackoverflow.com/questions/49713485/spark-error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c
>
> The new but similar exception is shown below:
>
> The hack to downgrade to 2.0.1 does help - i.e. execution proceeds further :
> but then when writing out to parquet the above error does happen.
>
> 8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0 (TID
> 2618)
> java.lang.NoSuchMethodError:
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
> at
> org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
> at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
> at
> org.apache.parquet.hadoop.CodecFactory$BytesCompressor.(CodecFactory.java:92)
> at
> org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)
> at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
> at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
> at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
> at
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
> at
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
>
>
>
> 2018-05-07 10:30 GMT-07:00 Stephen Boesch :
>>
>> I am intermittently running into guava dependency issues across mutiple
>> spark projects.  I have tried maven shade / relocate but it does not resolve
>> the issues.
>>
>> The current project is extremely simple: *no* additional dependencies
>> beyond scala, spark, and scalatest - yet the issues remain (and yes mvn
>> clean was re-applied).
>>
>> Is there a reliable approach to handling the versioning for guava within
>> spark dependency projects?
>>
>>
>> [INFO]
>> 
>> [INFO] Building ccapps_final 1.0-SNAPSHOT
>> [INFO]
>> 
>> [INFO]
>> [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final ---
>> 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> [WARNING]
>> java.lang.NoSuchMethodError:
>> com.google.common.cache.CacheBuilder.refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)Lcom/google/common/cache/CacheBuilder;
>> at org.apache.hadoop.security.Groups.(Groups.java:96)
>> at org.apache.hadoop.security.Groups.(Groups.java:73)
>> at
>> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
>> at
>> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
>> at
>> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
>> at
>> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
>> at
>> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
>> at
>> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
>> at
>> 

Re: Advice on multiple streaming job

2018-05-08 Thread Peter Liu
Hi Dhaval,

I'm using Yarn scheduler (without the need to specify the port in the
submit). Not sue why the port issue here.

Gerard seem to have a good point here to have the multiple topics managed
within your application (to avoid the port issue) - Not sure if you're
using Spark Streaming or Spark Structured Streaming (see different
developer links for spark 2.2.0 below, but the same for the latest version).

https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

I'm new to Spark streaming and I was curious what is the reason in your
case to have to run multiple spark service. Is this because of the "fact"
(just my question) that each service can only maintain one dstream?

I'm reading the following part from the guide above (
https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html ) and
was wondering if having multiple topics (split one into multiple) would be
a good practice to enable multiple dstreams and thus better parallelism in
the data processing?

Quote:>> Note that each input DStream creates a single receiver (running on
a worker machine) that receives a single stream of data

Any comment from you guys would be much appreciated!

Cheers,

Peter


On Mon, May 7, 2018 at 5:08 AM, Dhaval Modi  wrote:

> Hi Gerard,
>
> Our source is kafka, and we are using standard streaming api (DStreams).
>
> Our requirement is,  as we have 100's of kafka topics, Each topic sends
> different messages in JSON (complex) format. Topics structured are as per
> domain.
> Hence, each topic is independent of each other.
> These JSON messages needs to be flattened and stored in Hive.
>
> For these 100's of topic, currently we have 100's of jobs running
> independently and using different UI port.
>
>
>
> Regards,
> Dhaval Modi
> dhavalmod...@gmail.com
>
> On 7 May 2018 at 13:53, Gerard Maas  wrote:
>
>> Dhaval,
>>
>> Which Streaming API are you using?
>> In Structured Streaming, you are able to start several streaming queries
>> within the same context.
>>
>> kind regards, Gerard.
>>
>> On Sun, May 6, 2018 at 7:59 PM, Dhaval Modi 
>> wrote:
>>
>>> Hi Susan,
>>>
>>> Thanks for your response.
>>>
>>> Will try configuration as suggested.
>>>
>>> But still i am looking for answer does Spark support running multiple
>>> jobs on the same port?
>>>
>>> On Sun, May 6, 2018, 20:27 Susan X. Huynh  wrote:
>>>
 Hi Dhaval,

 Not sure if you have considered this: the port 4040 sounds like a
 driver UI port. By default it will try up to 4056, but you can increase
 that number with "spark.port.maxRetries". (
 https://spark.apache.org/docs/latest/configuration.html) Try setting
 it to "32". This would help if the only conflict is among the driver UI
 ports (like if you have > 16 drivers running on the same host).

 Susan

 On Sun, May 6, 2018 at 12:32 AM, vincent gromakowski <
 vincent.gromakow...@gmail.com> wrote:

> Use a scheduler that abstract the network away with a CNI for instance
> or other mécanismes (mesos, kubernetes, yarn). The CNI will allow to 
> always
> bind on the same ports because each container will have its own IP. Some
> other solution like mesos and marathon can work without CNI , with host IP
> binding, but will manage the ports for you ensuring there isn't any
> conflict.
>
> Le sam. 5 mai 2018 à 17:10, Dhaval Modi  a
> écrit :
>
>> Hi All,
>>
>> Need advice on executing multiple streaming jobs.
>>
>> Problem:- We have 100's of streaming job. Every streaming job uses
>> new port. Also, Spark automatically checks port from 4040 to 4056, post
>> that it fails. One of the workaround, is to provide port explicitly.
>>
>> Is there a way to tackle this situation? or Am I missing any thing?
>>
>> Thanking you in advance.
>>
>> Regards,
>> Dhaval Modi
>> dhavalmod...@gmail.com
>>
>


 --
 Susan X. Huynh
 Software engineer, Data Agility
 xhu...@mesosphere.com

>>>
>>
>


Re: Guava dependency issue

2018-05-08 Thread Stephen Boesch
I downgraded to spark 2.0.1 and it fixed that *particular *runtime
exception: but then a similar one appears when saving to parquet:

An  SOF question on this was created a month ago and today further details plus
an open bounty were added to it:

https://stackoverflow.com/questions/49713485/spark-error-with-google-guava-library-java-lang-nosuchmethoderror-com-google-c

The new but similar exception is shown below:

The hack to downgrade to 2.0.1 does help - i.e. execution proceeds *further* :
but then when writing out to *parquet* the above error does happen.

8/05/07 11:26:11 ERROR Executor: Exception in task 0.0 in stage 2741.0
(TID 2618)
java.lang.NoSuchMethodError:
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
at 
org.apache.parquet.hadoop.CodecFactory$BytesCompressor.(CodecFactory.java:92)
at 
org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:169)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6



2018-05-07 10:30 GMT-07:00 Stephen Boesch :

> I am intermittently running into guava dependency issues across mutiple
> spark projects.  I have tried maven shade / relocate but it does not
> resolve the issues.
>
> The current project is extremely simple: *no* additional dependencies
> beyond scala, spark, and scalatest - yet the issues remain (and yes mvn
> clean was re-applied).
>
> Is there a reliable approach to handling the versioning for guava within
> spark dependency projects?
>
>
> [INFO] 
> 
> [INFO] Building ccapps_final 1.0-SNAPSHOT
> [INFO] 
> 
> [INFO]
> [INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ ccapps_final ---
> 18/05/07 10:24:00 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> [WARNING]
> java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.
> refreshAfterWrite(JLjava/util/concurrent/TimeUnit;)Lcom/
> google/common/cache/CacheBuilder;
> at org.apache.hadoop.security.Groups.(Groups.java:96)
> at org.apache.hadoop.security.Groups.(Groups.java:73)
> at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(
> Groups.java:293)
> at org.apache.hadoop.security.UserGroupInformation.initialize(
> UserGroupInformation.java:283)
> at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(
> UserGroupInformation.java:260)
> at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
> UserGroupInformation.java:789)
> at org.apache.hadoop.security.UserGroupInformation.getLoginUser(
> UserGroupInformation.java:774)
> at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(
> UserGroupInformation.java:647)
> at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.
> apply(Utils.scala:2424)
> at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.
> apply(Utils.scala:2424)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2424)
> at org.apache.spark.SparkContext.(SparkContext.scala:295)
> at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)
> at org.apache.spark.sql.SparkSession$Builder$$anonfun$
> 6.apply(SparkSession.scala:918)
> at org.apache.spark.sql.SparkSession$Builder$$anonfun$
> 6.apply(SparkSession.scala:910)
> at 

Re: Help Required - Unable to run spark-submit on YARN client mode

2018-05-08 Thread Deepak Sharma
Can you try increasing the partition for the base RDD/dataframe that you
are working on?


On Tue, May 8, 2018 at 5:05 PM, Debabrata Ghosh 
wrote:

> Hi Everyone,
> I have been trying to run spark-shell in YARN client mode, but am getting
> lot of ClosedChannelException errors, however the program works fine on
> local mode.  I am using spark 2.2.0 build for Hadoop 2.7.3.  If you are
> familiar with this error, please can you help with the possible resolution.
>
> Any help would be greatly appreciated!
>
> Here is the error message:
>
> 18/05/08 00:01:18 ERROR TransportClient: Failed to send RPC
> 7905321254854295784 to /9.30.94.43:60220: java.nio.channels.
> ClosedChannelException
> java.nio.channels.ClosedChannelException
> at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown
> Source)
> 18/05/08 00:01:18 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint:
> Sending RequestExecutors(5,0,Map(),Set()) to AM was unsuccessful
> java.io.IOException: Failed to send RPC 7905321254854295784 to /
> 9.30.94.43:60220: java.nio.channels.ClosedChannelException
> at org.apache.spark.network.client.TransportClient.lambda$
> sendRpc$2(TransportClient.java:237)
> at io.netty.util.concurrent.DefaultPromise.notifyListener0(
> DefaultPromise.java:507)
> at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(
> DefaultPromise.java:481)
> at io.netty.util.concurrent.DefaultPromise.access$000(
> DefaultPromise.java:34)
> at io.netty.util.concurrent.DefaultPromise$1.run(
> DefaultPromise.java:431)
> at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
> SingleThreadEventExecutor.java:399)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:131)
> at io.netty.util.concurrent.DefaultThreadFactory$
> DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedChannelException
>
> Cheers,
>
> Debu
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Help Required - Unable to run spark-submit on YARN client mode

2018-05-08 Thread Debabrata Ghosh
Hi Everyone,
I have been trying to run spark-shell in YARN client mode, but am getting
lot of ClosedChannelException errors, however the program works fine on
local mode.  I am using spark 2.2.0 build for Hadoop 2.7.3.  If you are
familiar with this error, please can you help with the possible resolution.

Any help would be greatly appreciated!

Here is the error message:

18/05/08 00:01:18 ERROR TransportClient: Failed to send RPC
7905321254854295784 to /9.30.94.43:60220:
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at
io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/05/08 00:01:18 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending
RequestExecutors(5,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7905321254854295784 to /
9.30.94.43:60220: java.nio.channels.ClosedChannelException
at
org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at
io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at
io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException

Cheers,

Debu


Error submitting Spark Job in yarn-cluster mode on EMR

2018-05-08 Thread SparkUser6
I have a simple program that works fine in the local mode.  But I am having
issues when I try to run the program in yarn-cluster mode.  I know usually
no such method happens when compile and run version mismatch but I made sure
I took the same version.

205  [main] INFO  org.spark_project.jetty.server.ServerConnector  - Started
Spark@29539e36{HTTP/1.1}{0.0.0.0:4040}
205  [main] INFO  org.spark_project.jetty.server.Server  - Started @3265ms
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.internal.config.package$.APP_CALLER_CONTEXT()Lorg/apache/spark/internal/config/OptionalConfigEntry;
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:163)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.(SparkContext.scala:509)
at
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at
com.voicebase.etl.PhoenixToElasticSearch.main(PhoenixToElasticSearch.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org