Re: Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Any solution please ?

On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh 
wrote:

> Hi,
> I have a spark streaming application where Kafka is producing
> records but unfortunately spark streaming isn't able to consume those.
>
> I am hitting the following error:
>
> 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
> java.lang.AssertionError: assertion failed: Failed to get records for 
> spark-executor-service-spark-ingestion dice-ingestion 11 0 after polling for 
> 12
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:223)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:189)
>
>
> Would you please be able to help with a resolution.
>
> Thanks,
> Debu
>


Re: Spark Streaming not working

2020-04-10 Thread Chenguang He
unsubscribe


Re: Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Yes the Kafka producer is producing records from the same host - Rechecked
Kafka connection and the connection is there. Came across this URL but
unable to understand it

https://stackoverflow.com/questions/42264669/spark-streaming-assertion-failed-failed-to-get-records-for-spark-executor-a-gro

On Fri, Apr 10, 2020 at 11:14 PM Srinivas V  wrote:

> Check if your broker details are correct, verify if you have network
> connectivity to your client box and Kafka broker server host.
>
> On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh 
> wrote:
>
>> Hi,
>> I have a spark streaming application where Kafka is producing
>> records but unfortunately spark streaming isn't able to consume those.
>>
>> I am hitting the following error:
>>
>> 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
>> java.lang.AssertionError: assertion failed: Failed to get records for 
>> spark-executor-service-spark-ingestion dice-ingestion 11 0 after polling for 
>> 12
>>  at scala.Predef$.assert(Predef.scala:170)
>>  at 
>> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>>  at 
>> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:223)
>>  at 
>> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:189)
>>
>>
>> Would you please be able to help with a resolution.
>>
>> Thanks,
>> Debu
>>
>


Re: Spark Streaming not working

2020-04-10 Thread Srinivas V
Check if your broker details are correct, verify if you have network
connectivity to your client box and Kafka broker server host.

On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh 
wrote:

> Hi,
> I have a spark streaming application where Kafka is producing
> records but unfortunately spark streaming isn't able to consume those.
>
> I am hitting the following error:
>
> 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
> java.lang.AssertionError: assertion failed: Failed to get records for 
> spark-executor-service-spark-ingestion dice-ingestion 11 0 after polling for 
> 12
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:223)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:189)
>
>
> Would you please be able to help with a resolution.
>
> Thanks,
> Debu
>


Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Hi,
I have a spark streaming application where Kafka is producing
records but unfortunately spark streaming isn't able to consume those.

I am hitting the following error:

20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24)
java.lang.AssertionError: assertion failed: Failed to get records for
spark-executor-service-spark-ingestion dice-ingestion 11 0 after
polling for 12
at scala.Predef$.assert(Predef.scala:170)
at 
org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:223)
at 
org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:189)


Would you please be able to help with a resolution.

Thanks,
Debu


Re: [External Sender] Re: Driver pods stuck in running state indefinitely

2020-04-10 Thread Prudhvi Chennuru (CONT)
No, there was no internal domain issue. As I mentioned I saw this issue
only on a few nodes on the cluster.

On Thu, Apr 9, 2020 at 10:49 PM Wei Zhang  wrote:

> Is there any internal domain name resolving issues?
>
> > Caused by:  java.net.UnknownHostException:
> spark-1586333186571-driver-svc.fractal-segmentation.svc
>
> -z
> 
> From: Prudhvi Chennuru (CONT) 
> Sent: Friday, April 10, 2020 2:44
> To: user
> Subject: Driver pods stuck in running state indefinitely
>
>
> Hi,
>
>We are running spark batch jobs on K8s.
>Kubernetes version: 1.11.5 ,
>spark version: 2.3.2,
>   docker version: 19.3.8
>
>Issue: Few Driver pods are stuck in running state indefinitely with
> error
>
>```
>The Initial job has not accepted any resources; check your cluster UI
> to ensure that workers are registered and have sufficient resources.
>```
>
> Below is the log of the errored out executor pods
>
>   ```
>Exception in thread "main"
> java.lang.reflect.UndeclaredThrowableException
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1858)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: org.apache.spark.SparkException: Exception thrown in
> awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
> ... 4 more
> Caused by: java.io.IOException: Failed to connect to
> spark-1586333186571-driver-svc.fractal-segmentation.svc:7078
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.UnknownHostException:
> spark-1586333186571-driver-svc.fractal-segmentation.svc
> at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
> at java.net.InetAddress.getAllByName(InetAddress.java:1192)
> at java.net.InetAddress.getAllByName(InetAddress.java:1126)
> at java.net.InetAddress.getByName(InetAddress.java:1076)
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
> at java.security.AccessController.doPrivileged(Native Method)
> at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
> at
> io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
> at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
> at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
> at
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
> at
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
> at
> io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
> at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
> at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
> at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
> at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
> at
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> at
> io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
> at
> 

Spark hangs while reading from jdbc - does nothing

2020-04-10 Thread Ruijing Li
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
seeing a strange issue where spark just hangs and does nothing, not
starting any new tasks. Normally this job finishes in 30 stages but
sometimes it stops at 29 completed stages and doesn’t start the last stage.
The spark job is idling and there is no pending or active task. What could
be the problem? Thanks.
-- 
Cheers,
Ruijing Li


Re: How to import PySpark into Jupyter

2020-04-10 Thread Akchhaya S
Hello Yasir,

You need to check your 'PYTHONPATH' environment variable.

For windows, If I do a "pip install", the package is installed in
"lib\site-packages" under the python folder. If I "print (sys.path)", I see
"lib\site-packages" as one of the entries, and I can expect "import
" to work.

Find the installation location of 'findspark' and add it to the PYTHONPATH,
you can do that inside your script as well like:

import sys
sys.path.append('X:\PathTo\findspark\module')

Hope it works.

Regards,
Akchhaya Sharma


On Fri, Apr 10, 2020 at 4:35 PM Yasir Elgohary  wrote:

> Peace dear all,
>
> I hope you all are well and healthy...
>
> I am brand new to Spark/Hadoop. My env. is: Windows 7 with
> Jupyter/Anaconda and Spark/Hadoop all installed on my laptop. How can I run
> the following without errors:
>
> import findspark
> findspark.init()
> findspark.find()
> from pyspark.sql import SparkSession
>
> This is the error msg. I get:
>
> ModuleNotFoundError: No module named 'findspark'
>
>
> It seems I missing something for Spark to run well with Jupyter/Anaconda on 
> Windows 7.
>
>
> Cheers
>
>
>
>
>
> Cheers
>


Fwd: How to import PySpark into Jupyter

2020-04-10 Thread Yasir Elgohary
Peace dear all,

I hope you all are well and healthy...

I am brand new to Spark/Hadoop. My env. is: Windows 7 with Jupyter/Anaconda
and Spark/Hadoop all installed on my laptop. How can I run the following
without errors:

import findspark
findspark.init()
findspark.find()
from pyspark.sql import SparkSession

This is the error msg. I get:

ModuleNotFoundError: No module named 'findspark'


It seems I missing something for Spark to run well with
Jupyter/Anaconda on Windows 7.


Cheers





Cheers