Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Amit Sharma
/Spark versions. Use > mvn dependency:tree or equivalent on your build to see what you actually > build in. You probably do not need to include json4s at all as it is in > Spark anway > > On Fri, Feb 4, 2022 at 2:35 PM Amit Sharma wrote: > >> Martin Sean, changed it to 3.

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Amit Sharma
PM Sean Owen wrote: > >> You can look it up: >> https://github.com/apache/spark/blob/branch-3.2/pom.xml#L916 >> 3.7.0-M11 >> >> On Thu, Feb 3, 2022 at 1:57 PM Amit Sharma wrote: >> >>> Hello, everyone. I am migrating my spark stream to spark ve

Spark 3.1 Json4s-native jar compatibility

2022-02-03 Thread Amit Sharma
Hello, everyone. I am migrating my spark stream to spark version 3.1. I also upgraded json version as below libraryDependencies += "org.json4s" %% "json4s-native" % "3.7.0-M5" While running the job I getting an error for the below code where I am serializing the given inputs. implicit val

Re: Kafka to spark streaming

2022-01-29 Thread Amit Sharma
xplicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 28 Jan 2022 at 22:14, Amit Sharma wrote: > >> Hello everyone, we have spark streaming application. We send request to

Kafka to spark streaming

2022-01-28 Thread Amit Sharma
Hello everyone, we have spark streaming application. We send request to stream through Akka actor using Kafka topic. We wait for response as it is real time. Just want a suggestion is there any better option like Livy where we can send and receive request to spark streaming. Thanks Amit

Fwd: Cassandra driver upgrade

2022-01-24 Thread Amit Sharma
I am upgrading my cassandra java driver version to the latest 4.13. I have a Cassandra cluster using Cassandra version 3.11.11. I am getting the below runtime error while connecting to cassandra. Before version 4.13 I was using version 3.9 and things were working fine.

Re: Spark 3.2.0 upgrade

2022-01-22 Thread Amit Sharma
ark-mllib" % sparkVersion , "com.datastax.spark" %% "spark-cassandra-connector" % "3.1.0", // this includes cassandra-driver "org.apache.spark" %% "spark-hive" % sparkVersion, "org.apache.spark" %% "spark-streaming-ka

Re: Spark 3.2.0 upgrade

2022-01-21 Thread Amit Sharma
tion: com.codahale.metrics.JmxReporter at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) On Thu, Jan 20, 2022 at 5:1

Spark 3.2.0 upgrade

2022-01-20 Thread Amit Sharma
Hello, I am trying to upgrade my project from spark 2.3.3 to spark 3.2.0. While running the application locally I am getting below error. Could you please let me know which version of the cassandra connector I should use. I am using below shaded connector but i think that causing the issue

Log4j2 upgrade

2022-01-12 Thread Amit Sharma
Hello, everyone. I am replacing log4j with log4j2 in my spark streaming application. When i deployed my application to spark cluster it is giving me the below error . " ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to

Kafka Sink Issue

2021-08-23 Thread Amit Sharma
) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) Can anyone help here and let me know about why it happened and what is resolution for this. -- Thanks & Regards, Amit Sharma

Spark Null Pointer Exception

2021-06-30 Thread Amit Sharma
Hi , I am using spark 2.7 version with scala. I am calling a method as below 1. val rddBacklog = spark.sparkContext.parallelize(MAs) // MA is list of say city 2. rddBacklog.foreach(ma => doAlloc3Daily(ma, fteReview.forecastId, startYear, endYear)) 3.doAlloc3Daily method just doing a database

unit testing for spark code

2021-03-22 Thread Amit Sharma
Hi, can we write unit tests for spark code. Is there any specific framework? Thanks Amit

Re: Understanding Executors UI

2021-01-09 Thread Amit Sharma
I believe it’s a spark Ui issue which do not display correct value. I believe it is resolved for spark 3.0. Thanks Amit On Fri, Jan 8, 2021 at 4:00 PM Luca Canali wrote: > You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the > memory used for storage, the second one is the

Re: Spark UI Storage Memory

2020-12-07 Thread Amit Sharma
any suggestion please. Thanks Amit On Fri, Dec 4, 2020 at 2:27 PM Amit Sharma wrote: > Is there any memory leak in spark 2.3.3 version as mentioned in below > Jira. > https://issues.apache.org/jira/browse/SPARK-29055. > > Please let me know how to solve it. > > Thanks >

Re: Caching

2020-12-07 Thread Amit Sharma
performance when > compared to cache because Spark will optimize for cache the data during > shuffle. > > > > *From: *Amit Sharma > *Reply-To: *"resolve...@gmail.com" > *Date: *Monday, December 7, 2020 at 12:47 PM > *To: *Theodoros Gkountouvas , " > user

Re: Caching

2020-12-07 Thread Amit Sharma
at 1:01 PM Sean Owen wrote: > No, it's not true that one action means every DF is evaluated once. This > is a good counterexample. > > On Mon, Dec 7, 2020 at 11:47 AM Amit Sharma wrote: > >> Thanks for the information. I am using spark 2.3.3 There are few more >> questi

Re: Caching

2020-12-07 Thread Amit Sharma
d I suspect that > DF1 is used more than once (one time at DF2 and another one at DF3). So, > Spark is going to cache it the first time and it will load it from cache > instead of running it again the second time. > > > > I hope this helped, > > Theo. > > >

Caching

2020-12-07 Thread Amit Sharma
Hi All, I am using caching in my code. I have a DF like val DF1 = read csv. val DF2 = DF1.groupBy().agg().select(.) Val DF3 = read csv .join(DF1).join(DF2) DF3 .save. If I do not cache DF2 or Df1 it is taking longer time . But i am doing 1 action only why do I need to cache. Thanks

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Is there any memory leak in spark 2.3.3 version as mentioned in below Jira. https://issues.apache.org/jira/browse/SPARK-29055. Please let me know how to solve it. Thanks Amit On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma wrote: > Can someone help me on this please. > > > Thanks >

Re: Spark UI Storage Memory

2020-12-04 Thread Amit Sharma
Can someone help me on this please. Thanks Amit On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma wrote: > Hi , I have a spark streaming job. When I am checking the Excetors tab , > there is a Storage Memory column. It displays used memory /total memory. > What is used memory. Is it memor

Spark UI Storage Memory

2020-12-02 Thread Amit Sharma
Hi , I have a spark streaming job. When I am checking the Excetors tab , there is a Storage Memory column. It displays used memory /total memory. What is used memory. Is it memory in use or memory used so far. How would I know how much memory is unused at 1 point of time. Thanks Amit

Re: Cache not getting cleaned.

2020-11-21 Thread Amit Sharma
please find attached the screenshot of no active task but memory i still used . [image: image.png] On Sat, Nov 21, 2020 at 4:25 PM Amit Sharma wrote: > I am using df.cache and also unpersisting it. But when I check spark Ui > storage I still see cache memory usage. Do I need to do any

Cache not getting cleaned.

2020-11-21 Thread Amit Sharma
I am using df.cache and also unpersisting it. But when I check spark Ui storage I still see cache memory usage. Do I need to do any thing else. Also in executor tab on spark Ui for each executor memory used/total memory always display some used memory not sure if no request on streaming job then

Re: Spark Exception

2020-11-20 Thread Amit Sharma
Russell i increased the rpc timeout to 240 seconds but i am still getting this issue once a while and after this issue my spark streaming job stuck and do not process any request then i need to restart this every time. Any suggestion please. Thanks Amit On Wed, Nov 18, 2020 at 12:05 PM Amit

Re: Out of memory issue

2020-11-20 Thread Amit Sharma
please help. Thanks Amit On Mon, Nov 9, 2020 at 4:18 PM Amit Sharma wrote: > Please find below the exact exception > > Exception in thread "streaming-job-executor-3" java.lang.OutOfMemoryError: > Java heap space > at java.util.Array

Re: Spark Exception

2020-11-20 Thread Amit Sharma
Please help. Thanks Amit On Wed, Nov 18, 2020 at 12:05 PM Amit Sharma wrote: > Hi, we are running a spark streaming job and sometimes it throws below > two exceptions . I am not understanding what is the difference between > these two exception for one timeout is 120 seconds an

Spark Exception

2020-11-18 Thread Amit Sharma
Hi, we are running a spark streaming job and sometimes it throws below two exceptions . I am not understanding what is the difference between these two exception for one timeout is 120 seconds and another is 600 seconds. What could be the reason for these Error running job streaming job

spark UI storage tab

2020-11-11 Thread Amit Sharma
Hi , I have few questions as below 1. In the spark ui storage tab is displayed 'storage level',' size in memory' and size on disk, i am not sure it displays RDD ID 16 with memory usage 76 MB not sure why it is not getting 0 once a request for spark streaming is completed. I am caching some RDD

Re: Out of memory issue

2020-11-09 Thread Amit Sharma
nfun$map$1.apply(Try.scala:237) at scala.util.Try$.apply(Try.scala:192) at scala.util.Success.map(Try.scala:237) On Sun, Nov 8, 2020 at 1:35 PM Amit Sharma wrote: > Hi , I am using 16 nodes spark cluster with below config > 1. Executor memory 8 GB > 2. 5 cores per executor

Re: Out of memory issue

2020-11-09 Thread Amit Sharma
Can you please help. Thanks Amit On Sun, Nov 8, 2020 at 1:35 PM Amit Sharma wrote: > Hi , I am using 16 nodes spark cluster with below config > 1. Executor memory 8 GB > 2. 5 cores per executor > 3. Driver memory 12 GB. > > > We have streaming job. We do not see problem

Out of memory issue

2020-11-08 Thread Amit Sharma
Hi , I am using 16 nodes spark cluster with below config 1. Executor memory 8 GB 2. 5 cores per executor 3. Driver memory 12 GB. We have streaming job. We do not see problem but sometimes we get exception executor-1 heap memory issue. I am not understanding if data size is same and this job

Spark reading from cassandra

2020-11-04 Thread Amit Sharma
Hi, i have a question while we are reading from cassandra should we use partition key only in where clause from performance perspective or it does not matter from spark perspective because it always allows filtering. Thanks Amit

Driver Information

2020-08-17 Thread Amit Sharma
Hi, I have 20 node clusters. I run multiple batch jobs. in spark submit file ,driver memory=2g and executor memory=4g and I have 8 GB worker. I have below questions 1. Is there any way I know in each batch job which worker is the driver node? 2. Will the driver node be part of one of the

Re: help on use case - spark parquet processing

2020-08-13 Thread Amit Sharma
Can you keep option field in your case class. Thanks Amit On Thu, Aug 13, 2020 at 12:47 PM manjay kumar wrote: > Hi , > > I have a use case, > > where i need to merge three data set and build one where ever data is > available. > > And my dataset is a complex object. > > Customer > - name -

Re: Spark batch job chaining

2020-08-08 Thread Amit Sharma
Any help is appreciated. I have spark batch job based on condition I would like to start another batch job by invoking .sh file. Just want to know can we achieve that? Thanks Amit On Fri, Aug 7, 2020 at 3:58 PM Amit Sharma wrote: > Hi, I want to write a batch job which would call another ba

Spark batch job chaining

2020-08-07 Thread Amit Sharma
Hi, I want to write a batch job which would call another batch job based on condition. Can I call one batch job through another in scala or I can do it just by python script. Example would be really helpful. Thanks Amit

how to copy from one cassandra cluster to another

2020-07-28 Thread Amit Sharma
Hi, I have table A in the cassandra cluster cluster -1 in one data center. I have table B in cluster -2 in another data center. I want to copy the data from one cluster to another using spark. I faced the problem that I can not create two spark sessions as we need spark sessions per cluster.

spark exception

2020-07-24 Thread Amit Sharma
Hi All, sometimes i get this error in spark logs. I notice few executors are shown as dead in the executor tab during this error. Although my job get success. Please help me out the root cause of this issue. I have 3 workers with 30 cores each and 64 GB RAM each. My job uses 3 cores per executor

Re: Garbage collection issue

2020-07-20 Thread Amit Sharma
Please help on this. Thanks Amit On Fri, Jul 17, 2020 at 2:34 PM Amit Sharma wrote: > Hi All, i am running the same batch job in my two separate spark clusters. > In one of the clusters it is showing GC warning on spark -ui under > executer tag. Garbage collection is taking lo

Re: Future timeout

2020-07-20 Thread Amit Sharma
Please help on this. Thanks Amit On Fri, Jul 17, 2020 at 9:10 AM Amit Sharma wrote: > Hi, sometimes my spark streaming job throw this exception Futures timed > out after [300 seconds]. > I am not sure where is the default timeout configuration. Can i increase > it.

Garbage collection issue

2020-07-17 Thread Amit Sharma
Hi All, i am running the same batch job in my two separate spark clusters. In one of the clusters it is showing GC warning on spark -ui under executer tag. Garbage collection is taking longer time around 20 % while in another cluster it is under 10 %. I am using the same configuration in my

Future timeout

2020-07-17 Thread Amit Sharma
Hi, sometimes my spark streaming job throw this exception Futures timed out after [300 seconds]. I am not sure where is the default timeout configuration. Can i increase it. Please help. Thanks Amit Caused by: java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

Cassandra raw deletion

2020-07-04 Thread Amit Sharma
Hi, I have to delete certain raw from Cassandra during my spark batch process. Is there any way to delete Rawat using spark Cassandra connector. Thanks Amit

Truncate table

2020-07-01 Thread Amit Sharma
Hi, i have scenario where i have to read certain raw from a table and truncate the table and store the certain raws back to the table. I am doing below steps 1. reading certain raws in DF1 from cassandra table A. 2. saving into cassandra as override in table A the problem is when I truncate the

No of cores per executor.

2019-12-08 Thread Amit Sharma
I have set 5 cores per executor. Is there any formula to determine best combination of executor and cores and memory per core for better performance. Also when I am running local spark instance in my web jar getting better speed than running in cluster. Thanks Amit

Re: spark streaming exception

2019-10-17 Thread Amit Sharma
Please update me if any one knows about it. Thanks Amit On Thu, Oct 10, 2019 at 3:49 PM Amit Sharma wrote: > Hi , we have spark streaming job to which we send a request through our UI > using kafka. It process and returned the response. We are getting below > error and this

spark streaming exception

2019-10-10 Thread Amit Sharma
Hi , we have spark streaming job to which we send a request through our UI using kafka. It process and returned the response. We are getting below error and this stareming is not processing any request. Listener StreamingJobProgressListener threw an exception java.util.NoSuchElementException: key

Re: Driver vs master

2019-10-07 Thread Amit Sharma
mit > > On Mon, Oct 7, 2019 at 18:33 Amit Sharma wrote: > >> Can you please help me understand this. I believe driver programs runs on >> master node > > If we are running 4 spark job and driver memory config is 4g then total 16 >> 6b would be used of master node. >

Driver vs master

2019-10-07 Thread Amit Sharma
Can you please help me understand this. I believe driver programs runs on master node. If we are running 4 spark job and driver memory config is 4g then total 16 6b would be used of master node. So if we will run more jobs then we need more memory on master. Please correct me if I am wrong.

Re: Memory Limits error

2019-08-16 Thread Amit Sharma
Increasing your driver memory as 12g. On Thursday, August 15, 2019, Dennis Suhari wrote: > Hi community, > > I am using Spark on Yarn. When submiting a job after a long time I get an > error mesage and retry. > > It happens when I want to store the dataframe to a table. > >

Spark Streaming concurrent calls

2019-08-13 Thread Amit Sharma
I am using kafka spark streming. My UI application send request to streaming through kafka. Problem is streaming handles one request at a time so if multiple users send request at the same time they have to wait till earlier request are done. Is there any way it can handle multiple request.

spark job getting hang

2019-08-05 Thread Amit Sharma
I am running spark job and if i run it sometimes it ran successfully but most of the time getting ERROR Dropping event from queue appStatus. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.

Core allocation is scattered

2019-07-25 Thread Amit Sharma
I have cluster with 26 nodes having 16 cores on each. I am running a spark job with 20 cores but i did not understand why my application get 1-2 cores on couple of machines why not it just run on two nodes like node1=16 cores and node 2=4 cores . but cores are allocated like node1=2 node

Re: spark dataset.cache is not thread safe

2019-07-22 Thread Amit Sharma
please update me if any one knows how to handle it. On Sun, Jul 21, 2019 at 7:18 PM Amit Sharma wrote: > Hi , I wrote a code in future block which read data from dataset and cache > it which is used later in the code. I faced a issue that data.cached() data > will be replaced by c

spark dataset.cache is not thread safe

2019-07-21 Thread Amit Sharma
Hi , I wrote a code in future block which read data from dataset and cache it which is used later in the code. I faced a issue that data.cached() data will be replaced by concurrent running thread . Is there any way we can avoid this condition. val dailyData = callDetailsDS.collect.toList val

Re: spark standalone mode problem about executor add and removed again and again!

2019-07-17 Thread Amit Sharma
Do you have dynamic resource allocation enabled? On Wednesday, July 17, 2019, zenglong chen wrote: > Hi,all, > My standalone mode has two slaves.When I submit my job,the > localhost slave is working well,but second slave do add and remove executor > action always!The log are below: >

Dynamic allocation not working

2019-07-08 Thread Amit Sharma
Hi All, i have set the dynamic allocation propertt = true in my script file and also shuffle property in script as well as on all worker nodes spark-env file. I am using spark kafka streaming. I checked that as request comes no of cores allocation increase but even after request is completed no of

Spark-cluster slowness

2019-06-20 Thread Amit Sharma
I have spark cluster on two data centers each. Cluster on spark cluster B is 6 times slower than cluster A. I ran the same job on both cluster and time difference is of 6 times. I used the same config and using spark 2.3.3. I checked that on spark UI it displays the slaves nodes but when i check

Spark Kafka Streaming stopped

2019-06-14 Thread Amit Sharma
we are using spark kafka streaming. We have 6 nodes in kafka cluster if any of the node is getting down we are getting below exception and streaming stopped. ERROR DirectKafkaInputDStream:70 - ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException,

Re: Spark kafka streaming job stopped

2019-06-11 Thread Amit Sharma
Please provide update if any one knows. On Monday, June 10, 2019, Amit Sharma wrote: > > We have spark kafka sreaming job running on standalone spark cluster. We > have below kafka architecture > > 1. Two cluster running on two data centers. > 2. There is LTM on top on each

Fwd: Spark kafka streaming job stopped

2019-06-10 Thread Amit Sharma
We have spark kafka sreaming job running on standalone spark cluster. We have below kafka architecture 1. Two cluster running on two data centers. 2. There is LTM on top on each data center (load balance) 3. There is GSLB on top of LTM. I observed when ever any of the node in kafka cluster is