Re: Log file location in Spark on K8s

2023-10-09 Thread Prashant Sharma
Hi Sanket, Driver and executor logs are written to stdout by default, it can be configured using SPARK_HOME/conf/log4j.properties file. The file including the entire SPARK_HOME/conf is auto propogateded to all driver and executor container and mounted as volume. Thanks On Mon, 9 Oct, 2023, 5:37

Re: Connection pool shut down in Spark Iceberg Streaming Connector

2023-10-05 Thread Prashant Sharma
Hi Sanket, more details might help here. How does your spark configuration look like? What exactly was done when this happened? On Thu, 5 Oct, 2023, 2:29 pm Agrawal, Sanket, wrote: > Hello Everyone, > > > > We are trying to stream the changes in our Iceberg tables stored in AWS > S3. We are

Re: EOF Exception Spark Structured Streams - Kubernetes

2021-02-01 Thread Prashant Sharma
Hi Sachit, The fix verison on that JIRA says 3.0.2, so this fix is not yet released. Soon, there will be a 3.1.1 release, in the meantime you can try out the 3.1.1-rc which also has the fix and let us know your findings. Thanks, On Mon, Feb 1, 2021 at 10:24 AM Sachit Murarka wrote: >

Re: Suggestion on Spark 2.4.7 vs Spark 3 for Kubernetes

2021-01-05 Thread Prashant Sharma
A lot of developers may have already moved to 3.0.x, FYI 3.1.0 is just around the corner hopefully(in a few days) and has a lot of improvements to spark on K8s, including it will be transitioning from experimental to GA in this release. See: https://issues.apache.org/jira/browse/SPARK-33005

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
ate.driver.serviceAccountName=spark-sa --conf > spark.kubernetes.container.image=sparkpy local:///opt/spark/da/main.py > > Kind Regards, > Sachit Murarka > > > On Mon, Jan 4, 2021 at 5:46 PM Prashant Sharma > wrote: > >> Hi Sachit, >> >> Can you give more details on how did you

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
Hi Sachit, Can you give more details on how did you run? i.e. spark submit command. My guess is, a service account with sufficient privilege is not provided. Please see: http://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac Thanks, On Mon, Jan 4, 2021 at 5:27 PM Sachit Murarka

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Prashant Sharma
-dev Hi, I have used Spark with HDFS encrypted with Hadoop KMS, and it worked well. Somehow, I could not recall, if I had the kubernetes in the mix. Somehow, seeing the error, it is not clear what caused the failure. Can I reproduce this somehow? Thanks, On Sat, Aug 15, 2020 at 7:18 PM Michel

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Prashant Sharma
Hi Ashika, Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However,

Re: Spark Compatibility with Java 11

2020-07-14 Thread Prashant Sharma
Hi Ankur, Java 11 support was added in Spark 3.0. https://issues.apache.org/jira/browse/SPARK-24417 Thanks, On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal wrote: > Hi, > > I am using Spark 2.X and need to execute Java 11 .Its not able to execute > Java 11 using Spark 2.X. > > Is there any way

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-12 Thread Prashant Sharma
> scalable and dynamic-allocation-enabled for deploying Spark on K8s? Any > suggested github repo or link? > > > > Thanks, > > Vaibhav V > > > > > > *From:* Prashant Sharma > *Sent:* Friday, July 10, 2020 12:57 AM > *To:* user@spark.apache.org

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Prashant Sharma
Hi, Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only

Employment opportunities.

2019-06-12 Thread Prashant Sharma
Hi, My employer(IBM) is interested in hiring people in hyderabad if they are committers in any of the Apache Projects and are interested Spark and ecosystem. Thanks, Prashant.

Spark Streaming RDD Cleanup too slow

2018-09-05 Thread Prashant Sharma
I have a Spark Streaming job which takes too long to delete temp RDD's. I collect about 4MM telemetry metrics per minute and do minor aggregations in the Streaming Job. I am using Amazon R4 instances. The Driver RPC call although Async,i believe, is slow getting the handle for future object at

Re: Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-26 Thread Prashant Sharma
Hi Darshan, Did you try passing the config directly as an option, like this: .option("kafka.sasl.jaas.config", saslConfig) Where saslConfig can look like: com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \

Kafka Spark structured streaming latency benchmark.

2016-12-17 Thread Prashant Sharma
Hi, Goal of my benchmark is to arrive at end to end latency lower than 100ms and sustain them over time, by consuming from a kafka topic and writing back to another kafka topic using Spark. Since the job does not do aggregation and does a constant time processing on each message, it appeared to

Re: If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition

2016-11-10 Thread Prashant Sharma
+user -dev Since the same hash based partitioner is in action by default. In my understanding every time same partitioning will happen. Thanks, On Nov 10, 2016 7:13 PM, "WangJianfei" wrote: > Hi Devs: > If i run sc.textFile(path,xxx) many times, will the

Re: Large files with wholetextfile()

2016-07-12 Thread Prashant Sharma
Hi Baahu, That should not be a problem, given you allocate sufficient buffer for reading. I was just working on implementing a patch[1] to support the feature for reading wholetextfiles in SQL. This can actually be slightly better approach, because here we read to offheap memory for holding

Re: Streaming K-means not printing predictions

2016-04-26 Thread Prashant Sharma
Since you are reading from file stream, I would suggest instead of printing try to save it on a file. There may be output the first time and then no data in subsequent iterations. Prashant Sharma On Tue, Apr 26, 2016 at 7:40 PM, Ashutosh Kumar <kmr.ashutos...@gmail.com> wrote: >

Re: Save RDD to HDFS using Spark Python API

2016-04-26 Thread Prashant Sharma
is one such formatter class. thanks, Prashant Sharma On Wed, Apr 27, 2016 at 5:22 AM, Davies Liu <dav...@databricks.com> wrote: > hdfs://192.168.10.130:9000/dev/output/test already exists, so you need > to remove it first. > > On Tue, Apr 26, 2016 at 5:28 AM, Luke Adolph &l

Re: Choosing an Algorithm in Spark MLib

2016-04-21 Thread Prashant Sharma
As far as I can understand, your requirements are pretty straight forward and doable with just simple SQL queries. Take a look at Spark SQL on spark documentation. Prashant Sharma On Tue, Apr 12, 2016 at 8:13 PM, Joe San <codeintheo...@gmail.com> wrote: > up vote > down votefavo

Re: Spark streaming batch time displayed is not current system time but it is processing current messages

2016-04-19 Thread Prashant Sharma
This can happen if system time is not in sync. By default, streaming uses SystemClock(it also supports ManualClock) and that relies on System.currentTimeMillis() for determining start time. Prashant Sharma On Sat, Apr 16, 2016 at 10:09 PM, Hemalatha A < hemalatha.amru...@googlemail.com>

Re: [Spark 1.5.2] Log4j Configuration for executors

2016-04-19 Thread Prashant Sharma
May be you can try creating it before running the App.

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-18 Thread Prashant Sharma
and xml[1] messages. Thanks, Prashant Sharma 1. https://github.com/databricks/spark-xml On Tue, Apr 19, 2016 at 10:31 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > Hi all, > I am looking for an architecture to ingest 10 mils of messages in the > micro batches of seconds. > If

Re: Renaming sc variable in sparkcontext throws task not serializable

2016-03-02 Thread Prashant Sharma
*This is a known issue. * https://issues.apache.org/jira/browse/SPARK-3200 Prashant Sharma On Thu, Mar 3, 2016 at 9:01 AM, Rahul Palamuttam <rahulpala...@gmail.com> wrote: > Thank you Jeff. > > I have filed a JIRA under the following link : > > https://issues.apache.

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Prashant Sharma
) are planning to work, I can help you ? Prashant Sharma On Thu, Apr 9, 2015 at 3:08 PM, anakos ana...@gmail.com wrote: Hi- I am having difficulty getting the 1.3.0 Spark shell to find an external jar. I have build Spark locally for Scala 2.11 and I am starting the REPL as follows: bin/spark

UnsatisfiedLinkError related to libgfortran when running MLLIB code on RHEL 5.8

2015-03-03 Thread Prashant Sharma
Hi Folks, We are trying to run the following code from the spark shell in a CDH 5.3 cluster running on RHEL 5.8. *spark-shell --master yarn --deploy-mode client --num-executors 15 --executor-cores 6 --executor-memory 12G * *import org.apache.spark.mllib.recommendation.ALS * *import

Re: Bind Exception

2015-01-19 Thread Prashant Sharma
, That is just a warning. FYI spark ignores BindException and probes for next available port and continues. So you application is fine if that particular error comes up. Prashant Sharma On Tue, Jan 20, 2015 at 10:30 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Yes, I have increased the driver

Re: Is it safe to use Scala 2.11 for Spark build?

2014-11-17 Thread Prashant Sharma
/patch-3/docs/building-spark.md Prashant Sharma On Tue, Nov 18, 2014 at 12:19 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Any notable issues for using Scala 2.11? Is it stable now? Or can I use Scala 2.11 in my spark application and use Spark dist build with 2.10 ? I'm looking forward

Re: Is it safe to use Scala 2.11 for Spark build?

2014-11-17 Thread Prashant Sharma
Looks like sbt/sbt -Pscala-2.11 is broken by a recent patch for improving maven build. Prashant Sharma On Tue, Nov 18, 2014 at 12:57 PM, Prashant Sharma scrapco...@gmail.com wrote: It is safe in the sense we would help you with the fix if you run into issues. I have used it, but since I

Re: Spray client reports Exception: akka.actor.ActorSystem.dispatcher()Lscala/concurrent/ExecutionContext

2014-10-29 Thread Prashant Sharma
spray depends on and use the akka spark depends on. Prashant Sharma On Wed, Oct 29, 2014 at 9:27 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm using Spark built from HEAD, I think it uses modified Akka 2.3.4, right? Jianshi On Wed, Oct 29, 2014 at 5:53 AM, Mohammed Guller moham

Re: Spark SQL reduce number of java threads

2014-10-28 Thread Prashant Sharma
What is the motivation behind this ? You can start with master as local[NO_OF_THREADS]. Reducing the threads at all other places can have unexpected results. Take a look at this. http://spark.apache.org/docs/latest/configuration.html. Prashant Sharma On Tue, Oct 28, 2014 at 2:08 PM, Wanda

Re: unable to make a custom class as a key in a pairrdd

2014-10-23 Thread Prashant Sharma
Are you doing this in REPL ? Then there is a bug filed for this, I just can't recall the bug ID at the moment. Prashant Sharma On Fri, Oct 24, 2014 at 4:07 AM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: Hi Jao, I don't really know why this doesn't work but I have two hints

Re: Default spark.deploy.recoveryMode

2014-10-15 Thread Prashant Sharma
[Removing dev lists] You are absolutely correct about that. Prashant Sharma On Tue, Oct 14, 2014 at 5:03 PM, Priya Ch learnings.chitt...@gmail.com wrote: Hi Spark users/experts, In Spark source code (Master.scala Worker.scala), when registering the worker with master, I see the usage

Re: Default spark.deploy.recoveryMode

2014-10-15 Thread Prashant Sharma
So if you need those features you can go ahead and setup one of Filesystem or zookeeper options. Please take a look at: http://spark.apache.org/docs/latest/spark-standalone.html. Prashant Sharma On Wed, Oct 15, 2014 at 3:25 PM, Chitturi Padma learnings.chitt...@gmail.com wrote: which means

Re: Nested Case Classes (Found and Required Same)

2014-09-12 Thread Prashant Sharma
What is your spark version ? This was fixed I suppose. Can you try it with latest release ? Prashant Sharma On Fri, Sep 12, 2014 at 9:47 PM, Ramaraju Indukuri iramar...@gmail.com wrote: This is only a problem in shell, but works fine in batch mode though. I am also interested in how others

Re: .sparkrc for Spark shell?

2014-09-03 Thread Prashant Sharma
Hey, You can use spark-shell -i sparkrc, to do this. Prashant Sharma On Wed, Sep 3, 2014 at 2:17 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: To make my shell experience merrier, I need to import several packages, and define implicit sparkContext and sqlContext. Is there a startup

Re: spark streaming actor receiver doesn't play well with kryoserializer

2014-07-31 Thread Prashant Sharma
-framework/chill-akka) might help. I am not well aware about how kryo works internally, may be someone else can throw some light on this. Prashant Sharma On Sat, Jul 26, 2014 at 6:26 AM, Alan Ngai a...@opsclarity.com wrote: The stack trace was from running the Actor count sample directly

Re: Emacs Setup Anyone?

2014-07-26 Thread Prashant Sharma
it is kinda fast to do either tag prediction at point which is not accurate etc.. but its useful. Incase you are working on building this(inferior mode for spark repl) for us, I can come up with a wishlist. Prashant Sharma On Sat, Jul 26, 2014 at 3:07 AM, Andrei faithlessfri...@gmail.com wrote: I

Re: ZeroMQ Stream - stack guard problem and no data

2014-06-04 Thread Prashant Sharma
Hi, What is your Zeromq version ? It is known to work well with 2.2 an output of `sudo ldconfig -v | grep zmq` would helpful in this regard. Thanks Prashant Sharma On Wed, Jun 4, 2014 at 11:40 AM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I am trying to use Spark Streaming (1.0.0

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
%3DGJh1g2zxOJd02Wt7L06mCLjo-vwwG9Q%40mail.gmail.com%3E Prashant Sharma On Fri, May 2, 2014 at 3:56 PM, N.Venkata Naga Ravi nvn_r...@hotmail.comwrote: Hi, I am tyring to build Apache Spark with Java 8 in my Mac system ( OS X 10.8.5) , but getting following exception. Please help on resolving

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
I have pasted the link in my previous post. Prashant Sharma On Fri, May 2, 2014 at 4:15 PM, N.Venkata Naga Ravi nvn_r...@hotmail.comwrote: Thanks for your quick replay. I tried with fresh installation, it downloads sbt 0.12.4 only (please check below logs). So it is not working. Can you

Re: when to use broadcast variables

2014-05-02 Thread Prashant Sharma
I had like to be corrected on this but I am just trying to say small enough of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it can be a GB but not GBs and then depends on the network too. Prashant Sharma On Fri, May 2, 2014 at 6:42 PM, Diana Carroll dcarr

Re: Issue during Spark streaming with ZeroMQ source

2014-04-29 Thread Prashant Sharma
with zeromq 2.2.0 and if you have jzmq libraries installed performance is much better. Prashant Sharma On Tue, Apr 29, 2014 at 12:29 PM, Francis.Hu francis...@reachjunction.comwrote: Hi, all I installed spark-0.9.1 and zeromq 4.0.1 , and then run below example: ./bin/run-example

Re: 答复: Issue during Spark streaming with ZeroMQ source

2014-04-29 Thread Prashant Sharma
Well that is not going to be easy, simply because we depend on akka-zeromq for zeromq support. And since akka does not support the latest zeromq library yet, I doubt if there is something simple that can be done to support it. Prashant Sharma On Tue, Apr 29, 2014 at 2:44 PM, Francis.Hu francis

Re: Need help about how hadoop works.

2014-04-24 Thread Prashant Sharma
Prashant Sharma On Thu, Apr 24, 2014 at 12:15 PM, Carter gyz...@hotmail.com wrote: Thanks Mayur. So without Hadoop and any other distributed file systems, by running: val doc = sc.textFile(/home/scalatest.txt,5) doc.count we can only get parallelization within the computer where

Re: Need help about how hadoop works.

2014-04-24 Thread Prashant Sharma
It is the same file and hadoop library that we use for splitting takes care of assigning the right split to each node. Prashant Sharma On Thu, Apr 24, 2014 at 1:36 PM, Carter gyz...@hotmail.com wrote: Thank you very much for your help Prashant. Sorry I still have another question about your

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Prashant Sharma
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative. Prashant Sharma On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov pahomov.e...@gmail.comwrote: Hi, I'm running benchmark, which compares Mahout and SparkML. For now I have next results for k-means: Number