Hello,
Is Kubernetes Dynamic executor scaling for spark is available in latest
release of spark
I mean scaling the executors based on the work load vs preallocating number
of executors for a spark job
Thanks,
Purna
Thanks this is a great news
Can you please lemme if dynamic resource allocation is available in spark
2.4?
I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor
memory options as part of spark submit command or spark will manage
required executor memory based on the spark job
Thanks this is a great news
Can you please lemme if dynamic resource allocation is available in spark
2.4?
I’m using spark 2.3.2 on Kubernetes, do I still need to provide executor
memory options as part of spark submit command or spark will manage
required executor memory based on the spark job
Hello ,
We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from
k8s are getting stuck in initializing state like so:
NAME
READY STATUS RESTARTS AGE
my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0
18h
And
We're running spark 2.3.1 on kubernetes v1.11.0 and our driver pods from
k8s are getting stuck in initializing state like so:
NAME
READY STATUS RESTARTS AGE
my-pod-fd79926b819d3b34b05250e23347d0e7-driver 0/1 Init:0/1 0
18h
And from *kubectl
Resurfacing The question to get more attention
Hello,
>
> im running Spark 2.3 job on kubernetes cluster
>>
>> kubectl version
>>
>> Client Version: version.Info{Major:"1", Minor:"9",
>> GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b",
>> GitTreeState:"clean",
Hello,
im running Spark 2.3 job on kubernetes cluster
>
> kubectl version
>
> Client Version: version.Info{Major:"1", Minor:"9",
> GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b",
> GitTreeState:"clean", BuildDate:"2018-02-09T21:51:06Z",
> GoVersion:"go1.9.4",
im running Spark 2.3 job on kubernetes cluster
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3",
GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean",
BuildDate:"2018-02-09T21:51:06Z", GoVersion:"go1.9.4", Compiler:"gc",
va:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:
On Tue, Jul 31, 2018 at 8:32 AM purna pradeep
wrote:
>
> Hello,
>>
>>
>>
>> I’m getting below error in spa
> Hello,
>
>
>
> I’m getting below error in spark driver pod logs and executor pods are
> getting killed midway through while the job is running and even driver pod
> Terminated with below intermittent error ,this happens if I run multiple
> jobs in parallel.
>
>
>
> Not able to see executor logs
Hello,
I’m getting below error in spark driver pod logs and executor pods are
getting killed midway through while the job is running and even driver pod
Terminated with below intermittent error ,this happens if I run multiple
jobs in parallel.
Not able to see executor logs as executor pods
Hello,
I’m getting below error in spark driver pod logs and executor pods are getting
killed midway through while the job is running and even driver pod Terminated
with below intermittent error ,this happens if I run multiple jobs in parallel.
Not able to see executor logs as executor pods
> Hello,
>
>
>
> When I’m trying to set below options to spark-submit command on k8s Master
> getting below error in spark-driver pod logs
>
>
>
> --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost
> -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \
>
> --conf
Hello,
When I’m trying to set below options to spark-submit command on k8s Master
getting below error in spark-driver pod logs
--conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost
-Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \
--conf
Hello,
When I’m trying to set below options to spark-submit command on k8s Master
getting below error in spark-driver pod logs
--conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost
-Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \
--conf
Hello,
When I run spark-submit on k8s cluster I’m
Seeing driver pod stuck in Running state and when I pulled driver pod logs
I’m able to see below log
I do understand that this warning might be because of lack of cpu/ Memory ,
but I expect driver pod be in “Pending” state rather than “ Running”
im reading below json in spark
{"bucket": "B01", "actionType": "A1", "preaction": "NULL",
"postaction": "NULL"}
{"bucket": "B02", "actionType": "A2", "preaction": "NULL",
"postaction": "NULL"}
{"bucket": "B03", "actionType": "A3", "preaction": "NULL",
"postaction": "NULL"}
val
t/crashloop due to
> lack of resource.
>
> On Tue, May 29, 2018 at 3:18 PM, purna pradeep
> wrote:
>
>> Hello,
>>
>> I’m getting below error when I spark-submit a Spark 2.3 app on
>> Kubernetes *v1.8.3* , some of the executor pods were killed with below
>
Hello,
I’m getting below error when I spark-submit a Spark 2.3 app on Kubernetes
*v1.8.3* , some of the executor pods were killed with below error as soon
as they come up
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
Hello,
I’m getting below intermittent error when I spark-submit a Spark 2.3 app on
Kubernetes v1.8.3 , some of the executor pods were killed with below error as
soon as they come up
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
Hello,
I’m getting below intermittent error when I spark-submit a Spark 2.3 app on
Kubernetes v1.8.3 , some of the executor pods were killed with below error as
soon as they come up
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at
Hello,
Currently I observe dead pods are not getting garbage collected (aka spark
driver pods which have completed execution). So pods could sit in the
namespace for weeks potentially. This makes listing, parsing, and reading
pods slower and well as having junk sit on the cluster.
I believe
Hi,
What would be the recommended approach to wait for spark driver pod to
complete the currently running job before it gets evicted to new nodes
while maintenance on the current node is goingon (kernel upgrade,hardware
maintenance etc..) using drain command
I don’t think I can use
+ Joe
On Mon, May 21, 2018 at 2:56 PM purna pradeep <purna2prad...@gmail.com>
wrote:
> I do know only to some extent , I mean If you see my sample s3 locations
>
> s3a://mybucket/20180425_111447_data1/_SUCCESS
>
> s3a://mybucket/20180424_111241_data1/_SUCCESS
>
>
isenbergwoodworking.com> wrote:
>
> > I suggest it’ll work for your needs.
> >
> > Sent from a device with less than stellar autocorrect
> >
> > > On May 21, 2018, at 10:16 AM, purna pradeep <purna2prad...@gmail.com>
> > wrote:
> > >
>
Cseh <gezap...@cloudera.com> wrote:
> Wow, great work!
> Can you please summarize the required steps? This would be useful for
> others so we probably should add it to our documentation.
> Thanks in advance!
> Peter
>
> On Fri, May 18, 2018 at 11:33 PM, purna prade
Hello ,
Event trigger Oozie datasets
1) Does oozie supports event trigger?
Trigger Workflow based on a file arrival on AWS s3
As per my understanding based on start date mentioned on coordinator it
can poll for a file on s3 and once dependency is met it can execute an
action/SparkAction
.sh
But now I’m getting this error
On Thu, May 17, 2018 at 2:53 PM purna pradeep <purna2prad...@gmail.com>
wrote:
> Ok I got passed this error
>
> By rebuilding oozie with Dhttpclient.version=4.5.5 -Dhttpcore.version=4.4.9
>
> now getting this error
>
>
credentials from service
endpoint]
On Thu, May 17, 2018 at 12:24 PM purna pradeep <purna2prad...@gmail.com>
wrote:
>
> Peter,
>
> Also When I submit a job with new http client jar, I get
>
> ```Error: IO_ERROR : java.io.IOException: Error while connecting Oozie
> server. N
purna pradeep <purna2prad...@gmail.com>
wrote:
> Ok I have tried this
>
> It appears that s3a support requires httpclient 4.4.x and oozie is bundled
> with httpclient 4.3.6. When httpclient is upgraded, the ext UI stops
> loading.
>
>
>
> On Thu, May 17, 2
ames are slightly different so you'll have to change
> the example I've given.
>
>
>
> On Thu, May 17, 2018 at 4:16 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Peter,
>>
>> I’m using latest oozie 5.0.0 and I have tried below changes but no luck
https://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-kubernetes_2.11%7C2.3.0%7Cjar
> >
> in
> the sharelib/spark/pom.xml as a compile-time dependency.
>
> gp
>
> On Tue, May 15, 2018 at 9:04 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
&g
(respectively)
I have tried adding AWS access ,secret keys in
oozie-site.xml and hadoop core-site.xml , and hadoop-config.xml
On Wed, May 16, 2018 at 2:30 PM purna pradeep <purna2prad...@gmail.com>
wrote:
>
> I have tried this ,just adde
)
at
org.apache.oozie.service.HadoopAccessorService$5.run(HadoopAccessorService.java:623
On Wed, May 16, 2018 at 2:19 PM purna pradeep <purna2prad...@gmail.com>
wrote:
> This is what is in the logs
>
> 2018-05-16 14:06:13,500 INFO URIHandlerService:520 - SERVER[loca
ses));
> LOG.info("Loaded default urihandler {0}",
> defaultHandler.getClass().getName());
> Thanks
>
> On Wed, May 16, 2018 at 5:47 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> This is what I already have in my oozie-site.xml
>&
stems
> hdfs,hftp,webhdfs Enlist
> the different filesystems supported for federation. If wildcard "*" is
> specified, then ALL file schemes will be allowed.properly.
>
> For testing purposes it's ok to put * in there in oozie-site.xml
>
> On Wed, May 16, 2018 at 5:29
+Peter
On Wed, May 16, 2018 at 11:29 AM purna pradeep <purna2prad...@gmail.com>
wrote:
> Peter,
>
> I have tried to specify dataset with uri starting with s3://, s3a:// and
> s3n:// and I am getting exception
>
>
>
> Exception occurred:E0904: Scheme [s3] not sup
tion on how to make it work in jobs, something similar should work
> on the server side as well
>
> On Tue, May 15, 2018 at 4:43 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
> > Thanks Andras,
> >
> > Also I also would like to know if oozie supports
artemerv...@gmail.com> wrote:
>
> > Did you run
> > mvn clean install first on the parent directory?
> >
> > On Tue, May 15, 2018, 11:35 AM purna pradeep <purna2prad...@gmail.com>
> > wrote:
> >
> > > Thanks peter,
> >
f you overwrite the -Dspark.version and compile Oozie that way it
> will work.
> gp
>
>
> On Tue, May 15, 2018 at 5:07 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
> > Hello,
> >
> > Does oozie supports spark 2.3? Or will it even care of the spark versi
Hello,
Does oozie supports spark 2.3? Or will it even care of the spark version
I want to use spark action
Thanks,
Purna
on.
>
> At the moment it's not feasible to install Oozie without those Hadoop
> components. How to install Oozie please *find here
> <https://oozie.apache.org/docs/5.0.0/AG_Install.html>*.
>
> Regards,
>
> Andras
>
> On Tue, May 15, 2018 at 4:11 PM, purna pradeep &
Hi,
Hi,
I’m very new to oozie ,actually I would like to run Spark 2.3 jobs on oozie
based on file arrival on aws s3 which is a dependency for the job
I see some examples which uses s3 as input event datasets as below
s3n://mybucket/a/b/${YEAR}/${MONTH}/${DAY}
So my question is does oozie
Hello,
Would like to know if anyone tried oozie with spark 2.3 actions on
Kubernetes for scheduling spark jobs .
Thanks,
Purna
yes “REST application that submits a Spark job to a k8s cluster by running
spark-submit programmatically” and also would like to expose as a
Kubernetes service so that clients can access as any other Rest api
On Wed, Apr 4, 2018 at 12:25 PM Yinan Li wrote:
> Hi Kittu,
>
>
unsubscribe
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
kApplication CRD objects and
> automatically submits the applications to run on a Kubernetes cluster.
>
> Yinan
>
> On Tue, Mar 20, 2018 at 7:47 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Im using kubernetes cluster on AWS to run spark jobs ,im using spa
Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3
,now i want to run spark-submit from AWS lambda function to k8s
master,would like to know if there is any REST interface to run Spark
submit on k8s Master
Hello ,I’m using CA v1.0.4 from kubernetes for aws node autoscaling
(Kubernetes version 1.8.3)
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws
And I’m getting below error in the logs of CA pod
node registry: RequestError: send request failed
caused
ed in your cluster? This issue
> https://github.com/apache-spark-on-k8s/spark/issues/558 might help.
>
>
> On Sun, Mar 11, 2018 at 5:01 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Getting below errors when I’m trying to run spark-submit on k8 cluster
>>
Getting below errors when I’m trying to run spark-submit on k8 cluster
*Error 1*:This looks like a warning it doesn’t interrupt the app running
inside executor pod but keeps on getting this warning
*2018-03-09 11:15:21 WARN WatchConnectionManager:192 - Exec Failure*
*
Im trying to run spark-submit to kubernetes cluster with spark 2.3 docker
container image
The challenge im facing is application have a mainapplication.jar and other
dependency files & jars which are located in Remote location like AWS s3
,but as per spark 2.3 documentation there is something
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Unsubscribe
Hi all,
Im performing spark submit using Spark rest api POST operation on 6066 port
with below config
> Launch Command:
> "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-1.b16.el7_3.x86_64/jre/bin/java"
> "-cp" "/usr/local/spark/conf/:/usr/local/spark/jars/*" "-Xmx4096M"
>
Hi,
I'm using spark standalone in aws ec2 .And I'm using spark rest
API http::8080/Json to get completed apps but the Json completed
apps as empty array though the job ran successfully.
ested but meaby with @ayan sql
>
> spark.sql("select *, row_number(), last_value(income) over (partition by
> id order by income_age_ts desc) r from t")
>
>
> On Tue, Aug 29, 2017 at 11:30 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> @ay
3| ES|101| 19000| 4/20/17| 1|
> | 4/20/12| DS|102| 13000| 5/9/17| 1|
> +++---+--+-+---+
>
> This should be better because it uses all in-built optimizations in Spark.
>
> Best
> Ayan
>
> On Wed, Aug 30, 2017 at 11:06 AM, purna pradeep <
Please click on unnamed text/html link for better view
On Tue, Aug 29, 2017 at 8:11 PM purna pradeep <purna2prad...@gmail.com>
wrote:
>
> -- Forwarded message -
> From: Mamillapalli, Purna Pradeep <
> purnapradeep.mamillapa...@capitalone.com>
> Date: T
va().calculateExpense(pexpense.toDouble,
> cexpense.toDouble))
>
>
>
>
>
> On Tue, Aug 29, 2017 at 6:53 AM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> I have data in a DataFrame with below columns
>>
>> 1)Fileformat is csv
>> 2)All below
-- Forwarded message -
From: Mamillapalli, Purna Pradeep <purnapradeep.mamillapa...@capitalone.com>
Date: Tue, Aug 29, 2017 at 8:08 PM
Subject: Spark question
To: purna pradeep <purna2prad...@gmail.com>
Below is the input Dataframe(In real this is a very lar
I have data in a DataFrame with below columns
1)Fileformat is csv
2)All below column datatypes are String
employeeid,pexpense,cexpense
Now I need to create a new DataFrame which has new column called `expense`,
which is calculated based on columns `pexpense`, `cexpense`.
The tricky part is
And also is query.stop() is graceful stop operation?what happens to already
received data will it be processed ?
On Tue, Aug 15, 2017 at 7:21 PM purna pradeep <purna2prad...@gmail.com>
wrote:
> Ok thanks
>
> Few more
>
> 1.when I looked into the documentation it
rigger/batch after the asynchronous
> unpersist+persist will probably take longer as it has to reload the data.
>
>
> On Tue, Aug 15, 2017 at 2:29 PM, purna pradeep <purna2prad...@gmail.com>
> wrote:
>
>> Thanks tathagata das actually I'm planning to something like this
>&g
ion and
> restart query
> activeQuery.stop()
> activeQuery = startQuery()
>}
>
>activeQuery.awaitTermination(100) // wait for 100 ms.
>// if there is any error it will throw exception and quit the loop
>// otherwise it will keep checking the conditi
:
> See
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing
>
> Though I think that this currently doesn't work with the console sink.
>
> On Tue, Aug 15, 2017 at 9:40 AM, purna pradeep <purn
Hi,
>
> I'm trying to restart a streaming query to refresh cached data frame
>
> Where and how should I restart streaming query
>
val sparkSes = SparkSession
.builder
.config("spark.master", "local")
.appName("StreamingCahcePoc")
.getOrCreate()
import
Im working on structered streaming application wherein im reading from
Kafka as stream and for each batch of streams i need to perform S3 lookup
file (which is nearly 200gb) to fetch some attributes .So im using
df.persist() (basically caching the lookup) but i need to refresh the
dataframe as the
72 matches
Mail list logo