date:20210729

How can I sync 2 hive cluster

2021-07-29 Thread igyu

I want read data from hive cluster1 and write data to hive cluster2 How can I do it? notice: cluster1,cluster2 are enable kerberos igyu

Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-29 Thread Artemis User

Has anyone had any experience with running Spark-Rapids on a GPU-powered cluster (https://github.com/NVIDIA/spark-rapids)? I am very interested in knowing: 1. What is the hardware/software platform and the type of Spark cluster you are using to run Spark-Rapids? 2. How easy was the

Hacking my way through Kubernetes docker file

2021-07-29 Thread Mich Talebzadeh

You may recall that I raised a few questions here and in Stacktrace regarding two items both related to running Pyspark inside kubernetes. The challenge was 1. Load third party packages like tensorflow, numpy, pyyaml in running job in k8s 2. How to read from a yaml file to load

Re: Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread Mich Talebzadeh

Yes indeed very good points by the Artemis User. Just to add if I may, why choose Spark? Generally, parallel architecture comes into play when the data size is significantly large which cannot be handled on a single machine, hence, the use of Spark becomes meaningful. In cases where (the

Re: Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread Artemis User

PySpark still uses Spark dataframe underneath (it wraps java code). Use PySpark when you have to deal with big data ETL and analytics so you can leverage the distributed architecture in Spark. If you job is simple, dataset is relatively small, and doesn't require distributed processing, use

Re: Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Artemis User

Can you please post the error log/exception messages? There is not enough info to help diagnose what the real problem is On 7/29/21 8:55 AM, Big data developer need help relat to spark gateway roles in 2.0 wrote: Hi Team , We are facing issue in production where we are getting frequent

Well balanced Python code with Pandas compared to PySpark

2021-07-29 Thread ashok34...@yahoo.com.INVALID

Hello team Someone asked me regarding well developed Python code with Panda dataframe and comparing that to PySpark. Under what situations one choose PySpark instead of Python and Pandas. Appreciate AK

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0

Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0

Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to

Re: Spark Architecture Question

2021-07-29 Thread Pasha Finkelshteyn

Hi Renganathan, Not quite. It strongly depends on your usage of UDFs defined in any manner — as UDF object or just lambdas. If you have ones — they may and will be called on executors too. On 21/07/29 05:17, Renganathan Mutthiah wrote: > Hi, > > I have read in many materials (including from the

Spark Architecture Question

2021-07-29 Thread Renganathan Mutthiah

Hi, I have read in many materials (including from the book: Spark - The Definitive Guide) that Spark is a compiler. In my understanding, our program is used until the point of DAG generation. This portion can be written in any language - Java,Scala,R,Python. Post that (executing the DAG), the

How can I sync 2 hive cluster

Running Spark Rapids on GPU-Powered Spark Cluster

Hacking my way through Kubernetes docker file

Re: Well balanced Python code with Pandas compared to PySpark

Re: Well balanced Python code with Pandas compared to PySpark

Re: Connection Reset by Peer : failed to remove cached rdd

Well balanced Python code with Pandas compared to PySpark

Connection Reset by Peer : failed to remove cached rdd

Connection Reset by Peer : failed to remove cached rdd

Re: Spark Architecture Question

Spark Architecture Question

11 matches

Site Navigation

Mail list logo

Footer information