RE: Re: Using Avro file format with SparkSQL

2022-02-14 Thread Morven Huang
Hi Steve, You’re correct about the '--packages' option, seems my memory does not serve me well :) On 2022/02/15 07:04:27 Stephen Coy wrote: > Hi Morven, > > We use —packages for all of our spark jobs. Spark downloads the specified jar > and all of its dependencies from a Maven repository. >

Re: Using Avro file format with SparkSQL

2022-02-14 Thread Stephen Coy
Hi Morven, We use —packages for all of our spark jobs. Spark downloads the specified jar and all of its dependencies from a Maven repository. This means we never have to build fat or uber jars. It does mean that the Apache Ivy configuration has to be set up correctly though. Cheers, Steve C

RE: Using Avro file format with SparkSQL

2022-02-14 Thread Morven Huang
I wrote a toy spark job and ran it within my IDE, same error if I don’t add spark-avro to my pom.xml. After putting spark-avro dependency to my pom.xml, everything works fine. Another thing is, if my memory serves me right, the spark-submit options for extra jars is ‘--jars’ , not

Unsubscribe

2022-02-14 Thread William R
Unsubscribe

Position for 'cf.content' not found in row

2022-02-14 Thread 潘明文
HI, Could you help me the below issue,Thanks! This is my source code: SparkConf sparkConf = new SparkConf(true); sparkConf.setAppName(ESTest.class.getName()); SparkSession spark = null; sparkConf.setMaster("local[*]"); sparkConf.set("spark.cleaner.ttl", "3600"); sparkConf.set("es.nodes",

Re: Apache spark 3.0.3 [Spark lower version enhancements]

2022-02-14 Thread Sean Owen
What vulnerabilities are you referring to? I'm not aware of any critical outstanding issues, but not sure what you have in mind either. See https://spark.apache.org/versioning-policy.html - 3.0.x is EOL about now, which doesn't mean there can't be another release, but would not generally expect

Re: Spark kubernetes s3 connectivity issue

2022-02-14 Thread Mich Talebzadeh
actually can you create an Uber jar file in a conventional way using those two hadoop versions? You have HADOOP_AWS_VERSION=3.3.0 besides 3.2. HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Spark kubernetes s3 connectivity issue

2022-02-14 Thread Raj ks
I understand what you are saying . However, I am not sure how to implement when i create a docker image using spark 3.2.1 with hadoop 3.2 which has guava jar already added as part of distribution. On Tue, Feb 15, 2022, 01:17 Mich Talebzadeh wrote: > Hi Raj, > > I found the old email. That is

Re: Spark kubernetes s3 connectivity issue

2022-02-14 Thread Mich Talebzadeh
Hi Raj, I found the old email. That is what I did but it is 2018 stuff. The email says I sorted out this problem. I rewrote the assembly with shade rules to avoid old jar files as follows: lazy val root = (project in file(".")). settings( name := "${APPLICATION}", version := "1.0",

Re: Spark kubernetes s3 connectivity issue

2022-02-14 Thread Raj ks
Should we remove the existing jar and upgrade it to some recent version? On Tue, Feb 15, 2022, 01:08 Mich Talebzadeh wrote: > I recall I had similar issues running Spark on Google Dataproc. > > sounds like it gets Hadoop's jars on the classpath which include an older > version of Guava. The

Re: Spark kubernetes s3 connectivity issue

2022-02-14 Thread Mich Talebzadeh
I recall I had similar issues running Spark on Google Dataproc. sounds like it gets Hadoop's jars on the classpath which include an older version of Guava. The solution is to shade/relocate Guava in your distribution HTH view my Linkedin profile

Spark kubernetes s3 connectivity issue

2022-02-14 Thread Raj ks
Hi Team , We are trying to build a docker image using Centos and trying to connect through S3. Same works with Hadoop 3.2.0 and spark.3.1.2 #Installing spark binaries ENV SPARK_HOME /opt/spark ENV SPARK_VERSION 3.2.1 ENV HADOOP_VERSION 3.2.0 ARG HADOOP_VERSION_SHORT=3.2 ARG

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-14 Thread Gourav Sengupta
Hi, sorry in case it appeared otherwise, Mich's takes are super interesting. Just that while applying solutions on commercial undertakings things are quite different from research/ development scenarios . Regards, Gourav Sengupta On Mon, Feb 14, 2022 at 5:02 PM

Re: [MLlib]: GLM with multinomial family

2022-02-14 Thread Sean Owen
SparkR is just a wrapper on Scala implementations. Are you just looking for setting family = multinomial on LogisticRegression ? Sure it's there in the scala API On Mon, Feb 14, 2022, 11:50 AM Surya Rajaraman Iyer wrote: > Hi Team, > > I am using a multinomial regression in Spark Scala. I want

[MLlib]: GLM with multinomial family

2022-02-14 Thread Surya Rajaraman Iyer
Hi Team, I am using a multinomial regression in Spark Scala. I want to generate the coefficient and p-values for every category. For example, given two variables salary group (dependent variable) and age group (Independent variable) salary-group: 10,000-, 10,000-100,000, 100,000+ age-group:

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21

2022-02-14 Thread Mich Talebzadeh
Hi It is complaining about the missing driver container image. Does $SPARK_IMAGE point to a valid image in the GCP container registry? Example for a docker image for PySpark IMAGEDRIVER="eu.gcr.io/ /spark-py:3.1.1-scala_2.12-8-jre-slim-buster-java8PlusPackages" spark-submit

Re: Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21

2022-02-14 Thread Gnana Kumar
Also im using the below parameters while submitting the spark job. spark-submit \ --master k8s://$K8S_SERVER \ --deploy-mode cluster \ --name $POD_NAME \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=2 \ --conf

Spark 3.2.1 in Google Kubernetes Version 1.19 or 1.21

2022-02-14 Thread Gnana Kumar
Hi There, I have been trying to run Spark 3.2.1 in Google Cloud's Kubernetes Cluster version 1.19 or 1.21 But I kept on getting on following error and could not proceed. Please help me resolve this issue. 22/02/14 16:00:48 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-14 Thread ashok34...@yahoo.com.INVALID
Thanks Mich. Very insightful. AKOn Monday, 14 February 2022, 11:18:19 GMT, Mich Talebzadeh wrote: Good question. However, we ought to look at what options we have so to speak.  Let us consider Spark on Dataproc, Spark on Kubernetes and Spark on Dataflow Spark on DataProc is proven

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-14 Thread Gourav Sengupta
Hi, I would still not build any custom solution, and if in GCP use serverless Dataproc. I think that it is always better to be hands on with AWS Glue before commenting on it. Regards, Gourav Sengupta On Mon, Feb 14, 2022 at 11:18 AM Mich Talebzadeh wrote: > Good question. However, we ought to

Re: Deploying Spark on Google Kubernetes (GKE) autopilot, preliminary findings

2022-02-14 Thread Mich Talebzadeh
Good question. However, we ought to look at what options we have so to speak. Let us consider Spark on Dataproc, Spark on Kubernetes and Spark on Dataflow Spark on DataProc is proven and it is in use at many organizations, I have deployed it extensively. It

Re: [EXTERNAL] Re: Unable to access Google buckets using spark-submit

2022-02-14 Thread Saurabh Gulati
Hey Karan, you can get the jar from here From: karan alang Sent: 13 February 2022 20:08 To: Gourav Sengupta Cc: Holden Karau ; Mich Talebzadeh ; user @spark