Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Prashant Sharma
Hi, Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only

Re: Blog : Apache Spark Window Functions

2020-07-09 Thread Anwar AliKhan
My opinion would be go here. https://www.coursera.org/courses?query=machine%20learning%20andrew%20ng Machine learning by Andrew Ng. After three weeks you will have more valuable skills than most engineers in silicon valley in the USA. I am past week 3. 蘿 He does go 90 miles per hour. I wish

sparksql 2.4.0 java.lang.NoClassDefFoundError: com/esotericsoftware/minlog/Log

2020-07-09 Thread Ivan Petrov
Hi there! I'm seeing this exception in Spark Driver log. Executor log stays empty. No exceptions, nothing. 8 tasks out of 402 failed with this exception. What is the right way to debug it? Thank you. I see that spark/jars -> minlog-1.3.0.jar is in driver classpath at least...

Re: Strange WholeStageCodegen UI values

2020-07-09 Thread Sean Owen
It sounds like you have huge data skew? On Thu, Jul 9, 2020 at 4:15 PM Bobby Evans wrote: > > Sadly there isn't a lot you can do to fix this. All of the operations take > iterators of rows as input and produce iterators of rows as output. For > efficiency reasons, the timing is not done for

Re: Strange WholeStageCodegen UI values

2020-07-09 Thread Bobby Evans
Sadly there isn't a lot you can do to fix this. All of the operations take iterators of rows as input and produce iterators of rows as output. For efficiency reasons, the timing is not done for each individual row. If we did that in many cases it would take longer to measure how long something

RE: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Varshney, Vaibhav
Thanks for response. We have tried it in dev env. For production, if Spark 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be "static"? As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is still blocker for production workloads? Thanks, Vaibhav V

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Sean Owen
I haven't used the K8S scheduler personally, but, just based on that comment I wouldn't worry too much. It's been around for several versions and AFAIK works fine in general. We sometimes aren't so great about removing "experimental" labels. That said I know there are still some things that could

[Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Varshney, Vaibhav
Hi Spark Experts, We are trying to deploy spark on Kubernetes. As per doc http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks like K8s deployment is experimental. "The Kubernetes scheduler is currently experimental ". Spark 3.0 does not support production deployment using

Re: How To Access Hive 2 Through JDBC Using Kerberos

2020-07-09 Thread Jeff Evans
There are various sample JDBC URLs documented here, depending on the driver vendor, Kerberos (or not), and SSL (or not). Often times, unsurprisingly, SSL is used in conjunction with Kerberos. Even if you don't use StreamSets software at all, you might find these useful.

Re: sparksql 2.4.0 java.lang.NoClassDefFoundError: com/esotericsoftware/minlog/Log

2020-07-09 Thread Ivan Petrov
spark/jars -> minlog-1.3.0.jar I see that jar is there. What do I do wrong? чт, 9 июл. 2020 г. в 20:43, Ivan Petrov : > Hi there! > I'm seeing this exception in Spark Driver log. > Executor log stays empty. No exceptions, nothing. > 8 tasks out of 402 failed with this exception. > What is the

sparksql 2.4.0 java.lang.NoClassDefFoundError: com/esotericsoftware/minlog/Log

2020-07-09 Thread Ivan Petrov
Hi there! I'm seeing this exception in Spark Driver log. Executor log stays empty. No exceptions, nothing. 8 tasks out of 402 failed with this exception. What is the right way to debug it? Thank you. java.lang.NoClassDefFoundError: com/esotericsoftware/minlog/Log at

Re: How To Access Hive 2 Through JDBC Using Kerberos

2020-07-09 Thread Daniel de Oliveira Mantovani
One of my colleagues found this solution: https://github.com/morfious902002/impala-spark-jdbc-kerberos/blob/master/src/main/java/ImpalaSparkJDBC.java If you need to connect to Hive or Impala using JDBC with Kerberos authentication from Apache Spark, you can use it and will work. You can

Re: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.9.6 requires Jackson Databind version >= 2.9.0 and < 2.10.0

2020-07-09 Thread Sean Owen
You have a Jackson version conflict somewhere. It might be from other libraries you include in your application. I am not sure Spark 2.3 works with Hadoop 3.1, so this may be the issue. Make sure you match these to Spark, and/or use the latest versions. On Thu, Jul 9, 2020 at 8:23 AM Julian Jiang

com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.9.6 requires Jackson Databind version >= 2.9.0 and < 2.10.0

2020-07-09 Thread Julian Jiang
when I run it in my idea ,it works well.but when I submit to cluster ,it appear this problem.。thanks for help me . My version is as follow: 2.11.8 3.1.1 2.3.2 0.2.4 My code is as follow : val spark:SparkSession = SparkSession .builder() .appName("CkConnect")