Scala version changed in spark job

2018-01-24 Thread Fawze Abujaber
Hi all, I upgraded my Hadoop cluster which include spark 1.6.0, I noticed that sometimes the job is running with scala version 2.10.5 and sometimes with 2.10.4, any idea why this happening?

Re: a way to allow spark job to continue despite task failures?

2018-01-24 Thread Sunita Arvind
Had a similar situation and landed on this question. Finally I was able to make it do what I needed by cheating the spark driver :) i.e By setting a very high value for "--conf spark.task.maxFailures=800". I made it 800 deliberately which typically is 4. So by the time 800 attempts for failed

CI/CD for spark and scala

2018-01-24 Thread Deepak Sharma
Hi All, I just wanted to check if there are any best practises around using CI/CD for spark / scala projects running on AWS hadoop clusters. IF there is any specific tools , please do let me know. -- Thanks Deepak

Re: Providing Kafka configuration as Map of Strings

2018-01-24 Thread Cody Koeninger
Have you tried passing in a Map that happens to have string for all the values? I haven't tested this, but the underlying kafka consumer constructor is documented to take either strings or objects as values, despite the static type. On Wed, Jan 24, 2018 at 2:48 PM, Tecno Brain

Apache Hadoop and Spark

2018-01-24 Thread Mutahir Ali
Hello All, Cordial Greetings, I am trying to familiarize myself with Apache Hadoop and it's different software components and how they can be deployed on physical or virtual infrastructure. I have a few questions: Q1) Can we use Mapreduce and apache spark in the same cluster Q2) is it

Re: Providing Kafka configuration as Map of Strings

2018-01-24 Thread Tecno Brain
Basically, I am trying to avoid writing code like: switch( key ) { case "key.deserializer" : result.put(key , Class.forName(value)); break; case "key.serializer" : result.put(key , Class.forName(value)); break; case "value.deserializer" :

Providing Kafka configuration as Map of Strings

2018-01-24 Thread Tecno Brain
On page https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html there is this Java example: Map kafkaParams = new HashMap<>();kafkaParams.put("bootstrap.servers", "localhost:9092,anotherhost:9092");kafkaParams.put("key.deserializer",

Re: Spark Tuning Tool

2018-01-24 Thread Shmuel Blitz
Hi, Which versions of Spark does the tool support? Does the tool have any reference to the number of executor cores? >From your blog post it seems that this is a new feature on your service. Are you offering the tool for download? Shmuel On Wed, Jan 24, 2018 at 7:02 PM, Timothy Chen

Re: uncontinuous offset in kafka will cause the spark streamingfailure

2018-01-24 Thread Cody Koeninger
When you say the patch is not suitable, can you clarify why? Probably best to get the various findings centralized on https://issues.apache.org/jira/browse/SPARK-17147 Happy to help with getting the patch up to date and working. On Wed, Jan 24, 2018 at 1:19 AM, namesuperwood

Re: Spark Tuning Tool

2018-01-24 Thread Timothy Chen
Interested to try as well. Tim On Tue, Jan 23, 2018 at 5:54 PM, Raj Adyanthaya wrote: > Its very interesting and I do agree that it will get a lot of traction once > made open source. > > On Mon, Jan 22, 2018 at 9:01 PM, Rohit Karlupia wrote: >> >> Hi, >>

Re: spark.sql call takes far too long

2018-01-24 Thread lucas.g...@gmail.com
Hi Michael. I haven't had this particular issue previously, but I have had other performance issues. Some questions which may help: 1. Have you checked the Spark Console? 2. Have you isolated the query in question, are you sure it's actually where the slowdown occurs? 3. How much data are you

spark.sql call takes far too long

2018-01-24 Thread Michael Shtelma
Hi all, I have a problem with the performance of the sparkSession.sql call. It takes up to a couple of seconds for me right now. I have a lot of generated temporary tables, which are registered within the session and also a lot of temporary data frames. Is it possible, that the