scala question (in spark project)- not able to call getClassSchema method in avro generated class

2018-02-24 Thread karan alang
i’ve an Avro generated class - com.avro.Person which has a method -> getClassSchema I’m passing className to a method, and in the method - i need to get the Avro schema . Here is the code i'm trying to use - val pr = Class.forName(productCls) //where productCls = classOf[Product].getName How

Saving spark output to multiple files as map

2018-02-24 Thread pooja bhojwani
Hello everyone, I wanted to do something like this: Given a JavaPairRDD(let's say with 10 rows), I want to store each of the rows separately with following requirements: a) Each of them should be a map(Can not use saveAsTextFile) b) The file name should have the key in it(Eg: If the key is 0,1..

Timezone conversion using from_utc_timestamp

2018-02-24 Thread Srinath C
Hi, This is question regarding timezone conversion with from_utc_timestamp function. The observation is that the function return different values for zoneId and zoneOffset for the same timezone. Ex: "America/Los_Angeles" and "-08:00" System Timezone is +05:30 Timestamp: 1519430400

Re: sqoop import job not working when spark thrift server is running.

2018-02-24 Thread akshay naidu
Thanks Jörn, Fairscheduler is already enabled in yarn-site.xml yarn.resourcemanager.scheduler.class - org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler yarn.scheduler.fair.allow-undeclared-pools - true yarn.scheduler.fair.user-as-default-queue true

Re: Apache Spark - Structured Streaming reading from Kafka some tasks take much longer

2018-02-24 Thread M Singh
Hi Vijay: I am using spark-shell because I am still prototyping the steps involved. Regarding executors - I have 280 executors and UI only show a few straggler tasks on each trigger.  The UI does not show too much time spend on GC.  suspect the delay is because of getting data from kafka. The

Re: sqoop import job not working when spark thrift server is running.

2018-02-24 Thread Jörn Franke
Fairscheduler in yarn provides you the possibility to use more resources than configured if they are available On 24. Feb 2018, at 13:47, akshay naidu wrote: >> it sure is not able to get sufficient resources from YARN to start the >> containers. > that's right. I

Re: sqoop import job not working when spark thrift server is running.

2018-02-24 Thread akshay naidu
> > it sure is not able to get sufficient resources from YARN to start the > containers. > that's right. I worked when I reduced executors from thrift but it also reduced thrift's performance. But it is not the solution i am looking forward to. my sqoop import job runs just once a day, and thrift