Recommended cluster parameters

2017-04-30 Thread rakesh sharma
Hi

I would like to know the details of implementing a cluster.

What kind of machines one would require, how many nodes, number of cores etc.


thanks

rakesh


Spark SQL shell hangs

2016-11-13 Thread rakesh sharma
Hi

I'm trying to convert an XML file to data frame using data bricks spark XML. 
But the shell hanhs when I do a select operation on the table. I believe it's 
memory issue. How can I debug this. The cm file sizes 86 MB.

Thanks in advance
Rakesh

Get Outlook for Android


Re: Converting Dataframe to resultSet in Spark Java

2016-08-18 Thread rakesh sharma
Hi Sree


I dont think what you are trying to do is correct. DataFrame and ResultSet are 
two different types. And no strongly typed language will alow you to do that.

If your intention is to traverse the DataFrame or get the individual rows and 
columns then you must try the map function and pass anonymous function 
definitions with the required logic.


thanks

rakesh


From: Sree Eedupuganti 
Sent: Thursday, August 18, 2016 1:26:52 PM
To: user
Subject: Converting Dataframe to resultSet in Spark Java


Retrieved the data to DataFrame but i can't convert into ResultSet Is there 
any possible way how to convert...Any suggestions please...

Exception in thread "main" java.lang.ClassCastException: 
org.apache.spark.sql.DataFrame cannot be cast to 
com.datastax.driver.core.ResultSet

--
Best Regards,
Sreeharsha Eedupuganti
Data Engineer
innData Analytics Private Limited


Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread rakesh sharma
It would be a rare doc. Please share

Get Outlook for Android



On Tue, May 17, 2016 at 9:14 AM -0700, "Natu Lauchande" 
> wrote:

Hi Mich,

I am also interested in the write up.

Regards,
Natu

On Thu, May 12, 2016 at 12:08 PM, Mich Talebzadeh 
> wrote:
Hi Al,,


Following the threads in spark forum, I decided to write up on configuration of 
Spark including allocation of resources and configuration of driver, executors, 
threads, execution of Spark apps and general troubleshooting taking into 
account the allocation of resources for Spark applications and OS tools at the 
disposal.

Since the most widespread configuration as I notice is with "Spark Standalone 
Mode", I have decided to write these notes starting with Standalone and later 
on moving to Yarn


  *   Standalone - a simple cluster manager included with Spark that makes it 
easy to set up a cluster.

  *   YARN - the resource manager in Hadoop 2.


I would appreciate if anyone interested in reading and commenting to get in 
touch with me directly on 
mich.talebza...@gmail.com so I can send the 
write-up for their review and comments.


Just to be clear this is not meant to be any commercial proposition or anything 
like that. As I seem to get involved with members troubleshooting issues and 
threads on this topic, I thought it is worthwhile writing a note about it to 
summarise the findings for the benefit of the community.


Regards.


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com





Running window functions in spark dataframe

2016-01-13 Thread rakesh sharma
Hi all

I am getting hivecontext error when trying to run to run window functions like 
over on ordering clause. Any help to go about. I am running spark locally

Sent from Ouertlook Mobile


-- Forwarded message --
From: "King sami" >
Date: Wed, Jan 13, 2016 at 7:20 AM -0800
Subject: Need 'Learning Spark' Partner
To: "user@spark.apache.org" 
>


Hi,

As I'm beginner in Spark, I'm looking for someone who's also beginner to learn 
and train on Spark together.

Please contact me if interested

Cordially,


Error using SQLContext in spark

2015-09-02 Thread rakesh sharma
Error: application failed with exceptionjava.lang.NoSuchMethodError: 
org.apache.spark.sql.SQLContext.(Lorg/apache/spark/api/java/JavaSparkContext;)V
at 
examples.PersonRecordReader.getPersonRecords(PersonRecordReader.java:35)
at examples.PersonRecordReader.main(PersonRecordReader.java:17)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)at 
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77)at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Hi All
I am getting the above exception when I am using SQLContext in spark jobs.The 
error occurs only with the insertion of these statements. The rdd is fine and 
it prints all correctly.The error occurs when creating dataframes. I am using 
maven dependencies version 1.3.1
public static void getPersonRecords(String...args) {SparkConf 
sparkConf = new SparkConf().setAppName("SQLContext"); JavaSparkContext 
javaSparkContext = new JavaSparkContext(sparkConf);JavaRDD 
lines = javaSparkContext.textFile(args[0], 1);  JavaRDD 
personRecords = lines.map(new Function() {
public Person call(String line) throws Exception {  
System.out.println(line);   
String[] rec = line.split(","); return new 
Person(Integer.parseInt(rec[1].trim()), rec[0]); }  
 }); for(Person p : personRecords.collect())  { 
 System.out.println(p.getName());}   SQLContext 
sqlContext = new SQLContext(javaSparkContext);   DataFrame 
dataFrame = sqlContext.createDataFrame(personRecords, Person.class);  }

Please help me stuck with this since morning
thanksrakesh  

How mature is spark sql

2015-09-01 Thread rakesh sharma
Is it mature enough to use it extensively. I see that it is easier to do than 
writing map/reduce  in java.We are being asked to do it in java itself and 
cannot move to python and scala.
thanksrakesh