Re: Create an Empty dataframe

2018-07-08 Thread रविशंकर नायर
>From Stackoverflow: from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType sc = SparkContext(conf=SparkConf()) spark = SparkSession(sc) # Need to use SparkSession(sc) to createDataFrame schema = StructType([

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread रविशंकर नायर
Are you able to run a simple Map Reduce job on yarn without any issues? If you have any issues: I had this problem on Mac. Use CSRUTIL in Mac, to disable it. Then add a softlink sudo ln –s /usr/bin/java/bin/java The new versions of Mac from EL Captain does not allow softlinks in

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-15 Thread रविशंकर नायर
Hi Jacek, If we use RDD instead of Dataframe, can we accomplish the same? I mean, is joining between RDDS allowed in Spark streaming ? Best, Ravi On Sun, May 13, 2018 at 11:18 AM Jacek Laskowski wrote: > Hi, > > The exception message should be self-explanatory and says that

Re: Spark Structured Streaming is giving error “org.apache.spark.sql.AnalysisException: Inner join between two streaming DataFrames/Datasets is not supported;”

2018-05-12 Thread रविशंकर नायर
Perhaps this link might help you. https://stackoverflow.com/questions/48699445/inner-join-not-working-in-dataframe-using-spark-2-1 Best, Passion On Sat, May 12, 2018, 10:57 AM ThomasThomas wrote: > Hi There, > > Our use case is like this. > > We have a nested(multiple)

Spark Streaming for more file types

2018-04-27 Thread रविशंकर नायर
All, I have the following methods in my scala code, currently executed on demand val files = sc.binaryFiles ("file:///imocks/data/ocr/raw") //Abive line takes all PDF files files.map(myconveter(_)).count myconverter signature: def myconverter ( file: (String,

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread रविशंकर नायर
Excellent. You filled a missing link. Best, Passion On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia wrote: > Hi, > > Happy to announce the availability of Sparklens as open source project. It > helps in understanding the scalability limits of spark applications and > can

Spark production scenario

2018-03-08 Thread रविशंकर नायर
Hi all, We are going to move to production with an 8 node Spark cluster. Request some help for below We are running on YARN cluster manager.That means YARN is installed with SSH between the nodes. When we run a standalone Spark program with spark-submit, YARN initializes a resource manager

Re: Spark StreamingContext Question

2018-03-07 Thread रविशंकर नायर
on. > Yet, that does not mean you need to put them all in the same job. You can > (and should) still submit different jobs for different application concerns. > > kind regards, Gerard. > > > > On Wed, Mar 7, 2018 at 4:56 AM, ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gm

Spark StreamingContext Question

2018-03-06 Thread रविशंकर नायर
Hi all, Understand from documentation that, only one streaming context can be active in a JVM at the same time. Hence in an enterprise cluster, how can we manage/handle multiple users are having many different streaming applications, one may be ingesting data from Flume, another from Twitter

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
rdReader.nextKeyValue(ParquetRecor >> dReader.java:201) >> >> So the table looks good but this needs to be fixed before you can query >> the data in hive. >> >> Thanks >> Deepak >> >> On Sun, Feb 11, 2018 at 1:45 PM, ☼ R Nair (रविशंकर नायर) < &

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
ne"; > > On Sun, Feb 11, 2018 at 1:38 PM, Deepak Sharma <deepakmc...@gmail.com> > wrote: > >> Try this in hive: >> alter table mine set locations "hdfs://localhost:8020/user/hi >> ve/warehouse/mine"; >> >> Thanks >> Deepak >>

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
;> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://t

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > I have created it using Spark SQL. Then I want to retrieve from HIVE. > Thats where the issue is. I can , still retrieve from Spark. No problems. > Why HIVE is not giving me the data?? > > > > On Sun, Feb

Re: Spark Dataframe and HIVE

2018-02-11 Thread रविशंकर नायर
rofile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any >

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
le\":true,\"metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\"name\":\"fl_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"fl_site_deductib

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
No, No luck. Thanks On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > In hive cli: > msck repair table 《table_name》; > > Thanks > Deepak > > On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" < > ravishankar.n...@gmail

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
before reading it in hive ? > > Thanks > Deepak > > On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" < > ravishankar.n...@gmail.com> wrote: > >> All, >> >> Thanks for the inputs. Again I am not successful. I think, we need to >> resolve thi

Re: Spark Dataframe and HIVE

2018-02-10 Thread रविशंकर नायर
\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\"type\":\"string\",\"nullable\":true,\"metadata\":

Re: Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
hanks On Fri, Feb 9, 2018 at 9:49 AM, ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > All, > > It has been three days continuously I am on this issue. Not getting any > clue. > > Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is

Spark Dataframe and HIVE

2018-02-09 Thread रविशंकर नायर
All, It has been three days continuously I am on this issue. Not getting any clue. Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is in spark's conf. 1) Step 1: I created a data frame DF1 reading a csv file. 2) Did manipulations on DF1. Resulting frame is passion_df.

Re: Spark Tuning Tool

2018-01-23 Thread रविशंकर नायर
Very good job, intact a missing link has been addressed. Any plan to porting to GITHUB, I would like to contribute. Best, RS On Tue, Jan 23, 2018 at 12:01 AM, Rohit Karlupia wrote: > Hi, > > I have been working on making the performance tuning of spark applications > bit

Re: Save hive table from spark in hive 2.1.0

2017-12-10 Thread रविशंकर नायर
Hi, Good try. As you can see, when you run upgrade using schematool, there is a duplicate column error. Can you please look the script generated and edit to avoid duplicate column? Not sure why the Hive guys made it complicated, I did face same issues like you. Can anyone else give a clean and

Re: Save hive table from spark in hive 2.1.1

2017-12-09 Thread रविशंकर नायर
There is an option in Hive site.xml to ignore metadata validation. I mean try making below as false and try. Hive schematool also can help. hive.metastore.schema.verification true Best, Ravion On Dec 9, 2017 5:56 PM, "konu" wrote: Iam using Spark 2.2 with scala,

Re: Question on using pseudo columns in spark jdbc options

2017-12-07 Thread रविशंकर नायर
t sure). > > The first question - could you just check and provide us the answer? :) > > Cheers, > Tomasz > > 2017-12-03 7:39 GMT+01:00 ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gmail.com>: > >> Hi all, >> >> I am using a query to fetch data from

Question on using pseudo columns in spark jdbc options

2017-12-02 Thread रविशंकर नायर
Hi all, I am using a query to fetch data from MYSQL as follows: var df = spark.read. format("jdbc"). option("url", "jdbc:mysql://10.0.0.192:3306/retail_db"). option("driver" ,"com.mysql.jdbc.Driver"). option("user", "retail_dba"). option("password", "cloudera"). option("dbtable", "orders").

sqlContext vs spark.

2017-02-03 Thread रविशंकर नायर
All, In Spark 1.6.0, we used val jdbcDF = sqlContext.read.format(-) for creating a data frame through hsbc. In Spark 2.1.x, we have seen this is val jdbcDF = *spark*.read.format(-) Does that mean we should not be using sqlContext going forward? Also, we see that sqlContext is not auto

No Reducer scenarios

2017-01-29 Thread रविशंकर नायर
Dear all, 1) When we don't set the reducer class in driver program, IdentityReducer is invoked. 2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked. Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which

Re: Dataframe caching

2017-01-20 Thread रविशंकर नायर
Thanks, Will look into this. Best regards, Ravion -- Forwarded message -- From: "Muthu Jayakumar" <bablo...@gmail.com> Date: Jan 20, 2017 10:56 AM Subject: Re: Dataframe caching To: "☼ R Nair (रविशंकर नायर)" <ravishankar.n...@gmail.com> C

Dataframe caching

2017-01-20 Thread रविशंकर नायर
Dear all, Here is a requirement I am thinking of implementing in Spark core. Please let me know if this is possible, and kindly provide your thoughts. A user executes a query to fetch 1 million records from , let's say a database. We let the user store this as a dataframe, partitioned across

Re: Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0

2016-05-02 Thread रविशंकर नायर
Oracle jdbc is not part of Maven repository, are you keeping a downloaded file in your local repo? Best, RS On May 2, 2016 8:51 PM, "Hien Luu" wrote: > Hi all, > > I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0. > It kept getting "Operation timed

New Spark User Group in Florida

2016-03-06 Thread रविशंकर नायर
Hi Organizer, We have just started a new user group for Spark in Florida. Can you please add this entry in Spark community ? Thanks Florida Spark Meetup Best regards, R Nair.