Iterating all columns in a pyspark dataframe

2020-09-04 Thread Devi P.V
Hi all, What is the best approach for iterating all columns in a pyspark dataframe?I want to apply some conditions on all columns in the dataframe. Currently I am using for loop for iteration. Is it a good practice while using Spark and I am using Spark 3.0 Please advice Thanks, Devi

FP growth - Items in a transaction must be unique

2017-02-01 Thread Devi P.V
Hi all, I am trying to run FP growth algorithm using spark and scala.sample input dataframe is following, +---+ |productName

How to find unique values after groupBy() in spark dataframe ?

2016-12-08 Thread Devi P.V
Hi all, I have a dataframe like following, +-+--+ |client_id|Date | + +--+ | a |2016-11-23| | b |2016-11-18| | a |2016-11-23| | a |2016-11-23| | a |2016-11-24| +-+--+ I want to find unique dates of each client_id

Re: How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-05 Thread Devi P.V
[client_id: string, ts: >>> string, ts1: string] >>> >>> scala> finaldf.show >>> ++-+---+ >>> | client_id| ts| ts1| >>> ++-+---+ >>> |cd646551-fceb-416...|1477989416803|48805-08-14| >

Re: How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-05 Thread Devi P.V
Hi, Thanks for replying to my question. I am using scala On Mon, Dec 5, 2016 at 1:20 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi > In python you can use date time.fromtimestamp(..). > strftime('%Y%m%d') > Which spark API are you using? > Kr > > O

How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-04 Thread Devi P.V
Hi all, I have a dataframe like following, ++---+ |client_id |timestamp| ++---+ |cd646551-fceb-4166-acbc-b9|1477989416803 | |3bc61951-0f49-43bf-9848-b2|1477983725292 |

what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-15 Thread Devi P.V
Hi all, I have 4 data frames with three columns, client_id,product_id,interest I want to combine these 4 dataframes into one dataframe.I used union like following df1.union(df2).union(df3).union(df4) But it is time consuming for bigdata.what is the optimized way for doing this using spark 2.0

Re: Couchbase-Spark 2.0.0

2016-10-17 Thread Devi P.V
built.sbt. Is my query wrong or anything else needed to import? Please help. On Sun, Oct 16, 2016 at 8:23 PM, Rodrick Brown <rodr...@orchardplatform.com> wrote: > > > On Sun, Oct 16, 2016 at 10:51 AM, Devi P.V <devip2...@gmail.com> wrote: > >> Hi all, >> I am

Couchbase-Spark 2.0.0

2016-10-16 Thread Devi P.V
Hi all, I am trying to read data from couchbase using spark 2.0.0.I need to fetch complete data from a bucket as Rdd.How can I solve this?Does spark 2.0.0 support couchbase?Please help. Thanks

Re: How to write data into CouchBase using Spark & Scala?

2016-09-07 Thread Devi P.V
> > -- > Oleksiy Dyagilev > > On Wed, Sep 7, 2016 at 9:42 AM, Devi P.V <devip2...@gmail.com> wrote: > >> I am newbie in CouchBase.I am trying to write data into CouchBase.My >> sample code is following, >> >> val cfg = new SparkConf() >

How to write data into CouchBase using Spark & Scala?

2016-09-07 Thread Devi P.V
I am newbie in CouchBase.I am trying to write data into CouchBase.My sample code is following, val cfg = new SparkConf() .setAppName("couchbaseQuickstart") .setMaster("local[*]") .set("com.couchbase.bucket.MyBucket","pwd") val sc = new SparkContext(cfg) val doc1 =

Re: How to install spark with s3 on AWS?

2016-08-26 Thread Devi P.V
The following piece of code works for me to read data from S3 using Spark. val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]") val sc = new SparkContext(conf) val hadoopConf=sc.hadoopConfiguration; hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native

Re: Spark MLlib:Collaborative Filtering

2016-08-24 Thread Devi P.V
;Courage doesn't always roar. Sometimes courage is the quiet voice at the > end of the day saying I will try again" > > > > From:glen <cng...@126.com> > To:"Devi P.V" <devip2...@gmail.com> > Cc:"user@spark.apache.org" &

Spark MLlib:Collaborative Filtering

2016-08-24 Thread Devi P.V
Hi all, I am newbie in collaborative filtering.I want to implement collaborative filtering algorithm(need to find top 10 recommended products) using Spark and Scala.I have a rating dataset where userID & ProductID are String type. UserID ProductID Rating b3a68043-c1

What are the configurations needs to connect spark and ms-sql server?

2016-08-08 Thread Devi P.V
Hi all, I am trying to write a spark dataframe into MS-Sql Server.I have tried using the following code, val sqlprop = new java.util.Properties sqlprop.setProperty("user","uname") sqlprop.setProperty("password","pwd")

How to connect Power BI to Apache Spark on local machine?

2016-08-04 Thread Devi P.V
Hi all, I am newbie in Power BI.What are the configurations need to connect Power BI to spark on my local machine? I found some documents that mentioned spark over Azure's HDInsight .But didn't find any reference materials for connecting Spark to remote machine? Is it possible? following is the

Optimized way to multiply two large matrices and save output using Spark and Scala

2016-01-13 Thread Devi P.V
I want to multiply two large matrices (from csv files)using Spark and Scala and save output.I use the following code val rows=file1.coalesce(1,false).map(x=>{ val line=x.split(delimiter).map(_.toDouble) Vectors.sparse(line.length, line.zipWithIndex.map(e => (e._2,

Count of distinct values in each column

2015-07-29 Thread Devi P.V
Hi All, I have a 5GB CSV dataset having 69 columns..I need to find the count of distinct values in each column. What is the optimized way to find the same using spark scala? Example CSV format : a,b,c,d a,c,b,a b,b,c,d b,b,c,a c,b,b,a Output expecting : (a,2),(b,2),(c,1) #- First column