Re: Spark Dataframe 1.4 (GroupBy partial match)

2015-07-03 Thread Suraj Shetiya
://www.linkedin.com/in/salihoztop -- *From:* Suraj Shetiya surajshet...@gmail.com *To:* Michael Armbrust mich...@databricks.com *Cc:* Salih Oztop soz...@yahoo.com; user@spark.apache.org user@spark.apache.org; megha.sridh...@cynepia.com *Sent:* Thursday, July 2, 2015

Re: Spark Dataframe 1.4 (GroupBy partial match)

2015-07-02 Thread Suraj Shetiya
Date: Jul 2, 2015 12:49 AM Subject: Re: Spark Dataframe 1.4 (GroupBy partial match) To: Suraj Shetiya surajshet...@gmail.com Cc: Salih Oztop soz...@yahoo.com, user@spark.apache.org user@spark.apache.org You should probably write a UDF that uses regular expression or other string munging

Re: Spark Dataframe 1.4 (GroupBy partial match)

2015-06-30 Thread Suraj Shetiya
. If you want to count the 2015 records than it is possible. Kind Regards Salih Oztop -- *From:* Suraj Shetiya surajshet...@gmail.com *To:* user@spark.apache.org *Sent:* Tuesday, June 30, 2015 3:05 PM *Subject:* Spark Dataframe 1.4 (GroupBy partial match) I have

Spark Dataframe 1.4 (GroupBy partial match)

2015-06-30 Thread Suraj Shetiya
I have a dataset (trimmed and simplified) with 2 columns as below. DateSubject 2015-01-14 SEC Inquiry 2014-02-12 Happy birthday 2014-02-13 Re: Happy birthday 2015-01-16 Re: SEC Inquiry 2015-01-18 Fwd: Re: SEC Inquiry I have imported the same in a

Spark group by sub coulumn

2015-06-19 Thread Suraj Shetiya
Hi, I wanted to obtain a grouped by frame from a dataframe. A snippet of the column on which I need to perform groupby is below. df.select(To).show() To ArrayBuffer(vance... ArrayBuffer(vance... ArrayBuffer(rober... ArrayBuffer(richa... ArrayBuffer(guill... ArrayBuffer(m..pr...

Pipeline in pyspark

2015-04-23 Thread Suraj Shetiya
Hi, I have come across ways of building pipeline of input/transform and output pipelines with Java (Google Dataflow/Spark etc). I also understand that Spark itelf provides ways for creating a pipeline within mlib for MLtransforms (primarily fit) Both of the above are available in Java/Scala