Hi SurajWhat will be your output after group by? Since GroupBy is for 
aggregations like sum, count etc.
If you want to count the 2015 records than it is possible. Kind Regards
Salih Oztop


      From: Suraj Shetiya <surajshet...@gmail.com>
 To: user@spark.apache.org 
 Sent: Tuesday, June 30, 2015 3:05 PM
 Subject: Spark Dataframe 1.4 (GroupBy partial match)
   
I have a dataset (trimmed and simplified) with 2 columns as below.

Date                Subject
2015-01-14      "SEC Inquiry"
2014-02-12       "Happy birthday"
2014-02-13       "Re: Happy birthday"
2015-01-16       "Re: SEC Inquiry"
2015-01-18       "Fwd: Re: SEC Inquiry"

I have imported the same in a Spark Dataframe. What I am looking at is groupBy 
subject field (however, I need a partial match to identify the discussion 
topic). 

For example in the above case.. I would like to group all messages, which have 
subject containing "SEC Inquiry" which returns following grouped frame: 

2015-01-14      "SEC Inquiry"
2015-01-16       "Re: SEC Inquiry"
2015-01-18       "Fwd: Re: SEC Inquiry"

Another usecase for a similar problem could be group by year (in the above 
example), it would mean partial match of the date field, which would mean 
groupBy Date by matching year as "2014" or "2015".

Keenly Looking forward to reply/solution to the above.

- Suraj






  

Reply via email to