Re: Data in one partition after reduceByKey

2015-11-25 Thread Ruslan Dautkhanov
public long getTime() Returns the number of milliseconds since January 1, 1970, 00:00:00 GMT represented by this Date object. http://docs.oracle.com/javase/7/docs/api/java/util/Date.html#getTime%28%29 Based on what you did i might be easier to get date partitioner from that. Also, to get even

Re: Data in one partition after reduceByKey

2015-11-23 Thread Patrick McGloin
I will answer my own question, since I figured it out. Here is my answer in case anyone else has the same issue. My DateTimes were all without seconds and milliseconds since I wanted to group data belonging to the same minute. The hashCode() for Joda DateTimes which are one minute apart is a

Data in one partition after reduceByKey

2015-11-20 Thread Patrick McGloin
Hi, I have Spark application which contains the following segment: val reparitioned = rdd.repartition(16) val filtered: RDD[(MyKey, myData)] = MyUtils.filter(reparitioned, startDate, endDate) val mapped: RDD[(DateTime, myData)] = filtered.map(kv=(kv._1.processingTime, kv._2)) val reduced: