That's right, though it's possible the default way Kryo chooses to serialize the object doesn't work. I'd debug a little more and print out as much as you can about the DateTime object at the point it appears to not work. I think there's a real problem and it only happens to not turn up for the map + take(1) for reasons below.
Sandy I know you work with DateTime for spark-timeseries; does this ring a bell? On Thu, Jan 14, 2016 at 2:28 PM, Spencer, Alex (Santander) <alex.spen...@santander.co.uk> wrote: > Hi, > > I tried take(1500) and test.collect and these both work on the "single" map > statement. > > I'm very new to Kryo serialisation, I managed to find some code and I copied > and pasted and that's what originally made the single map statement work: > > class MyRegistrator extends KryoRegistrator { > override def registerClasses(kryo: Kryo) { > kryo.register(classOf[org.joda.time.DateTime]) > } > } > > Is it because the groupBy sees a different class type? Maybe Array[DateTime]? > I don’t want to find the answer by trial and error though. > > Alex > > -----Original Message----- > From: Sean Owen [mailto:so...@cloudera.com] > Sent: 14 January 2016 14:07 > To: Spencer, Alex (Santander) > Cc: user@spark.apache.org > Subject: Re: NPE when using Joda DateTime > > It does look somehow like the state of the DateTime object isn't being > recreated properly on deserialization somehow, given where the NPE occurs > (look at the Joda source code). However the object is java.io.Serializable. > Are you sure the Kryo serialization is correct? > > It doesn't quite explain why the map operation works by itself. It could be > the difference between executing locally (take(1) will look at 1 partition in > 1 task which prefers to be local) and executing remotely (groupBy is going to > need a shuffle). > > On Thu, Jan 14, 2016 at 1:01 PM, Spencer, Alex (Santander) > <alex.spen...@santander.co.uk.invalid> wrote: >> Hello, >> >> >> >> I was wondering if somebody is able to help me get to the bottom of a >> null pointer exception I’m seeing in my code. I’ve managed to narrow >> down a problem in a larger class to my use of Joda’s DateTime >> functions. I’ve successfully run my code in scala, but I’ve hit a few >> problems when adapting it to run in spark. >> >> >> >> Spark version: 1.3.0 >> >> Scala version: 2.10.4 >> >> Java HotSpot 1.7 >> >> >> >> I have a small case class called Transaction, which looks something >> like >> this: >> >> >> >> case class Transaction(date : org.joda.time.DateTime = new >> org.joda.time.DateTime()) >> >> >> >> I have an RDD[Transactions] trans: >> >> org.apache.spark.rdd.RDD[Transaction] = MapPartitionsRDD[4] at map at >> <console>:44 >> >> >> >> I am able to run this successfully: >> >> >> >> val test = trans.map(_.date.minusYears(10)) >> >> test.take(1) >> >> >> >> However if I do: >> >> >> >> val groupedTrans = trans.groupBy(_.account) >> >> >> >> //For each group, process transactions in turn: >> >> val test = groupedTrans.flatMap { case (_, transList) => >> >> transList.map {transaction => >> >> transaction.date.minusYears(10) >> >> } >> >> } >> >> test.take(1) >> >> >> >> I get: >> >> >> >> java.lang.NullPointerException >> >> at org.joda.time.DateTime.minusYears(DateTime.java:1268) >> >> >> >> Should the second operation not be equivalent to the first .map one? >> (It’s a long way round of producing my error – but it’s extremely >> similar to what’s happening in my class). >> >> >> >> I’ve got a custom registration class for Kryo which I think is working >> - before I added this the original .map did not work – but shouldn’t >> it be able to serialize all instances of Joda DateTime? >> >> >> >> Thank you for any help / pointers you can give me. >> >> >> >> Kind Regards, >> >> Alex. >> >> >> >> Alex Spencer >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org > > Emails aren't always secure, and they may be intercepted or changed after > they've been sent. Santander doesn't accept liability if this happens. If you > think someone may have interfered with this email, please get in touch with > the > sender another way. This message doesn't create or change any contract. > Santander doesn't accept responsibility for damage caused by any viruses > contained in this email or its attachments. Emails may be monitored. If you've > received this email by mistake, please let the sender know at once that it's > gone to the wrong person and then destroy it without copying, using, or > telling > anyone about its contents. > Santander UK plc Reg. No. 2294747 and Abbey National Treasury Services plc > Reg. > No. 2338548 Registered Offices: 2 Triton Square, Regent's Place, London NW1 > 3AN. > Registered in England. www.santander.co.uk. Authorised by the Prudential > Regulation Authority and regulated by the Financial Conduct Authority and the > Prudential Regulation Authority. FCA Reg. No. 106054 and 146003 respectively. > Santander Sharedealing is a trading name of Abbey Stockbrokers Limited Reg. > No. > 02666793. Registered Office: Kingfisher House, Radford Way, Billericay, Essex > CM12 0GZ. Authorised and regulated by the Financial Conduct Authority. FCA > Reg. > No. 154210. You can check this on the Financial Services Register by visiting > the FCA’s website www.fca.org.uk/register or by contacting the FCA on 0800 111 > 6768. Santander UK plc is also licensed by the Financial Supervision > Commission > of the Isle of Man for its branch in the Isle of Man. Deposits held with the > Isle of Man branch are covered by the Isle of Man Depositors’ Compensation > Scheme as set out in the Isle of Man Depositors’ Compensation Scheme > Regulations > 2010. In the Isle of Man, Santander UK plc’s principal place of business is at > 19/21 Prospect Hill, Douglas, Isle of Man, IM1 1ET. Santander and the flame > logo > are registered trademarks. > Santander Asset Finance plc. Reg. No. 1533123. Registered Office: 2 Triton > Square, Regent’s Place, London NW1 3AN. Registered in England. Santander > Corporate & Commercial is a brand name used by Santander UK plc, Abbey > National > Treasury Services plc and Santander Asset Finance plc. > Ref:[PDB#1-4A] --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org