How much work is to produce a small standalone reproduction? Can you create an Avro file with some mock data, maybe 10 or so records, then reproduce this locally?
On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman <igor.ber...@gmail.com> wrote: > switching to use simple pojos instead of using avro for spark > serialization solved the problem(I mean reading avro from s3 and than > mapping each avro object to it's pojo serializable counterpart with same > fields, pojo is registered withing kryo) > Any thought where to look for a problem/misconfiguration? > > On 31 May 2015 at 22:48, Igor Berman <igor.ber...@gmail.com> wrote: > >> Hi >> We are using spark 1.3.1 >> Avro-chill (tomorrow will check if its important) we register avro >> classes from java >> Avro 1.7.6 >> On May 31, 2015 22:37, "Josh Rosen" <rosenvi...@gmail.com> wrote: >> >>> Which Spark version are you using? I'd like to understand whether this >>> change could be caused by recent Kryo serializer re-use changes in master / >>> Spark 1.4. >>> >>> On Sun, May 31, 2015 at 11:31 AM, igor.berman <igor.ber...@gmail.com> >>> wrote: >>> >>>> after investigation the problem is somehow connected to avro >>>> serialization >>>> with kryo + chill-avro(mapping avro object to simple scala case class >>>> and >>>> running reduce on these case class objects solves the problem) >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >