subject:"SequenceFile and object reuse"

RE: SequenceFile and object reuse

2015-11-19 Thread jeff saremi

Sandy, Ryan, Andrew Thanks very much. I think i now understand it better. Jeff From: ryan.blake.willi...@gmail.com Date: Thu, 19 Nov 2015 06:00:30 + Subject: Re: SequenceFile and object reuse To: sandy.r...@cloudera.com; jeffsar...@hotmail.com CC: user@spark.apache.org Hey Jeff, in addition

Re: SequenceFile and object reuse

2015-11-18 Thread Ryan Williams

Hey Jeff, in addition to what Sandy said, there are two more reasons that this might not be as bad as it seems; I may be incorrect in my understanding though. First, the "additional step" you're referring to is not likely to be adding any overhead; the "extra map" is really just materializing the

Re: SequenceFile and object reuse

2015-11-18 Thread Sandy Ryza

Hi Jeff, Many access patterns simply take the result of hadoopFile and use it to create some other object, and thus have no need for each input record to refer to a different object. In those cases, the current API is more performant than an alternative that would create an object for each

SequenceFile and object reuse

2015-11-13 Thread jeff saremi

So we tried reading a sequencefile in Spark and realized that all our records have ended up becoming the same. THen one of us found this: Note: Because Hadoop's RecordReader class re-uses the same Writable object for each record, directly caching the returned RDD or directly passing it to an