subject:"RE\: quickly counting the number of rows in a partition\?"

Re: quickly counting the number of rows in a partition?

2015-01-14 Thread Michael Segel

Sorry, but the accumulator is still going to require you to walk through the RDD to get an accurate count, right? Its not being persisted? On Jan 14, 2015, at 5:17 AM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: Alternative to doing a naive toArray is to declare an accumulator per

Re: quickly counting the number of rows in a partition?

2015-01-13 Thread Tobias Pfeiffer

Hi again, On Wed, Jan 14, 2015 at 10:06 AM, Tobias Pfeiffer t...@preferred.jp wrote: If you think of items.map(x = /* throw exception */).count() then even though the count you want to get does not necessarily require the evaluation of the function in map() (i.e., the number is the

RE: quickly counting the number of rows in a partition?

2015-01-13 Thread Ganelin, Ilya

Alternative to doing a naive toArray is to declare an accumulator per partition and use that. It's specifically what they were designed to do. See the programming guide. Sent with Good (www.good.com) -Original Message- From: Tobias Pfeiffer

Re: quickly counting the number of rows in a partition?

2015-01-13 Thread Tobias Pfeiffer

Hi, On Mon, Jan 12, 2015 at 8:09 PM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: Use the mapPartitions function. It returns an iterator to each partition. Then just get that length by converting to an array. On Tue, Jan 13, 2015 at 2:50 PM, Kevin Burton bur...@spinn3r.com wrote:

Re: quickly counting the number of rows in a partition?

2015-01-12 Thread Sven Krasser

Yes, using mapPartitionsWithIndex, e.g. in PySpark: sc.parallelize(xrange(0,1000), 4).mapPartitionsWithIndex(lambda idx,iter: ((idx, len(list(iter))),)).collect() [(0, 250), (1, 250), (2, 250), (3, 250)] (This is not the most efficient way to get the length of an iterator, but you get the

Re: quickly counting the number of rows in a partition?

Re: quickly counting the number of rows in a partition?

RE: quickly counting the number of rows in a partition?

Re: quickly counting the number of rows in a partition?

Re: quickly counting the number of rows in a partition?

5 matches

Site Navigation

Mail list logo

Footer information