Re: New to Spark - Paritioning Question

2015-09-09 Thread Richard Marscher
that mine previously. >>> >>> To ensure that this works, the idea if to: >>> >>> 1) Filter the superset to relevant mines (done) >>> 2) Group the subset by the unique identifier for the mine. So, a group >>> may >>> be all the rows for mine

Re: New to Spark - Paritioning Question

2015-09-08 Thread Richard Marscher
gt; > It's step 3 that is confusing me. I suspect it's very easy ... do I simply > use PartitionByKey? > > We're using Java if that makes any difference. > > Thanks! > > > > -- > View this message in context: >

Re: New to Spark - Paritioning Question

2015-09-08 Thread Mike Wright
"A" for 1990-2015 >> 3) I then want to ensure that the RDD is partitioned by the Mine >> Identifier >> (and Integer). >> >> It's step 3 that is confusing me. I suspect it's very easy ... do I simply

New to Spark - Paritioning Question

2015-09-04 Thread mmike87
nt to ensure that the RDD is partitioned by the Mine Identifier (and Integer). It's step 3 that is confusing me. I suspect it's very easy ... do I simply use PartitionByKey? We're using Java if that makes any difference. Thanks! -- View this message in context: http://apache-spark-user-list.100156