that mine previously.
>>>
>>> To ensure that this works, the idea if to:
>>>
>>> 1) Filter the superset to relevant mines (done)
>>> 2) Group the subset by the unique identifier for the mine. So, a group
>>> may
>>> be all the rows for mine
gt;
> It's step 3 that is confusing me. I suspect it's very easy ... do I simply
> use PartitionByKey?
>
> We're using Java if that makes any difference.
>
> Thanks!
>
>
>
> --
> View this message in context:
>
"A" for 1990-2015
>> 3) I then want to ensure that the RDD is partitioned by the Mine
>> Identifier
>> (and Integer).
>>
>> It's step 3 that is confusing me. I suspect it's very easy ... do I simply
nt to ensure that the RDD is partitioned by the Mine Identifier
(and Integer).
It's step 3 that is confusing me. I suspect it's very easy ... do I simply
use PartitionByKey?
We're using Java if that makes any difference.
Thanks!
--
View this message in context:
http://apache-spark-user-list.100156