I'm looking into how to do more efficient jobs by using partition strategies, but I'm hitting a blocker after I do a `union` between two RDDs. Suppose A and B are both RDDs with the same partitioner, that is,
`A.partitioner == B.partitioner` If I do A.union(B), the resulting RDD has None partitioner. If I am reading the code correctly, this comes from this line returning False and a partitioner never being set: https://github.com/apache/spark/blob/95f4fbae52d26ede94c3ba8248394749f3d95dcc/python/pyspark/rdd.py#L532 What confuses me is that that line will _always_ return False: is it not true that the union of RDDs results in the sum of number of partitions, which will never be equal to self.getNumPartitions? Am I missing something about this logical check? -- Cam Davidson-Pilon cam...@shopify.com