Hi,

How can I convert this Python Numpy code into Spark RDD so that the
operations leverage the Spark distributed architecture for Big Data.

Code is as follows -

def gini(array):
    """Calculate the Gini coefficient of a numpy array."""
    array = array.flatten() #all values are treated equally, arrays must be 1d
    if np.amin(array) < 0:
        array -= np.amin(array) #values cannot be negative
    array += 0.0000001 #values cannot be 0
    array = np.sort(array) #values must be sorted
    index = np.arange(1,array.shape[0]+1) #index per array element
    n = array.shape[0]#number of array elements
    return ((np.sum((2 * index - n  - 1) * array)) / (n *
np.sum(array))) #Gini coefficient




Thanks in adv,
Aakash.

Reply via email to