Hi,

I am trying to use SSVD for dimensionality reduction on Mahout, the input
is a sample data in CSV format. Below is a snippet of the input

22,2,44,36,5,9,2824,2,4,733,285,169
25,1,150,175,3,9,4037,2,18,1822,254,171

I have executed the below steps.

1. Loaded the csv file and Vectorized the data by following the steps
mentioned at https://github.com/tdunning/pig-vector with key as
TextConverter and value as VectorWritable. Listed below is the output of
this step. I believe the values 420468, 279945 are indices, please correct
me if I am wrong.
Key: 1: Value:
{420468:733.0,279945:2.0,607618:285.0,107323:4.0,88330:2.0,263605:9.0,975378:169.0,796003:2824.0,899937:44.0,422862:5.0,723271:22.0,508675:36.0}
Key: 1: Value:
{420468:1822.0,279945:2.0,607618:254.0,107323:18.0,88330:1.0,263605:9.0,975378:171.0,796003:4037.0,899937:150.0,422862:3.0,723271:25.0,508675:175.0}

2. Passed the output of the above command to SSVD as follows
bin/mahout ssvd -i /user/cloudera/vectorized_data/ -o
/user/cloudera/reduced_dimensions --rank 7 -us true -V false -U false -pca
true -ow -t 1

Below is a snippet of the output in USigma folder
Key: 1: Value:
{0:190.78376981262613,1:350.30406212052424,2:78.24932121461198,3:98.67283686605012,4:-122.95056058078157,5:-4.201436498582381,6:-1.4370820809434337}
Key: 1: Value:
{0:1295.933786837574,1:-698.5629072274602,2:-24.15996813349674,3:60.936737740013946,4:11.859426028893711,5:-6.379057682687426,6:0.9356299409590896}

Please let me know if my approach is correct and help me in interpreting
the output in USigma folder


Thanks in advance
Pratap

Reply via email to