Hi, I am trying to use SSVD for dimensionality reduction on Mahout, the input is a sample data in CSV format. Below is a snippet of the input
22,2,44,36,5,9,2824,2,4,733,285,169 25,1,150,175,3,9,4037,2,18,1822,254,171 I have executed the below steps. 1. Loaded the csv file and Vectorized the data by following the steps mentioned at https://github.com/tdunning/pig-vector with key as TextConverter and value as VectorWritable. Listed below is the output of this step. I believe the values 420468, 279945 are indices, please correct me if I am wrong. Key: 1: Value: {420468:733.0,279945:2.0,607618:285.0,107323:4.0,88330:2.0,263605:9.0,975378:169.0,796003:2824.0,899937:44.0,422862:5.0,723271:22.0,508675:36.0} Key: 1: Value: {420468:1822.0,279945:2.0,607618:254.0,107323:18.0,88330:1.0,263605:9.0,975378:171.0,796003:4037.0,899937:150.0,422862:3.0,723271:25.0,508675:175.0} 2. Passed the output of the above command to SSVD as follows bin/mahout ssvd -i /user/cloudera/vectorized_data/ -o /user/cloudera/reduced_dimensions --rank 7 -us true -V false -U false -pca true -ow -t 1 Below is a snippet of the output in USigma folder Key: 1: Value: {0:190.78376981262613,1:350.30406212052424,2:78.24932121461198,3:98.67283686605012,4:-122.95056058078157,5:-4.201436498582381,6:-1.4370820809434337} Key: 1: Value: {0:1295.933786837574,1:-698.5629072274602,2:-24.15996813349674,3:60.936737740013946,4:11.859426028893711,5:-6.379057682687426,6:0.9356299409590896} Please let me know if my approach is correct and help me in interpreting the output in USigma folder Thanks in advance Pratap