If the rows in the input for SSVD are data points you are trying to create
reduced space for, then rows of USigma represent the same points in the PCA
(reduced) space. The mapping between the input rows and output rows is by
same keys in the sequence files. However, it doesn't look like your input
Hi,
I am trying to use SSVD for dimensionality reduction on Mahout, the input
is a sample data in CSV format. Below is a snippet of the input
22,2,44,36,5,9,2824,2,4,733,285,169
25,1,150,175,3,9,4037,2,18,1822,254,171
I have executed the below steps.
1. Loaded the csv file and Vectorized the da
As mahout streamingkmeans has no problems in sequential mode,
I would like to try sequential mode.
However, "java.lang.OutofMemoryError" occurs.
I wonder where to set JVM heap size for sequential mode?
Is it the same with mapreduce mode?
-Original Message-
From: fx MA XIAOJUN [mailto:x
-rskm option works only in sequential mode and fails in MR. That's still an
issue in present trunk that needs to be fixed.
That should explain why Streaming KMeans with -rskm works only in sequential
mode for you.
Mahout 0.9 has been built with Hadoop 1.2.1 profile, not sure if that's gonna
wor
Thank you for your extremely quick reply.
>> What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u
>> mean Streaming KMeans here?
I want to try using -rskm in streaming kmeans.
But in mahout 0.8, if setting -rskm as true, errors occur.
I heard that the bug has been fixed in 0.
You could take a look
at org.apache.mahout.clustering.classify/ClusterClassificationMapper
Enjoy,
Wei Shung
On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi wrote:
> The clustering code is cimapper and cireducer. Following the clustering,
> there is cluster classification which is mapper only.
>
On Monday, March 17, 2014 8:10 AM, Bikash Gupta
wrote:
Want to achieve few things
1. Normalize input data of clustering and classification algorithm
Not sure what you consider as normalization, but:
If u r trying to normalize text, Lucene's analyzers do it while generating term
vectors
Want to achieve few things
1. Normalize input data of clustering and classification algorithm
2. Normalize output data to plot in graph
On Mon, Mar 17, 2014 at 5:32 PM, Suneel Marthi wrote:
> What r u trying to do?
>
>
>
>
>
> On Monday, March 17, 2014 7:45 AM, Bikash Gupta
> wrote:
>
> Hi,
>
What r u trying to do?
On Monday, March 17, 2014 7:45 AM, Bikash Gupta
wrote:
Hi,
Do we have any utility for Column and Row normalization in Mahout?
--
Thanks & Regards
Bikash Gupta
Hi,
Do we have any utility for Column and Row normalization in Mahout?
--
Thanks & Regards
Bikash Gupta
I have 3 node cluster of CDH4.6, however I have build Mahout 0.9 with
Hadoop 2.x profile.
I have also created a mount point for these node and the path uri is same
as HDFS.
I have manually configured filesystem parameter
conf.set("fs.hdfs.impl",org.
apache.hadoop.hdfs.DistributedFileSystem.class
Have not seen that behavior with KMeans, what were ur settings again?
Sorry joining late onto this thread, hence have not looked at the entire
history.
On Monday, March 17, 2014 6:52 AM, Bikash Gupta
wrote:
Suneel,
Just for information, I havent found this issue in Canopy. Canopy cluster
Suneel,
Just for information, I havent found this issue in Canopy. Canopy cluster-0
was created in HDFS only.
However Kmeans cluster-0 was created in local file system and cluster-1 in
HDFS and after that it spit an error as it was unable to locate cluster-0
On Mon, Mar 17, 2014 at 3:10 PM, Sun
R u running on Hadoop 2.x which seems to be the case here.
Compile with hadoop 2 profile:
mvn -DskipTests clean install -Dhadoop2.profile=
On Monday, March 17, 2014 5:57 AM, Margusja wrote:
Hi
Here is my output:
[speech@h14 ~]$ mahout/bin/mahout seqdirectory -c UTF-8 -i
/user/speech/dem
Hi
Here is my output:
[speech@h14 ~]$ mahout/bin/mahout seqdirectory -c UTF-8 -i
/user/speech/demo -o demo-seqfiles
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/bin/hadoop and
HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/home/speech/mahout/examp
This problem's specifically to do with Canopy clustering and is not an issue
with KMeans. I had seen this behavior with Canopy and looking at the code its
indeed an issue wherein cluster-0 is created on the local file system and the
remaining clusters land on HDFS.
Please file a JIRA for this
On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN
wrote:
Thank you for your quick reply.
As to -km, I thought it was log10, instead of ln. I was wrong...
This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1,
Mahout 0.8)
The maps run faster than before, but the red
Thank you for your quick reply.
As to -km, I thought it was log10, instead of ln. I was wrong...
This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1,
Mahout 0.8)
The maps run faster than before, but the reduce was still stuck at 76% for ever.
So, I uninstalled mahout 0.
18 matches
Mail list logo