How do you want to combine Mahout and Solr? => that's was my question I was using mahout0.6 but from yesterday Mahout0.7.
So I was trying to run (just for test and making sure that everything works properly) ############################################################################################################### :/usr/local/mahout-distribution-0.7/examples/bin$ ./build-cluster-syntheticcontrol.sh Please call cluster-syntheticcontrol.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. canopy clustering 2. kmeans clustering 3. fuzzykmeans clustering 4. dirichlet clustering 5. meanshift clustering Enter your choice : 1 ok. You chose 1 and we'll use canopy Clustering creating work directory at /tmp/mahout-work-hduser Downloading Synthetic control data % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:01:03 --:--:-- 0curl: (7) couldn't connect to host Checking the health of DFS... Warning: $HADOOP_HOME is deprecated. Found 4 items drwxr-xr-x - hduser supergroup 0 2012-06-18 14:05 /user/hduser/gutenberg drwxr-xr-x - hduser supergroup 0 2012-06-18 14:07 /user/hduser/gutenberg-output drwxr-xr-x - hduser supergroup 0 2012-06-18 15:35 /user/hduser/output drwxr-xr-x - hduser supergroup 0 2012-06-19 14:24 /user/hduser/testdata DFS is healthy... Uploading Synthetic control data to HDFS Warning: $HADOOP_HOME is deprecated. Deleted hdfs://localhost:54310/user/hduser/testdata Warning: $HADOOP_HOME is deprecated. Warning: $HADOOP_HOME is deprecated. put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist. Successfully Uploaded Synthetic control data to HDFS Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. 12/06/20 08:20:24 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only 12/06/20 08:20:24 INFO canopy.Job: Running with default arguments 12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output 12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0 12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030 12/06/20 08:20:29 INFO mapred.JobClient: map 0% reduce 0% 12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030 12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4 12/06/20 08:20:52 INFO mapred.JobClient: Job Counters 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=10970 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/20 08:20:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/20 08:20:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 55.0 12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0 12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031 12/06/20 08:20:54 INFO mapred.JobClient: map 0% reduce 0% 12/06/20 08:21:17 INFO mapred.JobClient: map 0% reduce 100% 12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031 12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19 12/06/20 08:21:22 INFO mapred.JobClient: Job Counters 12/06/20 08:21:22 INFO mapred.JobClient: Launched reduce tasks=1 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9351 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/20 08:21:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/20 08:21:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=7740 12/06/20 08:21:22 INFO mapred.JobClient: File Output Format Counters 12/06/20 08:21:22 INFO mapred.JobClient: Bytes Written=106 12/06/20 08:21:22 INFO mapred.JobClient: FileSystemCounters 12/06/20 08:21:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22545 12/06/20 08:21:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=106 12/06/20 08:21:22 INFO mapred.JobClient: Map-Reduce Framework 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input groups=0 12/06/20 08:21:22 INFO mapred.JobClient: Combine output records=0 12/06/20 08:21:22 INFO mapred.JobClient: Reduce shuffle bytes=0 12/06/20 08:21:22 INFO mapred.JobClient: Physical memory (bytes) snapshot=40652800 12/06/20 08:21:22 INFO mapred.JobClient: Reduce output records=0 12/06/20 08:21:22 INFO mapred.JobClient: Spilled Records=0 12/06/20 08:21:22 INFO mapred.JobClient: CPU time spent (ms)=420 12/06/20 08:21:22 INFO mapred.JobClient: Total committed heap usage (bytes)=16252928 12/06/20 08:21:22 INFO mapred.JobClient: Virtual memory (bytes) snapshot=383250432 12/06/20 08:21:22 INFO mapred.JobClient: Combine input records=0 12/06/20 08:21:22 INFO mapred.JobClient: Reduce input records=0 12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0 12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032 12/06/20 08:21:24 INFO mapred.JobClient: map 0% reduce 0% 12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032 12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4 12/06/20 08:21:43 INFO mapred.JobClient: Job Counters 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=9347 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/20 08:21:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/20 08:21:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters 12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 1.3067666666666666) ############################################################################################### How do you want to combine Mahout and Solr? Also, Solr is a web service and can receive and supply data in several different formats. On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote: > Regarding the errors, > which version of Mahout are you using? > There was some problem in cluster-reuters.sh ( build-reuters.sh calls > cluster-reuters.sh ) which has been fixed in the last release 0.7. > ________________________________________ > From: Svet [svetlana.videnova <at> logica.com] > Sent: Tuesday, June 19, 2012 2:51 PM > To: user <at> mahout.apache.org > Subject: several info > > Hi all, > > > First of all i would like to thanks Praveenesh Kumar for helping me with > hadoop > and mahout!!! > > Nevertheless i have several questions about Mahout. > > 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to > make them starting together? > > 2)What exactly the possibilities of input and output files of Mahout > (especially > when Mahout works with SOLR, i know that output file of SOLR is XML)? > > 3)Which of thoses algorythms are using Hadoop? And please complete the list > if i > forgot some. > -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation > > > > > 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans > clustering (but its the same error with fuzzykmeans) > Can somebody help me with this error? (but look at 8) ! ) > ########################### > 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.IllegalStateException: No clusters found. Check your -c path. > at > org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/06/19 13:33:52 INFO mapred.JobClient: map 0% reduce 0% > 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001 > 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0 > Exception in thread "main" java.lang.InterruptedException: K-Means Iteration > failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part- > randomSeed > at > org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java: > 371) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja > va:316) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java > :239) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > a:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav > a:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > > ########################### > > > 5)problem also with "./build-reuters" but lda (but look at 8) ! ) > ############################ > 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124) > at > org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92) > at > org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper > .java:96) > at > org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav > a:102) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/06/19 13:40:02 INFO mapred.JobClient: map 0% reduce 0% > 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001 > 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0 > Exception in thread "main" java.lang.InterruptedException: LDA Iteration > failed > processing /tmp/mahout-work-hduser/reuters-lda/state-0 > at > org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449) > at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249) > at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav > a:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav > a:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > ############################ > > > 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote > 20clusters without problems (but look at 8) ! ) > The result is : > ############################ > ... > 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes: > 2.3768166666666666) > MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath. > MAHOUT_LOCAL is set, running locally > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout- > examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j- > jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j- > log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {-- > dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse- > dirichlet/dictionary.file-0, --dictionaryType=sequencefile, -- > distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur > e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, -- > seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, -- > startPhase=0, --substring=100, --tempDir=temp} > DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]} > Top Terms: > DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]} > Top Terms: > DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]} > Top Terms: > DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]} > Top Terms: > DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]} > Top Terms: > DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]} > Top Terms: > DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]} > Top Terms: > DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]} > Top Terms: > DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]} > Top Terms: > DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]} > Top Terms: > DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]} > Top Terms: > DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]} > Top Terms: > DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]} > Top Terms: > DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]} > Top Terms: > DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]} > Top Terms: > DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]} > Top Terms: > DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]} > Top Terms: > DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]} > Top Terms: > DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]} > Top Terms: > DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]} > Top Terms: > 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters > 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes: > 0.01315) > ############################ > > > 7) And the end : "./build-reuters" with minhash clustering. > Works good! > > > 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/ > > ... > > > > Thanks everybody > Regards > -- Lance Norskog goksron <at> gmail.com Think green - keep it on the screen. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.