How do you want to combine Mahout and Solr? => that's was my question

I was using mahout0.6 but from yesterday Mahout0.7.

So I was trying to run (just for test and making sure that everything works 
properly)



###############################################################################################################

:/usr/local/mahout-distribution-0.7/examples/bin$ 
./build-cluster-syntheticcontrol.sh

Please call cluster-syntheticcontrol.sh directly next time.  This file is going 
away.

Please select a number to choose the corresponding clustering algorithm

1. canopy clustering

2. kmeans clustering

3. fuzzykmeans clustering

4. dirichlet clustering

5. meanshift clustering

Enter your choice : 1

ok. You chose 1 and we'll use canopy Clustering

creating work directory at /tmp/mahout-work-hduser

Downloading Synthetic control data

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:--  0:01:03 --:--:--     
0curl: (7) couldn't connect to host

Checking the health of DFS...

Warning: $HADOOP_HOME is deprecated.



Found 4 items

drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:05 
/user/hduser/gutenberg

drwxr-xr-x   - hduser supergroup          0 2012-06-18 14:07 
/user/hduser/gutenberg-output

drwxr-xr-x   - hduser supergroup          0 2012-06-18 15:35 /user/hduser/output

drwxr-xr-x   - hduser supergroup          0 2012-06-19 14:24 
/user/hduser/testdata

DFS is healthy...

Uploading Synthetic control data to HDFS

Warning: $HADOOP_HOME is deprecated.



Deleted hdfs://localhost:54310/user/hduser/testdata

Warning: $HADOOP_HOME is deprecated.



Warning: $HADOOP_HOME is deprecated.



put: File /tmp/mahout-work-hduser/synthetic_control.data does not exist.

Successfully Uploaded Synthetic control data to HDFS

Warning: $HADOOP_HOME is deprecated.



Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=

MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar

Warning: $HADOOP_HOME is deprecated.



12/06/20 08:20:24 WARN driver.MahoutDriver: No 
org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on 
classpath, will use command-line arguments only

12/06/20 08:20:24 INFO canopy.Job: Running with default arguments

12/06/20 08:20:25 INFO common.HadoopUtil: Deleting output

12/06/20 08:20:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.

12/06/20 08:20:28 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:20:28 INFO mapred.JobClient: Running job: job_201206181326_0030

12/06/20 08:20:29 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:20:52 INFO mapred.JobClient: Job complete: job_201206181326_0030

12/06/20 08:20:52 INFO mapred.JobClient: Counters: 4

12/06/20 08:20:52 INFO mapred.JobClient:   Job Counters

12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10970

12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0

12/06/20 08:20:52 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0

12/06/20 08:20:52 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

12/06/20 08:20:52 INFO canopy.CanopyDriver: Build Clusters Input: output/data 
Out: output Measure: 
org.apache.mahout.common.distance.EuclideanDistanceMeasure@c5967f t1: 80.0 t2: 
55.0

12/06/20 08:20:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.

12/06/20 08:20:53 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:20:53 INFO mapred.JobClient: Running job: job_201206181326_0031

12/06/20 08:20:54 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:21:17 INFO mapred.JobClient:  map 0% reduce 100%

12/06/20 08:21:22 INFO mapred.JobClient: Job complete: job_201206181326_0031

12/06/20 08:21:22 INFO mapred.JobClient: Counters: 19

12/06/20 08:21:22 INFO mapred.JobClient:   Job Counters

12/06/20 08:21:22 INFO mapred.JobClient:     Launched reduce tasks=1

12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9351

12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0

12/06/20 08:21:22 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0

12/06/20 08:21:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7740

12/06/20 08:21:22 INFO mapred.JobClient:   File Output Format Counters

12/06/20 08:21:22 INFO mapred.JobClient:     Bytes Written=106

12/06/20 08:21:22 INFO mapred.JobClient:   FileSystemCounters

12/06/20 08:21:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=22545

12/06/20 08:21:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=106

12/06/20 08:21:22 INFO mapred.JobClient:   Map-Reduce Framework

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input groups=0

12/06/20 08:21:22 INFO mapred.JobClient:     Combine output records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce shuffle bytes=0

12/06/20 08:21:22 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=40652800

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce output records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Spilled Records=0

12/06/20 08:21:22 INFO mapred.JobClient:     CPU time spent (ms)=420

12/06/20 08:21:22 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=16252928

12/06/20 08:21:22 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=383250432

12/06/20 08:21:22 INFO mapred.JobClient:     Combine input records=0

12/06/20 08:21:22 INFO mapred.JobClient:     Reduce input records=0

12/06/20 08:21:22 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.

12/06/20 08:21:23 INFO input.FileInputFormat: Total input paths to process : 0

12/06/20 08:21:23 INFO mapred.JobClient: Running job: job_201206181326_0032

12/06/20 08:21:24 INFO mapred.JobClient:  map 0% reduce 0%

12/06/20 08:21:43 INFO mapred.JobClient: Job complete: job_201206181326_0032

12/06/20 08:21:43 INFO mapred.JobClient: Counters: 4

12/06/20 08:21:43 INFO mapred.JobClient:   Job Counters

12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9347

12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0

12/06/20 08:21:43 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0

12/06/20 08:21:43 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

12/06/20 08:21:43 INFO clustering.ClusterDumper: Wrote 0 clusters

12/06/20 08:21:43 INFO driver.MahoutDriver: Program took 78406 ms (Minutes: 
1.3067666666666666)

###############################################################################################























How do you want to combine Mahout and Solr? Also, Solr is a web

service and can receive and supply data in several different formats.



On Tue, Jun 19, 2012 at 6:04 AM, Paritosh Ranjan <pranjan <at> xebia.com> wrote:

> Regarding the errors,

> which version of Mahout are you using?

> There was some problem in cluster-reuters.sh ( build-reuters.sh calls 
> cluster-reuters.sh ) which has

been fixed in the last release 0.7.

> ________________________________________

> From: Svet [svetlana.videnova <at> logica.com]

> Sent: Tuesday, June 19, 2012 2:51 PM

> To: user <at> mahout.apache.org

> Subject: several info

>

> Hi all,

>

>

> First of all i would like to thanks Praveenesh Kumar for helping me with 
> hadoop

> and mahout!!!

>

> Nevertheless i have several questions about Mahout.

>

> 1) I need Mahout working with SOLR. Can somebody give me a great tutorial to

> make them starting together?

>

> 2)What exactly the possibilities of input and output files of Mahout 
> (especially

> when Mahout works with SOLR, i know that output file of SOLR is XML)?

>

> 3)Which of thoses algorythms are using Hadoop? And please complete the list 
> if i

> forgot some.

>          -Canopy, KMeans, Dirichlet, Mean-shift, Latent Dirichlet Allocation

>

>

>

>

> 4)Moreover i was trying to run "./build-reuters.sh" and choosing kmeans

> clustering (but its the same error with fuzzykmeans)

>  Can somebody help me with this error? (but look at 8) ! )

> ###########################

> 12/06/19 13:33:52 WARN mapred.LocalJobRunner: job_local_0001

> java.lang.IllegalStateException: No clusters found. Check your -c path.

>        at

> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)

>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)

>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

>        at

> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

> 12/06/19 13:33:52 INFO mapred.JobClient:  map 0% reduce 0%

> 12/06/19 13:33:52 INFO mapred.JobClient: Job complete: job_local_0001

> 12/06/19 13:33:52 INFO mapred.JobClient: Counters: 0

> Exception in thread "main" java.lang.InterruptedException: K-Means Iteration

> failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-

> randomSeed

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:

> 371)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.ja

> va:316)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java

> :239)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)

>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

>        at

> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)

>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

>        at

> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

>        at

> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

> a:43)

>        at java.lang.reflect.Method.invoke(Method.java:601)

>        at

> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav

> a:68)

>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

>

> ###########################

>

>

> 5)problem also with "./build-reuters" but lda (but look at 8) ! )

> ############################

> 12/06/19 13:40:01 WARN mapred.LocalJobRunner: job_local_0001

> java.lang.IllegalArgumentException

>        at

> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)

>        at

> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:124)

>        at

> org.apache.mahout.clustering.lda.LDADriver.createState(LDADriver.java:92)

>        at

> org.apache.mahout.clustering.lda.LDAWordTopicMapper.configure(LDAWordTopicMapper

> .java:96)

>        at

> org.apache.mahout.clustering.lda.LDAWordTopicMapper.setup(LDAWordTopicMapper.jav

> a:102)

>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)

>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

>        at

> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

> 12/06/19 13:40:02 INFO mapred.JobClient:  map 0% reduce 0%

> 12/06/19 13:40:02 INFO mapred.JobClient: Job complete: job_local_0001

> 12/06/19 13:40:02 INFO mapred.JobClient: Counters: 0

> Exception in thread "main" java.lang.InterruptedException: LDA Iteration 
> failed

> processing /tmp/mahout-work-hduser/reuters-lda/state-0

>        at

> org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:449)

>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:249)

>        at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:169)

>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

>        at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:88)

>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

>        at

> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

>        at

> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

> a:43)

>        at java.lang.reflect.Method.invoke(Method.java:601)

>        at

> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.jav

> a:68)

>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)

>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

> ############################

>

>

> 6)But i was starting "./build-reuters" with dirichlet clustering and it wrote

> 20clusters without problems (but look at 8) ! )

> The result is :

> ############################

> ...

> 12/06/19 13:45:12 INFO driver.MahoutDriver: Program took 142609 ms (Minutes:

> 2.3768166666666666)

> MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.

> MAHOUT_LOCAL is set, running locally

> SLF4J: Class path contains multiple SLF4J bindings.

> SLF4J: Found binding in [jar:file:/usr/local/mahout-distribution-0.6/mahout-

> examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: Found binding in 
> [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-

> jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: Found binding in 
> [jar:file:/usr/local/mahout-distribution-0.6/lib/slf4j-

> log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.

> 12/06/19 13:45:13 INFO common.AbstractJob: Command line arguments: {--

> dictionary=/tmp/mahout-work-hduser/reuters-out-seqdir-sparse-

> dirichlet/dictionary.file-0, --dictionaryType=sequencefile, --

> distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasur

> e, --endPhase=2147483647, --numWords=20, --outputFormat=TEXT, --

> seqFileDir=/tmp/mahout-work-hduser/reuters-dirichlet/clusters-20-final, --

> startPhase=0, --substring=100, --tempDir=temp}

> DC-0 total= 0 model= DMC:0{n=0 c=[] r=[]}

>        Top Terms:

> DC-1 total= 0 model= DMC:1{n=0 c=[] r=[]}

>        Top Terms:

> DC-10 total= 0 model= DMC:10{n=0 c=[] r=[]}

>        Top Terms:

> DC-11 total= 0 model= DMC:11{n=0 c=[] r=[]}

>        Top Terms:

> DC-12 total= 0 model= DMC:12{n=0 c=[] r=[]}

>        Top Terms:

> DC-13 total= 0 model= DMC:13{n=0 c=[] r=[]}

>        Top Terms:

> DC-14 total= 0 model= DMC:14{n=0 c=[] r=[]}

>        Top Terms:

> DC-15 total= 0 model= DMC:15{n=0 c=[] r=[]}

>        Top Terms:

> DC-16 total= 0 model= DMC:16{n=0 c=[] r=[]}

>        Top Terms:

> DC-17 total= 0 model= DMC:17{n=0 c=[] r=[]}

>        Top Terms:

> DC-18 total= 0 model= DMC:18{n=0 c=[] r=[]}

>        Top Terms:

> DC-19 total= 0 model= DMC:19{n=0 c=[] r=[]}

>        Top Terms:

> DC-2 total= 0 model= DMC:2{n=0 c=[] r=[]}

>        Top Terms:

> DC-3 total= 0 model= DMC:3{n=0 c=[] r=[]}

>        Top Terms:

> DC-4 total= 0 model= DMC:4{n=0 c=[] r=[]}

>        Top Terms:

> DC-5 total= 0 model= DMC:5{n=0 c=[] r=[]}

>        Top Terms:

> DC-6 total= 0 model= DMC:6{n=0 c=[] r=[]}

>        Top Terms:

> DC-7 total= 0 model= DMC:7{n=0 c=[] r=[]}

>        Top Terms:

> DC-8 total= 0 model= DMC:8{n=0 c=[] r=[]}

>        Top Terms:

> DC-9 total= 0 model= DMC:9{n=0 c=[] r=[]}

>        Top Terms:

> 12/06/19 13:45:14 INFO clustering.ClusterDumper: Wrote 20 clusters

> 12/06/19 13:45:14 INFO driver.MahoutDriver: Program took 789 ms (Minutes:

> 0.01315)

> ############################

>

>

> 7) And the end : "./build-reuters" with minhash clustering.

> Works good!

>

>

> 8) For 4),5),6) and 7) there is SUCCESS file in /tmp/mahout-work-hduser/

>

> ...

>

>

>

> Thanks everybody

> Regards

>



--

Lance Norskog

goksron <at> gmail.com





Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Reply via email to