[ https://issues.apache.org/jira/browse/MAHOUT-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved MAHOUT-789. ------------------------------ Resolution: Cannot Reproduce I don't think this is enough info, or at least, this is not nearly narrowed down enough to point to a problem in the classifier. What have you tried in debugging? Want to get some indication you've ruled out problems in your env or data. Reopen if so. > testclassifier seems does not work using kdd data set > ----------------------------------------------------- > > Key: MAHOUT-789 > URL: https://issues.apache.org/jira/browse/MAHOUT-789 > Project: Mahout > Issue Type: Bug > Components: Classification > Affects Versions: 0.6 > Environment: CENTOS 5.5, Hadoop 0.20.203, and latest Mahout 0.6 > snapshop. > Reporter: XiaoboGu > > I am now testing the trainclassifier and testclassifier commands in > mabout, I prepaired a nbdf-train.csv and nbdf-test.csv file with the > following R commands: > df <- read.arff(file = "d:/temp/kdd/KDDTrain+.arff") > nbdf <- data.frame(class=df["class"], > protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"]) > nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]]) > nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]]) > nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]]) > nbdf[["land"]] <- as.factor(nbdf[["land"]]) > write.table(nbdf, file="D:/nbdf-train.csv", row.names=FALSE, > col.names=FALSE, quote=FALSE, sep="\t") > df <- read.arff(file = "d:/temp/kdd/KDDTest+.arff") > nbdf <- data.frame(class=df["class"], > protocol_type=df["protocol_type"],service=df["service"],flag=df["flag"],land=df["land"],logged_in=df["logged_in"],is_host_login=df["is_host_login"],is_guest_login=df["is_guest_login"]) > nbdf[["logged_in"]] <- as.factor(nbdf[["logged_in"]]) > nbdf[["is_guest_login"]] <- as.factor(nbdf[["is_guest_login"]]) > nbdf[["is_host_login"]] <- as.factor(nbdf[["is_host_login"]]) > nbdf[["land"]] <- as.factor(nbdf[["land"]]) > write.table(nbdf, file="D:/nbdf-test.csv", row.names=FALSE, > col.names=FALSE, quote=FALSE, sep="\t") > and put them under nbtest/train and nbtest/test in HDFS > then issue > mahout trainclassifier --input nbtest/train --output nbtest/output > mahout testclassifier --testDir nbtest/test --model nbtest/output > trainclassifier seems succed, but testclassifier failed with this: > [gpadmin@linuxsvr2 mahtest]$ mahout testclassifier --testDir > nbtest/test --model nbtest/output > Running on hadoop, using HADOOP_HOME=/usr/local/hadoop > HADOOP_CONF_DIR=/usr/local/hadoop/conf > MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.6-SNAPSHOT-job.jar > 11/08/15 18:06:20 WARN driver.MahoutDriver: No testclassifier.props > found on classpath, will use command-line arguments only > 11/08/15 18:06:20 INFO bayes.TestClassifier: Loading model from: > {basePath=nbtest/output, classifierType=bayes, alpha_i=1.0, > dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, > defaultCat=unknown, testDirPath=nbtest/test} > 11/08/15 18:06:20 INFO bayes.TestClassifier: Testing Bayes Classifier > 11/08/15 18:06:20 INFO bayes.SequenceFileModelReader: 77319.90481464032 > 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: normal > -213.05542661827678 442.8886516970405 -0.48105867197522617 > 11/08/15 18:06:20 INFO bayes.InMemoryBayesDatastore: anomaly > -442.8886516970405 442.8886516970405 -1.0 > 11/08/15 18:06:20 INFO bayes.TestClassifier: > ======================================================= > Summary > ------------------------------------------------------- > Correctly Classified Instances : 0 锟 > Incorrectly Classified Instances : 0 锟 > Total Classified Instances : 0 > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a b c <--Classified as > 0 0 0 | 0 a = normal > 0 0 0 | 0 b = anomaly > 0 0 0 | 0 c = unknown > Default Category: unknown: 2 > 11/08/15 18:06:20 INFO driver.MahoutDriver: Program took 746 ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira