[jira] [Commented] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-12-09 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842960#comment-13842960
 ] 

Gokhan Capan commented on MAHOUT-1354:
--

Looks like when hadoop-2 profile is activated, this patch fails to apply the 
hadoop-2 related dependencies to integration and examples modules, despite they 
are both dependent to core and core is dependent to hadoop-2. For me, moving 
hadoop dependencies to the root solved the problem, but I think we wouldn't 
want that since hadoop is not a common dependency for all modules of the 
project. 

CC'ing [~frankscholten]

 Mahout Support for Hadoop 2 
 

 Key: MAHOUT-1354
 URL: https://issues.apache.org/jira/browse/MAHOUT-1354
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 1.0

 Attachments: MAHOUT-1354_initial.patch


 Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1265) Add Multilayer Perceptron

2013-12-09 Thread Yexi Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yexi Jiang updated MAHOUT-1265:
---

Attachment: Mahout-1265-11.patch

This is the final version of the patch. It has been reviewed by [~smarthi].

 Add Multilayer Perceptron 
 --

 Key: MAHOUT-1265
 URL: https://issues.apache.org/jira/browse/MAHOUT-1265
 Project: Mahout
  Issue Type: New Feature
Reporter: Yexi Jiang
  Labels: machine_learning, neural_network
 Attachments: Mahout-1265-11.patch, Mahout-1265-6.patch, 
 mahout-1265.patch


 Design of multilayer perceptron
 1. Motivation
 A multilayer perceptron (MLP) is a kind of feed forward artificial neural 
 network, which is a mathematical model inspired by the biological neural 
 network. The multilayer perceptron can be used for various machine learning 
 tasks such as classification and regression. It is helpful if it can be 
 included in mahout.
 2. API
 The design goal of API is to facilitate the usage of MLP for user, and make 
 the implementation detail user transparent.
 The following is an example code of how user uses the MLP.
 -
 //  set the parameters
 double learningRate = 0.5;
 double momentum = 0.1;
 int[] layerSizeArray = new int[] {2, 5, 1};
 String costFuncName = “SquaredError”;
 String squashingFuncName = “Sigmoid”;
 //  the location to store the model, if there is already an existing model at 
 the specified location, MLP will throw exception
 URI modelLocation = ...
 MultilayerPerceptron mlp = new MultiLayerPerceptron(layerSizeArray, 
 modelLocation);
 mlp.setLearningRate(learningRate).setMomentum(momentum).setRegularization(...).setCostFunction(...).setSquashingFunction(...);
 //  the user can also load an existing model with given URI and update the 
 model with new training data, if there is no existing model at the specified 
 location, an exception will be thrown
 /*
 MultilayerPerceptron mlp = new MultiLayerPerceptron(learningRate, 
 regularization, momentum, squashingFuncName, costFuncName, modelLocation);
 */
 URI trainingDataLocation = …
 //  the detail of training is transparent to the user, it may running in a 
 single machine or in a distributed environment
 mlp.train(trainingDataLocation);
 //  user can also train the model with one training instance in stochastic 
 gradient descent way
 Vector trainingInstance = ...
 mlp.train(trainingInstance);
 //  prepare the input feature
 Vector inputFeature …
 //  the semantic meaning of the output result is defined by the user
 //  in general case, the dimension of output vector is 1 for regression and 
 two-class classification
 //  the dimension of output vector is n for n-class classification (n  2)
 Vector outputVector = mlp.output(inputFeature); 
 -
 3. Methodology
 The output calculation can be easily implemented with feed-forward approach. 
 Also, the single machine training is straightforward. The following will 
 describe how to train MLP in distributed way with batch gradient descent. The 
 workflow is illustrated as the below figure.
 https://docs.google.com/drawings/d/1s8hiYKpdrP3epe1BzkrddIfShkxPrqSuQBH0NAawEM4/pub?w=960h=720
 For the distributed training, each training iteration is divided into two 
 steps, the weight update calculation step and the weight update step. The 
 distributed MLP can only be trained in batch-update approach.
 3.1 The partial weight update calculation step:
 This step trains the MLP distributedly. Each task will get a copy of the MLP 
 model, and calculate the weight update with a partition of data.
 Suppose the training error is E(w) = ½ \sigma_{d \in D} cost(t_d, y_d), where 
 D denotes the training set, d denotes a training instance, t_d denotes the 
 class label and y_d denotes the output of the MLP. Also, suppose sigmoid 
 function is used as the squashing function, 
 squared error is used as the cost function, 
 t_i denotes the target value for the ith dimension of the output layer, 
 o_i denotes the actual output for the ith dimension of the output layer, 
 l denotes the learning rate,
 w_{ij} denotes the weight between the jth neuron in previous layer and the 
 ith neuron in the next layer. 
 The weight of each edge is updated as 
 \Delta w_{ij} = l * 1 / m * \delta_j * o_i, 
 where \delta_j = - \sigma_{m} * o_j^{(m)} * (1 - o_j^{(m)}) * (t_j^{(m)} - 
 o_j^{(m)}) for output layer, \delta = - \sigma_{m} * o_j^{(m)} * (1 - 
 o_j^{(m)}) * \sigma_k \delta_k * w_{jk} for hidden layer. 
 It is easy to know that \delta_j can be rewritten as 
 \delta_j = - \sigma_{i = 1}^k \sigma_{m_i} * o_j^{(m_i)} * (1 - o_j^{(m_i)}) 
 * (t_j^{(m_i)} - o_j^{(m_i)})
 The above equation indicates that the \delta_j can be divided into k parts.
 So for the implementation, each mapper can calculate part of \delta_j with 
 

[jira] [Commented] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-12-09 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843226#comment-13843226
 ] 

Gokhan Capan commented on MAHOUT-1354:
--

Yeah, I agree

 Mahout Support for Hadoop 2 
 

 Key: MAHOUT-1354
 URL: https://issues.apache.org/jira/browse/MAHOUT-1354
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 1.0

 Attachments: MAHOUT-1354_initial.patch


 Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Jenkins build is back to normal : Mahout-Examples-Cluster-Reuters-II #689

2013-12-09 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/689/changes



[jira] [Created] (MAHOUT-1375) Apache Mahout

2013-12-09 Thread kaan can (JIRA)
kaan can created MAHOUT-1375:


 Summary: Apache Mahout
 Key: MAHOUT-1375
 URL: https://issues.apache.org/jira/browse/MAHOUT-1375
 Project: Mahout
  Issue Type: Bug
Reporter: kaan can


Hello,
 Firstly, thank you for spending time in read my letter! 
 well,my question is :

1) Which tools are used in Carrot2?
2) Carrot2 is provide suitable for supervised learning or unsupervised?
3) Which preprocessing methods tools in Carrot2?


Kind regards



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-1375) Apache Mahout

2013-12-09 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843651#comment-13843651
 ] 

Suneel Marthi commented on MAHOUT-1375:
---

Is this about Carrot2? This should be discussed on Carrot2 forums then.

 Apache Mahout
 -

 Key: MAHOUT-1375
 URL: https://issues.apache.org/jira/browse/MAHOUT-1375
 Project: Mahout
  Issue Type: Bug
Reporter: kaan can

 Hello,
  Firstly, thank you for spending time in read my letter! 
  well,my question is :
 1) Which tools are used in Carrot2?
 2) Carrot2 is provide suitable for supervised learning or unsupervised?
 3) Which preprocessing methods tools in Carrot2?
 Kind regards



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-1375) Apache Mahout

2013-12-09 Thread kaan can (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843673#comment-13843673
 ] 

kaan can commented on MAHOUT-1375:
--

 sorry, i messed..
1) Which tools are used in Apache Mahout?
2) Apache Mahout is provide suitable for supervised learning or unsupervised?
3) Which preprocessing methods tools in Apache Mahout?



 Apache Mahout
 -

 Key: MAHOUT-1375
 URL: https://issues.apache.org/jira/browse/MAHOUT-1375
 Project: Mahout
  Issue Type: Bug
Reporter: kaan can

 Hello,
  Firstly, thank you for spending time in read my letter! 
  well,my question is :
 1) Which tools are used in Carrot2?
 2) Carrot2 is provide suitable for supervised learning or unsupervised?
 3) Which preprocessing methods tools in Carrot2?
 Kind regards



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAHOUT-1375) Apache Mahout

2013-12-09 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843695#comment-13843695
 ] 

Suneel Marthi commented on MAHOUT-1375:
---

This seems more like a question that should have been posted to the user@ 
mailing list. Please post your question to mailing lists. 

 Apache Mahout
 -

 Key: MAHOUT-1375
 URL: https://issues.apache.org/jira/browse/MAHOUT-1375
 Project: Mahout
  Issue Type: Bug
Reporter: kaan can

 Hello,
  Firstly, thank you for spending time in read my letter! 
  well,my question is :
 1) Which tools are used in Carrot2?
 2) Carrot2 is provide suitable for supervised learning or unsupervised?
 3) Which preprocessing methods tools in Carrot2?
 Kind regards



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1371) Arff loader can misinterprete nominals with integer, real or string

2013-12-09 Thread mansur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mansur updated MAHOUT-1371:
---

Attachment: (was: MAHOUT-1371.patch)

 Arff loader can misinterprete nominals with integer, real or string
 ---

 Key: MAHOUT-1371
 URL: https://issues.apache.org/jira/browse/MAHOUT-1371
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9
 Environment: all
Reporter: mansur
  Labels: ARFF
 Fix For: 0.9

 Attachments: MAHOUT-1371.patch


 If the nominal values contain a value like integer, real or string it will be 
 misinterpreted as such instead of nominal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1371) Arff loader can misinterprete nominals with integer, real or string

2013-12-09 Thread mansur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mansur updated MAHOUT-1371:
---

Attachment: MAHOUT-1371.patch

Unit tests written and passed.

 Arff loader can misinterprete nominals with integer, real or string
 ---

 Key: MAHOUT-1371
 URL: https://issues.apache.org/jira/browse/MAHOUT-1371
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9
 Environment: all
Reporter: mansur
  Labels: ARFF
 Fix For: 0.9

 Attachments: MAHOUT-1371.patch


 If the nominal values contain a value like integer, real or string it will be 
 misinterpreted as such instead of nominal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1371) Arff loader can misinterprete nominals with integer, real or string

2013-12-09 Thread mansur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mansur updated MAHOUT-1371:
---

Status: Patch Available  (was: Open)

 Arff loader can misinterprete nominals with integer, real or string
 ---

 Key: MAHOUT-1371
 URL: https://issues.apache.org/jira/browse/MAHOUT-1371
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9
 Environment: all
Reporter: mansur
  Labels: ARFF
 Fix For: 0.9

 Attachments: MAHOUT-1371.patch


 If the nominal values contain a value like integer, real or string it will be 
 misinterpreted as such instead of nominal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1371) Arff loader can misinterprete nominals with integer, real or string

2013-12-09 Thread mansur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mansur updated MAHOUT-1371:
---

Attachment: (was: MAHOUT-1371.patch)

 Arff loader can misinterprete nominals with integer, real or string
 ---

 Key: MAHOUT-1371
 URL: https://issues.apache.org/jira/browse/MAHOUT-1371
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9
 Environment: all
Reporter: mansur
  Labels: ARFF
 Fix For: 0.9

 Attachments: MAHOUT-1371.patch


 If the nominal values contain a value like integer, real or string it will be 
 misinterpreted as such instead of nominal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAHOUT-1371) Arff loader can misinterprete nominals with integer, real or string

2013-12-09 Thread mansur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mansur updated MAHOUT-1371:
---

Attachment: MAHOUT-1371.patch

 Arff loader can misinterprete nominals with integer, real or string
 ---

 Key: MAHOUT-1371
 URL: https://issues.apache.org/jira/browse/MAHOUT-1371
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.9
 Environment: all
Reporter: mansur
  Labels: ARFF
 Fix For: 0.9

 Attachments: MAHOUT-1371.patch


 If the nominal values contain a value like integer, real or string it will be 
 misinterpreted as such instead of nominal.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAHOUT-1376) when mahout train data, there is Task Id : attempt_201312031842_0751_m_000000_0, Status : FAILED java.lang.IllegalArgumentException

2013-12-09 Thread wangqiaoshi (JIRA)
wangqiaoshi created MAHOUT-1376:
---

 Summary: when mahout train data, there is Task Id : 
attempt_201312031842_0751_m_00_0, Status : FAILED 
java.lang.IllegalArgumentException
 Key: MAHOUT-1376
 URL: https://issues.apache.org/jira/browse/MAHOUT-1376
 Project: Mahout
  Issue Type: Bug
  Components: Classification
Affects Versions: 0.8
 Environment: Hadoop 1.0.3,mahout 0.8
Reporter: wangqiaoshi
 Fix For: 0.8


vm001:/usr/local/hadoop/mahout-distribution-0.8 # ./bin/mahout trainnb -i 
/tmp/mahout-work-root/20news-train-vectors -el -o /tmp/mahout-work-root/model 
-li /tmp/mahout-work-root/labelindex -ow -c
Running on hadoop, using /usr/local/hadoop/hadoop-0.20.2/bin/hadoop and 
HADOOP_CONF_DIR=
MAHOUT-JOB: 
/usr/local/hadoop/mahout-distribution-0.8/mahout-examples-0.8-job.jar
13/12/10 10:29:56 WARN driver.MahoutDriver: No trainnb.props found on 
classpath, will use command-line arguments only
13/12/10 10:29:56 INFO common.AbstractJob: Command line arguments: 
{--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, 
--input=[/tmp/mahout-work-root/20news-train-vectors], 
--labelIndex=[/tmp/mahout-work-root/labelindex], 
--output=[/tmp/mahout-work-root/model], --overwrite=null, --startPhase=[0], 
--tempDir=[temp], --trainComplementary=null}
13/12/10 10:29:56 INFO common.HadoopUtil: Deleting temp
13/12/10 10:29:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/10 10:29:57 INFO zlib.ZlibFactory: Successfully loaded  initialized 
native-zlib library
13/12/10 10:29:57 INFO compress.CodecPool: Got brand-new decompressor
13/12/10 10:30:00 INFO input.FileInputFormat: Total input paths to process : 1
13/12/10 10:30:01 INFO mapred.JobClient: Running job: job_201312031842_0750
13/12/10 10:30:02 INFO mapred.JobClient:  map 0% reduce 0%
13/12/10 10:30:18 INFO mapred.JobClient:  map 100% reduce 0%
13/12/10 10:30:30 INFO mapred.JobClient:  map 100% reduce 100%
13/12/10 10:30:35 INFO mapred.JobClient: Job complete: job_201312031842_0750
13/12/10 10:30:35 INFO mapred.JobClient: Counters: 29
13/12/10 10:30:35 INFO mapred.JobClient:   Job Counters 
13/12/10 10:30:35 INFO mapred.JobClient: Launched reduce tasks=1
13/12/10 10:30:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12445
13/12/10 10:30:35 INFO mapred.JobClient: Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/12/10 10:30:35 INFO mapred.JobClient: Total time spent by all maps 
waiting after reserving slots (ms)=0
13/12/10 10:30:35 INFO mapred.JobClient: Rack-local map tasks=1
13/12/10 10:30:35 INFO mapred.JobClient: Launched map tasks=1
13/12/10 10:30:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10355
13/12/10 10:30:35 INFO mapred.JobClient:   File Output Format Counters 
13/12/10 10:30:35 INFO mapred.JobClient: Bytes Written=97
13/12/10 10:30:35 INFO mapred.JobClient:   FileSystemCounters
13/12/10 10:30:35 INFO mapred.JobClient: FILE_BYTES_READ=119
13/12/10 10:30:35 INFO mapred.JobClient: HDFS_BYTES_READ=270
13/12/10 10:30:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45827
13/12/10 10:30:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97
13/12/10 10:30:35 INFO mapred.JobClient:   File Input Format Counters 
13/12/10 10:30:35 INFO mapred.JobClient: Bytes Read=133
13/12/10 10:30:35 INFO mapred.JobClient:   Map-Reduce Framework
13/12/10 10:30:35 INFO mapred.JobClient: Map output materialized bytes=14
13/12/10 10:30:35 INFO mapred.JobClient: Map input records=0
13/12/10 10:30:35 INFO mapred.JobClient: Reduce shuffle bytes=0
13/12/10 10:30:35 INFO mapred.JobClient: Spilled Records=0
13/12/10 10:30:35 INFO mapred.JobClient: Map output bytes=0
13/12/10 10:30:35 INFO mapred.JobClient: CPU time spent (ms)=2080
13/12/10 10:30:35 INFO mapred.JobClient: Total committed heap usage 
(bytes)=1016594432
13/12/10 10:30:35 INFO mapred.JobClient: Combine input records=0
13/12/10 10:30:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=137
13/12/10 10:30:35 INFO mapred.JobClient: Reduce input records=0
13/12/10 10:30:35 INFO mapred.JobClient: Reduce input groups=0
13/12/10 10:30:35 INFO mapred.JobClient: Combine output records=0
13/12/10 10:30:35 INFO mapred.JobClient: Physical memory (bytes) 
snapshot=313008128
13/12/10 10:30:35 INFO mapred.JobClient: Reduce output records=0
13/12/10 10:30:35 INFO mapred.JobClient: Virtual memory (bytes) 
snapshot=2980098048
13/12/10 10:30:35 INFO mapred.JobClient: Map output records=0
13/12/10 10:30:38 INFO input.FileInputFormat: Total input paths to process : 1
13/12/10 10:30:38 INFO mapred.JobClient: Running job: job_201312031842_0751
13/12/10 10:30:39 INFO mapred.JobClient:  map 0% reduce 0%
13/12/10 10:30:55 INFO mapred.JobClient: Task Id : 
attempt_201312031842_0751_m_00_0, Status : FAILED
java.lang.IllegalArgumentException
at