GitHub user edi-bice opened a pull request:
https://github.com/apache/incubator-samoa/pull/47
Changes required to work on newer versions of Samza (0.10) and YARN (2.6.0)
One important issue remains however. It seems to not pick up the
yarn-site.xml resourcemanager.address and defaults to 0.0.0.0:8032 which is not
the case for my cluster so connection times out.
bash-4.1$ YARN_HOME=/usr/hdp/current/hadoop-yarn-client/etc/hadoop
bin/samoa samza target/SAMOA-Samza-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -l classifiers.ensemble.Bagging -s (ArffFileStream -f
covtypeNorm.arff) -f 1000"
bin/samoa
Deploying to SAMZA
Command line string = PrequentialEvaluation -l
classifiers.ensemble.Bagging -s (ArffFileStream -f covtypeNorm.arff) -f 1000
2016-01-22 21:49:16,860 [main] INFO org.apache.samoa.SamzaDoTask
(SamzaDoTask.java:103) - Sucessfully instantiating
org.apache.samoa.tasks.PrequentialEvaluation
2016-01-22 21:49:17,023 [main] INFO org.apache.samoa.utils.SystemsUtils
(SystemsUtils.java:114) - Hadoop config
home:/usr/hdp/current/hadoop-yarn-client/etc/hadoop
2016-01-22 21:49:17,338 [main] INFO org.apache.samoa.utils.SystemsUtils
(SystemsUtils.java:172) - Filesystem name:hdfs://lvsdevfrsa01.gspt.net:8020
2016-01-22 21:49:21,684 [main] INFO org.apache.samoa.utils.SystemsUtils
(SystemsUtils.java:114) - Hadoop config
home:/usr/hdp/current/hadoop-yarn-client/etc/hadoop
2016-01-22 21:49:21,924 [main] INFO org.apache.samoa.utils.SystemsUtils
(SystemsUtils.java:172) - Filesystem name:hdfs://lvsdevfrsa01.gspt.net:8020
2016-01-22 21:49:23,689 [main] INFO
org.apache.samoa.topology.impl.SamzaEngine (SamzaEngine.java:106) -
Config:{yarn.container.count=1, yarn.container.memory.mb=1024,
yarn.config.home=/usr/hdp/current/hadoop-yarn-client/etc/hadoop,
systems.kafka0.consumer.zookeeper.connect=lvsdevfrsa01.gspt.net:2181,
serializers.registry.kryo.class=org.apache.samoa.utils.SamzaKryoSerdeFactory,
systems.kafka0.producer.bootstrap.servers=lvsdevfrsa01.gspt.net:6667,lvsdevfrsa02.gspt.net:6667,lvsdevfrsa03.gspt.net:6667,
systems.samoa.samza.factory=org.apache.samoa.topology.impl.SamoaSystemFactory,
kryo.register=org.apache.samoa.learners.classifiers.rules.common.Perceptron:org.apache.samoa.learners.classifiers.rules.common.Perceptron?PerceptronSerializer,org.apache.samoa.learners.classifiers.trees.ComputeContentEvent:org.apache.samoa.learners.classifiers.trees.ComputeContentEvent?ComputeCEFullPrecSerializer,org.apache.samoa.moa.classifiers.core.AttributeSplitSuggestion:org.apache.samoa.utils.Serializa
bleSerializer,org.apache.samoa.learners.classifiers.trees.AttributeContentEvent:org.apache.samoa.learners.classifiers.trees.AttributeContentEvent?AttributeCEFullPrecSerializer,org.apache.samoa.learners.classifiers.rules.common.TargetMean:org.apache.samoa.learners.classifiers.rules.common.TargetMean?TargetMeanSerializer,
systems.kafka0.producer.producer.type=sync, yarn.am.container.memory.mb=1024,
job.coordinator.replication.factor=2, task.processor.filesystem=hdfs,
job.name=Prequential_20160122214916-0,
task.processor.file=hdfs://lvsdevfrsa01.gspt.net:8020/user/ebice/.samoa/dat/Prequential_20160122214916.dat,
job.coordinator.system=kafka0, systems.kafka0.samza.msg.serde=kryo,
task.inputs=samoa.Prequential_20160122214916-0,
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory,
yarn.package.path=hdfs://lvsdevfrsa01.gspt.net:8020/user/ebice/.samoa/SAMOA-Samza-0.4.0-incubating-SNAPSHOT.jar,
systems.kafka0.producer.batch.num.messages=1,
task.class=org.apache.samoa.topology.impl.Sam
zaEntranceProcessingItem, systems.kafka0.samza.offset.default=oldest,
task.opts=-Xmx768M -XX:+PrintGCDateStamps,
systems.kafka0.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory}
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/edi-bice/incubator-samoa master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-samoa/pull/47.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #47
----
commit 6d57f0409b1a4bd909c768ae36541eff77c9bf82
Author: edi_bice <[email protected]>
Date: 2016-01-21T16:31:03Z
See this for details
http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file
commit 3ff4d05e9f027254d8e891695cc0d723245b5c58
Author: edi_bice <[email protected]>
Date: 2016-01-21T20:51:13Z
Upgraded to Samza 0.10
commit 956b01645b0b5d5b39ebce2f98b75df5024b7ecb
Author: edi_bice <[email protected]>
Date: 2016-01-21T21:02:14Z
Job coordinator config required
commit c3a9dcd348cf4e2447c4c04e267d88524aa798d7
Author: edi_bice <[email protected]>
Date: 2016-01-21T21:11:07Z
still complaining about missing coordinator config
commit a309024392302cc6120c57f17c314920a41d2828
Author: edi_bice <[email protected]>
Date: 2016-01-21T21:17:00Z
key was changed between 0.7 and 0.10
commit b82726d6c01d3f098f43ec60525e635c0c9ac1e3
Author: edi_bice <[email protected]>
Date: 2016-01-22T21:44:21Z
key was changed between 0.7 and 0.10
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---