[
https://issues.apache.org/jira/browse/SAMOA-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182115#comment-15182115
]
ASF GitHub Bot commented on SAMOA-58:
-------------------------------------
Github user gdfm commented on a diff in the pull request:
https://github.com/apache/incubator-samoa/pull/48#discussion_r55137139
--- Diff:
samoa-instances/src/main/java/org/apache/samoa/instances/AvroLoader.java ---
@@ -254,12 +257,30 @@ protected InstanceInformation getHeader() {
List<String> attributeLabels = attributeSchema.getEnumSymbols();
attributes.add(new Attribute(field.name(), attributeLabels));
}
- else
+ else if (isNumeric(field))
--- End diff --
I think the idea here is that if we don't know better, we treat the
attribute as numeric.
With this change, we ignore attributes we don't know how to treat.
Is my understanding correct?
> Samoa AvroFileStream from HDFSFileStreamSource stops at end of first file
> -------------------------------------------------------------------------
>
> Key: SAMOA-58
> URL: https://issues.apache.org/jira/browse/SAMOA-58
> Project: SAMOA
> Issue Type: Bug
> Components: SAMOA-Instances
> Environment: RHEL 6.6, java 1.8.0_72
> Reporter: Edi Bice
>
> It appears Samoa is capable of streaming a collection of files as a single
> stream effectively concatenating the files. However using Samoa
> AvroFileStream from HDFSFileStreamSource seems the stream stops at end of
> first file:
> bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
> "PrequentialEvaluation -i -1 -l (classifiers.ensemble.Bagging -s 100) -s
> (AvroFileStream -s HDFSFileStreamSource -f
> /tmp/order_and_feats_flat_avro/2016_02_18/ -c 1 -e binary) -f 10000"
> 2016-02-18 20:43:20,991 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:183)
> - last event is received!
> 2016-02-18 20:43:20,991 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:184)
> - total count: 262144
> ...
> 2016-02-18 20:43:20,993 [main] INFO
> org.apache.samoa.evaluation.EvaluatorProcessor (EvaluatorProcessor.java:191)
> - total evaluation time: 34 seconds for 262144 instances
> bash-4.1$ hadoop fs -ls /tmp/order_and_feats_flat_avro/2016_02_18 | more
> Found 70 items
> -rw-r--r-- 3 yarn hdfs 230855335 2016-02-18 16:01
> /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00001
> -rw-r--r-- 3 yarn hdfs 229800273 2016-02-18 16:04
> /tmp/order_and_feats_flat_avro/2016_02_18/hdfs-1a238673-c4ec-4462-be67-78d573efa790-00002
> ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)