GitHub user edi-bice opened a pull request:
https://github.com/apache/incubator-samoa/pull/48
Patch for SAMOA-58 (Samoa AvroFileStream from HDFSFileStreamSource stops at
end of first file)
FileStreamSource seemed to support multiple files but during my testing it
turned out otherwise - Samoa AvroFileStream from HDFSFileStreamSource stops at
end of first file. I had to change AvroFileStream, ArffFileStream and their
parent FileStream in order to make this work.
See following JIRA for additional detail:
https://issues.apache.org/jira/browse/SAMOA-58
Additionally, I modified bin/samoa, pom.xml, SystemUtils (as well as added
a resource) to fix reading from HDFS on my cluster.
A seemingly unrelated change is the explicit test for supported Avro types
so as to filter out any fields that are not supported instead of assuming all
non-nominal (non-enum) fields are numeric and failing during reading.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/edi-bice/incubator-samoa master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-samoa/pull/48.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #48
----
commit 5cbbcfab94db47732ab44b3b9d752c45f02e2f30
Author: edi_bice <[email protected]>
Date: 2016-02-17T15:45:07Z
Only add fields of supported types (double, float, long, int and enum)
rather than adding and defaulting all non-enum to numeric and failing at value
parse time
commit d5a055f5c5ff0c6787beaa03234375cdcbb89cb5
Author: edi_bice <[email protected]>
Date: 2016-02-17T21:53:02Z
until we change samza to produce files with .avro extension
commit ba73bb24d9477207e8dfd85fbf478be1e3877c7d
Author: edi_bice <[email protected]>
Date: 2016-02-18T22:06:12Z
A tentative solution to issue described in:
https://issues.apache.org/jira/browse/SAMOA-58
commit 29e0379949eb7847ea46bfe432d98d90dff993e9
Author: edi_bice <[email protected]>
Date: 2016-02-19T16:55:03Z
Issue described in https://issues.apache.org/jira/browse/SAMOA-58 was
apparently more complicated than what was expected in previous commit. While we
did succeed in replacing the first exhausted file stream with a new one, the
loader was not changed and would return null. This rework of AvroFileStream,
FileStream and ArffFileStream hopefully cleans things up a bit and allows
multi-file streams of either (Avro or Arff) type.
commit fe093240a248e26be84ded4d378acc1d5c81d599
Author: edi_bice <[email protected]>
Date: 2016-01-25T17:02:22Z
configure don't code
commit 99f04bb4396190e92af2a43e56d005cb502357ca
Author: Edi Bice <[email protected]>
Date: 2016-02-22T14:25:43Z
cherry-picked from faf branch - changes needed to be able to read from HDFS
on a YARN 2.7.1 cluster
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---