[ 
https://issues.apache.org/jira/browse/SAMZA-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000319#comment-14000319
 ] 

Yan Fang commented on SAMZA-138:
--------------------------------

Thank you, Chris. That solved my problem and I tested the system successfully.

RB: https://reviews.apache.org/r/21581/
(It is a new RB request because I changed the repository)

1. Created FileReaderSysteamAdmin, SystemConsumer, and SystemFactory. Also 
added unit test for each of them.
2. Convert them into Scala
3. The patch is based on samza-core
4. In SystemAdmin, the oldest, newest and upcoming offests behave as described 
in previous comment. The getOffsetsAfter does not implement as described 
because we actually always store the position of \n in offset, simply +1 is 
sufficient to get the beginning of next newline. (More explaination is in 
javadoc)
5. SystemFactory does not provide Producer.

Thank you!


> System that places specified file contents onto stream
> ------------------------------------------------------
>
>                 Key: SAMZA-138
>                 URL: https://issues.apache.org/jira/browse/SAMZA-138
>             Project: Samza
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: RHELinux 2.6.18-371.4.1.el5
>            Reporter: Jonathan Poltak Samosir
>            Assignee: Yan Fang
>            Priority: Minor
>              Labels: feature, newbie, patch
>         Attachments: FileReaderConsumer.java, FileReaderSystemFactory.java, 
> SAMZA-138.1.patch, SAMZA-138.patch
>
>
> A fairly straightforward Samza System that reads from a specified file, and 
> places that file's contents onto a SystemStreamPartition for use as input for 
> a StreamTask.
> Roughly based off how the hello-samza example project's WikipediaSystem works 
> (more the SystemConsumerFactory rather than SystemConsumer class). 
> Probably needs a bit of work, but basic functionality works as intended. 
> Hopefully useful to some, either as a functioning system or as a base for a 
> more robust and functionally-promising system that you wish to implement.
> Some suggested improvements (not yet implemented):
> * handle reading from multiple files ([suggested alternative input 
> specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
>  point 2)
> * use of filepos for IncomingMessageEnvelope offset ([more info 
> here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
> * come up with a reasonable bounded queue threshold (the value of 100 was 
> arbitrary, as I was unsure of a reasonable value here) 
> * better handling for the exceptions encountered (I wasn't 100% sure about 
> some of them)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to