[ 
https://issues.apache.org/jira/browse/SAMZA-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Poltak Samosir updated SAMZA-138:
------------------------------------------

    Description: 
A fairly straightforward Samza System that reads from a specified file, and 
places that file's contents onto a SystemStreamPartition for use as input for a 
StreamTask.

Roughly based off how the hello-samza example project's WikipediaSystem works 
(more the SystemConsumerFactory rather than SystemConsumer class). 

Probably needs a bit of work, but basic functionality works as intended. 
Hopefully useful to some, either as a functioning system or as a base for a 
more robust and functionally-promising system that you wish to implement.

Some suggested improvements (not yet implemented):
* handle reading from multiple files ([suggested alternative input 
specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
 point 2)
* use of filepos for IncomingMessageEnvelope offset ([more info 
here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
* come up with a reasonable bounded queue threshold (the value of 100 was 
arbitrary, as I was unsure of a reasonable value here) 
* better handling for the exceptions encountered (I wasn't 100% sure about some 
of them)

  was:
A fairly straightforward Samza System that reads from a specified file, and 
places that file's contents onto a SystemStreamPartition for use as input for a 
StreamTask.

Roughly based off how the hello-samza example project's WikipediaSystem works 
(more the SystemConsumerFactory rather than SystemConsumer class). 

Probably needs a bit of work, but basic functionality works as intended. 
Hopefully useful to some, either as a functioning system or as a base for a 
more robust and functionally-promising system that you wish to implement.

Some suggested improvements (not yet implemented):
* handle reading from multiple files ([suggested alternative input 
specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
 point 2)
* use of filepos for IncomingMessageEnvelope offset ([more info 
here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
* come up with a reasonable bounded queue threshold (the value of 100 was 
arbitrary, as I was unsure of a reasonable value here) 


> System that places specified file contents onto stream
> ------------------------------------------------------
>
>                 Key: SAMZA-138
>                 URL: https://issues.apache.org/jira/browse/SAMZA-138
>             Project: Samza
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: RHELinux 2.6.18-371.4.1.el5
>            Reporter: Jonathan Poltak Samosir
>            Priority: Minor
>              Labels: feature, newbie, patch
>
> A fairly straightforward Samza System that reads from a specified file, and 
> places that file's contents onto a SystemStreamPartition for use as input for 
> a StreamTask.
> Roughly based off how the hello-samza example project's WikipediaSystem works 
> (more the SystemConsumerFactory rather than SystemConsumer class). 
> Probably needs a bit of work, but basic functionality works as intended. 
> Hopefully useful to some, either as a functioning system or as a base for a 
> more robust and functionally-promising system that you wish to implement.
> Some suggested improvements (not yet implemented):
> * handle reading from multiple files ([suggested alternative input 
> specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
>  point 2)
> * use of filepos for IncomingMessageEnvelope offset ([more info 
> here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
> * come up with a reasonable bounded queue threshold (the value of 100 was 
> arbitrary, as I was unsure of a reasonable value here) 
> * better handling for the exceptions encountered (I wasn't 100% sure about 
> some of them)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to