[ 
https://issues.apache.org/jira/browse/SAMZA-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998435#comment-13998435
 ] 

Yan Fang commented on SAMZA-138:
--------------------------------

Hey guys, I have done the code. But when I was doing integrated testing, I ran 
into a problem for metadata:
If I do as [~criccomini] said, 
[quote]
In the file reader's case, for a non-empty file, the oldest offset will always 
be 0, the newest offset will always be the offset immediately after the second 
to last newline, and the upcoming offset will always be the offset immediately 
after the last newline.
[quote]
When it is the first time I run the system, because lastProcessedOffsets in 
OffsetManager is empty, loadStartingOffsets method (OffsetManager line 166) 
will not get anything, then loadDefaults (line 167) will be called, which in 
term gets the metadata from getSystemStreamMetadata. Then the upcoming offset 
will always be the end of the file. As a result, the system will not read any 
existing content from the file.

Any thoughts? Or I missunderstood something?

> System that places specified file contents onto stream
> ------------------------------------------------------
>
>                 Key: SAMZA-138
>                 URL: https://issues.apache.org/jira/browse/SAMZA-138
>             Project: Samza
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: RHELinux 2.6.18-371.4.1.el5
>            Reporter: Jonathan Poltak Samosir
>            Assignee: Yan Fang
>            Priority: Minor
>              Labels: feature, newbie, patch
>         Attachments: FileReaderConsumer.java, FileReaderSystemFactory.java, 
> SAMZA-138.patch
>
>
> A fairly straightforward Samza System that reads from a specified file, and 
> places that file's contents onto a SystemStreamPartition for use as input for 
> a StreamTask.
> Roughly based off how the hello-samza example project's WikipediaSystem works 
> (more the SystemConsumerFactory rather than SystemConsumer class). 
> Probably needs a bit of work, but basic functionality works as intended. 
> Hopefully useful to some, either as a functioning system or as a base for a 
> more robust and functionally-promising system that you wish to implement.
> Some suggested improvements (not yet implemented):
> * handle reading from multiple files ([suggested alternative input 
> specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
>  point 2)
> * use of filepos for IncomingMessageEnvelope offset ([more info 
> here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
> * come up with a reasonable bounded queue threshold (the value of 100 was 
> arbitrary, as I was unsure of a reasonable value here) 
> * better handling for the exceptions encountered (I wasn't 100% sure about 
> some of them)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to