[
https://issues.apache.org/jira/browse/SAMZA-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986286#comment-13986286
]
Yan Fang commented on SAMZA-138:
--------------------------------
Hi Chris,
Thanks a lot for the reviewing. for (1), totally agree we should use file
offset instead of line. for (2), I have two ways of implementing:
* old fashion: keep comparing file pointer and the file size. If the file size
is bigger than file pointer, read new records.
* java 7: use WatchService to monitor the file changes. If there is a change,
check new lines if possible.
I prefer to use second approach. What's your opinion according to your
experience? Does Scala has other approaches?
{quote}
Most local log files can be read using this reader.
{quote}
This reminds me of SAMZA-200 . MySQL log is just a simple log file. We may be
able to solve that ticket by modifying this reader a little.
For the offset stuff, I asked the similar question in RB. Still not sure how I
can get newest offset and upcoming offset from a file. Should I read the file
in the FilereaderSystemAdmin class to get the end position of the file or
create a kafka consumer to get the latest offset from checkpoint topic?
Thanks.
> System that places specified file contents onto stream
> ------------------------------------------------------
>
> Key: SAMZA-138
> URL: https://issues.apache.org/jira/browse/SAMZA-138
> Project: Samza
> Issue Type: New Feature
> Affects Versions: 0.7.0
> Environment: RHELinux 2.6.18-371.4.1.el5
> Reporter: Jonathan Poltak Samosir
> Assignee: Yan Fang
> Priority: Minor
> Labels: feature, newbie, patch
> Attachments: FileReaderConsumer.java, FileReaderSystemFactory.java,
> SAMZA-138.patch
>
>
> A fairly straightforward Samza System that reads from a specified file, and
> places that file's contents onto a SystemStreamPartition for use as input for
> a StreamTask.
> Roughly based off how the hello-samza example project's WikipediaSystem works
> (more the SystemConsumerFactory rather than SystemConsumer class).
> Probably needs a bit of work, but basic functionality works as intended.
> Hopefully useful to some, either as a functioning system or as a base for a
> more robust and functionally-promising system that you wish to implement.
> Some suggested improvements (not yet implemented):
> * handle reading from multiple files ([suggested alternative input
> specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]-
> point 2)
> * use of filepos for IncomingMessageEnvelope offset ([more info
> here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
> * come up with a reasonable bounded queue threshold (the value of 100 was
> arbitrary, as I was unsure of a reasonable value here)
> * better handling for the exceptions encountered (I wasn't 100% sure about
> some of them)
--
This message was sent by Atlassian JIRA
(v6.2#6252)