> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java,
> >  line 107
> > <https://reviews.apache.org/r/31643/diff/1/?file=882380#file882380line107>
> >
> >     Can we use InputStream instead of StreamCreator? It makes it easy to 
> > read the code and perhaps this class doen't need to know about how the 
> > stream is created and can work directly on InputStream. The using class can 
> > hide the details on how the stream is created.

The streamCreator is needed in this class, because the seek() and 
markPosition() uses it to create a new stream, if the request to seek is to a 
position before the buffering we have. markPosition() also uses it to mark a 
position that is way before what we have buffered up.


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java,
> >  line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line23>
> >
> >     Would be worth adding an init() or start() API, where the 
> > initialization can take place like for Zookeeper store, connection can be 
> > initialized.

Can we defer this to after pulling in the first commit, and may be track it by 
a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java,
> >  line 30
> > <https://reviews.apache.org/r/31643/diff/1/?file=882388#file882388line30>
> >
> >     Would be worth adding null checks in the base class and expose mark 
> > protected abstract API's to do the real job

Can we defer this to after pulling in the first commit, and if needed track it 
by a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java,
> >  line 23
> > <https://reviews.apache.org/r/31643/diff/1/?file=882389#file882389line23>
> >
> >     enum style factory that we use across Flume is recommended. This would 
> > enable supporting a User supplied custom backing store

Can we defer this to after pulling in the first commit, and if needed track it 
by a seperate Jira ?


> On March 15, 2015, 7:40 a.m., Ashish Paliwal wrote:
> > flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java,
> >  line 146
> > <https://reviews.apache.org/r/31643/diff/1/?file=882390#file882390line146>
> >
> >     Do we still need this? Should't it be abstracted with MetadataStore

Currently the MetadataStore handles just the metadata info on the objects 
processed so far. It does make sense to store the position tracker as part of 
the MetaDataBackingStore.
Can we defer this to after pulling in the first commit, and if needed track it 
by a seperate Jira ?


- Johny Rufus


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31643/#review76501
-----------------------------------------------------------


On March 2, 2015, 10:26 p.m., Johny Rufus John wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31643/
> -----------------------------------------------------------
> 
> (Updated March 2, 2015, 10:26 p.m.)
> 
> 
> Review request for Flume.
> 
> 
> Bugs: FLUME-2437
>     https://issues.apache.org/jira/browse/FLUME-2437
> 
> 
> Repository: flume-git
> 
> 
> Description
> -------
> 
> S3 Source, initial version
> 
> 
> Diffs
> -----
> 
>   
> flume-ng-core/src/main/java/org/apache/flume/serialization/FileStreamCreator.java
>  PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableGenericInputStream.java
>  PRE-CREATION 
>   
> flume-ng-core/src/main/java/org/apache/flume/serialization/StreamCreator.java 
> PRE-CREATION 
>   
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
>  d1240fb 
>   
> flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableGenericInputStream.java
>  PRE-CREATION 
>   flume-ng-dist/pom.xml a083fe2 
>   flume-ng-sources/flume-s3-source/pom.xml PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/FileBasedMetadataBackingStore.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/InMemoryMetadataBackingStore.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStore.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/MetadataBackingStoreFactory.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3ObjectEventReader.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3Source.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3SourceConfigurationConstants.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/main/java/org/apache/flume/source/s3/S3StreamCreator.java
>  PRE-CREATION 
>   
> flume-ng-sources/flume-s3-source/src/test/java/org/apache/flume/source/s3/TestS3Source.java
>  PRE-CREATION 
>   flume-ng-sources/pom.xml ab8eca4 
>   pom.xml ea7ffe3 
> 
> Diff: https://reviews.apache.org/r/31643/diff/
> 
> 
> Testing
> -------
> 
> TestResettableGenericInputStream and 
> TestS3Source 
> 
> Manual testing of scenarios:
> 1. Created multiple files in S3 Bucket - make sure the source processes all 
> the files
> 2. Add more files, after the S3 source starts - make sure the newly created 
> S3 objects are processed
> 3. Stop the source after a few files are processed - make sure on restart, 
> the source only processes the rest of the unprocessed files
> 4. Stop the source in the middle of processing a file - make sure the postion 
> tracker is read on re-start and processing of the partial file continues from 
> where it was last marked
> 
> 
> Thanks,
> 
> Johny Rufus John
> 
>

Reply via email to