----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20869/#review41856 -----------------------------------------------------------
1. Stylistic: white space should be spaces, not tabs. 2. Can we put this in the main code base, rather than in hello-samza? Hello-samza can still use it, but this seems like a really useful consumer. I think samza-core is appropriate for it. 3. If we do (2), would you feel comfortable converting this to Scala? Happy to provide Scala guidance, if you need it. 4. Need tests. 5. I see that your implementation right now treats an offset as a line (e.g. 0 means the first line, 1 means the second, etc). This makes things a lot easier to implement internally, but means we have to read the entire file every time we start up. If we use the actual file offset for the \n location, we can do an fseek, which is much faster. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java <https://reviews.apache.org/r/20869/#comment75498> Javadocs here would be useful. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java <https://reviews.apache.org/r/20869/#comment75497> Some java docs describing behavior would be good here. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java <https://reviews.apache.org/r/20869/#comment75496> This is not quite correct. Rather than use a static FilePartitionMetadata and hard code made-up offsets, you'll need to uniquely determine the oldest, newest, and upcoming offsets for every stream (file). The oldest offset will always be 0. The newest offset will always be the offset of second-to-last \n in the file + 1. The upcoming offset will be the offset of the last \n in the file + 1. msg1\n <-- oldest offset msg2\n <-- newest offset msg3\n <-- upcoming offset not-yet-complete-msg-that-has-no-newline-after-it-yet samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java <https://reviews.apache.org/r/20869/#comment75499> Javadocs here would be useful. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java <https://reviews.apache.org/r/20869/#comment75500> You can't add 1 here. You have to find the location of the next newline in the file after the supplied offset. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75501> Stylistic: keep the ASF header 80 char aligned. Easiest way is to just copy from an existing file: /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75502> Javadocs. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75507> Stylistic: camel case in variable names. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75508> Do you need metricsRegistry? If so, should be Registry, not REgistry. samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75510> Keep track of these threads so we can shut them down in stop(). samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java <https://reviews.apache.org/r/20869/#comment75509> Stop all reader threads that were created in start. - Chris Riccomini On April 30, 2014, 12:42 a.m., Yan Fang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20869/ > ----------------------------------------------------------- > > (Updated April 30, 2014, 12:42 a.m.) > > > Review request for samza. > > > Repository: samza-hello-samza > > > Description > ------- > > a filereader system which reads specified file contents onto stream. > > 1. FileReaderSystemAdmin implements SystemAdmin and is similar to > SinglePartitionWithoutOffsetsSystemAdmin except it updates offset. > 2. FileReaderSystemFactory implements SystemFactory. It throws a Samza > exception for getProducer because this system does not suppose to write to > files. > 3. FileReaderSystemConsumer gets file path from stream names, creates threads > for each file and then read files line-by-line. It skips certain lines when > its startingOffset is not zero, meaning it has checkpoint. > > > Diffs > ----- > > > samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemAdmin.java > PRE-CREATION > > samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemConsumer.java > PRE-CREATION > > samza-wikipedia/src/main/java/samza/examples/filereader/system/FileReaderSystemFactory.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/20869/diff/ > > > Testing > ------- > > > Thanks, > > Yan Fang > >
