Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated Aug. 6, 2015, 11:23 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs (updated) - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION docs/learn/documentation/versioned/index.html 85209bd docs/learn/documentation/versioned/jobs/configuration-table.html 926fd50 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-6) should be good to go. Thanks again for the execellent reviews and prompt responses, your time is appreciated! Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 31, 2015, 8:40 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION docs/learn/documentation/versioned/index.html 85209bd docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing (updated) --- Updated: See JIRA SAMZA-693 for details, this latest update (693-6) should be good to go. Thanks again for the execellent reviews and prompt responses, your time is appreciated! Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 31, 2015, 7:53 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs (updated) - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION docs/learn/documentation/versioned/index.html 85209bd docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 29, 2015, 7:41 p.m., Navina Ramesh wrote: docs/learn/documentation/versioned/hdfs/producer.md, line 24 https://reviews.apache.org/r/35445/diff/4/?file=1023362#file1023362line24 Can you please update the list of available configuration for this system at http://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html as well? Like we did with elastic search, adding an example job to hello-samza will be very useful for adoption. If not in this JIRA, please consider opening a follow-up JIRA to add this example. Yeah I can do that. I think adding an example to hello-samza is a great idea, but I agree I would make that a separate ticket and then I'd claim that ticket. On July 29, 2015, 7:41 p.m., Navina Ramesh wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala, line 37 https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37 What is the motivation for having 2 sequenceFileHdfsWriters? Is the only difference in the data type being written to the fs? The difference is not clear from the class's javadoc or the example configuration provided. The reason I ask is we already have a config overload in Samza. I would rather not have the user configure a writer type, if we can safely default to a single type and only override the relevant diverging properties. Functionally, I believe they are doing the same thing. I have worked on another Apache project where introducing Writable type params all over the place has painted us into some corners over time, so I wanted to try it this way first. I could definitely rework the patch so that the current base class SequenceFileHdfsWriter takes [K, V] type params and handles data types that way if you like? - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93501 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 29, 2015, 7:41 p.m., Navina Ramesh wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala, line 37 https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37 What is the motivation for having 2 sequenceFileHdfsWriters? Is the only difference in the data type being written to the fs? The difference is not clear from the class's javadoc or the example configuration provided. The reason I ask is we already have a config overload in Samza. I would rather not have the user configure a writer type, if we can safely default to a single type and only override the relevant diverging properties. Functionally, I believe they are doing the same thing. Eli Reisman wrote: I have worked on another Apache project where introducing Writable type params all over the place has painted us into some corners over time, so I wanted to try it this way first. I could definitely rework the patch so that the current base class SequenceFileHdfsWriter takes [K, V] type params and handles data types that way if you like? Oh and thanks once again for the prompt review, you folks are on top of it! - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93501 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 30, 2015, 6:59 p.m., Yan Fang wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala, line 37 https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37 I would prefer the param idea because 1) Samza is already using this fashion 2) less code especially when there are more SequenceFileHdfsWriter come out (LongWritable, etc) like the casting of the outgoing message to something not-Writable like Array[Byte] or String might require a third param and it might start to get awkward -- We can always cast the outgoing msg to Array[Byte] using the serde defined for this msg. So as long as the Wriable accepts Array[Byte], this should be fine. Also there are some Writable types that would not allow us to determine message size for batching purposes the way -- I think we can either give it a default size (this can be configurable) when there is not getLength method or use a subclass. Either way will be fine. I definitely agree on the less code point, and I think we can move functions like the compression selection to the base class. But, I don't think we can't just cast to Array[Byte] for all the Writable types to accept the message, even from the serde. Only Text and BytesWritable will accept Array[Byte] messages, so we will be limited to just those two types forever if we are only using that cast on the outgoing message before wrapping it in the Writable. If that works (i.e. messages will never be FloatWritable, LongWritable etc.) then generics will work there. But the getLength issue still presents a problem. We already have a configuration to set a batch size default or user-defined one, but getLength is called per-message-write, and it's how we track how big the current file is. We won't know when to split or when we hit that configured size without tracking it. Each Writable will need slightly different logic to pick up or estimate message size, they don't all supply a getLength call for byte size. So again that seems to force us to only work with BytesWritable and Text value types? If I'm completely missing something here please let me know and we can make the desire changes. Thanks for the input! - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93614 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 30, 2015, 6:59 p.m., Yan Fang wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala, line 40 https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line40 My overall concern here is that, if there are more than one tasks are running, is it possible that all the tasks are writing to one file at the same time? I don't think so, each registered source should be using it's own HdfsWriter in write() calls even on the same Producer and the filenames per writer are unique-ified in the writer impl. There are other ways to accomplish that uniqueness though. - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93614 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 30, 2015, 6:59 p.m., Yan Fang wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala, line 55 https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line55 this will very possibly never be executed, because line 53 already makes currentDateTime == dateTime. Some version of that in our internal patch works but maybe I broke it here I'll take a look, thanks - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93614 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 30, 2015, 6:59 p.m., Yan Fang wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala, line 40 https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line40 My overall concern here is that, if there are more than one tasks are running, is it possible that all the tasks are writing to one file at the same time? Eli Reisman wrote: I don't think so, each registered source should be using it's own HdfsWriter in write() calls even on the same Producer and the filenames per writer are unique-ified in the writer impl. There are other ways to accomplish that uniqueness though. I see. We are using the UUID.randomUUID to make sure the writers writes to different files. This is fine unless we win the lottery. :) On July 30, 2015, 6:59 p.m., Yan Fang wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala, line 37 https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37 I would prefer the param idea because 1) Samza is already using this fashion 2) less code especially when there are more SequenceFileHdfsWriter come out (LongWritable, etc) like the casting of the outgoing message to something not-Writable like Array[Byte] or String might require a third param and it might start to get awkward -- We can always cast the outgoing msg to Array[Byte] using the serde defined for this msg. So as long as the Wriable accepts Array[Byte], this should be fine. Also there are some Writable types that would not allow us to determine message size for batching purposes the way -- I think we can either give it a default size (this can be configurable) when there is not getLength method or use a subclass. Either way will be fine. Eli Reisman wrote: I definitely agree on the less code point, and I think we can move functions like the compression selection to the base class. But, I don't think we can't just cast to Array[Byte] for all the Writable types to accept the message, even from the serde. Only Text and BytesWritable will accept Array[Byte] messages, so we will be limited to just those two types forever if we are only using that cast on the outgoing message before wrapping it in the Writable. If that works (i.e. messages will never be FloatWritable, LongWritable etc.) then generics will work there. But the getLength issue still presents a problem. We already have a configuration to set a batch size default or user-defined one, but getLength is called per-message-write, and it's how we track how big the current file is. We won't know when to split or when we hit that configured size without tracking it. Each Writable will need slightly different logic to pick up or estimate message size, they don't all supply a getLength call for byte size. So again that seems to force us to only work with BytesWritable and Text value types? If I'm completely missing something here please let me know and we can make the desire changes. Thanks for the input! I see. Considering those facts, IMO, moving the common code, such as getCompression, getBukets, to their parent class is sufficient. BTW, def getBucketer(systemName: String, config: HdfsConfig) = { new JobNameDateTimeBucketer(systemName, config) } Should it be Bucketer.getInstance(systemName, config) ? If we move it to the parent class, then just bucketer, I think. - Yan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93614 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 29, 2015, 7:41 p.m., Navina Ramesh wrote: samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala, line 37 https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37 What is the motivation for having 2 sequenceFileHdfsWriters? Is the only difference in the data type being written to the fs? The difference is not clear from the class's javadoc or the example configuration provided. The reason I ask is we already have a config overload in Samza. I would rather not have the user configure a writer type, if we can safely default to a single type and only override the relevant diverging properties. Functionally, I believe they are doing the same thing. Eli Reisman wrote: I have worked on another Apache project where introducing Writable type params all over the place has painted us into some corners over time, so I wanted to try it this way first. I could definitely rework the patch so that the current base class SequenceFileHdfsWriter takes [K, V] type params and handles data types that way if you like? Eli Reisman wrote: Oh and thanks once again for the prompt review, you folks are on top of it! I am liking the type param idea more as I think about it, despite past experience, but I am concerned that things like the casting of the outgoing message to something not-Writable like Array[Byte] or String might require a third param and it might start to get awkward. Also there are some Writable types that would not allow us to determine message size for batching purposes the way Text and BytesWritable do and could not be asbtracted this way, so we might end up with subclasses still (but smaller interfaces to implement) in that case. Any thoughts or guidance is welcome here, I'm happy enough with it at this point to make whatever of these changes sounds good to you and the committers. - Eli --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93501 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93614 --- docs/learn/documentation/versioned/hdfs/producer.md (line 33) https://reviews.apache.org/r/35445/#comment148011 com.etsy - org.apache. docs/learn/documentation/versioned/hdfs/producer.md (line 65) https://reviews.apache.org/r/35445/#comment148012 you also need to add a link in the document/index.html to make it reachable. samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala (line 38) https://reviews.apache.org/r/35445/#comment148015 can we use the camel case with an initial lower case character for the varaible? Just to obey the same coding guide http://samza.apache.org/contribute/coding-guide.html Thank you. samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala (line 38) https://reviews.apache.org/r/35445/#comment148016 not necessary samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala (line 40) https://reviews.apache.org/r/35445/#comment148018 My overall concern here is that, if there are more than one tasks are running, is it possible that all the tasks are writing to one file at the same time? samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala (line 55) https://reviews.apache.org/r/35445/#comment148017 this will very possibly never be executed, because line 53 already makes currentDateTime == dateTime. samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala (line 37) https://reviews.apache.org/r/35445/#comment148024 I would prefer the param idea because 1) Samza is already using this fashion 2) less code especially when there are more SequenceFileHdfsWriter come out (LongWritable, etc) like the casting of the outgoing message to something not-Writable like Array[Byte] or String might require a third param and it might start to get awkward -- We can always cast the outgoing msg to Array[Byte] using the serde defined for this msg. So as long as the Wriable accepts Array[Byte], this should be fine. Also there are some Writable types that would not allow us to determine message size for batching purposes the way -- I think we can either give it a default size (this can be configurable) when there is not getLength method or use a subclass. Either way will be fine. samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala (line 106) https://reviews.apache.org/r/35445/#comment148019 do we need to bring up the MiniCluster in every test? Is it be better if we just bring them up in the beforeClass and then shutdown it in afterClass? - Yan Fang On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
On July 29, 2015, 7:41 p.m., Navina Ramesh wrote: docs/learn/documentation/versioned/hdfs/producer.md, line 24 https://reviews.apache.org/r/35445/diff/4/?file=1023362#file1023362line24 Can you please update the list of available configuration for this system at http://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html as well? Like we did with elastic search, adding an example job to hello-samza will be very useful for adoption. If not in this JIRA, please consider opening a follow-up JIRA to add this example. Eli Reisman wrote: Yeah I can do that. I think adding an example to hello-samza is a great idea, but I agree I would make that a separate ticket and then I'd claim that ticket. Eli Reisman wrote: I'd also be interested in taking some follow up tickets for more output formats than just seq files once this is out, and possibly the reader implemention too although we'd need some up front discussion on the JIRA ticket about what that looks like in a long-lived Samza job that needs to read forever That will be great ! :) - Yan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/#review93501 --- On July 28, 2015, 5:25 a.m., Eli Reisman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 28, 2015, 5:25 a.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 26, 2015, 10:47 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs - build.gradle 0852adc samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing (updated) --- Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review issues and adds more pluggable design, several default writer implementations, and more (and more thorough) unit tests. Passes 'gradle clean test'. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 26, 2015, 10:31 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs (updated) - build.gradle 0852adc samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- New unit test, but it's fairly rudimentary. Passes ./gradlew test and ./gradlew check This only supplies an HDFS Producer, and this producer only writes SequenceFiles of ByteWriteables so far. If the patch were accepted as-is, I'd suggest future tickets for a matching HDFS Consumer, and a pluggable set of output formats, configurable via HdfsConfig settings. On the upside, this patch has been tested on a real cluster with real data, using several serdes, with good results. Thanks, Eli Reisman
Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35445/ --- (Updated July 20, 2015, 3:29 p.m.) Review request for samza. Repository: samza Description --- SAMZA-693: Very basic HDFS Producer service for Samza Diffs (updated) - build.gradle 0852adc samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala PRE-CREATION samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION samza-hdfs/src/test/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala PRE-CREATION samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION settings.gradle 19bff97 Diff: https://reviews.apache.org/r/35445/diff/ Testing --- New unit test, but it's fairly rudimentary. Passes ./gradlew test and ./gradlew check This only supplies an HDFS Producer, and this producer only writes SequenceFiles of ByteWriteables so far. If the patch were accepted as-is, I'd suggest future tickets for a matching HDFS Consumer, and a pluggable set of output formats, configurable via HdfsConfig settings. On the upside, this patch has been tested on a real cluster with real data, using several serdes, with good results. Thanks, Eli Reisman