Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-08-06 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated Aug. 6, 2015, 11:23 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs (updated)
-

  build.gradle 0852adc 
  docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
  docs/learn/documentation/versioned/index.html 85209bd 
  docs/learn/documentation/versioned/jobs/configuration-table.html 926fd50 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-6) should be 
good to go. Thanks again for the execellent reviews and prompt responses, your 
time is appreciated!

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-31 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 31, 2015, 8:40 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs
-

  build.gradle 0852adc 
  docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
  docs/learn/documentation/versioned/index.html 85209bd 
  docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing (updated)
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-6) should be 
good to go. Thanks again for the execellent reviews and prompt responses, your 
time is appreciated!

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-31 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 31, 2015, 7:53 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs (updated)
-

  build.gradle 0852adc 
  docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
  docs/learn/documentation/versioned/index.html 85209bd 
  docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
post-review issues and adds more pluggable design, several default writer 
implementations, and more (and more thorough) unit tests.

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 29, 2015, 7:41 p.m., Navina Ramesh wrote:
  docs/learn/documentation/versioned/hdfs/producer.md, line 24
  https://reviews.apache.org/r/35445/diff/4/?file=1023362#file1023362line24
 
  Can you please update the list of available configuration for this 
  system at 
  http://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html
   as well? 
  
  Like we did with elastic search, adding an example job to hello-samza 
  will be very useful for adoption. If not in this JIRA, please consider 
  opening a follow-up JIRA to add this example.

Yeah I can do that. I think adding an example to hello-samza is a great idea, 
but I agree I would make that a separate ticket and then I'd claim that ticket.


 On July 29, 2015, 7:41 p.m., Navina Ramesh wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala,
   line 37
  https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37
 
  What is the motivation for having 2 sequenceFileHdfsWriters?  Is the 
  only difference in the data type being written to the fs? 
  
  The difference is not clear from the class's javadoc or the example 
  configuration provided. The reason I ask is we already have a config 
  overload in Samza. I would rather not have the user configure a writer 
  type, if we can safely default to a single type and only override the 
  relevant diverging properties. Functionally, I believe they are doing the 
  same thing.

I have worked on another Apache project where introducing Writable type params 
all over the place has painted us into some corners over time, so I wanted to 
try it this way first. I could definitely rework the patch so that the current 
base class SequenceFileHdfsWriter takes [K, V] type params and handles data 
types that way if you like?


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93501
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 29, 2015, 7:41 p.m., Navina Ramesh wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala,
   line 37
  https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37
 
  What is the motivation for having 2 sequenceFileHdfsWriters?  Is the 
  only difference in the data type being written to the fs? 
  
  The difference is not clear from the class's javadoc or the example 
  configuration provided. The reason I ask is we already have a config 
  overload in Samza. I would rather not have the user configure a writer 
  type, if we can safely default to a single type and only override the 
  relevant diverging properties. Functionally, I believe they are doing the 
  same thing.
 
 Eli Reisman wrote:
 I have worked on another Apache project where introducing Writable type 
 params all over the place has painted us into some corners over time, so I 
 wanted to try it this way first. I could definitely rework the patch so that 
 the current base class SequenceFileHdfsWriter takes [K, V] type params and 
 handles data types that way if you like?

Oh and thanks once again for the prompt review, you folks are on top of it!


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93501
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 30, 2015, 6:59 p.m., Yan Fang wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala,
   line 37
  https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37
 
  I would prefer the param idea because 1) Samza is already using this 
  fashion 2) less code especially when there are more SequenceFileHdfsWriter 
  come out (LongWritable, etc)
  
  like the casting of the outgoing message to something not-Writable 
  like Array[Byte] or String might require a third param and it might start 
  to get awkward
  
  -- We can always cast the outgoing msg to Array[Byte] using the serde 
  defined for this msg. So as long as the Wriable accepts Array[Byte], this 
  should be fine.
  
  Also there are some Writable types that would not allow us to 
  determine message size for batching purposes the way 
  
  -- I think we can either give it a default size (this can be 
  configurable) when there is not getLength method or use a subclass. Either 
  way will be fine.

I definitely agree on the less code point, and I think we can move functions 
like the compression selection to the base class.

But, I don't think we can't just cast to Array[Byte] for all the Writable types 
to accept the message, even from the serde. Only Text and BytesWritable will 
accept Array[Byte] messages, so we will be limited to just those two types 
forever if we are only using that cast on the outgoing message before wrapping 
it in the Writable. If that works (i.e. messages will never be FloatWritable, 
LongWritable etc.) then generics will work there.

But the getLength issue still presents a problem. We already have a 
configuration to set a batch size default or user-defined one, but getLength is 
called per-message-write, and it's how we track how big the current file is. We 
won't know when to split or when we hit that configured size without tracking 
it. Each Writable will need slightly different logic to pick up or estimate 
message size, they don't all supply a getLength call for byte size.

So again that seems to force us to only work with BytesWritable and Text value 
types? If I'm completely missing something here please let me know and we can 
make the desire changes. Thanks for the input!


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 

Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 30, 2015, 6:59 p.m., Yan Fang wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala,
   line 40
  https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line40
 
  My overall concern here is that, if there are more than one tasks are 
  running, is it possible that all the tasks are writing to one file at the 
  same time?

I don't think so, each registered source should be using it's own HdfsWriter in 
write() calls even on the same Producer and the filenames per writer are 
unique-ified in the writer impl. There are other ways to accomplish that 
uniqueness though.


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 30, 2015, 6:59 p.m., Yan Fang wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala,
   line 55
  https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line55
 
  this will very possibly never be executed, because line 53 already 
  makes currentDateTime == dateTime.

Some version of that in our internal patch works but maybe I broke it here I'll 
take a look, thanks


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Yan Fang


 On July 30, 2015, 6:59 p.m., Yan Fang wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala,
   line 40
  https://reviews.apache.org/r/35445/diff/4/?file=1023371#file1023371line40
 
  My overall concern here is that, if there are more than one tasks are 
  running, is it possible that all the tasks are writing to one file at the 
  same time?
 
 Eli Reisman wrote:
 I don't think so, each registered source should be using it's own 
 HdfsWriter in write() calls even on the same Producer and the filenames per 
 writer are unique-ified in the writer impl. There are other ways to 
 accomplish that uniqueness though.

I see. We are using the UUID.randomUUID to make sure the writers writes to 
different files. This is fine unless we win the lottery. :)


 On July 30, 2015, 6:59 p.m., Yan Fang wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala,
   line 37
  https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37
 
  I would prefer the param idea because 1) Samza is already using this 
  fashion 2) less code especially when there are more SequenceFileHdfsWriter 
  come out (LongWritable, etc)
  
  like the casting of the outgoing message to something not-Writable 
  like Array[Byte] or String might require a third param and it might start 
  to get awkward
  
  -- We can always cast the outgoing msg to Array[Byte] using the serde 
  defined for this msg. So as long as the Wriable accepts Array[Byte], this 
  should be fine.
  
  Also there are some Writable types that would not allow us to 
  determine message size for batching purposes the way 
  
  -- I think we can either give it a default size (this can be 
  configurable) when there is not getLength method or use a subclass. Either 
  way will be fine.
 
 Eli Reisman wrote:
 I definitely agree on the less code point, and I think we can move 
 functions like the compression selection to the base class.
 
 But, I don't think we can't just cast to Array[Byte] for all the Writable 
 types to accept the message, even from the serde. Only Text and BytesWritable 
 will accept Array[Byte] messages, so we will be limited to just those two 
 types forever if we are only using that cast on the outgoing message before 
 wrapping it in the Writable. If that works (i.e. messages will never be 
 FloatWritable, LongWritable etc.) then generics will work there.
 
 But the getLength issue still presents a problem. We already have a 
 configuration to set a batch size default or user-defined one, but getLength 
 is called per-message-write, and it's how we track how big the current file 
 is. We won't know when to split or when we hit that configured size without 
 tracking it. Each Writable will need slightly different logic to pick up or 
 estimate message size, they don't all supply a getLength call for byte size.
 
 So again that seems to force us to only work with BytesWritable and Text 
 value types? If I'm completely missing something here please let me know and 
 we can make the desire changes. Thanks for the input!

I see. Considering those facts, IMO, moving the common code, such as 
getCompression, getBukets, to their parent class is sufficient.

BTW, def getBucketer(systemName: String, config: HdfsConfig) = {
new JobNameDateTimeBucketer(systemName, config)
  }
  
Should it be Bucketer.getInstance(systemName, config) ?

If we move it to the parent class, then just bucketer, I think.


- Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   

Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Eli Reisman


 On July 29, 2015, 7:41 p.m., Navina Ramesh wrote:
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala,
   line 37
  https://reviews.apache.org/r/35445/diff/4/?file=1023373#file1023373line37
 
  What is the motivation for having 2 sequenceFileHdfsWriters?  Is the 
  only difference in the data type being written to the fs? 
  
  The difference is not clear from the class's javadoc or the example 
  configuration provided. The reason I ask is we already have a config 
  overload in Samza. I would rather not have the user configure a writer 
  type, if we can safely default to a single type and only override the 
  relevant diverging properties. Functionally, I believe they are doing the 
  same thing.
 
 Eli Reisman wrote:
 I have worked on another Apache project where introducing Writable type 
 params all over the place has painted us into some corners over time, so I 
 wanted to try it this way first. I could definitely rework the patch so that 
 the current base class SequenceFileHdfsWriter takes [K, V] type params and 
 handles data types that way if you like?
 
 Eli Reisman wrote:
 Oh and thanks once again for the prompt review, you folks are on top of 
 it!

I am liking the type param idea more as I think about it, despite past 
experience, but I am concerned that things like the casting of the outgoing 
message to something not-Writable like Array[Byte] or String might require a 
third param and it might start to get awkward. Also there are some Writable 
types that would not allow us to determine message size for batching purposes 
the way Text and BytesWritable do and could not be asbtracted this way, so we 
might end up with subclasses still (but smaller interfaces to implement) in 
that case.

Any thoughts or guidance is welcome here, I'm happy enough with it at this 
point to make whatever of these changes sounds good to you and the committers.


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93501
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Yan Fang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---



docs/learn/documentation/versioned/hdfs/producer.md (line 33)
https://reviews.apache.org/r/35445/#comment148011

com.etsy - org.apache.



docs/learn/documentation/versioned/hdfs/producer.md (line 65)
https://reviews.apache.org/r/35445/#comment148012

you also need to add a link in the document/index.html to make it 
reachable.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 (line 38)
https://reviews.apache.org/r/35445/#comment148015

can we use the camel case with an initial lower case character  for the 
varaible? Just to obey the same coding guide 
http://samza.apache.org/contribute/coding-guide.html

Thank you.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
(line 38)
https://reviews.apache.org/r/35445/#comment148016

not necessary



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 (line 40)
https://reviews.apache.org/r/35445/#comment148018

My overall concern here is that, if there are more than one tasks are 
running, is it possible that all the tasks are writing to one file at the same 
time?



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 (line 55)
https://reviews.apache.org/r/35445/#comment148017

this will very possibly never be executed, because line 53 already makes 
currentDateTime == dateTime.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 (line 37)
https://reviews.apache.org/r/35445/#comment148024

I would prefer the param idea because 1) Samza is already using this 
fashion 2) less code especially when there are more SequenceFileHdfsWriter come 
out (LongWritable, etc)

like the casting of the outgoing message to something not-Writable like 
Array[Byte] or String might require a third param and it might start to get 
awkward

-- We can always cast the outgoing msg to Array[Byte] using the serde 
defined for this msg. So as long as the Wriable accepts Array[Byte], this 
should be fine.

Also there are some Writable types that would not allow us to determine 
message size for batching purposes the way 

-- I think we can either give it a default size (this can be configurable) 
when there is not getLength method or use a subclass. Either way will be fine.



samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 (line 106)
https://reviews.apache.org/r/35445/#comment148019

do we need to bring up the MiniCluster in every test? Is it be better if we 
just bring them up in the beforeClass and then shutdown it in afterClass?


- Yan Fang


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   

Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-30 Thread Yan Fang


 On July 29, 2015, 7:41 p.m., Navina Ramesh wrote:
  docs/learn/documentation/versioned/hdfs/producer.md, line 24
  https://reviews.apache.org/r/35445/diff/4/?file=1023362#file1023362line24
 
  Can you please update the list of available configuration for this 
  system at 
  http://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html
   as well? 
  
  Like we did with elastic search, adding an example job to hello-samza 
  will be very useful for adoption. If not in this JIRA, please consider 
  opening a follow-up JIRA to add this example.
 
 Eli Reisman wrote:
 Yeah I can do that. I think adding an example to hello-samza is a great 
 idea, but I agree I would make that a separate ticket and then I'd claim that 
 ticket.
 
 Eli Reisman wrote:
 I'd also be interested in taking some follow up tickets for more output 
 formats than just seq files once this is out, and possibly the reader 
 implemention too although we'd need some up front discussion on the JIRA 
 ticket about what that looks like in a long-lived Samza job that needs to 
 read forever

That will be great ! :)


- Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93501
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35445/
 ---
 
 (Updated July 28, 2015, 5:25 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-693: Very basic HDFS Producer service for Samza
 
 
 Diffs
 -
 
   build.gradle 0852adc 
   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
 PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
  PRE-CREATION 
   
 samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
  PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
 PRE-CREATION 
   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
   
 samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
  PRE-CREATION 
   settings.gradle 19bff97 
 
 Diff: https://reviews.apache.org/r/35445/diff/
 
 
 Testing
 ---
 
 Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
 post-review issues and adds more pluggable design, several default writer 
 implementations, and more (and more thorough) unit tests.
 
 Passes 'gradle clean test'.
 
 
 Thanks,
 
 Eli Reisman
 




Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-26 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 26, 2015, 10:47 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs
-

  build.gradle 0852adc 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing (updated)
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
post-review issues and adds more pluggable design, several default writer 
implementations, and more (and more thorough) unit tests.

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-26 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 26, 2015, 10:31 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs (updated)
-

  build.gradle 0852adc 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing
---

New unit test, but it's fairly rudimentary. Passes ./gradlew test and 
./gradlew check

This only supplies an HDFS Producer, and this producer only writes 
SequenceFiles of ByteWriteables so far. If the patch were accepted as-is, I'd 
suggest future tickets for a matching HDFS Consumer, and a pluggable set of 
output formats, configurable via HdfsConfig settings.

On the upside, this patch has been tested on a real cluster with real data, 
using several serdes, with good results.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-20 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 20, 2015, 3:29 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs (updated)
-

  build.gradle 0852adc 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/test/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing
---

New unit test, but it's fairly rudimentary. Passes ./gradlew test and 
./gradlew check

This only supplies an HDFS Producer, and this producer only writes 
SequenceFiles of ByteWriteables so far. If the patch were accepted as-is, I'd 
suggest future tickets for a matching HDFS Consumer, and a pluggable set of 
output formats, configurable via HdfsConfig settings.

On the upside, this patch has been tested on a real cluster with real data, 
using several serdes, with good results.


Thanks,

Eli Reisman