Re: Review Request 36545: SAMZA-682 Refactor Coordinator stream messages

2015-07-31 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36545/#review93826
---

Ship it!


Thanks for testing with Zopkio !

- Navina Ramesh


On July 31, 2015, 12:49 p.m., József Márton Jung wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36545/
> ---
> 
> (Updated July 31, 2015, 12:49 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> The following has been refactored: 
> 1. Static inner classes from CoordinatorStreamMessage has been extracted
> 2. Common functionality from CheckpointManager, ChangelogMappingManager and 
> LocalityManager has benn moved to a base class
> 
> 
> Diffs
> -
> 
>   checkstyle/import-control.xml 6654319 
>   samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 7445996 
>   samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
> 55c258f 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/AbstractCoordinatorStreamManager.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
>  e5ab4fb 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  b1078bd 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemProducer.java
>  92f8907 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
>  f769756 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/CoordinatorStreamMessage.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/Delete.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetChangelogMapping.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetCheckpoint.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetConfig.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetContainerHostMapping.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/model/JobModel.java ad6387d 
>   
> samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
>  7d3409c 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> 0dbf14b 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a6 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamWrappedConsumer.java
>  e454593 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamMessage.java
>  ac26a01 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemConsumer.java
>  c25f6a7 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java
>  1ef07d0 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
>  c484660 
>   
> samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
> 84fdeaa 
>   samza-yarn/src/main/resources/scalate/WEB-INF/views/index.scaml 41303f7 
> 
> Diff: https://reviews.apache.org/r/36545/diff/
> 
> 
> Testing
> ---
> 
> Tests has been updated.
> 
> 
> Thanks,
> 
> József Márton Jung
> 
>



Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

2015-07-31 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review93820
---



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 70)


shouldRewriteConfigToCoordinatorStream is the action, not the job-level 
functionality this variable is controlling. I would prefer "overwriteJobConfig" 
or "resetJobConfig" which tells more explicitly what the job-level function 
this controls.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 1)


The file directory name is still autoScaling. Isn't it a problem?



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 226)


Can we lower it down to info? This should not trigger alerts.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 307)


This also reminds me of one thing: where do we assume that the 
ConfigManager will be running? If we assumes a specific host or hosts to run 
this ConfigManager (e.g. RM nodes), we better call it out and add some docs to 
describe it.



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala 
(line 61)


Can we use CoordinatorStreamMessage defined constants, instead of hard-code 
strings here?


- Yi Pan (Data Infrastructure)


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36006/
> ---
> 
> (Updated July 28, 2015, 2:39 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
> Naveen Somasundaram.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> After a job is submitted, it might need some configuration change, 
> specifically it might need more containers. In SAMZA-704 a tool is being 
> added to write to the coordinator stream (CoordinatorStreamWriter).  This 
> tool can be used to write new configurations to the coordinator stream. 
> However, another tool (ConfigManager) is needed to read the config changes 
> and react to them, which is the goal of this task. This tool should be 
> brought up after the job is submitted and read any config changes added to 
> the coordinator stream, and react to each accordingly. 
> 
> This tool, called the Config Manager, is focusing on handling container 
> changs by reacting to set-config massages with key "yarn.container.count". 
> 
> The config manager is a separate standa alone module, that should be brought 
> up separately after the submission of a job. Therefore, you have to add two 
> configurations to the input config file:
> 1. yarn.rm.address= 
> 2. yarn.rm.port= 
> 
> The config manger will periodically poll the coordinator stream to see if 
> there are any new messages. This period is set to 100 ms by deafualt. 
> However, it can be configured by adding 
> configManager.polling.interval= to the input config file. 
> Thus, overal the command to run the config manager along with the job would 
> be:
> 
> 
> /bin/run-config-manager.sh --config-factory= factory> --config-path=
> 
> 
> Diffs
> -
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   
> samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
>  PRE-CREATION 
>   
> samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
>   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
>   
> samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
>  ea702a919348305ff95ce0b4ca1996a13aff04ec 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala
>  ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
>   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
> 
> Diff: https://reviews.apache.org/r/36006/diff/
> 
> 
> Testing
> ---
> 
> Tested with hello samza and works properly.
> 
> 
> Thanks,
> 
> Shadi A. Noghabi
> 
>



Re: Review Request 36471: Autoscaling for samza (work in progress)

2015-07-31 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36471/#review93785
---


As discussed, we definitely need a document on the auto-scaling design and 
config model. You can add a separate web-page to the site. Please make sure you 
create follow-up JIRAs for the same and include a auto-scaling config table as 
well.


samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 30)


Can you mention the available modes/values for this config as a comment 
here?
It's easier to understand rather than scrolling till the method where this 
is used.



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 31)


Mention a one-liner comment on what this config indicates.
It can perhaps be renamed as "autoscaling.max.containers.allowed"

One general comment: Try to keep the variable name similar to the 
configuration. For example, "autoscaling.max.containers.allowed" can be 
AUTOSCALING_MAX_CONTAINERS_ALLOWED.

It might make the variable name really long. But it improved readability.



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 34)


Rename this to "autoscaling.analyzer.%s.num.datapoints" ?
I can't think of anything better :) 
But let's not overuse "window" as it automatically makes you relate to 
"time"

1 line comment like - "number of data points the analyzer looks at"



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 35)


Does this represent the period for which we store the gathered metrics ?



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 36)


This is confusing because it only refers to the percentage of data-points 
above the trigger threshold. 
Try "autoscaling.analyzer.%s.percent.datapoints.above.trigger" or 
"autoscaling.analyzer.%s.datatpoints.above.trigger" ??



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 37)


This is fine! Just add a one-liner comment here



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 38)


I am a little lost on this config even after reading the description below. 
I will sync up with you in person!



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 41)


The varaible name is fine. You can change the config name to look similar. 
"autoscaling.optimizer.max.allocatable.capacity" or does 
"autoscaling.optimizer.max.container.capacity" sound better??



samza-core/src/main/java/org/apache/samza/autoScaling/AutoScalingConfig.java 
(line 42)


"remove" sounds too harsh. It should indicate how much memory to free up 
per container that you scale up??

Does "autoscaling.optimizer.percent.memory.revoke" or 
"autoscaling.optimizer.memory.revoke" sound correct?


- Navina Ramesh


On July 31, 2015, 4:24 a.m., Shadi A. Noghabi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36471/
> ---
> 
> (Updated July 31, 2015, 4:24 a.m.)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This work is for SAMZA-719. Currently, a fixed number of containers is 
> assigned to a job as an input configuration parameter. However, with this 
> design jobs can fail due to lack of enough resources (such as memory), or 
> they can become a bottleneck in a workflow containing many jobs. While 
> auto-scaling is much broader term, the goal of this project will be to enable 
> a Samza job to automatically scale its containers such that there is improved 
> job performance.
> 
> Based on the design, we need a profiler, analyzer, optimizer and deployer 
> module.
> 
> -currently, all components added based on memory usage and the loop seems to 
> work in inital test
> 
> -tests not added for all components, and further testing is needed.
> 
> 
> Diffs
> -
> 
>   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
>   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
>   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
>   
> samza-core/src/main/java/org/apa

Re: [DISCUSS] Release 0.10.0

2015-07-31 Thread Yi Pan
Hi, Guozhang,

Good call. I moved SAMZA-697 out of 0.10.0. Boris is still working on that
and it is not clear yet whether this separation of class loaders will be
ready in 0.10.0.

Thanks!

-Yi

On Fri, Jul 31, 2015 at 10:12 AM, Guozhang Wang  wrote:

> Would SAMZA-697 worth adding to 0.10.0 as well?
>
> Guozhang
>
> On Thu, Jul 30, 2015 at 6:45 PM, Yan Fang  wrote:
>
> > For SAMZA-747, may ping Naveen or Chris. :)
> >
> > They have the permission to publish to maven. From the discuss
> > , they seem ready for
> the
> > release.
> >
> > Thanks,
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> > On Thu, Jul 30, 2015 at 5:35 PM, Navina Ramesh
> >  > > wrote:
> >
> > > Ok. I think it got confusing because you are talking about tickets NOT
> to
> > > be included in the release :)
> > >
> > > Got it, now. +1 for this list of exclusions!
> > > SAMZA-723 (StreamAppender bug) and SAMZA-747 (rocksdb) should be in
> > 0.10.0.
> > > I think we don't have an ETA on the fix for SAMZA-747?
> > >
> > > Thanks!
> > > Navina
> > >
> > > On Thu, Jul 30, 2015 at 5:26 PM, Yi Pan  wrote:
> > >
> > > > Uh... wrong math all the day today... :(
> > > >
> > > > Let me re-try:
> > > >
> > > > 29/32 tickets are to be moved to later (i.e. excluded from 0.10.0).
> > > >
> > > > 2/32 tickets (including SAMZA-723) will be included
> > > >
> > > > 1/32 ticket (SAMZA-689) is up for discussion and I am leaning toward
> > mark
> > > > it as won't fix.
> > > >
> > > > -Yi
> > > >
> > > > On Thu, Jul 30, 2015 at 5:24 PM, Yi Pan  wrote:
> > > >
> > > > > Hi, Navina,
> > > > >
> > > > > The 29/30 tickets are to be excluded from 0.10.0 (i.e. moved to
> > > 0.11.0),
> > > > > the three tickets are either to be included in 0.10.0, or won't
> fix.
> > > > >
> > > > > -Yi
> > > > >
> > > > > On Thu, Jul 30, 2015 at 5:10 PM, Navina Ramesh <
> > > > > nram...@linkedin.com.invalid> wrote:
> > > > >
> > > > >> Hi Yi,
> > > > >> Thanks for summarizing. But why are we excluding SAMZA-723 from
> the
> > > > >> current
> > > > >> release ? Doesn't this break the existing StreamAppender
> > functionality
> > > > in
> > > > >> 0.9?
> > > > >>
> > > > >> Thanks!
> > > > >> Navina
> > > > >>
> > > > >> On Thu, Jul 30, 2015 at 4:55 PM, Yi Pan 
> > wrote:
> > > > >>
> > > > >> > Sorry, hit the send button too fast. Let me correct the summary
> > > > section:
> > > > >> >
> > > > >> > 29/32 tickets Status=open/reopen :
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20fixVersion%20%3D%200.10.0%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > > > >> >
> > > > >> > Three tickets excluded:
> > > > >> > - SAMZA-689, @Chinmay, were you able to re-produce it? I am
> > thinking
> > > > of
> > > > >> > mark it as won't fix since 0.10.0 will remove the checkpoint
> topic
> > > > >> > altogether.
> > > > >> > - SAMZA-747, we will have to upgrade to RocksDB 3.11.1 to
> include
> > > the
> > > > >> fix
> > > > >> > that breaks Samza 0.10.0 build on Linux boxes
> > > > >> > - SAMZA-723, stream appender deadlock issue
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > -Yi
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Jul 30, 2015 at 4:52 PM, Yi Pan 
> > > wrote:
> > > > >> >
> > > > >> > > Hi, all,
> > > > >> > >
> > > > >> > > Thanks a lot for helping out to select the features planned in
> > > > 0.10.0.
> > > > >> > >
> > > > >> > > Based on the above discussion, I am proposing to move the
> > > following
> > > > >> > > tickets later (i.e. 0.11.0).
> > > > >> > >
> > > > >> > > 30/32 tickets Status=open/reopen (exception for SAMZA-723:
> > stream
> > > > >> > appender
> > > > >> > > deadlock issue) :
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20fixVersion%20%3D%200.10.0%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > > > >> > >
> > > > >> > > Two tickets excluded:
> > > > >> > > - SAMZA-689, @Chinmay, were you able to re-produce it? I am
> > > thinking
> > > > >> of
> > > > >> > > mark it as won't fix since 0.10.0 will remove the checkpoint
> > topic
> > > > >> > > altogether.
> > > > >> > > - SAMZA-747, we will have to upgrade to RocksDB 3.11.1 to
> > include
> > > > the
> > > > >> fix
> > > > >> > > that breaks Samza 0.10.0 build on Linux boxes
> > > > >> > >
> > > > >> > > Anything that I missed?
> > > > >> > >
> > > > >> > > Thanks!
> > > > >> > >
> > > > >> > > -Yi
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Jul 29, 2015 at 9:57 AM, Navina Ramesh <
> > > > >> > > nram...@linkedin.com.invalid> wrote:
> > > > >> > >
> > > > >> > >> +1 for StreamAppender bug fix as a mandatory in 0.10.
> > > > >> > >>
> > > > >> > >> @Yan: will review SAMZA-676 today and test it out locally.
> > > > >> > >>
> > > > >> > >> Thanks!
> > >

Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-31 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 31, 2015, 8:40 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs
-

  build.gradle 0852adc 
  docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
  docs/learn/documentation/versioned/index.html 85209bd 
  docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing (updated)
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-6) should be 
good to go. Thanks again for the execellent reviews and prompt responses, your 
time is appreciated!

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-31 Thread Eli Reisman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/
---

(Updated July 31, 2015, 7:53 p.m.)


Review request for samza.


Repository: samza


Description
---

SAMZA-693: Very basic HDFS Producer service for Samza


Diffs (updated)
-

  build.gradle 0852adc 
  docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
  docs/learn/documentation/versioned/index.html 85209bd 
  docs/learn/documentation/versioned/jobs/configuration-table.html ea73b40 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala 
PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
 PRE-CREATION 
  
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
 PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
  
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
 PRE-CREATION 
  settings.gradle 19bff97 

Diff: https://reviews.apache.org/r/35445/diff/


Testing
---

Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
post-review issues and adds more pluggable design, several default writer 
implementations, and more (and more thorough) unit tests.

Passes 'gradle clean test'.


Thanks,

Eli Reisman



Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza

2015-07-31 Thread Eli Reisman


> On July 30, 2015, 6:59 p.m., Yan Fang wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala,
> >  line 40
> > 
> >
> > My overall concern here is that, if there are more than one tasks are 
> > running, is it possible that all the tasks are writing to one file at the 
> > same time?
> 
> Eli Reisman wrote:
> I don't think so, each registered source should be using it's own 
> HdfsWriter in write() calls even on the same Producer and the filenames per 
> writer are unique-ified in the writer impl. There are other ways to 
> accomplish that uniqueness though.
> 
> Yan Fang wrote:
> I see. We are using the UUID.randomUUID to make sure the writers writes 
> to different files. This is fine unless we win the lottery. :)

Right! I was wondering if you wanted something more concise or there's a better 
uniquing pattern using some combo of other fields available in the Config + 
systemName? Glad you brought it up


- Eli


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
---


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35445/
> ---
> 
> (Updated July 28, 2015, 5:25 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-693: Very basic HDFS Producer service for Samza
> 
> 
> Diffs
> -
> 
>   build.gradle 0852adc 
>   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 
> PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala 
> PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala 
> PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
>  PRE-CREATION 
>   
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
>  PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties 
> PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties 
> PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties 
> PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
>   
> samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
>  PRE-CREATION 
>   settings.gradle 19bff97 
> 
> Diff: https://reviews.apache.org/r/35445/diff/
> 
> 
> Testing
> ---
> 
> Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses 
> post-review issues and adds more pluggable design, several default writer 
> implementations, and more (and more thorough) unit tests.
> 
> Passes 'gradle clean test'.
> 
> 
> Thanks,
> 
> Eli Reisman
> 
>



Re: Coordinator URL always 127.0.0.1

2015-07-31 Thread Tommy Becker

I realize the JC runs in the AM YARN container.  The SamzaContainers run in 
their own YARN containers, which may or may not be on the same node as the AM 
in the YARN grid.  My point is simply that when they are not, it does not work. 
 This is because the AM is telling the SamzaContainers to fetch the config from 
127.0.0.1.

-Tommy

On 07/31/2015 02:31 PM, Navina Ramesh wrote:

Hi Tommy,
I am not sure what you mean below:
{quote}
this context by container I meant the SamzaContainer.  What we are seeing
is that jobs only start when YARN happens to place the AM and
SamzaContainer(s) on the same node.Which is increasingly unlikely as you
increase container count for your job and/or expand your YARN grid.
{quote}
Any YARN application finds a yarn container to run the AM. In our case, JC
is running in the same container as the AM. So, I don't understand why this
will cause an issue on cluster expansion. I can understand your concern if
JC and AM are running on 2 separate containers.

Please correct me if I have misunderstood your statement.

Thanks!
Navina

On Fri, Jul 31, 2015 at 6:05 AM, Tommy Becker 
 wrote:



Hey Yan,

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container

Sorry, the term "container" is overloaded.  In this context by container I
meant the SamzaContainer.  What we are seeing is that jobs only start when
YARN happens to place the AM and SamzaContainer(s) on the same node.  Which
is increasingly unlikely as you increase container count for your job
and/or expand your YARN grid.

-Tommy

On 07/30/2015 10:08 PM, Yan Fang wrote:

Hi Thommy,

{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.

{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}

-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution
<
http://www.coderanch.com/t/491883/java/java/IP>
 .

+ @Yi, @Navina,

Also, I think this fix should go to the 0.10.0 release.

What do you guys think?

Thanks,

Fang, Yan
yanfang...@gmail.com

On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang 
 wrote:



Just one point to add:

{quote}
AM gets notified of container status from the RM.
{quote}

I think this is not 100% correct. AM can communicate with NM through
NMClientAsync
<
https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html


<


https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
to
get container status, though Samza does not implement the CallbackHandler.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
nram...@linkedin.com.invalid>
 wrote:



The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to
the
NM.

Navina


On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker 
 wrote:



Ok, I thought there was some communication from the container to the AM,
it sounds like you're saying it's in the other direction only?  Don't
containers heartbeat to the AM?  Regardless, even if we can't get a


better


address for the AM from YARN, we could at least filter the addresses we


get


back from the JVM to exclude loopbacks.

-Tommy

From: Navina Ramesh 
[nram...@linkedin.com.INVALID]
Sent: Thursday, July 30, 2015 8:40 PM
To: 
dev@samza.apache.org
Subject: Re: Coordinator URL always 127.0.0.1

Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM


registers


itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don

Re: Coordinator URL always 127.0.0.1

2015-07-31 Thread Navina Ramesh
Hi Tommy,
I am not sure what you mean below:
{quote}
this context by container I meant the SamzaContainer.  What we are seeing
is that jobs only start when YARN happens to place the AM and
SamzaContainer(s) on the same node.Which is increasingly unlikely as you
increase container count for your job and/or expand your YARN grid.
{quote}
Any YARN application finds a yarn container to run the AM. In our case, JC
is running in the same container as the AM. So, I don't understand why this
will cause an issue on cluster expansion. I can understand your concern if
JC and AM are running on 2 separate containers.

Please correct me if I have misunderstood your statement.

Thanks!
Navina

On Fri, Jul 31, 2015 at 6:05 AM, Tommy Becker  wrote:

> Hey Yan,
>
> -- I do not quite understand this part. AM essentially is running in a
> container as well. And the http server is brought up in the same container
>
> Sorry, the term "container" is overloaded.  In this context by container I
> meant the SamzaContainer.  What we are seeing is that jobs only start when
> YARN happens to place the AM and SamzaContainer(s) on the same node.  Which
> is increasingly unlikely as you increase container count for your job
> and/or expand your YARN grid.
>
> -Tommy
>
> On 07/30/2015 10:08 PM, Yan Fang wrote:
>
> Hi Thommy,
>
> {quote}
> Because I don't see how this is ever going to work in scenarios where the
> AM is on a different node than the containers.
> {quote}
>
> -- I do not quite understand this part. AM essentially is running in a
> container as well. And the http server is brought up in the same container.
>
> {quote}
> even if we can't get a better address for the AM from YARN, we could at
> least filter the addresses we get back from the JVM to exclude loopbacks.
> {quote}
>
> -- You are right. InetAddress.getLocalHost() gives back loopback address
> sometimes. We should filter this out. Just googling one possible solution
> <
> http://www.coderanch.com/t/491883/java/java/IP> .
>
> + @Yi, @Navina,
>
> Also, I think this fix should go to the 0.10.0 release.
>
> What do you guys think?
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang  yanfang...@gmail.com> wrote:
>
>
>
> Just one point to add:
>
> {quote}
> AM gets notified of container status from the RM.
> {quote}
>
> I think this is not 100% correct. AM can communicate with NM through
> NMClientAsync
> <
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html
> ><
> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html>
> to
> get container status, though Samza does not implement the CallbackHandler.
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
> nram...@linkedin.com.invalid> wrote:
>
>
>
> The NM (and hence, by extension the container) heartbeats to the RM, not
> the AM. AM gets notified of container status from the RM.
> The AM starts / stops /releases a container process by communicating to
> the
> NM.
>
> Navina
>
>
> On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker  tobec...@tivo.com> wrote:
>
>
>
> Ok, I thought there was some communication from the container to the AM,
> it sounds like you're saying it's in the other direction only?  Don't
> containers heartbeat to the AM?  Regardless, even if we can't get a
>
>
> better
>
>
> address for the AM from YARN, we could at least filter the addresses we
>
>
> get
>
>
> back from the JVM to exclude loopbacks.
>
> -Tommy
> 
> From: Navina Ramesh [nram...@linkedin.com.INVALID nram...@linkedin.com.INVALID>]
> Sent: Thursday, July 30, 2015 8:40 PM
> To: dev@samza.apache.org
> Subject: Re: Coordinator URL always 127.0.0.1
>
> Hi Tommy,
> Yi is right. Container start is coordinated by the AppMaster using an
> NMClient. Container host name and port is provided by the RM during
> allocation.
> In Yarn (at least, afaik), when the node joins a cluster, the NM
>
>
> registers
>
>
> itself with the RM. So, the NM might still be using
> getLocalhost.getAddress().
>
> I don't know of any other way to programmatically fetch the machine's
> hostname (apart from some hacky shell commands).
>
> Cheers,
> Navina
>
> On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan  nickpa...@gmail.com> wrote:
>
>
>
> Hi, Tommy,
>
> Yeah, I agree that the current implementation is not bullet-proof to
>
>
> any
>
>
> different networking configuration on the host. As for the AM <->
>
>
> container
>
>
> communication, if I am not mistaken, it is through the NMClient and
>
>
> the
>
>
> node HTTP address is wrapped within the Container object returned from
>
>
> RM.
>
>
> I am not very familiar with that part of source code. Navina may be
>
>
> able
>
>
> to
>
>
> help m

Re: [DISCUSS] Release 0.10.0

2015-07-31 Thread Guozhang Wang
Would SAMZA-697 worth adding to 0.10.0 as well?

Guozhang

On Thu, Jul 30, 2015 at 6:45 PM, Yan Fang  wrote:

> For SAMZA-747, may ping Naveen or Chris. :)
>
> They have the permission to publish to maven. From the discuss
> , they seem ready for the
> release.
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Jul 30, 2015 at 5:35 PM, Navina Ramesh
>  > wrote:
>
> > Ok. I think it got confusing because you are talking about tickets NOT to
> > be included in the release :)
> >
> > Got it, now. +1 for this list of exclusions!
> > SAMZA-723 (StreamAppender bug) and SAMZA-747 (rocksdb) should be in
> 0.10.0.
> > I think we don't have an ETA on the fix for SAMZA-747?
> >
> > Thanks!
> > Navina
> >
> > On Thu, Jul 30, 2015 at 5:26 PM, Yi Pan  wrote:
> >
> > > Uh... wrong math all the day today... :(
> > >
> > > Let me re-try:
> > >
> > > 29/32 tickets are to be moved to later (i.e. excluded from 0.10.0).
> > >
> > > 2/32 tickets (including SAMZA-723) will be included
> > >
> > > 1/32 ticket (SAMZA-689) is up for discussion and I am leaning toward
> mark
> > > it as won't fix.
> > >
> > > -Yi
> > >
> > > On Thu, Jul 30, 2015 at 5:24 PM, Yi Pan  wrote:
> > >
> > > > Hi, Navina,
> > > >
> > > > The 29/30 tickets are to be excluded from 0.10.0 (i.e. moved to
> > 0.11.0),
> > > > the three tickets are either to be included in 0.10.0, or won't fix.
> > > >
> > > > -Yi
> > > >
> > > > On Thu, Jul 30, 2015 at 5:10 PM, Navina Ramesh <
> > > > nram...@linkedin.com.invalid> wrote:
> > > >
> > > >> Hi Yi,
> > > >> Thanks for summarizing. But why are we excluding SAMZA-723 from the
> > > >> current
> > > >> release ? Doesn't this break the existing StreamAppender
> functionality
> > > in
> > > >> 0.9?
> > > >>
> > > >> Thanks!
> > > >> Navina
> > > >>
> > > >> On Thu, Jul 30, 2015 at 4:55 PM, Yi Pan 
> wrote:
> > > >>
> > > >> > Sorry, hit the send button too fast. Let me correct the summary
> > > section:
> > > >> >
> > > >> > 29/32 tickets Status=open/reopen :
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20fixVersion%20%3D%200.10.0%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > > >> >
> > > >> > Three tickets excluded:
> > > >> > - SAMZA-689, @Chinmay, were you able to re-produce it? I am
> thinking
> > > of
> > > >> > mark it as won't fix since 0.10.0 will remove the checkpoint topic
> > > >> > altogether.
> > > >> > - SAMZA-747, we will have to upgrade to RocksDB 3.11.1 to include
> > the
> > > >> fix
> > > >> > that breaks Samza 0.10.0 build on Linux boxes
> > > >> > - SAMZA-723, stream appender deadlock issue
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> > -Yi
> > > >> >
> > > >> >
> > > >> > On Thu, Jul 30, 2015 at 4:52 PM, Yi Pan 
> > wrote:
> > > >> >
> > > >> > > Hi, all,
> > > >> > >
> > > >> > > Thanks a lot for helping out to select the features planned in
> > > 0.10.0.
> > > >> > >
> > > >> > > Based on the above discussion, I am proposing to move the
> > following
> > > >> > > tickets later (i.e. 0.11.0).
> > > >> > >
> > > >> > > 30/32 tickets Status=open/reopen (exception for SAMZA-723:
> stream
> > > >> > appender
> > > >> > > deadlock issue) :
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20fixVersion%20%3D%200.10.0%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
> > > >> > >
> > > >> > > Two tickets excluded:
> > > >> > > - SAMZA-689, @Chinmay, were you able to re-produce it? I am
> > thinking
> > > >> of
> > > >> > > mark it as won't fix since 0.10.0 will remove the checkpoint
> topic
> > > >> > > altogether.
> > > >> > > - SAMZA-747, we will have to upgrade to RocksDB 3.11.1 to
> include
> > > the
> > > >> fix
> > > >> > > that breaks Samza 0.10.0 build on Linux boxes
> > > >> > >
> > > >> > > Anything that I missed?
> > > >> > >
> > > >> > > Thanks!
> > > >> > >
> > > >> > > -Yi
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jul 29, 2015 at 9:57 AM, Navina Ramesh <
> > > >> > > nram...@linkedin.com.invalid> wrote:
> > > >> > >
> > > >> > >> +1 for StreamAppender bug fix as a mandatory in 0.10.
> > > >> > >>
> > > >> > >> @Yan: will review SAMZA-676 today and test it out locally.
> > > >> > >>
> > > >> > >> Thanks!
> > > >> > >> Navina
> > > >> > >>
> > > >> > >> On Wed, Jul 29, 2015 at 8:38 AM, Chinmay Soman <
> > > >> > chinmay.cere...@gmail.com
> > > >> > >> >
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >> > I can take care of SAMZA-340, SAMZA-683  and will follow up
> > with
> > > >> Luis
> > > >> > >> for
> > > >> > >> > SAMZA-401,2,3,4
> > > >> > >> >
> > > >> > >> > On Wed, Jul 29, 2015 at 12:10 AM, Dan  >
> > > >> wrote:
> > > >> > >> >
> > > >> > >> > > I agree SAMZA-741 for the ElasticSearch producer should be
> in
> > > >> too so
> > > >> > >> > we've
> > > >> > >> > > 

Re: Coordinator URL always 127.0.0.1

2015-07-31 Thread Tommy Becker

Hey Yan,

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container

Sorry, the term "container" is overloaded.  In this context by container I 
meant the SamzaContainer.  What we are seeing is that jobs only start when YARN happens 
to place the AM and SamzaContainer(s) on the same node.  Which is increasingly unlikely 
as you increase container count for your job and/or expand your YARN grid.

-Tommy

On 07/30/2015 10:08 PM, Yan Fang wrote:

Hi Thommy,

{quote}
Because I don't see how this is ever going to work in scenarios where the
AM is on a different node than the containers.
{quote}

-- I do not quite understand this part. AM essentially is running in a
container as well. And the http server is brought up in the same container.

{quote}
even if we can't get a better address for the AM from YARN, we could at
least filter the addresses we get back from the JVM to exclude loopbacks.
{quote}

-- You are right. InetAddress.getLocalHost() gives back loopback address
sometimes. We should filter this out. Just googling one possible solution

 .

+ @Yi, @Navina,

Also, I think this fix should go to the 0.10.0 release.

What do you guys think?

Thanks,

Fang, Yan
yanfang...@gmail.com

On Thu, Jul 30, 2015 at 6:39 PM, Yan Fang 
 wrote:



Just one point to add:

{quote}
AM gets notified of container status from the RM.
{quote}

I think this is not 100% correct. AM can communicate with NM through
NMClientAsync

 to
get container status, though Samza does not implement the CallbackHandler.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh <
nram...@linkedin.com.invalid> wrote:



The NM (and hence, by extension the container) heartbeats to the RM, not
the AM. AM gets notified of container status from the RM.
The AM starts / stops /releases a container process by communicating to
the
NM.

Navina


On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker 
 wrote:



Ok, I thought there was some communication from the container to the AM,
it sounds like you're saying it's in the other direction only?  Don't
containers heartbeat to the AM?  Regardless, even if we can't get a


better


address for the AM from YARN, we could at least filter the addresses we


get


back from the JVM to exclude loopbacks.

-Tommy

From: Navina Ramesh 
[nram...@linkedin.com.INVALID]
Sent: Thursday, July 30, 2015 8:40 PM
To: dev@samza.apache.org
Subject: Re: Coordinator URL always 127.0.0.1

Hi Tommy,
Yi is right. Container start is coordinated by the AppMaster using an
NMClient. Container host name and port is provided by the RM during
allocation.
In Yarn (at least, afaik), when the node joins a cluster, the NM


registers


itself with the RM. So, the NM might still be using
getLocalhost.getAddress().

I don't know of any other way to programmatically fetch the machine's
hostname (apart from some hacky shell commands).

Cheers,
Navina

On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan 
 wrote:



Hi, Tommy,

Yeah, I agree that the current implementation is not bullet-proof to


any


different networking configuration on the host. As for the AM <->


container


communication, if I am not mistaken, it is through the NMClient and


the


node HTTP address is wrapped within the Container object returned from


RM.


I am not very familiar with that part of source code. Navina may be


able


to


help more here.

-Yi

On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker 



wrote:





Hi Yi,
Thanks a lot for your reply.  I don't doubt we can get it to work by
mucking with the networking configuration, but to me this feels


like a


workaround, not a solution.


InetAddress.getLocalHost().getHostAddress()


is


not a reliable way of obtaining an IP that other machines can


connect


to.


Just today I tested on several Linux distros and it did not work on


any


of


them.  Can we do something more robust here?  How does the container
communicate status to the AM?

-Tommy


From: Yi Pan [nickpa...@gmail.com]
Sent: Thursday, July 30, 2015 6:48 PM
To: dev@samza.apache.org
Subject: Re: Coordinator URL always 127.0.0.1

Hi, Tommy,

I think that it might be a commonly asked question regarding to


multiple


IPs on a single host. A common trick

Re: Review Request 36545: SAMZA-682 Refactor Coordinator stream messages

2015-07-31 Thread József Márton Jung

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36545/
---

(Updated July 31, 2015, 12:49 p.m.)


Review request for samza.


Changes
---

Updated the patch, it can be cleanly applied to current master branch. Comments 
addressed.


Repository: samza


Description
---

The following has been refactored: 
1. Static inner classes from CoordinatorStreamMessage has been extracted
2. Common functionality from CheckpointManager, ChangelogMappingManager and 
LocalityManager has benn moved to a base class


Diffs (updated)
-

  checkstyle/import-control.xml 6654319 
  samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
7445996 
  samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
55c258f 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/AbstractCoordinatorStreamManager.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
 e5ab4fb 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
 b1078bd 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemProducer.java
 92f8907 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
 f769756 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/CoordinatorStreamMessage.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/Delete.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetChangelogMapping.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetCheckpoint.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetConfig.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetContainerHostMapping.java
 PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/job/model/JobModel.java ad6387d 
  
samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
 7d3409c 
  samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
0dbf14b 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 1c178a6 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamWrappedConsumer.java
 e454593 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamMessage.java
 ac26a01 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemConsumer.java
 c25f6a7 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java
 1ef07d0 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
 c484660 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
84fdeaa 
  samza-yarn/src/main/resources/scalate/WEB-INF/views/index.scaml 41303f7 

Diff: https://reviews.apache.org/r/36545/diff/


Testing
---

Tests has been updated.


Thanks,

József Márton Jung



Review Request 36985: SAMZA-749 Improve documentation for integration tests

2015-07-31 Thread József Márton Jung

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36985/
---

Review request for samza.


Repository: samza


Description
---

The integration tests section in http://samza.apache.org/contribute/tests.html 
should contain more information on how to run the tests locally.


Diffs
-

  docs/contribute/tests.md f485ce2 

Diff: https://reviews.apache.org/r/36985/diff/


Testing
---

Rebuilt the docs module to see if the new information is available. No other 
testing is required, since this is a docs update.


Thanks,

József Márton Jung



Re: Review Request 36545: SAMZA-682 Refactor Coordinator stream messages

2015-07-31 Thread József Márton Jung


> On July 24, 2015, 6:01 p.m., Navina Ramesh wrote:
> > Thanks for picking this up! It feels good to look at a refactored code. 
> > 
> > One suggestion: Please run all the intergration test (including the zopkio 
> > tests) before checking in this patch. I don't think we cleanly start and 
> > stop coordinator stream producers/consumers in all the managers. Please 
> > verify that nothing is broken due to this change.
> 
> József Márton Jung wrote:
> I have difficulties running the zopkio tests. The error message is the 
> following: 
> 2015-07-27 11:36:44,608 zopkio.remote_host_helper [ERROR] Error: 
> JAVA_HOME is not set and could not be found.
> 
> JAVA_HOME is set on my machine (in /etc/profile, so it is available 
> system-wide), echoing it outputs the path to Java installation. I don't knw 
> what is wrong.
> 
> Navina Ramesh wrote:
> I think the problem is with the remote host (host to which zopkio is 
> trying to ssh) not having JAVA_HOME set correctly. Are you running the test 
> on a remote machine?
> 
> József Márton Jung wrote:
> I'm trying to run the tests locally, so zopkio is ssh-ing to localhost. 
> When I connect to localhost through ssh and when I try "echo $JAVA_HOME", it 
> prints the correct path to my Java home. I'm clueless at the moment.
> 
> József Márton Jung wrote:
> Okay, I figured out. It works. Yay! When deployment is going through SSH, 
> it is a non-interctive shell, therefore /etc/profile is not executed.
> More details can be found here: 
> http://askubuntu.com/questions/247738/why-is-etc-profile-not-invoked-for-non-login-shells
> When I added the following line before the line '# If not running 
> interactively, don't do anything' in ~/.bashrc:
> ```
>export JAVA_HOME=/path/to/java/home
> ```
> Integration tests started working.
> 
> Maybe it would be a good idea to mention this on the page where 
> integration tests are described.
> 
> Navina Ramesh wrote:
> Duh.. that's true! It didn't occur to me. Glad you were able to figure it 
> out. 
> Yeah. We should probably mention this in the documentation.

I'll create a ticket for that.

I ran the tests, the run was successful, the report tells that all tests 
passed. Logs are looking good, too.
I'll align my branch with the current master, will run the tests again, and 
I'll attach the new patch.


- József Márton


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36545/#review92938
---


On July 27, 2015, 10:15 a.m., József Márton Jung wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36545/
> ---
> 
> (Updated July 27, 2015, 10:15 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> The following has been refactored: 
> 1. Static inner classes from CoordinatorStreamMessage has been extracted
> 2. Common functionality from CheckpointManager, ChangelogMappingManager and 
> LocalityManager has benn moved to a base class
> 
> 
> Diffs
> -
> 
>   checkstyle/import-control.xml 6654319 
>   samza-core/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
> 7445996 
>   samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
> 55c258f 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/AbstractCoordinatorStreamManager.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
>  e5ab4fb 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
>  b1078bd 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemProducer.java
>  92f8907 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
>  f769756 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/CoordinatorStreamMessage.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/Delete.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetChangelogMapping.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetCheckpoint.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetConfig.java
>  PRE-CREATION 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/messages/SetContainerHostMapping.java
>  PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/job/model/JobModel.java ad6387d 
>   
> samza-core/src/main/java/org/apache/samza/storage/ChangelogPartitionManager.java
>  7d3409c