Re: Review Request 50828: SAMZA-994 Fix StreamAppender to work with the refactored Job Coordinator

2016-08-05 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50828/#review144983
---


Ship it!




Ship It!

- Yi Pan (Data Infrastructure)


On Aug. 5, 2016, 1:32 a.m., Jagadish Venkatraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50828/
> ---
> 
> (Updated Aug. 5, 2016, 1:32 a.m.)
> 
> 
> Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
> Xinyu Liu.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> The Stream Appender recognizes that it's running in the app-master by 
> checking for the presence of the string "application-master" in the 
> container.id environment variable. However, the refactoring changed the AM to 
> Job Coordinator. The Stream Appender must be updated to look for the 
> "job-coordinator" in the container.id instead of "application-master".
> 
> In the absence of this change, the stream appender in the AM will crash.
> 
> 
> Diffs
> -
> 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> ca4eb7fa890fe8d7d8f6e2f59351b35c0e41 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  e2e17a06159a35234270f342e8ae1c81031d971b 
> 
> Diff: https://reviews.apache.org/r/50828/diff/
> 
> 
> Testing
> ---
> 
> Updated unit tests and tested with a sample job.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>



Re: Review Request 50619: SAMZA-963: add KV storage engine timers to help identify the issues on kv stores and also add unit test

2016-08-05 Thread Fred Ji


> On Aug. 5, 2016, 6:31 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala,
> >  line 135
> > 
> >
> > Why do we need to add the default function, if you already defined the 
> > default in KeyValueStorageEngine.scala?

Thanks a lot. My initial consideration was to minimize potential errors if 
somebody would like to add new paramters for this constructor since there are a 
lot already. I also saw some similar uses in the existing samza code base as 
well. 

I am OK on either one. I can follow whatever code standard our team has.


- Fred


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50619/#review144968
---


On July 29, 2016, 10:24 p.m., Fred Ji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50619/
> ---
> 
> (Updated July 29, 2016, 10:24 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Bugs: SAMZA-963
> https://issues.apache.org/jira/browse/SAMZA-963
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-963: add KV storage engine timers to help identify the issues on kv 
> stores and also add unit test
> 
> 
> Diffs
> -
> 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
>  c975893a42689732c39c39600fecacee843bf9d6 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
>  a3ffc421020b7a84c40b2101f2e37db8a20690cb 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngineMetrics.scala
>  233fba91caf041bfb78189efef00ce8fc56f9f15 
>   
> samza-kv/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStorageEngine.scala
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50619/diff/
> 
> 
> Testing
> ---
> 
> 1. unit test is successful on a newly created test file for 
> KeyValueStorageEngine: ./gradlew clean :samza-kv_2.10:test 
> -Dtest.single=TestKeyValueStorageEngine
> 2. build and all unit tests are successful: ./gradlew clean build
> 3. ./gradlew checkstyleMain checkstyleTest passed
> 4. manually tested on local machine for a stateful sample job depending on 
> KVStore, and from jconsole, the corresponding metrics were seen in mbeans 
> (see attached file) and the metrics were updated as expected.
> 
> 
> File Attachments
> 
> 
> snapshot of KeyValueStorageEngineMetrics bean from jconsole on local test
>   
> https://reviews.apache.org/media/uploaded/files/2016/07/29/ce4b0456-73f9-44d5-af7d-0c45bf91bcaa__KeyValueStorageEngineMetricsFromJconsole.png
> 
> 
> Thanks,
> 
> Fred Ji
> 
>



Re: State store changelog format

2016-08-05 Thread David Yu
I'm reporting back my observations after enabling compression.

Looks like compression is not doing anything. I'm still seeing
"compression-rate-avg=1.0" and the same "record-size-avg" from JMX
"kafka.producer" metrics.

I did set the following:
systems.kafka.producer.compression.type=snappy

Am I missing anything?

Thanks,
David

On Wed, Aug 3, 2016 at 1:48 PM David Yu  wrote:

> Great. Thx.
>
> On Wed, Aug 3, 2016 at 1:42 PM Jacob Maes  wrote:
>
>> Hey David,
>>
>> what gets written to the changelog topic
>>
>> The changelog gets the same value as the store, which is the serialized
>> form of the key and value. The serdes for the store are configured with
>> the
>> properties:
>> stores.store-name.key.serde
>> stores.store-name.msg.serde
>>
>> If I want to compress the changelog topic, do I enable that from the
>> > producer?
>>
>> Yes. When you specify the changelog for your store, you specify it in
>> terms
>> of a SystemStream (typically a Kafka topic). In the part of the config
>> where you define the Kafka system, you can pass any Kafka producer config
>> . So to
>> configure compression you should configure the following property.
>> systems.system-name.producer.compression.type
>>
>> Hope this helps.
>> -Jake
>>
>>
>>
>> On Wed, Aug 3, 2016 at 11:16 AM, David Yu 
>> wrote:
>>
>> > I'm trying to understand what gets written to the changelog topic. Is it
>> > just the serialized value of the particular state store entry? If I
>> want to
>> > compress the changelog topic, do I enable that from the producer?
>> >
>> > The reason I'm asking is that, we are seeing producer throughput issues
>> and
>> > suspected that writing to changelog takes up most of the network
>> bandwidth.
>> >
>> > Thanks,
>> > David
>> >
>>
>


Re: Understand KV store restoring

2016-08-05 Thread David Yu
Sorry, you are right. It is going through the changelog topic partition by
partition, which happens sequentially.

Thanks,
David

On Fri, Aug 5, 2016 at 1:53 PM Navina Ramesh 
wrote:

> Hi David,
>
> For a given container, it should go through the entire changelog only for
> the partitions owned by the tasks in the container. Restoration should
> happen only once and not multiple times.
>
> What logs statements do you see that indicate that it is going through the
> changelog multiple times? Can you please share that ?
>
> Thanks!
> Navina
>
> On Fri, Aug 5, 2016 at 11:39 AM, David Yu  wrote:
>
> > Within a given container, does the restoration process go through the
> > changelog topic once to restore ALL stores in that container? From the
> > logs, I have a feeling that it is going through the changelog multiple
> > times.
> >
> > Can anyone confirm?
> >
> > Thanks,
> > David
> >
>
>
>
> --
> Navina R.
>


Re: Understand KV store restoring

2016-08-05 Thread Navina Ramesh
Hi David,

For a given container, it should go through the entire changelog only for
the partitions owned by the tasks in the container. Restoration should
happen only once and not multiple times.

What logs statements do you see that indicate that it is going through the
changelog multiple times? Can you please share that ?

Thanks!
Navina

On Fri, Aug 5, 2016 at 11:39 AM, David Yu  wrote:

> Within a given container, does the restoration process go through the
> changelog topic once to restore ALL stores in that container? From the
> logs, I have a feeling that it is going through the changelog multiple
> times.
>
> Can anyone confirm?
>
> Thanks,
> David
>



-- 
Navina R.


Understand KV store restoring

2016-08-05 Thread David Yu
Within a given container, does the restoration process go through the
changelog topic once to restore ALL stores in that container? From the
logs, I have a feeling that it is going through the changelog multiple
times.

Can anyone confirm?

Thanks,
David


Re: Review Request 50619: SAMZA-963: add KV storage engine timers to help identify the issues on kv stores and also add unit test

2016-08-05 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50619/#review144968
---


Ship it!




LGTM. Just one minor comment. Thanks!


samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
 (line 135)


Why do we need to add the default function, if you already defined the 
default in KeyValueStorageEngine.scala?


- Yi Pan (Data Infrastructure)


On July 29, 2016, 10:24 p.m., Fred Ji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50619/
> ---
> 
> (Updated July 29, 2016, 10:24 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Yi Pan (Data 
> Infrastructure).
> 
> 
> Bugs: SAMZA-963
> https://issues.apache.org/jira/browse/SAMZA-963
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-963: add KV storage engine timers to help identify the issues on kv 
> stores and also add unit test
> 
> 
> Diffs
> -
> 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
>  c975893a42689732c39c39600fecacee843bf9d6 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
>  a3ffc421020b7a84c40b2101f2e37db8a20690cb 
>   
> samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngineMetrics.scala
>  233fba91caf041bfb78189efef00ce8fc56f9f15 
>   
> samza-kv/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStorageEngine.scala
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50619/diff/
> 
> 
> Testing
> ---
> 
> 1. unit test is successful on a newly created test file for 
> KeyValueStorageEngine: ./gradlew clean :samza-kv_2.10:test 
> -Dtest.single=TestKeyValueStorageEngine
> 2. build and all unit tests are successful: ./gradlew clean build
> 3. ./gradlew checkstyleMain checkstyleTest passed
> 4. manually tested on local machine for a stateful sample job depending on 
> KVStore, and from jconsole, the corresponding metrics were seen in mbeans 
> (see attached file) and the metrics were updated as expected.
> 
> 
> File Attachments
> 
> 
> snapshot of KeyValueStorageEngineMetrics bean from jconsole on local test
>   
> https://reviews.apache.org/media/uploaded/files/2016/07/29/ce4b0456-73f9-44d5-af7d-0c45bf91bcaa__KeyValueStorageEngineMetricsFromJconsole.png
> 
> 
> Thanks,
> 
> Fred Ji
> 
>



Re: Samza yarn job - cannot bind to local host

2016-08-05 Thread Jacob Maes
Hey Shekar,

There's currently no way to force the port or disable the JMXServer.

We've seen port conflicts (fixed in 10.1) and connection issues due to VPN
but no connection issue for other reasons.

While googling that exception, I found a couple cases without clear
resolutions, but they were both on Windows machines. Are your servers
running Windows?

Ultimately it seems that there is something atypical about your environment
compared to how Samza is commonly run. If we can determine what that is,
hopefully that'll lead to a fix in either the environment or Samza. So any
other details about your cluster you can provide would be useful.

-Jake

On Fri, Aug 5, 2016 at 7:33 AM, Shekar Tippur  wrote:

> Any pointers on this please. I am completely blocked.
>
> - Shekar
>
> On Thu, Aug 4, 2016 at 5:05 PM, Shekar Tippur  wrote:
>
> > This server is not connected to vpn.
> >
> > - Shekar
> >
>


Re: Samza yarn job - cannot bind to local host

2016-08-05 Thread Shekar Tippur
Any pointers on this please. I am completely blocked.

- Shekar

On Thu, Aug 4, 2016 at 5:05 PM, Shekar Tippur  wrote:

> This server is not connected to vpn.
>
> - Shekar
>