Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/#review107321
---

Ship it!



samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 (line 84)


I agree. Calling close() doesn't really test the case.


- Navina Ramesh


On Nov. 20, 2015, 6:18 a.m., tao feng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40525/
> ---
> 
> (Updated Nov. 20, 2015, 6:18 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> 1. implement RocksDB flush method; 
> 2. RocksDB flushOptions waitForFlush set to true; 
> 3. unit test
> 
> 
> Diffs
> -
> 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
>  b949793a63951576937fa848bd674ec68f6f9727 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  4620037f2a0da43149a754aa808d3d5d280ea893 
>   
> samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
>  a428a16bc1e9ab4980a6f17db4fd810057d31136 
> 
> Diff: https://reviews.apache.org/r/40525/diff/
> 
> 
> Testing
> ---
> 
> 1. ./gradlew clean build
> 2. ./gradlew checkstyleMain checkstyleTest
> 
> 
> Thanks,
> 
> tao feng
> 
>



Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/#review107320
---



samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 (line 84)


To make sure that it is flush() that write to the disk, not close(), you 
may want to keep this db open and open another db in read-only mode to verify 
that the read-only db sees the data.


- Yi Pan (Data Infrastructure)


On Nov. 20, 2015, 6:18 a.m., tao feng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40525/
> ---
> 
> (Updated Nov. 20, 2015, 6:18 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> 1. implement RocksDB flush method; 
> 2. RocksDB flushOptions waitForFlush set to true; 
> 3. unit test
> 
> 
> Diffs
> -
> 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
>  b949793a63951576937fa848bd674ec68f6f9727 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  4620037f2a0da43149a754aa808d3d5d280ea893 
>   
> samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
>  a428a16bc1e9ab4980a6f17db4fd810057d31136 
> 
> Diff: https://reviews.apache.org/r/40525/diff/
> 
> 
> Testing
> ---
> 
> 1. ./gradlew clean build
> 2. ./gradlew checkstyleMain checkstyleTest
> 
> 
> Thanks,
> 
> tao feng
> 
>



Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/#review107319
---

Ship it!


lgtm

- Jagadish Venkatraman


On Nov. 20, 2015, 6:18 a.m., tao feng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40525/
> ---
> 
> (Updated Nov. 20, 2015, 6:18 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> 1. implement RocksDB flush method; 
> 2. RocksDB flushOptions waitForFlush set to true; 
> 3. unit test
> 
> 
> Diffs
> -
> 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
>  b949793a63951576937fa848bd674ec68f6f9727 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  4620037f2a0da43149a754aa808d3d5d280ea893 
>   
> samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
>  a428a16bc1e9ab4980a6f17db4fd810057d31136 
> 
> Diff: https://reviews.apache.org/r/40525/diff/
> 
> 
> Testing
> ---
> 
> 1. ./gradlew clean build
> 2. ./gradlew checkstyleMain checkstyleTest
> 
> 
> Thanks,
> 
> tao feng
> 
>



Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread tao feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/
---

Review request for samza.


Repository: samza


Description
---

1. implement RocksDB flush method; 
2. RocksDB flushOptions waitForFlush set to true; 
3. unit test


Diffs
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
 b949793a63951576937fa848bd674ec68f6f9727 
  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 4620037f2a0da43149a754aa808d3d5d280ea893 
  
samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 a428a16bc1e9ab4980a6f17db4fd810057d31136 

Diff: https://reviews.apache.org/r/40525/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest


Thanks,

tao feng



Re: Avro vs Protocol buffer for Samza output

2015-11-19 Thread Selina Tech
Hi, Yi:

  Thanks a lot for your reply with information about Avro schema
registry.
  I studied the Avro message on Kafka after your reply, the Avro
message will automatically have  [magic byte][schema id][actual message] after
encode.
  Your mentioned " It is a specific way of maintaining compatibility
between producer and consumer in LinkedIn."  I am wondering how this work?
Any "AvroSchemaRegistry" API for Samza,  Kafka or Avro? Do you know any
link for this API or link for code example?
   In another word, If I send messages out with Schema Id1 to topic
"temp", and then later on I add or delete a filed and then the schema
changed. I send the messages out with Schema Id2 to topic "temp". When I
consumer the temp. how can I decode the message? Should I need the schema
Id, How can I get it? Does Kafka, Samza or Avro implement it?

Sincerely,
Selina

On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan  wrote:

> Hi, Selina,
>
> Samza's producer/consumer is highly tunable. You can configure it to use
> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
> format. The use of Avro in Kafka is LinkedIn's choice and does not
> necessarily fit others.
>
> For the sake of "why LinkedIn uses Avro", here is the biggest reason:
> LinkedIn uses Avro schema registry to ensure that producer/consumer are
> using compatible Avro schema versions. It is a specific way of maintaining
> compatibility between producer and consumer in LinkedIn. ProtoBuf does not
> seem to have the schema registry functionality and requires re-compilation
> to make sure producer and consumer are compatible on the wire-format of the
> message.
>
> If you have other ways to maintain the compatibility between producer and
> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza.
>
> Best,
>
> -Yi
>
> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech 
> wrote:
>
> > Dear All:
> >
> >   I need to generate some data by Samza to Kafka and then write to
> > Parquet formate file.  I was asked why I choose Avro type as my Samza
> > output to Kafka instead of Protocol Buffer. Since currently our data on
> > Kafka are all Protocol buffer.
> >   I explained for Avro encoded message -- The encoded size is
> smaller,
> > no extra code compile, implementation easier.  fast to
> > serialize/deserialize and support a lot language.  However some people
> > believe when encoded the Avro message take as much space as Protocol
> > buffer, but with schema, the size could be much bigger.
> >
> >   I am wondering if there are any other advantages make you choose
> Avro
> > as your message type at Kafka?
> >
> > Sincerely,
> > Selina
> >
>


Re: Review Request 40472: SAMZA-816: avoid creating and registering CoordinatorSystemConsumer in LocalityManager in SamzaContainer

2015-11-19 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40472/#review107253
---

Ship it!


A few nits. Fix them and then, +1 !


samza-core/src/main/java/org/apache/samza/container/LocalityManager.java (line 
86)


Can you add some comment/javadoc explain which components can read locality 
information?



samza-core/src/test/java/org/apache/samza/container/TestLocalityManager.java 
(line 54)


nit: line break after test annotations

This is the format everywhere else in the codebase.


cou

- Navina Ramesh


On Nov. 19, 2015, 7:46 a.m., Yi Pan (Data Infrastructure) wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40472/
> ---
> 
> (Updated Nov. 19, 2015, 7:46 a.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and 
> Navina Ramesh.
> 
> 
> Bugs: SAMZA-816
> https://issues.apache.org/jira/browse/SAMZA-816
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-816: avoid creating and registering CoordinatorSystemConsumer in 
> LocalityManager in SamzaContainer
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
> d19b5740069706cb98c0e59507dd7d4595aaa8b3 
>   
> samza-core/src/main/java/org/apache/samza/coordinator/stream/AbstractCoordinatorStreamManager.java
>  ca97ce80d0a56e1ace3931a06bd5cf062d9516d7 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 3787b85e5cc738d16c0d1eaea4de3345a3a9106c 
>   
> samza-core/src/test/java/org/apache/samza/container/TestLocalityManager.java 
> PRE-CREATION 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java
>  9d8c98ed442cd893d657a580ba3794cda3aa0efa 
> 
> Diff: https://reviews.apache.org/r/40472/diff/
> 
> 
> Testing
> ---
> 
> Unit test and local builds all passed.
> 
> 
> Thanks,
> 
> Yi Pan (Data Infrastructure)
> 
>



Re: Sample code or tutorial for writing/reading Avro type message in Samza

2015-11-19 Thread Luis Casillas

I did a Samza proof of concept project recently and I ended up writing this 
code:

https://gist.github.com/ldcasillas-progreso/871af3c1a1790be975fd

In the end, however, I switched the project from Avro to JSON.  The issue is 
that Avro is designed to work with its self-describing container file format, 
which embeds the schema used to write the records in the file.  Avro’s schema 
evolution features rely on this embedded schema; when the embedded schema and 
the reader’s schema are not equal, Avro uses its special rules to translate the 
old data to the new schema.

But when you’re working with Kafka/Samza, there is no container file.  
Therefore, none of the schema evolution tools work.  Therefore, if you change 
your Avro schema, you likely won’t be able to read any of the old messages 
again.

There’s a Kafka Avro schema registry project that aims to fix this:

https://github.com/confluentinc/schema-registry

I tried it but the released version just was not mature enough—which is why I 
ended up using JSON.  But I did write a Serde that encodes/decodes the Avro 
objects in JSON:

https://gist.github.com/ldcasillas-progreso/3611d40d2833aa62c1b3

Hope this helps.





On 11/17/15, 12:32 AM, "Selina Tech"  wrote:

>Dear All:
> Do you know where I can find the tutorial or sample code for writing
>Avro type message to Kafka and reading Avro type message from Kafka in
>Samza?
>  I am wondering how should I serialized GenericRecord to byte and
>deserialized it?
> Your comments/suggestion are highly appreciated.
>
>Sincerely,
>Selina


---
This message and any files or text attached to it are intended only for the 
recipients named above, and contain information that is confidential or 
privileged. If you are not an intended recipient, you must not read, copy, use 
or disclose this communication. Please also notify the sender by replying to 
this message, and then delete all copies of it from your system.

Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a los 
destinatarios especificados en el encabezado y contiene información 
confidencial y/o privilegiada. Si usted no es el destinatario no deberá leer, 
copiar, usar o divulgar el contenido. Por favor notifique al remitente, 
respondiendo a esté mensaje y elimine todas las copias del mismo de su sistema.


Re: Review Request 40457: SAMZA-788 - coordinator stream configuration should not guess the system names

2015-11-19 Thread Navina Ramesh


> On Nov. 19, 2015, 5:59 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala,
> >  line 72
> > 
> >
> > The patch to SAMZA-754 already added this to StreamTaskTestUtils.

Fixed it and verified.


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40457/#review107209
---


On Nov. 19, 2015, 10:05 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40457/
> ---
> 
> (Updated Nov. 19, 2015, 10:05 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Yan Fang, Chris Riccomini, Jagadish 
> Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-788
> https://issues.apache.org/jira/browse/SAMZA-788
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-788 - coordinator stream configuration should not guess the system names
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
> 6d73bb9d3ef9c95445ec7c5f539d6e6279286a73 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
>  f9c630419bc355d03c28e3221acde9efd8e17ae6 
>   samza-core/src/test/resources/test-migration-fail.properties 
> b0657defeebe5ed61ebc7f1dfa7688a583071fe1 
>   samza-core/src/test/resources/test.properties 
> 41eb82eb673a82fa75f464c69000ddcde0bd17cb 
> 
> Diff: https://reviews.apache.org/r/40457/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 40457: SAMZA-788 - coordinator stream configuration should not guess the system names

2015-11-19 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40457/
---

(Updated Nov. 19, 2015, 10:05 p.m.)


Review request for samza, Boris Shkolnik, Yan Fang, Chris Riccomini, Jagadish 
Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).


Bugs: SAMZA-788
https://issues.apache.org/jira/browse/SAMZA-788


Repository: samza


Description
---

SAMZA-788 - coordinator stream configuration should not guess the system names


Diffs (updated)
-

  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
6d73bb9d3ef9c95445ec7c5f539d6e6279286a73 
  
samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
 f9c630419bc355d03c28e3221acde9efd8e17ae6 
  samza-core/src/test/resources/test-migration-fail.properties 
b0657defeebe5ed61ebc7f1dfa7688a583071fe1 
  samza-core/src/test/resources/test.properties 
41eb82eb673a82fa75f464c69000ddcde0bd17cb 

Diff: https://reviews.apache.org/r/40457/diff/


Testing
---

./gradlew clean build


Thanks,

Navina Ramesh



Re: Review Request 40485: SAMZA-767 yarn.queue option is not used anywhere

2015-11-19 Thread Boris Shkolnik

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40485/#review107232
---

Ship it!



samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala (line 76)


nit. add a new line here.


- Boris Shkolnik


On Nov. 19, 2015, 2:56 p.m., Aleksandar Pejakovic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40485/
> ---
> 
> (Updated Nov. 19, 2015, 2:56 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Added missing code from 
> [SAMZA-491](https://issues.apache.org/jira/browse/SAMZA-491)
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java a572aa2 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> a2b9279 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala 02f46a1 
> 
> Diff: https://reviews.apache.org/r/40485/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aleksandar Pejakovic
> 
>



Re: Review Request 40485: SAMZA-767 yarn.queue option is not used anywhere

2015-11-19 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40485/#review107221
---

Ship it!


Agree. Fix the nit and then +1 !

- Navina Ramesh


On Nov. 19, 2015, 2:56 p.m., Aleksandar Pejakovic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40485/
> ---
> 
> (Updated Nov. 19, 2015, 2:56 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Added missing code from 
> [SAMZA-491](https://issues.apache.org/jira/browse/SAMZA-491)
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java a572aa2 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> a2b9279 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala 02f46a1 
> 
> Diff: https://reviews.apache.org/r/40485/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aleksandar Pejakovic
> 
>



Re: Review Request 40485: SAMZA-767 yarn.queue option is not used anywhere

2015-11-19 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40485/#review107215
---

Ship it!


Fix the nit and +1


samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala (line 76)


nit: better to follow the convention that each input argument to 
submitApplication() starts a newline and aligned.


- Yi Pan (Data Infrastructure)


On Nov. 19, 2015, 2:56 p.m., Aleksandar Pejakovic wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40485/
> ---
> 
> (Updated Nov. 19, 2015, 2:56 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Added missing code from 
> [SAMZA-491](https://issues.apache.org/jira/browse/SAMZA-491)
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java a572aa2 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> a2b9279 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala 02f46a1 
> 
> Diff: https://reviews.apache.org/r/40485/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aleksandar Pejakovic
> 
>



Re: Review Request 40457: SAMZA-788 - coordinator stream configuration should not guess the system names

2015-11-19 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40457/#review107209
---

Ship it!


LGTM except one comment.


samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 (line 72)


The patch to SAMZA-754 already added this to StreamTaskTestUtils.


- Yi Pan (Data Infrastructure)


On Nov. 18, 2015, 10:13 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40457/
> ---
> 
> (Updated Nov. 18, 2015, 10:13 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Yan Fang, Chris Riccomini, Jagadish 
> Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-788
> https://issues.apache.org/jira/browse/SAMZA-788
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-788 - coordinator stream configuration should not guess the system names
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
> 6d73bb9d3ef9c95445ec7c5f539d6e6279286a73 
>   
> samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
>  f9c630419bc355d03c28e3221acde9efd8e17ae6 
>   samza-core/src/test/resources/test-migration-fail.properties 
> b0657defeebe5ed61ebc7f1dfa7688a583071fe1 
>   samza-core/src/test/resources/test.properties 
> 41eb82eb673a82fa75f464c69000ddcde0bd17cb 
>   
> samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownStatefulTask.scala
>  06a107bdc180c4590d5f2225639b8455da8edc91 
>   
> samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
>  224090333f26715024b4e2eb39f095e64927ed01 
> 
> Diff: https://reviews.apache.org/r/40457/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: question on yarn.container.cpu.cores

2015-11-19 Thread Jagadish Venkatraman
Hi Chen,

Yes, theoretically, it's possible and it'll certainly increase parallelism.

You should be careful about auto-commit when you implement this, Say you
have auto-commit turned on, and do a time-consuming operation for certain
types of messages in your process method (like - a long computation/ a
remote call), and return from process immediately after you submit your
message to your threadpool. If Samza's auto-commit kicks in, and commits
the message offset, and the fails before your threadpool completes, then,
you've lost that submitted message to the threadpool (Since, after your
restart jobs resume from the checkpoint).

You should turn auto-commit off and call task.commit yourself when all your
threadpool tasks complete. (maybe, in window method) That way, you'll
achieve parallelism. However, you may shoot yourself in the foot if not
implemented right ;-)




On Wed, Nov 18, 2015 at 8:40 PM, Chen Song  wrote:

> Thanks Navina
>
> So theoretically I can create a thread pool within a container. I know it
> is very hacky but it should increase parallelism.
>
> Chen
>
> On Mon, Nov 16, 2015 at 5:49 PM, Navina Ramesh 
> wrote:
>
> > Hi Chen,
> > Samza container is still single threaded. In case of yarn based
> deployment,
> > Samza uses this config value to verify that the cluster has sufficient
> > capacity to support running your job.
> >
> > Apart from this verification, I don't believe we utilize this config
> value.
> > If you set it to > 1, it won't have any effect on the Samza job execution
> > itself. However, you may end-up under-utilizing your Yarn cluster
> > resources.
> >
> > HTH!
> > Navina
> >
> > On Mon, Nov 16, 2015 at 2:32 PM, Chen Song 
> wrote:
> >
> > > According to the documentation, each Samza container is single
> threaded.
> > > Why giving yarn.container.cpu.cores as a config option and what is the
> > > implication
> > > to set this to a value > 1?
> > >
> > > --
> > > Chen Song
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>
>
>
> --
> Chen Song
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University


Re: question on yarn.container.cpu.cores

2015-11-19 Thread Rick Mangi
I’m not sure you can do this for the purposes of consuming messages faster, but 
I’m interested to hear if I’m mistaken.

You could spawn threads to do other work within a job though.




> On Nov 18, 2015, at 11:40 PM, Chen Song  wrote:
> 
> Thanks Navina
> 
> So theoretically I can create a thread pool within a container. I know it
> is very hacky but it should increase parallelism.
> 
> Chen
> 
> On Mon, Nov 16, 2015 at 5:49 PM, Navina Ramesh  wrote:
> 
>> Hi Chen,
>> Samza container is still single threaded. In case of yarn based deployment,
>> Samza uses this config value to verify that the cluster has sufficient
>> capacity to support running your job.
>> 
>> Apart from this verification, I don't believe we utilize this config value.
>> If you set it to > 1, it won't have any effect on the Samza job execution
>> itself. However, you may end-up under-utilizing your Yarn cluster
>> resources.
>> 
>> HTH!
>> Navina
>> 
>> On Mon, Nov 16, 2015 at 2:32 PM, Chen Song  wrote:
>> 
>>> According to the documentation, each Samza container is single threaded.
>>> Why giving yarn.container.cpu.cores as a config option and what is the
>>> implication
>>> to set this to a value > 1?
>>> 
>>> --
>>> Chen Song
>>> 
>> 
>> 
>> 
>> --
>> Navina R.
>> 
> 
> 
> 
> --
> Chen Song



signature.asc
Description: Message signed with OpenPGP using GPGMail


Review Request 40485: SAMZA-767 yarn.queue option is not used anywhere

2015-11-19 Thread Aleksandar Pejakovic

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40485/
---

Review request for samza.


Repository: samza


Description
---

Added missing code from 
[SAMZA-491](https://issues.apache.org/jira/browse/SAMZA-491)


Diffs
-

  samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java a572aa2 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
a2b9279 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnJob.scala 02f46a1 

Diff: https://reviews.apache.org/r/40485/diff/


Testing
---


Thanks,

Aleksandar Pejakovic