Re: [VOTE] Apache Samza 1.7.0 RC1

2022-03-15 Thread Yi Pan
+1 (binding).

Ran check-all, verified the signature and checksums. All passed.

Thanks for pushing 1.7.0 out of the door!

Yi

On Fri, Mar 11, 2022 at 2:31 PM Xinyu Liu  wrote:

> +1 (binding).
>
> Verified the signature and checksums, and also ran check-all tests which
> all passed.
>
> Thanks,
> Xinyu
>
> On Fri, Mar 11, 2022 at 2:00 PM Bob S  wrote:
>
> > +1
> > Ran build,test and both integration tests (regular + standalone) and
> > check-all.
> > Verified signatures, sha and md5.
> > Thanks Daniel!
> >
> > On Wed, Mar 9, 2022 at 4:34 PM Daniel Chen  wrote:
> >
> > > Hey all, This is a call for a vote on a release of Apache Samza 1.7.0.
> > > Thanks to everyone who has contributed to this release.
> > >
> > > The release candidate can be downloaded from here:
> > >
> > > https://home.apache.org/~dchen/samza-1.7.0-rc1/
> > >
> > > The release candidate is signed with pgp key 1D9ADCE059431C34, which is
> > > included in the repository's KEYS file:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=c5831bfc01b2e70ba57c4bd3505c6a84a73c8a7b
> > > and can also be found on keyservers:
> > >
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=dchen%40apache.org=on=index
> > >
> > > The git tag is release-1.7.0-rc1 and signed with the same pgp key:
> > >
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.7.0-rc1
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > >
> > > Scala 2.11:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1092/
> > > Scala 2.12:
> > >
> https://repository.apache.org/content/repositories/orgapachesamza-1093/
> > >
> > > The vote will be open for 72 hours ( end in 5:00pm Saturday, 03/12/2022
> > ).
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote: [ ] +1 approve [ ] +0 no opinion [ ]
> > -1
> > > disapprove (and reason why)
> > >
> > > I ran check-all.sh and bor...@apache.org helped run integration tests
> > > (both
> > > YARN and standalone) passed, for rc1
> > >
> > > +1 from my side for the release.
> > > Thanks,
> > > Daniel
> > >
> >
>


Re: [RESULT][VOTE] Apache Samza 1.7.0 RC1

2022-03-15 Thread Yi Pan
Thanks, Daniel!

Just want to mention that Boris also voted +1 (binding).

Best!

-Yi

On Tue, Mar 15, 2022 at 9:22 AM Daniel Chen  wrote:

> Hey all,
>
> The vote for 1.7.0 release has been out for more than 72 hours and we got
> +1(binding) x3  from Yi, Xinyu, Daniel
>
> Samza 1.7.0 officially passed the VOTE phase!
>
> Thanks to everyone who helped with the validation!
>
> Daniel
>


[REPORT] Samza - Feb 2022

2022-02-09 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza 1.7.x release is in DISCUSSION to include the following major
features
  - [SAMZA-2591] Introduce Async State Backup API (SEP-28)
  - [SAMZA-2657] Blob store backed state backup and restore (SEP-29)
  - [SAMZA-2709] Adding partial updates to Samza Table API (SEP-30)
  - [SAMZA-2716] Upgrade to Kafka 2.4
- Samza auto-scaling presented in Stream Processing Meetup@LinkedIn on Dec 1

## Community Health:
JIRA
- 11 issues opened in JIRA, past quarter (-72% change)
- 24 issues closed in JIRA, past quarter (166% increase)
COMMITS
- 28 commits in the past quarter (-15% decrease)
- 15 code contributors in the past quarter (87% increase)
- 24 PRs opened on GitHub, past quarter (-42% change)
- 25 PRs closed on GitHub, past quarter (-30% change)


Re: [VOTE] SEP-30: Support Updates in Table API

2022-01-24 Thread Yi Pan
Discussed and resolved the minor concerns offline. +1 (binding) for this
one.

Thanks!

-Yi

On Tue, Dec 21, 2021 at 1:28 PM Xinyu Liu  wrote:

> +1 on my side.
>
> Glad to see this feature coming. Please make sure the api changes are
> reflected in the documents, e.g.
> https://samza.apache.org/learn/documentation/1.0.0/api/table-api.html.
>
> Thanks,
> Xinyu
>
> On Mon, Dec 20, 2021 at 10:44 AM Ajo Thomas 
> wrote:
>
> > Hi All,
> >
> > This is a call for a vote on SEP-30: Support Updates in Table API
> > Thanks to everyone involved with the design and reviews to refine the
> > proposal.
> >
> > Email Thread:
> >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202112.mbox/%3cCAAMuQDN9fX64KONdqD1n06xTXvgMXNUqkt2RnPnt9Zr=vjn...@mail.gmail.com%3e
> >
> > SEP-30:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-30%3A+Support+Updates+in+Table+API
> >
> > Jira ticket:
> > https://issues.apache.org/jira/browse/SAMZA-2709
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Ajo Thomas
> >
>


Re: [DISCUSS] Apache Samza 1.7.0 RC0

2022-01-26 Thread Yi Pan
Huge +1! Can't wait to see this list of features coming out!

-Yi

On Wed, Jan 26, 2022 at 2:36 PM Daniel Chen  wrote:

> Hi folks,
>
> We have added a number of major features and changes to master since
>
> 1.6, that warrants a major 1.7 release.
>
> Within LinkedIn, some of these features have already been tested as
>
> part of our test suites and are currently used in many of our production
> jobs. We plan to continue our testing in the coming weeks to validate the
> stability prior to release.
>
> We wanted to kick off the discussion in the open source forum to keep
>
> the momentum flowing.
>
> Here is a selected list of major features that are part of the new release:
>
>
>
>-
>
>SEP-28: Samza State Backend Interface and Checkpointing Improvements
>(#1514)
>-
>
>SEP-29: Blob Store as backend for Samza State backup and restore (#1501)
>-
>
>SEP-30: Adding partial update api to Table API (#1560)
>
>
>
> You can find a concrete list of the features, bug-fixes, upgrades here
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%201.7
>


[REPORT] Samza - April 2022

2022-04-13 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (7 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza 1.7.0 is released on 2022-04-04
- Stream Processing Meetup@LinkedIn on Kafka, Samza held on 2022-04-07

## Community Health:
JIRA:
13 issues opened in JIRA, past quarter (-13% change)
13 issues closed in JIRA, past quarter (-23% change)
Commits:
32 commits in the past quarter (10% increase)
11 code contributors in the past quarter (-21% change)
22 PRs opened on GitHub, past quarter (-15% change)
20 PRs closed on GitHub, past quarter (-20% change)


Re: Running v1.7.0 locally

2022-09-02 Thread Yi Pan
Hey, Malcolm,

Thanks for reporting this issue. Could you open a JIRA to track that?

Best!

-Yi

On Mon, Aug 29, 2022 at 5:53 PM Malcolm McFarland 
wrote:

> Hey folks,
>
> I've recently been attempting to upgrade our legacy application from Samza
> 1.5.1 to 1.7.0. With version 1.5.1, I've had no problems running the
> application with this command:
>
> ./bin/run-app.sh --config-path=path/to/file.properties
>
> Starting in 1.6.0, this doesn't seem to work. As far as I can tell, the
> application is starting fully up without errors and then is simply shutting
> down, once again without error. Afaict it runs fine on YARN. Does Samza
> v1.6.0+ support running local processes? I've tried this on both OS X and
> Ubuntu, using Java 1.8.
>
> Here are the relevant portions of the properties file:
>
> task.class=com.cavulus.task.SimpleLegacyTask
> job.factory.class=org.apache.samza.job.local.ThreadJobFactory
> job.default.system=kafka
>
> systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> job.name=simple-legacy-task
> task.inputs=kafka.event-input
>
> ...plus serdes, ZooKeeper configuration, etc, etc. Here are the last few
> lines of logging output:
>
> 2022-08-29 17:19:42,842  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Sending metadata request
> (type=MetadataRequest, topics=) to node localhost:9092 (id: -1 rack: null)
> 2022-08-29 17:19:42,843  INFO   [org.apache.kafka.clients.Metadata]
>  Cluster ID: fwnjhL2kQayFxN0xpatT-g
> 2022-08-29 17:19:42,843  DEBUG  [org.apache.kafka.clients.Metadata]
>  Updated cluster metadata version 2 to Cluster(id = fwnjhL2kQayFxN0xpatT-g,
> nodes = [localhost:9092 (id: 0 rack: null)], partitions = [], controller =
> localhost:9092 (id: 0 rack: null))
> 2022-08-29 17:19:42,843  DEBUG
>  [org.apache.samza.system.kafka.KafkaSystemAdmin]  Stream
> simple-legacy-task-broadcast-stream has partitions [Partition(topic =
> simple-legacy-task-broadcast-stream, partition = 0, leader = 0, replicas =
> [0], isr = [0], offlineReplicas = [])]
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Initiating connection to node localhost:9092
> (id: 0 rack: null)
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.bytes-sent
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.bytes-received
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.metrics.Metrics]
>  Added sensor with name node-0.latency
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.common.network.Selector]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Created socket with SO_RCVBUF = 342972,
> SO_SNDBUF = 146988, SO_TIMEOUT = 0 to node 0
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Completed connection to node 0. Fetching API
> versions.
> 2022-08-29 17:19:42,844  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Initiating API versions fetch from node 0.
> 2022-08-29 17:19:42,845  DEBUG  [org.apache.kafka.clients.NetworkClient]
>  [Consumer clientId=kafka_admin_consumer-simple_legacy_task-1,
> groupId=simple-legacy-task-1] Recorded API versions for node 0:
> (Produce(0): 0 to 7 [usable: 6], Fetch(1): 0 to 11 [usable: 8],
> ListOffsets(2): 0 to 5 [usable: 3], Metadata(3): 0 to 8 [usable: 6],
> LeaderAndIsr(4): 0 to 2 [usable: 1], StopReplica(5): 0 to 1 [usable: 0],
> UpdateMetadata(6): 0 to 5 [usable: 4], ControlledShutdown(7): 0 to 2
> [usable: 1], OffsetCommit(8): 0 to 7 [usable: 4], OffsetFetch(9): 0 to 5
> [usable: 4], FindCoordinator(10): 0 to 2 [usable: 2], JoinGroup(11): 0 to 5
> [usable: 3], Heartbeat(12): 0 to 3 [usable: 2], LeaveGroup(13): 0 to 2
> [usable: 2], SyncGroup(14): 0 to 3 [usable: 2], DescribeGroups(15): 0 to 3
> [usable: 2], ListGroups(16): 0 to 2 [usable: 2], SaslHandshake(17): 0 to 1
> [usable: 1], ApiVersions(18): 0 to 2 [usable: 2], CreateTopics(19): 0 to 3
> [usable: 3], DeleteTopics(20): 0 to 3 [usable: 2], DeleteRecords(21): 0 to
> 1 [usable: 1], InitProducerId(22): 0 to 1 [usable: 1],
> OffsetForLeaderEpoch(23): 0 to 3 [usable: 1], AddPartitionsToTxn(24): 0 to
> 1 [usable: 1], AddOffsetsToTxn(25): 0 to 1 [usable: 1], EndTxn(26): 0 to 1
> [usable: 1], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 to
> 2 [usable: 1], DescribeAcls(29): 0 to 1 [usable: 1], CreateAcls(30): 0 to 1
> [usable: 1], DeleteAcls(31): 0 to 1 [usable: 1], DescribeConfigs(32): 0 to
> 2 [usable: 2], AlterConfigs(33): 0 to 1 [usable: 1],
> AlterReplicaLogDirs(34): 0 to 1 [usable: 1], 

Re: Java 11 Checkin again

2022-09-02 Thread Yi Pan
Hey, James,

Thanks for the ping. @prateek, can we have someone to review this change?

One question: have you tested the change w/ the older YARN cluster version
(running 2.10.1)? If this change requires YARN cluster upgrade to 3.3.4 as
well, that may be a breaking change to existing Samza users (i.e. LinkedIn
is still running a YARN cluster with version 2.10.1).

Best and apologize for the delay.

-Yi

On Fri, Sep 2, 2022 at 8:56 AM James DeMichele
 wrote:

> Hey y'all. I just am not sure how to get some traction on these Java 11
> PRs.
>
> https://github.com/apache/samza/pull/1628
> https://github.com/apache/samza-hello-samza/pull/87
>
> Would someone that is a maintainer for Samza just let us know that y'all
> are looking at them? I can stop pestering you :)
>
> I ran all tests in both PRs, all pass. I also confirmed that using my Samza
> PR in the Hello World app all works with Java 11.
>
> Thanks!
>
> -Jamie
>


Re: Java 11 Checkin again

2022-09-19 Thread Yi Pan
Hi, James,

Thanks a lot for reporting this. I will take a look this week.

Best!

-Yi

On Mon, Sep 19, 2022 at 8:20 AM James DeMichele
 wrote:

> Hey Yi. I can take a look at this. I do want to point out that your current
> "master" branch is actually broken for using Scala 2.11.
>
> You can repro by just going into the master branch using Java 8 and
> compiling like this
>
> $ java -version
> openjdk version "1.8.0_332"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)
>
> ./gradlew build -PscalaSuffix=2.11
>
> The build fails with this command using that version of Java 8 ^.
>
> Anyway, just wanted to point that out since I hit this in my branch
> trying to utilize the "bin/check-all.sh" script. That doesn't block
> me/us, but just wanted to call it out.
>
> -Jamie
>
>
>
> On Wed, Sep 14, 2022 at 5:52 PM Yi Pan  wrote:
>
> > Hey, James,
> >
> > In order to merge your PR without breaking the jdk8 older modules, we
> will
> > need the changes proposed here. Can you try to add those build script
> > changes in the same PR? We will definitely help review and merge it.
> >
> > Best!
> >
> > -Yi
> >
> > On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
> >  wrote:
> >
> > > Also, do you have a timeline for when this could be completed? Thanks.
> > >
> > > On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> > > james.demich...@redfin.com>
> > > wrote:
> > >
> > > > That sounds like a great solution to me if that works for y'all!
> > > >
> > > > Note too, the Java 11 and yarn 3 module need to only use the Scala
> 2.12
> > > > version of the build.
> > > >
> > > > Jamie
> > > >
> > > >
> > > > On Wed, Sep 14, 2022, 2:38 AM Yi Pan  wrote:
> > > >
> > > >> Hi, James,
> > > >>
> > > >> Sorry to reply late. I just came back from a trip and had a
> discussion
> > > >> with
> > > >> our internal team as well. So, there is one proposal other than
> > > creating a
> > > >> branch. Let me elaborate it below:
> > > >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and
> be
> > > the
> > > >> hosting module for most of the jdk11 related changes.
> > > >> b) modify the build script s.t. samza-yarn3 will only compile and
> > build
> > > >> with jdk11 and samza-yarn only compile and build with jdk8.
> > > >> Thus, we can have two builds: jdk8 build that builds with samza-yarn
> > w/
> > > >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN
> > 3.3.0.
> > > >> We
> > > >> can manage to publish both jdk8 and jdk11 artifacts if needed.
> > > >> The benefit of this approach is that we can still maintain the trunk
> > > >> release while opening up the jdk11 support.
> > > >>
> > > >> Let me know if that works for you and we can work together to get
> the
> > > code
> > > >> in.
> > > >>
> > > >> Best!
> > > >>
> > > >> -Yi
> > > >>
> > > >> On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
> > > >>  wrote:
> > > >>
> > > >> > Hey Yi, I wanted to follow up here and figure what a path forward
> is
> > > >> here.
> > > >> > We need to move to Java 11, and Samza currently is our only
> blocking
> > > >> issue.
> > > >> > In order to move to Java 11, the Yarn Cluster would need to run on
> > > Java
> > > >> 11
> > > >> > correct? If that's the case, then it would need to be 3.3+. I
> don't
> > > know
> > > >> > what it entails on your end to have a new Major version, but that
> > > seems
> > > >> > like a good option here right? Version 2 could be where we can
> move
> > > this
> > > >> > project forward to Java 11, while Version 1 can still remain, and
> > > would
> > > >> not
> > > >> > break people that can't/won't upgrade to Java 11.
> > > >> >
> > > >> > -Jamie
> > > >> >
> > > >> > On Tue, Sep 6, 2022 at 9:55 AM James DeMichele <
> > > >> james.demich...@redfin.com

Re: Java 11 Checkin again

2022-09-20 Thread Yi Pan
Hi, James,

Sorry that I was busy during the day and couldn't check your email. I have
joined the slack channel you created. Let's discuss there.

Best,

-Yi

On Mon, Sep 19, 2022 at 9:23 AM James DeMichele
 wrote:

> Hey, Yi! Thanks.
>
> I started a slack channel that maybe would make it easier to communicate if
> I have questions. I do have one issue that I am hitting between the 2
> different Yarn versions I think and that I am not entirely sure what to do
> about. I made a change to a Test class that was needed for a
> compilation fix:
>
> https://github.com/apache/samza/pull/1628/files#diff-34db8b18730bda1058014e87ec3ad88dfc03f79854b00a339407761d224f66e9
> and the class is TestSamzaYarnAppMasterLifecycle.scala.
>
> I'm not sure how to go about toggling this class between a compatible one
> for yarn 2.10.1 and 3.3.4.
>
> Thanks!
>
> -Jamie
>
> On Mon, Sep 19, 2022 at 11:00 AM Yi Pan  wrote:
>
> > Hi, James,
> >
> > Thanks a lot for reporting this. I will take a look this week.
> >
> > Best!
> >
> > -Yi
> >
> > On Mon, Sep 19, 2022 at 8:20 AM James DeMichele
> >  wrote:
> >
> > > Hey Yi. I can take a look at this. I do want to point out that your
> > current
> > > "master" branch is actually broken for using Scala 2.11.
> > >
> > > You can repro by just going into the master branch using Java 8 and
> > > compiling like this
> > >
> > > $ java -version
> > > openjdk version "1.8.0_332"
> > > OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
> > > OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)
> > >
> > > ./gradlew build -PscalaSuffix=2.11
> > >
> > > The build fails with this command using that version of Java 8 ^.
> > >
> > > Anyway, just wanted to point that out since I hit this in my branch
> > > trying to utilize the "bin/check-all.sh" script. That doesn't block
> > > me/us, but just wanted to call it out.
> > >
> > > -Jamie
> > >
> > >
> > >
> > > On Wed, Sep 14, 2022 at 5:52 PM Yi Pan  wrote:
> > >
> > > > Hey, James,
> > > >
> > > > In order to merge your PR without breaking the jdk8 older modules, we
> > > will
> > > > need the changes proposed here. Can you try to add those build script
> > > > changes in the same PR? We will definitely help review and merge it.
> > > >
> > > > Best!
> > > >
> > > > -Yi
> > > >
> > > > On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
> > > >  wrote:
> > > >
> > > > > Also, do you have a timeline for when this could be completed?
> > Thanks.
> > > > >
> > > > > On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> > > > > james.demich...@redfin.com>
> > > > > wrote:
> > > > >
> > > > > > That sounds like a great solution to me if that works for y'all!
> > > > > >
> > > > > > Note too, the Java 11 and yarn 3 module need to only use the
> Scala
> > > 2.12
> > > > > > version of the build.
> > > > > >
> > > > > > Jamie
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 14, 2022, 2:38 AM Yi Pan 
> wrote:
> > > > > >
> > > > > >> Hi, James,
> > > > > >>
> > > > > >> Sorry to reply late. I just came back from a trip and had a
> > > discussion
> > > > > >> with
> > > > > >> our internal team as well. So, there is one proposal other than
> > > > > creating a
> > > > > >> branch. Let me elaborate it below:
> > > > > >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0
> > and
> > > be
> > > > > the
> > > > > >> hosting module for most of the jdk11 related changes.
> > > > > >> b) modify the build script s.t. samza-yarn3 will only compile
> and
> > > > build
> > > > > >> with jdk11 and samza-yarn only compile and build with jdk8.
> > > > > >> Thus, we can have two builds: jdk8 build that builds with
> > samza-yarn
> > > > w/
> > > > > >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/
> YARN
> > > > 3.3.0.
> > > > > >> We

Re: Java 11 Checkin again

2022-09-14 Thread Yi Pan
Hi, James,

Sorry to reply late. I just came back from a trip and had a discussion with
our internal team as well. So, there is one proposal other than creating a
branch. Let me elaborate it below:
a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and be the
hosting module for most of the jdk11 related changes.
b) modify the build script s.t. samza-yarn3 will only compile and build
with jdk11 and samza-yarn only compile and build with jdk8.
Thus, we can have two builds: jdk8 build that builds with samza-yarn w/
YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN 3.3.0. We
can manage to publish both jdk8 and jdk11 artifacts if needed.
The benefit of this approach is that we can still maintain the trunk
release while opening up the jdk11 support.

Let me know if that works for you and we can work together to get the code
in.

Best!

-Yi

On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
 wrote:

> Hey Yi, I wanted to follow up here and figure what a path forward is here.
> We need to move to Java 11, and Samza currently is our only blocking issue.
> In order to move to Java 11, the Yarn Cluster would need to run on Java 11
> correct? If that's the case, then it would need to be 3.3+. I don't know
> what it entails on your end to have a new Major version, but that seems
> like a good option here right? Version 2 could be where we can move this
> project forward to Java 11, while Version 1 can still remain, and would not
> break people that can't/won't upgrade to Java 11.
>
> -Jamie
>
> On Tue, Sep 6, 2022 at 9:55 AM James DeMichele  >
> wrote:
>
> > Yeah I mean if Samza works fine with the hadoop-yarn library running
> > against a 3.3.x YARN cluster, then I don't mind keeping that library of
> > 2.10.x in Samza's code. But it is still a moot point in terms of
> upgrading
> > your YARN cluster, since it must be upgraded to 3.3.x+ in order to be
> able
> > to run the Cluster with Java 11.
> >
> > @Yi, I think that moving to a new major version might be the solution
> > here. That way Linkedin can still have a pathway of upgrading code for
> the
> > old legacy 1.x version of Samza. While a new major version of 2.x of
> Samza
> > could then make it a requirement that it runs with a YARN cluster of
> 3.3.x
> > if you want to use Java 11.
> >
> > The only issue there is that you'll probably need to backport changes
> > between the 2 versions. But in all honestly, this project does not look
> > extremely active with commits so it might not be that big of a problem.
> >
> > -Jamie
> >
> > On Fri, Sep 2, 2022 at 9:08 PM Malcolm McFarland  >
> > wrote:
> >
> >> Hi all,
> >>
> >> I've been doing a little bit of testing with Samza and Hadoop 3.3.4;
> >> afaict, in light testing, Samza seems to work fine using the 2.10.x
> >> hadoop-yarn library against a YARN cluster running 3.3.x. As Jamie
> pointed
> >> out, YARN didn't incorporate Java 11 compatibility until v3.3.0 (
> >> https://hadoop.apache.org/docs/r3.3.0/index.html). Are there any unit
> >> tests
> >> in Samza that verify compatibility against a YARN cluster? If so, that
> >> could be a place to validate YARN v2.10/v3.3 cross-compatibility.
> >>
> >> Just throwing my 2 cents out there,
> >> Malcolm McFarland
> >> Cavulus
> >>
> >> On Fri, Sep 2, 2022 at 6:27 PM James DeMichele
> >>  wrote:
> >>
> >> > Hey Yi,
> >> >
> >> > Thanks for getting back to me. I have not tried the older yarn cluster
> >> > version yet in the Samza app running against 3.3.4 but I am wary it
> >> would
> >> > work. Yarn itself is not compatible at 2.10.1 with Java 11 so you
> would
> >> > have to update yarn even if the Java library here wasn't updated.
> >> >
> >> > Could we move this version I'm proposing to a 2.x version of Samza? So
> >> > people that wanted to move forward with yarn upgrade and Samza and
> Java
> >> 11
> >> > (like us) could do so? Then 1.x could only be java 8 compatible and
> 2.x
> >> > could be java 11.
> >> >
> >> > Jamie
> >> >
> >> > On Fri, Sep 2, 2022, 6:44 PM Yi Pan  wrote:
> >> >
> >> > > Hey, James,
> >> > >
> >> > > Thanks for the ping. @prateek, can we have someone to review this
> >> change?
> >> > >
> >> > > One question: have you tested the change w/ the older YARN cluster
> >> > version
> >> > > (running 2.10.1)? If this change requires YARN cluster

Re: Java 11 Checkin again

2022-09-14 Thread Yi Pan
Hey, James,

In order to merge your PR without breaking the jdk8 older modules, we will
need the changes proposed here. Can you try to add those build script
changes in the same PR? We will definitely help review and merge it.

Best!

-Yi

On Wed, Sep 14, 2022 at 8:00 AM James DeMichele
 wrote:

> Also, do you have a timeline for when this could be completed? Thanks.
>
> On Wed, Sep 14, 2022 at 7:16 AM James DeMichele <
> james.demich...@redfin.com>
> wrote:
>
> > That sounds like a great solution to me if that works for y'all!
> >
> > Note too, the Java 11 and yarn 3 module need to only use the Scala 2.12
> > version of the build.
> >
> > Jamie
> >
> >
> > On Wed, Sep 14, 2022, 2:38 AM Yi Pan  wrote:
> >
> >> Hi, James,
> >>
> >> Sorry to reply late. I just came back from a trip and had a discussion
> >> with
> >> our internal team as well. So, there is one proposal other than
> creating a
> >> branch. Let me elaborate it below:
> >> a) creating a new module samza-yarn3 that depends on YARN 3.3.0 and be
> the
> >> hosting module for most of the jdk11 related changes.
> >> b) modify the build script s.t. samza-yarn3 will only compile and build
> >> with jdk11 and samza-yarn only compile and build with jdk8.
> >> Thus, we can have two builds: jdk8 build that builds with samza-yarn w/
> >> YARN 2.10.0, and jdk11 build that builds with samza-yanr3 w/ YARN 3.3.0.
> >> We
> >> can manage to publish both jdk8 and jdk11 artifacts if needed.
> >> The benefit of this approach is that we can still maintain the trunk
> >> release while opening up the jdk11 support.
> >>
> >> Let me know if that works for you and we can work together to get the
> code
> >> in.
> >>
> >> Best!
> >>
> >> -Yi
> >>
> >> On Tue, Sep 13, 2022 at 7:53 AM James DeMichele
> >>  wrote:
> >>
> >> > Hey Yi, I wanted to follow up here and figure what a path forward is
> >> here.
> >> > We need to move to Java 11, and Samza currently is our only blocking
> >> issue.
> >> > In order to move to Java 11, the Yarn Cluster would need to run on
> Java
> >> 11
> >> > correct? If that's the case, then it would need to be 3.3+. I don't
> know
> >> > what it entails on your end to have a new Major version, but that
> seems
> >> > like a good option here right? Version 2 could be where we can move
> this
> >> > project forward to Java 11, while Version 1 can still remain, and
> would
> >> not
> >> > break people that can't/won't upgrade to Java 11.
> >> >
> >> > -Jamie
> >> >
> >> > On Tue, Sep 6, 2022 at 9:55 AM James DeMichele <
> >> james.demich...@redfin.com
> >> > >
> >> > wrote:
> >> >
> >> > > Yeah I mean if Samza works fine with the hadoop-yarn library running
> >> > > against a 3.3.x YARN cluster, then I don't mind keeping that library
> >> of
> >> > > 2.10.x in Samza's code. But it is still a moot point in terms of
> >> > upgrading
> >> > > your YARN cluster, since it must be upgraded to 3.3.x+ in order to
> be
> >> > able
> >> > > to run the Cluster with Java 11.
> >> > >
> >> > > @Yi, I think that moving to a new major version might be the
> solution
> >> > > here. That way Linkedin can still have a pathway of upgrading code
> for
> >> > the
> >> > > old legacy 1.x version of Samza. While a new major version of 2.x of
> >> > Samza
> >> > > could then make it a requirement that it runs with a YARN cluster of
> >> > 3.3.x
> >> > > if you want to use Java 11.
> >> > >
> >> > > The only issue there is that you'll probably need to backport
> changes
> >> > > between the 2 versions. But in all honestly, this project does not
> >> look
> >> > > extremely active with commits so it might not be that big of a
> >> problem.
> >> > >
> >> > > -Jamie
> >> > >
> >> > > On Fri, Sep 2, 2022 at 9:08 PM Malcolm McFarland <
> >> mmcfarl...@cavulus.com
> >> > >
> >> > > wrote:
> >> > >
> >> > >> Hi all,
> >> > >>
> >> > >> I've been doing a little bit of testing with Samza and Hadoop
> 3.3.4;
> >&g

Re: Request for new release

2022-10-15 Thread Yi Pan
Hi, James,

Thanks for the reminder. We are preparing the new 1.8 release. It is
expected to be the end of this quarter.

Best,

-Yi

On Mon, Oct 10, 2022 at 12:52 PM James DeMichele
 wrote:

> Hello, we just had a pr merged to main in the Samza app that now supports
> Java 11 runtime environments.
>
> Could we get a new official release of this project?
>
> Here's the pr: https://github.com/apache/samza/pull/1628
>
> Jamie
>


Re: [VOTE] Apache Samza 1.8.0 RC0

2023-01-06 Thread Yi Pan
(+1) binding,

Downloaded the src tarball, run check-all.sh and passed all tests.

One thing noticed: there are configurations for your personal keys used for
publishing the jars checked in gradle.properties. I don't think that we
need to include that in the published src tarball.

Otherwise, lgtm.

Thanks a lot!

-Yi

On Wed, Dec 21, 2022 at 4:40 PM Xinyu Liu  wrote:

> +1 (binding).
>
> Verified the md5 and sha1 checksums. Run check-all.sh on linux and the
> build/tests all passed. Please also generate the sha256 checksums for the
> release artifacts, according to Apache's requirements for open source
> releases.
>
> Thanks,
> Xinyu
>
> On Wed, Dec 21, 2022 at 9:47 AM Ajo Thomas  wrote:
>
> > Hey All,
> >
> > This is a call for a vote on the release of *Apache Samza 1.8.0.*
> > Thanks to everyone who contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > https://home.apache.org/~ajothomas/samza-1.8.0-rc0/
> > The release candidate is signed with pgp key *1A4639DA*, which is
> included
> > in the repository's KEYS file:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS
> > and
> > can also be found on keyservers:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> > <
> >
> https://keyserver.ubuntu.com/pks/lookup?search=ajothomas%40apache.org=on=index
> > >
> >
> > The git tag is *release-1.8.0-rc0* and signed with the same pgp key:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> > <
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.8.0-rc0
> > >
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > URL: https://repository.apache.org/#stagingRepositories
> > 
> > Repository: orgapachesamza-1095 (org.apache.samza)
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Please note that check-all.sh was run and integration tests *were not
> *run
> > as there are some issues with the legacy zopkio library used for
> > integration testing. However, most of the key features being released as
> a
> > part of this release have been tested and are currently used in many of
> our
> > production jobs at LinkedIn. hadoop/yarn3 changes have been tested with
> > https://github.com/apache/samza-hello-samza which brings up yarn 3,
> > zookeeper and kafka clusters locally for testing.
> >
> > Thanks,
> > Ajo
> >
>


Re: [ANNOUNCE] Welcome Ajo Thomas as Samza Committer

2022-12-16 Thread Yi Pan
Welcome and congrats, Ajo!

- Yi

On Wed, Dec 14, 2022 at 3:42 PM Xinyu Liu  wrote:

> Hi, All,
>
> I am glad to announce that Ajo Thomas has officially accepted our
> invitation and become an Apache Samza Committer now.
>
> Ajo has made contributions to improve both Samza user experience and
> operability greatly. He added the partial update functionality to Samza
> Table API to allow field-level updates to stores. He developed the
> “Pipeline Drain” feature for cleaning up intermediate data and state before
> introducing backward incompatible changes. He is also actively working on
> the next release of Samza 1.8.
>
> Considering his contributions, the Samza PMC trusts Ajo with the
> responsibilities of a Samza Committer.
>
> Please join me to give him a warm welcome!
>
> Xinyu Liu
> on behalf of the Apache Samza PMC
>


Re: SEP-31: Pipeline Drain: Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-08 Thread Yi Pan
As discussed offline and see the clarifications in the SEP, +1 (binding)

On Fri, Dec 2, 2022 at 8:05 AM Ajo Thomas  wrote:

> Hi Yi,
>
> The order currently is infinity watermark followed by drain control message
> for every source SSP (all input SSPs - intermediate SSPs) to insert in the
> in-memory buffer in SystemConsumers. Prior to this step, we also
> stop calling refresh in Chooser to make sure that the last messages in the
> in-memory SSP buffer are the watermark and drain messages.
> Infinity watermark is essentially tasked with flushing windows and
> triggers.
> Drain message essentially signals to the processing logic that it is the
> last message for SSP and it should shutdown. We track the SSPs that have
> received this token in a task. Once all SSPs have been drained, the task is
> marked ready to shutdown. Once all tasks are ready to shutdown, RunLoop
> shuts down.
>
> Do you see any issues with it ?
>
> - Ajo
>
>
> On Thu, 1 Dec 2022 at 20:06, Yi Pan  wrote:
>
> > Hi, Ajo,
> >
> > Sorry to reply this late. Could you clarify one thing in the design: For
> > watermark triggered window draining, is the infinitive watermark trigger
> > happen first, or the drain token in all source SSP happen first?
> Shouldn't
> > it be the following sequence: a) all drain token from all input source
> SSPs
> > (except for intermediate streams) are received by tasks ==> b) infinite
> > watermark triggers from the source and flush all window/triggers in the
> > pipeline ==> c) once the infinite watermark is propagated through all
> > stages in the pipeline, stops the tasks. Could you confirm?
> >
> > Thanks a lot!
> >
> > -Yi
> >
> > On Thu, Nov 17, 2022 at 9:48 AM Ajo Thomas 
> wrote:
> >
> > > Hi All,
> > >
> > > Samza currently doesn't have a way to gracefully drain pipelines before
> > > making a backward-incompatible intermediate schema change. We have
> added
> > a
> > > feature called Pipeline Drain to the samza engine to address this
> > problem.
> > > Here is the SEP page for it:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain%3A+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
> > >
> > >
> > > If there are no major blockers, we are tentatively seeking to open a
> vote
> > > on Monday, Nov 28th, 2022.
> > >
> > > Thanks,
> > > Ajo
> > >
> >
>


Re: [VOTE] SEP-31: Pipeline Drain- Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-08 Thread Yi Pan
+1. Long awaited feature! Thanks!

-Yi

On Tue, Nov 29, 2022 at 11:46 AM Xinyu Liu  wrote:

> +1.
>
> Overall the design looks good. Thanks for contributing to this feature.
>
> Thanks,
> Xinyu
>
> On Tue, Nov 29, 2022 at 10:44 AM Ajo Thomas 
> wrote:
>
> > Hi All,
> >
> > This is a call for a vote on *SEP-31: Pipeline Drain- Support the ability
> > to drain pipelines to allow incompatible intermediate schema changes.*
> > Thanks to everyone involved with the design and reviews to refine the
> > proposal.
> >
> > Discuss Email Thread:
> > https://lists.apache.org/thread/7m2hqcqq9lx9o1d48gb64glplb3g2crt
> >
> > SEP-31:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain-+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
> >
> > Jira ticket:
> > https://issues.apache.org/jira/browse/SAMZA-2741
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > Thanks,
> > Ajo
> >
>


Re: SEP-31: Pipeline Drain: Support the ability to drain pipelines to allow incompatible intermediate schema changes

2022-12-01 Thread Yi Pan
Hi, Ajo,

Sorry to reply this late. Could you clarify one thing in the design: For
watermark triggered window draining, is the infinitive watermark trigger
happen first, or the drain token in all source SSP happen first? Shouldn't
it be the following sequence: a) all drain token from all input source SSPs
(except for intermediate streams) are received by tasks ==> b) infinite
watermark triggers from the source and flush all window/triggers in the
pipeline ==> c) once the infinite watermark is propagated through all
stages in the pipeline, stops the tasks. Could you confirm?

Thanks a lot!

-Yi

On Thu, Nov 17, 2022 at 9:48 AM Ajo Thomas  wrote:

> Hi All,
>
> Samza currently doesn't have a way to gracefully drain pipelines before
> making a backward-incompatible intermediate schema change. We have added a
> feature called Pipeline Drain to the samza engine to address this problem.
> Here is the SEP page for it:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-31%3A+Pipeline+Drain%3A+Support+the+ability+to+drain+pipelines+to+allow+incompatible+intermediate+schema+changes
>
>
> If there are no major blockers, we are tentatively seeking to open a vote
> on Monday, Nov 28th, 2022.
>
> Thanks,
> Ajo
>


Re: [DISCUSS] SEP-32: Elasticity for Samza

2023-01-19 Thread Yi Pan
Hey, Manasa,

Sorry to chime in late. A few questions:
a) how are states for the virtual tasks managed during split/merge?
b) what's perf impact when we have 2 virtual tasks on the same SSP in the
same container, while one virtual task is much faster than the other?
c) what's the reason that a virtual task can not filter older messages from
a previous offset, in case the container restarts from a smaller offset
from another virtual task consuming the same SSP?
d) how do we compare this w/ an alternative idea that implements a
KeyedOrderedExecutor w/ multiple parallel threads within the single task's
main event loop to increase the parallelism?

Best,

-Yi


On Thu, Jan 19, 2023 at 3:26 PM Lakshmi Manasa 
wrote:

> hi all,
>
>  if there are no concerns or questions about this SEP, I shall start the
> vote email thread tomorrow.
>
> thanks,
> Manasa
>
> On Fri, Jan 6, 2023 at 8:08 AM Lakshmi Manasa 
> wrote:
>
> > Hi all,
> >   We created SEP-32: Elasticity for Samza.
> >
> > Please find SEP here (
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
> > )
> >   Please take a look and provide feedback. thanks, Manasa
> >
>


[REPORT] Samza - Nov 2022

2022-11-09 Thread Yi Pan
## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (8 years ago)
There are currently 29 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- No new committers. Last addition was Daniel Chen on 2021-09-17.

## Project Activity:
- Samza upgrade to be runtime compatible w/ Java 11 and YARN 3.3
- Stream Processing Meetup@LinkedIn on Kafka, Samza held on 2022-10-19

## Community Health:
JIRA:
13 issues opened in JIRA, past quarter (no change)
10 issues closed in JIRA, past quarter (233% increase)
Commits:
11 commits in the past quarter (-50% decrease)
7 code contributors in the past quarter (-36% change)
16 PRs opened on GitHub, past quarter (-23% change)
13 PRs closed on GitHub, past quarter (-35% change)


Re: [DISCUSS] SEP-32: Elasticity for Samza

2023-02-06 Thread Yi Pan
ty as virtual tasks can be spread across hosts whereas
>> increased throughput due to all keys (single task) in key ordered executor
>> sitting in the same host will increase the load on the host and (c) if one
>> or more of the parallel units (threads here) needs more resources, it will
>> result in large container which makes scheduling harder as finding large
>> chunks takes longer in a cluster whereas with virtual tasks, we can have
>> smaller containers for virtual tasks.
>>
>>
>> Please let me know if the above answers make sense and if there are any
>> follow-ups for this SEP.
>>
>> On Thu, Jan 19, 2023 at 10:33 PM Yi Pan  wrote:
>>
>>> Hey, Manasa,
>>>
>>> Sorry to chime in late. A few questions:
>>> a) how are states for the virtual tasks managed during split/merge?
>>> b) what's perf impact when we have 2 virtual tasks on the same SSP in the
>>> same container, while one virtual task is much faster than the other?
>>> c) what's the reason that a virtual task can not filter older messages
>>> from
>>> a previous offset, in case the container restarts from a smaller offset
>>> from another virtual task consuming the same SSP?
>>> d) how do we compare this w/ an alternative idea that implements a
>>> KeyedOrderedExecutor w/ multiple parallel threads within the single
>>> task's
>>> main event loop to increase the parallelism?
>>>
>>> Best,
>>>
>>> -Yi
>>>
>>>
>>> On Thu, Jan 19, 2023 at 3:26 PM Lakshmi Manasa <
>>> lakshmimanas...@gmail.com>
>>> wrote:
>>>
>>> > hi all,
>>> >
>>> >  if there are no concerns or questions about this SEP, I shall start
>>> the
>>> > vote email thread tomorrow.
>>> >
>>> > thanks,
>>> > Manasa
>>> >
>>> > On Fri, Jan 6, 2023 at 8:08 AM Lakshmi Manasa <
>>> lakshmimanas...@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi all,
>>> > >   We created SEP-32: Elasticity for Samza.
>>> > >
>>> > > Please find SEP here (
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
>>> > > )
>>> > >   Please take a look and provide feedback. thanks, Manasa
>>> > >
>>> >
>>>
>>


Re: [VOTE] SEP-32: Elasticity for Samza

2023-02-08 Thread Yi Pan
+1 (binding)

Thanks!

-Yi

On Tue, Feb 7, 2023 at 2:14 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> +1 (binding)
>
> Cheers,
> Bharath
>
> On Tue, Feb 7, 2023 at 12:56 PM Lakshmi Manasa 
> wrote:
>
> > Hi folks,
> >
> >  This is a call for vote on SEP-32: Elasticity for Samza.
> > Thank you for reviewing the SEP and giving feedback.
> >
> > I have addressed the comments on the SEP and since there were three +1 on
> > the discuss thread, starting this vote.
> >
> > Discussion thread:
> > https://lists.apache.org/thread/vjtl5fnf64kpkoxc591466y92dlt2bsb
> >
> > SEP:
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
> >
> > Please vote:
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > thanks,
> > Manasa
> >
>


Re: Review Request 30287: Stream SQL Object Model Draft

2015-01-27 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30287/#review69927
---


The examples look clean and neat. I really liked it. One high-level overview 
question: how do we map the operator classes to this intermediate layer of OM? 
That would probably help to understand how to translate from this OM to the 
operators.


samza-sql/src/main/java/org/apache/samza/sql/om/ArithmeticExpression.java
https://reviews.apache.org/r/30287/#comment114764

Don't quite get the use of the codeoperands/code here. From the enum 
type, it seems that the ArithmeticExpression is binary operators with 
codelhs/code and coderhs/code. Not sure what's the intension to have 
the additional list of operands here?



samza-sql/src/main/java/org/apache/samza/sql/om/FromExpression.java
https://reviews.apache.org/r/30287/#comment114766

This FromExpression is confusing to me. IMO, from clause should just take 
one expression as parameter which defines a data source. It seems to me that 
FROM t1 JOIN t2 ON conditions should be considered as FROM JOIN expression, 
in which, JOIN expression has two operands t1 and t2 and condition as a 
compound logic expression following ON. Maybe, FROM should not be an 
expression. It simply signifies a data source in the statement.



samza-sql/src/main/java/org/apache/samza/sql/om/PartitionByExpression.java
https://reviews.apache.org/r/30287/#comment114770

Conceptually, I would prefer to differentiate clause from expression. 
IMO, the hierarchy should be: statement = clauses = expressions = operands.



samza-sql/src/main/java/org/apache/samza/sql/om/Stream.java
https://reviews.apache.org/r/30287/#comment114776

Following Chris' comment on remote stream, if we defines a canonical name 
fashion for all remote and local streams, that would be better. I.e. all 
streams that are in memory are named as local:astream and all remote streams 
are named as kafka:bstream.



samza-sql/src/main/java/org/apache/samza/sql/om/TimeVaryingRelation.java
https://reviews.apache.org/r/30287/#comment114777

I had a feeling that the expression classes may look better organized if we 
categorize the expressions based on the output entity that they represent. i.e. 
essentially, window expression and join expression are expressions that all 
generate a table-type data source entity.


- Yi Pan (Data Infrastructure)


On Jan. 26, 2015, 10:04 p.m., Milinda Pathirage wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30287/
 ---
 
 (Updated Jan. 26, 2015, 10:04 p.m.)
 
 
 Review request for samza, Chris Riccomini and Yi Pan (Data Infrastructure).
 
 
 Repository: samza
 
 
 Description
 ---
 
 WIP: Stream SQL Object Model 
 
 Overview view of the design:
 
 There are three main types of objects in this model
 
 * Expression: Can be a reference to a column of a table or a field of a tuple 
 or a property of a nested data structure, a arithmetic expression, a logical 
 expression, a function or expressions that can be used in where clauses, 
 having clauses, field based window clauses  
 * DataSource: Data source can be a stream, a table or a time varying relation 
 results from applying a window operator to a stream. (**I'm not sure whether 
 DataSource is the correct naming. We need to discuss this.**)
 * Statement: Stream SQL statements like SELECT, INSERT and UPDATE
 
 Then we have a simple factory (**StreamSQLFactory**) which creates different 
 types of expressions. And I have implemented a fluent style API for 
 **Select** to demonstrate how building a query using this OM will looks like. 
 Two examples queries can be found in **StreamSQLSamples**.
 
 This is the first draft. Please feel free to comment on this. 
 
 
 Diffs
 -
 
   build.gradle 7a40ad4 
   gradle/wrapper/gradle-wrapper.properties 78596c0 
   samza-sql/src/main/java/org/apache/samza/sql/om/AbstractDataSource.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/AbstractExpression.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/AliasedExpression.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/ArithmeticExpression.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/CompareExpression.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/FieldReference.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/FromExpression.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/FunctionExpressions.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/Literal.java PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/om/LogicalExpression.java 
 PRE-CREATION

Re: Review Request 29754: StreamSQL operator API draft

2015-01-30 Thread Yi Pan (Data Infrastructure)
 
  
samza-sql/src/main/java/org/apache/samza/sql/operators/stream/InsertStream.java 
PRE-CREATION 
  
samza-sql/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
 PRE-CREATION 
  
samza-sql/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
 PRE-CREATION 
  samza-sql/src/main/java/org/apache/samza/sql/operators/window/WindowSpec.java 
PRE-CREATION 
  
samza-sql/src/main/java/org/apache/samza/sql/operators/window/WindowState.java 
PRE-CREATION 
  samza-sql/src/main/java/org/apache/samza/sql/router/SimpleRouter.java 
PRE-CREATION 
  
samza-sql/src/main/java/org/apache/samza/task/sql/OperatorMessageCollector.java 
PRE-CREATION 
  samza-sql/src/main/java/org/apache/samza/task/sql/SqlMessageCollector.java 
PRE-CREATION 
  samza-sql/src/main/java/org/apache/samza/task/sql/StoreMessageCollector.java 
PRE-CREATION 
  samza-sql/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java 
PRE-CREATION 
  samza-sql/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  settings.gradle 3a01fd66359b8c79954ae8f34eeaf4b2e3fdc0b4 

Diff: https://reviews.apache.org/r/29754/diff/


Testing
---

run ./bin/check-all.sh passed


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 31909: SAMZA-590

2015-03-11 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31909/#review76120
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On March 10, 2015, 7:26 p.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31909/
 ---
 
 (Updated March 10, 2015, 7:26 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-590
 https://issues.apache.org/jira/browse/SAMZA-590
 
 
 Repository: samza
 
 
 Description
 ---
 
 update test to check for refreshes as well
 
 
 add test
 
 
 abdicate all partitions in broker proxy when there's a consumer failure
 
 
 Diffs
 -
 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 f768263961d395c5d21a857a7581e0c472bbe547 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestBrokerProxy.scala
  d559d8b276007ebb99b37b2ebf42c1b415e3fbce 
 
 Diff: https://reviews.apache.org/r/31909/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/#review76381
---



samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java
https://reviews.apache.org/r/32006/#comment123930

It seems that the default is set to false. Just curious: don't we always 
want to see the file location info when using StreamAppender? Or is it included 
somewhere else?



samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java
https://reviews.apache.org/r/32006/#comment123931

It seems that the default return value of this method has changed and no 
longer default the return value to LoggingEventStringSerdeFactory. It would be 
better to remove this Java doc description here.



samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
https://reviews.apache.org/r/32006/#comment123934

nit: SerdeObject


- Yi Pan (Data Infrastructure)


On March 13, 2015, 12:57 a.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32006/
 ---
 
 (Updated March 13, 2015, 12:57 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 add docs
 
 
 fail if containerName is not set.
 
 
 update tests
 
 
 move location enabled config to log4j config file
 
 
 fix MDC link in docs
 
 
 add a logging event json serde for log4j, and set it as default.
 
 
 default to log4j string serde for now
 
 
 Diffs
 -
 
   build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
   docs/learn/documentation/versioned/jobs/configuration-table.html 
 ec1287418042b95df73ff7c36a684d3123c46372 
   docs/learn/documentation/versioned/jobs/logging.md 
 af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
 107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
 4ef3551f470e77e27bd156e81ce96486f25c21bf 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
  PRE-CREATION 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
  PRE-CREATION 
   
 samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
 16ccb459892f62245648235eb65f53b26e8ecb87 
   
 samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
  3e4ddc9c72868e22f993f60015224cd3a153266c 
 
 Diff: https://reviews.apache.org/r/32006/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/#review76412
---



samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala
https://reviews.apache.org/r/32052/#comment123960

nit: can we still keep this import in the original line?



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
https://reviews.apache.org/r/32052/#comment123963

Just a question for my sake of understanding: it seems that we were 
throwing exception for partition metadata errors and we stopped doing so in the 
new code. Two questions here:
1. What would happen to the cache entry if we don't check 
partitionMetadata.errorCode?
2. Assuming that the errored partition metadata is not inserted in the 
cache, getOffsets would raise exception? And how do we capture that case?


- Yi Pan (Data Infrastructure)


On March 13, 2015, 7:56 p.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32052/
 ---
 
 (Updated March 13, 2015, 7:56 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-592
 https://issues.apache.org/jira/browse/SAMZA-592
 
 
 Repository: samza
 
 
 Description
 ---
 
 refresh topic metadata if partitions have bad error codes. add a test
 
 
 add a little test to verify we ignore replica not available exceptions
 
 
 remove partition metadata check from KafkaSystemAdmin since it's already done 
 in getOffsets
 
 
 switch to KafkaUtil.maybeThrowException
 
 
 Diffs
 -
 
   
 samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
  4a1b31f025ba7b05a7b46041aa8e12074599ce24 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 c6e231a2588ce95940aa2da9483a98c6115e38d9 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
 147aabc947f0cb01c0780edb693e9714f810b5f6 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
  b790be17cfe08da28220ffb381cbd618ebe25cf0 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
  4a49d22a3fc403f624ca17a6414d84eaba1898be 
   samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
 2482f23cc6b9c072651df9cbfe9714ffeb203687 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
  3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
  e698d2f1f004740a4d74a488c469d8ca8426c6e4 
   samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
 PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  a8b724bf781003142e455fdf1fed2f13d6c18353 
 
 Diff: https://reviews.apache.org/r/32052/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32006: SAMZA-597

2015-03-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32006/#review76430
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On March 13, 2015, 8:39 p.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32006/
 ---
 
 (Updated March 13, 2015, 8:39 p.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 fixing yi's feedback
 
 
 add docs
 
 
 fail if containerName is not set.
 
 
 update tests
 
 
 move location enabled config to log4j config file
 
 
 fix MDC link in docs
 
 
 add a logging event json serde for log4j, and set it as default.
 
 
 default to log4j string serde for now
 
 
 Diffs
 -
 
   build.gradle 08583e07f1c0bda88433bacb59bc2fd9ef6ce310 
   docs/learn/documentation/versioned/jobs/configuration-table.html 
 ec1287418042b95df73ff7c36a684d3123c46372 
   docs/learn/documentation/versioned/jobs/logging.md 
 af2fd0ea6929230cdc6bc3c51d9ae62adacb55fa 
   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
 107ddf0c3d4e0f584a2f68a23debbada5f68dcb8 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
 4ef3551f470e77e27bd156e81ce96486f25c21bf 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerde.java
  PRE-CREATION 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventJsonSerdeFactory.java
  PRE-CREATION 
   
 samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
 16ccb459892f62245648235eb65f53b26e8ecb87 
   
 samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
  3e4ddc9c72868e22f993f60015224cd3a153266c 
 
 Diff: https://reviews.apache.org/r/32006/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/#review76442
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On March 13, 2015, 9:33 p.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32052/
 ---
 
 (Updated March 13, 2015, 9:33 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-592
 https://issues.apache.org/jira/browse/SAMZA-592
 
 
 Repository: samza
 
 
 Description
 ---
 
 updating based on Ewen's feedback
 
 
 minor nit formatting fix for import
 
 
 refresh topic metadata if partitions have bad error codes. add a test
 
 
 add a little test to verify we ignore replica not available exceptions
 
 
 remove partition metadata check from KafkaSystemAdmin since it's already done 
 in getOffsets
 
 
 switch to KafkaUtil.maybeThrowException
 
 
 Diffs
 -
 
   
 samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
  4a1b31f025ba7b05a7b46041aa8e12074599ce24 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 c6e231a2588ce95940aa2da9483a98c6115e38d9 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
 147aabc947f0cb01c0780edb693e9714f810b5f6 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
  b790be17cfe08da28220ffb381cbd618ebe25cf0 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
  4a49d22a3fc403f624ca17a6414d84eaba1898be 
   samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
 2482f23cc6b9c072651df9cbfe9714ffeb203687 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
  3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
  e698d2f1f004740a4d74a488c469d8ca8426c6e4 
   samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
 PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  a8b724bf781003142e455fdf1fed2f13d6c18353 
 
 Diff: https://reviews.apache.org/r/32052/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32052: SAMZA-592

2015-03-13 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32052/#review76437
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On March 13, 2015, 8:48 p.m., Chris Riccomini wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/32052/
 ---
 
 (Updated March 13, 2015, 8:48 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-592
 https://issues.apache.org/jira/browse/SAMZA-592
 
 
 Repository: samza
 
 
 Description
 ---
 
 minor nit formatting fix for import
 
 
 refresh topic metadata if partitions have bad error codes. add a test
 
 
 add a little test to verify we ignore replica not available exceptions
 
 
 remove partition metadata check from KafkaSystemAdmin since it's already done 
 in getOffsets
 
 
 switch to KafkaUtil.maybeThrowException
 
 
 Diffs
 -
 
   
 samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
  4a1b31f025ba7b05a7b46041aa8e12074599ce24 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 c6e231a2588ce95940aa2da9483a98c6115e38d9 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/GetOffset.scala 
 147aabc947f0cb01c0780edb693e9714f810b5f6 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
  b790be17cfe08da28220ffb381cbd618ebe25cf0 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/TopicMetadataCache.scala
  4a49d22a3fc403f624ca17a6414d84eaba1898be 
   samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala 
 2482f23cc6b9c072651df9cbfe9714ffeb203687 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
  3d1e6ecbb3fd95816c722a68c4f5907120eb20d0 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestTopicMetadataCache.scala
  e698d2f1f004740a4d74a488c469d8ca8426c6e4 
   samza-kafka/src/test/scala/org/apache/samza/utils/TestKafkaUtil.scala 
 PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  a8b724bf781003142e455fdf1fed2f13d6c18353 
 
 Diff: https://reviews.apache.org/r/32052/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Chris Riccomini
 




Re: Review Request 32407: SAMZA-571: add suppression interface for uncaught exceptions

2015-03-24 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32407/
---

(Updated March 25, 2015, 2:24 a.m.)


Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Navina 
Ramesh, and Naveen Somasundaram.


Changes
---

Revised: add more exception callback methods to ExceptionTask interface for 
each specific task lifecycle stage. Consolidate invocation of 
ExceptionTask.exception() and the configured suppression in a single place.


Bugs: SAMZA-571
https://issues.apache.org/jira/browse/SAMZA-571


Repository: samza


Description (updated)
---

[SAMZA-571] Adding task interface to allow customized handling of exceptions 
from user code in tasks

Just to add the first part: add ExceptionTask interface to allow user to add 
code to handle the exceptions from user code in the task's lifecycle: init(), 
process(), window(), and close().


Diffs (updated)
-

  samza-api/src/main/java/org/apache/samza/task/ExceptionTask.java PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
9fc3b557bdcc2756a0ddfed6642deb529936b7a9 
  samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
be0b55ace5b4b9d29f42da17fabac93bb6a25605 
  
samza-core/src/main/scala/org/apache/samza/container/TaskInstanceExceptionHandler.scala
 99b729f129344dee7d324d5889a3964c331f6521 
  samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 
54b4df84f47f818d62ac0361196567ad1f430fde 

Diff: https://reviews.apache.org/r/32407/diff/


Testing (updated)
---

Unit tests added. Pass with ./bin/check-all.sh


Thanks,

Yi Pan (Data Infrastructure)



Review Request 33142: [SAMZA-561] Review in progress

2015-04-13 Thread Yi Pan (Data Infrastructure)
 
  samza-test/src/main/python/samza_job_yarn_deployer.py 
38635ca5899c43fb61d6b4042e8543f0508fd41b 
  samza-test/src/main/python/tests/sql_tests.py PRE-CREATION 
  samza-test/src/main/resources/orders.avsc PRE-CREATION 
  samza-test/src/main/resources/orders.json PRE-CREATION 

Diff: https://reviews.apache.org/r/33142/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 33146: New KeyValueStore Features

2015-04-21 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review81004
---



samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
https://reviews.apache.org/r/33146/#comment131225

From the put() method below, null values seem to be impossible:

if (value == null) {
  db.remove(writeOptions, key)
  ...



samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java
https://reviews.apache.org/r/33146/#comment131197

The signature of close() and flush() functions from AutoCloseable and 
Flushable are different from what the existing function signatures. Wouldn't 
that cause backward-compatibility issues w/ the user code? I would prefer to do 
the minimum to just add the new methods, instead of changing the existing ones.



samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java
https://reviews.apache.org/r/33146/#comment131198

The methods seemed to be re-ordered and it makes it a bit difficult to 
identify the exact changes. Could you turn off the re-ordering in the format 
template s.t. we keep the change minimum?



samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala
https://reviews.apache.org/r/33146/#comment131242

This could be more concise with keys.filter(...) and misses.foreach(...)



samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala
https://reviews.apache.org/r/33146/#comment131246

nit: can be concise as keys.foreach(key = collector.send(new 
OutgoingMessageEnvelope(systemStream, partitionId, key, null))



samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
https://reviews.apache.org/r/33146/#comment131257

nit: prefer not to re-order the methods if not necessary.


- Yi Pan (Data Infrastructure)


On April 16, 2015, 10:43 a.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated April 16, 2015, 10:43 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Making the KeyValueStore extend AutoClosable and Flushable (since we can 
 use Java 1.7 now).
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
 
 
 Diffs
 -
 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala

Re: Review Request 33146: New KeyValueStore Features

2015-04-26 Thread Yi Pan (Data Infrastructure)


 On April 24, 2015, 5:01 p.m., Yi Pan (Data Infrastructure) wrote:
  Ship It!
 
 Mohamed Mahmoud (El-Geish) wrote:
 I don't have access to commit. Can you please grant me access or commit 
 for me? Thanks!

Hi, MOhamed, I was trying to go through all the tests w/ your patch. After all 
the tests are done, I will commit it for you. Thanks a lot!


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review81497
---


On April 24, 2015, 4:59 p.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated April 24, 2015, 4:59 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
 
 
 Diffs
 -
 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
  79092b91c9498e55f1c4e28661b7280c6c19cef7 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
 26f4cd9cfef305546c85ef9330f3e8b8be5336f7 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
  4f48cf490d6c1012591a602c0d29dcc71473090f 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
   samza-test/src/main/config/perf/kv-perf.properties 
 33fcd8d1aea14ecea47bbadb24936f737feedb39 
   
 samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
 
 Diff: https://reviews.apache.org/r/33146/diff/
 
 
 Testing
 ---
 
 Unit-tested.
 
 
 Thanks,
 
 Mohamed Mahmoud (El-Geish)
 




Re: Review Request 33146: New KeyValueStore Features

2015-04-21 Thread Yi Pan (Data Infrastructure)


 On April 21, 2015, 6:49 p.m., Yi Pan (Data Infrastructure) wrote:
  samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java, line 
  33
  https://reviews.apache.org/r/33146/diff/2/?file=931566#file931566line33
 
  The signature of close() and flush() functions from AutoCloseable and 
  Flushable are different from what the existing function signatures. 
  Wouldn't that cause backward-compatibility issues w/ the user code? I would 
  prefer to do the minimum to just add the new methods, instead of changing 
  the existing ones.

Actually, reading @Chris comments on SAMZA-425, I start thinking that it might 
be the time for us to add JDK7/8 only features in the code. But that would need 
a wider range discussion and agreement. I would prefer to put this to 
discussion or to a separate JIRA first, if you don't mind.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review81004
---


On April 16, 2015, 10:43 a.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated April 16, 2015, 10:43 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Making the KeyValueStore extend AutoClosable and Flushable (since we can 
 use Java 1.7 now).
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
 
 
 Diffs
 -
 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
  79092b91c9498e55f1c4e28661b7280c6c19cef7 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
 26f4cd9cfef305546c85ef9330f3e8b8be5336f7 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
  4f48cf490d6c1012591a602c0d29dcc71473090f 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
   samza-test/src/main/config/perf/kv-perf.properties 
 33fcd8d1aea14ecea47bbadb24936f737feedb39 
   
 samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
 
 Diff: https://reviews.apache.org/r/33146/diff/
 
 
 Testing
 ---
 
 Unit-tested.
 
 
 Thanks,
 
 Mohamed Mahmoud

Re: Review Request 33146: New KeyValueStore Features

2015-04-24 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review81497
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On April 24, 2015, 4:59 p.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated April 24, 2015, 4:59 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
 
 
 Diffs
 -
 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
  79092b91c9498e55f1c4e28661b7280c6c19cef7 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
 26f4cd9cfef305546c85ef9330f3e8b8be5336f7 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
  4f48cf490d6c1012591a602c0d29dcc71473090f 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
   samza-test/src/main/config/perf/kv-perf.properties 
 33fcd8d1aea14ecea47bbadb24936f737feedb39 
   
 samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
 
 Diff: https://reviews.apache.org/r/33146/diff/
 
 
 Testing
 ---
 
 Unit-tested.
 
 
 Thanks,
 
 Mohamed Mahmoud (El-Geish)
 




Re: Review Request 33761: Fix SAMZA-658

2015-05-04 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33761/#review82361
---


We would need a unit test to ensure the fix is effective. I.e. create a cached 
store, creating an iterator, call iterator.remove() and verify that the entry 
is deleted from the cache.


samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala
https://reviews.apache.org/r/33761/#comment133093

This iterator needs to be wrapped as well.



samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
https://reviews.apache.org/r/33761/#comment133094

nit: If we are cleaning up the coding style, should be consistent w/ 
wrapperStore.flush() here as well.



samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
https://reviews.apache.org/r/33761/#comment133095

nit: same here.


- Yi Pan (Data Infrastructure)


On May 1, 2015, 6:43 p.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33761/
 ---
 
 (Updated May 1, 2015, 6:43 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-658
 https://issues.apache.org/jira/browse/SAMZA-658
 
 
 Repository: samza
 
 
 Description
 ---
 
 Add removing logic to the intermediate cached store for kv engine
 
 
 Diffs
 -
 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
  b3624e6057ee1a86090f00d2853035b06f63358d 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
 
 Diff: https://reviews.apache.org/r/33761/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Guozhang Wang
 




Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-04-28 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33453/#review81838
---


Overall looks good to me. It would be nice to add unit tests as well. Thanks!


samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
https://reviews.apache.org/r/33453/#comment132322

Question: shouldn't the base dir to be determined by container ID, not 
jobId?



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
https://reviews.apache.org/r/33453/#comment132326

nit: I think that the readablity is a little better if:

val storeBaseDir = if (changeLogSystemStreamPartition != null) {
  loggedStorageBaseDir
} else {
  defaultStoreBaseDir
}
...
val storageEngine = ... (
  storeName,
  TaskStorageManager.getStorePartitionDir(storeBaseDir, storeName, 
taskName),
  ...



samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala
https://reviews.apache.org/r/33453/#comment132329

How do we know that this store is using the default storeBaseDir not 
loggedStoreBaseDir?



samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala
https://reviews.apache.org/r/33453/#comment132331

qq: Is there any checksum in the offset file to make sure that we don't 
read in a corrupted value?



samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala
https://reviews.apache.org/r/33453/#comment132330

From the store initiation code, it seems that storeBaseDir won't have any 
logged store paths if logged store base dir is configured. Hence, here we may 
be deleting empty/non-existing storagePartitionDirs.


- Yi Pan (Data Infrastructure)


On April 22, 2015, 9:54 p.m., Navina Ramesh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33453/
 ---
 
 (Updated April 22, 2015, 9:54 p.m.)
 
 
 Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and 
 Yi Pan (Data Infrastructure).
 
 
 Repository: samza
 
 
 Description
 ---
 
 Changed default to yarn cwd instead of io.tmpDir and refactored code
 
 
 Diffs
 -
 
   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
 1a2dd4413f56e53dbeeb47b5637d7b0c50522f02 
   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
 720fbdceafc4fe69b048d81a677e874d13e6d22f 
   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
 f68a7fee24614fce101e91c4f933d9b4e65dda0a 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
 
 Diff: https://reviews.apache.org/r/33453/diff/
 
 
 Testing
 ---
 
 Tested locally using hello-samza.
 Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing 
 to the new location to persist the changelog attached stores. Otherwise, it 
 will default to YARN's cwd and will not re-use local state.
 
 
 Thanks,
 
 Navina Ramesh
 




Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-07 Thread Yi Pan (Data Infrastructure)


 On May 7, 2015, 2:22 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoiner.java,
   line 47
  https://reviews.apache.org/r/33749/diff/2/?file=948513#file948513line47
 
  I think that stream-to-stream joining is not practical for possibly 
  infinite streams. We may need to define some constraints or some other 
  restrictions to make this practical.

Hi, Milinda, yes. Infinite stream-to-stream join should not be allowed. My 
thought is that the parser/planner should invalidate the unbounded join via 
inspecting the join conditions. Consider the each input stream is ordered on 
field X in A and ordered on field Y in B if join condition does not put a 
bounded range according to field X in A and field Y in B, the validation should 
fail.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33749/#review82821
---


On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33749/
 ---
 
 (Updated May 4, 2015, 6:58 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-650
 https://issues.apache.org/jira/browse/SAMZA-650
 
 
 Repository: samza
 
 
 Description
 ---
 
 WIP: SAMZA-650 window store implementation
 
 First patch to implemement window store and message store.
 
 - Added window store initial implementation
 - Added MessageStore initial implementation as well
 - Completed skeleton APIs in window operators to illustrate the use cases of 
 window store and message store
 - Added a StreamTask test case w/ a single window operator usage
 - Unit tests are still WIP and will be updated later
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoiner.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowAutoOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOpSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowState.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/FilteredMessageIterator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-07 Thread Yi Pan (Data Infrastructure)


 On May 7, 2015, 2:35 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowAutoOp.java,
   line 30
  https://reviews.apache.org/r/33749/diff/2/?file=948519#file948519line30
 
  Hi Yi, What is automated operator in this context?

It is evolving now and will be removed. The original thought is trying to make 
two overlayed APIs on top of core set of window APIs: a) used by OperatorRouter 
to automatically execute the connected operators (i.e. the DAG for the query); 
b) a more low-level API that allows programmer to control when to add message 
and when to retrieve/flush results. After an internal team discussion, we are 
going to remove b). Instead, we will add support for user callback functions in 
the Operator to be invoked right before the process() is called and right 
before the result is collected and sent.


 On May 7, 2015, 2:35 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java,
   line 76
  https://reviews.apache.org/r/33749/diff/2/?file=948520#file948520line76
 
  When will this input disabling happens for a window operator?

Refer to the design doc, the input to a window operator may be disabled if the 
downstream operator can not accept the output of the window operator (e.g. if 
the joiner needs to wait for the output from another stream to be available 
before accepting this window's output). It should not happen in normal case, 
but may occur if we optimize the message store recovery with lazy recovery or 
the two streams are skewed w/ large offsets that is larger than the retention 
size.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33749/#review82824
---


On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33749/
 ---
 
 (Updated May 4, 2015, 6:58 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-650
 https://issues.apache.org/jira/browse/SAMZA-650
 
 
 Repository: samza
 
 
 Description
 ---
 
 WIP: SAMZA-650 window store implementation
 
 First patch to implemement window store and message store.
 
 - Added window store initial implementation
 - Added MessageStore initial implementation as well
 - Completed skeleton APIs in window operators to illustrate the use cases of 
 window store and message store
 - Added a StreamTask test case w/ a single window operator usage
 - Unit tests are still WIP and will be updated later
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoiner.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql

Re: Review Request 33488: SAMZA-657

2015-05-07 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33488/#review82800
---


Overall looks good to me. We may need to re-base.


checkstyle/import-control.xml
https://reviews.apache.org/r/33488/#comment133624

nit: indentation is not aligned.



samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java
https://reviews.apache.org/r/33488/#comment133626

nit: should have been removed?



samza-test/src/main/java/org/apache/samza/test/integration/join/Emitter.java
https://reviews.apache.org/r/33488/#comment133628

nit: There are still many trailing white spaces. We should remove them.


- Yi Pan (Data Infrastructure)


On April 27, 2015, 7:59 p.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33488/
 ---
 
 (Updated April 27, 2015, 7:59 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-657
 https://issues.apache.org/jira/browse/SAMZA-657
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-657.v1
 
 1. Add the checkstyle xml files for the following packages (that have Java 
 code)
 
 samza-api
 samza-core
 samza-log4j
 samza-kv
 samza-test
 
 2. Fix some coding style issues found by checkstyle.
 
 3. Remove one class EpochPartitioner.java since it is from the old producer 
 and not used anywhere.
 
 Some questions I have:
 
 1. Current packaging hierarchy seems to me unnecessarily nested and hence the 
 import-control.xml rules quite messy. For example:
 
 a. We have a o.a.s.test package, with nested integration.join etc, but 
 the test and test-util classes are acutally spread all over other packages, 
 like o.a.s.system.mock
 
 b. We have a o.a.s.serializers, and o.a.s.logging.log4j.serializers. 
 Shall we just make one serializers?
 
 c. We have both o.a.s.serializers.model and o.a.s.job.model. Shall we 
 just make one model?
 
 d. We have a top-level o.a.s.task and o.a.s.container.grouper.task, etc..
 
 Should we consider make the packaging hierarchy more clearer?
 
 2. We are currently claiming to use 2 indentation in order to be consistent 
 with Scala, while there are some places that 4 indentation are also used. 
 Shall we just choose one standard and stick with it? The current 
 checkstyle.xml overrides default (4) to 2.
 
 
 Diffs
 -
 
   README.md 7f92020726626e606dbd97b86dcd91f4157c9ea7 
   build.gradle 97de3a28f6379e3862eec845da87587b1d4f742e 
   checkstyle/checkstyle.xml PRE-CREATION 
   checkstyle/import-control.xml PRE-CREATION 
   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointManager.java 
 092cb910b40d312217e86420bf1ddfbaf605e9e5 
   samza-api/src/main/java/org/apache/samza/config/Config.java 
 2b990506864c38ec2c46d55f27c2ba2f98f271ea 
   samza-api/src/main/java/org/apache/samza/config/MapConfig.java 
 38d7424429d4bf81614311c39630b165236d8fbb 
   samza-api/src/main/java/org/apache/samza/system/SystemStreamPartition.java 
 8dcea09ece60f91a890ff6b1abcb4e93c248dfe4 
   samza-api/src/main/java/org/apache/samza/util/BlockingEnvelopeMap.java 
 e30321d521f0fd7d3d69e2858352916142fb27bf 
   
 samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
  01997ae22641b735cd452a0e89a49219e2874892 
   samza-api/src/test/java/org/apache/samza/config/TestConfig.java 
 d9f378d8de6da3bdf002e69dfb1e4605c3d90cec 
   
 samza-api/src/test/java/org/apache/samza/system/TestSystemStreamPartitionIterator.java
  5af2a11812983cfb8b6a8a0927ef6f3eba7d340f 
   samza-api/src/test/java/org/apache/samza/util/TestBlockingEnvelopeMap.java 
 4eb87eb9033a45b5367480b9e38e18c629c79bc0 
   
 samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
  3517912eaafbf95f8c8cc70ab5869548a56b76e7 
   
 samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala
  8071fec3ac1bad76ddb3c6116e35c646be71d891 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueIterator.java 
 2fb26e28685928547342b325ff6f0a63b3d83887 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
 d3f25c0e03a727e64a774581384ef5aae9ef9c1c 
   
 samza-log4j/src/main/java/org/apache/samza/logging/log4j/serializers/LoggingEventStringSerde.java
  8d8f5e8a8e5fd1e4d9e5482a5accd4a7ece463bc 
   
 samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
 6314a3ed7ae6d49328ba3f32af5d9d1097899009 
   
 samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestJmxAppender.java 
 0bdade0d7c097bcb33acdfc8077c1ce57ad7988c 
   samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java

Re: Review Request 33761: Fix SAMZA-658

2015-05-06 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33761/#review82777
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On May 6, 2015, 11:38 p.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33761/
 ---
 
 (Updated May 6, 2015, 11:38 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-658
 https://issues.apache.org/jira/browse/SAMZA-658
 
 
 Repository: samza
 
 
 Description
 ---
 
 Address Yi's comments round two
 
 
 Diffs
 -
 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
  b3624e6057ee1a86090f00d2853035b06f63358d 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   samza-kv/src/test/scala/org/apache/samza/storage/kv/MockKeyValueStore.scala 
 PRE-CREATION 
   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
 d03ec925b103ccf3c1561de0461fbc39cbe9d9ca 
 
 Diff: https://reviews.apache.org/r/33761/diff/
 
 
 Testing
 ---
 
 unit tests
 
 
 Thanks,
 
 Guozhang Wang
 




Re: Review Request 33453: SAMZA-557 Reuse local state in SamzaContainer on clean shutdown

2015-05-06 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33453/#review82738
---


Overall looks good. Just one minor comment in the info log.


samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala
https://reviews.apache.org/r/33453/#comment133550

Should be default here.


- Yi Pan (Data Infrastructure)


On May 6, 2015, 6:22 a.m., Navina Ramesh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33453/
 ---
 
 (Updated May 6, 2015, 6:22 a.m.)
 
 
 Review request for samza, Yan Fang, Chris Riccomini, Naveen Somasundaram, and 
 Yi Pan (Data Infrastructure).
 
 
 Repository: samza
 
 
 Description
 ---
 
 Added checksum to the Offset file and some unit tests
 
 Added Unit Tests for TaskStorageManager and refactored some code
 
 Changed default to yarn cwd instead of io.tmpDir and refactored code
 
 
 Diffs
 -
 
   samza-core/src/main/scala/org/apache/samza/config/ShellCommandConfig.scala 
 e94a4735217f59d074510ce1556c8c439e6a72f0 
   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
 ac4793afe1e6868933e750181bee1e27c157b5e6 
   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
 f68a7fee24614fce101e91c4f933d9b4e65dda0a 
   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
 8a83566ae6139127d7fe04ab42231151227dc479 
   
 samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
  PRE-CREATION 
   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
 b75f44060fb8e660e824eaeb9cfdcc9d6fa902e8 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  1b44a517129b35affac802929087eaa0061e6b5d 
 
 Diff: https://reviews.apache.org/r/33453/diff/
 
 
 Testing
 ---
 
 Tested locally using hello-samza.
 Note: you have to set an environment variable LOGGED_STORE_BASE_DIR pointing 
 to the new location to persist the changelog attached stores. Otherwise, it 
 will default to YARN's cwd and will not re-use local state.
 
 
 Thanks,
 
 Navina Ramesh
 




Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-06 Thread Yi Pan (Data Infrastructure)


 On May 6, 2015, 2:16 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStore.java,
   line 40
  https://reviews.apache.org/r/33749/diff/1-2/?file=947212#file947212line40
 
  Invalid parameter in doc comment.

Thanks! Will fix.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33749/#review82674
---


On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33749/
 ---
 
 (Updated May 4, 2015, 6:58 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-650
 https://issues.apache.org/jira/browse/SAMZA-650
 
 
 Repository: samza
 
 
 Description
 ---
 
 WIP: SAMZA-650 window store implementation
 
 First patch to implemement window store and message store.
 
 - Added window store initial implementation
 - Added MessageStore initial implementation as well
 - Completed skeleton APIs in window operators to illustrate the use cases of 
 window store and message store
 - Added a StreamTask test case w/ a single window operator usage
 - Unit tests are still WIP and will be updated later
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoiner.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowAutoOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOpSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowState.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/FilteredMessageIterator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/HashPrefixedMessageStore.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStore.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStoreSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/OffsetKey.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/OrderedStoreKey.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java

Re: Review Request 33749: WIP: SAMZA-650 window store implementation

2015-05-06 Thread Yi Pan (Data Infrastructure)


 On May 6, 2015, 2:11 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/FilteredMessageIterator.java,
   line 54
  https://reviews.apache.org/r/33749/diff/2/?file=948526#file948526line54
 
  I'm not sure whether this is 100% correct. For example lets take a 
  situation where only 1 element is left and that element doesn't match given 
  filters. In this situation hasNext will return true. But next will return 
  null. So this doesn't exactly adhere to iterator semantics we know.

Sure. I was debating on whether the slight semantic deviation is OK or not. 
Good point and I will change in the next update. Thanks!


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33749/#review82673
---


On May 4, 2015, 6:58 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33749/
 ---
 
 (Updated May 4, 2015, 6:58 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-650
 https://issues.apache.org/jira/browse/SAMZA-650
 
 
 Repository: samza
 
 
 Description
 ---
 
 WIP: SAMZA-650 window store implementation
 
 First patch to implemement window store and message store.
 
 - Added window store initial implementation
 - Added MessageStore initial implementation as well
 - Completed skeleton APIs in window operators to illustrate the use cases of 
 window store and message store
 - Added a StreamTask test case w/ a single window operator usage
 - Unit tests are still WIP and will be updated later
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoiner.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowAutoOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOpSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowState.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/FilteredMessageIterator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/HashPrefixedMessageStore.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/window/storage

Re: Review Request 33146: New KeyValueStore Features

2015-05-04 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review82429
---



samza-test/src/main/scala/org/apache/samza/test/performance/TestKeyValuePerformance.scala
https://reviews.apache.org/r/33146/#comment133156

This is a bit confusing to me: why do we want to populate the cache (and, 
which cache are we referring to here)? I thought that the straightforward 
comparison is to disable the CachedStore on top of the RocksDB and call 
getAll() vs many get() that directly hitting RocksDB APIs?


- Yi Pan (Data Infrastructure)


On May 4, 2015, 4:27 a.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated May 4, 2015, 4:27 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance -- I added my perf tests there
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
  My tests show that getAll is definitely an improvement that's worth adding:
 ![getAll Perf 
 Chart](https://issues.apache.org/jira/secure/attachment/12729937/getAllPerf.png)
 
 
 Diffs
 -
 
   README.md 7f92020726626e606dbd97b86dcd91f4157c9ea7 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
  79092b91c9498e55f1c4e28661b7280c6c19cef7 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
 26f4cd9cfef305546c85ef9330f3e8b8be5336f7 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/NullSafeKeyValueStore.scala
  4f48cf490d6c1012591a602c0d29dcc71473090f 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
  531e8bef2069a77fa9ceab36fa738bbaa162fe8c 
   samza-test/src/main/config/perf/kv-perf.properties 
 33fcd8d1aea14ecea47bbadb24936f737feedb39 
   
 samza-test/src/main/scala/org/apache/samza/test/performance/TestKeyValuePerformance.scala
  0858b981add6581230960356f65fe6f6e6ab108f 
   
 samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
  50dfc10bb053d74dba70fdbce0ef87609ba447ea 
 
 Diff: https://reviews.apache.org/r/33146/diff/
 
 
 Testing
 ---
 
 Unit-tested.
 
 
 Thanks,
 
 Mohamed Mahmoud (El-Geish)
 




Re: Review Request 33146: New KeyValueStore Features

2015-05-04 Thread Yi Pan (Data Infrastructure)


 On May 4, 2015, 8:14 p.m., Yi Pan (Data Infrastructure) wrote:
  samza-test/src/main/scala/org/apache/samza/test/performance/TestKeyValuePerformance.scala,
   line 320
  https://reviews.apache.org/r/33146/diff/5-6/?file=943969#file943969line320
 
  This is a bit confusing to me: why do we want to populate the cache 
  (and, which cache are we referring to here)? I thought that the 
  straightforward comparison is to disable the CachedStore on top of the 
  RocksDB and call getAll() vs many get() that directly hitting RocksDB APIs?
 
 Mohamed Mahmoud (El-Geish) wrote:
 A- I'm referring to the RocksDB cache, please see my comment on this 
 method:
 Test that ::getAll performance is better than that of ::get (test when 
 data are written once and read many times); load is usually greater than the 
 storage engine's cache size (not to be confused with Samza's cache layer)
 
 B- The first call to get() or getAll() is an outlier because it takes 
 longer than any subsequent call -- we are testing many reads, so it makes 
 sense to remove those outliers before the cache is populated (the perf test 
 is warming up)
 
 C- This test hits the RocksDB APIs directly

Ah, that makes sense. Thanks!


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33146/#review82429
---


On May 4, 2015, 4:27 a.m., Mohamed Mahmoud (El-Geish) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33146/
 ---
 
 (Updated May 4, 2015, 4:27 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-647
 https://issues.apache.org/jira/browse/SAMZA-647
 
 
 Repository: samza
 
 
 Description
 ---
 
 * Adding new KeyValueStore methods: MapK, V getAll(ListK), and void 
 deleteAll(ListK).
 **Please note: Backwards incompatible API changes, config changes, or 
 library upgrades should only happen between major revision changes, or when 
 the major revision is 0. -- since the latter is true, I found that adding 
 the new methods to the already-existing interface to be a better solution, 
 even though it breaks backward compatability (please see the first iteration 
 for context).** Alternatively, to maintain backward compat., I would have 
 added a new contract and a new class for each class that implements 
 KeyValueStore (to implement the new contract and pass calls throught to the 
 underlying store, whose type needs to change in the constructor to the new 
 contract).
 * Improved the javadoc of KeyValueStore to have the same voice, to add API 
 notes, and to follow the javadoc standards.
 * Removing stress tests from TestKeyValueStores.scala because:
 * * Unit tests are not meant to be stress-testing the system,
 * * The test load looked arbitrary,
 * * It wasn't measuring anything (just testing it doesn't crash),
 * * Stress testing requires extended periods of testing, and
 * * My machine, a MacBook Air, is not beefy enough to survive stress tests; 
 they should be run on a typical production machine and not a dev one -- a 
 flaky test is not a good test.
 * * There's a test class dedicated for KV stores' performance testing: 
 TestKeyValuePerformance -- I added my perf tests there
 * Last, but definitely not least, the main motiviation behind this change: 
 Allowing RocksDbKeyValueStore to implement getAll(ListK) to call 
 multiGet(ListK); Preliminary tests showed that multiGet is at least 1.25x 
 faster per key than get (see 
 https://reviews.facebook.net/rROCKSDB4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d).
  My tests show that getAll is definitely an improvement that's worth adding:
 ![getAll Perf 
 Chart](https://issues.apache.org/jira/secure/attachment/12729937/getAllPerf.png)
 
 
 Diffs
 -
 
   README.md 7f92020726626e606dbd97b86dcd91f4157c9ea7 
   
 samza-kv-inmemory/src/main/scala/org/apache/samza/storage/kv/inmemory/InMemoryKeyValueStore.scala
  217333c84c696c0cc1bc3eeabf1c4066a6e89795 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  66c2a0dc2e38e21f951727a30f0987776ac52fe2 
   samza-kv/src/main/java/org/apache/samza/storage/kv/KeyValueStore.java 
 b708341abed15aaad34df5934f5f310bc1feb87a 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
 61bb3f6acb080b653f8b11176538549738255acc 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStorageEngine.scala
  3a23daf053f0b8dec3a7ec83a51c9c5527078a3b 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv/KeyValueStoreMetrics.scala
  79092b91c9498e55f1c4e28661b7280c6c19cef7 
   samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala 
 26f4cd9cfef305546c85ef9330f3e8b8be5336f7 
   
 samza-kv/src/main/scala/org/apache/samza/storage/kv

Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-08 Thread Yi Pan (Data Infrastructure)
/samza/sql/window/storage/PrefixedKey.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/Range.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/TimeAndOffsetKey.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/TimeKey.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowOutputStream.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowState.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowStore.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/system/sql/LongOffset.java 
PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/system/sql/Offset.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/OperatorMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/RouterMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/SimpleMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/SqlMessageCollector.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/StoreMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java
 PRE-CREATION 
  samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34009/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-08 Thread Yi Pan (Data Infrastructure)
/java/org/apache/samza/sql/window/storage/FilteredMessageIterator.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/HashPrefixedMessageStore.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStore.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStoreSpec.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/OrderedStoreKey.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/PrefixedKey.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/Range.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/TimeAndOffsetKey.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/TimeKey.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowOutputStream.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowState.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowStore.java
 PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/system/sql/LongOffset.java 
PRE-CREATION 
  samza-sql-core/src/main/java/org/apache/samza/system/sql/Offset.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/OperatorMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/RouterMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/SimpleMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/SqlMessageCollector.java 
PRE-CREATION 
  
samza-sql-core/src/main/java/org/apache/samza/task/sql/StoreMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java
 PRE-CREATION 
  samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34009/diff/


Testing (updated)
---

./gradlew clean build passed


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)


 On May 12, 2015, 3:07 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java,
   line 83
  https://reviews.apache.org/r/34009/diff/1/?file=954283#file954283line83
 
  Don't we need to make this final because we are passing tuples to an 
  inner class?

Are you referring to make the iterator final s.t. the joiner can not change the 
message store content outside the window operator? I agree on this point. But I 
think trying to make the iterator read-only would be safer than just make it 
final. Let me make a note on this and address it when we get the point to 
actually implementing the joiner class.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34009/#review83405
---


On May 9, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34009/
 ---
 
 (Updated May 9, 2015, 1:52 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-650: adding updateOutputs(), getResult(), and flush() in window 
 operator. Adding example code as use case for window operator.
 
 This is to solicitate the iteration on the change of Operator APIs to make it 
 simpler for programmers.
 * Main highlights:
   * Removed differentiation between RelationOperator and TupleOperator
   * Added OperatorCallback interface to allow user to insert callback 
 functions to be invoked before process the input and before send the result
   * Simplified test task implementation
 
 WIP:
   * Implementation of window store and message store
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/DefaultOperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/Join.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStream.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)


 On May 12, 2015, 2:36 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java, line 
  62
  https://reviews.apache.org/r/34009/diff/1/?file=954266#file954266line62
 
  I assume this is the system time. If yes, may be having a brief 
  description about the difference between system time and event time will 
  help.

@Milinda, I was hoping to get the broker's publish timestamp from the input 
stream, if no timestamp field is specified by user. I will make it explicit. 
Thanks!


 On May 12, 2015, 2:36 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java,
   line 26
  https://reviews.apache.org/r/34009/diff/1/?file=954268#file954268line26
 
  Having some description about the usage of OperatorCallback would be 
  good.

Yes, absolutely. I am working on filling up all the Java doc for these changes 
now.


 On May 12, 2015, 2:36 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java,
   line 32
  https://reviews.apache.org/r/34009/diff/1/?file=954274#file954274line32
 
  I think this description is no longer 100% correct. Even tough 
  implementaiton is still the same, interface doesn't expose some of the 
  things we exposed earlier to the user.
  
  I should have more comments about the router after I change my planner 
  implementation to work with this version of the router.

Sure. I will update the description here.


 On May 12, 2015, 2:36 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java,
   line 53
  https://reviews.apache.org/r/34009/diff/1/?file=954322#file954322line53
 
  Why this class is named UserCallbacksSqlTask?

I use this name to indicate that this is a task that users of operator APIs are 
implementing a customized OperatorCallback in the SQL task.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34009/#review83395
---


On May 9, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34009/
 ---
 
 (Updated May 9, 2015, 1:52 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-650: adding updateOutputs(), getResult(), and flush() in window 
 operator. Adding example code as use case for window operator.
 
 This is to solicitate the iteration on the change of Operator APIs to make it 
 simpler for programmers.
 * Main highlights:
   * Removed differentiation between RelationOperator and TupleOperator
   * Added OperatorCallback interface to allow user to insert callback 
 functions to be invoked before process the input and before send the result
   * Simplified test task implementation
 
 WIP:
   * Implementation of window store and message store
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/DefaultOperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java

Re: Review Request 34009: WIP: SAMZA-650 window store implementation

2015-05-12 Thread Yi Pan (Data Infrastructure)


 On May 12, 2015, 2:39 p.m., Milinda Pathirage wrote:
  Hi, Yi,
  
  Patch looks good overall. I think we should get this into samza-sql branch 
  first and change the SAMZA-561 patch to work with latest API changes.

Thanks a lot! I am addressing the comments/issues from you and from my team 
members here now. I will update and commit asap.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34009/#review83396
---


On May 9, 2015, 1:52 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34009/
 ---
 
 (Updated May 9, 2015, 1:52 a.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-650: adding updateOutputs(), getResult(), and flush() in window 
 operator. Adding example code as use case for window operator.
 
 This is to solicitate the iteration on the change of Operator APIs to make it 
 simpler for programmers.
 * Main highlights:
   * Removed differentiation between RelationOperator and TupleOperator
   * Added OperatorCallback interface to allow user to insert callback 
 functions to be invoked before process the input and before send the result
   * Simplified test task implementation
 
 WIP:
   * Implementation of window store and message store
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/DefaultOperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/Join.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStream.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql

Re: Review Request 33735: RocksDB TTL support

2015-05-14 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33735/#review83847
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On May 13, 2015, 11:10 p.m., Naveen Somasundaram wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33735/
 ---
 
 (Updated May 13, 2015, 11:10 p.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 RocksDB TTL support
 https://issues.apache.org/jira/browse/SAMZA-537
 https://issues.apache.org/jira/browse/SAMZA-442
 
 Please ignore the maven link added to build.gradle, I'll remove it once I 
 validate the release is good.
 
 
 Diffs
 -
 
   build.gradle ac80a8664180e556ec83e229e04e3d8c56b70506 
   docs/learn/documentation/versioned/jobs/configuration-table.html 
 728197d01d1e3f551ea53e2a14e97df44e29ee19 
   gradle/dependency-versions.gradle ee6dfc411b7ab90b187df79f109884127953862e 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
  5ab68590a4ed2686d730344665e25776cade6add 
   
 samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
  dd20f171491da4b4d900551932b2a06d58526d73 
   
 samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
  PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/storage/kv/TestKeyValueStores.scala
  9dee7be9a58c491dbd1a6b9cf73d5c111c570da2 
 
 Diff: https://reviews.apache.org/r/33735/diff/
 
 
 Testing
 ---
 
 Added Unit test
 
 
 Thanks,
 
 Naveen Somasundaram
 




Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
/StoreMessageCollector.java
 PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java
 PRE-CREATION 
  samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34206/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
/RandomOperatorTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java
 PRE-CREATION 
  samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34206/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 34009: SAMZA-552 window store implementation

2015-05-13 Thread Yi Pan (Data Infrastructure)


 On May 13, 2015, 9:56 p.m., Navina Ramesh wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java,
   line 309
  https://reviews.apache.org/r/34009/diff/1/?file=954292#file954292line309
 
  There are 2 refresh method definitions - one is internal and other is 
  public api?
  
  When is this refresh method invoked?

This is probably a mess-up in the operator API changes. I will fix it. Thanks!


 On May 13, 2015, 9:56 p.m., Navina Ramesh wrote:
  samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java,
   line 96
  https://reviews.apache.org/r/34009/diff/1/?file=954320#file954320line96
 
  Shouldn't the window id be system generated?

Yes, in the case when the whole query plan is system-generated. Here we just 
illustrate how a human programmer can instantiate and use the window operators.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34009/#review83667
---


On May 13, 2015, 5:36 p.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34009/
 ---
 
 (Updated May 13, 2015, 5:36 p.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Milinda Pathirage, 
 Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-650: adding updateOutputs(), getResult(), and flush() in window 
 operator. Adding example code as use case for window operator.
 
 This is to solicitate the iteration on the change of Operator APIs to make it 
 simpler for programmers.
 * Main highlights:
   * Removed differentiation between RelationOperator and TupleOperator
   * Added OperatorCallback interface to allow user to insert callback 
 functions to be invoked before process the input and before send the result
   * Simplified test task implementation
 
 WIP:
   * Implementation of window store and message store
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/exception/OperatorException.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/DefaultOperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/Join.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql

Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-14 Thread Yi Pan (Data Infrastructure)
-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java 
PRE-CREATION 
  
samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/34206/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 34206: WIP: update operator API to allow callbacks and allow a single API to trigger OperatorRouter execution w/ user callbacks

2015-05-15 Thread Yi Pan (Data Infrastructure)


 On May 15, 2015, 4:27 p.m., Milinda Pathirage wrote:
  samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java, line 
  36
  https://reviews.apache.org/r/34206/diff/2/?file=960924#file960924line36
 
  How about supporting multi column primary keys? We can add it to the 
  API even if we are not going to support it at the begining.

Sure. I will change the API to return a list instead of a single value.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34206/#review83937
---


On May 15, 2015, 2:16 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34206/
 ---
 
 (Updated May 15, 2015, 2:16 a.m.)
 
 
 Review request for samza, Yan Fang, Chris Riccomini, Guozhang Wang, Milinda 
 Pathirage, Navina Ramesh, and Naveen Somasundaram.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 This is one version of the Operator API change:
 - Merge RelationOperator and TupleOperator to SimpleOperator
 - Add OperatorRouter to extends from Operator to allow process() and 
 refresh() the whole connected set of SimpleOperators
 - Modified test case examples to illustrate usage
 
 The downside of not defining a separate OperatorCallback function is that if 
 there are some commonly used user functions to preprocess the input and 
 outgoing messages, the user will have to extend different SimpleOperator 
 classes to override the beforeProcess() and afterProcess() functions.
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/NoopOperatorCallback.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorImpl.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionOp.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/partition/PartitionSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/Join.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStream.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/BoundedTimeWindow.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowSpec.java
  PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/router/SimpleRouter.java 
 PRE-CREATION

Re: Review Request 33170: Renamed samza-sql to samza-sql-core

2015-04-14 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33170/#review80043
---

Ship it!


I assume that those files w/ Calcite dependencies moved to samza-sql-calcite 
later.


samza-sql-core/README.md
https://reviews.apache.org/r/33170/#comment129800

We may want to add a few more description here, like samza-sql-core is the 
module for backend operators and samza-sql-calcite is the module for front end 
parser/planner.



samza-sql-core/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java
https://reviews.apache.org/r/33170/#comment129803

We should consider moving this to samza-sql-calcite, when the module is 
created.



samza-sql-core/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java
https://reviews.apache.org/r/33170/#comment129805

This should be moved to samza-sql-calcite as well.



samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaCalciteConnection.java
https://reviews.apache.org/r/33170/#comment129806

Same here.



samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaQueryPreparingStatement.java
https://reviews.apache.org/r/33170/#comment129808

move to samza-sql-calcite.



samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaSqlValidator.java
https://reviews.apache.org/r/33170/#comment129809

Same here.



samza-sql-core/src/test/java/org/apache/samza/sql/planner/QueryPlannerTest.java
https://reviews.apache.org/r/33170/#comment129811

Move to samza-sql-calcite



samza-sql-core/src/test/java/org/apache/samza/sql/planner/SamzaStreamTableFactory.java
https://reviews.apache.org/r/33170/#comment129814

Move to samza-sql-calcite



samza-sql-core/src/test/java/org/apache/samza/sql/test/metadata/TestAvroSchemaConverter.java
https://reviews.apache.org/r/33170/#comment129816

Same here.


- Yi Pan (Data Infrastructure)


On April 14, 2015, 3:14 p.m., Milinda Pathirage wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33170/
 ---
 
 (Updated April 14, 2015, 3:14 p.m.)
 
 
 Review request for samza, Chris Riccomini and Yi Pan (Data Infrastructure).
 
 
 Bugs: SAMZA-648
 https://issues.apache.org/jira/browse/SAMZA-648
 
 
 Repository: samza
 
 
 Description
 ---
 
 This patch rename samza-sql module to samza-sql-core. This is the first step 
 of separating samza-sql to multiple modules. samza-sql-core will contain 
 operator layer related code while we will have separate modules for different 
 front-end implementations. After this step, Calcite based front-end code will 
 be moved to samza-sql-calcite module.
 
 
 Diffs
 -
 
   build.gradle ee4baa5 
   gradle/dependency-versions.gradle 46e7c28 
   samza-sql-core/README.md PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Data.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Schema.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/RelationOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SqlOperatorFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/TupleOperator.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/router/OperatorRouter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java
  PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/data/avro/AvroData.java 
 PRE-CREATION 
   samza-sql-core/src/main/java/org/apache/samza/sql/data/avro/AvroSchema.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/serializers/SqlAvroSerde.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/serializers/SqlAvroSerdeFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/serializers/SqlStringSerde.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/serializers/SqlStringSerdeFactory.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/string/StringData.java 
 PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/data/string

Re: Review Request 33219: [SAMZA-649] Create samza-sql-calcite module for Calcite SQL front end

2015-04-15 Thread Yi Pan (Data Infrastructure)


 On April 15, 2015, 6:20 p.m., Yi Pan (Data Infrastructure) wrote:
  samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/schema/AvroSchemaConverter.java,
   line 37
  https://reviews.apache.org/r/33219/diff/1/?file=930371#file930371line37
 
  I assume that this class is used to convert the data schema/types in 
  samza-sql-core model to Calcite's RelDataType? In that case, can we use the 
  generic Schema class in samza-sql-core instead of implementation specific 
  for Avro?

Please discard this comment. I just realized that this converter is probably 
used in query validation which will need to convert schemas in an available 
Avro schema repo to Calcite data model.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33219/#review80228
---


On April 15, 2015, 2:49 p.m., Milinda Pathirage wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33219/
 ---
 
 (Updated April 15, 2015, 2:49 p.m.)
 
 
 Review request for samza, Chris Riccomini and Yi Pan (Data Infrastructure).
 
 
 Bugs: SAMZA-649
 https://issues.apache.org/jira/browse/SAMZA-649
 
 
 Repository: samza
 
 
 Description
 ---
 
 Moved Calcite based front-end to samza-sql-calcite module.
 
 
 Diffs
 -
 
   build.gradle a1c7133 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/QueryPlanner.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaCalciteConnection.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaQueryPreparingStatement.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaSqlValidator.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/schema/AvroSchemaConverter.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/planner/SamzaStreamTableFactory.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/planner/TestQueryPlanner.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/schema/TestAvroSchemaConverter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java
  3dad046 
   samza-sql-core/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java 
 1dfb262 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaCalciteConnection.java
  63b1da5 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaQueryPreparingStatement.java
  0721573 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaSqlValidator.java
  f46c1f0 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/planner/QueryPlannerTest.java
  022116e 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/planner/SamzaStreamTableFactory.java
  f757d8f 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/test/metadata/TestAvroSchemaConverter.java
  b4ac5f5 
   settings.gradle 5cbb755 
 
 Diff: https://reviews.apache.org/r/33219/diff/
 
 
 Testing
 ---
 
 ./bin/check-all.sh passed.
 
 
 Thanks,
 
 Milinda Pathirage
 




Re: Review Request 33219: [SAMZA-649] Create samza-sql-calcite module for Calcite SQL front end

2015-04-15 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33219/#review80237
---

Ship it!


+1

- Yi Pan (Data Infrastructure)


On April 15, 2015, 2:49 p.m., Milinda Pathirage wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33219/
 ---
 
 (Updated April 15, 2015, 2:49 p.m.)
 
 
 Review request for samza, Chris Riccomini and Yi Pan (Data Infrastructure).
 
 
 Bugs: SAMZA-649
 https://issues.apache.org/jira/browse/SAMZA-649
 
 
 Repository: samza
 
 
 Description
 ---
 
 Moved Calcite based front-end to samza-sql-calcite module.
 
 
 Diffs
 -
 
   build.gradle a1c7133 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/QueryPlanner.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaCalciteConnection.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaQueryPreparingStatement.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/planner/SamzaSqlValidator.java
  PRE-CREATION 
   
 samza-sql-calcite/src/main/java/org/apache/samza/sql/calcite/schema/AvroSchemaConverter.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/planner/SamzaStreamTableFactory.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/planner/TestQueryPlanner.java
  PRE-CREATION 
   
 samza-sql-calcite/src/test/java/org/apache/samza/sql/calcite/schema/TestAvroSchemaConverter.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java
  3dad046 
   samza-sql-core/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java 
 1dfb262 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaCalciteConnection.java
  63b1da5 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaQueryPreparingStatement.java
  0721573 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/planner/SamzaSqlValidator.java
  f46c1f0 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/planner/QueryPlannerTest.java
  022116e 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/planner/SamzaStreamTableFactory.java
  f757d8f 
   
 samza-sql-core/src/test/java/org/apache/samza/sql/test/metadata/TestAvroSchemaConverter.java
  b4ac5f5 
   settings.gradle 5cbb755 
 
 Diff: https://reviews.apache.org/r/33219/diff/
 
 
 Testing
 ---
 
 ./bin/check-all.sh passed.
 
 
 Thanks,
 
 Milinda Pathirage
 




Re: Review Request 33142: [SAMZA-561] Review in progress

2015-04-14 Thread Yi Pan (Data Infrastructure)

I think that we may be able to combine these two operators. Let me think 
about it a bit more.



samza-sql/src/main/java/org/apache/samza/sql/operators/scan/ProjectableFilterableStreamScanSpec.java
https://reviews.apache.org/r/33142/#comment129692

These calcite specific class should be moved out-of samza-sql-core module.


- Yi Pan (Data Infrastructure)


On April 13, 2015, 9:04 p.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33142/
 ---
 
 (Updated April 13, 2015, 9:04 p.m.)
 
 
 Review request for samza and Milinda Pathirage.
 
 
 Bugs: SAMZA-561
 https://issues.apache.org/jira/browse/SAMZA-561
 
 
 Repository: samza
 
 
 Description
 ---
 
 [SAMZA-561] Review in progress
 
 Post Milinda's patch for SAMZA-561 to ease the comment and discussion.
 
 
 Diffs
 -
 
   build.gradle 97de3a28f6379e3862eec845da87587b1d4f742e 
   gradle/dependency-versions.gradle ee6dfc411b7ab90b187df79f109884127953862e 
   samza-sql/src/main/java/org/apache/samza/sql/Utils.java PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/api/operators/spec/OperatorSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/data/IntermediateMessageTuple.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/data/serializers/SqlAvroSerdeFactory.java
  PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/expressions/Expression.java 
 PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/expressions/RexToJavaCompiler.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/expressions/RexToJavaUtils.java 
 PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/metadata/RelDataTypeToAvroSchemaConverter.java
  PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/metadata/Stream.java 
 PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/factory/TypeAwareOperatorSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/insert/InsertToStreamOp.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/insert/InsertToStreamSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/project/ProjectOp.java 
 PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/project/ProjectSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/scan/ProjectableFilterableStreamScanOp.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/scan/ProjectableFilterableStreamScanSpec.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/operators/scan/StreamScanSpec.java
  PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/planner/ExecutionPlanner.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java 
 PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/planner/rules/FilterableStreamScanRule.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/planner/rules/ProjectableStreamScanRule.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/planner/rules/RemoveIdentityProjectRule.java
  PRE-CREATION 
   
 samza-sql/src/main/java/org/apache/samza/sql/rel/ProjectableFilterableStreamScan.java
  PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/sql/rel/StreamScan.java 
 PRE-CREATION 
   samza-sql/src/main/java/org/apache/samza/task/sql/StreamSqlTask.java 
 PRE-CREATION 
   
 samza-sql/src/test/java/org/apache/samza/sql/data/serializers/SqlAvroSerdeTest.java
  PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/sql/planner/QueryPlannerTest.java 
 PRE-CREATION 
   
 samza-sql/src/test/java/org/apache/samza/sql/planner/SamzaStreamTableFactory.java
  PRE-CREATION 
   
 samza-sql/src/test/java/org/apache/samza/sql/planner/TestExecutionPlanner.java
  PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/sql/planner/TestQueryPlanner.java 
 PRE-CREATION 
   
 samza-sql/src/test/java/org/apache/samza/sql/planner/TestRexToJavaCompiler.java
  PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/sql/test/Constants.java 
 PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/sql/test/Utils.java PRE-CREATION 
   
 samza-sql/src/test/java/org/apache/samza/sql/test/metadata/TestAvroSchemaConverter.java
  PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java 
 PRE-CREATION 
   samza-sql/src/test/java/org/apache/samza/task

Review Request 32872: SAMZA-571: add test to capture RecordTooLargeException

2015-04-06 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32872/
---

Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Navina 
Ramesh, and Naveen Somasundaram.


Bugs: SAMZA-571
https://issues.apache.org/jira/browse/SAMZA-571


Repository: samza


Description
---

SAMZA-571: add test to capture RecordTooLargeException


Diffs
-

  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 d66b3bd070a4cef4b1d3dded1d79a33cbe3fa09b 

Diff: https://reviews.apache.org/r/32872/diff/


Testing
---

Passed local test suite


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 35723: SAMZA-720: fix bootstrap hangs when container number 1

2015-06-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35723/#review88820
---

Ship it!


+1. LGTM. Thanks for the quick fix, Yan.

- Yi Pan (Data Infrastructure)


On June 22, 2015, 8 a.m., Yan Fang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35723/
 ---
 
 (Updated June 22, 2015, 8 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-720
 https://issues.apache.org/jira/browse/SAMZA-720
 
 
 Repository: samza
 
 
 Description
 ---
 
 remove the unregistered ssps in the laggingSsp set.
 
 
 Diffs
 -
 
   
 samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala
  dd500b9 
 
 Diff: https://reviews.apache.org/r/35723/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yan Fang
 




Re: Review Request 35397: Fix SAMZA-697

2015-06-19 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35397/#review88236
---


Overall looks good. I have a few comments/questions. Thanks!


bin/check-all.sh (line 60)
https://reviews.apache.org/r/35397/#comment140653

I think that this is from another commit? You may need to rebase your 
changes.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 443)
https://reviews.apache.org/r/35397/#comment141085

Users who extends and implements StreamTask usually have their 
implementation classes put in the same package as org.apache.samza.task. 
Wouldn't this default blacklist also ignore the user-implemented StreamTask 
classes?



samza-core/src/main/java/org/apache/samza/task/TaskClassLoader.java (line 71)
https://reviews.apache.org/r/35397/#comment141086

nit: for unit test, changing it to package default should work.



samza-core/src/main/java/org/apache/samza/task/TaskClassLoader.java (line 156)
https://reviews.apache.org/r/35397/#comment141087

Question: does this findLoadedClass(name) return true if the same binary 
name of a class has been loaded by another classloader? Can we add a test here 
to make sure?



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala (line 
433)
https://reviews.apache.org/r/35397/#comment141089

What if the taskClassLoaderPath is null? Shouldn't we just skip the swaping 
of classloaders?

Maybe it would be good to print out a warning now and use the default 
classloader. Then later enforcing it by throwing exception if 
taskClassLoaderPath is not setup.



samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala (line 
97)
https://reviews.apache.org/r/35397/#comment141088

It would seem cleaner as:

def swapClassloader(userTask: = Unit) {
  val thread = Thread.currentThread
  val oldClassLoader = thread.getContextClassLoader
  thread.setContextClassLoader(taskClassLoader)
  try {
userTask
  } finally {
thread.setContextClassLoader(oldClassLoader)
  }
}

Then:
swapClassloader {
  task.asInstanceOf[InitableTask].init(config, context)
}

swapClassloader {
  exceptionHandler.maybeHandle {
task.process(envelope, collector, coordinator)
  }
}
...


- Yi Pan (Data Infrastructure)


On June 18, 2015, 6:42 p.m., Guozhang Wang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35397/
 ---
 
 (Updated June 18, 2015, 6:42 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-697
 https://issues.apache.org/jira/browse/SAMZA-697
 
 
 Repository: samza
 
 
 Description
 ---
 
 Address Yan's comments
 
 
 Diffs
 -
 
   checkstyle/import-control.xml 3374f0c432e61ac4cda275377604cfd481f0cddf 
   docs/learn/documentation/versioned/jobs/configuration-table.html 
 405e2cea4fd1d037cc26b3537f6bb406eded202b 
   samza-core/src/main/java/org/apache/samza/task/TaskClassLoader.java 
 PRE-CREATION 
   samza-core/src/main/scala/org/apache/samza/config/TaskConfig.scala 
 0b3a235b5ab1d6bd60669bfe6023f6b0b4e943d3 
   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
 cbacd183420e9d1d72b05693b55a8f0a62d59fc5 
   samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
 c5a5ea5dea9a950fc741625238f5bf8b1f362180 
   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
   
 samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala 
 4fac154709d72ab594485dad93c912b55fb1617e 
   samza-core/src/test/java/org/apache/samza/task/TestTaskClassLoader.java 
 PRE-CREATION 
   
 samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
 9fb1aa98fcd14397e8a4cb00c67537482e95fa53 
   samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 
 7caad28c9298485753ab861da76793cf925953ed 
 
 Diff: https://reviews.apache.org/r/35397/diff/
 
 
 Testing
 ---
 
 unit tests
 
 
 Thanks,
 
 Guozhang Wang
 




Re: Review Request 35325: SAMZA-698: update Samza and Spark Streaming Comparison

2015-06-12 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35325/#review87787
---


Overall very nice comparison and review. I just have one recommendation here. 
Thanks, Yan!


docs/learn/documentation/versioned/comparisons/spark-streaming.md
https://reviews.apache.org/r/35325/#comment140211

I felt that the mentioning of exactly-once here in Spark is not proper, if 
you will have to revert back to at-least-once in failures. It is trivial to 
claim exact-once under the asumption that the system running healthy,


- Yi Pan (Data Infrastructure)


On June 12, 2015, 11:54 p.m., Yan Fang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35325/
 ---
 
 (Updated June 12, 2015, 11:54 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-698
 https://issues.apache.org/jira/browse/SAMZA-698
 
 
 Repository: samza
 
 
 Description
 ---
 
 update the doc
 
 
 Diffs
 -
 
   docs/learn/documentation/versioned/comparisons/spark-streaming.md b8a521f 
 
 Diff: https://reviews.apache.org/r/35325/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yan Fang
 




Re: Review Request 34500: SAMZA-552 Operator API change: builder and simplified operator classes

2015-05-29 Thread Yi Pan (Data Infrastructure)


 On May 29, 2015, 1:28 a.m., Navina Ramesh wrote:
  samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java,
   line 120
  https://reviews.apache.org/r/34500/diff/1/?file=965740#file965740line120
 
  I thought TopologyBuilder was to abstract away the spec and provide a 
  simplified API for a user implementing a simple SQL query. 
  Imo, this still seems pretty involved for a user concerned with just 
  defining a simple join query. 
  
  I assumed we could have a builder pattern as below:
  
  ```
  TopologyBuilder builder = TopologyBuilder
  .create()
  .join(window(stream1, 10), 
  window(stream2, 10), List{joinKey1, joinKey2, ...})
  .partition(partitionKey)
  .build()
  ```
  
  The idea here is that the build statement order determines the 
  topology. The builder just validates and chains them together. 
  I can see that this can be a problem with running operators in parallel 
  and possibly, make it hard for the user to understand the correct sequence 
  of operators. 
  I am wondering if you think this kind of a model is possible. It would 
  greatly simplify the API for most users. 
  Just wanted to put this comment out so that we can discuss further.
 
 Milinda Pathirage wrote:
 I also agree with Navina here. I think we should make building topologies 
 simple with the builder API. One complexity of current OperatorSpec based API 
 is you need to create intermediate streams (EntityName)s to wire operators 
 together. I think we should try to hide that complexity through the builder 
 API. Even though source and sink hides that complexity to some extent, its 
 better if we can completely remove that.
 
 Yi Pan (Data Infrastructure) wrote:
 Thank you both for the good points here. @Navina, yes, the basic idea for 
 the topology builder is exactly what you mentioned and the model you 
 illustrated is much simpler and very attractive. The issue I saw is that the 
 topology may not be completely linear, or a tree. It is not easy to describe 
 a network of operators like the following, w/o introducing the concept of 
 intermediate streams.
window--aggregate--+-+
window--aggregate join --+  |   +--aggregate --+
  project--window-- join --|split -+   |
  +join 
 --join -- partition
 There are three issues in the above example:
 1. the join input may be intermediate streams, which essentially could be 
 an output from a sub-topology
 2. the multi-output operator will make the downstream expression branch 
 off and not easily expressible in linear format
 3. the output of a single operator maybe used by multiple downstream 
 operators, again, forking off the linear expression
 
 Maybe we can adopt the simple join builder as you illustrated for simple 
 queries, although I think that I would like to add an OperatorBuilder as well 
 here:
 OperatorRouter simpleJoinQuery = TopologyBuilder.create()
  .join(OperatorBuilder.window(stream1, 10), 
 OperatorBuilder.window(stream2, 10), joinKeys)
  .partition(partitionKey).build();
 
 For more complex queries, I am thinking of the following method may work 
 better:
 OperatorRouter router = 
 TopologyBuilder.create().beginStream(stream1).window(10).aggregate(group-by,
  treeId, sum)
   
 .beginStream(stream2).window(10).aggregate(group-by,treeId, 
 avg).join(joinCondition)
   
 .beginStream(stream3).project(fieldList).window(10).join(joinCondition)
   .partition(partitionKey, 
 number,outstream1).build();
   
 In which, beginStream() always signify one linear path of operators and 
 add it to the topology. The following join operator will join the latest two 
 streams and create one joined stream. This model may even solve the issue 2 
 by allowing: 
   ...split().beginStream(1).aggregate().beginStream(2)
 .beginStream(stream4).filter().window()
 .join().join().partition().build();
 
 For issue 3, i.e. reuse a certain intermediate stream in multiple 
 downstream operator, we can introduce beginReuseStream(source, name) as the 
 following:
 OperatorRouter router = 
 TopologyBuilder.create().beginReuseStream(stream1, 
 reuseStream1).window(10).aggregate(group-by, treeId, sum)
   
 .beginStream(stream2).window(10).aggregate(group-by,treeId, 
 avg).join(joinCondition)
   
 .beginStream(stream3).project(fieldList).window(10).join(joinCondition)
   .join(reuseStream1, joinCondition

Re: Review Request 34746: Adding new CoordinatorStreamMessage SetContainerHostMapping and LocalityManager (SAMZA-618)

2015-06-01 Thread Yi Pan (Data Infrastructure)


 On May 30, 2015, 8:58 a.m., Yi Pan (Data Infrastructure) wrote:
  samza-core/src/main/java/org/apache/samza/container/LocalityManager.java, 
  line 62
  https://reviews.apache.org/r/34746/diff/2/?file=974783#file974783line62
 
  This would be invoked twice by checkpointManager and localityManager. 
  Generally, I felt that since checkpointManager and localityManager are 
  referring to the same set of coordinatorSystemConsumer and 
  coordinatorSystemProducer, shouldn't the start/stop/register be in the same 
  life cycle, instead of being invoked twice separately?
 
 Navina Ramesh wrote:
 Yeah. Naveen and I discussed about this. CheckpointManager, 
 ChangelogManager and LocalityManger can all use the same 
 CoordinatorStreamAccessor instance and we can have accessors like 
 getCoordinatorStreamProducer(), getCoordinatorStreamConsumer() interfaces.
 We have to refactor the entire set of coordinator stream related code. I 
 started making these changes. But they are pretty extensive. This change will 
 be a part of [SAMZA-678](https://issues.apache.org/jira/browse/SAMZA-678)

Got it. Thanks for explaining.


 On May 30, 2015, 8:58 a.m., Yi Pan (Data Infrastructure) wrote:
  samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
   line 74
  https://reviews.apache.org/r/34746/diff/2/?file=974787#file974787line74
 
  Maybe, it is easier to wrap the three managers that all requires to 
  initialize the coordinatorSystemProducer/coordinatorSystemConsumer in a 
  single CoordinatorStreamManager? And 
  CoordinatorStreamManager.getCheckpointManager()/getChangelogManager()/getLocalityManager()
   would return the specific management function handler?
 
 Navina Ramesh wrote:
 I guess this is also a pattern we can follow during the refactoring. 
 As mentioned above, I will add this refactoring as a part of 
 [SAMZA-678](https://issues.apache.org/jira/browse/SAMZA-678)

Thanks!


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34746/#review85855
---


On May 30, 2015, 11:49 p.m., Navina Ramesh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34746/
 ---
 
 (Updated May 30, 2015, 11:49 p.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Yi Pan (Data 
 Infrastructure), and Naveen Somasundaram.
 
 
 Bugs: SAMZA-618
 https://issues.apache.org/jira/browse/SAMZA-618
 
 
 Repository: samza
 
 
 Description
 ---
 
 Adding Locality Manager file
 
 
 reading in JC and writing from containers
 
 
 After SAMZA-686 changes
 
 
 Fixing stylechecks
 
 
 Correcting when coordinator system accessors start  stop
 
 
 Corrected documentation
 
 
 Diffs
 -
 
   checkstyle/import-control.xml 5f8e103a2e43f96518b20de1c7cbd84e0af24842 
   samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
 PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
  0988dedc3e8ad1b4080fb89dfff7c6f95fba8b67 
   samza-core/src/main/java/org/apache/samza/job/model/JobModel.java 
 fa113e12080384586b329c82133bc0601b855ae5 
   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
 50e53fbcb55c4e9176bf29217a341b195c96d762 
   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
 5b43b58a851c363846b433ebd589ce6dd5c5c932 
   
 samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
 a7fa0857d1243f5a24e4550a39ee230fbd7705bb 
 
 Diff: https://reviews.apache.org/r/34746/diff/
 
 
 Testing
 ---
 
 Used a sample job to test it locally and also, by setting up YARN on 3 
 machines. 
 Verified that the message is correctly written and consumed from the 
 Coordinator Stream
 
 
 Thanks,
 
 Navina Ramesh
 




Re: Review Request 34746: Adding new CoordinatorStreamMessage SetContainerHostMapping and LocalityManager (SAMZA-618)

2015-06-01 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34746/#review86125
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On May 30, 2015, 11:49 p.m., Navina Ramesh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34746/
 ---
 
 (Updated May 30, 2015, 11:49 p.m.)
 
 
 Review request for samza, Chris Riccomini, Guozhang Wang, Yi Pan (Data 
 Infrastructure), and Naveen Somasundaram.
 
 
 Bugs: SAMZA-618
 https://issues.apache.org/jira/browse/SAMZA-618
 
 
 Repository: samza
 
 
 Description
 ---
 
 Adding Locality Manager file
 
 
 reading in JC and writing from containers
 
 
 After SAMZA-686 changes
 
 
 Fixing stylechecks
 
 
 Correcting when coordinator system accessors start  stop
 
 
 Corrected documentation
 
 
 Diffs
 -
 
   checkstyle/import-control.xml 5f8e103a2e43f96518b20de1c7cbd84e0af24842 
   samza-core/src/main/java/org/apache/samza/container/LocalityManager.java 
 PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
  0988dedc3e8ad1b4080fb89dfff7c6f95fba8b67 
   samza-core/src/main/java/org/apache/samza/job/model/JobModel.java 
 fa113e12080384586b329c82133bc0601b855ae5 
   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
 50e53fbcb55c4e9176bf29217a341b195c96d762 
   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
 5b43b58a851c363846b433ebd589ce6dd5c5c932 
   
 samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
 a7fa0857d1243f5a24e4550a39ee230fbd7705bb 
 
 Diff: https://reviews.apache.org/r/34746/diff/
 
 
 Testing
 ---
 
 Used a sample job to test it locally and also, by setting up YARN on 3 
 machines. 
 Verified that the message is correctly written and consumed from the 
 Coordinator Stream
 
 
 Thanks,
 
 Navina Ramesh
 




Review Request 34664: SAMZA-552 Operator API change: builder and simplified operator classes

2015-05-26 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34664/
---

Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Guozhang 
Wang, Milinda Pathirage, Navina Ramesh, and Naveen Somasundaram.


Bugs: SAMZA-650
https://issues.apache.org/jira/browse/SAMZA-650


Repository: samza


Description
---

WIP: update operator API to allow callbacks and allow a single API to trigger 
OperatorRouter execution w/ user callbacks

Conflicts:
samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java

samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java

samza-sql-core/src/main/java/org/apache/samza/sql/data/IncomingMessageTuple.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperator.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoin.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/join/StreamStreamJoinSpec.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/relation/JoinSpec.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/stream/InsertStreamSpec.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateWindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOpSpec.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowState.java

samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/HashPrefixedMessageStore.java

samza-sql-core/src/main/java/org/apache/samza/task/sql/OperatorMessageCollector.java

samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomOperatorTask.java

samza-sql-core/src/test/java/org/apache/samza/task/sql/RandomWindowOperatorTask.java

samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java

WIP: updated operator API and use case in test tasks

Conflicts:

samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/Operator.java

samza-sql-core/src/main/java/org/apache/samza/sql/api/operators/SimpleOperator.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/SimpleOperatorFactoryImpl.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateTimeWindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/FullStateWindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/operators/window/WindowOp.java

samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStore.java

samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/MessageStoreSpec.java

samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowState.java

samza-sql-core/src/main/java/org/apache/samza/sql/window/storage/WindowStore.java

samza-sql-core/src/main/java/org/apache/samza/task/sql/StoreMessageCollector.java

samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java

SAMZA-552: Updated operator API w/o callbacks

SAMZA-552: updated Operator API w/o callbacks

SAMZA-552: use OperatorCallback to allow implementation of callbacks w/o 
inheriting and creating many sub-classes from operators

SAMZA-552 update the operator API

SAMZA-552: operator builder API update

Squashed commit of the following:

commit fad81106901e494d3950eeaafaeefef482ac0125
Author: Yi Pan (Data Infrastructure) yi...@linkedin.com
Date:   Mon May 25 23:40:00 2015 -0700

SAMZA-650 window message store and window store implementation

commit 58c2eeebf4bb0975f70aeba733379e1104f3a7de
Author: Yi Pan (Data Infrastructure) yi...@linkedin.com
Date:   Fri May 22 00:25:13 2015 -0700

WIP: window store implementation

commit 917e1b599622c2d46ad9a6c63e52dcded893eb8e
Merge: 174eeb0 1183e9f
Author: Yi Pan (Data Infrastructure) yi...@linkedin.com
Date:   Fri May 22 00:18:35 2015 -0700

Merge branch 'samza-window-op' into samza-650-v2

Conflicts:

samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java
samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Relation.java
samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Stream.java
samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java
samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Tuple.java

samza-sql-core/src/main/java/org

Review Request 34574: SAMZA-608; don't hange on serde errors in system consumers

2015-05-21 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34574/
---

Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Guozhang 
Wang, Navina Ramesh, and Naveen Somasundaram.


Bugs: SAMZA-608
https://issues.apache.org/jira/browse/SAMZA-608


Repository: samza


Description
---

SAMZA-608; don't hange on serde errors in system consumers


Diffs
-

  samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala 
125d37602e2c0a9da75674f37580a1ac02f94796 
  samza-core/src/test/scala/org/apache/samza/system/TestSystemConsumers.scala 
3fdc781c1275f928f4b51b01869e1122502a2c08 

Diff: https://reviews.apache.org/r/34574/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

2015-07-31 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review93820
---



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 70)
https://reviews.apache.org/r/36006/#comment148216

shouldRewriteConfigToCoordinatorStream is the action, not the job-level 
functionality this variable is controlling. I would prefer overwriteJobConfig 
or resetJobConfig which tells more explicitly what the job-level function 
this controls.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 1)
https://reviews.apache.org/r/36006/#comment148213

The file directory name is still autoScaling. Isn't it a problem?



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 226)
https://reviews.apache.org/r/36006/#comment148214

Can we lower it down to info? This should not trigger alerts.



samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 307)
https://reviews.apache.org/r/36006/#comment148215

This also reminds me of one thing: where do we assume that the 
ConfigManager will be running? If we assumes a specific host or hosts to run 
this ConfigManager (e.g. RM nodes), we better call it out and add some docs to 
describe it.



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala 
(line 61)
https://reviews.apache.org/r/36006/#comment148217

Can we use CoordinatorStreamMessage defined constants, instead of hard-code 
strings here?


- Yi Pan (Data Infrastructure)


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36006/
 ---
 
 (Updated July 28, 2015, 2:39 a.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
 Naveen Somasundaram.
 
 
 Repository: samza
 
 
 Description
 ---
 
 After a job is submitted, it might need some configuration change, 
 specifically it might need more containers. In SAMZA-704 a tool is being 
 added to write to the coordinator stream (CoordinatorStreamWriter).  This 
 tool can be used to write new configurations to the coordinator stream. 
 However, another tool (ConfigManager) is needed to read the config changes 
 and react to them, which is the goal of this task. This tool should be 
 brought up after the job is submitted and read any config changes added to 
 the coordinator stream, and react to each accordingly. 
 
 This tool, called the Config Manager, is focusing on handling container 
 changs by reacting to set-config massages with key yarn.container.count. 
 
 The config manager is a separate standa alone module, that should be brought 
 up separately after the submission of a job. Therefore, you have to add two 
 configurations to the input config file:
 1. yarn.rm.address= ip of resource manager in yarn. ex: localhost 
 2. yarn.rm.port= the port of the resource manager http server. ex: 8088 
 
 The config manger will periodically poll the coordinator stream to see if 
 there are any new messages. This period is set to 100 ms by deafualt. 
 However, it can be configured by adding 
 configManager.polling.interval=polling interval to the input config file. 
 Thus, overal the command to run the config manager along with the job would 
 be:
 
 
 path to samza deployment/bin/run-config-manager.sh --config-factory=config 
 factory --config-path=path to config file of a job
 
 
 Diffs
 -
 
   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
   
 samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
  PRE-CREATION 
   
 samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java
  PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
  b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  ea702a919348305ff95ce0b4ca1996a13aff04ec 
   
 samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala
  ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
 
 Diff: https://reviews.apache.org/r/36006/diff/
 
 
 Testing
 ---
 
 Tested with hello samza and works properly.
 
 
 Thanks,
 
 Shadi A. Noghabi
 




Re: Review Request 37069: SAMZA-738 Samza Timer based metrics does not have enough precision

2015-08-18 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37069/#review95692
---

Ship it!


LGTM.

- Yi Pan (Data Infrastructure)


On Aug. 5, 2015, 2:58 p.m., Aleksandar Pejakovic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37069/
 ---
 
 (Updated Aug. 5, 2015, 2:58 p.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Changed SystemProducersMetrics and RunLoop so that metrics now show 
 nanoseconds instead milliseconds.
 
 
 Diffs
 -
 
   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 
   
 samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
  aa7a9bc 
   samza-core/src/main/scala/org/apache/samza/util/TimerUtils.scala 1643070 
   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
 64a5844 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala
  39c54aa 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducerMetrics.scala
  8aa73ce 
 
 Diff: https://reviews.apache.org/r/37069/diff/
 
 
 Testing
 ---
 
 Tested on hello-samza - wikipedia-parser, results:
 ```
 org.apache.samza.container.SamzaContainerMetrics:{
   commit-calls:10,
   window-ns:3198.62544796632,
   process-null-envelopes:56292,
   process-envelopes:989,
   window-calls:0,
   commit-ns:5130.901534393375,
   send-calls:0,
   process-calls:57283,
   choose-ns:10368839.818551894,
   process-ns:10390588.194071393,
   event-loop-utilization:0.99807554
 }
 ```
 
 
 Thanks,
 
 Aleksandar Pejakovic
 




Re: Review Request 37506: WIP: SAMZA-552 Operator API change: New Builder API

2015-08-24 Thread Yi Pan (Data Infrastructure)
/ the isSystemStream flag in the EntityName class?



samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/TopologyBuilderV2.java
 (line 100)
https://reviews.apache.org/r/37506/#comment151398

nit: why don't we call it addOperator() directly?



samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/TopologyBuilderV2.java
 (line 133)
https://reviews.apache.org/r/37506/#comment151399

Me neither. I don't see the need to emit the table to a stream either.



samza-sql-core/src/main/java/org/apache/samza/sql/operators/factory/TopologyBuilderV2.java
 (line 136)
https://reviews.apache.org/r/37506/#comment151400

So, I assume that the stack is used as intermediate context for DAG 
computation? It works for computations like algebra. What I am worried about is 
that when the non-algebra types of operators (such as split operator in my 
previous examples, or in a case where one intermediate result is used by 
multiple downstream operators as input) are needed, this builder will need to 
be completely re-written, due to the strict stack-implementation that limits 
the types of computation it can support. I would prefer to have a generic 
implementation that can support more than DAG type of computation, but we can 
keep the API to look like fluent style for DAGs.



samza-sql-core/src/main/java/org/apache/samza/sql/operators/modify/InsertToStreamOp.java
 (line 16)
https://reviews.apache.org/r/37506/#comment151401

Question: I am not quite sure about why we need this. Is it simply a 
projection operator that directly send output to the system streams?



samza-sql-core/src/test/java/org/apache/samza/task/sql/StreamSqlTask.java (line 
89)
https://reviews.apache.org/r/37506/#comment151402

The goal here is to use the topology builder to generate the query. Can you 
update the code here to use the topology builder?



samza-sql-core/src/test/java/org/apache/samza/task/sql/UserCallbacksSqlTask.java
 (line 119)
https://reviews.apache.org/r/37506/#comment151405

The previous discussion has led us to the point that we think that using 
OperatorBuilder seems to be easier here:
this.simpleRtr = TopologyBuilder.create()
.join(OperatorBuilder.window()
.size(10).source(kafka:inputstream2)
.setCallback(this.wndCallback),
  OperatorBuilder.window()
.size(10).source(kafka:inputstream1)
.setCallback(this.wndCallback),
  OperatorBuilder.join().setJoinFields(new ArrayListString() {{ 
add(key1); add(key2);}})
.partition(OperatorBuilder.partition()
.setPartitionKey(joinKey)
.setPartitionNum(50)
.setOutput(kafka:parOutputStrm1))
.build()

In which, all intermediate streams that are immediately consumed by the 
downstream operators are not named. Only the actual input/output streams are 
named. And OperatorBuilders are passed in as parameters to TopologyBuilder, 
s.t. intermediate stream/table names are generated and set to the 
OperatorBuilders within the Topology, w/o users to involved. Also, w/ the 
OperatorBuilder model, it would be easier to build a more flexible non-DAG 
topology later: users can name the operator's outputs s.t. it can be consumed 
by multiple downstream operators. I agree that it should not be the first 
priority to implement it. But it would be nice to keep the door open, instead 
of requiring re-implementing the TopologyBuilder layer later.


- Yi Pan (Data Infrastructure)


On Aug. 16, 2015, 3:57 p.m., Milinda Pathirage wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37506/
 ---
 
 (Updated Aug. 16, 2015, 3:57 p.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure) and Navina Ramesh.
 
 
 Bugs: SAMZA-552
 https://issues.apache.org/jira/browse/SAMZA-552
 
 
 Repository: samza
 
 
 Description
 ---
 
 New proposal for TopologuBuilder API proposed in rb34500 
 (https://reviews.apache.org/r/34500/).
 
 * Created a new class called TopologyBuilderV2 instead of changing existing 
 TopologyBuilder
 * org.apache.samza.sql.operators.factory.TestTopologyBuilderV2 contains two 
 tests which demonstrate the basic usage of the new API
 * Window and aggregate related draft APIs are not done yet
 * This is a WIP, please feel free to comment on the APIs
 * This contains Yi's changes from RB 34500
 
 
 Diffs
 -
 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/EntityName.java 
 80ba455 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Schema.java 
 1e8f192 
   samza-sql-core/src/main/java/org/apache/samza/sql/api/data/Table.java 
 7b4d984 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/expressions/ScalarExpression.java
  PRE-CREATION 
   
 samza-sql-core/src/main/java/org/apache/samza/sql/api/expressions/TupleExpression.java
  PRE

Re: Review Request 37528: SAMZA-736 BrokerProxy will stuck in infinite loop if consumer.fetch throws OOME

2015-08-20 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37528/#review96004
---


LGTM. Just one nit. Thanks!


samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestBrokerProxy.scala 
(line 307)
https://reviews.apache.org/r/37528/#comment151197

nit: trailing white space.


- Yi Pan (Data Infrastructure)


On Aug. 19, 2015, 9:03 a.m., Aleksandar Pejakovic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37528/
 ---
 
 (Updated Aug. 19, 2015, 9:03 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Added new catch blocks to prevent infinite loops
 
 
 Diffs
 -
 
   
 samza-core/src/main/scala/org/apache/samza/util/ExponentialSleepStrategy.scala
  376b277 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 614f33f 
   
 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestBrokerProxy.scala
  e285dec 
 
 Diff: https://reviews.apache.org/r/37528/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aleksandar Pejakovic
 




Re: Review Request 36903: SAMZA-744: shutdown stores before shutdown producers

2015-07-29 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36903/
---

(Updated July 29, 2015, 9:48 p.m.)


Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Navina 
Ramesh.


Summary (updated)
-

SAMZA-744: shutdown stores before shutdown producers


Bugs: SAMZA-744
https://issues.apache.org/jira/browse/SAMZA-744


Repository: samza


Description
---

SAMZA-744: shutdown stores before shutdown producers


Diffs
-

  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
27b2517048ad5730762506426ee7578c66181db8 

Diff: https://reviews.apache.org/r/36903/diff/


Testing (updated)
---

./bin/check-all.sh passed


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 36905: SAMZA-745 elasticsearch module has Javadoc warning

2015-07-29 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36905/#review93479
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On July 29, 2015, 9:10 a.m., Aleksandar Pejakovic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36905/
 ---
 
 (Updated July 29, 2015, 9:10 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Quick fix for javadoc.
 
 
 Diffs
 -
 
   
 samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemFactory.java
  d8ca70e 
 
 Diff: https://reviews.apache.org/r/36905/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aleksandar Pejakovic
 




Review Request 36903: SAMZA-744: WIP patch

2015-07-28 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36903/
---

Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Navina 
Ramesh.


Bugs: SAMZA-744
https://issues.apache.org/jira/browse/SAMZA-744


Repository: samza


Description
---

SAMZA-744: shutdown stores before shutdown producers


Diffs
-

  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
27b2517048ad5730762506426ee7578c66181db8 

Diff: https://reviews.apache.org/r/36903/diff/


Testing
---


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

2015-07-29 Thread Yi Pan (Data Infrastructure)


 On July 22, 2015, 7:08 p.m., Yi Pan (Data Infrastructure) wrote:
  samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java,
   line 318
  https://reviews.apache.org/r/36006/diff/2/?file=1015151#file1015151line318
 
  Did you update the container count in this config object? I couldn't 
  find the code to update this before you start the JobRunner?
 
 Shadi A. Noghabi wrote:
 There is no need to change the config, since the new configuration is 
 written to the coordinator stream, and it will be picked up there. I thought 
 changing the config file might make it confusing.

OK. This is not clear to me. Who is writing the new configuration to the 
coordinator stream? What's the sequence of messages written in the coordinator 
stream, between the new job configuration, the set-configure messages changing 
the JobCoordinator URL, and the set-configure messages changing the number of 
containers? Let's sync up tomorrow.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92625
---


On July 28, 2015, 2:39 a.m., Shadi A. Noghabi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36006/
 ---
 
 (Updated July 28, 2015, 2:39 a.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
 Naveen Somasundaram.
 
 
 Repository: samza
 
 
 Description
 ---
 
 After a job is submitted, it might need some configuration change, 
 specifically it might need more containers. In SAMZA-704 a tool is being 
 added to write to the coordinator stream (CoordinatorStreamWriter).  This 
 tool can be used to write new configurations to the coordinator stream. 
 However, another tool (ConfigManager) is needed to read the config changes 
 and react to them, which is the goal of this task. This tool should be 
 brought up after the job is submitted and read any config changes added to 
 the coordinator stream, and react to each accordingly. 
 
 This tool, called the Config Manager, is focusing on handling container 
 changs by reacting to set-config massages with key yarn.container.count. 
 
 The config manager is a separate standa alone module, that should be brought 
 up separately after the submission of a job. Therefore, you have to add two 
 configurations to the input config file:
 1. yarn.rm.address= ip of resource manager in yarn. ex: localhost 
 2. yarn.rm.port= the port of the resource manager http server. ex: 8088 
 
 The config manger will periodically poll the coordinator stream to see if 
 there are any new messages. This period is set to 100 ms by deafualt. 
 However, it can be configured by adding 
 configManager.polling.interval=polling interval to the input config file. 
 Thus, overal the command to run the config manager along with the job would 
 be:
 
 
 path to samza deployment/bin/run-config-manager.sh --config-factory=config 
 factory --config-path=path to config file of a job
 
 
 Diffs
 -
 
   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
   
 samza-autoscaling/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
  PRE-CREATION 
   
 samza-autoscaling/src/main/java/org/apache/samza/autoScaling/utils/YarnUtil.java
  PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
  b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  ea702a919348305ff95ce0b4ca1996a13aff04ec 
   
 samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterService.scala
  ce88698c12c4bf6f4cf128f92d60b0b9496997d7 
   settings.gradle 19bff971ad221084dac10d3f7f3facfa42b829a7 
 
 Diff: https://reviews.apache.org/r/36006/diff/
 
 
 Testing
 ---
 
 Tested with hello samza and works properly.
 
 
 Thanks,
 
 Shadi A. Noghabi
 




Re: Review Request 36815: SAMZA-741 Support for versioning with Elasticsearch Producer

2015-07-29 Thread Yi Pan (Data Infrastructure)


 On July 29, 2015, 5:47 a.m., Yi Pan (Data Infrastructure) wrote:
  samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java,
   line 149
  https://reviews.apache.org/r/36815/diff/4/?file=1024086#file1024086line149
 
  Quick question: Is it guaranteed that there is no DeleteResponse here? 
  It would be good to at least log a warn if we get an unexpected response 
  here.
 
 Roger Hoover wrote:
 It is guaranteed that you will not get a DeleteResponse back because the 
 producer currently only allows IndexRequests.  In the furture, if it supports 
 DeleteRequest then we should add a counter metric for deletes.

@Roger, thanks for the explanation. My point is: it would be good to make the 
code detect and log unexpected response.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36815/#review93395
---


On July 29, 2015, 5:17 a.m., Roger Hoover wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36815/
 ---
 
 (Updated July 29, 2015, 5:17 a.m.)
 
 
 Review request for samza and Dan Harvey.
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-741 Add support for versioning to Elasticsearch System Producer
 
 
 Diffs
 -
 
   
 samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java
  f61bd36 
   
 samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducerMetrics.java
  e3b635b 
   
 samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/indexrequest/DefaultIndexRequestFactory.java
  afe0eee 
   
 samza-elasticsearch/src/test/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducerMetricsTest.java
  980964f 
   
 samza-elasticsearch/src/test/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducerTest.java
  684d7f6 
 
 Diff: https://reviews.apache.org/r/36815/diff/
 
 
 Testing
 ---
 
 Refactored DefaultIndexRequestFactory to make it easier to subclass and 
 customize to handle version and version_type parameters.
 
 
 Thanks,
 
 Roger Hoover
 




Re: Review Request 36163: SAMZA-690: changelog topic creation should not be in the container code

2015-08-05 Thread Yi Pan (Data Infrastructure)


 On July 27, 2015, 7:36 a.m., Yi Pan (Data Infrastructure) wrote:
  The code LGTM. For testing, if we can verify this fix w/ a stateful 
  StreamTask w/ changelog enabled with some partition numbers that are 
  different from the default auto-creation partition number (i.e. 8) in 
  Kafka, it would be good. The integration test suite in samza-test should be 
  a good place to add the test there. Try following the steps in 
  samza-test/src/main/config/join/README and run the integration test. The 
  joiner task has a changelog configured with partition number of 2. You can 
  verify the test passed w/ your fix.
 
 Robert Zuljevic wrote:
 Hi Yi, sorry for bothering you so much with this task : ) I'll just write 
 down what I managed to do regarding integration tests:
 
 1. I ran integration tests via Zopkio and they all finished successfully.
 2. I ran the integration per guide in 
 samza-test/src/main/config/join/README and I suspect they ran successfully, 
 since none of them had an abnormal final status. I also ran the failure tests 
 (albeit after some limited fidling with the python scripts involved).
 3. I ran ./gradlew clean build (which runs TestStatefulTask). It 
 finished with a STANDARD_ERROR, which I assume is a good thing, but here is 
 the output, just in case: http://pastebin.com/aLT5jRdd
 
 What I suspect are the next (possible) steps:
 
 1. Create integration tests to be used with Zopkio. Here I am uncertain 
 how I would kill/stop Samza task to verify that changelog stream is being 
 consumed properly.
 2. Create another set of tasks similar to Checker/Emitter/Joiner/Watcher. 
 I believe this is unnecessary since they have their changelogs and their 
 restartability is being tested. Of course, I might be wrong.
 3. Add another test similar to TestStatefulTask.
a. Or add num.partitions param to TestStatefulTask.
 4. None of the above : )

 Again, I am very sorry for relying on you this much, but I'm really 
 unclear on how to proceed regarding this.

Hi, @Robert, sorry that I was not too specific in the comment before. If you 
have successfully ran the integration tests via the steps in 
samza-test/src/main/config/join/README. It should be good to go. Thanks!


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36163/#review93093
---


On July 9, 2015, 2:39 p.m., Robert Zuljevic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36163/
 ---
 
 (Updated July 9, 2015, 2:39 p.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Removed trailing whitespaces
 
 
 Diffs
 -
 
   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
 7a588ebc99b5f07d533e48e10061a3075a63665a 
   
 samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
  249b8ae3a904716ea51a2b27c7701ac30d13b854 
   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
 8ee034a642a13af1b0fdb4ebbb3b2592bb8e2be1 
   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
 aeba61a95371faaba23c97d896321b8d95467f87 
   
 samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala
  097f41062f3432ae9dc9a9737b48ed7b2f709f20 
   
 samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 
 8d54c4639fc226b34e64915935c1d90e5917af2e 
   
 samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala
  d9ae187c7707673fe15c8cb7ea854e02c4a89a54 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
  35086f54f526d5d88ad3bc312b71fce40260e7c6 
   samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java 
 b063366f0f60e401765a000fa265c59dee4a461e 
   
 samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala
  1e936b42a5b9a4bfb43766c17847b2947ebdb21d 
 
 Diff: https://reviews.apache.org/r/36163/diff/
 
 
 Testing
 ---
 
 I wasn't really sure what kind of test (unit test / integration test) I 
 should make here, so any pointers would be greatly appreaciated! I tested the 
 change with the unit/integration tests already available.
 
 
 Thanks,
 
 Robert Zuljevic
 




Re: Review Request 37102: SAMZA-753: BrokerProxy stop should shutdown kafka consumer first

2015-08-04 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37102/#review94146
---

Ship it!


This makes sense! Thanks Yan! Did you verify with some test? It would be good 
to verify it.

- Yi Pan (Data Infrastructure)


On Aug. 4, 2015, 11:07 p.m., Yan Fang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37102/
 ---
 
 (Updated Aug. 4, 2015, 11:07 p.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-753
 https://issues.apache.org/jira/browse/SAMZA-753
 
 
 Repository: samza
 
 
 Description
 ---
 
 shutdown the kafka consumer before interrupting the BrokerProxy
 
 
 Diffs
 -
 
   samza-kafka/src/main/scala/org/apache/samza/system/kafka/BrokerProxy.scala 
 614f33f 
 
 Diff: https://reviews.apache.org/r/37102/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Yan Fang
 




Re: Review Request 37069: SAMZA-738 Samza Timer based metrics does not have enough precision

2015-08-04 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37069/#review94122
---



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 46)
https://reviews.apache.org/r/37069/#comment148638

Why are we adding a new clock here? changing clock() implementation to:
val clock: () = Long = {System.nanoTime}

Should work as well.



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 49)
https://reviews.apache.org/r/37069/#comment148634

The variable should be lastWindowNs and lastCommitNs



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 78)
https://reviews.apache.org/r/37069/#comment148635

This should be totalNs



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 107)
https://reviews.apache.org/r/37069/#comment148636

This variable name should also be activeNs



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 134)
https://reviews.apache.org/r/37069/#comment148637

Same here.



samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (line 135)
https://reviews.apache.org/r/37069/#comment148641

This does not need to be done by metricClocks. The original code works. 
Just that there is optimization we should do here to reduce the number of 
clock() calls. updateTimerAndGetDuration internally calls clock(), and we are 
calling clock() twice here as well. Should at least cache the return value from 
clock() and re-use it here.


- Yi Pan (Data Infrastructure)


On Aug. 4, 2015, 9:18 a.m., Aleksandar Pejakovic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/37069/
 ---
 
 (Updated Aug. 4, 2015, 9:18 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Changed SystemProducersMetrics and RunLoop so that metrics now show 
 nanoseconds instead milliseconds.
 
 
 Diffs
 -
 
   samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 
   
 samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
  aa7a9bc 
 
 Diff: https://reviews.apache.org/r/37069/diff/
 
 
 Testing
 ---
 
 Tested on hello-samza - wikipedia-parser, results:
 ```
 org.apache.samza.container.SamzaContainerMetrics:{
   commit-calls:10,
   window-ns:3198.62544796632,
   process-null-envelopes:56292,
   process-envelopes:989,
   window-calls:0,
   commit-ns:5130.901534393375,
   send-calls:0,
   process-calls:57283,
   choose-ns:10368839.818551894,
   process-ns:10390588.194071393,
   event-loop-utilization:0.99807554
 }
 ```
 
 
 Thanks,
 
 Aleksandar Pejakovic
 




Re: Review Request 36903: SAMZA-744: shutdown stores before shutdown producers

2015-08-04 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36903/
---

(Updated Aug. 4, 2015, 9:30 p.m.)


Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Navina 
Ramesh.


Changes
---

Added integration tests and verified the fix.
- Restructured the stateful task tests s.t. it is easier to add more tests.


Bugs: SAMZA-744
https://issues.apache.org/jira/browse/SAMZA-744


Repository: samza


Description
---

SAMZA-744: shutdown stores before shutdown producers


Diffs (updated)
-

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
27b2517048ad5730762506426ee7578c66181db8 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTest.scala
 PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala
 PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 ea702a919348305ff95ce0b4ca1996a13aff04ec 
  samza-test/src/test/scala/org/apache/samza/test/integration/TestTask.scala 
PRE-CREATION 

Diff: https://reviews.apache.org/r/36903/diff/


Testing
---

./bin/check-all.sh passed


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 36903: SAMZA-744: shutdown stores before shutdown producers

2015-08-06 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36903/
---

(Updated Aug. 6, 2015, 11:24 a.m.)


Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Navina 
Ramesh.


Changes
---

Addressed the comments on test code refactoring.


Bugs: SAMZA-744
https://issues.apache.org/jira/browse/SAMZA-744


Repository: samza


Description
---

SAMZA-744: shutdown stores before shutdown producers


Diffs (updated)
-

  build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
27b2517048ad5730762506426ee7578c66181db8 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownStatefulTask.scala
 PRE-CREATION 
  
samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
 ea702a919348305ff95ce0b4ca1996a13aff04ec 

Diff: https://reviews.apache.org/r/36903/diff/


Testing
---

./bin/check-all.sh passed


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 36903: SAMZA-744: shutdown stores before shutdown producers

2015-08-06 Thread Yi Pan (Data Infrastructure)


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTest.scala,
   lines 47-49
  https://reviews.apache.org/r/36903/diff/2/?file=1029022#file1029022line47
 
  we need to remove the author information. :) And maybe add some java 
  doc instead.
  
  My 2 cents:
  1. If this is a real test, to be consistent, we may want to use 
  TestStreamTask (begin with Test), or change all other TestSomething to 
  SomethingTest (e.g. change TestStateful to StatefulTest)
  
  2. If this is not a real test, I prefer something like StreamTaskUtil 
  to be less ambiguous.

Done.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTest.scala,
   line 96
  https://reviews.apache.org/r/36903/diff/2/?file=1029022#file1029022line96
 
  is this tag used?

Removed.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTest.scala,
   line 148
  https://reviews.apache.org/r/36903/diff/2/?file=1029022#file1029022line148
 
  same, is this used?

Removed


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTest.scala,
   line 169
  https://reviews.apache.org/r/36903/diff/2/?file=1029022#file1029022line169
 
  There is no TestJob. (I know, it is copy/paste issue :)

Fixed.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala,
   line 64
  https://reviews.apache.org/r/36903/diff/2/?file=1029023#file1029023line64
 
  From the description, it is not testing the Container Shutdown, 
  actually it is testing the store restoring feature.

Updated the description


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala,
   line 66
  https://reviews.apache.org/r/36903/diff/2/?file=1029023#file1029023line66
 
  Since we already are doing the abstraction, is it possible to put the 
  common config into StreamTastTest object? Becaue I see a lot of the same 
  configs in ShutdownContainerTest and TestStatefulTask.

Done.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala,
   lines 87-89
  https://reviews.apache.org/r/36903/diff/2/?file=1029023#file1029023line87
 
  in the 0.10.0, we do not have checkpoint factory, I believe

There is issues w/ removing. Opened JIRA SAMZA-754 for it.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala,
   lines 142-146
  https://reviews.apache.org/r/36903/diff/2/?file=1029023#file1029023line142
 
  are those two methods used anywhere?

Removed.


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestShutdownContainer.scala,
   line 165
  https://reviews.apache.org/r/36903/diff/2/?file=1029023#file1029023line165
 
  how about adding override?

Done


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala,
   line 227
  https://reviews.apache.org/r/36903/diff/2/?file=1029024#file1029024line227
 
  actually i do not understand why we need a companion object here. We 
  just use the default task number, 1.
  
  And awaitTaskRegistered and register methods are not used anywhere.

Removed


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestTask.scala, 
  lines 32-34
  https://reviews.apache.org/r/36903/diff/2/?file=1029025#file1029025line32
 
  Instead of the author information, I think putting some java doc 
  explaining this class/object will be better.

Fixed


 On Aug. 5, 2015, 7:48 a.m., Yan Fang wrote:
  samza-test/src/test/scala/org/apache/samza/test/integration/TestTask.scala, 
  line 37
  https://reviews.apache.org/r/36903/diff/2/?file=1029025#file1029025line37
 
  rm ;

Fixed


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36903/#review94193
---


On Aug. 6, 2015, 11:24 a.m., Yi Pan (Data Infrastructure) wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36903/
 ---
 
 (Updated Aug. 6, 2015, 11:24 a.m.)
 
 
 Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and 
 Navina Ramesh.
 
 
 Bugs: SAMZA-744
 https://issues.apache.org/jira/browse/SAMZA-744
 
 
 Repository: samza
 
 
 Description
 ---
 
 SAMZA-744

Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

2015-07-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92616
---



build.gradle (line 150)
https://reviews.apache.org/r/36006/#comment146824

We already are including jackson libraries as dependencies. What are 
additional functions that must require gson lib here? Generally, we want to be 
stingy in expanding out dependencies unless absolutely necessary.



build.gradle (line 151)
https://reviews.apache.org/r/36006/#comment146825

Same question here. We want to make samza-core depends on less externals. I 
start to think that it might be reasonable to move the tools to a separate 
module like samza-man?



samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala (line 52)
https://reviews.apache.org/r/36006/#comment146826

And how does the JobRunner knows whether it is starting the job the first 
time, if the first time is defined across job restarts?



samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala (line 
127)
https://reviews.apache.org/r/36006/#comment146828

Question: can we move this code into the JobCoordinator.start()? I don't 
see any reason why this has to be outside JobCoordinator?


Some high-level comments: I don't see a reason that these tools classes have to 
be in samza-core. Can we move it to a separate module, like samza-man? That 
will also allow us to keep the minimum dependencies in samza-core. I will 
continue w/ the implementation of ConfigManager.

- Yi Pan (Data Infrastructure)


On July 18, 2015, 2:52 a.m., Shadi A. Noghabi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36006/
 ---
 
 (Updated July 18, 2015, 2:52 a.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
 Naveen Somasundaram.
 
 
 Repository: samza
 
 
 Description
 ---
 
 After a job is submitted, it might need some configuration change, 
 specifically it might need more containers. In SAMZA-704 a tool is being 
 added to write to the coordinator stream (CoordinatorStreamWriter).  This 
 tool can be used to write new configurations to the coordinator stream. 
 However, another tool (ConfigManager) is needed to read the config changes 
 and react to them, which is the goal of this task. This tool should be 
 brought up after the job is submitted and read any config changes added to 
 the coordinator stream, and react to each accordingly. 
 
 This tool, called the Config Manager, is focusing on handling container 
 changs by reacting to set-config massages with key yarn.container.count. 
 
 The config manager is a separate standa alone module, that should be brought 
 up separately after the submission of a job. Therefore, you have to add two 
 configurations to the input config file:
 1. yarn.rm.address= ip of resource manager in yarn. ex: localhost 
 2. yarn.rm.port= the port of the resource manager http server. ex: 8088 
 
 The config manger will periodically poll the coordinator stream to see if 
 there are any new messages. This period is set to 100 ms by deafualt. 
 However, it can be configured by adding 
 configManager.polling.interval=polling interval to the input config file. 
 Thus, overal the command to run the config manager along with the job would 
 be:
 
 
 path to samza deployment/bin/run-config-manager.sh --config-factory=config 
 factory --config-path=path to config file of a job
 
 
 Diffs
 -
 
   build.gradle 0852adc4e8e0c2816afd1ebf433f1af6b44852f7 
   checkstyle/import-control.xml 6654319392929857bb861d77763afd8a5ea7674c 
   gradle/dependency-versions.gradle fb06e8ed393d1a38abfa1a48fe5244fc7f6c7339 
   
 samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
  PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
  b1078bdf7bddd16c9ccc6559b9efd40ca5ae67bc 
   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
 1c178a661e449c6bdfc4ce431aef9bb2d261a6c2 
   samza-shell/src/main/bash/run-config-manager.sh PRE-CREATION 
   
 samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala
  ea702a919348305ff95ce0b4ca1996a13aff04ec 
   samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMaster.scala 
 af42c6a6636953a95f79837fe372e0dbd735df70 
   
 samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMaster.scala 
 7b7d86a43c69e72c47eaa91f68be24e0f4022891 
 
 Diff: https://reviews.apache.org/r/36006/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Shadi A. Noghabi
 




Re: Review Request 36006: Writing a tool to read from the coordinator stream and react to config changes accordingly.

2015-07-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36006/#review92625
---



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 21)
https://reviews.apache.org/r/36006/#comment146833

We don't use camel style package names in Samza. It should be 
org.apache.samza.autoscaling.deployer



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 123)
https://reviews.apache.org/r/36006/#comment146838

Question: shouldn't the JobConfig be returned from the JobCoordinator's web 
interface? Or, more precisely, this ConfigManager should only need to know the 
CoordinatorStream topic to start with, since the JobCoordinator will send its 
url info into the CoordinatorStream topic. Even the initial config can be read 
from that topic as well.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 131)
https://reviews.apache.org/r/36006/#comment146839

As Navina pointed out, we could have wrapped the Yarn related 
variables/functions into a YarnUtils class, which can hide all the details 
about yarnClient, hConfig, etc.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 188)
https://reviews.apache.org/r/36006/#comment146841

Any reason the boostrap, skipUnreadMessages, readConfigMessages need to be 
public?



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 217)
https://reviews.apache.org/r/36006/#comment146842

The function name does not convey what it actually does. This is *not* just 
readConfigMessages, this method is *processing* messages in coordinator 
streams. I am still a bit confused among the three modes you defined here. If I 
read correctly, you are referring to three different *process_mode* here:
1) SKIP_ALL
2) PROCESS_SERVER_URL
3) PROCESS_CONTAINER_COUNT_AND_SERVER_URL
Does the above make sense? I would suggest that you define a set-config 
message filter here that can define the list of messages that you want to react 
on. Therefore, if you want to skip all, the filter filters everything. And you 
can put arbitrary combination of different types of messages you want to react 
to, instead of just 3 different combinations here.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 303)
https://reviews.apache.org/r/36006/#comment146843

Change to logger-based logging. And the message should be killing the 
current job.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 310)
https://reviews.apache.org/r/36006/#comment146844

And add a Killed the current job log line after confirming the job is 
dead.



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 318)
https://reviews.apache.org/r/36006/#comment146850

Did you update the container count in this config object? I couldn't find 
the code to update this before you start the JobRunner?



samza-core/src/main/java/org/apache/samza/autoScaling/deployer/ConfigManager.java
 (line 338)
https://reviews.apache.org/r/36006/#comment146848

I don't see why jackson lib can't do this? Can we avoid adding the Gson 
dependency here?



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
 (line 92)
https://reviews.apache.org/r/36006/#comment146851

How is it used? I don't see it used in ConfigManager. Where else could it 
be used? If no one uses it, we should remove this function.


- Yi Pan (Data Infrastructure)


On July 18, 2015, 2:52 a.m., Shadi A. Noghabi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36006/
 ---
 
 (Updated July 18, 2015, 2:52 a.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
 Naveen Somasundaram.
 
 
 Repository: samza
 
 
 Description
 ---
 
 After a job is submitted, it might need some configuration change, 
 specifically it might need more containers. In SAMZA-704 a tool is being 
 added to write to the coordinator stream (CoordinatorStreamWriter).  This 
 tool can be used to write new configurations to the coordinator stream. 
 However, another tool (ConfigManager) is needed to read the config changes 
 and react to them, which is the goal of this task. This tool should be 
 brought up after the job is submitted and read any config changes added to 
 the coordinator stream, and react to each accordingly. 
 
 This tool, called the Config Manager, is focusing on handling container 
 changs by reacting to set-config massages with key yarn.container.count. 
 
 The config manager

Re: Review Request 36163: SAMZA-690: changelog topic creation should not be in the container code

2015-07-27 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36163/#review93093
---


The code LGTM. For testing, if we can verify this fix w/ a stateful StreamTask 
w/ changelog enabled with some partition numbers that are different from the 
default auto-creation partition number (i.e. 8) in Kafka, it would be good. The 
integration test suite in samza-test should be a good place to add the test 
there. Try following the steps in samza-test/src/main/config/join/README and 
run the integration test. The joiner task has a changelog configured with 
partition number of 2. You can verify the test passed w/ your fix.


samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
(line 132)
https://reviews.apache.org/r/36163/#comment147349

Just want to make a note here. When Kafka admin API allows a read-only flag 
in fetchMetadata(), we should use it in validateChangelogStream() to avoid 
auto-creation of topic w/ un-wanted partition numbers.



samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
(line 400)
https://reviews.apache.org/r/36163/#comment147350

Nit: It would be good to put a note here: validateChangelogStream() should 
not be called before createChangelogStream(), before Kafka fixes the admin API 
to add a read-only flag in fetchMetadata().


- Yi Pan (Data Infrastructure)


On July 9, 2015, 2:39 p.m., Robert Zuljevic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36163/
 ---
 
 (Updated July 9, 2015, 2:39 p.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Removed trailing whitespaces
 
 
 Diffs
 -
 
   samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 
 7a588ebc99b5f07d533e48e10061a3075a63665a 
   
 samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
  249b8ae3a904716ea51a2b27c7701ac30d13b854 
   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
 8ee034a642a13af1b0fdb4ebbb3b2592bb8e2be1 
   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
 aeba61a95371faaba23c97d896321b8d95467f87 
   
 samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala
  097f41062f3432ae9dc9a9737b48ed7b2f709f20 
   
 samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 
 8d54c4639fc226b34e64915935c1d90e5917af2e 
   
 samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala
  d9ae187c7707673fe15c8cb7ea854e02c4a89a54 
   
 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
  35086f54f526d5d88ad3bc312b71fce40260e7c6 
   samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java 
 b063366f0f60e401765a000fa265c59dee4a461e 
   
 samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala
  1e936b42a5b9a4bfb43766c17847b2947ebdb21d 
 
 Diff: https://reviews.apache.org/r/36163/diff/
 
 
 Testing
 ---
 
 I wasn't really sure what kind of test (unit test / integration test) I 
 should make here, so any pointers would be greatly appreaciated! I tested the 
 change with the unit/integration tests already available.
 
 
 Thanks,
 
 Robert Zuljevic
 




Re: Review Request 39558: Fix parsing errors in broadcast stream config values

2015-10-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39558/#review103657
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On Oct. 22, 2015, 5:12 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39558/
> ---
> 
> (Updated Oct. 22, 2015, 5:12 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, Jake 
> Maes, Jagadish Venkatraman, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-797
> https://issues.apache.org/jira/browse/SAMZA-797
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Fix parsing errors in broadcast stream config values
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/config/TaskConfigJava.java 
> 015e99449f045fa578059f8b23a401274b0503ba 
>   samza-core/src/test/java/org/apache/samza/config/TestTaskConfigJava.java 
> 2d6060ef41ba3105fb9f954d3f12065d699469b8 
> 
> Diff: https://reviews.apache.org/r/39558/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 39252: SAMZA-626 - tool to read the RocksDb in a running job (Yan's patch)

2015-10-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39252/#review103702
---


Overall LGTM. Just a few minor comments on command options. Thanks!


docs/learn/documentation/versioned/container/state-management.md (line 214)
<https://reviews.apache.org/r/39252/#comment161775>

From the code, it seems that the argument is required. We should fix the 
document here.



samza-core/src/main/java/org/apache/samza/config/JavaSerializerConfig.java 
(line 1)
<https://reviews.apache.org/r/39252/#comment161770>

I noticed that we are creating new JavaSerializerConfig, JavaStorageConfig, 
etc. instead of change the scala classes directly to Java. Why is it? It seems 
to be a good opportunity to move part of the code to Java.



samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala (line 74)
<https://reviews.apache.org/r/39252/#comment161771>

nit: trailing whitespaces



samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbKeyValueStoreHelper.java
 (line 34)
<https://reviews.apache.org/r/39252/#comment161772>

nit: maybe rename to RocksDbOptionsHelper, to be more accurate stating the 
function of this class?



samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbReadingTool.java
 (line 80)
<https://reviews.apache.org/r/39252/#comment161774>

Shouldn't we validate only one keyArgu is included in the options?



samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbReadingTool.java
 (line 96)
<https://reviews.apache.org/r/39252/#comment161776>

nit: Since the db-path and db-name are required options, the if conditions 
seem to be unnecessary.


- Yi Pan (Data Infrastructure)


On Oct. 16, 2015, 11:44 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39252/
> ---
> 
> (Updated Oct. 16, 2015, 11:44 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Jake Maes, Yi Pan (Data 
> Infrastructure), and Jagadish Venkatraman.
> 
> 
> Bugs: SAMZA-626
> https://issues.apache.org/jira/browse/SAMZA-626
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Porting changes from Yan's patch in https://reviews.apache.org/r/36973/
> 
> *refactored some java code
> 
> *changed RocksDbKeyValueStore.options form scala to java
> 
> *moved default serde name from container to util, because it is useful to 
> other classes
> 
> *added a class to read the running rocksdb
> 
> *added a commondline tool
> 
> *updated the doc accordingly
> 
> https://reviews.apache.org/r/36973/#issue-summary
> 
> 
> Diffs
> -
> 
>   build.gradle 682d4f80f33939d8471c9f0cecb7ccbf4eb1bfec 
>   docs/learn/documentation/versioned/container/state-management.md 
> 50d4b657582661975369c1caa088d9b8b55d7745 
>   samza-core/src/main/java/org/apache/samza/config/JavaSerializerConfig.java 
> PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java 
> af7d4ca70f77eb4865b52fe71a55f506b60474e7 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> f351ad6959eadde8766140bd3e6334a18bda4886 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 948c19ab39ff1686be4efe6a55a6fc9aa6a01d86 
>   
> samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
> 6de8710d5a4ebceca3df6cd925388441f8823a37 
>   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
> d16726393d844b3de7a814db2a40f71b8feac3dc 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbKeyValueReader.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbKeyValueStoreHelper.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbReadingTool.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
>  571a50e4a9abee25e880db3c268ccf892f2c5125 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  a423f7bd6c43461e051b5fd1f880dd01db785991 
>   
> samza-kv-rocksdb/src/test/java/org/apache/samza/storage/kv/TestRocksDbKeyValueReader.java
>  PRE-CREATION 
>   samza-shell/src/main/bash/read-rocksdb-tool.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/39252/diff/
> 
> 
> Testing
> ---
> 
> ./bin/check-all.sh
> Tested and verified with sample job on Yarn.
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Review Request 39575: SAMZA-793 static config rewriter should be invoked in JobRunner

2015-10-22 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39575/
---

Review request for samza, Yan Fang and Navina Ramesh.


Bugs: SAMZA-793
https://issues.apache.org/jira/browse/SAMZA-793


Repository: samza


Description
---

Move rewriter logic to JobRunner so that:
1. It is invoked once
2. It is invoked no matter which job factory is used


Diffs
-

  samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
a926ce639dbf6776ebfb1d885621f4b6bf5f5aa5 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
d6109ec4b3eedfe32325c6dc420c6c8bb18eb79b 

Diff: https://reviews.apache.org/r/39575/diff/


Testing
---

Verified via test jobs.


Thanks,

Yi Pan (Data Infrastructure)



Re: Review Request 39464: SAMZA-723 - hello-samza hangs when we use StreamAppender (Yan's patch)

2015-10-23 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39464/#review103838
---

Ship it!


LGTM. Thanks!


samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
(line 179)
<https://reviews.apache.org/r/39464/#comment161936>

This is also one of my concern when I was debugging the coordinator HTTP 
request timeout issue. Wouldn't it be better if the StreamAppender was given 
the config instead of embedding the code here to grab the configuration? Then, 
we don't even need this if-else condition here, since the caller should be 
aware of the process context this getConfig is called and supply the 
cooresponding config differently. However, this could be a later code cleanup 
task.


- Yi Pan (Data Infrastructure)


On Oct. 23, 2015, 6:33 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39464/
> ---
> 
> (Updated Oct. 23, 2015, 6:33 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chris Riccomini, Jake Maes, Jagadish 
> Venkatraman, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-723
> https://issues.apache.org/jira/browse/SAMZA-723
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> fixed 2 parts of the problem:
> 1. Start streamAppender until the JobCoordinator is running
> 2. deadlock in Producer thread and the main thread
> 
> More explanation is in JIRA.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> a926ce639dbf6776ebfb1d885621f4b6bf5f5aa5 
>   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
> d98b8c658dd8ae4d952a1004a796d64c229012a7 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> 776a36bd539f747d440a65f844cfcede52625e1b 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  1c6f9a48590d55fe808940adba72415c3ab9614e 
> 
> Diff: https://reviews.apache.org/r/39464/diff/
> 
> 
> Testing
> ---
> 
> Verified with hello-samza.
> Still investigating to understand the original cause and the fix
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 39464: SAMZA-723 - hello-samza hangs when we use StreamAppender (Yan's patch)

2015-10-23 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39464/#review103842
---



samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
(line 105)
<https://reviews.apache.org/r/39464/#comment161939>

Minor suggestions in SamzaAppMaster code: if we do not want to lose the 
first few lines of logs, we can move the initialization of 
coordinatorSystemConfig and the JobCoordinator at the beginning of the main() 
method.


- Yi Pan (Data Infrastructure)


On Oct. 23, 2015, 6:33 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39464/
> ---
> 
> (Updated Oct. 23, 2015, 6:33 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chris Riccomini, Jake Maes, Jagadish 
> Venkatraman, and Yi Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-723
> https://issues.apache.org/jira/browse/SAMZA-723
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> fixed 2 parts of the problem:
> 1. Start streamAppender until the JobCoordinator is running
> 2. deadlock in Producer thread and the main thread
> 
> More explanation is in JIRA.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> a926ce639dbf6776ebfb1d885621f4b6bf5f5aa5 
>   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
> d98b8c658dd8ae4d952a1004a796d64c229012a7 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> 776a36bd539f747d440a65f844cfcede52625e1b 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  1c6f9a48590d55fe808940adba72415c3ab9614e 
> 
> Diff: https://reviews.apache.org/r/39464/diff/
> 
> 
> Testing
> ---
> 
> Verified with hello-samza.
> Still investigating to understand the original cause and the fix
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 39252: SAMZA-626 - tool to read the RocksDb in a running job (Yan's patch)

2015-10-26 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39252/#review104029
---


Except the check for the key variable arguments and minor comments on the 
document, everything looks good. Feel free to commit after you address the 
comments. Thanks!


docs/learn/documentation/versioned/container/state-management.md (line 214)
<https://reviews.apache.org/r/39252/#comment162216>

nit: The description here still says that this is optional.


- Yi Pan (Data Infrastructure)


On Oct. 23, 2015, 11:46 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39252/
> ---
> 
> (Updated Oct. 23, 2015, 11:46 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Jake Maes, Yi Pan (Data 
> Infrastructure), and Jagadish Venkatraman.
> 
> 
> Bugs: SAMZA-626
> https://issues.apache.org/jira/browse/SAMZA-626
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Porting changes from Yan's patch in https://reviews.apache.org/r/36973/
> 
> *refactored some java code
> 
> *changed RocksDbKeyValueStore.options form scala to java
> 
> *moved default serde name from container to util, because it is useful to 
> other classes
> 
> *added a class to read the running rocksdb
> 
> *added a commondline tool
> 
> *updated the doc accordingly
> 
> https://reviews.apache.org/r/36973/#issue-summary
> 
> 
> Diffs
> -
> 
>   build.gradle 682d4f80f33939d8471c9f0cecb7ccbf4eb1bfec 
>   docs/learn/documentation/versioned/container/state-management.md 
> 50d4b657582661975369c1caa088d9b8b55d7745 
>   samza-core/src/main/java/org/apache/samza/config/JavaSerializerConfig.java 
> PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java 
> af7d4ca70f77eb4865b52fe71a55f506b60474e7 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> f351ad6959eadde8766140bd3e6334a18bda4886 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 948c19ab39ff1686be4efe6a55a6fc9aa6a01d86 
>   
> samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
> 6de8710d5a4ebceca3df6cd925388441f8823a37 
>   samza-core/src/test/scala/org/apache/samza/util/TestUtil.scala 
> d16726393d844b3de7a814db2a40f71b8feac3dc 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbKeyValueReader.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbReadingTool.java
>  PRE-CREATION 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
>  571a50e4a9abee25e880db3c268ccf892f2c5125 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  a423f7bd6c43461e051b5fd1f880dd01db785991 
>   
> samza-kv-rocksdb/src/test/java/org/apache/samza/storage/kv/TestRocksDbKeyValueReader.java
>  PRE-CREATION 
>   samza-shell/src/main/bash/read-rocksdb-tool.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/39252/diff/
> 
> 
> Testing
> ---
> 
> ./bin/check-all.sh
> Tested and verified with sample job on Yarn.
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 40106: SAMZA-812 CachedStore flushes too often

2015-11-11 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40106/#review106084
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On Nov. 9, 2015, 9:30 p.m., Tommy Becker wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40106/
> ---
> 
> (Updated Nov. 9, 2015, 9:30 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-812
> https://issues.apache.org/jira/browse/SAMZA-812
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Fix for SAMZA-812. Only flush CachedStore when necessary, with the exception 
> that this preserves the buggy flush behavior for array keys. Otherwise the 
> store will not behave properly for array keys due to the mismatch between the 
> reference semantics of the cache vs the value semantics of the store. See the 
> bug for details.
> 
> 
> Diffs
> -
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
> 1112350 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
> cc9c9f3 
> 
> Diff: https://reviews.apache.org/r/40106/diff/
> 
> 
> Testing
> ---
> 
> Unit tested
> 
> 
> Thanks,
> 
> Tommy Becker
> 
>



Re: Review Request 40106: SAMZA-812 CachedStore flushes too often

2015-11-11 Thread Yi Pan (Data Infrastructure)


> On Nov. 11, 2015, 7:31 a.m., Yi Pan (Data Infrastructure) wrote:
> > samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala, 
> > line 37
> > <https://reviews.apache.org/r/40106/diff/1/?file=1120744#file1120744line37>
> >
> > question: so, here by preserving the old behavior for array keys, the 
> > end result is that the array keys would be immediately flushed out to the 
> > store as they are today, right? Wouldn't it be nicer to fix the 
> > CachedStore's cache hit issue w/ array keys s.t. array keys and other 
> > primitive type of keys behave the same?
> 
> Tommy Becker wrote:
> I think boxed primitives will work ok as is since they have sane 
> equals/hashCode methods; arrays don't. See 
> https://issues.apache.org/jira/browse/SAMZA-505 for background on the current 
> behavior. In short, we could special case arrays, but they thought it would 
> be better to just call out the semantics of the cache more loudly.

Yes. Thanks for digging up the history on this issue. Sounds reasonable.


- Yi


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40106/#review106023
---


On Nov. 9, 2015, 9:30 p.m., Tommy Becker wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40106/
> ---
> 
> (Updated Nov. 9, 2015, 9:30 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-812
> https://issues.apache.org/jira/browse/SAMZA-812
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Fix for SAMZA-812. Only flush CachedStore when necessary, with the exception 
> that this preserves the buggy flush behavior for array keys. Otherwise the 
> store will not behave properly for array keys due to the mismatch between the 
> reference semantics of the cache vs the value semantics of the store. See the 
> bug for details.
> 
> 
> Diffs
> -
> 
>   samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala 
> 1112350 
>   samza-kv/src/test/scala/org/apache/samza/storage/kv/TestCachedStore.scala 
> cc9c9f3 
> 
> Diff: https://reviews.apache.org/r/40106/diff/
> 
> 
> Testing
> ---
> 
> Unit tested
> 
> 
> Thanks,
> 
> Tommy Becker
> 
>



Re: Review Request 40034: added list-yarn-job.sh

2015-11-09 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40034/#review105721
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On Nov. 6, 2015, 7:42 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40034/
> ---
> 
> (Updated Nov. 6, 2015, 7:42 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-810
> https://issues.apache.org/jira/browse/SAMZA-810
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-810.added list-yarn-job.sh
> 
> 
> Diffs
> -
> 
>   samza-shell/src/main/bash/list-yarn-job.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/40034/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Review Request 39119: SAMZA-792: SamzaAppMaster Java code needs to pass the requested container memory size to RM

2015-10-08 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39119/#review101950
---

Ship it!


LGTM. Just a minor comment in test cases.


samza-yarn/src/test/java/org/apache/samza/job/yarn/TestSamzaTaskManager.java 
(line 172)
<https://reviews.apache.org/r/39119/#comment159461>

It would be nice to add a unit test here to verify that the 
SamzaContainerRequest actually uses the configured request parameters in 
YarnConfig, instead of the default ones.


- Yi Pan (Data Infrastructure)


On Oct. 8, 2015, 7:39 a.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39119/
> ---
> 
> (Updated Oct. 8, 2015, 7:39 a.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Jake Maes, and Yi Pan 
> (Data Infrastructure).
> 
> 
> Bugs: SAMZA-792
> https://issues.apache.org/jira/browse/SAMZA-792
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-792: SamzaAppMaster Java code needs to pass the requested container 
> memory size to RM
> 
> 
> Diffs
> -
> 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/AbstractContainerAllocator.java
>  eec1708571cb361d9c228efa19a14b24a3ae4a8e 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/ContainerAllocator.java 
> 9911540ad65cc75fc7f74f97264573ef2a80dad2 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/HostAwareContainerAllocator.java
>  e3b58685084a8643f0ef554a00daadff409a8ffa 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaContainerRequest.java 
> 9441d772af6971ad7ef4665430cf8fe20ce4f24b 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaTaskManager.java 
> 12f2f2cba980c82f07b3919771f841dfca7a7945 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestContainerAllocator.java
>  01f32a47726ef5b8e8512826a3336ddfa7709eaf 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestHostAwareContainerAllocator.java
>  663ea250e88949da13ce2af7dbecd4cb737e75d5 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestSamzaTaskManager.java 
> 4c1eaa9354e3e3cfed9bf5e032d6d9e89a9bd8b5 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/util/MockContainerAllocator.java
>  85f871a85b8fced212c7418d4c9a7f0de702811e 
> 
> Diff: https://reviews.apache.org/r/39119/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Review Request 39119: SAMZA-792: SamzaAppMaster Java code needs to pass the requested container memory size to RM

2015-10-08 Thread Yi Pan (Data Infrastructure)

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39119/#review101974
---

Ship it!


Ship It!

- Yi Pan (Data Infrastructure)


On Oct. 8, 2015, 10:07 p.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39119/
> ---
> 
> (Updated Oct. 8, 2015, 10:07 p.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Jake Maes, and Yi Pan 
> (Data Infrastructure).
> 
> 
> Bugs: SAMZA-792
> https://issues.apache.org/jira/browse/SAMZA-792
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-792: SamzaAppMaster Java code needs to pass the requested container 
> memory size to RM
> 
> 
> Diffs
> -
> 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/AbstractContainerAllocator.java
>  eec1708571cb361d9c228efa19a14b24a3ae4a8e 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/ContainerAllocator.java 
> 9911540ad65cc75fc7f74f97264573ef2a80dad2 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/HostAwareContainerAllocator.java
>  e3b58685084a8643f0ef554a00daadff409a8ffa 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaContainerRequest.java 
> 9441d772af6971ad7ef4665430cf8fe20ce4f24b 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/SamzaTaskManager.java 
> 12f2f2cba980c82f07b3919771f841dfca7a7945 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestContainerAllocator.java
>  01f32a47726ef5b8e8512826a3336ddfa7709eaf 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestHostAwareContainerAllocator.java
>  663ea250e88949da13ce2af7dbecd4cb737e75d5 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/TestSamzaTaskManager.java 
> 4c1eaa9354e3e3cfed9bf5e032d6d9e89a9bd8b5 
>   
> samza-yarn/src/test/java/org/apache/samza/job/yarn/util/MockContainerAllocator.java
>  85f871a85b8fced212c7418d4c9a7f0de702811e 
> 
> Diff: https://reviews.apache.org/r/39119/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



<    1   2   3   4   5   6   7   8   >