Review Request 35918: Monitoring page for REST API and the dashboard

2015-06-26 Thread Aleksandar Bircakovic

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35918/
---

Review request for samza.


Repository: samza


Description
---

Added new monitoring page for REST API and the dashboard and removed dashboard 
from ApplicationMaster. Also added table that shortly explains REST service.


Diffs
-

  docs/learn/documentation/versioned/index.html e1b9f2d 
  docs/learn/documentation/versioned/jobs/reprocessing.md 28d9925 
  docs/learn/documentation/versioned/jobs/web-ui-rest-api.md PRE-CREATION 

Diff: https://reviews.apache.org/r/35918/diff/


Testing
---


Thanks,

Aleksandar Bircakovic



Samza and sliding window

2015-06-26 Thread Shekar Tippur
Hello,
My apologies if I have raised it earlier.
Here is the use case:
I have a stream that is partitioned based on application name. I want to be
able to count hte number of events happening for that particular
application in the past 5 minutes (sliding window) and update either
another topic or a local cache.

Is this possible via 0.9 version of Samza?
If not, what is the easiest way to achieve this?

- Shekar


RE: [SAMZA-690] Changelog topic creation should not be in the container code

2015-06-26 Thread Robert Zuljevic
Hi Yi,

Thank you for your quick response! Your suggestions make a lot of sense, and I 
will begin implementing them right away : )

Met vriendelijke groet / Kind regards,
Robert Žuljević
Software Developer


Address: Trifkovicev trg 6, 21000 Novi Sad, Serbia
Tel.: +31 20 6701 947 | +381 21 2155 500
Mobile: +381 64 428 28 46
Skype: robert.zuljevic
Internet: www.levi9.com

Chamber of commerce Levi9 Holding: 34221951
Chamber of commerce Levi9 IT Services BV: 34224746

This e-mail may contain confidential or privileged information. If you are not 
(one of) the intended recipient(s), please notify the sender immediately by 
reply e-mail and delete this message and any attachments permanently without 
retaining a copy. Any review, disclosure, copying, distribution or taking any 
action in reliance on the contents of this e-mail by persons or entities other 
than the intended recipient(s) is strictly prohibited and may be unlawful.
The services of Levi9 are exclusively subject to its general terms and 
conditions. These general terms and conditions can be found on www.levi9.com 
and a copy will be promptly submitted to you on your request and free of charge.

-Original Message-
From: Yi Pan [mailto:nickpa...@gmail.com] 
Sent: Thursday, June 25, 2015 6:38 PM
To: dev@samza.apache.org
Subject: Re: [SAMZA-690] Changelog topic creation should not be in the 
container code

Hi, Robert,

Thanks for digging into this. I am embedding my answers below:


On Thu, Jun 25, 2015 at 7:40 AM, Robert Zuljevic r.zulje...@levi9.com
wrote:

  1.   Is checkpoint topic referred to in the description coordinator
 stream/topic?


In the master branch, checkpoint topic is deprecated (except for migration 
purpose). The checkpoints will be sent to coordinator stream. Hence, there is 
no need to change the creation of checkpoint topic any more.


  2.   Changelog topic creation is handled by TaskStorageManager class
 which is required by SamzaContainer. Would it be preferable to:

 a.   Create TaskStoreManager instance(s) in JobRunner and pass them
 to SamzaContainer

 b.  Create changelog stream in JobRunner/JobCoordinator and skip it
 in TaskStoreManager

I prefer option b in your above proposal, w/ a slight modification:
1. JobCoordinator now has a changelogManager which reads the changelog 
partition to task mapping from the coordinator stream 2. JobCoordinator also 
has access to the job config and will be able to figure out what are the 
changelog topics needed in the job 3. JobCoordinator start should have an 
additional step to create all the changelog topics needed in the job. If 
exists, validate the partition numbers 4. In TaskStoreManager, we should just 
get the changelog topic metadata and validate the partitions are correct.

Does that sound reasonable?

-Yi






 Met vriendelijke groet / Kind regards,

 Robert Žuljević

 Software Developer

 [image: Title: Levi9 IT Services]
  --

 Address: Trifkovicev trg 6, 21000 Novi Sad, Serbia

 Tel.: +31 20 6701 947 | +381 21 2155 500

 Mobile: +381 64 428 28 46

 Skype: robert.zuljevic

 Internet: www.levi9.com



 Chamber of commerce Levi9 Holding: 34221951

 Chamber of commerce Levi9 IT Services BV: 34224746
  --

 This e-mail may contain confidential or privileged information. If you 
 are not (one of) the intended recipient(s), please notify the sender 
 immediately by reply e-mail and delete this message and any 
 attachments permanently without retaining a copy. Any review, 
 disclosure, copying, distribution or taking any action in reliance on 
 the contents of this e-mail by persons or entities other than the 
 intended recipient(s) is strictly prohibited and may be unlawful.

 The services of Levi9 are exclusively subject to its general terms and 
 conditions. These general terms and conditions can be found on 
 www.levi9.com and a copy will be promptly submitted to you on your 
 request and free of charge.





Re: Best way to log from inside a Samza task?

2015-06-26 Thread Rick Mangi
Hey Jason,

If you configure log4j as described here: 
http://samza.apache.org/learn/documentation/0.9/jobs/logging.html 
http://samza.apache.org/learn/documentation/0.9/jobs/logging.html

Your log statements will wind up in the samza-container logs which you can get 
to via the application master gui.

hth,

Rick


 On Jun 26, 2015, at 1:39 PM, ja...@marketingscience.co wrote:
 
 Hello,
 
 
 I am working on a basic Samza task that pulls from one Kafka topic and writes 
 to another. This task runs in Yarn but the Output topic does not contain any 
 data. 
 
 
 In order to troubleshoot this more effectively I would like to log the 
 incoming message as my example below. Ideally, I would like to be able to see 
 the log messages in Yarn, maybe in the .out files in the /logs directory. 
 
 
 Any advice is appreciated. 
 
 
 
 Here is my task:
 
 
 package com.project.samza.tasks;
 
 
 import java.util.Map;
 import org.apache.samza.system.IncomingMessageEnvelope;
 import org.apache.samza.system.OutgoingMessageEnvelope;
 import org.apache.samza.system.SystemStream;
 import org.apache.samza.task.MessageCollector;
 import org.apache.samza.task.StreamTask;
 import org.apache.samza.task.TaskCoordinator;
 
 
 public class exampleStreamTask implements StreamTask {
   private static final SystemStream OUTPUT_STREAM = new SystemStream(“kafka”, 
 “new-topic-test”);
 
 
   @Override
   public void process(IncomingMessageEnvelope envelope,
   MessageCollector collector,
   TaskCoordinator coordinator) {
   String msg = (String) envelope.getMessage();
   System.out.println(msg);
   collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, msg));
   }
 }
 
 
 Thanks, 
 
 
 Jason



Best way to log from inside a Samza task?

2015-06-26 Thread jason
Hello,


I am working on a basic Samza task that pulls from one Kafka topic and writes 
to another. This task runs in Yarn but the Output topic does not contain any 
data. 


In order to troubleshoot this more effectively I would like to log the incoming 
message as my example below. Ideally, I would like to be able to see the log 
messages in Yarn, maybe in the .out files in the /logs directory. 


Any advice is appreciated. 



Here is my task:


package com.project.samza.tasks;


import java.util.Map;
import org.apache.samza.system.IncomingMessageEnvelope;
import org.apache.samza.system.OutgoingMessageEnvelope;
import org.apache.samza.system.SystemStream;
import org.apache.samza.task.MessageCollector;
import org.apache.samza.task.StreamTask;
import org.apache.samza.task.TaskCoordinator;


public class exampleStreamTask implements StreamTask {
  private static final SystemStream OUTPUT_STREAM = new SystemStream(“kafka”, 
“new-topic-test”);


  @Override
  public void process(IncomingMessageEnvelope envelope,
      MessageCollector collector,
      TaskCoordinator coordinator) {
          String msg = (String) envelope.getMessage();
          System.out.println(msg);
          collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, msg));
  }
}


Thanks, 


Jason

[CANCEL][VOTE] Apache Samza 0.9.1 RC0

2015-06-26 Thread Jakob Homan
As discussed, this vote has been CANCELED.

On 25 June 2015 at 16:34, Yan Fang yanfang...@gmail.com wrote:
 no objection from me. :)

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Thu, Jun 25, 2015 at 4:18 PM, Yi Pan nickpa...@gmail.com wrote:

 Hi, all,

 I have been preparing for the new 0.9.1 RC1 and it is close to be done. I
 am going to cancel this VOTE, if no objections.

 Thanks!

 On Mon, Jun 22, 2015 at 5:41 PM, Yan Fang yanfang...@gmail.com wrote:

  Hi Yi,
 
  This only publishes the artifacts to the staging repository for testing.
  After completing the vote, you can release the artifacts to the public
  repository by clicking the release button. :)
 
  Thanks,
 
  Fang, Yan
  yanfang...@gmail.com
 
  On Mon, Jun 22, 2015 at 5:30 PM, Yi Pan nickpa...@gmail.com wrote:
 
   Hi, Yan,
  
   Thanks for point out that! Actually I saw that last time and had the
   following question: should we publish the artifacts after the VOTE is
   completed or together w/ the VOTE?
   It seems like that we want to publish the binary artifacts together w/
  the
   VOTE, right?
  
   -Yi
  
  
   On Mon, Jun 22, 2015 at 5:25 PM, Yan Fang yanfang...@gmail.com
 wrote:
  
Hi Yi Pan,
   
 Is there any document regarding to how to publish the maven staging
   link?

   
  -- Yes. Check the last part of the
https://github.com/apache/samza/blob/master/RELEASE.md . Not sure if
  you
have seen this. I should have pointed it out earlier. *_*
   
Thanks,
   
Fang, Yan
yanfang...@gmail.com
   
On Mon, Jun 22, 2015 at 3:47 PM, Yi Pan nickpa...@gmail.com wrote:
   
 Hi, guys,

 I am working on the list of things posted by Yan:

 1. I have the difficulty in building the 0.9.1 branch. I think this
  is
 mainly related to SAMZA-721
 https://issues.apache.org/jira/browse/SAMZA-721.

 This seems to be an invalid case in 0.9.1. We only need the
 joint-compilation option in master.

 2. Also, https://issues.apache.org/jira/browse/SAMZA-712 seems
   bothering
 people as well.

 Committed to master and backported to 0.9.1.

 3. https://issues.apache.org/jira/browse/SAMZA-720 is a critical
 bug
   we
 need to fix. Have already attached a patch.

 Plan to backport to 0.9.1.

 4. There is no maven staging link.

 Is there any document regarding to how to publish the maven staging
   link?

 On Mon, Jun 22, 2015 at 3:02 PM, Naveen Somasundaram 
 nsomasunda...@linkedin.com.invalid wrote:

  Hey Yan,
 SAMZA-721 might be because you checkout master and
 switched
  to 0.9.1 branch, and you still have some files from master which
  git
   is
 not
  tracking. Can you try a git clean before you build 0.9.1 ?  AFAIK
  you
 don't
  need joint compilation for core in 0.9.1.
 
  On Mon, Jun 22, 2015 at 1:25 PM, Roger Hoover 
   roger.hoo...@gmail.com
  wrote:
 
   Yan,
  
   I tested to patch locally and it looks good.  Creating a
 patched
 release
   for myself to test in our environment.  Thanks, again.
  
   Sent from my iPhone
  
On Jun 22, 2015, at 10:59 AM, Yi Pan nickpa...@gmail.com
   wrote:
   
Hi, Yan,
   
Thanks a lot for the quick fix on the mentioned bugs. It
 seems
   the
 fix
   for
SAMZA-720 is pretty localized and I am OK to push it into
  0.9.1.
   I
 will
   be
working on back porting those changes to 0.9.1 later today
 and
   fix
 all
   the
release related issues.
   
Thanks!
   
-Yi
   
On Mon, Jun 22, 2015 at 10:30 AM, Roger Hoover 
 roger.hoo...@gmail.com
  
wrote:
   
Yan,
   
You rock.  Thank you so much for the quick fix.  I'm working
  on
  building
and testing the patch.
   
Cheers,
   
Roger
   
On Mon, Jun 22, 2015 at 1:09 AM, Yan Fang 
   yanfang...@gmail.com
   wrote:
   
Hi guys,
   
1. I have the difficulty in building the 0.9.1 branch. I
  think
this
  is
mainly related to SAMZA-721
https://issues.apache.org/jira/browse/SAMZA-721.
   
2. Also, https://issues.apache.org/jira/browse/SAMZA-712
  seems
   bothering
people as well.
   
3. https://issues.apache.org/jira/browse/SAMZA-720 is a
   critical
 bug
   we
need to fix. Have already attached a patch.
   
4. There is no maven staging link.
   
Thanks,
   
Fang, Yan
yanfang...@gmail.com
   
On Sun, Jun 21, 2015 at 1:53 PM, Roger Hoover 
  roger.hoo...@gmail.com
wrote:
   
Hi all,
   
Do you think we could get this bootstrapping bug fixed
  before
 0.9.1
release?  It seems like a critical bug.
   

Review Request 35933: SAMZA-449 Expose RocksDB statistic

2015-06-26 Thread Gustavo Anatoly F . V . Solís

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35933/
---

Review request for samza.


Repository: samza


Description
---

RocksDB statistic


Diffs
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 a423f7bd6c43461e051b5fd1f880dd01db785991 
  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbStatistic.scala
 PRE-CREATION 
  
samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 a428a16bc1e9ab4980a6f17db4fd810057d31136 

Diff: https://reviews.apache.org/r/35933/diff/


Testing
---


Thanks,

Gustavo Anatoly F. V. Solís



Re: Samza and sliding window

2015-06-26 Thread Shekar Tippur
Never mind. I see it here:

http://samza.apache.org/learn/documentation/0.8/container/windowing.html

Thanks again Milinda.

- Shekar

On Fri, Jun 26, 2015 at 11:39 AM, Shekar Tippur ctip...@gmail.com wrote:

 Thanks Milinda.
 Is this feature available on 0.8 version of Samza?

 - Shekar

 On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage mpath...@umail.iu.edu
  wrote:

 Hi Shekar,

 You can use Samza's local storage (

 http://samza.apache.org/learn/documentation/0.9/container/state-management.html
 )
 to keep the window state and windowing (
 http://samza.apache.org/learn/documentation/0.9/container/windowing.html)
 capabilities to handle the window advancement. During advancement you can
 update the local cache (Redis in your case). AFAIK, Samza doesn't provide
 any helpers or utilities to handle window state maintenance. You have to
 implement it on top of local storage or if you don't won't fault tolerance
 you can keep the state in-memory too (as long as the state fit in memory).

 Thanks
 Milinda

 On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur ctip...@gmail.com wrote:

  Yan,
 
 
  *What do you mean by a local cache? Is it a db like MySQL, something
  likeRocksDB, or even just in-memory?*
 
  Local cache as in Redis
 
 
 
  *When you say another topic, is this the topic consumed by the same
  Samzajob as your 5-minutes-job, or in a separate job? What is the
  relationbetween the topic and the application name*
 
  We dont have a 5 min job. All we have now is a stream of events coming
 from
  a bunch of applications. All these land on a raw kafka topic. The stream
  data has application name. I want to create a job that takes incoming
  stream and group it by application name and count the number of events
 we
  get in a 5 min sliding window.
 
  - Shekar
 
  On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang yanfang...@gmail.com
 wrote:
 
   Hi Shekar,
  
   Need a little more clarification.
  
   What do you mean by a local cache? Is it a db like MySQL, something
  like
   RocksDB, or even just in-memory?
  
   When you say another topic, is this the topic consumed by the same
  Samza
   job as your 5-minutes-job, or in a separate job? What is the relation
   between the topic and the application name?
  
   Thanks,
  
   Fang, Yan
   yanfang...@gmail.com
  
   On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur ctip...@gmail.com
  wrote:
  
Hello,
My apologies if I have raised it earlier.
Here is the use case:
I have a stream that is partitioned based on application name. I
 want
  to
   be
able to count hte number of events happening for that particular
application in the past 5 minutes (sliding window) and update either
another topic or a local cache.
   
Is this possible via 0.9 version of Samza?
If not, what is the easiest way to achieve this?
   
- Shekar
   
  
 



 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org





Re: Samza and sliding window

2015-06-26 Thread Shekar Tippur
Thanks Milinda.
Is this feature available on 0.8 version of Samza?

- Shekar

On Fri, Jun 26, 2015 at 11:24 AM, Milinda Pathirage mpath...@umail.iu.edu
wrote:

 Hi Shekar,

 You can use Samza's local storage (

 http://samza.apache.org/learn/documentation/0.9/container/state-management.html
 )
 to keep the window state and windowing (
 http://samza.apache.org/learn/documentation/0.9/container/windowing.html)
 capabilities to handle the window advancement. During advancement you can
 update the local cache (Redis in your case). AFAIK, Samza doesn't provide
 any helpers or utilities to handle window state maintenance. You have to
 implement it on top of local storage or if you don't won't fault tolerance
 you can keep the state in-memory too (as long as the state fit in memory).

 Thanks
 Milinda

 On Fri, Jun 26, 2015 at 1:53 PM, Shekar Tippur ctip...@gmail.com wrote:

  Yan,
 
 
  *What do you mean by a local cache? Is it a db like MySQL, something
  likeRocksDB, or even just in-memory?*
 
  Local cache as in Redis
 
 
 
  *When you say another topic, is this the topic consumed by the same
  Samzajob as your 5-minutes-job, or in a separate job? What is the
  relationbetween the topic and the application name*
 
  We dont have a 5 min job. All we have now is a stream of events coming
 from
  a bunch of applications. All these land on a raw kafka topic. The stream
  data has application name. I want to create a job that takes incoming
  stream and group it by application name and count the number of events we
  get in a 5 min sliding window.
 
  - Shekar
 
  On Fri, Jun 26, 2015 at 10:29 AM, Yan Fang yanfang...@gmail.com wrote:
 
   Hi Shekar,
  
   Need a little more clarification.
  
   What do you mean by a local cache? Is it a db like MySQL, something
  like
   RocksDB, or even just in-memory?
  
   When you say another topic, is this the topic consumed by the same
  Samza
   job as your 5-minutes-job, or in a separate job? What is the relation
   between the topic and the application name?
  
   Thanks,
  
   Fang, Yan
   yanfang...@gmail.com
  
   On Fri, Jun 26, 2015 at 1:08 AM, Shekar Tippur ctip...@gmail.com
  wrote:
  
Hello,
My apologies if I have raised it earlier.
Here is the use case:
I have a stream that is partitioned based on application name. I want
  to
   be
able to count hte number of events happening for that particular
application in the past 5 minutes (sliding window) and update either
another topic or a local cache.
   
Is this possible via 0.9 version of Samza?
If not, what is the easiest way to achieve this?
   
- Shekar
   
  
 



 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org



Re: Best way to log from inside a Samza task?

2015-06-26 Thread Rick Mangi
If you do something like this in your log4j.xml

  root
priority value=INFO /
appender-ref ref=RollingAppender /
/root

logger name=cbsamza additivity=false
  level value=“DEBUG /
  appender-ref ref=RollingAppender /
/logger

the root controls samza’s logging and the logger controls your own… I haven’t 
managed to get the imx configuration working yet.




 On Jun 26, 2015, at 1:50 PM, ja...@marketingscience.co wrote:
 
 I was almost there. Got it now. Thanks for your help Rick. 
 
 
 
 
 Cheers, 
 
 
 
 
 Jason
 
 
 
 
 
 
 
 
 
 On Friday, Jun 26, 2558 at 11:43, Rick Mangi r...@chartbeat.com, wrote:
 Hey Jason,
 
 
 If you configure log4j as described here: 
 http://samza.apache.org/learn/documentation/0.9/jobs/logging.html 
 http://samza.apache.org/learn/documentation/0.9/jobs/logging.html
 
 
 Your log statements will wind up in the samza-container logs which you can 
 get to via the application master gui.
 
 
 hth,
 
 
 Rick
 
 
 
 On Jun 26, 2015, at 1:39 PM, ja...@marketingscience.co wrote:
 
 
 
 Hello,
 
 
 
 
 
 I am working on a basic Samza task that pulls from one Kafka topic and 
 writes to another. This task runs in Yarn but the Output topic does not 
 contain any data. 
 
 
 
 
 
 In order to troubleshoot this more effectively I would like to log the 
 incoming message as my example below. Ideally, I would like to be able to 
 see the log messages in Yarn, maybe in the .out files in the /logs 
 directory. 
 
 
 
 
 
 Any advice is appreciated. 
 
 
 
 
 
 
 
 Here is my task:
 
 
 
 
 
 package com.project.samza.tasks;
 
 
 
 
 
 import java.util.Map;
 
 import org.apache.samza.system.IncomingMessageEnvelope;
 
 import org.apache.samza.system.OutgoingMessageEnvelope;
 
 import org.apache.samza.system.SystemStream;
 
 import org.apache.samza.task.MessageCollector;
 
 import org.apache.samza.task.StreamTask;
 
 import org.apache.samza.task.TaskCoordinator;
 
 
 
 
 
 public class exampleStreamTask implements StreamTask {
 
  private static final SystemStream OUTPUT_STREAM = new SystemStream(“kafka”, 
 “new-topic-test”);
 
 
 
 
 
  @Override
 
  public void process(IncomingMessageEnvelope envelope,
 
  MessageCollector collector,
 
  TaskCoordinator coordinator) {
 
  String msg = (String) envelope.getMessage();
 
  System.out.println(msg);
 
  collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, msg));
 
  }
 
 }
 
 
 
 
 
 Thanks, 
 
 
 
 
 
 Jason



Re: Triggering emits for streaming window aggregates

2015-06-26 Thread Milinda Pathirage
Hi Yi,

In this specific case ordering is declared in the schema. Quoting from
Calcite documentation


Monotonic columns need to be declared in the schema. The monotonicity is
enforced when records enter the stream and assumed by queries that read
from that stream. We recommend that you give each stream a timestamp column
called rowtime, but you can declare others, orderId, for example.




If we can propagate this ordering information to LogicalAggregate then we
can easily handle this. As I understand required information is accessible
to Calcite query planner. But in our case we need this information after we
get the query plan from Calcite. AFAIK, current API doesn't provide a way
to get this information in scenarios like above where ORDER BY is not
specified in the query (I am not 100% sure about ORDER BY case too. I need
to have a look at a query plan generated for a query with ORDER BY).

Thanks
Milinda

On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan nickpa...@gmail.com wrote:

 Hi, Milinda,

 I thought that in your example, the ordering field is given in GROUP BY.
 Are we missing a way to pass the ordering field(s) to the LogicalAggregate?

 -Yi

 On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage mpath...@umail.iu.edu
 
 wrote:

  Hi Julian,
 
  Even though this is a general question across all the streaming
 aggregates
  which utilize GROUP BY clause and a monotonic timestamp field for
  specifying the window, but I am going to stick to most basic example
 (which
  is from Calcite Streaming document).
 
  SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,
productId,
COUNT(*) AS c,
SUM(units) AS units
  FROM Orders
  GROUP BY FLOOR(rowtime TO HOUR), productId;
 
  I was trying to implement an aggregate operator which handles tumbling
  windows via the monotonic field in GROUP By clause in addition to the
  general aggregations. I went in this path because I thought integrating
  windowing aspects (at least for tumbling and hopping) into aggregate
  operator will be easier than trying to extract the window spec from the
  query plan for a query like above. But I hit a wall when trying to figure
  out trigger condition for emitting aggregate results. I was initially
  planning to detect new values for FLOOR(rowtime TO HOUR) and emit current
  aggregate results for previous groups (I was thinking to keep old groups
  around until we clean them up after a timeout). But when trying to
  implement this I figured out that I don’t know how to check which GROUP
 BY
  field is monotonic so that I only detect new values for the monotonic
  field/fields, not for the all the other fields. I think this is not a
  problem for tables because we have the whole input before computation and
  we wait till we are done with the input before emitting the results.
 
  With regards to above can you please clarify following things:
 
  - Is the method I described above for handling streaming aggregates make
  sense at all?
  - Is there a way that I can figure out which fields/expressions in
  LogicalAggregate are monotonic?
  - Or can we write a rule to annotate or add extra metadata to
  LogicalAggregate so that we can get monotonic fields in the GROUP By
 clause
 
  Thanks in advance
  Milinda
 
 
  --
  Milinda Pathirage
 
  PhD Student | Research Assistant
  School of Informatics and Computing | Data to Insight Center
  Indiana University
 
  twitter: milindalakmal
  skype: milinda.pathirage
  blog: http://milinda.pathirage.org
 




-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


Re: Best way to log from inside a Samza task?

2015-06-26 Thread jason
I was almost there. Got it now. Thanks for your help Rick. 




Cheers, 




Jason









On Friday, Jun 26, 2558 at 11:43, Rick Mangi r...@chartbeat.com, wrote:
Hey Jason,


If you configure log4j as described here: 
http://samza.apache.org/learn/documentation/0.9/jobs/logging.html 
http://samza.apache.org/learn/documentation/0.9/jobs/logging.html


Your log statements will wind up in the samza-container logs which you can get 
to via the application master gui.


hth,


Rick



 On Jun 26, 2015, at 1:39 PM, ja...@marketingscience.co wrote:

 

 Hello,

 

 

 I am working on a basic Samza task that pulls from one Kafka topic and writes 
 to another. This task runs in Yarn but the Output topic does not contain any 
 data. 

 

 

 In order to troubleshoot this more effectively I would like to log the 
 incoming message as my example below. Ideally, I would like to be able to see 
 the log messages in Yarn, maybe in the .out files in the /logs directory. 

 

 

 Any advice is appreciated. 

 

 

 

 Here is my task:

 

 

 package com.project.samza.tasks;

 

 

 import java.util.Map;

 import org.apache.samza.system.IncomingMessageEnvelope;

 import org.apache.samza.system.OutgoingMessageEnvelope;

 import org.apache.samza.system.SystemStream;

 import org.apache.samza.task.MessageCollector;

 import org.apache.samza.task.StreamTask;

 import org.apache.samza.task.TaskCoordinator;

 

 

 public class exampleStreamTask implements StreamTask {

   private static final SystemStream OUTPUT_STREAM = new SystemStream(“kafka”, 
 “new-topic-test”);

 

 

   @Override

   public void process(IncomingMessageEnvelope envelope,

   MessageCollector collector,

   TaskCoordinator coordinator) {

   String msg = (String) envelope.getMessage();

   System.out.println(msg);

   collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, msg));

   }

 }

 

 

 Thanks, 

 

 

 Jason

Re: Review Request 35918: SAMZA-709 Monitoring page for REST API and the dashboard

2015-06-26 Thread Yan Fang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35918/#review89590
---



docs/learn/documentation/versioned/jobs/web-ui-rest-api.md (line 33)
https://reviews.apache.org/r/35918/#comment142228

after SAMZA-418, the dashboard is a little different with new information, 
will you be able to update the dashboard screenshot ?



docs/learn/documentation/versioned/jobs/web-ui-rest-api.md (lines 39 - 44)
https://reviews.apache.org/r/35918/#comment142229

This seems not working correclty. It loses format. I think we can direclty 
use table html tags like what samza-container.md does.


- Yan Fang


On June 26, 2015, 11:50 a.m., Aleksandar Bircakovic wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35918/
 ---
 
 (Updated June 26, 2015, 11:50 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Added new monitoring page for REST API and the dashboard and removed 
 dashboard from ApplicationMaster. Also added table that shortly explains REST 
 service.
 
 
 Diffs
 -
 
   docs/learn/documentation/versioned/index.html e1b9f2d 
   docs/learn/documentation/versioned/jobs/reprocessing.md 28d9925 
   docs/learn/documentation/versioned/jobs/web-ui-rest-api.md PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35918/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Aleksandar Bircakovic
 




Re: Review Request 35918: SAMZA-709 Monitoring page for REST API and the dashboard

2015-06-26 Thread Aleksandar Bircakovic

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35918/
---

(Updated June 26, 2015, 11:50 a.m.)


Review request for samza.


Summary (updated)
-

SAMZA-709 Monitoring page for REST API and the dashboard


Repository: samza


Description
---

Added new monitoring page for REST API and the dashboard and removed dashboard 
from ApplicationMaster. Also added table that shortly explains REST service.


Diffs
-

  docs/learn/documentation/versioned/index.html e1b9f2d 
  docs/learn/documentation/versioned/jobs/reprocessing.md 28d9925 
  docs/learn/documentation/versioned/jobs/web-ui-rest-api.md PRE-CREATION 

Diff: https://reviews.apache.org/r/35918/diff/


Testing
---


Thanks,

Aleksandar Bircakovic