Re: LoggingMetricsConsumer

2014-09-26 Thread Raphael Hsieh
, if initialization of my consumer failed, the LoggingMetricsConsumer would also failit may have depended on the order that I was registering them in, but I don't remember exactly. Cheers, John On Thu, Sep 25, 2014 at 10:07 AM, Raphael Hsieh raffihs...@gmail.com wrote: Hi, I've been trying

Trident Metrics Consumer

2014-09-26 Thread Raphael Hsieh
I've been following the tutorials here ( http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to) to create metrics in Storm. However I am using Trident which abstracts bolts away from the user. How can I go about creating metrics in trident ? Thanks -- Raphael Hsieh

LoggingMetricsConsumer

2014-09-25 Thread Raphael Hsieh
? When is the handleDataPoints function called? Thanks -- Raphael Hsieh

Re: metrics consumer logging stormUI data

2014-09-24 Thread Raphael Hsieh
in the metrics.log. -Harsha On Mon, Sep 22, 2014, at 10:41 AM, Raphael Hsieh wrote: Hi Harsha, Did you have to bind the metrics consumer to the default StormUI metrics at all? Or do those automagically get included ? Thanks! On Mon, Sep 22, 2014 at 10:33 AM, Otis Gospodnetic otis.gospodne

Re: metrics consumer logging stormUI data

2014-09-22 Thread Raphael Hsieh
://blog.sematext.com/2014/01/30/announcement-apache-storm-monitoring-in-spm/ I hope this helps. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Sep 19, 2014 at 6:12 PM, Raphael Hsieh raffihs...@gmail.com

Re: metrics consumer logging stormUI data

2014-09-22 Thread Raphael Hsieh
/ I hope this helps. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Sep 19, 2014 at 6:12 PM, Raphael Hsieh raffihs...@gmail.com wrote: Hi, Using Storm/Trident, how do I register a metrics consumer

metrics consumer logging stormUI data

2014-09-19 Thread Raphael Hsieh
to me how I am supposed to go about doing that. Thanks -- Raphael Hsieh

Re: Parallelism for KafkaSpout

2014-07-23 Thread Raphael Hsieh
for the topic? Regards, Kashyap -- Raphael Hsieh

Re: Naming Components In Trident Topology

2014-07-22 Thread Raphael Hsieh
. Currently the UI shows $mastercoord-bg0 for the spout then the following for the bolts $spoutcoord-spout0 b-1 b-0 Is there anyway to make this more friendly. Thanks Justin -- Raphael Hsieh Amazon.com Software Development Engineer I (978) 764-9014

Max Spout Pending

2014-07-14 Thread Raphael Hsieh
sense for a Max Spout Pending value ? I expect my topology to have a throughput of around 80,000/s and I've been seeing a complete latency of around 300ms, so given this formula, I'd want 2*8*.3 = 48,000 Max Spout Pending. This seems absurdly high to me.. -- Raphael Hsieh

Re: Max Spout Pending

2014-07-14 Thread Raphael Hsieh
pending config specifies how many *batches* can be processed simultaneously by your topology. Thats why 48,000 seems absurdly high to you. Divide it between the batch size and you'll get the max spout pending config that you were expecting. 2014-07-14 19:00 GMT+02:00 Raphael Hsieh raffihs

Spout process latency

2014-07-09 Thread Raphael Hsieh
to fill a batch with data and send it to the first bolt in the topology? Thanks -- Raphael Hsieh

topology system metrics

2014-07-08 Thread Raphael Hsieh
Is there a way to get a hold of the topology's system metrics and send it to an external datastore such as dynamoDb ? -- Raphael Hsieh

Re: key values in PersistentAggregate

2014-07-02 Thread Raphael Hsieh
actually I think this is a non-issue, given the field exists in the stream already, I should be able to access it right ? On Wed, Jul 2, 2014 at 10:27 AM, Raphael Hsieh raffihs...@gmail.com wrote: From my understanding, if I implement my own state factory to use in PersistentAggregate

Re: using CachedMap in a trident state

2014-06-13 Thread Raphael Hsieh
If we don't serialize the data when we store it in the cache, doesn't that defeat the purpose of having an OpaqueValue in order to keep transactional consistency and the processed exactly once semantics? On Thu, Jun 12, 2014 at 8:57 AM, Raphael Hsieh raffihs...@gmail.com wrote: How come we

using CachedMap in a trident state

2014-06-11 Thread Raphael Hsieh
the same data as the external datastore ? thanks -- Raphael Hsieh

is storm.trident LRUMap distributed among hosts?

2014-06-10 Thread Raphael Hsieh
Is the storm.trident.util.LRUMap distributed among all the hosts in the storm cluster ? if not, is there any way to combine this with a memcache ? -- Raphael Hsieh

how does PersistentAggregate distribute the DB Calls ?

2014-06-03 Thread Raphael Hsieh
How does PersistentAggregate distribute the database calls across all the worked nodes ? Does it do the global aggregation then choose a single host to do a multiget/multiput to the external db ? Thanks -- Raphael Hsieh

Re: how does PersistentAggregate distribute the DB Calls ?

2014-06-03 Thread Raphael Hsieh
it needs to interact with database. So if you do a persistent global count, for example, it will compute the count for the batch (in parallel), and then the task that finishes the global count will do a single get/update/put to the database. On Tue, Jun 3, 2014 at 3:08 PM, Raphael Hsieh raffihs

Re: Optimizing Kafka Stream

2014-06-02 Thread Raphael Hsieh
, and would likely share resources with other Storm processes (spouts and bolts). I recommend to increase the number of workers so Storm has a chance to spread out the work, and keep a good balance. Hope this helps. Chi On Fri, May 30, 2014 at 4:24 PM, Raphael Hsieh raffihs...@gmail.com

Re: Optimizing Kafka Stream

2014-06-02 Thread Raphael Hsieh
#brokerconfigs (num.partitions) - or when you create the topic. The behavior is different for each version of Kafka, so you should read more documentation. Your topology needs to match the Kafka configuration for the topic. Chi On Mon, Jun 2, 2014 at 8:46 AM, Raphael Hsieh raffihs...@gmail.com wrote

Optimizing Kafka Stream

2014-05-30 Thread Raphael Hsieh
and I'm starting to run out of ideas. Thanks -- Raphael Hsieh

Re: Trident, ZooKeeper and Kafka

2014-05-29 Thread Raphael Hsieh
:23 AM, Raphael Hsieh raffihs...@gmail.com wrote: I'm doing both tridentKafkaConfig.forceFromStart = false; as well as tridentKafkaConfig.startOffsetTime = -1; Neither are working for me. Looking at my nimbus UI, I still get a large spike in processed data, before it levels off and seems

Re: Nimbus UI fields

2014-05-29 Thread Raphael Hsieh
, Raphael Hsieh wrote: I reattached the previous image in case it was too difficult to read before On Tue, May 20, 2014 at 3:31 PM, Raphael Hsieh raffihs...@gmail.com wrote: Hi I'm confused as to what each field in the StormUI represents and how to use the information. [image: Inline

Re: Position in Kafka Stream

2014-05-29 Thread Raphael Hsieh
...@adobe.com wrote: I found this blog helpful: http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to Best regards, Tyson On May 29, 2014, at 8:41 AM, Raphael Hsieh raffihs...@gmail.com wrote: Can someone explain to me what LoggingMetrics is ? I've heard of it and people

Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
storm system start processing the live data. Thanks -- Raphael Hsieh

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
forceStartOffset to -2 to start consuming from the earliest available offset, or -1 to start consuming from the latest available offset. On Wednesday, May 28, 2014, Raphael Hsieh raffihs...@gmail.com wrote: If I don't tell trident to start consuming data from the beginning of the Kafka stream, where does

Re: Trident, ZooKeeper and Kafka

2014-05-28 Thread Raphael Hsieh
This is still not working for me. I've set the offset to -1 and it is still backfilling data. Is there any documentation on the start offsets that I could take a look at ? Or even documentation on kafka.api.OffsetRequest.LatestTime() ? On Wed, May 28, 2014 at 1:01 PM, Raphael Hsieh raffihs

Re: logging 'failed' tuples in mastercoord-bg0

2014-05-28 Thread Raphael Hsieh
-Trident. Is there a particular string other than 'failed' that I can grep for? Thanks -- Raphael Hsieh -- Raphael Hsieh

Re: Different ZooKeeper Cluster for storm and kafka ?

2014-05-28 Thread Raphael Hsieh
Never mind I figured this out. Thanks On Wed, May 28, 2014 at 3:59 PM, Raphael Hsieh raffihs...@gmail.com wrote: Hi I believe it is possible to have my Storm topology run on a different ZooKeeper cluster than the source of my data (this case being Kafka). I cannot seem to find documentation

PersistentAggregate

2014-05-27 Thread Raphael Hsieh
From my understanding, PersistentAggregate should first aggregate the batch, then once the batch has finished aggregating, send it to whatever datastore is specified. Is this the case ? Or will the Persistent Aggregate use the external datastore in order to do the aggregations ? -- Raphael

Batches per second

2014-05-27 Thread Raphael Hsieh
Is there a way to tell how many batches per second are being processed by my topology? Thanks -- Raphael Hsieh

Position in Kafka Stream

2014-05-27 Thread Raphael Hsieh
and what the most recent/oldest position is? Thanks -- Raphael Hsieh

Re: $mastercoord-bg0

2014-05-22 Thread Raphael Hsieh
and look at logs there. On Wed, May 21, 2014 at 4:47 PM, Raphael Hsieh raffihs...@gmail.com wrote: what does the $mastercoord-bg0 represent ? It seems to have much less work that my bolt spout. Also how can I set the parallelism of this master spout ? when my other bolts are emitting and acking

$mastercoord-bg0

2014-05-21 Thread Raphael Hsieh
in my code is broken and the mastercoord spout is failing 20, and nothing is being passed through. Is this mastercoord-bg0 spout acking batches ? How might I go about troubleshooting this to figure out why it is broken ? Thanks -- Raphael Hsieh

Multiple writers to datastore ?

2014-04-28 Thread Raphael Hsieh
writes first, then 28. However, 28 did not aggregate on top of 27's aggregate, and hence the final data in the datastore is wrong. How does storm handle this ? -- Raphael Hsieh

Re: Memcached

2014-04-28 Thread Raphael Hsieh
Also, How fault tolerant is using the MemcachedState? If a host/worker node dies, does its in memory map get lost forever ? Or is this map distributed among worker nodes ? On Mon, Apr 28, 2014 at 3:57 PM, Raphael Hsieh raffihs...@gmail.com wrote: How would one pull data from a Memcached

Re: Flush aggregated data every X seconds

2014-04-24 Thread Raphael Hsieh
cared so much to implement this) was that the rules need to be dynamic and the topology needs to be static as to make the best use of resources while users are defining that they need. On Thu, Apr 24, 2014 at 11:27 PM, Raphael Hsieh raffihs...@gmail.comwrote: Is there a way in Storm Trident

Re: PersistentAggregate across batches

2014-04-22 Thread Raphael Hsieh
On Apr 22, 2014 10:32 AM, Raphael Hsieh raffihs...@gmail.com wrote: The Reducer/Combiner Aggregators hold logic in order to aggregate across an entire batch, however it does not have the logic to aggregate between batches. In order for this to happen, it must read the previous TransactionId

Re: PersistentAggregate across batches

2014-04-22 Thread Raphael Hsieh
the previous link didn't work, https://github.com/nathanmarz/storm/wiki/Trident-API-Overview#operations-on-grouped-streams On Tue, Apr 22, 2014 at 10:30 AM, Raphael Hsieh raffihs...@gmail.comwrote: Yes partially, The part I was missing was getting old values and feeding it through

PersistentAggregate across batches

2014-04-21 Thread Raphael Hsieh
in order to check the TxId of the batch. Instead of using an OpaqueMap class, should I just make my own implementation ? Thanks -- Raphael Hsieh

Re: How to think of batches vs partitions

2014-04-17 Thread Raphael Hsieh
I guess I'm just confused as to when multiGet and multiPut are called when using an implementation of the IBackingMap On Thu, Apr 17, 2014 at 8:33 AM, Raphael Hsieh raffihs...@gmail.com wrote: So from my understanding, this is how the different spout types guarantee single message processing

Re: How to think of batches vs partitions

2014-04-17 Thread Raphael Hsieh
delegates to an IBackingMap which handles the actual persistence. IBackingMap just has multiGet and multiPut methods. An implementation for a database (like Cassandra, Riak, HBase, etc.) just has to implement IBackingMap. On Thu, Apr 17, 2014 at 10:15 AM, Raphael Hsieh raffihs...@gmail.comwrote

In memory state, drop data after X time

2014-04-17 Thread Raphael Hsieh
a state for a period of time, then flush it out to an external datastore. -- Raphael Hsieh

How to think of batches vs partitions

2014-04-16 Thread Raphael Hsieh
into partitions within each host running on multiple threads ? Thanks! -- Raphael Hsieh

Re: setting trident transaction window

2014-04-15 Thread Raphael Hsieh
per batch. This should though, definitely give it a shot. On Thu, Apr 10, 2014 at 1:33 PM, Raphael Hsieh raffihs...@gmail.comwrote: Thanks for your reply Jason, So what I'm hearing is that there is no nice way of doing temporal flushes to a database. My main reason for wanting to do

setting trident transaction window

2014-04-09 Thread Raphael Hsieh
, flush it out to an external datastore, then rinse and repeat ? there are some blogs out there regarding how to use a sliding window in storm, however I just want sequential windows in Trident. Thanks -- Raphael Hsieh