HBase spout?

2014-02-10 Thread Niels Basjes
Hi,

I'm looking for a way to hook storm to an hbase table.
What I've found so far is this one https://github.com/ypf412/storm-hbase but
that simply scans the table 'very often' and has a requirement on the
structure of the rowid.

Is there a storm-hbase spout that does not have this requirement and
'simply' hooks to the events in the HBase-WAL?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Reading state of a streaming topology via DRPC?

2014-02-15 Thread Niels Basjes
Hi,

I want to create a bolt that keeps some kind of state (aggregate) about the
messages it has seen so far (i.e. web click stream).
Once such a bolt has gathered information I would like to get to that
information for an application I designing.

So far I've come up with two way of getting at this data:
1) Persist it into something like HBase (i.e push from the bolt).
2) Use DRPC to query the bolt state directly.

Regarding this last idea (using DRPC): Is this possible?
If it is then where can I find an example on how to create a single
topology that is essentially both streaming and drpc.

Thanks.

-- 
Best regards,

Niels Basjes


Re: Reading state of a streaming topology via DRPC?

2014-02-18 Thread Niels Basjes
Thanks for the pointer. I''ll have a close look at this.
Is there also an example in 'plain' storm api?

Niels
On Feb 16, 2014 11:14 AM, "Enno Shioji"  wrote:

> This uses Trident but I think it covers what you need:
>
> https://github.com/eshioji/trident-tutorial/blob/master/src/main/java/tutorial/storm/trident/Part04_BasicStateAndDRPC.java
>
>
> On Sat, Feb 15, 2014 at 8:17 PM, Niels Basjes  wrote:
>
>> Hi,
>>
>> I want to create a bolt that keeps some kind of state (aggregate) about
>> the messages it has seen so far (i.e. web click stream).
>> Once such a bolt has gathered information I would like to get to that
>> information for an application I designing.
>>
>> So far I've come up with two way of getting at this data:
>> 1) Persist it into something like HBase (i.e push from the bolt).
>> 2) Use DRPC to query the bolt state directly.
>>
>> Regarding this last idea (using DRPC): Is this possible?
>> If it is then where can I find an example on how to create a single
>> topology that is essentially both streaming and drpc.
>>
>> Thanks.
>>
>> --
>> Best regards,
>>
>> Niels Basjes
>>
>
>


Storm cannot run in combination with a recent Hadoop/HBase version.

2014-02-26 Thread Niels Basjes
Hi,

I'm trying to write some storm bolts and I want them to output the
information they produce into HBase.
Now the HBase we have running here is based on CDH 4.5.0 which is fully
based on the zookeeper versions in the 3.4.x range.

The problem I have is that Storm currently still uses zookeeper 3.3.3

The important difference in my case between these two is that
3.3.x has:  org.apache.zookeeper.server.NIOServerCnxn$Factory
3.4.x has:  org.apache.zookeeper.server.NIOServerCnxnFactory

As a consequence I'm getting a ClassNotFoundException.

I found that during a short period this problem was fixed but because of a
performance problem in curator was turned back.
https://github.com/nathanmarz/storm/pull/225

What does it take to get this fixed (i.e. zookeeper goes to a 3.4.x
version)?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Storm cannot run in combination with a recent Hadoop/HBa

2014-03-03 Thread Niels Basjes
Thanks.

A few days ago I found this recent pull request that seems to work
https://github.com/apache/incubator-storm/pull/29

Today I updated my RPM scripting to include this update
https://github.com/nielsbasjes/storm-rhel-packaging

Tomorrow I'll do some testing to see if I can actually get a record from
Storm into HBase.

Niels Basjes



On Sat, Mar 1, 2014 at 1:19 PM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) <
skada...@bloomberg.net> wrote:

> We faced this issue couple weeks ago and ended up patching the code
> ourselves since it was such a small change.
>
> We weren't aware of the performance issue with the curator client. We
> don't see anything obvious, so I wonder if that's an edge case of some sort.
>
> ----- Original Message -
> From: Niels Basjes 
> To: user@storm.incubator.apache.org
> At: Feb 26, 2014 11:43:37 AM
>
> Hi,
>
> I'm trying to write some storm bolts and I want them to output the
> information they produce into HBase.
> Now the HBase we have running here is based on CDH 4.5.0 which is fully
> based on the zookeeper versions in the 3.4.x range.
>
> The problem I have is that Storm currently still uses zookeeper 3.3.3
>
> The important difference in my case between these two is that
> 3.3.x has: org.apache.zookeeper.server.NIOServerCnxn$Factory
> 3.4.x has: org.apache.zookeeper.server.NIOServerCnxnFactory
>
> As a consequence I'm getting a ClassNotFoundException.
>
> I found that during a short period this problem was fixed but because of a
> performance problem in curator was turned back.
> https://github.com/nathanmarz/storm/pull/225
>
> What does it take to get this fixed (i.e. zookeeper goes to a 3.4.x
> version)?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Storm-0.9.0-wip21 failed for topology with Hadoop and Hbase dependencies

2014-03-09 Thread Niels Basjes
Try putting those static initializations into the prepare.
On Mar 8, 2014 12:20 PM, "Zheng Xue"  wrote:

> Thanks for you reply.
>
> I excluded the log4j dependency in the pom file. The topology can run in
> Storm-0.9.0-wip21 now.
>
> But another error appears in the last bolt, which is responsible for
> writing statistics into HBase. Here is the exception information:
>
>
> java.lang.RuntimeException: java.lang.NullPointerException
>   at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
>   at 
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58)
>   at 
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
>   at 
> backtype.storm.daemon.executor$fn__3483$fn__3495$fn__3542.invoke(executor.clj:715)
>   at backtype.storm.util$async_loop$fn__441.invoke(util.clj:396)
>   at clojure.lang.AFn.run(AFn.java:24)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> com.mycompany.app.MobileNetLogThresholdBolt.execute(MobileNetLogThresholdBolt.java:81)
>   at 
> backtype.storm.daemon.executor$fn__3483$tuple_action_fn__3485.invoke(executor.clj:610)
>   at 
> backtype.storm.daemon.executor$mk_task_receiver$fn__3406.invoke(executor.clj:381)
>   at 
> backtype.storm.disruptor$clojure_handler$reify__2948.onEvent(disruptor.clj:43)
>   at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:79)
>
> Line 81 in MobileNetLogThresholdBolt.java is "table.put(put)" (see the
> java code of this bolt below). Referred to
> https://github.com/nathanmarz/storm/wiki/Troubleshooting, multiple
> threads using the OutputCollector leads to this exception. But how can I
> fix it? Thanks.
>
>  public class MobileNetLogThresholdBolt implements IRichBolt {
> private OutputCollector outputCollector;
> public static Configuration configuration;
> public static String tablename = "t_mobilenet_threshold";
> public static HTable table;
> static {
>  configuration = HBaseConfiguration.create();
>  configuration.set("hbase.zookeeper.property.clientPort","2181");
>  configuration.set("hbase.zookeeper.quorum", "xx.xx.xx.xx");
>  configuration.set("hbase.master", "xx.xx.xx.xx:6");
> }
>
>
> private Log log = LogFactory.getLog(MobileNetLogThresholdBolt.class);
>
> @Override
> public void prepare(Map map, TopologyContext topologyContext,
> OutputCollector outputCollector) {
> this.outputCollector = outputCollector;
> try {
>  table = new HTable(configuration, tablename);
>  } catch (IOException e) {
>  // TODO Auto-generated catch block
>  e.printStackTrace();
>  }
>
> }
>
>
> @Override
>  public void execute(Tuple tuple) {
>
> log.info("deal data " + tuple.getString(0) + "=" +
> tuple.getInteger(1));
> if (tuple.getInteger(1) > 2) {
>
> Put put = new Put(Bytes.toBytes(tuple.getString(0)));
> put.add(Bytes.toBytes("STAT_INFO"), Bytes.toBytes("COUNT"),
> Bytes.toBytes(String.valueOf(tuple.getInteger(1;
> try {
>  table.put(put);
>  } catch (IOException e) {
>  e.printStackTrace();
>  }
> }
> this.outputCollector.emit(tuple, tuple.getValues());
> this.outputCollector.ack(tuple);
> }
>
> @Override
> public void cleanup() {
>
> }
>
> @Override
> public void declareOutputFields(OutputFieldsDeclarer
> outputFieldsDeclarer) {
> }
>
> @Override
> public Map getComponentConfiguration() {
> return null;
> }
> }
>
>
> 2014-03-08 0:53 GMT+08:00 bijoy deb :
>
>> Hi Zheng,
>>
>> Did you look at the logs for what exactly the error message is?
>> In case you see any error that says "multiple default.yaml in path
>> ...",try the below:
>>   --In the with-dependencies jar,can you check if you have a default.yaml
>> or storm.yaml file.If yes,please delete it and try submitting the resulting
>> jar.
>>
>> Thanks
>> Bijoy
>>
>>
>> On Fri, Mar 7, 2014 at 10:51 AM, Zheng Xue  wrote:
>>
>>> Hi, all:
>>>
>>> I was trying to build a Storm topology with Hadoop and Hbase
>>> dependencies. And I want to run this topology with Storm-on-Yarn. The
>>> version of Storm in it is 0.9.0-wip21. I created the Jar file with Maven,
>>> and the pom.xml file is attached.
>>>
>>> I submitted the topology (with-dependencies), and there was no
>>> exceptions. But it did't run at all. I checked the supervisor logs (see
>>> bottom), which shows that the workers failed to start.
>>>
>>> To trace the reason of this issue, I add a test topology
>>> (WordCountTopology) in the jar file. The problems remains when trying to
>>> submit the WordCountTopology with dependencies, but it works well when
>>> trying to submit the topology without dependencies.
>>>
>>> To make it clear that where is this issue from, I ran the topology
>>> without Storm-on-Yarn. I

Re: Question on Kafka Spout

2014-03-29 Thread Niels Basjes
It looks like you used a zookeeper connect string with port numbers in
there.  I.e.  host:port,host:port
Storm is the only tool I know that forces you to use bare hostnames and the
port in a in a separate property.

Niels
On Mar 29, 2014 12:37 AM, "Software Dev"  wrote:

> When running locally it correctly connects to my Kafka broker and it
> see it display:
>
> Writing /kafka/wtf/172.16.1.91:9092:0 the data
> {topology={id=be4212a6-c140-4512-837f-13786e265504,
> name=ExampleTopology}, offset=19250,
>
> but I can't see anything nodes that are created when I log into the ZK
> console. Does this get saved somewhere different when running locally?
>


Re: why storm still use it's own zookeeper node not my zookeeper cluster?

2014-04-24 Thread Niels Basjes
Where is the storm.yaml placed?
Do you have a ~/.storm/storm.yaml with overrides?

Niels
On Apr 24, 2014 8:05 AM, "ch huang"  wrote:

> hi,maillist:
> i am a storm newbie ,when i start storm nimbus and suervisor
> ,i check log and find the following info
>
> 2014-04-08 09:26:02 o.a.z.ZooKeeper [INFO] Client
> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
> 2014-04-08 09:26:02 o.a.z.s.ZooKeeperServer [INFO] Server
> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>  but my zookeeper is  3.4.5 ,
>
> # rpm -qa|grep zookeeper
> zookeeper-server-3.4.5+23-1.cdh4.4.0.p0.24.el6.noarch
> zookeeper-3.4.5+23-1.cdh4.4.0.p0.24.el6.noarch
> my storm.yaml is
>
> storm.zookeeper.servers:
> - 192.168.10.220
> - 192.168.10.221
> - 192.168.10.223
> but why storm still use it's build-in zookeeper not my zookeeper cluster?
>


Re: [VOTE] Storm Logo Contest - Final Round

2014-06-10 Thread Niels Basjes
#10  5 points
On Jun 9, 2014 8:39 PM, "P. Taylor Goetz"  wrote:

> This is a call to vote on selecting the winning Storm logo from the 3
> finalists.
>
> The three candidates are:
>
>  * [No. 6 - Alec Bartos](
> http://storm.incubator.apache.org/2014/04/23/logo-abartos.html)
>  * [No. 9 - Jennifer Lee](
> http://storm.incubator.apache.org/2014/04/29/logo-jlee1.html)
>  * [No. 10 - Jennifer Lee](
> http://storm.incubator.apache.org/2014/04/29/logo-jlee2.html)
>
> VOTING
>
> Each person can cast a single vote. A vote consists of 5 points that can
> be divided among multiple entries. To vote, list the entry number, followed
> by the number of points assigned. For example:
>
> #1 - 2 pts.
> #2 - 1 pt.
> #3 - 2 pts.
>
> Votes cast by PPMC members are considered binding, but voting is open to
> anyone. In the event of a tie vote from the PPMC, votes from the community
> will be used to break the tie.
>
> This vote will be open until Monday, June 16 11:59 PM UTC.
>
> - Taylor
>


Re: Apache Storm Graduation to a TLP

2014-09-24 Thread Niels Basjes
What are the expected time line for changes like this:
0.9.3-incubating-SNAPSHOT
in https://github.com/apache/storm/blob/master/pom.xml
?


On Mon, Sep 22, 2014 at 11:16 PM, P. Taylor Goetz  wrote:

> I’m pleased to announce that Apache Storm has graduated to a Top-Level
> Project (TLP), and I’d like to thank everyone in the Storm community for
> your contributions and help in achieving this important milestone.
>
> As part of the graduation process, a number of infrastructure changes have
> taken place:
>
> *New website url:* http://storm.apache.org
>
> *New git repo urls:*
>
> https://git-wip-us.apache.org/repos/asf/storm.git (for committer push)
>
> g...@github.com:apache/storm.git
> -or-
> https://github.com/apache/storm.git (for github pull requests)
>
> *Mailing Lists:*
> If you are already subscribed, you’re subscription has been migrated. New
> messages should be sent to the new address:
>
> [list]@storm.apache.org
>
> This includes any subscribe/unsubscribe requests.
>
> Note: The mail-archives.apache.org site will not reflect these changes
> until October 1.
>
>
> Most of these changes have already occurred and are seamless. Please
> update your git remotes and address books accordingly.
>
> - Taylor
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes